% Generated by roxygen2: do not edit by hand % Please edit documentation in R/schema.R \docType{class} \name{Schema} \alias{Schema} \alias{schema} \title{Schema class} \usage{ schema(...) } \arguments{ \item{...}{named list containing \link[=data-type]{data types} or a list of \link[=field]{fields} containing the fields for the schema} } \description{ A \code{Schema} is a list of \link{Field}s, which map names to Arrow \link[=data-type]{data types}. Create a \code{Schema} when you want to convert an R \code{data.frame} to Arrow but don't want to rely on the default mapping of R types to Arrow types, such as when you want to choose a specific numeric precision, or when creating a \link{Dataset} and you want to ensure a specific schema rather than inferring it from the various files. Many Arrow objects, including \link{Table} and \link{Dataset}, have a \verb{$schema} method (active binding) that lets you access their schema. } \section{Methods}{ \itemize{ \item \verb{$ToString()}: convert to a string \item \verb{$field(i)}: returns the field at index \code{i} (0-based) \item \verb{$GetFieldByName(x)}: returns the field with name \code{x} \item \verb{$WithMetadata(metadata)}: returns a new \code{Schema} with the key-value \code{metadata} set. Note that all list elements in \code{metadata} will be coerced to \code{character}. } } \section{Active bindings}{ \itemize{ \item \verb{$names}: returns the field names (called in \code{names(Schema)}) \item \verb{$num_fields}: returns the number of fields (called in \code{length(Schema)}) \item \verb{$fields}: returns the list of \code{Field}s in the \code{Schema}, suitable for iterating over \item \verb{$HasMetadata}: logical: does this \code{Schema} have extra metadata? \item \verb{$metadata}: returns the key-value metadata as a named list. Modify or replace by assigning in (\code{sch$metadata <- new_metadata}). All list elements are coerced to string. } } \section{R Metadata}{ When converting a data.frame to an Arrow Table or RecordBatch, attributes from the \code{data.frame} are saved alongside tables so that the object can be reconstructed faithfully in R (e.g. with \code{as.data.frame()}). This metadata can be both at the top-level of the \code{data.frame} (e.g. \code{attributes(df)}) or at the column (e.g. \code{attributes(df$col_a)}) or for list columns only: element level (e.g. \code{attributes(df[1, "col_a"])}). For example, this allows for storing \code{haven} columns in a table and being able to faithfully re-create them when pulled back into R. This metadata is separate from the schema (column names and types) which is compatible with other Arrow clients. The R metadata is only read by R and is ignored by other clients (e.g. Pandas has its own custom metadata). This metadata is stored in \verb{$metadata$r}. Since Schema metadata keys and values must be strings, this metadata is saved by serializing R's attribute list structure to a string. If the serialized metadata exceeds 100Kb in size, by default it is compressed starting in version 3.0.0. To disable this compression (e.g. for tables that are compatible with Arrow versions before 3.0.0 and include large amounts of metadata), set the option \code{arrow.compress_metadata} to \code{FALSE}. Files with compressed metadata are readable by older versions of arrow, but the metadata is dropped. } \examples{ \dontshow{if (arrow_available()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5)) tab1 <- arrow_table(df) tab1$schema tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32())) tab2$schema \dontshow{\}) # examplesIf} }