summaryrefslogtreecommitdiffstats
path: root/src/arrow/r/man/Schema.Rd
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
commite6918187568dbd01842d8d1d2c808ce16a894239 (patch)
tree64f88b554b444a49f656b6c656111a145cbbaa28 /src/arrow/r/man/Schema.Rd
parentInitial commit. (diff)
downloadceph-upstream/18.2.2.tar.xz
ceph-upstream/18.2.2.zip
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/arrow/r/man/Schema.Rd')
-rw-r--r--src/arrow/r/man/Schema.Rd86
1 files changed, 86 insertions, 0 deletions
diff --git a/src/arrow/r/man/Schema.Rd b/src/arrow/r/man/Schema.Rd
new file mode 100644
index 000000000..7322c70f2
--- /dev/null
+++ b/src/arrow/r/man/Schema.Rd
@@ -0,0 +1,86 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/schema.R
+\docType{class}
+\name{Schema}
+\alias{Schema}
+\alias{schema}
+\title{Schema class}
+\usage{
+schema(...)
+}
+\arguments{
+\item{...}{named list containing \link[=data-type]{data types} or
+a list of \link[=field]{fields} containing the fields for the schema}
+}
+\description{
+A \code{Schema} is a list of \link{Field}s, which map names to
+Arrow \link[=data-type]{data types}. Create a \code{Schema} when you
+want to convert an R \code{data.frame} to Arrow but don't want to rely on the
+default mapping of R types to Arrow types, such as when you want to choose a
+specific numeric precision, or when creating a \link{Dataset} and you want to
+ensure a specific schema rather than inferring it from the various files.
+
+Many Arrow objects, including \link{Table} and \link{Dataset}, have a \verb{$schema} method
+(active binding) that lets you access their schema.
+}
+\section{Methods}{
+
+\itemize{
+\item \verb{$ToString()}: convert to a string
+\item \verb{$field(i)}: returns the field at index \code{i} (0-based)
+\item \verb{$GetFieldByName(x)}: returns the field with name \code{x}
+\item \verb{$WithMetadata(metadata)}: returns a new \code{Schema} with the key-value
+\code{metadata} set. Note that all list elements in \code{metadata} will be coerced
+to \code{character}.
+}
+}
+
+\section{Active bindings}{
+
+\itemize{
+\item \verb{$names}: returns the field names (called in \code{names(Schema)})
+\item \verb{$num_fields}: returns the number of fields (called in \code{length(Schema)})
+\item \verb{$fields}: returns the list of \code{Field}s in the \code{Schema}, suitable for
+iterating over
+\item \verb{$HasMetadata}: logical: does this \code{Schema} have extra metadata?
+\item \verb{$metadata}: returns the key-value metadata as a named list.
+Modify or replace by assigning in (\code{sch$metadata <- new_metadata}).
+All list elements are coerced to string.
+}
+}
+
+\section{R Metadata}{
+
+
+When converting a data.frame to an Arrow Table or RecordBatch, attributes
+from the \code{data.frame} are saved alongside tables so that the object can be
+reconstructed faithfully in R (e.g. with \code{as.data.frame()}). This metadata
+can be both at the top-level of the \code{data.frame} (e.g. \code{attributes(df)}) or
+at the column (e.g. \code{attributes(df$col_a)}) or for list columns only:
+element level (e.g. \code{attributes(df[1, "col_a"])}). For example, this allows
+for storing \code{haven} columns in a table and being able to faithfully
+re-create them when pulled back into R. This metadata is separate from the
+schema (column names and types) which is compatible with other Arrow
+clients. The R metadata is only read by R and is ignored by other clients
+(e.g. Pandas has its own custom metadata). This metadata is stored in
+\verb{$metadata$r}.
+
+Since Schema metadata keys and values must be strings, this metadata is
+saved by serializing R's attribute list structure to a string. If the
+serialized metadata exceeds 100Kb in size, by default it is compressed
+starting in version 3.0.0. To disable this compression (e.g. for tables
+that are compatible with Arrow versions before 3.0.0 and include large
+amounts of metadata), set the option \code{arrow.compress_metadata} to \code{FALSE}.
+Files with compressed metadata are readable by older versions of arrow, but
+the metadata is dropped.
+}
+
+\examples{
+\dontshow{if (arrow_available()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
+df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+tab1 <- arrow_table(df)
+tab1$schema
+tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+tab2$schema
+\dontshow{\}) # examplesIf}
+}