diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-21 11:54:28 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-21 11:54:28 +0000 |
commit | e6918187568dbd01842d8d1d2c808ce16a894239 (patch) | |
tree | 64f88b554b444a49f656b6c656111a145cbbaa28 /src/arrow/r/man/Schema.Rd | |
parent | Initial commit. (diff) | |
download | ceph-b26c4052f3542036551aa9dec9caa4226e456195.tar.xz ceph-b26c4052f3542036551aa9dec9caa4226e456195.zip |
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/arrow/r/man/Schema.Rd')
-rw-r--r-- | src/arrow/r/man/Schema.Rd | 86 |
1 files changed, 86 insertions, 0 deletions
diff --git a/src/arrow/r/man/Schema.Rd b/src/arrow/r/man/Schema.Rd new file mode 100644 index 000000000..7322c70f2 --- /dev/null +++ b/src/arrow/r/man/Schema.Rd @@ -0,0 +1,86 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/schema.R +\docType{class} +\name{Schema} +\alias{Schema} +\alias{schema} +\title{Schema class} +\usage{ +schema(...) +} +\arguments{ +\item{...}{named list containing \link[=data-type]{data types} or +a list of \link[=field]{fields} containing the fields for the schema} +} +\description{ +A \code{Schema} is a list of \link{Field}s, which map names to +Arrow \link[=data-type]{data types}. Create a \code{Schema} when you +want to convert an R \code{data.frame} to Arrow but don't want to rely on the +default mapping of R types to Arrow types, such as when you want to choose a +specific numeric precision, or when creating a \link{Dataset} and you want to +ensure a specific schema rather than inferring it from the various files. + +Many Arrow objects, including \link{Table} and \link{Dataset}, have a \verb{$schema} method +(active binding) that lets you access their schema. +} +\section{Methods}{ + +\itemize{ +\item \verb{$ToString()}: convert to a string +\item \verb{$field(i)}: returns the field at index \code{i} (0-based) +\item \verb{$GetFieldByName(x)}: returns the field with name \code{x} +\item \verb{$WithMetadata(metadata)}: returns a new \code{Schema} with the key-value +\code{metadata} set. Note that all list elements in \code{metadata} will be coerced +to \code{character}. +} +} + +\section{Active bindings}{ + +\itemize{ +\item \verb{$names}: returns the field names (called in \code{names(Schema)}) +\item \verb{$num_fields}: returns the number of fields (called in \code{length(Schema)}) +\item \verb{$fields}: returns the list of \code{Field}s in the \code{Schema}, suitable for +iterating over +\item \verb{$HasMetadata}: logical: does this \code{Schema} have extra metadata? +\item \verb{$metadata}: returns the key-value metadata as a named list. +Modify or replace by assigning in (\code{sch$metadata <- new_metadata}). +All list elements are coerced to string. +} +} + +\section{R Metadata}{ + + +When converting a data.frame to an Arrow Table or RecordBatch, attributes +from the \code{data.frame} are saved alongside tables so that the object can be +reconstructed faithfully in R (e.g. with \code{as.data.frame()}). This metadata +can be both at the top-level of the \code{data.frame} (e.g. \code{attributes(df)}) or +at the column (e.g. \code{attributes(df$col_a)}) or for list columns only: +element level (e.g. \code{attributes(df[1, "col_a"])}). For example, this allows +for storing \code{haven} columns in a table and being able to faithfully +re-create them when pulled back into R. This metadata is separate from the +schema (column names and types) which is compatible with other Arrow +clients. The R metadata is only read by R and is ignored by other clients +(e.g. Pandas has its own custom metadata). This metadata is stored in +\verb{$metadata$r}. + +Since Schema metadata keys and values must be strings, this metadata is +saved by serializing R's attribute list structure to a string. If the +serialized metadata exceeds 100Kb in size, by default it is compressed +starting in version 3.0.0. To disable this compression (e.g. for tables +that are compatible with Arrow versions before 3.0.0 and include large +amounts of metadata), set the option \code{arrow.compress_metadata} to \code{FALSE}. +Files with compressed metadata are readable by older versions of arrow, but +the metadata is dropped. +} + +\examples{ +\dontshow{if (arrow_available()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5)) +tab1 <- arrow_table(df) +tab1$schema +tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32())) +tab2$schema +\dontshow{\}) # examplesIf} +} |