summaryrefslogtreecommitdiffstats
path: root/src/arrow/r/man/CsvReadOptions.Rd
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
commite6918187568dbd01842d8d1d2c808ce16a894239 (patch)
tree64f88b554b444a49f656b6c656111a145cbbaa28 /src/arrow/r/man/CsvReadOptions.Rd
parentInitial commit. (diff)
downloadceph-b26c4052f3542036551aa9dec9caa4226e456195.tar.xz
ceph-b26c4052f3542036551aa9dec9caa4226e456195.zip
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/arrow/r/man/CsvReadOptions.Rd')
-rw-r--r--src/arrow/r/man/CsvReadOptions.Rd107
1 files changed, 107 insertions, 0 deletions
diff --git a/src/arrow/r/man/CsvReadOptions.Rd b/src/arrow/r/man/CsvReadOptions.Rd
new file mode 100644
index 000000000..d08869270
--- /dev/null
+++ b/src/arrow/r/man/CsvReadOptions.Rd
@@ -0,0 +1,107 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/csv.R, R/json.R
+\docType{class}
+\name{CsvReadOptions}
+\alias{CsvReadOptions}
+\alias{CsvWriteOptions}
+\alias{CsvParseOptions}
+\alias{TimestampParser}
+\alias{CsvConvertOptions}
+\alias{JsonReadOptions}
+\alias{JsonParseOptions}
+\title{File reader options}
+\description{
+\code{CsvReadOptions}, \code{CsvParseOptions}, \code{CsvConvertOptions},
+\code{JsonReadOptions}, \code{JsonParseOptions}, and \code{TimestampParser} are containers for various
+file reading options. See their usage in \code{\link[=read_csv_arrow]{read_csv_arrow()}} and
+\code{\link[=read_json_arrow]{read_json_arrow()}}, respectively.
+}
+\section{Factory}{
+
+
+The \code{CsvReadOptions$create()} and \code{JsonReadOptions$create()} factory methods
+take the following arguments:
+\itemize{
+\item \code{use_threads} Whether to use the global CPU thread pool
+\item \code{block_size} Block size we request from the IO layer; also determines
+the size of chunks when use_threads is \code{TRUE}. NB: if \code{FALSE}, JSON input
+must end with an empty line.
+}
+
+\code{CsvReadOptions$create()} further accepts these additional arguments:
+\itemize{
+\item \code{skip_rows} Number of lines to skip before reading data (default 0)
+\item \code{column_names} Character vector to supply column names. If length-0
+(the default), the first non-skipped row will be parsed to generate column
+names, unless \code{autogenerate_column_names} is \code{TRUE}.
+\item \code{autogenerate_column_names} Logical: generate column names instead of
+using the first non-skipped row (the default)? If \code{TRUE}, column names will
+be "f0", "f1", ..., "fN".
+}
+
+\code{CsvParseOptions$create()} takes the following arguments:
+\itemize{
+\item \code{delimiter} Field delimiting character (default \code{","})
+\item \code{quoting} Logical: are strings quoted? (default \code{TRUE})
+\item \code{quote_char} Quoting character, if \code{quoting} is \code{TRUE}
+\item \code{double_quote} Logical: are quotes inside values double-quoted? (default \code{TRUE})
+\item \code{escaping} Logical: whether escaping is used (default \code{FALSE})
+\item \code{escape_char} Escaping character, if \code{escaping} is \code{TRUE}
+\item \code{newlines_in_values} Logical: are values allowed to contain CR (\code{0x0d})
+and LF (\code{0x0a}) characters? (default \code{FALSE})
+\item \code{ignore_empty_lines} Logical: should empty lines be ignored (default) or
+generate a row of missing values (if \code{FALSE})?
+}
+
+\code{JsonParseOptions$create()} accepts only the \code{newlines_in_values} argument.
+
+\code{CsvConvertOptions$create()} takes the following arguments:
+\itemize{
+\item \code{check_utf8} Logical: check UTF8 validity of string columns? (default \code{TRUE})
+\item \code{null_values} character vector of recognized spellings for null values.
+Analogous to the \code{na.strings} argument to
+\code{\link[utils:read.table]{read.csv()}} or \code{na} in \code{readr::read_csv()}.
+\item \code{strings_can_be_null} Logical: can string / binary columns have
+null values? Similar to the \code{quoted_na} argument to \code{readr::read_csv()}.
+(default \code{FALSE})
+\item \code{true_values} character vector of recognized spellings for \code{TRUE} values
+\item \code{false_values} character vector of recognized spellings for \code{FALSE} values
+\item \code{col_types} A \code{Schema} or \code{NULL} to infer types
+\item \code{auto_dict_encode} Logical: Whether to try to automatically
+dictionary-encode string / binary data (think \code{stringsAsFactors}). Default \code{FALSE}.
+This setting is ignored for non-inferred columns (those in \code{col_types}).
+\item \code{auto_dict_max_cardinality} If \code{auto_dict_encode}, string/binary columns
+are dictionary-encoded up to this number of unique values (default 50),
+after which it switches to regular encoding.
+\item \code{include_columns} If non-empty, indicates the names of columns from the
+CSV file that should be actually read and converted (in the vector's order).
+\item \code{include_missing_columns} Logical: if \code{include_columns} is provided, should
+columns named in it but not found in the data be included as a column of
+type \code{null()}? The default (\code{FALSE}) means that the reader will instead
+raise an error.
+\item \code{timestamp_parsers} User-defined timestamp parsers. If more than one
+parser is specified, the CSV conversion logic will try parsing values
+starting from the beginning of this vector. Possible values are
+(a) \code{NULL}, the default, which uses the ISO-8601 parser;
+(b) a character vector of \link[base:strptime]{strptime} parse strings; or
+(c) a list of \link{TimestampParser} objects.
+}
+
+\code{TimestampParser$create()} takes an optional \code{format} string argument.
+See \code{\link[base:strptime]{strptime()}} for example syntax.
+The default is to use an ISO-8601 format parser.
+
+The \code{CsvWriteOptions$create()} factory method takes the following arguments:
+\itemize{
+\item \code{include_header} Whether to write an initial header line with column names
+\item \code{batch_size} Maximum number of rows processed at a time. Default is 1024.
+}
+}
+
+\section{Active bindings}{
+
+\itemize{
+\item \code{column_names}: from \code{CsvReadOptions}
+}
+}
+