diff options
Diffstat (limited to '')
-rw-r--r-- | src/arrow/r/man/Scanner.Rd | 51 |
1 files changed, 51 insertions, 0 deletions
diff --git a/src/arrow/r/man/Scanner.Rd b/src/arrow/r/man/Scanner.Rd new file mode 100644 index 000000000..db6488f50 --- /dev/null +++ b/src/arrow/r/man/Scanner.Rd @@ -0,0 +1,51 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/dataset-scan.R +\name{Scanner} +\alias{Scanner} +\alias{ScannerBuilder} +\title{Scan the contents of a dataset} +\description{ +A \code{Scanner} iterates over a \link{Dataset}'s fragments and returns data +according to given row filtering and column projection. A \code{ScannerBuilder} +can help create one. +} +\section{Factory}{ + +\code{Scanner$create()} wraps the \code{ScannerBuilder} interface to make a \code{Scanner}. +It takes the following arguments: +\itemize{ +\item \code{dataset}: A \code{Dataset} or \code{arrow_dplyr_query} object, as returned by the +\code{dplyr} methods on \code{Dataset}. +\item \code{projection}: A character vector of column names to select columns or a +named list of expressions +\item \code{filter}: A \code{Expression} to filter the scanned rows by, or \code{TRUE} (default) +to keep all rows. +\item \code{use_threads}: logical: should scanning use multithreading? Default \code{TRUE} +\item \code{use_async}: logical: should the async scanner (performs better on +high-latency/highly parallel filesystems like S3) be used? Default \code{FALSE} +\item \code{...}: Additional arguments, currently ignored +} +} + +\section{Methods}{ + +\code{ScannerBuilder} has the following methods: +\itemize{ +\item \verb{$Project(cols)}: Indicate that the scan should only return columns given +by \code{cols}, a character vector of column names +\item \verb{$Filter(expr)}: Filter rows by an \link{Expression}. +\item \verb{$UseThreads(threads)}: logical: should the scan use multithreading? +The method's default input is \code{TRUE}, but you must call the method to enable +multithreading because the scanner default is \code{FALSE}. +\item \verb{$UseAsync(use_async)}: logical: should the async scanner be used? +\item \verb{$BatchSize(batch_size)}: integer: Maximum row count of scanned record +batches, default is 32K. If scanned record batches are overflowing memory +then this method can be called to reduce their size. +\item \verb{$schema}: Active binding, returns the \link{Schema} of the Dataset +\item \verb{$Finish()}: Returns a \code{Scanner} +} + +\code{Scanner} currently has a single method, \verb{$ToTable()}, which evaluates the +query and returns an Arrow \link{Table}. +} + |