diff options
Diffstat (limited to '')
-rw-r--r-- | src/arrow/r/man/map_batches.Rd | 30 |
1 files changed, 30 insertions, 0 deletions
diff --git a/src/arrow/r/man/map_batches.Rd b/src/arrow/r/man/map_batches.Rd new file mode 100644 index 000000000..08e7b86c0 --- /dev/null +++ b/src/arrow/r/man/map_batches.Rd @@ -0,0 +1,30 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/dataset-scan.R +\name{map_batches} +\alias{map_batches} +\title{Apply a function to a stream of RecordBatches} +\usage{ +map_batches(X, FUN, ..., .data.frame = TRUE) +} +\arguments{ +\item{X}{A \code{Dataset} or \code{arrow_dplyr_query} object, as returned by the +\code{dplyr} methods on \code{Dataset}.} + +\item{FUN}{A function or \code{purrr}-style lambda expression to apply to each +batch} + +\item{...}{Additional arguments passed to \code{FUN}} + +\item{.data.frame}{logical: collect the resulting chunks into a single +\code{data.frame}? Default \code{TRUE}} +} +\description{ +As an alternative to calling \code{collect()} on a \code{Dataset} query, you can +use this function to access the stream of \code{RecordBatch}es in the \code{Dataset}. +This lets you aggregate on each chunk and pull the intermediate results into +a \code{data.frame} for further aggregation, even if you couldn't fit the whole +\code{Dataset} result in memory. +} +\details{ +This is experimental and not recommended for production use. +} |