diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-21 11:54:28 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-21 11:54:28 +0000 |
commit | e6918187568dbd01842d8d1d2c808ce16a894239 (patch) | |
tree | 64f88b554b444a49f656b6c656111a145cbbaa28 /src/arrow/r/vignettes/flight.Rmd | |
parent | Initial commit. (diff) | |
download | ceph-b26c4052f3542036551aa9dec9caa4226e456195.tar.xz ceph-b26c4052f3542036551aa9dec9caa4226e456195.zip |
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/arrow/r/vignettes/flight.Rmd')
-rw-r--r-- | src/arrow/r/vignettes/flight.Rmd | 87 |
1 files changed, 87 insertions, 0 deletions
diff --git a/src/arrow/r/vignettes/flight.Rmd b/src/arrow/r/vignettes/flight.Rmd new file mode 100644 index 000000000..e8af5cad6 --- /dev/null +++ b/src/arrow/r/vignettes/flight.Rmd @@ -0,0 +1,87 @@ +--- +title: "Connecting to Flight RPC Servers" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Connecting to Flight RPC Servers} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +[**Flight**](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) +is a general-purpose client-server framework for high performance +transport of large datasets over network interfaces, built as part of the +[Apache Arrow](https://arrow.apache.org) project. + +Flight allows for highly efficient data transfer as it: + +* removes the need for deserialization during data transfer +* allows for parallel data streaming +* is highly optimized to take advantage of Arrow's columnar format. + +The arrow package provides methods for connecting to Flight RPC servers +to send and receive data. + +## Getting Started + +The `flight` functions in the package use [reticulate](https://rstudio.github.io/reticulate/) to call methods in the +[pyarrow](https://arrow.apache.org/docs/python/api/flight.html) Python package. + +Before using them for the first time, +you'll need to be sure you have reticulate and pyarrow installed: + +```r +install.packages("reticulate") +arrow::install_pyarrow() +``` + +See `vignette("python", package = "arrow")` for more details on setting up +`pyarrow`. + +## Example + +The package includes methods for starting a Python-based Flight server, as well +as methods for connecting to a Flight server running elsewhere. + +To illustrate both sides, in one process let's start a demo server: + +```r +library(arrow) +demo_server <- load_flight_server("demo_flight_server") +server <- demo_server$DemoFlightServer(port = 8089) +server$serve() +``` + +We'll leave that one running. + +In a different R process, let's connect to it and put some data in it. + +```r +library(arrow) +client <- flight_connect(port = 8089) +# Upload some data to our server so there's something to demo +flight_put(client, iris, path = "test_data/iris") +``` + +Now, in a new R process, let's connect to the server and pull the data we +put there: + +```r +library(arrow) +library(dplyr) +client <- flight_connect(port = 8089) +client %>% + flight_get("test_data/iris") %>% + group_by(Species) %>% + summarize(max_petal = max(Petal.Length)) + +## # A tibble: 3 x 2 +## Species max_petal +## <fct> <dbl> +## 1 setosa 1.9 +## 2 versicolor 5.1 +## 3 virginica 6.9 +``` + +Because `flight_get()` returns an Arrow data structure, you can directly pipe +its result into a [dplyr](https://dplyr.tidyverse.org/) workflow. +See `vignette("dataset", package = "arrow")` for more information on working with Arrow objects via a dplyr interface. |