summaryrefslogtreecommitdiffstats
path: root/src/arrow/r/vignettes/flight.Rmd
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-21 11:54:28 +0000
commite6918187568dbd01842d8d1d2c808ce16a894239 (patch)
tree64f88b554b444a49f656b6c656111a145cbbaa28 /src/arrow/r/vignettes/flight.Rmd
parentInitial commit. (diff)
downloadceph-e6918187568dbd01842d8d1d2c808ce16a894239.tar.xz
ceph-e6918187568dbd01842d8d1d2c808ce16a894239.zip
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/arrow/r/vignettes/flight.Rmd')
-rw-r--r--src/arrow/r/vignettes/flight.Rmd87
1 files changed, 87 insertions, 0 deletions
diff --git a/src/arrow/r/vignettes/flight.Rmd b/src/arrow/r/vignettes/flight.Rmd
new file mode 100644
index 000000000..e8af5cad6
--- /dev/null
+++ b/src/arrow/r/vignettes/flight.Rmd
@@ -0,0 +1,87 @@
+---
+title: "Connecting to Flight RPC Servers"
+output: rmarkdown::html_vignette
+vignette: >
+ %\VignetteIndexEntry{Connecting to Flight RPC Servers}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+---
+
+[**Flight**](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/)
+is a general-purpose client-server framework for high performance
+transport of large datasets over network interfaces, built as part of the
+[Apache Arrow](https://arrow.apache.org) project.
+
+Flight allows for highly efficient data transfer as it:
+
+* removes the need for deserialization during data transfer
+* allows for parallel data streaming
+* is highly optimized to take advantage of Arrow's columnar format.
+
+The arrow package provides methods for connecting to Flight RPC servers
+to send and receive data.
+
+## Getting Started
+
+The `flight` functions in the package use [reticulate](https://rstudio.github.io/reticulate/) to call methods in the
+[pyarrow](https://arrow.apache.org/docs/python/api/flight.html) Python package.
+
+Before using them for the first time,
+you'll need to be sure you have reticulate and pyarrow installed:
+
+```r
+install.packages("reticulate")
+arrow::install_pyarrow()
+```
+
+See `vignette("python", package = "arrow")` for more details on setting up
+`pyarrow`.
+
+## Example
+
+The package includes methods for starting a Python-based Flight server, as well
+as methods for connecting to a Flight server running elsewhere.
+
+To illustrate both sides, in one process let's start a demo server:
+
+```r
+library(arrow)
+demo_server <- load_flight_server("demo_flight_server")
+server <- demo_server$DemoFlightServer(port = 8089)
+server$serve()
+```
+
+We'll leave that one running.
+
+In a different R process, let's connect to it and put some data in it.
+
+```r
+library(arrow)
+client <- flight_connect(port = 8089)
+# Upload some data to our server so there's something to demo
+flight_put(client, iris, path = "test_data/iris")
+```
+
+Now, in a new R process, let's connect to the server and pull the data we
+put there:
+
+```r
+library(arrow)
+library(dplyr)
+client <- flight_connect(port = 8089)
+client %>%
+ flight_get("test_data/iris") %>%
+ group_by(Species) %>%
+ summarize(max_petal = max(Petal.Length))
+
+## # A tibble: 3 x 2
+## Species max_petal
+## <fct> <dbl>
+## 1 setosa 1.9
+## 2 versicolor 5.1
+## 3 virginica 6.9
+```
+
+Because `flight_get()` returns an Arrow data structure, you can directly pipe
+its result into a [dplyr](https://dplyr.tidyverse.org/) workflow.
+See `vignette("dataset", package = "arrow")` for more information on working with Arrow objects via a dplyr interface.