summaryrefslogtreecommitdiffstats
path: root/src/arrow/docs/source/format/Flight.rst
diff options
context:
space:
mode:
Diffstat (limited to 'src/arrow/docs/source/format/Flight.rst')
-rw-r--r--src/arrow/docs/source/format/Flight.rst152
1 files changed, 152 insertions, 0 deletions
diff --git a/src/arrow/docs/source/format/Flight.rst b/src/arrow/docs/source/format/Flight.rst
new file mode 100644
index 000000000..c79c56386
--- /dev/null
+++ b/src/arrow/docs/source/format/Flight.rst
@@ -0,0 +1,152 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _flight-rpc:
+
+Arrow Flight RPC
+================
+
+Arrow Flight is an RPC framework for high-performance data services
+based on Arrow data, and is built on top of gRPC_ and the :doc:`IPC
+format <IPC>`.
+
+Flight is organized around streams of Arrow record batches, being
+either downloaded from or uploaded to another service. A set of
+metadata methods offers discovery and introspection of streams, as
+well as the ability to implement application-specific methods.
+
+Methods and message wire formats are defined by Protobuf, enabling
+interoperability with clients that may support gRPC and Arrow
+separately, but not Flight. However, Flight implementations include
+further optimizations to avoid overhead in usage of Protobuf (mostly
+around avoiding excessive memory copies).
+
+.. _gRPC: https://grpc.io/
+
+RPC Methods
+-----------
+
+Flight defines a set of RPC methods for uploading/downloading data,
+retrieving metadata about a data stream, listing available data
+streams, and for implementing application-specific RPC methods. A
+Flight service implements some subset of these methods, while a Flight
+client can call any of these methods. Thus, one Flight client can
+connect to any Flight service and perform basic operations.
+
+Data streams are identified by descriptors, which are either a path or
+an arbitrary binary command. A client that wishes to download the data
+would:
+
+#. Construct or acquire a ``FlightDescriptor`` for the data set they
+ are interested in. A client may know what descriptor they want
+ already, or they may use methods like ``ListFlights`` to discover
+ them.
+#. Call ``GetFlightInfo(FlightDescriptor)`` to get a ``FlightInfo``
+ message containing details on where the data is located (as well as
+ other metadata, like the schema and possibly an estimate of the
+ dataset size).
+
+ Flight does not require that data live on the same server as
+ metadata: this call may list other servers to connect to. The
+ ``FlightInfo`` message includes a ``Ticket``, an opaque binary
+ token that the server uses to identify the exact data set being
+ requested.
+#. Connect to other servers (if needed).
+#. Call ``DoGet(Ticket)`` to get back a stream of Arrow record
+ batches.
+
+To upload data, a client would:
+
+#. Construct or acquire a ``FlightDescriptor``, as before.
+#. Call ``DoPut(FlightData)`` and upload a stream of Arrow record
+ batches. They would also include the ``FlightDescriptor`` with the
+ first message.
+
+See `Protocol Buffer Definitions`_ for full details on the methods and
+messages involved.
+
+Authentication
+--------------
+
+Flight supports application-implemented authentication
+methods. Authentication, if enabled, has two phases: at connection
+time, the client and server can exchange any number of messages. Then,
+the client can provide a token alongside each call, and the server can
+validate that token.
+
+Applications may use any part of this; for instance, they may ignore
+the initial handshake and send an externally acquired token on each
+call, or they may establish trust during the handshake and not
+validate a token for each call. (Note that the latter is not secure if
+you choose to deploy a layer 7 load balancer, as is common with gRPC.)
+
+Error Handling
+--------------
+
+Arrow Flight defines its own set of error codes. The implementation
+differs between languages (e.g. in C++, Unimplemented is a general
+Arrow error status while it's a Flight-specific exception in Java),
+but the following set is exposed:
+
++----------------+-------------------------------------------+
+|Error Code |Description |
++================+===========================================+
+|UNKNOWN |An unknown error. The default if no other |
+| |error applies. |
++----------------+-------------------------------------------+
+|INTERNAL |An error internal to the service |
+| |implementation occurred. |
++----------------+-------------------------------------------+
+|INVALID_ARGUMENT|The client passed an invalid argument to |
+| |the RPC. |
++----------------+-------------------------------------------+
+|TIMED_OUT |The operation exceeded a timeout or |
+| |deadline. |
++----------------+-------------------------------------------+
+|NOT_FOUND |The requested resource (action, data |
+| |stream) was not found. |
++----------------+-------------------------------------------+
+|ALREADY_EXISTS |The resource already exists. |
++----------------+-------------------------------------------+
+|CANCELLED |The operation was cancelled (either by the |
+| |client or the server). |
++----------------+-------------------------------------------+
+|UNAUTHENTICATED |The client is not authenticated. |
++----------------+-------------------------------------------+
+|UNAUTHORIZED |The client is authenticated, but does not |
+| |have permissions for the requested |
+| |operation. |
++----------------+-------------------------------------------+
+|UNIMPLEMENTED |The RPC is not implemented. |
++----------------+-------------------------------------------+
+|UNAVAILABLE |The server is not available. May be emitted|
+| |by the client for connectivity reasons. |
++----------------+-------------------------------------------+
+
+
+External Resources
+------------------
+
+- https://arrow.apache.org/blog/2018/10/09/0.11.0-release/
+- https://www.slideshare.net/JacquesNadeau5/apache-arrow-flight-overview
+
+Protocol Buffer Definitions
+---------------------------
+
+.. literalinclude:: ../../../format/Flight.proto
+ :language: protobuf
+ :linenos: