diff options
Diffstat (limited to 'src/arrow/docs/source/format/Flight.rst')
-rw-r--r-- | src/arrow/docs/source/format/Flight.rst | 152 |
1 files changed, 152 insertions, 0 deletions
diff --git a/src/arrow/docs/source/format/Flight.rst b/src/arrow/docs/source/format/Flight.rst new file mode 100644 index 000000000..c79c56386 --- /dev/null +++ b/src/arrow/docs/source/format/Flight.rst @@ -0,0 +1,152 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _flight-rpc: + +Arrow Flight RPC +================ + +Arrow Flight is an RPC framework for high-performance data services +based on Arrow data, and is built on top of gRPC_ and the :doc:`IPC +format <IPC>`. + +Flight is organized around streams of Arrow record batches, being +either downloaded from or uploaded to another service. A set of +metadata methods offers discovery and introspection of streams, as +well as the ability to implement application-specific methods. + +Methods and message wire formats are defined by Protobuf, enabling +interoperability with clients that may support gRPC and Arrow +separately, but not Flight. However, Flight implementations include +further optimizations to avoid overhead in usage of Protobuf (mostly +around avoiding excessive memory copies). + +.. _gRPC: https://grpc.io/ + +RPC Methods +----------- + +Flight defines a set of RPC methods for uploading/downloading data, +retrieving metadata about a data stream, listing available data +streams, and for implementing application-specific RPC methods. A +Flight service implements some subset of these methods, while a Flight +client can call any of these methods. Thus, one Flight client can +connect to any Flight service and perform basic operations. + +Data streams are identified by descriptors, which are either a path or +an arbitrary binary command. A client that wishes to download the data +would: + +#. Construct or acquire a ``FlightDescriptor`` for the data set they + are interested in. A client may know what descriptor they want + already, or they may use methods like ``ListFlights`` to discover + them. +#. Call ``GetFlightInfo(FlightDescriptor)`` to get a ``FlightInfo`` + message containing details on where the data is located (as well as + other metadata, like the schema and possibly an estimate of the + dataset size). + + Flight does not require that data live on the same server as + metadata: this call may list other servers to connect to. The + ``FlightInfo`` message includes a ``Ticket``, an opaque binary + token that the server uses to identify the exact data set being + requested. +#. Connect to other servers (if needed). +#. Call ``DoGet(Ticket)`` to get back a stream of Arrow record + batches. + +To upload data, a client would: + +#. Construct or acquire a ``FlightDescriptor``, as before. +#. Call ``DoPut(FlightData)`` and upload a stream of Arrow record + batches. They would also include the ``FlightDescriptor`` with the + first message. + +See `Protocol Buffer Definitions`_ for full details on the methods and +messages involved. + +Authentication +-------------- + +Flight supports application-implemented authentication +methods. Authentication, if enabled, has two phases: at connection +time, the client and server can exchange any number of messages. Then, +the client can provide a token alongside each call, and the server can +validate that token. + +Applications may use any part of this; for instance, they may ignore +the initial handshake and send an externally acquired token on each +call, or they may establish trust during the handshake and not +validate a token for each call. (Note that the latter is not secure if +you choose to deploy a layer 7 load balancer, as is common with gRPC.) + +Error Handling +-------------- + +Arrow Flight defines its own set of error codes. The implementation +differs between languages (e.g. in C++, Unimplemented is a general +Arrow error status while it's a Flight-specific exception in Java), +but the following set is exposed: + ++----------------+-------------------------------------------+ +|Error Code |Description | ++================+===========================================+ +|UNKNOWN |An unknown error. The default if no other | +| |error applies. | ++----------------+-------------------------------------------+ +|INTERNAL |An error internal to the service | +| |implementation occurred. | ++----------------+-------------------------------------------+ +|INVALID_ARGUMENT|The client passed an invalid argument to | +| |the RPC. | ++----------------+-------------------------------------------+ +|TIMED_OUT |The operation exceeded a timeout or | +| |deadline. | ++----------------+-------------------------------------------+ +|NOT_FOUND |The requested resource (action, data | +| |stream) was not found. | ++----------------+-------------------------------------------+ +|ALREADY_EXISTS |The resource already exists. | ++----------------+-------------------------------------------+ +|CANCELLED |The operation was cancelled (either by the | +| |client or the server). | ++----------------+-------------------------------------------+ +|UNAUTHENTICATED |The client is not authenticated. | ++----------------+-------------------------------------------+ +|UNAUTHORIZED |The client is authenticated, but does not | +| |have permissions for the requested | +| |operation. | ++----------------+-------------------------------------------+ +|UNIMPLEMENTED |The RPC is not implemented. | ++----------------+-------------------------------------------+ +|UNAVAILABLE |The server is not available. May be emitted| +| |by the client for connectivity reasons. | ++----------------+-------------------------------------------+ + + +External Resources +------------------ + +- https://arrow.apache.org/blog/2018/10/09/0.11.0-release/ +- https://www.slideshare.net/JacquesNadeau5/apache-arrow-flight-overview + +Protocol Buffer Definitions +--------------------------- + +.. literalinclude:: ../../../format/Flight.proto + :language: protobuf + :linenos: |