summaryrefslogtreecommitdiffstats
path: root/src/arrow/docs/source/format/Flight.rst
blob: c79c563864a0ea8bd6ff3da5c8a58129d3f124ca (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements.  See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership.  The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License.  You may obtain a copy of the License at

..   http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied.  See the License for the
.. specific language governing permissions and limitations
.. under the License.

.. _flight-rpc:

Arrow Flight RPC
================

Arrow Flight is an RPC framework for high-performance data services
based on Arrow data, and is built on top of gRPC_ and the :doc:`IPC
format <IPC>`.

Flight is organized around streams of Arrow record batches, being
either downloaded from or uploaded to another service. A set of
metadata methods offers discovery and introspection of streams, as
well as the ability to implement application-specific methods.

Methods and message wire formats are defined by Protobuf, enabling
interoperability with clients that may support gRPC and Arrow
separately, but not Flight. However, Flight implementations include
further optimizations to avoid overhead in usage of Protobuf (mostly
around avoiding excessive memory copies).

.. _gRPC: https://grpc.io/

RPC Methods
-----------

Flight defines a set of RPC methods for uploading/downloading data,
retrieving metadata about a data stream, listing available data
streams, and for implementing application-specific RPC methods. A
Flight service implements some subset of these methods, while a Flight
client can call any of these methods. Thus, one Flight client can
connect to any Flight service and perform basic operations.

Data streams are identified by descriptors, which are either a path or
an arbitrary binary command. A client that wishes to download the data
would:

#. Construct or acquire a ``FlightDescriptor`` for the data set they
   are interested in. A client may know what descriptor they want
   already, or they may use methods like ``ListFlights`` to discover
   them.
#. Call ``GetFlightInfo(FlightDescriptor)`` to get a ``FlightInfo``
   message containing details on where the data is located (as well as
   other metadata, like the schema and possibly an estimate of the
   dataset size).

   Flight does not require that data live on the same server as
   metadata: this call may list other servers to connect to. The
   ``FlightInfo`` message includes a ``Ticket``, an opaque binary
   token that the server uses to identify the exact data set being
   requested.
#. Connect to other servers (if needed).
#. Call ``DoGet(Ticket)`` to get back a stream of Arrow record
   batches.

To upload data, a client would:

#. Construct or acquire a ``FlightDescriptor``, as before.
#. Call ``DoPut(FlightData)`` and upload a stream of Arrow record
   batches. They would also include the ``FlightDescriptor`` with the
   first message.

See `Protocol Buffer Definitions`_ for full details on the methods and
messages involved.

Authentication
--------------

Flight supports application-implemented authentication
methods. Authentication, if enabled, has two phases: at connection
time, the client and server can exchange any number of messages. Then,
the client can provide a token alongside each call, and the server can
validate that token.

Applications may use any part of this; for instance, they may ignore
the initial handshake and send an externally acquired token on each
call, or they may establish trust during the handshake and not
validate a token for each call. (Note that the latter is not secure if
you choose to deploy a layer 7 load balancer, as is common with gRPC.)

Error Handling
--------------

Arrow Flight defines its own set of error codes. The implementation
differs between languages (e.g. in C++, Unimplemented is a general
Arrow error status while it's a Flight-specific exception in Java),
but the following set is exposed:

+----------------+-------------------------------------------+
|Error Code      |Description                                |
+================+===========================================+
|UNKNOWN         |An unknown error. The default if no other  |
|                |error applies.                             |
+----------------+-------------------------------------------+
|INTERNAL        |An error internal to the service           |
|                |implementation occurred.                   |
+----------------+-------------------------------------------+
|INVALID_ARGUMENT|The client passed an invalid argument to   |
|                |the RPC.                                   |
+----------------+-------------------------------------------+
|TIMED_OUT       |The operation exceeded a timeout or        |
|                |deadline.                                  |
+----------------+-------------------------------------------+
|NOT_FOUND       |The requested resource (action, data       |
|                |stream) was not found.                     |
+----------------+-------------------------------------------+
|ALREADY_EXISTS  |The resource already exists.               |
+----------------+-------------------------------------------+
|CANCELLED       |The operation was cancelled (either by the |
|                |client or the server).                     |
+----------------+-------------------------------------------+
|UNAUTHENTICATED |The client is not authenticated.           |
+----------------+-------------------------------------------+
|UNAUTHORIZED    |The client is authenticated, but does not  |
|                |have permissions for the requested         |
|                |operation.                                 |
+----------------+-------------------------------------------+
|UNIMPLEMENTED   |The RPC is not implemented.                |
+----------------+-------------------------------------------+
|UNAVAILABLE     |The server is not available. May be emitted|
|                |by the client for connectivity reasons.    |
+----------------+-------------------------------------------+


External Resources
------------------

- https://arrow.apache.org/blog/2018/10/09/0.11.0-release/
- https://www.slideshare.net/JacquesNadeau5/apache-arrow-flight-overview

Protocol Buffer Definitions
---------------------------

.. literalinclude:: ../../../format/Flight.proto
   :language: protobuf
   :linenos: