diff options
Diffstat (limited to 'src/arrow/docs/source/cpp/overview.rst')
-rw-r--r-- | src/arrow/docs/source/cpp/overview.rst | 97 |
1 files changed, 97 insertions, 0 deletions
diff --git a/src/arrow/docs/source/cpp/overview.rst b/src/arrow/docs/source/cpp/overview.rst new file mode 100644 index 000000000..ccebdba45 --- /dev/null +++ b/src/arrow/docs/source/cpp/overview.rst @@ -0,0 +1,97 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: cpp +.. highlight:: cpp + +High-Level Overview +=================== + +The Arrow C++ library is comprised of different parts, each of which serves +a specific purpose. + +The physical layer +------------------ + +**Memory management** abstractions provide a uniform API over memory that +may be allocated through various means, such as heap allocation, the memory +mapping of a file or a static memory area. In particular, the **buffer** +abstraction represents a contiguous area of physical data. + +The one-dimensional layer +------------------------- + +**Data types** govern the *logical* interpretation of *physical* data. +Many operations in Arrow are parametered, at compile-time or at runtime, +by a data type. + +**Arrays** assemble one or several buffers with a data type, allowing to +view them as a logical contiguous sequence of values (possibly nested). + +**Chunked arrays** are a generalization of arrays, comprising several same-type +arrays into a longer logical sequence of values. + +The two-dimensional layer +------------------------- + +**Schemas** describe a logical collection of several pieces of data, +each with a distinct name and type, and optional metadata. + +**Tables** are collections of chunked array in accordance to a schema. They +are the most capable dataset-providing abstraction in Arrow. + +**Record batches** are collections of contiguous arrays, described +by a schema. They allow incremental construction or serialization of tables. + +The compute layer +----------------- + +**Datums** are flexible dataset references, able to hold for example an array or table +reference. + +**Kernels** are specialized computation functions running in a loop over a +given set of datums representing input and output parameters to the functions. + +The IO layer +------------ + +**Streams** allow untyped sequential or seekable access over external data +of various kinds (for example compressed or memory-mapped). + +The Inter-Process Communication (IPC) layer +------------------------------------------- + +A **messaging format** allows interchange of Arrow data between processes, using +as few copies as possible. + +The file formats layer +---------------------- + +Reading and writing Arrow data from/to various file formats is possible, for +example **Parquet**, **CSV**, **Orc** or the Arrow-specific **Feather** format. + +The devices layer +----------------- + +Basic **CUDA** integration is provided, allowing to describe Arrow data backed +by GPU-allocated memory. + +The filesystem layer +-------------------- + +A filesystem abstraction allows reading and writing data from different storage +backends, such as the local filesystem or a S3 bucket. |