summaryrefslogtreecommitdiffstats
path: root/src/arrow/docs/source/cpp/overview.rst
diff options
context:
space:
mode:
Diffstat (limited to 'src/arrow/docs/source/cpp/overview.rst')
-rw-r--r--src/arrow/docs/source/cpp/overview.rst97
1 files changed, 97 insertions, 0 deletions
diff --git a/src/arrow/docs/source/cpp/overview.rst b/src/arrow/docs/source/cpp/overview.rst
new file mode 100644
index 000000000..ccebdba45
--- /dev/null
+++ b/src/arrow/docs/source/cpp/overview.rst
@@ -0,0 +1,97 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. default-domain:: cpp
+.. highlight:: cpp
+
+High-Level Overview
+===================
+
+The Arrow C++ library is comprised of different parts, each of which serves
+a specific purpose.
+
+The physical layer
+------------------
+
+**Memory management** abstractions provide a uniform API over memory that
+may be allocated through various means, such as heap allocation, the memory
+mapping of a file or a static memory area. In particular, the **buffer**
+abstraction represents a contiguous area of physical data.
+
+The one-dimensional layer
+-------------------------
+
+**Data types** govern the *logical* interpretation of *physical* data.
+Many operations in Arrow are parametered, at compile-time or at runtime,
+by a data type.
+
+**Arrays** assemble one or several buffers with a data type, allowing to
+view them as a logical contiguous sequence of values (possibly nested).
+
+**Chunked arrays** are a generalization of arrays, comprising several same-type
+arrays into a longer logical sequence of values.
+
+The two-dimensional layer
+-------------------------
+
+**Schemas** describe a logical collection of several pieces of data,
+each with a distinct name and type, and optional metadata.
+
+**Tables** are collections of chunked array in accordance to a schema. They
+are the most capable dataset-providing abstraction in Arrow.
+
+**Record batches** are collections of contiguous arrays, described
+by a schema. They allow incremental construction or serialization of tables.
+
+The compute layer
+-----------------
+
+**Datums** are flexible dataset references, able to hold for example an array or table
+reference.
+
+**Kernels** are specialized computation functions running in a loop over a
+given set of datums representing input and output parameters to the functions.
+
+The IO layer
+------------
+
+**Streams** allow untyped sequential or seekable access over external data
+of various kinds (for example compressed or memory-mapped).
+
+The Inter-Process Communication (IPC) layer
+-------------------------------------------
+
+A **messaging format** allows interchange of Arrow data between processes, using
+as few copies as possible.
+
+The file formats layer
+----------------------
+
+Reading and writing Arrow data from/to various file formats is possible, for
+example **Parquet**, **CSV**, **Orc** or the Arrow-specific **Feather** format.
+
+The devices layer
+-----------------
+
+Basic **CUDA** integration is provided, allowing to describe Arrow data backed
+by GPU-allocated memory.
+
+The filesystem layer
+--------------------
+
+A filesystem abstraction allows reading and writing data from different storage
+backends, such as the local filesystem or a S3 bucket.