summaryrefslogtreecommitdiffstats
path: root/src/arrow/docs/source/cpp/memory.rst
diff options
context:
space:
mode:
Diffstat (limited to 'src/arrow/docs/source/cpp/memory.rst')
-rw-r--r--src/arrow/docs/source/cpp/memory.rst203
1 files changed, 203 insertions, 0 deletions
diff --git a/src/arrow/docs/source/cpp/memory.rst b/src/arrow/docs/source/cpp/memory.rst
new file mode 100644
index 000000000..ff8ffb044
--- /dev/null
+++ b/src/arrow/docs/source/cpp/memory.rst
@@ -0,0 +1,203 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. default-domain:: cpp
+.. highlight:: cpp
+
+.. _cpp_memory_management:
+
+=================
+Memory Management
+=================
+
+.. seealso::
+ :doc:`Memory management API reference <api/memory>`
+
+Buffers
+=======
+
+To avoid passing around raw data pointers with varying and non-obvious
+lifetime rules, Arrow provides a generic abstraction called :class:`arrow::Buffer`.
+A Buffer encapsulates a pointer and data size, and generally also ties its
+lifetime to that of an underlying provider (in other words, a Buffer should
+*always* point to valid memory till its destruction). Buffers are untyped:
+they simply denote a physical memory area regardless of its intended meaning
+or interpretation.
+
+Buffers may be allocated by Arrow itself , or by third-party routines.
+For example, it is possible to pass the data of a Python bytestring as a Arrow
+buffer, keeping the Python object alive as necessary.
+
+In addition, buffers come in various flavours: mutable or not, resizable or
+not. Generally, you will hold a mutable buffer when building up a piece
+of data, then it will be frozen as an immutable container such as an
+:doc:`array <arrays>`.
+
+.. note::
+ Some buffers may point to non-CPU memory, such as GPU-backed memory
+ provided by a CUDA context. If you're writing a GPU-aware application,
+ you will need to be careful not to interpret a GPU memory pointer as
+ a CPU-reachable pointer, or vice-versa.
+
+Accessing Buffer Memory
+-----------------------
+
+Buffers provide fast access to the underlying memory using the
+:func:`~arrow::Buffer::size` and :func:`~arrow::Buffer::data` accessors
+(or :func:`~arrow::Buffer::mutable_data` for writable access to a mutable
+buffer).
+
+Slicing
+-------
+
+It is possible to make zero-copy slices of buffers, to obtain a buffer
+referring to some contiguous subset of the underlying data. This is done
+by calling the :func:`arrow::SliceBuffer` and :func:`arrow::SliceMutableBuffer`
+functions.
+
+Allocating a Buffer
+-------------------
+
+You can allocate a buffer yourself by calling one of the
+:func:`arrow::AllocateBuffer` or :func:`arrow::AllocateResizableBuffer`
+overloads::
+
+ arrow::Result<std::unique_ptr<Buffer>> maybe_buffer = arrow::AllocateBuffer(4096);
+ if (!maybe_buffer.ok()) {
+ // ... handle allocation error
+ }
+
+ std::shared_ptr<arrow::Buffer> buffer = *std::move(maybe_buffer);
+ uint8_t* buffer_data = buffer->mutable_data();
+ memcpy(buffer_data, "hello world", 11);
+
+Allocating a buffer this way ensures it is 64-bytes aligned and padded
+as recommended by the :doc:`Arrow memory specification <../format/Layout>`.
+
+Building a Buffer
+-----------------
+
+You can also allocate *and* build a Buffer incrementally, using the
+:class:`arrow::BufferBuilder` API::
+
+ BufferBuilder builder;
+ builder.Resize(11); // reserve enough space for 11 bytes
+ builder.Append("hello ", 6);
+ builder.Append("world", 5);
+
+ auto maybe_buffer = builder.Finish();
+ if (!maybe_buffer.ok()) {
+ // ... handle buffer allocation error
+ }
+ std::shared_ptr<arrow::Buffer> buffer = *maybe_buffer;
+
+If a Buffer is meant to contain values of a given fixed-width type (for
+example the 32-bit offsets of a List array), it can be more convenient to
+use the template :class:`arrow::TypedBufferBuilder` API::
+
+ TypedBufferBuilder<int32_t> builder;
+ builder.Reserve(2); // reserve enough space for two int32_t values
+ builder.Append(0x12345678);
+ builder.Append(-0x765643210);
+
+ auto maybe_buffer = builder.Finish();
+ if (!maybe_buffer.ok()) {
+ // ... handle buffer allocation error
+ }
+ std::shared_ptr<arrow::Buffer> buffer = *maybe_buffer;
+
+Memory Pools
+============
+
+When allocating a Buffer using the Arrow C++ API, the buffer's underlying
+memory is allocated by a :class:`arrow::MemoryPool` instance. Usually this
+will be the process-wide *default memory pool*, but many Arrow APIs allow
+you to pass another MemoryPool instance for their internal allocations.
+
+Memory pools are used for large long-lived data such as array buffers.
+Other data, such as small C++ objects and temporary workspaces, usually
+goes through the regular C++ allocators.
+
+Default Memory Pool
+-------------------
+
+The default memory pool depends on how Arrow C++ was compiled:
+
+- if enabled at compile time, a `jemalloc <http://jemalloc.net/>`_ heap;
+- otherwise, if enabled at compile time, a
+ `mimalloc <https://github.com/microsoft/mimalloc>`_ heap;
+- otherwise, the C library ``malloc`` heap.
+
+Overriding the Default Memory Pool
+----------------------------------
+
+One can override the above selection algorithm by setting the
+``ARROW_DEFAULT_MEMORY_POOL`` environment variable to one of the following
+values: ``jemalloc``, ``mimalloc`` or ``system``. This variable is inspected
+once when Arrow C++ is loaded in memory (for example when the Arrow C++ DLL
+is loaded).
+
+STL Integration
+---------------
+
+If you wish to use a Arrow memory pool to allocate the data of STL containers,
+you can do so using the :class:`arrow::stl::allocator` wrapper.
+
+Conversely, you can also use a STL allocator to allocate Arrow memory,
+using the :class:`arrow::stl::STLMemoryPool` class. However, this may be less
+performant, as STL allocators don't provide a resizing operation.
+
+Devices
+=======
+
+Many Arrow applications only access host (CPU) memory. However, in some cases
+it is desirable to handle on-device memory (such as on-board memory on a GPU)
+as well as host memory.
+
+Arrow represents the CPU and other devices using the
+:class:`arrow::Device` abstraction. The associated class :class:`arrow::MemoryManager`
+specifies how to allocate on a given device. Each device has a default memory manager, but
+additional instances may be constructed (for example, wrapping a custom
+:class:`arrow::MemoryPool` the CPU).
+:class:`arrow::MemoryManager` instances which specify how to allocate
+memory on a given device (for example, using a particular
+:class:`arrow::MemoryPool` on the CPU).
+
+Device-Agnostic Programming
+---------------------------
+
+If you receive a Buffer from third-party code, you can query whether it is
+CPU-readable by calling its :func:`~arrow::Buffer::is_cpu` method.
+
+You can also view the Buffer on a given device, in a generic way, by calling
+:func:`arrow::Buffer::View` or :func:`arrow::Buffer::ViewOrCopy`. This will
+be a no-operation if the source and destination devices are identical.
+Otherwise, a device-dependent mechanism will attempt to construct a memory
+address for the destination device that gives access to the buffer contents.
+Actual device-to-device transfer may happen lazily, when reading the buffer
+contents.
+
+Similarly, if you want to do I/O on a buffer without assuming a CPU-readable
+buffer, you can call :func:`arrow::Buffer::GetReader` and
+:func:`arrow::Buffer::GetWriter`.
+
+For example, to get an on-CPU view or copy of an arbitrary buffer, you can
+simply do::
+
+ std::shared_ptr<arrow::Buffer> arbitrary_buffer = ... ;
+ std::shared_ptr<arrow::Buffer> cpu_buffer = arrow::Buffer::ViewOrCopy(
+ arbitrary_buffer, arrow::default_cpu_memory_manager());