diff options
Diffstat (limited to 'src/arrow/docs/source/cpp/memory.rst')
-rw-r--r-- | src/arrow/docs/source/cpp/memory.rst | 203 |
1 files changed, 203 insertions, 0 deletions
diff --git a/src/arrow/docs/source/cpp/memory.rst b/src/arrow/docs/source/cpp/memory.rst new file mode 100644 index 000000000..ff8ffb044 --- /dev/null +++ b/src/arrow/docs/source/cpp/memory.rst @@ -0,0 +1,203 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: cpp +.. highlight:: cpp + +.. _cpp_memory_management: + +================= +Memory Management +================= + +.. seealso:: + :doc:`Memory management API reference <api/memory>` + +Buffers +======= + +To avoid passing around raw data pointers with varying and non-obvious +lifetime rules, Arrow provides a generic abstraction called :class:`arrow::Buffer`. +A Buffer encapsulates a pointer and data size, and generally also ties its +lifetime to that of an underlying provider (in other words, a Buffer should +*always* point to valid memory till its destruction). Buffers are untyped: +they simply denote a physical memory area regardless of its intended meaning +or interpretation. + +Buffers may be allocated by Arrow itself , or by third-party routines. +For example, it is possible to pass the data of a Python bytestring as a Arrow +buffer, keeping the Python object alive as necessary. + +In addition, buffers come in various flavours: mutable or not, resizable or +not. Generally, you will hold a mutable buffer when building up a piece +of data, then it will be frozen as an immutable container such as an +:doc:`array <arrays>`. + +.. note:: + Some buffers may point to non-CPU memory, such as GPU-backed memory + provided by a CUDA context. If you're writing a GPU-aware application, + you will need to be careful not to interpret a GPU memory pointer as + a CPU-reachable pointer, or vice-versa. + +Accessing Buffer Memory +----------------------- + +Buffers provide fast access to the underlying memory using the +:func:`~arrow::Buffer::size` and :func:`~arrow::Buffer::data` accessors +(or :func:`~arrow::Buffer::mutable_data` for writable access to a mutable +buffer). + +Slicing +------- + +It is possible to make zero-copy slices of buffers, to obtain a buffer +referring to some contiguous subset of the underlying data. This is done +by calling the :func:`arrow::SliceBuffer` and :func:`arrow::SliceMutableBuffer` +functions. + +Allocating a Buffer +------------------- + +You can allocate a buffer yourself by calling one of the +:func:`arrow::AllocateBuffer` or :func:`arrow::AllocateResizableBuffer` +overloads:: + + arrow::Result<std::unique_ptr<Buffer>> maybe_buffer = arrow::AllocateBuffer(4096); + if (!maybe_buffer.ok()) { + // ... handle allocation error + } + + std::shared_ptr<arrow::Buffer> buffer = *std::move(maybe_buffer); + uint8_t* buffer_data = buffer->mutable_data(); + memcpy(buffer_data, "hello world", 11); + +Allocating a buffer this way ensures it is 64-bytes aligned and padded +as recommended by the :doc:`Arrow memory specification <../format/Layout>`. + +Building a Buffer +----------------- + +You can also allocate *and* build a Buffer incrementally, using the +:class:`arrow::BufferBuilder` API:: + + BufferBuilder builder; + builder.Resize(11); // reserve enough space for 11 bytes + builder.Append("hello ", 6); + builder.Append("world", 5); + + auto maybe_buffer = builder.Finish(); + if (!maybe_buffer.ok()) { + // ... handle buffer allocation error + } + std::shared_ptr<arrow::Buffer> buffer = *maybe_buffer; + +If a Buffer is meant to contain values of a given fixed-width type (for +example the 32-bit offsets of a List array), it can be more convenient to +use the template :class:`arrow::TypedBufferBuilder` API:: + + TypedBufferBuilder<int32_t> builder; + builder.Reserve(2); // reserve enough space for two int32_t values + builder.Append(0x12345678); + builder.Append(-0x765643210); + + auto maybe_buffer = builder.Finish(); + if (!maybe_buffer.ok()) { + // ... handle buffer allocation error + } + std::shared_ptr<arrow::Buffer> buffer = *maybe_buffer; + +Memory Pools +============ + +When allocating a Buffer using the Arrow C++ API, the buffer's underlying +memory is allocated by a :class:`arrow::MemoryPool` instance. Usually this +will be the process-wide *default memory pool*, but many Arrow APIs allow +you to pass another MemoryPool instance for their internal allocations. + +Memory pools are used for large long-lived data such as array buffers. +Other data, such as small C++ objects and temporary workspaces, usually +goes through the regular C++ allocators. + +Default Memory Pool +------------------- + +The default memory pool depends on how Arrow C++ was compiled: + +- if enabled at compile time, a `jemalloc <http://jemalloc.net/>`_ heap; +- otherwise, if enabled at compile time, a + `mimalloc <https://github.com/microsoft/mimalloc>`_ heap; +- otherwise, the C library ``malloc`` heap. + +Overriding the Default Memory Pool +---------------------------------- + +One can override the above selection algorithm by setting the +``ARROW_DEFAULT_MEMORY_POOL`` environment variable to one of the following +values: ``jemalloc``, ``mimalloc`` or ``system``. This variable is inspected +once when Arrow C++ is loaded in memory (for example when the Arrow C++ DLL +is loaded). + +STL Integration +--------------- + +If you wish to use a Arrow memory pool to allocate the data of STL containers, +you can do so using the :class:`arrow::stl::allocator` wrapper. + +Conversely, you can also use a STL allocator to allocate Arrow memory, +using the :class:`arrow::stl::STLMemoryPool` class. However, this may be less +performant, as STL allocators don't provide a resizing operation. + +Devices +======= + +Many Arrow applications only access host (CPU) memory. However, in some cases +it is desirable to handle on-device memory (such as on-board memory on a GPU) +as well as host memory. + +Arrow represents the CPU and other devices using the +:class:`arrow::Device` abstraction. The associated class :class:`arrow::MemoryManager` +specifies how to allocate on a given device. Each device has a default memory manager, but +additional instances may be constructed (for example, wrapping a custom +:class:`arrow::MemoryPool` the CPU). +:class:`arrow::MemoryManager` instances which specify how to allocate +memory on a given device (for example, using a particular +:class:`arrow::MemoryPool` on the CPU). + +Device-Agnostic Programming +--------------------------- + +If you receive a Buffer from third-party code, you can query whether it is +CPU-readable by calling its :func:`~arrow::Buffer::is_cpu` method. + +You can also view the Buffer on a given device, in a generic way, by calling +:func:`arrow::Buffer::View` or :func:`arrow::Buffer::ViewOrCopy`. This will +be a no-operation if the source and destination devices are identical. +Otherwise, a device-dependent mechanism will attempt to construct a memory +address for the destination device that gives access to the buffer contents. +Actual device-to-device transfer may happen lazily, when reading the buffer +contents. + +Similarly, if you want to do I/O on a buffer without assuming a CPU-readable +buffer, you can call :func:`arrow::Buffer::GetReader` and +:func:`arrow::Buffer::GetWriter`. + +For example, to get an on-CPU view or copy of an arbitrary buffer, you can +simply do:: + + std::shared_ptr<arrow::Buffer> arbitrary_buffer = ... ; + std::shared_ptr<arrow::Buffer> cpu_buffer = arrow::Buffer::ViewOrCopy( + arbitrary_buffer, arrow::default_cpu_memory_manager()); |