.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied. See the License for the
.. specific language governing permissions and limitations
.. under the License.
.. currentmodule:: pyarrow.cuda
CUDA Integration
================
Arrow is not limited to CPU buffers (located in the computer's main memory,
also named "host memory"). It also has provisions for accessing buffers
located on a CUDA-capable GPU device (in "device memory").
.. note::
This functionality is optional and must have been enabled at build time.
If this is not done by your package manager, you might have to build Arrow
yourself.
CUDA Contexts
-------------
A CUDA context represents access to a particular CUDA-capable device.
For example, this is creating a CUDA context accessing CUDA device number 0::
>>> from pyarrow import cuda
>>> ctx = cuda.Context(0)
>>>
CUDA Buffers
------------
A CUDA buffer can be created by copying data from host memory to the memory
of a CUDA device, using the :meth:`Context.buffer_from_data` method.
The source data can be any Python buffer-like object, including Arrow buffers::
>>> import numpy as np
>>> arr = np.arange(4, dtype=np.int32)
>>> arr.nbytes
16
>>> cuda_buf = ctx.buffer_from_data(arr)
>>> type(cuda_buf)
pyarrow._cuda.CudaBuffer
>>> cuda_buf.size # The buffer's size in bytes
16
>>> cuda_buf.address # The buffer's address in device memory
30088364544
>>> cuda_buf.context.device_number
0
Conversely, you can copy back a CUDA buffer to device memory, getting a regular
CPU buffer::
>>> buf = cuda_buf.copy_to_host()
>>> type(buf)
pyarrow.lib.Buffer
>>> np.frombuffer(buf, dtype=np.int32)
array([0, 1, 2, 3], dtype=int32)
.. warning::
Many Arrow functions expect a CPU buffer but will not check the buffer's
actual type. You will get a crash if you pass a CUDA buffer to such a
function::
>>> pa.py_buffer(b"x" * 16).equals(cuda_buf)
Segmentation fault
Numba Integration
-----------------
There is not much you can do directly with Arrow CUDA buffers from Python,
but they support interoperation with `Numba `_,
a JIT compiler which can turn Python code into optimized CUDA kernels.
Arrow to Numba
~~~~~~~~~~~~~~
First let's define a Numba CUDA kernel operating on an ``int32`` array. Here,
we will simply increment each array element (assuming the array is writable)::
import numba.cuda
@numba.cuda.jit
def increment_by_one(an_array):
pos = numba.cuda.grid(1)
if pos < an_array.size:
an_array[pos] += 1
Then we need to wrap our CUDA buffer into a Numba "device array" with the right
array metadata (shape, strides and datatype). This is necessary so that Numba
can identify the array's characteristics and compile the kernel with the
appropriate type declarations.
In this case the metadata can simply be got from the original Numpy array.
Note the GPU data isn't copied, just pointed to::
>>> from numba.cuda.cudadrv.devicearray import DeviceNDArray
>>> device_arr = DeviceNDArray(arr.shape, arr.strides, arr.dtype, gpu_data=cuda_buf.to_numba())
(ideally we could have defined an Arrow array in CPU memory, copied it to CUDA
memory without losing type information, and then invoked the Numba kernel on it
without constructing the DeviceNDArray by hand; this is not yet possible)
Finally we can run the Numba CUDA kernel on the Numba device array (here
with a 16x16 grid size)::
>>> increment_by_one[16, 16](device_arr)
And the results can be checked by copying back the CUDA buffer to CPU memory::
>>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32)
array([1, 2, 3, 4], dtype=int32)
Numba to Arrow
~~~~~~~~~~~~~~
Conversely, a Numba-created device array can be viewed as an Arrow CUDA buffer,
using the :meth:`CudaBuffer.from_numba` factory method.
For the sake of example, let's first create a Numba device array::
>>> arr = np.arange(10, 14, dtype=np.int32)
>>> arr
array([10, 11, 12, 13], dtype=int32)
>>> device_arr = numba.cuda.to_device(arr)
Then we can create a CUDA buffer pointing the device array's memory.
We don't need to pass a CUDA context explicitly this time: the appropriate
CUDA context is automatically retrieved and adapted from the Numba object.
::
>>> cuda_buf = cuda.CudaBuffer.from_numba(device_arr.gpu_data)
>>> cuda_buf.size
16
>>> cuda_buf.address
30088364032
>>> cuda_buf.context.device_number
0
Of course, we can copy the CUDA buffer back to host memory::
>>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32)
array([10, 11, 12, 13], dtype=int32)
.. seealso::
Documentation for Numba's `CUDA support `_.