diff options
Diffstat (limited to 'Documentation/accel/introduction.rst')
-rw-r--r-- | Documentation/accel/introduction.rst | 110 |
1 files changed, 110 insertions, 0 deletions
diff --git a/Documentation/accel/introduction.rst b/Documentation/accel/introduction.rst new file mode 100644 index 0000000000..89984dfece --- /dev/null +++ b/Documentation/accel/introduction.rst @@ -0,0 +1,110 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============ +Introduction +============ + +The Linux compute accelerators subsystem is designed to expose compute +accelerators in a common way to user-space and provide a common set of +functionality. + +These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. +Although these devices are typically designed to accelerate +Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer +is not limited to handling these types of accelerators. + +Typically, a compute accelerator will belong to one of the following +categories: + +- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, + or an IP inside a SoC (e.g. laptop web camera). These devices + are typically configured using registers and can work with or without DMA. + +- Inference data-center - single/multi user devices in a large server. This + type of device can be stand-alone or an IP inside a SoC or a GPU. It will + have on-board DRAM (to hold the DL topology), DMA engines and + command submission queues (either kernel or user-space queues). + It might also have an MMU to manage multiple users and might also enable + virtualization (SR-IOV) to support multiple VMs on the same device. In + addition, these devices will usually have some tools, such as profiler and + debugger. + +- Training data-center - Similar to Inference data-center cards, but typically + have more computational power and memory b/w (e.g. HBM) and will likely have + a method of scaling-up/out, i.e. connecting to other training cards inside + the server or in other servers, respectively. + +All these devices typically have different runtime user-space software stacks, +that are tailored-made to their h/w. In addition, they will also probably +include a compiler to generate programs to their custom-made computational +engines. Typically, the common layer in user-space will be the DL frameworks, +such as PyTorch and TensorFlow. + +Sharing code with DRM +===================== + +Because this type of devices can be an IP inside GPUs or have similar +characteristics as those of GPUs, the accel subsystem will use the +DRM subsystem's code and functionality. i.e. the accel core code will +be part of the DRM subsystem and an accel device will be a new type of DRM +device. + +This will allow us to leverage the extensive DRM code-base and +collaborate with DRM developers that have experience with this type of +devices. In addition, new features that will be added for the accelerator +drivers can be of use to GPU drivers as well. + +Differentiation from GPUs +========================= + +Because we want to prevent the extensive user-space graphic software stack +from trying to use an accelerator as a GPU, the compute accelerators will be +differentiated from GPUs by using a new major number and new device char files. + +Furthermore, the drivers will be located in a separate place in the kernel +tree - drivers/accel/. + +The accelerator devices will be exposed to the user space with the dedicated +261 major number and will have the following convention: + +- device char files - /dev/accel/accel\* +- sysfs - /sys/class/accel/accel\*/ +- debugfs - /sys/kernel/debug/accel/\*/ + +Getting Started +=============== + +First, read the DRM documentation at Documentation/gpu/index.rst. +Not only it will explain how to write a new DRM driver but it will also +contain all the information on how to contribute, the Code Of Conduct and +what is the coding style/documentation. All of that is the same for the +accel subsystem. + +Second, make sure the kernel is configured with CONFIG_DRM_ACCEL. + +To expose your device as an accelerator, two changes are needed to +be done in your driver (as opposed to a standard DRM driver): + +- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's + driver_features field. It is important to note that this driver feature is + mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want + to expose both graphics and compute device char files should be handled by + two drivers that are connected using the auxiliary bus framework. + +- Change the open callback in your driver fops structure to accel_open(). + Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily + set the correct function operations pointers structure. + +External References +=================== + +email threads +------------- + +* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022) +* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022) + +Conference talks +---------------- + +* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022) |