From 5d1646d90e1f2cceb9f0828f4b28318cd0ec7744 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 27 Apr 2024 12:05:51 +0200 Subject: Adding upstream version 5.10.209. Signed-off-by: Daniel Baumann --- Documentation/gpu/drm-mm.rst | 506 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 506 insertions(+) create mode 100644 Documentation/gpu/drm-mm.rst (limited to 'Documentation/gpu/drm-mm.rst') diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst new file mode 100644 index 000000000..9abee1589 --- /dev/null +++ b/Documentation/gpu/drm-mm.rst @@ -0,0 +1,506 @@ +===================== +DRM Memory Management +===================== + +Modern Linux systems require large amount of graphics memory to store +frame buffers, textures, vertices and other graphics-related data. Given +the very dynamic nature of many of that data, managing graphics memory +efficiently is thus crucial for the graphics stack and plays a central +role in the DRM infrastructure. + +The DRM core includes two memory managers, namely Translation Table Maps +(TTM) and Graphics Execution Manager (GEM). TTM was the first DRM memory +manager to be developed and tried to be a one-size-fits-them all +solution. It provides a single userspace API to accommodate the need of +all hardware, supporting both Unified Memory Architecture (UMA) devices +and devices with dedicated video RAM (i.e. most discrete video cards). +This resulted in a large, complex piece of code that turned out to be +hard to use for driver development. + +GEM started as an Intel-sponsored project in reaction to TTM's +complexity. Its design philosophy is completely different: instead of +providing a solution to every graphics memory-related problems, GEM +identified common code between drivers and created a support library to +share it. GEM has simpler initialization and execution requirements than +TTM, but has no video RAM management capabilities and is thus limited to +UMA devices. + +The Translation Table Manager (TTM) +=================================== + +TTM design background and information belongs here. + +TTM initialization +------------------ + + **Warning** + This section is outdated. + +Drivers wishing to support TTM must pass a filled :c:type:`ttm_bo_driver +` structure to ttm_bo_device_init, together with an +initialized global reference to the memory manager. The ttm_bo_driver +structure contains several fields with function pointers for +initializing the TTM, allocating and freeing memory, waiting for command +completion and fence synchronization, and memory migration. + +The :c:type:`struct drm_global_reference ` is made +up of several fields: + +.. code-block:: c + + struct drm_global_reference { + enum ttm_global_types global_type; + size_t size; + void *object; + int (*init) (struct drm_global_reference *); + void (*release) (struct drm_global_reference *); + }; + + +There should be one global reference structure for your memory manager +as a whole, and there will be others for each object created by the +memory manager at runtime. Your global TTM should have a type of +TTM_GLOBAL_TTM_MEM. The size field for the global object should be +sizeof(struct ttm_mem_global), and the init and release hooks should +point at your driver-specific init and release routines, which probably +eventually call ttm_mem_global_init and ttm_mem_global_release, +respectively. + +Once your global TTM accounting structure is set up and initialized by +calling ttm_global_item_ref() on it, you need to create a buffer +object TTM to provide a pool for buffer object allocation by clients and +the kernel itself. The type of this object should be +TTM_GLOBAL_TTM_BO, and its size should be sizeof(struct +ttm_bo_global). Again, driver-specific init and release functions may +be provided, likely eventually calling ttm_bo_global_ref_init() and +ttm_bo_global_ref_release(), respectively. Also, like the previous +object, ttm_global_item_ref() is used to create an initial reference +count for the TTM, which will call your initialization function. + +See the radeon_ttm.c file for an example of usage. + +The Graphics Execution Manager (GEM) +==================================== + +The GEM design approach has resulted in a memory manager that doesn't +provide full coverage of all (or even all common) use cases in its +userspace or kernel API. GEM exposes a set of standard memory-related +operations to userspace and a set of helper functions to drivers, and +let drivers implement hardware-specific operations with their own +private API. + +The GEM userspace API is described in the `GEM - the Graphics Execution +Manager `__ article on LWN. While +slightly outdated, the document provides a good overview of the GEM API +principles. Buffer allocation and read and write operations, described +as part of the common GEM API, are currently implemented using +driver-specific ioctls. + +GEM is data-agnostic. It manages abstract buffer objects without knowing +what individual buffers contain. APIs that require knowledge of buffer +contents or purpose, such as buffer allocation or synchronization +primitives, are thus outside of the scope of GEM and must be implemented +using driver-specific ioctls. + +On a fundamental level, GEM involves several operations: + +- Memory allocation and freeing +- Command execution +- Aperture management at command execution time + +Buffer object allocation is relatively straightforward and largely +provided by Linux's shmem layer, which provides memory to back each +object. + +Device-specific operations, such as command execution, pinning, buffer +read & write, mapping, and domain ownership transfers are left to +driver-specific ioctls. + +GEM Initialization +------------------ + +Drivers that use GEM must set the DRIVER_GEM bit in the struct +:c:type:`struct drm_driver ` driver_features +field. The DRM core will then automatically initialize the GEM core +before calling the load operation. Behind the scene, this will create a +DRM Memory Manager object which provides an address space pool for +object allocation. + +In a KMS configuration, drivers need to allocate and initialize a +command ring buffer following core GEM initialization if required by the +hardware. UMA devices usually have what is called a "stolen" memory +region, which provides space for the initial framebuffer and large, +contiguous memory regions required by the device. This space is +typically not managed by GEM, and must be initialized separately into +its own DRM MM object. + +GEM Objects Creation +-------------------- + +GEM splits creation of GEM objects and allocation of the memory that +backs them in two distinct operations. + +GEM objects are represented by an instance of struct :c:type:`struct +drm_gem_object `. Drivers usually need to +extend GEM objects with private information and thus create a +driver-specific GEM object structure type that embeds an instance of +struct :c:type:`struct drm_gem_object `. + +To create a GEM object, a driver allocates memory for an instance of its +specific GEM object type and initializes the embedded struct +:c:type:`struct drm_gem_object ` with a call +to drm_gem_object_init(). The function takes a pointer +to the DRM device, a pointer to the GEM object and the buffer object +size in bytes. + +GEM uses shmem to allocate anonymous pageable memory. +drm_gem_object_init() will create an shmfs file of the +requested size and store it into the struct :c:type:`struct +drm_gem_object ` filp field. The memory is +used as either main storage for the object when the graphics hardware +uses system memory directly or as a backing store otherwise. + +Drivers are responsible for the actual physical pages allocation by +calling shmem_read_mapping_page_gfp() for each page. +Note that they can decide to allocate pages when initializing the GEM +object, or to delay allocation until the memory is needed (for instance +when a page fault occurs as a result of a userspace memory access or +when the driver needs to start a DMA transfer involving the memory). + +Anonymous pageable memory allocation is not always desired, for instance +when the hardware requires physically contiguous system memory as is +often the case in embedded devices. Drivers can create GEM objects with +no shmfs backing (called private GEM objects) by initializing them with a call +to drm_gem_private_object_init() instead of drm_gem_object_init(). Storage for +private GEM objects must be managed by drivers. + +GEM Objects Lifetime +-------------------- + +All GEM objects are reference-counted by the GEM core. References can be +acquired and release by calling drm_gem_object_get() and drm_gem_object_put() +respectively. + +When the last reference to a GEM object is released the GEM core calls +the :c:type:`struct drm_driver ` gem_free_object_unlocked +operation. That operation is mandatory for GEM-enabled drivers and must +free the GEM object and all associated resources. + +void (\*gem_free_object) (struct drm_gem_object \*obj); Drivers are +responsible for freeing all GEM object resources. This includes the +resources created by the GEM core, which need to be released with +drm_gem_object_release(). + +GEM Objects Naming +------------------ + +Communication between userspace and the kernel refers to GEM objects +using local handles, global names or, more recently, file descriptors. +All of those are 32-bit integer values; the usual Linux kernel limits +apply to the file descriptors. + +GEM handles are local to a DRM file. Applications get a handle to a GEM +object through a driver-specific ioctl, and can use that handle to refer +to the GEM object in other standard or driver-specific ioctls. Closing a +DRM file handle frees all its GEM handles and dereferences the +associated GEM objects. + +To create a handle for a GEM object drivers call drm_gem_handle_create(). The +function takes a pointer to the DRM file and the GEM object and returns a +locally unique handle. When the handle is no longer needed drivers delete it +with a call to drm_gem_handle_delete(). Finally the GEM object associated with a +handle can be retrieved by a call to drm_gem_object_lookup(). + +Handles don't take ownership of GEM objects, they only take a reference +to the object that will be dropped when the handle is destroyed. To +avoid leaking GEM objects, drivers must make sure they drop the +reference(s) they own (such as the initial reference taken at object +creation time) as appropriate, without any special consideration for the +handle. For example, in the particular case of combined GEM object and +handle creation in the implementation of the dumb_create operation, +drivers must drop the initial reference to the GEM object before +returning the handle. + +GEM names are similar in purpose to handles but are not local to DRM +files. They can be passed between processes to reference a GEM object +globally. Names can't be used directly to refer to objects in the DRM +API, applications must convert handles to names and names to handles +using the DRM_IOCTL_GEM_FLINK and DRM_IOCTL_GEM_OPEN ioctls +respectively. The conversion is handled by the DRM core without any +driver-specific support. + +GEM also supports buffer sharing with dma-buf file descriptors through +PRIME. GEM-based drivers must use the provided helpers functions to +implement the exporting and importing correctly. See ?. Since sharing +file descriptors is inherently more secure than the easily guessable and +global GEM names it is the preferred buffer sharing mechanism. Sharing +buffers through GEM names is only supported for legacy userspace. +Furthermore PRIME also allows cross-device buffer sharing since it is +based on dma-bufs. + +GEM Objects Mapping +------------------- + +Because mapping operations are fairly heavyweight GEM favours +read/write-like access to buffers, implemented through driver-specific +ioctls, over mapping buffers to userspace. However, when random access +to the buffer is needed (to perform software rendering for instance), +direct access to the object can be more efficient. + +The mmap system call can't be used directly to map GEM objects, as they +don't have their own file handle. Two alternative methods currently +co-exist to map GEM objects to userspace. The first method uses a +driver-specific ioctl to perform the mapping operation, calling +do_mmap() under the hood. This is often considered +dubious, seems to be discouraged for new GEM-enabled drivers, and will +thus not be described here. + +The second method uses the mmap system call on the DRM file handle. void +\*mmap(void \*addr, size_t length, int prot, int flags, int fd, off_t +offset); DRM identifies the GEM object to be mapped by a fake offset +passed through the mmap offset argument. Prior to being mapped, a GEM +object must thus be associated with a fake offset. To do so, drivers +must call drm_gem_create_mmap_offset() on the object. + +Once allocated, the fake offset value must be passed to the application +in a driver-specific way and can then be used as the mmap offset +argument. + +The GEM core provides a helper method drm_gem_mmap() to +handle object mapping. The method can be set directly as the mmap file +operation handler. It will look up the GEM object based on the offset +value and set the VMA operations to the :c:type:`struct drm_driver +` gem_vm_ops field. Note that drm_gem_mmap() doesn't map memory to +userspace, but relies on the driver-provided fault handler to map pages +individually. + +To use drm_gem_mmap(), drivers must fill the struct :c:type:`struct drm_driver +` gem_vm_ops field with a pointer to VM operations. + +The VM operations is a :c:type:`struct vm_operations_struct ` +made up of several fields, the more interesting ones being: + +.. code-block:: c + + struct vm_operations_struct { + void (*open)(struct vm_area_struct * area); + void (*close)(struct vm_area_struct * area); + vm_fault_t (*fault)(struct vm_fault *vmf); + }; + + +The open and close operations must update the GEM object reference +count. Drivers can use the drm_gem_vm_open() and drm_gem_vm_close() helper +functions directly as open and close handlers. + +The fault operation handler is responsible for mapping individual pages +to userspace when a page fault occurs. Depending on the memory +allocation scheme, drivers can allocate pages at fault time, or can +decide to allocate memory for the GEM object at the time the object is +created. + +Drivers that want to map the GEM object upfront instead of handling page +faults can implement their own mmap file operation handler. + +For platforms without MMU the GEM core provides a helper method +drm_gem_cma_get_unmapped_area(). The mmap() routines will call this to get a +proposed address for the mapping. + +To use drm_gem_cma_get_unmapped_area(), drivers must fill the struct +:c:type:`struct file_operations ` get_unmapped_area field with +a pointer on drm_gem_cma_get_unmapped_area(). + +More detailed information about get_unmapped_area can be found in +Documentation/admin-guide/mm/nommu-mmap.rst + +Memory Coherency +---------------- + +When mapped to the device or used in a command buffer, backing pages for +an object are flushed to memory and marked write combined so as to be +coherent with the GPU. Likewise, if the CPU accesses an object after the +GPU has finished rendering to the object, then the object must be made +coherent with the CPU's view of memory, usually involving GPU cache +flushing of various kinds. This core CPU<->GPU coherency management is +provided by a device-specific ioctl, which evaluates an object's current +domain and performs any necessary flushing or synchronization to put the +object into the desired coherency domain (note that the object may be +busy, i.e. an active render target; in that case, setting the domain +blocks the client and waits for rendering to complete before performing +any necessary flushing operations). + +Command Execution +----------------- + +Perhaps the most important GEM function for GPU devices is providing a +command execution interface to clients. Client programs construct +command buffers containing references to previously allocated memory +objects, and then submit them to GEM. At that point, GEM takes care to +bind all the objects into the GTT, execute the buffer, and provide +necessary synchronization between clients accessing the same buffers. +This often involves evicting some objects from the GTT and re-binding +others (a fairly expensive operation), and providing relocation support +which hides fixed GTT offsets from clients. Clients must take care not +to submit command buffers that reference more objects than can fit in +the GTT; otherwise, GEM will reject them and no rendering will occur. +Similarly, if several objects in the buffer require fence registers to +be allocated for correct rendering (e.g. 2D blits on pre-965 chips), +care must be taken not to require more fence registers than are +available to the client. Such resource management should be abstracted +from the client in libdrm. + +GEM Function Reference +---------------------- + +.. kernel-doc:: include/drm/drm_gem.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_gem.c + :export: + +GEM CMA Helper Functions Reference +---------------------------------- + +.. kernel-doc:: drivers/gpu/drm/drm_gem_cma_helper.c + :doc: cma helpers + +.. kernel-doc:: include/drm/drm_gem_cma_helper.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_gem_cma_helper.c + :export: + +GEM SHMEM Helper Function Reference +----------------------------------- + +.. kernel-doc:: drivers/gpu/drm/drm_gem_shmem_helper.c + :doc: overview + +.. kernel-doc:: include/drm/drm_gem_shmem_helper.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_gem_shmem_helper.c + :export: + +GEM VRAM Helper Functions Reference +----------------------------------- + +.. kernel-doc:: drivers/gpu/drm/drm_gem_vram_helper.c + :doc: overview + +.. kernel-doc:: include/drm/drm_gem_vram_helper.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_gem_vram_helper.c + :export: + +GEM TTM Helper Functions Reference +----------------------------------- + +.. kernel-doc:: drivers/gpu/drm/drm_gem_ttm_helper.c + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/drm_gem_ttm_helper.c + :export: + +VMA Offset Manager +================== + +.. kernel-doc:: drivers/gpu/drm/drm_vma_manager.c + :doc: vma offset manager + +.. kernel-doc:: include/drm/drm_vma_manager.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_vma_manager.c + :export: + +.. _prime_buffer_sharing: + +PRIME Buffer Sharing +==================== + +PRIME is the cross device buffer sharing framework in drm, originally +created for the OPTIMUS range of multi-gpu platforms. To userspace PRIME +buffers are dma-buf based file descriptors. + +Overview and Lifetime Rules +--------------------------- + +.. kernel-doc:: drivers/gpu/drm/drm_prime.c + :doc: overview and lifetime rules + +PRIME Helper Functions +---------------------- + +.. kernel-doc:: drivers/gpu/drm/drm_prime.c + :doc: PRIME Helpers + +PRIME Function References +------------------------- + +.. kernel-doc:: include/drm/drm_prime.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_prime.c + :export: + +DRM MM Range Allocator +====================== + +Overview +-------- + +.. kernel-doc:: drivers/gpu/drm/drm_mm.c + :doc: Overview + +LRU Scan/Eviction Support +------------------------- + +.. kernel-doc:: drivers/gpu/drm/drm_mm.c + :doc: lru scan roster + +DRM MM Range Allocator Function References +------------------------------------------ + +.. kernel-doc:: include/drm/drm_mm.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_mm.c + :export: + +DRM Cache Handling +================== + +.. kernel-doc:: drivers/gpu/drm/drm_cache.c + :export: + +DRM Sync Objects +=========================== + +.. kernel-doc:: drivers/gpu/drm/drm_syncobj.c + :doc: Overview + +.. kernel-doc:: include/drm/drm_syncobj.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_syncobj.c + :export: + +GPU Scheduler +============= + +Overview +-------- + +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c + :doc: Overview + +Scheduler Function References +----------------------------- + +.. kernel-doc:: include/drm/gpu_scheduler.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c + :export: -- cgit v1.2.3