diff options
Diffstat (limited to 'Documentation/gpu/amdgpu')
-rw-r--r-- | Documentation/gpu/amdgpu/debugging.rst | 80 | ||||
-rw-r--r-- | Documentation/gpu/amdgpu/dgpu-asic-info-table.csv | 2 | ||||
-rw-r--r-- | Documentation/gpu/amdgpu/display/dcn-blocks.rst | 78 | ||||
-rw-r--r-- | Documentation/gpu/amdgpu/display/display-contributing.rst | 168 | ||||
-rw-r--r-- | Documentation/gpu/amdgpu/display/display-manager.rst | 3 | ||||
-rw-r--r-- | Documentation/gpu/amdgpu/display/index.rst | 78 | ||||
-rw-r--r-- | Documentation/gpu/amdgpu/index.rst | 1 |
7 files changed, 400 insertions, 10 deletions
diff --git a/Documentation/gpu/amdgpu/debugging.rst b/Documentation/gpu/amdgpu/debugging.rst new file mode 100644 index 0000000000..e75f97d0e4 --- /dev/null +++ b/Documentation/gpu/amdgpu/debugging.rst @@ -0,0 +1,80 @@ +=============== + GPU Debugging +=============== + +GPUVM Debugging +=============== + +To aid in debugging GPU virtual memory related problems, the driver supports a +number of options module parameters: + +`vm_fault_stop` - If non-0, halt the GPU memory controller on a GPU page fault. + +`vm_update_mode` - If non-0, use the CPU to update GPU page tables rather than +the GPU. + + +Decoding a GPUVM Page Fault +=========================== + +If you see a GPU page fault in the kernel log, you can decode it to figure +out what is going wrong in your application. A page fault in your kernel +log may look something like this: + +:: + + [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32777, for process glxinfo pid 2424 thread glxinfo:cs0 pid 2425) + in page starting at address 0x0000800102800000 from IH client 0x1b (UTCL2) + VM_L2_PROTECTION_FAULT_STATUS:0x00301030 + Faulty UTCL2 client ID: TCP (0x8) + MORE_FAULTS: 0x0 + WALKER_ERROR: 0x0 + PERMISSION_FAULTS: 0x3 + MAPPING_ERROR: 0x0 + RW: 0x0 + +First you have the memory hub, gfxhub and mmhub. gfxhub is the memory +hub used for graphics, compute, and sdma on some chips. mmhub is the +memory hub used for multi-media and sdma on some chips. + +Next you have the vmid and pasid. If the vmid is 0, this fault was likely +caused by the kernel driver or firmware. If the vmid is non-0, it is generally +a fault in a user application. The pasid is used to link a vmid to a system +process id. If the process is active when the fault happens, the process +information will be printed. + +The GPU virtual address that caused the fault comes next. + +The client ID indicates the GPU block that caused the fault. +Some common client IDs: + +- CB/DB: The color/depth backend of the graphics pipe +- CPF: Command Processor Frontend +- CPC: Command Processor Compute +- CPG: Command Processor Graphics +- TCP/SQC/SQG: Shaders +- SDMA: SDMA engines +- VCN: Video encode/decode engines +- JPEG: JPEG engines + +PERMISSION_FAULTS describe what faults were encountered: + +- bit 0: the PTE was not valid +- bit 1: the PTE read bit was not set +- bit 2: the PTE write bit was not set +- bit 3: the PTE execute bit was not set + +Finally, RW, indicates whether the access was a read (0) or a write (1). + +In the example above, a shader (cliend id = TCP) generated a read (RW = 0x0) to +an invalid page (PERMISSION_FAULTS = 0x3) at GPU virtual address +0x0000800102800000. The user can then inspect their shader code and resource +descriptor state to determine what caused the GPU page fault. + +UMR +=== + +`umr <https://gitlab.freedesktop.org/tomstdenis/umr>`_ is a general purpose +GPU debugging and diagnostics tool. Please see the umr +`documentation <https://umr.readthedocs.io/en/main/>`_ for more information +about its capabilities. diff --git a/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv b/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv index 882d2518f8..3825f00ca9 100644 --- a/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv +++ b/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv @@ -16,6 +16,7 @@ Radeon (RX|TM) (PRO|WX) Vega /MI25 /V320 /V340L /8200 /9100 /SSG MxGPU, VEGA10, AMD Radeon (Pro) VII /MI50 /MI60, VEGA20, DCE 12, 9.4.0, VCE 4.1.0 / UVD 7.2.0, 4.2.0 MI100, ARCTURUS, *, 9.4.1, VCN 2.5.0, 4.2.2 MI200, ALDEBARAN, *, 9.4.2, VCN 2.6.0, 4.4.0 +MI300, AQUA_VANGARAM, *, 9.4.3, VCN 4.0.3, 4.4.2 AMD Radeon (RX|Pro) 5600(M|XT) /5700 (M|XT|XTB) /W5700, NAVI10, DCN 2.0.0, 10.1.10, VCN 2.0.0, 5.0.0 AMD Radeon (Pro) 5300 /5500XTB/5500(XT|M) /W5500M /W5500, NAVI14, DCN 2.0.0, 10.1.1, VCN 2.0.2, 5.0.2 AMD Radeon RX 6800(XT) /6900(XT) /W6800, SIENNA_CICHLID, DCN 3.0.0, 10.3.0, VCN 3.0.0, 5.2.0 @@ -23,4 +24,5 @@ AMD Radeon RX 6700 XT / 6800M / 6700M, NAVY_FLOUNDER, DCN 3.0.0, 10.3.2, VCN 3.0 AMD Radeon RX 6600(XT) /6600M /W6600 /W6600M, DIMGREY_CAVEFISH, DCN 3.0.2, 10.3.4, VCN 3.0.16, 5.2.4 AMD Radeon RX 6500M /6300M /W6500M /W6300M, BEIGE_GOBY, DCN 3.0.3, 10.3.5, VCN 3.0.33, 5.2.5 AMD Radeon RX 7900 XT /XTX, , DCN 3.2.0, 11.0.0, VCN 4.0.0, 6.0.0 +AMD Radeon RX 7800 XT, , DCN 3.2.0, 11.0.3, VCN 4.0.0, 6.0.3 AMD Radeon RX 7600M (XT) /7700S /7600S, , DCN 3.2.1, 11.0.2, VCN 4.0.4, 6.0.2 diff --git a/Documentation/gpu/amdgpu/display/dcn-blocks.rst b/Documentation/gpu/amdgpu/display/dcn-blocks.rst new file mode 100644 index 0000000000..a3fbd3ea02 --- /dev/null +++ b/Documentation/gpu/amdgpu/display/dcn-blocks.rst @@ -0,0 +1,78 @@ +========== +DCN Blocks +========== + +In this section, you will find some extra details about some of the DCN blocks +and the code documentation when it is automatically generated. + +DCHUBBUB +-------- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :internal: + +HUBP +---- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :internal: + +DPP +--- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :internal: + +MPC +--- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h + :internal: + +OPP +--- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h + :internal: + +DIO +--- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h + :internal: diff --git a/Documentation/gpu/amdgpu/display/display-contributing.rst b/Documentation/gpu/amdgpu/display/display-contributing.rst new file mode 100644 index 0000000000..36f3077eee --- /dev/null +++ b/Documentation/gpu/amdgpu/display/display-contributing.rst @@ -0,0 +1,168 @@ +.. _display_todos: + +============================== +AMDGPU - Display Contributions +============================== + +First of all, if you are here, you probably want to give some technical +contribution to the display code, and for that, we say thank you :) + +This page summarizes some of the issues you can help with; keep in mind that +this is a static page, and it is always a good idea to try to reach developers +in the amdgfx or some of the maintainers. Finally, this page follows the DRM +way of creating a TODO list; for more information, check +'Documentation/gpu/todo.rst'. + +Gitlab issues +============= + +Users can report issues associated with AMD GPUs at: + +- https://gitlab.freedesktop.org/drm/amd + +Usually, we try to add a proper label to all new tickets to make it easy to +filter issues. If you can reproduce any problem, you could help by adding more +information or fixing the issue. + +Level: diverse + +IGT +=== + +`IGT`_ provides many integration tests that can be run on your GPU. We always +want to pass a large set of tests to increase the test coverage in our CI. If +you wish to contribute to the display code but are unsure where a good place +is, we recommend you run all IGT tests and try to fix any failure you see in +your hardware. Keep in mind that this failure can be an IGT problem or a kernel +issue; it is necessary to analyze case-by-case. + +Level: diverse + +.. _IGT: https://gitlab.freedesktop.org/drm/igt-gpu-tools + +Compilation +=========== + +Fix compilation warnings +------------------------ + +Enable the W1 or W2 warning level in the kernel compilation and try to fix the +issues on the display side. + +Level: Starter + +Fix compilation issues when using um architecture +------------------------------------------------- + +Linux has a User-mode Linux (UML) feature, and the kernel can be compiled to +the **um** architecture. Compiling for **um** can bring multiple advantages +from the test perspective. We currently have some compilation issues in this +area that we need to fix. + +Level: Intermediate + +Code Refactor +============= + +Add prefix to DC functions to improve the debug with ftrace +----------------------------------------------------------- + +The Ftrace debug feature (check 'Documentation/trace/ftrace.rst') is a +fantastic way to check the code path when developers try to make sense of a +bug. Ftrace provides a filter mechanism that can be useful when the developer +has some hunch of which part of the code can cause the issue; for this reason, +if a set of functions has a proper prefix, it becomes easy to create a good +filter. Additionally, prefixes can improve stack trace readability. + +The DC code does not follow some prefix rules, which makes the Ftrace filter +more complicated and reduces the readability of the stack trace. If you want +something simple to start contributing to the display, you can make patches for +adding prefixes to DC functions. To create those prefixes, use part of the file +name as a prefix for all functions in the target file. Check the +'amdgpu_dm_crtc.c` and `amdgpu_dm_plane.c` for some references. However, we +strongly advise not to send huge patches changing these prefixes; otherwise, it +will be hard to review and test, which can generate second thoughts from +maintainers. Try small steps; in case of double, you can ask before you put in +effort. We recommend first looking at folders like dceXYZ, dcnXYZ, basics, +bios, core, clk_mgr, hwss, resource, and irq. + +Level: Starter + +Reduce code duplication +----------------------- + +AMD has an extensive portfolio with various dGPUs and APUs that amdgpu +supports. To maintain the new hardware release cadence, DCE/DCN was designed in +a modular design, making the bring-up for new hardware fast. Over the years, +amdgpu accumulated some technical debt in the code duplication area. For this +task, it would be a good idea to find a tool that can discover code duplication +(including patterns) and use it as guidance to reduce duplications. + +Level: Intermediate + +Make atomic_commit_[check|tail] more readable +--------------------------------------------- + +The functions responsible for atomic commit and tail are intricate and +extensive. In particular `amdgpu_dm_atomic_commit_tail` is a long function and +could benefit from being split into smaller helpers. Improvements in this area +are more than welcome, but keep in mind that changes in this area will affect +all ASICs, meaning that refactoring requires a comprehensive verification; in +other words, this effort can take some time for validation. + +Level: Advanced + +Documentation +============= + +Expand kernel-doc +----------------- + +Many DC functions do not have a proper kernel-doc; understanding a function and +adding documentation is a great way to learn more about the amdgpu driver and +also leave an outstanding contribution to the entire community. + +Level: Starter + +Beyond AMDGPU +============= + +AMDGPU provides features that are not yet enabled in the userspace. This +section highlights some of the coolest display features, which could be enabled +with the userspace developer helper. + +Enable underlay +--------------- + +AMD display has this feature called underlay (which you can read more about at +'Documentation/gpu/amdgpu/display/mpo-overview.rst') which is intended to +save power when playing a video. The basic idea is to put a video in the +underlay plane at the bottom and the desktop in the plane above it with a hole +in the video area. This feature is enabled in ChromeOS, and from our data +measurement, it can save power. + +Level: Unknown + +Adaptive Backlight Modulation (ABM) +----------------------------------- + +ABM is a feature that adjusts the display panel's backlight level and pixel +values depending on the displayed image. This power-saving feature can be very +useful when the system starts to run off battery; since this will impact the +display output fidelity, it would be good if this option was something that +users could turn on or off. + +Level: Unknown + + +HDR & Color management & VRR +---------------------------- + +HDR, Color Management, and VRR are huge topics and it's hard to put these into +concise ToDos. If you are interested in this topic, we recommend checking some +blog posts from the community developers to better understand some of the +specific challenges and people working on the subject. If anyone wants to work +on some particular part, we can try to help with some basic guidance. Finally, +keep in mind that we already have some kernel-doc in place for those areas. + +Level: Unknown diff --git a/Documentation/gpu/amdgpu/display/display-manager.rst b/Documentation/gpu/amdgpu/display/display-manager.rst index be2651ecdd..67a811e689 100644 --- a/Documentation/gpu/amdgpu/display/display-manager.rst +++ b/Documentation/gpu/amdgpu/display/display-manager.rst @@ -132,9 +132,6 @@ The DRM blend mode and its elements are then mapped by AMDGPU display manager (MPC), as follows: .. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h - :doc: mpc-overview - -.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h :functions: mpcc_blnd_cfg Therefore, the blending configuration for a single MPCC instance on the MPC diff --git a/Documentation/gpu/amdgpu/display/index.rst b/Documentation/gpu/amdgpu/display/index.rst index f8a4f53d70..f0c342e00a 100644 --- a/Documentation/gpu/amdgpu/display/index.rst +++ b/Documentation/gpu/amdgpu/display/index.rst @@ -7,18 +7,80 @@ drm/amd/display - Display Core (DC) AMD display engine is partially shared with other operating systems; for this reason, our Display Core Driver is divided into two pieces: -1. **Display Core (DC)** contains the OS-agnostic components. Things like +#. **Display Core (DC)** contains the OS-agnostic components. Things like hardware programming and resource management are handled here. -2. **Display Manager (DM)** contains the OS-dependent components. Hooks to the - amdgpu base driver and DRM are implemented here. +#. **Display Manager (DM)** contains the OS-dependent components. Hooks to the + amdgpu base driver and DRM are implemented here. For example, you can check + display/amdgpu_dm/ folder. + +------------------ +DC Code validation +------------------ + +Maintaining the same code base across multiple OSes requires a lot of +synchronization effort between repositories and exhaustive validation. In the +DC case, we maintain a tree to centralize code from different parts. The shared +repository has integration tests with our Internal Linux CI farm, and we run a +comprehensive set of IGT tests in various AMD GPUs/APUs (mostly recent dGPUs +and APUs). Our CI also checks ARM64/32, PPC64/32, and x86_64/32 compilation +with DCN enabled and disabled. + +When we upstream a new feature or some patches, we pack them in a patchset with +the prefix **DC Patches for <DATE>**, which is created based on the latest +`amd-staging-drm-next <https://gitlab.freedesktop.org/agd5f/linux>`_. All of +those patches are under a DC version tested as follows: + +* Ensure that every patch compiles and the entire series pass our set of IGT + test in different hardware. +* Prepare a branch with those patches for our validation team. If there is an + error, a developer will debug as fast as possible; usually, a simple bisect + in the series is enough to point to a bad change, and two possible actions + emerge: fix the issue or drop the patch. If it is not an easy fix, the bad + patch is dropped. +* Finally, developers wait a few days for community feedback before we merge + the series. + +It is good to stress that the test phase is something that we take extremely +seriously, and we never merge anything that fails our validation. Follows an +overview of our test set: + +#. Manual test + * Multiple Hotplugs with DP and HDMI. + * Stress test with multiple display configuration changes via the user interface. + * Validate VRR behaviour. + * Check PSR. + * Validate MPO when playing video. + * Test more than two displays connected at the same time. + * Check suspend/resume. + * Validate FPO. + * Check MST. +#. Automated test + * IGT tests in a farm with GPUs and APUs that support DCN and DCE. + * Compilation validation with the latest GCC and Clang from LTS distro. + * Cross-compilation for PowerPC 64/32, ARM 64/32, and x86 32. + +In terms of test setup for CI and manual tests, we usually use: + +#. The latest Ubuntu LTS. +#. In terms of userspace, we only use fully updated open-source components + provided by the distribution official package manager. +#. Regarding IGT, we use the latest code from the upstream. +#. Most of the manual tests are conducted in the GNome but we also use KDE. + +Notice that someone from our test team will always reply to the cover letter +with the test report. + +-------------- +DC Information +-------------- The display pipe is responsible for "scanning out" a rendered frame from the GPU memory (also called VRAM, FrameBuffer, etc.) to a display. In other words, it would: -1. Read frame information from memory; -2. Perform required transformation; -3. Send pixel data to sink devices. +#. Read frame information from memory; +#. Perform required transformation; +#. Send pixel data to sink devices. If you want to learn more about our driver details, take a look at the below table of content: @@ -26,7 +88,9 @@ table of content: .. toctree:: display-manager.rst - dc-debug.rst dcn-overview.rst + dcn-blocks.rst mpo-overview.rst + dc-debug.rst + display-contributing.rst dc-glossary.rst diff --git a/Documentation/gpu/amdgpu/index.rst b/Documentation/gpu/amdgpu/index.rst index 912e699fd3..847e049240 100644 --- a/Documentation/gpu/amdgpu/index.rst +++ b/Documentation/gpu/amdgpu/index.rst @@ -15,4 +15,5 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) architectures. ras thermal driver-misc + debugging amdgpu-glossary |