summaryrefslogtreecommitdiffstats
path: root/Documentation/gpu/amdgpu
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/gpu/amdgpu')
-rw-r--r--Documentation/gpu/amdgpu/debugging.rst80
-rw-r--r--Documentation/gpu/amdgpu/dgpu-asic-info-table.csv2
-rw-r--r--Documentation/gpu/amdgpu/display/dcn-blocks.rst78
-rw-r--r--Documentation/gpu/amdgpu/display/display-contributing.rst168
-rw-r--r--Documentation/gpu/amdgpu/display/display-manager.rst3
-rw-r--r--Documentation/gpu/amdgpu/display/index.rst78
-rw-r--r--Documentation/gpu/amdgpu/index.rst1
7 files changed, 400 insertions, 10 deletions
diff --git a/Documentation/gpu/amdgpu/debugging.rst b/Documentation/gpu/amdgpu/debugging.rst
new file mode 100644
index 0000000000..e75f97d0e4
--- /dev/null
+++ b/Documentation/gpu/amdgpu/debugging.rst
@@ -0,0 +1,80 @@
+===============
+ GPU Debugging
+===============
+
+GPUVM Debugging
+===============
+
+To aid in debugging GPU virtual memory related problems, the driver supports a
+number of options module parameters:
+
+`vm_fault_stop` - If non-0, halt the GPU memory controller on a GPU page fault.
+
+`vm_update_mode` - If non-0, use the CPU to update GPU page tables rather than
+the GPU.
+
+
+Decoding a GPUVM Page Fault
+===========================
+
+If you see a GPU page fault in the kernel log, you can decode it to figure
+out what is going wrong in your application. A page fault in your kernel
+log may look something like this:
+
+::
+
+ [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32777, for process glxinfo pid 2424 thread glxinfo:cs0 pid 2425)
+ in page starting at address 0x0000800102800000 from IH client 0x1b (UTCL2)
+ VM_L2_PROTECTION_FAULT_STATUS:0x00301030
+ Faulty UTCL2 client ID: TCP (0x8)
+ MORE_FAULTS: 0x0
+ WALKER_ERROR: 0x0
+ PERMISSION_FAULTS: 0x3
+ MAPPING_ERROR: 0x0
+ RW: 0x0
+
+First you have the memory hub, gfxhub and mmhub. gfxhub is the memory
+hub used for graphics, compute, and sdma on some chips. mmhub is the
+memory hub used for multi-media and sdma on some chips.
+
+Next you have the vmid and pasid. If the vmid is 0, this fault was likely
+caused by the kernel driver or firmware. If the vmid is non-0, it is generally
+a fault in a user application. The pasid is used to link a vmid to a system
+process id. If the process is active when the fault happens, the process
+information will be printed.
+
+The GPU virtual address that caused the fault comes next.
+
+The client ID indicates the GPU block that caused the fault.
+Some common client IDs:
+
+- CB/DB: The color/depth backend of the graphics pipe
+- CPF: Command Processor Frontend
+- CPC: Command Processor Compute
+- CPG: Command Processor Graphics
+- TCP/SQC/SQG: Shaders
+- SDMA: SDMA engines
+- VCN: Video encode/decode engines
+- JPEG: JPEG engines
+
+PERMISSION_FAULTS describe what faults were encountered:
+
+- bit 0: the PTE was not valid
+- bit 1: the PTE read bit was not set
+- bit 2: the PTE write bit was not set
+- bit 3: the PTE execute bit was not set
+
+Finally, RW, indicates whether the access was a read (0) or a write (1).
+
+In the example above, a shader (cliend id = TCP) generated a read (RW = 0x0) to
+an invalid page (PERMISSION_FAULTS = 0x3) at GPU virtual address
+0x0000800102800000. The user can then inspect their shader code and resource
+descriptor state to determine what caused the GPU page fault.
+
+UMR
+===
+
+`umr <https://gitlab.freedesktop.org/tomstdenis/umr>`_ is a general purpose
+GPU debugging and diagnostics tool. Please see the umr
+`documentation <https://umr.readthedocs.io/en/main/>`_ for more information
+about its capabilities.
diff --git a/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv b/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv
index 882d2518f8..3825f00ca9 100644
--- a/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv
+++ b/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv
@@ -16,6 +16,7 @@ Radeon (RX|TM) (PRO|WX) Vega /MI25 /V320 /V340L /8200 /9100 /SSG MxGPU, VEGA10,
AMD Radeon (Pro) VII /MI50 /MI60, VEGA20, DCE 12, 9.4.0, VCE 4.1.0 / UVD 7.2.0, 4.2.0
MI100, ARCTURUS, *, 9.4.1, VCN 2.5.0, 4.2.2
MI200, ALDEBARAN, *, 9.4.2, VCN 2.6.0, 4.4.0
+MI300, AQUA_VANGARAM, *, 9.4.3, VCN 4.0.3, 4.4.2
AMD Radeon (RX|Pro) 5600(M|XT) /5700 (M|XT|XTB) /W5700, NAVI10, DCN 2.0.0, 10.1.10, VCN 2.0.0, 5.0.0
AMD Radeon (Pro) 5300 /5500XTB/5500(XT|M) /W5500M /W5500, NAVI14, DCN 2.0.0, 10.1.1, VCN 2.0.2, 5.0.2
AMD Radeon RX 6800(XT) /6900(XT) /W6800, SIENNA_CICHLID, DCN 3.0.0, 10.3.0, VCN 3.0.0, 5.2.0
@@ -23,4 +24,5 @@ AMD Radeon RX 6700 XT / 6800M / 6700M, NAVY_FLOUNDER, DCN 3.0.0, 10.3.2, VCN 3.0
AMD Radeon RX 6600(XT) /6600M /W6600 /W6600M, DIMGREY_CAVEFISH, DCN 3.0.2, 10.3.4, VCN 3.0.16, 5.2.4
AMD Radeon RX 6500M /6300M /W6500M /W6300M, BEIGE_GOBY, DCN 3.0.3, 10.3.5, VCN 3.0.33, 5.2.5
AMD Radeon RX 7900 XT /XTX, , DCN 3.2.0, 11.0.0, VCN 4.0.0, 6.0.0
+AMD Radeon RX 7800 XT, , DCN 3.2.0, 11.0.3, VCN 4.0.0, 6.0.3
AMD Radeon RX 7600M (XT) /7700S /7600S, , DCN 3.2.1, 11.0.2, VCN 4.0.4, 6.0.2
diff --git a/Documentation/gpu/amdgpu/display/dcn-blocks.rst b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
new file mode 100644
index 0000000000..a3fbd3ea02
--- /dev/null
+++ b/Documentation/gpu/amdgpu/display/dcn-blocks.rst
@@ -0,0 +1,78 @@
+==========
+DCN Blocks
+==========
+
+In this section, you will find some extra details about some of the DCN blocks
+and the code documentation when it is automatically generated.
+
+DCHUBBUB
+--------
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+ :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+ :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+ :internal:
+
+HUBP
+----
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+ :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+ :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+ :internal:
+
+DPP
+---
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+ :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+ :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h
+ :internal:
+
+MPC
+---
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
+ :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
+ :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
+ :internal:
+
+OPP
+---
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h
+ :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h
+ :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h
+ :internal:
+
+DIO
+---
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h
+ :doc: overview
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h
+ :export:
+
+.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h
+ :internal:
diff --git a/Documentation/gpu/amdgpu/display/display-contributing.rst b/Documentation/gpu/amdgpu/display/display-contributing.rst
new file mode 100644
index 0000000000..36f3077eee
--- /dev/null
+++ b/Documentation/gpu/amdgpu/display/display-contributing.rst
@@ -0,0 +1,168 @@
+.. _display_todos:
+
+==============================
+AMDGPU - Display Contributions
+==============================
+
+First of all, if you are here, you probably want to give some technical
+contribution to the display code, and for that, we say thank you :)
+
+This page summarizes some of the issues you can help with; keep in mind that
+this is a static page, and it is always a good idea to try to reach developers
+in the amdgfx or some of the maintainers. Finally, this page follows the DRM
+way of creating a TODO list; for more information, check
+'Documentation/gpu/todo.rst'.
+
+Gitlab issues
+=============
+
+Users can report issues associated with AMD GPUs at:
+
+- https://gitlab.freedesktop.org/drm/amd
+
+Usually, we try to add a proper label to all new tickets to make it easy to
+filter issues. If you can reproduce any problem, you could help by adding more
+information or fixing the issue.
+
+Level: diverse
+
+IGT
+===
+
+`IGT`_ provides many integration tests that can be run on your GPU. We always
+want to pass a large set of tests to increase the test coverage in our CI. If
+you wish to contribute to the display code but are unsure where a good place
+is, we recommend you run all IGT tests and try to fix any failure you see in
+your hardware. Keep in mind that this failure can be an IGT problem or a kernel
+issue; it is necessary to analyze case-by-case.
+
+Level: diverse
+
+.. _IGT: https://gitlab.freedesktop.org/drm/igt-gpu-tools
+
+Compilation
+===========
+
+Fix compilation warnings
+------------------------
+
+Enable the W1 or W2 warning level in the kernel compilation and try to fix the
+issues on the display side.
+
+Level: Starter
+
+Fix compilation issues when using um architecture
+-------------------------------------------------
+
+Linux has a User-mode Linux (UML) feature, and the kernel can be compiled to
+the **um** architecture. Compiling for **um** can bring multiple advantages
+from the test perspective. We currently have some compilation issues in this
+area that we need to fix.
+
+Level: Intermediate
+
+Code Refactor
+=============
+
+Add prefix to DC functions to improve the debug with ftrace
+-----------------------------------------------------------
+
+The Ftrace debug feature (check 'Documentation/trace/ftrace.rst') is a
+fantastic way to check the code path when developers try to make sense of a
+bug. Ftrace provides a filter mechanism that can be useful when the developer
+has some hunch of which part of the code can cause the issue; for this reason,
+if a set of functions has a proper prefix, it becomes easy to create a good
+filter. Additionally, prefixes can improve stack trace readability.
+
+The DC code does not follow some prefix rules, which makes the Ftrace filter
+more complicated and reduces the readability of the stack trace. If you want
+something simple to start contributing to the display, you can make patches for
+adding prefixes to DC functions. To create those prefixes, use part of the file
+name as a prefix for all functions in the target file. Check the
+'amdgpu_dm_crtc.c` and `amdgpu_dm_plane.c` for some references. However, we
+strongly advise not to send huge patches changing these prefixes; otherwise, it
+will be hard to review and test, which can generate second thoughts from
+maintainers. Try small steps; in case of double, you can ask before you put in
+effort. We recommend first looking at folders like dceXYZ, dcnXYZ, basics,
+bios, core, clk_mgr, hwss, resource, and irq.
+
+Level: Starter
+
+Reduce code duplication
+-----------------------
+
+AMD has an extensive portfolio with various dGPUs and APUs that amdgpu
+supports. To maintain the new hardware release cadence, DCE/DCN was designed in
+a modular design, making the bring-up for new hardware fast. Over the years,
+amdgpu accumulated some technical debt in the code duplication area. For this
+task, it would be a good idea to find a tool that can discover code duplication
+(including patterns) and use it as guidance to reduce duplications.
+
+Level: Intermediate
+
+Make atomic_commit_[check|tail] more readable
+---------------------------------------------
+
+The functions responsible for atomic commit and tail are intricate and
+extensive. In particular `amdgpu_dm_atomic_commit_tail` is a long function and
+could benefit from being split into smaller helpers. Improvements in this area
+are more than welcome, but keep in mind that changes in this area will affect
+all ASICs, meaning that refactoring requires a comprehensive verification; in
+other words, this effort can take some time for validation.
+
+Level: Advanced
+
+Documentation
+=============
+
+Expand kernel-doc
+-----------------
+
+Many DC functions do not have a proper kernel-doc; understanding a function and
+adding documentation is a great way to learn more about the amdgpu driver and
+also leave an outstanding contribution to the entire community.
+
+Level: Starter
+
+Beyond AMDGPU
+=============
+
+AMDGPU provides features that are not yet enabled in the userspace. This
+section highlights some of the coolest display features, which could be enabled
+with the userspace developer helper.
+
+Enable underlay
+---------------
+
+AMD display has this feature called underlay (which you can read more about at
+'Documentation/gpu/amdgpu/display/mpo-overview.rst') which is intended to
+save power when playing a video. The basic idea is to put a video in the
+underlay plane at the bottom and the desktop in the plane above it with a hole
+in the video area. This feature is enabled in ChromeOS, and from our data
+measurement, it can save power.
+
+Level: Unknown
+
+Adaptive Backlight Modulation (ABM)
+-----------------------------------
+
+ABM is a feature that adjusts the display panel's backlight level and pixel
+values depending on the displayed image. This power-saving feature can be very
+useful when the system starts to run off battery; since this will impact the
+display output fidelity, it would be good if this option was something that
+users could turn on or off.
+
+Level: Unknown
+
+
+HDR & Color management & VRR
+----------------------------
+
+HDR, Color Management, and VRR are huge topics and it's hard to put these into
+concise ToDos. If you are interested in this topic, we recommend checking some
+blog posts from the community developers to better understand some of the
+specific challenges and people working on the subject. If anyone wants to work
+on some particular part, we can try to help with some basic guidance. Finally,
+keep in mind that we already have some kernel-doc in place for those areas.
+
+Level: Unknown
diff --git a/Documentation/gpu/amdgpu/display/display-manager.rst b/Documentation/gpu/amdgpu/display/display-manager.rst
index be2651ecdd..67a811e689 100644
--- a/Documentation/gpu/amdgpu/display/display-manager.rst
+++ b/Documentation/gpu/amdgpu/display/display-manager.rst
@@ -132,9 +132,6 @@ The DRM blend mode and its elements are then mapped by AMDGPU display manager
(MPC), as follows:
.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
- :doc: mpc-overview
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
:functions: mpcc_blnd_cfg
Therefore, the blending configuration for a single MPCC instance on the MPC
diff --git a/Documentation/gpu/amdgpu/display/index.rst b/Documentation/gpu/amdgpu/display/index.rst
index f8a4f53d70..f0c342e00a 100644
--- a/Documentation/gpu/amdgpu/display/index.rst
+++ b/Documentation/gpu/amdgpu/display/index.rst
@@ -7,18 +7,80 @@ drm/amd/display - Display Core (DC)
AMD display engine is partially shared with other operating systems; for this
reason, our Display Core Driver is divided into two pieces:
-1. **Display Core (DC)** contains the OS-agnostic components. Things like
+#. **Display Core (DC)** contains the OS-agnostic components. Things like
hardware programming and resource management are handled here.
-2. **Display Manager (DM)** contains the OS-dependent components. Hooks to the
- amdgpu base driver and DRM are implemented here.
+#. **Display Manager (DM)** contains the OS-dependent components. Hooks to the
+ amdgpu base driver and DRM are implemented here. For example, you can check
+ display/amdgpu_dm/ folder.
+
+------------------
+DC Code validation
+------------------
+
+Maintaining the same code base across multiple OSes requires a lot of
+synchronization effort between repositories and exhaustive validation. In the
+DC case, we maintain a tree to centralize code from different parts. The shared
+repository has integration tests with our Internal Linux CI farm, and we run a
+comprehensive set of IGT tests in various AMD GPUs/APUs (mostly recent dGPUs
+and APUs). Our CI also checks ARM64/32, PPC64/32, and x86_64/32 compilation
+with DCN enabled and disabled.
+
+When we upstream a new feature or some patches, we pack them in a patchset with
+the prefix **DC Patches for <DATE>**, which is created based on the latest
+`amd-staging-drm-next <https://gitlab.freedesktop.org/agd5f/linux>`_. All of
+those patches are under a DC version tested as follows:
+
+* Ensure that every patch compiles and the entire series pass our set of IGT
+ test in different hardware.
+* Prepare a branch with those patches for our validation team. If there is an
+ error, a developer will debug as fast as possible; usually, a simple bisect
+ in the series is enough to point to a bad change, and two possible actions
+ emerge: fix the issue or drop the patch. If it is not an easy fix, the bad
+ patch is dropped.
+* Finally, developers wait a few days for community feedback before we merge
+ the series.
+
+It is good to stress that the test phase is something that we take extremely
+seriously, and we never merge anything that fails our validation. Follows an
+overview of our test set:
+
+#. Manual test
+ * Multiple Hotplugs with DP and HDMI.
+ * Stress test with multiple display configuration changes via the user interface.
+ * Validate VRR behaviour.
+ * Check PSR.
+ * Validate MPO when playing video.
+ * Test more than two displays connected at the same time.
+ * Check suspend/resume.
+ * Validate FPO.
+ * Check MST.
+#. Automated test
+ * IGT tests in a farm with GPUs and APUs that support DCN and DCE.
+ * Compilation validation with the latest GCC and Clang from LTS distro.
+ * Cross-compilation for PowerPC 64/32, ARM 64/32, and x86 32.
+
+In terms of test setup for CI and manual tests, we usually use:
+
+#. The latest Ubuntu LTS.
+#. In terms of userspace, we only use fully updated open-source components
+ provided by the distribution official package manager.
+#. Regarding IGT, we use the latest code from the upstream.
+#. Most of the manual tests are conducted in the GNome but we also use KDE.
+
+Notice that someone from our test team will always reply to the cover letter
+with the test report.
+
+--------------
+DC Information
+--------------
The display pipe is responsible for "scanning out" a rendered frame from the
GPU memory (also called VRAM, FrameBuffer, etc.) to a display. In other words,
it would:
-1. Read frame information from memory;
-2. Perform required transformation;
-3. Send pixel data to sink devices.
+#. Read frame information from memory;
+#. Perform required transformation;
+#. Send pixel data to sink devices.
If you want to learn more about our driver details, take a look at the below
table of content:
@@ -26,7 +88,9 @@ table of content:
.. toctree::
display-manager.rst
- dc-debug.rst
dcn-overview.rst
+ dcn-blocks.rst
mpo-overview.rst
+ dc-debug.rst
+ display-contributing.rst
dc-glossary.rst
diff --git a/Documentation/gpu/amdgpu/index.rst b/Documentation/gpu/amdgpu/index.rst
index 912e699fd3..847e049240 100644
--- a/Documentation/gpu/amdgpu/index.rst
+++ b/Documentation/gpu/amdgpu/index.rst
@@ -15,4 +15,5 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) architectures.
ras
thermal
driver-misc
+ debugging
amdgpu-glossary