summaryrefslogtreecommitdiffstats
path: root/Documentation/driver-api/edac.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/driver-api/edac.rst')
-rw-r--r--Documentation/driver-api/edac.rst298
1 files changed, 298 insertions, 0 deletions
diff --git a/Documentation/driver-api/edac.rst b/Documentation/driver-api/edac.rst
new file mode 100644
index 000000000..f4f044b95
--- /dev/null
+++ b/Documentation/driver-api/edac.rst
@@ -0,0 +1,298 @@
+Error Detection And Correction (EDAC) Devices
+=============================================
+
+Main Concepts used at the EDAC subsystem
+----------------------------------------
+
+There are several things to be aware of that aren't at all obvious, like
+*sockets, *socket sets*, *banks*, *rows*, *chip-select rows*, *channels*,
+etc...
+
+These are some of the many terms that are thrown about that don't always
+mean what people think they mean (Inconceivable!). In the interest of
+creating a common ground for discussion, terms and their definitions
+will be established.
+
+* Memory devices
+
+The individual DRAM chips on a memory stick. These devices commonly
+output 4 and 8 bits each (x4, x8). Grouping several of these in parallel
+provides the number of bits that the memory controller expects:
+typically 72 bits, in order to provide 64 bits + 8 bits of ECC data.
+
+* Memory Stick
+
+A printed circuit board that aggregates multiple memory devices in
+parallel. In general, this is the Field Replaceable Unit (FRU) which
+gets replaced, in the case of excessive errors. Most often it is also
+called DIMM (Dual Inline Memory Module).
+
+* Memory Socket
+
+A physical connector on the motherboard that accepts a single memory
+stick. Also called as "slot" on several datasheets.
+
+* Channel
+
+A memory controller channel, responsible to communicate with a group of
+DIMMs. Each channel has its own independent control (command) and data
+bus, and can be used independently or grouped with other channels.
+
+* Branch
+
+It is typically the highest hierarchy on a Fully-Buffered DIMM memory
+controller. Typically, it contains two channels. Two channels at the
+same branch can be used in single mode or in lockstep mode. When
+lockstep is enabled, the cacheline is doubled, but it generally brings
+some performance penalty. Also, it is generally not possible to point to
+just one memory stick when an error occurs, as the error correction code
+is calculated using two DIMMs instead of one. Due to that, it is capable
+of correcting more errors than on single mode.
+
+* Single-channel
+
+The data accessed by the memory controller is contained into one dimm
+only. E. g. if the data is 64 bits-wide, the data flows to the CPU using
+one 64 bits parallel access. Typically used with SDR, DDR, DDR2 and DDR3
+memories. FB-DIMM and RAMBUS use a different concept for channel, so
+this concept doesn't apply there.
+
+* Double-channel
+
+The data size accessed by the memory controller is interlaced into two
+dimms, accessed at the same time. E. g. if the DIMM is 64 bits-wide (72
+bits with ECC), the data flows to the CPU using a 128 bits parallel
+access.
+
+* Chip-select row
+
+This is the name of the DRAM signal used to select the DRAM ranks to be
+accessed. Common chip-select rows for single channel are 64 bits, for
+dual channel 128 bits. It may not be visible by the memory controller,
+as some DIMM types have a memory buffer that can hide direct access to
+it from the Memory Controller.
+
+* Single-Ranked stick
+
+A Single-ranked stick has 1 chip-select row of memory. Motherboards
+commonly drive two chip-select pins to a memory stick. A single-ranked
+stick, will occupy only one of those rows. The other will be unused.
+
+.. _doubleranked:
+
+* Double-Ranked stick
+
+A double-ranked stick has two chip-select rows which access different
+sets of memory devices. The two rows cannot be accessed concurrently.
+
+* Double-sided stick
+
+**DEPRECATED TERM**, see :ref:`Double-Ranked stick <doubleranked>`.
+
+A double-sided stick has two chip-select rows which access different sets
+of memory devices. The two rows cannot be accessed concurrently.
+"Double-sided" is irrespective of the memory devices being mounted on
+both sides of the memory stick.
+
+* Socket set
+
+All of the memory sticks that are required for a single memory access or
+all of the memory sticks spanned by a chip-select row. A single socket
+set has two chip-select rows and if double-sided sticks are used these
+will occupy those chip-select rows.
+
+* Bank
+
+This term is avoided because it is unclear when needing to distinguish
+between chip-select rows and socket sets.
+
+* High Bandwidth Memory (HBM)
+
+HBM is a new memory type with low power consumption and ultra-wide
+communication lanes. It uses vertically stacked memory chips (DRAM dies)
+interconnected by microscopic wires called "through-silicon vias," or
+TSVs.
+
+Several stacks of HBM chips connect to the CPU or GPU through an ultra-fast
+interconnect called the "interposer". Therefore, HBM's characteristics
+are nearly indistinguishable from on-chip integrated RAM.
+
+Memory Controllers
+------------------
+
+Most of the EDAC core is focused on doing Memory Controller error detection.
+The :c:func:`edac_mc_alloc`. It uses internally the struct ``mem_ctl_info``
+to describe the memory controllers, with is an opaque struct for the EDAC
+drivers. Only the EDAC core is allowed to touch it.
+
+.. kernel-doc:: include/linux/edac.h
+
+.. kernel-doc:: drivers/edac/edac_mc.h
+
+PCI Controllers
+---------------
+
+The EDAC subsystem provides a mechanism to handle PCI controllers by calling
+the :c:func:`edac_pci_alloc_ctl_info`. It will use the struct
+:c:type:`edac_pci_ctl_info` to describe the PCI controllers.
+
+.. kernel-doc:: drivers/edac/edac_pci.h
+
+EDAC Blocks
+-----------
+
+The EDAC subsystem also provides a generic mechanism to report errors on
+other parts of the hardware via :c:func:`edac_device_alloc_ctl_info` function.
+
+The structures :c:type:`edac_dev_sysfs_block_attribute`,
+:c:type:`edac_device_block`, :c:type:`edac_device_instance` and
+:c:type:`edac_device_ctl_info` provide a generic or abstract 'edac_device'
+representation at sysfs.
+
+This set of structures and the code that implements the APIs for the same, provide for registering EDAC type devices which are NOT standard memory or
+PCI, like:
+
+- CPU caches (L1 and L2)
+- DMA engines
+- Core CPU switches
+- Fabric switch units
+- PCIe interface controllers
+- other EDAC/ECC type devices that can be monitored for
+ errors, etc.
+
+It allows for a 2 level set of hierarchy.
+
+For example, a cache could be composed of L1, L2 and L3 levels of cache.
+Each CPU core would have its own L1 cache, while sharing L2 and maybe L3
+caches. On such case, those can be represented via the following sysfs
+nodes::
+
+ /sys/devices/system/edac/..
+
+ pci/ <existing pci directory (if available)>
+ mc/ <existing memory device directory>
+ cpu/cpu0/.. <L1 and L2 block directory>
+ /L1-cache/ce_count
+ /ue_count
+ /L2-cache/ce_count
+ /ue_count
+ cpu/cpu1/.. <L1 and L2 block directory>
+ /L1-cache/ce_count
+ /ue_count
+ /L2-cache/ce_count
+ /ue_count
+ ...
+
+ the L1 and L2 directories would be "edac_device_block's"
+
+.. kernel-doc:: drivers/edac/edac_device.h
+
+
+Heterogeneous system support
+----------------------------
+
+An AMD heterogeneous system is built by connecting the data fabrics of
+both CPUs and GPUs via custom xGMI links. Thus, the data fabric on the
+GPU nodes can be accessed the same way as the data fabric on CPU nodes.
+
+The MI200 accelerators are data center GPUs. They have 2 data fabrics,
+and each GPU data fabric contains four Unified Memory Controllers (UMC).
+Each UMC contains eight channels. Each UMC channel controls one 128-bit
+HBM2e (2GB) channel (equivalent to 8 X 2GB ranks). This creates a total
+of 4096-bits of DRAM data bus.
+
+While the UMC is interfacing a 16GB (8high X 2GB DRAM) HBM stack, each UMC
+channel is interfacing 2GB of DRAM (represented as rank).
+
+Memory controllers on AMD GPU nodes can be represented in EDAC thusly:
+
+ GPU DF / GPU Node -> EDAC MC
+ GPU UMC -> EDAC CSROW
+ GPU UMC channel -> EDAC CHANNEL
+
+For example: a heterogeneous system with 1 AMD CPU is connected to
+4 MI200 (Aldebaran) GPUs using xGMI.
+
+Some more heterogeneous hardware details:
+
+- The CPU UMC (Unified Memory Controller) is mostly the same as the GPU UMC.
+ They have chip selects (csrows) and channels. However, the layouts are different
+ for performance, physical layout, or other reasons.
+- CPU UMCs use 1 channel, In this case UMC = EDAC channel. This follows the
+ marketing speak. CPU has X memory channels, etc.
+- CPU UMCs use up to 4 chip selects, So UMC chip select = EDAC CSROW.
+- GPU UMCs use 1 chip select, So UMC = EDAC CSROW.
+- GPU UMCs use 8 channels, So UMC channel = EDAC channel.
+
+The EDAC subsystem provides a mechanism to handle AMD heterogeneous
+systems by calling system specific ops for both CPUs and GPUs.
+
+AMD GPU nodes are enumerated in sequential order based on the PCI
+hierarchy, and the first GPU node is assumed to have a Node ID value
+following those of the CPU nodes after latter are fully populated::
+
+ $ ls /sys/devices/system/edac/mc/
+ mc0 - CPU MC node 0
+ mc1 |
+ mc2 |- GPU card[0] => node 0(mc1), node 1(mc2)
+ mc3 |
+ mc4 |- GPU card[1] => node 0(mc3), node 1(mc4)
+ mc5 |
+ mc6 |- GPU card[2] => node 0(mc5), node 1(mc6)
+ mc7 |
+ mc8 |- GPU card[3] => node 0(mc7), node 1(mc8)
+
+For example, a heterogeneous system with one AMD CPU is connected to
+four MI200 (Aldebaran) GPUs using xGMI. This topology can be represented
+via the following sysfs entries::
+
+ /sys/devices/system/edac/mc/..
+
+ CPU # CPU node
+ ├── mc 0
+
+ GPU Nodes are enumerated sequentially after CPU nodes have been populated
+ GPU card 1 # Each MI200 GPU has 2 nodes/mcs
+ ├── mc 1 # GPU node 0 == mc1, Each MC node has 4 UMCs/CSROWs
+ │   ├── csrow 0 # UMC 0
+ │   │   ├── channel 0 # Each UMC has 8 channels
+ │   │   ├── channel 1 # size of each channel is 2 GB, so each UMC has 16 GB
+ │   │   ├── channel 2
+ │   │   ├── channel 3
+ │   │   ├── channel 4
+ │   │   ├── channel 5
+ │   │   ├── channel 6
+ │   │   ├── channel 7
+ │   ├── csrow 1 # UMC 1
+ │   │   ├── channel 0
+ │   │   ├── ..
+ │   │   ├── channel 7
+ │   ├── .. ..
+ │   ├── csrow 3 # UMC 3
+ │   │   ├── channel 0
+ │   │   ├── ..
+ │   │   ├── channel 7
+ │   ├── rank 0
+ │   ├── .. ..
+ │   ├── rank 31 # total 32 ranks/dimms from 4 UMCs
+ ├
+ ├── mc 2 # GPU node 1 == mc2
+ │   ├── .. # each GPU has total 64 GB
+
+ GPU card 2
+ ├── mc 3
+ │   ├── ..
+ ├── mc 4
+ │   ├── ..
+
+ GPU card 3
+ ├── mc 5
+ │   ├── ..
+ ├── mc 6
+ │   ├── ..
+
+ GPU card 4
+ ├── mc 7
+ │   ├── ..
+ ├── mc 8
+ │   ├── ..