diff options
Diffstat (limited to 'Documentation/driver-api/edac.rst')
-rw-r--r-- | Documentation/driver-api/edac.rst | 298 |
1 files changed, 298 insertions, 0 deletions
diff --git a/Documentation/driver-api/edac.rst b/Documentation/driver-api/edac.rst new file mode 100644 index 0000000000..f4f044b95c --- /dev/null +++ b/Documentation/driver-api/edac.rst @@ -0,0 +1,298 @@ +Error Detection And Correction (EDAC) Devices +============================================= + +Main Concepts used at the EDAC subsystem +---------------------------------------- + +There are several things to be aware of that aren't at all obvious, like +*sockets, *socket sets*, *banks*, *rows*, *chip-select rows*, *channels*, +etc... + +These are some of the many terms that are thrown about that don't always +mean what people think they mean (Inconceivable!). In the interest of +creating a common ground for discussion, terms and their definitions +will be established. + +* Memory devices + +The individual DRAM chips on a memory stick. These devices commonly +output 4 and 8 bits each (x4, x8). Grouping several of these in parallel +provides the number of bits that the memory controller expects: +typically 72 bits, in order to provide 64 bits + 8 bits of ECC data. + +* Memory Stick + +A printed circuit board that aggregates multiple memory devices in +parallel. In general, this is the Field Replaceable Unit (FRU) which +gets replaced, in the case of excessive errors. Most often it is also +called DIMM (Dual Inline Memory Module). + +* Memory Socket + +A physical connector on the motherboard that accepts a single memory +stick. Also called as "slot" on several datasheets. + +* Channel + +A memory controller channel, responsible to communicate with a group of +DIMMs. Each channel has its own independent control (command) and data +bus, and can be used independently or grouped with other channels. + +* Branch + +It is typically the highest hierarchy on a Fully-Buffered DIMM memory +controller. Typically, it contains two channels. Two channels at the +same branch can be used in single mode or in lockstep mode. When +lockstep is enabled, the cacheline is doubled, but it generally brings +some performance penalty. Also, it is generally not possible to point to +just one memory stick when an error occurs, as the error correction code +is calculated using two DIMMs instead of one. Due to that, it is capable +of correcting more errors than on single mode. + +* Single-channel + +The data accessed by the memory controller is contained into one dimm +only. E. g. if the data is 64 bits-wide, the data flows to the CPU using +one 64 bits parallel access. Typically used with SDR, DDR, DDR2 and DDR3 +memories. FB-DIMM and RAMBUS use a different concept for channel, so +this concept doesn't apply there. + +* Double-channel + +The data size accessed by the memory controller is interlaced into two +dimms, accessed at the same time. E. g. if the DIMM is 64 bits-wide (72 +bits with ECC), the data flows to the CPU using a 128 bits parallel +access. + +* Chip-select row + +This is the name of the DRAM signal used to select the DRAM ranks to be +accessed. Common chip-select rows for single channel are 64 bits, for +dual channel 128 bits. It may not be visible by the memory controller, +as some DIMM types have a memory buffer that can hide direct access to +it from the Memory Controller. + +* Single-Ranked stick + +A Single-ranked stick has 1 chip-select row of memory. Motherboards +commonly drive two chip-select pins to a memory stick. A single-ranked +stick, will occupy only one of those rows. The other will be unused. + +.. _doubleranked: + +* Double-Ranked stick + +A double-ranked stick has two chip-select rows which access different +sets of memory devices. The two rows cannot be accessed concurrently. + +* Double-sided stick + +**DEPRECATED TERM**, see :ref:`Double-Ranked stick <doubleranked>`. + +A double-sided stick has two chip-select rows which access different sets +of memory devices. The two rows cannot be accessed concurrently. +"Double-sided" is irrespective of the memory devices being mounted on +both sides of the memory stick. + +* Socket set + +All of the memory sticks that are required for a single memory access or +all of the memory sticks spanned by a chip-select row. A single socket +set has two chip-select rows and if double-sided sticks are used these +will occupy those chip-select rows. + +* Bank + +This term is avoided because it is unclear when needing to distinguish +between chip-select rows and socket sets. + +* High Bandwidth Memory (HBM) + +HBM is a new memory type with low power consumption and ultra-wide +communication lanes. It uses vertically stacked memory chips (DRAM dies) +interconnected by microscopic wires called "through-silicon vias," or +TSVs. + +Several stacks of HBM chips connect to the CPU or GPU through an ultra-fast +interconnect called the "interposer". Therefore, HBM's characteristics +are nearly indistinguishable from on-chip integrated RAM. + +Memory Controllers +------------------ + +Most of the EDAC core is focused on doing Memory Controller error detection. +The :c:func:`edac_mc_alloc`. It uses internally the struct ``mem_ctl_info`` +to describe the memory controllers, with is an opaque struct for the EDAC +drivers. Only the EDAC core is allowed to touch it. + +.. kernel-doc:: include/linux/edac.h + +.. kernel-doc:: drivers/edac/edac_mc.h + +PCI Controllers +--------------- + +The EDAC subsystem provides a mechanism to handle PCI controllers by calling +the :c:func:`edac_pci_alloc_ctl_info`. It will use the struct +:c:type:`edac_pci_ctl_info` to describe the PCI controllers. + +.. kernel-doc:: drivers/edac/edac_pci.h + +EDAC Blocks +----------- + +The EDAC subsystem also provides a generic mechanism to report errors on +other parts of the hardware via :c:func:`edac_device_alloc_ctl_info` function. + +The structures :c:type:`edac_dev_sysfs_block_attribute`, +:c:type:`edac_device_block`, :c:type:`edac_device_instance` and +:c:type:`edac_device_ctl_info` provide a generic or abstract 'edac_device' +representation at sysfs. + +This set of structures and the code that implements the APIs for the same, provide for registering EDAC type devices which are NOT standard memory or +PCI, like: + +- CPU caches (L1 and L2) +- DMA engines +- Core CPU switches +- Fabric switch units +- PCIe interface controllers +- other EDAC/ECC type devices that can be monitored for + errors, etc. + +It allows for a 2 level set of hierarchy. + +For example, a cache could be composed of L1, L2 and L3 levels of cache. +Each CPU core would have its own L1 cache, while sharing L2 and maybe L3 +caches. On such case, those can be represented via the following sysfs +nodes:: + + /sys/devices/system/edac/.. + + pci/ <existing pci directory (if available)> + mc/ <existing memory device directory> + cpu/cpu0/.. <L1 and L2 block directory> + /L1-cache/ce_count + /ue_count + /L2-cache/ce_count + /ue_count + cpu/cpu1/.. <L1 and L2 block directory> + /L1-cache/ce_count + /ue_count + /L2-cache/ce_count + /ue_count + ... + + the L1 and L2 directories would be "edac_device_block's" + +.. kernel-doc:: drivers/edac/edac_device.h + + +Heterogeneous system support +---------------------------- + +An AMD heterogeneous system is built by connecting the data fabrics of +both CPUs and GPUs via custom xGMI links. Thus, the data fabric on the +GPU nodes can be accessed the same way as the data fabric on CPU nodes. + +The MI200 accelerators are data center GPUs. They have 2 data fabrics, +and each GPU data fabric contains four Unified Memory Controllers (UMC). +Each UMC contains eight channels. Each UMC channel controls one 128-bit +HBM2e (2GB) channel (equivalent to 8 X 2GB ranks). This creates a total +of 4096-bits of DRAM data bus. + +While the UMC is interfacing a 16GB (8high X 2GB DRAM) HBM stack, each UMC +channel is interfacing 2GB of DRAM (represented as rank). + +Memory controllers on AMD GPU nodes can be represented in EDAC thusly: + + GPU DF / GPU Node -> EDAC MC + GPU UMC -> EDAC CSROW + GPU UMC channel -> EDAC CHANNEL + +For example: a heterogeneous system with 1 AMD CPU is connected to +4 MI200 (Aldebaran) GPUs using xGMI. + +Some more heterogeneous hardware details: + +- The CPU UMC (Unified Memory Controller) is mostly the same as the GPU UMC. + They have chip selects (csrows) and channels. However, the layouts are different + for performance, physical layout, or other reasons. +- CPU UMCs use 1 channel, In this case UMC = EDAC channel. This follows the + marketing speak. CPU has X memory channels, etc. +- CPU UMCs use up to 4 chip selects, So UMC chip select = EDAC CSROW. +- GPU UMCs use 1 chip select, So UMC = EDAC CSROW. +- GPU UMCs use 8 channels, So UMC channel = EDAC channel. + +The EDAC subsystem provides a mechanism to handle AMD heterogeneous +systems by calling system specific ops for both CPUs and GPUs. + +AMD GPU nodes are enumerated in sequential order based on the PCI +hierarchy, and the first GPU node is assumed to have a Node ID value +following those of the CPU nodes after latter are fully populated:: + + $ ls /sys/devices/system/edac/mc/ + mc0 - CPU MC node 0 + mc1 | + mc2 |- GPU card[0] => node 0(mc1), node 1(mc2) + mc3 | + mc4 |- GPU card[1] => node 0(mc3), node 1(mc4) + mc5 | + mc6 |- GPU card[2] => node 0(mc5), node 1(mc6) + mc7 | + mc8 |- GPU card[3] => node 0(mc7), node 1(mc8) + +For example, a heterogeneous system with one AMD CPU is connected to +four MI200 (Aldebaran) GPUs using xGMI. This topology can be represented +via the following sysfs entries:: + + /sys/devices/system/edac/mc/.. + + CPU # CPU node + ├── mc 0 + + GPU Nodes are enumerated sequentially after CPU nodes have been populated + GPU card 1 # Each MI200 GPU has 2 nodes/mcs + ├── mc 1 # GPU node 0 == mc1, Each MC node has 4 UMCs/CSROWs + │ ├── csrow 0 # UMC 0 + │ │ ├── channel 0 # Each UMC has 8 channels + │ │ ├── channel 1 # size of each channel is 2 GB, so each UMC has 16 GB + │ │ ├── channel 2 + │ │ ├── channel 3 + │ │ ├── channel 4 + │ │ ├── channel 5 + │ │ ├── channel 6 + │ │ ├── channel 7 + │ ├── csrow 1 # UMC 1 + │ │ ├── channel 0 + │ │ ├── .. + │ │ ├── channel 7 + │ ├── .. .. + │ ├── csrow 3 # UMC 3 + │ │ ├── channel 0 + │ │ ├── .. + │ │ ├── channel 7 + │ ├── rank 0 + │ ├── .. .. + │ ├── rank 31 # total 32 ranks/dimms from 4 UMCs + ├ + ├── mc 2 # GPU node 1 == mc2 + │ ├── .. # each GPU has total 64 GB + + GPU card 2 + ├── mc 3 + │ ├── .. + ├── mc 4 + │ ├── .. + + GPU card 3 + ├── mc 5 + │ ├── .. + ├── mc 6 + │ ├── .. + + GPU card 4 + ├── mc 7 + │ ├── .. + ├── mc 8 + │ ├── .. |