diff options
Diffstat (limited to '')
-rw-r--r-- | Documentation/x86/pat.rst | 240 |
1 files changed, 240 insertions, 0 deletions
diff --git a/Documentation/x86/pat.rst b/Documentation/x86/pat.rst new file mode 100644 index 000000000..5d9017710 --- /dev/null +++ b/Documentation/x86/pat.rst @@ -0,0 +1,240 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================== +PAT (Page Attribute Table) +========================== + +x86 Page Attribute Table (PAT) allows for setting the memory attribute at the +page level granularity. PAT is complementary to the MTRR settings which allows +for setting of memory types over physical address ranges. However, PAT is +more flexible than MTRR due to its capability to set attributes at page level +and also due to the fact that there are no hardware limitations on number of +such attribute settings allowed. Added flexibility comes with guidelines for +not having memory type aliasing for the same physical memory with multiple +virtual addresses. + +PAT allows for different types of memory attributes. The most commonly used +ones that will be supported at this time are: + +=== ============== +WB Write-back +UC Uncached +WC Write-combined +WT Write-through +UC- Uncached Minus +=== ============== + + +PAT APIs +======== + +There are many different APIs in the kernel that allows setting of memory +attributes at the page level. In order to avoid aliasing, these interfaces +should be used thoughtfully. Below is a table of interfaces available, +their intended usage and their memory attribute relationships. Internally, +these APIs use a reserve_memtype()/free_memtype() interface on the physical +address range to avoid any aliasing. + ++------------------------+----------+--------------+------------------+ +| API | RAM | ACPI,... | Reserved/Holes | ++------------------------+----------+--------------+------------------+ +| ioremap | -- | UC- | UC- | ++------------------------+----------+--------------+------------------+ +| ioremap_cache | -- | WB | WB | ++------------------------+----------+--------------+------------------+ +| ioremap_uc | -- | UC | UC | ++------------------------+----------+--------------+------------------+ +| ioremap_wc | -- | -- | WC | ++------------------------+----------+--------------+------------------+ +| ioremap_wt | -- | -- | WT | ++------------------------+----------+--------------+------------------+ +| set_memory_uc, | UC- | -- | -- | +| set_memory_wb | | | | ++------------------------+----------+--------------+------------------+ +| set_memory_wc, | WC | -- | -- | +| set_memory_wb | | | | ++------------------------+----------+--------------+------------------+ +| set_memory_wt, | WT | -- | -- | +| set_memory_wb | | | | ++------------------------+----------+--------------+------------------+ +| pci sysfs resource | -- | -- | UC- | ++------------------------+----------+--------------+------------------+ +| pci sysfs resource_wc | -- | -- | WC | +| is IORESOURCE_PREFETCH | | | | ++------------------------+----------+--------------+------------------+ +| pci proc | -- | -- | UC- | +| !PCIIOC_WRITE_COMBINE | | | | ++------------------------+----------+--------------+------------------+ +| pci proc | -- | -- | WC | +| PCIIOC_WRITE_COMBINE | | | | ++------------------------+----------+--------------+------------------+ +| /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | +| read-write | | | | ++------------------------+----------+--------------+------------------+ +| /dev/mem | -- | UC- | UC- | +| mmap SYNC flag | | | | ++------------------------+----------+--------------+------------------+ +| /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | +| mmap !SYNC flag | | | | +| and | |(from existing| (from existing | +| any alias to this area | |alias) | alias) | ++------------------------+----------+--------------+------------------+ +| /dev/mem | -- | WB | WB | +| mmap !SYNC flag | | | | +| no alias to this area | | | | +| and | | | | +| MTRR says WB | | | | ++------------------------+----------+--------------+------------------+ +| /dev/mem | -- | -- | UC- | +| mmap !SYNC flag | | | | +| no alias to this area | | | | +| and | | | | +| MTRR says !WB | | | | ++------------------------+----------+--------------+------------------+ + + +Advanced APIs for drivers +========================= + +A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range, +vmf_insert_pfn. + +Drivers wanting to export some pages to userspace do it by using mmap +interface and a combination of: + + 1) pgprot_noncached() + 2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn() + +With PAT support, a new API pgprot_writecombine is being added. So, drivers can +continue to use the above sequence, with either pgprot_noncached() or +pgprot_writecombine() in step 1, followed by step 2. + +In addition, step 2 internally tracks the region as UC or WC in memtype +list in order to ensure no conflicting mapping. + +Note that this set of APIs only works with IO (non RAM) regions. If driver +wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc() +as step 0 above and also track the usage of those pages and use set_memory_wb() +before the page is freed to free pool. + +MTRR effects on PAT / non-PAT systems +===================================== + +The following table provides the effects of using write-combining MTRRs when +using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally +mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will +be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add() +is made, should already have been ioremapped with WC attributes or PAT entries, +this can be done by using ioremap_wc() / set_memory_wc(). Devices which +combine areas of IO memory desired to remain uncacheable with areas where +write-combining is desirable should consider use of ioremap_uc() followed by +set_memory_wc() to white-list effective write-combined areas. Such use is +nevertheless discouraged as the effective memory type is considered +implementation defined, yet this strategy can be used as last resort on devices +with size-constrained regions where otherwise MTRR write-combining would +otherwise not be effective. +:: + + ==== ======= === ========================= ===================== + MTRR Non-PAT PAT Linux ioremap value Effective memory type + ==== ======= === ========================= ===================== + PAT Non-PAT | PAT + |PCD | + ||PWT | + ||| | + WC 000 WB _PAGE_CACHE_MODE_WB WC | WC + WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC + WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC + WC 011 UC _PAGE_CACHE_MODE_UC UC | UC + ==== ======= === ========================= ===================== + + (*) denotes implementation defined and is discouraged + +.. note:: -- in the above table mean "Not suggested usage for the API". Some + of the --'s are strictly enforced by the kernel. Some others are not really + enforced today, but may be enforced in future. + +For ioremap and pci access through /sys or /proc - The actual type returned +can be more restrictive, in case of any existing aliasing for that address. +For example: If there is an existing uncached mapping, a new ioremap_wc can +return uncached mapping in place of write-combine requested. + +set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver +will first make a region uc, wc or wt and switch it back to wb after use. + +Over time writes to /proc/mtrr will be deprecated in favor of using PAT based +interfaces. Users writing to /proc/mtrr are suggested to use above interfaces. + +Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access +types. + +Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges. + + +PAT debugging +============= + +With CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by:: + + # mount -t debugfs debugfs /sys/kernel/debug + # cat /sys/kernel/debug/x86/pat_memtype_list + PAT memtype list: + uncached-minus @ 0x7fadf000-0x7fae0000 + uncached-minus @ 0x7fb19000-0x7fb1a000 + uncached-minus @ 0x7fb1a000-0x7fb1b000 + uncached-minus @ 0x7fb1b000-0x7fb1c000 + uncached-minus @ 0x7fb1c000-0x7fb1d000 + uncached-minus @ 0x7fb1d000-0x7fb1e000 + uncached-minus @ 0x7fb1e000-0x7fb25000 + uncached-minus @ 0x7fb25000-0x7fb26000 + uncached-minus @ 0x7fb26000-0x7fb27000 + uncached-minus @ 0x7fb27000-0x7fb28000 + uncached-minus @ 0x7fb28000-0x7fb2e000 + uncached-minus @ 0x7fb2e000-0x7fb2f000 + uncached-minus @ 0x7fb2f000-0x7fb30000 + uncached-minus @ 0x7fb31000-0x7fb32000 + uncached-minus @ 0x80000000-0x90000000 + +This list shows physical address ranges and various PAT settings used to +access those physical address ranges. + +Another, more verbose way of getting PAT related debug messages is with +"debugpat" boot parameter. With this parameter, various debug messages are +printed to dmesg log. + +PAT Initialization +================== + +The following table describes how PAT is initialized under various +configurations. The PAT MSR must be updated by Linux in order to support WC +and WT attributes. Otherwise, the PAT MSR has the value programmed in it +by the firmware. Note, Xen enables WC attribute in the PAT MSR for guests. + + ==== ===== ========================== ========= ======= + MTRR PAT Call Sequence PAT State PAT MSR + ==== ===== ========================== ========= ======= + E E MTRR -> PAT init Enabled OS + E D MTRR -> PAT init Disabled - + D E MTRR -> PAT disable Disabled BIOS + D D MTRR -> PAT disable Disabled - + - np/E PAT -> PAT disable Disabled BIOS + - np/D PAT -> PAT disable Disabled - + E !P/E MTRR -> PAT init Disabled BIOS + D !P/E MTRR -> PAT disable Disabled BIOS + !M !P/E MTRR stub -> PAT disable Disabled BIOS + ==== ===== ========================== ========= ======= + + Legend + + ========= ======================================= + E Feature enabled in CPU + D Feature disabled/unsupported in CPU + np "nopat" boot option specified + !P CONFIG_X86_PAT option unset + !M CONFIG_MTRR option unset + Enabled PAT state set to enabled + Disabled PAT state set to disabled + OS PAT initializes PAT MSR with OS setting + BIOS PAT keeps PAT MSR with BIOS setting + ========= ======================================= + |