summaryrefslogtreecommitdiffstats
path: root/Documentation/arch/powerpc
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-08-07 13:17:52 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-08-07 13:17:52 +0000
commit3afb00d3f86d3d924f88b56fa8285d4e9db85852 (patch)
tree95a985d3019522cea546b7d8df621369bc44fc6c /Documentation/arch/powerpc
parentAdding debian version 6.9.12-1. (diff)
downloadlinux-3afb00d3f86d3d924f88b56fa8285d4e9db85852.tar.xz
linux-3afb00d3f86d3d924f88b56fa8285d4e9db85852.zip
Merging upstream version 6.10.3.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'Documentation/arch/powerpc')
-rw-r--r--Documentation/arch/powerpc/dexcr.rst141
-rw-r--r--Documentation/arch/powerpc/firmware-assisted-dump.rst91
-rw-r--r--Documentation/arch/powerpc/kvm-nested.rst4
3 files changed, 184 insertions, 52 deletions
diff --git a/Documentation/arch/powerpc/dexcr.rst b/Documentation/arch/powerpc/dexcr.rst
index 615a631f51..ab0724212f 100644
--- a/Documentation/arch/powerpc/dexcr.rst
+++ b/Documentation/arch/powerpc/dexcr.rst
@@ -36,8 +36,145 @@ state for a process.
Configuration
=============
-The DEXCR is currently unconfigurable. All threads are run with the
-NPHIE aspect enabled.
+prctl
+-----
+
+A process can control its own userspace DEXCR value using the
+``PR_PPC_GET_DEXCR`` and ``PR_PPC_SET_DEXCR`` pair of
+:manpage:`prctl(2)` commands. These calls have the form::
+
+ prctl(PR_PPC_GET_DEXCR, unsigned long which, 0, 0, 0);
+ prctl(PR_PPC_SET_DEXCR, unsigned long which, unsigned long ctrl, 0, 0);
+
+The possible 'which' and 'ctrl' values are as follows. Note there is no relation
+between the 'which' value and the DEXCR aspect's index.
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 7 1
+
+ * - ``prctl()`` which
+ - Aspect name
+ - Aspect index
+
+ * - ``PR_PPC_DEXCR_SBHE``
+ - Speculative Branch Hint Enable (SBHE)
+ - 0
+
+ * - ``PR_PPC_DEXCR_IBRTPD``
+ - Indirect Branch Recurrent Target Prediction Disable (IBRTPD)
+ - 3
+
+ * - ``PR_PPC_DEXCR_SRAPD``
+ - Subroutine Return Address Prediction Disable (SRAPD)
+ - 4
+
+ * - ``PR_PPC_DEXCR_NPHIE``
+ - Non-Privileged Hash Instruction Enable (NPHIE)
+ - 5
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 8
+
+ * - ``prctl()`` ctrl
+ - Meaning
+
+ * - ``PR_PPC_DEXCR_CTRL_EDITABLE``
+ - This aspect can be configured with PR_PPC_SET_DEXCR (get only)
+
+ * - ``PR_PPC_DEXCR_CTRL_SET``
+ - This aspect is set / set this aspect
+
+ * - ``PR_PPC_DEXCR_CTRL_CLEAR``
+ - This aspect is clear / clear this aspect
+
+ * - ``PR_PPC_DEXCR_CTRL_SET_ONEXEC``
+ - This aspect will be set after exec / set this aspect after exec
+
+ * - ``PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC``
+ - This aspect will be clear after exec / clear this aspect after exec
+
+Note that
+
+* which is a plain value, not a bitmask. Aspects must be worked with individually.
+
+* ctrl is a bitmask. ``PR_PPC_GET_DEXCR`` returns both the current and onexec
+ configuration. For example, ``PR_PPC_GET_DEXCR`` may return
+ ``PR_PPC_DEXCR_CTRL_EDITABLE | PR_PPC_DEXCR_CTRL_SET |
+ PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC``. This would indicate the aspect is currently
+ set, it will be cleared when you run exec, and you can change this with the
+ ``PR_PPC_SET_DEXCR`` prctl.
+
+* The set/clear terminology refers to setting/clearing the bit in the DEXCR.
+ For example::
+
+ prctl(PR_PPC_SET_DEXCR, PR_PPC_DEXCR_IBRTPD, PR_PPC_DEXCR_CTRL_SET, 0, 0);
+
+ will set the IBRTPD aspect bit in the DEXCR, causing indirect branch prediction
+ to be disabled.
+
+* The status returned by ``PR_PPC_GET_DEXCR`` represents what value the process
+ would like applied. It does not include any alternative overrides, such as if
+ the hypervisor is enforcing the aspect be set. To see the true DEXCR state
+ software should read the appropriate SPRs directly.
+
+* The aspect state when starting a process is copied from the parent's state on
+ :manpage:`fork(2)`. The state is reset to a fixed value on
+ :manpage:`execve(2)`. The PR_PPC_SET_DEXCR prctl() can control both of these
+ values.
+
+* The ``*_ONEXEC`` controls do not change the current process's DEXCR.
+
+Use ``PR_PPC_SET_DEXCR`` with one of ``PR_PPC_DEXCR_CTRL_SET`` or
+``PR_PPC_DEXCR_CTRL_CLEAR`` to edit a given aspect.
+
+Common error codes for both getting and setting the DEXCR are as follows:
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 8
+
+ * - Error
+ - Meaning
+
+ * - ``EINVAL``
+ - The DEXCR is not supported by the kernel.
+
+ * - ``ENODEV``
+ - The aspect is not recognised by the kernel or not supported by the
+ hardware.
+
+``PR_PPC_SET_DEXCR`` may also report the following error codes:
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 8
+
+ * - Error
+ - Meaning
+
+ * - ``EINVAL``
+ - The ctrl value contains unrecognised flags.
+
+ * - ``EINVAL``
+ - The ctrl value contains mutually conflicting flags (e.g.,
+ ``PR_PPC_DEXCR_CTRL_SET | PR_PPC_DEXCR_CTRL_CLEAR``)
+
+ * - ``EPERM``
+ - This aspect cannot be modified with prctl() (check for the
+ PR_PPC_DEXCR_CTRL_EDITABLE flag with PR_PPC_GET_DEXCR).
+
+ * - ``EPERM``
+ - The process does not have sufficient privilege to perform the operation.
+ For example, clearing NPHIE on exec is a privileged operation (a process
+ can still clear its own NPHIE aspect without privileges).
+
+This interface allows a process to control its own DEXCR aspects, and also set
+the initial DEXCR value for any children in its process tree (up to the next
+child to use an ``*_ONEXEC`` control). This allows fine-grained control over the
+default value of the DEXCR, for example allowing containers to run with different
+default values.
coredump and ptrace
diff --git a/Documentation/arch/powerpc/firmware-assisted-dump.rst b/Documentation/arch/powerpc/firmware-assisted-dump.rst
index e363fc4852..7e37aadd1f 100644
--- a/Documentation/arch/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/arch/powerpc/firmware-assisted-dump.rst
@@ -134,12 +134,12 @@ that are run. If there is dump data, then the
memory is held.
If there is no waiting dump data, then only the memory required to
-hold CPU state, HPTE region, boot memory dump, FADump header and
-elfcore header, is usually reserved at an offset greater than boot
-memory size (see Fig. 1). This area is *not* released: this region
-will be kept permanently reserved, so that it can act as a receptacle
-for a copy of the boot memory content in addition to CPU state and
-HPTE region, in the case a crash does occur.
+hold CPU state, HPTE region, boot memory dump, and FADump header is
+usually reserved at an offset greater than boot memory size (see Fig. 1).
+This area is *not* released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy of the boot
+memory content in addition to CPU state and HPTE region, in the case
+a crash does occur.
Since this reserved memory area is used only after the system crash,
there is no point in blocking this significant chunk of memory from
@@ -153,22 +153,22 @@ that were present in CMA region::
o Memory Reservation during first kernel
- Low memory Top of memory
- 0 boot memory size |<--- Reserved dump area --->| |
- | | | Permanent Reservation | |
- V V | | V
- +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
- | | |///|////| DUMP | HDR | ELF |////| |
- +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
- | ^ ^ ^ ^ ^
- | | | | | |
- \ CPU HPTE / | |
- ------------------------------ | |
- Boot memory content gets transferred | |
- to reserved area by firmware at the | |
- time of crash. | |
- FADump Header |
- (meta area) |
+ Low memory Top of memory
+ 0 boot memory size |<------ Reserved dump area ----->| |
+ | | | Permanent Reservation | |
+ V V | | V
+ +-----------+-----/ /---+---+----+-----------+-------+----+-----+
+ | | |///|////| DUMP | HDR |////| |
+ +-----------+-----/ /---+---+----+-----------+-------+----+-----+
+ | ^ ^ ^ ^ ^
+ | | | | | |
+ \ CPU HPTE / | |
+ -------------------------------- | |
+ Boot memory content gets transferred | |
+ to reserved area by firmware at the | |
+ time of crash. | |
+ FADump Header |
+ (meta area) |
|
|
Metadata: This area holds a metadata structure whose
@@ -186,13 +186,20 @@ that were present in CMA region::
0 boot memory size |
| |<------------ Crash preserved area ------------>|
V V |<--- Reserved dump area --->| |
- +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
- | | |///|////| DUMP | HDR | ELF |////| |
- +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
- | |
- V V
- Used by second /proc/vmcore
- kernel to boot
+ +----+---+--+-----/ /---+---+----+-------+-----+-----+-------+
+ | |ELF| | |///|////| DUMP | HDR |/////| |
+ +----+---+--+-----/ /---+---+----+-------+-----+-----+-------+
+ | | | | | |
+ ----- ------------------------------ ---------------
+ \ | |
+ \ | |
+ \ | |
+ \ | ----------------------------
+ \ | /
+ \ | /
+ \ | /
+ /proc/vmcore
+
+---+
|///| -> Regions (CPU, HPTE & Metadata) marked like this in the above
@@ -200,6 +207,12 @@ that were present in CMA region::
does not have CPU & HPTE regions while Metadata region is
not supported on pSeries currently.
+ +---+
+ |ELF| -> elfcorehdr, it is created in second kernel after crash.
+ +---+
+
+ Note: Memory from 0 to the boot memory size is used by second kernel
+
Fig. 2
@@ -353,26 +366,6 @@ TODO:
- Need to come up with the better approach to find out more
accurate boot memory size that is required for a kernel to
boot successfully when booted with restricted memory.
- - The FADump implementation introduces a FADump crash info structure
- in the scratch area before the ELF core header. The idea of introducing
- this structure is to pass some important crash info data to the second
- kernel which will help second kernel to populate ELF core header with
- correct data before it gets exported through /proc/vmcore. The current
- design implementation does not address a possibility of introducing
- additional fields (in future) to this structure without affecting
- compatibility. Need to come up with the better approach to address this.
-
- The possible approaches are:
-
- 1. Introduce version field for version tracking, bump up the version
- whenever a new field is added to the structure in future. The version
- field can be used to find out what fields are valid for the current
- version of the structure.
- 2. Reserve the area of predefined size (say PAGE_SIZE) for this
- structure and have unused area as reserved (initialized to zero)
- for future field additions.
-
- The advantage of approach 1 over 2 is we don't need to reserve extra space.
Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
diff --git a/Documentation/arch/powerpc/kvm-nested.rst b/Documentation/arch/powerpc/kvm-nested.rst
index 630602a8aa..5defd13cc6 100644
--- a/Documentation/arch/powerpc/kvm-nested.rst
+++ b/Documentation/arch/powerpc/kvm-nested.rst
@@ -546,7 +546,9 @@ table information.
+--------+-------+----+--------+----------------------------------+
| 0x1052 | 0x08 | RW | T | CTRL |
+--------+-------+----+--------+----------------------------------+
-| 0x1053-| | | | Reserved |
+| 0x1053 | 0x08 | RW | T | DPDES |
++--------+-------+----+--------+----------------------------------+
+| 0x1054-| | | | Reserved |
| 0x1FFF | | | | |
+--------+-------+----+--------+----------------------------------+
| 0x2000 | 0x04 | RW | T | CR |