diff options
Diffstat (limited to 'Documentation/powerpc/papr_hcalls.rst')
-rw-r--r-- | Documentation/powerpc/papr_hcalls.rst | 302 |
1 files changed, 0 insertions, 302 deletions
diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst deleted file mode 100644 index 80d2c0aada..0000000000 --- a/Documentation/powerpc/papr_hcalls.rst +++ /dev/null @@ -1,302 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -=========================== -Hypercall Op-codes (hcalls) -=========================== - -Overview -========= - -Virtualization on 64-bit Power Book3S Platforms is based on the PAPR -specification [1]_ which describes the run-time environment for a guest -operating system and how it should interact with the hypervisor for -privileged operations. Currently there are two PAPR compliant hypervisors: - -- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, - IBM-i and Linux as supported guests (termed as Logical Partitions - or LPARS). It supports the full PAPR specification. - -- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. - Though it only implements a subset of PAPR specification called LoPAPR [2]_. - -On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called -a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must -issue hypercalls to the hypervisor whenever it needs to perform an action -that is hypervisor privileged [3]_ or for other services managed by the -hypervisor. - -Hence a Hypercall (hcall) is essentially a request by the pseries guest -asking hypervisor to perform a privileged operation on behalf of the guest. The -guest issues a with necessary input operands. The hypervisor after performing -the privilege operation returns a status code and output operands back to the -guest. - -HCALL ABI -========= -The ABI specification for a hcall between a pseries guest and PAPR hypervisor -is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is -done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* -and any in-arguments for the hcall are provided in registers *r4-r12*. If values -have to be passed through a memory buffer, the data stored in that buffer should be -in Big-endian byte order. - -Once control returns back to the guest after hypervisor has serviced the -'HVCS' instruction the return value of the hcall is available in *r3* and any -out values are returned in registers *r4-r12*. Again like in case of in-arguments, -any out values stored in a memory buffer will be in Big-endian byte order. - -Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined -in a arch specific header [4]_ to issue hcalls from the linux kernel -running as pseries guest. - -Register Conventions -==================== - -Any hcall should follow same register convention as described in section 2.2.1.1 -of "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below -summarizes these conventions: - -+----------+----------+-------------------------------------------+ -| Register |Volatile | Purpose | -| Range |(Y/N) | | -+==========+==========+===========================================+ -| r0 | Y | Optional-usage | -+----------+----------+-------------------------------------------+ -| r1 | N | Stack Pointer | -+----------+----------+-------------------------------------------+ -| r2 | N | TOC | -+----------+----------+-------------------------------------------+ -| r3 | Y | hcall opcode/return value | -+----------+----------+-------------------------------------------+ -| r4-r10 | Y | in and out values | -+----------+----------+-------------------------------------------+ -| r11 | Y | Optional-usage/Environmental pointer | -+----------+----------+-------------------------------------------+ -| r12 | Y | Optional-usage/Function entry address at | -| | | global entry point | -+----------+----------+-------------------------------------------+ -| r13 | N | Thread-Pointer | -+----------+----------+-------------------------------------------+ -| r14-r31 | N | Local Variables | -+----------+----------+-------------------------------------------+ -| LR | Y | Link Register | -+----------+----------+-------------------------------------------+ -| CTR | Y | Loop Counter | -+----------+----------+-------------------------------------------+ -| XER | Y | Fixed-point exception register. | -+----------+----------+-------------------------------------------+ -| CR0-1 | Y | Condition register fields. | -+----------+----------+-------------------------------------------+ -| CR2-4 | N | Condition register fields. | -+----------+----------+-------------------------------------------+ -| CR5-7 | Y | Condition register fields. | -+----------+----------+-------------------------------------------+ -| Others | N | | -+----------+----------+-------------------------------------------+ - -DRC & DRC Indexes -================= -:: - - DR1 Guest - +--+ +------------+ +---------+ - | | <----> | | | User | - +--+ DRC1 | | DRC | Space | - | PAPR | Index +---------+ - DR2 | Hypervisor | | | - +--+ | | <-----> | Kernel | - | | <----> | | Hcall | | - +--+ DRC2 +------------+ +---------+ - -PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc -available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to -an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC) -to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number -called DRC-Index. The DRC-index value is provided to the LPAR via device-tree -where its present as an attribute in the device tree node associated with the -DR. - -HCALL Return-values -=================== - -After servicing the hcall, hypervisor sets the return-value in *r3* indicating -success or failure of the hcall. In case of a failure an error code indicates -the cause for error. These codes are defined and documented in arch specific -header [4]_. - -In some cases a hcall can potentially take a long time and need to be issued -multiple times in order to be completely serviced. These hcalls will usually -accept an opaque value *continue-token* within there argument list and a -return value of *H_CONTINUE* indicates that hypervisor hasn't still finished -servicing the hcall yet. - -To make such hcalls the guest need to set *continue-token == 0* for the -initial call and use the hypervisor returned value of *continue-token* -for each subsequent hcall until hypervisor returns a non *H_CONTINUE* -return value. - -HCALL Op-codes -============== - -Below is a partial list of HCALLs that are supported by PHYP. For the -corresponding opcode values please look into the arch specific header [4]_: - -**H_SCM_READ_METADATA** - -| Input: *drcIndex, offset, buffer-address, numBytesToRead* -| Out: *numBytesRead* -| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware* - -Given a DRC Index of an NVDIMM, read N-bytes from the metadata area -associated with it, at a specified offset and copy it to provided buffer. -The metadata area stores configuration information such as label information, -bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage -area hence a separate access semantics is provided. - -**H_SCM_WRITE_METADATA** - -| Input: *drcIndex, offset, data, numBytesToWrite* -| Out: *None* -| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware* - -Given a DRC Index of an NVDIMM, write N-bytes to the metadata area -associated with it, at the specified offset and from the provided buffer. - -**H_SCM_BIND_MEM** - -| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,* -| *targetLogicalMemoryAddress, continue-token* -| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound* -| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,* -| *H_Too_Big, H_P5, H_Busy* - -Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range -*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest -at *targetLogicalMemoryAddress* within guest physical address space. In -case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor -assigns a target address to the guest. The HCALL can fail if the Guest has -an active PTE entry to the SCM block being bound. - -**H_SCM_UNBIND_MEM** -| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind -| Out: numScmBlocksUnbound -| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,* -| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* - -Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting -at *startingScmLogicalMemoryAddress* from guest physical address space. The -HCALL can fail if the Guest has an active PTE entry to the SCM block being -unbound. - -**H_SCM_QUERY_BLOCK_MEM_BINDING** - -| Input: *drcIndex, scmBlockIndex* -| Out: *Guest-Physical-Address* -| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* - -Given a DRC-Index and an SCM Block index return the guest physical address to -which the SCM block is mapped to. - -**H_SCM_QUERY_LOGICAL_MEM_BINDING** - -| Input: *Guest-Physical-Address* -| Out: *drcIndex, scmBlockIndex* -| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* - -Given a guest physical address return which DRC Index and SCM block is mapped -to that address. - -**H_SCM_UNBIND_ALL** - -| Input: *scmTargetScope, drcIndex* -| Out: *None* -| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,* -| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* - -Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs -or all SCM blocks belonging to a single NVDIMM identified by its drcIndex -from the LPAR memory. - -**H_SCM_HEALTH** - -| Input: drcIndex -| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)* -| Return Value: *H_Success, H_Parameter, H_Hardware* - -Given a DRC Index return the info on predictive failure and overall health of -the PMEM device. The asserted bits in the health-bitmap indicate one or more states -(described in table below) of the PMEM device and health-bit-valid-bitmap indicate -which bits in health-bitmap are valid. The bits are reported in -reverse bit ordering for example a value of 0xC400000000000000 -indicates bits 0, 1, and 5 are valid. - -Health Bitmap Flags: - -+------+-----------------------------------------------------------------------+ -| Bit | Definition | -+======+=======================================================================+ -| 00 | PMEM device is unable to persist memory contents. | -| | If the system is powered down, nothing will be saved. | -+------+-----------------------------------------------------------------------+ -| 01 | PMEM device failed to persist memory contents. Either contents were | -| | not saved successfully on power down or were not restored properly on | -| | power up. | -+------+-----------------------------------------------------------------------+ -| 02 | PMEM device contents are persisted from previous IPL. The data from | -| | the last boot were successfully restored. | -+------+-----------------------------------------------------------------------+ -| 03 | PMEM device contents are not persisted from previous IPL. There was no| -| | data to restore from the last boot. | -+------+-----------------------------------------------------------------------+ -| 04 | PMEM device memory life remaining is critically low | -+------+-----------------------------------------------------------------------+ -| 05 | PMEM device will be garded off next IPL due to failure | -+------+-----------------------------------------------------------------------+ -| 06 | PMEM device contents cannot persist due to current platform health | -| | status. A hardware failure may prevent data from being saved or | -| | restored. | -+------+-----------------------------------------------------------------------+ -| 07 | PMEM device is unable to persist memory contents in certain conditions| -+------+-----------------------------------------------------------------------+ -| 08 | PMEM device is encrypted | -+------+-----------------------------------------------------------------------+ -| 09 | PMEM device has successfully completed a requested erase or secure | -| | erase procedure. | -+------+-----------------------------------------------------------------------+ -|10:63 | Reserved / Unused | -+------+-----------------------------------------------------------------------+ - -**H_SCM_PERFORMANCE_STATS** - -| Input: drcIndex, resultBuffer Addr -| Out: None -| Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege* - -Given a DRC Index collect the performance statistics for NVDIMM and copy them -to the resultBuffer. - -**H_SCM_FLUSH** - -| Input: *drcIndex, continue-token* -| Out: *continue-token* -| Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY* - -Given a DRC Index Flush the data to backend NVDIMM device. - -The hcall returns H_BUSY when the flush takes longer time and the hcall needs -to be issued multiple times in order to be completely serviced. The -*continue-token* from the output to be passed in the argument list of -subsequent hcalls to the hypervisor until the hcall is completely serviced -at which point H_SUCCESS or other error is returned by the hypervisor. - -References -========== -.. [1] "Power Architecture Platform Reference" - https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference -.. [2] "Linux on Power Architecture Platform Reference" - https://members.openpowerfoundation.org/document/dl/469 -.. [3] "Definitions and Notation" Book III-Section 14.5.3 - https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 -.. [4] arch/powerpc/include/asm/hvcall.h -.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture" - https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture |