diff options
Diffstat (limited to 'Documentation/power/powercap')
-rw-r--r-- | Documentation/power/powercap/dtpm.rst | 212 | ||||
-rw-r--r-- | Documentation/power/powercap/powercap.rst | 262 |
2 files changed, 474 insertions, 0 deletions
diff --git a/Documentation/power/powercap/dtpm.rst b/Documentation/power/powercap/dtpm.rst new file mode 100644 index 0000000000..a38dee3d81 --- /dev/null +++ b/Documentation/power/powercap/dtpm.rst @@ -0,0 +1,212 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================== +Dynamic Thermal Power Management framework +========================================== + +On the embedded world, the complexity of the SoC leads to an +increasing number of hotspots which need to be monitored and mitigated +as a whole in order to prevent the temperature to go above the +normative and legally stated 'skin temperature'. + +Another aspect is to sustain the performance for a given power budget, +for example virtual reality where the user can feel dizziness if the +performance is capped while a big CPU is processing something else. Or +reduce the battery charging because the dissipated power is too high +compared with the power consumed by other devices. + +The user space is the most adequate place to dynamically act on the +different devices by limiting their power given an application +profile: it has the knowledge of the platform. + +The Dynamic Thermal Power Management (DTPM) is a technique acting on +the device power by limiting and/or balancing a power budget among +different devices. + +The DTPM framework provides an unified interface to act on the +device power. + +Overview +======== + +The DTPM framework relies on the powercap framework to create the +powercap entries in the sysfs directory and implement the backend +driver to do the connection with the power manageable device. + +The DTPM is a tree representation describing the power constraints +shared between devices, not their physical positions. + +The nodes of the tree are a virtual description aggregating the power +characteristics of the children nodes and their power limitations. + +The leaves of the tree are the real power manageable devices. + +For instance:: + + SoC + | + `-- pkg + | + |-- pd0 (cpu0-3) + | + `-- pd1 (cpu4-5) + +The pkg power will be the sum of pd0 and pd1 power numbers:: + + SoC (400mW - 3100mW) + | + `-- pkg (400mW - 3100mW) + | + |-- pd0 (100mW - 700mW) + | + `-- pd1 (300mW - 2400mW) + +When the nodes are inserted in the tree, their power characteristics are propagated to the parents:: + + SoC (600mW - 5900mW) + | + |-- pkg (400mW - 3100mW) + | | + | |-- pd0 (100mW - 700mW) + | | + | `-- pd1 (300mW - 2400mW) + | + `-- pd2 (200mW - 2800mW) + +Each node have a weight on a 2^10 basis reflecting the percentage of power consumption along the siblings:: + + SoC (w=1024) + | + |-- pkg (w=538) + | | + | |-- pd0 (w=231) + | | + | `-- pd1 (w=794) + | + `-- pd2 (w=486) + + Note the sum of weights at the same level are equal to 1024. + +When a power limitation is applied to a node, then it is distributed along the children given their weights. For example, if we set a power limitation of 3200mW at the 'SoC' root node, the resulting tree will be:: + + SoC (w=1024) <--- power_limit = 3200mW + | + |-- pkg (w=538) --> power_limit = 1681mW + | | + | |-- pd0 (w=231) --> power_limit = 378mW + | | + | `-- pd1 (w=794) --> power_limit = 1303mW + | + `-- pd2 (w=486) --> power_limit = 1519mW + + +Flat description +---------------- + +A root node is created and it is the parent of all the nodes. This +description is the simplest one and it is supposed to give to user +space a flat representation of all the devices supporting the power +limitation without any power limitation distribution. + +Hierarchical description +------------------------ + +The different devices supporting the power limitation are represented +hierarchically. There is one root node, all intermediate nodes are +grouping the child nodes which can be intermediate nodes also or real +devices. + +The intermediate nodes aggregate the power information and allows to +set the power limit given the weight of the nodes. + +User space API +============== + +As stated in the overview, the DTPM framework is built on top of the +powercap framework. Thus the sysfs interface is the same, please refer +to the powercap documentation for further details. + + * power_uw: Instantaneous power consumption. If the node is an + intermediate node, then the power consumption will be the sum of all + children power consumption. + + * max_power_range_uw: The power range resulting of the maximum power + minus the minimum power. + + * name: The name of the node. This is implementation dependent. Even + if it is not recommended for the user space, several nodes can have + the same name. + + * constraint_X_name: The name of the constraint. + + * constraint_X_max_power_uw: The maximum power limit to be applicable + to the node. + + * constraint_X_power_limit_uw: The power limit to be applied to the + node. If the value contained in constraint_X_max_power_uw is set, + the constraint will be removed. + + * constraint_X_time_window_us: The meaning of this file will depend + on the constraint number. + +Constraints +----------- + + * Constraint 0: The power limitation is immediately applied, without + limitation in time. + +Kernel API +========== + +Overview +-------- + +The DTPM framework has no power limiting backend support. It is +generic and provides a set of API to let the different drivers to +implement the backend part for the power limitation and create the +power constraints tree. + +It is up to the platform to provide the initialization function to +allocate and link the different nodes of the tree. + +A special macro has the role of declaring a node and the corresponding +initialization function via a description structure. This one contains +an optional parent field allowing to hook different devices to an +already existing tree at boot time. + +For instance:: + + struct dtpm_descr my_descr = { + .name = "my_name", + .init = my_init_func, + }; + + DTPM_DECLARE(my_descr); + +The nodes of the DTPM tree are described with dtpm structure. The +steps to add a new power limitable device is done in three steps: + + * Allocate the dtpm node + * Set the power number of the dtpm node + * Register the dtpm node + +The registration of the dtpm node is done with the powercap +ops. Basically, it must implements the callbacks to get and set the +power and the limit. + +Alternatively, if the node to be inserted is an intermediate one, then +a simple function to insert it as a future parent is available. + +If a device has its power characteristics changing, then the tree must +be updated with the new power numbers and weights. + +Nomenclature +------------ + + * dtpm_alloc() : Allocate and initialize a dtpm structure + + * dtpm_register() : Add the dtpm node to the tree + + * dtpm_unregister() : Remove the dtpm node from the tree + + * dtpm_update_power() : Update the power characteristics of the dtpm node diff --git a/Documentation/power/powercap/powercap.rst b/Documentation/power/powercap/powercap.rst new file mode 100644 index 0000000000..e75d12596d --- /dev/null +++ b/Documentation/power/powercap/powercap.rst @@ -0,0 +1,262 @@ +======================= +Power Capping Framework +======================= + +The power capping framework provides a consistent interface between the kernel +and the user space that allows power capping drivers to expose the settings to +user space in a uniform way. + +Terminology +=========== + +The framework exposes power capping devices to user space via sysfs in the +form of a tree of objects. The objects at the root level of the tree represent +'control types', which correspond to different methods of power capping. For +example, the intel-rapl control type represents the Intel "Running Average +Power Limit" (RAPL) technology, whereas the 'idle-injection' control type +corresponds to the use of idle injection for controlling power. + +Power zones represent different parts of the system, which can be controlled and +monitored using the power capping method determined by the control type the +given zone belongs to. They each contain attributes for monitoring power, as +well as controls represented in the form of power constraints. If the parts of +the system represented by different power zones are hierarchical (that is, one +bigger part consists of multiple smaller parts that each have their own power +controls), those power zones may also be organized in a hierarchy with one +parent power zone containing multiple subzones and so on to reflect the power +control topology of the system. In that case, it is possible to apply power +capping to a set of devices together using the parent power zone and if more +fine grained control is required, it can be applied through the subzones. + + +Example sysfs interface tree:: + + /sys/devices/virtual/powercap + └──intel-rapl + ├──intel-rapl:0 + │ ├──constraint_0_name + │ ├──constraint_0_power_limit_uw + │ ├──constraint_0_time_window_us + │ ├──constraint_1_name + │ ├──constraint_1_power_limit_uw + │ ├──constraint_1_time_window_us + │ ├──device -> ../../intel-rapl + │ ├──energy_uj + │ ├──intel-rapl:0:0 + │ │ ├──constraint_0_name + │ │ ├──constraint_0_power_limit_uw + │ │ ├──constraint_0_time_window_us + │ │ ├──constraint_1_name + │ │ ├──constraint_1_power_limit_uw + │ │ ├──constraint_1_time_window_us + │ │ ├──device -> ../../intel-rapl:0 + │ │ ├──energy_uj + │ │ ├──max_energy_range_uj + │ │ ├──name + │ │ ├──enabled + │ │ ├──power + │ │ │ ├──async + │ │ │ [] + │ │ ├──subsystem -> ../../../../../../class/power_cap + │ │ └──uevent + │ ├──intel-rapl:0:1 + │ │ ├──constraint_0_name + │ │ ├──constraint_0_power_limit_uw + │ │ ├──constraint_0_time_window_us + │ │ ├──constraint_1_name + │ │ ├──constraint_1_power_limit_uw + │ │ ├──constraint_1_time_window_us + │ │ ├──device -> ../../intel-rapl:0 + │ │ ├──energy_uj + │ │ ├──max_energy_range_uj + │ │ ├──name + │ │ ├──enabled + │ │ ├──power + │ │ │ ├──async + │ │ │ [] + │ │ ├──subsystem -> ../../../../../../class/power_cap + │ │ └──uevent + │ ├──max_energy_range_uj + │ ├──max_power_range_uw + │ ├──name + │ ├──enabled + │ ├──power + │ │ ├──async + │ │ [] + │ ├──subsystem -> ../../../../../class/power_cap + │ ├──enabled + │ ├──uevent + ├──intel-rapl:1 + │ ├──constraint_0_name + │ ├──constraint_0_power_limit_uw + │ ├──constraint_0_time_window_us + │ ├──constraint_1_name + │ ├──constraint_1_power_limit_uw + │ ├──constraint_1_time_window_us + │ ├──device -> ../../intel-rapl + │ ├──energy_uj + │ ├──intel-rapl:1:0 + │ │ ├──constraint_0_name + │ │ ├──constraint_0_power_limit_uw + │ │ ├──constraint_0_time_window_us + │ │ ├──constraint_1_name + │ │ ├──constraint_1_power_limit_uw + │ │ ├──constraint_1_time_window_us + │ │ ├──device -> ../../intel-rapl:1 + │ │ ├──energy_uj + │ │ ├──max_energy_range_uj + │ │ ├──name + │ │ ├──enabled + │ │ ├──power + │ │ │ ├──async + │ │ │ [] + │ │ ├──subsystem -> ../../../../../../class/power_cap + │ │ └──uevent + │ ├──intel-rapl:1:1 + │ │ ├──constraint_0_name + │ │ ├──constraint_0_power_limit_uw + │ │ ├──constraint_0_time_window_us + │ │ ├──constraint_1_name + │ │ ├──constraint_1_power_limit_uw + │ │ ├──constraint_1_time_window_us + │ │ ├──device -> ../../intel-rapl:1 + │ │ ├──energy_uj + │ │ ├──max_energy_range_uj + │ │ ├──name + │ │ ├──enabled + │ │ ├──power + │ │ │ ├──async + │ │ │ [] + │ │ ├──subsystem -> ../../../../../../class/power_cap + │ │ └──uevent + │ ├──max_energy_range_uj + │ ├──max_power_range_uw + │ ├──name + │ ├──enabled + │ ├──power + │ │ ├──async + │ │ [] + │ ├──subsystem -> ../../../../../class/power_cap + │ ├──uevent + ├──power + │ ├──async + │ [] + ├──subsystem -> ../../../../class/power_cap + ├──enabled + └──uevent + +The above example illustrates a case in which the Intel RAPL technology, +available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one +control type called intel-rapl which contains two power zones, intel-rapl:0 and +intel-rapl:1, representing CPU packages. Each of these power zones contains +two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the +"core" and the "uncore" parts of the given CPU package, respectively. All of +the zones and subzones contain energy monitoring attributes (energy_uj, +max_energy_range_uj) and constraint attributes (constraint_*) allowing controls +to be applied (the constraints in the 'package' power zones apply to the whole +CPU packages and the subzone constraints only apply to the respective parts of +the given package individually). Since Intel RAPL doesn't provide instantaneous +power value, there is no power_uw attribute. + +In addition to that, each power zone contains a name attribute, allowing the +part of the system represented by that zone to be identified. +For example:: + + cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name + +package-0 +--------- + +Depending on different power zones, the Intel RAPL technology allows +one or multiple constraints like short term, long term and peak power, +with different time windows to be applied to each power zone. +All the zones contain attributes representing the constraint names, +power limits and the sizes of the time windows. Note that time window +is not applicable to peak power. Here, constraint_j_* attributes +correspond to the jth constraint (j = 0,1,2). + +For example:: + + constraint_0_name + constraint_0_power_limit_uw + constraint_0_time_window_us + constraint_1_name + constraint_1_power_limit_uw + constraint_1_time_window_us + constraint_2_name + constraint_2_power_limit_uw + constraint_2_time_window_us + +Power Zone Attributes +===================== + +Monitoring attributes +--------------------- + +energy_uj (rw) + Current energy counter in micro joules. Write "0" to reset. + If the counter can not be reset, then this attribute is read only. + +max_energy_range_uj (ro) + Range of the above energy counter in micro-joules. + +power_uw (ro) + Current power in micro watts. + +max_power_range_uw (ro) + Range of the above power value in micro-watts. + +name (ro) + Name of this power zone. + +It is possible that some domains have both power ranges and energy counter ranges; +however, only one is mandatory. + +Constraints +----------- + +constraint_X_power_limit_uw (rw) + Power limit in micro watts, which should be applicable for the + time window specified by "constraint_X_time_window_us". + +constraint_X_time_window_us (rw) + Time window in micro seconds. + +constraint_X_name (ro) + An optional name of the constraint + +constraint_X_max_power_uw(ro) + Maximum allowed power in micro watts. + +constraint_X_min_power_uw(ro) + Minimum allowed power in micro watts. + +constraint_X_max_time_window_us(ro) + Maximum allowed time window in micro seconds. + +constraint_X_min_time_window_us(ro) + Minimum allowed time window in micro seconds. + +Except power_limit_uw and time_window_us other fields are optional. + +Common zone and control type attributes +--------------------------------------- + +enabled (rw): Enable/Disable controls at zone level or for all zones using +a control type. + +Power Cap Client Driver Interface +================================= + +The API summary: + +Call powercap_register_control_type() to register control type object. +Call powercap_register_zone() to register a power zone (under a given +control type), either as a top-level power zone or as a subzone of another +power zone registered earlier. +The number of constraints in a power zone and the corresponding callbacks have +to be defined prior to calling powercap_register_zone() to register that zone. + +To Free a power zone call powercap_unregister_zone(). +To free a control type object call powercap_unregister_control_type(). +Detailed API can be generated using kernel-doc on include/linux/powercap.h. |