summaryrefslogtreecommitdiffstats
path: root/Documentation/power/powercap
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/power/powercap')
-rw-r--r--Documentation/power/powercap/dtpm.rst212
-rw-r--r--Documentation/power/powercap/powercap.rst262
2 files changed, 474 insertions, 0 deletions
diff --git a/Documentation/power/powercap/dtpm.rst b/Documentation/power/powercap/dtpm.rst
new file mode 100644
index 0000000000..a38dee3d81
--- /dev/null
+++ b/Documentation/power/powercap/dtpm.rst
@@ -0,0 +1,212 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================================
+Dynamic Thermal Power Management framework
+==========================================
+
+On the embedded world, the complexity of the SoC leads to an
+increasing number of hotspots which need to be monitored and mitigated
+as a whole in order to prevent the temperature to go above the
+normative and legally stated 'skin temperature'.
+
+Another aspect is to sustain the performance for a given power budget,
+for example virtual reality where the user can feel dizziness if the
+performance is capped while a big CPU is processing something else. Or
+reduce the battery charging because the dissipated power is too high
+compared with the power consumed by other devices.
+
+The user space is the most adequate place to dynamically act on the
+different devices by limiting their power given an application
+profile: it has the knowledge of the platform.
+
+The Dynamic Thermal Power Management (DTPM) is a technique acting on
+the device power by limiting and/or balancing a power budget among
+different devices.
+
+The DTPM framework provides an unified interface to act on the
+device power.
+
+Overview
+========
+
+The DTPM framework relies on the powercap framework to create the
+powercap entries in the sysfs directory and implement the backend
+driver to do the connection with the power manageable device.
+
+The DTPM is a tree representation describing the power constraints
+shared between devices, not their physical positions.
+
+The nodes of the tree are a virtual description aggregating the power
+characteristics of the children nodes and their power limitations.
+
+The leaves of the tree are the real power manageable devices.
+
+For instance::
+
+ SoC
+ |
+ `-- pkg
+ |
+ |-- pd0 (cpu0-3)
+ |
+ `-- pd1 (cpu4-5)
+
+The pkg power will be the sum of pd0 and pd1 power numbers::
+
+ SoC (400mW - 3100mW)
+ |
+ `-- pkg (400mW - 3100mW)
+ |
+ |-- pd0 (100mW - 700mW)
+ |
+ `-- pd1 (300mW - 2400mW)
+
+When the nodes are inserted in the tree, their power characteristics are propagated to the parents::
+
+ SoC (600mW - 5900mW)
+ |
+ |-- pkg (400mW - 3100mW)
+ | |
+ | |-- pd0 (100mW - 700mW)
+ | |
+ | `-- pd1 (300mW - 2400mW)
+ |
+ `-- pd2 (200mW - 2800mW)
+
+Each node have a weight on a 2^10 basis reflecting the percentage of power consumption along the siblings::
+
+ SoC (w=1024)
+ |
+ |-- pkg (w=538)
+ | |
+ | |-- pd0 (w=231)
+ | |
+ | `-- pd1 (w=794)
+ |
+ `-- pd2 (w=486)
+
+ Note the sum of weights at the same level are equal to 1024.
+
+When a power limitation is applied to a node, then it is distributed along the children given their weights. For example, if we set a power limitation of 3200mW at the 'SoC' root node, the resulting tree will be::
+
+ SoC (w=1024) <--- power_limit = 3200mW
+ |
+ |-- pkg (w=538) --> power_limit = 1681mW
+ | |
+ | |-- pd0 (w=231) --> power_limit = 378mW
+ | |
+ | `-- pd1 (w=794) --> power_limit = 1303mW
+ |
+ `-- pd2 (w=486) --> power_limit = 1519mW
+
+
+Flat description
+----------------
+
+A root node is created and it is the parent of all the nodes. This
+description is the simplest one and it is supposed to give to user
+space a flat representation of all the devices supporting the power
+limitation without any power limitation distribution.
+
+Hierarchical description
+------------------------
+
+The different devices supporting the power limitation are represented
+hierarchically. There is one root node, all intermediate nodes are
+grouping the child nodes which can be intermediate nodes also or real
+devices.
+
+The intermediate nodes aggregate the power information and allows to
+set the power limit given the weight of the nodes.
+
+User space API
+==============
+
+As stated in the overview, the DTPM framework is built on top of the
+powercap framework. Thus the sysfs interface is the same, please refer
+to the powercap documentation for further details.
+
+ * power_uw: Instantaneous power consumption. If the node is an
+ intermediate node, then the power consumption will be the sum of all
+ children power consumption.
+
+ * max_power_range_uw: The power range resulting of the maximum power
+ minus the minimum power.
+
+ * name: The name of the node. This is implementation dependent. Even
+ if it is not recommended for the user space, several nodes can have
+ the same name.
+
+ * constraint_X_name: The name of the constraint.
+
+ * constraint_X_max_power_uw: The maximum power limit to be applicable
+ to the node.
+
+ * constraint_X_power_limit_uw: The power limit to be applied to the
+ node. If the value contained in constraint_X_max_power_uw is set,
+ the constraint will be removed.
+
+ * constraint_X_time_window_us: The meaning of this file will depend
+ on the constraint number.
+
+Constraints
+-----------
+
+ * Constraint 0: The power limitation is immediately applied, without
+ limitation in time.
+
+Kernel API
+==========
+
+Overview
+--------
+
+The DTPM framework has no power limiting backend support. It is
+generic and provides a set of API to let the different drivers to
+implement the backend part for the power limitation and create the
+power constraints tree.
+
+It is up to the platform to provide the initialization function to
+allocate and link the different nodes of the tree.
+
+A special macro has the role of declaring a node and the corresponding
+initialization function via a description structure. This one contains
+an optional parent field allowing to hook different devices to an
+already existing tree at boot time.
+
+For instance::
+
+ struct dtpm_descr my_descr = {
+ .name = "my_name",
+ .init = my_init_func,
+ };
+
+ DTPM_DECLARE(my_descr);
+
+The nodes of the DTPM tree are described with dtpm structure. The
+steps to add a new power limitable device is done in three steps:
+
+ * Allocate the dtpm node
+ * Set the power number of the dtpm node
+ * Register the dtpm node
+
+The registration of the dtpm node is done with the powercap
+ops. Basically, it must implements the callbacks to get and set the
+power and the limit.
+
+Alternatively, if the node to be inserted is an intermediate one, then
+a simple function to insert it as a future parent is available.
+
+If a device has its power characteristics changing, then the tree must
+be updated with the new power numbers and weights.
+
+Nomenclature
+------------
+
+ * dtpm_alloc() : Allocate and initialize a dtpm structure
+
+ * dtpm_register() : Add the dtpm node to the tree
+
+ * dtpm_unregister() : Remove the dtpm node from the tree
+
+ * dtpm_update_power() : Update the power characteristics of the dtpm node
diff --git a/Documentation/power/powercap/powercap.rst b/Documentation/power/powercap/powercap.rst
new file mode 100644
index 0000000000..e75d12596d
--- /dev/null
+++ b/Documentation/power/powercap/powercap.rst
@@ -0,0 +1,262 @@
+=======================
+Power Capping Framework
+=======================
+
+The power capping framework provides a consistent interface between the kernel
+and the user space that allows power capping drivers to expose the settings to
+user space in a uniform way.
+
+Terminology
+===========
+
+The framework exposes power capping devices to user space via sysfs in the
+form of a tree of objects. The objects at the root level of the tree represent
+'control types', which correspond to different methods of power capping. For
+example, the intel-rapl control type represents the Intel "Running Average
+Power Limit" (RAPL) technology, whereas the 'idle-injection' control type
+corresponds to the use of idle injection for controlling power.
+
+Power zones represent different parts of the system, which can be controlled and
+monitored using the power capping method determined by the control type the
+given zone belongs to. They each contain attributes for monitoring power, as
+well as controls represented in the form of power constraints. If the parts of
+the system represented by different power zones are hierarchical (that is, one
+bigger part consists of multiple smaller parts that each have their own power
+controls), those power zones may also be organized in a hierarchy with one
+parent power zone containing multiple subzones and so on to reflect the power
+control topology of the system. In that case, it is possible to apply power
+capping to a set of devices together using the parent power zone and if more
+fine grained control is required, it can be applied through the subzones.
+
+
+Example sysfs interface tree::
+
+ /sys/devices/virtual/powercap
+ └──intel-rapl
+ ├──intel-rapl:0
+ │   ├──constraint_0_name
+ │   ├──constraint_0_power_limit_uw
+ │   ├──constraint_0_time_window_us
+ │   ├──constraint_1_name
+ │   ├──constraint_1_power_limit_uw
+ │   ├──constraint_1_time_window_us
+ │   ├──device -> ../../intel-rapl
+ │   ├──energy_uj
+ │   ├──intel-rapl:0:0
+ │   │   ├──constraint_0_name
+ │   │   ├──constraint_0_power_limit_uw
+ │   │   ├──constraint_0_time_window_us
+ │   │   ├──constraint_1_name
+ │   │   ├──constraint_1_power_limit_uw
+ │   │   ├──constraint_1_time_window_us
+ │   │   ├──device -> ../../intel-rapl:0
+ │   │   ├──energy_uj
+ │   │   ├──max_energy_range_uj
+ │   │   ├──name
+ │   │   ├──enabled
+ │   │   ├──power
+ │   │   │   ├──async
+ │   │   │   []
+ │   │   ├──subsystem -> ../../../../../../class/power_cap
+ │   │   └──uevent
+ │   ├──intel-rapl:0:1
+ │   │   ├──constraint_0_name
+ │   │   ├──constraint_0_power_limit_uw
+ │   │   ├──constraint_0_time_window_us
+ │   │   ├──constraint_1_name
+ │   │   ├──constraint_1_power_limit_uw
+ │   │   ├──constraint_1_time_window_us
+ │   │   ├──device -> ../../intel-rapl:0
+ │   │   ├──energy_uj
+ │   │   ├──max_energy_range_uj
+ │   │   ├──name
+ │   │   ├──enabled
+ │   │   ├──power
+ │   │   │   ├──async
+ │   │   │   []
+ │   │   ├──subsystem -> ../../../../../../class/power_cap
+ │   │   └──uevent
+ │   ├──max_energy_range_uj
+ │   ├──max_power_range_uw
+ │   ├──name
+ │   ├──enabled
+ │   ├──power
+ │   │   ├──async
+ │   │   []
+ │   ├──subsystem -> ../../../../../class/power_cap
+ │   ├──enabled
+ │   ├──uevent
+ ├──intel-rapl:1
+ │   ├──constraint_0_name
+ │   ├──constraint_0_power_limit_uw
+ │   ├──constraint_0_time_window_us
+ │   ├──constraint_1_name
+ │   ├──constraint_1_power_limit_uw
+ │   ├──constraint_1_time_window_us
+ │   ├──device -> ../../intel-rapl
+ │   ├──energy_uj
+ │   ├──intel-rapl:1:0
+ │   │   ├──constraint_0_name
+ │   │   ├──constraint_0_power_limit_uw
+ │   │   ├──constraint_0_time_window_us
+ │   │   ├──constraint_1_name
+ │   │   ├──constraint_1_power_limit_uw
+ │   │   ├──constraint_1_time_window_us
+ │   │   ├──device -> ../../intel-rapl:1
+ │   │   ├──energy_uj
+ │   │   ├──max_energy_range_uj
+ │   │   ├──name
+ │   │   ├──enabled
+ │   │   ├──power
+ │   │   │   ├──async
+ │   │   │   []
+ │   │   ├──subsystem -> ../../../../../../class/power_cap
+ │   │   └──uevent
+ │   ├──intel-rapl:1:1
+ │   │   ├──constraint_0_name
+ │   │   ├──constraint_0_power_limit_uw
+ │   │   ├──constraint_0_time_window_us
+ │   │   ├──constraint_1_name
+ │   │   ├──constraint_1_power_limit_uw
+ │   │   ├──constraint_1_time_window_us
+ │   │   ├──device -> ../../intel-rapl:1
+ │   │   ├──energy_uj
+ │   │   ├──max_energy_range_uj
+ │   │   ├──name
+ │   │   ├──enabled
+ │   │   ├──power
+ │   │   │   ├──async
+ │   │   │   []
+ │   │   ├──subsystem -> ../../../../../../class/power_cap
+ │   │   └──uevent
+ │   ├──max_energy_range_uj
+ │   ├──max_power_range_uw
+ │   ├──name
+ │   ├──enabled
+ │   ├──power
+ │   │   ├──async
+ │   │   []
+ │   ├──subsystem -> ../../../../../class/power_cap
+ │   ├──uevent
+ ├──power
+ │   ├──async
+ │   []
+ ├──subsystem -> ../../../../class/power_cap
+ ├──enabled
+ └──uevent
+
+The above example illustrates a case in which the Intel RAPL technology,
+available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one
+control type called intel-rapl which contains two power zones, intel-rapl:0 and
+intel-rapl:1, representing CPU packages. Each of these power zones contains
+two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the
+"core" and the "uncore" parts of the given CPU package, respectively. All of
+the zones and subzones contain energy monitoring attributes (energy_uj,
+max_energy_range_uj) and constraint attributes (constraint_*) allowing controls
+to be applied (the constraints in the 'package' power zones apply to the whole
+CPU packages and the subzone constraints only apply to the respective parts of
+the given package individually). Since Intel RAPL doesn't provide instantaneous
+power value, there is no power_uw attribute.
+
+In addition to that, each power zone contains a name attribute, allowing the
+part of the system represented by that zone to be identified.
+For example::
+
+ cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
+
+package-0
+---------
+
+Depending on different power zones, the Intel RAPL technology allows
+one or multiple constraints like short term, long term and peak power,
+with different time windows to be applied to each power zone.
+All the zones contain attributes representing the constraint names,
+power limits and the sizes of the time windows. Note that time window
+is not applicable to peak power. Here, constraint_j_* attributes
+correspond to the jth constraint (j = 0,1,2).
+
+For example::
+
+ constraint_0_name
+ constraint_0_power_limit_uw
+ constraint_0_time_window_us
+ constraint_1_name
+ constraint_1_power_limit_uw
+ constraint_1_time_window_us
+ constraint_2_name
+ constraint_2_power_limit_uw
+ constraint_2_time_window_us
+
+Power Zone Attributes
+=====================
+
+Monitoring attributes
+---------------------
+
+energy_uj (rw)
+ Current energy counter in micro joules. Write "0" to reset.
+ If the counter can not be reset, then this attribute is read only.
+
+max_energy_range_uj (ro)
+ Range of the above energy counter in micro-joules.
+
+power_uw (ro)
+ Current power in micro watts.
+
+max_power_range_uw (ro)
+ Range of the above power value in micro-watts.
+
+name (ro)
+ Name of this power zone.
+
+It is possible that some domains have both power ranges and energy counter ranges;
+however, only one is mandatory.
+
+Constraints
+-----------
+
+constraint_X_power_limit_uw (rw)
+ Power limit in micro watts, which should be applicable for the
+ time window specified by "constraint_X_time_window_us".
+
+constraint_X_time_window_us (rw)
+ Time window in micro seconds.
+
+constraint_X_name (ro)
+ An optional name of the constraint
+
+constraint_X_max_power_uw(ro)
+ Maximum allowed power in micro watts.
+
+constraint_X_min_power_uw(ro)
+ Minimum allowed power in micro watts.
+
+constraint_X_max_time_window_us(ro)
+ Maximum allowed time window in micro seconds.
+
+constraint_X_min_time_window_us(ro)
+ Minimum allowed time window in micro seconds.
+
+Except power_limit_uw and time_window_us other fields are optional.
+
+Common zone and control type attributes
+---------------------------------------
+
+enabled (rw): Enable/Disable controls at zone level or for all zones using
+a control type.
+
+Power Cap Client Driver Interface
+=================================
+
+The API summary:
+
+Call powercap_register_control_type() to register control type object.
+Call powercap_register_zone() to register a power zone (under a given
+control type), either as a top-level power zone or as a subzone of another
+power zone registered earlier.
+The number of constraints in a power zone and the corresponding callbacks have
+to be defined prior to calling powercap_register_zone() to register that zone.
+
+To Free a power zone call powercap_unregister_zone().
+To free a control type object call powercap_unregister_control_type().
+Detailed API can be generated using kernel-doc on include/linux/powercap.h.