summaryrefslogtreecommitdiffstats
path: root/docs/performance/power_profiling_overview.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/performance/power_profiling_overview.md')
-rw-r--r--docs/performance/power_profiling_overview.md326
1 files changed, 326 insertions, 0 deletions
diff --git a/docs/performance/power_profiling_overview.md b/docs/performance/power_profiling_overview.md
new file mode 100644
index 0000000000..bb8f511fe2
--- /dev/null
+++ b/docs/performance/power_profiling_overview.md
@@ -0,0 +1,326 @@
+# Power profiling
+
+This article covers important background information about power
+profiling, with an emphasis on Intel processors used in desktop and
+laptop machines. It serves as a starting point for anybody doing power
+profiling for the first time.
+
+## Basic physics concepts
+
+In physics, *[power](https://en.wikipedia.org/wiki/Power_%28physics%29)*
+is the rate of doing
+*[work](https://en.wikipedia.org/wiki/Work_%28physics%29 "Work (physics)")*.
+It is equivalent to an amount of
+*[energy](https://en.wikipedia.org/wiki/Energy_%28physics%29 "Energy (physics)"){.mw-redirect}*
+consumed per unit time. In SI units, energy is measured in Joules, and
+power is measured in Watts, which is equivalent to Joules per second.
+
+Although power is an instantaneous concept, in practice measurements of
+it are determined in a non-instantaneous fashion, i.e. by dividing an
+energy amount by a non-infinitesimal time period. Strictly speaking,
+such a computation gives the *average power* but this is often referred
+to as just the *power* when context makes it clear.
+
+In the context of computing, a fully-charged mobile device battery (as
+found in a laptop or smartphone) holds a certain amount of energy, and
+the speed at which that stored energy is depleted depends on the power
+consumption of the mobile device. That in turn depends on the software
+running on the device. Web browsers are popular applications and can be
+power-intensive, and therefore can significantly affect battery life. As
+a result, it is worth optimizing (i.e. reducing) the power consumption
+caused by Firefox and Firefox OS.
+
+## Intel processor basics
+
+### Processor layout
+
+The following diagram (from the [Intel Power Governor
+documentation)](https://software.intel.com/en-us/articles/intel-power-governor)
+shows how machines using recent Intel processors are constructed.
+
+![](img/power-planes.jpg)
+
+The important points are as follows.
+
+- The processor has one or more *packages*. These are part of the
+ actual processor that you buy from Intel. Client processors (e.g.
+ Core i3/i5/i7) have one package. Server processors (e.g. Xeon)
+ typically have two or more packages.
+- Each package contains multiple *cores*.
+- Each core typically has
+ [hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading),
+ which means it contains two logical *CPUs*.
+- The part of the package outside the cores is called the [*uncore* or
+ *system agent*](https://en.wikipedia.org/wiki/Uncore)*.* It includes
+ various components including the L3 cache, memory controller, and,
+ for processors that have one, the integrated GPU.
+- RAM is separate from the processor.
+
+### C-states
+
+Intel processors have aggressive power-saving features. The first is the
+ability to switch frequently (thousands of times per second) between
+active and idle states, and there are actually several different kinds
+of idle states. These different states are called
+*[C-states](https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface#Processor_states).*
+C0 is the active/busy state, where instructions are being executed. The
+other states have higher numbers and reflect increasing deeper idle
+states. The deeper an idle state is, the less power it uses, but the
+longer it takes to wake up from.
+
+Note: the [ACPI
+standard](https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface)
+specifies four states, C0, C1, C2 and C3. Intel maps these to
+processor-specific states such as C0, C1, C2, C6 and C7. and many tools
+report C-states using the latter names. The exact relationship is
+confusing, and chapter 13 of the [Intel optimization
+manual](http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html)
+has more details. The important thing is that C0 is always the active
+state, and for the idle states a higher number always means less power
+consumption.
+
+The other thing to note about C-states is that they apply both to cores
+and the entire package --- i.e. if all cores are idle then the entire
+package can also become idle, which reduces power consumption even
+further.
+
+The fraction of time that a package or core spends in an idle C-state is
+called the *C-state residency*. This is a misleading term --- the active
+state, C0, is also a C-state --- but one that is nonetheless common.
+
+Intel processors have model-specific registers (MSRs) containing
+measurements of how much time is spent in different C-states, and tools
+such as [powermetrics](powermetrics.md)
+(Mac), powertop and
+[turbostat](turbostat.md) (Linux) can
+expose this information.
+
+A *wakeup* occurs when a core or package transitions from an idle state
+to the active state. This happens when the OS schedules a process to run
+due to some kind of event. Common causes of wakeups include scheduled
+timers going off and blocked I/O system calls receiving data.
+Maintaining C-state residency is crucial to keep power consumption low,
+and so reducing wakeup frequency is one of the best ways to reduce power
+consumption.
+
+One consequence of the existence of C-states is that observations made
+during power profiling --- even more than with other kinds of profiling
+--- can disturb what is being observed. For example, the Gecko Profiler
+takes samples at 1000Hz using a timer. Each of these samples can trigger
+a wakeup, which consumes power and obscures Firefox's natural wakeup
+patterns. For this reason, integrating power measurements into the Gecko
+Profiler is unlikely to be useful, and other power profiling tools
+typically use much lower sampling rates (e.g. 1Hz.)
+
+### P-states
+
+Intel processors also support multiple *P-states*. P0 is the state where
+the processor is operating at maximum frequency and voltage, and
+higher-numbered P-states operate at a lower frequency and voltage to
+reduce power consumption. Processors can have dozens of P-states, but
+the transitions are controlled by the hardware and OS and so P-states
+are of less interest to application developers than C-states.
+
+## Power and power-related measurements
+
+There are several kinds of power and power-related measurements. Some
+are global (whole-system) and some are per-process. The following
+sections list them from best to worst.
+
+### Power measurements
+
+The best measurements are measured in joules and/or watts, and are taken
+by measuring the actual hardware in some fashion. These are global
+(whole-system) measurements that are affected by running programs but
+also by other things such as (for laptops) how bright the monitor
+backlight is.
+
+- Devices such as ammeters give the best results, but these can be
+ expensive and difficult to set up.
+- A cruder technique that works with mobile machines and devices is to
+ run a program for a long time and simply time how long it takes for
+ the battery to drain. The long measurement times required are a
+ disadvantage, though.
+
+### Power estimates
+
+The next best measurements come from recent (Sandy Bridge and later)
+Intel processors that implement the *RAPL* (Running Average Power Limit)
+interface that provides MSRs containing energy consumption estimates for
+up to four *power planes* or *domains* of a machine, as seen in the
+diagram above.
+
+- PKG: The entire package.
+ - PP0: The cores.
+ - PP1: An uncore device, usually the GPU (not available on all
+ processor models.)
+- DRAM: main memory (not available on all processor models.)
+
+The following relationship holds: PP0 + PP1 \<= PKG. DRAM is independent
+of the other three domains.
+
+These values are computed using a power model that uses
+processor-internal counts as inputs, and they have been
+[verified](http://www.computer.org/csdl/proceedings/ispass/2013/5776/00/06557170.pdf)
+as being fairly accurate. They are also updated frequently, at
+approximately 1,000 Hz, though the variability in their update latency
+means that they are probably only accurate at lower frequencies, e.g. up
+to 20 Hz or so. See section 14.9 of Volume 3 of the [Intel Software
+Developer's
+Manual](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
+for more details about RAPL.
+
+Tools that can take RAPL readings include the following.
+
+- `tools/power/rapl`: all planes; Linux and Mac.
+- [Intel Power
+ Gadget](intel_power_gadget.md): PKG and
+ PP0 planes; Windows, Mac and Linux.
+- [powermetrics](powermetrics.md): PKG
+ plane; Mac.
+- [perf](perf.md): all planes; Linux.
+- [turbostat](turbostat.md): PKG, PP0 and
+ PP1 planes; Linux.
+
+Of these,
+[tools/power/rapl](tools_power_rapl.md) is
+generally the easiest and best to use because it reads all power planes,
+it's a command line utility, and it doesn't measure anything else.
+
+### Proxy measurements
+
+The next best measurements are proxy measurements, i.e. measurements of
+things that affect power consumption such as CPU activity, GPU activity,
+wakeup frequency, C-state residency, disk activity, and network
+activity. Some of these are measured on a global basis, and some can be
+measured on a per-process basis. Some can also be measured via
+instrumentation within Firefox itself.
+
+The correlation between each proxy measure and power consumption is hard
+to know and can vary greatly. When used carefully, however, they can
+still be useful. This is because they can often be measured in a more
+fine-grained fashion than power measurements and estimates, which is
+vital for gaining insight into how a program can reduce power
+consumption.
+
+Most profiling tools provide at least some proxy measurements.
+
+### Hybrid proxy measurements
+
+These are combinations of proxy measurements. The combinations are
+semi-arbitrary, they amplify the unreliability of proxy measurements,
+and unlike non-hybrid proxy measurements, they don't have a clear
+physical meaning. Avoid them.
+
+The most notable example of a hybrid proxy measurement is the ["Energy
+Impact" used by OS X's Activity
+[Monitor](activity_monitor_and_top.md#What-does-Energy-Impact-measure).
+
+## Ways to user power-related measurements
+
+### Low-context measurements
+
+Most power-related measurements are global or per-process. Such
+low-context measurements are typically good for understand *if* power
+consumption is good or bad, but in the latter case they often don't
+provide much insight into why the problem is occurring, which part of
+the code is at fault, or how it can be fixed. Nonetheless, they can
+still help improve understanding of a problem by using *differential
+profiling*.
+
+- Compare browsers to see if Firefox is doing better or worse than
+ another browser on a particular workload.
+- Compare different versions of Firefox to see if Firefox has improved
+ or worsened over time on a particular workload. This can identify
+ specific changes that caused regressions, for example.
+- Compare different configurations of Firefox to see if a particular
+ feature is affecting things.
+- Compare different workloads. This can be particularly useful if the
+ workloads only vary slightly. For example, it can be useful to
+ gradually remove elements from a web page and see how the
+ power-related measurements change. Even just switching a tab from
+ the foreground to the background can make a difference.
+
+### High-context measurements
+
+A few power-related measurements can be obtained in a high-context
+fashion, e.g. with stack traces that clearly pinpoint specific parts of
+the code as being responsible.
+
+- Standard performance profiling tools that measure CPU usage or
+ proxies of CPU usage (such as instruction counts) typically provide
+ high-context measurements. This is useful because high CPU usage
+ typically causes high power consumption.
+- Some tools can provide high-context wakeup measurements:
+ [dtrace](dtrace.md) (on Mac) and
+ [perf](perf.md) (on Linux.)
+- Source-level instrumentation, such as [TimerFirings
+ logging](timerfirings_logging.md), can
+ identify which timers are firing frequently.
+
+## Power profiling how-to
+
+This section aims to put together all the above information and provide
+a set of strategies for finding, diagnosing and fixing cases of high
+power consumption.
+
+- First of all, all measurements are best done on a quiet machine that
+ is running little other than the program of interest. Global
+ measurements in particular can be completely skewed and unreliable
+ if this is not the case.
+- Find or confirm a test case where Firefox's power consumption is
+ high. "High" can most easily be gauged by comparing against other
+ browsers. Use power measurements or estimates (e.g. via
+ [tools/power/rapl](tools_power_rapl.md),
+ or `mach power` on Mac, or [Intel Power
+ Gadget](intel_power_gadget.md) on
+ Windows) for the comparisons. Avoid lower-quality measurements,
+ especially Activity Monitor's "Energy Impact".
+- Try using differential profiling to narrow down the cause.
+ - Try turning hardware acceleration on or off; e10s on or off;
+ Flash on or off.
+ - Try putting the relevant tab in the foreground vs. in the
+ background.
+ - If the problem manifests on a particular website, try saving a
+ local copy of the site and then manually removing HTML elements
+ to see if a particular page feature is causing the problem
+- Many power problems are caused by either high CPU usage or high
+ wakeup frequency. Use one of the low-context tools to determine if
+ this is the case (e.g. on Mac use
+ [powermetrics](powermetrics.md).) If
+ so, follow that up by using a tool that gives high-context
+ measurements, which hopefully will identify the cause of the
+ problem.
+ - For high CPU usage, many profilers can be used: Firefox's dev
+ tools profiler, the Gecko Profiler, or generic performance
+ profilers.
+ - For high wakeup counts, use
+ [dtrace](dtrace.md) or
+ [perf](perf.md) or [TimerFirings logging](timerfirings_logging.md).
+- On Mac workloads that use graphics, Activity Monitor's "Energy"
+ tab can tell you if the high-performance GPU is being used, which
+ uses more power than the integrated GPU.
+- If neither CPU usage nor wakeup frequency identifies the problem,
+ more ingenuity may be needed. Looking at other measurements (C-state
+ residency, GPU usage, etc.) may be helpful.
+- Animations are sometimes the cause of high power consumption. The
+ [animation
+ inspector](/devtools-user/page_inspector/how_to/work_with_animations/index.rst#animation-inspector)
+ in the Firefox Devtools can identify them. Alternatively, [here is
+ an
+ explanation](https://bugzilla.mozilla.org/show_bug.cgi?id=1190721#c10)
+ of how one developer diagnosed two animation-related problems the
+ hard way (which required genuine platform expertise).
+- The approximate cause of power problems often isn't that hard to
+ find. Fixing them is often the hard part. Good luck.
+- If you do fix a problem by improving a proxy measurement, you should
+ verify that it also improves a power measurement or estimate. That
+ way you know the fix had a genuine effect.
+
+## Further reading
+
+Chapter 13 of the [Intel optimization
+manual](http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html)
+has many details about optimizing for power consumption. Section 13.5
+("Tuning Software for Intelligent Power Consumption") in particular is
+worth reading.