summaryrefslogtreecommitdiffstats
path: root/docs/performance/power_profiling_overview.md
blob: bb8f511fe2ba4700e8f992422f19d455b88db2ba (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
# Power profiling

This article covers important background information about power
profiling, with an emphasis on Intel processors used in desktop and
laptop machines. It serves as a starting point for anybody doing power
profiling for the first time.

## Basic physics concepts

In physics, *[power](https://en.wikipedia.org/wiki/Power_%28physics%29)*
is the rate of doing
*[work](https://en.wikipedia.org/wiki/Work_%28physics%29 "Work (physics)")*.
It is equivalent to an amount of
*[energy](https://en.wikipedia.org/wiki/Energy_%28physics%29 "Energy (physics)"){.mw-redirect}*
consumed per unit time. In SI units, energy is measured in Joules, and
power is measured in Watts, which is equivalent to Joules per second.

Although power is an instantaneous concept, in practice measurements of
it are determined in a non-instantaneous fashion, i.e. by dividing an
energy amount by a non-infinitesimal time period. Strictly speaking,
such a computation gives the *average power* but this is often referred
to as just the *power* when context makes it clear.

In the context of computing, a fully-charged mobile device battery (as
found in a laptop or smartphone) holds a certain amount of energy, and
the speed at which that stored energy is depleted depends on the power
consumption of the mobile device. That in turn depends on the software
running on the device. Web browsers are popular applications and can be
power-intensive, and therefore can significantly affect battery life. As
a result, it is worth optimizing (i.e. reducing) the power consumption
caused by Firefox and Firefox OS.

## Intel processor basics

### Processor layout

The following diagram (from the [Intel Power Governor
documentation)](https://software.intel.com/en-us/articles/intel-power-governor)
shows how machines using recent Intel processors are constructed.

![](img/power-planes.jpg)

The important points are as follows.

-   The processor has one or more *packages*. These are part of the
    actual processor that you buy from Intel. Client processors (e.g.
    Core i3/i5/i7) have one package. Server processors (e.g. Xeon)
    typically have two or more packages.
-   Each package contains multiple *cores*.
-   Each core typically has
    [hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading),
    which means it contains two logical *CPUs*.
-   The part of the package outside the cores is called the [*uncore* or
    *system agent*](https://en.wikipedia.org/wiki/Uncore)*.* It includes
    various components including the L3 cache, memory controller, and,
    for processors that have one, the integrated GPU.
-   RAM is separate from the processor.

### C-states

Intel processors have aggressive power-saving features. The first is the
ability to switch frequently (thousands of times per second) between
active and idle states, and there are actually several different kinds
of idle states. These different states are called
*[C-states](https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface#Processor_states).*
C0 is the active/busy state, where instructions are being executed. The
other states have higher numbers and reflect increasing deeper idle
states. The deeper an idle state is, the less power it uses, but the
longer it takes to wake up from.

Note: the [ACPI
standard](https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface)
specifies four states, C0, C1, C2 and C3. Intel maps these to
processor-specific states such as C0, C1, C2, C6 and C7. and many tools
report C-states using the latter names. The exact relationship is
confusing, and chapter 13 of the [Intel optimization
manual](http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html)
has more details. The important thing is that C0 is always the active
state, and for the idle states a higher number always means less power
consumption.

The other thing to note about C-states is that they apply both to cores
and the entire package --- i.e. if all cores are idle then the entire
package can also become idle, which reduces power consumption even
further.

The fraction of time that a package or core spends in an idle C-state is
called the *C-state residency*. This is a misleading term --- the active
state, C0, is also a C-state --- but one that is nonetheless common.

Intel processors have model-specific registers (MSRs) containing
measurements of how much time is spent in different C-states, and tools
such as [powermetrics](powermetrics.md)
(Mac), powertop and
[turbostat](turbostat.md) (Linux) can
expose this information.

A *wakeup* occurs when a core or package transitions from an idle state
to the active state. This happens when the OS schedules a process to run
due to some kind of event. Common causes of wakeups include scheduled
timers going off and blocked I/O system calls receiving data.
Maintaining C-state residency is crucial to keep power consumption low,
and so reducing wakeup frequency is one of the best ways to reduce power
consumption.

One consequence of the existence of C-states is that observations made
during power profiling --- even more than with other kinds of profiling
--- can disturb what is being observed. For example, the Gecko Profiler
takes samples at 1000Hz using a timer. Each of these samples can trigger
a wakeup, which consumes power and obscures Firefox's natural wakeup
patterns. For this reason, integrating power measurements into the Gecko
Profiler is unlikely to be useful, and other power profiling tools
typically use much lower sampling rates (e.g. 1Hz.)

### P-states

Intel processors also support multiple *P-states*. P0 is the state where
the processor is operating at maximum frequency and voltage, and
higher-numbered P-states operate at a lower frequency and voltage to
reduce power consumption. Processors can have dozens of P-states, but
the transitions are controlled by the hardware and OS and so P-states
are of less interest to application developers than C-states.

## Power and power-related measurements

There are several kinds of power and power-related measurements. Some
are global (whole-system) and some are per-process. The following
sections list them from best to worst.

### Power measurements

The best measurements are measured in joules and/or watts, and are taken
by measuring the actual hardware in some fashion. These are global
(whole-system) measurements that are affected by running programs but
also by other things such as (for laptops) how bright the monitor
backlight is.

-   Devices such as ammeters give the best results, but these can be
    expensive and difficult to set up.
-   A cruder technique that works with mobile machines and devices is to
    run a program for a long time and simply time how long it takes for
    the battery to drain. The long measurement times required are a
    disadvantage, though.

### Power estimates

The next best measurements come from recent (Sandy Bridge and later)
Intel processors that implement the *RAPL* (Running Average Power Limit)
interface that provides MSRs containing energy consumption estimates for
up to four *power planes* or *domains* of a machine, as seen in the
diagram above.

-   PKG: The entire package.
    -   PP0: The cores.
    -   PP1: An uncore device, usually the GPU (not available on all
        processor models.)
-   DRAM: main memory (not available on all processor models.)

The following relationship holds: PP0 + PP1 \<= PKG. DRAM is independent
of the other three domains.

These values are computed using a power model that uses
processor-internal counts as inputs, and they have been
[verified](http://www.computer.org/csdl/proceedings/ispass/2013/5776/00/06557170.pdf)
as being fairly accurate. They are also updated frequently, at
approximately 1,000 Hz, though the variability in their update latency
means that they are probably only accurate at lower frequencies, e.g. up
to 20 Hz or so. See section 14.9 of Volume 3 of the [Intel Software
Developer's
Manual](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
for more details about RAPL.

Tools that can take RAPL readings include the following.

-   `tools/power/rapl`: all planes; Linux and Mac.
-   [Intel Power
    Gadget](intel_power_gadget.md): PKG and
    PP0 planes; Windows, Mac and Linux.
-   [powermetrics](powermetrics.md): PKG
    plane; Mac.
-   [perf](perf.md): all planes; Linux.
-   [turbostat](turbostat.md): PKG, PP0 and
    PP1 planes; Linux.

Of these,
[tools/power/rapl](tools_power_rapl.md) is
generally the easiest and best to use because it reads all power planes,
it's a command line utility, and it doesn't measure anything else.

### Proxy measurements

The next best measurements are proxy measurements, i.e. measurements of
things that affect power consumption such as CPU activity, GPU activity,
wakeup frequency, C-state residency, disk activity, and network
activity. Some of these are measured on a global basis, and some can be
measured on a per-process basis. Some can also be measured via
instrumentation within Firefox itself.

The correlation between each proxy measure and power consumption is hard
to know and can vary greatly. When used carefully, however, they can
still be useful. This is because they can often be measured in a more
fine-grained fashion than power measurements and estimates, which is
vital for gaining insight into how a program can reduce power
consumption.

Most profiling tools provide at least some proxy measurements.

### Hybrid proxy measurements

These are combinations of proxy measurements. The combinations are
semi-arbitrary, they amplify the unreliability of proxy measurements,
and unlike non-hybrid proxy measurements, they don't have a clear
physical meaning. Avoid them.

The most notable example of a hybrid proxy measurement is the ["Energy
Impact" used by OS X's Activity
[Monitor](activity_monitor_and_top.md#What-does-Energy-Impact-measure).

## Ways to user power-related measurements

### Low-context measurements

Most power-related measurements are global or per-process. Such
low-context measurements are typically good for understand *if* power
consumption is good or bad, but in the latter case they often don't
provide much insight into why the problem is occurring, which part of
the code is at fault, or how it can be fixed. Nonetheless, they can
still help improve understanding of a problem by using *differential
profiling*.

-   Compare browsers to see if Firefox is doing better or worse than
    another browser on a particular workload.
-   Compare different versions of Firefox to see if Firefox has improved
    or worsened over time on a particular workload. This can identify
    specific changes that caused regressions, for example.
-   Compare different configurations of Firefox to see if a particular
    feature is affecting things.
-   Compare different workloads. This can be particularly useful if the
    workloads only vary slightly. For example, it can be useful to
    gradually remove elements from a web page and see how the
    power-related measurements change. Even just switching a tab from
    the foreground to the background can make a difference.

### High-context measurements

A few power-related measurements can be obtained in a high-context
fashion, e.g. with stack traces that clearly pinpoint specific parts of
the code as being responsible.

-   Standard performance profiling tools that measure CPU usage or
    proxies of CPU usage (such as instruction counts) typically provide
    high-context measurements. This is useful because high CPU usage
    typically causes high power consumption.
-   Some tools can provide high-context wakeup measurements:
    [dtrace](dtrace.md) (on Mac) and
    [perf](perf.md) (on Linux.)
-   Source-level instrumentation, such as [TimerFirings
    logging](timerfirings_logging.md), can
    identify which timers are firing frequently.

## Power profiling how-to

This section aims to put together all the above information and provide
a set of strategies for finding, diagnosing and fixing cases of high
power consumption.

-   First of all, all measurements are best done on a quiet machine that
    is running little other than the program of interest. Global
    measurements in particular can be completely skewed and unreliable
    if this is not the case.
-   Find or confirm a test case where Firefox's power consumption is
    high. "High" can most easily be gauged by comparing against other
    browsers. Use power measurements or estimates (e.g. via
    [tools/power/rapl](tools_power_rapl.md),
    or `mach power` on Mac, or [Intel Power
    Gadget](intel_power_gadget.md) on
    Windows) for the comparisons. Avoid lower-quality measurements,
    especially Activity Monitor's "Energy Impact".
-   Try using differential profiling to narrow down the cause.
    -   Try turning hardware acceleration on or off; e10s on or off;
        Flash on or off.
    -   Try putting the relevant tab in the foreground vs. in the
        background.
    -   If the problem manifests on a particular website, try saving a
        local copy of the site and then manually removing HTML elements
        to see if a particular page feature is causing the problem
-   Many power problems are caused by either high CPU usage or high
    wakeup frequency. Use one of the low-context tools to determine if
    this is the case (e.g. on Mac use
    [powermetrics](powermetrics.md).) If
    so, follow that up by using a tool that gives high-context
    measurements, which hopefully will identify the cause of the
    problem.
    -   For high CPU usage, many profilers can be used: Firefox's dev
        tools profiler, the Gecko Profiler, or generic performance
        profilers.
    -   For high wakeup counts, use
        [dtrace](dtrace.md) or
        [perf](perf.md) or [TimerFirings logging](timerfirings_logging.md).
-   On Mac workloads that use graphics, Activity Monitor's "Energy"
    tab can tell you if the high-performance GPU is being used, which
    uses more power than the integrated GPU.
-   If neither CPU usage nor wakeup frequency identifies the problem,
    more ingenuity may be needed. Looking at other measurements (C-state
    residency, GPU usage, etc.) may be helpful.
-   Animations are sometimes the cause of high power consumption. The
    [animation
    inspector](/devtools-user/page_inspector/how_to/work_with_animations/index.rst#animation-inspector)
    in the Firefox Devtools can identify them. Alternatively, [here is
    an
    explanation](https://bugzilla.mozilla.org/show_bug.cgi?id=1190721#c10)
    of how one developer diagnosed two animation-related problems the
    hard way (which required genuine platform expertise).
-   The approximate cause of power problems often isn't that hard to
    find. Fixing them is often the hard part. Good luck.
-   If you do fix a problem by improving a proxy measurement, you should
    verify that it also improves a power measurement or estimate. That
    way you know the fix had a genuine effect.

## Further reading

Chapter 13 of the [Intel optimization
manual](http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html)
has many details about optimizing for power consumption. Section 13.5
("Tuning Software for Intelligent Power Consumption") in particular is
worth reading.