summaryrefslogtreecommitdiffstats
path: root/docs/performance/profiling_with_xperf.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/performance/profiling_with_xperf.md')
-rw-r--r--docs/performance/profiling_with_xperf.md180
1 files changed, 180 insertions, 0 deletions
diff --git a/docs/performance/profiling_with_xperf.md b/docs/performance/profiling_with_xperf.md
new file mode 100644
index 0000000000..030dae7c68
--- /dev/null
+++ b/docs/performance/profiling_with_xperf.md
@@ -0,0 +1,180 @@
+# Profiling with xperf
+
+Xperf is part of the Microsoft Windows Performance Toolkit, and has
+functionality similar to that of Shark, oprofile, and (for some things)
+dtrace/Instruments. For stack walking, Windows Vista or higher is
+required; I haven't tested it at all on XP.
+
+This page applies to xperf version **4.8.7701 or newer**. To see your
+xperf version, either run '`xperf`' on a command line with no
+arguments, or start '`xperfview`' and look at Help -\> About
+Performance Analyzer. (Note that it's not the first version number in
+the About window; that's the Windows version.)
+
+If you have an older version, you will experience bugs, especially
+around symbol loading for local builds.
+
+## Installation
+
+For all versions, the tools are part of the latest [Windows 7 SDK (SDK
+Version
+7.1)](http://www.microsoft.com/downloads/details.aspx?FamilyID=6b6c21d2-2006-4afa-9702-529fa782d63b&displaylang=en "http://www.microsoft.com/downloads/details.aspx?FamilyID=6b6c21d2-2006-4afa-9702-529fa782d63b&displaylang=en"){.external}.
+Use the web installer to install at least the \"Win32 Development
+Tools\". Once the SDK installs, execute either `wpt_x86.msi` or
+`wpt_x64.msi` in the `Redist/Windows Performance Toolkit `folder of the
+SDK's install location (typically Program Files/Microsoft
+SDKs/Windows/v7.1/Redist/Windows Performance Toolkit) to actually
+install the Windows Performance Toolkit tools.
+
+It might already be installed by the Windows SDK. Check if C:\\Program
+Files\\Microsoft Windows Performance Toolkit already exists.
+
+For 64-bit Windows 7 or Vista, you'll need to do a registry tweak and
+then restart to enable stack walking:\
+\
+`REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD -f`
+
+## Symbol Server Setup
+
+With the latest versions of the Windows Performance Toolkit, you can
+modify the symbol path directly from within the program via the Trace
+menu. Just make sure you set the symbol paths before enabling \"Load
+Symbols\" and before opening a summary view. You can also modify the
+`_NT_SYMBOL_PATH` and `_NT_SYMCACHE_PATH` environment variables to make
+these changes permanent.
+
+The standard symbol path that includes both Mozilla's and Microsoft's
+symbol server configuration is as follows:
+
+`_NT_SYMCACHE_PATH: C:\symbols _NT_SYMBOL_PATH: srv*c:\symbols*http://msdl.microsoft.com/download/symbols;srv*c:\symbols*http://symbols.mozilla.org/firefox/`
+
+To add symbols **from your own builds**, add
+`C:\path\to\objdir\dist\bin` to `_NT_SYMBOL_PATH`. As with all Windows
+paths, the symbol path uses semicolons (`;`) as separators.
+
+Make sure you select the Trace -\> Load Symbols menu option in the
+Windows Performance Analyzer (xperfview).
+
+There seems to be a bug in xperf and symbols; it is very sensitive to
+when the symbol path is edited. If you change it within the program,
+you'll have to close all summary tables and reopen them for it to pick
+up the new symbol path data.
+
+You'll have to agree to a EULA for the Microsoft symbols \-- if you're
+not prompted for this, then something isn't configured right in your
+symbol path. (Again, make sure that the directories exist; if they
+don't, it's a silent error.)
+
+## Quick Start
+
+All these tools will live, by default, in C:\\Program Files\\Microsoft
+Windows Performance Toolkit. Either run these commands from there, or
+add the directory to your path. You will need to use an elevated command
+prompt to start or stop profiling.
+
+Start recording data:
+
+`xperf -on latency -stackwalk profile`
+
+\"Latency\" is a special provider name that turns on a few predefined
+kernel providers; run \"xperf -providers k\" to view a full list of
+providers and groups. You can combine providers, e.g., \"xperf -on
+DiagEasy+FILE_IO\". \"-stackwalk profile\" tells xperf to capture a
+stack for each PROFILE event; you could also do \"-stackwalk
+profile+file_io\" to capture a stack on each cpu profile tick and each
+file io completion event.
+
+Stop:
+
+`xperf -d out.etl`
+
+View:
+
+`xperfview out.etl`
+
+The MSDN
+\"[Quickstart](http://msdn.microsoft.com/en-us/library/ff190971%28v=VS.85%29.aspx){.external}\"
+page goes over this in more detail, and also has good explanations of
+how to use xperfview. I'm not going to repeat it here, because I'd be
+using essentially the same screenshots, so go look there.
+
+The 'stack' view will give results similar to shark.
+
+## Heap Profiling
+
+xperf has good tools for heap allocation profiling, but they have one
+major limitation: you can't build with jemalloc and get heap events
+generated. The stock windows CRT allocator is horrible about
+fragmentation, and causes memory usage to rise drastically even if only
+a small fraction of that memory is in use. However, even despite this,
+it's a useful way to track allocations/deallocations.
+
+### Capturing Heap Data
+
+The \"-heap\" option is used to set up heap tracing. Firefox generates
+lots of events, so you may want to play with the
+BufferSize/MinBuffers/MaxBuffers options as well to ensure that you
+don't get dropped events. Also, when recording the stack, I've found
+that a heap trace is often missing module information (I believe this is
+a bug in xperf). It's possible to get around that by doing a
+simultaneous capture of non-heap data.
+
+To start a trace session, launching a new Firefox instance:
+
+`xperf -on base xperf -start heapsession -heap -PidNewProcess "./firefox.exe -P test -no-remote" -stackwalk HeapAlloc+HeapRealloc -BufferSize 512 -MinBuffers 128 -MaxBuffers 512`
+
+To stop a session and merge the resulting files:
+
+`xperf -stop heapsession -d heap.etl xperf -d main.etl xperf -merge main.etl heap.etl result.etl`
+
+\"result.etl\" will contain your merged data; you can delete main.etl
+and heap.etl. Note that it's possible to capture even more data for the
+non-heap profile; for example, you might want to be able to correlate
+heap events with performance data, so you can do
+\"`xperf -on base -stackwalk profile`\".
+
+In the viewer, when summary data is viewed for heap events (Heap
+Allocations Outstanding, etc. all lead to the same summary graphs), 3
+types of allocations are listed \-- AIFI, AIFO, AOFI. This is shorthand
+for \"Allocated Inside, Freed Inside\", \"Allocated Inside, Freed
+Outside\", \"Allocated Outside, Freed Inside\". These refer to the time
+range that was selected for the summary graph; for example, something
+that's in the AOFI category was allocated before the start of the
+selected time range, but the free event happened inside.
+
+## Tips
+
+- In the summary views, the yellow bar can be dragged left and right
+ to change the grouping \-- for example, drag it to the left of the
+ Module column to have grouping happen only by process (stuff that's
+ to the left), so that you get symbols in order of weight, regardless
+ of what module they're in.
+- Dragging the columns around will change grouping in various ways;
+ experiment to get the data that you're looking for. Also experiment
+ with turning columns on and off; removing a column will allow data
+ to be aggregated without considering that column's contributions.
+- Disabling all but one core will make the numbers add up to 100%.
+ This can be done by running 'msconfig' and going to Advance
+ Options from the \"Boot\" tab.
+
+## Building Firefox
+
+To get good data from a Firefox build, it is important to build with the
+following options in your mozconfig:
+
+`export CFLAGS="-Oy-" export CXXFLAGS="-Oy-"`
+
+This disables frame-pointer optimization which lets xperf do a much
+better job unwinding the stack. Traces can be captured fine without this
+option (for example, from nightlies), but the stack information will not
+be useful.
+
+`ac_add_options --enable-debug-symbols`
+
+This gives us symbols.
+
+## For More Information
+
+Microsoft's [documentation for xperf](http://msdn.microsoft.com/en-us/library/ff191077.aspx "http://msdn.microsoft.com/en-us/library/ff191077.aspx")
+is pretty good; there is a lot of depth to this tool, and you should
+look there for more details.