The Jprof Profiler

jim_nance@yahoo.com

Recent (4/2011) updates Randell Jesup (see bugzilla for contact info)


Introduction | Operation | Setup | Usage | Interpretation

Introduction

Jprof is a profiling tool. I am writing it because I need to find out where mozilla is spending its time, and there do not seem to be any profilers for Linux that can handle threads and/or shared libraries. This code is based heavily on Kipp Hickman's leaky.

Operation

Jprof operates by installing a timer which periodically interrupts mozilla. When this timer goes off, the jprof code inside mozilla walks the function call stack to determine which code was executing and saves the results into the jprof-log and jprof-map files. By collecting a large number of these call stacks, it is possible to deduce where mozilla is spending its time.

Setup

Configure your mozilla with jprof support by adding --enable-jprof to your configure options (eg adding ac_add_options --enable-jprof to your .mozconfig) and making sure that you do not have the --enable-strip configure option set -- jprof needs symbols to operate. On many architectures with GCC, you'll need to add --enable-optimize="-O3 -fno-omit-frame-pointer" or the equivalent to ensure frame pointer generation in the compiler you're using.

Finally, build mozilla with your new configuration. Now you can run jprof.

Usage

 jprof [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] [--output-dir dir] prog log [log2 ...]
Options: The behavior of jprof is determined by the value of the JPROF_FLAGS environment variable. This environment variable can be composed of several substrings which have the following meanings:

Starting and stopping jprof from JavaScript

A build with jprof enabled adds four functions to the Window object:

JProfStartProfiling() and JProfStopProfiling(): When used with JP_DEFER, these allow one to start and stop the timer just around whatever critical section is being profiled.

JProfClearCircular() and JProfSaveCircular(): These clear the circular buffer and save the buffer (without stopping), respectively.

Examples of JPROF_FLAGS usage

Pausing profiles

jprof can be paused at any time by sending a SIGUSR1 to mozilla (kill -USR1). This will cause the timer signals to stop and jprof-map to be written, but it will not close jprof-log. Combining SIGUSR1 with the JP_DEFER option allows profiling of one sequence of actions by starting the timer right before starting the actions and stopping the timer right afterward.

After a SIGUSR1, sending another timer signal (SIGPROF, SIGALRM, or SIGPOLL (aka SIGIO), depending on the mode) can be used to continue writing data to the same output.

SIGUSR2 will cause the circular buffer to be cleared, if it's in use. This is useful right before running a test when you're using a large, continuous circular buffer, or programmatically at the start of an action which might take too long (JProfClearCircular()).

Looking at the results

Now that we have jprof-log and jprof-map files, we can use the jprof executable is used to turn them into readable output. To do this jprof needs the name of the mozilla binary and the log file. It deduces the name of the map file:
  ./jprof /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log > tmp.html
This will generate the file tmp.html which you should view in a web browser.
  ./jprof --output-dir=/tmp /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log*
This will generate a set of files in /tmp for each process.

Interpretation

The Jprof output is split into a flat portion and a hierarchical portion. There are links to each section at the top of the page. It is typically easier to analyze the profile by starting with the flat output and following the links contained in the flat output up to the hierarchical output.

Flat output

The flat portion of the profile indicates which functions were executing when the timer was going off. It is displayed as a list of functions names on the right and the number of times that function was interrupted on the left. The list is sorted by decreasing interrupt count. For example:
Total hit count: 151603
Count %Total  Function Name

8806   5.8     __libc_poll
2254   1.5     __i686.get_pc_thunk.bx
2053   1.4     _int_malloc
1777   1.2     ComputedStyle::GetStyleData(nsStyleStructID)
1600   1.1     __libc_malloc
1552   1.0     nsCOMPtr_base::~nsCOMPtr_base()
This shows that of the 151603 times the timer fired, 1777 (1.2% of the total) were inside ComputedStyle::GetStyleData() and 1552 (1.0% of the total) were in the nsCOMPtr_base destructor.

In general, the functions with the highest count are the functions which are taking the most time.

The function names are linked to the entry for that function in the hierarchical profile, which is described in the next section.

Hierarchical output

The hierarchical output is divided up into sections, with each section corresponding to one function. A typical section looks something like this:
 index  Count         Hits      Function Name
                           545 (46.4%) nsBlockFrame::ReflowInlineFrames(nsBlockReflowState&, nsLineList_iterator, int*)
                           100 (8.5%)  nsBlockFrame::ReflowDirtyLines(nsBlockReflowState&)
 72870      4 (0.3%)       645 (54.9%) nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFlowAreaRect&, int&, nsFloatManager::SavedState*, int*, LineReflowStatus*, int)
                           545 (46.4%) nsBlockFrame::ReflowInlineFrame(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsIFrame*, LineReflowStatus*)
                            83 (7.1%)  nsBlockFrame::PlaceLine(nsBlockReflowState&, nsLineLayout&, nsLineList_iterator, nsFloatManager::SavedState*, nsRect&, int&, int*)
                             9 (0.8%)  nsLineLayout::BeginLineReflow(int, int, int, int, int, int)
                             1 (0.1%)  nsTextFrame::GetType() const
                             1 (0.1%)  nsLineLayout::RelativePositionFrames(nsOverflowAreas&)
                             1 (0.1%)  __i686.get_pc_thunk.bx
                             1 (0.1%)  PL_ArenaAllocate
The information this block tells us is: The rest of this section explains how to read this information off from the jprof output.

This block corresponds to the function nsBlockFrame::DoReflowInlineFrames, which is therefore bolded and not a link. The name of this function is preceded by five numbers which have the following meaning. The number on the left (72870) is the index number, and is not important. The next number (4) and the percentage following (0.3%) are the number of times this function was interrupted by the timer and the percentage of the total hits that is. The last number pair ("645 (54.9%)") are the number of times this function was in the call stack when the timer went off. That is, the timer went off while we were in code that was ultimately called from nsBlockFrame::DoReflowInlineFrames.

For our example we can see that our function was in the call stack for 645 interrupt ticks, but we were only the function that was running when the interrupt arrived 4 times.

The functions listed above the line for nsBlockFrame::DoReflowInlineFrames are its callers. The numbers to the left of these function names are the numbers of times these functions were in the call stack as callers of nsBlockFrame::DoReflowInlineFrames. In our example, we were called 545 times by nsBlockFrame::ReflowInlineFrames and 100 times by nsBlockFrame::ReflowDirtyLines.

The functions listed below the line for nsBlockFrame::DoReflowInlineFrames are its callees. The numbers to the left of the function names are the numbers of times these functions were in the callstack as callees of nsBlockFrame::DoReflowInlineFrames and the corresponding percentages. In our example, of the 645 profiler hits under nsBlockFrame::DoReflowInlineFrames 545 were under nsBlockFrame::ReflowInlineFrame, 83 were under nsBlockFrame::PlaceLine, and so forth.

NOTE: If there are loops of execution or recursion, the numbers will not add up and percentages can exceed 100%. If a function directly calls itself "(self)" will be appended to the line, but indirect recursion will not be marked.

Bugs

The current build of Jprof has only been tested under Ubuntu 8.04 LTS, but should work under any fairly modern linux distribution using GCC/GLIBC. Please update this document with any known compatibilities/incompatibilities.

If you get an error:

Inconsistency detected by ld.so: dl-open.c: 260: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!

that means you've hit a timing hole in the version of glibc you're running. See Redhat bug 4578.