diff options
Diffstat (limited to 'docs/performance/memory')
-rw-r--r-- | docs/performance/memory/DOM_allocation_example.md | 57 | ||||
-rw-r--r-- | docs/performance/memory/about_colon_memory.md | 274 | ||||
-rw-r--r-- | docs/performance/memory/aggregate_view.md | 198 | ||||
-rw-r--r-- | docs/performance/memory/awsy.md | 22 | ||||
-rw-r--r-- | docs/performance/memory/basic_operations.md | 82 | ||||
-rw-r--r-- | docs/performance/memory/bloatview.md | 245 | ||||
-rw-r--r-- | docs/performance/memory/dmd.md | 489 | ||||
-rw-r--r-- | docs/performance/memory/dominators.md | 90 | ||||
-rw-r--r-- | docs/performance/memory/dominators_view.md | 221 | ||||
-rw-r--r-- | docs/performance/memory/gc_and_cc_logs.md | 109 | ||||
-rw-r--r-- | docs/performance/memory/heap_scan_mode.md | 313 | ||||
-rw-r--r-- | docs/performance/memory/leak_gauge.md | 45 | ||||
-rw-r--r-- | docs/performance/memory/leak_hunting_strategies_and_tips.md | 219 | ||||
-rw-r--r-- | docs/performance/memory/memory.md | 64 | ||||
-rw-r--r-- | docs/performance/memory/monster_example.md | 79 | ||||
-rw-r--r-- | docs/performance/memory/refcount_tracing_and_balancing.md | 235 | ||||
-rw-r--r-- | docs/performance/memory/tree_map_view.md | 62 |
17 files changed, 2804 insertions, 0 deletions
diff --git a/docs/performance/memory/DOM_allocation_example.md b/docs/performance/memory/DOM_allocation_example.md new file mode 100644 index 0000000000..db9a1f2c71 --- /dev/null +++ b/docs/performance/memory/DOM_allocation_example.md @@ -0,0 +1,57 @@ +# DOM allocation example + +This article describes a very simple web page that we\'ll use to +illustrate some features of the Memory tool. + +You can try out the site at +<https://mdn.github.io/performance-scenarios/dom-allocs/alloc.html>. + +It just contains a script that creates a large number of DOM nodes: + +```js +var toolbarButtonCount = 20; +var toolbarCount = 200; + +function getRandomInt(min, max) { + return Math.floor(Math.random() * (max - min + 1)) + min; +} + +function createToolbarButton() { + var toolbarButton = document.createElement("span"); + toolbarButton.classList.add("toolbarbutton"); + // stop Spidermonkey from sharing instances + toolbarButton[getRandomInt(0,5000)] = "foo"; + return toolbarButton; +} + +function createToolbar() { + var toolbar = document.createElement("div"); + // stop Spidermonkey from sharing instances + toolbar[getRandomInt(0,5000)] = "foo"; + for (var i = 0; i < toolbarButtonCount; i++) { + var toolbarButton = createToolbarButton(); + toolbar.appendChild(toolbarButton); + } + return toolbar; +} + +function createToolbars() { + var container = document.getElementById("container"); + for (var i = 0; i < toolbarCount; i++) { + var toolbar = createToolbar(); + container.appendChild(toolbar); + } +} + +createToolbars(); +``` + +A simple pseudocode representation of how this code operates looks like +this: + + createToolbars() + -> createToolbar() // called 200 times, creates 1 DIV element each time + -> createToolbarButton() // called 20 times per toolbar, creates 1 SPAN element each time + +In total, then, it creates 200 `HTMLDivElement` objects, and 4000 +`HTMLSpanElement` objects. diff --git a/docs/performance/memory/about_colon_memory.md b/docs/performance/memory/about_colon_memory.md new file mode 100644 index 0000000000..ab9dc81062 --- /dev/null +++ b/docs/performance/memory/about_colon_memory.md @@ -0,0 +1,274 @@ +# about:memory + +about:memory is a special page within Firefox that lets you view, save, +load, and diff detailed measurements of Firefox's memory usage. It also +lets you do other memory-related operations like trigger GC and CC, dump +GC & CC logs, and dump DMD reports. It is present in all builds and does +not require any preparation to be used. + +## How to generate memory reports + +Let's assume that you want to measure Firefox's memory usage. Perhaps +you want to investigate it yourself, or perhaps someone has asked you to +use about:memory to generate "memory reports" so they can investigate +a problem you are having. Follow these steps. + +- At the moment of interest (e.g. once Firefox's memory usage has + gotten high) open a new tab and type "about:memory" into the + address bar and hit "Enter". +- If you are using a communication channel where files can be sent, + such as Bugzilla or email, click on the "Measure and save..." + button. This will open a file dialog that lets you save the memory + reports to a file of your choosing. (The filename will have a + `.json.gz` suffix.) You can then attach or upload the file + appropriately. The recipients will be able to view the contents of + this file within about:memory in their own Firefox instance. +- If you are using a communication channel where only text can be + sent, such as a comment thread on a website, click on the + "Measure..." button. This will cause a tree-like structure to be + generated text within about:memory. This structure is just text, so + you can copy and paste some or all of this text into any kind of + text buffer. (You don't need to take a screenshot.) This text + contains fewer measurements than a memory reports file, but is often + good enough to diagnose problems. Don't click "Measure..." + repeatedly, because that will cause the memory usage of about:memory + itself to rise, due to it discarding and regenerating large numbers + of DOM nodes. + +Note that in both cases the generated data contains privacy-sensitive +details such as the full list of the web pages you have open in other +tabs. If you do not wish to share this information, you can select the +"anonymize" checkbox before clicking on "Measure and save..." or +"Measure...". This will cause the privacy-sensitive data to be +stripped out, but it may also make it harder for others to investigate +the memory usage. + +## Loading memory reports from file + +The easiest way to load memory reports from file is to use the +"Load..." button. You can also use the "Load and diff..." button +to get the difference between two memory report files. + +Single memory report files can also be loaded automatically when +about:memory is loaded by appending a `file` query string, for example: + + about:memory?file=/home/username/reports.json.gz + +This is most useful when loading memory reports files obtained from a +Firefox OS device. + +Memory reports are saved to file as gzipped JSON. These files can be +loaded as is, but they can also be loaded after unzipping. + +## Interpreting memory reports + +Almost everything you see in about:memory has an explanatory tool-tip. +Hover over any button to see a description of what it does. Hover over +any measurement to see a description of what it means. + +### [Measurement basics] + +Most measurements use bytes as their unit, but some are counts or +percentages. + +Most measurements are presented within trees. For example: + + 585 (100.0%) -- preference-service + └──585 (100.0%) -- referent + ├──493 (84.27%) ── strong + └───92 (15.73%) -- weak + ├──92 (15.73%) ── alive + └───0 (00.00%) ── dead + +Leaf nodes represent actual measurements; the value of each internal +node is the sum of all its children. + +The use of trees allows measurements to be broken down into further +categories, sub-categories, sub-sub-categories, etc., to arbitrary +depth, as needed. All the measurements within a single tree are +non-overlapping. + +Tree paths can be written using \'/\' as a separator. For example, +`preference/referent/weak/dead` represents the path to the final leaf +node in the example tree above. + +Sub-trees can be collapsed or expanded by clicking on them. If you find +any particular tree overwhelming, it can be helpful to collapse all the +sub-trees immediately below the root, and then gradually expand the +sub-trees of interest. + +### [Sections] + +Memory reports are displayed on a per-process basis, with one process +per section. Within each process's measurements, there are the +following subsections. + +#### Explicit Allocations + +This section contains a single tree, called "explicit", that measures +all the memory allocated via explicit calls to heap allocation functions +(such as `malloc` and `new`) and to non-heap allocations functions (such +as `mmap` and `VirtualAlloc`). + +Here is an example for a browser session where tabs were open to +cnn.com, techcrunch.com, and arstechnica.com. Various sub-trees have +been expanded and others collapsed for the sake of presentation. + + 191.89 MB (100.0%) -- explicit + ├───63.15 MB (32.91%) -- window-objects + │ ├──24.57 MB (12.80%) -- top(http://edition.cnn.com/, id=8) + │ │ ├──20.18 MB (10.52%) -- active + │ │ │ ├──10.57 MB (05.51%) -- window(http://edition.cnn.com/) + │ │ │ │ ├───4.55 MB (02.37%) ++ js-compartment(http://edition.cnn.com/) + │ │ │ │ ├───2.60 MB (01.36%) ++ layout + │ │ │ │ ├───1.94 MB (01.01%) ── style-sheets + │ │ │ │ └───1.48 MB (00.77%) -- (2 tiny) + │ │ │ │ ├──1.43 MB (00.75%) ++ dom + │ │ │ │ └──0.05 MB (00.02%) ── property-tables + │ │ │ └───9.61 MB (05.01%) ++ (18 tiny) + │ │ └───4.39 MB (02.29%) -- js-zone(0x7f69425b5800) + │ ├──15.75 MB (08.21%) ++ top(http://techcrunch.com/, id=20) + │ ├──12.85 MB (06.69%) ++ top(http://arstechnica.com/, id=14) + │ ├───6.40 MB (03.33%) ++ top(chrome://browser/content/browser.xul, id=3) + │ └───3.59 MB (01.87%) ++ (4 tiny) + ├───45.74 MB (23.84%) ++ js-non-window + ├───33.73 MB (17.58%) ── heap-unclassified + ├───22.51 MB (11.73%) ++ heap-overhead + ├────6.62 MB (03.45%) ++ images + ├────5.82 MB (03.03%) ++ workers/workers(chrome) + ├────5.36 MB (02.80%) ++ (16 tiny) + ├────4.07 MB (02.12%) ++ storage + ├────2.74 MB (01.43%) ++ startup-cache + └────2.16 MB (01.12%) ++ xpconnect + +Some expertise is required to understand the full details here, but +there are various things worth pointing out. + +- This "explicit" value at the root of the tree represents all the + memory allocated via explicit calls to allocation functions. +- The "window-objects" sub-tree represents all JavaScript `window` + objects, which includes the browser tabs and UI windows. For + example, the "top(http://edition.cnn.com/, id=8)" sub-tree + represents the tab open to cnn.com, and + "top(chrome://browser/content/browser.xul, id=3)" represents the + main browser UI window. +- Within each window's measurements are sub-trees for JavaScript + ("js-compartment(...)" and "js-zone(...)"), layout, + style-sheets, the DOM, and other things. +- It's clear that the cnn.com tab is using more memory than the + techcrunch.com tab, which is using more than the arstechnica.com + tab. +- Sub-trees with names like "(2 tiny)" are artificial nodes inserted + to allow insignificant sub-trees to be collapsed by default. If you + select the "verbose" checkbox before measuring, all trees will be + shown fully expanded and no artificial nodes will be inserted. +- The "js-non-window" sub-tree represents JavaScript memory usage + that doesn't come from windows, but from the browser core. +- The "heap-unclassified" value represents heap-allocated memory + that is not measured by any memory reporter. This is typically + 10--20% of "explicit". If it gets higher, it indicates that + additional memory reporters should be added. + [DMD](./dmd.md) + can be used to determine where these memory reporters should be + added. +- There are measurements for other content such as images and workers, + and for browser subsystems such as the startup cache and XPConnect. + +Some add-on memory usage is identified, as the following example shows. + + ├───40,214,384 B (04.17%) -- add-ons + │ ├──21,184,320 B (02.20%) ++ {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}/js-non-window/zones/zone(0x100496800)/compartment([System Principal], jar:file:///Users/njn/Library/Application%20Support/Firefox/Profiles/puna0zr8.new/extensions/%7Bd10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d%7D.xpi!/bootstrap.js (from: resource://gre/modules/addons/XPIProvider.jsm:4307)) + │ ├──11,583,312 B (01.20%) ++ jid1-xUfzOsOFlzSOXg@jetpack/js-non-window/zones/zone(0x100496800) + │ ├───5,574,608 B (00.58%) -- {59c81df5-4b7a-477b-912d-4e0fdf64e5f2} + │ │ ├──5,529,280 B (00.57%) -- window-objects + │ │ │ ├──4,175,584 B (00.43%) ++ top(chrome://chatzilla/content/chatzilla.xul, id=4293) + │ │ │ └──1,353,696 B (00.14%) ++ top(chrome://chatzilla/content/output-window.html, id=4298) + │ │ └─────45,328 B (00.00%) ++ js-non-window/zones/zone(0x100496800)/compartment([System Principal], file:///Users/njn/Library/Application%20Support/Firefox/Profiles/puna0zr8.new/extensions/%7B59c81df5-4b7a-477b-912d-4e0fdf64e5f2%7D/components/chatzilla-service.js) + │ └───1,872,144 B (00.19%) ++ treestyletab@piro.sakura.ne.jp/js-non-window/zones/zone(0x100496800) + +More things worth pointing out are as follows. + +- Some add-ons are identified by a name, such as Tree Style Tab. + Others are identified only by a hexadecimal identifier. You can look + in about:support to see which add-on a particular identifier belongs + to. For example, `59c81df5-4b7a-477b-912d-4e0fdf64e5f2` is + Chatzilla. +- All JavaScript memory usage for an add-on is measured separately and + shown in this sub-tree. +- For add-ons that use separate windows, such as Chatzilla, the memory + usage of those windows will show up in this sub-tree. +- For add-ons that use XUL overlays, such as AdBlock Plus, the memory + usage of those overlays will not show up in this sub-tree; it will + instead be in the non-add-on sub-trees and won't be identifiable as + being caused by the add-on. + +#### Other Measurements + +This section contains multiple trees, includes many that cross-cut the +measurements in the "explicit" tree. For example, in the "explicit" +tree all DOM and layout measurements are broken down by window by +window, but in "Other Measurements" those measurements are aggregated +into totals for the whole browser, as the following example shows. + + 26.77 MB (100.0%) -- window-objects + ├──14.59 MB (54.52%) -- layout + │ ├───6.22 MB (23.24%) ── style-sets + │ ├───4.00 MB (14.95%) ── pres-shell + │ ├───1.79 MB (06.68%) ── frames + │ ├───0.89 MB (03.33%) ── style-contexts + │ ├───0.62 MB (02.33%) ── rule-nodes + │ ├───0.56 MB (02.10%) ── pres-contexts + │ ├───0.47 MB (01.75%) ── line-boxes + │ └───0.04 MB (00.14%) ── text-runs + ├───6.53 MB (24.39%) ── style-sheets + ├───5.59 MB (20.89%) -- dom + │ ├──3.39 MB (12.66%) ── element-nodes + │ ├──1.56 MB (05.84%) ── text-nodes + │ ├──0.54 MB (02.03%) ── other + │ └──0.10 MB (00.36%) ++ (4 tiny) + └───0.06 MB (00.21%) ── property-tables + +Some of the trees in this section measure things that do not cross-cut +the measurements in the "explicit" tree, such as those in the +"preference-service" example above. + +Finally, at the end of this section are individual measurements, as the +following example shows. + + 0.00 MB ── canvas-2d-pixels + 5.38 MB ── gfx-surface-xlib + 0.00 MB ── gfx-textures + 0.00 MB ── gfx-tiles-waste + 0 ── ghost-windows + 109.22 MB ── heap-allocated + 164 ── heap-chunks + 1.00 MB ── heap-chunksize + 114.51 MB ── heap-committed + 164.00 MB ── heap-mapped + 4.84% ── heap-overhead-ratio + 1 ── host-object-urls + 0.00 MB ── imagelib-surface-cache + 5.27 MB ── js-main-runtime-temporary-peak + 0 ── page-faults-hard + 203,349 ── page-faults-soft + 274.99 MB ── resident + 251.47 MB ── resident-unique + 1,103.64 MB ── vsize + +Some measurements of note are as follows. + +- "resident". Physical memory usage. If you want a single + measurement to summarize memory usage, this is probably the best + one. +- "vsize". Virtual memory usage. This is often much higher than any + other measurement (particularly on Mac). It only really matters on + 32-bit platforms such as Win32. There is also + "vsize-max-contiguous" (not measured on all platforms, and not + shown in this example), which indicates the largest single chunk of + available virtual address space. If this number is low, it's likely + that memory allocations will fail due to lack of virtual address + space quite soon. +- Various graphics-related measurements ("gfx-*"). The measurements + taken vary between platforms. Graphics is often a source of high + memory usage, and so these measurements can be helpful for detecting + such cases. diff --git a/docs/performance/memory/aggregate_view.md b/docs/performance/memory/aggregate_view.md new file mode 100644 index 0000000000..9a4f01e01e --- /dev/null +++ b/docs/performance/memory/aggregate_view.md @@ -0,0 +1,198 @@ +# Aggregate view + +Before Firefox 48, this was the default view of a heap snapshot. After +Firefox 48, the default view is the [Tree map +view](tree_map_view.md), and you can switch to the +Aggregate view using the dropdown labeled \"View:\": + +![](../img/memory-tool-switch-view.png) + +The Aggregate view looks something like this: + +![](../img/memory-tool-aggregate-view.png) + +It presents a breakdown of the heap\'s contents, as a table. There are +three main ways to group the data: + +- Type +- Call Stack +- Inverted Call Stack + +You can switch between them using the dropdown menu labeled \"Group +by:\" located at the top of the panel: + +There\'s also a box labeled \"Filter\" at the top-right of the pane. You +can use this to filter the contents of the snapshot that are displayed, +so you can quickly see, for example, how many objects of a specific +class were allocated. + +## Type + +This is the default view, which looks something like this: + +![](../img/memory-tool-aggregate-view.png) + +It groups the things on the heap into types, including: + +- **JavaScript objects:** such as `Function` or `Array` +- **DOM elements:** such as `HTMLSpanElement` or `Window` +- **Strings:** listed as `"strings"` +- **JavaScript sources:** listed as \"`JSScript"` +- **Internal objects:** such as \"`js::Shape`\". These are prefixed + with `"js::"`. + +Each type gets a row in the table, and rows are ordered by the amount of +memory occupied by objects of that type. For example, in the screenshot +above you can see that JavaScript `Object`s account for most memory, +followed by strings. + +- The \"Total Count\" column shows you the number of objects of each + category that are currently allocated. +- The \"Total Bytes\" column shows you the number of bytes occupied by + objects in each category, and that number as a percentage of the + whole heap size for that tab. + +The screenshots in this section are taken from a snapshot of the +[monster example page](monster_example.md). + +For example, in the screenshot above, you can see that: + +- there are four `Array` objects +- that account for 15% of the total heap. + +Next to the type\'s name, there\'s an icon that contains three stars +arranged in a triangle: + +![](../img/memory-tool-in-group-icon.png) + +Click this to see every instance of that type. For example, the entry +for `Array` tells us that there are four `Array` objects in the +snapshot. If we click the star-triangle, we\'ll see all four `Array` +instances: + +![](../img/memory-tool-in-group.png) + +For each instance, you can see the [retained size and shallow +size](dominators.md#shallow_and_retained_size) of +that instance. In this case, you can see that the first three arrays +have a fairly large shallow size (5% of the total heap usage) and a much +larger retained size (26% of the total). + +On the right-hand side is a pane that just says \"Select an item to view +its retaining paths\". If you select an item, you\'ll see the [Retaining +paths +panel](dominators_view.md#retaining_paths_panel) +for that item: + +![](../img/memory-tool-in-group-retaining-paths.png) + +<iframe width="595" height="325" src="https://www.youtube.com/embed/uLjzrvx_VCg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"></iframe> + + +## Call Stack + +The Call Stack shows you exactly where in your code you are making heap +allocations. + +Because tracing allocations has a runtime cost, it must be explicitly +enabled by checking \"Record call stacks\" *before* you allocate the +memory in the snapshot. + +You\'ll then see a list of all the functions that allocated objects, +ordered by the size of the allocations they made: + +![](../img/memory-tool-call-stack.png) +\ +The first entry says that: + +- 4,832,592 bytes, comprising 93% of the total heap usage, were + allocated in a function at line 35 of \"alloc.js\", **or in + functions called by that function** + +We can use the disclosure triangle to drill down the call tree, to find +the exact place your code made those allocations. + +It\'s easier to explain this with reference to a simple example. For +this we\'ll use the [DOM allocation +example](DOM_allocation_example.md). This page +runs a script that creates a large number of DOM nodes (200 +`HTMLDivElement` objects and 4000 `HTMLSpanElement` objects). + +Let\'s get an allocation trace: + +1. open the Memory tool +2. check \"Record call stacks\" +3. load + <https://mdn.github.io/performance-scenarios/dom-allocs/alloc.html> +4. take a snapshot +5. select \"View/Aggregate\" +6. select \"Group by/Call Stack\" + +<iframe width="595" height="325" src="https://www.youtube.com/embed/DyLulu9eoKY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"></iframe> + +You should see something like this: + +![](../img/memory-tool-call-stack.png) + +This is telling us that 93% of the total heap snapshot was allocated in +functions called from \"alloc.js\", line 35 (our initial +`createToolbars()` call). + +We can use the disclosure arrow to expand the tree to find out exactly +where we\'re allocating memory: + +![](../img/memory-tool-call-stack-expanded.png) + +This is where the \"Bytes\" and \"Count\" columns are useful: they show +allocation size and number of allocations at that exact point. + +So in the example above, we can see that we made 4002 allocations, +accounting for 89% of the total heap, in `createToolbarButton()`, at +[alloc.js line 9, position +23](https://github.com/mdn/performance-scenarios/blob/gh-pages/dom-allocs/scripts/alloc.js#L9): +that is, the exact point where we create the span +elements. + +The file name and line number is a link: if we click it, we go directly +to that line in the debugger: + +<iframe width="595" height="325" src="https://www.youtube.com/embed/zlnJcr1IFyY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"></iframe> + + +## Inverted Call Stack + +The Call Stack view is top-down: it shows allocations that happen at +that point **or points deeper in the call tree**. So it\'s good for +getting an overview of where your program is memory-hungry. However, +this view means you have to drill a long way down to find the exact +place where the allocations are happening. + +The \"Inverted Call Stack\" view helps with that. It gives you the +bottom-up view of the program showing the exact places where allocations +are happening, ranked by the size of allocation at each place. The +disclosure arrow then walks you back up the call tree towards the top +level. + +Let\'s see what the example looks like when we select \"Inverted Call +Stack\": + +![](../img/memory-tool-inverted-call-stack.png) + +Now at the top we can immediately see the `createToolbarButton()` call +accounting for 89% of the heap usage in our page. + +## no stack available + +In the example above you\'ll note that 7% of the heap is marked \"(no +stack available)\". This is because not all heap usage results from your +JavaScript. + +For example: + +- any scripts the page loads occupy heap space +- sometimes an object is allocated when there is no JavaScript on the + stack. For example, DOM Event objects are allocated + before the JavaScript is run and event handlers are called. + +Many real-world pages will have a much higher \"(no stack available)\" +share than 7%. diff --git a/docs/performance/memory/awsy.md b/docs/performance/memory/awsy.md new file mode 100644 index 0000000000..5026f055aa --- /dev/null +++ b/docs/performance/memory/awsy.md @@ -0,0 +1,22 @@ +# Are We Slim Yet (AWSY) + +The Are We Slim Yet project (commonly known as AWSY) for several years +tracked memory usage across builds on the (now defunct) website. +It used the same +infrastructure as +[about:memory](about_colon_memory.md) to measure +memory usage on a predefined snapshot of Alexa top 100 pages known as +tp5. + +Since Firefox transitioned to using multiple processes by default, we +[moved AWSY into the +TaskCluster](https://bugzilla.mozilla.org/show_bug.cgi?id=1272113) +infrastructure. This allowed us to run measurements on all branches and +platforms. The results are posted to +[perfherder](https://treeherder.mozilla.org/perf.html) where we can +track regressions automatically. + +As new processes are added to Firefox we want to make sure their memory +usage is also tracked by AWSY. To this end we request that memory +reporting be integrated into any new process before it is enabled on +Nightly. diff --git a/docs/performance/memory/basic_operations.md b/docs/performance/memory/basic_operations.md new file mode 100644 index 0000000000..276c38bc2e --- /dev/null +++ b/docs/performance/memory/basic_operations.md @@ -0,0 +1,82 @@ +# Basic operations + +## Opening the Memory tool + +Before Firefox 50, the Memory tool is not enabled by default. To enable +it, open the developer tool settings, and check the "Memory" box under +"Default Firefox Developer Tools": + +<iframe width="595" height="325" src="https://www.youtube.com/embed/qi-0CoCOXw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"></iframe> + +From Firefox 50 onwards, the Memory tool is enabled by default. + +## Taking a heap snapshot + +To take a snapshot of the heap, click the "Take snapshot" button, or +the camera icon on the left: + +![memoryimage1](../img/memory-1-small.png) + +The snapshot will occupy the large pane on the right-hand side. On the +left, you'll see an entry for the new snapshot, including its +timestamp, size, and controls to save or clear this snapshot: + +![memoryimage2](../img/memory-2-small.png) + +## Clearing a snapshot + +To remove a snapshot, click the "X" icon: + +![memoryimage3](../img/memory-3-small.png) + +## Saving and loading snapshots + +If you close the Memory tool, all unsaved snapshots will be discarded. +To save a snapshot click "Save": + +![memoryimage4](../img/memory-4-small.png) + +You'll be prompted for a name and location, and the file will be saved +with an `.fxsnapshot` extension. + +To load a snapshot from an existing `.fxsnapshot` file, click the import +button, which looks like a rectangle with an arrow rising from it +(before Firefox 49, this button was labeled with the text +"Import\...\"): + +![memoryimage5](../img/memory-5-small.png) + +You'll be prompted to find a snapshot file on disk. + +## Comparing snapshots + +Starting in Firefox 45, you can diff two heap snapshots. The diff shows +you where memory was allocated or freed between the two snapshots. + +To create a diff, click the button that looks like a Venn diagram next +to the camera icon (before Firefox 47, this looked like a \"+/-\" icon): + +![memoryimage6](../img/memory-6-small.png) + +You'll be prompted to select the snapshot to use as a baseline, then +the snapshot to compare. The tool then shows you the differences between +the two snapshots: + +<iframe width="595" height="325" src="https://www.youtube.com/embed/3Ow-mdK6b2M" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"></iframe> + + +::: {.note} +When you're looking at a comparison, you can't use the Dominators view +or the Tree Map view. +::: + +## Recording call stacks + +The Memory tool can tell you exactly where in your code you are +allocating memory. However, recording this information has a run-time +cost, so you must ask the tool to record memory calls *before* the +memory is allocated, if you want to see memory call sites in the +snapshot. To do this, check "Record call stacks" (before Firefox 49 +this was labeled "Record allocation stacks"): + +![memoryimage7](../img/memory-7-small.png) diff --git a/docs/performance/memory/bloatview.md b/docs/performance/memory/bloatview.md new file mode 100644 index 0000000000..9e290011b1 --- /dev/null +++ b/docs/performance/memory/bloatview.md @@ -0,0 +1,245 @@ +# Bloatview + +BloatView is a tool that shows information about cumulative memory usage +and leaks. If it finds leaks, you can use [refcount tracing and balancing](refcount_tracing_and_balancing.md) +to discover the root cause. + +## How to build with BloatView + +Build with `--enable-debug` or `--enable-logrefcnt`. + +## How to run with BloatView + +The are two environment variables that can be used. + + XPCOM_MEM_BLOAT_LOG + +If set, this causes a *bloat log* to be printed on program exit, and +each time `nsTraceRefcnt::DumpStatistics` is called. This log contains +data on leaks and bloat (a.k.a. usage). + + XPCOM_MEM_LEAK_LOG + +This is similar to `XPCOM_MEM_BLOAT_LOG`, but restricts the log to only +show data on leaks. + +You can set these environment variables to any of the following values. + +- **1** - log to stdout. +- **2** - log to stderr. +- ***filename*** - write log to a file. + +## Reading individual bloat logs + +Full BloatView output contains per-class statistics on allocations and +refcounts, and provides gross numbers on the amount of memory being +leaked broken down by class. Here's a sample of the BloatView output. + + == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 1862 + |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| + | | Per-Inst Leaked| Total Rem| + 0 |TOTAL | 17 2484|253953338 38| + 17 |AsyncTransactionTrackersHolder | 40 40| 10594 1| + 78 |CompositorChild | 472 472| 1 1| + 79 |CondVar | 24 48| 3086 2| + 279 |MessagePump | 8 8| 30 1| + 285 |Mutex | 20 60| 89987 3| + 302 |PCompositorChild | 412 412| 1 1| + 308 |PImageBridgeChild | 416 416| 1 1| + +The first line tells you the pid of the leaking process, along with the +type of process. + +Here's how you interpret the columns. + +- The first, numerical column [is the index](https://searchfox.org/mozilla-central/source/xpcom/base/nsTraceRefcnt.cpp#365) + of the leaking class. +- **Class** - The name of the class in question (truncated to 20 + characters). +- **Bytes Per-Inst** - The number of bytes returned if you were to + write `sizeof(Class)`. Note that this number does not reflect any + memory held onto by the class, such as internal buffers, etc. (E.g. + for `nsString` you'll see the size of the header struct, not the + size of the string contents!) +- **Bytes Leaked** - The number of bytes per instance times the number + of objects leaked: (Bytes Per-Inst) x (Objects Rem). Use this number + to look for the worst offenders. (Should be zero!) +- **Objects Total** - The total count of objects allocated of a given + class. +- **Objects Rem** - The number of objects allocated of a given class + that weren't deleted. (Should be zero!) + +Interesting things to look for: + +- **Are your classes in the list?** - Look! If they aren't, then + you're not using the `NS_IMPL_ADDREF` and `NS_IMPL_RELEASE` (or + `NS_IMPL_ISUPPORTS` which calls them) for xpcom objects, or + `MOZ_COUNT_CTOR` and `MOZ_COUNT_DTOR` for non-xpcom objects. Not + having your classes in the list is *not* ok. That means no one is + looking at them, and we won't be able to tell if someone introduces + a leak. (See + [below](#how-to-instrument-your-objects-for-bloatview) + for how to fix this.) +- **The Bytes Leaked for your classes should be zero!** - Need I say + more? If it isn't, you should use the other tools to fix it. +- **The number of objects remaining might not be equal to the total + number of objects.** This could indicate a hand-written Release + method (that doesn't use the `NS_LOG_RELEASE` macro from + nsTraceRefcnt.h), or perhaps you're just not freeing any of the + instances you've allocated. These sorts of leaks are easy to fix. +- **The total number of objects might be 1.** This might indicate a + global variable or service. Usually this will have a large number of + refcounts. + +If you find leaks, you can use [refcount tracing and balancing](refcount_tracing_and_balancing.md) +to discover the root cause. + +## Combining and sorting bloat logs + +You can view one or more bloat logs in your browser by running the +following program. + + perl tools/bloatview/bloattable.pl *log1* *log2* \... *logn* > + *htmlfile* + +This will produce an HTML file that contains a table similar to the +following (but with added JavaScript so you can sort the data by +column). + + Byte Bloats + + ---------- ---------------- -------------------------- + Name File Date + blank `blank.txt` Tue Aug 29 14:17:40 2000 + mozilla `mozilla.txt` Tue Aug 29 14:18:42 2000 + yahoo `yahoo.txt` Tue Aug 29 14:19:32 2000 + netscape `netscape.txt` Tue Aug 29 14:20:14 2000 + ---------- ---------------- -------------------------- + +The numbers do not include malloc'd data such as string contents. + +Click on a column heading to sort by that column. Click on a class name +to see details for that class. + + -------------------- --------------- ----------------- --------- --------- ---------- ---------- ------------------------------- --------- -------- ---------- --------- + Class Name Instance Size Bytes allocated Bytes allocated but not freed + blank mozilla yahoo netscape Total blank mozilla yahoo netscape Total + TOTAL 1754408 432556 179828 404184 2770976 + nsStr 20 6261600 3781900 1120920 1791340 12955760 222760 48760 13280 76160 360960 + nsHashKey 8 610568 1842400 2457872 1134592 6045432 32000 536 568 1216 34320 + nsTextTransformer 548 8220 469088 1414936 1532756 3425000 0 0 0 0 0 + nsStyleContextData 736 259808 325312 489440 338560 1413120 141312 220800 -11040 94944 446016 + nsLineLayout 1100 2200 225500 402600 562100 1192400 0 0 0 0 0 + nsLocalFile 424 558832 19928 1696 1272 581728 72080 1272 424 -424 73352 + -------------------- --------------- ----------------- --------- --------- ---------- ---------- ------------------------------- --------- -------- ---------- --------- + +The first set of columns, **Bytes allocated**, shows the amount of +memory allocated for the first log file (`blank.txt`), the difference +between the first log file and the second (`mozilla.txt`), the +difference between the second log file and the third (`yahoo.txt`), the +difference between the third log file and the fourth (`netscape.txt`), +and the total amount of memory allocated in the fourth log file. These +columns provide an idea of how hard the memory allocator has to work, +but they do not indicate the size of the working set. + +The second set of columns, **Bytes allocated but not freed**, shows the +net memory gain or loss by subtracting the amount of memory freed from +the amount allocated. + +The **Show Objects** and **Show References** buttons show the same +statistics but counting objects or `AddRef`'d references rather than +bytes. + +## Comparing Bloat Logs + +You can also compare any two bloat logs (either those produced when the +program shuts down, or written to the bloatlogs directory) by running +the following program. + + `perl tools/bloatview/bloatdiff.pl` <previous-log> <current-log> + +This will give you output of the form: + + Bloat/Leak Delta Report + Current file: dist/win32_D.OBJ/bin/bloatlogs/all-1999-10-22-133450.txt + Previous file: dist/win32_D.OBJ/bin/bloatlogs/all-1999-10-16-010302.txt + -------------------------------------------------------------------------- + CLASS LEAKS delta BLOAT delta + -------------------------------------------------------------------------- + TOTAL 6113530 2.79% 67064808 9.18% + StyleContextImpl 265440 81.19% 283584 -26.99% + CToken 236500 17.32% 306676 20.64% + nsStr 217760 14.94% 5817060 7.63% + nsXULAttribute 113048 -70.92% 113568 -71.16% + LiteralImpl 53280 26.62% 75840 19.40% + nsXULElement 51648 0.00% 51648 0.00% + nsProfile 51224 0.00% 51224 0.00% + nsFrame 47568 -26.15% 48096 -50.49% + CSSDeclarationImpl 42984 0.67% 43488 0.67% + +This "delta report" shows the leak offenders, sorted from most leaks +to fewest. The delta numbers show the percentage change between runs for +the amount of leaks and amount of bloat (negative numbers are better!). +The bloat number is a metric determined by multiplying the total number +of objects allocated of a given class by the class size. Note that +although this isn't necessarily the amount of memory consumed at any +given time, it does give an indication of how much memory we're +consuming. The more memory in general, the worse the performance and +footprint. The percentage 99999.99% will show up indicating an +"infinite" amount of leakage. This happens when something that didn't +leak before is now leaking. + +## BloatView and continuous integration + +BloatView runs on debug builds for many of the test suites Mozilla has +running under continuous integration. If a new leak occurs, it will +trigger a test job failure. + +BloatView's output file can also show you where the leaked objects are +allocated. To do so, the `XPCOM_MEM_LOG_CLASSES` environment variable +should be set to the name of the class from the BloatView table: + + XPCOM_MEM_LOG_CLASSES=MyClass mach mochitest [options] + +Multiple class names can be specified by setting `XPCOM_MEM_LOG_CLASSES` +to a comma-separated list of names: + + XPCOM_MEM_LOG_CLASSES=MyClass,MyOtherClass,DeliberatelyLeakedClass mach mochitest [options] + +Test harness scripts typically accept a `--setenv` option for specifying +environment variables, which may be more convenient in some cases: + + mach mochitest --setenv=XPCOM_MEM_LOG_CLASSES=MyClass [options] + +For getting allocation stacks in automation, you can add the appropriate +`--setenv` options to the test configurations for the platforms you're +interested in. Those configurations are located in +`testing/mozharness/configs/`. The most likely configs you'll want to +modify are listed below: + +- Linux: `unittests/linux_unittest.py` +- Mac: `unittests/mac_unittest.py` +- Windows: `unittests/win_unittest.py` +- Android: `android/androidarm.py` + +## How to instrument your objects for BloatView + +First, if your object is an xpcom object and you use the +`NS_IMPL_ADDREF` and `NS_IMPL_RELEASE` (or a variation thereof) macro to +implement your `AddRef` and `Release` methods, then there is nothing you +need do. By default, those macros support refcnt logging directly. + +If your object is not an xpcom object then some manual editing is in +order. The following sample code shows what must be done: + + MyType::MyType() + { + MOZ_COUNT_CTOR(MyType); + ... + } + + MyType::~MyType() + { + MOZ_COUNT_DTOR(MyType); + ... + } diff --git a/docs/performance/memory/dmd.md b/docs/performance/memory/dmd.md new file mode 100644 index 0000000000..ebd6b5a2f8 --- /dev/null +++ b/docs/performance/memory/dmd.md @@ -0,0 +1,489 @@ +# Dark Matter Detector (DMD) + +DMD (short for "dark matter detector") is a heap profiler within +Firefox. It has four modes. + +- "Dark Matter" mode. In this mode, DMD tracks the contents of the + heap, including which heap blocks have been reported by memory + reporters. It helps us reduce the "heap-unclassified" value in + Firefox's about:memory page, and also detects if any heap blocks + are reported twice. Originally, this was the only mode that DMD had, + which explains DMD's name. This is the default mode. +- "Live" mode. In this mode, DMD tracks the current contents of the + heap. You can dump that information to file, giving a profile of the + live heap blocks at that point in time. This is good for + understanding how memory is used at an interesting point in time, + such as peak memory usage. +- "Cumulative" mode. In this mode, DMD tracks both the past and + current contents of the heap. You can dump that information to file, + giving a profile of the heap usage for the entire session. This is + good for finding parts of the code that cause high heap churn, e.g. + by allocating many short-lived allocations. +- "Heap scanning" mode. This mode is like live mode, but it also + records the contents of every live block in the log. This can be + used to investigate leaks by figuring out which objects might be + holding references to other objects. + +## Building and Running + +### Nightly Firefox + +The easiest way to use DMD is with the normal Nightly Firefox build, +which has DMD already enabled in the build. To have DMD active while +running it, you just need to set the environment variable `DMD=1` when +running. For instance, on OSX, you can run something like: + + DMD=1 /Applications/Firefox\ Nightly.app/Contents/MacOS/firefox + +You can tell it is working by going to `about:memory` and looking for +"Save DMD Output". If DMD has been properly enabled, the "Save" +button won't be grayed out. Look at the "Trigger" section below to +see the full list of ways to get a DMD report once you have it +activated. Note that the stack information you get will likely be less +detailed, due to being unable to symbolicate. You will be able to get +function names, but not line numbers. + +### Desktop Firefox + +#### Build + +Build Firefox with this option added to your mozconfig: + + ac_add_options --enable-dmd + +If building via try server, modify +`browser/config/mozconfigs/linux64/common-opt` or a similar file before +pushing. + +#### Launch + +Use `mach run --dmd`; use `--mode` to choose the mode. + +On a Windows build done by the try server, [these +instructions](https://bugzilla.mozilla.org/show_bug.cgi?id=936784#c69) from +2013 may or may not be useful. + +#### Trigger + +There are a few ways to trigger a DMD snapshot. Most of these will also +first get a memory report. When DMD is working on writing its output, it +will print logging like this: + + DMD[5222] opened /tmp/dmd-1414556492-5222.json.gz for writing + DMD[5222] Dump 1 { + DMD[5222] Constructing the heap block list... + DMD[5222] Constructing the stack trace table... + DMD[5222] Constructing the stack frame table... + DMD[5222] } + +You'll see separate output for each process. This step can take 10 or +more seconds and may make Firefox freeze temporarily. + +If you see the "opened" line, it tells you where the file was saved. +It's always in a temp directory, and the filenames are always of the +form dmd-<pid>. + +The ways to trigger a DMD snapshot are: + +1. Visit about:memory and click the "Save" button under "Save DMD output". + The button won't be present in non-DMD builds, and will be grayed out + in DMD builds if DMD isn't enabled at start-up. + +2. If you wish to trigger DMD dumps from within C++ or JavaScript code, + you can use `nsIMemoryInfoDumper.dumpMemoryInfoToTempDir`. For example, + from JavaScript code you can do the following. + + const Cc = Components.classes; + let mydumper = Cc["@mozilla.org/memory-info-dumper;1"] + .getService(Ci.nsIMemoryInfoDumper); + mydumper.dumpMemoryInfoToTempDir(identifier, anonymize, minimize); + + This will dump memory reports and DMD output to the temporary + directory. `identifier` is a string that will be used for part of + the filename (or a timestamp will be used if it is an empty string); + `anonymize` is a boolean that indicates if the memory reports should + be anonymized; and `minimize` is a boolean that indicates if memory + usage should be minimized first. + +3. On Linux, you can send signal 34 to the firefox process, e.g. + with the following command. + + $ killall -34 firefox + +4. The `MOZ_DMD_SHUTDOWN_LOG` environment variable, if set, triggers a DMD + run at shutdown; its value must be a directory where the logs will be + placed. This is mostly useful for debugging leaks. Which processes get + logged is controlled by the `MOZ_DMD_LOG_PROCESS` environment variable. + If this is not set, it will log all processes. It can be set to any valid + value of `XRE_GetProcessTypeString()` and will log only those processes. + For instance, if set to `default` it will only log the parent process. If + set to `tab`, it will log content processes only. + + For example, if you have + + MOZ_DMD_SHUTDOWN_LOG=~/dmdlogs/ MOZ_DMD_LOG_PROCESS=tab + + then DMD will create logs at shutdown for content processes and save them to + `~/dmdlogs/`. + +**NOTE:** + +- To dump DMD data from content processes, you'll need to disable the + sandbox with `MOZ_DISABLE_CONTENT_SANDBOX=1`. +- MOZ_DMD_SHUTDOWN_LOG must (currently) include the trailing separator + (\'\'/\") + + +### Fennec + +**NOTE:** + +You'll note from the name of this section being "Fennec" that these instructions +are very old. Hopefully they'll be more useful than not having them. + +**NOTE:** + +In order to use DMD on Fennec you will need root access on the Android +device. Instructions on how to root your device is outside the scope of +this document. + + +#### Build + +Build with these options: + + ac_add_options --enable-dmd + +#### Prep + +In order to prepare your device for running Fennec with DMD enabled, you +will need to do a few things. First, you will need to push the libdmd.so +library to the device so that it can by dynamically loaded by Fennec. +You can do this by running: + + adb push $OBJDIR/dist/bin/libdmd.so /sdcard/ + +Second, you will need to make an executable wrapper for Fennec which +sets an environment variable before launching it. (If you are familiar +with the recommended "--es env0" method for setting environment +variables when launching Fennec, note that you cannot use this method +here because those are processed too late in the startup process. If you +are not familiar with that method, you can ignore this parenthetical +note.) First make the executable wrapper on your host machine using the +editor of your choice. Name the file dmd_fennec and enter this as the +contents: + + #!/system/bin/sh + export MOZ_REPLACE_MALLOC_LIB=/sdcard/libdmd.so + exec "$@" + +If you want to use other DMD options, you can enter additional +environment variables above. You will need to push this to the device +and make it executable. Since you cannot mark files in /sdcard/ as +executable, we will use /data/local/tmp for this purpose: + + adb push dmd_fennec /data/local/tmp + adb shell + cd /data/local/tmp + chmod 755 dmd_fennec + +The final step is to make Android use the above wrapper script while +launching Fennec, so that the environment variable is present when +Fennec starts up. Assuming you have done a local build, the app +identifier will be `org.mozilla.fennec_$USERNAME` (`$USERNAME` is your +username on the host machine) and so we do this as shown below. If you +are using a DMD-enabled try build, or build from other source, adjust +the app identifier as necessary. + + adb shell + su # You need root access for the setprop command to take effect + setprop wrap.org.mozilla.fennec_$USERNAME "/data/local/tmp/dmd_fennec" + +Once this is set up, starting the `org.mozilla.fennec_$USERNAME` app +will use the wrapper script. + +#### Launch + +Launch Fennec either by tapping on the icon as usual, or from the +command line (as before, be sure to replace +`org.mozilla.fennec_$USERNAME` with the app identifier as appropriate). + + adb shell am start -n org.mozilla.fennec_$USERNAME/.App + +#### Trigger + +Use the existing memory-report dumping hook: + + adb shell am broadcast -a org.mozilla.gecko.MEMORY_DUMP + +In logcat, you should see output similar to this: + + I/DMD (20731): opened /storage/emulated/0/Download/memory-reports/dmd-default-20731.json.gz for writing + ... + I/GeckoConsole(20731): nsIMemoryInfoDumper dumped reports to /storage/emulated/0/Download/memory-reports/unified-memory-report-default-20731.json.gz + +The path is where the memory reports and DMD reports get dumped to. You +can pull them like so: + + adb pull /sdcard/Download/memory-reports/dmd-default-20731.json.gz + adb pull /sdcard/Download/memory-reports/unified-memory-report-default-20731.json.gz + +## Processing the output + +DMD outputs one gzipped JSON file per process that contains a +description of that process's heap. You can analyze these files (either +gzipped or not) using `dmd.py`. On Nightly Firefox, `dmd.py` is included +in the distribution. For instance on OS X, it is located in the +directory `/Applications/Firefox Nightly.app/Contents/Resources/`. For +Nightly, symbolication will fail, but you can at least get some +information. In a local build, `dmd.py` will be located in the directory +`$OBJDIR/dist/bin/`. + +Some platforms (Linux, Mac, Android) require stack fixing, which adds +missing filenames, function names and line number information. This will +occur automatically the first time you run `dmd.py` on the output file. +This can take 10s of seconds or more to complete. (This will fail if +your build does not contain symbols. However, if you have crash reporter +symbols for your build -- as tryserver builds do -- you can use [this +script](https://github.com/mstange/analyze-tryserver-profiles/blob/master/resymbolicate_dmd.py) +instead: clone the whole repo, edit the paths at the top of +`resymbolicate_dmd.py` and run it.) The simplest way to do this is to +just run the `dmd.py` script on your DMD report while your working +directory is `$OBJDIR/dist/bin`. This will allow the local libraries to +be found and used. + +If you invoke `dmd.py` without arguments you will get output appropriate +for the mode in which DMD was invoked. + +### "Dark matter" mode output + +For "dark matter" mode, `dmd.py`'s output describes how the live heap +blocks are covered by memory reports. This output is broken into +multiple sections. + +1. "Invocation". This tells you how DMD was invoked, i.e. what + options were used. +2. "Twice-reported stack trace records". This tells you which heap + blocks were reported twice or more. The presence of any such records + indicates bugs in one or more memory reporters. +3. "Unreported stack trace records". This tells you which heap blocks + were not reported, which indicate where additional memory reporters + would be most helpful. +4. "Once-reported stack trace records": like the "Unreported stack + trace records" section, but for blocks reported once. +5. "Summary": gives measurements of the total heap, and the + unreported/once-reported/twice-reported portions of it. + +The "Twice-reported stack trace records" and "Unreported stack trace +records" sections are the most important, because they indicate ways in +which the memory reporters can be improved. + +Here's an example stack trace record from the "Unreported stack trace +records" section. + + Unreported { + 150 blocks in heap block record 283 of 5,495 + 21,600 bytes (20,400 requested / 1,200 slop) + Individual block sizes: 144 x 150 + 0.00% of the heap (16.85% cumulative) + 0.02% of unreported (94.68% cumulative) + Allocated at { + #01: replace_malloc (/home/njn/moz/mi5/go64dmd/memory/replace/dmd/../../../../memory/replace/dmd/DMD.cpp:1286) + #02: malloc (/home/njn/moz/mi5/go64dmd/memory/build/../../../memory/build/replace_malloc.c:153) + #03: moz_xmalloc (/home/njn/moz/mi5/memory/mozalloc/mozalloc.cpp:84) + #04: nsCycleCollectingAutoRefCnt::incr(void*, nsCycleCollectionParticipant*) (/home/njn/moz/mi5/go64dmd/dom/xul/../../dist/include/nsISupportsImpl.h:250) + #05: nsXULElement::Create(nsXULPrototypeElement*, nsIDocument*, bool, bool,mozilla::dom::Element**) (/home/njn/moz/mi5/dom/xul/nsXULElement.cpp:287) + #06: nsXBLContentSink::CreateElement(char16_t const**, unsigned int, mozilla::dom::NodeInfo*, unsigned int, nsIContent**, bool*, mozilla::dom::FromParser) (/home/njn/moz/mi5/dom/xbl/nsXBLContentSink.cpp:874) + #07: nsCOMPtr<nsIContent>::StartAssignment() (/home/njn/moz/mi5/go64dmd/dom/xml/../../dist/include/nsCOMPtr.h:753) + #08: nsXMLContentSink::HandleStartElement(char16_t const*, char16_t const**, unsigned int, unsigned int, bool) (/home/njn/moz/mi5/dom/xml/nsXMLContentSink.cpp:1007) + } + } + +It tells you that there were 150 heap blocks that were allocated from +the program point indicated by the "Allocated at" stack trace, that +these blocks took up 21,600 bytes, that all 150 blocks had a size of 144 +bytes, and that 1,200 of those bytes were "slop" (wasted space caused +by the heap allocator rounding up request sizes). It also indicates what +percentage of the total heap size and the unreported portion of the heap +these blocks represent. + +Within each section, records are listed from largest to smallest. + +Once-reported and twice-reported stack trace records also have stack +traces for the report point(s). For example: + + Reported at { + #01: mozilla::dmd::Report(void const*) (/home/njn/moz/mi2/memory/replace/dmd/DMD.cpp:1740) 0x7f68652581ca + #02: CycleCollectorMallocSizeOf(void const*) (/home/njn/moz/mi2/xpcom/base/nsCycleCollector.cpp:3008) 0x7f6860fdfe02 + #03: nsPurpleBuffer::SizeOfExcludingThis(unsigned long (*)(void const*)) const (/home/njn/moz/mi2/xpcom/base/nsCycleCollector.cpp:933) 0x7f6860fdb7af + #04: nsCycleCollector::SizeOfIncludingThis(unsigned long (*)(void const*), unsigned long*, unsigned long*, unsigned long*, unsigned long*, unsigned long*) const (/home/njn/moz/mi2/xpcom/base/nsCycleCollector.cpp:3029) 0x7f6860fdb6b1 + #05: CycleCollectorMultiReporter::CollectReports(nsIMemoryMultiReporterCallback*, nsISupports*) (/home/njn/moz/mi2/xpcom/base/nsCycleCollector.cpp:3075) 0x7f6860fde432 + #06: nsMemoryInfoDumper::DumpMemoryReportsToFileImpl(nsAString_internal const&) (/home/njn/moz/mi2/xpcom/base/nsMemoryInfoDumper.cpp:626) 0x7f6860fece79 + #07: nsMemoryInfoDumper::DumpMemoryReportsToFile(nsAString_internal const&, bool, bool) (/home/njn/moz/mi2/xpcom/base/nsMemoryInfoDumper.cpp:344) 0x7f6860febaf9 + #08: mozilla::(anonymous namespace)::DumpMemoryReportsRunnable::Run() (/home/njn/moz/mi2/xpcom/base/nsMemoryInfoDumper.cpp:58) 0x7f6860fefe03 + } + +You can tell which memory reporter made the report by the name of the +`MallocSizeOf` function near the top of the stack trace. In this case it +was the cycle collector's reporter. + +By default, DMD does not record an allocation stack trace for most +blocks, to make it run faster. The decision on whether to record is done +probabilistically, and larger blocks are more likely to have an +allocation stack trace recorded. All unreported blocks that lack an +allocation stack trace will end up in a single record. For example: + + Unreported { + 420,010 blocks in heap block record 2 of 5,495 + 29,203,408 bytes (27,777,288 requested / 1,426,120 slop) + Individual block sizes: 2,048 x 3; 1,024 x 103; 512 x 147; 496 x 7; 480 x 31; 464 x 6; 448 x 50; 432 x 41; 416 x 28; 400 x 53; 384 x 43; 368 x 216; 352 x 141; 336 x 58; 320 x 104; 304 x 5,130; 288 x 150; 272 x 591; 256 x 6,017; 240 x 1,372; 224 x 93; 208 x 488; 192 x 1,919; 176 x 18,903; 160 x 1,754; 144 x 5,041; 128 x 36,709; 112 x 5,571; 96 x 6,280; 80 x 40,738; 64 x 37,925; 48 x 78,392; 32 x 136,199; 16 x 31,001; 8 x 4,706 + 3.78% of the heap (10.24% cumulative) + 21.24% of unreported (57.53% cumulative) + Allocated at { + #01: (no stack trace recorded due to --stacks=partial) + } + } + +In contrast, stack traces are always recorded when a block is reported, +which means you can end up with records like this where the allocation +point is unknown but the reporting point *is* known: + + Once-reported { + 104,491 blocks in heap block record 13 of 4,689 + 10,392,000 bytes (10,392,000 requested / 0 slop) + Individual block sizes: 512 x 124; 256 x 242; 192 x 813; 128 x 54,664; 64 x 48,648 + 1.35% of the heap (48.65% cumulative) + 1.64% of once-reported (59.18% cumulative) + Allocated at { + #01: (no stack trace recorded due to --stacks=partial) + } + Reported at { + #01: mozilla::dmd::DMDFuncs::Report(void const*) (/home/njn/moz/mi5/go64dmd/memory/replace/dmd/../../../../memory/replace/dmd/DMD.cpp:1646) + #02: WindowsMallocSizeOf(void const*) (/home/njn/moz/mi5/dom/base/nsWindowMemoryReporter.cpp:189) + #03: nsAttrAndChildArray::SizeOfExcludingThis(unsigned long (*)(void const*)) const (/home/njn/moz/mi5/dom/base/nsAttrAndChildArray.cpp:880) + #04: mozilla::dom::FragmentOrElement::SizeOfExcludingThis(unsigned long (*)(void const*)) const (/home/njn/moz/mi5/dom/base/FragmentOrElement.cpp:2337) + #05: nsINode::SizeOfIncludingThis(unsigned long (*)(void const*)) const (/home/njn/moz/mi5/go64dmd/parser/html/../../../dom/base/nsINode.h:307) + #06: mozilla::dom::NodeInfo::NodeType() const (/home/njn/moz/mi5/go64dmd/dom/base/../../dist/include/mozilla/dom/NodeInfo.h:127) + #07: nsHTMLDocument::DocAddSizeOfExcludingThis(nsWindowSizes*) const (/home/njn/moz/mi5/dom/html/nsHTMLDocument.cpp:3710) + #08: nsIDocument::DocAddSizeOfIncludingThis(nsWindowSizes*) const (/home/njn/moz/mi5/dom/base/nsDocument.cpp:12820) + } + } + +The choice of whether to record an allocation stack trace for all blocks +is controlled by an option (see below). + +### "Live" mode output + + +For "live" mode, dmd.py's output describes what live heap blocks are +present. This output is broken into multiple sections. + +1. "Invocation". This tells you how DMD was invoked, i.e. what + options were used. +2. "Live stack trace records". This tells you which heap blocks were + present. +3. "Summary": gives measurements of the total heap. + +The individual records are similar to those output in "dark matter" +mode. + +### "Cumulative" mode output + +For "cumulative" mode, dmd.py's output describes how the live heap +blocks are covered by memory reports. This output is broken into +multiple sections. + +1. "Invocation". This tells you how DMD was invoked, i.e. what + options were used. +2. "Cumulative stack trace records". This tells you which heap blocks + were allocated during the session. +3. "Summary": gives measurements of the total (cumulative) heap. + +The individual records are similar to those output in "dark matter" +mode. + +### "Scan" mode output + +For "scan" mode, the output of `dmd.py` is the same as "live" mode. +A separate script, `block_analyzer.py`, can be used to find out +information about which blocks refer to a particular block. +`dmd.py --clamp-contents` needs to be run on the log first. See [this +other page](heap_scan_mode.md) for an +overview of how to use heap scan mode to fix a leak involving refcounted +objects. + +## Options + +### Runtime + +When you run `mach run --dmd` you can specify additional options to +control how DMD runs. Run `mach help run` for documentation on these. + +The most interesting one is `--mode`. Acceptable values are +`dark-matter` (the default), `live`, `cumulative`, and `scan`. + +Another interesting one is `--stacks`. Acceptable values are `partial` +(the default) and `full`. In the former case most blocks will not have +an allocation stack trace recorded. However, because larger blocks are +more likely to have one recorded, most allocated bytes should have an +allocation stack trace even though most allocated blocks do not. Use +`--stacks=full` if you want complete information, but note that DMD will +run substantially slower in that case. + +The options may also be put in the environment variable DMD, or set DMD +to 1 to enable DMD with default options (dark-matter and partial +stacks). + +### Post-processing + +`dmd.py` also takes options that control how it works. Run `dmd.py -h` +for documentation. The following options are the most interesting ones. + +- `-f` / `--max-frames`. By default, records show up to 8 stack + frames. You can choose a smaller number, in which case more + allocations will be aggregated into each record, but you'll have + less context. Or you can choose a larger number, in which cases + allocations will be split across more records, but you will have + more context. There is no single best value, but values in the range + 2..10 are often good. The maximum is 24. + +- `-a` / `--ignore-alloc-fns`. Many allocation stack traces start + with multiple frames that mention allocation wrapper functions, e.g. + `js_calloc()` calls `replace_calloc()`. This option filters these + out. It often helps improve the quality of the output when using a + small `--max-frames` value. + +- `-s` / `--sort-by`. This controls how records are sorted. Acceptable + values are `usable` (the default), `req`, `slop` and `num-blocks`. + +- `--clamp-contents`. For a heap scan log, this performs a + conservative pointer analysis on the contents of each block, + changing any value that is a pointer into the middle of a live block + into a pointer to the start of that block. All other values are + changes to null. In addition, all trailing nulls are removed from + the block contents. + +As an example that combines multiple options, if you apply the following +command to a profile obtained in "live" mode: + + dmd.py -r -f 2 -a -s slop + +it will give you a good idea of where the major sources of slop are. + +`dmd.py` can also compute the difference between two DMD output files, +so long as those files were produced in the same mode. Simply pass it +two filenames instead of one to get the difference. + +## Which heap blocks are reported? + +At this stage you might wonder how DMD knows, in "dark matter" mode, +which allocations have been reported and which haven't. DMD only knows +about heap blocks that are measured via a function created with one of +the following two macros: + + MOZ_DEFINE_MALLOC_SIZE_OF + MOZ_DEFINE_MALLOC_SIZE_OF_ON_ALLOC + +Fortunately, most of the existing memory reporters do this. See +[Performance/Memory_Reporting](https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Memory_reporting "Platform/Memory Reporting") +for more details about how memory reporters are written. diff --git a/docs/performance/memory/dominators.md b/docs/performance/memory/dominators.md new file mode 100644 index 0000000000..e64c465e62 --- /dev/null +++ b/docs/performance/memory/dominators.md @@ -0,0 +1,90 @@ +# Dominators + +This article provides an introduction to the concepts of *Reachability*, +*Shallow* versus *Retained* size, and *Dominators*, as they apply in +garbage-collected languages like JavaScript. + +These concepts matter in memory analysis, because often an object may +itself be small, but may hold references to other much larger objects, +and by doing this will prevent the garbage collector from freeing that +extra memory. + +You can see the dominators in a page using the [Dominators +view](dominators_view.md) in the Memory tool. + +With a garbage-collected language, like JavaScript, the programmer +doesn\'t generally have to worry about deallocating memory. They can +just create and use objects, and when the objects are no longer needed, +the runtime takes care of cleaning up, and frees the memory the objects +occupied. + +## Reachability + +In modern JavaScript implementations, the runtime decides whether an +object is no longer needed based on *reachability*. In this system the +heap is represented as one or more graphs. Each node in the graph +represents an object, and each connection between nodes (edge) +represents a reference from one object to another. The graph starts at a +root node, indicated in these diagrams with \"R\". + +![](../img/memory-graph.svg) + +During garbage collection, the runtime traverses the graph, starting at +the root, and marks every object it finds. Any objects it doesn\'t find +are unreachable, and can be deallocated. + +So when an object becomes unreachable (for example, because it is only +referenced by a single local variable which goes out of scope) then any +objects it references also become unreachable, as long as no other +objects reference them: + +![](../img/memory-graph-unreachable.svg) + +Conversely, this means that objects are kept alive as long as some other +reachable object is holding a reference to them. + +## Shallow and retained size + +This gives rise to a distinction between two ways to look at the size of +an object: + +- *shallow size*: the size of the object itself +- *retained size*: the size of the object itself, plus the size of + other objects that are kept alive by this object + +Often, objects will have a small shallow size but a much larger retained +size, through the references they contain to other objects. Retained +size is an important concept in analyzing memory usage, because it +answers the question \"if this object ceases to exist, what\'s the total +amount of memory freed?\". + +## Dominators + +A related concept is that of the *dominator*. Node B is said to dominate +node A if every path from the root to A passes through B: + +![](../img/memory-graph-dominators.svg) + +If any of node A\'s dominators are freed, then node A itself becomes +eligible for garbage collection. + +[If node B dominates node A, but does not dominate any of A\'s other +dominators, then B is the *immediate dominator* of +A:] + +![](../img/memory-graph-immediate-dominator.svg) + +[One slight subtlety here is that if an object A is referenced by two +other objects B and C, then neither object is its +dominator], because you could remove either B or C from +the graph, and A would still be retained by its other referrer. Instead, +the immediate dominator of A would be its first common ancestor:\ +![](../img/memory-graph-dominator-multiple-references.svg) + +## See also + +[Dominators in graph +theory](https://en.wikipedia.org/wiki/Dominator_%28graph_theory%29). + +[Tracing garbage +collection](https://en.wikipedia.org/wiki/Tracing_garbage_collection). diff --git a/docs/performance/memory/dominators_view.md b/docs/performance/memory/dominators_view.md new file mode 100644 index 0000000000..05de01fa4e --- /dev/null +++ b/docs/performance/memory/dominators_view.md @@ -0,0 +1,221 @@ +# Dominators view + +The Dominators view is new in Firefox 46. + +Starting in Firefox 46, the Memory tool includes a new view called the +Dominators view. This is useful for understanding the \"retained size\" +of objects allocated by your site: that is, the size of the objects +themselves plus the size of the objects that they keep alive through +references. + +If you already know what shallow size, retained size, and dominators +are, skip to the Dominators UI section. Otherwise, you might want to +review the article on [Dominators +concepts](dominators.md). + +## Dominators UI + +To see the Dominators view for a snapshot, select \"Dominators\" in the +\"View\" drop-down list. It looks something like this: + +![](../img/dominators-1.png) + +The Dominators view consists of two panels: + +- the [Dominators Tree + panel](#dominators-tree-panel) + shows you which nodes in the snapshot are retaining the most memory +- the [Retaining Paths + panel](#retaining-paths-panel) + (new in Firefox 47) shows the 5 shortest retaining paths for a + single node. + +![](../img/dominators-2.png) + +### Dominators Tree panel + +The Dominators Tree tells you which objects in the snapshot are +retaining the most memory. + +In the main part of the UI, the first row is labeled \"GC Roots\". +Immediately underneath that is an entry for: + +- Every GC root node. In Gecko, there is more than one memory graph, + and therefore more than one root. There may be many (often + temporary) roots. For example: variables allocated on the stack need + to be rooted, or internal caches may need to root their elements. +- Any other node that\'s referenced from two different roots (since in + this case, neither root dominates it). + +Each entry displays: + +- the retained size of the node, as bytes and as a percentage of the + total +- the shallow size of the node, as bytes and as a percentage of the + total +- the nodes\'s name and address in memory. + +Entries are ordered by the amount of memory that they retain. For +example: + +![](../img/dominators-3.png) + +In this screenshot we can see five entries under \"GC Roots\". The first +two are Call and Window objects, and retain about 21% and 8% of the +total size of the memory snapshot, respectively. You can also see that +these objects have a relatively tiny \"Shallow Size\", so almost all of +the retained size is in the objects that they dominate. + +Immediately under each GC root, you\'ll see all the nodes for which this +root is the [immediate +dominator](/dominators.html#immediate_dominator). +These nodes are also ordered by their retained size. + +For example, if we click on the first Window object: + +![](../img/dominators-4.png) + +We can see that this Window dominates a CSS2Properties object, whose +retained size is 2% of the total snapshot size. Again the shallow size +is very small: almost all of its retained size is in the nodes that it +dominates. By clicking on the disclosure arrow next to the Function, we +can see those nodes. + +In this way you can quickly get a sense of which objects retain the most +memory in the snapshot. + +You can use [Alt]{.kbd} + click to expand the whole graph under a node. + +#### Call Stack {#Call_Stack} + +In the toolbar at the top of the tool is a dropdown called \"Label by\": + +![](../img/dominators-5.png) + +By default, this is set to \"Type\". However, you can set it instead to +\"Call Stack\" to see exactly where in your code the objects are being +allocated. + +::: {.note} +This option is called \"Allocation Stack\" in Firefox 46. +::: + +To enable this, you must check the box labeled \"Record call stacks\" +*before* you run the code that allocates the objects. Then take a +snapshot, then select \"Call Stack\" in the \"Label by\" drop-down. + +Now the node\'s name will contain the name of the function that +allocated it, and the file, line number and character position of the +exact spot where the function allocated it. Clicking the file name will +take you to that spot in the Debugger. + +<iframe width="595" height="325" src="https://www.youtube.com/embed/qTF5wCSD124" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"></iframe> + +::: +Sometimes you\'ll see \"(no stack available)\" here. In particular, +allocation stacks are currently only recorded for objects, not for +arrays, strings, or internal structures. +::: + +### Retaining Paths panel + +::: {.geckoVersionNote} +The Retaining Paths panel is new in Firefox 47. +::: + +The Retaining Paths panel shows you, for a given node, the 5 shortest +paths back from this node to a GC root. This enables you to see all the +nodes that are keeping the given node from being garbage-collected. If +you suspect that an object is being leaked, this will show you exactly +which objects are holding a reference to it. + +To see the retaining paths for a node, you have to select the node in +the Dominators Tree panel: + +![](../img/dominators-6.png) + +Here, we\'ve selected an object, and can see a single path back to a GC +root. + +The `Window` GC root holds a reference to an `HTMLDivElement` object, +and that holds a reference to an `Object`, and so on. If you look in the +Dominators Tree panel, you can trace the same path there. If either of +these references were removed, the items below them could be +garbage-collected. + +Each connection in the graph is labeled with the variable name for the +referenced object. + +Sometimes there\'s more than one retaining path back from a node: + +![](../img/dominators-7.png) + +Here there are three paths back from the `DocumentPrototype` node to a +GC root. If one were removed, then the `DocumentPrototype` would still +not be garbage-collected, because it\'s still retained by the other two +path. + +## Example {#Example} + +Let\'s see how some simple code is reflected in the Dominators view. + +We\'ll use the [monster allocation +example](monster_example.md), which creates three +arrays, each containing 5000 monsters, each monster having a +randomly-generated name. + +### Taking a snapshot + +To see what it looks like in the Dominators view: + +- load the page +- enable the Memory tool in the + [Settings](https://developer.mozilla.org/en-US/docs/Tools/Tools_Toolbox#settings), if you + haven\'t already +- open the Memory tool +- check \"Record call stacks\" +- press the button labeled \"Make monsters!\" +- take a snapshot +- switch to the \"Dominators\" view + +<iframe width="595" height="325" src="https://www.youtube.com/embed/HiWnfMoMs2c" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"></iframe> + +### Analyzing the Dominators Tree + +You\'ll see the three arrays as the top three GC roots, each retaining +about 23% of the total memory usage: + +![](../img/dominators-8.png) + +If you expand an array, you\'ll see the objects (monsters) it contains. +Each monster has a relatively small shallow size of 160 bytes. This +includes the integer eye- and tentacle-counts. Each monster has a bigger +retained size, which is accounted for by the string used for the +monster\'s name: + +![](../img/dominators-9.png) + +All this maps closely to the [memory graph we were expecting to +see](/monster_example.html#allocation-graph). One +thing you might be wondering, though, is: where\'s the top-level object +that retains all three arrays? If we look at the Retaining Paths panel +for one of the arrays, we\'ll see it: + +![](../img/dominators-10.png) + +Here we can see the retaining object, and even that this particular +array is the array of `fierce` monsters. But the array is also rooted +directly, so if the object were to stop referencing the array, it would +still not be eligible for garbage collection. + +This means that the object does not dominate the array, and is therefore +not shown in the Dominators Tree view. [See the relevant section of the +Dominators concepts +article](dominators.html#multiple-paths). + +### Using the Call Stack view {#Using_the_Call_Stack_view} + +Finally, you can switch to the Call Stack view, see where the objects +are being allocated, and jump to that point in the Debugger: + +<iframe width="595" height="325" src="https://www.youtube.com/embed/qTF5wCSD124" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"></iframe> diff --git a/docs/performance/memory/gc_and_cc_logs.md b/docs/performance/memory/gc_and_cc_logs.md new file mode 100644 index 0000000000..62e151dff4 --- /dev/null +++ b/docs/performance/memory/gc_and_cc_logs.md @@ -0,0 +1,109 @@ +# GC and CC logs + +Garbage collector (GC) and cycle collector (CC) logs give information +about why various JS and C++ objects are alive in the heap. Garbage +collector logs and cycle collector logs can be analyzed in various ways. +In particular, CC logs can be used to understand why the cycle collector +is keeping an object alive. These logs can either be manually or +automatically generated, and they can be generated in both debug and +non-debug builds. + +This logs the contents of the Javascript heap to a file named +`gc-edges-NNNN.log`. It also creates a file named `cc-edges-NNNN.log` to +which it dumps the parts of the heap visible to the cycle collector, +which includes native C++ objects that participate in cycle collection, +as well as JS objects being held alive by those C++ objects. + +## Generating logs + +### From within Firefox + +To manually generate GC and CC logs, navigate to `about:memory` and use +the buttons under \"Save GC & CC logs.\" \"Save concise\" will generate +a smaller CC log, \"Save verbose\" will provide a more detailed CC log. +(The GC log will be the same size in either case.) + +With multiprocess Firefox, you can't record logs from the content +process, due to sandboxing. You'll need to disable sandboxing by +setting `MOZ_DISABLE_CONTENT_SANDBOX=t` when you run Firefox. + +### From the commandline + +TLDR: if you just want shutdown GC/CC logs to debug leaks that happen in +our automated tests, you probably want something along the lines of: + + MOZ_DISABLE_CONTENT_SANDBOX=t MOZ_CC_LOG_DIRECTORY=/full/path/to/log/directory/ MOZ_CC_LOG_SHUTDOWN=1 MOZ_CC_ALL_TRACES=shutdown ./mach ... + +As noted in the previous section, with multiprocess Firefox, you can't +record logs from the content process, due to sandboxing. You'll need to +disable sandboxing by setting `MOZ_DISABLE_CONTENT_SANDBOX=t` when you +run Firefox. + +On desktop Firefox you can override the default location of the log +files by setting the `MOZ_CC_LOG_DIRECTORY` environment variable. By +default, they go to a temporary directory which differs per OS - it's +`/tmp/` on Linux/BSD, `$LOCALAPPDATA\Temp\` on Windows, and somewhere in +`/var/folders/` on Mac (whatever the directory service returns for +`TmpD`/`NS_OS_TEMP_DIR`). Note that just `MOZ_CC_LOG_DIRECTORY=.` won't +work - you need to specify a full path. On Firefox for Android you can +use the cc-dump.xpi +extension to save the files to `/sdcard`. By default, the file is +created in some temp directory, and the path to the file is printed to +the Error Console. + +To log every cycle collection, set the `MOZ_CC_LOG_ALL` environment +variable. To log only shutdown collections, set `MOZ_CC_LOG_SHUTDOWN`. +To make all CCs verbose, set `MOZ_CC_ALL_TRACES to "all`\", or to +\"`shutdown`\" to make only shutdown CCs verbose. + +Live GC logging can be enabled with the pref +`javascript.options.mem.log`. Output to a file can be controlled with +the MOZ_GCTIMER environment variable. See the [Statistics +API](https://developer.mozilla.org/en-US/docs/Tools/Tools_Toolbox#settings/en-US/docs/SpiderMonkey/Internals/GC/Statistics_API) page for +details on values. + +Set the environment variable `MOZ_CC_LOG_THREAD` to `main` to only log +main thread CCs, or to `worker` to only log worker CCs. The default +value is `all`, which will log all CCs. + +To get cycle collector logs on Try server, set `MOZ_CC_LOG_DIRECTORY` to +`MOZ_UPLOAD_DIR`, then set the other variables appropriately to generate +CC logs. The way to set environment variables depends on the test +harness, or you can modify the code in nsCycleCollector to set that +directly. To find the CC logs once the try run has finished, click on +the particular job, then click on \"Job Details\" in the bottom pane in +TreeHerder, and you should see download links. + +To set the environment variable, find the `buildBrowserEnv` method in +the Python file for the test suite you are interested in, and add +something like this code to the file: + + browserEnv["MOZ_CC_LOG_DIRECTORY"] = os.environ["MOZ_UPLOAD_DIR"] + browserEnv["MOZ_CC_LOG_SHUTDOWN"] = "1" + +## Analyzing GC and CC logs + +There are numerous scripts that analyze GC and CC logs on +[GitHub](https://github.com/amccreight/heapgraph/) + + +To find out why an object is being kept alive, you should use `find_roots.py` +in the root of the github repository. Calling `find_roots.py` on a CC log +with a specific object or kind of object will produce paths from rooting +objects to the specified objects. Most big leaks include an `nsGlobalWindow`, +so that's a good class to try if you don't have any better idea. + +To fix a leak, the next step is to figure out why the rooting object is +alive. For a C++ object, you need to figure out where the missing +references are from. For a JS object, you need to figure out why the JS +object is reachable from a JS root. For the latter, you can use the +corresponding [`find_roots.py` for +JS](https://github.com/amccreight/heapgraph/tree/master/g) +on the GC log. + +## Alternatives + +There are two add-ons that can be used to create and analyze CC graphs. + +- [about:cc](https://bugzilla.mozilla.org/show_bug.cgi?id=726346) + is simple, ugly, but rather powerful. diff --git a/docs/performance/memory/heap_scan_mode.md b/docs/performance/memory/heap_scan_mode.md new file mode 100644 index 0000000000..ea5a45016a --- /dev/null +++ b/docs/performance/memory/heap_scan_mode.md @@ -0,0 +1,313 @@ +# DMD heap scan mode + +Firefox's DMD heap scan mode tracks the set of all live blocks of +malloc-allocated memory and their allocation stacks, and allows you to +log these blocks, and the values stored in them, to a file. When +combined with cycle collector logging, this can be used to investigate +leaks of refcounted cycle collected objects, by figuring out what holds +a strong reference to a leaked object. + +**When should you use this?** DMD heap scan mode is intended to be used +to investigate leaks of cycle collected (CCed) objects. DMD heap scan +mode is a "tool of last resort" that should only be used when all +other avenues have been tried and failed, except possibly [ref count +logging](refcount_tracing_and_balancing.md). +It is particularly useful if you have no idea what is causing the leak. +If you have a patch that introduces a leak, you are probably better off +auditing all of the strong references that your patch creates before +trying this. + +The particular steps given below are intended for the case where the +leaked object is alive all the way through shutdown. You could modify +these steps for leaks that go away in shutdown by collecting a CC and +DMD log prior to shutdown. However, in that case it may be easier to use +refcount logging, or rr with a conditional breakpoint set on calls to +`Release()` for the leaking object, to see what object actually does the +release that causes the leaked object to go away. + +## Prerequisites + +- A debug DMD build of Firefox. [This + page](dmd.md) + describes how to do that. This should probably be an optimized + build. Non-optimized DMD builds will generate better stack traces, + but they can be so slow as to be useless. +- The build is going to be very slow, so you may need to disable some + shutdown checks. First, in + `toolkit/components/terminator/nsTerminator.cpp`, delete everything + in `RunWatchDog` but the call to `NS_SetCurrentThreadName`. This + will keep the watch dog from killing the browser when shut down + takes multiple minutes. Secondly, you may need to comment out the + call to `MOZ_CRASH("NSS_Shutdown failed");` in + `xpcom/build/XPCOMInit.cpp`, as this also seems to trigger when + shutdown is extremely slow. +- You need the cycle collector analysis script `find_roots.py`, which + can be downloaded as part of [this repo on + Github](https://github.com/amccreight/heapgraph). + +## Generating Logs + +The next step is to generate a number of log files. You need to get a +shutdown CC log and a DMD log, for a single run. + +**Definitions** I'll write `$objdir` for the object directory for your +Firefox DMD build, `$srcdir` for the top level of the Firefox source +directory, and `$heapgraph` for the location of the heapgraph repo, and +`$logdir` for the location you want logs to go to. `$logdir` should end +in a path separator. For instance, `~/logs/leak/`. + +The command you need to run Firefox will look something like this: + + XPCOM_MEM_BLOAT_LOG=1 MOZ_CC_LOG_SHUTDOWN=1 MOZ_DISABLE_CONTENT_SANDBOX=t MOZ_CC_LOG_DIRECTORY=$logdir + MOZ_CC_LOG_PROCESS=content MOZ_CC_LOG_THREAD=main MOZ_DMD_SHUTDOWN_LOG=$logdir MOZ_DMD_LOG_PROCESS=tab ./mach run --dmd --mode=scan + +Breaking this down: + +- `XPCOM_MEM_BLOAT_LOG=1`: This reports a list of the counts of every + object created and destroyed and tracked by the XPCOM leak tracking + system. From this chart, you can see how many objects of a + particular type were leaked through shutdown. This can come in handy + during the manual analysis phase later, to get evidence to support + your hunches. For instance, if you think that an `nsFoo` object + might be holding your leaking object alive, you can use this to + easily see if we leaked an `nsFoo` object. +- `MOZ_CC_LOG_SHUTDOWN=1`: This generates a cycle collector log during + shutdown. Creating this log during shutdown is nice because there + are less things unrelated to the leak in the log, and various cycle + collector optimizations are disabled. A garbage collector log will + also be created, which you may not need. +- `MOZ_DISABLE_CONTENT_SANDBOX=t`: This disables the content process + sandbox, which is needed because the DMD and CC log files are + created directly by the child processes. +- `MOZ_CC_LOG_DIRECTORY=$logdir`: This selects the location for cycle + collector logs to be saved. +- `MOZ_CC_LOG_PROCESS=content MOZ_CC_LOG_THREAD=main`: These options + specify that we only want CC logs for the main thread of content + processes, to make shutdown less slow. If your leak is happening in + a different process or thread, change the options, which are listed + in `xpcom/base/nsCycleCollector.cpp`. +- `MOZ_DMD_SHUTDOWN_LOG=$logdir`: This option specifies that we want a + DMD log to be taken very late in XPCOM shutdown, and the location + for that log to be saved. Like with the CC log, we want this log + very late to avoid as many non-leaking things as possible. +- `MOZ_DMD_LOG_PROCESS=tab`: As with the CC, this means that we only + want these logs in content processes, in order to make shutdown + faster. The allowed values here are the same as those returned by + `XRE_GetProcessType()`, so adjust as needed. +- Finally, the `--dmd` option need to be passed in so that DMD will be + run. `--mode=scan` is needed so that when we get a DMD log the + entire contents of each block of memory is saved for later analysis. + +With that command line in hand, you can start Firefox. Be aware that +this may take multiple minutes if you have optimization disabled. + +Once it has started, go through the steps you need to reproduce your +leak. If your leak is a ghost window, it can be handy to get an +`about:memory` report and write down the PID of the leaking process. You +may want to wait 10 or so seconds after this to make sure as much as +possible is cleaned up. + +Next, exit the browser. This will cause a lot of logs to be written out, +so it can take a while. + +## Analyzing the Logs + +### Getting the PID and address of the leaking object + +The first step is to figure out the **PID** of the leaking process. The +second step is to figure out **the address of the leaking object**, +usually a window. Conveniently, you can usually do both at once using +the cycle collector log. If you are investigating a leak of +`www.example.com`, then from `$logdir` you can do +`"grep nsGlobalWindow cc-edges* | grep example.com"`. This looks through +all of the windows in all of the CC logs (which may leaked, this late in +shutdown), and then filters out windows where the URL contains +`example.com`. + +The result of that grep will contain output that looks something like +this: + + cc-edges.15873.log:0x7f0897082c00 [rc=1285] nsGlobalWindowInner # 2147483662 inner https://www.example.com/ + +cc-edges.15873.log: The first part is the file name where it was +found. `15873` is the PID of the process that leaked. You'll want to +write down the name of the file and the PID. Let's call the file +`$cclog` and the pid `$pid`. + +0x7f0897082c00: This is the address of the leaking window. You'll +also want to write that down. Let's call this `$winaddr`. + +If there are multiple files, you'll end up with one that looks like +`cc-edges.$pid.log` and one or more that look like +`cc-edges.$pid-$n.log` for various values of `$n`. You want the one with +the largest `$n`, as this was recorded the latest, and so it will +contain the least non-garbage. + +### Identifying the root in the cycle collector log + +The next step is to figure out why the cycle collector could not collect +the window, using the `find_roots.py` script from the heapgraph +repository. The command to invoke this looks like this: + + python $heapgraph/find_roots.py $cclog $winaddr + +This may take a few seconds. It will eventually produce some output. +You'll want to save a copy of this output for later. + +The output will look something like this, after a message about loading +progress: + + 0x7f0882fe3230 [FragmentOrElement (xhtml) script https://www.example.com] + --[[via hash] mListenerManager]--> 0x7f0899b4e550 [EventListenerManager] + --[mListeners event=onload listenerType=3 [i]]--> 0x7f0882ff8f80 [CallbackObject] + --[mIncumbentGlobal]--> 0x7f0897082c00 [nsGlobalWindowInner # 2147483662 inner https://www.example.com] + +Root 0x7f0882fe3230 is a ref counted object with 1 unknown edge(s). + known edges: + 0x7f08975a24c0 [FragmentOrElement (xhtml) head https://www.example.com] --[mAttrsAndChildren[i]]--> 0x7f0882fe3230 + 0x7f08967e7b20 [JS Object (HTMLScriptElement)] --[UnwrapDOMObject(obj)]--> 0x7f0882fe3230 + +The first two lines mean that the script element `0x7f0882fe3230` +contains a strong reference to the EventListenerManager +`0x7f0899b4e550`. "[via hash] mListenerManager" is a description of +that strong reference. Together, these lines show a chain of strong +references from an object the cycle collector thinks needs to be kept +alive, `0x7f0899b4e550`, to the object` 0x7f0897082c00` that you asked +about. Most of the time, the actual chain is not important, because the +cycle collector can only tell us about what went right. Let us call the +address of the leaking object (`0x7f0882fe3230` in this case) +`$leakaddr`. + +Besides `$leakaddr`, the other interesting part is the chunk at the +bottom. It tells us that there is 1 unknown edge, and 2 known edges. +What this means is that the leaking object has a refcount of 3, but the +cycle collector was only told about these two references. In this case, +a head element and a JS object (the JS reflector of the script element). +We need to figure out what the unknown reference is from, as that is +where our leak really is. + +### Figure out what is holding the leaking object alive. + +Now we need to use the DMD heap scan logs. These contain the contents of +every live block of memory. + +The first step to using the DMD heap scan logs is to do some +pre-processing for the DMD log. Stacks need to be symbolicated, and we +need to clamp the values contained in the heap. Clamping is the same +kind of analysis that a conservative GC does: if a word-aligned value in +a heap block points to somewhere within another heap block, replace that +value with the address of the block. + +Both kinds of preprocessing are done by the `dmd.py` script, which can +be invoked like this: + + $objdir/dist/bin/dmd.py --clamp-contents dmd-$pid.log.gz + +This can take a few minutes due to symbolification, but you only need to +run it once on a log file. + +You can also locally symbolicate stacks from DMD logs generated on TreeHerder, +but it will [take a few extra steps](/contributing/debugging/local_symbols.rst) +that you need to do before running `dmd.py`. + +After that is done, we can finally find out which objects (possibly) +point to other objects, using the block_analyzer script: + + python $srcdir/memory/replace/dmd/block_analyzer.py dmd-$pid.log.gz $leakaddr + +This will look through every block of memory in the log, and give some +basic information about any block of memory that (possibly) contains a +pointer to that object. You can pass some additional options to affect +how the results are displayed. "-sfl 10000 -a" is useful. The -sfl 10000 +tells it to not truncate stack frames, and -a tells it to not display +generic frames related to the allocator. + +Caveat: I think block_analyzer.py does not attempt to clamp the address +you pass into it, so if it is an offset from the start of the block, it +won't find it. + + block_analyzer.py` will return a series of entries that look like this + with the [...] indicating where I have removed things): + 0x7f089306b000 size = 4096 bytes at byte offset 2168 + nsAttrAndChildArray::GrowBy[...] + nsAttrAndChildArray::InsertChildAt[...] + [...] + +`0x7f089306b000` is the address of the block that contains `$leakaddr`. +144 bytes is the size of that block. That can be useful for confirming +your guess about what class the block actually is. The byte offset tells +you were in the block the pointer is. This is mostly useful for larger +objects, and you can potentially combine this with debugging information +to figure out exactly what field this is. The rest of the entry is the +stack trace for the allocation of the block, which is the most useful +piece of information. + +What you need to do now is to go through every one of these entries and +place it into three categories: strong reference known to the cycle +collector, weak reference, or something else! The goal is to eventually +shrink down the "something else" category until there are only as many +things in it as there are unknown references to the leaking object, and +then you have your leaker. + +To place an entry into one of the categories, you must look at the code +locations given in the stack trace, and see if you can tell what the +object is based on that, then compare that to what `find_roots.py` told +you. + +For instance, one of the strong references in the CC log is from a head +element to its child via `mAttrsAndChildren`, and that sounds a lot like +this, so we can mark it as being a strong known reference. + +This is an iterative process, where you first go through and mark off +the things that are easily categorizable, and repeat until you have a +small list of things to analyze. + +### Example analysis of block_analyzer.py results + +In one debugging session where I was investigating the leak from bug +1451985, I eventually reduced the list of entries until this was the +most suspicious looking entry: + + 0x7f0892f29630 size = 392 bytes at byte offset 56 + mozilla::dom::ScriptLoader::ProcessExternalScript[...] + [...] + +I went to that line of `ScriptLoader::ProcessExternalScript()`, and it +contained a call to ScriptLoader::CreateLoadRequest(). Fortunately, this +method mostly just contains two calls to `new`, one for +`ScriptLoadRequest` and one for `ModuleLoadRequest`. (This is where an +unoptimized build comes in handy, as it would have pointed out the exact +line. Unfortunately, in this particular case, the unoptimized build was +so slow I wasn't getting any logs.) I then looked through the list of +leaked objects generated by `XPCOM_MEM_BLOAT_LOG` and saw that we were +leaking a `ScriptLoadRequest`, so I went and looked at its class +definition, where I noticed that `ScriptLoadRequest` had a strong +reference to an element that it wasn't telling the cycle collector +about, which seemed suspicious. + +The first thing I did to try to confirm that this was the source of the +leak was pass the address of this object into the cycle collector +analysis log, `find_roots.py`, that we used at an earlier step. That +gave a result that contained this: + + 0x7f0882fe3230 [FragmentOrElement (xhtml) script [...] + --[mNodeInfo]--> 0x7f0897431f00 [NodeInfo (xhtml) script] + [...] + --[mLoadingAsyncRequests]--> 0x7f0892f29630 [ScriptLoadRequest] + +This confirms that this block is actually a ScriptLoadRequest. Secondly, +notice that the load request is being held alive by the very same script +element that is causing the window leak! This strongly suggests that +there is a cycle of strong references between the script element and the +load request. I then added the missing field to the traverse and unlink +methods of ScriptLoadRequest, and confirmed that I couldn't reproduce +the leak. + +Keep in mind that you may need to run `block_analyzer.py` multiple +times. For instance, if the script element was being held alive by some +container being held alive by a runnable, we'd first need to figure out +that the container was holding the element. If it isn't possible to +figure out what is holding that alive, you'd have to run block_analyzer +again. This isn't too bad, because unlike ref count logging, we have the +full state of memory in our existing log, so we don't need to run the +browser again. diff --git a/docs/performance/memory/leak_gauge.md b/docs/performance/memory/leak_gauge.md new file mode 100644 index 0000000000..153303549e --- /dev/null +++ b/docs/performance/memory/leak_gauge.md @@ -0,0 +1,45 @@ +# Leak Gauge + +Leak Gauge is a tool that can be used to detect certain kinds of leaks +in Gecko, including those involving documents, window objects, and +docshells. It has two parts: instrumentation in Gecko that produces a +log file, and a script to post-process the log file. + +## Getting a log file + +To get a log file, run the browser with these settings: + + NSPR_LOG_MODULES=DOMLeak:5,DocumentLeak:5,nsDocShellLeak:5,NodeInfoManagerLeak:5 + NSPR_LOG_FILE=nspr.log # or any other filename of your choice + +This overwrites any existing file named `nspr.log`. The browser runs +with a negligible slowdown. For reliable results, exit the browser +before post-processing the log file. + +## Post-processing the log file + +Post-process the log file with +[tools/leak-gauge/leak-gauge.pl](https://searchfox.org/mozilla-central/source/tools/leak-gauge/leak-gauge.html) + +If there are no leaks, the output looks like this: + + Results of processing log leak.log : + Summary: + Leaked 0 out of 11 DOM Windows + Leaked 0 out of 44 documents + Leaked 0 out of 3 docshells + Leaked content nodes in 0 out of 0 documents + +If there are leaks, the output looks like this: + + Results of processing log leak2.log : + Leaked outer window 2c6e410 at address 2c6e410. + Leaked outer window 2c6ead0 at address 2c6ead0. + Leaked inner window 2c6ec80 (outer 2c6ead0) at address 2c6ec80. + Summary: + Leaked 13 out of 15 DOM Windows + Leaked 35 out of 46 documents + Leaked 4 out of 4 docshells + Leaked content nodes in 42 out of 53 documents + +If you find leaks, please file a bug report. diff --git a/docs/performance/memory/leak_hunting_strategies_and_tips.md b/docs/performance/memory/leak_hunting_strategies_and_tips.md new file mode 100644 index 0000000000..a5689223ea --- /dev/null +++ b/docs/performance/memory/leak_hunting_strategies_and_tips.md @@ -0,0 +1,219 @@ +# Leak hunting strategies and tips + +This document is old and some of the information is out-of-date. Use +with caution. + +## Strategy for finding leaks + +When trying to make a particular testcase not leak, I recommend focusing +first on the largest object graphs (since these entrain many smaller +objects), then on smaller reference-counted object graphs, and then on +any remaining individual objects or small object graphs that don't +entrain other objects. + +Because (1) large graphs of leaked objects tend to include some objects +pointed to by global variables that confuse GC-based leak detectors, +which can make leaks look smaller (as in [bug +99180](https://bugzilla.mozilla.org/show_bug.cgi?id=99180){.external +.text}) or hide them completely and (2) large graphs of leaked objects +tend to hide smaller ones, it's much better to go after the large +graphs of leaks first. + +A good general pattern for finding and fixing leaks is to start with a +task that you want not to leak (for example, reading email). Start +finding and fixing leaks by running part of the task under nsTraceRefcnt +logging, gradually building up from as little as possible to the +complete task, and fixing most of the leaks in the first steps before +adding additional steps. (By most of the leaks, I mean the leaks of +large numbers of different types of objects or leaks of objects that are +known to entrain many non-logged objects such as JS objects. Seeing a +leaked `GlobalWindowImpl`, `nsXULPDGlobalObject`, +`nsXBLDocGlobalObject`, or `nsXPCWrappedJS` is a sign that there could +be significant numbers of JS objects leaked.) + +For example, start with bringing up the mail window and closing the +window without doing anything. Then go on to selecting a folder, then +selecting a message, and then other activities one does while reading +mail. + +Once you've done this, and it doesn't leak much, then try the action +under trace-malloc or LSAN or Valgrind to find the leaks of smaller +graphs of objects. (When I refer to the size of a graph of objects, I'm +referring to the number of objects, not the size in bytes. Leaking many +copies of a string could be a very large leak, but the object graphs are +small and easy to identify using GC-based leak detection.) + +## What leak tools do we have? + +| Tool | Finds | Platforms | Requires | +|------------------------------------------|------------------------------------------------------|---------------------|--------------| +| Leak tools for large object graphs | | | | +| [Leak Gauge](leak_gauge.md) | Windows, documents, and docshells only | All platforms | Any build | +| [GC and CC logs](gc_and_cc_logs.md) | JS objects, DOM objects, many other kinds of objects | All platforms | Any build | +| Leak tools for medium-size object graphs | | | | +| [BloatView](bloatview.md), [refcount tracing and balancing](refcount_tracing_and_balancing.md) | Objects that implement `nsISupports` or use `MOZ_COUNT_{CTOR,DTOR}` | All tier 1 platforms | Debug build (or build opt with `--enable-logrefcnt`)| +| Leak tools for debugging memory growth that is cleaned up on shutdown | | | + +## Common leak patterns + +When trying to find a leak of reference-counted objects, there are a +number of patterns that could cause the leak: + +1. Ownership cycles. The most common source of hard-to-fix leaks is + ownership cycles. If you can avoid creating cycles in the first + place, please do, since it's often hard to be sure to break the + cycle in every last case. Sometimes these cycles extend through JS + objects (discussed further below), and since JS is + garbage-collected, every pointer acts like an owning pointer and the + potential for fan-out is larger. See [bug + 106860](https://bugzilla.mozilla.org/show_bug.cgi?id=106860){.external + .text} and [bug + 84136](https://bugzilla.mozilla.org/show_bug.cgi?id=84136){.external + .text} for examples. (Is this advice still accurate now that we have + a cycle collector? \--Jesse) +2. Dropping a reference on the floor by: + 1. Forgetting to release (because you weren't using `nsCOMPtr` + when you should have been): See [bug + 99180](https://bugzilla.mozilla.org/show_bug.cgi?id=99180){.external + .text} or [bug + 93087](https://bugzilla.mozilla.org/show_bug.cgi?id=93087){.external + .text} for an example or [bug + 28555](https://bugzilla.mozilla.org/show_bug.cgi?id=28555){.external + .text} for a slightly more interesting one. This is also a + frequent problem around early returns when not using `nsCOMPtr`. + 2. Double-AddRef: This happens most often when assigning the result + of a function that returns an AddRefed pointer (bad!) into an + `nsCOMPtr` without using `dont_AddRef()`. See [bug + 76091](https://bugzilla.mozilla.org/show_bug.cgi?id=76091){.external + .text} or [bug + 49648](https://bugzilla.mozilla.org/show_bug.cgi?id=49648){.external + .text} for an example. + 3. \[Obscure\] Double-assignment into the same variable: If you + release a member variable and then assign into it by calling + another function that does the same thing, you can leak the + object assigned into the variable by the inner function. (This + can happen equally with or without `nsCOMPtr`.) See [bug + 38586](https://bugzilla.mozilla.org/show_bug.cgi?id=38586){.external + .text} and [bug + 287847](https://bugzilla.mozilla.org/show_bug.cgi?id=287847){.external + .text} for examples. +3. Dropping a non-refcounted object on the floor (especially one that + owns references to reference counted objects). See [bug + 109671](https://bugzilla.mozilla.org/show_bug.cgi?id=109671){.external + .text} for an example. +4. Destructors that should have been virtual: If you expect to override + an object's destructor (which includes giving a derived class of it + an `nsCOMPtr` member variable) and delete that object through a + pointer to the base class using delete, its destructor better be + virtual. (But we have many virtual destructors in the codebase that + don't need to be -- don't do that.) + +## Debugging leaks that go through XPConnect + +Many large object graphs that leak go through +[XPConnect](http://www.mozilla.org/scriptable/){.external .text}. This +can mean there will be XPConnect wrapper objects showing up as owning +the leaked objects, but it doesn't mean it's XPConnect's fault +(although that [has been known to +happen](https://bugzilla.mozilla.org/show_bug.cgi?id=76102){.external +.text}, it's rare). Debugging leaks that go through XPConnect requires +a basic understanding of what XPConnect does. XPConnect allows an XPCOM +object to be exposed to JavaScript, and it allows certain JavaScript +objects to be exposed to C++ code as normal XPCOM objects. + +When a C++ object is exposed to JavaScript (the more common of the two), +an XPCWrappedNative object is created. This wrapper owns a reference to +the native object until the corresponding JavaScript object is +garbage-collected. This means that if there are leaked GC roots from +which the wrapper is reachable, the wrapper will never release its +reference on the native object. While this can be debugged in detail, +the quickest way to solve these problems is often to simply debug the +leaked JS roots. These roots are printed on shutdown in DEBUG builds, +and the name of the root should give the type of object it is associated +with. + +One of the most common ways one could leak a JS root is by leaking an +`nsXPCWrappedJS` object. This is the wrapper object in the reverse +direction \-- when a JS object is used to implement an XPCOM interface +and be used transparently by native code. The `nsXPCWrappedJS` object +creates a GC root that exists as long as the wrapper does. The wrapper +itself is just a normal reference-counted object, so a leaked +`nsXPCWrappedJS` can be debugged using the normal refcount-balancer +tools. + +If you really need to debug leaks that involve JS objects closely, you +can get detailed printouts of the paths JS uses to mark objects when it +is determining the set of live objects by using the functions added in +[bug +378261](https://bugzilla.mozilla.org/show_bug.cgi?id=378261){.external +.text} and [bug +378255](https://bugzilla.mozilla.org/show_bug.cgi?id=378255){.external +.text}. (More documentation of this replacement for GC_MARK_DEBUG, the +old way of doing it, would be useful. It may just involve setting the +`XPC_SHUTDOWN_HEAP_DUMP` environment variable to a file name, but I +haven't tested that.) + +## Post-processing of stack traces + +On Mac and Linux, the stack traces generated by our internal debugging +tools don't have very good symbol information (since they just show the +results of `dladdr`). The stacks can be significantly improved (better +symbols, and file name / line number information) by post-processing. +Stacks can be piped through the script `tools/rb/fix_stacks.py` to do +this. These scripts are designed to be run on balance trees in addition +to raw stacks; since they are rather slow, it is often **much faster** +to generate balance trees (e.g., using `make-tree.pl` for the refcount +balancer or `diffbloatdump.pl --use-address` for trace-malloc) and*then* +run the balance trees (which are much smaller) through the +post-processing. + +## Getting symbol information for system libraries + +### Windows + +Setting the environment variable `_NT_SYMBOL_PATH` to something like +`symsrv*symsrv.dll*f:\localsymbols*http://msdl.microsoft.com/download/symbols` +as described in [Microsoft's +article](http://support.microsoft.com/kb/311503){.external .text}. This +needs to be done when running, since we do the address to symbol mapping +at runtime. + +### Linux + +Many Linux distros provide packages containing external debugging +symbols for system libraries. `fix_stacks.py` uses this debugging +information (although it does not verify that they match the library +versions on the system). + +For example, on Fedora, these are in \*-debuginfo RPMs (which are +available in yum repositories that are disabled by default, but easily +enabled by editing the system configuration). + +## Tips + +### Disabling Arena Allocation + +With many lower-level leak tools (particularly trace-malloc based ones, +like leaksoup) it can be helpful to disable arena allocation of objects +that you're interested in, when possible, so that each object is +allocated with a separate call to malloc. Some places you can do this +are: + +layout engine +: Define `DEBUG_TRACEMALLOC_FRAMEARENA` where it is commented out in + `layout/base/nsPresShell.cpp` + +glib +: Set the environment variable `G_SLICE=always-malloc` + +## Other References + +- [Performance + tools](https://wiki.mozilla.org/Performance:Tools "Performance:Tools") +- [Leak Debugging Screencasts](https://dbaron.org/mozilla/leak-screencasts/){.external + .text} +- [LeakingPages](https://wiki.mozilla.org/LeakingPages "LeakingPages") - + a list of pages known to leak +- [mdc:Performance](https://developer.mozilla.org/en/Performance "mdc:Performance"){.extiw} - + contains documentation for all of our memory profiling and leak + detection tools diff --git a/docs/performance/memory/memory.md b/docs/performance/memory/memory.md new file mode 100644 index 0000000000..d571fb6b9c --- /dev/null +++ b/docs/performance/memory/memory.md @@ -0,0 +1,64 @@ +# Memory Tools + +The Memory tool lets you take a snapshot of the current tab's memory +[heap](https://en.wikipedia.org/wiki/Memory_management#HEAP). +It then provides a number of views of the heap that can +show you which objects account for memory usage and exactly where in +your code you are allocating memory. + +<iframe width="595" height="325" src="https://www.youtube.com/embed/DJLoq5E5ww0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"></iframe> + +------------------------------------------------------------------------ + +## The basics +- Opening [the memory + tool](basic_operations.md#opening-the-memory-tool) +- [Taking a heap + snapshot](basic_operations.md#saving-and-loading-snapshots) +- [Comparing two + snapshots](basic_operations.md#comparing-snapshots) +- [Deleting + snapshots](basic_operations.md#clearing-a-snapshot) +- [Saving and loading + snapshots](basic_operations.md#saving-and-loading-snapshots) +- [Recording call + stacks](basic_operations.md#recording-call-stacks) + +------------------------------------------------------------------------ + +## Analyzing snapshots + +The Tree map view is new in Firefox 48, and the Dominators view is new +in Firefox 46. + +Once you've taken a snapshot, there are three main views the Memory +tool provides: + +- [the Tree map view](tree_map_view.md) shows + memory usage as a + [treemap](https://en.wikipedia.org/wiki/Treemapping). +- [the Aggregate view](aggregate_view.md) shows + memory usage as a table of allocated types. +- [the Dominators view](dominators_view.md) + shows the "retained size" of objects: that is, the size of objects + plus the size of other objects that they keep alive through + references. + +If you've opted to record allocation stacks for the snapshot, the +Aggregate and Dominators views can show you exactly where in your code +allocations are happening. + +------------------------------------------------------------------------ + +## Concepts + +- What are [Dominators](dominators.md)? + +------------------------------------------------------------------------ + +## Example pages + +Examples used in the Memory tool documentation. + +- The [Monster example](monster_example.md) +- The [DOM allocation example](DOM_allocation_example.md) diff --git a/docs/performance/memory/monster_example.md b/docs/performance/memory/monster_example.md new file mode 100644 index 0000000000..d351803a8d --- /dev/null +++ b/docs/performance/memory/monster_example.md @@ -0,0 +1,79 @@ +# Monster example slug + +This article describes a very simple web page that we'll use to +illustrate some features of the Memory tool. + +You can try the site at +<https://mdn.github.io/performance-scenarios/js-allocs/alloc.html>. +Heres the code: + +```js +var MONSTER_COUNT = 5000; +var MIN_NAME_LENGTH = 2; +var MAX_NAME_LENGTH = 48; + +function Monster() { + + function randomInt(min, max) { + return Math.floor(Math.random() * (max - min + 1)) + min; + } + + function randomName() { + var chars = "abcdefghijklmnopqrstuvwxyz"; + var nameLength = randomInt(MIN_NAME_LENGTH, MAX_NAME_LENGTH); + var name = ""; + for (var j = 0; j < nameLength; j++) { + name += chars[randomInt(0, chars.length-1)]; + } + return name; + } + + this.name = randomName(); + this.eyeCount = randomInt(0, 25); + this.tentacleCount = randomInt(0, 250); +} + +function makeMonsters() { + var monsters = { + "friendly": [], + "fierce": [], + "undecided": [] + }; + + for (var i = 0; i < MONSTER_COUNT; i++) { + monsters.friendly.push(new Monster()); + } + + for (var i = 0; i < MONSTER_COUNT; i++) { + monsters.fierce.push(new Monster()); + } + + for (var i = 0; i < MONSTER_COUNT; i++) { + monsters.undecided.push(new Monster()); + } + + console.log(monsters); +} + +var makeMonstersButton = document.getElementById("make-monsters"); +makeMonstersButton.addEventListener("click", makeMonsters); +``` + +The page contains a button: when you push the button, the code creates +some monsters. Specifically: + +- the code creates an object with three properties, each an array: + - one for fierce monsters + - one for friendly monsters + - one for monsters who haven't decided yet. +- for each array, the code creates and appends 5000 + randomly-initialized monsters. Each monster has: + - a string, for the monster's name + - a number representing the number of eyes it has + - a number representing the number of tentacles it has. + +So the structure of the memory allocated on the JavaScript heap is an +object containing three arrays, each containing 5000 objects (monsters), +each object containing a string and two integers: + +[![](../img/monsters.svg)] diff --git a/docs/performance/memory/refcount_tracing_and_balancing.md b/docs/performance/memory/refcount_tracing_and_balancing.md new file mode 100644 index 0000000000..fe3e3c7ae4 --- /dev/null +++ b/docs/performance/memory/refcount_tracing_and_balancing.md @@ -0,0 +1,235 @@ +# Refcount Tracing and Balancing + +Refcount tracing and balancing are advanced techniques for tracking down +leak of refcounted objects found with +[BloatView](bloatview.md). The first step +is to run Firefox with refcount tracing enabled, which produces one or +more log files. Refcount tracing logs calls to `Addref` and `Release`, +preferably for a particular set of classes, including call-stacks in +symbolic form (on platforms that support this). Refcount balancing is a +follow-up step that analyzes the resulting log to help a developer +figure out where refcounting went wrong. + +## How to build for refcount tracing + +Build with `--enable-debug` or `--enable-logrefcnt`. + +## How to run with refcount tracing on + +There are several environment variables that can be used. + +First, you select one of three environment variables to choose what kind +of logging you want. You almost certainly want `XPCOM_MEM_REFCNT_LOG`. + +NOTE: Due to an issue with the sandbox on Windows (bug +[1345568](https://bugzilla.mozilla.org/show_bug.cgi?id=1345568) +refcount logging currently requires the MOZ_DISABLE_CONTENT_SANDBOX +environment variable to be set. + +`XPCOM_MEM_REFCNT_LOG` + +Setting this environment variable enables refcount tracing. If you set +this environment variable to the name of a file, the log will be output +to that file. You can also set it to 1 to log to stdout or 2 to log to +stderr, but these logs are large and expensive to capture, so you +probably don't want to do that. **WARNING**: you should never use this +without `XPCOM_MEM_LOG_CLASSES` and/or `XPCOM_MEM_LOG_OBJECTS`, because +without some filtering the logging will be completely useless due to how +slow the browser will run and how large the logs it produces will be. + +`XPCOM_MEM_COMPTR_LOG` + +This environment variable enables logging of additions and releases of +objects into `nsCOMPtr`s. This requires C++ dynamic casts, so it is not +supported on all platforms. However, having an nsCOMPtr log and using it +in the creation of the balance tree allows AddRef and Release calls that +we know are matched to be eliminated from the tree, so it makes it much +easier to debug reference count leaks of objects that have a large +amount of reference counting traffic. + +`XPCOM_MEM_ALLOC_LOG` + +For platforms that don't have stack-crawl support, XPCOM supports +logging at the call site to `AddRef`/`Release` using the usual cpp +`__FILE__` and __LINE__ number macro expansion hackery. This results +in slower code, but at least you get some data about where the leaks +might be occurring from. + +You must also set one or two additional environment variables, +`XPCOM_MEM_LOG_CLASSES` and `XPCOM_MEM_LOG_OBJECTS,` to reduce the set +of objects being logged, in order to improve performance to something +vaguely tolerable. + +`XPCOM_MEM_LOG_CLASSES` + +This variable should contain a comma-separated list of names which will +be used to compare against the types of the objects being logged. For +example: + + env XPCOM_MEM_LOG_CLASSES=nsDocShell XPCOM_MEM_REFCNT_LOG=./refcounts.log ./mach run + +This will log the `AddRef` and `Release` calls only for instances of +`nsDocShell` while running the browser using `mach`, to a file +`refcounts.log`. Note that setting `XPCOM_MEM_LOG_CLASSES` will also +list the *serial number* of each object that leaked in the "bloat log" +(that is, the file specified by the `XPCOM_MEM_BLOAT_LOG` variable; see +[the BloatView documentation](bloatview.md) +for more details). An object's serial number is simply a unique number, +starting at one, that is assigned to the object when it is allocated. + +You may use an object's serial number with the following variable to +further restrict the reference count tracing: + + XPCOM_MEM_LOG_OBJECTS + +Set this variable to a comma-separated list of object *serial number* or +ranges of *serial number*, e.g., `1,37-42,73,165` (serial numbers start +from 1, not 0). When this is set, along with `XPCOM_MEM_LOG_CLASSES` and +`XPCOM_MEM_REFCNT_LOG`, a stack track will be generated for *only* the +specific objects that you list. For example, + + env XPCOM_MEM_LOG_CLASSES=nsDocShell XPCOM_MEM_LOG_OBJECTS=2 XPCOM_MEM_REFCNT_LOG=./refcounts.log ./mach run + +will log stack traces to `refcounts.log` for the 2nd `nsDocShell` object +that gets allocated, and nothing else. + +## **Post-processing step 1: finding the leakers** + +First you have to figure out which objects leaked. The script +`tools/rb/find_leakers.py` does this. It grovels through the log file, +and figures out which objects got allocated (it knows because they were +just allocated because they got `AddRef()`-ed and their refcount became +1). It adds them to a list. When it finds an object that got freed (it +knows because its refcount goes to 0), it removes it from the list. +Anything left over is leaked. + +The scripts output looks like the following. + + 0x00253ab0 (1) + 0x00253ae0 (2) + 0x00253bd0 (4) + +The number in parentheses indicates the order in which it was allocated, +if you care. Pick one of these pointers for use with Step 2. + +## Post-processing step 2: filtering the log + +Once you've picked an object that leaked, you can use +`tools/rb/filter-log.pl` to filter the log file to drop the call +stack for other objects; This process reduces the size of the log file +and also improves the performance. + + perl -w tools/rb/filter-log.pl --object 0x00253ab0 < ./refcounts.log > my-leak.log + +### Symbolicating stacks + +The log files often lack function names, file +names and line numbers. You'll need to run a script to fix the call +stack. + + python3 tools/rb/fix_stacks.py < ./refcounts.log > fixed_stack.log + +Also, it is possible to [locally symbolicate](/contributing/debugging/local_symbols.rst) +logs generated on TreeHerder. + +## **Post-processing step 3: building the balance tree** + +Now that you've the log file fully prepared, you can build a *balance +tree*. This process takes all the stack `AddRef()` and `Release()` stack +traces and munges them into a call graph. Each node in the graph +represents a call site. Each call site has a *balance factor*, which is +positive if more `AddRef()`s than `Release()`es have happened at the +site, zero if the number of `AddRef()`s and `Release()`es are equal, and +negative if more `Release()`es than `AddRef()`s have happened at the +site. + +To build the balance tree, run `tools/rb/make-tree.pl`, specifying the +object of interest. For example: + + perl -w tools/rb/make-tree.pl --object 0x00253ab0 < my-leak.log + +This will build an indented tree that looks something like this (except +probably a lot larger and leafier): + + .root: bal=1 + main: bal=1 + DoSomethingWithFooAndReturnItToo: bal=2 + NS_NewFoo: bal=1 + +Let's pretend in our toy example that `NS_NewFoo()` is a factory method +that makes a new foo and returns it. +`DoSomethingWithFooAndReturnItToo()` is a method that munges the foo +before returning it to `main()`, the main program. + +What this little tree is telling you is that you leak *one refcount* +overall on object `0x00253ab0`. But, more specifically, it shows you +that: + +- `NS_NewFoo()` "leaks" a refcount. This is probably "okay" + because it's a factory method that creates an `AddRef()`-ed object. +- `DoSomethingWithFooAndReturnItToo()` leaks *two* refcounts. + Hmm...this probably isn't okay, especially because... +- `main()` is back down to leaking *one* refcount. + +So from this, we can deduce that `main()` is correctly releasing the +refcount that it got on the object returned from +`DoSomethingWithFooAndReturnItToo()`, so the leak *must* be somewhere in +that function. + +So now say we go fix the leak in `DoSomethingWithFooAndReturnItToo()`, +re-run our trace, grovel through the log by hand to find the object that +corresponds to `0x00253ab0` in the new run, and run `make-tree.pl`. What +we'd hope to see is a tree that looks like: + + .root: bal=0 + main: bal=0 + DoSomethingWithFooAndReturnItToo: bal=1 + NS_NewFoo: bal=1 + +That is, `NS_NewFoo()` "leaks" a single reference count; this leak is +"inherited" by `DoSomethingWithFooAndReturnItToo()`; but is finally +balanced by a `Release()` in `main()`. + +## Hints + +Clearly, this is an iterative and analytical process. Here are some +tricks that make it easier. + +**Check for leaks from smart pointers.** If the leak comes from a smart +pointer that is logged in the XPCOM_MEM_COMPTR_LOG, then +find-comptr-leakers.pl will find the exact stack for you, and you don't +have to look at trees. + +**Ignore balanced trees**. The `make-tree.pl` script accepts an option +`--ignore-balanced`, which tells it *not* to bother printing out the +children of a node whose balance factor is zero. This can help remove +some of the clutter from an otherwise noisy tree. + +**Ignore matched releases from smart pointers.** If you've checked (see +above) that the leak wasn't from a smart pointer, you can ignore the +references that came from smart pointers (where we can use the pointer +identity of the smart pointer to match the AddRef and the Release). This +requires using an XPCOM_MEM_REFCNT_LOG and an XPCOM_MEM_COMPTR_LOG that +were collected at the same time. For more details, see the [old +documentation](http://www-archive.mozilla.org/performance/leak-tutorial.html) +(which should probably be incorporated here). This is best used with +`--ignore-balanced` + +**Play Mah Jongg**. An unbalanced tree is not necessarily an evil thing. +More likely, it indicates that one `AddRef()` is cancelled by another +`Release()` somewhere else in the code. So the game is to try to match +them with one another. + +**Exclude Functions.** To aid in this process, you can create an +"excludes file", that lists the name of functions that you want to +exclude from the tree building process (presumably because you've +matched them). `make-tree.pl` has an option `--exclude [file]`, where +`[file]` is a newline-separated list of function names that will be +*excluded* from consideration while building the tree. Specifically, any +call stack that contains that call site will not contribute to the +computation of balance factors in the tree. + +## How to instrument your objects for refcount tracing and balancing + +The process is the same as instrumenting them for BloatView because BloatView +and refcount tracing share underlying infrastructure. diff --git a/docs/performance/memory/tree_map_view.md b/docs/performance/memory/tree_map_view.md new file mode 100644 index 0000000000..30d9968db6 --- /dev/null +++ b/docs/performance/memory/tree_map_view.md @@ -0,0 +1,62 @@ +# Tree map view + +The Tree map view is new in Firefox 48. + +The Tree map view provides a visual representation of the snapshot, that +helps you quickly get an idea of which objects are using the most +memory. + +A treemap displays [\"hierarchical (tree-structured) data as a set of +nested rectangles\"](https://en.wikipedia.org/wiki/Treemapping). The +size of the rectangles corresponds to some quantitative aspect of the +data. + +For the treemaps shown in the Memory tool, things on the heap are +divided at the top level into four categories: + +- **objects**: JavaScript and DOM objects, such as `Function`, + `Object`, or `Array`, and DOM types like `Window` and + `HTMLDivElement`. +- **scripts**: JavaScript sources loaded by the page. +- **strings** +- **other**: this includes internal + [SpiderMonkey](https://developer.mozilla.org/en-US/docs/Tools/Tools_Toolbox#settings/en-US/docs/Mozilla/Projects/SpiderMonkey) objects. + +Each category is represented with a rectangle, and the size of the +rectangle corresponds to the proportion of the heap occupied by items in +that category. This means you can quickly get an idea of roughly what +sorts of things allocated by your site are using the most memory. + +Within top-level categories: + +- **objects** is further divided by the object's type. +- **scripts** is further subdivided by the script's origin. It also + includes a separate rectangle for code that can't be correlated + with a file, such as JIT-optimized code. +- **other** is further subdivided by the object's type. + +Here are some example snapshots, as they appear in the Tree map view: + +![](../img/treemap-domnodes.png) + +This treemap is from the [DOM allocation +example](DOM_allocation_example.md), which runs a +script that creates a large number of DOM nodes (200 `HTMLDivElement` +objects and 4000 `HTMLSpanElement` objects). You can see how almost all +the heap usage is from the `HTMLSpanElement` objects that it creates. + +![](../img/treemap-monsters.png) + +This treemap is from the [monster allocation +example](monster_example.md), which creates three +arrays, each containing 5000 monsters, each monster having a +randomly-generated name. You can see that most of the heap is occupied +by the strings used for the monsters' names, and the objects used to +contain the monsters' other attributes. + +![](../img/treemap-bbc.png) + +This treemap is from <http://www.bbc.com/>, and is probably more +representative of real life than the examples. You can see the much +larger proportion of the heap occupied by scripts, that are loaded from +a large number of origins. |