js/src/doc/HazardAnalysis/index.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

# Static Analysis for Rooting and Heap Write Hazards

Treeherder can run two static analysis builds: the full browser (linux64-haz), just the JS shell (linux64-shell-haz). They show up on treeherder as `H` and `SM(H)`.

## Diagnosing a hazard failure

The first step is to look at what sort of hazard is being reported. There are two types that cause the job to fail: stack rooting hazards for garbage collection, and heap write thread safety hazards for stylo.

The summary output will include either the string `<N> rooting hazards detected` or `<N> heap write hazards detected out of <M> allowed`. See the appropriate section below for each.

## Diagnosing a rooting hazards failure

Click on the `H` build link, select the "Artifacts" pane on the bottom left, and download the `public/build/hazards.txt.gz` file.

Example snippet:

    Function 'jsopcode.cpp:uint8 DecompileExpressionFromStack(JSContext*, int32, int32, class JS::Handle<JS::Value>, int8**)' has unrooted 'ed' of type 'ExpressionDecompiler' live across GC call 'uint8 ExpressionDecompiler::decompilePC(uint8*)' at js/src/jsopcode.cpp:1866
        js/src/jsopcode.cpp:1866: Assume(74,75, !__temp_23*, true)
        js/src/jsopcode.cpp:1867: Assign(75,76, return := 0)
        js/src/jsopcode.cpp:1867: Call(76,77, ed.~ExpressionDecompiler())
    GC Function: uint8 ExpressionDecompiler::decompilePC(uint8*)
        JSString* js::ValueToSource(JSContext*, class JS::Handle<JS::Value>)
        uint8 js::Invoke(JSContext*, JS::Value*, JS::Value*, uint32, JS::Value*, class JS::MutableHandle<JS::Value>)
        uint8 js::Invoke(JSContext*, JS::CallArgs, uint32)
        JSScript* JSFunction::getOrCreateScript(JSContext*)
        uint8 JSFunction::createScriptForLazilyInterpretedFunction(JSContext*, class JS::Handle<JSFunction*>)
        uint8 JSRuntime::cloneSelfHostedFunctionScript(JSContext*, class JS::Handle<js::PropertyName*>, class JS::Handle<JSFunction*>)
        JSScript* js::CloneScript(JSContext*, class JS::Handle<JSObject*>, class JS::Handle<JSFunction*>, const class JS::Handle<JSScript*>, uint32)
        JSObject* js::CloneStaticBlockObject(JSContext*, class JS::Handle<JSObject*>, class JS::Handle<js::StaticBlockObject*>)
        js::StaticBlockObject* js::StaticBlockObject::create(js::ExclusiveContext*)
        js::Shape* js::EmptyShape::getInitialShape(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*, JSObject*, uint32, uint32)
        js::Shape* js::EmptyShape::getInitialShape(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*, JSObject*, uint64, uint32)
        js::UnownedBaseShape* js::BaseShape::getUnowned(js::ExclusiveContext*, js::StackBaseShape*)
        js::BaseShape* js_NewGCBaseShape(js::ThreadSafeContext*) [with js::AllowGC allowGC = (js::AllowGC)1u]
        js::BaseShape* js::gc::NewGCThing(js::ThreadSafeContext*, uint32, uint64, uint32) [with T = js::BaseShape; js::AllowGC allowGC = (js::AllowGC)1u; size_t = long unsigned int]
        void js::gc::RunDebugGC(JSContext*)
        void js::MinorGC(JSRuntime*, uint32)
        GC

This means that a rooting hazard was discovered at `js/src/jsopcode.cpp` line 1866, in the function `DecompileExpressionFromStack` (it is prefixed with the filename because it's a static function.) The problem is that there is an unrooted variable `ed` that holds an `ExpressionDecompiler` live across a call to `decompilePC`. "Live" means that the variable is used after the call to `decompilePC` returns. `decompilePC` may trigger a GC according to the static call stack given starting from the line beginning with "`GC Function:`".

The hazard itself has some barely comprehensible `Assume(...)` and `Call(...)` gibberish that describes the exact data flow path of the variable into the function call. That stuff is rarely useful -- usually, you'll only need to look at it if it's complaining about a temporary and you want to know where the temporary came from. The type `ExpressionDecompiler` is believed to hold pointers to GC-controlled objects of some sort. The analysis currently does not describe the exact field it is worried about.

To unpack this a little, the analysis is saying the following can happen:

* `ExpressionDecompiler` contains some pointer to a GC thing. For example, it might have a field `obj` of type `JSObject*`.
* `DecompileExpressionFromStack` is called.
* A pointer is stored in that field of the `ed` variable.
* `decompilePC` is invoked, which calls `ValueToSource`, which calls `Invoke`, which eventually calls `js::MinorGC`
* During the resulting garbage collection, the object pointed to by `ed.obj` is moved to a different location. All pointers stored in the JS heap are updated automatically, as are all rooted pointers. `ed.obj` is not, because the GC doesn't know about it.
* After `decompilePC` returns, something accesses `ed.obj`. This is now a stale pointer, and may refer to just about anything -- the wrong object, an invalid object, or whatever. As TeX would say, **badness 10000**.

## Diagnosing a heap write hazard failure

For the thread unsafe heap write analysis, a hazard means that some Gecko_* function calls, directly or indirectly, code that writes to something on the heap, or calls an unknown function that *might* write to something on the heap. The analysis requires quite a few annotations to describe things that are actually safe. This section will be expanded as we gain more experience with the analysis, but here are some common issues:

* Adding a new Gecko_* function: often, you will need to annotate any outparams or owned (thread-local) parameters in the `treatAsSafeArgument` function in `js/src/devtools/rootAnalysis/analyzeHeapWrites.js`.
* Calling some libc function: if you add a call to some random libc function (eg `sin()` or `floor()` or `ceil()`, though the latter two are already annotated), the analysis will report an "External Function". Add it to `checkExternalFunction`, assuming it *doesn't* have the possibility of writing to shared heap memory.
* If you call some non-returning (crashing) function that the analysis doesn't know about, you'll need to add it to `ignoreContents`.

On the other hand, you might have a real thread safety issue on your hands. Shared caches are common problems. Fix it.

## Analysis implementation

These builds do the following:

* set up a build environment and run the analysis within it, then upload the resulting files
 * compile an optimized JS shell to later run the analysis
 * compile the browser with gcc, using a slightly modified version of the sixgill (http://svn.sixgill.org) gcc plugin
* produce a set of `.xdb` files describing everything encountered during the compilation
* analyze the `.xdb` files with scripts in `js/src/devtools/rootAnalysis`

The format of the information stored in those files is [somewhat documented][CFG].

## Running the analysis

### Pushing to try

The easiest way to run an analysis is to push to try with `mach try fuzzy -q "'haz"` (or, if the hazards of interest are contained entirely within `js/src`, use `mach try fuzzy -q "'shell-haz"` for a much faster result). The expected turnaround time for linux64-haz is just under 1.5 hours (~20 minutes for `hazard-linux64-shell-haz`).

The output will be uploaded and an output file `hazards.txt.xz` will be placed into the "Artifacts" info pane on treeherder.

### Running locally

The rooting [hazard analysis may be run][running] using mach.

## So you broke the analysis by adding a hazard. Now what?

Backout, fix the hazard, or (final resort) update the expected number of hazards in `js/src/devtools/rootAnalysis/expect.browser.json` (but don't do that).

The most common way to fix a hazard is to change the variable to be a `Rooted` type, as described in [RootingAPI.h][rooting]

For more complicated cases, ask on the Matrix channel (see [spidermonkey.dev][spidermonkey] for contact info). If you don't get a response, ping sfink or jonco for rooting hazards, bholley or sfink for heap write hazards.

[running]: running.md
[rooting]: https://searchfox.org/mozilla-central/source/js/public/RootingAPI.h
[spidermonkey]: https://spidermonkey.dev/
[CFG]: CFG.md