From 26a029d407be480d791972afb5975cf62c9360a6 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Fri, 19 Apr 2024 02:47:55 +0200 Subject: Adding upstream version 124.0.1. Signed-off-by: Daniel Baumann --- js/src/doc/Debugger/Conventions.md | 254 +++++ js/src/doc/Debugger/Debugger-API.md | 129 +++ js/src/doc/Debugger/Debugger.Environment.md | 182 ++++ js/src/doc/Debugger/Debugger.Frame.md | 489 ++++++++++ js/src/doc/Debugger/Debugger.Memory.md | 575 +++++++++++ js/src/doc/Debugger/Debugger.Object.md | 684 +++++++++++++ js/src/doc/Debugger/Debugger.Script.md | 506 ++++++++++ js/src/doc/Debugger/Debugger.Source.md | 286 ++++++ js/src/doc/Debugger/Debugger.md | 570 +++++++++++ js/src/doc/Debugger/Tutorial-Alloc-Log-Tree.md | 226 +++++ js/src/doc/Debugger/Tutorial-Breakpoint.md | 130 +++ js/src/doc/Debugger/Tutorial-Debugger-Statement.md | 94 ++ js/src/doc/Debugger/alloc-plot-console.png | Bin 0 -> 82359 bytes js/src/doc/Debugger/console.png | Bin 0 -> 41695 bytes js/src/doc/Debugger/debugger-alert.png | Bin 0 -> 27770 bytes js/src/doc/Debugger/enable-chrome-devtools.png | Bin 0 -> 28465 bytes js/src/doc/Debugger/index.rst | 20 + .../Debugger/scratchpad-browser-environment.png | Bin 0 -> 30443 bytes js/src/doc/Debugger/shadows.svg | 1000 ++++++++++++++++++++ js/src/doc/HazardAnalysis/CFG.md | 347 +++++++ js/src/doc/HazardAnalysis/index.md | 100 ++ js/src/doc/HazardAnalysis/running.md | 124 +++ js/src/doc/MIR-optimizations/index.md | 97 ++ js/src/doc/SavedFrame/index.md | 95 ++ js/src/doc/build.rst | 247 +++++ js/src/doc/bytecode_checklist.md | 44 + js/src/doc/feature_checklist.md | 79 ++ js/src/doc/gc.rst | 140 +++ js/src/doc/hacking_tips.md | 588 ++++++++++++ js/src/doc/index.rst | 204 ++++ js/src/doc/test.rst | 89 ++ 31 files changed, 7299 insertions(+) create mode 100644 js/src/doc/Debugger/Conventions.md create mode 100644 js/src/doc/Debugger/Debugger-API.md create mode 100644 js/src/doc/Debugger/Debugger.Environment.md create mode 100644 js/src/doc/Debugger/Debugger.Frame.md create mode 100644 js/src/doc/Debugger/Debugger.Memory.md create mode 100644 js/src/doc/Debugger/Debugger.Object.md create mode 100644 js/src/doc/Debugger/Debugger.Script.md create mode 100644 js/src/doc/Debugger/Debugger.Source.md create mode 100644 js/src/doc/Debugger/Debugger.md create mode 100644 js/src/doc/Debugger/Tutorial-Alloc-Log-Tree.md create mode 100644 js/src/doc/Debugger/Tutorial-Breakpoint.md create mode 100644 js/src/doc/Debugger/Tutorial-Debugger-Statement.md create mode 100644 js/src/doc/Debugger/alloc-plot-console.png create mode 100644 js/src/doc/Debugger/console.png create mode 100644 js/src/doc/Debugger/debugger-alert.png create mode 100644 js/src/doc/Debugger/enable-chrome-devtools.png create mode 100644 js/src/doc/Debugger/index.rst create mode 100644 js/src/doc/Debugger/scratchpad-browser-environment.png create mode 100644 js/src/doc/Debugger/shadows.svg create mode 100644 js/src/doc/HazardAnalysis/CFG.md create mode 100644 js/src/doc/HazardAnalysis/index.md create mode 100644 js/src/doc/HazardAnalysis/running.md create mode 100644 js/src/doc/MIR-optimizations/index.md create mode 100644 js/src/doc/SavedFrame/index.md create mode 100644 js/src/doc/build.rst create mode 100644 js/src/doc/bytecode_checklist.md create mode 100644 js/src/doc/feature_checklist.md create mode 100644 js/src/doc/gc.rst create mode 100644 js/src/doc/hacking_tips.md create mode 100644 js/src/doc/index.rst create mode 100644 js/src/doc/test.rst (limited to 'js/src/doc') diff --git a/js/src/doc/Debugger/Conventions.md b/js/src/doc/Debugger/Conventions.md new file mode 100644 index 0000000000..5d89eb7343 --- /dev/null +++ b/js/src/doc/Debugger/Conventions.md @@ -0,0 +1,254 @@ +# General Conventions + +This page describes general conventions used in the [`Debugger`][debugger] API, +and defines some terminology used throughout the specification. + + +## Properties + +Properties of objects that comprise the `Debugger` interface, and those +that the interface creates, follow some general conventions: + +- Instances and prototypes are extensible; you can add your own properties + and methods to them. + +- Properties are configurable. This applies to both "own" and prototype + properties, and to both methods and data properties. (Leaving these + properties open to redefinition will hopefully make it easier for + JavaScript debugger code to cope with bugs, bug fixes, and changes in the + interface over time.) + +- Method properties are writable. + +- We prefer inherited accessor properties to own data properties. Both are + read using the same syntax, but inherited accessors seem like a more + accurate reflection of what's going on. Unless otherwise noted, these + properties have getters but no setters, as they cannot meaningfully be + assigned to. + + +## Debuggee Values + +The `Debugger` interface follows some conventions to help debuggers safely +inspect and modify the debuggee's objects and values. Primitive values are +passed freely between debugger and debuggee; copying or wrapping is handled +transparently. Objects received from the debuggee (including host objects +like DOM elements) are fronted in the debugger by `Debugger.Object` +instances, which provide reflection-oriented methods for inspecting their +referents; see `Debugger.Object`, below. + +Of the debugger's objects, only `Debugger.Object` instances may be passed +to the debuggee: when this occurs, the debuggee receives the +`Debugger.Object`'s referent, not the `Debugger.Object` instance itself. + +In the descriptions below, the term "debuggee value" means either a +primitive value or a `Debugger.Object` instance; it is a value that might +be received from the debuggee, or that could be passed to the debuggee. + + +## Debuggee Code + +Each `Debugger` instance maintains a set of global objects that, taken +together, comprise the debuggee. Code evaluated in the scope of a debuggee +global object, directly or indirectly, is considered *debuggee code*. +Similarly: + +- a *debuggee frame* is a frame running debuggee code; + +- a *debuggee function* is a function that closes over a debuggee + global object (and thus the function's code is debuggee code); + +- a *debuggee environment* is an environment whose outermost + enclosing environment is a debuggee global object; and + +- a *debuggee script* is a script containing debuggee code. + + +## Completion Values + +The `Debugger` API often needs to convey the result of running some JS code. For example, suppose you get a `frame.onPop` callback telling you that a method in the debuggee just finished. Did it return successfully? Did it throw? What did it return? The debugger passes the `onPop` handler a *completion value* that tells what happened. + +A completion value is one of these: + +* `{ return: value }` + + The code completed normally, returning value. Value is a + debuggee value. + +* `{ throw: value, stack: stack }` + + The code threw value as an exception. Value is a debuggee + value. stack is a `SavedFrame` representing the location from which + the value was thrown, and may be missing. + +* `null` + + The code was terminated, as if by the "slow script" ribbon. + +Generators and async functions add a wrinkle: they can suspend themselves (with `yield` or `await`), which removes their frame from the stack. Later, the generator or async frame might be returned to the stack and continue running where it left off. Does it count as "completion" when a generator suspends itself? + +The `Debugger` API says yes. `yield` and `await` do trigger the `frame.onPop` handler, passing a completion value that explains why the frame is being suspended. The completion value gets an extra `.yield` or `.await` property, to distinguish this kind of completion from a normal `return`. + +```js +{ return: value, yield: true } +``` + +where *value* is a debuggee value for the iterator result object, like `{ value: 1, done: false }`, for the yield. + +When a generator function is called, it first evaluates any default argument +expressions and destructures its arguments. Then its frame is suspended, and the +new generator object is returned to the caller. This initial suspension is reported +to any `onPop` handlers as a completion value of the form: + +```js +{ return: generatorObject, yield: true, initial: true } +``` + +where *generatorObject* is a debuggee value for the generator object being +returned to the caller. + +When an async function awaits a promise, its suspension is reported to any +`onPop` handlers as a completion value of the form: + +```js +{ return: promise, await: true } +``` + +where *promise* is a debuggee value for the promise being returned to the +caller. + +The first time a call to an async function awaits, returns, or throws, a promise +of its result is returned to the caller. Subsequent resumptions of the async +call, if any, are initiated directly from the job queue's event loop, with no +calling frame on the stack. Thus, if needed, an `onPop` handler can distinguish +an async call's initial suspension, which returns the promise, from any +subsequent suspensions by checking the `Debugger.Frame`'s `older` property: if +that is `null`, the call was resumed directly from the event loop. + +Async generators are a combination of async functions and generators that can +use both `yield` and `await` expressions. Suspensions of async generator frames +are reported using any combination of the completion values above. + + +## Resumption Values + +As the debuggee runs, the `Debugger` interface calls various +debugger-provided handler functions to report the debuggee's behavior. +Some of these calls can return a value indicating how the debuggee's +execution should continue; these are called *resumption values*. A +resumption value has one of the following forms: + +* `undefined` + + The debuggee should continue execution normally. + +* `{ return: value }` + + Force the top frame of the debuggee to return value immediately, + as if by executing a `return` statement. Value must be a debuggee + value. (Most handler functions support this, except those whose + descriptions say otherwise.) See the list of special cases below. + +* `{ throw: value }` + + Throw value as an exception from the current bytecode + instruction. Value must be a debuggee value. Note that unlike + completion values, resumption values do not specify a stack. When + initiating an exceptional return from a handler, the current debuggee stack + will be used. If a handler wants to avoid modifying the stack of an + already-thrown exception, it should return `undefined`. + +* `null` + + Terminate the debuggee, as if it had been cancelled by the "slow script" + dialog box. + +In some places, the JS language treats `return` statements specially or +doesn't allow them at all. So there are a few special cases. + +* An arrow function without curly braces can't contain a return + statement, but `{ return: value }` works anyway, + returning the specified value. + + Likewise, if the top frame of the debuggee is not in a function at + all—that is, it's running toplevel code in a `script` tag, or `eval` + code—then value is returned even though `return` statements + aren't legal in that kind of code. (In the case of a `script` tag, + the browser discards the return value.) + +* If the debuggee is in a function that was called as a constructor (that + is, via a `new` expression), then value serves as the value + returned by the function's body, not that produced by the `new` + expression: if the value is not an object, the `new` expression returns + the frame's `this` value. + + Similarly, if the function is the constructor for a subclass, then a + non-object value may result in a `TypeError`. + +* Returning from a generator simulates a `return`, not a `yield`; + there is no way to force a debuggee generator to `yield`. + + The way generators execute is rather odd. When a generator-function + is first called, it is put onto the stack and runs just a few + bytecode instructions (or more, if the generator-function has any + default argument values to compute), then performs the "initial + suspend". At that point, a new generator object is created and + returned to the caller. Thereafter, the caller may cause execution + of the generator to resume at any time, by calling `genObj.next()`, + and the generator may pause itself again using `yield`. + + JS generators normally can't return before the "initial + suspend"—there’s no place to put a `return` statement—but + `{ return: value }` there works anyway, replacing + the generator object that the initial suspend would normally create + and return. + + Returning from a generator that's been resumed via `genobj.next()` + (or one of the other methods) closes the generator, and the + `genobj.next()` or other method returns a new object of the form + `{ done: true, value: value }`. + +If a debugger hook function throws an exception, rather than returning a +resumption value, we never propagate such an exception to the debuggee; +instead, we call the associated `Debugger` instance's +`uncaughtExceptionHook` property, as described below. + + +## Timestamps + +Timestamps are expressed in units of milliseconds since an arbitrary, +but fixed, epoch. The resolution of timestamps is generally greater +than milliseconds, though no specific resolution is guaranteed. + + +## The `Debugger.DebuggeeWouldRun` Exception + +Some debugger operations that appear to simply inspect the debuggee's state +may actually cause debuggee code to run. For example, reading a variable +might run a getter function on the global or on a `with` expression's +operand; and getting an object's property descriptor will run a handler +trap if the object is a proxy. To protect the debugger's integrity, only +methods whose stated purpose is to run debuggee code can do so. These +methods are called [invocation functions][inv fr], and they follow certain +common conventions to report the debuggee's behavior safely. For other +methods, if their normal operation would cause debuggee code to run, they +throw an instance of the `Debugger.DebuggeeWouldRun` exception. + +If there are debugger frames on stack from multiple Debugger instances, the +thrown exception is an instance of the topmost locking debugger's global's +`Debugger.DebuggeeWouldRun`. + +A `Debugger.DebuggeeWouldRun` exception may have a `cause` property, +providing more detailed information on why the debuggee would have run. The +`cause` property's value is one of the following strings: + +* `"proxy"`: Carrying out the operation would have caused a proxy handler to run. | +* `"getter"`: Carrying out the operation would have caused an object property getter to run. | +* `"setter"`: Carrying out the operation would have caused an object property setter to run. | + +If the system can't determine why control attempted to enter the debuggee, +it will leave the exception's `cause` property undefined. + + +[debugger]: Debugger-API.md +[inv fr]: Debugger.Frame.md#invocation-functions-and-debugger-frames diff --git a/js/src/doc/Debugger/Debugger-API.md b/js/src/doc/Debugger/Debugger-API.md new file mode 100644 index 0000000000..79f36c0e22 --- /dev/null +++ b/js/src/doc/Debugger/Debugger-API.md @@ -0,0 +1,129 @@ +# The `Debugger` Interface + +Mozilla's JavaScript engine, SpiderMonkey, provides a debugging interface +named `Debugger` which lets JavaScript code observe and manipulate the +execution of other JavaScript code. Both Firefox's built-in developer tools +and the Firebug add-on use `Debugger` to implement their JavaScript +debuggers. However, `Debugger` is quite general, and can be used to +implement other kinds of tools like tracers, coverage analysis, +patch-and-continue, and so on. + +`Debugger` has three essential qualities: + +- It is a *source level* interface: it operates in terms of the JavaScript + language, not machine language. It operates on JavaScript objects, stack + frames, environments, and code, and presents a consistent interface + regardless of whether the debuggee is interpreted, compiled, or + optimized. If you have a strong command of the JavaScript language, you + should have all the background you need to use `Debugger` successfully, + even if you have never looked into the language's implementation. + +- It is for use *by JavaScript code*. JavaScript is both the debuggee + language and the tool implementation language, so the qualities that make + JavaScript effective on the web can be brought to bear in crafting tools + for developers. As is expected of JavaScript APIs, `Debugger` is a + *sound* interface: using (or even misusing) `Debugger` should never cause + Gecko to crash. Errors throw proper JavaScript exceptions. + +- It is an *intra-thread* debugging API. Both the debuggee and the code + using `Debugger` to observe it must run in the same thread. Cross-thread, + cross-process, and cross-device tools must use `Debugger` to observe the + debuggee from within the same thread, and then handle any needed + communication themselves. (Firefox's builtin tools have a + [protocol][protocol] defined for this purpose.) + +In Gecko, the `Debugger` API is available to chrome code only. By design, +it ought not to introduce security holes, so in principle it could be made +available to content as well; but it is hard to justify the security risks +of the additional attack surface. + +The `Debugger` API cannot currently observe self-hosted JavaScript. This is not +inherent in the API's design, but simply that the self-hosting infrastructure +isn't prepared for the kind of invasions the `Debugger` API can perform. + + +## Debugger Instances and Shadow Objects + +`Debugger` reflects every aspect of the debuggee's state as a JavaScript +value---not just actual JavaScript values like objects and primitives, +but also stack frames, environments, scripts, and compilation units, which +are not normally accessible as objects in their own right. + +Here is a JavaScript program in the process of running a timer callback function: + +![A running JavaScript program and its Debugger shadows][img-shadows] + +This diagram shows the various types of shadow objects that make up the +Debugger API (which all follow some [general conventions][conventions]): + +- A [`Debugger.Object`][object] represents a debuggee object, offering a + reflection-oriented API that protects the debugger from accidentally + invoking getters, setters, proxy traps, and so on. + +- A [`Debugger.Script`][script] represents a block of JavaScript + code---either a function body or a top-level script. Given a + `Debugger.Script`, one can set breakpoints, translate between source + positions and bytecode offsets (a deviation from the "source level" + design principle), and find other static characteristics of the code. + +- A [`Debugger.Frame`][frame] represents a running stack frame. You can use + these to walk the stack and find each frame's script and environment. You + can also set `onStep` and `onPop` handlers on frames. + +- A [`Debugger.Environment`][environment] represents an environment, + associating variable names with storage locations. Environments may + belong to a running stack frame, captured by a function closure, or + reflect some global object's properties as variables. + +The [`Debugger`][debugger-object] instance itself is not really a shadow of +anything in the debuggee; rather, it maintains the set of global objects +which are to be considered debuggees. A `Debugger` observes only execution +taking place in the scope of these global objects. You can set functions to +be called when new stack frames are pushed; when new code is loaded; and so +on. + +Omitted from this picture are [`Debugger.Source`][source] instances, which +represent JavaScript compilation units. A `Debugger.Source` can furnish a +full copy of its source code, and explain how the code entered the system, +whether via a call to `eval`, a `` elements. + +* `"injectedScript"`, for code belonging to scripts that _would_ be + `"inlineScript"` except that they were not part of the initial file itself. + + For example, scripts created via: + + * `document.write("")` + * `var s = document.createElement("script"); s.text = "code";` + +* `"importedModule"`, for code that was loaded indirectly by being imported + by another script using ESM static or dynamic imports. + +* `"javascriptURL"`, for code presented in `javascript:` URLs. + +* `"domTimer"`, for code passed to `setTimeout`/`setInterval` as a string. + +* `"self-hosted"`, for internal self-hosted JS code. + +* `undefined`, if the implementation doesn't know how the code was + introduced. + +**If the instance refers to WebAssembly code**, `"wasm"`. + +### `introductionScript` & `introductionOffset` +**If the instance refers to JavaScript source**, and if this source was +introduced by calling a function from debuggee code, then +`introductionScript` is the [`Debugger.Script`][script] instance referring +to the script containing that call, and `introductionOffset` is the call's +bytecode offset within that script. Otherwise, these are both `undefined`. +Taken together, these properties indicate the location of the introducing +call. + +For the purposes of these accessors, assignments to accessor properties are +treated as function calls. Thus, setting a DOM element's event handler IDL +attribute by assigning to the corresponding JavaScript property creates a +source whose `introductionScript` and `introductionOffset` refer to the +property assignment. + +Since a ` + ``` + +6. Open the browser console (Menu Button > Developer > Browser Console), and + then evaluate the expression `demoTrackAllocations()` in the browser + console. This begins logging allocations in the current browser tab. + +7. In the browser tab, click on the text that says "Click here...". The event + handler should add some text to the end of the page. + +8. Back in the browser console, evaluate the expression + `demoPlotAllocations()`. This stops logging allocations, and displays a tree + of allocations: + + ![An allocation plot, displayed in the console][img-alloc-plot] + + The numbers at the left edge of each line show the total number of objects + allocated at that site or at sites called from there. After the count, we + see the function name, and the source code location of the call site or + allocation. + + The `(root)` node's count includes objects allocated in the content page by + the web browser, like DOM events. Indeed, this display shows that + `popup.xml` and `content.js`, which are internal components of Firefox, + allocated more objects in the page's compartment than the page itself. (We + will probably revise the allocation log to present such allocations in a way + that is more informative, and that exposes less of Firefox's internal + structure.) + + As expected, the `onclick` handler is responsible for all allocation done by + the page's own code. (The line number for the onclick handler is `1`, + indicating that the allocating call is located on line one of the handler + text itself. We will probably change this to be the line number within + `page.html`, not the line number within the handler code.) + + The `onclick` handler calls `doDivsAndSpans`, which calls `divsAndSpans`, + which invokes closures of `factory` to do all the actual allocation. (It is + unclear why `spanFactory` allocated thirteen objects, despite being called + only ten times.) + + +[debugger]: Debugger-API.md +[img-chrome-pref]: enable-chrome-devtools.png +[img-scratchpad-browser]: scratchpad-browser-environment.png +[img-alloc-plot]: alloc-plot-console.png diff --git a/js/src/doc/Debugger/Tutorial-Breakpoint.md b/js/src/doc/Debugger/Tutorial-Breakpoint.md new file mode 100644 index 0000000000..014cf947aa --- /dev/null +++ b/js/src/doc/Debugger/Tutorial-Breakpoint.md @@ -0,0 +1,130 @@ +Tutorial: Set a breakpoint using `Debugger` +=========================================== + +This page shows how you can try out the [`Debugger` API][debugger] yourself +using Firefox's Scratchpad. We use `Debugger` to set a breakpoint in a function, +and then evaluate an expression whenever it is hit. + +This tutorial was tested against Firefox 58 Beta and Nightly. It does not work in Firefox 57. + +1. Since the `Debugger` API is only available to privileged JavaScript code, + you'll need to use the Browser Content Toolbox to try it out. To do this, + open the Firefox developer tools, click on the options gear at the upper + right of the toolbox, and make sure that both “Enable browser chrome and + add-on debugging toolboxes” and “Enable remote debugging” are checked. These + are located at the bottom right of the options panel; you may need to scroll + to see them. Once they're checked, you can close the developer tools. + + +2. Save the following text to an HTML file: + + ```html +
Click me!
+
Or me!
+ + ``` + +3. Visit the HTML file in your browser, and open the Browser Content Toolbox by + opening the Firefox menu, choosing “Browser Tools”, and then “Browser + Content Toolbox”. If that item doesn't appear in the “Browser Tools” menu, + make sure you checked both boxes to enable the Browser Content Toolbox as + explained in Step 1. + +4. Our example code is long enough that the best way to run it is to use the + Scratchpad panel, which is not enabled by default. To enable it, click on + the options gear at the upper right of the Browser Content Toolbox, and make + sure the “Scratchpad” box in the “Default Developer Tools” section the left + is checked. The Scratchpad panel should appear at the top of the Toolbox + alongside the Console, Debugger, and Memory panels. + +5. Click on the Scratchpad panel and enter the following code: + + ```js + const { addDebuggerToGlobal } = ChromeUtils.importESModule( + "resource://gre/modules/jsdebugger.sys.mjs" + ); + + // This simply defines 'Debugger' in this Scratchpad; + // it doesn't actually start debugging anything. + addDebuggerToGlobal(globalThis); + + // Create a 'Debugger' instance. + var dbg = new Debugger; + + // Make the tab's top window a debuggee, and get a + // Debugger.Object referring to the window. + var windowDO = dbg.addDebuggee(tabs[0].content); + + // Get a Debugger.Object referring to the window's `report` + // function. + var reportDO = windowDO.getOwnPropertyDescriptor('report').value; + + // Set a breakpoint at the entry point of `report`. + reportDO.script.setBreakpoint(0, { + hit: function (frame) { + console.log('hit breakpoint in ' + frame.callee.name); + console.log('what = ' + frame.eval('what').return); + } + }); + + console.log('Finished setting breakpoint!'); + ``` + +6. In the Scratchpad, ensure that no text is selected, and press the "Run" + button. + + Now, click on the text that says "Click me!" in the web page. + This runs the `div` element's `onclick` handler. + When control reaches the start of the `report` function, + `Debugger` calls the breakpoint handler's `hit` method, + passing a `Debugger.Frame` instance. + The `hit` method logs the breakpoint hit to the browser content toolbox's console. + Then it evaluates the expression `what` in the given stack frame, and logs its result. + The toolbox's console now looks like this: + + ![The breakpoint handler's console output][img-example-console] + + + You can also click on the text that says “Or me!”, to see `report` called from a + different handler. + + If `Debugger` is unable to find the `report` function, or the console output + does not appear, evaluate the expression `tabs[0].content.document.location` + in the console to make sure that `tabs[0]` indeed refers to the HTML file you + visited. If you have more than one tab visiting a `file:` URL, they all share + a single content process, so you may need to use a different element of the + array as the debuggee. + +7. Press "Run" in the Scratchpad again. Now, clicking on "Click me!" causes the + breakpoint hit to be logged twice---one for each `Debugger` instance. + + Multiple `Debugger` instances can observe the same debuggee. Re-running the code + in the Scratchpad creates a fresh `Debugger` instance, adds the same web page as + its debuggee, and then sets a new breakpoint. When you click on the `div` + element, both `Debugger`s' breakpoints are hit, and both handlers run. + + This shows how any number of `Debugger`-based tools can observe a single web + page simultaneously. In fact, you can use the Browser Content Toolbox's Debugger + panel to set its own breakpoint in `report`, and it will trigger along with the + first two. Keep in mind, however, that when multiple Debuggers share a debuggee, + the order in which their handlers run is not specified. If more than one tool + tries to influence the debuggee's behavior, their combined behavior could be + unpredictable. + +8. Close the web page and the Browser Content Toolbox. + + Since both the Scratchpad's global object and the debuggee window are + now gone, the `Debugger` instances will be garbage collected, since + they can no longer have any visible effect on Firefox's behavior. The + `Debugger` API tries to interact with garbage collection as + transparently as possible; for example, if both a `Debugger.Object` + instance and its referent are not reachable, they will both be + collected, even while the `Debugger` instance to which the shadow + belonged continues to exist. + +[debugger]: Debugger-API.md +[img-example-console]: console.png diff --git a/js/src/doc/Debugger/Tutorial-Debugger-Statement.md b/js/src/doc/Debugger/Tutorial-Debugger-Statement.md new file mode 100644 index 0000000000..ebac35e445 --- /dev/null +++ b/js/src/doc/Debugger/Tutorial-Debugger-Statement.md @@ -0,0 +1,94 @@ +Tutorial: Evaluate an Expression When a debugger; Statement Is Executed +======================================================================= + +**NOTE: This tutorial no longer works in current versions of Firefox.** +Instead, please try the updated and expanded [breakpoint tutorial][tut breakpoint]. + +This page shows how you can try out the [`Debugger` API][debugger] yourself +using Firefox's Scratchpad. We use the API to evaluate an expression in the web +page whenever it executes a JavaScript `debugger;` statement. + +1. Visit the URL `about:config`, and set the `devtools.chrome.enabled` + preference to `true`: + + ![Setting the 'devtools.chrome.enabled' preference][img-chrome-pref] + +2. Save the following HTML text to a file, and visit the file in your + browser: + + ```html +
Click me!
+ ``` + +3. Open a developer Scratchpad (Menu button > Developer > Scratchpad), and + select "Browser" from the "Environment" menu. (This menu will not be + present unless you have changed the preference as explained above.) + + ![Selecting the 'browser' context in the Scratchpad][img-scratchpad-browser] + +4. Enter the following code in the Scratchpad: + + ```js + // This simply defines 'Debugger' in this Scratchpad; + // it doesn't actually start debugging anything. + const { addDebuggerToGlobal } = ChromeUtils.importESModule( + "resource://gre/modules/jsdebugger.sys.mjs" + ); + addDebuggerToGlobal(window); + + // Create a 'Debugger' instance. + var dbg = new Debugger; + + // Get the current tab's content window, and make it a debuggee. + var w = gBrowser.selectedBrowser.contentWindow.wrappedJSObject; + dbg.addDebuggee(w); + + // When the debuggee executes a 'debugger' statement, evaluate + // the expression 'x' in that stack frame, and show its value. + dbg.onDebuggerStatement = function (frame) { + alert('hit debugger statement; x = ' + frame.eval('x').return); + } + ``` + +5. In the Scratchpad, ensure that no text is selected, and press the "Run" + button. + +6. Now, click on the text that says "Click me!" in the web page. This runs + the `div` element's `onclick` handler. When control reaches the + `debugger;` statement, `Debugger` calls your callback function, passing + a `Debugger.Frame` instance. Your callback function evaluates the + expression `x` in the given stack frame, and displays the alert: + + ![The Debugger callback displaying an alert][img-example-alert] + +7. Press "Run" in the Scratchpad again. Now, clicking on the "Click me!" + text causes *two* alerts to show---one for each `Debugger` + instance. + + Multiple `Debugger` instances can observe the same debuggee. Re-running + the code in the Scratchpad created a fresh `Debugger` instance, added + the same web page as its debuggee, and then registered a fresh + `debugger;` statement handler with the new instance. When you clicked + on the `div` element, both of them ran. This shows how any number of + `Debugger`-based tools can observe a single web page + simultaneously---although, since the order in which their handlers + run is not specified, such tools should probably only observe, and not + influence, the debuggee's behavior. + +8. Close the web page and the Scratchpad. + + Since both the Scratchpad's global object and the debuggee window are + now gone, the `Debugger` instances will be garbage collected, since + they can no longer have any visible effect on Firefox's behavior. The + `Debugger` API tries to interact with garbage collection as + transparently as possible; for example, if both a `Debugger.Object` + instance and its referent are not reachable, they will both be + collected, even while the `Debugger` instance to which the shadow + belonged continues to exist. + +[tut breakpoint]: Tutorial-Breakpoint.md +[debugger]: Debugger-API.md + +[img-chrome-pref]: enable-chrome-devtools.png +[img-scratchpad-browser]: scratchpad-browser-environment.png +[img-example-alert]: debugger-alert.png diff --git a/js/src/doc/Debugger/alloc-plot-console.png b/js/src/doc/Debugger/alloc-plot-console.png new file mode 100644 index 0000000000..5411724724 Binary files /dev/null and b/js/src/doc/Debugger/alloc-plot-console.png differ diff --git a/js/src/doc/Debugger/console.png b/js/src/doc/Debugger/console.png new file mode 100644 index 0000000000..9e9ce4ae74 Binary files /dev/null and b/js/src/doc/Debugger/console.png differ diff --git a/js/src/doc/Debugger/debugger-alert.png b/js/src/doc/Debugger/debugger-alert.png new file mode 100644 index 0000000000..2bf9362243 Binary files /dev/null and b/js/src/doc/Debugger/debugger-alert.png differ diff --git a/js/src/doc/Debugger/enable-chrome-devtools.png b/js/src/doc/Debugger/enable-chrome-devtools.png new file mode 100644 index 0000000000..033468991f Binary files /dev/null and b/js/src/doc/Debugger/enable-chrome-devtools.png differ diff --git a/js/src/doc/Debugger/index.rst b/js/src/doc/Debugger/index.rst new file mode 100644 index 0000000000..35ed8fa10f --- /dev/null +++ b/js/src/doc/Debugger/index.rst @@ -0,0 +1,20 @@ +===================== +Debugger API +===================== + +.. toctree:: + :maxdepth: 4 + + Debugger-API.md + Debugger.md + Debugger.Object.md + Debugger.Script.md + Debugger.Source.md + Debugger.Environment.md + Debugger.Frame.md + Debugger.Memory.md + Conventions.md + + Tutorial-Alloc-Log-Tree.md + Tutorial-Breakpoint.md + Tutorial-Debugger-Statement.md diff --git a/js/src/doc/Debugger/scratchpad-browser-environment.png b/js/src/doc/Debugger/scratchpad-browser-environment.png new file mode 100644 index 0000000000..534d0f9500 Binary files /dev/null and b/js/src/doc/Debugger/scratchpad-browser-environment.png differ diff --git a/js/src/doc/Debugger/shadows.svg b/js/src/doc/Debugger/shadows.svg new file mode 100644 index 0000000000..44bceddb4c --- /dev/null +++ b/js/src/doc/Debugger/shadows.svg @@ -0,0 +1,1000 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + Debugger + + + + + + + + + + + + Debugger.Object + Debugger.Environment + Debugger.Frame + Debugger.Object + Debugger.Object + Debugger.Script + + + + + global environment + + + global object: + Date; Math; ... + + + + + + + function alertLater(msg, delay) { setTimeout( function () { alert(msg); }, delay);} + + + + [[Code]]: + [[Scope]]: + + alertLater: + + + alertLater; + + + + + msg:delay: + 'xlerb'1000 + + + + + [[Code]]: + [[Scope]]: + + + + + + + anonymous() + + empty + + + + + + alert('xlerb') + + diff --git a/js/src/doc/HazardAnalysis/CFG.md b/js/src/doc/HazardAnalysis/CFG.md new file mode 100644 index 0000000000..1038e9f350 --- /dev/null +++ b/js/src/doc/HazardAnalysis/CFG.md @@ -0,0 +1,347 @@ +# sixgill CFG format + +The main output of the sixgill plugin is what is loosely labeled a control flow graph (CFG) associated with each function compiled. +These are stored in the file src_body.xdb, which contains a mapping from function names ("mangled\$unmangled") to function data. + +The graph is really a set of directed acyclic data flow graphs, stitched together via "loops" that imply back edges in the control flow graph. + +Function data is an array of "bodies", one body for the toplevel code in the function, and another body for each loop. A body is _not_ a basic block, since they can contain interior branches. (The nodes in a body do not necessarily dominate the following nodes.) A body is a DAG, and thus has no back edges or cross edges. Flow starts only at the entry point and ends only at the exit point, though (1) a loop body's entry point implicitly follows its exit point and (2) `Call` nodes will cause the actual program counter to go to another (possibly recursive) body. A body really describes data flow, not dynamic control flow. + +## Function Body + +A body (whether toplevel or loop) contains: + +- .BlockId + - `.Kind`: "Function" for the toplevel function or "Loop" for a (possibly nested) loop within it. + - `.Loop`: if .Kind == "Loop", then a string identifier distinguishing the loop, in the format "loop#n" where n is the index of the loop in the body. Nested loops will extend this to "loop#n#m". + - `.Variable`: + - `.Kind`: "Func" + - `.Name[]`: the function `Name` (see below) +- `.Version`: always zero +- `.Command`: the command used to compile this function, if recorded. This command will _not_ include the -fplugin parameters. +- `.Location[]`: a length-2 array of the source positions of the first and last line of the function definition. Hopefully it will be in the same file. Note that this Location is different from a `PPoint.Location` (see below), which will have a single source position. Each source position is: + - `.CacheString`: the filename + - `.Line`: the line number +- `.DefineVariable[]`: a list of variables defined in the body. The first one is for the function itself. Each variable has: + - `.Type`: the type of the variable. See `Type`, below. + - `.Variable`: + - `.Kind`: one of + - "Func" for the function itself + - "This" for the C++ `this` parameter + - "Arg" for parameters + - "Temp" for temporaries + - `.Name[]`: the variable `Name` (see below) +- `.Index[]`: a 2-tuple of the first and last index in the body. +- `.PPoint[]`: the filename and line number of each point in the body + - `.Location`: a single source point (see above). +- `.PEdge[]`: the bulk of the body. See Edges, below. +- `.LoopIsomorphic[]`: a list of `{"Index": point}` points in the body that are cloned in loop bodies. See the edge Kind `Loop`, below. + +A loop body (a body with BlockId.Kind == "Loop") will additionally have: + +- `.BlockPPoint`: an array of full references to points within parent bodies that represent the entry point of this loop. Each has: + - `.BlockId`: the BlockId of the parent body + - `.Index`: the index of the point within the parent body + - `.Version`: the value zero, intended for incremental analyses but unused in the GC hazard analysis. + +Note that a loop may appear in more than one parent body. I believe this will not be used for regular structured code, but could be necessary to properly disentangle loops when using `goto`. + +`Name`: a 2-tuple containing a variable or function name. The first element is a raw, internal name, and the second is a more user-facing name. For non-functions, both elements are normally the same, but `.Name[0]` could have a `:` suffix if there are multiple variables of that name in different scopes within the same function, or a `:` prefix for static variables. For functions, `.Name[0]` is the full name of the function (in format "mangled\$unmangled") and .Name[1] is the base name of the function (unqualified, with no type or parameters): + + "Variable": { + "Kind": "Func", + "Name": [ + "_Z12refptr_test9v$Cell* refptr_test9()", + "refptr_test9" + ] + } + +Bodies are an array of "edges" between "points". All behavior is described as happening on these edges. `body.Index[0]` gives the first point in the body. Each edge has a source and destination point. So eg if `body.Index[0]` is 1, then (unless the body is empty) there will be at least one edge with `edge.Index = [1, 2]`. The code `if (C) { x = 1; } else { x = 2; }; f();`, will have two edges sharing a common destination: + + Assume(1,2, C*, true) + Assign(2,4, x := 1) + Assume(1,3, C*, false) + Assign(3,4, x := 2) + Call(4,5, f()) + +Note that the above syntax is part of the default output of `xdbfind src_body.xdb `. It is a much-simplified version of the full JSON output from `xdbfind -json src_body.xdb `. It will be used in this document to describe examples because the JSON output is much too verbose. + +Every body is a directed acyclic graph (DAG), stored as a set of edges with source,destination point tuples. Any cycles in the original flow graph are replaced with Loop edges (see below). + +## Edges + +The edges are stored in an array named `PEdge`, with properties: + +- `.Index[]`: a 2-tuple giving the source and destination points. +- `.Kind`: One of 7 different Kinds. The rest of the attributes will depend on this Kind. + +Sixgill boils the control flow graph down to a small set of edge Kinds: + +### Assign + +- `.Exp[]`: a 2-tuple of [lhs, rhs] of the assignment, each an expression (see `Expressions`, below.) +- `.Type`: the overall type of the expression, which I believe is the type of the lhs? (See `Types`, below.) + +Note that `Call` is also used for assignments, when the result of the function call is being assigned to a variable. + +### Call + +- `.Exp[0]`: an expression representing the function being called (the "callee"). The callee might be a simple function, in which case `exp.Kind == "Var"`. Or it could be a computed function pointer or whatever. The expression evaluates to the function being called. +- `.Exp[1]` (optional): where to assign the return value. +- `.PEdgeCallArguments[]`: an array of expressions, one for each argument being passed. This does not include the `this` argument. +- `.PEdgeCallInstance`: the expression for the object to call the method on, which will be passed as the `this` argument. + +### Assume + +The destination of an `Assume` node can rely on the given value assumption, eg `Assume(1,2, __temp_1* == 7)` means that `__temp_1` will be 7 at point 2. + +A conditional branch will be represented as a pair of `Assume` edges coming off of the expression for the branch condition. These edges produce a data flow graph where you can know the value of a variable if it has passed through an `Assume` edge (at least, until it reaches an `Assign` or `Call` edge.) + +- `.Exp`: the expression being tested. +- `.PEdgeAssumeNonZero`: if present, this will be set to true, and means we are on the edge where `Exp` is `!= 0`. If this is not present, then `Exp` is `0`. + +Example: the C++ function body + + SomeRAIIType raii; + if (flipcoin()) { + return 1; + } else { + return 2; + } + +could produce something like: + + Call(3,4, __temp_1 := flipcoin()) + Assume(4,5, __temp_1*, true) + Assume(4,6, __temp_1*, false) + Assign(5,7, return := 1) + Assign(6,7, return := 2) + Call(7,8, raii.~__dt_comp ()) + +### Loop + +The edge corresponds to an entire loop. The meaning of a "loop" is subtle. It is mainly what is required to convert a general graph into a set of acyclic DAGs by finding back edges, and creating a "loop body" from the subgraph between the entry point (the destination of the back edge) and the source of the back edge. (Multiple back edges with a common destination will be a single loop.) Only the main body nodes that are necessary for (postdominated by) one of the back edges will be removed. Shared nodes will be cloned and will appear in both the main body and the loop body. The cloned nodes are described as "isomorphic". + +- `.BlockId` : the `BlockId` of the loop body. +- `.Loop` : an id like "loop#0" that will match up with the .BlockId.Loop property of the corresponding loop body. + +Example: consider the C++ code + + float testfunc(int val) { + int x = val; + x++; + loophead: + int y = x + 2; + if (y == 8) goto loophead; + y++; + if (y == 10) return 2.4; + if (y == 12) goto loophead; + return 3.6; + } + +This will produce the loop body: + + block: float32 testfunc(int32):loop#0 + parent: float32 testfunc(int32):3 + pentry: 1 + pexit: 6 + Assign(1,2, y := (x* + 2)) + Assume(2,6, (y* == 8), true) /* 6 is the exit point, so loops back to the entry point 1 */ + Assume(2,3, (y* == 8), false) + Assign(3,4, y := (y* + 1)) + Assume(4,5, (y* == 10), false) + Assume(5,6, (y* == 12), true) /* 6 is the exit point, so loops back to the entry point 1 */ + +and the main body: + + block: float32 testfunc(int32) + pentry: 1 + pexit: 11 + isomorphic: [4,5,6,7,9] + Assign(1,2, x := val*) + Assign(2,3, x := (x* + 1)) + Loop(3,4, loop#0) + Assign(4,5, y := (x* + 2)) /* edge is also in the loop */ + Assume(5,6, (y* == 8), false) /* edge is also in the loop */ + Assign(6,7, y := (y* + 1)) /* edge is also in the loop */ + Assume(7,8, (y* == 10), true) + Assume(7,9, (y* == 10), false) /* edge is also in the loop */ + Assign(8,11, return := 2.4) + Assume(9,10, (y* == 12), false) + Assign(10,11, return := 3.6) + +The isomorphic points correspond to the C++ code: + + y = x + 2; + if (y == 8) /* when y != 8 */ + y++; + if (y == 10) /* when y != 10 */ + +which is the code that will execute in order to reach the post-loop edge `Assume(9,10, (y* == 12), false)`. (If point 9 in the main body is reached and y _is_ equal to 12, then the `Assume(9,10,...)` edge will not be taken. Point 9 in the main body corresponds to point 5 in the loop body, so the edge `Assume(5,6, (y* == 12), true)` will be taken instead.) When "control flow" is at an isomorphic point, it can be considered to be at all "instantiations" of that point at the same time. Really, though, these are acyclic data flow graphs where a loop's exit point is externally known to flow into the entry point, and the main body lacks any `Assume` or other back edges that would make it cyclic. + +For a `while` loop, the isomorphic points will evaluate the conditional expression. + +Another example: the C++ code + + void testfunc() { + static Cell cell; + RefPtr v10; + v10.assign_with_AddRef(&somefloat); + while (flipcoin()) { + v10.forget(); + } + } + +generates + + block: void testfunc():loop#0 + parent: void testfunc():3 + pentry: 1 + pexit: 4 + Call(1,2, __temp_1 := flipcoin()) + Assume(2,3, __temp_1*, true) + Call(3,4, v10.forget()) + + block: void testfunc() + pentry: 1 + pexit: 7 + isomorphic: [3,4] + Call(1,2, v10.assign_with_AddRef(somefloat)) + Loop(2,3, loop#0) + Call(3,4, __temp_1 := flipcoin()) + Assume(4,5, __temp_1*, false) + Call(5,6, v10.~__dt_comp ()) + +The first block is the loop body, the second is the main body. Points 3 and 4 of the main body are equivalent to points 1 and 2 of the loop body. Notice the "parent" field of the loop body, which gives the equivalent point (3) of the loop's entry point in the body main. + +### Assembly + +An opaque wad of assembly code. + +### Annotation + +I'm not sure if I've seen these? They might be for the old annotation mechanism. + +### Skip + +These appear to be internal "epsilon" edges to simplify graph building and loop splitting. They are removed before the final CFG is emitted. + +## Expressions + +Expressions are the bulk of the CFG. + +- `.Width` (optional) : width in bits. I'm not sure when this is used. It is much more common for a Type to have a width. +- `.Unsigned` (optional) : boolean saying that this expression is unsigned. +- `.Kind` : one of the following values + +### Program lvalues + +- "Empty" : used in limited contexts when nothing is needed. +- "Var" : expression referring to a variable + - `.Type` +- "Drf" : dereference (as in, \*foo or foo->... or something implicit) + - `.Exp[0]` : target being dereferenced + - `.Type` +- "Fld" + - `.Exp[0]` : target object containing the field + - `.Field` + - `.Name[]` : 2-tuple of [qualified name, unqualified name] + - can be unnamed, in which case the name will be "field:". This is used for base classes. + - `.FieldCSU` : type of the CSU that the field is a member of + - `.Type` : type of the field + - `.FieldInstanceFunction` : "whether this is a virtual instance function rather than data field of the containing CSU". Presence or absence is what matters. All examples I have seen are for pure virtual functions (`virtual void foo() = 0`). + - `.Annotation[]` : any annotations on the specific field +- "Rfld" : ? some kind of "reverse" field access + - same children as Fld +- "Index" : array element access + - `.Exp[0]` : the target array + - `.Index` : the index being accessed (an Exp) + - `.Type` : the type of the element +- "String" : string constant + - `.Type` : the type of the string + - `.Count` : number of elements (chars) in the string + - `.String` : the actual data in the string +- "Clobber" : "additional lvalue generated by the memory model" (?) + - callee + - overwrite + - optional value kind + - point + - optional location + +### Program rvalues + +- "Int", "Float" : constant values + - `.String` : the string form of the value (this is the only way the value is stored) +- "Unop", "Binop" : operators + - `.OpCode` : the various opcodes + - `.Exp[0]` and `.Exp[1]` (the latter for Binop only) : parameters + - stride type (optional) + +### Expression modifiers + +- "Exit", "Initial" : ? + - `.Exp[0]` : target expression + - value kind (optional) +- "Val" : ? + - lvalue + - value kind (optional) + - index (body point) + - boolean saying whether it is relative (?) +- "Frame" : (unused) + +### Immutable properties + +These appear to be synthetic properties intended for the built-in analyses that we are not using. + +- "NullTest" : ? + - `.Exp[0]` : target being tested +- "Bound" : ? appears to be bounds-checked index access + - bound kind + - stride type + - `.Exp[0]` (optional) : target that the bound applies to +- "Directive" : ? + - directive kind + +### Mutable properties + +These appear to be synthetic properties intended for the built-in analyses that we are not using. + +- "Terminate" + - stride type + - terminate test (Exp) + - terminate int (Exp) + - `.Exp[0]` (optional) : target +- "GCSafe" : (unused) + - `.Exp[0]` (optional) : target + +## Types + +- `.Kind` : the kind of type being described, one of: + +Possible Type Kinds: + +- "Void" : the C/C++ void type +- "Int" + - `.Width` : width in bits + - `.Sign` (optional) : whether the type is signed + - `.Variant` (optional) : ? +- "Float" + - `.Width` : width in bits +- "Pointer" : pointer or reference type + - `.Width` : width in bits + - `.Reference` : 0 for pointer, 1 for regular reference, 2 for rvalue reference + - `.Type` : type of the target +- "Array" + - `.Type` : type of the elements + - `.Count` : number of elements, given as a plain constant integer +- "CSU" : class, structured, or union + - `.Name` : qualified name, as a plain string +- "Function" + - `.TypeFunctionCSU` (optional) : if present, the type of the CSU containing the function + - `.FunctionVarArgs` (?) (optional) : if this is present, the function is varargs (eg f(...)) + - `.TypeFunctionArgument` : array of argument types. Present if at least one parameter. + - `.Type` : type of argument + - `.Annotation` (optional) : any explicit annotations (**attribute**((foo))) for this parameter + - `.Variable` : the variable representing the function + - `.Annotation` (optional) : any explicit annotation for this function +- "Error" : there was an error handling this type in sixgill. Probably something unimplemented. diff --git a/js/src/doc/HazardAnalysis/index.md b/js/src/doc/HazardAnalysis/index.md new file mode 100644 index 0000000000..813369404d --- /dev/null +++ b/js/src/doc/HazardAnalysis/index.md @@ -0,0 +1,100 @@ +# Static Analysis for Rooting and Heap Write Hazards + +Treeherder can run two static analysis builds: the full browser (linux64-haz), just the JS shell (linux64-shell-haz). They show up on treeherder as `H` and `SM(H)`. + +## Diagnosing a hazard failure + +The first step is to look at what sort of hazard is being reported. There are two types that cause the job to fail: stack rooting hazards for garbage collection, and heap write thread safety hazards for stylo. + +The summary output will include either the string ` rooting hazards detected` or ` heap write hazards detected out of allowed`. See the appropriate section below for each. + +## Diagnosing a rooting hazards failure + +Click on the `H` build link, select the "Artifacts" pane on the bottom left, and download the `public/build/hazards.txt.gz` and `public/build/hazards.html.gz` files. The HTML file is most useful when running the analysis locally, since it will link to the exact parts of the code in question, but it's easier to talk about the text file here. + +Example snippet from `hazards.txt`: + + Function 'jsopcode.cpp:uint8 DecompileExpressionFromStack(JSContext*, int32, int32, class JS::Handle, int8**)' has unrooted 'ed' of type 'ExpressionDecompiler' live across GC call 'uint8 ExpressionDecompiler::decompilePC(uint8*)' at js/src/jsopcode.cpp:1866 + js/src/jsopcode.cpp:1866: Assume(74,75, !__temp_23*, true) + js/src/jsopcode.cpp:1867: Assign(75,76, return := 0) + js/src/jsopcode.cpp:1867: Call(76,77, ed.~ExpressionDecompiler()) + GC Function: uint8 ExpressionDecompiler::decompilePC(uint8*) + JSString* js::ValueToSource(JSContext*, class JS::Handle) + uint8 js::Invoke(JSContext*, JS::Value*, JS::Value*, uint32, JS::Value*, class JS::MutableHandle) + uint8 js::Invoke(JSContext*, JS::CallArgs, uint32) + JSScript* JSFunction::getOrCreateScript(JSContext*) + uint8 JSFunction::createScriptForLazilyInterpretedFunction(JSContext*, class JS::Handle) + uint8 JSRuntime::cloneSelfHostedFunctionScript(JSContext*, class JS::Handle, class JS::Handle) + JSScript* js::CloneScript(JSContext*, class JS::Handle, class JS::Handle, const class JS::Handle, uint32) + JSObject* js::CloneStaticBlockObject(JSContext*, class JS::Handle, class JS::Handle) + js::StaticBlockObject* js::StaticBlockObject::create(js::ExclusiveContext*) + js::Shape* js::EmptyShape::getInitialShape(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*, JSObject*, uint32, uint32) + js::Shape* js::EmptyShape::getInitialShape(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*, JSObject*, uint64, uint32) + js::UnownedBaseShape* js::BaseShape::getUnowned(js::ExclusiveContext*, js::StackBaseShape*) + js::BaseShape* js_NewGCBaseShape(js::ThreadSafeContext*) [with js::AllowGC allowGC = (js::AllowGC)1u] + js::BaseShape* js::gc::NewGCThing(js::ThreadSafeContext*, uint32, uint64, uint32) [with T = js::BaseShape; js::AllowGC allowGC = (js::AllowGC)1u; size_t = long unsigned int] + void js::gc::RunDebugGC(JSContext*) + void js::MinorGC(JSRuntime*, uint32) + GC + +This means that a rooting hazard was discovered at `js/src/jsopcode.cpp` line 1866, in the function `DecompileExpressionFromStack` (it is prefixed with the filename because it's a static function.) The problem is that there is an unrooted variable `ed` that holds an `ExpressionDecompiler` live across a call to `decompilePC`. "Live" means that the variable is used after the call to `decompilePC` returns. `decompilePC` may trigger a GC according to the static call stack given starting from the line beginning with "`GC Function:`". + +The hazard itself has some barely comprehensible `Assume(...)` and `Call(...)` [gibberish][CFG] that describes the exact data flow path of the variable into the function call. That stuff is rarely useful -- usually, you'll only need to look at it if it's complaining about a temporary and you want to know where the temporary came from. The type `ExpressionDecompiler` is believed to hold pointers to GC-controlled objects of some sort. The analysis currently does not describe the exact field it is worried about. + +To unpack this a little, the analysis is saying the following can happen: + +* `ExpressionDecompiler` contains some pointer to a GC thing. For example, it might have a field `obj` of type `JSObject*`. (There is a `gcTypes.txt` file inside `hazardIntermediates.tar.xz` that will give the detailed explanation for all types.) +* `DecompileExpressionFromStack` is called. +* A pointer is stored in that field of the `ed` variable. +* `decompilePC` is invoked, which calls `ValueToSource`, which calls `Invoke`, which eventually calls `js::MinorGC` +* During the resulting garbage collection, the object pointed to by `ed.obj` is moved to a different location. All pointers stored in the JS heap are updated automatically, as are all rooted pointers. `ed.obj` is not, because the GC doesn't know about it. +* After `decompilePC` returns, something accesses `ed.obj`. This is now a stale pointer, and may refer to just about anything -- the wrong object, an invalid object, or whatever. As TeX would say, **badness 10000**. + +## Diagnosing a heap write hazard failure + +OBSOLETE: The heap write hazard analysis has not been updated in years and is looking for things that no longer exist, and therefore will always report zero problems. + +For the thread unsafe heap write analysis, a hazard means that some Gecko_* function calls, directly or indirectly, code that writes to something on the heap, or calls an unknown function that *might* write to something on the heap. The analysis requires quite a few annotations to describe things that are actually safe. This section will be expanded as we gain more experience with the analysis, but here are some common issues: + +* Adding a new Gecko_* function: often, you will need to annotate any outparams or owned (thread-local) parameters in the `treatAsSafeArgument` function in `js/src/devtools/rootAnalysis/analyzeHeapWrites.js`. +* Calling some libc function: if you add a call to some random libc function (eg `sin()` or `floor()` or `ceil()`, though the latter two are already annotated), the analysis will report an "External Function". Add it to `checkExternalFunction`, assuming it *doesn't* have the possibility of writing to shared heap memory. +* If you call some non-returning (crashing) function that the analysis doesn't know about, you'll need to add it to `ignoreContents`. + +On the other hand, you might have a real thread safety issue on your hands. Shared caches are common problems. Fix it. + +## Analysis implementation + +These builds do the following: + +* set up a build environment and run the analysis within it, then upload the resulting files + * compile an optimized JS shell to later run the analysis + * compile the browser with gcc, using a slightly modified version of the sixgill (http://svn.sixgill.org) gcc plugin +* produce a set of `.xdb` files describing everything encountered during the compilation +* analyze the `.xdb` files with scripts in `js/src/devtools/rootAnalysis` + +The format of the information stored in those files is [somewhat documented][CFG]. + +## Running the analysis + +### Pushing to try + +The easiest way to run an analysis is to push to try with `mach try fuzzy -q "'haz"` (or, if the hazards of interest are contained entirely within `js/src`, use `mach try fuzzy -q "'shell-haz"` for a much faster result). The expected turnaround time for linux64-haz is just under 1.5 hours (~20 minutes for `hazard-linux64-shell-haz`). + +The output will be uploaded and an output file `hazards.txt.xz` will be placed into the "Artifacts" info pane on treeherder. + +### Running locally + +The rooting [hazard analysis may be run][running] using mach. + +## So you broke the analysis by adding a hazard. Now what? + +Backout, fix the hazard, or (final resort) update the expected number of hazards in `js/src/devtools/rootAnalysis/expect.browser.json` (but don't do that). + +The most common way to fix a hazard is to change the variable to be a `Rooted` type, as described in [RootingAPI.h][rooting] + +For more complicated cases, ask on the Matrix channel (see [spidermonkey.dev][spidermonkey] for contact info). If you don't get a response, ping sfink or jonco for rooting hazards, bholley or sfink for heap write hazards. + +[running]: running.md +[rooting]: https://searchfox.org/mozilla-central/source/js/public/RootingAPI.h +[spidermonkey]: https://spidermonkey.dev/ +[CFG]: CFG.md diff --git a/js/src/doc/HazardAnalysis/running.md b/js/src/doc/HazardAnalysis/running.md new file mode 100644 index 0000000000..4de0696986 --- /dev/null +++ b/js/src/doc/HazardAnalysis/running.md @@ -0,0 +1,124 @@ +# Running the Rooting Hazard Analysis + +The `js/src/devtools/rootAnalysis` directory contains scripts for running Brian +Hackett's static GC rooting and thread heap write safety analyses on a JS +source directory. + +To run the analysis on SpiderMonkey: + +1. Unset your $MOZCONFIG + + unset MOZCONFIG + +2. Install prerequisites. + + mach hazards bootstrap + +3. Build the shell to run the analysis. + + mach hazards build-shell + +4. Compile all the code to gather info. + + mach hazards gather --project=js + +5. Analyze the gathered info. + + mach hazards analyze --project=js + +Output goes to `$srctop/haz-js/hazards.txt`. This will run the analysis on the js/src +tree only; if you wish to analyze the full browser, use + + --project=browser + +(or leave it off; `--project=browser` is the default) + +6. (optional) View the resulting hazards. + + mach hazards view --project=js + +After running the analysis once, you can reuse the `*.xdb` database files +generated, using modified analysis scripts, by running either the `mach hazards +analyze` command above, or by adding on `mach hazards analyze ` to +run a subset of the analysis steps; `mach hazards analyze -- --list` to see +step names. + +Also, you can pass `-- -v` to get exact command lines to cut & paste for running +the various stages, which is helpful for running under a debugger. + +## Incremental Analyses + +Once you have an analysis, you can make code changes and rebuild with `mach hazards gather`. This will add to the existing `*.xdb` files, which will *usually* work ok, but sometimes older compilations will have left around information that will get in the way. A typical example is with lambda functions: you may get hazards reported due to lambdas that no longer exist, but the newer compile will not replace them. Although this could be fixed with some amount of effort, you're fighting against something of a fundamental problem where the analysis is depending on certain things *NOT* happening (eg calls to the GC) and incremental compilation only adds and replaces existing information. It does not remove information unless it is replacing it with something of a matching name (and things like lambdas have autogenerated numbers in their names that vary between compiles.) + +In short: for development speed, feel free to use incremental analyses but don't trust them. If the hazard analysis starts claiming the impossible is happening, try `mach hazards clobber` and do a full rebuild. + +## Overview of what is going on here + +So what does this actually do? + +1. It downloads a GCC compiler and plugin ("sixgill") from Mozilla servers. + +2. It runs `run_complete`, a script that builds the target codebase with the + downloaded GCC, generating a few database files containing control flow + graphs of the full compile, along with type information etc. + +3. Then it runs `analyze.py`, a Python script, which runs all the scripts + which actually perform the analysis -- the tricky parts. + (Those scripts are written in JS.) + +The easiest way to get this running is to not try to do the instrumented +compilation locally. Instead, grab the relevant files from a try server push +and analyze them locally. + +## Local Analysis of Downloaded Intermediate Files + +Another useful path is to let the continuous integration system do the hard +work of generating the intermediate files and analyze them locally. This is +particularly useful if you are working on the analysis itself. + +* Do a try push with "--upload-xdbs" appended to the try: ..." line. + + mach try fuzzy -q "'haz" --upload-xdbs + +* Create an empty directory to run the analysis. + +* When the try job is complete, download the resulting `src_body.xdb.bz2`, +`src_comp.xdb.bz2`, and `file_source.xdb.bz2` files into your directory. + +* Fetch a compiler and sixgill plugin to use: + + mach hazards bootstrap + +If you are on osx, these will not be available. Instead, build sixgill manually +(these directions are a little stale): + + hg clone https://hg.mozilla.org/users/sfink_mozilla.com/sixgill + cd sixgill + CC=$HOME/.mozbuild/hazard-tools/gcc/bin/gcc ./release.sh --build # This will fail horribly. + make bin/xdb.so CXX=clang++ + +* Build an optimized JS shell with ctypes. Note that this does not need to +match the source you are analyzing in any way; in fact, you pretty much never +need to update this once you've built it. (Though I reserve the right to use +any new JS features implemented in Spidermonkey in the future...) + + mach hazards build-shell + +The shell will be placed by default in `$topsrcdir/obj-haz-shell`. + +* Make a defaults.py file containing the following, with your own paths filled in: + + js = "/dist/bin/js" + sixgill_bin = "/bin" + +* For the rooting analysis, run + + python /js/src/devtools/rootAnalysis/analyze.py gcTypes + +* For the heap write analysis, run + + python /js/src/devtools/rootAnalysis/analyze.py heapwrites + +Also, you may wish to run with -v (aka --verbose) to see the exact commands +executed that you can cut & paste if needed. (I use them to run under the JS +debugger when I'm working on the analysis.) diff --git a/js/src/doc/MIR-optimizations/index.md b/js/src/doc/MIR-optimizations/index.md new file mode 100644 index 0000000000..32c38bd36a --- /dev/null +++ b/js/src/doc/MIR-optimizations/index.md @@ -0,0 +1,97 @@ +# MIR optimizations from a thousand feet + +MIR is the intermediate representation (`IR`) used in Ion, SpiderMonkey's optimizing compiler backend. MIR is generated by [WarpBuilder](https://hacks.mozilla.org/2020/11/warp-improved-js-performance-in-firefox-83/), then optimized by a succession of passes. (MIR is also used to compile WebAssembly code, but this document is focused on JavaScript compilation). + +This is a quick summary of all the MIR passes as of Feb 2021. The italicized passes are classic optimizations that are likely to be extensively covered in a compiler textbook. Non-italicized passes are either JS-specific, or too trivial to cover. + +The state of the MIR after each of these passes can be visualized using [iongraph](https://github.com/sstangl/iongraph). + +## *BuildSSA* +[Single Static Assignment](https://en.wikipedia.org/wiki/Static_single_assignment_form) is a form of IR in which every value is defined exactly once. It has several nice properties for optimization. Note: SSA is why we have phi nodes. + +## Prune Unused Branches +What it says on the tin: prunes away branches that are never taken. + +## Fold Empty Blocks +A simple cleanup pass to get rid of empty blocks with one predecessor and one successor by folding them into their successor. + +## Fold Tests +Simplifies the code generated for conditional operations. [See the comment here](https://searchfox.org/mozilla-central/rev/bd92b9b4a3c2ff022e830c1358968a84e6e69c95/js/src/jit/IonAnalysis.cpp#849-871). + +## Split Critical Edges +In subsequent passes, we may choose to move code around. In preparation, this pass adds empty blocks along [critical edges](https://en.wikipedia.org/wiki/Control-flow_graph#Special_edges), so that we have a safe place to put those instructions. + +## Renumber Blocks +This pass literally just reassigns block numbers. + +## Eliminate Phis +After some of the above optimizations, some of our phi nodes may be things like `b = phi(a,a)`, which is redundant. This pass cleans those up. + +## *Scalar Replacement* +If a function allocates and uses an object, but we can [prove that the object never escapes the function](https://en.wikipedia.org/wiki/Escape_analysis), then we can sometimes avoid the allocation entirely by tracking each of the object’s components (fields in C/C++; slots/elements in JS) individually. + +## Apply types +Each type of MIR node has a [TypePolicy](https://searchfox.org/mozilla-central/rev/fd853f4aea89186efdb368e759a71b7a90c2b89c/js/src/jit/TypePolicy.h#23-35) defining what type of input it accepts. If necessary, this pass inserts (potentially fallible) conversions to guarantee that the types work out. + +## *Alias Analysis* +[Alias analysis](https://en.wikipedia.org/wiki/Alias_analysis) determines whether two instructions may use/modify the same memory. If they do, then they can not be reordered with respect to each other, because that could change the semantics of the program. + +## *GVN* +[Global Value Numbering](https://en.wikipedia.org/wiki/Value_numbering) is a classic optimization for finding places where we compute the same value multiple times, and eliminating the redundancy. + +## *LICM* +[Loop-Invariant Code Motion](https://en.wikipedia.org/wiki/Loop-invariant_code_motion) finds instructions in a loop that will compute the same value every time and hoists them out of the loop. + +## Beta +This is done in preparation for range analysis. This particular approach to range analysis is [taken from a paper by Gough and Klaren](https://searchfox.org/mozilla-central/rev/fd853f4aea89186efdb368e759a71b7a90c2b89c/js/src/jit/RangeAnalysis.cpp#49-108). + +## *Range Analysis* +[A classic optimization](https://en.wikipedia.org/wiki/Value_range_analysis) that determines the possible range of values a definition can take on. Used to implement many of the following passes. + +## De-Beta +Remove beta nodes now that we don’t need them. + +## RA Check UCE +Check to see if range analysis has made any code eligible for Unreachable Code Elimination. + +## Truncate Doubles +Strength-reduce double arithmetic to integer arithmetic if range analysis says it’s okay. + +## Sink +If we compute a value that will only be used along some paths, and we could recover the value if one of the other paths bailed out, then we can postpone the computation of that variable until we are sure we will need it. [More details here](https://bugzilla.mozilla.org/show_bug.cgi?id=1093674). + +## Remove Unnecessary Bitops +Remove bit-wise arithmetic operators that don’t do anything (like `x | 0` on integer input). + +## Fold Linear Arithmetic Constants +Fold `a + constant1 + constant2` into `a + (constant1+constant2)`. + +## Effective Address Analysis +This was added to ensure that we generated good code for memory accesses in asm.js. + +## *DCE* +[Dead code elimination](https://en.wikipedia.org/wiki/Dead_code_elimination) removes instructions whose results are never needed. + +## *Reordering* +Shuffle instructions around within a block to reduce the lifetime of intermediate values and reduce register pressure. This is a relatively simple version of [instruction scheduling](https://en.wikipedia.org/wiki/Instruction_scheduling). + +## Make loops contiguous +Reorder blocks so that all the blocks in a loop are generated in one contiguous chunk, which is good for cache locality. + +## Edge Case Analysis (Late) +A place to check for edge cases after code has stopped being moved around. Currently used for checking whether some instructions need to handle negative zero. + +## *Bounds Check Elimination* +[A classic optimization](https://en.wikipedia.org/wiki/Bounds-checking_elimination) that eliminates bounds checks if we can prove at compile time that they can’t fail. + +## FoldLoadsWithUnbox +Loading a NaN-boxed value and then unboxing it can be slightly more efficient if we do both operations at once. + +## Add KeepAlive Instructions +While we access the slots or elements of an object, we have to ensure that the object itself is not collected by the GC. See [bug 1160884](https://bugzilla.mozilla.org/show_bug.cgi?id=1160884). + +## Generate LIR +After optimization is over, we lower our IR from MIR to LIR to do register allocation and code generation. + +## *Allocate Registers* +Programs can have way more variables than hardware has registers, so we have to decide which values live in which registers when. [This is a very well studied area](https://en.wikipedia.org/wiki/Register_allocation). diff --git a/js/src/doc/SavedFrame/index.md b/js/src/doc/SavedFrame/index.md new file mode 100644 index 0000000000..cdd4643a64 --- /dev/null +++ b/js/src/doc/SavedFrame/index.md @@ -0,0 +1,95 @@ +# SavedFrame + +A `SavedFrame` instance is a singly linked list of stack frames. It represents a +JavaScript call stack at a past moment of execution. Younger frames hold a +reference to the frames that invoked them. The older tails are shared across +many younger frames. + +`SavedFrame` stacks should generally be captured, allocated, and live within the +compartment that is being observed or debugged. Usually this is a content +compartment. + +## Capturing `SavedFrame` Stacks + +### From C++ + +Use `JS::CaptureCurrentStack` declared in `jsapi.h`. + +### From JS + +Use `saveStack`, accessible via `Components.utils.getJSTestingFunction()`. + +## Including and Excluding Chrome Frames + +Consider the following `SavedFrame` stack. Arrows represent links from child to +parent frame, `content.js` is from a compartment with content principals, and +`chrome.js` is from a compartment with chrome principals. + +```text +function A from content.js + | + V +function B from chrome.js + | + V +function C from content.js +``` +The content compartment will ever have one view of this stack: `A -> C`. + +However, a chrome compartment has a choice: it can either take the same view +that the content compartment has (`A -> C`), or it can view all stack frames, +including the frames from chrome compartments (`A -> B -> C`). To view +everything, use an `XrayWrapper`. This is the default wrapper. To see the stack +as the content compartment sees it, waive the xray wrapper with +`Components.utils.waiveXrays`: + + const contentViewOfStack = Components.utils.waiveXrays(someStack); + +## Accessor Properties of the `SavedFrame.prototype` Object + +### `source` +The source URL for this stack frame, as a string. + +### `sourceId` +The process-unique internal integer ID of this source. Usable to match up +a SavedFrame with a [Debugger.Source][dbg-source] using its `id` property. + +### `line` +The line number for this stack frame. + +### `column` +The column number for this stack frame. + +### `functionDisplayName` +Either SpiderMonkey's inferred name for this stack frame's function, or + `null`. + +### `asyncCause` +If this stack frame is the `asyncParent` of other stack frames, then this is +a string representing the type of asynchronous call by which this frame +invoked its children. For example, if this frame's children are calls to +handlers for a promise this frame created, this frame's `asyncCause` would +be `"Promise"`. If the asynchronous call was started in a descendant frame +to which the requester of the property does not have access, this will be +the generic string `"Async"`. If this is not an asynchronous call point, +this will be `null`. + +### `asyncParent` +If this stack frame was called as a result of an asynchronous operation, for +example if the function referenced by this frame is a promise handler, this +property points to the stack frame responsible for the asynchronous call, +for example where the promise was created. If the frame responsible for the +call is not accessible to the caller, this points to the youngest accessible +ancestor of the real frame, if any. In all other cases, this is `null`. + +### `parent` +This stack frame's caller, or `null` if this is the oldest frame on the +stack. In this case, there might be an `asyncParent` instead. + +## Function Properties of the `SavedFrame.prototype` Object + +### `toString()` +Return this frame and its parents formatted as a human readable stack trace +string. + +[dbg-source]: ../Debugger/Debugger.Source.md diff --git a/js/src/doc/build.rst b/js/src/doc/build.rst new file mode 100644 index 0000000000..7237c80dff --- /dev/null +++ b/js/src/doc/build.rst @@ -0,0 +1,247 @@ +Building and testing SpiderMonkey +================================= + +**The first step is to run our “bootstrap” script to help ensure you have the +right build tools for your operating system. This will also help you get a copy +of the source code. You do not need to run the “mach build” command just yet +though.** + +* :ref:`Building Firefox On Linux` +* :ref:`Building Firefox On Windows` +* :ref:`Building Firefox On MacOS` + +This guide shows you how to build SpiderMonkey using ``mach``, which is +Mozilla's multipurpose build tool. This replaces old guides that advised +running the "configure" script directly. + +These instructions assume you have a clone of `mozilla-unified` and are +interested in building the JS shell. + +Developer (debug) build +~~~~~~~~~~~~~~~~~~~~~~~ + +For developing and debugging SpiderMonkey itself, it is best to have +both a debug build (for everyday debugging) and an optimized build (for +performance testing), in separate build directories. We'll start by +covering how to create a debug build. + +Setting up a MOZCONFIG +----------------------- + +First, we will create a ``MOZCONFIG`` file. This file describes the characteristics +of the build you'd like `mach` to create. Since it is likely you will have a +couple of ``MOZCONFIGs``, a directory like ``$HOME/mozconfigs`` is a useful thing to +have. + +A basic ``MOZCONFIG`` file for doing a debug build, put into ``$HOME/mozconfigs/debug`` looks like this + +.. code:: text + + # Build only the JS shell + ac_add_options --enable-project=js + + # Enable the debugging tools: Assertions, debug only code etc. + ac_add_options --enable-debug + + # Enable optimizations as well so that the test suite runs much faster. If + # you are having trouble using a debugger, you should disable optimization. + ac_add_options --enable-optimize + + # Use a dedicated objdir for SpiderMonkey debug builds to avoid + # conflicting with Firefox build with default configuration. + mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-debug-@CONFIG_GUESS@ + +To activate a particular ``MOZCONFIG``, set the environment variable: + +.. code:: text + + export MOZCONFIG=$HOME/mozconfigs/debug + +Building +-------- + +Once you have activated a ``MOZCONFIG`` by setting the environment variable +you can then ask ``mach``, located in the top directory of your checkout, +to do your build: + +.. code:: console + + $ cd + $ ./mach build + +.. note:: + + If you are on Mac and baldrdash fails to compile with something similar to + + :: + + /usr/local/Cellar/llvm/7.0.1/lib/clang/7.0.1/include/inttypes.h:30:15: fatal error: 'inttypes.h' file not found + + This is because, starting from Mojave, headers are no longer + installed in ``/usr/include``. Refer the `release + notes `__ under + Command Line Tools -> New Features + + The release notes also states that this compatibility package will no longer be provided in the near + future, so the build system on macOS will have to be adapted to look for headers in the SDK + + Until then, the following should help, + + :: + + open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pk + +Once you have successfully built the shell, you can run it using ``mach run``. + +Testing +~~~~~~~ + +Once built, you can then use ``mach`` to run the ``jit-tests``: + +.. code:: console + + $ ./mach jit-test + +Similarly you can use also run ``jstests``. These include a local, +intermittently updated, copy of all `test262 `_ +tests. + +.. code:: console + + $ ./mach jstests + +See :doc:`Running Automated JavaScript Tests` for more details. + +Optimized Builds +~~~~~~~~~~~~~~~~ + +To switch to an optimized build, such as for performance testing, one need only +have an optimized build ``MOZCONFIG``, and then activate it. An example +``$HOME/mozconfigs/optimized`` ``MOZCONFIG`` looks like this: + +.. code:: text + + # Build only the JS shell + ac_add_options --enable-project=js + + # Enable optimization for speed + ac_add_options --enable-optimize + + # Disable debug checks to better match a release build of Firefox. + ac_add_options --disable-debug + + # Use a separate objdir for optimized builds to allow easy + # switching between optimized and debug builds while developing. + mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-opt-@CONFIG_GUESS@ + +SpiderMonkey on Android aarch64 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Building SpiderMonkey on Android +-------------------------------- + +- First, run `mach bootstrap` and answer `GeckoView/Firefox for Android` when + asked which project you want to build. This will download a recent Android + NDK, make sure all the build dependencies required to compile on Android are + present, etc. +- Make sure that `$MOZBUILD_DIR/android-sdk-linux/platform-tools` is present in + your `PATH` environment. You can do this by running the following line in a + shell, or adding it to a shell profile init file: + +.. code:: console + + $ export PATH="$PATH:~/.mozbuild/android-sdk-linux/platform-tools" + +- Create a typical `mozconfig` file for compiling SpiderMonkey, as outlined in + the :ref:`Setting up a MOZCONFIG` documentation, and include the following + line: + +.. code:: console + + $ ac_add_options --target=aarch64-linux-android + +- Then compile as usual with `mach build` with this `MOZCONFIG` file. + +Running jit-tests on Android +---------------------------- + +- Plug your Android device to the machine which compiled the shell for aarch64 + as described above, or make sure it is on the same subnetwork as the host. It + should appear in the list of devices seen by `adb`: + +.. code:: console + + $ adb devices + +This command should show you a device ID with the name of the device. If it +doesn't, make sure that you have enabled Developer options on your device, as +well as `enabled USB debugging on the device `_. + +- Run `mach jit-test --remote {JIT_TEST_ARGS}` with the android-aarch64 + `MOZCONFIG` file. This will upload the JS shell and its dependencies to the + Android device, in a temporary directory (`/data/local/tmp/test_root/bin` as + of 2020-09-02). Then it will start running the jit-test suite. + +Debugging jit-tests on Android +------------------------------ + +Debugging on Android uses the GDB remote debugging protocol, so we'll set up a +GDB server on the Android device, that is going to be controlled remotely by +the host machine. + +- Upload the `gdbserver` precompiled binary from the NDK from the host machine + to the Android device, using this command on the host: + +.. code:: console + + $ adb push \ + ~/.mozbuild/android-ndk-r23c/prebuilt/android-arm64/gdbserver/gdbserver \ + /data/local/tmp/test_root/bin + +- Make sure that the `ncurses5` library is installed on the host. On + Debian-like distros, this can be done with `sudo apt install -y libncurses5`. + +- Set up port forwarding for the GDB port, from the Android device to the host, + so we can connect to a local port from the host, without needing to find what + the IP address of the Android device is: + +.. code:: console + + $ adb forward tcp:5039 tcp:5039 + +- Start `gdbserver` on the phone, passing the JS shell command line arguments + to gdbserver: + +.. code:: console + + $ adb shell export LD_LIBRARY_PATH=/data/local/tmp/test_root/bin '&&' /data/local/tmp/test_root/bin/gdbserver :5039 /data/local/tmp/test_root/bin/js /path/to/test.js + +.. note:: + + Note this will make the gdbserver listen on the 5039 port on all the + network interfaces. In particular, the gdbserver will be reachable from + every other devices on the same networks as your phone. Since the gdbserver + protocol is unsafe, it is strongly recommended to double-check that the + gdbserver process has properly terminated when exiting the shell, and to + not run it more than needed. + +.. note:: + + You can find the full command line that the `jit_test.py` script is + using by giving it the `-s` parameter, and copy/paste it as the final + argument to the gdbserver invocation above. + +- On the host, start the precompiled NDK version of GDB that matches your host + architecture, passing it the path to the shell compiled with `mach` above: + +.. code:: console + + $ ~/.mozbuild/android-ndk-r23c/prebuilt/linux-x86_64/bin/gdb /path/to/objdir-aarch64-linux-android/dist/bin/js + +- Then connect remotely to the GDB server that's listening on the Android + device: + +.. code:: console + + $(gdb) target remote :5039 + $(gdb) continue diff --git a/js/src/doc/bytecode_checklist.md b/js/src/doc/bytecode_checklist.md new file mode 100644 index 0000000000..be3c1011a4 --- /dev/null +++ b/js/src/doc/bytecode_checklist.md @@ -0,0 +1,44 @@ +# So You Want to Add a Bytecode Op + +Occasionally we need to add a new bytecode operation in order to express or optimize +some feature of JavaScript. This document is intended to assist you in determining if +you ought to add a bytecode op, and what kind of changes and integrations will be +required if you do this. + +Think of this as a more specialized version of the feature addition document. + +## First: Do you need a new bytecode op? + +There are alternatives to increasing the bytecode space! + +* [Self-hosted intrinsics][intrinsic] can be accessed directly from the bytecode + generator using the `GetIntrinsic` bytecode. For an example see + [`BytecodeEmitter::emitCopyDataProperties`][emitCopy]. Calls to intrinsics can be + [(optionally) optimized with CacheIR as well][optimize]. +* Desugar the behavior you want into a sequence of already existing ops. If this is + possible this is often the right choice: Support for everything comes along mostly + for free. + * Is it possible to teach the JIT compilers to recognize your sequence of bytecode + if special handling is required? If that kind of idiom recognition is too costly, + you may be better served by a new bytecode op. + + +## Second: Can we make your bytecode fast? + +To implement bytecode in our JIT compilers we put a few constraints on our bytecode. +e.g. Each bytecode must be atomic, to support bailouts and exception behaviour. This +means that your op must not create torn objects or invalid states. + +Ideally there would be a fast inline cache possible to implement your bytecode, as +this brings speed and performance to our BaselineInterpreter layer, rather than +waiting for Warp. + +ICs are valuable when behaviour varies substantially at runtime (ie, we choose +different paths based on operand types). Examples of where we've used this are +`JSOp::OptimizeSpreadCall` and `JSOp::CloseIter`, both of which let us generate a +fastpath in the common case or fall back to something more expensive if necessary. + + +[intrinsic]: https://searchfox.org/mozilla-central/search?q=intrinsic_&path=js%2Fsrc%2F&case=false®exp=false +[emitCopy]: https://searchfox.org/mozilla-central/rev/650c19c96529eb28d081062c1ca274bc50ef3635/js/src/frontend/BytecodeEmitter.cpp#5018,5027,5039,5045,5050,5055,5067 +[optimize]: https://searchfox.org/mozilla-central/rev/c1180ea13e73eb985a49b15c0d90e977a1aa919c/js/src/jit/CacheIR.cpp#10140 diff --git a/js/src/doc/feature_checklist.md b/js/src/doc/feature_checklist.md new file mode 100644 index 0000000000..7cf847f725 --- /dev/null +++ b/js/src/doc/feature_checklist.md @@ -0,0 +1,79 @@ +# JavaScript Language Feature Checklist +So you're working on a new JavaScript feature in SpiderMonkey: Congratulations! Here's a set of checklists and guidelines to help you on your way. + +## High Level Feature Ship Checklist. +(Note: Some of these pieces can happen in parallel, so it's not necessary to +work directly top-down) + +- ☐ Send an Intent to Prototype email to `dev-platform`. This is part of the + [Exposure Guidelines](https://wiki.mozilla.org/ExposureGuidelines) process. We + historically haven't been amazing at sending intent-to-prototype emails, but + we can always get better. +- ☐ Create a shell option for the feature. +- ☐ Stage 2 or earlier proposals should be developed under compile time guards, + disabled by default. +- ☐ Create a browser preference for the feature. +- ☐ Implement the Feature. +- ☐ Land feature disabled by pref and shell-option. +- ☐ Import the test262 test cases for the feature, or enable them if they're + already imported. (See `js/src/test/Readme.txt` for guidance) +- ☐ Contact `fuzzing@mozilla.org` to arrange fuzzing for the feature. +- ☐ Add shell option to `js/src/shell/fuzz-flags.txt`. This signals to other + fuzzers as well that the feature is ready for fuzzing. +- ☐ Send an Intent to Ship Email to `dev-platform`. This is also part of the + [Exposure Guidelines](https://wiki.mozilla.org/ExposureGuidelines) process. +- ☐ Ship the feature; default the preference to true and the command-line + option to true. +- ☐ Open a followup bug to later remove the preference and the command line + option. + + +## Supplemental Checklists +### Shipping Consideration Checklist + +- ☐ If it seems possible that the feature will cause webcompat issues, + consider shipping `NIGHTLY_ONLY` for a cycle or two, to use nightly as an + attempt to shake out potential webcompat issues. + + +### Web Platform Integration Checklist + +_Sometimes Complexity of the web-platform leaks into JS Feature works_ + +- ☐ Ensure the appropriate web-platform tests exist, and are being run. +- ☐ Is your feature correctly enabled inside of Workers? (They have different + option set than main thread, and it's easy to forget them!) You may want to + write a mochitest. + +### Syntax Features Checklist + +- ☐ Does `Reflect.parse` correctly parse and return results for your new syntax? + - `Reflect.parse` tests are interesting as well, because they can be written + for new syntax before bytecode emission is done. + - ☐ Are the locations correct for the new syntax entries in the parse tree? +- ☐ Are your errors emitted with sensible location info? + +### Testing Consideration Checklist + +_There's lots of complexity in SpiderMonkey that isn't always captured by the +specification, so the below is some useful guidance to behaviour to pay +attention to that may not be tested by a feature's test262 tests_ + +- ☐ How does your feature interact with multiple compartments? What happens if + references happen across compartments, or if `this` is a + `CrossCompartmentWrapper`? +- ☐ Are your error messages being emitted in the correct realm, with the + correct prototype? +- ☐ If async functions or promises are involved, are user-code objects + resolved? If so, does the feature correctly handle [the `.then` property + behaviour of promise + resolution?](https://www.stefanjudis.com/today-i-learned/promise-resolution-with-objects-including-a-then-property/) +- ☐ Have you written some OOM tests for your feature to ensure your OOM + handling is correct? + +#### Web Platform Testing Considerations +- ☐ Does the feature have to handle exotic objects specially? Consider what + happens when your feature interacts with the very exotic objects on the web + platform, such as `WindowProxy`, `Location` (cross-origin objects). +- ☐ What happens when your feature interacts with + [X-rays](/dom/scriptSecurity/xray_vision.rst)? diff --git a/js/src/doc/gc.rst b/js/src/doc/gc.rst new file mode 100644 index 0000000000..e0a70c9d7a --- /dev/null +++ b/js/src/doc/gc.rst @@ -0,0 +1,140 @@ +SpiderMonkey garbage collector +============================== + +The SpiderMonkey garbage collector is responsible for allocating memory +representing JavaScript data structures and deallocating them when they are no +longer in use. It aims to collect as much data as possible in as little time +as possible. As well as JavaScript data it is also used to allocate some +internal SpiderMonkey data structures. + +The garbage collector is a hybrid tracing collector, and has the following +features: + + - :ref:`Precise ` + - :ref:`Incremental ` + - :ref:`Generational ` + - :ref:`Partially concurrent ` + - :ref:`Parallel ` + - :ref:`Compacting ` + - :ref:`Partitioned heap ` + +For an overview of garbage collection see: +https://en.wikipedia.org/wiki/Tracing_garbage_collection + +Description of features +####################### + +.. _precise-gc: + +Precise collection +****************** + +The GC is 'precise' in that it knows the layout of allocations (which is used +to determine reachable children) and also the location of all stack roots. This +means it does not need to resort to conservative techniques that may cause +garbage to be retained unnecessarily. + +Knowledge of the stack is achieved with C++ wrapper classes that must be used +for stack roots and handles (pointers) to them. This is enforced by the +SpiderMonkey API (which operates in terms of these types) and checked by a +static analysis that reports places when unrooted GC pointers can be present +when a GC could occur. + +For details of stack rooting, see: https://github.com/mozilla-spidermonkey/spidermonkey-embedding-examples/blob/esr78/docs/GC%20Rooting%20Guide.md + +We also have a :doc:`static analysis ` for detecting +errors in rooting. It can be :doc:`run locally or in CI `. + +.. _incremental-gc: + +Incremental collection +********************** + +'Stop the world' collectors run a whole collection in one go, which can result +in unacceptable pauses for users. An incremental collector breaks its +execution into a number of small slices, reducing user impact. + +As far as possible the SpiderMonkey collector runs incrementally. Not all +parts of a collection can be performed incrementally however as there are some +operations that need to complete atomically with respect to the rest of the +program. + +Currently, most of the collection is performed incrementally. Root marking, +compacting, and an initial part of sweeping are not. + +.. _generational-gc: + +Generational collection +*********************** + +Most real world allocations either die very quickly or live for a long +time. This suggests an approach to collection where allocations are moved +between 'generations' (separate heaps) depending on how long they have +survived. Generations containing young allocations are fast to collect and can +be collected more frequently; older generations are collected less often. + +The SpiderMonkey collector implements a single young generation (the nursery) +and a single old generation (the tenured heap). Collecting the nursery is +known as a minor GC as opposed to a major GC that collects the whole heap +(including the nursery). + +.. _concurrent-gc: + +Concurrent collection +********************* + +Many systems have more than one CPU and therefore can benefit from offloading +GC work to another core. In GC terms 'concurrent' usually refers to GC work +happening while the main program continues to run. + +The SpiderMonkey collector currently only uses concurrency in limited phases. + +This includes most finalization work (there are some restrictions as not all +finalization code can tolerate this) and some other aspects such as allocating +and decommitting blocks of memory. + +Performing marking work concurrently is currently being investigated. + +.. _parallel-gc: + +Parallel collection +******************* + +In GC terms 'parallel' usually means work performed in parallel while the +collector is running, as opposed to the main program itself. The SpiderMonkey +collector performs work within GC slices in parallel wherever possible. + +.. _compacting-gc: + +Compacting collection +********************* + +The collector allocates data with the same type and size in 'arenas' (often know +as slabs). After many allocations have died this can leave many arenas +containing free space (external fragmentation). Compacting remedies this by +moving allocations between arenas to free up as much memory as possible. + +Compacting involves tracing the entire heap to update pointers to moved data +and is not incremental so it only happens rarely, or in response to memory +pressure notifications. + +.. _partitioned-heap: + +Partitioned heap +**************** + +The collector has the concept of 'zones' which are separate heaps which can be +collected independently. Objects in different zones can refer to each other +however. + +Zones are also used to help incrementalize parts of the collection. For +example, compacting is not fully incremental but can be performed one zone at +a time. + +Other documentation +################### + +More details about the Garbage Collector (GC) can be found by looking for the +`[SMDOC] Garbage Collector`_ comment in the sources. + +.. _[SMDOC] Garbage Collector: https://searchfox.org/mozilla-central/search?q=[SMDOC]+Garbage+Collector diff --git a/js/src/doc/hacking_tips.md b/js/src/doc/hacking_tips.md new file mode 100644 index 0000000000..58c42ee5e0 --- /dev/null +++ b/js/src/doc/hacking_tips.md @@ -0,0 +1,588 @@ +# Hacking Tips + +**These tips were archived from [MDN](https://mdn-archive.mossop.dev/en-US/docs/Mozilla/Projects/SpiderMonkey/Hacking_Tips)**: This may be out of date! + +This is archived here because it captures valuable documentation that even if potentially out of date, provides inspiration. + +--- + + +This page lists a few tips to help you investigate issues related to SpiderMonkey. All tips listed here are dealing with the JavaScript shell obtained at the end of the [build documentation of SpiderMonkey](build.rst). It is separated in 2 parts, one section related to debugging and another section related to drafting optimizations. Many of these tips only apply to debug builds of the JS shell; they will not function in a release build. + +## Tools + +Here are some debugging tools above and beyond your standard debugger that might help you: + +* [rr](https://rr-project.org/) is a record-and-replay deterministic debugger for Linux +* [Pernosco](https://pernos.co/) takes an rr recording and adds omniscient debugging tools to help you + * It is a paid service with a free trial + * Mozilla has a license for internal developers; you can contact Matthew Gaudet for details + + +## Debugging Tips + +### Getting help (from JS shell) + +Use the **help** function to get the list of all primitive functions of the shell with their description. Note that some functions have been moved under an 'os' object, and **help(os)** will give brief help on just the members of that "namespace". + +You can also use **help(/Regex/)** to get help for members of the global namespace that match the given regular expression. + +### Getting the bytecode of a function (from JS shell) + +The shell has a small function named **dis** to dump the bytecode of a function with its source notes. Without arguments, it will dump the bytecode of its caller. + +``` +js> function f () { + return 1; +} +js> dis(f); +flags: +loc op +----- -- +main: +00000: one +00001: return +00002: stop + +Source notes: + ofs line pc delta desc args +---- ---- ----- ------ -------- ------ + 0: 1 0 [ 0] newline + 1: 2 0 [ 0] colspan 2 + 3: 2 2 [ 2] colspan 9 + +``` + +### Getting the bytecode of a function (from gdb) + +In _jsopcode.cpp_, a function named **js::DisassembleAtPC** can print the bytecode of a script. Some variants of this function, such as **js::DumpScript** etc., are convenient for debugging. + +### Printing the JS stack (from gdb) + +In _jsobj.cpp_, a function named **js::DumpBacktrace** prints a backtrace à la gdb for the JS stack. The backtrace contains in the following order, the stack depth, the interpreter frame pointer (see _js/src/vm/Stack.h_, **StackFrame** class) or (nil) if compiled with IonMonkey, the file and line number of the call location and under parentheses, the **JSScript** pointer and the **jsbytecode** pointer (pc) executed. + +``` +$ gdb --args js +[…] +(gdb) b js::ReportOverRecursed +(gdb) r +js> function f(i) { + if (i % 2) f(i + 1); + else f(i + 3); +} +js> f(0) + +Breakpoint 1, js::ReportOverRecursed (maybecx=0xfdca70) at /home/nicolas/mozilla/ionmonkey/js/src/jscntxt.cpp:495 +495 if (maybecx) +(gdb) call js::DumpBacktrace(maybecx) +#0 (nil) typein:2 (0x7fffef1231c0 @ 0) +#1 (nil) typein:2 (0x7fffef1231c0 @ 24) +#2 (nil) typein:3 (0x7fffef1231c0 @ 47) +#3 (nil) typein:2 (0x7fffef1231c0 @ 24) +#4 (nil) typein:3 (0x7fffef1231c0 @ 47) +[…] +#25157 0x7fffefbbc250 typein:2 (0x7fffef1231c0 @ 24) +#25158 0x7fffefbbc1c8 typein:3 (0x7fffef1231c0 @ 47) +#25159 0x7fffefbbc140 typein:2 (0x7fffef1231c0 @ 24) +#25160 0x7fffefbbc0b8 typein:3 (0x7fffef1231c0 @ 47) +#25161 0x7fffefbbc030 typein:5 (0x7fffef123280 @ 9) + +``` + +Note, you can do the exact same exercise above using `lldb` (necessary on OSX after Apple removed `gdb`) by running `lldb -f js` then following the remaining steps. + +Since SpiderMonkey 48, we have a gdb unwinder. This unwinder is able to read the frames created by the JIT, and to display the frames which are after these JIT frames. + +``` +$ gdb --args out/dist/bin/js ./foo.js +[…] +SpiderMonkey unwinder is disabled by default, to enable it type: + enable unwinder .* SpiderMonkey +(gdb) b js::math_cos +(gdb) run +[…] +#0 js::math_cos (cx=0x14f2640, argc=1, vp=0x7fffffff6a88) at js/src/jsmath.cpp:338 +338 CallArgs args = CallArgsFromVp(argc, vp); +(gdb) enable unwinder .* SpiderMonkey +(gdb) backtrace 10 +#0 0x0000000000f89979 in js::math_cos(JSContext*, unsigned int, JS::Value*) (cx=0x14f2640, argc=1, vp=0x7fffffff6a88) at js/src/jsmath.cpp:338 +#1 0x0000000000ca9c6e in js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (cx=0x14f2640, native=0xf89960 , args=...) at js/src/jscntxtinlines.h:235 +#2 0x0000000000c87625 in js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) (cx=0x14f2640, args=..., construct=js::NO_CONSTRUCT) at js/src/vm/Interpreter.cpp:476 +#3 0x000000000069bdcf in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICCall_Fallback*, uint32_t, JS::Value*, JS::MutableHandleValue) (cx=0x14f2640, frame=0x7fffffff6ad8, stub_=0x1798838, argc=1, vp=0x7fffffff6a88, res=JSVAL_VOID) at js/src/jit/BaselineIC.cpp:6113 +#4 0x00007ffff7f41395 in +``` + +Note, when you enable the unwinder, the current version of gdb (7.10.1) does not flush the backtrace. Therefore, the JIT frames do not appear until you settle on the next breakpoint. To work-around this issue you can use the recording feature of `gdb`, to step one instruction, and settle back to where you came from with the following set of `gdb` commands: + +``` +(gdb) record full +(gdb) si +(gdb) record goto 0 +(gdb) record stop +``` + +If you have a core file, you can use the gdb unwinder the same way, or do everything from the command line as follows: + +``` +$ gdb -ex 'enable unwinder .* SpiderMonkey' -ex 'bt 0' -ex 'thread apply all backtrace' -ex 'quit' out/dist/bin/js corefile +``` + +The gdb unwinder is supposed to be loaded by `dist/bin/js-gdb.py` and load python scripts which are located in `js/src/gdb/mozilla` under gdb. If gdb does not load the unwinder by default, you can force it to, by using the `source` command with the `js-gdb.py` file. + +### Setting a breakpoint in the generated code (from gdb, x86 / x86-64, arm) + +To set a breakpoint in the generated code of a specific JSScript compiled with IonMonkey, set a breakpoint on the instruction you are interested in. If you have no precise idea which function you are looking at, you can set a breakpoint on the **js::ion::CodeGenerator::visitStart** function. Optionally, a condition on the **ins->id()** of the LIR instruction can be added to select precisely the instruction you are looking for. Once the breakpoint is on the **CodeGenerator** function of the LIR instruction, add a command to generate a static breakpoint in the generated code. + +``` +$ gdb --args js +[…] +(gdb) b js::ion::CodeGenerator::visitStart +(gdb) command +>call masm.breakpoint() +>continue +>end +(gdb) r +js> function f(a, b) { return a + b; } +js> for (var i = 0; i < 100000; i++) f(i, i + 1); + +Breakpoint 1, js::ion::CodeGenerator::visitStart (this=0x101ed20, lir=0x10234e0) + at /home/nicolas/mozilla/ionmonkey/js/src/ion/CodeGenerator.cpp:609 +609 } + +Program received signal SIGTRAP, Trace/breakpoint trap. +0x00007ffff7fb165a in ?? () +(gdb) + +``` + +Once you hit the generated breakpoint, you can replace it by a gdb breakpoint to make it conditional. The procedure is to first replace the generated breakpoint by a nop instruction, and to set a breakpoint at the address of the nop. + +``` +(gdb) x /5i $pc - 1 + 0x7ffff7fb1659: int3 +=> 0x7ffff7fb165a: mov 0x28(%rsp),%rax + 0x7ffff7fb165f: mov %eax,%ecx + 0x7ffff7fb1661: mov 0x30(%rsp),%rdx + 0x7ffff7fb1666: mov %edx,%ebx + +(gdb) # replace the int3 by a nop +(gdb) set *(unsigned char *) ($pc - 1) = 0x90 +(gdb) x /1i $pc - 1 + 0x7ffff7fb1659: nop + +(gdb) # set a breakpoint at the previous location +(gdb) b *0x7ffff7fb1659 +Breakpoint 2 at 0x7ffff7fb1659 +``` + +### Printing Ion generated assembly code (from gdb) + +If you want to look at the assembly code generated by IonMonkey, you can follow this procedure: + +1. Place a breakpoint at CodeGenerator.cpp on the CodeGenerator::link method. +1. Step next a few times, so that the "code" variable gets generated +1. Print code->code\_, which is the address of the code +1. Disassemble code read at this address (using x/Ni address, where N is the number of instructions you would like to see) + +Here is an example. It might be simpler to use the CodeGenerator::link lineno instead of the full qualified name to put the breakpoint. Let's say that the line number of this function is 4780, for instance: + + (gdb) b CodeGenerator.cpp:4780 + Breakpoint 1 at 0x84cade0: file /home/code/mozilla-central/js/src/ion/CodeGenerator.cpp, line 4780. + (gdb) r + Starting program: /home/code/mozilla-central/js/src/32-release/js -f /home/code/jaeger.js + [Thread debugging using libthread_db enabled] + Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". + [New Thread 0xf7903b40 (LWP 12563)] + [New Thread 0xf6bdeb40 (LWP 12564)] + Run#0 + + Breakpoint 1, js::ion::CodeGenerator::link (this=0x86badf8) + at /home/code/mozilla-central/js/src/ion/CodeGenerator.cpp:4780 + 4780 { + (gdb) n + 4781 JSContext *cx = GetIonContext()->cx; + (gdb) n + 4783 Linker linker(masm); + (gdb) n + 4784 IonCode *code = linker.newCode(cx, JSC::ION*CODE); + (gdb) n + 4785 if (!code) + (gdb) p code->code* + $1 = (uint8_t \*) 0xf7fd25a8 "\201", + (gdb) x/2i 0xf7fd25a8 + 0xf7fd25a8: sub $0x80,%esp + 0xf7fd25ae: mov 0x94(%esp),%ecx + +On arm, the compiled JS code will always be ARM machine code, whereas SpiderMonkey itself is frequently Thumb2. Since there isn't debug info for the JIT'd code, you will need to tell gdb that you are looking at ARM code: + + (gdb) set arm force-mode arm + +Or you can wrap the x command in your own command: + + def xi + set arm force-mode arm + eval "x/%di %d", $arg0, $arg1 + set arm force-mode auto + end + +### Printing asm.js/wasm generated assembly code (from gdb) + +- Set a breakpoint on `js::wasm::Instance::callExport` (defined in `WasmInstance.cpp` as of November 18th 2016). This will trigger for _any_ asm.js/wasm call, so you should find a way to set this breakpoint for the only generated codes you want to look at. +- Run the program. +- Do `next` in gdb until you reach the definition of the `funcPtr`: +``` +// Call the per-exported-function trampoline created by GenerateEntry. +auto funcPtr = JS*DATA_TO_FUNC_PTR(ExportFuncPtr, codeBase() + func.entryOffset()); +if (!CALL_GENERATED_2(funcPtr, exportArgs.begin(), &tlsData*)) + return false; +``` +- After it's set, `x/64i funcPtr` will show you the trampoline code. There should be a call to an address at some point; that's what we're targeting. Copy that address. + +``` + 0x7ffff7ff6000: push %r15 + 0x7ffff7ff6002: push %r14 + 0x7ffff7ff6004: push %r13 + 0x7ffff7ff6006: push %r12 + 0x7ffff7ff6008: push %rbp + 0x7ffff7ff6009: push %rbx + 0x7ffff7ff600a: movabs $0xea4f80,%r10 + 0x7ffff7ff6014: mov 0x178(%r10),%r10 + 0x7ffff7ff601b: mov %rsp,0x40(%r10) + 0x7ffff7ff601f: mov (%rsi),%r15 + 0x7ffff7ff6022: mov %rdi,%r10 + 0x7ffff7ff6025: push %r10 + 0x7ffff7ff6027: test $0xf,%spl + 0x7ffff7ff602b: je 0x7ffff7ff6032 + 0x7ffff7ff6031: int3 + 0x7ffff7ff6032: callq 0x7ffff7ff5000 <------ right here +``` + +- `x/64i address` (in this case: `x/64i 0x7ffff7ff6032`). +- If you want to put a breakpoint at the function's entry, you can do: `b *address` (for instance here, `b* 0x7ffff7ff6032`). Then you can display the instructions around pc with `x/20i $pc,` and execute instruction by instruction with `stepi`. + +### Finding the script of Ion generated assembly (from gdb) + +When facing a bug in which you are in the middle of IonMonkey generated code, the first thing to note is that gdb's backtrace is not reliable, because the generated code does not keep a frame pointer. To figure it out, you have to read the stack to infer the IonMonkey frame. + +``` +(gdb) x /64a $sp +[…] +0x7fffffff9838: 0x7ffff7fad2da 0x141 +0x7fffffff9848: 0x7fffef134d40 0x2 +[…] +(gdb) p (*(JSFunction**) 0x7fffffff9848)->u.i.script_->lineno +$1 = 1 +(gdb) p (*(JSFunction**) 0x7fffffff9848)->u.i.script_->filename +$2 = 0xff92d1 "typein" +``` + +The stack is ordered as defined in js/src/ion/IonFrames-x86-shared.h. It is composed of the return address, a descriptor (a small value), the JSFunction (if it is even) or a JSScript (if it is odd; remove it to dereference the pointer) and the frame ends with the number of actual arguments (a small value too). If you want to know at which LIR the code is failing at, the **js::ion::CodeGenerator::generateBody** function can be instrumented to dump the LIR **id** before each instruction. + +``` +for (; iter != current->end(); iter++) { + IonSpew(IonSpew_Codegen, "instruction %s", iter->opName()); + […] + + masm.store16(Imm32(iter->id()), Address(StackPointer, -8)); // added + if (!iter->accept(this)) + return false; +``` + +`This modification will add an instruction which abuses the stack pointer` to store an immediate value (the LIR id) to a location which would never be generated by any sane compiler. Thus when dumping the assembly under gdb, this kind of instructions would be easily noticeable. + +### Viewing the MIRGraph of Ion/Odin compilations (from gdb) + +With gdb instrumentation, we can call [iongraph](https://github.com/sstangl/iongraph) program within gdb when the execution is stopped. This instrumentation adds an **`iongraph`** command when provided with an instance of a **`MIRGenerator*`**, will call `iongraph`, `graphviz` and your preferred png viewer to display the MIR graph at the precise time of the execution. To find **`MIRGenetator*`** instances, it is best to look up into the stack for `OptimizeMIR`, or `CodeGenerator::generateBody`. **`OptimizeMIR`** function has a **`mir`** argument, and the **`CodeGenerator::generateBody`** function has a member **`this->gen`**. + +``` +(gdb) bt +#0 0x00000000007eaad4 in js::InlineList::begin() const (this=0x33dbbc0) at …/js/src/jit/InlineList.h:280 +#1 0x00000000007cb845 in js::jit::MIRGraph::begin() (this=0x33dbbc0) at …/js/src/jit/MIRGraph.h:787 +#2 0x0000000000837d25 in js::jit::BuildPhiReverseMapping(js::jit::MIRGraph&) (graph=...) at …/js/src/jit/IonAnalysis.cpp:2436 +#3 0x000000000083317f in js::jit::OptimizeMIR(js::jit::MIRGenerator*) (mir=0x33dbdf0) at …/js/src/jit/Ion.cpp:1570 +… +(gdb) frame 3 +#3 0x000000000083317f in js::jit::OptimizeMIR(js::jit::MIRGenerator*) (mir=0x33dbdf0) at …/js/src/jit/Ion.cpp:1570 +(gdb) iongraph mir + function 0 (asm.js compilation): success; 1 passes. +/* open your png viewer with the result of iongraph */ +``` + +This gdb instrumentation is supposed to work with debug builds, or with optimized builds compiled with `--enable-jitspew` configure flag. External programs such as `iongraph`, `dot`, and your png viewer are searched for in the `PATH`; otherwise custom one can either be configured with environment variables (`GDB_IONGRAPH`, `GDB_DOT`, `GDB_PNGVIEWER`) before starting gdb, or with gdb parameters (`set iongraph-bin `, `set dot-bin `, `set pngviewer-bin `) within gdb. + +Enabling GDB instrumentation may require launching a JS shell executable that shares a directory with a file name "js-gdb.py". If js/src/js does not provide the "iongraph" command, try js/src/shell/js. GDB may complain that ~/.gdbinit requires modification to authorize user scripts, and if so will print out directions. + +### Finding the code that generated a JIT instruction (from rr) + +If you are looking at a JIT instruction and need to know what code generated it, you can use [jitsrc.py](https://searchfox.org/mozilla-central/source/js/src/gdb/mozilla/jitsrc.py). This script adds a `jitsrc` command to rr that will trace backwards from the JIT instruction to the code that generated it. + +To use the `jitsrc` command, add the following line to your .gdbinit file, or run it manually: + + source js/src/gdb/mozilla/jitsrc.py + +And you use the command like this: `jitsrc
`. + +Running the command will leave the application at the point of execution where that JIT instruction was originally emitted. For example, the backtrace might contain a frame at [js::jit::MacroAssemblerX64::loadPtr](https://searchfox.org/mozilla-central/rev/ddde3bbcafabe0fc8a36c660b3b673507d3e3874/js/src/jit/x64/MacroAssembler-x64.h#575). + +The way this works is by setting a watchpoint on the JIT instruction and `reverse-continue`ing the program execution to reach the point when that memory address was assigned to. JIT instruction memory can be copied or moved, so the `jitsrc` command automates updating the watchpoint across the copy/move to continue back to the original source of the JIT instruction. + +### Break on valgrind errors + +Sometimes, a bug can be reproduced under valgrind but with great difficulty under gdb. One way to investigate is to let valgrind start gdb for you; the other way documented here is to let valgrind act as a gdb server which can be manipulated from the gdb remote. + +``` +$ valgrind --smc-check=all-non-file +``` + +This command will tell you how to start gdb as a remote. Be aware that functions which are usually dumping some output will do it in the shell where valgrind is started and not in the shell where gdb is started. Thus functions such as **js::DumpBacktrace**, when called from gdb, will print their output in the shell containing valgrind. + +### Adding spew for Compilations & Bailouts & Invalidations (from gdb) + +If you are in rr, and forgot to record with the spew enabled with IONFLAGS or because this is an optimized build, then you can add similar spew with extra breakpoints within gdb. gdb has the ability to set breakpoints with commands, but a simpler / friendlier version is to use **dprintf**, with a location, and followed by printf-like arguments. + + (gdb) dprintf js::jit::IonBuilder::IonBuilder, "Compiling %s:%d:%d-%d\n", info->script*->scriptSource()->filename*.mTuple.mFirstA, info->script*->lineno*, info->script*->sourceStart*, info->script*->sourceEnd* + Dprintf 1 at 0x7fb4f6a104eb: file /home/nicolas/mozilla/contrib-push/js/src/jit/IonBuilder.cpp, line 159. + (gdb) cond 1 inliningDepth == 0 + (gdb) dprintf js::jit::BailoutIonToBaseline, "Bailout from %s:%d:%d-%d\n", iter.script()->scriptSource()->filename*.mTuple.mFirstA, iter.script()->lineno*, iter.script()->sourceStart*, iter.script()->sourceEnd* + Dprintf 2 at 0x7fb4f6fe43dc: js::jit::BailoutIonToBaseline. (2 locations) + (gdb) dprintf Ion.cpp:3196, "Invalidate %s:%d:%d-%d\n", co->script*->scriptSource()->filename*.mTuple.mFirstA, co->script*->lineno*, co->script*->sourceStart*, co->script*->sourceEnd* + Dprintf 3 at 0x7fb4f6a0b62a: file /home/nicolas/mozilla/contrib-push/js/src/jit/Ion.cpp, line 3196. + `(gdb) continue` + Compiling self-hosted:650:20470-21501 + Bailout from self-hosted:20:403-500 + Invalidate self-hosted:20:403-500 + +Note: the line 3196, listed above, corresponds to the location of the [Jit spew inside jit::Invalidate function](https://searchfox.org/mozilla-central/rev/655f49c541108e3d0a232aa7173fbcb9af88d80b/js/src/jit/Ion.cpp#2475). + +## Hacking tips + + +### Using the Gecko Profiler (browser / xpcshell) + +See the section dedicated to [profiling with the Gecko Profiler](/tools/profiler/index.rst). This method of profiling has the advantage of mixing the JavaScript stack with the C++ stack, which is useful for analyzing library function issues. + +One tip is to start looking at a script with an inverted JS stack to locate the most expensive JS function, then to focus on the frame of this JS function, and to remove the inverted stack and look at C++ part of this function to determine from where the cost is coming from. + +These archived [tips on using the Gecko Profiler](https://mdn-archive.mossop.dev/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler "/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler") and [FAQ](https://mdn-archive.mossop.dev/en-US/docs/Mozilla/Performance/Gecko_Profiler_FAQ "Gecko Profiler FAQ") might also be useful as inspiration, but are old enough that they are probably not accurate any more. + +### Using callgrind (JS shell) + +Because SpiderMonkey just-in-time compilers rewrite the executed program, valgrind should be informed from the command line by adding **--smc-check=all-non-file**. + +``` +$ valgrind --tool=callgrind --callgrind-out-file=bench.clg \ + --smc-check=all-non-file +``` + +The output file can then be used with **kcachegrind**, which provides a graphical view of the call graph. + +### Using IonMonkey spew (JS shell) + +IonMonkey spew is extremely verbose (not as much as the INFER spew), but you can filter it to focus on the list of compiled scripts or channels, IonMonkey spew channels can be selected with the IONFLAGS environment variable, and compilation spew can be filtered with IONFILTER. + +IONFLAGS contains the names of [each channel separated by commas](https://searchfox.org/mozilla-central/source/js/src/jit/JitSpewer.cpp#338). The **logs** channel produces one file (_/tmp/ion.json_), made to be used with [iongraph](https://github.com/sstangl/iongraph) (made by Sean Stangl). This tool will show the MIR & LIR steps done by IonMonkey during the compilation. To use [iongraph](https://github.com/sstangl/iongraph), you must install [Graphviz](https://www.graphviz.org/download/ "graphviz downloads"). + +Compilation logs and spew can be filtered with the IONFILTER environment variable which contains locations as output by other spew channels. Multiple locations can be specified using comma as a separator. + +``` +$ IONFILTER=pdfjs.js:16934 IONFLAGS=logs,scripts,osi,bailouts ./js --ion-offthread-compile=off ./run.js 2>&1 | less +``` + +The **bailouts** channel is likely to be the first thing you should focus on, because this means that something does not stay in IonMonkey and fallback to the interpreter. This channel outputs locations (as returned by the **id()** function of both instructions) of the latest MIR and the latest LIR phases. These locations should correspond to phases of the **logs** and a filter can be used to remove uninteresting functions. + +### Using the ARM simulator + +The ARM simulator can be used to test the ARM JIT backend on x86/x64 hardware. An ARM simulator build is an x86 shell (or browser) with the ARM JIT backend. Instead of entering JIT code, it runs it in a simulator (interpreter) for ARM code. To use the simulator, compile an x86 shell (32-bit, x64 doesn't work as we use a different Value format there), and pass --enable-arm-simulator to configure. For instance, on a 64-bit Linux host you can use the following configure command to get an ARM simulator build: + +```shell +AR=ar CC="gcc -m32" CXX="g++ -m32" ../configure --target=i686-pc-linux +--enable-debug --disable-optimize --enable-threadsafe --enable-simulator=arm +``` + +Or on OS X: + +```shell +$ AR=ar CC="clang -m32" CXX="clang++ -m32" ../configure --target=i686-apple-darwin10.0.0 --enable-debug --disable-optimize --enable-threadsafe --enable-arm-simulator +``` + +An **--enable-debug --enable-optimize** build is recommended if you want to run jit-tests or jstests. + +#### Use the VIXL Debugger in the simulator (arm64) + +Set a breakpoint (see the section above about setting a breakpoint in generated code) and run with the environment variable `USE_DEBUGGER=1`. This will then drop you into a simple debugger provided with VIXL, the ARM simulator technology used for arm64 simulation. + +#### Use the Simulator Debugger for arm32 + +The same instructions for arm64 in the preceding section apply, but the environment variable differs: Use `ARM_SIM_DEBUGGER=1`. + +#### Building the browser with the ARM simulator + +You can also build the entire browser with the ARM simulator backend, for instance to reproduce browser-only JS failures on ARM. Make sure to build a browser for x86 (32-bits) and add this option to your mozconfig file: + +ac_add_options --enable-arm-simulator + +If you are under an Ubuntu or Debian 64-bits distribution and you want to build a 32-bits browser, it might be hard to find the relevant 32-bits dependencies. You can use [padenot's scripts](https://github.com/padenot/fx-32-on-64.sh) which will magically setup a chrooted 32-bits environment and do All The Things (c) for you (you just need to modify the mozconfig file). + +### Using rr on a test + +Get the command line for your test run using -s: + +./jit_test.py -s $JS_SHELL saved-stacks/async.js + +Insert 'rr' before the shell invocation: + +``` +rr $JS_SHELL -f $JS_SRC/jit-test/lib/prolog.js --js-cache $JS_SRC/jit-test/.js-cache -e "const platform='linux2'; const libdir='$JS_SRC/jit-test/lib/'; const scriptdir='$JS_SRC/jit-test/tests/saved-stacks/'" -f $JS_SRC/jit-test/tests/saved-stacks/async.js +``` + +(note that the above is an example; simply setting JS_SHELL and JS_SRC will not work). Or if this is an intermittent, run it in a loop capturing an rr log for every one until it fails: + +``` +n=1; while rr ...same.as.above...; do echo passed $n; n=$(( $n + 1 )); done +``` + +Wait until it hits a failure. Now you can run `rr replay` to replay that last (failed) run under gdb. + +#### rr with reftest + +To break on the write of a differing pixel: + +1. Find the X/Y of a pixel that differs +2. Use `run Z` where Z is the mark in the log for TEST-START. For example in '[rr 28496 607198]REFTEST TEST-START | file:///home/bgirard/mozilla-central/tree/image/test/reftest/bmp/bmpsuite/b/wrapper.html?badpalettesize.bmp', Z would be 607198. +3. `break 'mozilla::dom::CanvasRenderingContext2D::DrawWindow(nsGlobalWindow&, double, double, double, double, nsAString_internal const&, unsigned int, mozilla::ErrorResult&)'` +4. `cont` +5. `break 'PresShell::RenderDocument(nsRect const&, unsigned int, unsigned int, gfxContext\*)'` +6. `set print object on` +7. `set $x = ` +8. `set $y = ` +9. `print &((cairo_image_surface_t*)aThebesContext->mDT.mRawPtr->mSurface).data[$y * ((cairo_image_surface_t*)aThebesContext->mDT.mRawPtr->mSurface).stride + $x * ((cairo_image_surface_t\*)aThebesContext->mDT.mRawPtr->mSurface).depth / 8]` +10. `watch *(char*)
` (NOTE: If you set a watch on the previous expression gdb will watch the expression and run out of watchpoints) + +#### rr with emacs + +Within emacs, do `M-x gud-gdb` and replace the command line with `rr replay`. When gdb comes up, enter + +``` +set annot 1 +``` + +to get it to emit file location information so that emacs will pop up the corresponding source. Note that if you `reverse-continue` over a SIGSEGV and you're using the standard .gdbinit that sets a catchpoint for that signal, you'll get an additional stop at the catchpoint. Just `reverse-continue` again to continue to your breakpoints or whatever. + +### [Hack] Replacing one instruction + +To replace one specific instruction, you can customize the instruction's visit function using the JSScript **filename** in **lineno** fields, as well as the **id()** of the LIR / MIR instructions. The JSScript can be obtained from **info().script()**. + +``` +bool +CodeGeneratorX86Shared::visitGuardShape(LGuardShape *guard) +{ + if (info().script()->lineno == 16934 && guard->id() == 522) { + [… another impl only for this one …] + return true; + } + [… old impl …] +``` + +### [Hack] Spewing all compiled code + +I usually just add this to the appropriate `executableCopy()` function. + + if (getenv("INST_DUMP")) { + char buf[4096]; + sprintf(buf, "gdb /proc/%d/exe %d -batch -ex 'set pagination off' -ex 'set arm force-mode arm' -ex 'x/%di %p' -ex 'set arm force-mode auto'", getpid(), getpid(), m_buffer.size() / 4, buffer); + system(buf); + } + +If you aren't running on arm, you should omit the `-ex 'set arm force-mode arm'` and `-ex 'set arm force-mode auto'`. And you should change the size()/4 to be something more appropriate for your architecture. + +### Benchmarking with sub-milliseconds (JS shell) + +In the shell, we have 2 simple ways to benchmark a script. We can either use the **-b** shell option (**--print-timing**) which will evaluate a script given on the command line without any need to instrument the benchmark and print an extra line showing the run-time of the script. The other way is to wrap the section that you want to measure with the **dateNow()** function call, which returns the number of milliseconds, with a decimal part for sub-milliseconds. + +```js +js> dateNow() - dateNow() +-0.0009765625 +``` + +Since [Firefox 61](https://bugzilla.mozilla.org/show_bug.cgi?id=1439788), the shell also has **performance.now()** available. + +### Benchmarking with sub-milliseconds (browser) + +Similar to how you can use **dateNow()** in the JS shell, you can use **performance.now()** in the JavaScript code of a page. + +### Dumping the JavaScript heap + +From the shell, you can call the `dumpHeap` function to dump out all GC things (reachable and unreachable) that are present in the heap. By default, the function writes to stdout, but a filename can be specified as an argument. + +Example output might look as follows: + +``` +0x1234abcd B global object +``` + +The output is textual. The first section of the file contains a list of roots, one per line. Each root has the form "0xabcd1234 \ \", where \ is the color of the given GC thing (B for black, G for gray, W for white) and \ is a string. The list of roots ends with a line containing "==========". + +After the roots come a series of zones. A zone starts with several "comment lines" that start with hashes. The first comment declares the zone. It is followed by lines listing each compartment within the zone. After all the compartments come arenas, which is where the GC things are actually stored. Each arena is followed by all the GC things in the arena. A GC thing starts with a line giving its address, its color, and the thing kind (object, function, whatever). After this comes a list of addresses that the GC thing points to, each one starting with ">". + +It's also possible to dump the JavaScript heap from C++ code (or from gdb) using the `js::DumpHeap` function. It is part of jsfriendapi.h and it is available in release builds. + +### Inspecting MIR objects within a debugger + +For MIRGraph, MBasicBlock, and MDefinition and its subclasses (MInstruction, MConstant, etc.), call the dump member function. + + (gdb) call graph->dump() + (gdb) call block->dump() + (gdb) call def->dump() + + +### How to debug oomTest() failures + +The oomTest() function executes a piece of code many times, simulating an OOM failure at each successive allocation it makes. It's designed to highlight incorrect OOM handling, which may show up as a crash or assertion failure at some later point. + +When debugging such a crash, the most useful thing is to locate the last simulated allocation failure, as that is usually what has caused the subsequent crash. + +My workflow for doing this is as follows: + +1. Build a version of the engine with `--enable-debug` and `--enable-oom-breakpoint` configure flags. +2. Set the environment variable `OOM_VERBOSE=1` and reproduce the failure. This will print an allocation count at each simulated failure. Note the count of the last allocation. +3. Run the engine under a debugger and set a breakpoint on the function `js_failedAllocBreakpoint`. +4. Run the program and `continue` the necessary number of times until you reach the final allocation. + - e.g. in lldb, if the allocation failure number shown is 1500, run `continue -i 1498` (subtracted 2 because we've already hit it once and don't want to skip the last). Drop "-i" for gdb. +5. Dump a backtrace. This should show you the point at which the OOM is incorrectly handled, which will be a few frames up from the breakpoint. + +Note: if you are on linux, it may be simpler to use rr. + +Some guidelines for handling OOM that lead to failures when they are not followed: + +1. Check for allocation failure! + - Fallible allocations should always must be checked and handled, at a minimum by returning a status indicating failure to the caller. +2. Report OOM to the context if you have one + - If a function has a `JSContext*` argument, usually it should call `js::ReportOutOfMemory(cx)` on allocation failure to report this to the context. +3. Sometimes it's OK to ignore OOM + - For example if you are performing a speculative optimisation you might abandon it and continue anyway. In this case, you may have to call cx->recoverFromOutOfMemory() if something further down the stack has already reported the failure. + +### Debugging GC marking/rooting + +The **js::debug** namespace contains some functions that are useful for watching mark bits for an individual JSObject* (or any Cell*). [js/src/gc/Heap.h](https://searchfox.org/mozilla-central/rev/dc5027f02e5ea1d6b56cfbd10f4d3a0830762115/js/src/gc/Heap.h#817-835) contains a comment describing an example usage. Reproduced here: + + // Sample usage from gdb: + // + // (gdb) p $word = js::debug::GetMarkWordAddress(obj) + // $1 = (uintptr_t *) 0x7fa56d5fe360 + // (gdb) p/x $mask = js::debug::GetMarkMask(obj, js::gc::GRAY) + // $2 = 0x200000000 + // (gdb) watch *$word + // Hardware watchpoint 7: *$word + // (gdb) cond 7 *$word & $mask + // (gdb) cont + // + // Note that this is *not* a watchpoint on a single bit. It is a watchpoint on + // the whole word, which will trigger whenever the word changes and the + // selected bit is set after the change. + // + // So if the bit changing is the desired one, this is exactly what you want. + // But if a different bit changes (either set or cleared), you may still stop + // execution if the $mask bit happened to already be set. gdb does not expose + // enough information to restrict the watchpoint to just a single bit. + +Most of the time, you will want **js::gc::BLACK** (or you can just use 0) for the 2nd param to **js::debug::GetMarkMask**. diff --git a/js/src/doc/index.rst b/js/src/doc/index.rst new file mode 100644 index 0000000000..82c36d52ad --- /dev/null +++ b/js/src/doc/index.rst @@ -0,0 +1,204 @@ +============ +SpiderMonkey +============ + +*SpiderMonkey* is the *JavaScript* and *WebAssembly* implementation library of +the *Mozilla Firefox* web browser. The implementation behaviour is defined by +the `ECMAScript `_ and `WebAssembly +`_ specifications. + +Much of the internal technical documentation of the engine can be found +throughout the source files themselves by looking for comments labelled with +`[SMDOC]`_. Information about the team, our processes, and about embedding +*SpiderMonkey* in your own projects can be found at https://spidermonkey.dev. + +Specific documentation on a few topics is available at: + +.. toctree:: + :maxdepth: 1 + + build + test + hacking_tips + Debugger/index + SavedFrame/index + feature_checklist + bytecode_checklist + + +Components of SpiderMonkey +########################## + +🧹 Garbage Collector +********************* + +.. toctree:: + :maxdepth: 2 + :hidden: + + Overview + Rooting Hazard Analysis + Running the Analysis + +*JavaScript* is a garbage collected language and at the core of *SpiderMonkey* +we manage a garbage-collected memory heap. Elements of this heap have a base +C++ type of `gc::Cell`_. Each round of garbage collection will free up any +*Cell* that is not referenced by a *root* or another live *Cell* in turn. + +See :doc:`GC overview` for more details. + + +📦 JS::Value and JSObject +************************** + +*JavaScript* values are divided into either objects or primitives +(*Undefined*, *Null*, *Boolean*, *Number*, *BigInt*, *String*, or *Symbol*). +Values are represented with the `JS::Value`_ type which may in turn point to +an object that extends from the `JSObject`_ type. Objects include both plain +*JavaScript* objects and exotic objects representing various things from +functions to *ArrayBuffers* to *HTML Elements* and more. + +Most objects extend ``NativeObject`` (which is a subtype of ``JSObject``) +which provides a way to store properties as key-value pairs similar to a hash +table. These objects hold their *values* and point to a *Shape* that +represents the set of *keys*. Similar objects point to the same *Shape* which +saves memory and allows the JITs to quickly work with objects similar to ones +it has seen before. See the `[SMDOC] Shapes`_ comment for more details. + +C++ (and Rust) code may create and manipulate these objects using the +collection of interfaces we traditionally call the **JSAPI**. + + +🗃️ JavaScript Parser +********************* + +In order to evaluate script text, we parse it using the *Parser* into an +`Abstract Syntax Tree`_ (AST) temporarily and then run the *BytecodeEmitter* +(BCE) to generate `Bytecode`_ and associated metadata. We refer to this +resulting format as `Stencil`_ and it has the helpful characteristic that it +does not utilize the Garbage Collector. The *Stencil* can then be +instantiated into a series of GC *Cells* that can be mutated and understood +by the execution engines described below. + +Each function as well as the top-level itself generates a distinct script. +This is the unit of execution granularity since functions may be set as +callbacks that the host runs at a later time. There are both +``ScriptStencil`` and ``js::BaseScript`` forms of scripts. + +By default, the parser runs in a mode called *syntax* or *lazy* parsing where +we avoid generating full bytecode for functions within the source that we are +parsing. This lazy parsing is still required to check for all *early errors* +that the specification describes. When such a lazily compiled inner function +is first executed, we recompile just that function in a process called +*delazification*. Lazy parsing avoids allocating the AST and bytecode which +saves both CPU time and memory. In practice, many functions are never +executed during a given load of a webpage so this delayed parsing can be +quite beneficial. + + +⚙️ JavaScript Interpreter +************************** + +The *bytecode* generated by the parser may be executed by an interpreter +written in C++ that manipulates objects in the GC heap and invokes native +code of the host (eg. web browser). See `[SMDOC] Bytecode Definitions`_ for +descriptions of each bytecode opcode and ``js/src/vm/Interpreter.cpp`` for +their implementation. + + +⚡ JavaScript JITs +******************* + +.. toctree:: + :maxdepth: 1 + :hidden: + + MIR-optimizations/index + +In order to speed up execution of *bytecode*, we use a series of Just-In-Time +(JIT) compilers to generate specialized machine code (eg. x86, ARM, etc) +tailored to the *JavaScript* that is run and the data that is processed. + +As an individual script runs more times (or has a loop that runs many times) +we describe it as getting *hotter* and at certain thresholds we *tier-up* by +JIT-compiling it. Each subsequent JIT tier spends more time compiling but +aims for better execution performance. + +Baseline Interpreter +-------------------- + +The *Baseline Interpreter* is a hybrid interpreter/JIT that interprets the +*bytecode* one opcode at a time, but attaches small fragments of code called +*Inline Caches* (ICs) that rapidly speed-up executing the same opcode the next +time (if the data is similar enough). See the `[SMDOC] JIT Inline Caches`_ +comment for more details. + +Baseline Compiler +----------------- + +The *Baseline Compiler* use the same *Inline Caches* mechanism from the +*Baseline Interpreter* but additionally translates the entire bytecode to +native machine code. This removes dispatch overhead and does minor local +optimizations. This machine code still calls back into C++ for complex +operations. The translation is very fast but the ``BaselineScript`` uses +memory and requires ``mprotect`` and flushing CPU caches. + +WarpMonkey +---------- + +The *WarpMonkey* JIT replaces the former *IonMonkey* engine and is the +highest level of optimization for the most frequently run scripts. It is able +to inline other scripts and specialize code based on the data and arguments +being processed. + +We translate the *bytecode* and *Inline Cache* data into a Mid-level +`Intermediate Representation`_ (Ion MIR) representation. This graph is +transformed and optimized before being *lowered* to a Low-level Intermediate +Representation (Ion LIR). This *LIR* performs register allocation and then +generates native machine code in a process called *Code Generation*. + +See `MIR Optimizations`_ for an overview of MIR optimizations. + +The optimizations here assume that a script continues to see data similar +what has been seen before. The *Baseline* JITs are essential to success here +because they generate *ICs* that match observed data. If after a script is +compiled with *Warp*, it encounters data that it is not prepared to handle it +performs a *bailout*. The *bailout* mechanism reconstructs the native machine +stack frame to match the layout used by the *Baseline Interpreter* and then +branches to that interpreter as though we were running it all along. Building +this stack frame may use special side-table saved by *Warp* to reconstruct +values that are not otherwise available. + + +🟪 WebAssembly +*************** + +In addition to *JavaScript*, the engine is also able to execute *WebAssembly* +(WASM) sources. + +WASM-Baseline (RabaldrMonkey) +----------------------------- + +This engine performs fast translation to machine code in order to minimize +latency to first execution. + +WASM-Ion (BaldrMonkey) +---------------------- + +This engine translates the WASM input into same *MIR* form that *WarpMonkey* +uses and uses the *IonBackend* to optimize. These optimizations (and in +particular, the register allocation) generate very fast native machine code. + + +.. _gc::Cell: https://searchfox.org/mozilla-central/search?q=[SMDOC]+GC+Cell +.. _JSObject: https://searchfox.org/mozilla-central/search?q=[SMDOC]+JSObject+layout +.. _JS::Value: https://searchfox.org/mozilla-central/search?q=[SMDOC]+JS%3A%3AValue+type&path=js%2F +.. _[SMDOC]: https://searchfox.org/mozilla-central/search?q=[SMDOC]&path=js%2F +.. _[SMDOC] Shapes: https://searchfox.org/mozilla-central/search?q=[SMDOC]+Shapes +.. _[SMDOC] Bytecode Definitions: https://searchfox.org/mozilla-central/search?q=[SMDOC]+Bytecode+Definitions&path=js%2F +.. _[SMDOC] JIT Inline Caches: https://searchfox.org/mozilla-central/search?q=[SMDOC]+JIT+Inline+Caches +.. _Stencil: https://searchfox.org/mozilla-central/search?q=[SMDOC]+Script+Stencil +.. _Bytecode: https://en.wikipedia.org/wiki/Bytecode +.. _Abstract Syntax Tree: https://en.wikipedia.org/wiki/Abstract_syntax_tree +.. _Intermediate Representation: https://en.wikipedia.org/wiki/Intermediate_representation +.. _MIR Optimizations: ./MIR-optimizations/index.html diff --git a/js/src/doc/test.rst b/js/src/doc/test.rst new file mode 100644 index 0000000000..32d7a2abf2 --- /dev/null +++ b/js/src/doc/test.rst @@ -0,0 +1,89 @@ +Running Automated JavaScript Tests +================================== + +SpiderMonkey uses two separate test suites. + +The `Test262 test suite `__ is the implementation conformance test suite for ECMA-262, the language specification for JavaScript. All JavaScript engines use Test262 to ensure that they implement JavaScript correctly. Test262 is run using ``mach jstests``. + +In addition to Test262, SpiderMonkey also has a large collection of internal tests. These tests are run using ``mach jit-test``. + +Both sets of tests can be run from a normal build of the JS shell. + +Running Test262 locally +~~~~~~~~~~~~~~~~~~~~~~~ + +The jstests shell harness is in ``js/src/tests/jstests.py``. It can be invoked using + +.. code:: bash + + ./mach jstests + +Note that mach will generally find the JS shell itself; the --shell argument can be used to specify the location manually. All other flags will be passed along to the harness. + +Test262 includes a lot of tests. When working on a specific feature, it is often useful to specify a subset of tests: + +.. code:: bash + + ./mach jstests TEST_PATH_SUBSTRING [TEST_PATH_SUBSTRING_2 ...] + +This runs all tests whose paths contain any TEST_PATH_SUBSTRING. To exclude tests, you can use the ``--exclude=EXCLUDED`` option. Other useful options include: + +- ``-s`` (``--show-cmd``): Show the exact command line used to run each test. +- ``-o`` (``--show-output``): Print each test's output. +- ``--args SHELL_ARGS``: Extra arguments to pass to the JS shell. +- ``--rr`` Run tests under the `rr `__ record-and-replay debugger. + +For a complete list of options, use: + +.. code:: bash + + ./mach jstests -- -h + +The Test262 tests can also be run in the browser, using: + +.. code:: bash + + ./mach jstestbrowser + +To run a specific test, you can use the ``--filter=PATTERN`` option, where PATTERN is a RegExp pattern that is tested against ``file:///{PATH_TO_OBJ_DIR}/dist/test-stage/jsreftest/tests/jsreftest.html?test={RELATIVE_PATH_TO_TEST_FROM_js/src/tests}``. + +For historical reasons, ``jstests`` also includes a SpiderMonkey specific suite of additional language feature tests in ``js/src/tests/non262``. + +Running jit-tests locally +~~~~~~~~~~~~~~~~~~~~~~~~~ + +The jit-test shell harness is in ``js/src/jit-test/jit_test.py``. It can be invoked using + +.. code:: bash + + ./mach jit-test + +Basic usage of ``mach jit-test`` is similar to ``mach jstests``. A subset of tests can be specified: + +.. code:: bash + + ./mach jit-test TEST_PATH_SUBSTRING [TEST_PATH_SUBSTRING_2 ...] + +The ``--jitflags`` option allows you to test the JS executable with different flags. The flags are used to control which features are run, such as which JITs are enabled and how quickly they will kick in. The valid options are named bundles like 'all', 'interp', 'none', etc. See the full list in the ``-h`` / ``--help`` output. + +Another helpful option specific to ``jit-tests`` is ``-R `` (or ``--retest ``). The first time this is run, it runs the entire test suite and writes a list of tests that failed to the given filename. The next time it is run, it will run only the tests in the given filename, and then will write the list of tests that failed when it is done. Thus, you can keep retesting as you fix bugs, and only the tests that failed the last time will run, speeding things up a bit. + +Other useful options include: + +- ``-s`` (``--show-cmd``): Show the exact command line used to run each test. +- ``-o`` (``--show-output``): Print each test's output. +- ``--args SHELL_ARGS``: Extra arguments to pass to the JS shell. +- ``--debug-rr`` Run a test under the `rr `__ record-and-replay debugger. +- ``--cgc`` Run a test with frequent compacting GCs (equivalent to ``SM(cgc)``) + +Adding new jit-tests +~~~~~~~~~~~~~~~~~~~~ + +Creating new tests for jit-tests is easy. Just add a new JS file in an appropriate directory under ``js/src/jit-test/tests``. (By default, tests should go in ``test/basic``.) The test harness will automatically find the test and run it. The test is considered to pass if the exit code of the JS shell is zero (i.e., JS didn't crash and there were no JS errors). Use the ``assertEq`` function to verify values in your test. + +There are some advanced options for tests. See the README (in ``js/src/jit-tests``) for details. + +Running jstests on Treeherder +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +On Treeherder, jstests run in the browser are shown as ``R(J)`` (search for ``jsreftest`` in ``mach try fuzzy``). SpiderMonkey shell jobs are shown as ``SM(...)``; most of them include JS shell runs of both jstests and jit-tests. -- cgit v1.2.3