From 9e3c08db40b8916968b9f30096c7be3f00ce9647 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sun, 21 Apr 2024 13:44:51 +0200 Subject: Adding upstream version 1:115.7.0. Signed-off-by: Daniel Baumann --- tools/profiler/docs/buffer.rst | 70 + tools/profiler/docs/code-overview.rst | 1494 ++++++++++++++++++++ tools/profiler/docs/fissionprofiler-20200424.png | Bin 0 -> 131301 bytes tools/profiler/docs/fissionprofiler.umlet.uxf | 546 +++++++ tools/profiler/docs/index.rst | 37 + tools/profiler/docs/instrumenting-javascript.rst | 60 + tools/profiler/docs/instrumenting-rust.rst | 433 ++++++ tools/profiler/docs/markers-guide.rst | 485 +++++++ tools/profiler/docs/memory.rst | 46 + tools/profiler/docs/profilerclasses-20220913.png | Bin 0 -> 727313 bytes tools/profiler/docs/profilerclasses.umlet.uxf | 811 +++++++++++ .../docs/profilerthreadregistration-20220913.png | Bin 0 -> 383738 bytes .../docs/profilerthreadregistration.umlet.uxf | 710 ++++++++++ 13 files changed, 4692 insertions(+) create mode 100644 tools/profiler/docs/buffer.rst create mode 100644 tools/profiler/docs/code-overview.rst create mode 100644 tools/profiler/docs/fissionprofiler-20200424.png create mode 100644 tools/profiler/docs/fissionprofiler.umlet.uxf create mode 100644 tools/profiler/docs/index.rst create mode 100644 tools/profiler/docs/instrumenting-javascript.rst create mode 100644 tools/profiler/docs/instrumenting-rust.rst create mode 100644 tools/profiler/docs/markers-guide.rst create mode 100644 tools/profiler/docs/memory.rst create mode 100644 tools/profiler/docs/profilerclasses-20220913.png create mode 100644 tools/profiler/docs/profilerclasses.umlet.uxf create mode 100644 tools/profiler/docs/profilerthreadregistration-20220913.png create mode 100644 tools/profiler/docs/profilerthreadregistration.umlet.uxf (limited to 'tools/profiler/docs') diff --git a/tools/profiler/docs/buffer.rst b/tools/profiler/docs/buffer.rst new file mode 100644 index 0000000000..dd7ef30dfd --- /dev/null +++ b/tools/profiler/docs/buffer.rst @@ -0,0 +1,70 @@ +Buffers and Memory Management +============================= + +In a post-Fission world, precise memory management across many threads and processes is +especially important. In order for the profiler to achieve this, it uses a chunked buffer +strategy. + +The `ProfileBuffer`_ is the overall buffer class that controls the memory and storage +for the profile, it allows allocating objects into it. This can be used freely +by things like markers and samples to store data as entries, without needing to know +about the general strategy for how the memory is managed. + +The `ProfileBuffer`_ is then backed by the `ProfileChunkedBuffer`_. This specialized +buffer grows incrementally, by allocating additional `ProfileBufferChunk`_ objects. +More and more chunks will be allocated until a memory limit is reached, where they will +be released. After releasing, the chunk will either be recycled or freed. + +The limiting of memory usage is coordinated by the `ProfilerParent`_ in the parent +process. The `ProfilerParent`_ and `ProfilerChild`_ exchange IPC messages with information +about how much memory is being used. When the maximum byte threshold is passed, +the ProfileChunkManager in the parent process removes the oldest chunk, and then the +`ProfilerParent`_ sends a `DestroyReleasedChunksAtOrBefore`_ message to all of child +processes so that the oldest chunks in the profile are released. This helps long profiles +to keep having data in a similar time frame. + +Profile Buffer Terminology +########################## + +ProfilerParent + The main profiler machinery is installed in the parent process. It uses IPC to + communicate to the child processes. The PProfiler is the actor which is used + to communicate across processes to coordinate things. See `ProfilerParent.h`_. The + ProfilerParent uses the DestroyReleasedChunksAtOrBefore meessage to control the + overall chunk limit. + +ProfilerChild + ProfilerChild is installed in every child process, it will receive requests from + DestroyReleasedChunksAtOrBefore. + +Entry + This is an individual entry in the `ProfileBuffer.h`_,. These entry sizes are not + related to the chunks sizes. An individual entry can straddle two different chunks. + An entry can contain various pieces of data, like markers, samples, and stacks. + +Chunk + An arbitrary sized chunk of memory, managed by the `ProfileChunkedBuffer`_, and + IPC calls from the ProfilerParent. + +Unreleased Chunk + This chunk is currently being used to write entries into. + +Released chunk + This chunk is full of data. When memory limits happen, it can either be recycled + or freed. + +Recycled chunk + This is a chunk that was previously written into, and full. When memory limits occur, + rather than freeing the memory, it is re-used as the next chunk. + +.. _ProfileChunkedBuffer: https://searchfox.org/mozilla-central/search?q=ProfileChunkedBuffer&path=&case=true®exp=false +.. _ProfileChunkManager: https://searchfox.org/mozilla-central/search?q=ProfileBufferChunkManager.h&path=&case=true®exp=false +.. _ProfileBufferChunk: https://searchfox.org/mozilla-central/search?q=ProfileBufferChunk&path=&case=true®exp=false +.. _ProfileBufferChunkManagerWithLocalLimit: https://searchfox.org/mozilla-central/search?q=ProfileBufferChunkManagerWithLocalLimit&case=true&path= +.. _ProfilerParent.h: https://searchfox.org/mozilla-central/source/tools/profiler/public/ProfilerParent.h +.. _ProfilerChild.h: https://searchfox.org/mozilla-central/source/tools/profiler/public/ProfilerChild.h +.. _ProfileBuffer.h: https://searchfox.org/mozilla-central/source/tools/profiler/core/ProfileBuffer.h +.. _ProfileBuffer: https://searchfox.org/mozilla-central/search?q=ProfileBuffer&path=&case=true®exp=false +.. _ProfilerParent: https://searchfox.org/mozilla-central/search?q=ProfilerParent&path=&case=true®exp=false +.. _ProfilerChild: https://searchfox.org/mozilla-central/search?q=ProfilerChild&path=&case=true®exp=false +.. _DestroyReleasedChunksAtOrBefore: https://searchfox.org/mozilla-central/search?q=DestroyReleasedChunksAtOrBefore&path=&case=true®exp=false diff --git a/tools/profiler/docs/code-overview.rst b/tools/profiler/docs/code-overview.rst new file mode 100644 index 0000000000..3ca662e141 --- /dev/null +++ b/tools/profiler/docs/code-overview.rst @@ -0,0 +1,1494 @@ +Profiler Code Overview +###################### + +This is an overview of the code that implements the Profiler inside Firefox +with dome details around tricky subjects, or pointers to more detailed +documentation and/or source code. + +It assumes familiarity with Firefox development, including Mercurial (hg), mach, +moz.build files, Try, Phabricator, etc. + +It also assumes knowledge of the user-visible part of the Firefox Profiler, that +is: How to use the Firefox Profiler, and what profiles contain that is shown +when capturing a profile. See the main website https://profiler.firefox.com, and +its `documentation `_. + +For just an "overview", it may look like a huge amount of information, but the +Profiler code is indeed quite expansive, so it takes a lot of words to explain +even just a high-level view of it! For on-the-spot needs, it should be possible +to search for some terms here and follow the clues. But for long-term +maintainers, it would be worth skimming this whole document to get a grasp of +the domain, and return to get some more detailed information before diving into +the code. + +WIP note: This document should be correct at the time it is written, but the +profiler code constantly evolves to respond to bugs or to provide new exciting +features, so this document could become obsolete in parts! It should still be +useful as an overview, but its correctness should be verified by looking at the +actual code. If you notice any significant discrepancy or broken links, please +help by +`filing a bug `_. + +***** +Terms +***** + +This is the common usage for some frequently-used terms, as understood by the +Dev Tools team. But incorrect usage can sometimes happen, context is key! + +* **profiler** (a): Generic name for software that enables the profiling of + code. (`"Profiling" on Wikipedia `_) +* **Profiler** (the): All parts of the profiler code inside Firefox. +* **Base Profiler** (the): Parts of the Profiler that live in + mozglue/baseprofiler, and can be used from anywhere, but has limited + functionality. +* **Gecko Profiler** (the): Parts of the Profiler that live in tools/profiler, + and can only be used from other code in the XUL library. +* **Profilers** (the): Both the Base Profiler and the Gecko Profiler. +* **profiling session**: This is the time during which the profiler is running + and collecting data. +* **profile** (a): The output from a profiling session, either as a file, or a + shared viewable profile on https://profiler.firefox.com +* **Profiler back-end** (the): Other name for the Profiler code inside Firefox, + to distinguish it from... +* **Profiler front-end** (the): The website https://profiler.firefox.com that + displays profiles captured by the back-end. +* **Firefox Profiler** (the): The whole suite comprised of the back-end and front-end. + +****************** +Guiding Principles +****************** + +When working on the profiler, here are some guiding principles to keep in mind: + +* Low profiling overhead in cpu and memory. For the Profiler to provide the best + value, it should stay out of the way and consume as few resources (in time and + memory) as possible, so as not to skew the actual Firefox code too much. + +* Common data structures and code should be in the Base Profiler when possible. + + WIP note: Deduplication is slowly happening, see + `meta bug 1557566 `_. + This document focuses on the Profiler back-end, and mainly the Gecko Profiler + (because this is where most of the code lives, the Base Profiler is mostly a + subset, originally just a cut-down version of the Gecko Profiler); so unless + specified, descriptions below are about the Gecko Profiler, but know that + there may be some equivalent code in the Base Profiler as well. + +* Use appropriate programming-language features where possible to reduce coding + errors in both our code, and our users' usage of it. In C++, this can be done + by using a specific class/struct types for a given usage, to avoid misuse + (e.g., an generic integer representing a **process** could be incorrectly + given to a function expecting a **thread**; we have specific types for these + instead, more below.) + +* Follow the + `Coding Style `_. + +* Whenever possible, write tests (if not present already) for code you add or + modify -- but this may be too difficult in some case, use good judgement and + at least test manually instead. + +****************** +Profiler Lifecycle +****************** + +Here is a high-level view of the Base **or** Gecko Profiler lifecycle, as part +of a Firefox run. The following sections will go into much more details. + +* Profiler initialization, preparing some common data. +* Threads de/register themselves as they start and stop. +* During each User/test-controlled profiling session: + + * Profiler start, preparing data structures that will store the profiling data. + * Periodic sampling from a separate thread, happening at a user-selected + frequency (usually once every 1-2 ms), and recording snapshots of what + Firefox is doing: + + * CPU sampling, measuring how much time each thread has spent actually + running on the CPU. + * Stack sampling, capturing a stack of functions calls from whichever leaf + function the program is in at this point in time, up to the top-most + caller (i.e., at least the ``main()`` function, or its callers if any). + Note that unlike most external profilers, the Firefox Profiler back-end + is capable or getting more useful information than just native functions + calls (compiled from C++ or Rust): + + * Labels added by Firefox developers along the stack, usually to identify + regions of code that perform "interesting" operations (like layout, file + I/Os, etc.). + * JavaScript function calls, including the level of optimization applied. + * Java function calls. + * At any time, Markers may record more specific details of what is happening, + e.g.: User operations, page rendering steps, garbage collection, etc. + * Optional profiler pause, which stops most recording, usually near the end of + a session so that no data gets recorded past this point. + * Profile JSON output, generated from all the recorded profiling data. + * Profiler stop, tearing down profiling session objects. +* Profiler shutdown. + +Note that the Base Profiler can start earlier, and then the data collected so +far, as well as the responsibility for periodic sampling, is handed over to the +Gecko Profiler: + +#. (Firefox starts) +#. Base Profiler init +#. Base Profiler start +#. (Firefox loads the libxul library and initializes XPCOM) +#. Gecko Profiler init +#. Gecko Profiler start +#. Handover from Base to Gecko +#. Base Profiler stop +#. (Bulk of the profiling session) +#. JSON generation +#. Gecko Profiler stop +#. Gecko Profiler shutdown +#. (Firefox ends XPCOM) +#. Base Profiler shutdown +#. (Firefox exits) + +Base Profiler functions that add data (mostly markers and labels) may be called +from anywhere, and will be recorded by either Profiler. The corresponding +functions in Gecko Profiler can only be called from other libxul code, and can +only be recorded by the Gecko Profiler. + +Whenever possible, Gecko Profiler functions should be preferred if accessible, +as they may provide extended functionality (e.g., better stacks with JS in +markers). Otherwise fallback on Base Profiler functions. + +*********** +Directories +*********** + +* Non-Profiler supporting code + + * `mfbt `_ - Mostly + replacements for C++ std library facilities. + + * `mozglue/misc `_ + + * `PlatformMutex.h `_ - + Mutex base classes. + * `StackWalk.h `_ - + Stack-walking functions. + * `TimeStamp.h `_ - + Timestamps and time durations. + + * `xpcom `_ + + * `ds `_ - + Data structures like arrays, strings. + + * `threads `_ - + Threading functions. + +* Profiler back-end + + * `mozglue/baseprofiler `_ - + Base Profiler code, usable from anywhere in Firefox. Because it lives in + mozglue, it's loaded right at the beginning, so it's possible to start the + profiler very early, even before Firefox loads its big&heavy "xul" library. + + * `baseprofiler's public `_ - + Public headers, may be #included from anywhere. + * `baseprofiler's core `_ - + Main implementation code. + * `baseprofiler's lul `_ - + Special stack-walking code for Linux. + * `../tests/TestBaseProfiler.cpp `_ - + Unit tests. + + * `tools/profiler `_ - + Gecko Profiler code, only usable from the xul library. That library is + loaded a short time after Firefox starts, so the Gecko Profiler is not able + to profile the early phase of the application, Base Profiler handles that, + and can pass its collected data to the Gecko Profiler when the latter + starts. + + * `public `_ - + Public headers, may be #included from most libxul code. + * `core `_ - + Main implementation code. + * `gecko `_ - + Control from JS, and multi-process/IPC code. + * `lul `_ - + Special stack-walking code for Linux. + * `rust-api `_, + `rust-helper `_ + * `tests `_ + + * `devtools/client/performance-new `_, + `devtools/shared/performance-new `_ - + Middleware code for about:profiling and devtools panel functionality. + + * js, starting with + `js/src/vm/GeckoProfiler.h `_ - + JavaScript engine support, mostly to capture JS stacks. + + * `toolkit/components/extensions/schemas/geckoProfiler.json `_ - + File that needs to be updated when Profiler features change. + +* Profiler front-end + + * Out of scope for this document, but its code and bug repository can be found at: + https://github.com/firefox-devtools/profiler . Sometimes work needs to be + done on both the back-end of the front-end, especially when modifying the + back-end's JSON output format. + +******* +Headers +******* + +The most central public header is +`GeckoProfiler.h `_, +from which almost everything else can be found, it can be a good starting point +for exploration. +It includes other headers, which together contain important top-level macros and +functions. + +WIP note: GeckoProfiler.h used to be the header that contained everything! +To better separate areas of functionality, and to hopefully reduce compilation +times, parts of it have been split into smaller headers, and this work will +continue, see `bug 1681416 `_. + +MOZ_GECKO_PROFILER and Macros +============================= + +Mozilla officially supports the Profiler on `tier-1 platforms +`_: +Windows, macos, Linux and Android. +There is also some code running on tier 2-3 platforms (e.g., for FreeBSD), but +the team at Mozilla is not obligated to maintain it; we do try to keep it +running, and some external contributors are keeping an eye on it and provide +patches when things do break. + +To reduce the burden on unsupported platforms, a lot of the Profilers code is +only compiled when ``MOZ_GECKO_PROFILER`` is #defined. This means that some +public functions may not always be declared or implemented, and should be +surrounded by guards like ``#ifdef MOZ_GECKO_PROFILER``. + +Some commonly-used functions offer an empty definition in the +non-``MOZ_GECKO_PROFILER`` case, so these functions may be called from anywhere +without guard. + +Other functions have associated macros that can always be used, and resolve to +nothing on unsupported platforms. E.g., +``PROFILER_REGISTER_THREAD`` calls ``profiler_register_thread`` where supported, +otherwise does nothing. + +WIP note: There is an effort to eventually get rid of ``MOZ_GECKO_PROFILER`` and +its associated macros, see +`bug 1635350 `_. + +RAII "Auto" macros and classes +============================== +A number of functions are intended to be called in pairs, usually to start and +then end some operation. To ease their use, and ensure that both functions are +always called together, they usually have an associated class and/or macro that +may be called only once. This pattern of using an object's destructor to ensure +that some action always eventually happens, is called +`RAII `_ in C++, with the +common prefix "auto". + +E.g.: In ``MOZ_GECKO_PROFILER`` builds, +`AUTO_PROFILER_INIT `_ +instantiates an +`AutoProfilerInit `_ +object, which calls ``profiler_init`` when constructed, and +``profiler_shutdown`` when destroyed. + +********************* +Platform Abstractions +********************* + +This section describes some platform abstractions that are used throughout the +Profilers. (Other platform abstractions will be described where they are used.) + +Process and Thread IDs +====================== + +The Profiler back-end often uses process and thread IDs (aka "pid" and "tid"), +which are commonly just a number. +For better code correctness, and to hide specific platform details, they are +encapsulated in opaque types +`BaseProfilerProcessId `_ +and +`BaseProfilerThreadId `_. +These types should be used wherever possible. +When interfacing with other code, they may be converted using the member +functions ``FromNumber`` and ``ToNumber``. + +To find the current process or thread ID, use +`profiler_current_process_id `_ +or +`profiler_current_thread_id `_. + +The main thread ID is available through +`profiler_main_thread_id `_ +(assuming +`profiler_init_main_thread_id `_ +was called when the application started -- especially important in stand-alone +test programs.) +And +`profiler_is_main_thread `_ +is a quick way to find out if the current thread is the main thread. + +Locking +======= +The locking primitives in PlatformMutex.h are not supposed to be used as-is, but +through a user-accessible implementation. For the Profilers, this is in +`BaseProfilerDetail.h `_. + +In addition to the usual ``Lock``, ``TryLock``, and ``Unlock`` functions, +`BaseProfilerMutex `_ +objects have a name (which may be helpful when debugging), +they record the thread on which they are locked (making it possible to know if +the mutex is locked on the current thread), and in ``DEBUG`` builds there are +assertions verifying that the mutex is not incorrectly used recursively, to +verify the correct ordering of different Profiler mutexes, and that it is +unlocked before destruction. + +Mutexes should preferably be locked within C++ block scopes, or as class +members, by using +`BaseProfilerAutoLock `_. + +Some classes give the option to use a mutex or not (so that single-threaded code +can more efficiently bypass locking operations), for these we have +`BaseProfilerMaybeMutex `_ +and +`BaseProfilerMaybeAutoLock `_. + +There is also a special type of shared lock (aka RWLock, see +`RWLock on wikipedia `_), +which may be locked in multiple threads (through ``LockShared`` or preferably +`BaseProfilerAutoLockShared `_), +or locked exclusively, preventing any other locking (through ``LockExclusive`` or preferably +`BaseProfilerAutoLockExclusive `_). + +********************* +Main Profiler Classes +********************* + +Diagram showing the most important Profiler classes, see details in the +following sections: + +(As noted, the "RegisteredThread" classes are now obsolete in the Gecko +Profiler, see the "Thread Registration" section below for an updated diagram and +description.) + +.. image:: profilerclasses-20220913.png + +*********************** +Profiler Initialization +*********************** + +`profiler_init `_ +and +`baseprofiler::profiler_init `_ +must be called from the main thread, and are used to prepare important aspects +of the profiler, including: + +* Making sure the main thread ID is recorded. +* Handling ``MOZ_PROFILER_HELP=1 ./mach run`` to display the command-line help. +* Creating the ``CorePS`` instance -- more details below. +* Registering the main thread. +* Initializing some platform-specific code. +* Handling other environment variables that are used to immediately start the + profiler, with optional settings provided in other env-vars. + +CorePS +====== + +The `CorePS class `_ +has a single instance that should live for the duration of the Firefox +application, and contains important information that could be needed even when +the Profiler is not running. + +It includes: + +* A static pointer to its single instance. +* The process start time. +* JavaScript-specific data structures. +* A list of registered + `PageInformations `_, + used to keep track of the tabs that this process handles. +* A list of + `BaseProfilerCounts `_, + used to record things like the process memory usage. +* The process name, and optionally the "eTLD+1" (roughly sub-domain) that this + process handles. +* In the Base Profiler only, a list of + `RegisteredThreads `_. + WIP note: This storage has been reworked in the Gecko Profiler (more below), + and in practice the Base Profiler only registers the main thread. This should + eventually disappear as part of the de-duplication work + (`bug 1557566 `_). + +******************* +Thread Registration +******************* + +Threads need to register themselves in order to get fully profiled. +This section describes the main data structures that record the list of +registered threads and their data. + +WIP note: There is some work happening to add limited profiling of unregistered +threads, with the hope that more and more functionality could be added to +eventually use the same registration data structures. + +Diagram showing the relevant classes, see details in the following sub-sections: + +.. image:: profilerthreadregistration-20220913.png + +ProfilerThreadRegistry +====================== + +The +`static ProfilerThreadRegistry object `_ +contains a list of ``OffThreadRef`` objects. + +Each ``OffThreadRef`` points to a ``ProfilerThreadRegistration``, and restricts +access to a safe subset of the thread data, and forces a mutex lock if necessary +(more information under ProfilerThreadRegistrationData below). + +ProfilerThreadRegistration +========================== + +A +`ProfilerThreadRegistration object `_ +contains a lot of information relevant to its thread, to help with profiling it. + +This data is accessible from the thread itself through an ``OnThreadRef`` +object, which points to the ``ThreadRegistration``, and restricts access to a +safe subset of thread data, and forces a mutex lock if necessary (more +information under ProfilerThreadRegistrationData below). + +ThreadRegistrationData and accessors +==================================== + +`The ProfilerThreadRegistrationData.h header `_ +contains a hierarchy of classes that encapsulate all the thread-related data. + +``ThreadRegistrationData`` contains all the actual data members, including: + +* Some long-lived + `ThreadRegistrationInfo `_, + containing the thread name, its registration time, the thread ID, and whether + it's the main thread. +* A ``ProfilingStack`` that gathers developer-provided pseudo-frames, and JS + frames. +* Some platform-specific ``PlatformData`` (usually required to actually record + profiling measurements for that thread). +* A pointer to the top of the stack. +* A shared pointer to the thread's ``nsIThread``. +* A pointer to the ``JSContext``. +* An optional pre-allocated ``JsFrame`` buffer used during stack-sampling. +* Some JS flags. +* Sleep-related data (to avoid costly sampling while the thread is known to not + be doing anything). +* The current ``ThreadProfilingFeatures``, to know what kind of data to record. +* When profiling, a pointer to a ``ProfiledThreadData``, which contains some + more data needed during and just after profiling. + +As described in their respective code comments, each data member is supposed to +be accessed in certain ways, e.g., the ``JSContext`` should only be "written +from thread, read from thread and suspended thread". To enforce these rules, +data members can only be accessed through certain classes, which themselves can +only be instantiated in the correct conditions. + +The accessor classes are, from base to most-derived: + +* ``ThreadRegistrationData``, not an accessor itself, but it's the base class + with all the ``protected`` data. +* ``ThreadRegistrationUnlockedConstReader``, giving unlocked ``const`` access to + the ``ThreadRegistrationInfo``, ``PlatformData``, and stack top. +* ``ThreadRegistrationUnlockedConstReaderAndAtomicRW``, giving unlocked + access to the atomic data members: ``ProfilingStack``, sleep-related data, + ``ThreadProfilingFeatures``. +* ``ThreadRegistrationUnlockedRWForLockedProfiler``, giving access that's + protected by the Profiler's main lock, but doesn't require a + ``ThreadRegistration`` lock, to the ``ProfiledThreadData`` +* ``ThreadRegistrationUnlockedReaderAndAtomicRWOnThread``, giving unlocked + mutable access, but only on the thread itself, to the ``JSContext``. +* ``ThreadRegistrationLockedRWFromAnyThread``, giving locked access from any + thread to mutex-protected data: ``ThreadProfilingFeatures``, ``JsFrame``, + ``nsIThread``, and the JS flags. +* ``ThreadRegistrationLockedRWOnThread``, giving locked access, but only from + the thread itself, to the ``JSContext`` and a JS flag-related operation. +* ``ThreadRegistration::EmbeddedData``, containing all of the above, and stored + as a data member in each ``ThreadRegistration``. + +To recapitulate, if some code needs some data on the thread, it can use +``ThreadRegistration`` functions to request access (with the required rights, +like a mutex lock). +To access data about another thread, use similar functions from +``ThreadRegistry`` instead. +You may find some examples in the implementations of the functions in +ProfilerThreadState.h (see the following section). + +ProfilerThreadState.h functions +=============================== + +The +`ProfilerThreadState.h `_ +header provides a few helpful functions related to threads, including: + +* ``profiler_is_active_and_thread_is_registered`` +* ``profiler_thread_is_being_profiled`` (for the current thread or another + thread, and for a given set of features) +* ``profiler_thread_is_sleeping`` + +************** +Profiler Start +************** + +There are multiple ways to start the profiler, through command line env-vars, +and programmatically in C++ and JS. + +The main public C++ function is +`profiler_start `_. +It takes all the features specifications, and returns a promise that gets +resolved when the Profiler has fully started in all processes (multi-process +profiling is described later in this document, for now the focus will be on each +process running its instance of the Profiler). It first calls ``profiler_init`` +if needed, and also ``profiler_stop`` if the profiler was already running. + +The main implementation, which can be called from multiple sources, is +`locked_profiler_start `_. +It performs a number of operations to start the profiling session, including: + +* Record the session start time. +* Pre-allocate some work buffer to capture stacks for markers on the main thread. +* In the Gecko Profiler only: If the Base Profiler was running, take ownership + of the data collected so far, and stop the Base Profiler (we don't want both + trying to collect the same data at the same time!) +* Create the ActivePS, which keeps track of most of the profiling session + information, more about it below. +* For each registered thread found in the ``ThreadRegistry``, check if it's one + of the threads to profile, and if yes set the appropriate data into the + corresponding ``ThreadRegistrationData`` (including informing the JS engine to + start recording profiling data). +* On Android, start the Java sampler. +* If native allocations are to be profiled, setup the appropriate hooks. +* Start the audio callback tracing if requested. +* Set the public shared "active" state, used by many functions to quickly assess + whether to actually record profiling data. + +ActivePS +======== + +The `ActivePS class `_ +has a single instance at a time, that should live for the length of the +profiling session. + +It includes: + +* The session start time. +* A way to track "generations" (in case an old ActivePS still lives when the + next one starts, so that in-flight data goes to the correct place.) +* Requested features: Buffer capacity, periodic sampling interval, feature set, + list of threads to profile, optional: specific tab to profile. +* The profile data storage buffer and its chunk manager (see "Storage" section + below for details.) +* More data about live and dead profiled threads. +* Optional counters for per-process CPU usage, and power usage. +* A pointer to the ``SamplerThread`` object (see "Periodic Sampling" section + below for details.) + +******* +Storage +******* + +During a session, the profiling data is serialized into a buffer, which is made +of "chunks", each of which contains "blocks", which have a size and the "entry" +data. + +During a profiling session, there is one main profile buffer, which may be +started by the Base Profiler, and then handed over to the Gecko Profiler when +the latter starts. + +The buffer is divided in chunks of equal size, which are allocated before they +are needed. When the data reaches a user-set limit, the oldest chunk is +recycled. This means that for long-enough profiling sessions, only the most +recent data (that could fit under the limit) is kept. + +Each chunk stores a sequence of blocks of variable length. The chunk itself +only knows where the first full block starts, and where the last block ends, +which is where the next block will be reserved. + +To add an entry to the buffer, a block is reserved, the size is written first +(so that readers can find the start of the next block), and then the entry bytes +are written. + +The following sessions give more technical details. + +leb128iterator.h +================ + +`This utility header `_ +contains some functions to read and write unsigned "LEB128" numbers +(`LEB128 on wikipedia `_). + +They are an efficient way to serialize numbers that are usually small, e.g., +numbers up to 127 only take one byte, two bytes up to 16,383, etc. + +ProfileBufferBlockIndex +======================= + +`A ProfileBufferBlockIndex object `_ +encapsulates a block index that is known to be the valid start of a block. It is +created when a block is reserved, or when trusted code computes the start of a +block in a chunk. + +The more generic +`ProfileBufferIndex `_ +type is used when working inside blocks. + +ProfileBufferChunk +================== + +`A ProfileBufferChunk `_ +is a variable-sized object. It contains: + +* A public copyable header, itself containing: + + * The local offset to the first full block (a chunk may start with the end of + a block that was started at the end of the previous chunk). That offset in + the very first chunk is the natural start to read all the data in the + buffer. + * The local offset past the last reserved block. This is where the next block + should be reserved, unless it points past the end of this chunk size. + * The timestamp when the chunk was first used. + * The timestamp when the chunk became full. + * The number of bytes that may be stored in this chunk. + * The number of reserved blocks. + * The global index where this chunk starts. + * The process ID writing into this chunk. + +* An owning unique pointer to the next chunk. It may be null for the last chunk + in a chain. + +* In ``DEBUG`` builds, a state variable, which is used to ensure that the chunk + goes through a known sequence of states (e.g., Created, then InUse, then + Done, etc.) See the sequence diagram + `where the member variable is defined `_. + +* The actual buffer data. + +Because a ProfileBufferChunk is variable-size, it must be created through its +static ``Create`` function, which takes care of allocating the correct amount +of bytes, at the correct alignment. + +Chunk Managers +============== + +ProfilerBufferChunkManager +-------------------------- + +`The ProfileBufferChunkManager abstract class `_ +defines the interface of classes that manage chunks. + +Concrete implementations are responsible for: +* Creating chunks for their user, with a mechanism to pre-allocate chunks before they are actually needed. +* Taking back and owning chunks when they are "released" (usually when full). +* Automatically destroying or recycling the oldest released chunks. +* Giving temporary access to extant released chunks. + +ProfileBufferChunkManagerSingle +------------------------------- + +`A ProfileBufferChunkManagerSingle object `_ +manages a single chunk. + +That chunk is always the same, it is never destroyed. The user may use it and +optionally release it. The manager can then be reset, and that one chunk will +be available again for use. + +A request for a second chunk would always fail. + +This manager is short-lived and not thread-safe. It is useful when there is some +limited data that needs to be captured without blocking the global profiling +buffer, usually one stack sample. This data may then be extracted and quickly +added to the global buffer. + +ProfileBufferChunkManagerWithLocalLimit +--------------------------------------- + +`A ProfileBufferChunkManagerWithLocalLimit object `_ +implements the ``ProfileBufferChunkManager`` interface fully, managing a number +of chunks, and making sure their total combined size stays under a given limit. +This is the main chunk manager user during a profiling session. + +Note: It also implements the ``ProfileBufferControlledChunkManager`` interface, +this is explained in the later section "Multi-Process Profiling". + +It is thread-safe, and one instance is shared by both Profilers. + +ProfileChunkedBuffer +==================== + +`A ProfileChunkedBuffer object `_ +uses a ``ProfilerBufferChunkManager`` to store data, and handles the different +C++ types of data that the Profilers want to read/write as entries in buffer +chunks. + +Its main function is ``ReserveAndPut``: + +* It takes an invocable object (like a lambda) that should return the size of + the entry to store, this is to potentially avoid costly operations just to + compute a size, when the profiler may not be running. +* It attempts to reserve the space in its chunks, requesting a new chunk if + necessary. +* It then calls a provided invocable object with a + `ProfileBufferEntryWriter `_, + which offers a range of functions to help serialize C++ objects. The + de/serialization functions are found in specializations of + `ProfileBufferEntryWriter::Serializer `_ + and + `ProfileBufferEntryReader::Deserializer `_. + +More "put" functions use ``ReserveAndPut`` to more easily serialize blocks of +memory, or C++ objects. + +``ProfileChunkedBuffer`` is optionally thread-safe, using a +``BaseProfilerMaybeMutex``. + +WIP note: Using a mutex makes this storage too noisy for profiling some +real-time (like audio processing). +`Bug 1697953 `_ will look +at switching to using atomic variables instead. +An alternative would be to use a totally separate non-thread-safe buffers for +each real-time thread that requires it (see +`bug 1754889 `_). + +ProfileBuffer +============= + +`A ProfileBuffer object `_ +uses a ``ProfileChunkedBuffer`` to store data, and handles the different kinds +of entries that the Profilers want to read/write. + +Each entry starts with a tag identifying a kind. These kinds can be found in +`ProfileBufferEntryKinds.h `_. + +There are "legacy" kinds, which are small fixed-length entries, such as: +Categories, labels, frame information, counters, etc. These can be stored in +`ProfileBufferEntry objects `_ + +And there are "modern" kinds, which have variable sizes, such as: Markers, CPU +running times, full stacks, etc. These are more directly handled by code that +can access the underlying ``ProfileChunkedBuffer``. + +The other major responsibility of a ``ProfileChunkedBuffer`` is to read back all +this data, sometimes during profiling (e.g., to duplicate a stack), but mainly +at the end of a session when generating the output JSON profile. + +***************** +Periodic Sampling +***************** + +Probably the most important job of the Profiler is to sample stacks of a number +of running threads, to help developers know which functions get used a lot when +performing some operation on Firefox. + +This is accomplished from a special thread, which regularly springs into action +and captures all this data. + +SamplerThread +============= + +`The SamplerThread object `_ +manages the information needed during sampling. It is created when the profiler +starts, and is stored inside the ``ActivePS``, see above for details. + +It includes: + +* A ``Sampler`` object that contains platform-specific details, which are + implemented in separate files like platform-win32.cpp, etc. +* The same generation index as its owning ``ActivePS``. +* The requested interval between samples. +* A handle to the thread where the sampling happens, its main function is + `Run() function `_. +* A list of callbacks to invoke after the next sampling. These may be used by + tests to wait for sampling to actually happen. +* The unregistered-thread-spy data, and an optional handle on another thread + that takes care of "spying" on unregistered thread (on platforms where that + operation is too expensive to run directly on the sampling thread). + +The ``Run()`` function takes care of performing the periodic sampling work: +(more details in the following sections) + +* Retrieve the sampling parameters. +* Instantiate a ``ProfileBuffer`` on the stack, to capture samples from other threads. +* Loop until a ``break``: + + * Lock the main profiler mutex, and do: + + * Check if sampling should stop, and break from the loop. + * Clean-up exit profiles (these are profiles sent from dying sub-processes, + and are kept for as long as they overlap with this process' own buffer range). + * Record the CPU utilization of the whole process. + * Record the power consumption. + * Sample each registered counter, including the memory counter. + * For each registered thread to be profiled: + + * Record the CPU utilization. + * If the thread is marked as "still sleeping", record a "same as before" + sample, otherwise suspend the thread and take a full stack sample. + * On some threads, record the event delay to compute the + (un)responsiveness. WIP note: This implementation may change. + + * Record profiling overhead durations. + + * Unlock the main profiler mutex. + * Invoke registered post-sampling callbacks. + * Spy on unregistered threads. + * Based on the requested sampling interval, and how much time this loop took, + compute when the next sampling loop should start, and make the thread sleep + for the appropriate amount of time. The goal is to be as regular as + possible, but if some/all loops take too much time, don't try too hard to + catch up, because the system is probably under stress already. + * Go back to the top of the loop. + +* If we're here, we hit a loop ``break`` above. +* Invoke registered post-sampling callbacks, to let them know that sampling + stopped. + +CPU Utilization +=============== + +CPU Utilization is stored as a number of milliseconds that a thread or process +has spent running on the CPU since the previous sampling. + +Implementations are platform-dependent, and can be found in +`the GetThreadRunningTimesDiff function `_ +and +`the GetProcessRunningTimesDiff function `_. + +Power Consumption +================= + +Energy probes added in 2022. + +Stacks +====== + +Stacks are the sequence of calls going from the entry point in the program +(generally ``main()`` and some OS-specific functions above), down to the +function where code is currently being executed. + +Native Frames +------------- + +Compiled code, from C++ and Rust source. + +Label Frames +------------ + +Pseudo-frames with arbitrary text, added from any language, mostly C++. + +JS, Wasm Frames +--------------- + +Frames corresponding to JavaScript functions. + +Java Frames +----------- + +Recorded by the JavaSampler. + +Stack Merging +------------- + +The above types of frames are all captured in different ways, and when finally +taking an actual stack sample (apart from Java), they get merged into one stack. + +All frames have an associated address in the call stack, and can therefore be +merged mostly by ordering them by this stack address. See +`MergeStacks `_ +for the implementation details. + +Counters +======== + +Counters are a special kind of probe, which can be continuously updated during +profiling, and the ``SamplerThread`` will sample their value at every loop. + +Memory Counter +-------------- + +This is the main counter. During a profiling session, hooks into the memory +manager keep track of each de/allocation, so at each sampling we know how many +operations were performed, and what is the current memory usage compared to the +previous sampling. + +Profiling Overhead +================== + +The ``SamplerThread`` records timestamps between parts of its sampling loop, and +records this as the sampling overhead. This may be useful to determine if the +profiler itself may have used too much of the computer resources, which could +skew the profile and give wrong impressions. + +Unregistered Thread Profiling +============================= + +At some intervals (not necessarily every sampling loop, depending on the OS), +the profiler may attempt to find unregistered threads, and record some +information about them. + +WIP note: This feature is experimental, and data is captured in markers on the +main thread. More work is needed to put this data in tracks like regular +registered threads, and capture more data like stack samples and markers. + +******* +Markers +******* + +Markers are events with a precise timestamp or time range, they have a name, a +category, options (out of a few choices), and optional marker-type-specific +payload data. + +Before describing the implementation, it is useful to be familiar with how +markers are natively added from C++, because this drives how the implementation +takes all this information and eventually outputs it in the final JSON profile. + +Adding Markers from C++ +======================= + +See https://firefox-source-docs.mozilla.org/tools/profiler/markers-guide.html + +Implementation +============== + +The main function that records markers is +`profiler_add_marker `_. +It's a variadic templated function that takes the different the expected +arguments, first checks if the marker should actually be recorded (the profiler +should be running, and the target thread should be profiled), and then calls +into the deeper implementation function ``AddMarkerToBuffer`` with a reference +to the main profiler buffer. + +`AddMarkerToBuffer `_ +takes the marker type as an object, removes it from the function parameter list, +and calls the next function with the marker type as an explicit template +parameter, and also a pointer to the function that can capture the stack +(because it is different between Base and Gecko Profilers, in particular the +latter one knows about JS). + +From here, we enter the land of +`BaseProfilerMarkersDetail.h `_, +which employs some heavy template techniques, in order to most efficiently +serialize the given marker payload arguments, in order to make them +deserializable when outputting the final JSON. In previous implementations, for +each new marker type, a new C++ class derived from a payload abstract class was +required, that had to implement all the constructors and virtual functions to: + +* Create the payload object. +* Serialize the payload into the profile buffer. +* Deserialize from the profile buffer to a new payload object. +* Convert the payload into the final output JSON. + +Now, the templated functions automatically take care of serializing all given +function call arguments directly (instead of storing them somewhere first), and +preparing a deserialization function that will recreate them on the stack and +directly call the user-provided JSONification function with these arguments. + +Continuing from the public ``AddMarkerToBuffer``, +`mozilla::base_profiler_markers_detail::AddMarkerToBuffer `_ +sets some defaults if not specified by the caller: Target the current thread, +use the current time. + +Then if a stack capture was requested, attempt to do it in +the most efficient way, using a pre-allocated buffer if possible. + +WIP note: This potential allocation should be avoided in time-critical thread. +There is already a buffer for the main thread (because it's the busiest thread), +but there could be more pre-allocated threads, for specific real-time thread +that need it, or picked from a pool of pre-allocated buffers. See +`bug 1578792 `_. + +From there, `AddMarkerWithOptionalStackToBuffer `_ +handles ``NoPayload`` markers (usually added with ``PROFILER_MARKER_UNTYPED``) +in a special way, mostly to avoid the extra work associated with handling +payloads. Otherwise it continues with the following function. + +`MarkerTypeSerialization::Serialize `_ +retrieves the deserialization tag associated with the marker type. If it's the +first time this marker type is used, +`Streaming::TagForMarkerTypeFunctions `_ +adds it to the global list (which stores some function pointers used during +deserialization). + +Then the main serialization happens in +`StreamFunctionTypeHelper::Serialize `_. +Deconstructing this mouthful of an template: + +* ``MarkerType::StreamJSONMarkerData`` is the user-provided function that will + eventually produce the final JSON, but here it's only used to know the + parameter types that it expects. +* ``StreamFunctionTypeHelper`` takes that function prototype, and can extract + its argument by specializing on ```R(SpliceableJSONWriter&, As...)``, now + ``As...`` is a parameter pack matching the function parameters. +* Note that ``Serialize`` also takes a parameter pack, which contains all the + referenced arguments given to the top ``AddBufferToMarker`` call. These two + packs are supposed to match, at least the given arguments should be + convertible to the target pack parameter types. +* That specialization's ``Serialize`` function calls the buffer's ``PutObjects`` + variadic function to write all the marker data, that is: + + * The entry kind that must be at the beginning of every buffer entry, in this + case `ProfileBufferEntryKind::Marker `_. + * The common marker data (options first, name, category, deserialization tag). + * Then all the marker-type-specific arguments. Note that the C++ types + are those extracted from the deserialization function, so we know that + whatever is serialized here can be later deserialized using those same + types. + +The deserialization side is described in the later section "JSON output of +Markers". + +Adding Markers from Rust +======================== + +See https://firefox-source-docs.mozilla.org/tools/profiler/instrumenting-rust.html#adding-markers + +Adding Markers from JS +====================== + +See https://firefox-source-docs.mozilla.org/tools/profiler/instrumenting-javascript.html + +Adding Markers from Java +======================== + +See https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ProfilerController.java + +************* +Profiling Log +************* + +During a profiling session, some profiler-related events may be recorded using +`ProfilingLog::Access `_. + +The resulting JSON object is added near the end of the process' JSON generation, +in a top-level property named "profilingLog". This object is free-form, and is +not intended to be displayed, or even read by most people. But it may include +interesting information for advanced users, or could be an early temporary +prototyping ground for new features. + +See "profileGatheringLog" for another log related to late events. + +WIP note: This was introduced shortly before this documentation, so at this time +it doesn't do much at all. + +*************** +Profile Capture +*************** + +Usually at the end of a profiling session, a profile is "captured", and either +saved to disk, or sent to the front-end https://profiler.firefox.com for +analysis. This section describes how the captured data is converted to the +Gecko Profiler JSON format. + +FailureLatch +============ + +`The FailureLatch interface `_ +is used during the JSON generation, in order to catch any unrecoverable error +(such as running Out Of Memory), to exit the process early, and to forward the +error to callers. + +There are two main implementations, suffixed "source" as they are the one source +of failure-handling, which is passed as ``FailureLatch&`` throughout the code: + +* `FailureLatchInfallibleSource `_ + is an "infallible" latch, meaning that it doesn't expect any failure. So if + a failure actually happened, the program would immediately terminate! (This + was the default behavior prior to introducing these latches.) +* `FailureLatchSource `_ + is a "fallible" latch, it will record the first failure that happens, and + "latch" into the failure state. The code should regularly examine this state, + and return early when possible. Eventually this failure state may be exposed + to end users. + +ProgressLogger, ProportionValue +=============================== + +`A ProgressLogger object `_ +is used to track the progress of a long operation, in this case the JSON +generation process. + +To match how the JSON generation code works (as a tree of C++ functions calls), +each ``ProgressLogger`` in a function usually records progress from 0 to 100% +locally inside that function. If that function calls a sub-function, it gives it +a sub-logger, which in the caller function is set to represent a local sub-range +(like 20% to 40%), but to the called function it will look like its own local +``ProgressLogger`` that goes from 0 to 100%. The very top ``ProgressLogger`` +converts the deepest local progress value to the corresponding global progress. + +Progress values are recorded in +`ProportionValue objects `_, +which effectively record fractional value with no loss of precision. + +This progress is most useful when the parent process is waiting for child +processes to do their work, to make sure progress does happen, otherwise to stop +waiting for frozen processes. More about that in the "Multi-Process Profiling" +section below. + +JSONWriter +========== + +`A JSONWriter object `_ +offers a simple way to create a JSON stream (start/end collections, add +elements, etc.), and calls back into a provided +`JSONWriteFunc interface `_ +to output characters. + +While these classes live outside of the Profiler directories, it may sometimes be +worth maintaining and/or modifying them to better serve the Profiler's needs. +But there are other users, so be careful not to break other things! + +SpliceableJSONWriter and SpliceableChunkedJSONWriter +==================================================== + +Because the Profiler deals with large amounts of data (big profiles can take +tens to hundreds of megabytes!), some specialized wrappers add better handling +of these large JSON streams. + +`SpliceableJSONWriter `_ +is a subclass of ``JSONWriter``, and allows the "splicing" of JSON strings, +i.e., being able to take a whole well-formed JSON string, and directly inserting +it as a JSON object in the target JSON being streamed. + +It also offers some functions that are often useful for the Profiler, such as: +* Converting a timestamp into a JSON object in the stream, taking care of keeping a nanosecond precision, without unwanted zeroes or nines at the end. +* Adding a number of null elements. +* Adding a unique string index, and add that string to a provided unique-string list if necessary. (More about UniqueStrings below.) + +`SpliceableChunkedJSONWriter `_ +is a subclass of ``SpliceableJSONWriter``. Its main attribute is that it provides its own writer +(`ChunkedJSONWriteFunc `_), +which stores the stream as a sequence of "chunks" (heap-allocated buffers). +It starts with a chunk of a default size, and writes incoming data into it, +later allocating more chunks as needed. This avoids having massive buffers being +resized all the time. + +It also offers the same splicing abilities as its parent class, but in case an +incoming JSON string comes from another ``SpliceableChunkedJSONWriter``, it's +able to just steal the chunks and add them to its list, thereby avoiding +expensive allocations and copies and destructions. + +UniqueStrings +============= + +Because a lot of strings would be repeated in profiles (e.g., frequent marker +names), such strings are stored in a separate JSON array of strings, and an +index into this list is used instead of that full string object. + +Note that these unique-string indices are currently only located in specific +spots in the JSON tree, they cannot be used just anywhere strings are accepted. + +`The UniqueJSONStrings class `_ +stores this list of unique strings in a ``SpliceableChunkedJSONWriter``. +Given a string, it takes care of storing it if encountered for the first time, +and inserts the index into a target ``SpliceableJSONWriter``. + +JSON Generation +=============== + +The "Gecko Profile Format" can be found at +https://github.com/firefox-devtools/profiler/blob/main/docs-developer/gecko-profile-format.md . + +The implementation in the back-end is +`locked_profiler_stream_json_for_this_process `_. +It outputs each JSON top-level JSON object, mostly in sequence. See the code for +how each object is output. Note that there is special handling for samples and +markers, as explained in the following section. + +ProcessStreamingContext and ThreadStreamingContext +-------------------------------------------------- + +In JSON profiles, samples and markers are separated by thread and by +samples/markers. Because there are potentially tens to a hundred threads, it +would be very costly to read the full profile buffer once for each of these +groups. So instead the buffer is read once, and all samples and markers are +handled as they are read, and their JSON output is sent to separate JSON +writers. + +`A ProcessStreamingContext object `_ +contains all the information to facilitate this output, including a list of +`ThreadStreamingContext's `_, +which each contain one ``SpliceableChunkedJSONWriter`` for the samples, and one +for the markers in this thread. + +When reading entries from the profile buffer, samples and markers are found by +their ``ProfileBufferEntryKind``, and as part of deserializing either kind (more +about each below), the thread ID is read, and determines which +``ThreadStreamingContext`` will receive the JSON output. + +At the end of this process, all ``SpliceableChunkedJSONWriters`` are efficiently +spliced (mainly a pointer move) into the final JSON output. + +JSON output of Samples +---------------------- + +This work is done in +`ProfileBuffer::DoStreamSamplesAndMarkersToJSON `_. + +From the main ``ProfileChunkedBuffer``, each entry is visited, its +``ProfileBufferEntryKind`` is read first, and for samples all frames from +captured stack are converted to the appropriate JSON. + +`A UniqueStacks object `_ +is used to de-duplicate frames and even sub-stacks: + +* Each unique frame string is written into a JSON array inside a + ``SpliceableChunkedJSONWriter``, and its index is the frame identifier. +* Each stack level is also de-duplicated, and identifies the associated frame + string, and points at the calling stack level (i.e., closer to the root). +* Finally, the identifier for the top of the stack is stored, along with a + timestamp (and potentially some more information) as the sample. + +For example, if we have collected the following samples: + +#. A -> B -> C +#. A -> B +#. A -> B -> D + +The frame table would contain each frame name, something like: +``["A", "B", "C", "D"]``. So the frame containing "A" has index 0, "B" is at 1, +etc. + +The stack table would contain each stack level, something like: +``[[0, null], [1, 0], [2, 1], [3, 1]]``. ``[0, null]`` means the frame is 0 +("A"), and it has no caller, it's the root frame. ``[1, 0]`` means the frame is +1 ("B"), and its caller is stack 0, which is just the previous one in this +example. + +And the three samples stored in the thread data would be therefore be: 2, 1, 3 +(E.g.: "2" points in the stack table at the frame [2,1] with "C", and from them +down to "B", then "A"). + +All this contains all the information needed to reconstruct all full stack +samples. + +JSON output of Markers +---------------------- + +This also happens +`inside ProfileBuffer::DoStreamSamplesAndMarkersToJSON `_. + +When a ``ProfileBufferEntryKind::Marker`` is encountered, +`the DeserializeAfterKindAndStream function `_ +reads the ``MarkerOptions`` (stored as explained above), which include the +thread ID, identifying which ``ThreadStreamingContext``'s +``SpliceableChunkedJSONWriter`` to use. + +After that, the common marker data (timing, category, etc.) is output. + +Then the ``Streaming::DeserializerTag`` identifies which type of marker this is. +The special case of ``0`` (no payload) means nothing more is output. + +Otherwise some more common data is output as part of the payload if present, in +particular the "inner window id" (used to match markers with specific html +frames), and stack. + +WIP note: Some of these may move around in the future, see +`bug 1774326 `_, +`bug 1774328 `_, and +others. + +In case of a C++-written payload, the ``DeserializerTag`` identifies the +``MarkerDataDeserializer`` function to use. This is part of the heavy templated +code in BaseProfilerMarkersDetail.h, the function is defined as +`MarkerTypeSerialization::Deserialize `_, +which outputs the marker type name, and then each marker payload argument. The +latter is done by using the user-defined ``MarkerType::StreamJSONMarkerData`` +parameter list, and recursively deserializing each parameter from the profile +buffer into an on-stack variable of a corresponding type, at the end of which +``MarkerType::StreamJSONMarkerData`` can be called with all of these arguments +at it expects, and that function does the actual JSON streaming as the user +programmed. + +************* +Profiler Stop +************* + +See "Profiler Start" and do the reverse! + +There is some special handling of the ``SampleThread`` object, just to ensure +that it gets deleted outside of the main profiler mutex being locked, otherwise +this could result in a deadlock (because it needs to take the lock before being +able to check the state variable indicating that the sampling loop and thread +should end). + +***************** +Profiler Shutdown +***************** + +See "Profiler Initialization" and do the reverse! + +One additional action is handling the optional ``MOZ_PROFILER_SHUTDOWN`` +environment variable, to output a profile if the profiler was running. + +*********************** +Multi-Process Profiling +*********************** + +All of the above explanations focused on what the profiler is doing is each +process: Starting, running and collecting samples, markers, and more data, +outputting JSON profiles, and stopping. + +But Firefox is a multi-process program, since +`Electrolysis aka e10s `_ introduce child +processes to handle web content and extensions, and especially since +`Fission `_ forced even parts of the +same webpage to run in separate processes, mainly for added security. Since then +Firefox can spawn many processes, sometimes 10 to 20 when visiting busy sites. + +The following sections explains how profiling Firefox as a whole works. + +IPC (Inter-Process Communication) +================================= + +See https://firefox-source-docs.mozilla.org/ipc/. + +As a quick summary, some message-passing function-like declarations live in +`PProfiler.ipdl `_, +and corresponding ``SendX`` and ``RecvX`` C++ functions are respectively +generated in +`PProfilerParent.h `_, +and virtually declared (for user implementation) in +`PProfilerChild.h `_. + +During Profiling +================ + +Exit profiles +------------- + +One IPC message that is not in PProfiler.ipdl, is +`ShutdownProfile `_ +in +`PContent.ipdl `_. + +It's called from +`ContentChild::ShutdownInternal `_, +just before a child process ends, and if the profiler was running, to ensure +that the profile data is collected and sent to the parent, for storage in its +``ActivePS``. + +See +`ActivePS::AddExitProfile `_ +for details. Note that the current "buffer position at gathering time" (which is +effectively the largest ``ProfileBufferBlockIndex`` that is present in the +global profile buffer) is recorded. Later, +`ClearExpiredExitProfiles `_ +looks at the **smallest** ``ProfileBufferBlockIndex`` still present in the +buffer (because early chunks may have been discarded to limit memory usage), and +discards exit profiles that were recorded before, because their data is now +older than anything stored in the parent. + +Profile Buffer Global Memory Control +------------------------------------ + +Each process runs its own profiler, with each its own profile chunked buffer. To +keep the overall memory usage of all these buffers under the user-picked limit, +processes work together, with the parent process overseeing things. + +Diagram showing the relevant classes, see details in the following sub-sections: + +.. image:: fissionprofiler-20200424.png + +ProfileBufferControlledChunkManager +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`The ProfileBufferControlledChunkManager interface `_ +allows a controller to get notified about all chunk updates, and to force the +destruction/recycling of old chunks. +`The ProfileBufferChunkManagerWithLocalLimit class `_ +implements it. + +`An Update object `_ +contains all information related to chunk changes: How much memory is currently +used by the local chunk manager, how much has been "released" (and therefore +could be destroyed/recycled), and a list of all chunks that were released since +the previous update; it also has a special state meaning that the child is +shutting down so there won't be updates anymore. An ``Update`` may be "folded" +into a previous one, to create a combined update equivalent to the two separate +ones one after the other. + +Update Handling in the ProfilerChild +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When the profiler starts in a child process, the ``ProfilerChild`` +`starts to listen for updates `_. + +These updates are stored and folded into previous ones (if any). At some point, +`an AwaitNextChunkManagerUpdate message `_ +will be received, and any update can be forwarded to the parent. The local +update is cleared, ready to store future updates. + +Update Handling in the ProfilerParent +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When the profiler starts AND when there are child processes, the +`ProfilerParent's ProfilerParentTracker `_ +creates +`a ProfileBufferGlobalController `_, +which starts to listen for updates from the local chunk manager. + +The ``ProfilerParentTracker`` is also responsible for keeping track of child +processes, and to regularly +`send them AwaitNextChunkManagerUpdate messages `_, +that the child's ``ProfilerChild`` answers to with updates. The update may +indicate that the child is shutting down, in which case the tracker will stop +tracking it. + +All these updates (from the local chunk manager, and from child processes' own +chunk managers) are processed in +`ProfileBufferGlobalController::HandleChunkManagerNonFinalUpdate `_. +Based on this stream of updates, it is possible to calculate the total memory +used by all profile buffers in all processes, and to keep track of all chunks +that have been "released" (i.e., are full, and can be destroyed). When the total +memory usage reaches the user-selected limit, the controller can lookup the +oldest chunk, and get it destroyed (either a local call for parent chunks, or by +sending +`a DestroyReleasedChunksAtOrBefore message `_ +to the owning child). + +Historical note: Prior to Fission, the Profiler used to keep one fixed-size +circular buffer in each process, but as Fission made the possible number of +processes unlimited, the memory consumption grew too fast, and required the +implementation of the above system. But there may still be mentions of +"circular buffers" in the code or documents; these have effectively been +replaced by chunked buffers, with centralized chunk control. + +Gathering Child Profiles +======================== + +When it's time to capture a full profile, the parent process performs its own +JSON generation (as described above), and sends +`a GatherProfile message `_ +to all child processes, which will make them generate their JSON profile and +send it back to the parent. + +All child profiles, including the exit profiles collected during profiling, are +stored as elements of a top-level array with property name "processes". + +During the gathering phase, while the parent is waiting for child responses, it +regularly sends +`GetGatherProfileProgress messages `_ +to all child processes that have not sent their profile yet, and the parent +expects responses within a short timeframe. The response carries a progress +value. If at some point two messages went with no progress was made anywhere +(either there was no response, or the progress value didn't change), the parent +assumes that remaining child processes may be frozen indefinitely, stops the +gathering and considers the JSON generation complete. + +During all of the above work, events are logged (especially issues with child +processes), and are added at the end of the JSON profile, in a top-level object +with property name "profileGatheringLog". This object is free-form, and is not +intended to be displayed, or even read by most people. But it may include +interesting information for advanced users regarding the profile-gathering +phase. diff --git a/tools/profiler/docs/fissionprofiler-20200424.png b/tools/profiler/docs/fissionprofiler-20200424.png new file mode 100644 index 0000000000..1602877a5b Binary files /dev/null and b/tools/profiler/docs/fissionprofiler-20200424.png differ diff --git a/tools/profiler/docs/fissionprofiler.umlet.uxf b/tools/profiler/docs/fissionprofiler.umlet.uxf new file mode 100644 index 0000000000..3325294e25 --- /dev/null +++ b/tools/profiler/docs/fissionprofiler.umlet.uxf @@ -0,0 +1,546 @@ + + + 10 + + UMLClass + + 70 + 110 + 300 + 70 + + /PProfilerParent/ +bg=light_gray +-- +*+SendAwaitNextChunkManagerUpdate()* +*+SendDestroyReleasedChunksAtOrBefore()* + + + + UMLClass + + 470 + 20 + 210 + 70 + + *ProfileBufferChunkMetadata* +bg=light_gray +-- ++doneTimeStamp ++bufferBytes + + + + + UMLClass + + 780 + 110 + 330 + 70 + + /PProfilerChild/ +bg=light_gray +-- +*/+RecvAwaitNextChunkManagerUpdate() = 0/* +*/+RecvDestroyReleasedChunksAtOrBefore() = 0/* + + + + + UMLClass + + 110 + 260 + 220 + 70 + + ProfilerParent +-- +*-processId* +-- + + + + + Relation + + 210 + 170 + 30 + 110 + + lt=<<- + 10.0;10.0;10.0;90.0 + + + UMLClass + + 740 + 250 + 410 + 90 + + ProfilerChild +-- +-UpdateStorage: unreleased bytes, released: {pid, rangeStart[ ]} +-- +*+RecvAwaitNextChunkUpdate()* +*+RecvDestroyReleasedChunksAtOrBefore()* + + + + + Relation + + 930 + 170 + 30 + 100 + + lt=<<- + 10.0;10.0;10.0;80.0 + + + UMLClass + + 110 + 400 + 220 + 70 + + ProfilerParentTracker +-- +_+Enumerate()_ +_*+ForChild()*_ + + + + Relation + + 210 + 320 + 190 + 100 + + lt=<- +m1=0..n +nsTArray<ProfilerParent*> + 10.0;10.0;10.0;80.0 + + + UMLClass + + 80 + 1070 + 150 + 30 + + ProfileBufferChunk + + + + UMLClass + + 380 + 1070 + 210 + 30 + + /ProfileBufferChunkManager/ + + + + UMLClass + + 180 + 900 + 700 + 50 + + ProfileBufferChunkManagerWithLocalLimit +-- +-mUpdateCallback + + + + Relation + + 480 + 940 + 30 + 150 + + lt=<<- + 10.0;130.0;10.0;10.0 + + + UMLClass + + 380 + 1200 + 210 + 30 + + ProfileChunkedBuffer + + + + Relation + + 410 + 1090 + 140 + 130 + + lt=->>>> +mChunkManager + 10.0;10.0;10.0;110.0 + + + UMLClass + + 960 + 1200 + 100 + 30 + + CorePS + + + + UMLClass + + 960 + 1040 + 100 + 30 + + ActivePS + + + + Relation + + 580 + 1200 + 400 + 40 + + lt=->>>>> +mCoreBuffer + 10.0;20.0;380.0;20.0 + + + Relation + + 870 + 940 + 250 + 120 + + lt=->>>>> +mProfileBufferChunkManager + 10.0;10.0;90.0;100.0 + + + UMLClass + + 830 + 1140 + 100 + 30 + + ProfileBuffer + + + + Relation + + 920 + 1060 + 130 + 110 + + lt=->>>>> +mProfileBuffer + 10.0;90.0;40.0;10.0 + + + Relation + + 580 + 1160 + 270 + 70 + + lt=->>>> +mEntries + 10.0;50.0;250.0;10.0 + + + Relation + + 90 + 1090 + 310 + 150 + + lt=->>>>> +m1=0..1 +mCurrentChunk: UniquePtr<> + 10.0;10.0;10.0;130.0;290.0;130.0 + + + Relation + + 210 + 1080 + 200 + 150 + + lt=->>>>> +m1=0..N +mNextChunks: UniquePtr<> + 20.0;10.0;170.0;130.0 + + + Relation + + 200 + 940 + 230 + 150 + + lt=->>>>> +m1=0..N +mReleasedChunks: UniquePtr<> + 10.0;130.0;10.0;10.0 + + + Relation + + 530 + 1090 + 270 + 130 + + lt=->>>>> +mOwnedChunkManager: UniquePtr<> + 10.0;10.0;10.0;110.0 + + + UMLClass + + 480 + 390 + 550 + 150 + + *ProfileBufferGlobalController* +-- +-mMaximumBytes +-mCurrentUnreleasedBytesTotal +-mCurrentUnreleasedBytes: {pid, unreleased bytes}[ ] sorted by pid +-mCurrentReleasedBytes +-mReleasedChunks: {doneTimeStamp, bytes, pid}[ ] sorted by timestamp +-mDestructionCallback: function<void(pid, rangeStart)> +-- ++Update(pid, unreleased bytes, released: ProfileBufferChunkMetadata[ ]) + + + + Relation + + 320 + 420 + 180 + 40 + + lt=->>>>> +mController + 160.0;20.0;10.0;20.0 + + + Relation + + 20 + 400 + 110 + 80 + + lt=->>>>> +_sInstance_ + 90.0;60.0;10.0;60.0;10.0;10.0;90.0;10.0 + + + UMLNote + + 480 + 250 + 220 + 120 + + The controller is only needed +if there *are* child processes, +so we can create it with the first +child (at which point the tracker +can register itself with the local +profiler), and destroyed with the +last child. +bg=blue + + + + Relation + + 690 + 330 + 100 + 80 + + + 10.0;10.0;80.0;60.0 + + + Relation + + 130 + 460 + 200 + 380 + + lt=->>>> +mParentChunkManager + 180.0;360.0;10.0;360.0;10.0;10.0 + + + Relation + + 740 + 330 + 350 + 510 + + lt=->>>> +mLocalBufferChunkManager + 10.0;490.0;330.0;490.0;330.0;10.0 + + + UMLClass + + 470 + 650 + 400 + 100 + + *ProfileBufferControlledChunkManager::Update* +-- +-mUnreleasedBytes +-mReleasedBytes +-mOldestDoneTimeStamp +-mNewReleasedChunks: ChunkMetadata[ ] + + + + UMLClass + + 470 + 560 + 400 + 60 + + *ProfileBufferControlledChunkManager::ChunkMetadata* +-- +-mDoneTimeStamp +-mBufferBytes + + + + Relation + + 670 + 610 + 30 + 60 + + lt=<. + 10.0;10.0;10.0;40.0 + + + Relation + + 670 + 740 + 30 + 60 + + lt=<. + 10.0;10.0;10.0;40.0 + + + Relation + + 670 + 50 + 130 + 110 + + lt=<. + 10.0;10.0;110.0;90.0 + + + Relation + + 360 + 50 + 130 + 110 + + lt=<. + 110.0;10.0;10.0;90.0 + + + UMLClass + + 400 + 130 + 350 + 100 + + *ProfileBufferChunkManagerUpdate* +bg=light_gray +-- +-unreleasedBytes +-releasedBytes +-oldestDoneTimeStamp +-newlyReleasedChunks: ProfileBufferChunkMetadata[ ] + + + + UMLClass + + 310 + 780 + 440 + 70 + + *ProfileBufferControlledChunkManager* +-- +*/+SetUpdateCallback(function<void(update: Update&&)>)/* +*/+DestroyChunksAtOrBefore(timeStamp)/* + + + + Relation + + 480 + 840 + 30 + 80 + + lt=<<- + 10.0;10.0;10.0;60.0 + + diff --git a/tools/profiler/docs/index.rst b/tools/profiler/docs/index.rst new file mode 100644 index 0000000000..53920e7d2f --- /dev/null +++ b/tools/profiler/docs/index.rst @@ -0,0 +1,37 @@ +Gecko Profiler +============== + +The Firefox Profiler is the collection of tools used to profile Firefox. This is backed +by the Gecko Profiler, which is the primarily C++ component that instruments Gecko. It +is configurable, and supports a variety of data sources and recording modes. Primarily, +it is used as a statistical profiler, where the execution of threads that have been +registered with the profile is paused, and a sample is taken. Generally, this includes +a stackwalk with combined native stack frame, JavaScript stack frames, and custom stack +frame labels. + +In addition to the sampling, the profiler can collect markers, which are collected +deterministically (as opposed to statistically, like samples). These include some +kind of text description, and optionally a payload with more information. + +This documentation serves to document the Gecko Profiler and Base Profiler components, +while the profiler.firefox.com interface is documented at `profiler.firefox.com/docs/ `_ + +.. toctree:: + :maxdepth: 1 + + code-overview + buffer + instrumenting-javascript + instrumenting-rust + markers-guide + memory + +The following areas still need documentation: + + * LUL + * Instrumenting Java + * Registering Threads + * Samples and Stack Walking + * Triggering Gecko Profiles in Automation + * JS Tracer + * Serialization diff --git a/tools/profiler/docs/instrumenting-javascript.rst b/tools/profiler/docs/instrumenting-javascript.rst new file mode 100644 index 0000000000..928d94781e --- /dev/null +++ b/tools/profiler/docs/instrumenting-javascript.rst @@ -0,0 +1,60 @@ +Instrumenting JavaScript +======================== + +There are multiple ways to use the profiler with JavaScript. There is the "JavaScript" +profiler feature (via about:profiling), which enables stack walking for JavaScript code. +This is most likely turned on already for every profiler preset. + +In addition, markers can be created to specifically marker an instant in time, or a +duration. This can be helpful to make sense of a particular piece of the front-end, +or record events that normally wouldn't show up in samples. + +.. note:: + This guide explains JavaScript markers in depth. To learn more about how to add a + marker in C++ or Rust, please take a look at their documentation + in :doc:`markers-guide` or :doc:`instrumenting-rust` respectively. + +Markers in Browser Chrome +************************* + +If you have access to ChromeUtils, then adding a marker is relatively easily. + +.. code-block:: javascript + + // Add an instant marker, representing a single point in time + ChromeUtils.addProfilerMarker("MarkerName"); + + // Add a duration marker, representing a span of time. + const startTime = Cu.now(); + doWork(); + ChromeUtils.addProfilerMarker("MarkerName", startTime); + + // Add a duration marker, representing a span of time, with some additional tex + const startTime = Cu.now(); + doWork(); + ChromeUtils.addProfilerMarker("MarkerName", startTime, "Details about this event"); + + // Add an instant marker, with some additional tex + const startTime = Cu.now(); + doWork(); + ChromeUtils.addProfilerMarker("MarkerName", undefined, "Details about this event"); + +Markers in Content Code +*********************** + +If instrumenting content code, then the `UserTiming`_ API is the best bet. +:code:`performance.mark` will create an instant marker, and a :code:`performance.measure` +will create a duration marker. These markers will show up under UserTiming in +the profiler UI. + +.. code-block:: javascript + + // Create an instant marker. + performance.mark("InstantMarkerName"); + + doWork(); + + // Measuring with the performance API will also create duration markers. + performance.measure("DurationMarkerName", "InstantMarkerName"); + +.. _UserTiming: https://developer.mozilla.org/en-US/docs/Web/API/User_Timing_API diff --git a/tools/profiler/docs/instrumenting-rust.rst b/tools/profiler/docs/instrumenting-rust.rst new file mode 100644 index 0000000000..0c5021eec1 --- /dev/null +++ b/tools/profiler/docs/instrumenting-rust.rst @@ -0,0 +1,433 @@ +Instrumenting Rust +================== + +There are multiple ways to use the profiler with Rust. Native stack sampling already +includes the Rust frames without special handling. There is the "Native Stacks" +profiler feature (via about:profiling), which enables stack walking for native code. +This is most likely turned on already for every profiler presets. + +In addition to that, there is a profiler Rust API to instrument the Rust code +and add more information to the profile data. There are three main functionalities +to use: + +1. Register Rust threads with the profiler, so the profiler can record these threads. +2. Add stack frame labels to annotate and categorize a part of the stack. +3. Add markers to specifically mark instants in time, or durations. This can be + helpful to make sense of a particular piece of the code, or record events that + normally wouldn't show up in samples. + +Crate to Include as a Dependency +-------------------------------- + +Profiler Rust API is located inside the ``gecko-profiler`` crate. This needs to +be included in the project dependencies before the following functionalities can +be used. + +To be able to include it, a new dependency entry needs to be added to the project's +``Cargo.toml`` file like this: + +.. code-block:: toml + + [dependencies] + gecko-profiler = { path = "../../tools/profiler/rust-api" } + +Note that the relative path needs to be updated depending on the project's location +in mozilla-central. + +Registering Threads +------------------- + +To be able to see the threads in the profile data, they need to be registered +with the profiler. Also, they need to be unregistered when they are exiting. +It's important to give a unique name to the thread, so they can be filtered easily. + +Registering and unregistering a thread is straightforward: + +.. code-block:: rust + + // Register it with a given name. + gecko_profiler::register_thread("Thread Name"); + // After doing some work, and right before exiting the thread, unregister it. + gecko_profiler::unregister_thread(); + +For example, here's how to register and unregister a simple thread: + +.. code-block:: rust + + let thread_name = "New Thread"; + std::thread::Builder::new() + .name(thread_name.into()) + .spawn(move || { + gecko_profiler::register_thread(thread_name); + // DO SOME WORK + gecko_profiler::unregister_thread(); + }) + .unwrap(); + +Or with a thread pool: + +.. code-block:: rust + + let worker = rayon::ThreadPoolBuilder::new() + .thread_name(move |idx| format!("Worker#{}", idx)) + .start_handler(move |idx| { + gecko_profiler::register_thread(&format!("Worker#{}", idx)); + }) + .exit_handler(|_idx| { + gecko_profiler::unregister_thread(); + }) + .build(); + +.. note:: + Registering a thread only will not make it appear in the profile data. In + addition, it needs to be added to the "Threads" filter in about:profiling. + This filter input is a comma-separated list. It matches partial names and + supports the wildcard ``*``. + +Adding Stack Frame Labels +------------------------- + +Stack frame labels are useful for annotating a part of the call stack with a +category. The category will appear in the various places on the Firefox Profiler +analysis page like timeline, call tree tab, flame graph tab, etc. + +``gecko_profiler_label!`` macro is used to add a new label frame. The added label +frame will exist between the call of this macro and the end of the current scope. + +Adding a stack frame label: + +.. code-block:: rust + + // Marking the stack as "Layout" category, no subcategory provided. + gecko_profiler_label!(Layout); + // Marking the stack as "JavaScript" category and "Parsing" subcategory. + gecko_profiler_label!(JavaScript, Parsing); + + // Or the entire function scope can be marked with a procedural macro. This is + // essentially a syntactical sugar and it expands into a function with a + // gecko_profiler_label! call at the very start: + #[gecko_profiler_fn_label(DOM)] + fn foo(bar: u32) -> u32 { + bar + } + +See the list of all profiling categories in the `profiling_categories.yaml`_ file. + +Adding Markers +-------------- + +Markers are packets of arbitrary data that are added to a profile by the Firefox code, +usually to indicate something important happening at a point in time, or during an interval of time. + +Each marker has a name, a category, some common optional information (timing, backtrace, etc.), +and an optional payload of a specific type (containing arbitrary data relevant to that type). + +.. note:: + This guide explains Rust markers in depth. To learn more about how to add a + marker in C++ or JavaScript, please take a look at their documentation + in :doc:`markers-guide` or :doc:`instrumenting-javascript` respectively. + +Examples +^^^^^^^^ + +Short examples, details are below. + +.. code-block:: rust + + // Record a simple marker with the category of Graphics, DisplayListBuilding. + gecko_profiler::add_untyped_marker( + // Name of the marker as a string. + "Marker Name", + // Category with an optional sub-category. + gecko_profiler_category!(Graphics, DisplayListBuilding), + // MarkerOptions that keeps options like marker timing and marker stack. + // It will be a point in type by default. + Default::default(), + ); + +.. code-block:: rust + + // Create a marker with some additional text information. + let info = "info about this marker"; + gecko_profiler::add_text_marker( + // Name of the marker as a string. + "Marker Name", + // Category with an optional sub-category. + gecko_profiler_category!(DOM), + // MarkerOptions that keeps options like marker timing and marker stack. + MarkerOptions { + timing: MarkerTiming::instant_now(), + ..Default::default() + }, + // Additional information as a string. + info, + ); + +.. code-block:: rust + + // Record a custom marker of type `ExampleNumberMarker` (see definition below). + gecko_profiler::add_marker( + // Name of the marker as a string. + "Marker Name", + // Category with an optional sub-category. + gecko_profiler_category!(Graphics, DisplayListBuilding), + // MarkerOptions that keeps options like marker timing and marker stack. + Default::default(), + // Marker payload. + ExampleNumberMarker { number: 5 }, + ); + + .... + + // Marker type definition. It needs to derive Serialize, Deserialize. + #[derive(Serialize, Deserialize, Debug)] + pub struct ExampleNumberMarker { + number: i32, + } + + // Marker payload needs to implement the ProfilerMarker trait. + impl gecko_profiler::ProfilerMarker for ExampleNumberMarker { + // Unique marker type name. + fn marker_type_name() -> &'static str { + "example number" + } + // Data specific to this marker type, serialized to JSON for profiler.firefox.com. + fn stream_json_marker_data(&self, json_writer: &mut gecko_profiler::JSONWriter) { + json_writer.int_property("number", self.number.into()); + } + // Where and how to display the marker and its data. + fn marker_type_display() -> gecko_profiler::MarkerSchema { + use gecko_profiler::marker::schema::*; + let mut schema = MarkerSchema::new(&[Location::MarkerChart]); + schema.set_chart_label("Name: {marker.name}"); + schema.add_key_label_format("number", "Number", Format::Integer); + schema + } + } + +Untyped Markers +^^^^^^^^^^^^^^^ + +Untyped markers don't carry any information apart from common marker data: +Name, category, options. + +.. code-block:: rust + + gecko_profiler::add_untyped_marker( + // Name of the marker as a string. + "Marker Name", + // Category with an optional sub-category. + gecko_profiler_category!(Graphics, DisplayListBuilding), + // MarkerOptions that keeps options like marker timing and marker stack. + MarkerOptions { + timing: MarkerTiming::instant_now(), + ..Default::default() + }, + ); + +1. Marker name + The first argument is the name of this marker. This will be displayed in most places + the marker is shown. It can be a literal string, or any dynamic string. +2. `Profiling category pair`_ + A category + subcategory pair from the `the list of categories`_. + ``gecko_profiler_category!`` macro should be used to create a profiling category + pair since it's easier to use, e.g. ``gecko_profiler_category!(JavaScript, Parsing)``. + Second parameter can be omitted to use the default subcategory directly. + ``gecko_profiler_category!`` macro is encouraged to use, but ``ProfilingCategoryPair`` + enum can also be used if needed. +3. `MarkerOptions`_ + See the options below. It can be omitted if there are no arguments with ``Default::default()``. + Some options can also be omitted, ``MarkerOptions {, ..Default::default()}``, + with one or more of the following options types: + + * `MarkerTiming`_ + This specifies an instant or interval of time. It defaults to the current instant if + left unspecified. Otherwise use ``MarkerTiming::instant_at(ProfilerTime)`` or + ``MarkerTiming::interval(pt1, pt2)``; timestamps are usually captured with + ``ProfilerTime::Now()``. It is also possible to record only the start or the end of an + interval, pairs of start/end markers will be matched by their name. + * `MarkerStack`_ + By default, markers do not record a "stack" (or "backtrace"). To record a stack at + this point, in the most efficient manner, specify ``MarkerStack::Full``. To + capture a stack without native frames for reduced overhead, specify + ``MarkerStack::NonNative``. + + *Note: Currently, all C++ marker options are not present in the Rust side. They will + be added in the future.* + +Text Markers +^^^^^^^^^^^^ + +Text markers are very common, they carry an extra text as a fourth argument, in addition to +the marker name. Use the following macro: + +.. code-block:: rust + + let info = "info about this marker"; + gecko_profiler::add_text_marker( + // Name of the marker as a string. + "Marker Name", + // Category with an optional sub-category. + gecko_profiler_category!(DOM), + // MarkerOptions that keeps options like marker timing and marker stack. + MarkerOptions { + stack: MarkerStack::Full, + ..Default::default() + }, + // Additional information as a string. + info, + ); + +As useful as it is, using an expensive ``format!`` operation to generate a complex text +comes with a variety of issues. It can leak potentially sensitive information +such as URLs during the profile sharing step. profiler.firefox.com cannot +access the information programmatically. It won't get the formatting benefits of the +built-in marker schema. Please consider using a custom marker type to separate and +better present the data. + +Other Typed Markers +^^^^^^^^^^^^^^^^^^^ + +From Rust code, a marker of some type ``YourMarker`` (details about type definition follow) can be +recorded like this: + +.. code-block:: rust + + gecko_profiler::add_marker( + // Name of the marker as a string. + "Marker Name", + // Category with an optional sub-category. + gecko_profiler_category!(JavaScript), + // MarkerOptions that keeps options like marker timing and marker stack. + Default::default(), + // Marker payload. + YourMarker { number: 5, text: "some string".to_string() }, + ); + +After the first three common arguments (like in ``gecko_profiler::add_untyped_marker``), +there is a marker payload struct and it needs to be defined. Let's take a look at +how to define it. + +How to Define New Marker Types +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Each marker type must be defined once and only once. +The definition is a Rust ``struct``, it's constructed when recording markers of +that type in Rust. Each marker struct holds the data that is required for them +to show in the profiler.firefox.com. +By convention, the suffix "Marker" is recommended to better distinguish them +from non-profiler entities in the source. + +Each marker payload must derive ``serde::Serialize`` and ``serde::Deserialize``. +They are also exported from ``gecko-profiler`` crate if a project doesn't have it. +Each marker payload should include its data as its fields like this: + +.. code-block:: rust + + #[derive(Serialize, Deserialize, Debug)] + pub struct YourMarker { + number: i32, + text: String, + } + +Each marker struct must also implement the `ProfilerMarker`_ trait. + +``ProfilerMarker`` trait +************************ + +`ProfilerMarker`_ trait must be implemented for all marker types. Its methods are +similar to C++ counterparts, please refer to :ref:`the C++ markers guide to learn +more about them `. It includes three methods that +needs to be implemented: + +1. ``marker_type_name() -> &'static str``: + A marker type must have a unique name, it is used to keep track of the type of + markers in the profiler storage, and to identify them uniquely on profiler.firefox.com. + (It does not need to be the same as the struct's name.) + + E.g.: + + .. code-block:: rust + + fn marker_type_name() -> &'static str { + "your marker type" + } + +2. ``stream_json_marker_data(&self, json_writer: &mut JSONWriter)`` + All markers of any type have some common data: A name, a category, options like + timing, etc. as previously explained. + + In addition, a certain marker type may carry zero of more arbitrary pieces of + information, and they are always the same for all markers of that type. + + These are defined in a special static member function ``stream_json_marker_data``. + + It's a member method and takes a ``&mut JSONWriter`` as a parameter, + it will be used to stream the data as JSON, to later be read by + profiler.firefox.com. See `JSONWriter object and its methods`_. + + E.g.: + + .. code-block:: rust + + fn stream_json_marker_data(&self, json_writer: &mut JSONWriter) { + json_writer.int_property("number", self.number.into()); + json_writer.string_property("text", &self.text); + } + +3. ``marker_type_display() -> schema::MarkerSchema`` + Now that how to stream type-specific data (from Firefox to + profiler.firefox.com) is defined, it needs to be described where and how this + data will be displayed on profiler.firefox.com. + + The static member function ``marker_type_display`` returns an opaque ``MarkerSchema`` + object, which will be forwarded to profiler.firefox.com. + + See the `MarkerSchema::Location enumeration for the full list`_. Also see the + `MarkerSchema struct for its possible methods`_. + + E.g.: + + .. code-block:: rust + + fn marker_type_display() -> schema::MarkerSchema { + // Import MarkerSchema related types for easier use. + use crate::marker::schema::*; + // Create a MarkerSchema struct with a list of locations provided. + // One or more constructor arguments determine where this marker will be displayed in + // the profiler.firefox.com UI. + let mut schema = MarkerSchema::new(&[Location::MarkerChart]); + + // Some labels can optionally be specified, to display certain information in different + // locations: set_chart_label, set_tooltip_label, and set_table_label``; or + // set_all_labels to define all of them the same way. + schema.set_all_labels("{marker.name} - {marker.data.number}); + + // Next, define the main display of marker data, which will appear in the Marker Chart + // tooltips and the Marker Table sidebar. + schema.add_key_label_format("number", "Number", Format::Number); + schema.add_key_label_format("text", "Text", Format::String); + schema.add_static_label_value("Help", "This is my own marker type"); + + // Lastly, return the created schema. + schema + } + + Note that the strings in ``set_all_labels`` may refer to marker data within braces: + + * ``{marker.name}``: Marker name. + * ``{marker.data.X}``: Type-specific data, as streamed with property name "X" + from ``stream_json_marker_data``. + + :ref:`See the C++ markers guide for more details about it `. + +.. _profiling_categories.yaml: https://searchfox.org/mozilla-central/source/mozglue/baseprofiler/build/profiling_categories.yaml +.. _Profiling category pair: https://searchfox.org/mozilla-central/define?q=gecko_profiler::gecko_bindings::profiling_categories::ProfilingCategoryPair +.. _the list of categories: https://searchfox.org/mozilla-central/source/mozglue/baseprofiler/build/profiling_categories.yaml +.. _MarkerOptions: https://searchfox.org/mozilla-central/define?q=gecko_profiler::marker::options::MarkerOptions +.. _MarkerTiming: https://searchfox.org/mozilla-central/define?q=gecko_profiler::marker::options::MarkerTiming +.. _MarkerStack: https://searchfox.org/mozilla-central/define?q=gecko_profiler::marker::options::MarkerStack +.. _ProfilerMarker: https://searchfox.org/mozilla-central/define?q=gecko_profiler::marker::ProfilerMarker +.. _MarkerSchema::Location enumeration for the full list: https://searchfox.org/mozilla-central/define?q=T_mozilla%3A%3AMarkerSchema%3A%3ALocation +.. _JSONWriter object and its methods: https://searchfox.org/mozilla-central/define?q=gecko_profiler::json_writer::JSONWriter +.. _MarkerSchema struct for its possible methods: https://searchfox.org/mozilla-central/define?q=gecko_profiler::marker::schema::MarkerSchema diff --git a/tools/profiler/docs/markers-guide.rst b/tools/profiler/docs/markers-guide.rst new file mode 100644 index 0000000000..82fe6f3cda --- /dev/null +++ b/tools/profiler/docs/markers-guide.rst @@ -0,0 +1,485 @@ +Markers +======= + +Markers are packets of arbitrary data that are added to a profile by the Firefox code, usually to +indicate something important happening at a point in time, or during an interval of time. + +Each marker has a name, a category, some common optional information (timing, backtrace, etc.), +and an optional payload of a specific type (containing arbitrary data relevant to that type). + +.. note:: + This guide explains C++ markers in depth. To learn more about how to add a + marker in JavaScript or Rust, please take a look at their documentation + in :doc:`instrumenting-javascript` or :doc:`instrumenting-rust` respectively. + +Example +------- + +Short example, details below. + +Note: Most marker-related identifiers are in the ``mozilla`` namespace, to be added where necessary. + +.. code-block:: c++ + + // Record a simple marker with the category of DOM. + PROFILER_MARKER_UNTYPED("Marker Name", DOM); + + // Create a marker with some additional text information. (Be wary of printf!) + PROFILER_MARKER_TEXT("Marker Name", JS, MarkerOptions{}, "Additional text information."); + + // Record a custom marker of type `ExampleNumberMarker` (see definition below). + PROFILER_MARKER("Number", OTHER, MarkerOptions{}, ExampleNumberMarker, 42); + +.. code-block:: c++ + + // Marker type definition. + struct ExampleNumberMarker { + // Unique marker type name. + static constexpr Span MarkerTypeName() { return MakeStringSpan("number"); } + // Data specific to this marker type, serialized to JSON for profiler.firefox.com. + static void StreamJSONMarkerData(SpliceableJSONWriter& aWriter, int aNumber) { + aWriter.IntProperty("number", aNumber); + } + // Where and how to display the marker and its data. + static MarkerSchema MarkerTypeDisplay() { + using MS = MarkerSchema; + MS schema(MS::Location::MarkerChart, MS::Location::MarkerTable); + schema.SetChartLabel("Number: {marker.data.number}"); + schema.AddKeyLabelFormat("number", "Number", MS::Format::Number); + return schema; + } + }; + + +How to Record Markers +--------------------- + +Header to Include +^^^^^^^^^^^^^^^^^ + +If the compilation unit only defines and records untyped, text, and/or its own markers, include +`the main profiler markers header `_: + +.. code-block:: c++ + + #include "mozilla/ProfilerMarkers.h" + +If it also records one of the other common markers defined in +`ProfilerMarkerTypes.h `_, +include that one instead: + +.. code-block:: c++ + + #include "mozilla/ProfilerMarkerTypes.h" + +And if it uses any other profiler functions (e.g., labels), use +`the main Gecko Profiler header `_ +instead: + +.. code-block:: c++ + + #include "GeckoProfiler.h" + +The above works from source files that end up in libxul, which is true for the majority +of Firefox source code. But some files live outside of libxul, such as mfbt, in which +case the advice is the same but the equivalent headers are from the Base Profiler instead: + +.. code-block:: c++ + + #include "mozilla/BaseProfilerMarkers.h" // Only own/untyped/text markers + #include "mozilla/BaseProfilerMarkerTypes.h" // Only common markers + #include "BaseProfiler.h" // Markers and other profiler functions + +Untyped Markers +^^^^^^^^^^^^^^^ + +Untyped markers don't carry any information apart from common marker data: +Name, category, options. + +.. code-block:: c++ + + PROFILER_MARKER_UNTYPED( + // Name, and category pair. + "Marker Name", OTHER, + // Marker options, may be omitted if all defaults are acceptable. + MarkerOptions(MarkerStack::Capture(), ...)); + +``PROFILER_MARKER_UNTYPED`` is a macro that simplifies the use of the main +``profiler_add_marker`` function, by adding the appropriate namespaces, and a surrounding +``#ifdef MOZ_GECKO_PROFILER`` guard. + +1. Marker name + The first argument is the name of this marker. This will be displayed in most places + the marker is shown. It can be a literal C string, or any dynamic string object. +2. `Category pair name `_ + Choose a category + subcategory from the `the list of categories `_. + This is the second parameter of each ``SUBCATEGORY`` line, for instance ``LAYOUT_Reflow``. + (Internally, this is really a `MarkerCategory `_ + object, in case you need to construct it elsewhere.) +3. `MarkerOptions `_ + See the options below. It can be omitted if there are no other arguments, ``{}``, or + ``MarkerOptions()`` (no specified options); only one of the following option types + alone; or ``MarkerOptions(...)`` with one or more of the following options types: + + * `MarkerThreadId `_ + Rarely used, as it defaults to the current thread. Otherwise it specifies the target + "thread id" (aka "track") where the marker should appear; This may be useful when + referring to something that happened on another thread (use ``profiler_current_thread_id()`` + from the original thread to get its id); or for some important markers, they may be + sent to the "main thread", which can be specified with ``MarkerThreadId::MainThread()``. + * `MarkerTiming `_ + This specifies an instant or interval of time. It defaults to the current instant if + left unspecified. Otherwise use ``MarkerTiming::InstantAt(timestamp)`` or + ``MarkerTiming::Interval(ts1, ts2)``; timestamps are usually captured with + ``TimeStamp::Now()``. It is also possible to record only the start or the end of an + interval, pairs of start/end markers will be matched by their name. *Note: The + upcoming "marker sets" feature will make this pairing more reliable, and also + allow more than two markers to be connected*. + * `MarkerStack `_ + By default, markers do not record a "stack" (or "backtrace"). To record a stack at + this point, in the most efficient manner, specify ``MarkerStack::Capture()``. To + record a previously captured stack, first store a stack into a + ``UniquePtr`` with ``profiler_capture_backtrace()``, then pass + it to the marker with ``MarkerStack::TakeBacktrace(std::move(stack))``. + * `MarkerInnerWindowId `_ + If you have access to an "inner window id", consider specifying it as an option, to + help profiler.firefox.com to classify them by tab. + +Text Markers +^^^^^^^^^^^^ + +Text markers are very common, they carry an extra text as a fourth argument, in addition to +the marker name. Use the following macro: + +.. code-block:: c++ + + PROFILER_MARKER_TEXT( + // Name, category pair, options. + "Marker Name", OTHER, {}, + // Text string. + "Here are some more details." + ); + +As useful as it is, using an expensive ``printf`` operation to generate a complex text +comes with a variety of issues string. It can leak potentially sensitive information +such as URLs can be leaked during the profile sharing step. profiler.firefox.com cannot +access the information programmatically. It won't get the formatting benefits of the +built-in marker schema. Please consider using a custom marker type to separate and +better present the data. + +Other Typed Markers +^^^^^^^^^^^^^^^^^^^ + +From C++ code, a marker of some type ``YourMarker`` (details about type definition follow) can be +recorded like this: + +.. code-block:: c++ + + PROFILER_MARKER( + "YourMarker name", OTHER, + MarkerOptions(MarkerTiming::IntervalUntilNowFrom(someStartTimestamp), + MarkerInnerWindowId(innerWindowId))), + YourMarker, "some string", 12345, "http://example.com", someTimeStamp); + +After the first three common arguments (like in ``PROFILER_MARKER_UNTYPED``), there are: + +4. The marker type, which is the name of the C++ ``struct`` that defines that type. +5. A variadic list of type-specific argument. They must match the number of, and must + be convertible to, ``StreamJSONMarkerData`` parameters as specified in the marker type definition. + +"Auto" Scoped Interval Markers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To capture time intervals around some important operations, it is common to store a timestamp, do the work, +and then record a marker, e.g.: + +.. code-block:: c++ + + void DoTimedWork() { + TimeStamp start = TimeStamp::Now(); + DoWork(); + PROFILER_MARKER_TEXT("Timed work", OTHER, MarkerTiming::IntervalUntilNowFrom(start), "Details"); + } + +`RAII `_ objects automate this, by recording the time +when the object is constructed, and later recording the marker when the object is destroyed at the end +of its C++ scope. +This is especially useful if there are multiple scope exit points. + +``AUTO_PROFILER_MARKER_TEXT`` is `the only one implemented `_ at this time. + +.. code-block:: c++ + + void MaybeDoTimedWork(bool aDoIt) { + AUTO_PROFILER_MARKER_TEXT("Timed work", OTHER, "Details"); + if (!aDoIt) { /* Marker recorded here... */ return; } + DoWork(); + /* ... or here. */ + } + +Note that these RAII objects only record one marker. In some situation, a very long +operation could be missed if it hasn't completed by the end of the profiling session. +In this case, consider recording two distinct markers, using +``MarkerTiming::IntervalStart()`` and ``MarkerTiming::IntervalEnd()``. + +Where to Define New Marker Types +-------------------------------- + +The first step is to determine the location of the marker type definition: + +* If this type is only used in one function, or a component, it can be defined in a + local common place relative to its use. +* For a more common type that could be used from multiple locations: + + * If there is no dependency on XUL, it can be defined in the Base Profiler, which can + be used in most locations in the codebase: + `mozglue/baseprofiler/public/BaseProfilerMarkerTypes.h `__ + + * However, if there is a XUL dependency, then it needs to be defined in the Gecko Profiler: + `tools/profiler/public/ProfilerMarkerTypes.h `__ + +.. _how-to-define-new-marker-types: + +How to Define New Marker Types +------------------------------ + +Each marker type must be defined once and only once. +The definition is a C++ ``struct``, its identifier is used when recording +markers of that type in C++. +By convention, the suffix "Marker" is recommended to better distinguish them +from non-profiler entities in the source. + +.. code-block:: c++ + + struct YourMarker { + +Marker Type Name +^^^^^^^^^^^^^^^^ + +A marker type must have a unique name, it is used to keep track of the type of +markers in the profiler storage, and to identify them uniquely on profiler.firefox.com. +(It does not need to be the same as the ``struct``'s name.) + +This name is defined in a special static member function ``MarkerTypeName``: + +.. code-block:: c++ + + // … + static constexpr Span MarkerTypeName() { + return MakeStringSpan("YourMarker"); + } + +Marker Type Data +^^^^^^^^^^^^^^^^ + +All markers of any type have some common data: A name, a category, options like +timing, etc. as previously explained. + +In addition, a certain marker type may carry zero of more arbitrary pieces of +information, and they are always the same for all markers of that type. + +These are defined in a special static member function ``StreamJSONMarkerData``. + +The first function parameters is always ``SpliceableJSONWriter& aWriter``, +it will be used to stream the data as JSON, to later be read by +profiler.firefox.com. + +.. code-block:: c++ + + // … + static void StreamJSONMarkerData(SpliceableJSONWriter& aWriter, + +The following function parameters is how the data is received as C++ objects +from the call sites. + +* Most C/C++ `POD (Plain Old Data) `_ + and `trivially-copyable `_ + types should work as-is, including ``TimeStamp``. +* Character strings should be passed using ``const ProfilerString8View&`` (this handles + literal strings, and various ``std::string`` and ``nsCString`` types, and spans with or + without null terminator). Use ``const ProfilerString16View&`` for 16-bit strings such as + ``nsString``. +* Other types can be used if they define specializations for ``ProfileBufferEntryWriter::Serializer`` + and ``ProfileBufferEntryReader::Deserializer``. You should rarely need to define new + ones, but if needed see how existing specializations are written, or contact the + `perf-tools team for help `_. + +Passing by value or by reference-to-const is recommended, because arguments are serialized +in binary form (i.e., there are no optimizable ``move`` operations). + +For example, here's how to handle a string, a 64-bit number, another string, and +a timestamp: + +.. code-block:: c++ + + // … + const ProfilerString8View& aString, + const int64_t aBytes, + const ProfilerString8View& aURL, + const TimeStamp& aTime) { + +Then the body of the function turns these parameters into a JSON stream. + +When this function is called, the writer has just started a JSON object, so +everything that is written should be a named object property. Use +``SpliceableJSONWriter`` functions, in most cases ``...Property`` functions +from its parent class ``JSONWriter``: ``NullProperty``, ``BoolProperty``, +``IntProperty``, ``DoubleProperty``, ``StringProperty``. (Other nested JSON +types like arrays or objects are not supported by the profiler.) + +As a special case, ``TimeStamps`` must be streamed using ``aWriter.TimeProperty(timestamp)``. + +The property names will be used to identify where each piece of data is stored and +how it should be displayed on profiler.firefox.com (see next section). + +Here's how the above functions parameters could be streamed: + +.. code-block:: c++ + + // … + aWriter.StringProperty("myString", aString); + aWriter.IntProperty("myBytes", aBytes); + aWriter.StringProperty("myURL", aURL); + aWriter.TimeProperty("myTime", aTime); + } + +.. _marker-type-display-schema: + +Marker Type Display Schema +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Now that we have defined how to stream type-specific data (from Firefox to +profiler.firefox.com), we need to describe where and how this data will be +displayed on profiler.firefox.com. + +The static member function ``MarkerTypeDisplay`` returns an opaque ``MarkerSchema`` +object, which will be forwarded to profiler.firefox.com. + +.. code-block:: c++ + + // … + static MarkerSchema MarkerTypeDisplay() { + +The ``MarkerSchema`` type will be used repeatedly, so for convenience we can define +a local type alias: + +.. code-block:: c++ + + // … + using MS = MarkerSchema; + +First, we construct the ``MarkerSchema`` object to be returned at the end. + +One or more constructor arguments determine where this marker will be displayed in +the profiler.firefox.com UI. See the `MarkerSchema::Location enumeration for the +full list `_. + +Here is the most common set of locations, showing markers of that type in both the +Marker Chart and the Marker Table panels: + +.. code-block:: c++ + + // … + MS schema(MS::Location::MarkerChart, MS::Location::MarkerTable); + +Some labels can optionally be specified, to display certain information in different +locations: ``SetChartLabel``, ``SetTooltipLabel``, and ``SetTableLabel``; or +``SetAllLabels`` to define all of them the same way. + +The arguments is a string that may refer to marker data within braces: + +* ``{marker.name}``: Marker name. +* ``{marker.data.X}``: Type-specific data, as streamed with property name "X" from ``StreamJSONMarkerData`` (e.g., ``aWriter.IntProperty("X", aNumber);`` + +For example, here's how to set the Marker Chart label to show the marker name and the +``myBytes`` number of bytes: + +.. code-block:: c++ + + // … + schema.SetChartLabel("{marker.name} – {marker.data.myBytes}"); + +profiler.firefox.com will apply the label with the data in a consistent manner. For +example, with this label definition, it could display marker information like the +following in the Firefox Profiler's Marker Chart: + + * "Marker Name – 10B" + * "Marker Name – 25.204KB" + * "Marker Name – 512.54MB" + +For implementation details on this processing, see `src/profiler-logic/marker-schema.js `_ +in the profiler's front-end. + +Next, define the main display of marker data, which will appear in the Marker +Chart tooltips and the Marker Table sidebar. + +Each row may either be: + +* A dynamic key-value pair, using one of the ``MarkerSchema::AddKey...`` functions. Each function is given: + + * Key: Element property name as streamed in ``StreamJSONMarkerData``. + * Label: Optional prefix. Defaults to the key name. + * Format: How to format the data element value, see `MarkerSchema::Format for details `_. + * Searchable: Optional boolean, indicates if the value is used in searches, defaults to false. + +* Or a fixed label and value strings, using ``MarkerSchema::AddStaticLabelValue``. + +.. code-block:: c++ + + // … + schema.AddKeyLabelFormatSearchable( + "myString", "My String", MS::Format::String, true); + schema.AddKeyLabelFormat( + "myBytes", "My Bytes", MS::Format::Bytes); + schema.AddKeyLabelFormat( + "myUrl", "My URL", MS::Format::Url); + schema.AddKeyLabelFormat( + "myTime", "Event time", MS::Format::Time); + +Finally the ``schema`` object is returned from the function: + +.. code-block:: c++ + + // … + return schema; + } + +Any other ``struct`` member function is ignored. There could be utility functions used by the above +compulsory functions, to make the code clearer. + +And that is the end of the marker definition ``struct``. + +.. code-block:: c++ + + // … + }; + +Performance Considerations +-------------------------- + +During profiling, it is best to reduce the amount of work spent doing profiler +operations, as they can influence the performance of the code that you want to profile. + +Whenever possible, consider passing simple types to marker functions, such that +``StreamJSONMarkerData`` will do the minimum amount of work necessary to serialize +the marker type-specific arguments to its internal buffer representation. POD types +(numbers) and strings are the easiest and cheapest to serialize. Look at the +corresponding ``ProfileBufferEntryWriter::Serializer`` specializations if you +want to better understand the work done. + +Avoid doing expensive operations when recording markers. E.g.: ``printf`` of +different things into a string, or complex computations; instead pass the +``printf``/computation arguments straight through to the marker function, so that +``StreamJSONMarkerData`` can do the expensive work at the end of the profiling session. + +Marker Architecture Description +------------------------------- + +The above sections should give all the information needed for adding your own marker +types. However, if you are wanting to work on the marker architecture itself, this +section will describe how the system works. + +TODO: + * Briefly describe the buffer and serialization. + * Describe the template strategy for generating marker types + * Describe the serialization and link to profiler front-end docs on marker processing (if they exist) diff --git a/tools/profiler/docs/memory.rst b/tools/profiler/docs/memory.rst new file mode 100644 index 0000000000..347a91f9e7 --- /dev/null +++ b/tools/profiler/docs/memory.rst @@ -0,0 +1,46 @@ +Profiling Memory +================ + +Sampling stacks from native allocations +--------------------------------------- + +The profiler can sample allocations and de-allocations from malloc using the +"Native Allocations" feature. This can be enabled by going to `about:profiling` and +enabling the "Native Allocations" checkbox. It is only available in Nightly, as it +uses a technique of hooking into malloc that could be a little more risky to apply to +the broader population of Firefox users. + +This implementation is located in: `tools/profiler/core/memory_hooks.cpp +`_ + +It works by hooking into all of the malloc calls. When the profiler is running, it +performs a `Bernoulli trial`_, that will pass for a given probability of per-byte +allocated. What this means is that larger allocations have a higher chance of being +recorded compared to smaller allocations. Currently, there is no way to configure +the per-byte probability. This means that sampled allocation sizes will be closer +to the actual allocated bytes. + +This infrastructure is quite similar to DMD, but with the additional motiviations of +making it easy to turn on and use with the profiler. The overhead is quite high, +especially on systems with more expensive stack walking, like Linux. Turning off +thee "Native Stacks" feature can help lower overhead, but will give less information. + +For more information on analyzing these profiles, see the `Firefox Profiler docs`_. + +Memory counters +--------------- + +Similar to the Native Allocations feature, memory counters use the malloc memory hook +that is only available in Nightly. When it's available, the memory counters are always +turned on. This is a lightweight way to count in a very granular fashion how much +memory is being allocated and deallocated during the profiling session. + +This information is then visualized in the `Firefox Profiler memory track`_. + +This feature uses the `Profiler Counters`_, which can be used to create other types +of cheap counting instrumentation. + +.. _Bernoulli trial: https://en.wikipedia.org/wiki/Bernoulli_trial +.. _Firefox Profiler docs: https://profiler.firefox.com/docs/#/./memory-allocations +.. _Firefox Profiler memory track: https://profiler.firefox.com/docs/#/./memory-allocations?id=memory-track +.. _Profiler Counters: https://searchfox.org/mozilla-central/source/tools/profiler/public/ProfilerCounts.h diff --git a/tools/profiler/docs/profilerclasses-20220913.png b/tools/profiler/docs/profilerclasses-20220913.png new file mode 100644 index 0000000000..a5ba265407 Binary files /dev/null and b/tools/profiler/docs/profilerclasses-20220913.png differ diff --git a/tools/profiler/docs/profilerclasses.umlet.uxf b/tools/profiler/docs/profilerclasses.umlet.uxf new file mode 100644 index 0000000000..c807853401 --- /dev/null +++ b/tools/profiler/docs/profilerclasses.umlet.uxf @@ -0,0 +1,811 @@ + + + 10 + + UMLClass + + 80 + 370 + 340 + 190 + + ThreadInfo +-- +-mName: nsCString +-mRegisterTime: TimeStamp +-mThreadId: int +-mIsMainThread: bool +-- +NS_INLINE_DECL_THREADSAFE_REFCOUNTING ++Name() ++RegisterTime() ++ThreadId() ++IsMainThread() + + + + + UMLClass + + 470 + 300 + 600 + 260 + + RacyRegisteredThread +-- +-mProfilingStackOwner: NotNull<RefPtr<ProfilingStackOwner>> +-mThreadId +-mSleep: Atomic<int> /* AWAKE, SLEEPING_NOT_OBSERVED, SLEEPING_OBSERVED */ +-mIsBeingProfiled: Atomic<bool, Relaxed> +-- ++SetIsBeingProfiled() ++IsBeingProfiled() ++ReinitializeOnResume() ++CanDuplicateLastSampleDueToSleep() ++SetSleeping() ++SetAwake() ++IsSleeping() ++ThreadId() ++ProfilingStack() ++ProfilingStackOwner() + + + + UMLClass + + 470 + 650 + 350 + 360 + + RegisteredThread +-- +-mPlatformData: UniquePlatformData +-mStackTop: const void* +-mThread: nsCOMPtr<nsIThread> +-mContext: JSContext* +-mJSSampling: enum {INACTIVE, ACTIVE_REQUESTED, ACTIVE, INACTIVE_REQUESTED} +-mmJSFlags: uint32_t +-- ++RacyRegisteredThread() ++GetPlatformData() ++StackTop() ++GetRunningEventDelay() ++SizeOfIncludingThis() ++SetJSContext() ++ClearJSContext() ++GetJSContext() ++Info(): RefPtr<ThreadInfo> ++GetEventTarget(): nsCOMPtr<nsIEventTarget> ++ResetMainThread(nsIThread*) ++StartJSSampling() ++StopJSSampling() ++PollJSSampling() + + + + + Relation + + 750 + 550 + 180 + 120 + + lt=<<<<<- +mRacyRegisteredThread + 10.0;100.0;10.0;10.0 + + + Relation + + 290 + 550 + 230 + 120 + + lt=<<<<- +mThreadInfo: RefPtr<> + 210.0;100.0;10.0;10.0 + + + UMLClass + + 70 + 660 + 340 + 190 + + PageInformation +-- +-mBrowsingContextID: uint64_t +-mInnerWindowID: uint64_t +-mUrl: nsCString +-mEmbedderInnerWindowID: uint64_t +-- +NS_INLINE_DECL_THREADSAFE_REFCOUNTING ++SizeOfIncludingThis(MallocSizeOf) ++Equals(PageInformation*) ++StreamJSON(SpliceableJSONWriter&) ++InnerWindowID() ++BrowsingContextID() ++Url() ++EmbedderInnerWindowID() ++BufferPositionWhenUnregistered(): Maybe<uint64_t> ++NotifyUnregistered(aBufferPosition: uint64_t) + + + + UMLClass + + 760 + 1890 + 570 + 120 + + ProfilerBacktrace +-- +-mName: UniqueFreePtr<char> +-mThreadId: int +-mProfileChunkedBuffer: UniquePtr<ProfileChunkedBuffer> +-mProfileBuffer: UniquePtr<ProfileBuffer> +-- ++StreamJSON(SpliceableJSONWriter&, aProcessStartTime: TimeStamp, UniqueStacks&) + + + + + UMLClass + + 20 + 2140 + 620 + 580 + + ProfileChunkedBuffer +-- +-mMutex: BaseProfilerMaybeMutex +-mChunkManager: ProfileBufferChunkManager* +-mOwnedChunkManager: UniquePtr<ProfileBufferChunkManager> +-mCurrentChunk: UniquePtr<ProfileBufferChunk> +-mNextChunks: UniquePtr<ProfileBufferChunk> +-mRequestedChunkHolder: RefPtr<RequestedChunkRefCountedHolder> +-mNextChunkRangeStart: ProfileBufferIndex +-mRangeStart: Atomic<ProfileBufferIndex, ReleaseAcquire> +-mRangeEnd: ProfileBufferIndex +-mPushedBlockCount: uint64_t +-mClearedBlockCount: Atomic<uint64_t, ReleaseAcquire> +-- ++Byte = ProfileBufferChunk::Byte ++Length = ProfileBufferChunk::Length ++IsThreadSafe() ++IsInSession() ++ResetChunkManager() ++SetChunkManager() ++Clear() ++BufferLength(): Maybe<size_t> ++SizeOfExcludingThis(MallocSizeOf) ++SizeOfIncludingThis(MallocSizeOf) ++GetState() ++IsThreadSafeAndLockedOnCurrentThread(): bool ++LockAndRun(Callback&&) ++ReserveAndPut(CallbackEntryBytes&&, Callback<auto(Maybe<ProfileBufferEntryWriter>&)>&&) ++Put(aEntryBytes: Length, Callback<auto(Maybe<ProfileBufferEntryWriter>&)>&&) ++PutFrom(const void*, Length) ++PutObjects(const Ts&...) ++PutObject(const T&) ++GetAllChunks() ++Read(Callback<void(Reader&)>&&): bool ++ReadEach(Callback<void(ProfileBufferEntryReader& [, ProfileBufferBlockIndex])>&&) ++ReadAt(ProfileBufferBlockIndex, Callback<void(Maybe<ProfileBufferEntryReader>&&)>&&) ++AppendContents + + + + UMLClass + + 810 + 2100 + 500 + 620 + + ProfileBufferChunk +-- ++Header: { + mOffsetFirstBlock; mOffsetPastLastBlock; mDoneTimeStamp; + mBufferBytes; mBlockCount; mRangeStart; mProcessId; + } +-InternalHeader: { mHeader: Header; mNext: UniquePtr<ProfileBufferChunk>; } +-- +-mInternalHeader: InternalHeader +-mBuffer: Byte /* First byte */ +-- ++Byte = uint8_t ++Length = uint32_t ++SpanOfBytes = Span<Byte> +/+Create(aMinBufferBytes: Length): UniquePtr<ProfileBufferChunk>/ ++ReserveInitialBlockAsTail(Length): SpanOfBytes ++ReserveBlock(Length): { SpanOfBytes, ProfileBufferBlockIndex } ++MarkDone() ++MarkRecycled() ++ChunkHeader() ++BufferBytes() ++ChunkBytes() ++SizeOfExcludingThis(MallocSizeOf) ++SizeOfIncludingThis(MallocSizeOf) ++RemainingBytes(): Length ++OffsetFirstBlock(): Length ++OffsetPastLastBlock(): Length ++BlockCount(): Length ++ProcessId(): int ++SetProcessId(int) ++RangeStart(): ProfileBufferIndex ++SetRangeStart(ProfileBufferIndex) ++BufferSpan(): Span<const Byte> ++ByteAt(aOffset: Length) ++GetNext(): maybe-const ProfileBufferChunk* ++ReleaseNext(): UniquePtr<ProfileBufferChunk> ++InsertNext(UniquePtr<ProfileBufferChunk>&&) ++Last(): const ProfileBufferChunk* ++SetLast(UniquePtr<ProfileBufferChunk>&&) +/+Join(UniquePtr<ProfileBufferChunk>&&, UniquePtr<ProfileBufferChunk>&&)/ + + + + + UMLClass + + 120 + 2850 + 570 + 350 + + ProfileBufferEntryReader +-- +-mCurrentSpan: SpanOfConstBytes +-mNextSpanOrEmpty: SpanOfConstBytes +-mCurrentBlockIndex: ProfileBufferBlockIndex +-mNextBlockIndex: ProfileBufferBlockIndex +-- ++RemainingBytes(): Length ++SetRemainingBytes(Length) ++CurrentBlockIndex(): ProfileBufferBlockIndex ++NextBlockIndex(): ProfileBufferBlockIndex ++EmptyIteratorAtOffset(Length): ProfileBufferEntryReader ++operator*(): const Byte& ++operator++(): ProfileBufferEntryReader& ++operator+=(Length): ProfileBufferEntryReader& ++operator==(const ProfileBufferEntryReader&) ++operator!=(const ProfileBufferEntryReader&) ++ReadULEB128<T>(): T ++ReadBytes(void*, Length) ++ReadIntoObject(T&) ++ReadIntoObjects(Ts&...) ++ReadObject<T>(): T + + + + UMLClass + + 740 + 2850 + 570 + 300 + + ProfileBufferEntryWriter +-- +-mCurrentSpan: SpanOfBytes +-mNextSpanOrEmpty: SpanOfBytes +-mCurrentBlockIndex: ProfileBufferBlockIndex +-mNextBlockIndex: ProfileBufferBlockIndex +-- ++RemainingBytes(): Length ++CurrentBlockIndex(): ProfileBufferBlockIndex ++NextBlockIndex(): ProfileBufferBlockIndex ++operator*(): Byte& ++operator++(): ProfileBufferEntryReader& ++operator+=(Length): ProfileBufferEntryReader& +/+ULEB128Size(T): unsigned/ ++WriteULEB128(T) +/+SumBytes(const Ts&...): Length/ ++WriteFromReader(ProfileBufferEntryReader&, Length) ++WriteObject(const T&) ++WriteObjects(const T&) + + + + UMLClass + + 120 + 3270 + 570 + 80 + + ProfileBufferEntryReader::Deserializer<T> +/to be specialized for all types read from ProfileBufferEntryReader/ +-- +/+ReadInto(ProfileBufferEntryReader&, T&)/ +/+Read<T>(ProfileBufferEntryReader&): T/ + + + + UMLClass + + 740 + 3270 + 570 + 80 + + ProfileBufferEntryWriter::Serializer<T> +/to be specialized for all types written into ProfileBufferEntryWriter/ +-- +/+Bytes(const T&): Length/ +/+Write(ProfileBufferEntryWriter&, const T&)/ + + + + Relation + + 330 + 2710 + 110 + 160 + + lt=.> +<<creates>> + 10.0;10.0;60.0;140.0 + + + Relation + + 430 + 2710 + 360 + 160 + + lt=.> +<<creates>> + 10.0;10.0;340.0;140.0 + + + Relation + + 660 + 2710 + 260 + 160 + + lt=.> +<<points into>> + 10.0;140.0;240.0;10.0 + + + Relation + + 870 + 2710 + 140 + 160 + + lt=.> +<<points into>> + 10.0;140.0;80.0;10.0 + + + Relation + + 630 + 2170 + 200 + 40 + + lt=<<<<- +mCurrentChunk + 10.0;20.0;180.0;20.0 + + + Relation + + 630 + 2230 + 200 + 40 + + lt=<<<<- +mNextChunks + 10.0;20.0;180.0;20.0 + + + Relation + + 1100 + 2030 + 170 + 90 + + lt=<<<<- +mInternalHeader.mNext + 10.0;70.0;10.0;20.0;150.0;20.0;150.0;70.0 + + + Relation + + 490 + 3190 + 70 + 100 + + lt=.> +<<uses>> + 10.0;10.0;10.0;80.0 + + + Relation + + 580 + 3190 + 230 + 100 + + lt=.> +<<uses>> + 10.0;10.0;210.0;80.0 + + + UMLClass + + 50 + 1620 + 570 + 410 + + ProfileBuffer +-- +-mFirstSamplingTimeNs: double +-mLastSamplingTimeNs: double +-mIntervalNs, etc.: ProfilerStats +-- ++IsThreadSafe(): bool ++AddEntry(const ProfileBufferEntry&): uint64_t ++AddThreadIdEntry(int): uint64_t ++PutObjects(Kind, const Ts&...): ProfileBufferBlockIndex ++CollectCodeLocation(...) ++AddJITInfoForRange(...) ++StreamSamplesToJSON(SpliceableJSONWriter&, aThreadId: int, aSinceTime: double, UniqueStacks&) ++StreamMarkersToJSON(SpliceableJSONWriter&, ...) ++StreamPausedRangesToJSON(SpliceableJSONWriter&, aSinceTime: double) ++StreamProfilerOverheadToJSON(SpliceableJSONWriter&, ...) ++StreamCountersToJSON(SpliceableJSONWriter&, ...) ++DuplicateLsstSample ++DiscardSamplesBeforeTime(aTime: double) ++GetEntry(aPosition: uint64_t): ProfileBufferEntry ++SizeOfExcludingThis(MallocSizeOf) ++SizeOfIncludingThis(MallocSizeOf) ++CollectOverheadStats(...) ++GetProfilerBufferInfo(): ProfilerBufferInfo ++BufferRangeStart(): uint64_t ++BufferRangeEnd(): uint64_t + + + + UMLClass + + 690 + 1620 + 230 + 60 + + ProfileBufferEntry +-- ++mKind: Kind ++mStorage: uint8_t[kNumChars=8] + + + + UMLClass + + 930 + 1620 + 440 + 130 + + UniqueJSONStrings +-- +-mStringTableWriter: SpliceableChunkedJSONWriter +-mStringHashToIndexMap: HashMap<HashNumber, uint32_t> +-- ++SpliceStringTableElements(SpliceableJSONWriter&) ++WriteProperty(JSONWriter&, aName: const char*, aStr: const char*) ++WriteElement(JSONWriter&, aStr: const char*) ++GetOrAddIndex(const char*): uint32_t + + + + UMLClass + + 680 + 1760 + 470 + 110 + + UniqueStack +-- +-mFrameTableWriter: SpliceableChunkedJSONWriter +-mFrameToIndexMap: HashMap<FrameKey, uint32_t, FrameKeyHasher> +-mStackTableWriter: SpliceableChunkedJSONWriter +-mStackToIndexMap: HashMap<StackKey, uint32_t, StackKeyHasher> +-mJITInfoRanges: Vector<JITFrameInfoForBufferRange> + + + + Relation + + 320 + 2020 + 230 + 140 + + lt=<<<<- +mEntries: ProfileChunkedBuffer& + 10.0;10.0;10.0;120.0 + + + Relation + + 610 + 1640 + 100 + 40 + + lt=.> +<<uses>> + 10.0;20.0;80.0;20.0 + + + Relation + + 610 + 1710 + 340 + 40 + + lt=.> +<<uses>> + 10.0;20.0;320.0;20.0 + + + Relation + + 610 + 1800 + 90 + 40 + + lt=.> +<<uses>> + 10.0;20.0;70.0;20.0 + + + Relation + + 610 + 1900 + 170 + 40 + + lt=<<<<- +mProfileBuffer + 150.0;20.0;10.0;20.0 + + + Relation + + 590 + 1940 + 250 + 220 + + lt=<<<<- +mProfileChunkedBuffer + 170.0;10.0;10.0;200.0 + + + UMLClass + + 20 + 1030 + 490 + 550 + + CorePS +-- +/-sInstance: CorePS*/ +-mMainThreadId: int +-mProcessStartTime: TimeStamp +-mCoreBuffer: ProfileChunkedBuffer +-mRegisteredThreads: Vector<UniquePtr<RegisteredThread>> +-mRegisteredPages: Vector<RefPtr<PageInformation>> +-mCounters: Vector<BaseProfilerCount*> +-mLul: UniquePtr<lul::LUL> /* linux only */ +-mProcessName: nsAutoCString +-mJsFrames: JsFrameBuffer +-- ++Create ++Destroy ++Exists(): bool ++AddSizeOf(...) ++MainThreadId() ++ProcessStartTime() ++CoreBuffer() ++RegisteredThreads(PSLockRef) ++JsFrames(PSLockRef) +/+AppendRegisteredThread(PSLockRef, UniquePtr<RegisteredThread>)/ +/+RemoveRegisteredThread(PSLockRef, RegisteredThread*)/ ++RegisteredPages(PSLockRef) +/+AppendRegisteredPage(PSLockRef, RefPtr<PageInformation>)/ +/+RemoveRegisteredPage(PSLockRef, aRegisteredInnerWindowID: uint64_t)/ +/+ClearRegisteredPages(PSLockRef)/ ++Counters(PSLockRef) ++AppendCounter ++RemoveCounter ++Lul(PSLockRef) ++SetLul(PSLockRef, UniquePtr<lul::LUL>) ++ProcessName(PSLockRef) ++SetProcessName(PSLockRef, const nsACString&) + + + + + Relation + + 20 + 1570 + 110 + 590 + + lt=<<<<<- +mCoreBuffer + 10.0;10.0;10.0;570.0 + + + Relation + + 160 + 840 + 150 + 210 + + lt=<<<<- +mRegisteredPages + 10.0;190.0;10.0;10.0 + + + Relation + + 250 + 840 + 240 + 210 + + lt=<<<<- +mRegisteredThreads + 10.0;190.0;220.0;10.0 + + + UMLClass + + 920 + 860 + 340 + 190 + + SamplerThread +-- +-mSampler: Sampler +-mActivityGeneration: uint32_t +-mIntervalMicroseconds: int +-mThread /* OS-specific */ +-mPostSamplingCallbackList: UniquePtr<PostSamplingCallbackListItem> +-- ++Run() ++Stop(PSLockRef) ++AppendPostSamplingCallback(PSLockRef, PostSamplingCallback&&) + + + + UMLClass + + 1060 + 600 + 340 + 190 + + Sampler +-- +-mOldSigprofHandler: sigaction +-mMyPid: int +-mSamplerTid: int ++sSigHandlerCoordinator +-- ++Disable(PSLockRef) ++SuspendAndSampleAndResumeThread(PSLockRef, const RegisteredThread&, aNow: TimeStamp, const Func&) + + + + + Relation + + 1190 + 780 + 90 + 100 + + lt=<<<<<- +mSampler + 10.0;80.0;10.0;10.0 + + + UMLClass + + 610 + 1130 + 470 + 400 + + ActivePS +-- +/-sInstance: ActivePS*/ +-mGeneration: const uint32_t +/-sNextGeneration: uint32_t/ +-mCapacity: const PowerOfTwo +-mDuration: const Maybe<double> +-mInterval: const double /* milliseconds */ +-mFeatures: const uint32_t +-mFilters: Vector<std::string> +-mActiveBrowsingContextID: uint64_t +-mProfileBufferChunkManager: ProfileBufferChunkManagerWithLocalLimit +-mProfileBuffer: ProfileBuffer +-mLiveProfiledThreads: Vector<LiveProfiledThreadData> +-mDeadProfiledThreads: Vector<UniquePtr<ProfiledThreadData>> +-mDeadProfiledPages: Vector<RefPtr<PageInformation>> +-mSamplerThread: SamplerThread* const +-mInterposeObserver: RefPtr<ProfilerIOInterposeObserver> +-mPaused: bool +-mWasPaused: bool /* linux */ +-mBaseProfileThreads: UniquePtr<char[]> +-mGeckoIndexWhenBaseProfileAdded: ProfileBufferBlockIndex +-mExitProfiles: Vector<ExitProfile> +-- ++ + + + + Relation + + 970 + 1040 + 140 + 110 + + lt=<<<<- +mSamplerThread + 10.0;90.0;10.0;10.0 + + + UMLNote + + 500 + 160 + 510 + 100 + + bg=red +This document pre-dates the generated image profilerclasses-20220913.png! +Unfortunately, the changes to make the image were lost. + +This previous version may still be useful to start reconstructing the image, +if there is a need to update it. + + + diff --git a/tools/profiler/docs/profilerthreadregistration-20220913.png b/tools/profiler/docs/profilerthreadregistration-20220913.png new file mode 100644 index 0000000000..8f7049d743 Binary files /dev/null and b/tools/profiler/docs/profilerthreadregistration-20220913.png differ diff --git a/tools/profiler/docs/profilerthreadregistration.umlet.uxf b/tools/profiler/docs/profilerthreadregistration.umlet.uxf new file mode 100644 index 0000000000..3e07215db4 --- /dev/null +++ b/tools/profiler/docs/profilerthreadregistration.umlet.uxf @@ -0,0 +1,710 @@ + + + 10 + + UMLClass + + 200 + 330 + 370 + 250 + + ThreadRegistry::OffThreadRef +-- ++UnlockedConstReaderCRef() const ++WithUnlockedConstReader(F&& aF) const ++UnlockedConstReaderAndAtomicRWCRef() const ++WithUnlockedConstReaderAndAtomicRW(F&& aF) const ++UnlockedConstReaderAndAtomicRWRef() ++WithUnlockedConstReaderAndAtomicRW(F&& aF) ++UnlockedRWForLockedProfilerCRef() ++WithUnlockedRWForLockedProfiler(F&& aF) ++UnlockedRWForLockedProfilerRef() ++WithUnlockedRWForLockedProfiler(F&& aF) ++ConstLockedRWFromAnyThread() ++WithConstLockedRWFromAnyThread(F&& aF) ++LockedRWFromAnyThread() ++WithLockedRWFromAnyThread(F&& aF) + + + + UMLClass + + 310 + 80 + 560 + 160 + + ThreadRegistry +-- +-sRegistryMutex: RegistryMutex (aka BaseProfilerSharedMutex) +/exclusive lock used during un/registration, shared lock for other accesses/ +-- +friend class ThreadRegistration +-Register(ThreadRegistration::OnThreadRef) +-Unregister(ThreadRegistration::OnThreadRef) +-- ++WithOffThreadRef(ProfilerThreadId, auto&& aF) static ++WithOffThreadRefOr(ProfilerThreadId, auto&& aF, auto&& aFallbackReturn) static: auto + + + + UMLClass + + 310 + 630 + 530 + 260 + + ThreadRegistration +-- +-mDataMutex: DataMutex (aka BaseProfilerMutex) +-mIsOnHeap: bool +-mIsRegistryLockedSharedOnThisThread: bool +-tlsThreadRegistration: MOZ_THREAD_LOCAL(ThreadRegistration*) +-GetTLS() static: tlsThreadRegistration* +-GetFromTLS() static: ThreadRegistration* +-- ++ThreadRegistration(const char* aName, const void* aStackTop) ++~ThreadRegistration() ++RegisterThread(const char* aName, const void* aStackTop) static: ProfilingStack* ++UnregisterThread() static ++IsRegistered() static: bool ++GetOnThreadPtr() static OnThreadPtr ++WithOnThreadRefOr(auto&& aF, auto&& aFallbackReturn) static: auto ++IsDataMutexLockedOnCurrentThread() static: bool + + + + UMLClass + + 880 + 620 + 450 + 290 + + ThreadRegistration::OnThreadRef +-- ++UnlockedConstReaderCRef() const ++WithUnlockedConstReader(auto&& aF) const: auto ++UnlockedConstReaderAndAtomicRWCRef() const ++WithUnlockedConstReaderAndAtomicRW(auto&& aF) const: auto ++UnlockedConstReaderAndAtomicRWRef() ++WithUnlockedConstReaderAndAtomicRW(auto&& aF): auto ++UnlockedRWForLockedProfilerCRef() const ++WithUnlockedRWForLockedProfiler(auto&& aF) const: auto ++UnlockedRWForLockedProfilerRef() ++WithUnlockedRWForLockedProfiler(auto&& aF): auto ++UnlockedReaderAndAtomicRWOnThreadCRef() const ++WithUnlockedReaderAndAtomicRWOnThread(auto&& aF) const: auto ++UnlockedReaderAndAtomicRWOnThreadRef() ++WithUnlockedReaderAndAtomicRWOnThread(auto&& aF): auto ++RWOnThreadWithLock LockedRWOnThread() ++WithLockedRWOnThread(auto&& aF): auto + + + + UMLClass + + 1040 + 440 + 230 + 70 + + ThreadRegistration::OnThreadPtr +-- ++operator*(): OnThreadRef ++operator->(): OnThreadRef + + + + UMLClass + + 450 + 940 + 350 + 240 + + ThreadRegistrationData +-- +-mProfilingStack: ProfilingStack +-mStackTop: const void* const +-mThread: nsCOMPtr<nsIThread> +-mJSContext: JSContext* +-mJsFrameBuffer: JsFrame* +-mJSFlags: uint32_t +-Sleep: Atomic<int> +-mThreadCpuTimeInNsAtLastSleep: Atomic<uint64_t> +-mWakeCount: Atomic<uint64_t, Relaxed> +-mRecordWakeCountMutex: BaseProfilerMutex +-mAlreadyRecordedWakeCount: uint64_t +-mAlreadyRecordedCpuTimeInMs: uin64_t +-mThreadProfilingFeatures: ThreadProfilingFeatures + + + + UMLClass + + 460 + 1220 + 330 + 80 + + ThreadRegistrationUnlockedConstReader +-- ++Info() const: const ThreadRegistrationInfo& ++PlatformDataCRef() const: const PlatformData& ++StackTop() const: const void* + + + + UMLClass + + 440 + 1340 + 370 + 190 + + ThreadRegistrationUnlockedConstReaderAndAtomicRW +-- ++ProfilingStackCRef() const: const ProfilingStack& ++ProfilingStackRef(): ProfilingStack& ++ProfilingFeatures() const: ThreadProfilingFeatures ++SetSleeping() ++SetAwake() ++GetNewCpuTimeInNs(): uint64_t ++RecordWakeCount() const ++ReinitializeOnResume() ++CanDuplicateLastSampleDueToSleep(): bool ++IsSleeping(): bool + + + + UMLClass + + 460 + 1570 + 330 + 60 + + ThreadRegistrationUnlockedRWForLockedProfiler +-- ++GetProfiledThreadData(): const ProfiledThreadData* ++GetProfiliedThreadData(): ProfiledThreadData* + + + + UMLClass + + 430 + 1670 + 390 + 50 + + ThreadRegistrationUnlockedReaderAndAtomicRWOnThread +-- ++GetJSContext(): JSContext* + + + + UMLClass + + 380 + 1840 + 490 + 190 + + ThreadRegistrationLockedRWFromAnyThread +-- ++SetProfilingFeaturesAndData( + ThreadProfilingFeatures, ProfiledThreadData*, const PSAutoLock&) ++ClearProfilingFeaturesAndData(const PSAutoLock&) ++GetJsFrameBuffer() const JsFrame* ++GetEventTarget() const: const nsCOMPtr<nsIEventTarget> ++ResetMainThread() ++GetRunningEventDelay(const TimeStamp&, TimeDuration&, TimeDuration&) ++StartJSSampling(uint32_t) ++StopJSSampling() + + + + UMLClass + + 490 + 2070 + 260 + 80 + + ThreadRegistrationLockedRWOnThread +-- ++SetJSContext(JSContext*) ++ClearJSContext() ++PollJSSampling() + + + + Relation + + 610 + 1170 + 30 + 70 + + lt=<<- + 10.0;10.0;10.0;50.0 + + + UMLClass + + 500 + 2190 + 240 + 60 + + ThreadRegistration::EmbeddedData +-- + + + + Relation + + 610 + 1290 + 30 + 70 + + lt=<<- + 10.0;10.0;10.0;50.0 + + + Relation + + 610 + 1520 + 30 + 70 + + lt=<<- + 10.0;10.0;10.0;50.0 + + + Relation + + 610 + 1620 + 30 + 70 + + lt=<<- + 10.0;10.0;10.0;50.0 + + + Relation + + 650 + 1710 + 30 + 150 + + lt=<<- + 10.0;10.0;10.0;130.0 + + + Relation + + 610 + 2020 + 30 + 70 + + lt=<<- + 10.0;10.0;10.0;50.0 + + + Relation + + 610 + 2140 + 30 + 70 + + lt=<<- + 10.0;10.0;10.0;50.0 + + + Relation + + 340 + 880 + 180 + 1370 + + lt=->>>>> +mData + 160.0;1350.0;10.0;1350.0;10.0;10.0 + + + UMLClass + + 990 + 930 + 210 + 100 + + ThreadRegistrationInfo +-- ++Name(): const char* ++RegisterTime(): const TimeStamp& ++ThreadId(): ProfilerThreadId ++IsMainThread(): bool + + + + Relation + + 790 + 980 + 220 + 40 + + lt=->>>>> +mInfo + 200.0;20.0;10.0;20.0 + + + UMLClass + + 990 + 1040 + 210 + 50 + + PlatformData +-- + + + + + Relation + + 790 + 1040 + 220 + 40 + + lt=->>>>> +mPlatformData + 200.0;20.0;10.0;20.0 + + + UMLClass + + 990 + 1100 + 210 + 60 + + ProfiledThreadData +-- + + + + Relation + + 790 + 1100 + 220 + 40 + + lt=->>>> +mProfiledThreadData: * + 200.0;20.0;10.0;20.0 + + + Relation + + 710 + 480 + 350 + 170 + + lt=->>>> +m1=0..1 +mThreadRegistration: * + 10.0;150.0;330.0;10.0 + + + Relation + + 830 + 580 + 260 + 130 + + lt=->>>> +m1=1 +mThreadRegistration: * + 10.0;110.0;40.0;20.0;220.0;20.0;240.0;40.0 + + + Relation + + 1140 + 500 + 90 + 140 + + lt=<. +<creates> + 10.0;120.0;10.0;10.0 + + + Relation + + 780 + 900 + 450 + 380 + + lt=<. +<accesses> + 10.0;360.0;430.0;360.0;430.0;10.0 + + + Relation + + 800 + 900 + 510 + 560 + + lt=<. +<accesses> + 10.0;540.0;420.0;540.0;420.0;10.0 + + + Relation + + 780 + 900 + 540 + 720 + + lt=<. +<accesses> + 10.0;700.0;450.0;700.0;450.0;10.0 + + + Relation + + 810 + 900 + 520 + 820 + + lt=<. +<accesses> + 10.0;800.0;430.0;800.0;430.0;10.0 + + + UMLClass + + 900 + 2070 + 410 + 80 + + ThreadRegistration::OnThreadRef::ConstRWOnThreadWithLock +-- +-mDataLock: BaseProfilerAutoLock +-- ++DataCRef() const: ThreadRegistrationLockedRWOnThread& ++operator->() const: ThreadRegistrationLockedRWOnThread& + + + + Relation + + 740 + 2100 + 180 + 40 + + lt=->>>> +mLockedRWOnThread + 10.0;20.0;160.0;20.0 + + + Relation + + 1250 + 900 + 90 + 1190 + + lt=<. +<creates> + 10.0;1170.0;10.0;10.0 + + + Relation + + 660 + 440 + 400 + 210 + + lt=<. +<creates> + 380.0;10.0;10.0;190.0 + + + Relation + + 740 + 880 + 160 + 50 + + lt=<. +<creates> + 140.0;30.0;50.0;30.0;10.0;10.0 + + + Relation + + 460 + 230 + 150 + 120 + + lt=->>>> +m1=0..N +sRegistryContainer: +static Vector<> + 10.0;100.0;10.0;10.0 + + + UMLClass + + 800 + 250 + 470 + 150 + + ThreadRegistry::LockedRegistry +-- +-mRegistryLock: RegistryLockShared (aka BaseProfilerAutoLockShared) +-- ++LockedRegistry() ++~LockedRegistry() ++begin() const: const OffThreadRef* ++end() const: const OffThreadRef* ++begin(): OffThreadRef* ++end(): OffThreadRef* + + + + Relation + + 560 + 350 + 260 + 50 + + lt=<. +<accesses with +shared lock> + 10.0;20.0;240.0;20.0 + + + Relation + + 550 + 390 + 330 + 260 + + lt=<. +<updates +mIsRegistryLockedSharedOnThisThread> + 10.0;240.0;310.0;10.0 + + + Relation + + 330 + 570 + 170 + 80 + + lt=->>>> +m1=1 +mThreadRegistration: * + 120.0;60.0;40.0;10.0 + + + Relation + + 280 + 570 + 200 + 710 + + lt=<. +<accesses> + 180.0;690.0;10.0;690.0;10.0;10.0 + + + Relation + + 270 + 570 + 190 + 890 + + lt=<. +<accesses> + 170.0;870.0;10.0;870.0;10.0;10.0 + + + UMLClass + + 200 + 1740 + 440 + 80 + + ThreadRegistry::OffThreadRef::{,Const}RWFromAnyThreadWithLock +-- +-mDataLock: BaseProfilerAutoLock +-- ++DataCRef() {,const}: ThreadRegistrationLockedRWOnThread& ++operator->() {,const}: ThreadRegistrationLockedRWOnThread& + + + + Relation + + 250 + 570 + 90 + 1190 + + lt=<. +<creates> + 10.0;1170.0;10.0;10.0 + + + Relation + + 180 + 1810 + 220 + 120 + + lt=->>>> +mLockedRWFromAnyThread + 200.0;100.0;80.0;100.0;80.0;10.0 + + -- cgit v1.2.3