From 26a029d407be480d791972afb5975cf62c9360a6 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Fri, 19 Apr 2024 02:47:55 +0200 Subject: Adding upstream version 124.0.1. Signed-off-by: Daniel Baumann --- testing/talos/perfdocs/index.rst | 728 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 728 insertions(+) create mode 100644 testing/talos/perfdocs/index.rst (limited to 'testing/talos/perfdocs/index.rst') diff --git a/testing/talos/perfdocs/index.rst b/testing/talos/perfdocs/index.rst new file mode 100644 index 0000000000..e5e4ebe2a5 --- /dev/null +++ b/testing/talos/perfdocs/index.rst @@ -0,0 +1,728 @@ +===== +Talos +===== + +Talos is a cross-platform Python performance testing framework that is specifically for +Firefox on desktop. New performance tests should be added to the newer framework +`mozperftest `_ unless there are limitations +there (highly unlikely) that make it absolutely necessary to add them to Talos. Talos is +named after the `bronze automaton from Greek myth `_. + +.. contents:: + :depth: 1 + :local: + +Talos tests are run in a similar manner to xpcshell and mochitests. They are started via +the command :code:`mach talos-test`. A `python script `_ +then launches Firefox, which runs the tests via JavaScript special powers. The test timing +information is recorded in a text log file, e.g. :code:`browser_output.txt`, and then processed +into the `JSON format supported by Perfherder `_. + +Talos bugs can be filed in `Testing::Talos `_. + +Talos infrastructure is still mostly documented `on the Mozilla Wiki `_. +In addition, there are plans to surface all of the individual tests using PerfDocs. +This work is tracked in `Bug 1674220 `_. + +Examples of current Talos runs can be `found in Treeherder by searching for "Talos" `_. +If none are immediately available, then scroll to the bottom of the page and load more test +runs. The tests all share a group symbol starting with a :code:`T`, for example +:code:`T(c d damp g1)` or :code:`T-gli(webgl)`. + +Running Talos Locally +********************* + +Running tests locally is most likely only useful for debugging what is going on in a test, +as the test output is only reported as raw JSON. The CLI is documented via: + +.. code-block:: bash + + ./mach talos-test --help + +To quickly try out the :code:`./mach talos-test` command, the following can be run to do a +single run of the DevTools' simple netmonitor test. + +.. code-block:: bash + + # Run the "simple.netmonitor" test very quickly with 1 cycle, and 1 page cycle. + ./mach talos-test --activeTests damp --subtests simple.netmonitor --cycles 1 --tppagecycles 1 + + +The :code:`--print-suites` and :code:`--print-tests` are two helpful command flags to +figure out what suites and tests are available to run. + +.. code-block:: bash + + # Print out the suites: + ./mach talos-test --print-suites + + # Available suites: + # bcv (basic_compositor_video) + # chromez (about_preferences_basic:tresize) + # dromaeojs (dromaeo_css:kraken) + # ... + + # Run all of the tests in the "bcv" test suite: + ./mach talos-test --suite bcv + + # Print out the tests: + ./mach talos-test --print-tests + + # Available tests: + # ================ + # + # a11yr + # ----- + # This test ensures basic a11y tables and permutations do not cause + # performance regressions. + # + # ... + + # Run the tests in "a11yr" listed above + ./mach talos-test --activeTests a11yr + +Running Talos on Try +******************** + +Talos runs can be generated through the mach try fuzzy finder: + +.. code-block:: bash + + ./mach try fuzzy + +The following is an example output at the time of this writing. Refine the query for the +platform and test suites of your choosing. + +.. code-block:: + + | test-windows10-64-qr/opt-talos-bcv-swr-e10s + | test-linux64-shippable/opt-talos-webgl-e10s + | test-linux64-shippable/opt-talos-other-e10s + | test-linux64-shippable-qr/opt-talos-g5-e10s + | test-linux64-shippable-qr/opt-talos-g4-e10s + | test-linux64-shippable-qr/opt-talos-g3-e10s + | test-linux64-shippable-qr/opt-talos-g1-e10s + | test-windows10-64/opt-talos-webgl-gli-e10s + | test-linux64-shippable/opt-talos-tp5o-e10s + | test-linux64-shippable/opt-talos-svgr-e10s + | test-linux64-shippable/opt-talos-damp-e10s + > test-windows7-32/opt-talos-webgl-gli-e10s + | test-linux64-shippable/opt-talos-bcv-e10s + | test-linux64-shippable/opt-talos-g5-e10s + | test-linux64-shippable/opt-talos-g4-e10s + | test-linux64-shippable/opt-talos-g3-e10s + | test-linux64-shippable/opt-talos-g1-e10s + | test-linux64-qr/opt-talos-bcv-swr-e10s + + For more shortcuts, see mach help try fuzzy and man fzf + select: , accept: , cancel: , select-all: , cursor-up: , cursor-down: + 1379/2967 + > talos + +At a glance +*********** + +- Tests are defined in + `testing/talos/talos/test.py `__ +- Treeherder abbreviations are defined in + `taskcluster/ci/test/talos.yml `__ +- Suites are defined for production in + `testing/talos/talos.json `__ + +Test lifecycle +************** + +- Taskcluster schedules `talos + jobs `__ +- Taskcluster runs a Talos job on a hardware machine when one is + available - this is bootstrapped by + `mozharness `__ + + - mozharness downloads the build, talos.zip (found in + `talos.json `__), + and creates a virtualenv for running the test. + - mozharness `configures the test and runs + it `__ + - After the test is completed the data is uploaded to + `Perfherder `__ + +- Treeherder displays a green (all OK) status and has a link to + `Perfherder `__ +- 13 pushes later, + `analyze_talos.py `__ + is ran which compares your push to the previous 12 pushes and next 12 + pushes to look for a + `regression `__ + + - If a regression is found, it will be posted on `Perfherder + Alerts `__ + +Test types +********** + +There are two different species of Talos tests: + +- Startup_: Start up the browser and wait for either the load event or the paint event and exit, measuring the time +- `Page load`_: Load a manifest of pages + +In addition we have some variations on existing tests: + +- Heavy_: Run tests with the heavy user profile instead of a blank one +- WebExtension_: Run tests with a WebExtension to see the perf impact extension have +- `Real-world WebExtensions`_: Run tests with a set of 5 popular real-world WebExtensions installed and enabled. + +Some tests measure different things: + +- Paint_: These measure events from the browser like moz_after_paint, etc. +- ASAP_: These tests go really fast and typically measure how many frames we can render in a time window +- Benchmarks_: These are benchmarks that measure specific items and report a summarized score + +Startup +======= + +`Startup +tests `__ +launch Firefox and measure the time to the onload or paint events. We +run this in a series of cycles (default to 20) to generate a full set of +data. Tests that currently are startup tests are: + +- `ts_paint <#ts_paint>`_ +- tpaint_ +- `tresize <#tresize>`_ +- `sessionrestore <#sessionrestore>`_ +- `sessionrestore_no_auto_restore <#sessionrestore_no_auto_restore>`_ +- `sessionrestore_many_windows <#sessionrestore_many_windows>`_ + +Page load +========= + +Many of the talos tests use the page loader to load a manifest of pages. +These are tests that load a specific page and measure the time it takes +to load the page, scroll the page, draw the page etc. In order to run a +page load test, you need a manifest of pages to run. The manifest is +simply a list of URLs of pages to load, separated by carriage returns, +e.g.: + +.. code-block:: none + + https://www.mozilla.org + https://www.mozilla.com + +Example: +`svgx.manifest `__ + +Manifests may also specify that a test computes its own data by +prepending a ``%`` in front of the line: + +.. code-block:: none + + % https://www.mozilla.org + % https://www.mozilla.com + +Example: +`v8.manifest `__ + +The file you created should be referenced in your test config inside of +`test.py `__. +For example, open test.py, and look for the line referring to the test +you want to run: + +.. code-block:: python + + tpmanifest = '${talos}/page_load_test/svgx/svgx.manifest' + tpcycles = 1 # run a single cycle + tppagecycles = 25 # load each page 25 times before moving onto the next page + +Heavy +===== + +All our testing is done with empty blank profiles, this is not ideal for +finding issues for end users. We recently undertook a task to create a +daily update to a profile so it is modern and relevant. It browses a +variety of web pages, and have history and cache to give us a more +realistic scenario. + +The toolchain is documented on +`github `__ and was added +to Talos in `bug +1407398 `__. + +Currently we have issues with this on windows (takes too long to unpack +the files from the profile), so we have turned this off there. Our goal +is to run this on basic pageload and startup tests. + +WebExtension +============= + +WebExtensions are what Firefox has switched to and there are different +code paths and APIs used vs addons. Historically we don't test with +addons (other than our test addons) and are missing out on common +slowdowns. In 2017 we started running some startup and basic pageload +tests with a WebExtension in the profile (`bug +1398974 `__). We +have updated the Extension to be more real world and will continue to do +that. + +Real-world WebExtensions +======================== + +We've added a variation on our test suite that automatically downloads, +installs and enables 5 popular WebExtensions. This is used to measure +things like the impact of real-world WebExtensions on start-up time. + +Currently, the following extensions are installed: + +- Adblock Plus (3.5.2) +- Cisco Webex Extension (1.4.0) +- Easy Screenshot (3.67) +- NoScript (10.6.3) +- Video DownloadHelper (7.3.6) + +Note that these add-ons and versions are "pinned" by being held in a +compressed file that's hosted in an archive by our test infrastructure +and downloaded at test runtime. To update the add-ons in this set, one +must provide a new ZIP file to someone on the test automation team. See +`this comment in +Bugzilla `__. + +Paint +===== + +Paint tests are measuring the time to receive both the +`MozAfterPaint `__ +and OnLoad event instead of just the OnLoad event. Most tests now look +for this unless they are an ASAP test, or an internal benchmark + +ASAP +==== + +We have a variety of tests which we now run in ASAP mode where we render +as fast as possible (disabling vsync and letting the rendering iterate +as fast as it can using \`requestAnimationFrame`). In fact we have +replaced some original tests with the 'x' versions to make them measure. +We do this with RequestAnimationFrame(). + +ASAP tests are: + +- `basic_compositor_video <#basic_compositor_video>`_ +- `displaylist_mutate <#displaylist_mutate>`_ +- `glterrain <#glterrain>`_ +- `rasterflood_svg <#rasterflood_svg>`_ +- `rasterflood_gradient <#rasterflood_gradient>`_ +- `tsvgx <#tsvgx>`_ +- `tscrollx <#tscrollx>`_ +- `tp5o_scroll <#tp5o_scroll>`_ +- `tabswitch <#tabswitch>`_ +- `tart <#tart>`_ + +Benchmarks +========== + +Many tests have internal benchmarks which we report as accurately as +possible. These are the exceptions to the general rule of calculating +the suite score as a geometric mean of the subtest values (which are +median values of the raw data from the subtests). + +Tests which are imported benchmarks are: + +- `ARES6 <#ares6>`_ +- `dromaeo <#dromaeo>`_ +- `JetStream <#jetstream>`_ +- `kraken <#kraken>`_ +- `motionmark <#motionmark>`_ +- `stylebench <#stylebench>`_ + +Row major vs. column major +========================== + +To get more stable numbers, tests are run multiple times. There are two +ways that we do this: row major and column major. Row major means each +test is run multiple times and then we move to the next test (and run it +multiple times). Column major means that each test is run once one after +the other and then the whole sequence of tests is run again. + +More background information about these approaches can be found in Joel +Maher's `Reducing the Noise in +Talos `__ +blog post. + +Page sets +********* + +We run our tests 100% offline, but serve pages via a webserver. Knowing +this we need to store and make available the offline pages we use for +testing. + +tp5pages +======== + +Some tests make use of a set of 50 "real world" pages, known as the tp5n +set. These pages are not part of the talos repository, but without them +the tests which use them won't run. + +- To add these pages to your local setup, download the latest tp5n zip + from `tooltool `__, and extract + it such that ``tp5n`` ends up as ``testing/talos/talos/tests/tp5n``. + You can also obtain it by running a talos test locally to get the zip + into ``testing/talos/talos/tests/``, i.e ``./mach talos-test --suite damp`` +- see also `tp5o <#tp5o>`_. + +{documentation} + +Extra Talos Tests +***************** + +.. contents:: + :depth: 1 + :local: + +File IO +------- + +File IO is tested using the tp5 test set in the `xperf`_ +test. + +Possible regression causes +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- **nonmain_startup_fileio opt (with or without e10s) windows7-32** – + `bug + 1274018 `__ + This test seems to consistently report a higher result for + mozilla-central compared to Try even for an identical revision due to + extension signing checks. In other words, if you are comparing Try + and Mozilla-Central you may see a false-positive regression on + perfherder. Graphs: + `non-e10s `__ + `e10s `__ + +Xres (X Resource Monitoring) +---------------------------- + +A memory metric tracked during tp5 test runs. This metric is sampled +every 20 seconds. This metric is collected on linux only. + +`xres man page `__. + +% CPU +----- + +Cpu usage tracked during tp5 test runs. This metric is sampled every 20 +seconds. This metric is collected on windows only. + +Responsiveness +-------------- + +contact: :jimm, :overholt + +Measures the delay for the event loop to process a `tracer +event `__. +For more details, see `bug +631571 `__. + +The score on this benchmark is proportional to the sum of squares of all +event delays that exceed a 20ms threshold. Lower is better. + +We collect 8000+ data points from the browser during the test and apply +`this +formula `__ +to the results: + +.. code-block:: python + + return sum([float(x)*float(x) / 1000000.0 for x in val_list]) + +tpaint +====== + +.. warning:: + + This test no longer exists + +- contact: :davidb +- source: + `tpaint-window.html `__ +- type: Startup_ +- data: we load the tpaint test window 20 times, resulting in 1 set of + 20 data points. +- summarization: + + - subtest: `ignore first`_ **5** data points, then take the `median`_ of the remaining 15; `source: + test.py `__ + - suite: identical to subtest + ++-----------------+---------------------------------------------------+ +| Talos test name | Description | ++-----------------+---------------------------------------------------+ +| tpaint | twinopen but measuring the time after we receive | +| | the `MozAfterPaint and OnLoad event <#paint>`__. | ++-----------------+---------------------------------------------------+ + +Tests the amount of time it takes the open a new window. This test does +not include startup time. Multiple test windows are opened in +succession, results reported are the average amount of time required to +create and display a window in the running instance of the browser. +(Measures ctrl-n performance.) + +**Example Data** + +.. code-block:: none + + [209.219, 222.180, 225.299, 225.970, 228.090, 229.450, 230.625, 236.315, 239.804, 242.795, 244.5, 244.770, 250.524, 251.785, 253.074, 255.349, 264.729, 266.014, 269.399, 326.190] + +Possible regression causes +-------------------------- + +- None listed yet. If you fix a regression for this test and have some + tips to share, this is a good place for them. + +xperf +===== + +- contact: fx-perf@mozilla.com +- source: `xperf + instrumentation `__ +- type: `Page load`_ (tp5n) / Startup_ +- measuring: IO counters from windows (currently, only startup IO is in + scope) +- reporting: Summary of read/write counters for disk, network (lower is + better) + +These tests only run on windows builds. See `this active-data +query `__ +for an updated set of platforms that xperf can be found on. If the query +is not found, use the following on the query page: + +.. code-block:: javascript + + { + "from":"task", + "groupby":["run.name","build.platform"], + "limit":2000, + "where":{"regex":{"run.name":".*xperf.*"}} + } + +Talos will turn orange for 'x' jobs on windows 7 if your changeset +accesses files which are not predefined in the +`allowlist `__ +during startup; specifically, before the +"`sessionstore-windows-restored `__" +Firefox event. If your job turns orange, you will see a list of files in +Treeherder (or in the log file) which have been accessed unexpectedly +(similar to this): + +.. code-block:: none + + TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0 + TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0 + TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768 + TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0 + TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0 + TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768 + +In the case that these files are expected to be accessed during startup +by your changeset, then we can add them to the +`allowlist `__. + +Xperf runs tp5 while collecting xperf metrics for disk IO and network +IO. The providers we listen for are: + +- `'PROC_THREAD', 'LOADER', 'HARD_FAULTS', 'FILENAME', 'FILE_IO', + 'FILE_IO_INIT' `__ + +The values we collect during stackwalk are: + +- `'FileRead', 'FileWrite', + 'FileFlush' `__ + +Notes: + +- Currently some runs may `return all-zeros and skew the + results `__ +- Additionally, these runs don't have dedicated hardware and have a + large variability. At least 30 runs are likely to be needed to get + stable statistics (xref `bug + 1616236 `__) + +Build metrics +************* + +These are not part of the Talos code, but like Talos they are benchmarks +that record data using the graphserver and are analyzed by the same +scripts for regressions. + +Number of constructors (num_ctors) +================================== + +This test runs at build time and measures the number of static +initializers in the compiled code. Reducing this number is helpful for +`startup +optimizations `__. + +- https://hg.mozilla.org/build/tools/file/348853aee492/buildfarm/utils/count_ctors.py + + - these are run for linux 32+64 opt and pgo builds. + +Platform microbenchmark +*********************** + +IsASCII and IsUTF8 gtest microbenchmarks +======================================== + +- contact: :hsivonen +- source: + `TestStrings.cpp `__ +- type: Microbench_ +- reporting: intervals in ms (lower is better) +- data: each test is run and measured 5 times +- summarization: take the `median`_ of the 5 data points; `source: + MozGTestBench.cpp `__ + +Test whose name starts with PerfIsASCII test the performance of the +XPCOM string IsASCII function with ASCII inputs if different lengths. + +Test whose name starts with PerfIsUTF8 test the performance of the XPCOM +string IsUTF8 function with ASCII inputs if different lengths. + +Possible regression causes +-------------------------- + +- The --enable-rust-simd accidentally getting turned off in automation. +- Changes to encoding_rs internals. +- LLVM optimizations regressing between updates to the copy of LLVM + included in the Rust compiler. + +Microbench +========== + +- contact: :bholley +- source: + `MozGTestBench.cpp `__ +- type: Custom GTest micro-benchmarking +- data: Time taken for a GTest function to execute +- summarization: Not a Talos test. This suite is provides a way to add + low level platform performance regression tests for things that are + not suited to be tested by Talos. + +PerfStrip Tests +=============== + +- contact: :davidb +- source: + https://dxr.mozilla.org/mozilla-central/source/xpcom/tests/gtest/TestStrings.cpp +- type: Microbench_ +- reporting: execution time in ms (lower is better) for 100k function + calls +- data: each test run and measured 5 times +- summarization: + +PerfStripWhitespace - call StripWhitespace() on 5 different test cases +20k times (each) + +PerfStripCharsWhitespace - call StripChars("\f\t\r\n") on 5 different +test cases 20k times (each) + +PerfStripCRLF - call StripCRLF() on 5 different test cases 20k times +(each) + +PerfStripCharsCRLF() - call StripChars("\r\n") on 5 different test cases +20k times (each) + +Stylo gtest microbenchmarks +=========================== + +- contact: :bholley, :SimonSapin +- source: + `gtest `__ +- type: Microbench_ +- reporting: intervals in ms (lower is better) +- data: each test is run and measured 5 times +- summarization: take the `median`_ of the 5 data points; `source: + MozGTestBench.cpp `__ + +Servo_StyleSheet_FromUTF8Bytes_Bench parses a sample stylesheet 20 times +with Stylo’s CSS parser that is written in Rust. It starts from an +in-memory UTF-8 string, so that I/O or UTF-16-to-UTF-8 conversion is not +measured. + +Gecko_nsCSSParser_ParseSheet_Bench does the same with Gecko’s previous +CSS parser that is written in C++, for comparison. + +Servo_DeclarationBlock_SetPropertyById_Bench parses the string "10px" +with Stylo’s CSS parser and sets it as the value of a property in a +declaration block, a million times. This is similar to animations that +are based on JavaScript code modifying Element.style instead of using +CSS @keyframes. + +Servo_DeclarationBlock_SetPropertyById_WithInitialSpace_Bench is the +same, but with the string " 10px" with an initial space. That initial +space is less typical of JS animations, but is almost always there in +stylesheets or full declarations like "width: 10px". This microbenchmark +was used to test the effect of some specific code changes. Regressions +here may be acceptable if Servo_StyleSheet_FromUTF8Bytes_Bench is not +affected. + +History of tp tests +******************* + +tp +== + +The original tp test created by Mozilla to test browser page load time. +Cycled through 40 pages. The pages were copied from the live web during +November, 2000. Pages were cycled by loading them within the main +browser window from a script that lived in content. + +tp2/tp_js +========= + +The same tp test but loading the individual pages into a frame instead +of the main browser window. Still used the old 40 page, year 2000 web +page test set. + +tp3 +=== + +An update to both the page set and the method by which pages are cycled. +The page set is now 393 pages from December, 2006. The pageloader is +re-built as an extension that is pre-loaded into the browser +chrome/components directories. + +tp4 +=== + +Updated web page test set to 100 pages from February 2009. + +tp4m +==== + +This is a smaller pageset (21 pages) designed for mobile Firefox. This +is a blend of regular and mobile friendly pages. + +We landed on this on April 18th, 2011 in `bug +648307 `__. This +runs for Android and Maemo mobile builds only. + +tp5 +=== + +Updated web page test set to 100 pages from April 8th, 2011. Effort was +made for the pages to no longer be splash screens/login pages/home pages +but to be pages that better reflect the actual content of the site in +question. There are two test page data sets for tp5 which are used in +multiple tests (i.e. awsy, xperf, etc.): (i) an optimized data set +called tp5o, and (ii) the standard data set called tp5n. + +tp6 +=== + +Created June 2017 with recorded pages via mitmproxy using modern google, +amazon, youtube, and facebook. Ideally this will contain more realistic +user accounts that have full content, in addition we would have more +than 4 sites- up to top 10 or maybe top 20. + +These were migrated to Raptor between 2018 and 2019. + +.. _geometric mean: https://wiki.mozilla.org/TestEngineering/Performance/Talos/Data#geometric_mean +.. _ignore first: https://wiki.mozilla.org/TestEngineering/Performance/Talos/Data#ignore_first +.. _median: https://wiki.mozilla.org/TestEngineering/Performance/Talos/Data#median -- cgit v1.2.3