diff options
Diffstat (limited to 'testing/docs/tests-for-new-config/manual.rst')
-rw-r--r-- | testing/docs/tests-for-new-config/manual.rst | 224 |
1 files changed, 224 insertions, 0 deletions
diff --git a/testing/docs/tests-for-new-config/manual.rst b/testing/docs/tests-for-new-config/manual.rst new file mode 100644 index 0000000000..cf2485251a --- /dev/null +++ b/testing/docs/tests-for-new-config/manual.rst @@ -0,0 +1,224 @@ +:orphan: + +Turning on Firefox tests for a new configuration (manual) +========================================================= + +You are ready to go with turning on Firefox tests for a new config. Once you +get to this stage, you will have seen a try push with all the tests running +(many not green) to verify some tests pass and there are enough machines +available to run tests. + +For the purpose of this document, assume you are tasked with upgrading Windows +10 OS from 1803 -> 1903. To simplify this we can call this `windows_1903`, and +we need to: + + * push to try + * analyze test failures + * disable tests in manifests + * repeat try push until no failures + * file bugs for test failures + * land changes and turn on tests + * turn on run only failures + +There are many edge cases, and I will outline them inside each step. + + +Push to Try Server +------------------ + +As you have new machines (or cloud instances) available with the updated +OS/config, it is time to push to try. + +In order to run all tests, we would need to execute: + ``./mach try fuzzy --no-artifact -q 'test-windows !-raptor- !-talos- --rebuild 10`` + +There are a few exceptions here: + + * Perf tests don't need to be run (hence the ``!-raptor- !-talos-``) + * Need to make sure we are not building with artifact builds (hence the + ``--no-artifact``) + * There are jobs hidden behind tier-3, some for a good reason (code coverage is + a good example, but fission tests might not be green) + + The last piece to sort out is running on the new config, here are some + considerations for new configs: + + * duplicated jobs (i.e. fission, a11y-checks), you can just run those specific + tasks: ``./mach try fuzzy --no-artifact -q 'test-windows fission --rebuild + 5`` + * new OS/hardware (i.e. aarch64, os upgrade), you need to reference the new + hardware, typically this is with ``--worker-override``: ``./mach try fuzzy + --no-artifact -q 'test-windows --rebuild 10 --worker-override + t-win10-64=gecko-t/t-win10-64-1903`` + + * the risk here is a scenario where hardware is limited, then ``--rebuild + 10`` will create too many tasks and some will expire. + * in low hardware situations, either run a subset of tests (i.e. + web-platform-tests, mochitest), or ``--rebuild 3`` and repeat. + + +Analyze Test Failures +--------------------- + +A try push will take many hours, it is best to push when you start work and +then results will be ready later in your day, or push at the end of your day +and results will be ready when you come back to work the next day. Please +make sure some tasks start before walking away, otherwise a small typo can +delay this process by hours or a full day. + +The best way to look at test failures is to use Push Health to avoid misleading +data. Push Health will bucket failures into possible regressions, known +regression, etc. When looking at 5 data points (from ``--rebuild 10``), this +will filter out intermittent failures. + +There are many reasons you might have invalid or misleading data: + + # Tests fail intermittently, we need a pattern to know if it is consistent or + intermittent. + # We still want to disable high frequency intermittent tests, those are just + annoying. + # You could be pushing off a bad base revision (regression or intermittent that + comes from the base revision). + # The machines you run on could be bad, skewing the data. + # Infrastructure problems could cause jobs to fail at random places, repeated + jobs filter that out. + # Some failures could affect future tests in the same browser session or tasks. + # If a crash occurs, or we timeout- it is possible that we will not run all of + the tests in the task, therefore believing a test was run 5 times, but maybe it + was only run once (and failed), or never run at all. + # Task failures that do not have a test name (leak on shutdown, crash on + shutdown, timeout on shutdown, etc.) + +That is a long list of reasons to not trust the data, luckily most of the time +using ``--rebuild 10`` will give us enough data to give enough confidence we +found all failures and can ignore random/intermittent failures. + +Knowing the reasons for misleading data, here is a way to use `Push Health +<https://treeherder.mozilla.org/push-health/push?revision=abaff26f8e084ac719bea0438dba741ace3cf5d8&repo=try&testGroup=pr>`__. + + * Alternatively, you could use the `API + <https://treeherder.mozilla.org/api/project/try/push/health/?revision=abaff26f8e084ac719bea0438dba741ace3cf5d8>`__ + to get raw data and work towards building a tool + * If you write a tool, you need to parse the resulting JSON file and keep in + mind to build a list of failures and match it with a list of jobnames to find + how many times the job ran and failed/passed. + +The main goal here is to know what <path>/<filenames> are failing, and having a +list of those. Ideally you would record some additional information like +timeout, crash, failure, etc. In the end you might end up with:: + + dom/html/test/test_fullscreen-api.html, scrollbar + gfx/layers/apz/test/mochitest/test_group_hittest.html, scrollbar + image/test/mochitest/test_animSVGImage.html, timeout + browser/base/content/test/general/browser_restore_isAppTab.js, crashed + + + + +Disable Tests in the Manifest Files +----------------------------------- + +The code sheriffs have been using `this documentation +<https://wiki.mozilla.org/Auto-tools/Projects/Stockwell/disable-recommended>`__ +for training and reference when they disable intermittents. + +First you need to add a keyword to be available in the manifest (e.g. ``skip-if += windows_1903``). + +There are many exceptions, the bulk of the work will fall into one of 4 +categories: + + # `manifestparser <mochitest_xpcshell_manifest_keywords>`_: \*.toml (mochitest*, + firefox-ui, marionette, xpcshell) easy to edit by adding a ``skip-if = + windows_1903 # <comment>``, a few exceptions here + # `reftest <reftest_manifest_keywords>`_: \*.list (reftest, crashtest) need to + add a ``fuzzy-if(windows_1903, A, B)``, this is more specific + # web-platform-test: testing/web-platform/meta/\*\*.ini (wpt, wpt-reftest, + etc.) need to edit/add testing/web-platform/meta/<path>/<testname>.ini, and add + expected results + # Other (compiled tests, jsreftest, etc.) edit source code, ask for help. + +Basically we want to take every non-intermittent failure found from push health +and edit the manifest, this typically means: + + * Finding the proper manifest. + * Adding the right text to the manifest. + +To find the proper manifest, it is typically <path>/<harness>.[toml|list]. +There are exceptions and if in doubt use searchfox.org/ to find the manifest +which contains the testname. + +Once you have the manifest, open it in an editor, and search for the exact test +name (there could be similar named tests). + +Rerun Try Push, Repeat as Necessary +----------------------------------- + +It is important to test your changes and for a new platform that will be +sheriffed, to rerun all the tests at scale. + +With your change in a commit, push again to try with ``--rebuild 10`` and come +back the next day. + +As there are so many edge cases, it is quite likely that you will have more +failures, mentally plan on 3 iterations of this, where each iteration has fewer +failures. + +Once you get a full push to show no persistent failures, it is time to land +those changes and turn on the new tests. There is a large risk here that the +longer you take to find all failures, the greater the chance of: + + * Bitrot of your patch + * New tests being added which could fail on your config + * Other edits to tests/tools which could affect your new config + +Since the new config process is designed to find failures fast and get the +changes landed fast, we do not need to ask developers for review, that comes +after the new config is running successfully where we notify the teams of what +tests are failing. + +File Bugs for Test Failures +--------------------------- + +Once the failure jobs are running on mozilla-central, now we have full coverage +and the ability to run tests on try server. There could be >100 tests that are +marked as ``skip-if`` and that would take a lot of time to file bugs. Instead +we will file a bug for each manifest that is edited, typically this reduces the +bugs to about 40% the total tests (average out to 2.5 test failures/manifest). + +When filing the bug, indicate the timeline, how to run the failure, link to the +bug where we created the config, describe briefly the config change (i.e. +upgrade windows 10 from version 1803 to 1903), and finally needinfo the triage +owner indicating this is a heads up and these tests are running reguarly on +mozilla-central for the next 6-7 weeks. + +Land Changes and Turn on Tests +------------------------------ + +After you have a green test run, it is time to land the patches. There could +be changes needed to the taskgraph in order to add the new hardware type and +duplicate tests to run on both the old and the new, or create a new variant and +denote which tests to run on that variant. + +Using our example of ``windows_1903``, this would be a new worker type that +would require these edits: + + * `transforms/tests.py <https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/tests.py#97>`__ (duplicate windows 10 entries) + * `test-platforms.py <https://searchfox.org/mozilla-central/source/taskcluster/ci/test/test-platforms.yml#229>`__ (copy windows10 debug/opt/shippable/asan entries and make win10_1903) + * `test-sets.py <https://searchfox.org/mozilla-central/source/taskcluster/ci/test/test-sets.yml#293>`__ (ideally you need nothing, otherwise copy ``windows-tests`` and edit the test list) + +In general this should allow you to have tests scheduled with no custom flags +in try server and all of these will be scheduled by default on +``mozilla-central``, ``autoland``, and ``release-branches``. + +Turn on Run Only Failures +------------------------- + +Now that we have tests running regularly, the next step is to take all the +disabled tests and run them in the special failures job. + +We have a basic framework created, but for every test harness (i.e. xpcshell, +mochitest-gpu, browser-chrome, devtools, web-platform-tests, crashtest, etc.), +there will need to be a corresponding tier-3 job that is created. + +TODO: point to examples of how to add this after we get our first jobs running. |