diff options
Diffstat (limited to 'testing/web-platform/tests/tools/wptrunner/docs/expectation.rst')
-rw-r--r-- | testing/web-platform/tests/tools/wptrunner/docs/expectation.rst | 366 |
1 files changed, 366 insertions, 0 deletions
diff --git a/testing/web-platform/tests/tools/wptrunner/docs/expectation.rst b/testing/web-platform/tests/tools/wptrunner/docs/expectation.rst new file mode 100644 index 0000000000..fea676565b --- /dev/null +++ b/testing/web-platform/tests/tools/wptrunner/docs/expectation.rst @@ -0,0 +1,366 @@ +Test Metadata +============= + +Directory Layout +---------------- + +Metadata files must be stored under the ``metadata`` directory passed +to the test runner. The directory layout follows that of +web-platform-tests with each test source path having a corresponding +metadata file. Because the metadata path is based on the source file +path, files that generate multiple URLs e.g. tests with multiple +variants, or multi-global tests generated from an ``any.js`` input +file, share the same metadata file for all their corresponding +tests. The metadata path under the ``metadata`` directory is the same +as the source path under the ``tests`` directory, with an additional +``.ini`` suffix. + +For example a test with URL:: + + /spec/section/file.html?query=param + +generated from a source file with path:: + + <tests root>/spec/section.file.html + +would have a metadata file :: + + <metadata root>/spec/section/file.html.ini + +As an optimisation, files which produce only default results +(i.e. ``PASS`` or ``OK``), and which don't have any other associated +metadata, don't require a corresponding metadata file. + +Directory Metadata +~~~~~~~~~~~~~~~~~~ + +In addition to per-test metadata, default metadata can be applied to +all the tests in a given source location, using a ``__dir__.ini`` +metadata file. For example to apply metadata to all tests under +``<tests root>/spec/`` add the metadata in ``<tests +root>/spec/__dir__.ini``. + +Metadata Format +--------------- +The format of the metadata files is based on the ini format. Files are +divided into sections, each (apart from the root section) having a +heading enclosed in square braces. Within each section are key-value +pairs. There are several notable differences from standard .ini files, +however: + + * Sections may be hierarchically nested, with significant whitespace + indicating nesting depth. + + * Only ``:`` is valid as a key/value separator + +A simple example of a metadata file is:: + + root_key: root_value + + [section] + section_key: section_value + + [subsection] + subsection_key: subsection_value + + [another_section] + another_key: [list, value] + +Conditional Values +~~~~~~~~~~~~~~~~~~ + +In order to support values that depend on some external data, the +right hand side of a key/value pair can take a set of conditionals +rather than a plain value. These values are placed on a new line +following the key, with significant indentation. Conditional values +are prefixed with ``if`` and terminated with a colon, for example:: + + key: + if cond1: value1 + if cond2: value2 + value3 + +In this example, the value associated with ``key`` is determined by +first evaluating ``cond1`` against external data. If that is true, +``key`` is assigned the value ``value1``, otherwise ``cond2`` is +evaluated in the same way. If both ``cond1`` and ``cond2`` are false, +the unconditional ``value3`` is used. + +Conditions themselves use a Python-like expression syntax. Operands +can either be variables, corresponding to data passed in, numbers +(integer or floating point; exponential notation is not supported) or +quote-delimited strings. Equality is tested using ``==`` and +inequality by ``!=``. The operators ``and``, ``or`` and ``not`` are +used in the expected way. Parentheses can also be used for +grouping. For example:: + + key: + if (a == 2 or a == 3) and b == "abc": value1 + if a == 1 or b != "abc": value2 + value3 + +Here ``a`` and ``b`` are variables, the value of which will be +supplied when the metadata is used. + +Web-Platform-Tests Metadata +--------------------------- + +When used for expectation data, metadata files have the following format: + + * A section per test URL provided by the corresponding source file, + with the section heading being the part of the test URL following + the last ``/`` in the path (this allows multiple tests in a single + metadata file with the same path part of the URL, but different + query parts). This may be omitted if there's no non-default + metadata for the test. + + * A subsection per subtest, with the heading being the title of the + subtest. This may be omitted if there's no non-default metadata for + the subtest. + + * The following known keys: + + :expected: + The expectation value or values of each (sub)test. In + the case this value is a list, the first value represents the + typical expected test outcome, and subsequent values indicate + known intermittent outcomes e.g. ``expected: [PASS, ERROR]`` + would indicate a test that usually passes but has a known-flaky + ``ERROR`` outcome. + + :disabled: + Any values apart from the special value ``@False`` + indicates that the (sub)test is disabled and should either not be + run (for tests) or that its results should be ignored (subtests). + + :restart-after: + Any value apart from the special value ``@False`` + indicates that the runner should restart the browser after running + this test (e.g. to clear out unwanted state). + + :fuzzy: + Used for reftests. This is interpreted as a list containing + entries like ``<meta name=fuzzy>`` content value, which consists of + an optional reference identifier followed by a colon, then a range + indicating the maximum permitted pixel difference per channel, then + semicolon, then a range indicating the maximum permitted total + number of differing pixels. The reference identifier is either a + single relative URL, resolved against the base test URL, in which + case the fuzziness applies to any comparison with that URL, or + takes the form lhs URL, comparison, rhs URL, in which case the + fuzziness only applies for any comparison involving that specific + pair of URLs. Some illustrative examples are given below. + + :implementation-status: + One of the values ``implementing``, + ``not-implementing`` or ``default``. This is used in conjunction + with the ``--skip-implementation-status`` command line argument to + ``wptrunner`` to ignore certain features where running the test is + low value. + + :tags: + A list of labels associated with a given test that can be + used in conjunction with the ``--tag`` command line argument to + ``wptrunner`` for test selection. + + In addition there are extra arguments which are currently tied to + specific implementations. For example Gecko-based browsers support + ``min-asserts``, ``max-asserts``, ``prefs``, ``lsan-disabled``, + ``lsan-allowed``, ``lsan-max-stack-depth``, ``leak-allowed``, and + ``leak-threshold`` properties. + + * Variables taken from the ``RunInfo`` data which describe the + configuration of the test run. Common properties include: + + :product: A string giving the name of the browser under test + :browser_channel: A string giving the release channel of the browser under test + :debug: A Boolean indicating whether the build is a debug build + :os: A string the operating system + :version: A string indicating the particular version of that operating system + :processor: A string indicating the processor architecture. + + This information is typically provided by :py:mod:`mozinfo`, but + different environments may add additional information, and not all + the properties above are guaranteed to be present in all + environments. The definitive list of available properties for a + specific run may be determined by looking at the ``run_info`` key + in the ``wptreport.json`` output for the run. + + * Top level keys are taken as defaults for the whole file. So, for + example, a top level key with ``expected: FAIL`` would indicate + that all tests and subtests in the file are expected to fail, + unless they have an ``expected`` key of their own. + +An simple example metadata file might look like:: + + [test.html?variant=basic] + type: testharness + + [Test something unsupported] + expected: FAIL + + [Test with intermittent statuses] + expected: [PASS, TIMEOUT] + + [test.html?variant=broken] + expected: ERROR + + [test.html?variant=unstable] + disabled: http://test.bugs.example.org/bugs/12345 + +A more complex metadata file with conditional properties might be:: + + [canvas_test.html] + expected: + if os == "mac": FAIL + if os == "windows" and version == "XP": FAIL + PASS + +Note that ``PASS`` in the above works, but is unnecessary since it's +the default expected result. + +A metadata file with fuzzy reftest values might be:: + + [reftest.html] + fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20] + +In this case the default fuzziness for any comparison would be to +require a maximum difference per channel of less than or equal to 10 +and less than or equal to 200 total pixels different. For any +comparison involving ref1.html on the right hand side, the limits +would instead be a difference per channel not more than 20 and a total +difference count of not less than 200 and not more than 300. For the +specific comparison ``subtest1.html == ref2.html`` (both resolved against +the test URL) these limits would instead be 10 to 15 and 0 to 20, +respectively. + +Generating Expectation Files +---------------------------- + +wpt provides the tool ``wpt update-expectations`` command to generate +expectation files from the results of a set of test runs. The basic +syntax for this is:: + + ./wpt update-expectations [options] [logfile]... + +Each ``logfile`` is a wptreport log file from a previous run. These +can be generated from wptrunner using the ``--log-wptreport`` option +e.g. ``--log-wptreport=wptreport.json``. + +``update-expectations`` takes several options: + +--full Overwrite all the expectation data for any tests that have a + result in the passed log files, not just data for the same run + configuration. + +--disable-intermittent When updating test results, disable tests that + have inconsistent results across many + runs. This can precede a message providing a + reason why that test is disable. If no message + is provided, ``unstable`` is the default text. + +--update-intermittent When this option is used, the ``expected`` key + stores expected intermittent statuses in + addition to the primary expected status. If + there is more than one status, it appears as a + list. The default behaviour of this option is to + retain any existing intermittent statuses in the + list unless ``--remove-intermittent`` is + specified. + +--remove-intermittent This option is used in conjunction with + ``--update-intermittent``. When the + ``expected`` statuses are updated, any obsolete + intermittent statuses that did not occur in the + specified log files are removed from the list. + +Property Configuration +~~~~~~~~~~~~~~~~~~~~~~ + +In cases where the expectation depends on the run configuration ``wpt +update-expectations`` is able to generate conditional values. Because +the relevant variables depend on the range of configurations that need +to be covered, it's necessary to specify the list of configuration +variables that should be used. This is done using a ``json`` format +file that can be specified with the ``--properties-file`` command line +argument to ``wpt update-expectations``. When this isn't supplied the +defaults from ``<metadata root>/update_properties.json`` are used, if +present. + +Properties File Format +++++++++++++++++++++++ + +The file is JSON formatted with two top-level keys: + +:``properties``: + A list of property names to consider for conditionals + e.g ``["product", "os"]``. + +:``dependents``: + An optional dictionary containing properties that + should only be used as "tie-breakers" when differentiating based on a + specific top-level property has failed. This is useful when the + dependent property is always more specific than the top-level + property, but less understandable when used directly. For example the + ``version`` property covering different OS versions is typically + unique amongst different operating systems, but using it when the + ``os`` property would do instead is likely to produce metadata that's + too specific to the current configuration and more difficult to + read. But where there are multiple versions of the same operating + system with different results, it can be necessary. So specifying + ``{"os": ["version"]}`` as a dependent property means that the + ``version`` property will only be used if the condition already + contains the ``os`` property and further conditions are required to + separate the observed results. + +So an example ``update-properties.json`` file might look like:: + + { + "properties": ["product", "os"], + "dependents": {"product": ["browser_channel"], "os": ["version"]} + } + +Examples +~~~~~~~~ + +Update all the expectations from a set of cross-platform test runs:: + + wpt update-expectations --full osx.log linux.log windows.log + +Add expectation data for some new tests that are expected to be +platform-independent:: + + wpt update-expectations tests.log + +Why a Custom Format? +-------------------- + +Introduction +------------ + +Given the use of the metadata files in CI systems, it was desirable to +have something with the following properties: + + * Human readable + + * Human editable + + * Machine readable / writable + + * Capable of storing key-value pairs + + * Suitable for storing in a version control system (i.e. text-based) + +The need for different results per platform means either having +multiple expectation files for each platform, or having a way to +express conditional values within a certain file. The former would be +rather cumbersome for humans updating the expectation files, so the +latter approach has been adopted, leading to the requirement: + + * Capable of storing result values that are conditional on the platform. + +There are few extant formats that clearly meet these requirements. In +particular although conditional properties could be expressed in many +existing formats, the representation would likely be cumbersome and +error-prone for hand authoring. Therefore it was decided that a custom +format offered the best tradeoffs given the requirements. |