firefox/testing/web-platform/tests/tools/wptrunner/docs/expectation.rst

Test Metadata
=============

Directory Layout
----------------

Metadata files must be stored under the ``metadata`` directory passed
to the test runner. The directory layout follows that of
web-platform-tests with each test source path having a corresponding
metadata file. Because the metadata path is based on the source file
path, files that generate multiple URLs e.g. tests with multiple
variants, or multi-global tests generated from an ``any.js`` input
file, share the same metadata file for all their corresponding
tests. The metadata path under the ``metadata`` directory is the same
as the source path under the ``tests`` directory, with an additional
``.ini`` suffix.

For example a test with URL::

  /spec/section/file.html?query=param

generated from a source file with path::

  <tests root>/spec/section.file.html

would have a metadata file ::

  <metadata root>/spec/section/file.html.ini

As an optimisation, files which produce only default results
(i.e. ``PASS`` or ``OK``), and which don't have any other associated
metadata, don't require a corresponding metadata file.

Directory Metadata
~~~~~~~~~~~~~~~~~~

In addition to per-test metadata, default metadata can be applied to
all the tests in a given source location, using a ``__dir__.ini``
metadata file. For example to apply metadata to all tests under
``<tests root>/spec/`` add the metadata in ``<tests
root>/spec/__dir__.ini``.

Metadata Format
---------------
The format of the metadata files is based on the ini format. Files are
divided into sections, each (apart from the root section) having a
heading enclosed in square braces. Within each section are key-value
pairs. There are several notable differences from standard .ini files,
however:

 * Sections may be hierarchically nested, with significant whitespace
   indicating nesting depth.

 * Only ``:`` is valid as a key/value separator

A simple example of a metadata file is::

  root_key: root_value

  [section]
    section_key: section_value

    [subsection]
       subsection_key: subsection_value

  [another_section]
    another_key: [list, value]

Conditional Values
~~~~~~~~~~~~~~~~~~

In order to support values that depend on some external data, the
right hand side of a key/value pair can take a set of conditionals
rather than a plain value. These values are placed on a new line
following the key, with significant indentation. Conditional values
are prefixed with ``if`` and terminated with a colon, for example::

  key:
    if cond1: value1
    if cond2: value2
    value3

In this example, the value associated with ``key`` is determined by
first evaluating ``cond1`` against external data. If that is true,
``key`` is assigned the value ``value1``, otherwise ``cond2`` is
evaluated in the same way. If both ``cond1`` and ``cond2`` are false,
the unconditional ``value3`` is used.

Conditions themselves use a Python-like expression syntax. Operands
can either be variables, corresponding to data passed in, numbers
(integer or floating point; exponential notation is not supported) or
quote-delimited strings. Equality is tested using ``==`` and
inequality by ``!=``. The operators ``and``, ``or`` and ``not`` are
used in the expected way. Parentheses can also be used for
grouping. For example::

  key:
    if (a == 2 or a == 3) and b == "abc": value1
    if a == 1 or b != "abc": value2
    value3

Here ``a`` and ``b`` are variables, the value of which will be
supplied when the metadata is used.

Web-Platform-Tests Metadata
---------------------------

When used for expectation data, metadata files have the following format:

 * A section per test URL provided by the corresponding source file,
   with the section heading being the part of the test URL following
   the last ``/`` in the path (this allows multiple tests in a single
   metadata file with the same path part of the URL, but different
   query parts). This may be omitted if there's no non-default
   metadata for the test.

 * A subsection per subtest, with the heading being the title of the
   subtest. This may be omitted if there's no non-default metadata for
   the subtest.

 * The following known keys:

   :expected:
      The expectation value or values of each (sub)test. In
      the case this value is a list, the first value represents the
      typical expected test outcome, and subsequent values indicate
      known intermittent outcomes e.g. ``expected: [PASS, ERROR]``
      would indicate a test that usually passes but has a known-flaky
      ``ERROR`` outcome.

   :disabled:
     Any values apart from the special value ``@False``
     indicates that the (sub)test is disabled and should either not be
     run (for tests) or that its results should be ignored (subtests).

   :restart-after:
     Any value apart from the special value ``@False``
     indicates that the runner should restart the browser after running
     this test (e.g. to clear out unwanted state).

   :fuzzy:
     Used for reftests. This is interpreted as a list containing
     entries like ``<meta name=fuzzy>`` content value, which consists of
     an optional reference identifier followed by a colon, then a range
     indicating the maximum permitted pixel difference per channel, then
     semicolon, then a range indicating the maximum permitted total
     number of differing pixels. The reference identifier is either a
     single relative URL, resolved against the base test URL, in which
     case the fuzziness applies to any comparison with that URL, or
     takes the form lhs URL, comparison, rhs URL, in which case the
     fuzziness only applies for any comparison involving that specific
     pair of URLs. Some illustrative examples are given below.

   :implementation-status:
     One of the values ``implementing``,
     ``not-implementing`` or ``backlog``. This is used in conjunction
     with the ``--skip-implementation-status`` command line argument to
     ``wptrunner`` to ignore certain features where running the test is
     low value.

   :tags:
     A list of labels associated with a given test that can be
     used in conjunction with the ``--tag`` command line argument to
     ``wptrunner`` for test selection.

   In addition there are extra arguments which are currently tied to
   specific implementations. For example Gecko-based browsers support
   ``min-asserts``, ``max-asserts``, ``prefs``, ``lsan-disabled``,
   ``lsan-allowed``, ``lsan-max-stack-depth``, ``leak-allowed``, and
   ``leak-threshold`` properties.

 * Variables taken from the ``RunInfo`` data which describe the
   configuration of the test run. Common properties include:

   :product: A string giving the name of the browser under test
   :browser_channel: A string giving the release channel of the browser under test
   :debug: A Boolean indicating whether the build is a debug build
   :os: A string  the operating system
   :version: A string indicating the particular version of that operating system
   :processor: A string indicating the processor architecture.

   This information is typically provided by :py:mod:`mozinfo`, but
   different environments may add additional information, and not all
   the properties above are guaranteed to be present in all
   environments. The definitive list of available properties for a
   specific run may be determined by looking at the ``run_info`` key
   in the ``wptreport.json`` output for the run.

 * Top level keys are taken as defaults for the whole file. So, for
   example, a top level key with ``expected: FAIL`` would indicate
   that all tests and subtests in the file are expected to fail,
   unless they have an ``expected`` key of their own.

An simple example metadata file might look like::

  [test.html?variant=basic]
    type: testharness

    [Test something unsupported]
       expected: FAIL

    [Test with intermittent statuses]
       expected: [PASS, TIMEOUT]

  [test.html?variant=broken]
    expected: ERROR

  [test.html?variant=unstable]
    disabled: http://test.bugs.example.org/bugs/12345

A more complex metadata file with conditional properties might be::

  [canvas_test.html]
    expected:
      if os == "mac": FAIL
      if os == "windows" and version == "XP": FAIL
      PASS

Note that ``PASS`` in the above works, but is unnecessary since it's
the default expected result.

A metadata file with fuzzy reftest values might be::

  [reftest.html]
    fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20]

In this case the default fuzziness for any comparison would be to
require a maximum difference per channel of less than or equal to 10
and less than or equal to 200 total pixels different. For any
comparison involving ref1.html on the right hand side, the limits
would instead be a difference per channel not more than 20 and a total
difference count of not less than 200 and not more than 300. For the
specific comparison ``subtest1.html == ref2.html`` (both resolved against
the test URL) these limits would instead be 10 to 15 and 0 to 20,
respectively.

Generating Expectation Files
----------------------------

wpt provides the tool ``wpt update-expectations`` command to generate
expectation files from the results of a set of test runs. The basic
syntax for this is::

  ./wpt update-expectations [options] [logfile]...

Each ``logfile`` is a wptreport log file from a previous run. These
can be generated from wptrunner using the ``--log-wptreport`` option
e.g. ``--log-wptreport=wptreport.json``.

``update-expectations`` takes several options:

--full  Overwrite all the expectation data for any tests that have a
        result in the passed log files, not just data for the same run
        configuration.

--disable-intermittent  When updating test results, disable tests that
                        have inconsistent results across many
                        runs. This can precede a message providing a
                        reason why that test is disable. If no message
                        is provided, ``unstable`` is the default text.

--update-intermittent  When this option is used, the ``expected`` key
                       stores expected intermittent statuses in
                       addition to the primary expected status. If
                       there is more than one status, it appears as a
                       list. The default behaviour of this option is to
                       retain any existing intermittent statuses in the
                       list unless ``--remove-intermittent`` is
                       specified.

--remove-intermittent  This option is used in conjunction with
                       ``--update-intermittent``.  When the
                       ``expected`` statuses are updated, any obsolete
                       intermittent statuses that did not occur in the
                       specified log files are removed from the list.

Property Configuration
~~~~~~~~~~~~~~~~~~~~~~

In cases where the expectation depends on the run configuration ``wpt
update-expectations`` is able to generate conditional values. Because
the relevant variables depend on the range of configurations that need
to be covered, it's necessary to specify the list of configuration
variables that should be used. This is done using a ``json`` format
file that can be specified with the ``--properties-file`` command line
argument to ``wpt update-expectations``. When this isn't supplied the
defaults from ``<metadata root>/update_properties.json`` are used, if
present.

Properties File Format
++++++++++++++++++++++

The file is JSON formatted with two top-level keys:

:``properties``:
  A list of property names to consider for conditionals
  e.g ``["product", "os"]``.

:``dependents``:
  An optional dictionary containing properties that
  should only be used as "tie-breakers" when differentiating based on a
  specific top-level property has failed. This is useful when the
  dependent property is always more specific than the top-level
  property, but less understandable when used directly. For example the
  ``version`` property covering different OS versions is typically
  unique amongst different operating systems, but using it when the
  ``os`` property would do instead is likely to produce metadata that's
  too specific to the current configuration and more difficult to
  read. But where there are multiple versions of the same operating
  system with different results, it can be necessary. So specifying
  ``{"os": ["version"]}`` as a dependent property means that the
  ``version`` property will only be used if the condition already
  contains the ``os`` property and further conditions are required to
  separate the observed results.

So an example ``update-properties.json`` file might look like::

  {
    "properties": ["product", "os"],
    "dependents": {"product": ["browser_channel"], "os": ["version"]}
  }

Examples
~~~~~~~~

Update all the expectations from a set of cross-platform test runs::

  wpt update-expectations --full osx.log linux.log windows.log

Add expectation data for some new tests that are expected to be
platform-independent::

  wpt update-expectations tests.log

Why a Custom Format?
--------------------

Introduction
------------

Given the use of the metadata files in CI systems, it was desirable to
have something with the following properties:

 * Human readable

 * Human editable

 * Machine readable / writable

 * Capable of storing key-value pairs

 * Suitable for storing in a version control system (i.e. text-based)

The need for different results per platform means either having
multiple expectation files for each platform, or having a way to
express conditional values within a certain file. The former would be
rather cumbersome for humans updating the expectation files, so the
latter approach has been adopted, leading to the requirement:

 * Capable of storing result values that are conditional on the platform.

There are few extant formats that clearly meet these requirements. In
particular although conditional properties could be expressed in many
existing formats, the representation would likely be cumbersome and
error-prone for hand authoring. Therefore it was decided that a custom
format offered the best tradeoffs given the requirements.