diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-19 00:47:55 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-19 00:47:55 +0000 |
commit | 26a029d407be480d791972afb5975cf62c9360a6 (patch) | |
tree | f435a8308119effd964b339f76abb83a57c29483 /dom/webgpu/tests/cts/checkout/docs | |
parent | Initial commit. (diff) | |
download | firefox-26a029d407be480d791972afb5975cf62c9360a6.tar.xz firefox-26a029d407be480d791972afb5975cf62c9360a6.zip |
Adding upstream version 124.0.1.upstream/124.0.1
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'dom/webgpu/tests/cts/checkout/docs')
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/adding_timing_metadata.md | 163 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/build.md | 43 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/deno.md | 24 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/fp_primer.md | 871 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/helper_index.txt | 93 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/implementing.md | 97 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/intro/README.md | 99 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/intro/convert_to_issue.png | bin | 0 -> 2061 bytes | |||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/intro/developing.md | 134 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/intro/life_of.md | 46 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/intro/plans.md | 82 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/intro/tests.md | 25 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/organization.md | 166 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/reviews.md | 70 | ||||
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/terms.md | 270 |
15 files changed, 2183 insertions, 0 deletions
diff --git a/dom/webgpu/tests/cts/checkout/docs/adding_timing_metadata.md b/dom/webgpu/tests/cts/checkout/docs/adding_timing_metadata.md new file mode 100644 index 0000000000..fe32cead20 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/adding_timing_metadata.md @@ -0,0 +1,163 @@ +# Adding Timing Metadata + +## listing_meta.json files + +`listing_meta.json` files are SEMI AUTO-GENERATED. + +The raw data may be edited manually, to add entries or change timing values. + +The **list** of tests must stay up to date, so it can be used by external +tools. This is verified by presubmit checks. + +The `subcaseMS` values are estimates. They can be set to 0 if for some reason +you can't estimate the time (or there's an existing test with a long name and +slow subcases that would result in query strings that are too long), but this +will produce a non-fatal warning. Avoid creating new warnings whenever +possible. Any existing failures should be fixed (eventually). + +### Performance + +Note this data is typically captured by developers using higher-end +computers, so typical test machines might execute more slowly. For this +reason, the WPT chunking should be configured to generate chunks much shorter +than 5 seconds (a typical default time limit in WPT test executors) so they +should still execute in under 5 seconds on lower-end computers. + +## Problem + +When adding new tests to the CTS you may occasionally see an error like this +when running `npm test` or `npm run standalone`: + +``` +ERROR: Tests missing from listing_meta.json. Please add the new tests (set subcaseMS to 0 if you cannot estimate it): + webgpu:shader,execution,expression,binary,af_matrix_addition:matrix:* + +/home/runner/work/cts/cts/src/common/util/util.ts:38 + throw new Error(msg && (typeof msg === 'string' ? msg : msg())); + ^ +Error: + at assert (/home/runner/work/cts/cts/src/common/util/util.ts:38:11) + at crawl (/home/runner/work/cts/cts/src/common/tools/crawl.ts:155:11) +Warning: non-zero exit code 1 + Use --force to continue. + +Aborted due to warnings. +``` + +What this error message is trying to tell us, is that there is no entry for +`webgpu:shader,execution,expression,binary,af_matrix_addition:matrix:*` in +`src/webgpu/listing_meta.json`. + +These entries are estimates for the amount of time that subcases take to run, +and are used as inputs into the WPT tooling to attempt to portion out tests into +approximately same-sized chunks. + +If a value has been defaulted to 0 by someone, you will see warnings like this: + +``` +... +WARNING: subcaseMS≤0 found in listing_meta.json (allowed, but try to avoid): + webgpu:shader,execution,expression,binary,af_matrix_addition:matrix:* +... +``` + +These messages should be resolved by adding appropriate entries to the JSON +file. + +## Solution 1 (manual, best for simple tests) + +If you're developing new tests and need to update this file, it is sometimes +easiest to do so manually. Run your tests under your usual development workflow +and see how long they take. In the standalone web runner `npm start`, the total +time for a test case is reported on the right-hand side when the case logs are +expanded. + +Record the average time per *subcase* across all cases of the test (you may need +to compute this) into the `listing_meta.json` file. + +## Solution 2 (semi-automated) + +There exists tooling in the CTS repo for generating appropriate estimates for +these values, though they do require some manual intervention. The rest of this +doc will be a walkthrough of running these tools. + +Timing data can be captured in bulk and "merged" into this file using +the `merge_listing_times` tool. This is useful when a large number of tests +change or otherwise a lot of tests need to be updated, but it also automates the +manual steps above. + +The tool can also be used without any inputs to reformat `listing_meta.json`. +Please read the help message of `merge_listing_times` for more information. + +### Placeholder Value + +If your development workflow requires a clean build, the first step is to add a +placeholder value for entry to `src/webgpu/listing_meta.json`, since there is a +chicken-and-egg problem for updating these values. + +``` + "webgpu:shader,execution,expression,binary,af_matrix_addition:matrix:*": { "subcaseMS": 0 }, +``` + +(It should have a value of 0, since later tooling updates the value if the newer +value is higher.) + +### Websocket Logger + +The first tool that needs to be run is `websocket-logger`, which receives data +on a WebSocket channel to capture timing data when CTS is run. This +should be run in a separate process/terminal, since it needs to stay running +throughout the following steps. + +In the `tools/websocket-logger/` directory: + +``` +npm ci +npm start +``` + +The output from this command will indicate where the results are being logged, +which will be needed later. For example: + +``` +... +Writing to wslog-2023-09-12T18-57-34.txt +... +``` + +### Running CTS + +Now we need to run the specific cases in CTS that we need to time. +This should be possible under any development workflow (as long as its runtime environment, like Node, supports WebSockets), but the most well-tested way is using the standalone web runner. + +This requires serving the CTS locally. In the project root: + +``` +npm run standalone +npm start +``` + +Once this is started you can then direct a WebGPU enabled browser to the +specific CTS entry and run the tests, for example: + +``` +http://localhost:8080/standalone/?q=webgpu:shader,execution,expression,binary,af_matrix_addition:matrix:* +``` + +If the tests have a high variance in runtime, you can run them multiple times. +The longest recorded time will be used. + +### Merging metadata + +The final step is to merge the new data that has been captured into the JSON +file. + +This can be done using the following command: + +``` +tools/merge_listing_times webgpu -- tools/websocket-logger/wslog-2023-09-12T18-57-34.txt +``` + +where the text file is the result file from websocket-logger. + +Now you just need to commit the pending diff in your repo. diff --git a/dom/webgpu/tests/cts/checkout/docs/build.md b/dom/webgpu/tests/cts/checkout/docs/build.md new file mode 100644 index 0000000000..2d7b2f968c --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/build.md @@ -0,0 +1,43 @@ +# Building + +Building the project is not usually needed for local development. +However, for exports to WPT, or deployment (https://gpuweb.github.io/cts/), +files can be pre-generated. + +The project builds into two directories: + +- `out/`: Built framework and test files, needed to run standalone or command line. +- `out-wpt/`: Build directory for export into WPT. Contains: + - An adapter for running WebGPU CTS tests under WPT + - A copy of the needed files from `out/` + - A copy of any `.html` test cases from `src/` + +To build and run all pre-submit checks (including type and lint checks and +unittests), use: + +```sh +npm test +``` + +For checks only: + +```sh +npm run check +``` + +For a quicker iterative build: + +```sh +npm run standalone +``` + +## Run + +To serve the built files (rather than using the dev server), run `npx grunt serve`. + +## Export to WPT + +Run `npm run wpt`. + +Copy (or symlink) the `out-wpt/` directory as the `webgpu/` directory in your +WPT checkout or your browser's "internal" WPT test directory. diff --git a/dom/webgpu/tests/cts/checkout/docs/deno.md b/dom/webgpu/tests/cts/checkout/docs/deno.md new file mode 100644 index 0000000000..22a54c79bd --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/deno.md @@ -0,0 +1,24 @@ +# Running the CTS on Deno + +Since version 1.8, Deno experimentally implements the WebGPU API out of the box. +You can use the `./tools/deno` script to run the CTS in Deno. To do this you +will first need to install Deno: [stable](https://deno.land#installation), or +build the main branch from source +(`cargo install --git https://github.com/denoland/deno --bin deno`). + +On macOS and recent Linux, you can just run `./tools/run_deno` as is. On Windows and +older Linux releases you will need to run +`deno run --unstable --allow-read --allow-write --allow-env ./tools/deno`. + +## Usage + +``` +Usage: + tools/run_deno [OPTIONS...] QUERIES... + tools/run_deno 'unittests:*' 'webgpu:buffers,*' +Options: + --verbose Print result/log of every test as it runs. + --debug Include debug messages in logging. + --print-json Print the complete result JSON in the output. + --expectations Path to expectations file. +``` diff --git a/dom/webgpu/tests/cts/checkout/docs/fp_primer.md b/dom/webgpu/tests/cts/checkout/docs/fp_primer.md new file mode 100644 index 0000000000..a8302fb461 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/fp_primer.md @@ -0,0 +1,871 @@ +# Floating Point Primer + +This document is meant to be a primer of the concepts related to floating point +numbers that are needed to be understood when working on tests in WebGPU's CTS. + +WebGPU's CTS is responsible for testing if an implementation of WebGPU +satisfies the spec, and thus meets the expectations of programmers based on the +contract defined by the spec. + +Floating point math makes up a significant portion of the WGSL spec, and has +many subtle corner cases to get correct. + +Additionally, floating point math, unlike integer math, is broadly not exact, so +how inaccurate a calculation is allowed to be is required to be stated in the +spec and tested in the CTS, as opposed to testing for a singular correct +response. + +Thus, the WebGPU CTS has a significant amount of machinery around how to +correctly test floating point expectations in a fluent manner. + +## Floating Point Numbers + +For some of the following discussion of floating point numbers 32-bit +floating numbers are assumed, also known as single precision IEEE floating +point numbers or `f32`s. Most of the discussions that apply to this format apply +to other concrete formats that are handled, i.e. 16-bit/f16/half-precision. +There are some significant differences with respect to AbstractFloats, which +will be discussed in its own section. + +Details of how these formats work are discussed as needed below, but for a more +involved discussion, please see the references in the Resources sections. + +Additionally, in the Appendix there is a table of interesting/common values that +are often referenced in tests or this document. + +A floating point number system defines +- A finite set of values to stand as representatives for the infinite set of + real numbers, and +- Arithmetic operations on those representatives, trying to approximate the + ideal operations on real numbers. + +The cardinality mismatch alone implies that any floating point number system necessarily loses information. + +This means that not all numbers in the bounds can be exactly represented as a +floating point value. + +For example, the integer `1` is exactly represented as a f32 as `0x3f800000`, +but the next nearest number `0x3f800001` is `1.00000011920928955`. + +So any number between `1` and `1.00000011920928955` is not exactly representable +as a f32 and instead is approximated as either `1` or `1.00000011920928955`. + +When a number X is not exactly representable by a floating point value, there +are normally two neighbouring numbers that could reasonably represent X: the +nearest floating point value above X, and the nearest floating point value below +X. Which of these values gets used is dictated by the rounding mode being used, +which may be something like always round towards 0 or go to the nearest +neighbour, or something else entirely. + +The process of converting numbers between different precisions is called +quantization. WGSL does not prescribe a specific rounding mode when +quantizing, so either of the neighbouring values is considered valid when +converting a non-exactly representable value to a floating point value. This has +significant implications on the CTS that are discussed later. + +From here on, we assume you are familiar with the internal structure of a +floating point number (a sign bit, a biased exponent, and a mantissa). For +reference, see +[binary64 on Wikipedia](https://en.wikipedia.org/wiki/Double-precision_floating-point_format), +[binary32 on Wikipedia](https://en.wikipedia.org/wiki/Single-precision_floating-point_format), +and +[binary16 on Wikipedia](https://en.wikipedia.org/wiki/Half-precision_floating-point_format). + +In the floating points formats described above, there are two possible zero +values, one with all bits being 0, called positive zero, and one all the same +except with the sign bit being 1, called negative zero. + +For WGSL, and thus the CTS's purposes, these values are considered equivalent. +Typescript, which the CTS is written in, treats all zeros as positive zeros, +unless you explicitly escape hatch to differentiate between them, so most of the +time there being two zeros doesn't materially affect code. + +### Normal Numbers + +Normal numbers are floating point numbers whose biased exponent is not all 0s or +all 1s. When working with normal numbers the mantissa starts with an implied +leading 1. For WGSL these numbers behave as you expect for floating point values +with no interesting caveats. + +### Subnormal Numbers + +Subnormal numbers are finite non-zero numbers whose biased exponent is all 0s, +sometimes called denorms. + +These are the closest numbers to zero, both positive and negative, and fill in +the gap between the normal numbers with the smallest magnitude, and 0. + +Some devices, for performance reasons, do not handle operations on the +subnormal numbers, and instead treat them as being zero, this is called *flush +to zero* or FTZ behaviour. + +This means in the CTS that when a subnormal number is consumed or produced by an +operation, an implementation may choose to replace it with zero. + +Like the rounding mode for quantization, this adds significant complexity to the +CTS, which will be discussed later. + +### Inf & NaNs + +Floating point numbers include positive and negative infinity to represent +values that are out of the bounds supported by the current precision. + +Implementations may assume that infinities are not present. When an evaluation +at runtime would produce an infinity, an indeterminate value is produced +instead. + +When a value goes out of bounds for a specific precision there are special +rounding rules that apply. If it is 'near' the edge of finite values for that +precision, it is considered to be near-overflowing, and the implementation may +choose to round it to the edge value or the appropriate infinity. If it is not +near the finite values, which it is considered to be far-overflowing, then it +must be rounded to the appropriate infinity. + +This of course is vague, but the spec does have a precise definition where the +transition from near to far overflow is. + +Let `x` be our value. + +Let `exp_max` be the (unbiased) exponent of the largest finite value for the +floating point type. + +If `|x|` < `2 ** (exp_max + 1)`, but not in +the finite range, than it is considered to be near-overflowing for the +floating point type. + +If the magnitude is equal to or greater than this limit, then it is +far-overflowing for the floating point type. + +This concept of near-overflow vs far-overflow divides the real number line into +5 distinct regions. + +| Region | Rule | +|-----------------------------------------------|---------------------------------| +| -∞ < `x` <= `-(2 ** (exp_max + 1))` | must round to -∞ | +| `-(2 ** (exp_max + 1))` < `x` <= min fp value | must round to -∞ or min value | +| min fp value < `x` < max fp value | round as discussed below | +| max fp value <= `x` < `2 ** (exp_max + 1)` | must round to max value or ∞ | +| `2 ** (exp_max + 1))` < `x` | implementations must round to ∞ | + + +The CTS encodes the least restrictive interpretation of the rules in the spec, +i.e. assuming someone has made a slightly adversarial implementation that always +chooses the thing with the least accuracy. + +This means that the above rules about infinities and overflow combine to say +that any time a non-finite value for the specific floating point type is seen, +any finite value is acceptable afterward. This is because the non-finite value +may be converted to an infinity and then an indeterminate value can be used +instead of the infinity. + +(This comes with the caveat that this is only for runtime execution on a GPU, +the rules for compile time execution will be discussed below.) + +Signaling NaNs are treated as quiet NaNs in the WGSL spec. And quiet NaNs have +the same "may-convert-to-indeterminate-value" behaviour that infinities have, so +for the purpose of the CTS they are handled by the infinite/out of bounds logic +normally. + +## Notation/Terminology + +When discussing floating point values in the CTS, there are a few terms used +with precise meanings, which will be elaborated here. + +Additionally, any specific notation used will be specified here to avoid +confusion. + +### Operations + +The CTS tests for the proper execution of builtins, i.e. `sin`, `sqrt`, `abs`, +etc, and expressions, i.e. `*`, `/`, `<`, etc, when provided with floating +point inputs. These collectively can be referred to as floating point +operations. + +Operations, which can be thought of as mathematical functions, are mappings from +a set of inputs to a set of outputs. + +Denoted `f(x, y) = X`, where `f` is a placeholder or the name of the operation, +lower case variables are the inputs to the function, and uppercase variables are +the outputs of the function. + +Operations have one or more inputs and an output value. + +Values are generally defined as floats, integers, booleans, vectors, and +matrices. Consult the [WGSL Spec](https://www.w3.org/TR/WGSL/) for the exact +list of types and their definitions. + +Most operations inputs and output are the same type of value. There are some +exceptions that accept or emit heterogeneous data types, normally a floating +point type and a integer type or a boolean. + +There are a couple of builtins (`frexp` and `modf`) that return composite +outputs where there are multiple values being returned, there is a single result +value made of structured data. Whereas composite inputs are handle by having +multiple input parameters. + +Some examples of different types of operations: + +`multiplication(x, y) = X`, which represents the WGSL expression `x * y`, takes +in floating point values, `x` and `y`, and produces a floating point value `X`. + +`lessThan(x, y) = X`, which represents the WGSL expression `x < y`, again takes +in floating point values, but in this case returns a boolean value. + +`ldexp(x, y) = X`, which builds a floating point value, takes in a floating +point value `x` and a restricted integer `y`. + +### Domain, Range, and Intervals + +For an operation `f(x) = X`, the interval of valid values for the input, `x`, is +called the *domain*, and the interval for valid results, `X`, is called the +*range*. + +An interval, `[a, b]`, is a set of real numbers that contains `a`, `b`, and all +the real numbers between them. + +Open-ended intervals, i.e. ones that don't include `a` and/or `b`, are avoided, +and are called out explicitly when they occur. + +The convention in this doc and the CTS code is that `a <= b`, so `a` can be +referred to as the beginning of the interval and `b` as the end of the interval. + +When talking about intervals, this doc and the code endeavours to avoid using +the term **range** to refer to the span of values that an interval covers, +instead using the term bounds to avoid confusion of terminology around output of +operations. + +## Accuracy + +As mentioned above floating point numbers are not able to represent all the +possible values over their bounds, but instead represent discrete values in that +interval, and approximate the remainder. + +Additionally, floating point numbers are not evenly distributed over the real +number line, but instead are more densely clustered around zero, with the space +between values increasing in steps as the magnitude increases. + +When discussing operations on floating point numbers, there is often reference +to a true value. This is the value that given no performance constraints and +infinite precision you would get, i.e `acos(1) = π`, where π has infinite +digits of precision. + +For the CTS it is often sufficient to calculate the true value using TypeScript, +since its native number format is higher precision (double-precision/f64), so +all f64, f32, and f16 values can be represented in it. Where this breaks down +will be discussed in the section on compile time vs runtime execution. + +The true value is sometimes representable exactly as a floating point value, but +often is not. + +Additionally, many operations are implemented using approximations from +numerical analysis, where there is a tradeoff between the precision of the +result and the cost. + +Thus, the spec specifies what the accuracy constraints for specific operations +is, how close to truth an implementation is required to be, to be +considered conforming. + +There are 5 different ways that accuracy requirements are defined in the spec: + +1. *Exact* + + This is the situation where it is expected that true value for an operation + is always expected to be exactly representable. This doesn't happen for any + of the operations that return floating point values, but does occur for + logical operations that return boolean values. + + +2. *Correctly Rounded* + + For the case that the true value is exactly representable as a floating + point, this is the equivalent of exactly from above. In the event that the + true value is not exact, then the acceptable answer for most numbers is + either the nearest representable value above or below the true value. + + For values near the subnormal range, e.g. close to zero, this becomes more + complex, since an implementation may FTZ at any point. So if the exact + solution is subnormal or either of the neighbours of the true value are + subnormal, zero becomes a possible result, thus the acceptance interval is + wider than naively expected. + + On the edge of and beyond the bounds of a floating point type the definition + of correctly rounded becomes complex, which is discussed in detail in the + section on overflow. + + +3. *Absolute Error* + + This type of accuracy specifies an error value, ε, and the calculated result + is expected to be within that distance from the true value, i.e. + `[ X - ε, X + ε ]`. + + The main drawback with this manner of specifying accuracy is that it doesn't + scale with the level of precision in floating point numbers themselves at a + specific value. Thus, it tends to be only used for specifying accuracy over + specific limited intervals, i.e. [-π, π]. + + +4. *Units of Least Precision (ULP)* + + The solution to the issue of not scaling with precision of floating point is + to use units of least precision. + + ULP(X) is min (b-a) over all pairs (a,b) of representable floating point + numbers such that (a <= X <= b and a =/= b). For a more formal discussion of + ULP see + [On the definition of ulp(x)](https://hal.inria.fr/inria-00070503/document). + + n * ULP or nULP means `[X - n * ULP @ X, X + n * ULP @ X]`. + + +5. *Inherited* + + When an operation's accuracy is defined in terms of other operations, then + its accuracy is said to be inherited. Handling of inherited accuracies is + one of the main driving factors in the design of testing framework, so will + need to be discussed in detail. + +## Acceptance Intervals + +The first four accuracy types; Exact, Correctly Rounded, Absolute Error, and +ULP, sometimes called simple accuracies, can be defined in isolation from each +other, and by association can be implemented using relatively independent +implementations. + +The original implementation of the floating point framework did this as it was +being built out, but ran into difficulties when defining the inherited +accuracies. + +For examples, `tan(x) inherits from sin(x)/cos(x)`, one can take the defined +rules and manually build up a bespoke solution for checking the results, but +this is tedious, error-prone, and doesn't allow for code re-use. + +Instead, it would be better if there was a single conceptual framework that one +can express all the 'simple' accuracy requirements in, and then have a mechanism +for composing them to define inherited accuracies. + +In the WebGPU CTS this is done via the concept of acceptance intervals, which is +derived from a similar concept in the Vulkan CTS, though implemented +significantly differently. + +The core of this idea is that each of different accuracy types can be integrated +into the definition of the operation, so that instead of transforming an input +from the domain to a point in the range, the operation is producing an interval +in the range, that is the acceptable values an implementation may emit. + + +The simple accuracies can be defined as follows: + +1. *Exact* + + `f(x) => [X, X]` + + +2. *Correctly Rounded* + + If `X` is precisely defined as a floating point value + + `f(x) => [X, X]` + + otherwise, + + `[a, b]` where `a` is the largest representable number with `a <= X`, and `b` + is the smallest representable number with `X <= b` + + +3. *Absolute Error* + + `f(x) => [ X - ε, X + ε ]`, where ε is the absolute error value + + +4. **ULP Error** + + `f(x) = X => [X - n*ULP(X), X + n*ULP(X)]` + +As defined, these definitions handle mapping from a point in the domain into an +interval in the range. + +This is insufficient for implementing inherited accuracies, since inheritance +sometimes involve mapping domain intervals to range intervals. + +Here we use the convention for naturally extending a function on real numbers +into a function on intervals of real numbers, i.e. `f([a, b]) = [A, B]`. + +Given that floating point numbers have a finite number of precise values for any +given interval, one could implement just running the accuracy computation for +every point in the interval and then spanning together the resultant intervals. +That would be very inefficient though and make your reviewer sad to read. + +For mapping intervals to intervals the key insight is that we only need to be +concerned with the extrema of the operation in the interval, since the +acceptance interval is the bounds of the possible outputs. + +In more precise terms: +``` + f(x) => X, x = [a, b] and X = [A, B] + + X = [min(f(x)), max(f(x))] + X = [min(f([a, b])), max(f([a, b]))] + X = [f(m), f(n)] +``` +where `m` and `n` are in `[a, b]`, `m <= n`, and produce the min and max results +for `f` on the interval, respectively. + +So how do we find the minima and maxima for our operation in the domain? + +The common general solution for this requires using calculus to calculate the +derivative of `f`, `f'`, and then find the zeroes `f'` to find inflection +points of `f`. + +This solution wouldn't be sufficient for all builtins, i.e. `step` which is not +differentiable at edge values. + +Thankfully we do not need a general solution for the CTS, since all the builtin +operations are defined in the spec, so `f` is from a known set of options. + +These operations can be divided into two broad categories: monotonic, and +non-monotonic, with respect to an interval. + +The monotonic operations are ones that preserve the order of inputs in their +outputs (or reverse it). Their graph only ever decreases or increases, +never changing from one or the other, though it can have flat sections. + +The non-monotonic operations are ones whose graph would have both regions of +increase and decrease. + +The monotonic operations, when mapping an interval to an interval, are simple to +handle, since the extrema are guaranteed to be the ends of the domain, `a` and +`b`. + +So `f([a, b])` = `[f(a), f(b)]` or `[f(b), f(a)]`. We could figure out if `f` is +increasing or decreasing beforehand to determine if it should be `[f(a), f(b)]` +or `[f(b), f(a)]`. + +It is simpler to just use min & max to have an implementation that is agnostic +to the details of `f`. +``` + A = f(a), B = f(b) + X = [min(A, B), max(A, B)] +``` + +The non-monotonic functions that we need to handle for interval-to-interval +mappings are more complex. Thankfully are a small number of the overall +operations that need to be handled, since they are only the operations that are +used in an inherited accuracy and take in the output of another operation as +part of that inherited accuracy. + +So in the CTS we just have bespoke implementations for each of them. + +Part of the operation definition in the CTS is a function that takes in the +domain interval, and returns a sub-interval such that the subject function is +monotonic over that sub-interval, and hence the function's minima and maxima are +at the ends. + +This adjusted domain interval can then be fed through the same machinery as the +monotonic functions. + +### Inherited Accuracy + +So with all of that background out of the way, we can now define an inherited +accuracy in terms of acceptance intervals. + +The crux of this is the insight that the range of one operation can become the +domain of another operation to compose them together. + +And since we have defined how to do this interval to interval mapping above, +transforming things becomes mechanical and thus implementable in reusable code. + +When talking about inherited accuracies `f(x) => g(x)` is used to denote that +`f`'s accuracy is a defined as `g`. + +An example to illustrate inherited accuracies, in f32: + +``` + tan(x) => sin(x)/cos(x) + + sin(x) => [sin(x) - 2 ** -11, sin(x) + 2 ** -11]` + cos(x) => [cos(x) - 2 ** -11, cos(x) + 2-11] + + x/y => [x/y - 2.5 * ULP(x/y), x/y + 2.5 * ULP(x/y)] +``` + +`sin(x)` and `cos(x)` are non-monotonic, so calculating out a closed generic +form over an interval is a pain, since the min and max vary depending on the +value of x. Let's isolate this to a single point, so you don't have to read +literally pages of expanded intervals. + +``` + x = π/2 + + sin(π/2) => [sin(π/2) - 2 ** -11, sin(π/2) + 2 ** -11] + => [0 - 2 ** -11, 0 + 2 ** -11] + => [-0.000488…, 0.000488…] + cos(π/2) => [cos(π/2) - 2 ** -11, cos(π/2) + 2 ** -11] + => [-0.500488…, -0.499511…] + + tan(π/2) => sin(π/2)/cos(π/2) + => [-0.000488…, 0.000488…]/[-0.500488…, -0.499511…] + => [min(-0.000488…/-0.500488…, -0.000488…/-0.499511…, 0.000488…/-0.500488…, 0.000488…/-0.499511…), + max(-0.000488…/-0.500488…, -0.000488…/-0.499511…, 0.000488…/-0.500488…, 0.000488…/-0.499511…)] + => [0.000488…/-0.499511…, 0.000488…/0.499511…] + => [-0.0009775171, 0.0009775171] +``` + +For clarity this has omitted a bunch of complexity around FTZ behaviours, and +that these operations are only defined for specific domains, but the high-level +concepts hold. + +For each of the inherited operations we could implement a manually written out +closed form solution, but that would be quite error-prone and not be +re-using code between builtins. + +Instead, the CTS takes advantage of the fact in addition to testing +implementations of `tan(x)` we are going to be testing implementations of +`sin(x)`, `cos(x)` and `x/y`, so there should be functions to generate +acceptance intervals for those operations. + +The `tan(x)` acceptance interval can be constructed by generating the acceptance +intervals for `sin(x)`, `cos(x)` and `x/y` via function calls and composing the +results. + +This algorithmically looks something like this: + +``` + tan(x): + Calculate sin(x) interval + Calculate cos(x) interval + Calculate sin(x) result divided by cos(x) result + Return division result +``` + +## Compile vs Run Time Evaluation + +The above discussions have been primarily agnostic to when and where a +calculation is occurring, with an implicit bias to runtime execution on a GPU. + +In reality where/when a computation is occurring has a significant impact on the +expected outcome when dealing with edge cases. + +### Terminology + +There are two related axes that will be referred to when it comes to evaluation. +These are compile vs run time, and CPU vs GPU. Broadly speaking compile time +execution happens on the host CPU, and run time evaluation occurs on a dedicated +GPU. + +(Software graphics implementations like WARP and SwiftShader technically break this by +being a software emulation of a GPU that runs on the CPU, but conceptually one can +think of these implementations being a type of GPU in this context, since it has +similar constraints when it comes to precision, etc.) + +Compile time evaluation is execution that occurs when setting up a shader +module, i.e. when compiling WGSL to a platform specific shading language. It is +part of resolving values for things like constants, and occurs once before the +shader is run by the caller. It includes constant evaluation and override +evaluation. All AbstractFloat operations are compile time evaluated. + +Runtime evaluation is execution that occurs every time the shader is run, and +may include dynamic data that is provided between invocations. It is work that +is sent to the GPU for execution in the shader. + +WGSL const-expressions and override-expressions are evaluated before runtime and +both are considered "compile time" in this discussion. WGSL runtime-expressions +are evaluated at runtime. + +### Behavioural Differences + +For a well-defined operation with a finite result, runtime and compile time +evaluation should be indistinguishable. + +For example: +``` +// runtime +@group(0) @binding(0) var a : f32; +@group(0) @binding(1) var b : f32; + +let c: f32 = a + b +``` +and +``` +// compile time +const c: f32 = 1.0f + 2.0f +``` +should produce the same result of `3.0` in the variable `c`, assuming `1.0` and `2.0` +were passed in as `a` and `b`. + +The only difference, is when/where the execution occurs. + +The difference in behaviour between these two occur when the result of the +operation is not finite for the underlying floating point type. + +If instead of `1.0` and `2.0`, we had `10.0` and `f32.max`, so the true result is +`f32.max + 10.0`, the behaviours differ. Specifically the runtime +evaluated version will still run, but the result in `c` will be an indeterminate +value, which is any finite f32 value. For the compile time example instead, +compiling the shader will fail validation. + +This applies to any operation, and isn't restricted to just addition. Anytime a +value goes outside the finite values the shader will hit these results, +indeterminate for runtime execution and validation failure for compile time +execution. + +Unfortunately we are dealing with intervals of results and not precise results. +So this leads to more even conceptual complexity. For runtime evaluation, this +isn't too bad, because the rule becomes: if any part of the interval is +non-finite then an indeterminate value can be a result, and the interval for an +indeterminate result `[fp min, fp max]`, will include any finite portions of the +interval. + +Compile time evaluation becomes significantly more complex, because difference +isn't what interval is returned, but does this shader compile or not, which are +mutually exclusive. This is compounded even further by having to consider +near-overflow vs far-overflow behaviour. Thankfully this can be broken down into +a case by case basis based on where an interval falls. + +Assuming `X`, is the well-defined result of an operation, i.e. not indeterminate +due to the operation isn't defined for the inputs: + +| Region | | Result | +|------------------------------|------------------------------------------------------|--------------------------------| +| `abs(X) <= fp max` | interval falls completely in the finite bounds | validation succeeds | +| `abs(X) >= 2 ** (exp_max+1)` | interval falls completely in the far-overflow bounds | validation fails | +| Otherwise | interval intersects the near-overflow region | validation may succeed or fail | + +The final case is somewhat difficult from a CTS perspective, because now it +isn't sufficient to know that a non-finite result has occurred, but what the +specific result is needs to be tracked. Additionally, the expected result is +somewhat ambiguous, since a shader may or may not compile. This could in theory +still be tested by the CTS, via having switching logic that determines in this +region, if the shader compiles expect these results, otherwise pass the test. +This adds a significant amount of complexity to the testing code for thoroughly +testing a relatively small segment of values. Other environments do not have the +behaviour in this region as rigorously defined nor tested, so fully testing +here would likely find lots of issues that would just need to be mitigated in +the CTS. + +Currently, we choose to avoid testing validation of near-overflow scenarios. + +### Additional Technical Limitations + +The above description of compile and runtime evaluation was somewhat based in +the theoretical world that the intervals being used for testing are infinitely +precise, when in actuality they are implemented by the ECMAScript `number` type, +which is implemented as a f64 value. + +For the vast majority of cases, even out of bounds and overflow, this is +sufficient. There is one small slice where this breaks down. Specifically if +the result just outside the finite range by less than 1 f64 ULP of the edge +value. An example of this is `2 ** -11 + f32.max`. This will be between `f32.max` +and `f32.max + ULPF64(f32.max)`. This becomes a problem, because this value +technically falls into the out-of-bounds region, but depending on how +quantization for f64 is handled in the test runner will be either `f32.max` or +`f32.max + ULPF64(f32.max)`. So as a compile time evaluation either we expect an +implementation to always handle this, or it might fail, but we cannot easily +detect it, since this is pushing hard on the limits of precision of the testing +environment. + +(A parallel version of this probably exists on the other side of the +out-of-bounds region, but I don't have a proven example of this) + +The high road fix to this problem is to use an arbitrary precision floating +point implementation. Unfortunately such a library is not on the standards +track for ECMAScript at this time, so we would have to evaluate and pick a +third party dependency to use. Beyond the selection process, this would also +require a significant refactoring of the existing framework code for fixing a +very marginal case. + +(This differs from Float16 support, where the prototyped version of the proposed +API has been pulled in, and the long term plan it use the ECMAScript +implementation's version, once all the major runtimes support it. So it can +be viewed as a polyfill). + +This region currently is not tested as part of the decision to defer testing on +the entire out-of-bounds but not overflowing region. + +In the future if we decide to add testing to the out-of-bounds region, to avoid +perfect being the enemy of good here, it is likely the CTS would still avoid +testing these regions where f64 precision breaks down. If someone is interested +in taking on the effort needed to migrate to an arbitrary precision float +library, or if this turns out to be a significant issue in the future, this +decision can be revisited. + +## Abstract Float + +### Accuracy + +For the concrete floating point types (f32 & f16) the accuracy of operations are +defined in terms of their own type. Specifically for f32, correctly rounded +refers to the nearest f32 values, and ULP is in terms of the distance between +f32 values. + +AbstractFloat internally is defined as a f64, and this applies for exact and +correctly rounded accuracies. Thus, correctly rounded refers to the nearest f64 +values. However, AbstractFloat differs for ULP and absolute errors. Reading +the spec strictly, these all have unbounded accuracies, but it is recommended +that their accuracies be at least as good as the f32 equivalent. + +The difference between f32 and f64 ULP at a specific value X are significant, so +at least as good as f32 requirement is always less strict than if it was +calculated in terms of f64. Similarly, for absolute accuracies the interval +`[x - epsilon, x + epsilon]` is always equal or wider if calculated as f32s +vs f64s. + +If an inherited accuracy is only defined in terms of correctly rounded +accuracies, then the interval is calculated in terms of f64s. If any of the +defining accuracies are ULP or absolute errors, then the result falls into the +unbounded accuracy, but recommended to be at least as good as f32 bucket. + +What this means from a CTS implementation is that for these "at least as good as +f32" error intervals, if the infinitely accurate result is finite for f32, then +the error interval for f64 is just the f32 interval. If the result is not finite +for f32, then the accuracy interval is just the unbounded interval. + +How this is implemented in the CTS is by having the FPTraits for AbstractFloat +forward to the f32 implementation for the operations that are tested to be as +good as f32. + +### Implementation + +AbstractFloats are a compile time construct that exist in WGSL. They are +expressible as literal values or the result of operations that return them, but +a variable cannot be typed as an AbstractFloat. Instead, the variable needs be a +concrete type, i.e. f32 or f16, and the AbstractFloat value will be quantized +on assignment. + +Because they cannot be stored nor passed via buffers, it is tricky to test them. +There are two approaches that have been proposed for testing the results of +operations that return AbstractFloats. + +As of the writing of this doc, this second option for testing AbstractFloats +is the one being pursued in the CTS. + +#### const_assert + +The first proposal is to lean on the `const_assert` statement that exists in +WGSL. For each test case a snippet of code would be written out that has a form +something like this + +``` +// foo(x) is the operation under test +const_assert lower < foo(x) // Result was below the acceptance interval +const_assert upper > foo(x) // Result was above the acceptance interval +``` + +where lower and upper would actually be string replaced with literals for the +bounds of the acceptance interval when generating the shader text. + +This approach has a number of limitations that made it unacceptable for the CTS. +First, how errors are reported is a pain to debug. Someone working with the CTS +would either get a report of a failed shader compile, or a failed compile with +the line number, but they will not get the result of `foo(x)`. Just that it is +out of range. Additionally, if you place many of these stanzas in the same +shader to optimize dispatch, you will not get a report that these 3 of 32 cases +failed with these results, you will just get this batch failed. All of these +makes for a very poor experience in attempting to understand what is failing. + +Beyond the lack of ergonomics, this approach also makes things like AF +comparison and const_assert very load bearing for the CTS. It is possible that +a bug could exist in an implementation of const_assert for example that would +cause it to not fail shader compilation, which could lead to silent passing of +tests. Conceptually you can think of this instead of depending on a signal to +indicate something is working, we would be depending on a signal that it isn't +working, and assuming if we don't receive that signal everything is good, not +that our signal mechanism was broken. + +#### Extracting Bits + +The other proposal that was developed depends on the fact that AbstractFloat is +spec'd to be a f64 internally. So the CTS could store the result of an operation +as two 32-bit unsigned integers (or broken up into sign, exponent, and +mantissa). These stored integers could be exported to the testing framework via +a buffer, which could in turn rebuild the f64 values. + +This approach allows the CTS to test values directly in the testing framework, +thus provide the same diagnostics as other tests, as well as reusing the same +running harness. + +The major downsides come from actually implementing extracting the bits. Due to +the restrictions on AbstractFloats the actual code to extract the bits is +tricky. Specifically there is no simple bit cast to something like an +AbstractInt that can be used. Instead, `frexp` needs to be used with additional +operations. This leads to problems, since as is `frexp` is not defined for +subnormal values, so it is impossible to extract a subnormal AbstractFloat, +though 0 could be returned when one is encountered. + +Test that do try to extract bits to determine the result should either avoid +cases with subnormal results or check for the nearest normal or zero number. + +The inability to store AbstractFloats in non-lossy fashion also has additional +issues, since this means that user defined functions that take in or return +them do not exist in WGSL. Thus, the snippet of code for extracting +AbstractFloats cannot just be inserted as a function at the top of a testing +shader, and then invoked on each test case. Instead, it needs to be inlined +into the shader at each call-site. Actually implementing this in the CTS isn't +difficult, but it does make the shaders significantly longer and more +difficult to read. It also may have an impact on how many test cases can be in +a batch, since runtime for some backends is sensitive to the length of the +shader being run. + +# Appendix + +### Significant f64 Values + +| Name | Decimal (~) | Hex | Sign Bit | Exponent Bits | Significand Bits | +|------------------------|----------------:|----------------------:|---------:|--------------:|-----------------------------------------------------------------:| +| Negative Infinity | -∞ | 0xfff0 0000 0000 0000 | 1 | 111 1111 1111 | 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 | +| Min Negative Normal | -1.79769313E308 | 0xffef ffff ffff ffff | 1 | 111 1111 1110 | 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 | +| Max Negative Normal | -2.2250738E−308 | 0x8010 0000 0000 0000 | 1 | 000 0000 0001 | 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 | +| Min Negative Subnormal | -2.2250738E−308 | 0x800f ffff ffff ffff | 1 | 000 0000 0000 | 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 | +| Max Negative Subnormal | -4.9406564E−324 | 0x8000 0000 0000 0001 | 1 | 000 0000 0000 | 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 | +| Negative Zero | -0 | 0x8000 0000 0000 0000 | 1 | 000 0000 0000 | 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 | +| Positive Zero | 0 | 0x0000 0000 0000 0000 | 0 | 000 0000 0000 | 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 | +| Min Positive Subnormal | 4.9406564E−324 | 0x0000 0000 0000 0001 | 0 | 000 0000 0000 | 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 | +| Max Positive Subnormal | 2.2250738E−308 | 0x000f ffff ffff ffff | 0 | 000 0000 0000 | 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 | +| Min Positive Normal | 2.2250738E−308 | 0x0010 0000 0000 0000 | 0 | 000 0000 0001 | 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 | +| Max Positive Normal | 1.79769313E308 | 0x7fef ffff ffff ffff | 0 | 111 1111 1110 | 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 | +| Negative Infinity | ∞ | 0x7ff0 0000 0000 0000 | 0 | 111 1111 1111 | 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 | + +### Significant f32 Values + +| Name | Decimal (~) | Hex | Sign Bit | Exponent Bits | Significand Bits | +|------------------------|---------------:|------------:|---------:|--------------:|-----------------------------:| +| Negative Infinity | -∞ | 0xff80 0000 | 1 | 1111 1111 | 0000 0000 0000 0000 0000 000 | +| Min Negative Normal | -3.40282346E38 | 0xff7f ffff | 1 | 1111 1110 | 1111 1111 1111 1111 1111 111 | +| Max Negative Normal | -1.1754943E−38 | 0x8080 0000 | 1 | 0000 0001 | 0000 0000 0000 0000 0000 000 | +| Min Negative Subnormal | -1.1754942E-38 | 0x807f ffff | 1 | 0000 0000 | 1111 1111 1111 1111 1111 111 | +| Max Negative Subnormal | -1.4012984E−45 | 0x8000 0001 | 1 | 0000 0000 | 0000 0000 0000 0000 0000 001 | +| Negative Zero | -0 | 0x8000 0000 | 1 | 0000 0000 | 0000 0000 0000 0000 0000 000 | +| Positive Zero | 0 | 0x0000 0000 | 0 | 0000 0000 | 0000 0000 0000 0000 0000 000 | +| Min Positive Subnormal | 1.4012984E−45 | 0x0000 0001 | 0 | 0000 0000 | 0000 0000 0000 0000 0000 001 | +| Max Positive Subnormal | 1.1754942E-38 | 0x007f ffff | 0 | 0000 0000 | 1111 1111 1111 1111 1111 111 | +| Min Positive Normal | 1.1754943E−38 | 0x0080 0000 | 0 | 0000 0001 | 0000 0000 0000 0000 0000 000 | +| Max Positive Normal | 3.40282346E38 | 0x7f7f ffff | 0 | 1111 1110 | 1111 1111 1111 1111 1111 111 | +| Negative Infinity | ∞ | 0x7f80 0000 | 0 | 1111 1111 | 0000 0000 0000 0000 0000 000 | + +### Significant f16 Values + +| Name | Decimal (~) | Hex | Sign Bit | Exponent Bits | Significand Bits | +|------------------------|--------------:|-------:|---------:|--------------:|-----------------:| +| Negative Infinity | -∞ | 0xfc00 | 1 | 111 11 | 00 0000 0000 | +| Min Negative Normal | -65504 | 0xfbff | 1 | 111 10 | 11 1111 1111 | +| Max Negative Normal | -6.1035156E−5 | 0x8400 | 1 | 000 01 | 00 0000 0000 | +| Min Negative Subnormal | -6.0975552E−5 | 0x83ff | 1 | 000 00 | 11 1111 1111 | +| Max Negative Subnormal | -5.9604645E−8 | 0x8001 | 1 | 000 00 | 00 0000 0001 | +| Negative Zero | -0 | 0x8000 | 1 | 000 00 | 00 0000 0000 | +| Positive Zero | 0 | 0x0000 | 0 | 000 00 | 00 0000 0000 | +| Min Positive Subnormal | 5.9604645E−8 | 0x0001 | 0 | 000 00 | 00 0000 0001 | +| Max Positive Subnormal | 6.0975552E−5 | 0x03ff | 0 | 000 00 | 11 1111 1111 | +| Min Positive Normal | 6.1035156E−5 | 0x0400 | 0 | 000 01 | 00 0000 0000 | +| Max Positive Normal | 65504 | 0x7bff | 0 | 111 10 | 11 1111 1111 | +| Negative Infinity | ∞ | 0x7c00 | 0 | 111 11 | 00 0000 0000 | + +# Resources +- [WebGPU Spec](https://www.w3.org/TR/webgpu/) +- [WGSL Spec](https://www.w3.org/TR/WGSL/) +- [binary64 on Wikipedia](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) +- [binary32 on Wikipedia](https://en.wikipedia.org/wiki/Single-precision_floating-point_format) +- [binary16 on Wikipedia](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) +- [IEEE-754 Floating Point Converter](https://www.h-schmidt.net/FloatConverter/IEEE754.html) +- [IEEE 754 Calculator](http://weitz.de/ieee/) +- [On the definition of ulp(x)](https://hal.inria.fr/inria-00070503/document) +- [Float Exposed](https://float.exposed/) diff --git a/dom/webgpu/tests/cts/checkout/docs/helper_index.txt b/dom/webgpu/tests/cts/checkout/docs/helper_index.txt new file mode 100644 index 0000000000..3cdf868bb4 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/helper_index.txt @@ -0,0 +1,93 @@ +<!-- + View this file in Typedoc! + + - At https://gpuweb.github.io/cts/docs/tsdoc/ + - Or locally: + - npm run tsdoc + - npm start + - http://localhost:8080/docs/tsdoc/ + + This file is parsed as a tsdoc. +--> + +## Index of Test Helpers + +This index is a quick-reference of helper functions in the test suite. +Use it to determine whether you can reuse a helper, instead of writing new code, +to improve readability and reviewability. + +Whenever a new generally-useful helper is added, it should be indexed here. + +**See linked documentation for full helper listings.** + +- {@link common/framework/params_builder!CaseParamsBuilder} and {@link common/framework/params_builder!SubcaseParamsBuilder}: + Combinatorial generation of test parameters. They are iterated by the test framework at runtime. + See `examples.spec.ts` for basic examples of how this behaves. + - {@link common/framework/params_builder!CaseParamsBuilder}: + `ParamsBuilder` for adding "cases" to a test. + - {@link common/framework/params_builder!CaseParamsBuilder#beginSubcases}: + "Finalizes" the `CaseParamsBuilder`, returning a `SubcaseParamsBuilder`. + - {@link common/framework/params_builder!SubcaseParamsBuilder}: + `ParamsBuilder` for adding "subcases" to a test. + +### Fixtures + +(Uncheck the "Inherited" box to hide inherited methods from documentation pages.) + +- {@link common/framework/fixture!Fixture}: Base fixture for all tests. +- {@link webgpu/gpu_test!GPUTest}: Base fixture for WebGPU tests. +- {@link webgpu/api/validation/validation_test!ValidationTest}: Base fixture for WebGPU validation tests. +- {@link webgpu/shader/validation/shader_validation_test!ShaderValidationTest}: Base fixture for WGSL shader validation tests. +- {@link webgpu/idl/idl_test!IDLTest}: + Base fixture for testing the exposed interface is correct (without actually using WebGPU). + +### WebGPU Helpers + +- {@link webgpu/capability_info}: Structured information about texture formats, binding types, etc. +- {@link webgpu/constants}: + Constant values (needed anytime a WebGPU constant is needed outside of a test function). +- {@link webgpu/util/buffer}: Helpers for GPUBuffers. +- {@link webgpu/util/texture}: Helpers for GPUTextures. +- {@link webgpu/util/unions}: Helpers for various union typedefs in the WebGPU spec. +- {@link webgpu/util/math}: Helpers for common math operations. +- {@link webgpu/util/check_contents}: Check the contents of TypedArrays, with nice messages. + Also can be composed with {@link webgpu/gpu_test!GPUTest#expectGPUBufferValuesPassCheck}, used to implement + GPUBuffer checking helpers in GPUTest. +- {@link webgpu/util/conversion}: Numeric encoding/decoding for float/unorm/snorm values, etc. +- {@link webgpu/util/copy_to_texture}: + Helper class for copyToTexture test suites for execution copy and check results. +- {@link webgpu/util/color_space_conversion}: + Helper functions to do color space conversion. The algorithm is the same as defined in + CSS Color Module Level 4. +- {@link webgpu/util/create_elements}: + Helpers for creating web elements like HTMLCanvasElement, OffscreenCanvas, etc. +- {@link webgpu/util/shader}: Helpers for creating fragment shader based on intended output values, plainType, and componentCount. +- {@link webgpu/util/prng}: Seed-able deterministic pseudo random number generator. Replacement for Math.random(). +- {@link webgpu/util/texture/base}: General texture-related helpers. +- {@link webgpu/util/texture/data_generation}: Helper for generating dummy texture data. +- {@link webgpu/util/texture/layout}: Helpers for working with linear image data + (like in copyBufferToTexture, copyTextureToBuffer, writeTexture). +- {@link webgpu/util/texture/subresource}: Helpers for working with texture subresource ranges. +- {@link webgpu/util/texture/texel_data}: Helpers encoding/decoding texel formats. +- {@link webgpu/util/texture/texel_view}: Helper class to create and view texture data through various representations. +- {@link webgpu/util/texture/texture_ok}: Helpers for checking texture contents. +- {@link webgpu/shader/types}: Helpers for WGSL data types. +- {@link webgpu/shader/execution/expression/expression}: Helpers for WGSL expression execution tests. +- {@link webgpu/web_platform/util}: Helpers for web platform features (e.g. video elements). + +### General Helpers + +- {@link common/framework/resources}: Provides the path to the `resources/` directory. +- {@link common/util/navigator_gpu}: Finds and returns the `navigator.gpu` object or equivalent. +- {@link common/util/util}: Miscellaneous utilities. + - {@link common/util/util!assert}: Assert a condition, otherwise throw an exception. + - {@link common/util/util!unreachable}: Assert unreachable code. + - {@link common/util/util!assertReject}, {@link common/util/util!resolveOnTimeout}, + {@link common/util/util!rejectOnTimeout}, + {@link common/util/util!raceWithRejectOnTimeout}, and more. +- {@link common/util/collect_garbage}: + Attempt to trigger garbage collection, for testing that garbage collection is not observable. +- {@link common/util/preprocessor}: A simple template-based, non-line-based preprocessor, + implementing if/elif/else/endif. Possibly useful for WGSL shader generation. +- {@link common/util/timeout}: Use this instead of `setTimeout`. +- {@link common/util/types}: Type metaprogramming helpers. diff --git a/dom/webgpu/tests/cts/checkout/docs/implementing.md b/dom/webgpu/tests/cts/checkout/docs/implementing.md new file mode 100644 index 0000000000..ae6848839a --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/implementing.md @@ -0,0 +1,97 @@ +# Test Implementation + +Concepts important to understand when writing tests. See existing tests for examples to copy from. + +## Test fixtures + +Most tests can use one of the several common test fixtures: + +- `Fixture`: Base fixture, provides core functions like `expect()`, `skip()`. +- `GPUTest`: Wraps every test in error scopes. Provides helpers like `expectContents()`. +- `ValidationTest`: Extends `GPUTest`, provides helpers like `expectValidationError()`, `getErrorTextureView()`. +- Or create your own. (Often not necessary - helper functions can be used instead.) + +Test fixtures or helper functions may be defined in `.spec.ts` files, but if used by multiple +test files, should be defined in separate `.ts` files (without `.spec`) alongside the files that +use them. + +### GPUDevices in tests + +`GPUDevice`s are largely stateless (except for `lost`-ness, error scope stack, and `label`). +This allows the CTS to reuse one device across multiple test cases using the `DevicePool`, +which provides `GPUDevice` objects to tests. + +Currently, there is one `GPUDevice` with the default descriptor, and +a cache of several more, for devices with additional capabilities. +Devices in the `DevicePool` are automatically removed when certain things go wrong. + +Later, there may be multiple `GPUDevice`s to allow multiple test cases to run concurrently. + +## Test parameterization + +The CTS provides helpers (`.params()` and friends) for creating large cartesian products of test parameters. +These generate "test cases" further subdivided into "test subcases". +See `basic,*` in `examples.spec.ts` for examples, and the [helper index](./helper_index.txt) +for a list of capabilities. + +Test parameterization should be applied liberally to ensure the maximum coverage +possible within reasonable time. You can skip some with `.filter()`. And remember: computers are +pretty fast - thousands of test cases can be reasonable. + +Use existing lists of parameters values (such as +[`kTextureFormats`](https://github.com/gpuweb/cts/blob/0f38b85/src/suites/cts/capability_info.ts#L61), +to parameterize tests), instead of making your own list. Use the info tables (such as +`kTextureFormatInfo`) to define and retrieve information about the parameters. + +## Asynchrony in tests + +Since there are no synchronous operations in WebGPU, almost every test is asynchronous in some +way. For example: + +- Checking the result of a readback. +- Capturing the result of a `popErrorScope()`. + +That said, test functions don't always need to be `async`; see below. + +### Checking asynchronous errors/results + +Validation is inherently asynchronous (`popErrorScope()` returns a promise). However, the error +scope stack itself is synchronous - operations immediately after a `popErrorScope()` are outside +that error scope. + +As a result, tests can assert things like validation errors/successes without having an `async` +test body. + +**Example:** + +```typescript +t.expectValidationError(() => { + device.createThing(); +}); +``` + +does: + +- `pushErrorScope('validation')` +- `popErrorScope()` and "eventually" check whether it returned an error. + +**Example:** + +```typescript +t.expectGPUBufferValuesEqual(srcBuffer, expectedData); +``` + +does: + +- copy `srcBuffer` into a new mappable buffer `dst` +- `dst.mapReadAsync()`, and "eventually" check what data it returned. + +Internally, this is accomplished via an "eventual expectation": `eventualAsyncExpectation()` +takes an async function, calls it immediately, and stores off the resulting `Promise` to +automatically await at the end before determining the pass/fail state. + +### Asynchronous parallelism + +A side effect of test asynchrony is that it's possible for multiple tests to be in flight at +once. We do not currently do this, but it will eventually be an option to run `N` tests in +"parallel", for faster local test runs. diff --git a/dom/webgpu/tests/cts/checkout/docs/intro/README.md b/dom/webgpu/tests/cts/checkout/docs/intro/README.md new file mode 100644 index 0000000000..e5f8bcedc6 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/intro/README.md @@ -0,0 +1,99 @@ +# Introduction + +These documents contains guidelines for contributors to the WebGPU CTS (Conformance Test Suite) +on how to write effective tests, and on the testing philosophy to adopt. + +The WebGPU CTS is arguably more important than the WebGPU specification itself, because +it is what forces implementation to be interoperable by checking they conform to the specification. +However writing a CTS is hard and requires a lot of effort to reach good coverage. + +More than a collection of tests like regular end2end and unit tests for software artifacts, a CTS +needs to be exhaustive. Contrast for example the WebGL2 CTS with the ANGLE end2end tests: they +cover the same functionality (WebGL 2 / OpenGL ES 3) but are structured very differently: + +- ANGLE's test suite has one or two tests per functionality to check it works correctly, plus + regression tests and special tests to cover implementation details. +- WebGL2's CTS can have thousands of tests per API aspect to cover every combination of + parameters (and global state) used by an operation. + +Below are guidelines based on our collective experience with graphics API CTSes like WebGL's. +They are expected to evolve over time and have exceptions, but should give a general idea of what +to do. + +## Contributing + +Testing tasks are tracked in the [CTS project tracker](https://github.com/orgs/gpuweb/projects/3). +Go here if you're looking for tasks, or if you have a test idea that isn't already covered. + +If contributing conformance tests, the directory you'll work in is [`src/webgpu/`](../src/webgpu/). +This directory is organized according to the goal of the test (API validation behavior vs +actual results) and its target (API entry points and spec areas, e.g. texture sampling). + +The contents of a test file (`src/webgpu/**/*.spec.ts`) are twofold: + +- Documentation ("test plans") on what tests do, how they do it, and what cases they cover. + Some test plans are fully or partially unimplemented: + they either contain "TODO" in a description or are `.unimplemented()`. +- Actual tests. + +**Please read the following short documents before contributing.** + +### 0. [Developing](developing.md) + +- Reviewers should also read [Review Requirements](../reviews.md). + +### 1. [Life of a Test Change](life_of.md) + +### 2. [Adding or Editing Test Plans](plans.md) + +### 3. [Implementing Tests](tests.md) + +## [Additional Documentation](../) + +## Examples + +### Operation testing of vertex input id generation + +This section provides an example of the planning process for a test. +It has not been refined into a set of final test plan descriptions. +(Note: this predates the actual implementation of these tests, so doesn't match the actual tests.) + +Somewhere under the `api/operation` node are tests checking that running `GPURenderPipelines` on +the device using the `GPURenderEncoderBase.draw` family of functions works correctly. Render +pipelines are composed of several stages that are mostly independent so they can be split in +several parts such as `vertex_input`, `rasterization`, `blending`. + +Vertex input itself has several parts that are mostly separate in hardware: + +- generation of the vertex and instance indices to run for this draw +- fetching of vertex data from vertex buffers based on these indices +- conversion from the vertex attribute `GPUVertexFormat` to the datatype for the input variable + in the shader + +Each of these are tested separately and have cases for each combination of the variables that may +affect them. This means that `api/operation/render/vertex_input/id_generation` checks that the +correct operation is performed for the cartesian product of all the following dimensions: + +- for encoding in a `GPURenderPassEncoder` or a `GPURenderBundleEncoder` +- whether the draw is direct or indirect +- whether the draw is indexed or not +- for various values of the `firstInstance` argument +- for various values of the `instanceCount` argument +- if the draw is not indexed: + - for various values of the `firstVertex` argument + - for various values of the `vertexCount` argument +- if the draw is indexed: + - for each `GPUIndexFormat` + - for various values of the indices in the index buffer including the primitive restart values + - for various values for the `offset` argument to `setIndexBuffer` + - for various values of the `firstIndex` argument + - for various values of the `indexCount` argument + - for various values of the `baseVertex` argument + +"Various values" above mean several small values, including `0` and the second smallest valid +value to check for corner cases, as well as some large value. + +An instance of the test sets up a `draw*` call based on the parameters, using point rendering and +a fragment shader that outputs to a storage buffer. After the draw the test checks the content of +the storage buffer to make sure all expected vertex shader invocation, and only these ones have +been generated. diff --git a/dom/webgpu/tests/cts/checkout/docs/intro/convert_to_issue.png b/dom/webgpu/tests/cts/checkout/docs/intro/convert_to_issue.png Binary files differnew file mode 100644 index 0000000000..672324a9d9 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/intro/convert_to_issue.png diff --git a/dom/webgpu/tests/cts/checkout/docs/intro/developing.md b/dom/webgpu/tests/cts/checkout/docs/intro/developing.md new file mode 100644 index 0000000000..5b1aeed36d --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/intro/developing.md @@ -0,0 +1,134 @@ +# Developing + +The WebGPU CTS is written in TypeScript. + +## Setup + +After checking out the repository and installing node/npm, run: + +```sh +npm ci +``` + +Before uploading, you can run pre-submit checks (`npm test`) to make sure it will pass CI. +Use `npm run fix` to fix linting issues. + +`npm run` will show available npm scripts. +Some more scripts can be listed using `npx grunt`. + +## Dev Server + +To start the development server, use: + +```sh +npm start +``` + +Then, browse to the standalone test runner at the printed URL. + +The server will generate and compile code on the fly, so no build step is necessary. +Only a reload is needed to see saved changes. +(TODO: except, currently, `README.txt` and file `description` changes won't be reflected in +the standalone runner.) + +Note: The first load of a test suite may take some time as generating the test suite listing can +take a few seconds. + +## Standalone Test Runner / Test Plan Viewer + +**The standalone test runner also serves as a test plan viewer.** +(This can be done in a browser without WebGPU support.) +You can use this to preview how your test plan will appear. + +You can view different suites (webgpu, unittests, stress, etc.) or different subtrees of +the test suite. + +- `http://localhost:8080/standalone/` (defaults to `?runnow=0&worker=0&debug=0&q=webgpu:*`) +- `http://localhost:8080/standalone/?q=unittests:*` +- `http://localhost:8080/standalone/?q=unittests:basic:*` + +The following url parameters change how the harness runs: + +- `runnow=1` runs all matching tests on page load. +- `debug=1` enables verbose debug logging from tests. +- `worker=1` runs the tests on a Web Worker instead of the main thread. +- `power_preference=low-power` runs most tests passing `powerPreference: low-power` to `requestAdapter` +- `power_preference=high-performance` runs most tests passing `powerPreference: high-performance` to `requestAdapter` + +### Web Platform Tests (wpt) - Ref Tests + +You can inspect the actual and reference pages for web platform reftests in the standalone +runner by navigating to them. For example, by loading: + + - `http://localhost:8080/out/webgpu/web_platform/reftests/canvas_clear.https.html` + - `http://localhost:8080/out/webgpu/web_platform/reftests/ref/canvas_clear-ref.html` + +You can also run a minimal ref test runner. + + - open 2 terminals / command lines. + - in one, `npm start` + - in the other, `node tools/run_wpt_ref_tests <path-to-browser-executable> [name-of-test]` + +Without `[name-of-test]` all ref tests will be run. `[name-of-test]` is just a simple check for +substring so passing in `rgba` will run every test with `rgba` in its filename. + +Examples: + +MacOS + +``` +# Chrome +node tools/run_wpt_ref_tests /Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary +``` + +Windows + +``` +# Chrome +node .\tools\run_wpt_ref_tests "C:\Users\your-user-name\AppData\Local\Google\Chrome SxS\Application\chrome.exe" +``` + +## Editor + +Since this project is written in TypeScript, it integrates best with +[Visual Studio Code](https://code.visualstudio.com/). +This is optional, but highly recommended: it automatically adds `import` lines and +provides robust completions, cross-references, renames, error highlighting, +deprecation highlighting, and type/JSDoc popups. + +Open the `cts.code-workspace` workspace file to load settings convenient for this project. +You can make local configuration changes in `.vscode/`, which is untracked by Git. + +## Pull Requests + +When opening a pull request, fill out the PR checklist and attach the issue number. +If an issue hasn't been opened, find the draft issue on the +[project tracker](https://github.com/orgs/gpuweb/projects/3) and choose "Convert to issue": + +![convert to issue button screenshot](convert_to_issue.png) + +Opening a pull request will automatically notify reviewers. + +To make the review process smoother, once a reviewer has started looking at your change: + +- Avoid major additions or changes that would be best done in a follow-up PR. +- Avoid rebases (`git rebase`) and force pushes (`git push -f`). These can make + it difficult for reviewers to review incremental changes as GitHub often cannot + view a useful diff across a rebase. If it's necessary to resolve conflicts + with upstream changes, use a merge commit (`git merge`) and don't include any + consequential changes in the merge, so a reviewer can skip over merge commits + when working through the individual commits in the PR. +- When you address a review comment, mark the thread as "Resolved". + +Pull requests will (usually) be landed with the "Squash and merge" option. + +### TODOs + +The word "TODO" refers to missing test coverage. It may only appear inside file/test descriptions +and README files (enforced by linting). + +To use comments to refer to TODOs inside the description, use a backreference, e.g., in the +description, `TODO: Also test the FROBNICATE usage flag [1]`, and somewhere in the code, `[1]: +Need to add FROBNICATE to this list.`. + +Use `MAINTENANCE_TODO` for TODOs which don't impact test coverage. diff --git a/dom/webgpu/tests/cts/checkout/docs/intro/life_of.md b/dom/webgpu/tests/cts/checkout/docs/intro/life_of.md new file mode 100644 index 0000000000..8dced4ad84 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/intro/life_of.md @@ -0,0 +1,46 @@ +# Life of a Test Change + +A "test change" could be a new test, an expansion of an existing test, a test bug fix, or a +modification to existing tests to make them match new spec changes. + +**CTS contributors should contribute to the tracker and strive to keep it up to date, especially +relating to their own changes.** + +Filing new draft issues in the CTS project tracker is very lightweight. +Anyone with access should do this eagerly, to ensure no testing ideas are forgotten. +(And if you don't have access, just file a regular issue.) + +1. Enter a [draft issue](https://github.com/orgs/gpuweb/projects/3), with the Status + set to "New (not in repo)", and any available info included in the issue description + (notes/plans to ensure full test coverage of the change). The source of this may be: + + - Anything in the spec/API that is found not to be covered by the CTS yet. + - Any test is found to be outdated or otherwise buggy. + - A spec change from the "Needs CTS Issue" column in the + [spec project tracker](https://github.com/orgs/gpuweb/projects/1). + Once information on the required test changes is entered into the CTS project tracker, + the spec issue moves to "Specification Done". + + Note: at some point, someone may make a PR to flush "New (not in repo)" issues into `TODO`s in + CTS file/test description text, changing their "Status" to "Open". + These may be done in bulk without linking back to the issue. + +1. As necessary: + + - Convert the draft issue to a full, numbered issue for linking from later PRs. + + ![convert to issue button screenshot](convert_to_issue.png) + + - Update the "Assignees" of the issue when an issue is assigned or unassigned + (you can assign yourself). + - Change the "Status" of the issue to "Started" once you start the task. + +1. Open one or more PRs, **each linking to the associated issue**. + Each PR may is reviewed and landed, and may leave further TODOs for parts it doesn't complete. + + 1. Test are "planned" in test descriptions. (For complex tests, open a separate PR with the + tests `.unimplemented()` so a reviewer can evaluate the plan before you implement tests.) + 1. Tests are implemented. + +1. When **no TODOs remain** for an issue, close it and change its status to "Complete". + (Enter a new more, specific draft issue into the tracker if you need to track related TODOs.) diff --git a/dom/webgpu/tests/cts/checkout/docs/intro/plans.md b/dom/webgpu/tests/cts/checkout/docs/intro/plans.md new file mode 100644 index 0000000000..f8d7af3a78 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/intro/plans.md @@ -0,0 +1,82 @@ +# Adding or Editing Test Plans + +## 1. Write a test plan + +For new tests, if some notes exist already, incorporate them into your plan. + +A detailed test plan should be written and reviewed before substantial test code is written. +This allows reviewers a chance to identify additional tests and cases, opportunities for +generalizations that would improve the strength of tests, similar existing tests or test plans, +and potentially useful [helpers](../helper_index.txt). + +**A test plan must serve two functions:** + +- Describes the test, succinctly, but in enough detail that a reader can read *only* the test + plans and evaluate coverage completeness of a file/directory. +- Describes the test precisely enough that, when code is added, the reviewer can ensure that the + test really covers what the test plan says. + +There should be one test plan for each test. It should describe what it tests, how, and describe +important cases that need to be covered. Here's an example: + +```ts +g.test('x,some_detail') + .desc( + ` +Tests [some detail] about x. Tests calling x in various 'mode's { mode1, mode2 }, +with various values of 'arg', and checks correctness of the result. +Tries to trigger [some conditional path]. + +- Valid values (control case) // <- (to make sure the test function works well) +- Unaligned values (should fail) // <- (only validation tests need to intentionally hit invalid cases) +- Extreme values` + ) + .params(u => + u // + .combine('mode', ['mode1', 'mode2']) + .beginSubcases() + .combine('arg', [ + // Valid // <- Comment params as you see fit. + 4, + 8, + 100, + // Invalid + 2, + 6, + 1e30, + ]) + ) + .unimplemented(); +``` + +"Cases" each appear as individual items in the `/standalone/` runner. +"Subcases" run inside each case, like a for-loop wrapping the `.fn(`test function`)`. +Documentation on the parameter builder can be found in the [helper index](../helper_index.txt). + +It's often impossible to predict the exact case/subcase structure before implementing tests, so they +can be added during implementation, instead of planning. + +For any notes which are not specific to a single test, or for preliminary notes for tests that +haven't been planned in full detail, put them in the test file's `description` variable at +the top. Or, if they aren't associated with a test file, put them in a `README.txt` file. + +**Any notes about missing test coverage must be marked with the word `TODO` inside a +description or README.** This makes them appear on the `/standalone/` page. + +## 2. Open a pull request + +Open a PR, and work with the reviewer(s) to revise the test plan. + +Usually (probably), plans will be landed in separate PRs before test implementations. + +## Conventions used in test plans + +- `Iff`: If and only if +- `x=`: "cartesian-cross equals", like `+=` for cartesian product. + Used for combinatorial test coverage. + - Sometimes this will result in too many test cases; simplify/reduce as needed + during planning *or* implementation. +- `{x,y,z}`: list of cases to test + - e.g. `x= texture format {r8unorm, r8snorm}` +- *Control case*: a case included to make sure that the rest of the cases aren't + missing their target by testing some other error case. diff --git a/dom/webgpu/tests/cts/checkout/docs/intro/tests.md b/dom/webgpu/tests/cts/checkout/docs/intro/tests.md new file mode 100644 index 0000000000..a67b6a20cc --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/intro/tests.md @@ -0,0 +1,25 @@ +# Implementing Tests + +Once a test plan is done, you can start writing tests. +To add new tests, imitate the pattern in neigboring tests or neighboring files. +New test files must be named ending in `.spec.ts`. + +For an example test file, see [`src/webgpu/examples.spec.ts`](../../src/webgpu/examples.spec.ts). +For a more complex, well-structured reference test file, see +[`src/webgpu/api/validation/vertex_state.spec.ts`](../../src/webgpu/api/validation/vertex_state.spec.ts). + +Implement some tests and open a pull request. You can open a PR any time you're ready for a review. +(If two tests are non-trivial but independent, consider separate pull requests.) + +Before uploading, you can run pre-submit checks (`npm test`) to make sure it will pass CI. +Use `npm run fix` to fix linting issues. + +## Test Helpers + +It's best to be familiar with helpers available in the test suite for simplifying +test implementations. + +New test helpers can be added at any time to either of those files, or to new `.ts` files anywhere +near the `.spec.ts` file where they're used. + +Documentation on existing helpers can be found in the [helper index](../helper_index.txt). diff --git a/dom/webgpu/tests/cts/checkout/docs/organization.md b/dom/webgpu/tests/cts/checkout/docs/organization.md new file mode 100644 index 0000000000..fd7020afd6 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/organization.md @@ -0,0 +1,166 @@ +# Test Organization + +## `src/webgpu/` + +Because of the glorious amount of test needed, the WebGPU CTS is organized as a tree of arbitrary +depth (a filesystem with multiple tests per file). + +Each directory may have a `README.txt` describing its contents. +Tests are grouped in large families (each of which has a `README.txt`); +the root and first few levels looks like the following (some nodes omitted for simplicity): + +- **`api`** with tests for full coverage of the Javascript API surface of WebGPU. + - **`validation`** with positive and negative tests for all the validation rules of the API. + - **`operation`** with tests that checks the result of performing valid WebGPU operations, + taking advantage of parametrization to exercise interactions between parts of the API. + - **`regression`** for one-off tests that reproduce bugs found in implementations to prevent + the bugs from appearing again. +- **`shader`** with tests for full coverage of the shaders that can be passed to WebGPU. + - **`validation`**. + - **`execution`** similar to `api/operation`. + - **`regression`**. +- **`idl`** with tests to check that the WebGPU IDL is correctly implemented, for examples that + objects exposed exactly the correct members, and that methods throw when passed incomplete + dictionaries. +- **`web-platform`** with tests for Web platform-specific interactions like `GPUSwapChain` and + `<canvas>`, WebXR and `GPUQueue.copyExternalImageToTexture`. + +At the same time test hierarchies can be used to split the testing of a single sub-object into +several file for maintainability. For example `GPURenderPipeline` has a large descriptor and some +parts could be tested independently like `vertex_input` vs. `primitive_topology` vs. `blending` +but all live under the `render_pipeline` directory. + +In addition to the test tree, each test can be parameterized. For coverage it is important to +test all enums values, for example for `GPUTextureFormat`. Instead of having a loop to iterate +over all the `GPUTextureFormat`, it is better to parameterize the test over them. Each format +will have a different entry in the test list which will help WebGPU implementers debug the test, +or suppress the failure without losing test coverage while they fix the bug. + +Extra capabilities (limits and features) are often tested in the same files as the rest of the API. +For example, a compressed texture format capability would simply add a `GPUTextureFormat` to the +parametrization lists of many tests, while a capability adding significant new functionality +like ray-tracing could have a separate subtree. + +Operation tests for optional features should be skipped using `t.selectDeviceOrSkipTestCase()` or +`t.skip()`. Validation tests should be written that test the behavior with and without the +capability enabled via `t.selectDeviceOrSkipTestCase()`, to ensure the functionality is valid +only with the capability enabled. + +### Validation tests + +Validation tests check the validation rules that are (or will be) set by the +WebGPU spec. Validation tests try to carefully trigger the individual validation +rules in the spec, without simultaneously triggering other rules. + +Validation errors *generally* generate WebGPU errors, not exceptions. +But check the spec on a case-by-case basis. + +Like all `GPUTest`s, `ValidationTest`s are wrapped in both types of error scope. These +"catch-all" error scopes look for any errors during the test, and report them as test failures. +Since error scopes can be nested, validation tests can nest an error scope to expect that there +*are* errors from specific operations. + +#### Parameterization + +Test parameterization can help write many validation tests more succinctly, +while making it easier for both authors and reviewers to be confident that +an aspect of the API is tested fully. Examples: + +- [`webgpu:api,validation,render_pass,resolve:resolve_attachment:*`](https://github.com/gpuweb/cts/blob/ded3b7c8a4680a1a01621a8ac859facefadf32d0/src/webgpu/api/validation/render_pass/resolve.spec.ts#L35) +- [`webgpu:api,validation,createBindGroupLayout:bindingTypeSpecific_optional_members:*`](https://github.com/gpuweb/cts/blob/ded3b7c8a4680a1a01621a8ac859facefadf32d0/src/webgpu/api/validation/createBindGroupLayout.spec.ts#L68) + +Use your own discretion when deciding the balance between heavily parameterizing +a test and writing multiple separate tests. + +#### Guidelines + +There are many aspects that should be tested in all validation tests: + +- each individual argument to a method call (including `this`) or member of a descriptor + dictionary should be tested including: + - what happens when an error object is passed. + - what happens when an optional feature enum or method is used. + - what happens for numeric values when they are at 0, too large, too small, etc. +- each validation rule in the specification should be checked both with a control success case, + and error cases. +- each set of arguments or state that interact for validation. + +When testing numeric values, it is important to check on both sides of the boundary: if the error +happens for value N and not N - 1, both should be tested. Alignment of integer values should also +be tested but boundary testing of alignment should be between a value aligned to 2^N and a value +aligned to 2^(N-1). + +Finally, this is probably also where we would test that extensions follow the rule that: if the +browser supports a feature but it is not enabled on the device, then calling methods from that +feature throws `TypeError`. + +- Test providing unknown properties *that are definitely not part of any feature* are + valid/ignored. (Unfortunately, due to the rules of IDL, adding a member to a dictionary is + always a breaking change. So this is how we have to test this unless we can get a "strict" + dictionary type in IDL. We can't test adding members from non-enabled extensions.) + +### Operation tests + +Operation tests test the actual results of using the API. They execute +(sometimes significant) code and check that the result is within the expected +set of behaviors (which can be quite complex to compute). + +Note that operation tests need to test a lot of interactions between different +parts of the API, and so can become quite complex. Try to reduce the complexity by +utilizing combinatorics and [helpers](./helper_index.txt), and splitting/merging test files as needed. + +#### Errors + +Operation tests are usually `GPUTest`s. As a result, they automatically fail on any validation +errors that occur during the test. + +When it's easier to write an operation test with invalid cases, use +`ParamsBuilder.filter`/`.unless` to avoid invalid cases, or detect and +`expect` validation errors in some cases. + +#### Implementation + +Use helpers like `expectContents` (and more to come) to check the values of data on the GPU. +(These are "eventual expectations" - the harness will wait for them to finish at the end). + +When testing something inside a shader, it's not always necessary to output the result to a +render output. In fragment shaders, you can output to a storage buffer. In vertex shaders, you +can't - but you can render with points (simplest), send the result to the fragment shader, and +output it from there. (Someday, we may end up wanting a helper for this.) + +#### Testing Default Values + +Default value tests (for arguments and dictionary members) should usually be operation tests - +all you have to do is include `undefined` in parameterizations of other tests to make sure the +behavior with `undefined` has the same expected result that you have when the default value is +specified explicitly. + +### IDL tests + +TODO: figure out how to implement these. https://github.com/gpuweb/cts/issues/332 + +These tests test only rules that come directly from WebIDL. For example: + +- Values out of range for `[EnforceRange]` cause exceptions. +- Required function arguments and dictionary members cause exceptions if omitted. +- Arguments and dictionary members cause exceptions if passed the wrong type. + +They may also test positive cases like the following, but the behavior of these should be tested in +operation tests. + +- OK to omit optional arguments/members. +- OK to pass the correct argument/member type (or of any type in a union type). + +Every overload of every method should be tested. + +## `src/stress/`, `src/manual/` + +Stress tests and manual tests for WebGPU that are not intended to be run in an automated way. + +## `src/unittests/` + +Unit tests for the test framework (`src/common/framework/`). + +## `src/demo/` + +A demo of test hierarchies for the purpose of testing the `standalone` test runner page. diff --git a/dom/webgpu/tests/cts/checkout/docs/reviews.md b/dom/webgpu/tests/cts/checkout/docs/reviews.md new file mode 100644 index 0000000000..1a8c3f9624 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/reviews.md @@ -0,0 +1,70 @@ +# Review Requirements + +A review should have several items checked off before it is landed. +Checkboxes are pre-filled into the pull request summary when it's created. + +The uploader may pre-check-off boxes if they are not applicable +(e.g. TypeScript readability on a plan PR). + +## Readability + +A reviewer has "readability" for a topic if they have enough expertise in that topic to ensure +good practices are followed in pull requests, or know when to loop in other reviewers. +Perfection is not required! + +**It is up to reviewers' own discretion** whether they are qualified to check off a +"readability" checkbox on any given pull request. + +- WebGPU Readability: Familiarity with the API to ensure: + + - WebGPU is being used correctly; expected results seem reasonable. + - WebGPU is being tested completely; tests have control cases. + - Test code has a clear correspondence with the test description. + - [Test helpers](./helper_index.txt) are used or created appropriately + (where the reviewer is familiar with the helpers). + +- TypeScript Readability: Make sure TypeScript is utilized in a way that: + + - Ensures test code is reasonably type-safe. + Reviewers may recommend changes to make type-safety either weaker (`as`, etc.) or stronger. + - Is understandable and has appropriate verbosity and dynamicity + (e.g. type inference and `as const` are used to reduce unnecessary boilerplate). + +## Plan Reviews + +**Changes *must* have an author or reviewer with the following readability:** WebGPU + +Reviewers must carefully ensure the following: + +- The test plan name accurately describes the area being tested. +- The test plan covers the area described by the file/test name and file/test description + as fully as possible (or adds TODOs for incomplete areas). +- Validation tests have control cases (where no validation error should occur). +- Each validation rule is tested in isolation, in at least one case which does not validate any + other validation rules. + +See also: [Adding or Editing Test Plans](intro/plans.md). + +## Implementation Reviews + +**Changes *must* have an author or reviewer with the following readability:** WebGPU, TypeScript + +Reviewers must carefully ensure the following: + +- The coverage of the test implementation precisely matches the test description. +- Everything required for test plan reviews above. + +Reviewers should ensure the following: + +- New test helpers are documented in [helper index](./helper_index.txt). +- Framework and test helpers are used where they would make test code clearer. + +See also: [Implementing Tests](intro/tests.md). + +## Framework + +**Changes *must* have an author or reviewer with the following readability:** TypeScript + +Reviewers should ensure the following: + +- Changes are reasonably type-safe, and covered by unit tests where appropriate. diff --git a/dom/webgpu/tests/cts/checkout/docs/terms.md b/dom/webgpu/tests/cts/checkout/docs/terms.md new file mode 100644 index 0000000000..032639be57 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/terms.md @@ -0,0 +1,270 @@ +# Terminology + +Each test suite is organized as a tree, both in the filesystem and further within each file. + +- _Suites_, e.g. `src/webgpu/`. + - _READMEs_, e.g. `src/webgpu/README.txt`. + - _Test Spec Files_, e.g. `src/webgpu/examples.spec.ts`. + Identified by their file path. + Each test spec file provides a description and a _Test Group_. + A _Test Group_ defines a test fixture, and contains multiple: + - _Tests_. + Identified by a comma-separated list of parts (e.g. `basic,async`) + which define a path through a filesystem-like tree (analogy: `basic/async.txt`). + Defines a _test function_ and contains multiple: + - _Test Cases_. + Identified by a list of _Public Parameters_ (e.g. `x` = `1`, `y` = `2`). + Each Test Case has the same test function but different Public Parameters. + +## Test Tree + +A _Test Tree_ is a tree whose leaves are individual Test Cases. + +A Test Tree can be thought of as follows: + +- Suite, which is the root of a tree with "leaves" which are: + - Test Spec Files, each of which is a tree with "leaves" which are: + - Tests, each of which is a tree with leaves which are: + - Test Cases. + +(In the implementation, this conceptual tree of trees is decomposed into one big tree +whose leaves are Test Cases.) + +**Type:** `TestTree` + +## Suite + +A suite of tests. +A single suite has a directory structure, and many _test spec files_ +(`.spec.ts` files containing tests) and _READMEs_. +Each member of a suite is identified by its path within the suite. + +**Example:** `src/webgpu/` + +### README + +**Example:** `src/webgpu/README.txt` + +Describes (in prose) the contents of a subdirectory in a suite. + +READMEs are only processed at build time, when generating the _Listing_ for a suite. + +**Type:** `TestSuiteListingEntryReadme` + +## Queries + +A _Query_ is a structured object which specifies a subset of cases in exactly one Suite. +A Query can be represented uniquely as a string. +Queries are used to: + +- Identify a subtree of a suite (by identifying the root node of that subtree). +- Identify individual cases. +- Represent the list of tests that a test runner (standalone, wpt, or cmdline) should run. +- Identify subtrees which should not be "collapsed" during WPT `cts.https.html` generation, + so that that cts.https.html "variants" can have individual test expectations + (i.e. marked as "expected to fail", "skip", etc.). + +There are four types of `TestQuery`: + +- `TestQueryMultiFile` represents any subtree of the file hierarchy: + - `suite:*` + - `suite:path,to,*` + - `suite:path,to,file,*` +- `TestQueryMultiTest` represents any subtree of the test hierarchy: + - `suite:path,to,file:*` + - `suite:path,to,file:path,to,*` + - `suite:path,to,file:path,to,test,*` +- `TestQueryMultiCase` represents any subtree of the case hierarchy: + - `suite:path,to,file:path,to,test:*` + - `suite:path,to,file:path,to,test:my=0;*` + - `suite:path,to,file:path,to,test:my=0;params="here";*` +- `TestQuerySingleCase` represents as single case: + - `suite:path,to,file:path,to,test:my=0;params="here"` + +Test Queries are a **weakly ordered set**: any query is +_Unordered_, _Equal_, _StrictSuperset_, or _StrictSubset_ relative to any other. +This property is used to construct the complete tree of test cases. +In the examples above, every example query is a StrictSubset of the previous one +(note: even `:*` is a subset of `,*`). + +In the WPT and standalone harnesses, the query is stored in the URL, e.g. +`index.html?q=q:u,e:r,y:*`. + +Queries are selectively URL-encoded for readability and compatibility with browsers +(see `encodeURIComponentSelectively`). + +**Type:** `TestQuery` + +## Listing + +A listing of the **test spec files** in a suite. + +This can be generated only in Node, which has filesystem access (see `src/tools/crawl.ts`). +As part of the build step, a _listing file_ is generated (see `src/tools/gen.ts`) so that the +Test Spec Files can be discovered by the web runner (since it does not have filesystem access). + +**Type:** `TestSuiteListing` + +### Listing File + +Each Suite has one Listing File (`suite/listing.[tj]s`), containing a list of the files +in the suite. + +In `src/suite/listing.ts`, this is computed dynamically. +In `out/suite/listing.js`, the listing has been pre-baked (by `tools/gen_listings`). + +**Type:** Once `import`ed, `ListingFile` + +**Example:** `out/webgpu/listing.js` + +## Test Spec File + +A Test Spec File has a `description` and a Test Group (under which tests and cases are defined). + +**Type:** Once `import`ed, `SpecFile` + +**Example:** `src/webgpu/**/*.spec.ts` + +## Test Group + +A subtree of tests. There is one Test Group per Test Spec File. + +The Test Fixture used for tests is defined at TestGroup creation. + +**Type:** `TestGroup` + +## Test + +One test. It has a single _test function_. + +It may represent multiple _test cases_, each of which runs the same Test Function with different +Parameters. + +A test is named using `TestGroup.test()`, which returns a `TestBuilder`. +`TestBuilder.params()`/`.paramsSimple()`/`.paramsSubcasesOnly()` +can optionally be used to parametrically generate instances (cases and subcases) of the test. +Finally, `TestBuilder.fn()` provides the Test Function +(or, a test can be marked unimplemented with `TestBuilder.unimplemented()`). + +### Test Function + +When a test subcase is run, the Test Function receives an instance of the +Test Fixture provided to the Test Group, producing test results. + +**Type:** `TestFn` + +## Test Case / Case + +A single case of a test. It is identified by a `TestCaseID`: a test name, and its parameters. + +Each case appears as an individual item (tree leaf) in `/standalone/`, +and as an individual "step" in WPT. + +If `TestBuilder.params()`/`.paramsSimple()`/`.paramsSubcasesOnly()` are not used, +there is exactly one case with one subcase, with parameters `{}`. + +**Type:** During test run time, a case is encapsulated as a `RunCase`. + +## Test Subcase / Subcase + +A single "subcase" of a test. It can also be identified by a `TestCaseID`, though +not all contexts allow subdividing cases into subcases. + +All of the subcases of a case will run _inside_ the case, essentially as a for-loop wrapping the +test function. They do _not_ appear individually in `/standalone/` or WPT. + +If `CaseParamsBuilder.beginSubcases()` is not used, there is exactly one subcase per case. + +## Test Parameters / Params + +Each Test Subcase has a (possibly empty) set of Test Parameters, +The parameters are passed to the Test Function `f(t)` via `t.params`. + +A set of Public Parameters identifies a Test Case or Test Subcase within a Test. + +There are also Private Parameters: any parameter name beginning with an underscore (`_`). +These parameters are not part of the Test Case identification, but are still passed into +the Test Function. They can be used, e.g., to manually specify expected results. + +**Type:** `TestParams` + +## Test Fixture / Fixture + +_Test Fixtures_ provide helpers for tests to use. +A new instance of the fixture is created for every run of every test case. + +There is always one fixture class for a whole test group (though this may change). + +The fixture is also how a test gets access to the _case recorder_, +which allows it to produce test results. + +They are also how tests produce results: `.skip()`, `.fail()`, etc. + +**Type:** `Fixture` + +### `UnitTest` Fixture + +Provides basic fixture utilities most useful in the `unittests` suite. + +### `GPUTest` Fixture + +Provides utilities useful in WebGPU CTS tests. + +# Test Results + +## Logger + +A logger logs the results of a whole test run. + +It saves an empty `LiveTestSpecResult` into its results map, then creates a +_test spec recorder_, which records the results for a group into the `LiveTestSpecResult`. + +**Type:** `Logger` + +### Test Case Recorder + +Refers to a `LiveTestCaseResult` created by the logger. +Records the results of running a test case (its pass-status, run time, and logs) into it. + +**Types:** `TestCaseRecorder`, `LiveTestCaseResult` + +#### Test Case Status + +The `status` of a `LiveTestCaseResult` can be one of: + +- `'running'` (only while still running) +- `'pass'` +- `'skip'` +- `'warn'` +- `'fail'` + +The "worst" result from running a case is always reported (fail > warn > skip > pass). +Note this means a test can still fail if it's "skipped", if it failed before +`.skip()` was called. + +**Type:** `Status` + +## Results Format + +The results are returned in JSON format. + +They are designed to be easily merged in JavaScript: +the `"results"` can be passed into the constructor of `Map` and merged from there. + +(TODO: Write a merge tool, if needed.) + +```js +{ + "version": "bf472c5698138cdf801006cd400f587e9b1910a5-dirty", + "results": [ + [ + "unittests:async_mutex:basic:", + { "status": "pass", "timems": 0.286, "logs": [] } + ], + [ + "unittests:async_mutex:serial:", + { "status": "pass", "timems": 0.415, "logs": [] } + ] + ] +} +``` |