summaryrefslogtreecommitdiffstats
path: root/dom/webgpu/tests/cts/checkout/docs/fp_primer.md
diff options
context:
space:
mode:
Diffstat (limited to 'dom/webgpu/tests/cts/checkout/docs/fp_primer.md')
-rw-r--r--dom/webgpu/tests/cts/checkout/docs/fp_primer.md516
1 files changed, 516 insertions, 0 deletions
diff --git a/dom/webgpu/tests/cts/checkout/docs/fp_primer.md b/dom/webgpu/tests/cts/checkout/docs/fp_primer.md
new file mode 100644
index 0000000000..234a43de40
--- /dev/null
+++ b/dom/webgpu/tests/cts/checkout/docs/fp_primer.md
@@ -0,0 +1,516 @@
+# Floating Point Primer
+
+This document is meant to be a primer of the concepts related to floating point
+numbers that are needed to be understood when working on tests in WebGPU's CTS.
+
+WebGPU's CTS is responsible for testing if implementations of WebGPU are
+conformant to the spec, and thus interoperable with each other.
+
+Floating point math makes up a significant portion of the WGSL spec, and has
+many subtle corner cases to get correct.
+
+Additionally, floating point math, unlike integer math, is broadly not exact, so
+how inaccurate a calculation is allowed to be is required to be stated in the
+spec and tested in the CTS, as opposed to testing for a singular correct
+response.
+
+Thus, the WebGPU CTS has a significant amount of machinery around how to
+correctly test floating point expectations in a fluent manner.
+
+## Floating Point Numbers
+
+For the context of this discussion floating point numbers, fp for short, are
+single precision IEEE floating point numbers, f32 for short.
+
+Details of how this format works are discussed as needed below, but for a more
+involved discussion, please see the references in the Resources sections.
+
+Additionally, in the Appendix there is a table of interesting/common values that
+are often referenced in tests or this document.
+
+*In the future support for f16 and abstract floats will be added to the CTS, and
+this document will need to be updated.*
+
+Floating point numbers are effectively lossy compression of the infinite number
+of possible values over their range down to 32-bits of distinct points.
+
+This means that not all numbers in the range can be exactly represented as a f32.
+
+For example, the integer `1` is exactly represented as `0x3f800000`, but the next
+nearest number `0x3f800001` is `1.00000011920928955`.
+
+So any number between `1` and `1.00000011920928955` is not exactly represented
+as a f32 and instead is approximated as either `1` or `1.00000011920928955`.
+
+When a number X is not exactly represented by a f32 value, there are normally
+two neighbouring numbers that could reasonably represent X: the nearest f32
+value above X, and the nearest f32 value below X. Which of these values gets
+used is dictated by the rounding mode being used, which may be something like
+always round towards 0 or go to the nearest neighbour, or something else
+entirely.
+
+The process of converting numbers between precisions, like non-f32 to f32, is
+called quantization. WGSL does not prescribe a specific rounding mode when
+quantizing, so either of the neighbouring values is considered valid
+when converting a non-exactly representable value to f32. This has significant
+implications on the CTS that are discussed later.
+
+From here on, we assume you are familiar with the internal structure of a f32
+value: a sign bit, a biased exponent, and a mantissa. For reference, see
+[float32 on Wikipedia](https://en.wikipedia.org/wiki/Single-precision_floating-point_format)
+
+In the f32 format as described above, there are two possible zero values, one
+with all bits being 0, called positive zero, and one all the same except with
+the sign bit being 1, called negative zero.
+
+For WGSL, and thus the CTS's purposes, these values are considered equivalent.
+Typescript, which the CTS is written in, treats all zeros as positive zeros,
+unless you explicitly escape hatch to differentiate between them, so most of the
+time there being two zeros doesn't materially affect code.
+
+### Normals
+
+Normal numbers are floating point numbers whose biased exponent is not all 0s or
+all 1s. For WGSL these numbers behave as you expect for floating point values
+with no interesting caveats.
+
+### Subnormals
+
+Subnormal numbers are numbers whose biased exponent is all 0s, also called
+denorms.
+
+These are the closest numbers to zero, both positive and negative, and fill in
+the gap between the normal numbers with smallest magnitude, and 0.
+
+Some devices, for performance reasons, do not handle operations on the
+subnormal numbers, and instead treat them as being zero, this is called *flush
+to zero* or FTZ behaviour.
+
+This means in the CTS that when a subnormal number is consumed or produced by an
+operation, an implementation may choose to replace it with zero.
+
+Like the rounding mode for quantization, this adds significant complexity to the
+CTS, which will be discussed later.
+
+### Inf & NaNs
+
+Floating point numbers include positive and negative infinity to represent
+values that are out of the bounds supported by the current precision.
+
+Implementations may assume that infinities are not present. When an evaluation
+would produce an infinity, an undefined value is produced instead.
+
+Additionally, when a calculation would produce a finite value outside the
+bounds of the current precision, the implementation may convert that value to
+either an infinity with same sign, or the min/max representable value as
+appropriate.
+
+The CTS encodes the least restrictive interpretation of the rules in the spec,
+i.e. assuming someone has made a slightly adversarial implementation that always
+chooses the thing with the least accuracy.
+
+This means that the above rules about infinities combine to say that any time an
+out of bounds value is seen, any finite value is acceptable afterwards.
+
+This is because the out of bounds value may be converted to an infinity and then
+an undefined value can be used instead of the infinity.
+
+This is actually a significant boon for the CTS implementation, because it short
+circuits a bunch of complexity about clamping to edge values and handling
+infinities.
+
+Signaling NaNs are treated as quiet NaNs in the WGSL spec. And quiet NaNs have
+the same "may-convert-to-undefined-value" behaviour that infinities have, so for
+the purpose of the CTS they are handled by the infinite/out of bounds logic
+normally.
+
+## Notation/Terminology
+
+When discussing floating point values in the CTS, there are a few terms used
+with precise meanings, which will be elaborated here.
+
+Additionally, any specific notation used will be specified here to avoid
+confusion.
+
+### Operations
+
+The CTS tests for the proper execution of f32 builtins, i.e. sin, sqrt, abs,
+etc, and expressions, i.e. *, /, <, etc. These collectively can be referred to
+as f32 operations.
+
+Operations, which can be thought of as mathematical functions, are mappings from
+a set of inputs to a set of outputs.
+
+Denoted `f(x, y) = X`, where f is a placeholder or the name of the operation,
+lower case variables are the inputs to the function, and uppercase variables are
+the outputs of the function.
+
+Operations have one or more inputs and an output. Being a f32 operation means
+that the primary space for input and output values is f32, but there is some
+flexibility in this definition. For example operations with values being
+restricted to a subset of integers that are representable as f32 are often
+referred to as being f32 based.
+
+Values are generally floats, integers, booleans, vector, and matrices. Consult
+the WGSL spec for the exact list of types and their definitions.
+
+For composite outputs where there are multiple values being returned, there is a
+single result value made of structured data. Whereas inputs handle this by
+having multiple input parameters.
+
+Some examples of different types of operations:
+
+`multiplication(x, y) = X`, which represents the WGSL expression `x * y`, takes
+in f32 values, `x` and `y`, and produces a f32 value `X`.
+
+`lessThen(x, y) = X`, which represents the WGSL expression `x < y`, again takes
+in f32 values, but in this case returns a boolean value.
+
+`ldexp(x, y) = X`, which builds a f32 takes, takes in a f32 values `x` and a
+restricted integer `y`.
+
+### Domain, Range, and Intervals
+
+For an operation `f(x) = X`, the interval of valid values for the input, `x`, is
+called the *domain*, and the interval for valid results, `X`, is called the
+*range*.
+
+An interval, `[a, b]`, is a set of real numbers that contains `a`, `b`, and all
+the real numbers between them.
+
+Open-ended intervals, i.e. ones that don't include `a` and/or `b`, are avoided,
+and are called out explicitly when they occur.
+
+The convention in this doc and the CTS code is that `a <= b`, so `a` can be
+referred to as the beginning of the interval and `b` as the end of the interval.
+
+When talking about intervals, this doc and the code endeavours to avoid using
+the term **range** to refer to the span of values that an interval covers,
+instead using the term bounds to avoid confusion of terminology around output of
+operations.
+
+## Accuracy
+
+As mentioned above floating point numbers are not able to represent all the
+possible values over their bounds, but instead represent discrete values in that
+interval, and approximate the remainder.
+
+Additionally, floating point numbers are not evenly distributed over the real
+number line, but instead are clustered closer together near zero, and further
+apart as their magnitudes grow.
+
+When discussing operations on floating point numbers, there is often reference
+to a true value. This is the value that given no performance constraints and
+infinite precision you would get, i.e `acos(1) = π`, where π has infinite
+digits of precision.
+
+For the CTS it is often sufficient to calculate the true value using TypeScript,
+since its native number format is higher precision (double-precision/f64), and
+all f32 values can be represented in it.
+
+The true value is sometimes representable exactly as a f32 value, but often is
+not.
+
+Additionally, many operations are implemented using approximations from
+numerical analysis, where there is a tradeoff between the precision of the
+result and the cost.
+
+Thus, the spec specifies what the accuracy constraints for specific operations
+is, how close to truth an implementation is required to be, to be
+considered conformant.
+
+There are 5 different ways that accuracy requirements are defined in the spec:
+
+1. *Exact*
+
+ This is the situation where it is expected that true value for an operation
+ is always expected to be exactly representable. This doesn't happen for any
+ of the operations that return floating point values, but does occur for
+ logical operations that return boolean values.
+
+
+2. *Correctly Rounded*
+
+ For the case that the true value is exactly representable as a f32, this is
+ the equivalent of exactly from above. In the event that the true value is not
+ exact, then the acceptable answer for most numbers is either the nearest f32
+ above or the nearest f32 below the true value.
+
+ For values near the subnormal range, e.g. close to zero, this becomes more
+ complex, since an implementation may FTZ at any point. So if the exact
+ solution is subnormal or either of the neighbours of the true value are
+ subnormal, zero becomes a possible result, thus the acceptance interval is
+ wider than naively expected.
+
+
+3. *Absolute Error*
+
+ This type of accuracy specifies an error value, ε, and the calculated result
+ is expected to be within that distance from the true value, i.e.
+ `[ X - ε, X + ε ]`.
+
+ The main drawback with this manner of specifying accuracy is that it doesn't
+ scale with the level of precision in floating point numbers themselves at a
+ specific value. Thus, it tends to be only used for specifying accuracy over
+ specific limited intervals, i.e. [-π, π].
+
+
+4. *Units of Least Precision (ULP)*
+
+ The solution to the issue of not scaling with precision of floating point is
+ to use units of least precision.
+
+ ULP(X) is min (b-a) over all pairs (a,b) of representable floating point
+ numbers such that (a <= X <= b and a =/= b). For a more formal discussion of
+ ULP see
+ [On the definition of ulp(x)](https://hal.inria.fr/inria-00070503/document).
+
+ n * ULP or nULP means `[X - n * ULP @ X, X + n * ULP @ X]`.
+
+
+5. *Inherited*
+
+ When an operation's accuracy is defined in terms of other operations, then
+ its accuracy is said to be inherited. Handling of inherited accuracies is
+ one of the main driving factors in the design of testing framework, so will
+ need to be discussed in detail.
+
+## Acceptance Intervals
+
+The first four accuracy types; Exact, Correctly Rounded, Absolute Error, and
+ULP, sometimes called simple accuracies, can be defined in isolation from each
+other, and by association can be implemented using relatively independent
+implementations.
+
+The original implementation of the floating point framework did this as it was
+being built out, but ran into difficulties when defining the inherited
+accuracies.
+
+For examples, `tan(x) inherits from sin(x)/cos(x)`, one can take the defined
+rules and manually build up a bespoke solution for checking the results, but
+this is tedious, error-prone, and doesn't allow for code re-use.
+
+Instead, it would be better if there was a single conceptual framework that one
+can express all the 'simple' accuracy requirements in, and then have a mechanism
+for composing them to define inherited accuracies.
+
+In the WebGPU CTS this is done via the concept of acceptance intervals, which is
+derived from a similar concept in the Vulkan CTS, though implemented
+significantly differently.
+
+The core of this idea is that each of different accuracy types can be integrated
+into the definition of the operation, so that instead of transforming an input
+from the domain to a point in the range, the operation is producing an interval
+in the range, that is the acceptable values an implementation may emit.
+
+
+The simple accuracies can be defined as follows:
+
+1. *Exact*
+
+ `f(x) => [X, X]`
+
+
+2. *Correctly Rounded*
+
+ If `X` is precisely defined as a f32
+
+ `f(x) => [X, X]`
+
+ otherwise,
+
+ `[a, b]` where `a` is the largest representable number with `a <= X`, and `b`
+ is the smallest representable number with `X <= b`
+
+
+3. *Absolute Error*
+
+ `f(x) => [ X - ε, X + ε ]`, where ε is the absolute error value
+
+
+4. **ULP Error**
+
+ `f(x) = X => [X - n*ULP(X), X + n*ULP(X)]`
+
+As defined, these definitions handle mapping from a point in the domain into an
+interval in the range.
+
+This is insufficient for implementing inherited accuracies, since inheritance
+sometimes involve mapping domain intervals to range intervals.
+
+Here we use the convention for naturally extending a function on real numbers
+into a function on intervals of real numbers, i.e. `f([a, b]) = [A, B]`.
+
+Given that floating point numbers have a finite number of precise values for any
+given interval, one could implement just running the accuracy computation for
+every point in the interval and then spanning together the resultant intervals.
+That would be very inefficient though and make your reviewer sad to read.
+
+For mapping intervals to intervals the key insight is that we only need to be
+concerned with the extrema of the operation in the interval, since the
+acceptance interval is the bounds of the possible outputs.
+
+In more precise terms:
+```
+ f(x) => X, x = [a, b] and X = [A, B]
+
+ X = [min(f(x)), max(f(x))]
+ X = [min(f([a, b])), max(f([a, b]))]
+ X = [f(m), f(M)]
+```
+where m and M are in `[a, b]`, `m <= M`, and produce the min and max results
+for `f` on the interval, respectively.
+
+So how do we find the minima and maxima for our operation in the domain?
+
+The common general solution for this requires using calculus to calculate the
+derivative of `f`, `f'`, and then find the zeroes `f'` to find inflection
+points of `f`.
+
+This solution wouldn't be sufficient for all builtins, i.e. `step` which is not
+differentiable at 'edge' values.
+
+Thankfully we do not need a general solution for the CTS, since all the builtin
+operations are defined in the spec, so `f` is from a known set of options.
+
+These operations can be divided into two broad categories: monotonic, and
+non-monotonic, with respect to an interval.
+
+The monotonic operations are ones that preserve the order of inputs in their
+outputs (or reverse it). Their graph only ever decreases or increases,
+never changing from one or the other, though it can have flat sections.
+
+The non-monotonic operations are ones whose graph would have both regions of
+increase and decrease.
+
+The monotonic operations, when mapping an interval to an interval, are simple to
+handle, since the extrema are guaranteed to be the ends of the domain, `a` and `b`.
+
+So `f([a, b])` = `[f(a), f(b)]` or `[f(b), f(a)]`. We could figure out if `f` is
+increasing or decreasing beforehand to determine if it should be `[f(a), f(b)]`
+or `[f(b), f(a)]`.
+
+It is simpler to just use min & max to have an implementation that is agnostic
+to the details of `f`.
+```
+ A = f(a), B = f(b)
+ X = [min(A, B), max(A, B)]
+```
+
+The non-monotonic functions that we need to handle for interval-to-interval
+mappings are more complex. Thankfully are a small number of the overall
+operations that need to be handled, since they are only the operations that are
+used in an inherited accuracy and take in the output of another operation as
+part of that inherited accuracy.
+
+So in the CTS we just have bespoke implementations for each of them.
+
+Part of the operation definition in the CTS is a function that takes in the
+domain interval, and returns a sub-interval such that the subject function is
+monotonic over that sub-interval, and hence the function's minima and maxima are
+at the ends.
+
+This adjusted domain interval can then be fed through the same machinery as the
+monotonic functions.
+
+### Inherited Accuracy
+
+So with all of that background out of the way, we can now define an inherited
+accuracy in terms of acceptance intervals.
+
+The crux of this is the insight that the range of one operation can become the
+domain of another operation to compose them together.
+
+And since we have defined how to do this interval to interval mapping above,
+transforming things becomes mechanical and thus implementable in reusable code.
+
+When talking about inherited accuracies `f(x) => g(x)` is used to denote that
+`f`'s accuracy is a defined as `g`.
+
+An example to illustrate inherited accuracies:
+
+```
+ tan(x) => sin(x)/cos(x)
+
+ sin(x) => [sin(x) - 2^-11, sin(x) + 2^-11]`
+ cos(x) => [cos(x) - 2^-11, cos(x) + 2-11]
+
+ x/y => [x/y - 2.5 * ULP(x/y), x/y + 2.5 * ULP(x/y)]
+```
+
+`sin(x)` and `cos(x)` are non-monotonic, so calculating out a closed generic
+form over an interval is a pain, since the min and max vary depending on the
+value of x. Let's isolate this to a single point, so you don't have to read
+literally pages of expanded intervals.
+
+```
+ x = π/2
+
+ sin(π/2) => [sin(π/2) - 2-11, sin(π/2) + 2-11]
+ => [0 - 2-11, 0 + 2-11]
+ => [-0.000488.., 0.000488...]
+ cos(π/2) => [cos(π/2) - 2-11, cos(π/2) + 2-11]
+ => [-0.500488, -0.499511...]
+
+ tan(π/2) => sin(π/2)/cos(π/2)
+ => [-0.000488.., 0.000488...]/[-0.500488..., -0.499511...]
+ => [min({-0.000488.../-0.500488..., -0.000488.../-0.499511..., ...}),
+ max(min({-0.000488.../-0.500488..., -0.000488.../-0.499511..., ...}) ]
+ => [0.000488.../-0.499511..., 0.000488.../0.499511...]
+ => [-0.0009775171, 0.0009775171]
+```
+
+For clarity this has omitted a bunch of complexity around FTZ behaviours, and
+that these operations are only defined for specific domains, but the high-level
+concepts hold.
+
+For each of the inherited operations we could implement a manually written out
+closed form solution, but that would be quite error-prone and not be
+re-using code between builtins.
+
+Instead, the CTS takes advantage of the fact in addition to testing
+implementations of `tan(x)` we are going to be testing implementations of
+`sin(x)`, `cos(x)` and `x/y`, so there should be functions to generate
+acceptance intervals for those operations.
+
+The `tan(x)` acceptance interval can be constructed by generating the acceptance
+intervals for `sin(x)`, `cos(x)` and `x/y` via function calls and composing the
+results.
+
+This algorithmically looks something like this:
+
+```
+ tan(x):
+ Calculate sin(x) interval
+ Calculate cos(x) interval
+ Calculate sin(x) result divided by cos(x) result
+ Return division result
+```
+
+# Appendix
+
+### Significant f32 Values
+
+| Name | Decimal (~) | Hex | Sign Bit | Exponent Bits | Significand Bits |
+| ---------------------- | --------------: | ----------: | -------: | ------------: | ---------------------------: |
+| Negative Infinity | -∞ | 0xff80 0000 | 1 | 1111 1111 | 0000 0000 0000 0000 0000 000 |
+| Min Negative Normal | -3.40282346E38 | 0xff7f ffff | 1 | 1111 1110 | 1111 1111 1111 1111 1111 111 |
+| Max Negative Normal | -1.1754943E−38 | 0x8080 0000 | 1 | 0000 0001 | 0000 0000 0000 0000 0000 000 |
+| Min Negative Subnormal | -1.1754942E-38 | 0x807f ffff | 1 | 0000 0000 | 1111 1111 1111 1111 1111 111 |
+| Max Negative Subnormal | -1.4012984E−45 | 0x8000 0001 | 1 | 0000 0000 | 0000 0000 0000 0000 0000 001 |
+| Negative Zero | -0 | 0x8000 0000 | 1 | 0000 0000 | 0000 0000 0000 0000 0000 000 |
+| Positive Zero | 0 | 0x0000 0000 | 0 | 0000 0000 | 0000 0000 0000 0000 0000 000 |
+| Min Positive Subnormal | 1.4012984E−45 | 0x0000 0001 | 0 | 0000 0000 | 0000 0000 0000 0000 0000 001 |
+| Max Positive Subnormal | 1.1754942E-38 | 0x007f ffff | 0 | 0000 0000 | 1111 1111 1111 1111 1111 111 |
+| Min Positive Normal | 1.1754943E−38 | 0x0080 0000 | 0 | 0000 0001 | 0000 0000 0000 0000 0000 000 |
+| Max Positive Normal | 3.40282346E38 | 0x7f7f ffff | 0 | 1111 1110 | 1111 1111 1111 1111 1111 111 |
+| Negative Infinity | ∞ | 0x7f80 0000 | 0 | 1111 1111 | 0000 0000 0000 0000 0000 000 |
+
+# Resources
+- [WebGPU Spec](https://www.w3.org/TR/webgpu/)
+- [WGSL Spec](https://www.w3.org/TR/WGSL/)
+- [float32 on Wikipedia](https://en.wikipedia.org/wiki/Single-precision_floating-point_format)
+- [IEEE-754 Floating Point Converter](https://www.h-schmidt.net/FloatConverter/IEEE754.html)
+- [IEEE 754 Calculator](http://weitz.de/ieee/)
+- [Keisan High Precision Calculator](https://keisan.casio.com/calculator)
+- [On the definition of ulp(x)](https://hal.inria.fr/inria-00070503/document)