diff options
Diffstat (limited to 'dom/webgpu/tests/cts/checkout/docs/fp_primer.md')
-rw-r--r-- | dom/webgpu/tests/cts/checkout/docs/fp_primer.md | 516 |
1 files changed, 516 insertions, 0 deletions
diff --git a/dom/webgpu/tests/cts/checkout/docs/fp_primer.md b/dom/webgpu/tests/cts/checkout/docs/fp_primer.md new file mode 100644 index 0000000000..234a43de40 --- /dev/null +++ b/dom/webgpu/tests/cts/checkout/docs/fp_primer.md @@ -0,0 +1,516 @@ +# Floating Point Primer + +This document is meant to be a primer of the concepts related to floating point +numbers that are needed to be understood when working on tests in WebGPU's CTS. + +WebGPU's CTS is responsible for testing if implementations of WebGPU are +conformant to the spec, and thus interoperable with each other. + +Floating point math makes up a significant portion of the WGSL spec, and has +many subtle corner cases to get correct. + +Additionally, floating point math, unlike integer math, is broadly not exact, so +how inaccurate a calculation is allowed to be is required to be stated in the +spec and tested in the CTS, as opposed to testing for a singular correct +response. + +Thus, the WebGPU CTS has a significant amount of machinery around how to +correctly test floating point expectations in a fluent manner. + +## Floating Point Numbers + +For the context of this discussion floating point numbers, fp for short, are +single precision IEEE floating point numbers, f32 for short. + +Details of how this format works are discussed as needed below, but for a more +involved discussion, please see the references in the Resources sections. + +Additionally, in the Appendix there is a table of interesting/common values that +are often referenced in tests or this document. + +*In the future support for f16 and abstract floats will be added to the CTS, and +this document will need to be updated.* + +Floating point numbers are effectively lossy compression of the infinite number +of possible values over their range down to 32-bits of distinct points. + +This means that not all numbers in the range can be exactly represented as a f32. + +For example, the integer `1` is exactly represented as `0x3f800000`, but the next +nearest number `0x3f800001` is `1.00000011920928955`. + +So any number between `1` and `1.00000011920928955` is not exactly represented +as a f32 and instead is approximated as either `1` or `1.00000011920928955`. + +When a number X is not exactly represented by a f32 value, there are normally +two neighbouring numbers that could reasonably represent X: the nearest f32 +value above X, and the nearest f32 value below X. Which of these values gets +used is dictated by the rounding mode being used, which may be something like +always round towards 0 or go to the nearest neighbour, or something else +entirely. + +The process of converting numbers between precisions, like non-f32 to f32, is +called quantization. WGSL does not prescribe a specific rounding mode when +quantizing, so either of the neighbouring values is considered valid +when converting a non-exactly representable value to f32. This has significant +implications on the CTS that are discussed later. + +From here on, we assume you are familiar with the internal structure of a f32 +value: a sign bit, a biased exponent, and a mantissa. For reference, see +[float32 on Wikipedia](https://en.wikipedia.org/wiki/Single-precision_floating-point_format) + +In the f32 format as described above, there are two possible zero values, one +with all bits being 0, called positive zero, and one all the same except with +the sign bit being 1, called negative zero. + +For WGSL, and thus the CTS's purposes, these values are considered equivalent. +Typescript, which the CTS is written in, treats all zeros as positive zeros, +unless you explicitly escape hatch to differentiate between them, so most of the +time there being two zeros doesn't materially affect code. + +### Normals + +Normal numbers are floating point numbers whose biased exponent is not all 0s or +all 1s. For WGSL these numbers behave as you expect for floating point values +with no interesting caveats. + +### Subnormals + +Subnormal numbers are numbers whose biased exponent is all 0s, also called +denorms. + +These are the closest numbers to zero, both positive and negative, and fill in +the gap between the normal numbers with smallest magnitude, and 0. + +Some devices, for performance reasons, do not handle operations on the +subnormal numbers, and instead treat them as being zero, this is called *flush +to zero* or FTZ behaviour. + +This means in the CTS that when a subnormal number is consumed or produced by an +operation, an implementation may choose to replace it with zero. + +Like the rounding mode for quantization, this adds significant complexity to the +CTS, which will be discussed later. + +### Inf & NaNs + +Floating point numbers include positive and negative infinity to represent +values that are out of the bounds supported by the current precision. + +Implementations may assume that infinities are not present. When an evaluation +would produce an infinity, an undefined value is produced instead. + +Additionally, when a calculation would produce a finite value outside the +bounds of the current precision, the implementation may convert that value to +either an infinity with same sign, or the min/max representable value as +appropriate. + +The CTS encodes the least restrictive interpretation of the rules in the spec, +i.e. assuming someone has made a slightly adversarial implementation that always +chooses the thing with the least accuracy. + +This means that the above rules about infinities combine to say that any time an +out of bounds value is seen, any finite value is acceptable afterwards. + +This is because the out of bounds value may be converted to an infinity and then +an undefined value can be used instead of the infinity. + +This is actually a significant boon for the CTS implementation, because it short +circuits a bunch of complexity about clamping to edge values and handling +infinities. + +Signaling NaNs are treated as quiet NaNs in the WGSL spec. And quiet NaNs have +the same "may-convert-to-undefined-value" behaviour that infinities have, so for +the purpose of the CTS they are handled by the infinite/out of bounds logic +normally. + +## Notation/Terminology + +When discussing floating point values in the CTS, there are a few terms used +with precise meanings, which will be elaborated here. + +Additionally, any specific notation used will be specified here to avoid +confusion. + +### Operations + +The CTS tests for the proper execution of f32 builtins, i.e. sin, sqrt, abs, +etc, and expressions, i.e. *, /, <, etc. These collectively can be referred to +as f32 operations. + +Operations, which can be thought of as mathematical functions, are mappings from +a set of inputs to a set of outputs. + +Denoted `f(x, y) = X`, where f is a placeholder or the name of the operation, +lower case variables are the inputs to the function, and uppercase variables are +the outputs of the function. + +Operations have one or more inputs and an output. Being a f32 operation means +that the primary space for input and output values is f32, but there is some +flexibility in this definition. For example operations with values being +restricted to a subset of integers that are representable as f32 are often +referred to as being f32 based. + +Values are generally floats, integers, booleans, vector, and matrices. Consult +the WGSL spec for the exact list of types and their definitions. + +For composite outputs where there are multiple values being returned, there is a +single result value made of structured data. Whereas inputs handle this by +having multiple input parameters. + +Some examples of different types of operations: + +`multiplication(x, y) = X`, which represents the WGSL expression `x * y`, takes +in f32 values, `x` and `y`, and produces a f32 value `X`. + +`lessThen(x, y) = X`, which represents the WGSL expression `x < y`, again takes +in f32 values, but in this case returns a boolean value. + +`ldexp(x, y) = X`, which builds a f32 takes, takes in a f32 values `x` and a +restricted integer `y`. + +### Domain, Range, and Intervals + +For an operation `f(x) = X`, the interval of valid values for the input, `x`, is +called the *domain*, and the interval for valid results, `X`, is called the +*range*. + +An interval, `[a, b]`, is a set of real numbers that contains `a`, `b`, and all +the real numbers between them. + +Open-ended intervals, i.e. ones that don't include `a` and/or `b`, are avoided, +and are called out explicitly when they occur. + +The convention in this doc and the CTS code is that `a <= b`, so `a` can be +referred to as the beginning of the interval and `b` as the end of the interval. + +When talking about intervals, this doc and the code endeavours to avoid using +the term **range** to refer to the span of values that an interval covers, +instead using the term bounds to avoid confusion of terminology around output of +operations. + +## Accuracy + +As mentioned above floating point numbers are not able to represent all the +possible values over their bounds, but instead represent discrete values in that +interval, and approximate the remainder. + +Additionally, floating point numbers are not evenly distributed over the real +number line, but instead are clustered closer together near zero, and further +apart as their magnitudes grow. + +When discussing operations on floating point numbers, there is often reference +to a true value. This is the value that given no performance constraints and +infinite precision you would get, i.e `acos(1) = π`, where π has infinite +digits of precision. + +For the CTS it is often sufficient to calculate the true value using TypeScript, +since its native number format is higher precision (double-precision/f64), and +all f32 values can be represented in it. + +The true value is sometimes representable exactly as a f32 value, but often is +not. + +Additionally, many operations are implemented using approximations from +numerical analysis, where there is a tradeoff between the precision of the +result and the cost. + +Thus, the spec specifies what the accuracy constraints for specific operations +is, how close to truth an implementation is required to be, to be +considered conformant. + +There are 5 different ways that accuracy requirements are defined in the spec: + +1. *Exact* + + This is the situation where it is expected that true value for an operation + is always expected to be exactly representable. This doesn't happen for any + of the operations that return floating point values, but does occur for + logical operations that return boolean values. + + +2. *Correctly Rounded* + + For the case that the true value is exactly representable as a f32, this is + the equivalent of exactly from above. In the event that the true value is not + exact, then the acceptable answer for most numbers is either the nearest f32 + above or the nearest f32 below the true value. + + For values near the subnormal range, e.g. close to zero, this becomes more + complex, since an implementation may FTZ at any point. So if the exact + solution is subnormal or either of the neighbours of the true value are + subnormal, zero becomes a possible result, thus the acceptance interval is + wider than naively expected. + + +3. *Absolute Error* + + This type of accuracy specifies an error value, ε, and the calculated result + is expected to be within that distance from the true value, i.e. + `[ X - ε, X + ε ]`. + + The main drawback with this manner of specifying accuracy is that it doesn't + scale with the level of precision in floating point numbers themselves at a + specific value. Thus, it tends to be only used for specifying accuracy over + specific limited intervals, i.e. [-π, π]. + + +4. *Units of Least Precision (ULP)* + + The solution to the issue of not scaling with precision of floating point is + to use units of least precision. + + ULP(X) is min (b-a) over all pairs (a,b) of representable floating point + numbers such that (a <= X <= b and a =/= b). For a more formal discussion of + ULP see + [On the definition of ulp(x)](https://hal.inria.fr/inria-00070503/document). + + n * ULP or nULP means `[X - n * ULP @ X, X + n * ULP @ X]`. + + +5. *Inherited* + + When an operation's accuracy is defined in terms of other operations, then + its accuracy is said to be inherited. Handling of inherited accuracies is + one of the main driving factors in the design of testing framework, so will + need to be discussed in detail. + +## Acceptance Intervals + +The first four accuracy types; Exact, Correctly Rounded, Absolute Error, and +ULP, sometimes called simple accuracies, can be defined in isolation from each +other, and by association can be implemented using relatively independent +implementations. + +The original implementation of the floating point framework did this as it was +being built out, but ran into difficulties when defining the inherited +accuracies. + +For examples, `tan(x) inherits from sin(x)/cos(x)`, one can take the defined +rules and manually build up a bespoke solution for checking the results, but +this is tedious, error-prone, and doesn't allow for code re-use. + +Instead, it would be better if there was a single conceptual framework that one +can express all the 'simple' accuracy requirements in, and then have a mechanism +for composing them to define inherited accuracies. + +In the WebGPU CTS this is done via the concept of acceptance intervals, which is +derived from a similar concept in the Vulkan CTS, though implemented +significantly differently. + +The core of this idea is that each of different accuracy types can be integrated +into the definition of the operation, so that instead of transforming an input +from the domain to a point in the range, the operation is producing an interval +in the range, that is the acceptable values an implementation may emit. + + +The simple accuracies can be defined as follows: + +1. *Exact* + + `f(x) => [X, X]` + + +2. *Correctly Rounded* + + If `X` is precisely defined as a f32 + + `f(x) => [X, X]` + + otherwise, + + `[a, b]` where `a` is the largest representable number with `a <= X`, and `b` + is the smallest representable number with `X <= b` + + +3. *Absolute Error* + + `f(x) => [ X - ε, X + ε ]`, where ε is the absolute error value + + +4. **ULP Error** + + `f(x) = X => [X - n*ULP(X), X + n*ULP(X)]` + +As defined, these definitions handle mapping from a point in the domain into an +interval in the range. + +This is insufficient for implementing inherited accuracies, since inheritance +sometimes involve mapping domain intervals to range intervals. + +Here we use the convention for naturally extending a function on real numbers +into a function on intervals of real numbers, i.e. `f([a, b]) = [A, B]`. + +Given that floating point numbers have a finite number of precise values for any +given interval, one could implement just running the accuracy computation for +every point in the interval and then spanning together the resultant intervals. +That would be very inefficient though and make your reviewer sad to read. + +For mapping intervals to intervals the key insight is that we only need to be +concerned with the extrema of the operation in the interval, since the +acceptance interval is the bounds of the possible outputs. + +In more precise terms: +``` + f(x) => X, x = [a, b] and X = [A, B] + + X = [min(f(x)), max(f(x))] + X = [min(f([a, b])), max(f([a, b]))] + X = [f(m), f(M)] +``` +where m and M are in `[a, b]`, `m <= M`, and produce the min and max results +for `f` on the interval, respectively. + +So how do we find the minima and maxima for our operation in the domain? + +The common general solution for this requires using calculus to calculate the +derivative of `f`, `f'`, and then find the zeroes `f'` to find inflection +points of `f`. + +This solution wouldn't be sufficient for all builtins, i.e. `step` which is not +differentiable at 'edge' values. + +Thankfully we do not need a general solution for the CTS, since all the builtin +operations are defined in the spec, so `f` is from a known set of options. + +These operations can be divided into two broad categories: monotonic, and +non-monotonic, with respect to an interval. + +The monotonic operations are ones that preserve the order of inputs in their +outputs (or reverse it). Their graph only ever decreases or increases, +never changing from one or the other, though it can have flat sections. + +The non-monotonic operations are ones whose graph would have both regions of +increase and decrease. + +The monotonic operations, when mapping an interval to an interval, are simple to +handle, since the extrema are guaranteed to be the ends of the domain, `a` and `b`. + +So `f([a, b])` = `[f(a), f(b)]` or `[f(b), f(a)]`. We could figure out if `f` is +increasing or decreasing beforehand to determine if it should be `[f(a), f(b)]` +or `[f(b), f(a)]`. + +It is simpler to just use min & max to have an implementation that is agnostic +to the details of `f`. +``` + A = f(a), B = f(b) + X = [min(A, B), max(A, B)] +``` + +The non-monotonic functions that we need to handle for interval-to-interval +mappings are more complex. Thankfully are a small number of the overall +operations that need to be handled, since they are only the operations that are +used in an inherited accuracy and take in the output of another operation as +part of that inherited accuracy. + +So in the CTS we just have bespoke implementations for each of them. + +Part of the operation definition in the CTS is a function that takes in the +domain interval, and returns a sub-interval such that the subject function is +monotonic over that sub-interval, and hence the function's minima and maxima are +at the ends. + +This adjusted domain interval can then be fed through the same machinery as the +monotonic functions. + +### Inherited Accuracy + +So with all of that background out of the way, we can now define an inherited +accuracy in terms of acceptance intervals. + +The crux of this is the insight that the range of one operation can become the +domain of another operation to compose them together. + +And since we have defined how to do this interval to interval mapping above, +transforming things becomes mechanical and thus implementable in reusable code. + +When talking about inherited accuracies `f(x) => g(x)` is used to denote that +`f`'s accuracy is a defined as `g`. + +An example to illustrate inherited accuracies: + +``` + tan(x) => sin(x)/cos(x) + + sin(x) => [sin(x) - 2^-11, sin(x) + 2^-11]` + cos(x) => [cos(x) - 2^-11, cos(x) + 2-11] + + x/y => [x/y - 2.5 * ULP(x/y), x/y + 2.5 * ULP(x/y)] +``` + +`sin(x)` and `cos(x)` are non-monotonic, so calculating out a closed generic +form over an interval is a pain, since the min and max vary depending on the +value of x. Let's isolate this to a single point, so you don't have to read +literally pages of expanded intervals. + +``` + x = π/2 + + sin(π/2) => [sin(π/2) - 2-11, sin(π/2) + 2-11] + => [0 - 2-11, 0 + 2-11] + => [-0.000488.., 0.000488...] + cos(π/2) => [cos(π/2) - 2-11, cos(π/2) + 2-11] + => [-0.500488, -0.499511...] + + tan(π/2) => sin(π/2)/cos(π/2) + => [-0.000488.., 0.000488...]/[-0.500488..., -0.499511...] + => [min({-0.000488.../-0.500488..., -0.000488.../-0.499511..., ...}), + max(min({-0.000488.../-0.500488..., -0.000488.../-0.499511..., ...}) ] + => [0.000488.../-0.499511..., 0.000488.../0.499511...] + => [-0.0009775171, 0.0009775171] +``` + +For clarity this has omitted a bunch of complexity around FTZ behaviours, and +that these operations are only defined for specific domains, but the high-level +concepts hold. + +For each of the inherited operations we could implement a manually written out +closed form solution, but that would be quite error-prone and not be +re-using code between builtins. + +Instead, the CTS takes advantage of the fact in addition to testing +implementations of `tan(x)` we are going to be testing implementations of +`sin(x)`, `cos(x)` and `x/y`, so there should be functions to generate +acceptance intervals for those operations. + +The `tan(x)` acceptance interval can be constructed by generating the acceptance +intervals for `sin(x)`, `cos(x)` and `x/y` via function calls and composing the +results. + +This algorithmically looks something like this: + +``` + tan(x): + Calculate sin(x) interval + Calculate cos(x) interval + Calculate sin(x) result divided by cos(x) result + Return division result +``` + +# Appendix + +### Significant f32 Values + +| Name | Decimal (~) | Hex | Sign Bit | Exponent Bits | Significand Bits | +| ---------------------- | --------------: | ----------: | -------: | ------------: | ---------------------------: | +| Negative Infinity | -∞ | 0xff80 0000 | 1 | 1111 1111 | 0000 0000 0000 0000 0000 000 | +| Min Negative Normal | -3.40282346E38 | 0xff7f ffff | 1 | 1111 1110 | 1111 1111 1111 1111 1111 111 | +| Max Negative Normal | -1.1754943E−38 | 0x8080 0000 | 1 | 0000 0001 | 0000 0000 0000 0000 0000 000 | +| Min Negative Subnormal | -1.1754942E-38 | 0x807f ffff | 1 | 0000 0000 | 1111 1111 1111 1111 1111 111 | +| Max Negative Subnormal | -1.4012984E−45 | 0x8000 0001 | 1 | 0000 0000 | 0000 0000 0000 0000 0000 001 | +| Negative Zero | -0 | 0x8000 0000 | 1 | 0000 0000 | 0000 0000 0000 0000 0000 000 | +| Positive Zero | 0 | 0x0000 0000 | 0 | 0000 0000 | 0000 0000 0000 0000 0000 000 | +| Min Positive Subnormal | 1.4012984E−45 | 0x0000 0001 | 0 | 0000 0000 | 0000 0000 0000 0000 0000 001 | +| Max Positive Subnormal | 1.1754942E-38 | 0x007f ffff | 0 | 0000 0000 | 1111 1111 1111 1111 1111 111 | +| Min Positive Normal | 1.1754943E−38 | 0x0080 0000 | 0 | 0000 0001 | 0000 0000 0000 0000 0000 000 | +| Max Positive Normal | 3.40282346E38 | 0x7f7f ffff | 0 | 1111 1110 | 1111 1111 1111 1111 1111 111 | +| Negative Infinity | ∞ | 0x7f80 0000 | 0 | 1111 1111 | 0000 0000 0000 0000 0000 000 | + +# Resources +- [WebGPU Spec](https://www.w3.org/TR/webgpu/) +- [WGSL Spec](https://www.w3.org/TR/WGSL/) +- [float32 on Wikipedia](https://en.wikipedia.org/wiki/Single-precision_floating-point_format) +- [IEEE-754 Floating Point Converter](https://www.h-schmidt.net/FloatConverter/IEEE754.html) +- [IEEE 754 Calculator](http://weitz.de/ieee/) +- [Keisan High Precision Calculator](https://keisan.casio.com/calculator) +- [On the definition of ulp(x)](https://hal.inria.fr/inria-00070503/document) |