1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
|
# swgl
Software OpenGL implementation for WebRender
## Overview
This is a relatively simple single threaded software rasterizer designed
for use by WebRender. It will shade one quad at a time using a 4xf32 vector
with one vertex per lane. It rasterizes quads usings spans and shades that
span 4 pixels at a time.
## Building
clang-cl is required to build on Windows. This can be done by installing
the llvm binaries from https://releases.llvm.org/ and adding the installation
to the path with something like `set PATH=%PATH%;C:\Program Files\LLVM\bin`.
Then `set CC=clang-cl` and `set CXX=clang-cl`. That should be sufficient
for `cc-rs` to use `clang-cl` instead of `cl`.
## Extensions
SWGL contains a number of OpenGL and GLSL extensions designed to both ease
integration with WebRender and to help accelerate span rasterization.
GLSL extension intrinsics are generally prefixed with `swgl_` to distinguish
them from other items in the GLSL namespace.
Inside GLSL, the `SWGL` preprocessor token is defined so that usage of SWGL
extensions may be conditionally compiled.
```
void swgl_drawSpanRGBA8();
void swgl_drawSpanR8();
int swgl_SpanLength;
int swgl_StepSize;
mixed swgl_interpStep(mixed varying_input);
void swgl_stepInterp();
```
SWGL's default fragment processing calls a fragment shader's `main` function
on groups of fragments in units of `swgl_StepSize`. On return, the value of
gl_FragColor is read, packed to an appropriate pixel format, and sent to the
blend stage for output to the destination framebuffer. This can be inefficient
for some types of fragment shaders, such as those that must lookup from a
texture and immediately output it, unpacking the texels only to subsequently
repack them at cost. Also, various per-fragment conditions in the shader might
need to be repeatedly checked, even though they are actually constant over
the entire primitive.
To work around this inefficiency, SWGL allows fragments to optionally be
processed over entire spans. This can both side-step the packing inefficiency
as well as more efficiently deal with conditions that remain constant over an
entire span. SWGL also introduces various specialized intrinsics for more
efficiently dealing with certain types of primitive spans with optimal
fixed-function processing.
Inside a fragment shader, a `swgl_drawSpan` function may be defined to override
the normal fragment processing for that fragment shader. The function must then
call some form of `swgl_commit` intrinsic to actually output to the destination
framebuffer via the blend stage, as normal fragment processing does not take
place otherwise as would have happened in `main`. This function is used by the
rasterizer to process an entire span of fragments that have passed the depth
test (if applicable) and clipping, but have not yet been output to the blend
stage.
The amount of fragments within the span to be processed is designated by
`swgl_SpanLength` and is always aligned to units of `swgl_StepSize`.
The size of a group of fragments in terms of which `swgl_commit` intrinsics
process and output fragments is designated by `swgl_StepSize`. The
`swgl_commit` intrinsics will deduct accordingly from `swgl_SpanLength` in
units of `swgl_StepSize` to reflect the fragments actually processed, which
may be less than the entire span or up to the entire span.
Fragments should be output until `swgl_SpanLength` becomes zero to process the
entire span. If `swgl_drawSpan` returns while leaving any fragments unprocessed,
the remaining fragments will be processed as normal by the fragment shader's
`main` function. This can be used to conditionally handle certain fast-paths
in a fragment shader by otherwise defaulting to the `main` function if
`swgl_drawSpan` can't appropriately process some or all of the fragments.
The values of any varying inputs to the fragment shader will be set to their
values for the start of the span, but do not automatically update over the
the course of a span within a given call to `swgl_drawSpan`. The
`swgl_interpStep` intrinsic may be used to get the derivative per `swgl_StepSize`
group of fragments of a varying input so that the caller may update such
variables manually if desired or otherwise use that information for processing.
The `swgl_stepInterp` intrinsic forces all such varying inputs to advance by
a single step.
The RGBA8 version will be used when the destination framebuffer is RGBA8 format,
and the R8 version will be used when the destination framebuffer is R8. Various
other intrinsics described below may have restrictions on whether they can be
used only with a certain destination framebuffer format and are noted as such if
so.
```
void swgl_clipMask(sampler2D mask, vec2 offset, vec2 bb_origin, vec2 bb_size);
```
When called from the the vertex shader, this specifies a clip mask texture to
be used to mask the currently drawn primitive while blending is enabled. This
mask will only apply to the current primitive.
The mask must be an R8 texture that will be interpreted as alpha weighting
applied to the source pixel prior to the blend stage. It is sampled 1:1 with
nearest filtering without any applied transform. The given offset specifies
the positioning of the clip mask relative to the framebuffer's viewport.
The supplied bounding box constrains sampling of the clip mask to only fall
within the given rectangle, specified relative to the clip mask offset.
Anything falling outside this rectangle will be clipped entirely. If the
rectangle is empty, then the clip mask will be ignored.
```
void swgl_antiAlias(int edgeMask);
```
When called from the vertex shader, this enables anti-aliasing for the
currently drawn primitive while blending is enabled. This setting will only
apply to the current primitive. Anti-aliasing will be applied only to the
edges corresponding to bits supplied in the mask. For simple use-cases,
the edge mask can be set to all 1 bits to enable AA for the entire quad.
The order of the bits in the edge mask must match the winding order in which
the vertices are output in the vertex shader if processed as a quad, so that
the edge ends on that vertex. The easiest way to understand this ordering
is that for a rectangle (x0,y0,x1,y1) then the edge Nth edge bit corresponds
to the edge where Nth coordinate in the rectangle is constant.
SWGL tries to use an anti-aliasing method that is reasonably close to WR's
signed-distance field approximation. WR would normally try to discern the
2D local-space coordinates of a given destination pixel relative to the
2D local-space bounding rectangle of a primitive. It then uses the screen-
space derivative to try to determine the how many local-space units equate
to a distance of around one screen-space pixel. A distance approximation
of coverage is then used based on the distance in local-space from the
the current pixel's center, roughly at half-intensity at pixel center
and ranging to zero or full intensity within a radius of half a pixel
away from the center. To account for AAing going outside the normal geometry
boundaries of the primitive, WR has to extrude the primitive by a local-space
estimate to allow some AA to happen within the extruded region.
SWGL can ultimately do this approximation more simply and get around the
extrusion limitations by just ensuring spans encompass any pixel that is
partially covered when computing span boundaries. Further, since SWGL already
knows the slope of an edge and the coordinate of the span relative to the span
boundaries, finding the partial coverage of a given span becomes easy to do
without requiring any extra interpolants to track against local-space bounds.
Essentially, SWGL just performs anti-aliasing on the actual geometry bounds,
but when the pixels on a span's edge are determined to be partially covered
during span rasterization, it uses the same distance field method as WR on
those span boundary pixels to estimate the coverage based on edge slope.
```
void swgl_commitTextureLinearRGBA8(sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitTextureLinearR8(sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitTextureLinearR8ToRGBA8(sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitTextureLinearColorRGBA8(sampler, vec2 uv, vec4 uv_bounds, vec4|float color);
void swgl_commitTextureLinearColorR8(sampler, vec2 uv, vec4 uv_bounds, vec4|float color);
void swgl_commitTextureLinearColorR8ToRGBA8(sampler, vec2 uv, vec4 uv_bounds, vec4|float color);
void swgl_commitTextureLinearRepeatRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds);
void swgl_commitTextureLinearRepeatColorRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds, vec4|float color);
void swgl_commitTextureNearestRGBA8(sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitTextureNearestColorRGBA8(sampler, vec2 uv, vec4 uv_bounds, vec4|float color);
void swgl_commitTextureNearestRepeatRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds);
void swgl_commitTextureNearestRepeatColorRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds, vec4|float color);
void swgl_commitTextureRGBA8(sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitTextureColorRGBA8(sampler, vec2 uv, vec4 uv_bounds, vec4|float color);
void swgl_commitTextureRepeatRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds);
void swgl_commitTextureRepeatColorRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds, vec4|float color);
void swgl_commitPartialTextureLinearR8(int len, sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitPartialTextureLinearInvertR8(int len, sampler, vec2 uv, vec4 uv_bounds);
```
Samples and commits an entire span of texture starting at the given uv and
within the supplied uv bounds. The color variations also accept a supplied color
that modulates the result.
The RGBA8 versions may only be used to commit within `swgl_drawSpanRGBA8`, and
the R8 versions may only be used to commit within `swgl_drawSpanR8`. The R8ToRGBA8
versions may be used to sample from an R8 source while committing to an RGBA8
framebuffer.
The Linear variations use a linear filter that bilinearly interpolates between
the four samples near the pixel. The Nearest variations use a nearest filter
that chooses the closest aliased sample to the center of the pixel. If neither
Linear nor Nearest is specified in the `swgl_commitTexture` variation name, then
it will automatically select either the Linear or Nearest variation depending
on the sampler's specified filter.
The Repeat variations require an optional repeat rect that specifies how to
scale and offset the UVs, assuming the UVs are normalized to repeat in the
range 0 to 1. For NearestRepeat variations, it is assumed the repeat rect is
always within the bounds. The tile repeat limit, if non-zero, specifies the
maximum number of repetitions allowed.
The Partial variations allow committing only a sub-span rather the entire
remaining span. These are currently only implemented in linear R8 variants
for optimizing clip shaders in WebRender. The Invert variant of these is
useful for implementing clip-out modes by inverting the source texture value.
```
// Premultiplied alpha over blend, but with source color set to source alpha modulated with a constant color.
void swgl_blendDropShadow(vec4 color);
// Premultiplied alpha over blend, but treats the source as a subpixel mask modulated with a constant color.
void swgl_blendSubpixelText(vec4 color);
```
SWGL allows overriding the blend mode per-primitive by calling `swgl_blend`
intrinsics in the vertex shader. The existing blend mode set by the GL is
replaced with the one specified by the intrinsic for the current primitive.
The blend mode will be reset to the blend mode set by the GL for the next
primitive after the current one, even within the same draw call.
|