diff options
Diffstat (limited to 'gfx/wr/webrender/doc')
-rw-r--r-- | gfx/wr/webrender/doc/CLIPPING_AND_POSITIONING.md | 150 | ||||
-rw-r--r-- | gfx/wr/webrender/doc/blob.md | 43 | ||||
-rw-r--r-- | gfx/wr/webrender/doc/swizzling.md | 31 | ||||
-rw-r--r-- | gfx/wr/webrender/doc/text-rendering.md | 368 |
4 files changed, 592 insertions, 0 deletions
diff --git a/gfx/wr/webrender/doc/CLIPPING_AND_POSITIONING.md b/gfx/wr/webrender/doc/CLIPPING_AND_POSITIONING.md new file mode 100644 index 0000000000..4aa8d0c684 --- /dev/null +++ b/gfx/wr/webrender/doc/CLIPPING_AND_POSITIONING.md @@ -0,0 +1,150 @@ +# Original Design + +To understand the current design for clipping and positioning (transformations +and scrolling) in WebRender it can be useful to have a little background about +the original design for these features. The most important thing to remember is +that originally clipping, scrolling regions, and transformations were +properties of stacking contexts and they were completely _hierarchical_. This +goes a long way toward representing the majority of CSS content on the web, but +fails when dealing with important edges cases and features including: + 1. Support for sticky positioned content + 2. Scrolling areas that include content that is ordered both above and below + intersecting content from outside the scroll area. + 3. Items in the same scrolling root, clipped by different clips one or more of + which are defined outside the scrolling root itself. + 4. Completely non-hierarchical clipping situations, such as when items are + clipped by some clips in the hierarchy, but not others. + +Design changes have been a step by step path from the original design to one +that can handle all CSS content. + +# Current Design + +All positioning and clipping is handled by the `SpatialTree`. The name is a +holdover from when this tree was a tree of `Layers` which handled both +positioning and clipping. Currently the `SpatialTree` holds: + 1. A hierarchical collection of `SpatialNodes`, with the final screen + transformation of each node depending on the relative transformation of the + node combined with the transformations of all of its ancestors. These nodes + are responsible for positioning display list items and clips. + 2. A collection of `ClipNodes` which specify a rectangular clip and, optionally, + a set of rounded rectangle clips and a masking image. + 3. A collection of `ClipChains`. Each `ClipChain` is a list of `ClipNode` + elements. Every display list item has an assigned `ClipChain` which + specifies what `ClipNodes` are applied to that item. + +The `SpatialNode` of each clip applied to an item is completely independent of +the `SpatialNode` applied to the item itself. + +One holdover from the previous design is that both `ClipNode` and `SpatialNodes` +have a parent node, which is either a `SpatialNode` or a `ClipNode`. From this +node WebRender can determine both a parent `ClipNode` and a parent `SpatialNode` +by finding the first ancestor of that type. This is handled by the +`DisplayListFlattener`. + +## `SpatialNode` +There are three types of `SpatialNodes`: + 1. Reference frames which are created when content needs to apply + transformation or perspective properties to display list items. Reference + frames establish a new coordinate system, so internally all coordinates on + display list items are relative to the reference frame origin. Later + any non-reference frame positioning nodes that display list items belong + to can adjust this position relative to the reference frame origin. + 2. Scrolling nodes are used to define scrolling areas. These nodes have scroll + offsets which are a 2D translation relative to ancestor nodes and, ultimately, + the reference frame origin. + 3. Sticky frames are responsible for implementing position:sticky behavior. + This is also an 2D translation. + +`SpatialNodes` are defined as items in the display list. After scene building +each node is traversed hierarchically during the `SpatialTree::update()` step. +Once reference frame transforms and relative offsets are calculated, a to screen +space transformation can be calculated for each `SpatialNode`. This transformation +is added the `TransformPalette` and becomes directly available to WebRender shaders. + +In addition to screen space transformation calculation, the `SpatialNode` tree +is divided up into _compatible coordinate systems_. These are coordinate systems +which differ only by 2D translations from their parent system. These compatible +coordinate systems may even cross reference frame boundaries. The goal here is +to allow the application clipping rectangles from different compatible +coordinate systems without generating mask images. + +## `ClipNode` + +Each clip node holds a clip rectangle along with an optional collection of +rounded clip rectangles and a mask image. The fact that `ClipNodes` all have a +clip rectangle is important because it means that all content clipped by a +clip node has a bounding rectangle, which can be converted into a bounding +screen space rectangle. This rectangle is called the _outer rectangle_ of the +clip. `ClipNodes` may also have an _inner rectangle_, which is an area within +the boundaries of the _outer rectangle_ that is completely unclipped. + +These rectangles are calculated during the `SpatialTree::update()` phase. In +addition, each `ClipNode` produces a template `ClipChainNode` used to build +the `ClipChains` which use that node. + +## `ClipChains` + +There are two ways that `ClipChains` are defined in WebRender. The first is +through using the API for manually specifying `ClipChains` via a parent +`ClipChain` and a list of `ClipNodes`. The second is through the hierarchy of a +`ClipNode` established by its parent node. Every `ClipNode` has a chain of +ancestor `SpatialNodes` and `ClipNodes`. The creation of a `ClipNode` +automatically defines a `ClipChain` for this hierarchy. This behavior is a +compatibility feature with the old completely hierarchical clipping architecture +and is still how Gecko and Servo create most of their `ClipChains`. These +hierarchical `ClipChains` are constructed during the `ClipNode::update()` step. + +During `ClipChain` construction, WebRender tries to eliminate clips that will +not affect rendering, by looking at the combined _outer rectangle_ and _inner +rectangle_ of a `ClipChain` and the _outer rectangle_ and _inner rectangle_ of +any `ClipNode` appended to the chain. An example of the goal of this process is +to avoid having to render a mask for a large rounded rectangle when the rest of +the clip chain constrains the content to an area completely inside that +rectangle. Avoiding mask rasterization in this case and others has large +performance impacts on WebRender. + +# Clipping and Positioning in the Display List + +Each non-structural WebRender display list item has + * A `SpatialId` of a `SpatialNode` for positioning + * A `ClipId` of a `ClipNode` or a `ClipChain` for clipping + * An item-specific rectangular clip rectangle + +The positioning node determines how that item is positioned. It's assumed that +the positioning node and the item are children of the same reference frame. The +clipping node determines how that item is clipped. This should be fully +independent of how the node is positioned and items can be clipped by any +`ClipChain` regardless of the reference frame of their member clips. Finally, +the item-specific clipping rectangle is applied directly to the item and should +never result in the creation of a clip mask itself. + +## Converting user-exposed `ClipId`/`SpatialId` to internal indices + +WebRender must access `ClipNodes` and `SpatialNodes` quite a bit when building +scenes and frames, so it tries to convert `ClipId`/`SpatialId`, which are already +per-pipeline indices, to global scene-wide indices. Internally this is a +conversion from `ClipId` into `ClipNodeIndex` or `ClipChainIndex`, and from +`SpatialId` into `SpatialNodeIndex`. In order to make this conversion cheaper, the +`DisplayListFlattner` assigns offsets for each pipeline and node type in the +scene-wide `SpatialTree`. + +Nodes are added to their respective arrays sequentially as the display list is +processed during scene building. When encountering an iframe, the +`DisplayListFlattener` must start processing the nodes for that iframe's +pipeline, meaning that nodes are now being added out of order to the node arrays +of the `SpatialTree`. In this case, the `SpatialTree` fills in the gaps in +the node arrays with placeholder nodes. + +# Hit Testing + +Hit testing is the responsibility of the `HitTester` data structure. This +structure copies information necessary for hit testing from the +`SpatialTree`. This is done so that hit testing can still take place while a +new `SpatialTree` is under construction. + +# Ideas for the Future +1. Expose the difference between `ClipId` and `ClipChainId` in the API. +2. Prevent having to duplicate the `SpatialTree` for hit testing. +3. Avoid having to create placeholder nodes in the `SpatialTree` while + processing iframes. diff --git a/gfx/wr/webrender/doc/blob.md b/gfx/wr/webrender/doc/blob.md new file mode 100644 index 0000000000..b910f6f76a --- /dev/null +++ b/gfx/wr/webrender/doc/blob.md @@ -0,0 +1,43 @@ +# Blob images + +Blob image is fallback mechanism for webrender that Gecko uses to render primitives that aren't currently supported by webrender. The main idea is to provide webrender with a custom handler that can take arbitray drawing commands serialized as buffers of bytes (the blobs) and turn them into images that webrender internally will treat as regular images. + +At the API level, blob images are treated as other images. They are resources created and associated with image keys, and are used in the display list with regular image display items. + + +## Active area + +In order to support scrolling very large content, blob images don't necessarily have a finite size. They can grow in any direction. At any time they do have an "active area", also called "visible area" which defines the portion that has to be rasterized. Typically this active area moves along large blob images depending on the scroll position. +The coordinate system of active area the *should* be the one of the blob's drawing commands (this is really up to the blob handler implementation to enforce that, Gecko does), and its scale should correspond to device pixels. The active area's coordinates can be negative. + +As far as positioning goes, the active area maps to the image display item's bounds. In other words the content at the top-left corner of the active area will be rendered on screen at the position of the top-left corner of the display item's local rect. + +In Gecko, the active area corresponds to the intersection of the fallback content's rect and the displayport. + +The terms "visible area" and "visible rect" are used a lot in the blobs code, unfortunately they collide with frame building's visibility/culling terminology. They don't correspond to what is visible to the user, but rather what is in the displayport. + + +## Tiling + +Blob images can be either tiled or non-tiled. Non-tiled blob images support invalid rects while tiled blob images track only validty at the tile level. In gecko all blobs are tiled with a tile size of 256x256. + +Just like regular tiled images, blob image tiles along the border of the image are shrinked to fit the remaining size. The only difference is that the tiling pattern always starts at the top-left corner for regular images (smaller boundary tiles only along the right and bottom edges), while it can be aribtrarily positioned for blob images (smaller boundary tiles potentially on all sides). + +The tiling logic is in webrender/src/image.rs. + + +## Async rasterization + +Blobs are typically too slow to rasterize on the critical path. We try to avoid blocking frame building on blob image rasterization. In order to do that we rasterize blobs as part of scene building. Rather than rasterize tiles on demand from visibility informating, we rasterize the entire active area during scene building. This means we potentially process a lot more content than will be displayed if the user doesn't scroll through all of the visible area. + +When the render backend receives a transaction, it looks for all new and update blob images, and generate blob rasterization requests for all tiles of the blob images that overlap their active area. The requests are bundled with an `AsyncBlobImageRasterizer` object in the transaction that is sent to the scene builder thread. The async rasterizer is created by the `BlobImageHandler` at each transaction. It is a snapshot of the state of the blobs as well as external information such as fonts, and does the actual rasterization. + +While tiles are rasterized eagerly during scene building, their content is uploaded lazily to the texture cache depending on the result of the visibility pass during frame building. + + +## Late rasterization + +In some case we run into a missing blob image during frame building and have to rasterize it synchronously. This happens when a rasterized tile is uploaded to the texture cache (at which point the CPU side is discarded), the texture cache entry expires and after scrolling back into view the tile is needed again. +We should really keep the rasterized blobs around just like we keep regular images in the cache. Hopefully this section will become obsolete eventually and we'll be able to remove late blob rasterization. + +The information needed for async rasterization corresponds to the state of blobs before scene building while late rasterization needs the state of blobs after the last complete scene build. This means we have to be careful about which version we manipulate in the resource cache. diff --git a/gfx/wr/webrender/doc/swizzling.md b/gfx/wr/webrender/doc/swizzling.md new file mode 100644 index 0000000000..4b38791940 --- /dev/null +++ b/gfx/wr/webrender/doc/swizzling.md @@ -0,0 +1,31 @@ +> It'd be great to have some (in-tree) docs describing the process you've worked through here, the overall motivation, how it works on different GPUs / platforms etc. Perhaps as a follow up? + +# Swizzling in WR + +## Problem statement + +The goal is to avoid the CPU conversion of data done by the driver on texture data uploads. It's slow and always done synchronously, hurting our "Renderer" thread CPU utilization. + +Gecko produces all of the image data in BGRA. Switching "imagelib" to RGBA is possible, but modifying Skia to follow is less trivial. +OpenGL support for BGRA8 as an internal texture format is a complex story: it's officially supported in Angle and a fair share of Android devices, but it's not available on the desktop GL (and until recently wasn't available in the Android emulator). Unofficially, when textures are initialized with `glTexImage` (as opposed to `glTexStorage`) with RGBA internal format, the desktop GL drivers often prefer to store the data in BGRA8 format, actually. + +The only way to avoid the CPU conversion is to provide the data in exactly the same format that the driver is using internally for a texture. In this case, the driver does a straght `memcpy` into its CPU-visible memory, which is the best we can hope for with OpenGL API. + +## Solution: swizzling + +https://phabricator.services.mozilla.com/D21965 is providing the solution to this problem. The main principles are: + + 1. Use `glTexStorage` whenever it's available. Doing so gives us full control of the internal format, also allows to avoid allocating memory for mipmaps that we don't use. + 2. Make the shared texture cache format to be determined at the init time, based on the GL device capabilities. For Angle and OpenGL ES this is BGRA8, for desktop this is RGBA8 (since desktop GL doesn't support BGRA internal formats). WebRender is now able to tell Gecko, which color format it prefers the texture data to use. + 3. If the data comes in a format that is different from our best case, we pretend that the data is actually in our best case format, and associate the allocated cache entry with the `Swizzle`. That swizzle configuration changes the way shaders sample from a texture, adjusting for the fact the data was provided in a different format. + 4. The lifetime of a "swizzled" texture cache data is starting at the point the data is uploaded and ending at a point where any shader samples from this data. Any other operation on that data (copying or blitting) is not configurable by `Swizzle` and thus would produce incorrect results. To address this, the change enhances `cs_copy` shader to be used in place of blitting from the texture cache, where needed. + 5. Swizzling becomes a part of the batch key per texture. Mixing up draw calls with texture data that is differently swizzled then introduces the batch breaks. This is a downside for the swizzling approach in general, but it's not clear to what extent this would affect Gecko. + +## Code paths + +Windows/Angle and Android: + - we use `glTexStorage` with BGRA8 internal format, no swizzling is needed in general case. + +Windows (non-Angle), Mac, Linux: + - if `glTexStorage` is available, we use it with RGBA8 internal format, swizzling everything on texture sampling. + - otherwise, we use RGBA unsized format with `gTexImage` and expect the data to come in BGRA format, no swizzling is involved. diff --git a/gfx/wr/webrender/doc/text-rendering.md b/gfx/wr/webrender/doc/text-rendering.md new file mode 100644 index 0000000000..0ac945ac19 --- /dev/null +++ b/gfx/wr/webrender/doc/text-rendering.md @@ -0,0 +1,368 @@ +# Text Rendering + +This document describes the details of how WebRender renders text, particularly the blending stage of text rendering. +We will go into grayscale text blending, subpixel text blending, and "subpixel text with background color" blending. + +### Prerequisites + +The description below assumes you're familiar with regular rgba compositing, operator over, +and the concept of premultiplied alpha. + +### Not covered in this document + +We are going to treat the origin of the text mask as a black box. +We're also going to assume we can blend text in the device color space and will not go into the gamma correction and linear pre-blending that happens in some of the backends that produce the text masks. + +## Grayscale Text Blending + +Grayscale text blending is the simplest form of text blending. Our blending function has three inputs: + + - The text color, as a premultiplied rgba color. + - The text mask, as a single-channel alpha texture. + - The existing contents of the framebuffer that we're rendering to, the "destination". This is also a premultiplied rgba buffer. + +Note: The word "grayscale" here does *not* mean that we can only draw gray text. +It means that the mask only has a single alpha value per pixel, so we can visualize +the mask in our minds as a grayscale image. + +### Deriving the math + +We want to mask our text color using the single-channel mask, and composite that to the destination. +This compositing step uses operator "over", just like regular compositing of rgba images. + +I'll be using GLSL syntax to describe the blend equations, but please consider most of the code below pseudocode. + +We can express the blending described above as the following blend equation: + +```glsl +vec4 textblend(vec4 text_color, vec4 mask, vec4 dest) { + return over(in(text_color, mask), dest); +} +``` + +with `over` being the blend function for (premultiplied) operator "over": + +```glsl +vec4 over(vec4 src, vec4 dest) { + return src + (1.0 - src.a) * dest; +} +``` + +and `in` being the blend function for (premultiplied) operator "in", i.e. the masking operator: + +```glsl +vec4 in(vec4 src, vec4 mask) { + return src * mask.a; +} +``` + +So the complete blending function is: + +```glsl +result.r = text_color.r * mask.a + (1.0 - text_color.a * mask.a) * dest.r; +result.g = text_color.g * mask.a + (1.0 - text_color.a * mask.a) * dest.g; +result.b = text_color.b * mask.a + (1.0 - text_color.a * mask.a) * dest.b; +result.a = text_color.a * mask.a + (1.0 - text_color.a * mask.a) * dest.a; +``` + +### Rendering this with OpenGL + +In general, a fragment shader does not have access to the destination. +So the full blend equation needs to be expressed in a way that the shader only computes values that are independent of the destination, +and the parts of the equation that use the destination values need to be applied by the OpenGL blend pipeline itself. +The OpenGL blend pipeline can be tweaked using the functions `glBlendEquation` and `glBlendFunc`. + +In our example, the fragment shader can output just `text_color * mask.a`: + +```glsl + oFragColor = text_color * mask.a; +``` + +and the OpenGL blend pipeline can be configured like so: + +```rust + pub fn set_blend_mode_premultiplied_alpha(&self) { + self.gl.blend_func(gl::ONE, gl::ONE_MINUS_SRC_ALPHA); + self.gl.blend_equation(gl::FUNC_ADD); + } +``` + +This results in an overall blend equation of + +``` +result.r = 1 * oFragColor.r + (1 - oFragColor.a) * dest.r; + ^ ^ ^^^^^^^^^^^^^^^^^ + | | | + +--gl::ONE | +-- gl::ONE_MINUS_SRC_ALPHA + | + +-- gl::FUNC_ADD + + = 1 * (text_color.r * mask.a) + (1 - (text_color.a * mask.a)) * dest.r + = text_color.r * mask.a + (1 - text_color.a * mask.a) * dest.r +``` + +which is exactly what we wanted. + +### Differences to the actual WebRender code + +There are two minor differences between the shader code above and the actual code in the text run shader in WebRender: + +```glsl +oFragColor = text_color * mask.a; // (shown above) +// vs. +oFragColor = vColor * mask * alpha; // (actual webrender code) +``` + +`vColor` is set to the text color. The differences are: + + - WebRender multiplies with all components of `mask` instead of just with `mask.a`. + However, our font rasterization code fills the rgb values of `mask` with the value of `mask.a`, + so this is completely equivalent. + - WebRender applies another alpha to the text. This is coming from the clip. + You can think of this alpha to be a pre-adjustment of the text color for that pixel, or as an + additional mask that gets applied to the mask. + +## Subpixel Text Blending + +Now that we have the blend equation for single-channel text blending, we can look at subpixel text blending. + +The main difference between subpixel text blending and grayscale text blending is the fact that, +for subpixel text, the text mask contains a separate alpha value for each color component. + +### Component alpha + +Regular painting uses four values per pixel: three color values, and one alpha value. The alpha value applies to all components of the pixel equally. + +Imagine for a second a world in which you have *three alpha values per pixel*, one for each color component. + + - Old world: Each pixel has four values: `color.r`, `color.g`, `color.b`, and `color.a`. + - New world: Each pixel has *six* values: `color.r`, `color.a_r`, `color.g`, `color.a_g`, `color.b`, and `color.a_b`. + +In such a world we can define a component-alpha-aware operator "over": + +```glsl +vec6 over_comp(vec6 src, vec6 dest) { + vec6 result; + result.r = src.r + (1.0 - src.a_r) * dest.r; + result.g = src.g + (1.0 - src.a_g) * dest.g; + result.b = src.b + (1.0 - src.a_b) * dest.b; + result.a_r = src.a_r + (1.0 - src.a_r) * dest.a_r; + result.a_g = src.a_g + (1.0 - src.a_g) * dest.a_g; + result.a_b = src.a_b + (1.0 - src.a_b) * dest.a_b; + return result; +} +``` + +and a component-alpha-aware operator "in": + +```glsl +vec6 in_comp(vec6 src, vec6 mask) { + vec6 result; + result.r = src.r * mask.a_r; + result.g = src.g * mask.a_g; + result.b = src.b * mask.a_b; + result.a_r = src.a_r * mask.a_r; + result.a_g = src.a_g * mask.a_g; + result.a_b = src.a_b * mask.a_b; + return result; +} +``` + +and even a component-alpha-aware version of `textblend`: + +```glsl +vec6 textblend_comp(vec6 text_color, vec6 mask, vec6 dest) { + return over_comp(in_comp(text_color, mask), dest); +} +``` + +This results in the following set of equations: + +```glsl +result.r = text_color.r * mask.a_r + (1.0 - text_color.a_r * mask.a_r) * dest.r; +result.g = text_color.g * mask.a_g + (1.0 - text_color.a_g * mask.a_g) * dest.g; +result.b = text_color.b * mask.a_b + (1.0 - text_color.a_b * mask.a_b) * dest.b; +result.a_r = text_color.a_r * mask.a_r + (1.0 - text_color.a_r * mask.a_r) * dest.a_r; +result.a_g = text_color.a_g * mask.a_g + (1.0 - text_color.a_g * mask.a_g) * dest.a_g; +result.a_b = text_color.a_b * mask.a_b + (1.0 - text_color.a_b * mask.a_b) * dest.a_b; +``` + +### Back to the real world + +If we want to transfer the component alpha blend equation into the real world, we need to make a few small changes: + + - Our text color only needs one alpha value. + So we'll replace all instances of `text_color.a_r/g/b` with `text_color.a`. + - We're currently not making use of the mask's `r`, `g` and `b` values, only of the `a_r`, `a_g` and `a_b` values. + So in the real world, we can use the rgb channels of `mask` to store those component alphas and + replace `mask.a_r/g/b` with `mask.r/g/b`. + +These two changes give us: + +```glsl +result.r = text_color.r * mask.r + (1.0 - text_color.a * mask.r) * dest.r; +result.g = text_color.g * mask.g + (1.0 - text_color.a * mask.g) * dest.g; +result.b = text_color.b * mask.b + (1.0 - text_color.a * mask.b) * dest.b; +result.a_r = text_color.a * mask.r + (1.0 - text_color.a * mask.r) * dest.a_r; +result.a_g = text_color.a * mask.g + (1.0 - text_color.a * mask.g) * dest.a_g; +result.a_b = text_color.a * mask.b + (1.0 - text_color.a * mask.b) * dest.a_b; +``` + +There's a third change we need to make: + + - We're rendering to a destination surface that only has one alpha channel instead of three. + So `dest.a_r/g/b` and `result.a_r/g/b` will need to become `dest.a` and `result.a`. + +This creates a problem: We're currently assigning different values to `result.a_r`, `result.a_g` and `result.a_b`. +Which of them should we use to compute `result.a`? + +This question does not have an answer. One alpha value per pixel is simply not sufficient +to express the same information as three alpha values. + +However, see what happens if the destination is already opaque: + +We have `dest.a_r == 1`, `dest.a_g == 1`, and `dest.a_b == 1`. + +``` +result.a_r = text_color.a * mask.r + (1 - text_color.a * mask.r) * dest.a_r + = text_color.a * mask.r + (1 - text_color.a * mask.r) * 1 + = text_color.a * mask.r + 1 - text_color.a * mask.r + = 1 +same for result.a_g and result.a_b +``` + +In other words, for opaque destinations, it doesn't matter what which channel of the mask we use when computing `result.a`, the result will always be completely opaque anyways. In WebRender we just pick `mask.g` (or rather, +have font rasterization set `mask.a` to the value of `mask.g`) because it's as good as any. + +The takeaway here is: **Subpixel text blending is only supported for opaque destinations.** Attempting to render subpixel +text into partially transparent destinations will result in bad alpha values. Or rather, it will result in alpha values which +are not anticipated by the r, g, and b values in the same pixel, so that subsequent blend operations, which will mix r and a values +from the same pixel, will produce incorrect colors. + +Here's the final subpixel blend function: + +```glsl +vec4 subpixeltextblend(vec4 text_color, vec4 mask, vec4 dest) { + vec4 result; + result.r = text_color.r * mask.r + (1.0 - text_color.a * mask.r) * dest.r; + result.g = text_color.g * mask.g + (1.0 - text_color.a * mask.g) * dest.g; + result.b = text_color.b * mask.b + (1.0 - text_color.a * mask.b) * dest.b; + result.a = text_color.a * mask.a + (1.0 - text_color.a * mask.a) * dest.a; + return result; +} +``` + +or for short: + +```glsl +vec4 subpixeltextblend(vec4 text_color, vec4 mask, vec4 dest) { + return text_color * mask + (1.0 - text_color.a * mask) * dest; +} +``` + +To recap, here's what we gained and lost by making the transition from the full-component-alpha world to the +regular rgba world: All colors and textures now only need four values to be represented, we still use a +component alpha mask, and the results are equivalent to the full-component-alpha result assuming that the +destination is opaque. We lost the ability to draw to partially transparent destinations. + +### Making this work in OpenGL + +We have the complete subpixel blend function. +Now we need to cut it into pieces and mix it with the OpenGL blend pipeline in such a way that +the fragment shader does not need to know about the destination. + +Compare the equation for the red channel and the alpha channel between the two ways of text blending: + +``` + single-channel alpha: + result.r = text_color.r * mask.a + (1.0 - text_color.a * mask.a) * dest.r + result.a = text_color.a * mask.a + (1.0 - text_color.a * mask.a) * dest.r + + component alpha: + result.r = text_color.r * mask.r + (1.0 - text_color.a * mask.r) * dest.r + result.a = text_color.a * mask.a + (1.0 - text_color.a * mask.a) * dest.r +``` + +Notably, in the single-channel alpha case, all three destination color channels are multiplied with the same thing: +`(1.0 - text_color.a * mask.a)`. This factor also happens to be "one minus `oFragColor.a`". +So we were able to take advantage of OpenGL's `ONE_MINUS_SRC_ALPHA` blend func. + +In the component alpha case, we're not so lucky: Each destination color channel +is multiplied with a different factor. We can use `ONE_MINUS_SRC_COLOR` instead, +and output `text_color.a * mask` from our fragment shader. +But then there's still the problem that the first summand of the computation for `result.r` uses +`text_color.r * mask.r` and the second summand uses `text_color.a * mask.r`. + +There are multiple ways to deal with this. They are: + + 1. Making use of `glBlendColor` and the `GL_CONSTANT_COLOR` blend func. + 2. Using a two-pass method. + 3. Using "dual source blending". + +Let's look at them in order. + +#### 1. Subpixel text blending in OpenGL using `glBlendColor` + +In this approach we return `text_color.a * mask` from the shader. +Then we set the blend color to `text_color / text_color.a` and use `GL_CONSTANT_COLOR` as the source blendfunc. +This results in the following blend equation: + +``` +result.r = (text_color.r / text_color.a) * oFragColor.r + (1 - oFragColor.r) * dest.r; + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^ ^^^^^^^^^^^^^^^^^ + | | | + +--gl::CONSTANT_COLOR | +-- gl::ONE_MINUS_SRC_COLOR + | + +-- gl::FUNC_ADD + + = (text_color.r / text_color.a) * (text_color.a * mask.r) + (1 - (text_color.a * mask.r)) * dest.r + = text_color.r * mask.r + (1 - text_color.a * mask.r) * dest.r +``` + +At the very beginning of this document, we defined `text_color` as the *premultiplied* text color. +So instead of actually doing the calculation `text_color.r / text_color.a` when specifying the blend color, +we really just want to use the *unpremultiplied* text color in that place. +That's usually the representation we start with anyway. + +#### 2. Two-pass subpixel blending in OpenGL + +The `glBlendColor` method has the disadvantage that the text color is part of the OpenGL state. +So if we want to draw text with different colors, we have two use separate batches / draw calls +to draw the differently-colored parts of text. + +Alternatively, we can use a two-pass method which avoids the need to use the `GL_CONSTANT_COLOR` blend func: + + - The first pass outputs `text_color.a * mask` from the fragment shader and + uses `gl::ZERO, gl::ONE_MINUS_SRC_COLOR` as the glBlendFuncs. This achieves: + +``` +oFragColor = text_color.a * mask; + +result_after_pass0.r = 0 * oFragColor.r + (1 - oFragColor.r) * dest.r + = (1 - text_color.a * mask.r) * dest.r + +result_after_pass0.g = 0 * oFragColor.g + (1 - oFragColor.g) * dest.r + = (1 - text_color.a * mask.r) * dest.r + +... +``` + + - The second pass outputs `text_color * mask` from the fragment shader and uses + `gl::ONE, gl::ONE` as the glBlendFuncs. This results in the correct overall blend equation. + +``` +oFragColor = text_color * mask; + +result_after_pass1.r + = 1 * oFragColor.r + 1 * result_after_pass0.r + = text_color.r * mask.r + result_after_pass0.r + = text_color.r * mask.r + (1 - text_color.a * mask.r) * dest.r +``` + +#### 3. Dual source subpixel blending in OpenGL + +The third approach is similar to the second approach, but makes use of the [`ARB_blend_func_extended`](https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_blend_func_extended.txt) extension +in order to fold the two passes into one: +Instead of outputting the two different colors in two separate passes, we output them from the same pass, +as two separate fragment shader outputs. +Those outputs can then be treated as two different sources in the blend equation. |