diff options
Diffstat (limited to 'gfx/docs/RenderingOverview.rst')
-rw-r--r-- | gfx/docs/RenderingOverview.rst | 384 |
1 files changed, 384 insertions, 0 deletions
diff --git a/gfx/docs/RenderingOverview.rst b/gfx/docs/RenderingOverview.rst new file mode 100644 index 0000000000..1b2e30b6f6 --- /dev/null +++ b/gfx/docs/RenderingOverview.rst @@ -0,0 +1,384 @@ +Rendering Overview +================== + +This document is an overview of the steps to render a webpage, and how HTML +gets transformed and broken down, step by step, into commands that can execute +on the GPU. + +If you're coming into the graphics team with not a lot of background +in browsers, start here :) + +.. contents:: + +High level overview +------------------- + +.. image:: RenderingOverviewSimple.png + :width: 100% + +Layout +~~~~~~ +Starting at the left in the above image, we have a document +represented by a DOM - a Document Object Model. A Javascript engine +will execute JS code, either to make changes to the DOM, or to respond to +events generated by the DOM (or do both). + +The DOM is a high level description and we don't know what to draw or +where until it is combined with a Cascading Style Sheet (CSS). +Combining these two and figuring out what, where and how to draw +things is the responsibility of the Layout team. The +DOM is converted into a hierarchical Frame Tree, which nests visual +elements (boxes). Each element points to some node in a Style Tree +that describes what it should look like -- color, transparency, etc. +The result is that now we know exactly what to render where, what goes +on top of what (layering and blending) and at what pixel coordinate. +This is the Display List. + +The Display List is a light-weight data structure because it's shallow +-- it mostly points back to the Frame Tree. There are two problems +with this. First, we want to cross process boundaries at this point. +Everything up until now happens in a Content Process (of which there are +several). Actual GPU rendering happens in a GPU Process (on some +platforms). Second, everything up until now was written in C++; but +WebRender is written in Rust. Thus the shallow Display List needs to +be serialized in a completely self-contained binary blob that will +survive Interprocess Communication (IPC) and a language switch (C++ to +Rust). The result is the WebRender Display List. + +WebRender +~~~~~~~~~ + +The GPU process receives the WebRender Display List blob and +de-serializes it into a Scene. This Scene contains more than the +strictly visible elements; for example, to anticipate scrolling, we +might have several paragraphs of text extending past the visible page. + +For a given viewport, the Scene gets culled and stripped down to a +Frame. This is also where we start preparing data structures for GPU +rendering, for example getting some font glyphs into an atlas for +rasterizing text. + +The final step takes the Frame and submits commands to the GPU to +actually render it. The GPU will execute the commands and composite +the final page. + +Software +~~~~~~~~ + +The above is the new WebRender-enabled way to do things. But in the +schematic you'll note a second branch towards the bottom: this is the +legacy code path which does not use WebRender (nor Rust). In this +case, the Display List is converted into a Layer Tree. The purpose of +this Tree is to try and avoid having to re-render absolutely +everything when the page needs to be refreshed. For example, when +scrolling we should be able to redraw the page by mostly shifting +things around. However that requires those 'things' to still be around +from last time we drew the page. In other words, visual elements that +are likely to be static and reusable need to be drawn into their own +private "page" (a cache). Then we can recombine (composite) all of +these when redrawing the actual page. + +Figuring out which elements would be good candidates for this, and +striking a balance between good performance versus excessive memory +use, is the purpose of the Layer Tree. Each 'layer' is a cached image +of some element(s). This logic also takes occlusion into account, eg. +don't allocate and render a layer for elements that are known to be +completely obscured by something in front of them. + +Redrawing the page by combining the Layer Tree with any newly +rasterized elements is the job of the Compositor. + + +Even when a layer cannot be reused in its entirety, it is likely +that only a small part of it was invalidated. Thus there is an +elaborate system for tracking dirty rectangles, starting an update by +copying the area that can be salvaged, and then redrawing only what +cannot. + +In fact, this idea can be extended to delta-tracking of display lists +themselves. Traversing the layout tree and building a display list is +also not cheap, so the code tries to partially invalidate and rebuild +the display list incrementally when possible. +This optimization is used both for non-WebRender and WebRender in +fact. + + +Asynchronous Panning And Zooming +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Earlier we mentioned that a Scene might contain more elements than are +strictly necessary for rendering what's visible (the Frame). The +reason for that is Asynchronous Panning and Zooming, or APZ for short. +The browser will feel much more responsive if scrolling & zooming can +short-circuit all of these data transformations and IPC boundaries, +and instead directly update an offset of some layer and recomposite. +(Think of late-latching in a VR context) + +This simple idea introduces a lot of complexity: how much extra do you +rasterize, and in which direction? How much memory can we afford? +What about Javascript that responds to scroll events and perhaps does +something 'interesting' with the page in return? What about nested +frames or nested scrollbars? What if we scroll so much that we go +past the boundaries of the Scene that we know about? + +See AsyncPanZoom.rst for all that and more. + +A Few More Details +~~~~~~~~~~~~~~~~~~ + +Here's another schematic which basically repeats the previous one, but +showing a little bit more detail. Note that the direction is reversed +-- the data flow starts at the right. Sorry about that :) + +.. image:: RenderingOverviewDetail.png + :width: 100% + +Some things to note: + +- there are multiple content processes, currently 4 of them. This is + for security reasons (sandboxing), stability (isolate crashes) and + performance (multi-core machines); +- ideally each "webpage" would run in its own process for security; + this is being developed under the term 'fission'; +- there is only a single GPU process, if there is one at all; + some platforms have it as part of the Parent; +- not shown here is the Extension process that isolates WebExtensions; +- for non-WebRender, rasterization happens in the Content Process, and + we send entire Layers to the GPU/Compositor process (via shared + memory, only using actual IPC for its metadata like width & height); +- if the GPU process crashes (a bug or a driver issue) we can simply + restart it, resend the display list, and the browser itself doesn't crash; +- the browser UI is just another set of DOM+JS, albeit one that runs + with elevated privileges. That is, its JS can do things that + normal JS cannot. It lives in the Parent Process, which then uses + IPC to get it rendered, same as regular Content. (the IPC arrow also + goes to WebRender Display List but is omitted to reduce clutter); +- UI events get routed to APZ first, to minimize latency. By running + inside the GPU process, we may have access to data such + as rasterized clipping masks that enables finer grained hit testing; +- the GPU process talks back to the content process; in particular, + when APZ scrolls out of bounds, it asks Content to enlarge/shift the + Scene with a new "display port"; +- we still use the GPU when we can for compositing even in the + non-WebRender case; + + +WebRender In Detail +------------------- + +Converting a display list into GPU commands is broken down into a +number of steps and intermediate data structures. + + +.. image:: RenderingOverviewTrees.png + :width: 75% + :align: center + +.. + + *Each element in the picture tree points to exactly one node in the spatial + tree. Only a few of these links are shown for clarity (the dashed lines).* + +The Picture Tree +~~~~~~~~~~~~~~~~ + +The incoming display list uses "stacking contexts". For example, to +render some text with a drop shadow, a display list will contain three +items: + +- "enable shadow" with some parameters such as shadow color, blur size, and offset; +- the text item; +- "pop all shadows" to deactivate shadows; + +WebRender will break this down into two distinct elements, or +"pictures". The first represents the shadow, so it contains a copy of the +text item, but modified to use the shadow's color, and to shift the +text by the shadow's offset. The second picture contains the original text +to draw on top of the shadow. + +The fact that the first picture, the shadow, needs to be blurred, is a +"compositing" property of the picture which we'll deal with later. + +Thus, the stack-based display list gets converted into a list of pictures +-- or more generally, a hierarchy of pictures, since items are nested +as per the original HTML. + +Example visual elements are a TextRun, a LineDecoration, or an Image +(like a .png file). + +Compared to 3D rendering, the picture tree is similar to a scenegraph: it's a +parent/child hierarchy of all the drawable elements that make up the "scene", in +this case the webpage. One important difference is that the transformations are +stored in a separate tree, the spatial tree. + +The Spatial Tree +~~~~~~~~~~~~~~~~ + +The nodes in the spatial tree represent coordinate transforms. Every time the +DOM hierarchy needs child elements to be transformed relative to their parent, +we add a new Spatial Node to the tree. All those child elements will then point +to this node as their "local space" reference (aka coordinate frame). In +traditional 3D terms, it's a scenegraph but only containing transform nodes. + +The nodes are called frames, as in "coordinate frame": + +- a Reference Frame corresponds to a ``<div>``; +- a Scrolling Frame corresponds to a scrollable part of the page; +- a Sticky Frame corresponds to some fixed position CSS style. + +Each element in the picture tree then points to a spatial node inside this tree, +so by walking up and down the tree we can find the absolute position of where +each element should render (traversing down) and how large each element needs to +be (traversing up). Originally the transform information was part of the +picture tree, as in a traditional scenegraph, but visual elements and their +transforms were split apart for technical reasons. + +Some of these nodes are dynamic. A scroll-frame can obviously scroll, but a +Reference Frame might also use a property binding to enable a live link with +JavaScript, for dynamic updates of (currently) the transform and opacity. + +Axis-aligned transformations (scales and translations) are considered "simple", +and are conceptually combined into a single "CoordinateSystem". When we +encounter a non-axis-aligned transform, we start a new CoordinateSystem. We +start in CoordinateSystem 0 at the root, and would bump this to CoordinateSystem +1 when we encounter a Reference Frame with a rotation or 3D transform, for +example. This would then be the CoordinateSystem index for all its children, +until we run into another (nested) non-simple transform, and so on. Roughly +speaking, as long as we're in the same CoordinateSystem, the transform stack is +simple enough that we have a reasonable chance of being able to flatten it. That +lets us directly rasterize text at its final scale for example, optimizing +away some of the intermediate pictures (offscreen textures). + +The layout code positions elements relative to their parent. Thus to position +the element on the actual page, we need to walk the Spatial Tree all the way to +the root and apply each transform; the result is a ``LayoutToWorldTransform``. + +One final step transforms from World to Device coordinates, which deals with +DPI scaling and such. + +.. csv-table:: + :header: "WebRender term", "Rough analogy" + + Spatial Tree, Scenegraph -- transforms only + Picture Tree, Scenegraph -- drawables only (grouping) + Spatial Tree Rootnode, World Space + Layout space, Local/Object Space + Picture, RenderTarget (sort of; see RenderTask below) + Layout-To-World transform, Local-To-World transform + World-To-Device transform, World-To-Clipspace transform + + +The Clip Tree +~~~~~~~~~~~~~ + +Finally, we also have a Clip Tree, which contains Clip Shapes. For +example, a rounded corner div will produce a clip shape, and since +divs can be nested, you end up with another tree. By pointing at a Clip Shape, +visual elements will be clipped against this shape plus all parent shapes above it +in the Clip Tree. + +As with CoordinateSystems, a chain of simple 2D clip shapes can be collapsed +into something that can be handled in the vertex shader, at very little extra +cost. More complex clips must be rasterized into a mask first, which we then +sample from to ``discard`` in the pixel shader as needed. + +In summary, at the end of scene building the display list turned into +a picture tree, plus a spatial tree that tells us what goes where +relative to what, plus a clip tree. + +RenderTask Tree +~~~~~~~~~~~~~~~ + +Now in a perfect world we could simply traverse the picture tree and start +drawing things: one drawcall per picture to render its contents, plus one +drawcall to draw the picture into its parent. However, recall that the first +picture in our example is a "text shadow" that needs to be blurred. We can't +just rasterize blurry text directly, so we need a number of steps or "render +passes" to get the intended effect: + +.. image:: RenderingOverviewBlurTask.png + :align: right + :height: 400px + +- rasterize the text into an offscreen rendertarget; +- apply one or more downscaling passes until the blur radius is reasonable; +- apply a horizontal Gaussian blur; +- apply a vertical Gaussian blur; +- use the result as an input for whatever comes next, or blit it to + its final position on the page (or more generally, on the containing + parent surface/picture). + +In the general case, which passes we need and how many of them depends +on how the picture is supposed to be composited (CSS filters, SVG +filters, effects) and its parameters (very large vs. small blur +radius, say). + +Thus, we walk the picture tree and build a render task tree: each high +level abstraction like "blur me" gets broken down into the necessary +render passes to get the effect. The result is again a tree because a +render pass can have multiple input dependencies (eg. blending). + +(Cfr. games, this has echoes of the Frostbite Framegraph in that it +dynamically builds up a renderpass DAG and dynamically allocates storage +for the outputs). + +If there are complicated clip shapes that need to be rasterized first, +so their output can be sampled as a texture for clip/discard +operations, that would also end up in this tree as a dependency... (I think?). + +Once we have the entire tree of dependencies, we analyze it to see +which tasks can be combined into a single pass for efficiency. We +ping-pong rendertargets when we can, but sometimes the dependencies +cut across more than one level of the rendertask tree, and some +copying is necessary. + +Once we've figured out the passes and allocated storage for anything +we wish to persist in the texture cache, we finally start rendering. + +When rasterizing the elements into the Picture's offscreen texture, we'd +position them by walking the transform hierarchy as far up as the picture's +transform node, resulting in a ``Layout To Picture`` transform. The picture +would then go onto the page using a ``Picture To World`` coordinate transform. + +Caching +``````` + +Just as with layers in the software rasterizer, it is not always necessary to +redraw absolutely everything when parts of a document change. The webrender +equivalent of layers is Slices -- a grouping of pictures that are expected to +render and update together. Slices are automatically created based on +heuristics and layout hints/flags. + +Implementation wise, slices reuse a lot of the existing machinery for Pictures; +in fact they're implemented as a "Virtual picture" of sorts. The similarities +make sense: both need to allocate offscreen textures in a cache, both will +position and render all their children into it, and both then draw themselves +into their parent as part of the parent's draw. + +If a slice isn't expected to change much, we give it a TileCacheInstance. It is +itself made up of Tiles, where each tile will track what's in it, what's +changing, and if it needs to be invalidated and redrawn or not as a result. +Thus the "damage" from changes can be localized to single tiles, while we +salvage the rest of the cache. If tiles keep seeing a lot of invalidations, +they will recursively divide themselves in a quad-tree like structure to try and +localize the invalidations. (And conversely, they'll recombine children if +nothing is invalidating them "for a while"). + +Interning +````````` + +To spot invalidated tiles, we need a fast way to compare its contents from the +previous frame with the current frame. To speed this up, we use interning; +similar to string-interning, this means that each ``TextRun``, ``Decoration``, +``Image`` and so on is registered in a repository (a ``DataStore``) and +consequently referred to by its unique ID. Cache contents can then be encoded as a +list of IDs (one such list per internable element type). Diffing is then just a +fast list comparison. + + +Callbacks +````````` +GPU text rendering assumes that the individual font-glyphs are already +available in a texture atlas. Likewise SVG is not being rendered on +the GPU. Both inputs are prepared during scene building; glyph +rasterization via a thread pool from within Rust itself, and SVG via +opaque callbacks (back to C++) that produce blobs. |