Rendering Overview ================== This document is an overview of the steps to render a webpage, and how HTML gets transformed and broken down, step by step, into commands that can execute on the GPU. If you're coming into the graphics team with not a lot of background in browsers, start here :) .. contents:: High level overview ------------------- .. image:: RenderingOverviewSimple.png :width: 100% Layout ~~~~~~ Starting at the left in the above image, we have a document represented by a DOM - a Document Object Model. A Javascript engine will execute JS code, either to make changes to the DOM, or to respond to events generated by the DOM (or do both). The DOM is a high level description and we don't know what to draw or where until it is combined with a Cascading Style Sheet (CSS). Combining these two and figuring out what, where and how to draw things is the responsibility of the Layout team. The DOM is converted into a hierarchical Frame Tree, which nests visual elements (boxes). Each element points to some node in a Style Tree that describes what it should look like -- color, transparency, etc. The result is that now we know exactly what to render where, what goes on top of what (layering and blending) and at what pixel coordinate. This is the Display List. The Display List is a light-weight data structure because it's shallow -- it mostly points back to the Frame Tree. There are two problems with this. First, we want to cross process boundaries at this point. Everything up until now happens in a Content Process (of which there are several). Actual GPU rendering happens in a GPU Process (on some platforms). Second, everything up until now was written in C++; but WebRender is written in Rust. Thus the shallow Display List needs to be serialized in a completely self-contained binary blob that will survive Interprocess Communication (IPC) and a language switch (C++ to Rust). The result is the WebRender Display List. WebRender ~~~~~~~~~ The GPU process receives the WebRender Display List blob and de-serializes it into a Scene. This Scene contains more than the strictly visible elements; for example, to anticipate scrolling, we might have several paragraphs of text extending past the visible page. For a given viewport, the Scene gets culled and stripped down to a Frame. This is also where we start preparing data structures for GPU rendering, for example getting some font glyphs into an atlas for rasterizing text. The final step takes the Frame and submits commands to the GPU to actually render it. The GPU will execute the commands and composite the final page. Software ~~~~~~~~ The above is the new WebRender-enabled way to do things. But in the schematic you'll note a second branch towards the bottom: this is the legacy code path which does not use WebRender (nor Rust). In this case, the Display List is converted into a Layer Tree. The purpose of this Tree is to try and avoid having to re-render absolutely everything when the page needs to be refreshed. For example, when scrolling we should be able to redraw the page by mostly shifting things around. However that requires those 'things' to still be around from last time we drew the page. In other words, visual elements that are likely to be static and reusable need to be drawn into their own private "page" (a cache). Then we can recombine (composite) all of these when redrawing the actual page. Figuring out which elements would be good candidates for this, and striking a balance between good performance versus excessive memory use, is the purpose of the Layer Tree. Each 'layer' is a cached image of some element(s). This logic also takes occlusion into account, eg. don't allocate and render a layer for elements that are known to be completely obscured by something in front of them. Redrawing the page by combining the Layer Tree with any newly rasterized elements is the job of the Compositor. Even when a layer cannot be reused in its entirety, it is likely that only a small part of it was invalidated. Thus there is an elaborate system for tracking dirty rectangles, starting an update by copying the area that can be salvaged, and then redrawing only what cannot. In fact, this idea can be extended to delta-tracking of display lists themselves. Traversing the layout tree and building a display list is also not cheap, so the code tries to partially invalidate and rebuild the display list incrementally when possible. This optimization is used both for non-WebRender and WebRender in fact. Asynchronous Panning And Zooming ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Earlier we mentioned that a Scene might contain more elements than are strictly necessary for rendering what's visible (the Frame). The reason for that is Asynchronous Panning and Zooming, or APZ for short. The browser will feel much more responsive if scrolling & zooming can short-circuit all of these data transformations and IPC boundaries, and instead directly update an offset of some layer and recomposite. (Think of late-latching in a VR context) This simple idea introduces a lot of complexity: how much extra do you rasterize, and in which direction? How much memory can we afford? What about Javascript that responds to scroll events and perhaps does something 'interesting' with the page in return? What about nested frames or nested scrollbars? What if we scroll so much that we go past the boundaries of the Scene that we know about? See :ref:`apz` for all that and more. A Few More Details ~~~~~~~~~~~~~~~~~~ Here's another schematic which basically repeats the previous one, but showing a little bit more detail. Note that the direction is reversed -- the data flow starts at the right. Sorry about that :) .. image:: RenderingOverviewDetail.png :width: 100% Some things to note: - there are multiple content processes, currently 4 of them. This is for security reasons (sandboxing), stability (isolate crashes) and performance (multi-core machines); - ideally each "webpage" would run in its own process for security; this is being developed under the term 'fission'; - there is only a single GPU process, if there is one at all; some platforms have it as part of the Parent; - not shown here is the Extension process that isolates WebExtensions; - for non-WebRender, rasterization happens in the Content Process, and we send entire Layers to the GPU/Compositor process (via shared memory, only using actual IPC for its metadata like width & height); - if the GPU process crashes (a bug or a driver issue) we can simply restart it, resend the display list, and the browser itself doesn't crash; - the browser UI is just another set of DOM+JS, albeit one that runs with elevated privileges. That is, its JS can do things that normal JS cannot. It lives in the Parent Process, which then uses IPC to get it rendered, same as regular Content. (the IPC arrow also goes to WebRender Display List but is omitted to reduce clutter); - UI events get routed to APZ first, to minimize latency. By running inside the GPU process, we may have access to data such as rasterized clipping masks that enables finer grained hit testing; - the GPU process talks back to the content process; in particular, when APZ scrolls out of bounds, it asks Content to enlarge/shift the Scene with a new "display port"; - we still use the GPU when we can for compositing even in the non-WebRender case; WebRender In Detail ------------------- Converting a display list into GPU commands is broken down into a number of steps and intermediate data structures. .. image:: RenderingOverviewTrees.png :width: 75% :align: center .. *Each element in the picture tree points to exactly one node in the spatial tree. Only a few of these links are shown for clarity (the dashed lines).* The Picture Tree ~~~~~~~~~~~~~~~~ The incoming display list uses "stacking contexts". For example, to render some text with a drop shadow, a display list will contain three items: - "enable shadow" with some parameters such as shadow color, blur size, and offset; - the text item; - "pop all shadows" to deactivate shadows; WebRender will break this down into two distinct elements, or "pictures". The first represents the shadow, so it contains a copy of the text item, but modified to use the shadow's color, and to shift the text by the shadow's offset. The second picture contains the original text to draw on top of the shadow. The fact that the first picture, the shadow, needs to be blurred, is a "compositing" property of the picture which we'll deal with later. Thus, the stack-based display list gets converted into a list of pictures -- or more generally, a hierarchy of pictures, since items are nested as per the original HTML. Example visual elements are a TextRun, a LineDecoration, or an Image (like a .png file). Compared to 3D rendering, the picture tree is similar to a scenegraph: it's a parent/child hierarchy of all the drawable elements that make up the "scene", in this case the webpage. One important difference is that the transformations are stored in a separate tree, the spatial tree. The Spatial Tree ~~~~~~~~~~~~~~~~ The nodes in the spatial tree represent coordinate transforms. Every time the DOM hierarchy needs child elements to be transformed relative to their parent, we add a new Spatial Node to the tree. All those child elements will then point to this node as their "local space" reference (aka coordinate frame). In traditional 3D terms, it's a scenegraph but only containing transform nodes. The nodes are called frames, as in "coordinate frame": - a Reference Frame corresponds to a ``