diff options
Diffstat (limited to 'gfx/docs')
-rw-r--r-- | gfx/docs/AsyncPanZoom.rst | 929 | ||||
-rw-r--r-- | gfx/docs/AsyncPanZoomArchitecture.png | bin | 0 -> 67837 bytes | |||
-rw-r--r-- | gfx/docs/GraphicsOverview.rst | 149 | ||||
-rw-r--r-- | gfx/docs/LayersHistory.rst | 63 | ||||
-rw-r--r-- | gfx/docs/Moz2D.rst | 16 | ||||
-rw-r--r-- | gfx/docs/RenderingOverview.rst | 384 | ||||
-rw-r--r-- | gfx/docs/RenderingOverviewBlurTask.png | bin | 0 -> 16264 bytes | |||
-rw-r--r-- | gfx/docs/RenderingOverviewDetail.png | bin | 0 -> 148839 bytes | |||
-rw-r--r-- | gfx/docs/RenderingOverviewSimple.png | bin | 0 -> 54981 bytes | |||
-rw-r--r-- | gfx/docs/RenderingOverviewTrees.png | bin | 0 -> 80062 bytes | |||
-rw-r--r-- | gfx/docs/Silk.rst | 472 | ||||
-rw-r--r-- | gfx/docs/SilkArchitecture.png | bin | 0 -> 221047 bytes | |||
-rw-r--r-- | gfx/docs/index.rst | 17 |
13 files changed, 2030 insertions, 0 deletions
diff --git a/gfx/docs/AsyncPanZoom.rst b/gfx/docs/AsyncPanZoom.rst new file mode 100644 index 0000000000..761c8fbb4f --- /dev/null +++ b/gfx/docs/AsyncPanZoom.rst @@ -0,0 +1,929 @@ +.. _apz: + +Asynchronous Panning and Zooming +================================ + +**This document is a work in progress. Some information may be missing +or incomplete.** + +.. image:: AsyncPanZoomArchitecture.png + +Goals +----- + +We need to be able to provide a visual response to user input with +minimal latency. In particular, on devices with touch input, content +must track the finger exactly while panning, or the user experience is +very poor. According to the UX team, 120ms is an acceptable latency +between user input and response. + +Context and surrounding architecture +------------------------------------ + +The fundamental problem we are trying to solve with the Asynchronous +Panning and Zooming (APZ) code is that of responsiveness. By default, +web browsers operate in a “game loop” that looks like this: + +:: + + while true: + process input + do computations + repaint content + display repainted content + +In browsers the “do computation” step can be arbitrarily expensive +because it can involve running event handlers in web content. Therefore, +there can be an arbitrary delay between the input being received and the +on-screen display getting updated. + +Responsiveness is always good, and with touch-based interaction it is +even more important than with mouse or keyboard input. In order to +ensure responsiveness, we split the “game loop” model of the browser +into a multithreaded variant which looks something like this: + +:: + + Thread 1 (compositor thread) + while true: + receive input + send a copy of input to thread 2 + adjust rendered content based on input + display adjusted rendered content + + Thread 2 (main thread) + while true: + receive input from thread 1 + do computations + rerender content + update the copy of rendered content in thread 1 + +This multithreaded model is called off-main-thread compositing (OMTC), +because the compositing (where the content is displayed on-screen) +happens on a separate thread from the main thread. Note that this is a +very very simplified model, but in this model the “adjust rendered +content based on input” is the primary function of the APZ code. + +A couple of notes on APZ's relationship to other browser architecture +improvements: + +1. Due to Electrolysis (e10s), Site Isolation (Fission), and GPU Process + isolation, the above two threads often actually run in different + processes. APZ is largely agnostic to this, as all communication + between the two threads for APZ purposes happens using an IPC layer + that abstracts over communication between threads vs. processes. +2. With the WebRender graphics backend, part of the rendering pipeline is + also offloaded from the main thread. In this architecture, the + information sent from the main thread consists of a display list, and + scrolling-related metadata referencing content in that display list. + The metadata is kept in a queue until the display list undergoes an + additional rendering step in the compositor (scene building). At this + point, we are ready to tell APZ about the new content and have it + start applying adjustments to it, as further rendering steps beyond + scene building are done synchronously on each composite. + +The compositor in theory can continuously composite previously rendered +content (adjusted on each composite by APZ) to the screen while the +main thread is busy doing other things and rendering new content. + +The APZ code takes the input events that are coming in from the hardware +and uses them to figure out what the user is trying to do (e.g. pan the +page, zoom in). It then expresses this user intention in the form of +translation and/or scale transformation matrices. These transformation +matrices are applied to the rendered content at composite time, so that +what the user sees on-screen reflects what they are trying to do as +closely as possible. + +Technical overview +------------------ + +As per the heavily simplified model described above, the fundamental +purpose of the APZ code is to take input events and produce +transformation matrices. This section attempts to break that down and +identify the different problems that make this task non-trivial. + +Checkerboarding +~~~~~~~~~~~~~~~ + +The area of page content for which a display list is built and sent to +the compositor is called the “displayport”. The APZ code is responsible +for determining how large the displayport should be. On the one hand, we +want the displayport to be as large as possible. At the very least it +needs to be larger than what is visible on-screen, because otherwise, as +soon as the user pans, there will be some unpainted area of the page +exposed. However, we cannot always set the displayport to be the entire +page, because the page can be arbitrarily long and this would require an +unbounded amount of memory to store. Therefore, a good displayport size +is one that is larger than the visible area but not so large that it is a +huge drain on memory. Because the displayport is usually smaller than the +whole page, it is always possible for the user to scroll so fast that +they end up in an area of the page outside the displayport. When this +happens, they see unpainted content; this is referred to as +“checkerboarding”, and we try to avoid it where possible. + +There are many possible ways to determine what the displayport should be +in order to balance the tradeoffs involved (i.e. having one that is too +big is bad for memory usage, and having one that is too small results in +excessive checkerboarding). Ideally, the displayport should cover +exactly the area that we know the user will make visible. Although we +cannot know this for sure, we can use heuristics based on current +panning velocity and direction to ensure a reasonably-chosen displayport +area. This calculation is done in the APZ code, and a new desired +displayport is frequently sent to the main thread as the user is panning +around. + +Multiple scrollable elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Consider, for example, a scrollable page that contains an iframe which +itself is scrollable. The iframe can be scrolled independently of the +top-level page, and we would like both the page and the iframe to scroll +responsively. This means that we want independent asynchronous panning +for both the top-level page and the iframe. In addition to iframes, +elements that have the overflow:scroll CSS property set are also +scrollable. In the display list, scrollable elements are arranged in a +tree structure, and in the APZ code we have a matching tree of +AsyncPanZoomController (APZC) objects, one for each scrollable element. +To manage this tree of APZC instances, we have a single APZCTreeManager +object. Each APZC is relatively independent and handles the scrolling for +its associated scrollable element, but there are some cases in which they +need to interact; these cases are described in the sections below. + +Hit detection +~~~~~~~~~~~~~ + +Consider again the case where we have a scrollable page that contains an +iframe which itself is scrollable. As described above, we will have two +APZC instances - one for the page and one for the iframe. When the user +puts their finger down on the screen and moves it, we need to do some +sort of hit detection in order to determine whether their finger is on +the iframe or on the top-level page. Based on where their finger lands, +the appropriate APZC instance needs to handle the input. + +This hit detection is done by APZCTreeManager in collaboration with +WebRender, which has more detailed information about the structure of +the page content than is stored in APZ directly. See +:ref:`this section <wr-hit-test-details>` for more details. + +Also note that for some types of input (e.g. when the user puts two +fingers down to do a pinch) we do not want the input to be “split” +across two different APZC instances. In the case of a pinch, for +example, we find a “common ancestor” APZC instance - one that is +zoomable and contains all of the touch input points, and direct the +input to that APZC instance. + +Scroll Handoff +~~~~~~~~~~~~~~ + +Consider yet again the case where we have a scrollable page that +contains an iframe which itself is scrollable. Say the user scrolls the +iframe so that it reaches the bottom. If the user continues panning on +the iframe, the expectation is that the top-level page will start +scrolling. However, as discussed in the section on hit detection, the +APZC instance for the iframe is separate from the APZC instance for the +top-level page. Thus, we need the two APZC instances to communicate in +some way such that input events on the iframe result in scrolling on the +top-level page. This behaviour is referred to as “scroll handoff” (or +“fling handoff” in the case where analogous behaviour results from the +scrolling momentum of the page after the user has lifted their finger). + +.. _input-event-untransformation: + +Input event untransformation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The APZC architecture by definition results in two copies of a “scroll +position” for each scrollable element. There is the original copy on the +main thread that is accessible to web content and the layout and +painting code. And there is a second copy on the compositor side, which +is updated asynchronously based on user input, and corresponds to what +the user visually sees on the screen. Although these two copies may +diverge temporarily, they are reconciled periodically. In particular, +they diverge while the APZ code is performing an async pan or zoom +action on behalf of the user, and are reconciled when the APZ code +requests a repaint from the main thread. + +Because of the way input events are represented, this has some +unfortunate consequences. Input event coordinates are represented +relative to the device screen - so if the user touches at the same +physical spot on the device, the same input events will be delivered +regardless of the content scroll position. When the main thread receives +a touch event, it combines that with the content scroll position in order +to figure out what DOM element the user touched. However, because we now +have two different scroll positions, this process may not work perfectly. +A concrete example follows: + +Consider a device with screen size 600 pixels tall. On this device, a +user is viewing a document that is 1000 pixels tall, and that is +scrolled down by 200 pixels. That is, the vertical section of the +document from 200px to 800px is visible. Now, if the user touches a +point 100px from the top of the physical display, the hardware will +generate a touch event with y=100. This will get sent to the main +thread, which will add the scroll position (200) and get a +document-relative touch event with y=300. This new y-value will be used +in hit detection to figure out what the user touched. If the document +had a absolute-positioned div at y=300, then that would receive the +touch event. + +Now let us add some async scrolling to this example. Say that the user +additionally scrolls the document by another 10 pixels asynchronously +(i.e. only on the compositor thread), and then does the same touch +event. The same input event is generated by the hardware, and as before, +the document will deliver the touch event to the div at y=300. However, +visually, the document is scrolled by an additional 10 pixels so this +outcome is wrong. What needs to happen is that the APZ code needs to +intercept the touch event and account for the 10 pixels of asynchronous +scroll. Therefore, the input event with y=100 gets converted to y=110 in +the APZ code before being passed on to the main thread. The main thread +then adds the scroll position it knows about and determines that the +user touched at a document-relative position of y=310. + +Analogous input event transformations need to be done for horizontal +scrolling and zooming. + +Content independently adjusting scrolling +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As described above, there are two copies of the scroll position in the +APZ architecture - one on the main thread and one on the compositor +thread. Usually for architectures like this, there is a single “source +of truth” value and the other value is simply a copy. However, in this +case that is not easily possible to do. The reason is that both of these +values can be legitimately modified. On the compositor side, the input +events the user is triggering modify the scroll position, which is then +propagated to the main thread. However, on the main thread, web content +might be running Javascript code that programmatically sets the scroll +position (via window.scrollTo, for example). Scroll changes driven from +the main thread are just as legitimate and need to be propagated to the +compositor thread, so that the visual display updates in response. + +Because the cross-thread messaging is asynchronous, reconciling the two +types of scroll changes is a tricky problem. Our design solves this +using various flags and generation counters. The general heuristic we +have is that content-driven scroll position changes (e.g. scrollTo from +JS) are never lost. For instance, if the user is doing an async scroll +with their finger and content does a scrollTo in the middle, then some +of the async scroll would occur before the “jump” and the rest after the +“jump”. + +Content preventing default behaviour of input events +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Another problem that we need to deal with is that web content is allowed +to intercept touch events and prevent the “default behaviour” of +scrolling. This ability is defined in web standards and is +non-negotiable. Touch event listeners in web content are allowed call +preventDefault() on the touchstart or first touchmove event for a touch +point; doing this is supposed to “consume” the event and prevent +touch-based panning. As we saw in a previous section, the input event +needs to be untransformed by the APZ code before it can be delivered to +content. But, because of the preventDefault problem, we cannot fully +process the touch event in the APZ code until content has had a chance +to handle it. + +To balance the needs of correctness (which calls for allowing web content +to successfully prevent default handling of events if it wishes to) and +responsiveness (which calls for avoiding blocking on web content +Javascript for a potentially-unbounded amount of time before reacting to +an event), APZ gives web content a "deadline" to process the event and +tell APZ whether preventDefault() was called on the event. The deadline +is 400ms from the time APZ receives the event on desktop, and 600ms on +mobile. If web content is able to process the event before this deadline, +the decision to preventDefault() the event or not will be respected. If +web content fails to process the event before the deadline, APZ assumes +preventDefault() will not be called and goes ahead and processes the +event. + +To implement this, upon receiving a touch event, APZ immediately returns +an untransformed version that can be dispatched to content. It also +schedules the 400ms or 600ms timeout. There is an API that allows the +main-thread event dispatching code to notify APZ as to whether or not the +default action should be prevented. If the APZ content response timeout +expires, or if the main-thread event dispatching code notifies the APZ of +the preventDefault status, then the APZ continues with the processing of +the events (which may involve discarding the events). + +To limit the responsiveness impact of this round-trip to content, APZ +tries to identify cases where it can rule out preventDefault() as a +possible outcome. To this end, the hit-testing information sent to the +compositor includes information about which regions of the page are +occupied by elements that have a touch event listener. If an event +targets an area outside of these regions, preventDefault() can be ruled +out, and the round-trip skipped. + +Additionally, recent enhancements to web standards have given page +authors new tools that can further limit the responsiveness impact of +preventDefault(): + +1. Event listeners can be registered as "passive", which means they + are not allowed to call preventDefault(). Authors can use this flag + when writing listeners that only need to observe the events, not alter + their behaviour via preventDefault(). The presence of passive event + listeners does not cause APZ to perform the content round-trip. +2. If page authors wish to disable certain types of touch interactions + completely, they can use the ``touch-action`` CSS property from the + pointer-events spec to do so declaratively, instead of registering + event listeners that call preventDefault(). Touch-action flags are + also included in the hit-test information sent to the compositor, and + APZ uses this information to respect ``touch-action``. (Note that the + touch-action information sent to the compositor is not always 100% + accurate, and sometimes APZ needs to fall back on asking the main + thread for touch-action information, which again involves a + round-trip.) + +Other event types +~~~~~~~~~~~~~~~~~ + +The above sections talk mostly about touch events, but over time APZ has +been extended to handle a variety of other event types, such as trackpad +and mousewheel scrolling, scrollbar thumb dragging, and keyboard +scrolling in some cases. Much of the above applies to these other event +types too (for example, wheel events can be prevent-defaulted as well). + +Importantly, the "untransformation" described above needs to happen even +for event types which are not handled in APZ, such as mouse click events, +since async scrolling can still affect the correct targeting of such +events. + + +Technical details +----------------- + +This section describes various pieces of the APZ code, and goes into +more specific detail on APIs and code than the previous sections. The +primary purpose of this section is to help people who plan on making +changes to the code, while also not going into so much detail that it +needs to be updated with every patch. + +Overall flow of input events +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This section describes how input events flow through the APZ code. + +Disclaimer: some details in this section are out of date (for example, +it assumes the case where the main thread and compositor thread are +in the same process, which is rarely the case these days, so in practice +e.g. steps 6 and 8 involve IPC, not just "stack unwinding"). + +1. Input events arrive from the hardware/widget code into the APZ via + APZCTreeManager::ReceiveInputEvent. The thread that invokes this is + called the "controller thread", and may or may not be the same as the + Gecko main thread. +2. Conceptually the first thing that the APZCTreeManager does is to + associate these events with “input blocks”. An input block is a set + of events that share certain properties, and generally are intended + to represent a single gesture. For example with touch events, all + events following a touchstart up to but not including the next + touchstart are in the same block. All of the events in a given block + will go to the same APZC instance and will either all be processed + or all be dropped. +3. Using the first event in the input block, the APZCTreeManager does a + hit-test to see which APZC it hits. If no APZC is hit, the events are + discarded and we jump to step 6. Otherwise, the input block is tagged + with the hit APZC as a tentative target and put into a global APZ + input queue. In addition the target APZC, the result of the hit test + also includes whether the input event landed on a "dispatch-to-content" + region. These are regions of the page where there is something going + on that requires dispatching the event to content and waiting for + a response _before_ processing the event in APZ; an example of this + is a region containing an element with a non-passive event listener, + as described above. (TODO: Add a section that talks about the other + uses of the dispatch-to-content mechanism.) +4. + + i. If the input events landed outside a dispatch-to-content region, + any available events in the input block are processed. These may + trigger behaviours like scrolling or tap gestures. + ii. If the input events landed inside a dispatch-to-content region, + the events are left in the queue and a timeout is initiated. If + the timeout expires before step 9 is completed, the APZ assumes + the input block was not cancelled and the tentative target is + correct, and processes them as part of step 10. + +5. The call stack unwinds back to APZCTreeManager::ReceiveInputEvent, + which does an in-place modification of the input event so that any + async transforms are removed. +6. The call stack unwinds back to the widget code that called + ReceiveInputEvent. This code now has the event in the coordinate + space Gecko is expecting, and so can dispatch it to the Gecko main + thread. +7. Gecko performs its own usual hit-testing and event dispatching for + the event. As part of this, it records whether any touch listeners + cancelled the input block by calling preventDefault(). It also + activates inactive scrollframes that were hit by the input events. +8. The call stack unwinds back to the widget code, which sends two + notifications to the APZ code on the controller thread. The first + notification is via APZCTreeManager::ContentReceivedInputBlock, and + informs the APZ whether the input block was cancelled. The second + notification is via APZCTreeManager::SetTargetAPZC, and informs the + APZ of the results of the Gecko hit-test during event dispatch. Note + that Gecko may report that the input event did not hit any + scrollable frame at all. The SetTargetAPZC notification happens only + once per input block, while the ContentReceivedInputBlock + notification may happen once per block, or multiple times per block, + depending on the input type. +9. + + i. If the events were processed as part of step 4(i), the + notifications from step 8 are ignored and step 10 is skipped. + ii. If events were queued as part of step 4(ii), and steps 5-8 + complete before the timeout, the arrival of both notifications + from step 8 will mark the input block ready for processing. + iii. If events were queued as part of step 4(ii), but steps 5-8 take + longer than the timeout, the notifications from step 8 will be + ignored and step 10 will already have happened. + +10. If events were queued as part of step 4(ii) they are now either + processed (if the input block was not cancelled and Gecko detected a + scrollframe under the input event, or if the timeout expired) or + dropped (all other cases). Note that the APZC that processes the + events may be different at this step than the tentative target from + step 3, depending on the SetTargetAPZC notification. Processing the + events may trigger behaviours like scrolling or tap gestures. + +If the CSS touch-action property is enabled, the above steps are +modified as follows: + +* In step 4, the APZC also requires the allowed touch-action behaviours + for the input event. This might have been determined as part of the + hit-test in APZCTreeManager; if not, the events are queued. +* In step 6, the widget code determines the content element at the point + under the input element, and notifies the APZ code of the allowed + touch-action behaviours. This notification is sent via a call to + APZCTreeManager::SetAllowedTouchBehavior on the input thread. +* In step 9(ii), the input block will only be marked ready for processing + once all three notifications arrive. + +Threading considerations +^^^^^^^^^^^^^^^^^^^^^^^^ + +The bulk of the input processing in the APZ code happens on what we call +“the controller thread”. In practice the controller thread could be the +Gecko main thread, the compositor thread, or some other thread. There are +obvious downsides to using the Gecko main thread - that is,“asynchronous” +panning and zooming is not really asynchronous as input events can only +be processed while Gecko is idle. In an e10s environment, using the Gecko +main thread of the chrome process is acceptable, because the code running +in that process is more controllable and well-behaved than arbitrary web +content. Using the compositor thread as the controller thread could work +on some platforms, but may be inefficient on others. For example, on +Android (Fennec) we receive input events from the system on a dedicated +UI thread. We would have to redispatch the input events to the compositor +thread if we wanted to the input thread to be the same as the compositor +thread. This introduces a potential for higher latency, particularly if +the compositor does any blocking operations - blocking SwapBuffers +operations, for example. As a result, the APZ code itself does not assume +that the controller thread will be the same as the Gecko main thread or +the compositor thread. + +Active vs. inactive scrollframes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The number of scrollframes on a page is potentially unbounded. However, +we do not want to create a separate displayport for each scrollframe +right away, as this would require large amounts of memory. Therefore, +scrollframes as designated as either “active” or “inactive”. Active +scrollframes get a displayport, and an APZC on the compositor side. +Inactive scrollframes do not get a displayport (a display list is only +built for their viewport, i.e. what is currently visible) and do not get +an APZC. + +Consider a page with a scrollframe that is initially inactive. This +scroll frame does not get an APZC, and therefore events targeting it will +target the APZC for the nearest active scrollable ancestor (let's call it +P; note, the rootmost scroll frame in a given process is always active). +However, the presence of the inactive scroll frame is reflected by a +dispatch-to-content region that prevents events over the frame from +erroneously scrolling P. + +When the user starts interacting with that content, the hit-test in the +APZ code hits the dispatch-to-content region of P. The input block +therefore has a tentative target of P when it goes into step 4(ii) in the +flow above. When gecko processes the input event, it must detect the +inactive scrollframe and activate it, as part of step 7. Finally, the +widget code sends the SetTargetAPZC notification in step 8 to notify the +APZ that the input block should really apply to this new APZC. An issue +here is that the transaction containing metadata for the newly active +scroll frame must reach the compositor and APZ before the SetTargetAPZC +notification. If this does not occur within the 400ms timeout, the APZ +code will be unable to update the tentative target, and will continue to +use P for that input block. Input blocks that start after the transaction +will get correctly routed to the new scroll frame as there will now be an +APZC instance for the active scrollframe. + +This model implies that when the user initially attempts to scroll an +inactive scrollframe, it may end up scrolling an ancestor scrollframe. +Only after the round-trip to the gecko thread is complete is there an +APZC for async scrolling to actually occur on the scrollframe itself. At +that point the scrollframe will start receiving new input blocks and will +scroll normally. + +Note: with Fission (where inactive scroll frames would make it impossible +to target the correct process in all situations; see +:ref:`this section <fission-hit-testing>` for more details) and WebRender +(which makes displayports more lightweight as the actual rendering is +offloaded to the compositor and can be done on demand), inactive scroll +frames are being phased out, and we are moving towards a model where all +scroll frames with nonempty scroll ranges are active and get a +displayport and an APZC. To conserve memory, displayports for scroll +frames which have not been recently scrolled are kept to a "minimal" size +equal to the viewport size. + +WebRender Integration +~~~~~~~~~~~~~~~~~~~~~ + +This section describes how APZ interacts with the WebRender graphics +backend. + +Note that APZ predates WebRender, having initially been written to work +with the earlier Layers graphics backend. The design of Layers has +influenced APZ significantly, and this still shows in some places in the +code. Now that the Layers backend has been removed, there may be +opportunities to streamline the interaction between APZ and WebRender. + + +HitTestingTree +^^^^^^^^^^^^^^ + +The APZCTreeManager keeps as part of its internal state a tree of +HitTestingTreeNode instances. This is referred to as the HitTestingTree. + +The main purpose of the HitTestingTree is to model the spatial +relationships between content that's affected by async scrolling. Tree +nodes fall roughly into the following categories: + +* Nodes representing scrollable content in an active scroll frame. These + nodes are associated with the scroll frame's APZC. +* Nodes representing other content that may move in special ways in + response to async scrolling, such as fixed content, sticky content, and + scrollbars. +* (Non-leaf) nodes which do not represent any content, just metadata + (e.g. a transform) that applies to its descendant nodes. + +An APZC may be associated with multiple nodes, if e.g. a scroll frame +scrolls two pieces of content that are interleaved with non-scrolling +content. + +Arranging these nodes in a tree allows modelling relationships such as +what content is scrolled by a given scroll frame, what the scroll handoff +relationships are between APZCs, and what content is subject to what +transforms. + +An additional use of the HitTestingTree is to allow APZ to keep content +processes up to date about enclosing transforms that they are subject to. +See :ref:`this section <sending-transforms-to-content-processes>` for +more details. + +(In the past, with the Layers backend, the HitTestingTree was also used +for compositor hit testing, hence the name. This is no longer the case, +and there may be opportunities to simplify the tree as a result.) + +The HitTestingTree is created from another tree data structure called +WebRenderScrollData. The relevant types here are: + +* WebRenderScrollData which stores the entire tree. +* WebRenderLayerScrollData, which represents a single "layer" of content, + i.e. a group of display items that move together when scrolling (or + metadata applying to a subtree of such layers). In the Layers backend, + such content would be rendered into a single texture which could then + be moved asynchronously at composite time. Since a layer of content can + be scrolled by multiple (nested) scroll frames, a + WebRenderLayerScrollData may contain scroll metadata for more than one + scroll frame. +* WebRenderScrollDataWrapper, which wraps WebRenderLayerScrollData + but "expanded" in a way that each node only stores metadata for + a single scroll frame. WebRenderScrollDataWrapper nodes have a + 1:1 correspondence with HitTestingTreeNodes. + +It's not clear whether the distinction between WebRenderLayerScrollData +and WebRenderScrollDataWrapper is still useful in a WebRender-only world. +The code could potentially be revised such that we directly build and +store nodes of a single type with the behaviour of +WebRenderScrollDataWrapper. + +The WebRenderScrollData structure is built on the main thread, and +then shipped over IPC to the compositor where it's used to construct +the HitTestingTree. + +WebRenderScrollData is built in WebRenderCommandBuilder, during the +same traversal of the Gecko display list that is used to build the +WebRender display list. As of this writing, the architecture for this is +that, as we walk the Gecko display list, we query it to see if it +contains any information that APZ might need to know (e.g. CSS +transforms) via a call to ``nsDisplayItem::UpdateScrollData(nullptr, +nullptr)``. If this call returns true, we create a +WebRenderLayerScrollData instance for the item, and populate it with the +necessary information in ``WebRenderLayerScrollData::Initialize``. We also +create WebRenderLayerScrollData instances if we detect (via ASR changes) +that we are now processing a Gecko display item that is in a different +scrollframe than the previous item. + +The main sources of complexity in this code come from: + +1. Ensuring the ScrollMetadata instances end on the proper + WebRenderLayerScrollData instances (such that every path from a leaf + WebRenderLayerScrollData node to the root has a consistent ordering of + scrollframes without duplications). +2. The deferred-transform optimization that is described in more detail + at the declaration of ``StackingContextHelper::mDeferredTransformItem``. + +.. _wr-hit-test-details: + +Hit-testing +^^^^^^^^^^^ + +Since the HitTestingTree is not used for actual hit-testing purposes +with the WebRender backend (see previous section), this section describes +how hit-testing actually works with WebRender. + +The Gecko display list contains display items +(``nsDisplayCompositorHitTestInfo``) that store hit-testing state. These +items implement the ``CreateWebRenderCommands`` method and generate a "hit-test +item" into the WebRender display list. This is basically just a rectangle +item in the WebRender display list that is no-op for painting purposes, +but contains information that should be returned by the hit-test (specifically +the hit info flags and the scrollId of the enclosing scrollframe). The +hit-test item gets clipped and transformed in the same way that all the other +items in the WebRender display list do, via clip chains and enclosing +reference frame/stacking context items. + +When WebRender needs to do a hit-test, it goes through its display list, +taking into account the current clips and transforms, adjusted for the +most recent async scroll/zoom, and determines which hit-test item(s) are under +the target point, and returns those items. APZ can then take the frontmost +item from that list (or skip over it if it happens to be inside a OOP +subdocument that's ``pointer-events:none``) and use that as the hit target. +Note that the hit-test uses the last transform provided by the +``SampleForWebRender`` API (see next section) which generally reflects the +last composite, and doesn't take into account further changes to the +transforms that have occurred since then. In practice, we should be +compositing frequently enough that this doesn't matter much. + +When debugging hit-test issues, it is often useful to apply the patches +on bug 1656260, which introduce a guid on Gecko display items and propagate +it all the way through to where APZ gets the hit-test result. This allows +answering the question "which nsDisplayCompositorHitTestInfo was responsible +for this hit-test result?" which is often a very good first step in +solving the bug. From there, one can determine if there was some other +display item in front that should have generated a +nsDisplayCompositorHitTestInfo but didn't, or if display item itself had +incorrect information. The second patch on that bug further allows exposing +hand-written debug info to the APZ code, so that the WR hit-testing +mechanism itself can be more effectively debugged, in case there is a problem +with the WR display items getting improperly transformed or clipped. + +The information returned by WebRender to APZ in response to the hit test +is enough for APZ to identify a HitTestingTreeNode as the target of the +event. APZ can then take actions such as scrolling the target node's +associated APZC, or other appropriate actions (e.g. initiating a scrollbar +drag if a scrollbar thumb node was targeted by a mouse-down event). + +Sampling +^^^^^^^^ + +The compositing step needs to read the latest async transforms from APZ +in order to ensure scrollframes are rendered at the right position. The API for this is +exposed via the ``APZSampler`` class. When WebRender is ready to do a composite, +it invokes ``APZSampler::SampleForWebRender``. In here, APZ gathers all async +transforms that WebRender needs to know about, including transforms to apply +to scrolled content, fixed and sticky content, and scrollbar thumbs. + +Along with sampling the APZ transforms, the compositor also triggers APZ +animations to advance to the next timestep (usually the next vsync). This +happens just before reading the APZ transforms. + +Fission Integration +~~~~~~~~~~~~~~~~~~~ + +This section describes how APZ interacts with the Fission (Site Isolation) +project. + +Introduction +^^^^^^^^^^^^ + +Fission is an architectural change motivated by security considerations, +where web content from each origin is isolated in its own process. Since +a page can contain a mixture of content from different origins (for +example, the top level page can be content from origin A, and it can +contain an iframe with content from origin B), that means that rendering +and interacting with a page can now involve coordination between APZ and +multiple content processes. + +.. _fission-hit-testing: + +Content Process Selection for Input Events +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Input events are initially received in the browser's parent process. +With Fission, the browser needs to decide which of possibly several +content processes an event is targeting. + +Since process boundaries correspond to iframe (subdocument) boundaries, +and every (html) document has a root scroll frame, process boundaries are +therefore also scroll frame boundaries. Since APZ already needs a hit +test mechanism to be able to determine which scroll frame an event +targets, this hit test mechanism was a good fit to also use to determine +which content process an event targets. + +APZ's hit test was therefore expanded to serve this purpose as well. This +mostly required only minor modifications, such as making sure that APZ +knows about the root scroll frames of iframes even if they're not +scrollable. Since APZ already needs to process all input events to +potentially apply :ref:`untransformations <input-event-untransformation>` +related to async scrolling, as part of this process it now also labels +input events with information identifying which content process they +target. + +Hit Testing Accuracy +^^^^^^^^^^^^^^^^^^^^ + +Prior to Fission, APZ's hit test could afford to be somewhat inaccurate, +as it could fall back on the dispatch-to-content mechanism to wait for +a more accurate answer from the main thread if necessary, suffering a +performance cost only (not a correctness cost). + +With Fission, an inaccurate compositor hit test now implies a correctness +cost, as there is no cross-process main-thread fallback mechanism. +(Such a mechanism was considered, but judged to require too much +complexity and IPC traffic to be worth it.) + +Luckily, with WebRender the compositor has much more detailed information +available to use for hit testing than it did with Layers. For example, +the compositor can perform accurate hit testing even in the presence of +irregular shapes such as rounded corners. + +APZ leverages WebRender's more accurate hit testing ability to aim to +accurately select the target process (and target scroll frame) for an +event in general. + +One consequence of this is that the dispatch-to-content mechanism is now +used less often than before (its primary remaining use is handling +`preventDefault()`). + +.. _sending-transforms-to-content-processes: + +Sending Transforms To Content Processes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Content processes sometimes need to be able to convert between screen +coordinates and their local coordinates. To do this, they need to know +about any transforms that their containing iframe and its ancestors are +subject to, including async transforms (particularly in cases where the +async transforms persist for more than just a few frames). + +APZ has information about these transforms in its HitTestingTree. With +Fission, APZ periodically sends content processes information about these +transforms so that they are kept relatively up to date. + +Testing +------- + +APZ makes use of several test frameworks to verify the expected behavior +is seen. + +Mochitest +~~~~~~~~~ + +The APZ specific mochitests are useful when specific gestures or events need to be tested +with specific content. The APZ mochitests are located in `gfx/layers/apz/test/mochitest`_. +To run all of the APZ mochitests, run something like the following: + +:: + + ./mach mochitest ./gfx/layers/apz/test/mochitest + +The APZ mochitests are often organized as subtests that run in a group. For example, +the `test_group_hittest-2.html`_ contains >20 subtests like +`helper_hittest_overscroll.html`_. When working on a specific subtest, it is often +helpful to use the `apz.subtest` preference to filter the subtests run to just the +tests you are working on. For example, the following would only run the +`helper_hittest_overscroll.html`_ subtest of the `test_group_hittest-2.html`_ group. + +:: + + ./mach mochitest --setpref apz.subtest=helper_hittest_overscroll.html \ + ./gfx/layers/apz/test/mochitest/test_group_hittest-2.html + +For more information on mochitest, see the `Mochitest Documentation`_. + +.. _gfx/layers/apz/test/mochitest: https://searchfox.org/mozilla-central/source/gfx/layers/apz/test/mochitest +.. _test_group_hittest-2.html: https://searchfox.org/mozilla-central/source/gfx/layers/apz/test/mochitest/test_group_hittest-2.html +.. _helper_hittest_overscroll.html: https://searchfox.org/mozilla-central/source/gfx/layers/apz/test/mochitest/helper_hittest_overscroll.html +.. _Mochitest Documentation: /testing/mochitest-plain/index.html + +GTest +~~~~~ + +The APZ specific GTests can be found in `gfx/layers/apz/test/gtest/`_. To run +these tests, run something like the following: + +:: + + ./mach gtest "APZ*" + +For more information, see the `GTest Documentation`_. + +.. _GTest Documentation: /gtest/index.html +.. _gfx/layers/apz/test/gtest/: https://searchfox.org/mozilla-central/source/gfx/layers/apz/test/gtest/ + +Reftests +~~~~~~~~ + +The APZ reftests can be found in `layout/reftests/async-scrolling/`_ and +`gfx/layers/apz/test/reftest`_. To run the relevant reftests for APZ, run +a large portion of the APZ reftests, run something like the following: + +:: + + ./mach reftest ./layout/reftests/async-scrolling/ + +Useful information about the reftests can be found in the `Reftest Documentation`_. + +There is no defined process for choosing which directory the APZ reftests +should be placed in, but in general reftests should exist where other +similar tests do. + +.. _layout/reftests/async-scrolling/: https://searchfox.org/mozilla-central/source/layout/reftests/async-scrolling/ +.. _gfx/layers/apz/test/reftest: https://searchfox.org/mozilla-central/source/gfx/layers/apz/test/reftest/ +.. _Reftest Documentation: /layout/Reftest.html + +Threading / Locking Overview +---------------------------- + +Threads +~~~~~~~ + +There are three threads relevant to APZ: the **controller thread**, +the **updater thread**, and the **sampler thread**. This table lists +which threads play these roles on each platform / configuration: + +===================== ============= ============== ============= +APZ Thread Name Desktop Desktop+GPU Android +===================== ============= ============== ============= +**controller thread** UI main GPU main Java UI +**updater thread** SceneBuilder SceneBuilder SceneBuilder +**sampler thread** RenderBackend RenderBackend RenderBackend +===================== ============= ============== ============= + +Locks +~~~~~ + +There are also a number of locks used in APZ code: + +======================= ============================== +Lock type How many instances +======================= ============================== +APZ tree lock one per APZCTreeManager +APZC map lock one per APZCTreeManager +APZC instance lock one per AsyncPanZoomController +APZ test lock one per APZCTreeManager +Checkerboard event lock one per AsyncPanZoomController +======================= ============================== + +Thread / Lock Ordering +~~~~~~~~~~~~~~~~~~~~~~ + +To avoid deadlocks, the threads and locks have a global **ordering** +which must be respected. + +Respecting the ordering means the following: + +- Let "A < B" denote that A occurs earlier than B in the ordering +- Thread T may only acquire lock L, if T < L +- A thread may only acquire lock L2 while holding lock L1, if L1 < L2 +- A thread may only block on a response from another thread T while holding a lock L, if L < T + +**The lock ordering is as follows**: + +1. UI main +2. GPU main (only if GPU process enabled) +3. Compositor thread +4. SceneBuilder thread +5. **APZ tree lock** +6. RenderBackend thread +7. **APZC map lock** +8. **APZC instance lock** +9. **APZ test lock** +10. **Checkerboard event lock** + +Example workflows +^^^^^^^^^^^^^^^^^ + +Here are some example APZ workflows. Observe how they all obey +the global thread/lock ordering. Feel free to add others: + +- **Input handling** (with GPU process): UI main -> GPU main -> APZ tree lock -> RenderBackend thread +- **Sync messages** in ``PCompositorBridge.ipdl``: UI main thread -> Compositor thread +- **GetAPZTestData**: Compositor thread -> SceneBuilder thread -> test lock +- **Scene swap**: SceneBuilder thread -> APZ tree lock -> RenderBackend thread +- **Updating hit-testing tree**: SceneBuilder thread -> APZ tree lock -> APZC instance lock +- **Updating APZC map**: SceneBuilder thread -> APZ tree lock -> APZC map lock +- **Sampling and animation deferred tasks** [1]_: RenderBackend thread -> APZC map lock -> APZC instance lock +- **Advancing animations**: RenderBackend thread -> APZC instance lock + +.. [1] It looks like there are two deferred tasks that actually need the tree lock, + ``AsyncPanZoomController::HandleSmoothScrollOverscroll`` and + ``AsyncPanZoomController::HandleFlingOverscroll``. We should be able to rewrite + these to use the map lock instead of the tree lock. + This will allow us to continue running the deferred tasks on the sampler + thread rather than having to bounce them to another thread. diff --git a/gfx/docs/AsyncPanZoomArchitecture.png b/gfx/docs/AsyncPanZoomArchitecture.png Binary files differnew file mode 100644 index 0000000000..d19dcb7c8b --- /dev/null +++ b/gfx/docs/AsyncPanZoomArchitecture.png diff --git a/gfx/docs/GraphicsOverview.rst b/gfx/docs/GraphicsOverview.rst new file mode 100644 index 0000000000..a37bc255ff --- /dev/null +++ b/gfx/docs/GraphicsOverview.rst @@ -0,0 +1,149 @@ +Graphics Overview +========================= + +Work in progress. Possibly incorrect or incomplete. +--------------------------------------------------- + +Jargon +------ + +There's a lot of jargon in the graphics stack. We try to maintain a list +of common words and acronyms `here <https://wiki.mozilla.org/Platform/GFX/Jargon>`__. + +Overview +-------- + +The graphics systems is responsible for rendering (painting, drawing) +the frame tree (rendering tree) elements as created by the layout +system. Each leaf in the tree has content, either bounded by a rectangle +(or perhaps another shape, in the case of SVG.) + +The simple approach for producing the result would thus involve +traversing the frame tree, in a correct order, drawing each frame into +the resulting buffer and displaying (printing non-withstanding) that +buffer when the traversal is done. It is worth spending some time on the +“correct order” note above. If there are no overlapping frames, this is +fairly simple - any order will do, as long as there is no background. If +there is background, we just have to worry about drawing that first. +Since we do not control the content, chances are the page is more +complicated. There are overlapping frames, likely with transparency, so +we need to make sure the elements are draw “back to front”, in layers, +so to speak. Layers are an important concept, and we will revisit them +shortly, as they are central to fixing a major issue with the above +simple approach. + +While the above simple approach will work, the performance will suffer. +Each time anything changes in any of the frames, the complete process +needs to be repeated, everything needs to be redrawn. Further, there is +very little space to take advantage of the modern graphics (GPU) +hardware, or multi-core computers. If you recall from the previous +sections, the frame tree is only accessible from the UI thread, so while +we’re doing all this work, the UI is basically blocked. + +(Retained) Layers +~~~~~~~~~~~~~~~~~ + +Layers framework was introduced to address the above performance issues, +by having a part of the design address each item. At the high level: + +1. We create a layer tree. The leaf elements of the tree contain all + frames (possibly multiple frames per leaf). +2. We render each layer tree element and cache (retain) the result. +3. We composite (combine) all the leaf elements into the final result. + +Let’s examine each of these steps, in reverse order. + +Compositing +~~~~~~~~~~~ + +We use the term composite as it implies that the order is important. If +the elements being composited overlap, whether there is transparency +involved or not, the order in which they are combined will effect the +result. Compositing is where we can use some of the power of the modern +graphics hardware. It is optimal for doing this job. In the scenarios +where only the position of individual frames changes, without the +content inside them changing, we see why caching each layer would be +advantageous - we only need to repeat the final compositing step, +completely skipping the layer tree creation and the rendering of each +leaf, thus speeding up the process considerably. + +Another benefit is equally apparent in the context of the stated +deficiencies of the simple approach. We can use the available graphics +hardware accelerated APIs to do the compositing step. Direct3D, OpenGL +can be used on different platforms and are well suited to accelerate +this step. + +Finally, we can now envision performing the compositing step on a +separate thread, unblocking the UI thread for other work, and doing more +work in parallel. More on this below. + +It is important to note that the number of operations in this step is +proportional to the number of layer tree (leaf) elements, so there is +additional work and complexity involved, when the layer tree is large. + +Render and retain layer elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As we saw, the compositing step benefits from caching the intermediate +result. This does result in the extra memory usage, so needs to be +considered during the layer tree creation. Beyond the caching, we can +accelerate the rendering of each element by (indirectly) using the +available platform APIs (e.g., Direct2D, CoreGraphics, even some of the +3D APIs like OpenGL or Direct3D) as available. This is actually done +through a platform independent API (see Moz2D) below, but is important +to realize it does get accelerated appropriately. + +Creating the layer tree +~~~~~~~~~~~~~~~~~~~~~~~ + +We need to create a layer tree (from the frames tree), which will give +us the correct result while striking the right balance between a layer +per frame element and a single layer for the complete frames tree. As +was mentioned above, there is an overhead in traversing the whole tree +and caching each of the elements, balanced by the performance +improvements. Some of the performance improvements are only noticed when +something changes (e.g., one element is moving, we only need to redo the +compositing step). + +Refresh Driver +~~~~~~~~~~~~~~ + +Layers +~~~~~~ + +Rendering each layer +~~~~~~~~~~~~~~~~~~~~ + +Tiling vs. Buffer Rotation vs. Full paint +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Compositing for the final result +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Graphics API +~~~~~~~~~~~~ + +Compositing +~~~~~~~~~~~ + +Image Decoding +~~~~~~~~~~~~~~ + +Image Animation +~~~~~~~~~~~~~~~ + +`Historical Documents <http://www.youtube.com/watch?v=lLZQz26-kms>`__ +--------------------------------------------------------------------- + +A number of posts and blogs that will give you more details or more +background, or reasoning that led to different solutions and approaches. + +- 2010-01 `Layers: Cross Platform Acceleration <http://www.basschouten.com/blog1.php/layers-cross-platform-acceleration>`__ +- 2010-04 `Layers <http://robert.ocallahan.org/2010/04/layers_01.html>`__ +- 2010-07 `Retained Layers <http://robert.ocallahan.org/2010/07/retained-layers_16.html>`__ +- 2011-04 `Introduction <https://web.archive.org/web/20140604005804/https://blog.mozilla.org/joe/2011/04/26/introducing-the-azure-project/>`__ +- 2011-07 `Layers <http://chrislord.net/index.php/2011/07/25/shadow-layers-and-learning-by-failing/%20Shadow>`__ +- 2011-09 `Graphics API Design <http://robert.ocallahan.org/2011/09/graphics-api-design.html>`__ +- 2012-04 `Moz2D Canvas on OSX <http://muizelaar.blogspot.ca/2012/04/azure-canvas-on-os-x.html>`__ +- 2012-05 `Mask Layers <http://featherweightmusings.blogspot.co.uk/2012/05/mask-layers_26.html>`__ +- 2013-07 `Graphics related <http://www.basschouten.com/blog1.php>`__ diff --git a/gfx/docs/LayersHistory.rst b/gfx/docs/LayersHistory.rst new file mode 100644 index 0000000000..5ab2e2a821 --- /dev/null +++ b/gfx/docs/LayersHistory.rst @@ -0,0 +1,63 @@ +Layers History +============== + +This is an overview of the major events in the history of our Layers +infrastructure. + +- iPhone released in July 2007 (Built on a toolkit called LayerKit) + +- Core Animation (October 2007) LayerKit was publicly renamed to OS X + 10.5 + +- Webkit CSS 3d transforms (July 2009) + +- Original layers API (March 2010) Introduced the idea of a layer + manager that would composite. One of the first use cases for this was + hardware accelerated YUV conversion for video. + +- Retained layers (July 7 2010 - Bug 564991) This was an important + concept that introduced the idea of persisting the layer content + across paints in gecko controlled buffers instead of just by the OS. + This introduced the concept of buffer rotation to deal with scrolling + instead of using the native scrolling APIs like ScrollWindowEx + +- Layers IPC (July 2010 - Bug 570294) This introduced shadow layers and + edit lists and was originally done for e10s v1 + +- 3D transforms (September 2011 - Bug 505115) + +- OMTC (December 2011 - Bug 711168) This was prototyped on OS X but + shipped first for Fennec + +- Tiling v1 (April 2012 - Bug 739679) Originally done for Fennec. This + was done to avoid situations where we had to do a bunch of work for + scrolling a small amount. i.e. buffer rotation. It allowed us to have + a variety of interesting features like progressive painting and lower + resolution painting. + +- C++ Async pan zoom controller (July 2012 - Bug 750974) The existing + APZ code was in Java for Fennec so this was reimplemented. + +- Streaming WebGL Buffers (February 2013 - Bug 716859) Infrastructure + to allow OMTC WebGL and avoid the need to glFinish() every frame. + +- Compositor API (April 2013 - Bug 825928) The planning for this + started around November 2012. Layers refactoring created a compositor + API that abstracted away the differences between the D3D vs OpenGL. + The main piece of API is DrawQuad. + +- Tiling v2 (Mar 7 2014 - Bug 963073) Tiling for B2G. This work is + mainly porting tiled layers to new textures, implementing + double-buffered tiles and implementing a texture client pool, to be + used by tiled content clients. + + A large motivation for the pool was the very slow performance of + allocating tiles because of the sync messages to the compositor. + + The slow performance of allocating was directly addressed by bug 959089 + which allowed us to allocate gralloc buffers without sync messages to + the compositor thread. + +- B2G WebGL performance (May 2014 - Bug 1006957, 1001417, 1024144) This + work improved the synchronization mechanism between the compositor + and the producer. diff --git a/gfx/docs/Moz2D.rst b/gfx/docs/Moz2D.rst new file mode 100644 index 0000000000..0be251a209 --- /dev/null +++ b/gfx/docs/Moz2D.rst @@ -0,0 +1,16 @@ +Moz2D +======================== + +The `gfx/2d` contains our abstraction of a typical 2D API (similar +to the HTML Canvas API). It has different backends used for different +purposes. Direct2D is used for implementing hardware accelerated +canvas on Windows. Skia is used for any software drawing needs and +Cairo is used for printing. + +Previously, Moz2D aimed to be buildable independently from the rest of +Gecko but we've slipped from this because C++/Gecko don't have a good +mechanism for modularization/dependencies. That being said, we still try +to keep the coupling with the rest of Gecko low for hygiene, simplicity +and perhaps a more modular future. + +See also `Moz2D documentation on wiki <https://wiki.mozilla.org/Platform/GFX/Moz2D>`. diff --git a/gfx/docs/RenderingOverview.rst b/gfx/docs/RenderingOverview.rst new file mode 100644 index 0000000000..1b2e30b6f6 --- /dev/null +++ b/gfx/docs/RenderingOverview.rst @@ -0,0 +1,384 @@ +Rendering Overview +================== + +This document is an overview of the steps to render a webpage, and how HTML +gets transformed and broken down, step by step, into commands that can execute +on the GPU. + +If you're coming into the graphics team with not a lot of background +in browsers, start here :) + +.. contents:: + +High level overview +------------------- + +.. image:: RenderingOverviewSimple.png + :width: 100% + +Layout +~~~~~~ +Starting at the left in the above image, we have a document +represented by a DOM - a Document Object Model. A Javascript engine +will execute JS code, either to make changes to the DOM, or to respond to +events generated by the DOM (or do both). + +The DOM is a high level description and we don't know what to draw or +where until it is combined with a Cascading Style Sheet (CSS). +Combining these two and figuring out what, where and how to draw +things is the responsibility of the Layout team. The +DOM is converted into a hierarchical Frame Tree, which nests visual +elements (boxes). Each element points to some node in a Style Tree +that describes what it should look like -- color, transparency, etc. +The result is that now we know exactly what to render where, what goes +on top of what (layering and blending) and at what pixel coordinate. +This is the Display List. + +The Display List is a light-weight data structure because it's shallow +-- it mostly points back to the Frame Tree. There are two problems +with this. First, we want to cross process boundaries at this point. +Everything up until now happens in a Content Process (of which there are +several). Actual GPU rendering happens in a GPU Process (on some +platforms). Second, everything up until now was written in C++; but +WebRender is written in Rust. Thus the shallow Display List needs to +be serialized in a completely self-contained binary blob that will +survive Interprocess Communication (IPC) and a language switch (C++ to +Rust). The result is the WebRender Display List. + +WebRender +~~~~~~~~~ + +The GPU process receives the WebRender Display List blob and +de-serializes it into a Scene. This Scene contains more than the +strictly visible elements; for example, to anticipate scrolling, we +might have several paragraphs of text extending past the visible page. + +For a given viewport, the Scene gets culled and stripped down to a +Frame. This is also where we start preparing data structures for GPU +rendering, for example getting some font glyphs into an atlas for +rasterizing text. + +The final step takes the Frame and submits commands to the GPU to +actually render it. The GPU will execute the commands and composite +the final page. + +Software +~~~~~~~~ + +The above is the new WebRender-enabled way to do things. But in the +schematic you'll note a second branch towards the bottom: this is the +legacy code path which does not use WebRender (nor Rust). In this +case, the Display List is converted into a Layer Tree. The purpose of +this Tree is to try and avoid having to re-render absolutely +everything when the page needs to be refreshed. For example, when +scrolling we should be able to redraw the page by mostly shifting +things around. However that requires those 'things' to still be around +from last time we drew the page. In other words, visual elements that +are likely to be static and reusable need to be drawn into their own +private "page" (a cache). Then we can recombine (composite) all of +these when redrawing the actual page. + +Figuring out which elements would be good candidates for this, and +striking a balance between good performance versus excessive memory +use, is the purpose of the Layer Tree. Each 'layer' is a cached image +of some element(s). This logic also takes occlusion into account, eg. +don't allocate and render a layer for elements that are known to be +completely obscured by something in front of them. + +Redrawing the page by combining the Layer Tree with any newly +rasterized elements is the job of the Compositor. + + +Even when a layer cannot be reused in its entirety, it is likely +that only a small part of it was invalidated. Thus there is an +elaborate system for tracking dirty rectangles, starting an update by +copying the area that can be salvaged, and then redrawing only what +cannot. + +In fact, this idea can be extended to delta-tracking of display lists +themselves. Traversing the layout tree and building a display list is +also not cheap, so the code tries to partially invalidate and rebuild +the display list incrementally when possible. +This optimization is used both for non-WebRender and WebRender in +fact. + + +Asynchronous Panning And Zooming +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Earlier we mentioned that a Scene might contain more elements than are +strictly necessary for rendering what's visible (the Frame). The +reason for that is Asynchronous Panning and Zooming, or APZ for short. +The browser will feel much more responsive if scrolling & zooming can +short-circuit all of these data transformations and IPC boundaries, +and instead directly update an offset of some layer and recomposite. +(Think of late-latching in a VR context) + +This simple idea introduces a lot of complexity: how much extra do you +rasterize, and in which direction? How much memory can we afford? +What about Javascript that responds to scroll events and perhaps does +something 'interesting' with the page in return? What about nested +frames or nested scrollbars? What if we scroll so much that we go +past the boundaries of the Scene that we know about? + +See AsyncPanZoom.rst for all that and more. + +A Few More Details +~~~~~~~~~~~~~~~~~~ + +Here's another schematic which basically repeats the previous one, but +showing a little bit more detail. Note that the direction is reversed +-- the data flow starts at the right. Sorry about that :) + +.. image:: RenderingOverviewDetail.png + :width: 100% + +Some things to note: + +- there are multiple content processes, currently 4 of them. This is + for security reasons (sandboxing), stability (isolate crashes) and + performance (multi-core machines); +- ideally each "webpage" would run in its own process for security; + this is being developed under the term 'fission'; +- there is only a single GPU process, if there is one at all; + some platforms have it as part of the Parent; +- not shown here is the Extension process that isolates WebExtensions; +- for non-WebRender, rasterization happens in the Content Process, and + we send entire Layers to the GPU/Compositor process (via shared + memory, only using actual IPC for its metadata like width & height); +- if the GPU process crashes (a bug or a driver issue) we can simply + restart it, resend the display list, and the browser itself doesn't crash; +- the browser UI is just another set of DOM+JS, albeit one that runs + with elevated privileges. That is, its JS can do things that + normal JS cannot. It lives in the Parent Process, which then uses + IPC to get it rendered, same as regular Content. (the IPC arrow also + goes to WebRender Display List but is omitted to reduce clutter); +- UI events get routed to APZ first, to minimize latency. By running + inside the GPU process, we may have access to data such + as rasterized clipping masks that enables finer grained hit testing; +- the GPU process talks back to the content process; in particular, + when APZ scrolls out of bounds, it asks Content to enlarge/shift the + Scene with a new "display port"; +- we still use the GPU when we can for compositing even in the + non-WebRender case; + + +WebRender In Detail +------------------- + +Converting a display list into GPU commands is broken down into a +number of steps and intermediate data structures. + + +.. image:: RenderingOverviewTrees.png + :width: 75% + :align: center + +.. + + *Each element in the picture tree points to exactly one node in the spatial + tree. Only a few of these links are shown for clarity (the dashed lines).* + +The Picture Tree +~~~~~~~~~~~~~~~~ + +The incoming display list uses "stacking contexts". For example, to +render some text with a drop shadow, a display list will contain three +items: + +- "enable shadow" with some parameters such as shadow color, blur size, and offset; +- the text item; +- "pop all shadows" to deactivate shadows; + +WebRender will break this down into two distinct elements, or +"pictures". The first represents the shadow, so it contains a copy of the +text item, but modified to use the shadow's color, and to shift the +text by the shadow's offset. The second picture contains the original text +to draw on top of the shadow. + +The fact that the first picture, the shadow, needs to be blurred, is a +"compositing" property of the picture which we'll deal with later. + +Thus, the stack-based display list gets converted into a list of pictures +-- or more generally, a hierarchy of pictures, since items are nested +as per the original HTML. + +Example visual elements are a TextRun, a LineDecoration, or an Image +(like a .png file). + +Compared to 3D rendering, the picture tree is similar to a scenegraph: it's a +parent/child hierarchy of all the drawable elements that make up the "scene", in +this case the webpage. One important difference is that the transformations are +stored in a separate tree, the spatial tree. + +The Spatial Tree +~~~~~~~~~~~~~~~~ + +The nodes in the spatial tree represent coordinate transforms. Every time the +DOM hierarchy needs child elements to be transformed relative to their parent, +we add a new Spatial Node to the tree. All those child elements will then point +to this node as their "local space" reference (aka coordinate frame). In +traditional 3D terms, it's a scenegraph but only containing transform nodes. + +The nodes are called frames, as in "coordinate frame": + +- a Reference Frame corresponds to a ``<div>``; +- a Scrolling Frame corresponds to a scrollable part of the page; +- a Sticky Frame corresponds to some fixed position CSS style. + +Each element in the picture tree then points to a spatial node inside this tree, +so by walking up and down the tree we can find the absolute position of where +each element should render (traversing down) and how large each element needs to +be (traversing up). Originally the transform information was part of the +picture tree, as in a traditional scenegraph, but visual elements and their +transforms were split apart for technical reasons. + +Some of these nodes are dynamic. A scroll-frame can obviously scroll, but a +Reference Frame might also use a property binding to enable a live link with +JavaScript, for dynamic updates of (currently) the transform and opacity. + +Axis-aligned transformations (scales and translations) are considered "simple", +and are conceptually combined into a single "CoordinateSystem". When we +encounter a non-axis-aligned transform, we start a new CoordinateSystem. We +start in CoordinateSystem 0 at the root, and would bump this to CoordinateSystem +1 when we encounter a Reference Frame with a rotation or 3D transform, for +example. This would then be the CoordinateSystem index for all its children, +until we run into another (nested) non-simple transform, and so on. Roughly +speaking, as long as we're in the same CoordinateSystem, the transform stack is +simple enough that we have a reasonable chance of being able to flatten it. That +lets us directly rasterize text at its final scale for example, optimizing +away some of the intermediate pictures (offscreen textures). + +The layout code positions elements relative to their parent. Thus to position +the element on the actual page, we need to walk the Spatial Tree all the way to +the root and apply each transform; the result is a ``LayoutToWorldTransform``. + +One final step transforms from World to Device coordinates, which deals with +DPI scaling and such. + +.. csv-table:: + :header: "WebRender term", "Rough analogy" + + Spatial Tree, Scenegraph -- transforms only + Picture Tree, Scenegraph -- drawables only (grouping) + Spatial Tree Rootnode, World Space + Layout space, Local/Object Space + Picture, RenderTarget (sort of; see RenderTask below) + Layout-To-World transform, Local-To-World transform + World-To-Device transform, World-To-Clipspace transform + + +The Clip Tree +~~~~~~~~~~~~~ + +Finally, we also have a Clip Tree, which contains Clip Shapes. For +example, a rounded corner div will produce a clip shape, and since +divs can be nested, you end up with another tree. By pointing at a Clip Shape, +visual elements will be clipped against this shape plus all parent shapes above it +in the Clip Tree. + +As with CoordinateSystems, a chain of simple 2D clip shapes can be collapsed +into something that can be handled in the vertex shader, at very little extra +cost. More complex clips must be rasterized into a mask first, which we then +sample from to ``discard`` in the pixel shader as needed. + +In summary, at the end of scene building the display list turned into +a picture tree, plus a spatial tree that tells us what goes where +relative to what, plus a clip tree. + +RenderTask Tree +~~~~~~~~~~~~~~~ + +Now in a perfect world we could simply traverse the picture tree and start +drawing things: one drawcall per picture to render its contents, plus one +drawcall to draw the picture into its parent. However, recall that the first +picture in our example is a "text shadow" that needs to be blurred. We can't +just rasterize blurry text directly, so we need a number of steps or "render +passes" to get the intended effect: + +.. image:: RenderingOverviewBlurTask.png + :align: right + :height: 400px + +- rasterize the text into an offscreen rendertarget; +- apply one or more downscaling passes until the blur radius is reasonable; +- apply a horizontal Gaussian blur; +- apply a vertical Gaussian blur; +- use the result as an input for whatever comes next, or blit it to + its final position on the page (or more generally, on the containing + parent surface/picture). + +In the general case, which passes we need and how many of them depends +on how the picture is supposed to be composited (CSS filters, SVG +filters, effects) and its parameters (very large vs. small blur +radius, say). + +Thus, we walk the picture tree and build a render task tree: each high +level abstraction like "blur me" gets broken down into the necessary +render passes to get the effect. The result is again a tree because a +render pass can have multiple input dependencies (eg. blending). + +(Cfr. games, this has echoes of the Frostbite Framegraph in that it +dynamically builds up a renderpass DAG and dynamically allocates storage +for the outputs). + +If there are complicated clip shapes that need to be rasterized first, +so their output can be sampled as a texture for clip/discard +operations, that would also end up in this tree as a dependency... (I think?). + +Once we have the entire tree of dependencies, we analyze it to see +which tasks can be combined into a single pass for efficiency. We +ping-pong rendertargets when we can, but sometimes the dependencies +cut across more than one level of the rendertask tree, and some +copying is necessary. + +Once we've figured out the passes and allocated storage for anything +we wish to persist in the texture cache, we finally start rendering. + +When rasterizing the elements into the Picture's offscreen texture, we'd +position them by walking the transform hierarchy as far up as the picture's +transform node, resulting in a ``Layout To Picture`` transform. The picture +would then go onto the page using a ``Picture To World`` coordinate transform. + +Caching +``````` + +Just as with layers in the software rasterizer, it is not always necessary to +redraw absolutely everything when parts of a document change. The webrender +equivalent of layers is Slices -- a grouping of pictures that are expected to +render and update together. Slices are automatically created based on +heuristics and layout hints/flags. + +Implementation wise, slices reuse a lot of the existing machinery for Pictures; +in fact they're implemented as a "Virtual picture" of sorts. The similarities +make sense: both need to allocate offscreen textures in a cache, both will +position and render all their children into it, and both then draw themselves +into their parent as part of the parent's draw. + +If a slice isn't expected to change much, we give it a TileCacheInstance. It is +itself made up of Tiles, where each tile will track what's in it, what's +changing, and if it needs to be invalidated and redrawn or not as a result. +Thus the "damage" from changes can be localized to single tiles, while we +salvage the rest of the cache. If tiles keep seeing a lot of invalidations, +they will recursively divide themselves in a quad-tree like structure to try and +localize the invalidations. (And conversely, they'll recombine children if +nothing is invalidating them "for a while"). + +Interning +````````` + +To spot invalidated tiles, we need a fast way to compare its contents from the +previous frame with the current frame. To speed this up, we use interning; +similar to string-interning, this means that each ``TextRun``, ``Decoration``, +``Image`` and so on is registered in a repository (a ``DataStore``) and +consequently referred to by its unique ID. Cache contents can then be encoded as a +list of IDs (one such list per internable element type). Diffing is then just a +fast list comparison. + + +Callbacks +````````` +GPU text rendering assumes that the individual font-glyphs are already +available in a texture atlas. Likewise SVG is not being rendered on +the GPU. Both inputs are prepared during scene building; glyph +rasterization via a thread pool from within Rust itself, and SVG via +opaque callbacks (back to C++) that produce blobs. diff --git a/gfx/docs/RenderingOverviewBlurTask.png b/gfx/docs/RenderingOverviewBlurTask.png Binary files differnew file mode 100644 index 0000000000..baffc08f32 --- /dev/null +++ b/gfx/docs/RenderingOverviewBlurTask.png diff --git a/gfx/docs/RenderingOverviewDetail.png b/gfx/docs/RenderingOverviewDetail.png Binary files differnew file mode 100644 index 0000000000..2909a811e4 --- /dev/null +++ b/gfx/docs/RenderingOverviewDetail.png diff --git a/gfx/docs/RenderingOverviewSimple.png b/gfx/docs/RenderingOverviewSimple.png Binary files differnew file mode 100644 index 0000000000..43c0a59439 --- /dev/null +++ b/gfx/docs/RenderingOverviewSimple.png diff --git a/gfx/docs/RenderingOverviewTrees.png b/gfx/docs/RenderingOverviewTrees.png Binary files differnew file mode 100644 index 0000000000..ffdf0812fa --- /dev/null +++ b/gfx/docs/RenderingOverviewTrees.png diff --git a/gfx/docs/Silk.rst b/gfx/docs/Silk.rst new file mode 100644 index 0000000000..16e4cdfc7b --- /dev/null +++ b/gfx/docs/Silk.rst @@ -0,0 +1,472 @@ +Silk Overview +========================== + +.. image:: SilkArchitecture.png + +Architecture +------------ + +Our current architecture is to align three components to hardware vsync +timers: + +1. Compositor +2. RefreshDriver / Painting +3. Input Events + +The flow of our rendering engine is as follows: + +1. Hardware Vsync event occurs on an OS specific *Hardware Vsync Thread* + on a per monitor basis. +2. The *Hardware Vsync Thread* attached to the monitor notifies the + ``CompositorVsyncDispatchers`` and ``VsyncDispatcher``. +3. For every Firefox window on the specific monitor, notify a + ``CompositorVsyncDispatcher``. The ``CompositorVsyncDispatcher`` is + specific to one window. +4. The ``CompositorVsyncDispatcher`` notifies a + ``CompositorWidgetVsyncObserver`` when remote compositing, or a + ``CompositorVsyncScheduler::Observer`` when compositing in-process. +5. If remote compositing, a vsync notification is sent from the + ``CompositorWidgetVsyncObserver`` to the ``VsyncBridgeChild`` on the + UI process, which sends an IPDL message to the ``VsyncBridgeParent`` + on the compositor thread of the GPU process, which then dispatches to + ``CompositorVsyncScheduler::Observer``. +6. The ``VsyncDispatcher`` notifies the Chrome + ``RefreshTimer`` that a vsync has occurred. +7. The ``VsyncDispatcher`` sends IPC messages to all content + processes to tick their respective active ``RefreshTimer``. +8. The ``Compositor`` dispatches input events on the *Compositor + Thread*, then composites. Input events are only dispatched on the + *Compositor Thread* on b2g. +9. The ``RefreshDriver`` paints on the *Main Thread*. + +Hardware Vsync +-------------- + +Hardware vsync events from (1), occur on a specific ``Display`` Object. +The ``Display`` object is responsible for enabling / disabling vsync on +a per connected display basis. For example, if two monitors are +connected, two ``Display`` objects will be created, each listening to +vsync events for their respective displays. We require one ``Display`` +object per monitor as each monitor may have different vsync rates. As a +fallback solution, we have one global ``Display`` object that can +synchronize across all connected displays. The global ``Display`` is +useful if a window is positioned halfway between the two monitors. Each +platform will have to implement a specific ``Display`` object to hook +and listen to vsync events. As of this writing, both Firefox OS and OS X +create their own hardware specific *Hardware Vsync Thread* that executes +after a vsync has occurred. OS X creates one *Hardware Vsync Thread* per +``CVDisplayLinkRef``. We do not currently support multiple displays, so +we use one global ``CVDisplayLinkRef`` that works across all active +displays. On Windows, we have to create a new platform ``thread`` that +waits for DwmFlush(), which works across all active displays. Once the +thread wakes up from DwmFlush(), the actual vsync timestamp is retrieved +from DwmGetCompositionTimingInfo(), which is the timestamp that is +actually passed into the compositor and refresh driver. + +When a vsync occurs on a ``Display``, the *Hardware Vsync Thread* +callback fetches all ``CompositorVsyncDispatchers`` associated with the +``Display``. Each ``CompositorVsyncDispatcher`` is notified that a vsync +has occurred with the vsync’s timestamp. It is the responsibility of the +``CompositorVsyncDispatcher`` to notify the ``Compositor`` that is +awaiting vsync notifications. The ``Display`` will then notify the +associated ``VsyncDispatcher``, which should notify all +active ``RefreshDrivers`` to tick. + +All ``Display`` objects are encapsulated in a ``VsyncSource`` object. +The ``VsyncSource`` object lives in ``gfxPlatform`` and is instantiated +only on the parent process when ``gfxPlatform`` is created. The +``VsyncSource`` is destroyed when ``gfxPlatform`` is destroyed. It can +also be destroyed when the layout frame rate pref (or other prefs that +influence frame rate) are changed. This may mean we switch from hardware +to software vsync (or vice versa) at runtime. During the switch, there +may briefly be 2 vsync sources. Otherwise, there is only one +``VsyncSource`` object throughout the entire lifetime of Firefox. Each +platform is expected to implement their own ``VsyncSource`` to manage +vsync events. On OS X, this is through ``CVDisplayLinkRef``. On +Windows, it should be through ``DwmGetCompositionTimingInfo``. + +Compositor +---------- + +When the ``CompositorVsyncDispatcher`` is notified of the vsync event, +the ``CompositorVsyncScheduler::Observer`` associated with the +``CompositorVsyncDispatcher`` begins execution. Since the +``CompositorVsyncDispatcher`` executes on the *Hardware Vsync Thread* +and the ``Compositor`` composites on the ``CompositorThread``, the +``CompositorVsyncScheduler::Observer`` posts a task to the +``CompositorThread``. The ``CompositorBridgeParent`` then composites. +The model where the ``CompositorVsyncDispatcher`` notifies components on +the *Hardware Vsync Thread*, and the component schedules the task on the +appropriate thread is used everywhere. + +The ``CompositorVsyncScheduler::Observer`` listens to vsync events as +needed and stops listening to vsync when composites are no longer +scheduled or required. Every ``CompositorBridgeParent`` is associated +and tied to one ``CompositorVsyncScheduler::Observer``, which is +associated with the ``CompositorVsyncDispatcher``. Each +``CompositorBridgeParent`` is associated with one widget and is created +when a new platform window or ``nsBaseWidget`` is created. The +``CompositorBridgeParent``, ``CompositorVsyncDispatcher``, +``CompositorVsyncScheduler::Observer``, and ``nsBaseWidget`` all have +the same lifetimes, which are created and destroyed together. + +Out-of-process Compositors +-------------------------- + +When compositing out-of-process, this model changes slightly. In this +case there are effectively two observers: a UI process observer +(``CompositorWidgetVsyncObserver``), and the +``CompositorVsyncScheduler::Observer`` in the GPU process. There are +also two dispatchers: the widget dispatcher in the UI process +(``CompositorVsyncDispatcher``), and the IPDL-based dispatcher in the +GPU process (``CompositorBridgeParent::NotifyVsync``). The UI process +observer and the GPU process dispatcher are linked via an IPDL protocol +called PVsyncBridge. ``PVsyncBridge`` is a top-level protocol for +sending vsync notifications to the compositor thread in the GPU process. +The compositor controls vsync observation through a separate actor, +``PCompositorWidget``, which (as a subactor for +``CompositorBridgeChild``) links the compositor thread in the GPU +process to the main thread in the UI process. + +Out-of-process compositors do not go through +``CompositorVsyncDispatcher`` directly. Instead, the +``CompositorWidgetDelegate`` in the UI process creates one, and gives it +a ``CompositorWidgetVsyncObserver``. This observer forwards +notifications to a Vsync I/O thread, where ``VsyncBridgeChild`` then +forwards the notification again to the compositor thread in the GPU +process. The notification is received by a ``VsyncBridgeParent``. The +GPU process uses the layers ID in the notification to find the correct +compositor to dispatch the notification to. + +CompositorVsyncDispatcher +------------------------- + +The ``CompositorVsyncDispatcher`` executes on the *Hardware Vsync +Thread*. It contains references to the ``nsBaseWidget`` it is associated +with and has a lifetime equal to the ``nsBaseWidget``. The +``CompositorVsyncDispatcher`` is responsible for notifying the +``CompositorBridgeParent`` that a vsync event has occurred. There can be +multiple ``CompositorVsyncDispatchers`` per ``Display``, one +``CompositorVsyncDispatcher`` per window. The only responsibility of the +``CompositorVsyncDispatcher`` is to notify components when a vsync event +has occurred, and to stop listening to vsync when no components require +vsync events. We require one ``CompositorVsyncDispatcher`` per window so +that we can handle multiple ``Displays``. When compositing in-process, +the ``CompositorVsyncDispatcher`` is attached to the CompositorWidget +for the window. When out-of-process, it is attached to the +CompositorWidgetDelegate, which forwards observer notifications over +IPDL. In the latter case, its lifetime is tied to a CompositorSession +rather than the nsIWidget. + +Multiple Displays +----------------- + +The ``VsyncSource`` has an API to switch a ``CompositorVsyncDispatcher`` +from one ``Display`` to another ``Display``. For example, when one +window either goes into full screen mode or moves from one connected +monitor to another. When one window moves to another monitor, we expect +a platform specific notification to occur. The detection of when a +window enters full screen mode or moves is not covered by Silk itself, +but the framework is built to support this use case. The expected flow +is that the OS notification occurs on ``nsIWidget``, which retrieves the +associated ``CompositorVsyncDispatcher``. The +``CompositorVsyncDispatcher`` then notifies the ``VsyncSource`` to +switch to the correct ``Display`` the ``CompositorVsyncDispatcher`` is +connected to. Because the notification works through the ``nsIWidget``, +the actual switching of the ``CompositorVsyncDispatcher`` to the correct +``Display`` should occur on the *Main Thread*. The current +implementation of Silk does not handle this case and needs to be built +out. + +CompositorVsyncScheduler::Observer +---------------------------------- + +The ``CompositorVsyncScheduler::Observer`` handles the vsync +notifications and interactions with the ``CompositorVsyncDispatcher``. +When the ``Compositor`` requires a scheduled composite, it notifies the +``CompositorVsyncScheduler::Observer`` that it needs to listen to vsync. +The ``CompositorVsyncScheduler::Observer`` then observes / unobserves +vsync as needed from the ``CompositorVsyncDispatcher`` to enable +composites. + +GeckoTouchDispatcher +-------------------- + +The ``GeckoTouchDispatcher`` is a singleton that resamples touch events +to smooth out jank while tracking a user’s finger. Because input and +composite are linked together, the +``CompositorVsyncScheduler::Observer`` has a reference to the +``GeckoTouchDispatcher`` and vice versa. + +Input Events +------------ + +One large goal of Silk is to align touch events with vsync events. On +Firefox OS, touchscreens often have different touch scan rates than the +display refreshes. A Flame device has a touch refresh rate of 75 HZ, +while a Nexus 4 has a touch refresh rate of 100 HZ, while the device’s +display refresh rate is 60HZ. When a vsync event occurs, we resample +touch events, and then dispatch the resampled touch event to APZ. Touch +events on Firefox OS occur on a *Touch Input Thread* whereas they are +processed by APZ on the *APZ Controller Thread*. We use `Google +Android’s touch +resampling <https://web.archive.org/web/20200909082458/http://www.masonchang.com/blog/2014/8/25/androids-touch-resampling-algorithm>`__ +algorithm to resample touch events. + +Currently, we have a strict ordering between Composites and touch +events. When a touch event occurs on the *Touch Input Thread*, we store +the touch event in a queue. When a vsync event occurs, the +``CompositorVsyncDispatcher`` notifies the ``Compositor`` of a vsync +event, which notifies the ``GeckoTouchDispatcher``. The +``GeckoTouchDispatcher`` processes the touch event first on the *APZ +Controller Thread*, which is the same as the *Compositor Thread* on b2g, +then the ``Compositor`` finishes compositing. We require this strict +ordering because if a vsync notification is dispatched to both the +``Compositor`` and ``GeckoTouchDispatcher`` at the same time, a race +condition occurs between processing the touch event and therefore +position versus compositing. In practice, this creates very janky +scrolling. As of this writing, we have not analyzed input events on +desktop platforms. + +One slight quirk is that input events can start a composite, for example +during a scroll and after the ``Compositor`` is no longer listening to +vsync events. In these cases, we notify the ``Compositor`` to observe +vsync so that it dispatches touch events. If touch events were not +dispatched, and since the ``Compositor`` is not listening to vsync +events, the touch events would never be dispatched. The +``GeckoTouchDispatcher`` handles this case by always forcing the +``Compositor`` to listen to vsync events while touch events are +occurring. + +Widget, Compositor, CompositorVsyncDispatcher, GeckoTouchDispatcher Shutdown Procedure +-------------------------------------------------------------------------------------- + +When the `nsBaseWidget shuts +down <https://hg.mozilla.org/mozilla-central/file/0df249a0e4d3/widget/nsBaseWidget.cpp#l182>`__ +- It calls nsBaseWidget::DestroyCompositor on the *Gecko Main Thread*. +During nsBaseWidget::DestroyCompositor, it first destroys the +CompositorBridgeChild. CompositorBridgeChild sends a sync IPC call to +CompositorBridgeParent::RecvStop, which calls +`CompositorBridgeParent::Destroy <https://hg.mozilla.org/mozilla-central/file/ab0490972e1e/gfx/layers/ipc/CompositorParent.cpp#l509>`__. +During this time, the *main thread* is blocked on the parent process. +CompositorBridgeParent::RecvStop runs on the *Compositor thread* and +cleans up some resources, including setting the +``CompositorVsyncScheduler::Observer`` to nullptr. +CompositorBridgeParent::RecvStop also explicitly keeps the +CompositorBridgeParent alive and posts another task to run +CompositorBridgeParent::DeferredDestroy on the Compositor loop so that +all ipdl code can finish executing. The +``CompositorVsyncScheduler::Observer`` also unobserves from vsync and +cancels any pending composite tasks. Once +CompositorBridgeParent::RecvStop finishes, the *main thread* in the +parent process continues shutting down the nsBaseWidget. + +At the same time, the *Compositor thread* is executing tasks until +CompositorBridgeParent::DeferredDestroy runs, which flushes the +compositor message loop. Now we have two tasks as both the nsBaseWidget +releases a reference to the Compositor on the *main thread* during +destruction and the CompositorBridgeParent::DeferredDestroy releases a +reference to the CompositorBridgeParent on the *Compositor Thread*. +Finally, the CompositorBridgeParent itself is destroyed on the *main +thread* once both references are gone due to explicit `main thread +destruction <https://hg.mozilla.org/mozilla-central/file/50b95032152c/gfx/layers/ipc/CompositorParent.h#l148>`__. + +With the ``CompositorVsyncScheduler::Observer``, any accesses to the +widget after nsBaseWidget::DestroyCompositor executes are invalid. Any +accesses to the compositor between the time the +nsBaseWidget::DestroyCompositor runs and the +CompositorVsyncScheduler::Observer’s destructor runs aren’t safe yet a +hardware vsync event could occur between these times. Since any tasks +posted on the Compositor loop after +CompositorBridgeParent::DeferredDestroy is posted are invalid, we make +sure that no vsync tasks can be posted once +CompositorBridgeParent::RecvStop executes and DeferredDestroy is posted +on the Compositor thread. When the sync call to +CompositorBridgeParent::RecvStop executes, we explicitly set the +CompositorVsyncScheduler::Observer to null to prevent vsync +notifications from occurring. If vsync notifications were allowed to +occur, since the ``CompositorVsyncScheduler::Observer``\ ’s vsync +notification executes on the *hardware vsync thread*, it would post a +task to the Compositor loop and may execute after +CompositorBridgeParent::DeferredDestroy. Thus, we explicitly shut down +vsync events in the ``CompositorVsyncDispatcher`` and +``CompositorVsyncScheduler::Observer`` during nsBaseWidget::Shutdown to +prevent any vsync tasks from executing after +CompositorBridgeParent::DeferredDestroy. + +The ``CompositorVsyncDispatcher`` may be destroyed on either the *main +thread* or *Compositor Thread*, since both the nsBaseWidget and +``CompositorVsyncScheduler::Observer`` race to destroy on different +threads. nsBaseWidget is destroyed on the *main thread* and releases a +reference to the ``CompositorVsyncDispatcher`` during destruction. The +``CompositorVsyncScheduler::Observer`` has a race to be destroyed either +during CompositorBridgeParent shutdown or from the +``GeckoTouchDispatcher`` which is destroyed on the main thread with +`ClearOnShutdown <https://hg.mozilla.org/mozilla-central/file/21567e9a6e40/xpcom/base/ClearOnShutdown.h#l15>`__. +Whichever object, the CompositorBridgeParent or the +``GeckoTouchDispatcher`` is destroyed last will hold the last reference +to the ``CompositorVsyncDispatcher``, which destroys the object. + +Refresh Driver +-------------- + +The Refresh Driver is ticked from a `single active +timer <https://hg.mozilla.org/mozilla-central/file/ab0490972e1e/layout/base/nsRefreshDriver.cpp#l11>`__. +The assumption is that there are multiple ``RefreshDrivers`` connected +to a single ``RefreshTimer``. There are two ``RefreshTimers``: an active +and an inactive ``RefreshTimer``. Each Tab has its own +``RefreshDriver``, which connects to one of the global +``RefreshTimers``. The ``RefreshTimers`` execute on the *Main Thread* +and tick their connected ``RefreshDrivers``. We do not want to break +this model of multiple ``RefreshDrivers`` per a set of two global +``RefreshTimers``. Each ``RefreshDriver`` switches between the active +and inactive ``RefreshTimer``. + +Instead, we create a new ``RefreshTimer``, the ``VsyncRefreshTimer`` +which ticks based on vsync messages. We replace the current active timer +with a ``VsyncRefreshTimer``. All tabs will then tick based on this new +active timer. Since the ``RefreshTimer`` has a lifetime of the process, +we only need to create a single ``VsyncDispatcher`` per +``Display`` when Firefox starts. Even if we do not have any content +processes, the Chrome process will still need a ``VsyncRefreshTimer``, +thus we can associate the ``VsyncDispatcher`` with each +``Display``. + +When Firefox starts, we initially create a new ``VsyncRefreshTimer`` in +the Chrome process. The ``VsyncRefreshTimer`` will listen to vsync +notifications from ``VsyncDispatcher`` on the global +``Display``. When nsRefreshDriver::Shutdown executes, it will delete the +``VsyncRefreshTimer``. This creates a problem as all the +``RefreshTimers`` are currently manually memory managed whereas +``VsyncObservers`` are ref counted. To work around this problem, we +create a new ``RefreshDriverVsyncObserver`` as an inner class to +``VsyncRefreshTimer``, which actually receives vsync notifications. It +then ticks the ``RefreshDrivers`` inside ``VsyncRefreshTimer``. + +With Content processes, the start up process is more complicated. We +send vsync IPC messages via the use of the PBackground thread on the +parent process, which allows us to send messages from the Parent +process’ without waiting on the *main thread*. This sends messages from +the Parent::\ *PBackground Thread* to the Child::\ *Main Thread*. The +*main thread* receiving IPC messages on the content process is +acceptable because ``RefreshDrivers`` must execute on the *main thread*. +However, there is some amount of time required to setup the IPC +connection upon process creation and during this time, the +``RefreshDrivers`` must tick to set up the process. To get around this, +we initially use software ``RefreshTimers`` that already exist during +content process startup and swap in the ``VsyncRefreshTimer`` once the +IPC connection is created. + +During nsRefreshDriver::ChooseTimer, we create an async PBackground IPC +open request to create a ``VsyncParent`` and ``VsyncChild``. At the same +time, we create a software ``RefreshTimer`` and tick the +``RefreshDrivers`` as normal. Once the PBackground callback is executed +and an IPC connection exists, we swap all ``RefreshDrivers`` currently +associated with the active ``RefreshTimer`` and swap the +``RefreshDrivers`` to use the ``VsyncRefreshTimer``. Since all +interactions on the content process occur on the main thread, there are +no need for locks. The ``VsyncParent`` listens to vsync events through +the ``VsyncRefreshTimerDispatcher`` on the parent side and sends vsync +IPC messages to the ``VsyncChild``. The ``VsyncChild`` notifies the +``VsyncRefreshTimer`` on the content process. + +During the shutdown process of the content process, ActorDestroy is +called on the ``VsyncChild`` and ``VsyncParent`` due to the normal +PBackground shutdown process. Once ActorDestroy is called, no IPC +messages should be sent across the channel. After ActorDestroy is +called, the IPDL machinery will delete the **VsyncParent/Child** pair. +The ``VsyncParent``, due to being a ``VsyncObserver``, is ref counted. +After ``VsyncParent::ActorDestroy`` is called, it unregisters itself +from the ``VsyncDispatcher``, which holds the last reference +to the ``VsyncParent``, and the object will be deleted. + +Thus the overall flow during normal execution is: + +1. VsyncSource::Display::VsyncDispatcher receives a Vsync + notification from the OS in the parent process. +2. VsyncDispatcher notifies + VsyncRefreshTimer::RefreshDriverVsyncObserver that a vsync occurred on + the parent process on the hardware vsync thread. +3. VsyncDispatcher notifies the VsyncParent on the hardware + vsync thread that a vsync occurred. +4. The VsyncRefreshTimer::RefreshDriverVsyncObserver in the parent + process posts a task to the main thread that ticks the refresh + drivers. +5. VsyncParent posts a task to the PBackground thread to send a vsync + IPC message to VsyncChild. +6. VsyncChild receive a vsync notification on the content process on the + main thread and ticks their respective RefreshDrivers. + +Compressing Vsync Messages +-------------------------- + +Vsync messages occur quite often and the *main thread* can be busy for +long periods of time due to JavaScript. Consistently sending vsync +messages to the refresh driver timer can flood the *main thread* with +refresh driver ticks, causing even more delays. To avoid this problem, +we compress vsync messages on both the parent and child processes. + +On the parent process, newer vsync messages update a vsync timestamp but +do not actually queue any tasks on the *main thread*. Once the parent +process’ *main thread* executes the refresh driver tick, it uses the +most updated vsync timestamp to tick the refresh driver. After the +refresh driver has ticked, one single vsync message is queued for +another refresh driver tick task. On the content process, the IPDL +``compress`` keyword automatically compresses IPC messages. + +Multiple Monitors +----------------- + +In order to have multiple monitor support for the ``RefreshDrivers``, we +have multiple active ``RefreshTimers``. Each ``RefreshTimer`` is +associated with a specific ``Display`` via an id and tick when it’s +respective ``Display`` vsync occurs. We have **N RefreshTimers**, where +N is the number of connected displays. Each ``RefreshTimer`` still has +multiple ``RefreshDrivers``. + +When a tab or window changes monitors, the ``nsIWidget`` receives a +display changed notification. Based on which display the window is on, +the window switches to the correct ``VsyncDispatcher`` and +``CompositorVsyncDispatcher`` on the parent process based on the display +id. Each ``TabParent`` should also send a notification to their child. +Each ``TabChild``, given the display ID, switches to the correct +``RefreshTimer`` associated with the display ID. When each display vsync +occurs, it sends one IPC message to notify vsync. The vsync message +contains a display ID, to tick the appropriate ``RefreshTimer`` on the +content process. There is still only one **VsyncParent/VsyncChild** +pair, just each vsync notification will include a display ID, which maps +to the correct ``RefreshTimer``. + +Object Lifetime +--------------- + +1. CompositorVsyncDispatcher - Lives as long as the nsBaseWidget + associated with the VsyncDispatcher +2. CompositorVsyncScheduler::Observer - Lives and dies the same time as + the CompositorBridgeParent. +3. VsyncDispatcher - As long as the associated display + object, which is the lifetime of Firefox. +4. VsyncSource - Lives as long as the gfxPlatform on the chrome process, + which is the lifetime of Firefox. +5. VsyncParent/VsyncChild - Lives as long as the content process +6. RefreshTimer - Lives as long as the process + +Threads +------- + +All ``VsyncObservers`` are notified on the *Hardware Vsync Thread*. It +is the responsibility of the ``VsyncObservers`` to post tasks to their +respective correct thread. For example, the +``CompositorVsyncScheduler::Observer`` will be notified on the *Hardware +Vsync Thread*, and post a task to the *Compositor Thread* to do the +actual composition. + +1. Compositor Thread - Nothing changes +2. Main Thread - PVsyncChild receives IPC messages on the main thread. + We also enable/disable vsync on the main thread. +3. PBackground Thread - Creates a connection from the PBackground thread + on the parent process to the main thread in the content process. +4. Hardware Vsync Thread - Every platform is different, but we always + have the concept of a hardware vsync thread. Sometimes this is + actually created by the host OS. On Windows, we have to create a + separate platform thread that blocks on DwmFlush(). diff --git a/gfx/docs/SilkArchitecture.png b/gfx/docs/SilkArchitecture.png Binary files differnew file mode 100644 index 0000000000..938c585e40 --- /dev/null +++ b/gfx/docs/SilkArchitecture.png diff --git a/gfx/docs/index.rst b/gfx/docs/index.rst new file mode 100644 index 0000000000..9d1d7b0396 --- /dev/null +++ b/gfx/docs/index.rst @@ -0,0 +1,17 @@ +Graphics +======== + +This collection of linked pages contains design documents for the +Mozilla graphics architecture. The design documents live in gfx/docs directory. + +This `wiki page <https://wiki.mozilla.org/Platform/GFX>`__ contains +information about graphics and the graphics team at Mozilla. + +.. toctree:: + :maxdepth: 1 + + RenderingOverview + LayersHistory + AsyncPanZoom + Silk + Moz2D |