summaryrefslogtreecommitdiffstats
path: root/gfx/docs/AsyncPanZoom.rst
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-28 14:29:10 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-28 14:29:10 +0000
commit2aa4a82499d4becd2284cdb482213d541b8804dd (patch)
treeb80bf8bf13c3766139fbacc530efd0dd9d54394c /gfx/docs/AsyncPanZoom.rst
parentInitial commit. (diff)
downloadfirefox-2aa4a82499d4becd2284cdb482213d541b8804dd.tar.xz
firefox-2aa4a82499d4becd2284cdb482213d541b8804dd.zip
Adding upstream version 86.0.1.upstream/86.0.1upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'gfx/docs/AsyncPanZoom.rst')
-rw-r--r--gfx/docs/AsyncPanZoom.rst687
1 files changed, 687 insertions, 0 deletions
diff --git a/gfx/docs/AsyncPanZoom.rst b/gfx/docs/AsyncPanZoom.rst
new file mode 100644
index 0000000000..01bf2776df
--- /dev/null
+++ b/gfx/docs/AsyncPanZoom.rst
@@ -0,0 +1,687 @@
+.. _apz:
+
+Asynchronous Panning and Zooming
+================================
+
+**This document is a work in progress. Some information may be missing
+or incomplete.**
+
+.. image:: AsyncPanZoomArchitecture.png
+
+Goals
+-----
+
+We need to be able to provide a visual response to user input with
+minimal latency. In particular, on devices with touch input, content
+must track the finger exactly while panning, or the user experience is
+very poor. According to the UX team, 120ms is an acceptable latency
+between user input and response.
+
+Context and surrounding architecture
+------------------------------------
+
+The fundamental problem we are trying to solve with the Asynchronous
+Panning and Zooming (APZ) code is that of responsiveness. By default,
+web browsers operate in a “game loop” that looks like this:
+
+::
+
+ while true:
+ process input
+ do computations
+ repaint content
+ display repainted content
+
+In browsers the “do computation” step can be arbitrarily expensive
+because it can involve running event handlers in web content. Therefore,
+there can be an arbitrary delay between the input being received and the
+on-screen display getting updated.
+
+Responsiveness is always good, and with touch-based interaction it is
+even more important than with mouse or keyboard input. In order to
+ensure responsiveness, we split the “game loop” model of the browser
+into a multithreaded variant which looks something like this:
+
+::
+
+ Thread 1 (compositor thread)
+ while true:
+ receive input
+ send a copy of input to thread 2
+ adjust painted content based on input
+ display adjusted painted content
+
+ Thread 2 (main thread)
+ while true:
+ receive input from thread 1
+ do computations
+ repaint content
+ update the copy of painted content in thread 1
+
+This multithreaded model is called off-main-thread compositing (OMTC),
+because the compositing (where the content is displayed on-screen)
+happens on a separate thread from the main thread. Note that this is a
+very very simplified model, but in this model the “adjust painted
+content based on input” is the primary function of the APZ code.
+
+The “painted content” is stored on a set of “layers”, that are
+conceptually double-buffered. That is, when the main thread does its
+repaint, it paints into one set of layers (the “client” layers). The
+update that is sent to the compositor thread copies all the changes from
+the client layers into another set of layers that the compositor holds.
+These layers are called the “shadow” layers or the “compositor” layers.
+The compositor in theory can continuously composite these shadow layers
+to the screen while the main thread is busy doing other things and
+painting a new set of client layers.
+
+The APZ code takes the input events that are coming in from the hardware
+and uses them to figure out what the user is trying to do (e.g. pan the
+page, zoom in). It then expresses this user intention in the form of
+translation and/or scale transformation matrices. These transformation
+matrices are applied to the shadow layers at composite time, so that
+what the user sees on-screen reflects what they are trying to do as
+closely as possible.
+
+Technical overview
+------------------
+
+As per the heavily simplified model described above, the fundamental
+purpose of the APZ code is to take input events and produce
+transformation matrices. This section attempts to break that down and
+identify the different problems that make this task non-trivial.
+
+Checkerboarding
+~~~~~~~~~~~~~~~
+
+The content area that is painted and stored in a shadow layer is called
+the “displayport”. The APZ code is responsible for determining how large
+the displayport should be. On the one hand, we want the displayport to
+be as large as possible. At the very least it needs to be larger than
+what is visible on-screen, because otherwise, as soon as the user pans,
+there will be some unpainted area of the page exposed. However, we
+cannot always set the displayport to be the entire page, because the
+page can be arbitrarily long and this would require an unbounded amount
+of memory to store. Therefore, a good displayport size is one that is
+larger than the visible area but not so large that it is a huge drain on
+memory. Because the displayport is usually smaller than the whole page,
+it is always possible for the user to scroll so fast that they end up in
+an area of the page outside the displayport. When this happens, they see
+unpainted content; this is referred to as “checkerboarding”, and we try
+to avoid it where possible.
+
+There are many possible ways to determine what the displayport should be
+in order to balance the tradeoffs involved (i.e. having one that is too
+big is bad for memory usage, and having one that is too small results in
+excessive checkerboarding). Ideally, the displayport should cover
+exactly the area that we know the user will make visible. Although we
+cannot know this for sure, we can use heuristics based on current
+panning velocity and direction to ensure a reasonably-chosen displayport
+area. This calculation is done in the APZ code, and a new desired
+displayport is frequently sent to the main thread as the user is panning
+around.
+
+Multiple layers
+~~~~~~~~~~~~~~~
+
+Consider, for example, a scrollable page that contains an iframe which
+itself is scrollable. The iframe can be scrolled independently of the
+top-level page, and we would like both the page and the iframe to scroll
+responsively. This means that we want independent asynchronous panning
+for both the top-level page and the iframe. In addition to iframes,
+elements that have the overflow:scroll CSS property set are also
+scrollable, and also end up on separate scrollable layers. In the
+general case, the layers are arranged in a tree structure, and so within
+the APZ code we have a matching tree of AsyncPanZoomController (APZC)
+objects, one for each scrollable layer. To manage this tree of APZC
+instances, we have a single APZCTreeManager object. Each APZC is
+relatively independent and handles the scrolling for its associated
+layer, but there are some cases in which they need to interact; these
+cases are described in the sections below.
+
+Hit detection
+~~~~~~~~~~~~~
+
+Consider again the case where we have a scrollable page that contains an
+iframe which itself is scrollable. As described above, we will have two
+APZC instances - one for the page and one for the iframe. When the user
+puts their finger down on the screen and moves it, we need to do some
+sort of hit detection in order to determine whether their finger is on
+the iframe or on the top-level page. Based on where their finger lands,
+the appropriate APZC instance needs to handle the input. This hit
+detection is also done in the APZCTreeManager, as it has the necessary
+information about the sizes and positions of the layers. Currently this
+hit detection is not perfect, as it uses rects and does not account for
+things like rounded corners and opacity.
+
+Also note that for some types of input (e.g. when the user puts two
+fingers down to do a pinch) we do not want the input to be “split”
+across two different APZC instances. In the case of a pinch, for
+example, we find a “common ancestor” APZC instance - one that is
+zoomable and contains all of the touch input points, and direct the
+input to that APZC instance.
+
+Scroll Handoff
+~~~~~~~~~~~~~~
+
+Consider yet again the case where we have a scrollable page that
+contains an iframe which itself is scrollable. Say the user scrolls the
+iframe so that it reaches the bottom. If the user continues panning on
+the iframe, the expectation is that the top-level page will start
+scrolling. However, as discussed in the section on hit detection, the
+APZC instance for the iframe is separate from the APZC instance for the
+top-level page. Thus, we need the two APZC instances to communicate in
+some way such that input events on the iframe result in scrolling on the
+top-level page. This behaviour is referred to as “scroll handoff” (or
+“fling handoff” in the case where analogous behaviour results from the
+scrolling momentum of the page after the user has lifted their finger).
+
+Input event untransformation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The APZC architecture by definition results in two copies of a “scroll
+position” for each scrollable layer. There is the original copy on the
+main thread that is accessible to web content and the layout and
+painting code. And there is a second copy on the compositor side, which
+is updated asynchronously based on user input, and corresponds to what
+the user visually sees on the screen. Although these two copies may
+diverge temporarily, they are reconciled periodically. In particular,
+they diverge while the APZ code is performing an async pan or zoom
+action on behalf of the user, and are reconciled when the APZ code
+requests a repaint from the main thread.
+
+Because of the way input events are stored, this has some unfortunate
+consequences. Input events are stored relative to the device screen - so
+if the user touches at the same physical spot on the device, the same
+input events will be delivered regardless of the content scroll
+position. When the main thread receives a touch event, it combines that
+with the content scroll position in order to figure out what DOM element
+the user touched. However, because we now have two different scroll
+positions, this process may not work perfectly. A concrete example
+follows:
+
+Consider a device with screen size 600 pixels tall. On this device, a
+user is viewing a document that is 1000 pixels tall, and that is
+scrolled down by 200 pixels. That is, the vertical section of the
+document from 200px to 800px is visible. Now, if the user touches a
+point 100px from the top of the physical display, the hardware will
+generate a touch event with y=100. This will get sent to the main
+thread, which will add the scroll position (200) and get a
+document-relative touch event with y=300. This new y-value will be used
+in hit detection to figure out what the user touched. If the document
+had a absolute-positioned div at y=300, then that would receive the
+touch event.
+
+Now let us add some async scrolling to this example. Say that the user
+additionally scrolls the document by another 10 pixels asynchronously
+(i.e. only on the compositor thread), and then does the same touch
+event. The same input event is generated by the hardware, and as before,
+the document will deliver the touch event to the div at y=300. However,
+visually, the document is scrolled by an additional 10 pixels so this
+outcome is wrong. What needs to happen is that the APZ code needs to
+intercept the touch event and account for the 10 pixels of asynchronous
+scroll. Therefore, the input event with y=100 gets converted to y=110 in
+the APZ code before being passed on to the main thread. The main thread
+then adds the scroll position it knows about and determines that the
+user touched at a document-relative position of y=310.
+
+Analogous input event transformations need to be done for horizontal
+scrolling and zooming.
+
+Content independently adjusting scrolling
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As described above, there are two copies of the scroll position in the
+APZ architecture - one on the main thread and one on the compositor
+thread. Usually for architectures like this, there is a single “source
+of truth” value and the other value is simply a copy. However, in this
+case that is not easily possible to do. The reason is that both of these
+values can be legitimately modified. On the compositor side, the input
+events the user is triggering modify the scroll position, which is then
+propagated to the main thread. However, on the main thread, web content
+might be running Javascript code that programmatically sets the scroll
+position (via window.scrollTo, for example). Scroll changes driven from
+the main thread are just as legitimate and need to be propagated to the
+compositor thread, so that the visual display updates in response.
+
+Because the cross-thread messaging is asynchronous, reconciling the two
+types of scroll changes is a tricky problem. Our design solves this
+using various flags and generation counters. The general heuristic we
+have is that content-driven scroll position changes (e.g. scrollTo from
+JS) are never lost. For instance, if the user is doing an async scroll
+with their finger and content does a scrollTo in the middle, then some
+of the async scroll would occur before the “jump” and the rest after the
+“jump”.
+
+Content preventing default behaviour of input events
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Another problem that we need to deal with is that web content is allowed
+to intercept touch events and prevent the “default behaviour” of
+scrolling. This ability is defined in web standards and is
+non-negotiable. Touch event listeners in web content are allowed call
+preventDefault() on the touchstart or first touchmove event for a touch
+point; doing this is supposed to “consume” the event and prevent
+touch-based panning. As we saw in a previous section, the input event
+needs to be untransformed by the APZ code before it can be delivered to
+content. But, because of the preventDefault problem, we cannot fully
+process the touch event in the APZ code until content has had a chance
+to handle it. Web browsers in general solve this problem by inserting a
+delay of up to 300ms before processing the input - that is, web content
+is allowed up to 300ms to process the event and call preventDefault on
+it. If web content takes longer than 300ms, or if it completes handling
+of the event without calling preventDefault, then the browser
+immediately starts processing the events.
+
+The way the APZ implementation deals with this is that upon receiving a
+touch event, it immediately returns an untransformed version that can be
+dispatched to content. It also schedules a 400ms timeout (600ms on
+Android) during which content is allowed to prevent scrolling. There is
+an API that allows the main-thread event dispatching code to notify the
+APZ as to whether or not the default action should be prevented. If the
+APZ content response timeout expires, or if the main-thread event
+dispatching code notifies the APZ of the preventDefault status, then the
+APZ continues with the processing of the events (which may involve
+discarding the events).
+
+The touch-action CSS property from the pointer-events spec is intended
+to allow eliminating this 400ms delay in many cases (although for
+backwards compatibility it will still be needed for a while). Note that
+even with touch-action implemented, there may be cases where the APZ
+code does not know the touch-action behaviour of the point the user
+touched. In such cases, the APZ code will still wait up to 400ms for the
+main thread to provide it with the touch-action behaviour information.
+
+Technical details
+-----------------
+
+This section describes various pieces of the APZ code, and goes into
+more specific detail on APIs and code than the previous sections. The
+primary purpose of this section is to help people who plan on making
+changes to the code, while also not going into so much detail that it
+needs to be updated with every patch.
+
+Overall flow of input events
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This section describes how input events flow through the APZ code.
+
+1. Input events arrive from the hardware/widget code into the APZ via
+ APZCTreeManager::ReceiveInputEvent. The thread that invokes this is
+ called the input thread, and may or may not be the same as the Gecko
+ main thread.
+2. Conceptually the first thing that the APZCTreeManager does is to
+ associate these events with “input blocks”. An input block is a set
+ of events that share certain properties, and generally are intended
+ to represent a single gesture. For example with touch events, all
+ events following a touchstart up to but not including the next
+ touchstart are in the same block. All of the events in a given block
+ will go to the same APZC instance and will either all be processed
+ or all be dropped.
+3. Using the first event in the input block, the APZCTreeManager does a
+ hit-test to see which APZC it hits. This hit-test uses the event
+ regions populated on the layers, which may be larger than the true
+ hit area of the layer. If no APZC is hit, the events are discarded
+ and we jump to step 6. Otherwise, the input block is tagged with the
+ hit APZC as a tentative target and put into a global APZ input
+ queue.
+4.
+
+ i. If the input events landed outside the dispatch-to-content event
+ region for the layer, any available events in the input block
+ are processed. These may trigger behaviours like scrolling or
+ tap gestures.
+ ii. If the input events landed inside the dispatch-to-content event
+ region for the layer, the events are left in the queue and a
+ 400ms timeout is initiated. If the timeout expires before step 9
+ is completed, the APZ assumes the input block was not cancelled
+ and the tentative target is correct, and processes them as part
+ of step 10.
+
+5. The call stack unwinds back to APZCTreeManager::ReceiveInputEvent,
+ which does an in-place modification of the input event so that any
+ async transforms are removed.
+6. The call stack unwinds back to the widget code that called
+ ReceiveInputEvent. This code now has the event in the coordinate
+ space Gecko is expecting, and so can dispatch it to the Gecko main
+ thread.
+7. Gecko performs its own usual hit-testing and event dispatching for
+ the event. As part of this, it records whether any touch listeners
+ cancelled the input block by calling preventDefault(). It also
+ activates inactive scrollframes that were hit by the input events.
+8. The call stack unwinds back to the widget code, which sends two
+ notifications to the APZ code on the input thread. The first
+ notification is via APZCTreeManager::ContentReceivedInputBlock, and
+ informs the APZ whether the input block was cancelled. The second
+ notification is via APZCTreeManager::SetTargetAPZC, and informs the
+ APZ of the results of the Gecko hit-test during event dispatch. Note
+ that Gecko may report that the input event did not hit any
+ scrollable frame at all. The SetTargetAPZC notification happens only
+ once per input block, while the ContentReceivedInputBlock
+ notification may happen once per block, or multiple times per block,
+ depending on the input type.
+9.
+
+ i. If the events were processed as part of step 4(i), the
+ notifications from step 8 are ignored and step 10 is skipped.
+ ii. If events were queued as part of step 4(ii), and steps 5-8 take
+ less than 400ms, the arrival of both notifications from step 8
+ will mark the input block ready for processing.
+ iii. If events were queued as part of step 4(ii), but steps 5-8 take
+ longer than 400ms, the notifications from step 8 will be
+ ignored and step 10 will already have happened.
+
+10. If events were queued as part of step 4(ii) they are now either
+ processed (if the input block was not cancelled and Gecko detected a
+ scrollframe under the input event, or if the timeout expired) or
+ dropped (all other cases). Note that the APZC that processes the
+ events may be different at this step than the tentative target from
+ step 3, depending on the SetTargetAPZC notification. Processing the
+ events may trigger behaviours like scrolling or tap gestures.
+
+If the CSS touch-action property is enabled, the above steps are
+modified as follows: \* In step 4, the APZC also requires the allowed
+touch-action behaviours for the input event. This might have been
+determined as part of the hit-test in APZCTreeManager; if not, the
+events are queued. \* In step 6, the widget code determines the content
+element at the point under the input element, and notifies the APZ code
+of the allowed touch-action behaviours. This notification is sent via a
+call to APZCTreeManager::SetAllowedTouchBehavior on the input thread. \*
+In step 9(ii), the input block will only be marked ready for processing
+once all three notifications arrive.
+
+Threading considerations
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The bulk of the input processing in the APZ code happens on what we call
+“the input thread”. In practice the input thread could be the Gecko main
+thread, the compositor thread, or some other thread. There are obvious
+downsides to using the Gecko main thread - that is, “asynchronous”
+panning and zooming is not really asynchronous as input events can only
+be processed while Gecko is idle. In an e10s environment, using the
+Gecko main thread of the chrome process is acceptable, because the code
+running in that process is more controllable and well-behaved than
+arbitrary web content. Using the compositor thread as the input thread
+could work on some platforms, but may be inefficient on others. For
+example, on Android (Fennec) we receive input events from the system on
+a dedicated UI thread. We would have to redispatch the input events to
+the compositor thread if we wanted to the input thread to be the same as
+the compositor thread. This introduces a potential for higher latency,
+particularly if the compositor does any blocking operations - blocking
+SwapBuffers operations, for example. As a result, the APZ code itself
+does not assume that the input thread will be the same as the Gecko main
+thread or the compositor thread.
+
+Active vs. inactive scrollframes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The number of scrollframes on a page is potentially unbounded. However,
+we do not want to create a separate layer for each scrollframe right
+away, as this would require large amounts of memory. Therefore,
+scrollframes as designated as either “active” or “inactive”. Active
+scrollframes are the ones that do have their contents put on a separate
+layer (or set of layers), and inactive ones do not.
+
+Consider a page with a scrollframe that is initially inactive. When
+layout generates the layers for this page, the content of the
+scrollframe will be flattened into some other PaintedLayer (call it P).
+The layout code also adds the area (or bounding region in case of weird
+shapes) of the scrollframe to the dispatch-to-content region of P.
+
+When the user starts interacting with that content, the hit-test in the
+APZ code finds the dispatch-to-content region of P. The input block
+therefore has a tentative target of P when it goes into step 4(ii) in
+the flow above. When gecko processes the input event, it must detect the
+inactive scrollframe and activate it, as part of step 7. Finally, the
+widget code sends the SetTargetAPZC notification in step 8 to notify the
+APZ that the input block should really apply to this new layer. The
+issue here is that the layer transaction containing the new layer must
+reach the compositor and APZ before the SetTargetAPZC notification. If
+this does not occur within the 400ms timeout, the APZ code will be
+unable to update the tentative target, and will continue to use P for
+that input block. Input blocks that start after the layer transaction
+will get correctly routed to the new layer as there will now be a layer
+and APZC instance for the active scrollframe.
+
+This model implies that when the user initially attempts to scroll an
+inactive scrollframe, it may end up scrolling an ancestor scrollframe.
+(This is because in the absence of the SetTargetAPZC notification, the
+input events will get applied to the closest ancestor scrollframe’s
+APZC.) Only after the round-trip to the gecko thread is complete is
+there a layer for async scrolling to actually occur on the scrollframe
+itself. At that point the scrollframe will start receiving new input
+blocks and will scroll normally.
+
+WebRender Integration
+~~~~~~~~~~~~~~~~~~~~~
+
+The APZ code was originally written to work with the "layers" graphics
+backend. Many of the concepts (and therefore variable/function names)
+stem from the integration with the layers backend. After the WebRender
+backend was added, the existing code evolved over time to integrate
+with that backend as well, resulting in a bit of a hodge-podge effect.
+With that cautionary note out of the way, there are three main pieces
+that need to be understood to grasp the integration between the APZ
+code and WebRender. These are detailed below.
+
+HitTestingTree
+^^^^^^^^^^^^^^
+
+The APZCTreeManager keeps as part of its internal state a tree of
+HitTestingTreeNode instances. This is referred to as the HitTestingTree.
+As the name implies, this was used for hit-testing purposes, so that
+APZ could determine which scrollframe a particular incoming input event
+would be targeting. Doing the hit-test requires access to a bunch of state,
+such as CSS transforms and clip rects, as well as ancillary data like
+event regions, which affect how APZ reacts to input events.
+
+With the layers backend, all this information was provided by a layer tree
+update, and so the HitTestingTree was created to mirror the layer tree,
+allowing APZ access to that information from other threads. The structure
+of the tree was identical to the layer tree. But with WebRender, there
+is no "layer tree" per se, and instead we "fake it" by creating a
+HitTestingTree structure that is similar to what it would be like on the
+equivalent layer tree. But the bigger difference is that with WebRender,
+the HitTestingTree is not actually used for hit-testing at all; instead
+we get WebRender to do the hit-test for us, as it can do so using its
+own internal state and produce a more precise result.
+
+Information stored in the HitTestingTree (e.g. CSS transforms) is still
+used by other pieces of APZ (e.g. some of the scrollbar manipulation code)
+so it is still needed, even with the WebRender backend. For this reason,
+and for consistency between the two backends, we try to populate as much
+information in the HitTestingTree that we can, even with the WebRender
+backend.
+
+With the layers backend, the way the HitTestingTree is created is by
+walking the layer tree with a LayerMetricsWrapper class. This wraps
+a layer tree but also expands layers with multiple ScrollMetadata into
+multiple nodes. The equivalent in the WebRender world is the
+WebRenderScrollDataWrapper, which wraps a WebRenderScrollData object. The
+WebRenderScrollData object is roughly analogous to a layer tree, but
+is something that is constructed deliberately rather than being a natural
+output from the WebRender paint transaction (i.e. we create it explicitly
+for APZ's consumption, rather than something that we would create anyway
+for WebRender's consumption).
+
+The WebRenderScrollData structure contains within it a tree of
+WebRenderLayerScrollData instances, which are analogous to individual
+layers in a layer tree. These instances contain various fields like
+CSS transforms, fixed/sticky position info, etc. that would normally be
+found on individual layers in the layer tree. This allows the code
+that builds the HitTestingTree to consume either a WebRenderScrollData
+or a layer tree in a more-or-less unified fashion.
+
+Working backwards a bit more, the WebRenderLayerScrollData instances
+are created as we traverse the Gecko display list and build the
+WebRender display list. In the layers world, the code in FrameLayerBuilder
+was responsible for building the layer tree from the Gecko display list,
+but in the WebRender world, this happens primarily in WebRenderCommandBuilder.
+As of this writing, the architecture for this is that, as we walk
+the Gecko display list, we query it to see if it contains any information
+that APZ might need to know (e.g. CSS transforms) via a call to
+`nsDisplayItem::UpdateScrollData(nullptr, nullptr)`. If this call
+returns true, we create a WebRenderLayerScrollData instance for the item,
+and populate it with the necessary information in
+`WebRenderLayerScrollData::Initialize`. We also create
+WebRenderLayerScrollData instances if we detect (via ASR changes) that
+we are now processing a Gecko display item that is in a different scrollframe
+than the previous item. This is equivalent to how FrameLayerBuilder will
+flatten items with different ASRs into different layers, so that it
+is cheap to scroll scrollframes in the compositor.
+
+The main sources of complexity in this code come from:
+
+1. Ensuring the ScrollMetadata instances end on the proper
+ WebRenderLayerScrollData instances (such that every path from a leaf
+ WebRenderLayerScrollData node to the root has a consistent ordering of
+ scrollframes without duplications).
+2. The deferred-transform optimization that is described in more detail
+ at the declaration of StackingContextHelper::mDeferredTransformItem.
+
+Hit-testing
+^^^^^^^^^^^
+
+Since the HitTestingTree is not used for actual hit-testing purposes
+with the WebRender backend (see previous section), this section describes
+how hit-testing actually works with WebRender.
+
+With both layers and WebRender, the Gecko display list contains display items
+(`nsDisplayCompositorHitTestInfo`) that store hit-testing state. These
+items implement the `CreateWebRenderCommands` method and generate a "hit-test
+item" into the WebRender display list. This is basically just a rectangle
+item in the WebRender display list that is no-op for painting purposes,
+but contains information that should be returned by the hit-test (specifically
+the hit info flags and the scrollId of the enclosing scrollframe). The
+hit-test item gets clipped and transformed in the same way that all the other
+items in the WebRender display list do, via clip chains and enclosing
+reference frame/stacking context items.
+
+When WebRender needs to do a hit-test, it goes through its display list,
+taking into account the current clips and transforms, adjusted for the
+most recent async scroll/zoom, and determines which hit-test item(s) are under
+the target point, and returns those items. APZ can then take the frontmost
+item from that list (or skip over it if it happens to be inside a OOP
+subdocument that's pointer-events:none) and use that as the hit target.
+It's important to note that when APZ does hit-testing for the layers backend,
+it uses the most recent available async transforms, even if those transforms
+have not yet been composited. With WebRender, the hit-test uses the last
+transform provided by the `SampleForWebRender` API (see next section) which
+generally reflects the last composite, and doesn't take into account further
+changes to the transforms that have occurred since then.
+
+When debugging hit-test issues, it is often useful to apply the patches
+on bug 1656260, which introduce a guid on Gecko display items and propagate
+it all the way through to where APZ gets the hit-test result. This allows
+answering the question "which nsDisplayCompositorHitTestInfo was responsible
+for this hit-test result?" which is often a very good first step in
+solving the bug. From there, one can determine if there was some other
+display item in front that should have generated a
+nsDisplayCompositorHitTestInfo but didn't, or if display item itself had
+incorrect information. The second patch on that bug further allows exposing
+hand-written debug info to the APZ code, so that the WR hit-testing
+mechanism itself can be more effectively debugged, in case there is a problem
+with the WR display items getting improperly transformed or clipped.
+
+Sampling
+^^^^^^^^
+
+With both the layers and WebRender backend, the compositing step needs to
+read the latest async transforms from APZ in order to ensure scrollframes
+are rendered at the right position. In both cases, the API for this is
+exposed via the `APZSampler` class. The difference is that with the layers
+backend, the `AsyncCompositionManager` walks the layer tree and queries
+the transform components for each layer individually via the various getters
+on `APZSampler`. In contrast, with the WebRender backend, there is a single
+`APZSampler::SampleForWebRender` API that returns all the information needed
+for all the scrollframes, scrollthumbs, etc. Conceptually though, the
+functionality is pretty similar, because the compositor needs the same
+information from APZ regardless of which backend is in use.
+
+Along with sampling the APZ transforms, the compositor also triggers APZ
+animations to advance to the next timestep (usually the next vsync). Again,
+with both the WebRender and layers backend, this happens just before reading
+the APZ transforms. The only difference is that with the layers backend,
+the `AsyncCompositionManager` invokes the `APZSampler::AdvanceAnimations` API
+directly, whereas with the WebRender backend this happens as part of the
+`APZSampler::SampleForWebRender` implementation.
+
+Threading / Locking Overview
+----------------------------
+
+Threads
+~~~~~~~
+
+There are three threads relevant to APZ: the **controller thread**,
+the **updater thread**, and the **sampler thread**. This table lists
+which threads play these roles on each platform / configuration:
+
+===================== ========== =========== ============= ============== ========== =============
+APZ Thread Name Desktop Desktop+GPU Desktop+WR Desktop+WR+GPU Android Android+WR
+===================== ========== =========== ============= ============== ========== =============
+**controller thread** UI main GPU main UI main GPU main Java UI Java UI
+**updater thread** Compositor Compositor SceneBuilder SceneBuilder Compositor SceneBuilder
+**sampler thread** Compositor Compositor RenderBackend RenderBackend Compositor RenderBackend
+===================== ========== =========== ============= ============== ========== =============
+
+Locks
+~~~~~
+
+There are also a number of locks used in APZ code:
+
+======================= ==============================
+Lock type How many instances
+======================= ==============================
+APZ tree lock one per APZCTreeManager
+APZC map lock one per APZCTreeManager
+APZC instance lock one per AsyncPanZoomController
+APZ test lock one per APZCTreeManager
+Checkerboard event lock one per AsyncPanZoomController
+======================= ==============================
+
+Thread / Lock Ordering
+~~~~~~~~~~~~~~~~~~~~~~
+
+To avoid deadlocks, the threads and locks have a global **ordering**
+which must be respected.
+
+Respecting the ordering means the following:
+
+- Let "A < B" denote that A occurs earlier than B in the ordering
+- Thread T may only acquire lock L, if T < L
+- A thread may only acquire lock L2 while holding lock L1, if L1 < L2
+- A thread may only block on a response from another thread T while holding a lock L, if L < T
+
+**The lock ordering is as follows**:
+
+1. UI main
+2. GPU main (only if GPU enabled)
+3. Compositor thread
+4. SceneBuilder thread (only if WR enabled)
+5. **APZ tree lock**
+6. RenderBackend thread (only if WR enabled)
+7. **APZC map lock**
+8. **APZC instance lock**
+9. **APZ test lock**
+10. **Checkerboard event lock**
+
+Example workflows
+^^^^^^^^^^^^^^^^^
+
+Here are some example APZ workflows. Observe how they all obey
+the global thread/lock ordering. Feel free to add others:
+
+- **Input handling** (in WR+GPU) case: UI main -> GPU main -> APZ tree lock -> RenderBackend thread
+- **Sync messages** in ``PCompositorBridge.ipdl``: UI main thread -> Compositor thread
+- **GetAPZTestData**: Compositor thread -> SceneBuilder thread -> test lock
+- **Scene swap**: SceneBuilder thread -> APZ tree lock -> RenderBackend thread
+- **Updating hit-testing tree**: SceneBuilder thread -> APZ tree lock -> APZC instance lock
+- **Updating APZC map**: SceneBuilder thread -> APZ tree lock -> APZC map lock
+- **Sampling and animation deferred tasks** [1]_: RenderBackend thread -> APZC map lock -> APZC instance lock
+- **Advancing animations**: RenderBackend thread -> APZC instance lock
+
+.. [1] It looks like there are two deferred tasks that actually need the tree lock,
+ ``AsyncPanZoomController::HandleSmoothScrollOverscroll`` and
+ ``AsyncPanZoomController::HandleFlingOverscroll``. We should be able to rewrite
+ these to use the map lock instead of the tree lock.
+ This will allow us to continue running the deferred tasks on the sampler
+ thread rather than having to bounce them to another thread.