13 files changed, 2031 insertions, 0 deletions
diff --git a/gfx/docs/AsyncPanZoom.rst b/gfx/docs/AsyncPanZoom.rst
new file mode 100644
index 0000000000..39bdc48c7d
--- /dev/null
+++ b/gfx/docs/AsyncPanZoom.rst
@@ -0,0 +1,929 @@
+.. _apz:
+
+Asynchronous Panning and Zooming
+================================
+
+**This document is a work in progress. Some information may be missing
+or incomplete.**
+
+.. image:: AsyncPanZoomArchitecture.png
+
+Goals
+-----
+
+We need to be able to provide a visual response to user input with
+minimal latency. In particular, on devices with touch input, content
+must track the finger exactly while panning, or the user experience is
+very poor. According to the UX team, 120ms is an acceptable latency
+between user input and response.
+
+Context and surrounding architecture
+------------------------------------
+
+The fundamental problem we are trying to solve with the Asynchronous
+Panning and Zooming (APZ) code is that of responsiveness. By default,
+web browsers operate in a “game loop” that looks like this:
+
+::
+
+       while true:
+           process input
+           do computations
+           repaint content
+           display repainted content
+
+In browsers the “do computation” step can be arbitrarily expensive
+because it can involve running event handlers in web content. Therefore,
+there can be an arbitrary delay between the input being received and the
+on-screen display getting updated.
+
+Responsiveness is always good, and with touch-based interaction it is
+even more important than with mouse or keyboard input. In order to
+ensure responsiveness, we split the “game loop” model of the browser
+into a multithreaded variant which looks something like this:
+
+::
+
+       Thread 1 (compositor thread)
+       while true:
+           receive input
+           send a copy of input to thread 2
+           adjust rendered content based on input
+           display adjusted rendered content
+
+       Thread 2 (main thread)
+       while true:
+           receive input from thread 1
+           do computations
+           rerender content
+           update the copy of rendered content in thread 1
+
+This multithreaded model is called off-main-thread compositing (OMTC),
+because the compositing (where the content is displayed on-screen)
+happens on a separate thread from the main thread. Note that this is a
+very very simplified model, but in this model the “adjust rendered
+content based on input” is the primary function of the APZ code.
+
+A couple of notes on APZ's relationship to other browser architecture
+improvements:
+
+1. Due to Electrolysis (e10s), Site Isolation (Fission), and GPU Process
+   isolation, the above two threads often actually run in different
+   processes. APZ is largely agnostic to this, as all communication
+   between the two threads for APZ purposes happens using an IPC layer
+   that abstracts over communication between threads vs. processes.
+2. With the WebRender graphics backend, part of the rendering pipeline is
+   also offloaded from the main thread. In this architecture, the
+   information sent from the main thread consists of a display list, and
+   scrolling-related metadata referencing content in that display list.
+   The metadata is kept in a queue until the display list undergoes an
+   additional rendering step in the compositor (scene building). At this
+   point, we are ready to tell APZ about the new content and have it
+   start applying adjustments to it, as further rendering steps beyond
+   scene building are done synchronously on each composite.
+
+The compositor in theory can continuously composite previously rendered
+content (adjusted on each composite by APZ) to the screen while the
+main thread is busy doing other things and rendering new content.
+
+The APZ code takes the input events that are coming in from the hardware
+and uses them to figure out what the user is trying to do (e.g. pan the
+page, zoom in). It then expresses this user intention in the form of
+translation and/or scale transformation matrices. These transformation
+matrices are applied to the rendered content at composite time, so that
+what the user sees on-screen reflects what they are trying to do as
+closely as possible.
+
+Technical overview
+------------------
+
+As per the heavily simplified model described above, the fundamental
+purpose of the APZ code is to take input events and produce
+transformation matrices. This section attempts to break that down and
+identify the different problems that make this task non-trivial.
+
+Checkerboarding
+~~~~~~~~~~~~~~~
+
+The area of page content for which a display list is built and sent to
+the compositor is called the “displayport”. The APZ code is responsible
+for determining how large the displayport should be. On the one hand, we
+want the displayport to be as large as possible. At the very least it
+needs to be larger than what is visible on-screen, because otherwise, as
+soon as the user pans, there will be some unpainted area of the page
+exposed. However, we cannot always set the displayport to be the entire
+page, because the page can be arbitrarily long and this would require an
+unbounded amount of memory to store. Therefore, a good displayport size
+is one that is larger than the visible area but not so large that it is a
+huge drain on memory. Because the displayport is usually smaller than the
+whole page, it is always possible for the user to scroll so fast that
+they end up in an area of the page outside the displayport. When this
+happens, they see unpainted content; this is referred to as
+“checkerboarding”, and we try to avoid it where possible.
+
+There are many possible ways to determine what the displayport should be
+in order to balance the tradeoffs involved (i.e. having one that is too
+big is bad for memory usage, and having one that is too small results in
+excessive checkerboarding). Ideally, the displayport should cover
+exactly the area that we know the user will make visible. Although we
+cannot know this for sure, we can use heuristics based on current
+panning velocity and direction to ensure a reasonably-chosen displayport
+area. This calculation is done in the APZ code, and a new desired
+displayport is frequently sent to the main thread as the user is panning
+around.
+
+Multiple scrollable elements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Consider, for example, a scrollable page that contains an iframe which
+itself is scrollable. The iframe can be scrolled independently of the
+top-level page, and we would like both the page and the iframe to scroll
+responsively. This means that we want independent asynchronous panning
+for both the top-level page and the iframe. In addition to iframes,
+elements that have the overflow:scroll CSS property set are also
+scrollable. In the display list, scrollable elements are arranged in a
+tree structure, and in the APZ code we have a matching tree of
+AsyncPanZoomController (APZC) objects, one for each scrollable element.
+To manage this tree of APZC instances, we have a single APZCTreeManager
+object. Each APZC is relatively independent and handles the scrolling for
+its associated scrollable element, but there are some cases in which they
+need to interact; these cases are described in the sections below.
+
+Hit detection
+~~~~~~~~~~~~~
+
+Consider again the case where we have a scrollable page that contains an
+iframe which itself is scrollable. As described above, we will have two
+APZC instances - one for the page and one for the iframe. When the user
+puts their finger down on the screen and moves it, we need to do some
+sort of hit detection in order to determine whether their finger is on
+the iframe or on the top-level page. Based on where their finger lands,
+the appropriate APZC instance needs to handle the input.
+
+This hit detection is done by APZCTreeManager in collaboration with
+WebRender, which has more detailed information about the structure of
+the page content than is stored in APZ directly. See
+:ref:`this section <wr-hit-test-details>` for more details.
+
+Also note that for some types of input (e.g. when the user puts two
+fingers down to do a pinch) we do not want the input to be “split”
+across two different APZC instances. In the case of a pinch, for
+example, we find a “common ancestor” APZC instance - one that is
+zoomable and contains all of the touch input points, and direct the
+input to that APZC instance.
+
+Scroll Handoff
+~~~~~~~~~~~~~~
+
+Consider yet again the case where we have a scrollable page that
+contains an iframe which itself is scrollable. Say the user scrolls the
+iframe so that it reaches the bottom. If the user continues panning on
+the iframe, the expectation is that the top-level page will start
+scrolling. However, as discussed in the section on hit detection, the
+APZC instance for the iframe is separate from the APZC instance for the
+top-level page. Thus, we need the two APZC instances to communicate in
+some way such that input events on the iframe result in scrolling on the
+top-level page. This behaviour is referred to as “scroll handoff” (or
+“fling handoff” in the case where analogous behaviour results from the
+scrolling momentum of the page after the user has lifted their finger).
+
+.. _input-event-untransformation:
+
+Input event untransformation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The APZC architecture by definition results in two copies of a “scroll
+position” for each scrollable element. There is the original copy on the
+main thread that is accessible to web content and the layout and
+painting code. And there is a second copy on the compositor side, which
+is updated asynchronously based on user input, and corresponds to what
+the user visually sees on the screen. Although these two copies may
+diverge temporarily, they are reconciled periodically. In particular,
+they diverge while the APZ code is performing an async pan or zoom
+action on behalf of the user, and are reconciled when the APZ code
+requests a repaint from the main thread.
+
+Because of the way input events are represented, this has some
+unfortunate consequences. Input event coordinates are represented
+relative to the device screen - so if the user touches at the same
+physical spot on the device, the same input events will be delivered
+regardless of the content scroll position. When the main thread receives
+a touch event, it combines that with the content scroll position in order
+to figure out what DOM element the user touched. However, because we now
+have two different scroll positions, this process may not work perfectly.
+A concrete example follows:
+
+Consider a device with screen size 600 pixels tall. On this device, a
+user is viewing a document that is 1000 pixels tall, and that is
+scrolled down by 200 pixels. That is, the vertical section of the
+document from 200px to 800px is visible. Now, if the user touches a
+point 100px from the top of the physical display, the hardware will
+generate a touch event with y=100. This will get sent to the main
+thread, which will add the scroll position (200) and get a
+document-relative touch event with y=300. This new y-value will be used
+in hit detection to figure out what the user touched. If the document
+had a absolute-positioned div at y=300, then that would receive the
+touch event.
+
+Now let us add some async scrolling to this example. Say that the user
+additionally scrolls the document by another 10 pixels asynchronously
+(i.e. only on the compositor thread), and then does the same touch
+event. The same input event is generated by the hardware, and as before,
+the document will deliver the touch event to the div at y=300. However,
+visually, the document is scrolled by an additional 10 pixels so this
+outcome is wrong. What needs to happen is that the APZ code needs to
+intercept the touch event and account for the 10 pixels of asynchronous
+scroll. Therefore, the input event with y=100 gets converted to y=110 in
+the APZ code before being passed on to the main thread. The main thread
+then adds the scroll position it knows about and determines that the
+user touched at a document-relative position of y=310.
+
+Analogous input event transformations need to be done for horizontal
+scrolling and zooming.
+
+Content independently adjusting scrolling
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As described above, there are two copies of the scroll position in the
+APZ architecture - one on the main thread and one on the compositor
+thread. Usually for architectures like this, there is a single “source
+of truth” value and the other value is simply a copy. However, in this
+case that is not easily possible to do. The reason is that both of these
+values can be legitimately modified. On the compositor side, the input
+events the user is triggering modify the scroll position, which is then
+propagated to the main thread. However, on the main thread, web content
+might be running Javascript code that programmatically sets the scroll
+position (via window.scrollTo, for example). Scroll changes driven from
+the main thread are just as legitimate and need to be propagated to the
+compositor thread, so that the visual display updates in response.
+
+Because the cross-thread messaging is asynchronous, reconciling the two
+types of scroll changes is a tricky problem. Our design solves this
+using various flags and generation counters. The general heuristic we
+have is that content-driven scroll position changes (e.g. scrollTo from
+JS) are never lost. For instance, if the user is doing an async scroll
+with their finger and content does a scrollTo in the middle, then some
+of the async scroll would occur before the “jump” and the rest after the
+“jump”.
+
+Content preventing default behaviour of input events
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Another problem that we need to deal with is that web content is allowed
+to intercept touch events and prevent the “default behaviour” of
+scrolling. This ability is defined in web standards and is
+non-negotiable. Touch event listeners in web content are allowed call
+preventDefault() on the touchstart or first touchmove event for a touch
+point; doing this is supposed to “consume” the event and prevent
+touch-based panning. As we saw in a previous section, the input event
+needs to be untransformed by the APZ code before it can be delivered to
+content. But, because of the preventDefault problem, we cannot fully
+process the touch event in the APZ code until content has had a chance
+to handle it.
+
+To balance the needs of correctness (which calls for allowing web content
+to successfully prevent default handling of events if it wishes to) and
+responsiveness (which calls for avoiding blocking on web content
+Javascript for a potentially-unbounded amount of time before reacting to
+an event), APZ gives web content a "deadline" to process the event and
+tell APZ whether preventDefault() was called on the event. The deadline
+is 400ms from the time APZ receives the event on desktop, and 600ms on
+mobile. If web content is able to process the event before this deadline,
+the decision to preventDefault() the event or not will be respected. If
+web content fails to process the event before the deadline, APZ assumes
+preventDefault() will not be called and goes ahead and processes the
+event.
+
+To implement this, upon receiving a touch event, APZ immediately returns
+an untransformed version that can be dispatched to content. It also
+schedules the 400ms or 600ms timeout. There is an API that allows the
+main-thread event dispatching code to notify APZ as to whether or not the
+default action should be prevented. If the APZ content response timeout
+expires, or if the main-thread event dispatching code notifies the APZ of
+the preventDefault status, then the APZ continues with the processing of
+the events (which may involve discarding the events).
+
+To limit the responsiveness impact of this round-trip to content, APZ
+tries to identify cases where it can rule out preventDefault() as a
+possible outcome. To this end, the hit-testing information sent to the
+compositor includes information about which regions of the page are
+occupied by elements that have a touch event listener. If an event
+targets an area outside of these regions, preventDefault() can be ruled
+out, and the round-trip skipped.
+
+Additionally, recent enhancements to web standards have given page
+authors new tools that can further limit the responsiveness impact of
+preventDefault():
+
+1. Event listeners can be registered as "passive", which means they
+   are not allowed to call preventDefault(). Authors can use this flag
+   when writing listeners that only need to observe the events, not alter
+   their behaviour via preventDefault(). The presence of passive event
+   listeners does not cause APZ to perform the content round-trip.
+2. If page authors wish to disable certain types of touch interactions
+   completely, they can use the ``touch-action`` CSS property from the
+   pointer-events spec to do so declaratively, instead of registering
+   event listeners that call preventDefault(). Touch-action flags are
+   also included in the hit-test information sent to the compositor, and
+   APZ uses this information to respect ``touch-action``. (Note that the
+   touch-action information sent to the compositor is not always 100%
+   accurate, and sometimes APZ needs to fall back on asking the main
+   thread for touch-action information, which again involves a
+   round-trip.)
+
+Other event types
+~~~~~~~~~~~~~~~~~
+
+The above sections talk mostly about touch events, but over time APZ has
+been extended to handle a variety of other event types, such as trackpad
+and mousewheel scrolling, scrollbar thumb dragging, and keyboard
+scrolling in some cases. Much of the above applies to these other event
+types too (for example, wheel events can be prevent-defaulted as well).
+
+Importantly, the "untransformation" described above needs to happen even
+for event types which are not handled in APZ, such as mouse click events,
+since async scrolling can still affect the correct targeting of such
+events.
+
+
+Technical details
+-----------------
+
+This section describes various pieces of the APZ code, and goes into
+more specific detail on APIs and code than the previous sections. The
+primary purpose of this section is to help people who plan on making
+changes to the code, while also not going into so much detail that it
+needs to be updated with every patch.
+
+Overall flow of input events
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This section describes how input events flow through the APZ code.
+
+Disclaimer: some details in this section are out of date (for example,
+it assumes the case where the main thread and compositor thread are
+in the same process, which is rarely the case these days, so in practice
+e.g. steps 6 and 8 involve IPC, not just "stack unwinding").
+
+1.  Input events arrive from the hardware/widget code into the APZ via
+    APZCTreeManager::ReceiveInputEvent. The thread that invokes this is
+    called the "controller thread", and may or may not be the same as the
+    Gecko main thread.
+2.  Conceptually the first thing that the APZCTreeManager does is to
+    associate these events with “input blocks”. An input block is a set
+    of events that share certain properties, and generally are intended
+    to represent a single gesture. For example with touch events, all
+    events following a touchstart up to but not including the next
+    touchstart are in the same block. All of the events in a given block
+    will go to the same APZC instance and will either all be processed
+    or all be dropped.
+3.  Using the first event in the input block, the APZCTreeManager does a
+    hit-test to see which APZC it hits. If no APZC is hit, the events are
+    discarded and we jump to step 6. Otherwise, the input block is tagged
+    with the hit APZC as a tentative target and put into a global APZ
+    input queue. In addition the target APZC, the result of the hit test
+    also includes whether the input event landed on a "dispatch-to-content"
+    region. These are regions of the page where there is something going
+    on that requires dispatching the event to content and waiting for
+    a response _before_ processing the event in APZ; an example of this
+    is a region containing an element with a non-passive event listener,
+    as described above. (TODO: Add a section that talks about the other
+    uses of the dispatch-to-content mechanism.)
+4.
+
+    i.  If the input events landed outside a dispatch-to-content region,
+        any available events in the input block are processed. These may
+        trigger behaviours like scrolling or tap gestures.
+    ii. If the input events landed inside a dispatch-to-content region,
+        the events are left in the queue and a timeout is initiated. If
+        the timeout expires before step 9 is completed, the APZ assumes
+        the input block was not cancelled and the tentative target is
+        correct, and processes them as part of step 10.
+
+5.  The call stack unwinds back to APZCTreeManager::ReceiveInputEvent,
+    which does an in-place modification of the input event so that any
+    async transforms are removed.
+6.  The call stack unwinds back to the widget code that called
+    ReceiveInputEvent. This code now has the event in the coordinate
+    space Gecko is expecting, and so can dispatch it to the Gecko main
+    thread.
+7.  Gecko performs its own usual hit-testing and event dispatching for
+    the event. As part of this, it records whether any touch listeners
+    cancelled the input block by calling preventDefault(). It also
+    activates inactive scrollframes that were hit by the input events.
+8.  The call stack unwinds back to the widget code, which sends two
+    notifications to the APZ code on the controller thread. The first
+    notification is via APZCTreeManager::ContentReceivedInputBlock, and
+    informs the APZ whether the input block was cancelled. The second
+    notification is via APZCTreeManager::SetTargetAPZC, and informs the
+    APZ of the results of the Gecko hit-test during event dispatch. Note
+    that Gecko may report that the input event did not hit any
+    scrollable frame at all. The SetTargetAPZC notification happens only
+    once per input block, while the ContentReceivedInputBlock
+    notification may happen once per block, or multiple times per block,
+    depending on the input type.
+9.
+
+    i.   If the events were processed as part of step 4(i), the
+         notifications from step 8 are ignored and step 10 is skipped.
+    ii.  If events were queued as part of step 4(ii), and steps 5-8
+         complete before the timeout, the arrival of both notifications
+         from step 8 will mark the input block ready for processing.
+    iii. If events were queued as part of step 4(ii), but steps 5-8 take
+         longer than the timeout, the notifications from step 8 will be
+         ignored and step 10 will already have happened.
+
+10. If events were queued as part of step 4(ii) they are now either
+    processed (if the input block was not cancelled and Gecko detected a
+    scrollframe under the input event, or if the timeout expired) or
+    dropped (all other cases). Note that the APZC that processes the
+    events may be different at this step than the tentative target from
+    step 3, depending on the SetTargetAPZC notification. Processing the
+    events may trigger behaviours like scrolling or tap gestures.
+
+If the CSS touch-action property is enabled, the above steps are
+modified as follows:
+
+* In step 4, the APZC also requires the allowed touch-action behaviours
+  for the input event. This might have been determined as part of the
+  hit-test in APZCTreeManager; if not, the events are queued.
+* In step 6, the widget code determines the content element at the point
+  under the input element, and notifies the APZ code of the allowed
+  touch-action behaviours. This notification is sent via a call to
+  APZCTreeManager::SetAllowedTouchBehavior on the input thread.
+* In step 9(ii), the input block will only be marked ready for processing
+  once all three notifications arrive.
+
+Threading considerations
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The bulk of the input processing in the APZ code happens on what we call
+“the controller thread”. In practice the controller thread could be the
+Gecko main thread, the compositor thread, or some other thread. There are
+obvious downsides to using the Gecko main thread - that is,“asynchronous”
+panning and zooming is not really asynchronous as input events can only
+be processed while Gecko is idle. In an e10s environment, using the Gecko
+main thread of the chrome process is acceptable, because the code running
+in that process is more controllable and well-behaved than arbitrary web
+content. Using the compositor thread as the controller thread could work
+on some platforms, but may be inefficient on others. For example, on
+Android (Fennec) we receive input events from the system on a dedicated
+UI thread. We would have to redispatch the input events to the compositor
+thread if we wanted to the input thread to be the same as the compositor
+thread. This introduces a potential for higher latency, particularly if
+the compositor does any blocking operations - blocking SwapBuffers
+operations, for example. As a result, the APZ code itself does not assume
+that the controller thread will be the same as the Gecko main thread or
+the compositor thread.
+
+Active vs. inactive scrollframes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The number of scrollframes on a page is potentially unbounded. However,
+we do not want to create a separate displayport for each scrollframe
+right away, as this would require large amounts of memory. Therefore,
+scrollframes as designated as either “active” or “inactive”. Active
+scrollframes get a displayport, and an APZC on the compositor side.
+Inactive scrollframes do not get a displayport (a display list is only
+built for their viewport, i.e. what is currently visible) and do not get
+an APZC.
+
+Consider a page with a scrollframe that is initially inactive. This
+scroll frame does not get an APZC, and therefore events targeting it will
+target the APZC for the nearest active scrollable ancestor (let's call it
+P; note, the rootmost scroll frame in a given process is always active).
+However, the presence of the inactive scroll frame is reflected by a
+dispatch-to-content region that prevents events over the frame from
+erroneously scrolling P.
+
+When the user starts interacting with that content, the hit-test in the
+APZ code hits the dispatch-to-content region of P. The input block
+therefore has a tentative target of P when it goes into step 4(ii) in the
+flow above. When gecko processes the input event, it must detect the
+inactive scrollframe and activate it, as part of step 7. Finally, the
+widget code sends the SetTargetAPZC notification in step 8 to notify the
+APZ that the input block should really apply to this new APZC. An issue
+here is that the transaction containing metadata for the newly active
+scroll frame must reach the compositor and APZ before the SetTargetAPZC
+notification. If this does not occur within the 400ms timeout, the APZ
+code will be unable to update the tentative target, and will continue to
+use P for that input block. Input blocks that start after the transaction
+will get correctly routed to the new scroll frame as there will now be an
+APZC instance for the active scrollframe.
+
+This model implies that when the user initially attempts to scroll an
+inactive scrollframe, it may end up scrolling an ancestor scrollframe.
+Only after the round-trip to the gecko thread is complete is there an
+APZC for async scrolling to actually occur on the scrollframe itself. At
+that point the scrollframe will start receiving new input blocks and will
+scroll normally.
+
+Note: with Fission (where inactive scroll frames would make it impossible
+to target the correct process in all situations; see
+:ref:`this section <fission-hit-testing>` for more details) and WebRender
+(which makes displayports more lightweight as the actual rendering is
+offloaded to the compositor and can be done on demand), inactive scroll
+frames are being phased out, and we are moving towards a model where all
+scroll frames with nonempty scroll ranges are active and get a
+displayport and an APZC. To conserve memory, displayports for scroll
+frames which have not been recently scrolled are kept to a "minimal" size
+equal to the viewport size.
+
+WebRender Integration
+~~~~~~~~~~~~~~~~~~~~~
+
+This section describes how APZ interacts with the WebRender graphics
+backend.
+
+Note that APZ predates WebRender, having initially been written to work
+with the earlier Layers graphics backend. The design of Layers has
+influenced APZ significantly, and this still shows in some places in the
+code. Now that the Layers backend has been removed, there may be
+opportunities to streamline the interaction between APZ and WebRender.
+
+
+HitTestingTree
+^^^^^^^^^^^^^^
+
+The APZCTreeManager keeps as part of its internal state a tree of
+HitTestingTreeNode instances. This is referred to as the HitTestingTree.
+
+The main purpose of the HitTestingTree is to model the spatial
+relationships between content that's affected by async scrolling. Tree
+nodes fall roughly into the following categories:
+
+* Nodes representing scrollable content in an active scroll frame. These
+  nodes are associated with the scroll frame's APZC.
+* Nodes representing other content that may move in special ways in
+  response to async scrolling, such as fixed content, sticky content, and
+  scrollbars.
+* (Non-leaf) nodes which do not represent any content, just metadata
+  (e.g. a transform) that applies to its descendant nodes.
+
+An APZC may be associated with multiple nodes, if e.g. a scroll frame
+scrolls two pieces of content that are interleaved with non-scrolling
+content.
+
+Arranging these nodes in a tree allows modelling relationships such as
+what content is scrolled by a given scroll frame, what the scroll handoff
+relationships are between APZCs, and what content is subject to what
+transforms.
+
+An additional use of the HitTestingTree is to allow APZ to keep content
+processes up to date about enclosing transforms that they are subject to.
+See :ref:`this section <sending-transforms-to-content-processes>` for
+more details.
+
+(In the past, with the Layers backend, the HitTestingTree was also used
+for compositor hit testing, hence the name. This is no longer the case,
+and there may be opportunities to simplify the tree as a result.)
+
+The HitTestingTree is created from another tree data structure called
+WebRenderScrollData. The relevant types here are:
+
+* WebRenderScrollData which stores the entire tree.
+* WebRenderLayerScrollData, which represents a single "layer" of content,
+  i.e. a group of display items that move together when scrolling (or
+  metadata applying to a subtree of such layers). In the Layers backend,
+  such content would be rendered into a single texture which could then
+  be moved asynchronously at composite time. Since a layer of content can
+  be scrolled by multiple (nested) scroll frames, a
+  WebRenderLayerScrollData may contain scroll metadata for more than one
+  scroll frame.
+* WebRenderScrollDataWrapper, which wraps WebRenderLayerScrollData
+  but "expanded" in a way that each node only stores metadata for
+  a single scroll frame. WebRenderScrollDataWrapper nodes have a
+  1:1 correspondence with HitTestingTreeNodes.
+
+It's not clear whether the distinction between WebRenderLayerScrollData
+and WebRenderScrollDataWrapper is still useful in a WebRender-only world.
+The code could potentially be revised such that we directly build and
+store nodes of a single type with the behaviour of
+WebRenderScrollDataWrapper.
+
+The WebRenderScrollData structure is built on the main thread, and
+then shipped over IPC to the compositor where it's used to construct
+the HitTestingTree.
+
+WebRenderScrollData is built in WebRenderCommandBuilder, during the
+same traversal of the Gecko display list that is used to build the
+WebRender display list. As of this writing, the architecture for this is
+that, as we walk the Gecko display list, we query it to see if it
+contains any information that APZ might need to know (e.g. CSS
+transforms) via a call to ``nsDisplayItem::UpdateScrollData(nullptr,
+nullptr)``. If this call returns true, we create a
+WebRenderLayerScrollData instance for the item, and populate it with the
+necessary information in ``WebRenderLayerScrollData::Initialize``. We also
+create WebRenderLayerScrollData instances if we detect (via ASR changes)
+that we are now processing a Gecko display item that is in a different
+scrollframe than the previous item.
+
+The main sources of complexity in this code come from:
+
+1. Ensuring the ScrollMetadata instances end on the proper
+   WebRenderLayerScrollData instances (such that every path from a leaf
+   WebRenderLayerScrollData node to the root has a consistent ordering of
+   scrollframes without duplications).
+2. The deferred-transform optimization that is described in more detail
+   at the declaration of ``StackingContextHelper::mDeferredTransformItem``.
+
+.. _wr-hit-test-details:
+
+Hit-testing
+^^^^^^^^^^^
+
+Since the HitTestingTree is not used for actual hit-testing purposes
+with the WebRender backend (see previous section), this section describes
+how hit-testing actually works with WebRender.
+
+The Gecko display list contains display items
+(``nsDisplayCompositorHitTestInfo``) that store hit-testing state. These
+items implement the ``CreateWebRenderCommands`` method and generate a "hit-test
+item" into the WebRender display list. This is basically just a rectangle
+item in the WebRender display list that is no-op for painting purposes,
+but contains information that should be returned by the hit-test (specifically
+the hit info flags and the scrollId of the enclosing scrollframe). The
+hit-test item gets clipped and transformed in the same way that all the other
+items in the WebRender display list do, via clip chains and enclosing
+reference frame/stacking context items.
+
+When WebRender needs to do a hit-test, it goes through its display list,
+taking into account the current clips and transforms, adjusted for the
+most recent async scroll/zoom, and determines which hit-test item(s) are under
+the target point, and returns those items. APZ can then take the frontmost
+item from that list (or skip over it if it happens to be inside a OOP
+subdocument that's ``pointer-events:none``) and use that as the hit target.
+Note that the hit-test uses the last transform provided by the
+``SampleForWebRender`` API (see next section) which generally reflects the
+last composite, and doesn't take into account further changes to the
+transforms that have occurred since then. In practice, we should be
+compositing frequently enough that this doesn't matter much.
+
+When debugging hit-test issues, it is often useful to apply the patches
+on bug 1656260, which introduce a guid on Gecko display items and propagate
+it all the way through to where APZ gets the hit-test result. This allows
+answering the question "which nsDisplayCompositorHitTestInfo was responsible
+for this hit-test result?" which is often a very good first step in
+solving the bug. From there, one can determine if there was some other
+display item in front that should have generated a
+nsDisplayCompositorHitTestInfo but didn't, or if display item itself had
+incorrect information. The second patch on that bug further allows exposing
+hand-written debug info to the APZ code, so that the WR hit-testing
+mechanism itself can be more effectively debugged, in case there is a problem
+with the WR display items getting improperly transformed or clipped.
+
+The information returned by WebRender to APZ in response to the hit test
+is enough for APZ to identify a HitTestingTreeNode as the target of the
+event. APZ can then take actions such as scrolling the target node's
+associated APZC, or other appropriate actions (e.g. initiating a scrollbar
+drag if a scrollbar thumb node was targeted by a mouse-down event).
+
+Sampling
+^^^^^^^^
+
+The compositing step needs to read the latest async transforms from APZ
+in order to ensure scrollframes are rendered at the right position. The API for this is
+exposed via the ``APZSampler`` class. When WebRender is ready to do a composite,
+it invokes ``APZSampler::SampleForWebRender``. In here, APZ gathers all async
+transforms that WebRender needs to know about, including transforms to apply
+to scrolled content, fixed and sticky content, and scrollbar thumbs.
+
+Along with sampling the APZ transforms, the compositor also triggers APZ
+animations to advance to the next timestep (usually the next vsync). This
+happens just before reading the APZ transforms.
+
+Fission Integration
+~~~~~~~~~~~~~~~~~~~
+
+This section describes how APZ interacts with the Fission (Site Isolation)
+project.
+
+Introduction
+^^^^^^^^^^^^
+
+Fission is an architectural change motivated by security considerations,
+where web content from each origin is isolated in its own process. Since
+a page can contain a mixture of content from different origins (for
+example, the top level page can be content from origin A, and it can
+contain an iframe with content from origin B), that means that rendering
+and interacting with a page can now involve coordination between APZ and
+multiple content processes.
+
+.. _fission-hit-testing:
+
+Content Process Selection for Input Events
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Input events are initially received in the browser's parent process.
+With Fission, the browser needs to decide which of possibly several
+content processes an event is targeting.
+
+Since process boundaries correspond to iframe (subdocument) boundaries,
+and every (html) document has a root scroll frame, process boundaries are
+therefore also scroll frame boundaries. Since APZ already needs a hit
+test mechanism to be able to determine which scroll frame an event
+targets, this hit test mechanism was a good fit to also use to determine
+which content process an event targets.
+
+APZ's hit test was therefore expanded to serve this purpose as well. This
+mostly required only minor modifications, such as making sure that APZ
+knows about the root scroll frames of iframes even if they're not
+scrollable. Since APZ already needs to process all input events to
+potentially apply :ref:`untransformations <input-event-untransformation>`
+related to async scrolling, as part of this process it now also labels
+input events with information identifying which content process they
+target.
+
+Hit Testing Accuracy
+^^^^^^^^^^^^^^^^^^^^
+
+Prior to Fission, APZ's hit test could afford to be somewhat inaccurate,
+as it could fall back on the dispatch-to-content mechanism to wait for
+a more accurate answer from the main thread if necessary, suffering a
+performance cost only (not a correctness cost).
+
+With Fission, an inaccurate compositor hit test now implies a correctness
+cost, as there is no cross-process main-thread fallback mechanism.
+(Such a mechanism was considered, but judged to require too much
+complexity and IPC traffic to be worth it.)
+
+Luckily, with WebRender the compositor has much more detailed information
+available to use for hit testing than it did with Layers. For example,
+the compositor can perform accurate hit testing even in the presence of
+irregular shapes such as rounded corners.
+
+APZ leverages WebRender's more accurate hit testing ability to aim to
+accurately select the target process (and target scroll frame) for an
+event in general.
+
+One consequence of this is that the dispatch-to-content mechanism is now
+used less often than before (its primary remaining use is handling
+`preventDefault()`).
+
+.. _sending-transforms-to-content-processes:
+
+Sending Transforms To Content Processes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Content processes sometimes need to be able to convert between screen
+coordinates and their local coordinates. To do this, they need to know
+about any transforms that their containing iframe and its ancestors are
+subject to, including async transforms (particularly in cases where the
+async transforms persist for more than just a few frames).
+
+APZ has information about these transforms in its HitTestingTree. With
+Fission, APZ periodically sends content processes information about these
+transforms so that they are kept relatively up to date.
+
+Testing
+-------
+
+APZ makes use of several test frameworks to verify the expected behavior
+is seen.
+
+Mochitest
+~~~~~~~~~
+
+The APZ specific mochitests are useful when specific gestures or events need to be tested
+with specific content. The APZ mochitests are located in `gfx/layers/apz/test/mochitest`_.
+To run all of the APZ mochitests, run something like the following:
+
+::
+
+    ./mach mochitest ./gfx/layers/apz/test/mochitest
+
+The APZ mochitests are often organized as subtests that run in a group. For example,
+the `test_group_hittest-2.html`_ contains >20 subtests like
+`helper_hittest_overscroll.html`_. When working on a specific subtest, it is often
+helpful to use the `apz.subtest` preference to filter the subtests run to just the
+tests you are working on. For example, the following would only run the
+`helper_hittest_overscroll.html`_ subtest of the `test_group_hittest-2.html`_ group.
+
+::
+
+    ./mach mochitest --setpref apz.subtest=helper_hittest_overscroll.html \
+        ./gfx/layers/apz/test/mochitest/test_group_hittest-2.html
+
+For more information on mochitest, see the `Mochitest Documentation`_.
+
+.. _gfx/layers/apz/test/mochitest: https://searchfox.org/mozilla-central/source/gfx/layers/apz/test/mochitest
+.. _test_group_hittest-2.html: https://searchfox.org/mozilla-central/source/gfx/layers/apz/test/mochitest/test_group_hittest-2.html
+.. _helper_hittest_overscroll.html: https://searchfox.org/mozilla-central/source/gfx/layers/apz/test/mochitest/helper_hittest_overscroll.html
+.. _Mochitest Documentation: /testing/mochitest-plain/index.html
+
+GTest
+~~~~~
+
+The APZ specific GTests can be found in `gfx/layers/apz/test/gtest/`_. To run
+these tests, run something like the following:
+
+::
+
+    ./mach gtest "APZ*"
+
+For more information, see the `GTest Documentation`_.
+
+.. _GTest Documentation: /gtest/index.html
+.. _gfx/layers/apz/test/gtest/: https://searchfox.org/mozilla-central/source/gfx/layers/apz/test/gtest/
+
+Reftests
+~~~~~~~~
+
+The APZ reftests can be found in `layout/reftests/async-scrolling/`_ and
+`gfx/layers/apz/test/reftest`_. To run the relevant reftests for APZ, run
+a large portion of the APZ reftests, run something like the following:
+
+::
+
+    ./mach reftest ./layout/reftests/async-scrolling/
+
+Useful information about the reftests can be found in the `Reftest Documentation`_.
+
+There is no defined process for choosing which directory the APZ reftests
+should be placed in, but in general reftests should exist where other
+similar tests do.
+
+.. _layout/reftests/async-scrolling/: https://searchfox.org/mozilla-central/source/layout/reftests/async-scrolling/
+.. _gfx/layers/apz/test/reftest: https://searchfox.org/mozilla-central/source/gfx/layers/apz/test/reftest/
+.. _Reftest Documentation: /layout/Reftest.html
+
+Threading / Locking Overview
+----------------------------
+
+Threads
+~~~~~~~
+
+There are three threads relevant to APZ: the **controller thread**,
+the **updater thread**, and the **sampler thread**. This table lists
+which threads play these roles on each platform / configuration:
+
+===================== ============= ============== =============
+APZ Thread Name       Desktop       Desktop+GPU    Android
+===================== ============= ============== =============
+**controller thread** UI main       GPU main       Java UI
+**updater thread**    SceneBuilder  SceneBuilder   SceneBuilder
+**sampler thread**    RenderBackend RenderBackend  RenderBackend
+===================== ============= ============== =============
+
+Locks
+~~~~~
+
+There are also a number of locks used in APZ code:
+
+======================= ==============================
+Lock type               How many instances
+======================= ==============================
+APZ tree lock           one per APZCTreeManager
+APZC map lock           one per APZCTreeManager
+APZC instance lock      one per AsyncPanZoomController
+APZ test lock           one per APZCTreeManager
+Checkerboard event lock one per AsyncPanZoomController
+======================= ==============================
+
+Thread / Lock Ordering
+~~~~~~~~~~~~~~~~~~~~~~
+
+To avoid deadlocks, the threads and locks have a global **ordering**
+which must be respected.
+
+Respecting the ordering means the following:
+
+- Let "A < B" denote that A occurs earlier than B in the ordering
+- Thread T may only acquire lock L, if T < L
+- A thread may only acquire lock L2 while holding lock L1, if L1 < L2
+- A thread may only block on a response from another thread T while holding a lock L, if L < T
+
+**The lock ordering is as follows**:
+
+1. UI main
+2. GPU main (only if GPU process enabled)
+3. Compositor thread
+4. SceneBuilder thread
+5. **APZ tree lock**
+6. RenderBackend thread
+7. **APZC map lock**
+8. **APZC instance lock**
+9. **APZ test lock**
+10. **Checkerboard event lock**
+
+Example workflows
+^^^^^^^^^^^^^^^^^
+
+Here are some example APZ workflows. Observe how they all obey
+the global thread/lock ordering. Feel free to add others:
+
+- **Input handling** (with GPU process): UI main -> GPU main -> APZ tree lock -> RenderBackend thread
+- **Sync messages** in ``PCompositorBridge.ipdl``: UI main thread -> Compositor thread
+- **GetAPZTestData**: Compositor thread -> SceneBuilder thread -> test lock
+- **Scene swap**: SceneBuilder thread -> APZ tree lock -> RenderBackend thread
+- **Updating hit-testing tree**: SceneBuilder thread -> APZ tree lock -> APZC instance lock
+- **Updating APZC map**: SceneBuilder thread -> APZ tree lock -> APZC map lock
+- **Sampling and animation deferred tasks** [1]_: RenderBackend thread -> APZC map lock -> APZC instance lock
+- **Advancing animations**: RenderBackend thread -> APZC instance lock
+
+.. [1] It looks like there are two deferred tasks that actually need the tree lock,
+   ``AsyncPanZoomController::HandleSmoothScrollOverscroll`` and
+   ``AsyncPanZoomController::HandleFlingOverscroll``. We should be able to rewrite
+   these to use the map lock instead of the tree lock.
+   This will allow us to continue running the deferred tasks on the sampler
+   thread rather than having to bounce them to another thread.
diff --git a/gfx/docs/AsyncPanZoomArchitecture.png b/gfx/docs/AsyncPanZoomArchitecture.png
new file mode 100644
index 0000000000..d19dcb7c8b
--- /dev/null
+++ b/gfx/docs/AsyncPanZoomArchitecture.png
diff --git a/gfx/docs/GraphicsOverview.rst b/gfx/docs/GraphicsOverview.rst
new file mode 100644
index 0000000000..77b0379743
--- /dev/null
+++ b/gfx/docs/GraphicsOverview.rst
@@ -0,0 +1,149 @@
+Graphics Overview
+=========================
+
+Work in progress. Possibly incorrect or incomplete.
+---------------------------------------------------
+
+Jargon
+------
+
+There's a lot of jargon in the graphics stack. We try to maintain a list
+of common words and acronyms `here <https://wiki.mozilla.org/Platform/GFX/Jargon>`__.
+
+Overview
+--------
+
+The graphics systems is responsible for rendering (painting, drawing)
+the frame tree (rendering tree) elements as created by the layout
+system. Each leaf in the tree has content, either bounded by a rectangle
+(or perhaps another shape, in the case of SVG.)
+
+The simple approach for producing the result would thus involve
+traversing the frame tree, in a correct order, drawing each frame into
+the resulting buffer and displaying (printing non-withstanding) that
+buffer when the traversal is done. It is worth spending some time on the
+“correct order” note above. If there are no overlapping frames, this is
+fairly simple - any order will do, as long as there is no background. If
+there is background, we just have to worry about drawing that first.
+Since we do not control the content, chances are the page is more
+complicated. There are overlapping frames, likely with transparency, so
+we need to make sure the elements are draw “back to front”, in layers,
+so to speak. Layers are an important concept, and we will revisit them
+shortly, as they are central to fixing a major issue with the above
+simple approach.
+
+While the above simple approach will work, the performance will suffer.
+Each time anything changes in any of the frames, the complete process
+needs to be repeated, everything needs to be redrawn. Further, there is
+very little space to take advantage of the modern graphics (GPU)
+hardware, or multi-core computers. If you recall from the previous
+sections, the frame tree is only accessible from the UI thread, so while
+we’re doing all this work, the UI is basically blocked.
+
+(Retained) Layers
+~~~~~~~~~~~~~~~~~
+
+Layers framework was introduced to address the above performance issues,
+by having a part of the design address each item. At the high level:
+
+1. We create a layer tree. The leaf elements of the tree contain all
+   frames (possibly multiple frames per leaf).
+2. We render each layer tree element and cache (retain) the result.
+3. We composite (combine) all the leaf elements into the final result.
+
+Let’s examine each of these steps, in reverse order.
+
+Compositing
+~~~~~~~~~~~
+
+We use the term composite as it implies that the order is important. If
+the elements being composited overlap, whether there is transparency
+involved or not, the order in which they are combined will effect the
+result. Compositing is where we can use some of the power of the modern
+graphics hardware. It is optimal for doing this job. In the scenarios
+where only the position of individual frames changes, without the
+content inside them changing, we see why caching each layer would be
+advantageous - we only need to repeat the final compositing step,
+completely skipping the layer tree creation and the rendering of each
+leaf, thus speeding up the process considerably.
+
+Another benefit is equally apparent in the context of the stated
+deficiencies of the simple approach. We can use the available graphics
+hardware accelerated APIs to do the compositing step. Direct3D, OpenGL
+can be used on different platforms and are well suited to accelerate
+this step.
+
+Finally, we can now envision performing the compositing step on a
+separate thread, unblocking the UI thread for other work, and doing more
+work in parallel. More on this below.
+
+It is important to note that the number of operations in this step is
+proportional to the number of layer tree (leaf) elements, so there is
+additional work and complexity involved, when the layer tree is large.
+
+Render and retain layer elements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As we saw, the compositing step benefits from caching the intermediate
+result. This does result in the extra memory usage, so needs to be
+considered during the layer tree creation. Beyond the caching, we can
+accelerate the rendering of each element by (indirectly) using the
+available platform APIs (e.g., Direct2D, CoreGraphics, even some of the
+3D APIs like OpenGL or Direct3D) as available. This is actually done
+through a platform independent API (see Moz2D) below, but is important
+to realize it does get accelerated appropriately.
+
+Creating the layer tree
+~~~~~~~~~~~~~~~~~~~~~~~
+
+We need to create a layer tree (from the frames tree), which will give
+us the correct result while striking the right balance between a layer
+per frame element and a single layer for the complete frames tree. As
+was mentioned above, there is an overhead in traversing the whole tree
+and caching each of the elements, balanced by the performance
+improvements. Some of the performance improvements are only noticed when
+something changes (e.g., one element is moving, we only need to redo the
+compositing step).
+
+Refresh Driver
+~~~~~~~~~~~~~~
+
+Layers
+~~~~~~
+
+Rendering each layer
+~~~~~~~~~~~~~~~~~~~~
+
+Tiling vs. Buffer Rotation vs. Full paint
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Compositing for the final result
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Graphics API
+~~~~~~~~~~~~
+
+Compositing
+~~~~~~~~~~~
+
+Image Decoding
+~~~~~~~~~~~~~~
+
+Image Animation
+~~~~~~~~~~~~~~~
+
+`Historical Documents <http://www.youtube.com/watch?v=lLZQz26-kms>`__
+---------------------------------------------------------------------
+
+A number of posts and blogs that will give you more details or more
+background, or reasoning that led to different solutions and approaches.
+
+-  2010-01 `Layers: Cross Platform Acceleration <http://www.basschouten.com/blog1.php/layers-cross-platform-acceleration>`__
+-  2010-04 `Layers <http://robert.ocallahan.org/2010/04/layers_01.html>`__
+-  2010-07 `Retained Layers <http://robert.ocallahan.org/2010/07/retained-layers_16.html>`__
+-  2011-04 `Introduction <https://web.archive.org/web/20140604005804/https://blog.mozilla.org/joe/2011/04/26/introducing-the-azure-project/>`__
+-  2011-07 `Layers <http://chrislord.net/index.php/2011/07/25/shadow-layers-and-learning-by-failing/%20Shadow>`__
+-  2011-09 `Graphics API Design <http://robert.ocallahan.org/2011/09/graphics-api-design.html>`__
+-  2012-04 `Moz2D Canvas on OSX <http://muizelaar.blogspot.ca/2012/04/azure-canvas-on-os-x.html>`__
+-  2012-05 `Mask Layers <http://featherweightmusings.blogspot.co.uk/2012/05/mask-layers_26.html>`__
+-  2013-07 `Graphics related <http://www.basschouten.com/blog1.php>`__
diff --git a/gfx/docs/LayersHistory.rst b/gfx/docs/LayersHistory.rst
new file mode 100644
index 0000000000..360df9b37d
--- /dev/null
+++ b/gfx/docs/LayersHistory.rst
@@ -0,0 +1,63 @@
+Layers History
+==============
+
+This is an overview of the major events in the history of our Layers
+infrastructure.
+
+-  iPhone released in July 2007 (Built on a toolkit called LayerKit)
+
+-  Core Animation (October 2007) LayerKit was publicly renamed to OS X
+   10.5
+
+-  Webkit CSS 3d transforms (July 2009)
+
+-  Original layers API (March 2010) Introduced the idea of a layer
+   manager that would composite. One of the first use cases for this was
+   hardware accelerated YUV conversion for video.
+
+-  Retained layers (July 7 2010 - Bug 564991) This was an important
+   concept that introduced the idea of persisting the layer content
+   across paints in gecko controlled buffers instead of just by the OS.
+   This introduced the concept of buffer rotation to deal with scrolling
+   instead of using the native scrolling APIs like ScrollWindowEx
+
+-  Layers IPC (July 2010 - Bug 570294) This introduced shadow layers and
+   edit lists and was originally done for e10s v1
+
+-  3D transforms (September 2011 - Bug 505115)
+
+-  OMTC (December 2011 - Bug 711168) This was prototyped on OS X but
+   shipped first for Fennec
+
+-  Tiling v1 (April 2012 - Bug 739679) Originally done for Fennec. This
+   was done to avoid situations where we had to do a bunch of work for
+   scrolling a small amount. i.e. buffer rotation. It allowed us to have
+   a variety of interesting features like progressive painting and lower
+   resolution painting.
+
+-  C++ Async pan zoom controller (July 2012 - Bug 750974) The existing
+   APZ code was in Java for Fennec so this was reimplemented.
+
+-  Streaming WebGL Buffers (February 2013 - Bug 716859) Infrastructure
+   to allow OMTC WebGL and avoid the need to glFinish() every frame.
+
+-  Compositor API (April 2013 - Bug 825928) The planning for this
+   started around November 2012. Layers refactoring created a compositor
+   API that abstracted away the differences between the D3D vs OpenGL.
+   The main piece of API is DrawQuad.
+
+-  Tiling v2 (Mar 7 2014 - Bug 963073) Tiling for B2G. This work is
+   mainly porting tiled layers to new textures, implementing
+   double-buffered tiles and implementing a texture client pool, to be
+   used by tiled content clients.
+
+   A large motivation for the pool was the very slow performance of
+   allocating tiles because of the sync messages to the compositor.
+
+   The slow performance of allocating was directly addressed by bug 959089
+   which allowed us to allocate gralloc buffers without sync messages to
+   the compositor thread.
+
+-  B2G WebGL performance (May 2014 - Bug 1006957, 1001417, 1024144) This
+   work improved the synchronization mechanism between the compositor
+   and the producer.
diff --git a/gfx/docs/Moz2D.rst b/gfx/docs/Moz2D.rst
new file mode 100644
index 0000000000..0be251a209
--- /dev/null
+++ b/gfx/docs/Moz2D.rst
@@ -0,0 +1,16 @@
+Moz2D
+========================
+
+The `gfx/2d` contains our abstraction of a typical 2D API (similar
+to the HTML Canvas API). It has different backends used for different
+purposes. Direct2D is used for implementing hardware accelerated
+canvas on Windows. Skia is used for any software drawing needs and
+Cairo is used for printing.
+
+Previously, Moz2D aimed to be buildable independently from the rest of
+Gecko but we've slipped from this because C++/Gecko don't have a good
+mechanism for modularization/dependencies. That being said, we still try
+to keep the coupling with the rest of Gecko low for hygiene, simplicity
+and perhaps a more modular future.
+
+See also `Moz2D documentation on wiki <https://wiki.mozilla.org/Platform/GFX/Moz2D>`.
diff --git a/gfx/docs/RenderingOverview.rst b/gfx/docs/RenderingOverview.rst
new file mode 100644
index 0000000000..50b146d9b9
--- /dev/null
+++ b/gfx/docs/RenderingOverview.rst
@@ -0,0 +1,384 @@
+Rendering Overview
+==================
+
+This document is an overview of the steps to render a webpage, and how HTML
+gets transformed and broken down, step by step, into commands that can execute
+on the GPU.
+
+If you're coming into the graphics team with not a lot of background
+in browsers, start here :)
+
+.. contents::
+
+High level overview
+-------------------
+
+.. image:: RenderingOverviewSimple.png
+   :width: 100%
+
+Layout
+~~~~~~
+Starting at the left in the above image, we have a document
+represented by a DOM - a Document Object Model.  A Javascript engine
+will execute JS code, either to make changes to the DOM, or to respond to
+events generated by the DOM (or do both).
+
+The DOM is a high level description and we don't know what to draw or
+where until it is combined with a Cascading Style Sheet (CSS).
+Combining these two and figuring out what, where and how to draw
+things is the responsibility of the Layout team.  The
+DOM is converted into a hierarchical Frame Tree, which nests visual
+elements (boxes).  Each element points to some node in a Style Tree
+that describes what it should look like -- color, transparency, etc.
+The result is that now we know exactly what to render where, what goes
+on top of what (layering and blending) and at what pixel coordinate.
+This is the Display List.
+
+The Display List is a light-weight data structure because it's shallow
+-- it mostly points back to the Frame Tree.  There are two problems
+with this.  First, we want to cross process boundaries at this point.
+Everything up until now happens in a Content Process (of which there are
+several).  Actual GPU rendering happens in a GPU Process (on some
+platforms).  Second, everything up until now was written in C++; but
+WebRender is written in Rust.  Thus the shallow Display List needs to
+be serialized in a completely self-contained binary blob that will
+survive Interprocess Communication (IPC) and a language switch (C++ to
+Rust).  The result is the WebRender Display List.
+
+WebRender
+~~~~~~~~~
+
+The GPU process receives the WebRender Display List blob and
+de-serializes it into a Scene.  This Scene contains more than the
+strictly visible elements; for example, to anticipate scrolling, we
+might have several paragraphs of text extending past the visible page.
+
+For a given viewport, the Scene gets culled and stripped down to a
+Frame.  This is also where we start preparing data structures for GPU
+rendering, for example getting some font glyphs into an atlas for
+rasterizing text.
+
+The final step takes the Frame and submits commands to the GPU to
+actually render it.  The GPU will execute the commands and composite
+the final page.
+
+Software
+~~~~~~~~
+
+The above is the new WebRender-enabled way to do things.  But in the
+schematic you'll note a second branch towards the bottom: this is the
+legacy code path which does not use WebRender (nor Rust).  In this
+case, the Display List is converted into a Layer Tree. The purpose of
+this Tree is to try and avoid having to re-render absolutely
+everything when the page needs to be refreshed. For example, when
+scrolling we should be able to redraw the page by mostly shifting
+things around. However that requires those 'things' to still be around
+from last time we drew the page.  In other words, visual elements that
+are likely to be static and reusable need to be drawn into their own
+private "page" (a cache).  Then we can recombine (composite) all of
+these when redrawing the actual page.
+
+Figuring out which elements would be good candidates for this, and
+striking a balance between good performance versus excessive memory
+use, is the purpose of the Layer Tree.  Each 'layer' is a cached image
+of some element(s).  This logic also takes occlusion into account, eg.
+don't allocate and render a layer for elements that are known to be
+completely obscured by something in front of them.
+
+Redrawing the page by combining the Layer Tree with any newly
+rasterized elements is the job of the Compositor.
+
+
+Even when a layer cannot be reused in its entirety, it is likely
+that only a small part of it was invalidated.  Thus there is an
+elaborate system for tracking dirty rectangles, starting an update by
+copying the area that can be salvaged, and then redrawing only what
+cannot.
+
+In fact, this idea can be extended to delta-tracking of display lists
+themselves. Traversing the layout tree and building a display list is
+also not cheap, so the code tries to partially invalidate and rebuild
+the display list incrementally when possible.
+This optimization is used both for non-WebRender and WebRender in
+fact.
+
+
+Asynchronous Panning And Zooming
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Earlier we mentioned that a Scene might contain more elements than are
+strictly necessary for rendering what's visible (the Frame).  The
+reason for that is Asynchronous Panning and Zooming, or APZ for short.
+The browser will feel much more responsive if scrolling & zooming can
+short-circuit all of these data transformations and IPC boundaries,
+and instead directly update an offset of some layer and recomposite.
+(Think of late-latching in a VR context)
+
+This simple idea introduces a lot of complexity: how much extra do you
+rasterize, and in which direction?  How much memory can we afford?
+What about Javascript that responds to scroll events and perhaps does
+something 'interesting' with the page in return?  What about nested
+frames or nested scrollbars?  What if we scroll so much that we go
+past the boundaries of the Scene that we know about?
+
+See AsyncPanZoom.rst for all that and more.
+
+A Few More Details
+~~~~~~~~~~~~~~~~~~
+
+Here's another schematic which basically repeats the previous one, but
+showing a little bit more detail.  Note that the direction is reversed
+-- the data flow starts at the right.  Sorry about that :)
+
+.. image:: RenderingOverviewDetail.png
+   :width: 100%
+
+Some things to note:
+
+- there are multiple content processes, currently 4 of them.  This is
+  for security reasons (sandboxing), stability (isolate crashes) and
+  performance (multi-core machines);
+- ideally each "webpage" would run in its own process for security;
+  this is being developed under the term 'fission';
+- there is only a single GPU process, if there is one at all;
+  some platforms have it as part of the Parent;
+- not shown here is the Extension process that isolates WebExtensions;
+- for non-WebRender, rasterization happens in the Content Process, and
+  we send entire Layers to the GPU/Compositor process (via shared
+  memory, only using actual IPC for its metadata like width & height);
+- if the GPU process crashes (a bug or a driver issue) we can simply
+  restart it, resend the display list, and the browser itself doesn't crash;
+- the browser UI is just another set of DOM+JS, albeit one that runs
+  with elevated privileges. That is, its JS can do things that
+  normal JS cannot.  It lives in the Parent Process, which then uses
+  IPC to get it rendered, same as regular Content. (the IPC arrow also
+  goes to WebRender Display List but is omitted to reduce clutter);
+- UI events get routed to APZ first, to minimize latency. By running
+  inside the GPU process, we may have access to data such
+  as rasterized clipping masks that enables finer grained hit testing;
+- the GPU process talks back to the content process; in particular,
+  when APZ scrolls out of bounds, it asks Content to enlarge/shift the
+  Scene with a new "display port";
+- we still use the GPU when we can for compositing even in the
+  non-WebRender case;
+
+
+WebRender In Detail
+-------------------
+
+Converting a display list into GPU commands is broken down into a
+number of steps and intermediate data structures.
+
+
+.. image:: RenderingOverviewTrees.png
+   :width: 75%
+   :align: center
+
+..
+
+    *Each element in the picture tree points to exactly one node in the spatial
+    tree. Only a few of these links are shown for clarity (the dashed lines).*
+
+The Picture Tree
+~~~~~~~~~~~~~~~~
+
+The incoming display list uses "stacking contexts".  For example, to
+render some text with a drop shadow, a display list will contain three
+items:
+
+- "enable shadow" with some parameters such as shadow color, blur size, and offset;
+- the text item;
+- "pop all shadows" to deactivate shadows;
+
+WebRender will break this down into two distinct elements, or
+"pictures".  The first represents the shadow, so it contains a copy of the
+text item, but modified to use the shadow's color, and to shift the
+text by the shadow's offset.  The second picture contains the original text
+to draw on top of the shadow.
+
+The fact that the first picture, the shadow, needs to be blurred, is a
+"compositing" property of the picture which we'll deal with later.
+
+Thus, the stack-based display list gets converted into a list of pictures
+-- or more generally, a hierarchy of pictures, since items are nested
+as per the original HTML.
+
+Example visual elements are a TextRun, a LineDecoration, or an Image
+(like a .png file).
+
+Compared to 3D rendering, the picture tree is similar to a scenegraph: it's a
+parent/child hierarchy of all the drawable elements that make up the "scene", in
+this case the webpage.  One important difference is that the transformations are
+stored in a separate tree, the spatial tree.
+
+The Spatial Tree
+~~~~~~~~~~~~~~~~
+
+The nodes in the spatial tree represent coordinate transforms.  Every time the
+DOM hierarchy needs child elements to be transformed relative to their parent,
+we add a new Spatial Node to the tree. All those child elements will then point
+to this node as their "local space" reference (aka coordinate frame).  In
+traditional 3D terms, it's a scenegraph but only containing transform nodes.
+
+The nodes are called frames, as in "coordinate frame":
+
+- a Reference Frame corresponds to a ``<div>``;
+- a Scrolling Frame corresponds to a scrollable part of the page;
+- a Sticky Frame corresponds to some fixed position CSS style.
+
+Each element in the picture tree then points to a spatial node inside this tree,
+so by walking up and down the tree we can find the absolute position of where
+each element should render (traversing down) and how large each element needs to
+be (traversing up).  Originally the transform information was part of the
+picture tree, as in a traditional scenegraph, but visual elements and their
+transforms were split apart for technical reasons.
+
+Some of these nodes are dynamic.  A scroll-frame can obviously scroll, but a
+Reference Frame might also use a property binding to enable a live link with
+JavaScript, for dynamic updates of (currently) the transform and opacity.
+
+Axis-aligned transformations (scales and translations) are considered "simple",
+and are conceptually combined into a single "CoordinateSystem".  When we
+encounter a non-axis-aligned transform, we start a new CoordinateSystem.  We
+start in CoordinateSystem 0 at the root, and would bump this to CoordinateSystem
+1 when we encounter a Reference Frame with a rotation or 3D transform, for
+example.  This would then be the CoordinateSystem index for all its children,
+until we run into another (nested) non-simple transform, and so on.  Roughly
+speaking, as long as we're in the same CoordinateSystem, the transform stack is
+simple enough that we have a reasonable chance of being able to flatten it. That
+lets us directly rasterize text at its final scale for example, optimizing
+away some of the intermediate pictures (offscreen textures).
+
+The layout code positions elements relative to their parent.  Thus to position
+the element on the actual page, we need to walk the Spatial Tree all the way to
+the root and apply each transform; the result is a ``LayoutToWorldTransform``.
+
+One final step transforms from World to Device coordinates, which deals with
+DPI scaling and such.
+
+.. csv-table::
+    :header: "WebRender term", "Rough analogy"
+
+      Spatial Tree, Scenegraph -- transforms only
+      Picture Tree, Scenegraph -- drawables only (grouping)
+      Spatial Tree Rootnode, World Space
+      Layout space, Local/Object Space
+      Picture, RenderTarget (sort of; see RenderTask below)
+      Layout-To-World transform, Local-To-World transform
+      World-To-Device transform, World-To-Clipspace transform
+
+
+The Clip Tree
+~~~~~~~~~~~~~
+
+Finally, we also have a Clip Tree, which contains Clip Shapes. For
+example, a rounded corner div will produce a clip shape, and since
+divs can be nested, you end up with another tree.  By pointing at a Clip Shape,
+visual elements will be clipped against this shape plus all parent shapes above it
+in the Clip Tree.
+
+As with CoordinateSystems, a chain of simple 2D clip shapes can be collapsed
+into something that can be handled in the vertex shader, at very little extra
+cost.  More complex clips must be rasterized into a mask first, which we then
+sample from to ``discard`` in the pixel shader as needed.
+
+In summary, at the end of scene building the display list turned into
+a picture tree, plus a spatial tree that tells us what goes where
+relative to what, plus a clip tree.
+
+RenderTask Tree
+~~~~~~~~~~~~~~~
+
+Now in a perfect world we could simply traverse the picture tree and start
+drawing things: one drawcall per picture to render its contents, plus one
+drawcall to draw the picture into its parent.  However, recall that the first
+picture in our example is a "text shadow" that needs to be blurred.  We can't
+just rasterize blurry text directly, so we need a number of steps or "render
+passes" to get the intended effect:
+
+.. image:: RenderingOverviewBlurTask.png
+   :align: right
+   :height: 400px
+
+- rasterize the text into an offscreen rendertarget;
+- apply one or more downscaling passes until the blur radius is reasonable;
+- apply a horizontal Gaussian blur;
+- apply a vertical Gaussian blur;
+- use the result as an input for whatever comes next, or blit it to
+  its final position on the page (or more generally, on the containing
+  parent surface/picture).
+
+In the general case, which passes we need and how many of them depends
+on how the picture is supposed to be composited (CSS filters, SVG
+filters, effects) and its parameters (very large vs. small blur
+radius, say).
+
+Thus, we walk the picture tree and build a render task tree: each high
+level abstraction like "blur me" gets broken down into the necessary
+render passes to get the effect.  The result is again a tree because a
+render pass can have multiple input dependencies (eg. blending).
+
+(Cfr. games, this has echoes of the Frostbite Framegraph in that it
+dynamically builds up a renderpass DAG and dynamically allocates storage
+for the outputs).
+
+If there are complicated clip shapes that need to be rasterized first,
+so their output can be sampled as a texture for clip/discard
+operations, that would also end up in this tree as a dependency... (I think?).
+
+Once we have the entire tree of dependencies, we analyze it to see
+which tasks can be combined into a single pass for efficiency.  We
+ping-pong rendertargets when we can, but sometimes the dependencies
+cut across more than one level of the rendertask tree, and some
+copying is necessary.
+
+Once we've figured out the passes and allocated storage for anything
+we wish to persist in the texture cache, we finally start rendering.
+
+When rasterizing the elements into the Picture's offscreen texture, we'd
+position them by walking the transform hierarchy as far up as the picture's
+transform node, resulting in a ``Layout To Picture`` transform.  The picture
+would then go onto the page using a ``Picture To World`` coordinate transform.
+
+Caching
+```````
+
+Just as with layers in the software rasterizer, it is not always necessary to
+redraw absolutely everything when parts of a document change.  The webrender
+equivalent of layers is Slices -- a grouping of pictures that are expected to
+render and update together.  Slices are automatically created based on
+heuristics and layout hints/flags.
+
+Implementation wise, slices re-use a lot of the existing machinery for Pictures;
+in fact they're implemented as a "Virtual picture" of sorts.  The similarities
+make sense: both need to allocate offscreen textures in a cache, both will
+position and render all their children into it, and both then draw themselves
+into their parent as part of the parent's draw.
+
+If a slice isn't expected to change much, we give it a TileCacheInstance. It is
+itself made up of Tiles, where each tile will track what's in it, what's
+changing, and if it needs to be invalidated and redrawn or not as a result.
+Thus the "damage" from changes can be localized to single tiles, while we
+salvage the rest of the cache.  If tiles keep seeing a lot of invalidations,
+they will recursively divide themselves in a quad-tree like structure to try and
+localize the invalidations.  (And conversely, they'll recombine children if
+nothing is invalidating them "for a while").
+
+Interning
+`````````
+
+To spot invalidated tiles, we need a fast way to compare its contents from the
+previous frame with the current frame.  To speed this up, we use interning;
+similar to string-interning, this means that each ``TextRun``, ``Decoration``,
+``Image`` and so on is registered in a repository (a ``DataStore``) and
+consequently referred to by its unique ID. Cache contents can then be encoded as a
+list of IDs (one such list per internable element type).  Diffing is then just a
+fast list comparison.
+
+
+Callbacks
+`````````
+GPU text rendering assumes that the individual font-glyphs are already
+available in a texture atlas.  Likewise SVG is not being rendered on
+the GPU.  Both inputs are prepared during scene building; glyph
+rasterization via a thread pool from within Rust itself, and SVG via
+opaque callbacks (back to C++) that produce blobs.
diff --git a/gfx/docs/RenderingOverviewBlurTask.png b/gfx/docs/RenderingOverviewBlurTask.png
new file mode 100644
index 0000000000..baffc08f32
--- /dev/null
+++ b/gfx/docs/RenderingOverviewBlurTask.png
diff --git a/gfx/docs/RenderingOverviewDetail.png b/gfx/docs/RenderingOverviewDetail.png
new file mode 100644
index 0000000000..2909a811e4
--- /dev/null
+++ b/gfx/docs/RenderingOverviewDetail.png
diff --git a/gfx/docs/RenderingOverviewSimple.png b/gfx/docs/RenderingOverviewSimple.png
new file mode 100644
index 0000000000..43c0a59439
--- /dev/null
+++ b/gfx/docs/RenderingOverviewSimple.png
diff --git a/gfx/docs/RenderingOverviewTrees.png b/gfx/docs/RenderingOverviewTrees.png
new file mode 100644
index 0000000000..ffdf0812fa
--- /dev/null
+++ b/gfx/docs/RenderingOverviewTrees.png
diff --git a/gfx/docs/Silk.rst b/gfx/docs/Silk.rst
new file mode 100644
index 0000000000..16e4cdfc7b
--- /dev/null
+++ b/gfx/docs/Silk.rst
@@ -0,0 +1,472 @@
+Silk Overview
+==========================
+
+.. image:: SilkArchitecture.png
+
+Architecture
+------------
+
+Our current architecture is to align three components to hardware vsync
+timers:
+
+1. Compositor
+2. RefreshDriver / Painting
+3. Input Events
+
+The flow of our rendering engine is as follows:
+
+1. Hardware Vsync event occurs on an OS specific *Hardware Vsync Thread*
+   on a per monitor basis.
+2. The *Hardware Vsync Thread* attached to the monitor notifies the
+   ``CompositorVsyncDispatchers`` and ``VsyncDispatcher``.
+3. For every Firefox window on the specific monitor, notify a
+   ``CompositorVsyncDispatcher``. The ``CompositorVsyncDispatcher`` is
+   specific to one window.
+4. The ``CompositorVsyncDispatcher`` notifies a
+   ``CompositorWidgetVsyncObserver`` when remote compositing, or a
+   ``CompositorVsyncScheduler::Observer`` when compositing in-process.
+5. If remote compositing, a vsync notification is sent from the
+   ``CompositorWidgetVsyncObserver`` to the ``VsyncBridgeChild`` on the
+   UI process, which sends an IPDL message to the ``VsyncBridgeParent``
+   on the compositor thread of the GPU process, which then dispatches to
+   ``CompositorVsyncScheduler::Observer``.
+6. The ``VsyncDispatcher`` notifies the Chrome
+   ``RefreshTimer`` that a vsync has occurred.
+7. The ``VsyncDispatcher`` sends IPC messages to all content
+   processes to tick their respective active ``RefreshTimer``.
+8. The ``Compositor`` dispatches input events on the *Compositor
+   Thread*, then composites. Input events are only dispatched on the
+   *Compositor Thread* on b2g.
+9. The ``RefreshDriver`` paints on the *Main Thread*.
+
+Hardware Vsync
+--------------
+
+Hardware vsync events from (1), occur on a specific ``Display`` Object.
+The ``Display`` object is responsible for enabling / disabling vsync on
+a per connected display basis. For example, if two monitors are
+connected, two ``Display`` objects will be created, each listening to
+vsync events for their respective displays. We require one ``Display``
+object per monitor as each monitor may have different vsync rates. As a
+fallback solution, we have one global ``Display`` object that can
+synchronize across all connected displays. The global ``Display`` is
+useful if a window is positioned halfway between the two monitors. Each
+platform will have to implement a specific ``Display`` object to hook
+and listen to vsync events. As of this writing, both Firefox OS and OS X
+create their own hardware specific *Hardware Vsync Thread* that executes
+after a vsync has occurred. OS X creates one *Hardware Vsync Thread* per
+``CVDisplayLinkRef``. We do not currently support multiple displays, so
+we use one global ``CVDisplayLinkRef`` that works across all active
+displays. On Windows, we have to create a new platform ``thread`` that
+waits for DwmFlush(), which works across all active displays. Once the
+thread wakes up from DwmFlush(), the actual vsync timestamp is retrieved
+from DwmGetCompositionTimingInfo(), which is the timestamp that is
+actually passed into the compositor and refresh driver.
+
+When a vsync occurs on a ``Display``, the *Hardware Vsync Thread*
+callback fetches all ``CompositorVsyncDispatchers`` associated with the
+``Display``. Each ``CompositorVsyncDispatcher`` is notified that a vsync
+has occurred with the vsync’s timestamp. It is the responsibility of the
+``CompositorVsyncDispatcher`` to notify the ``Compositor`` that is
+awaiting vsync notifications. The ``Display`` will then notify the
+associated ``VsyncDispatcher``, which should notify all
+active ``RefreshDrivers`` to tick.
+
+All ``Display`` objects are encapsulated in a ``VsyncSource`` object.
+The ``VsyncSource`` object lives in ``gfxPlatform`` and is instantiated
+only on the parent process when ``gfxPlatform`` is created. The
+``VsyncSource`` is destroyed when ``gfxPlatform`` is destroyed. It can
+also be destroyed when the layout frame rate pref (or other prefs that
+influence frame rate) are changed. This may mean we switch from hardware
+to software vsync (or vice versa) at runtime. During the switch, there
+may briefly be 2 vsync sources. Otherwise, there is only one
+``VsyncSource`` object throughout the entire lifetime of Firefox. Each
+platform is expected to implement their own ``VsyncSource`` to manage
+vsync events. On OS X, this is through ``CVDisplayLinkRef``. On
+Windows, it should be through ``DwmGetCompositionTimingInfo``.
+
+Compositor
+----------
+
+When the ``CompositorVsyncDispatcher`` is notified of the vsync event,
+the ``CompositorVsyncScheduler::Observer`` associated with the
+``CompositorVsyncDispatcher`` begins execution. Since the
+``CompositorVsyncDispatcher`` executes on the *Hardware Vsync Thread*
+and the ``Compositor`` composites on the ``CompositorThread``, the
+``CompositorVsyncScheduler::Observer`` posts a task to the
+``CompositorThread``. The ``CompositorBridgeParent`` then composites.
+The model where the ``CompositorVsyncDispatcher`` notifies components on
+the *Hardware Vsync Thread*, and the component schedules the task on the
+appropriate thread is used everywhere.
+
+The ``CompositorVsyncScheduler::Observer`` listens to vsync events as
+needed and stops listening to vsync when composites are no longer
+scheduled or required. Every ``CompositorBridgeParent`` is associated
+and tied to one ``CompositorVsyncScheduler::Observer``, which is
+associated with the ``CompositorVsyncDispatcher``. Each
+``CompositorBridgeParent`` is associated with one widget and is created
+when a new platform window or ``nsBaseWidget`` is created. The
+``CompositorBridgeParent``, ``CompositorVsyncDispatcher``,
+``CompositorVsyncScheduler::Observer``, and ``nsBaseWidget`` all have
+the same lifetimes, which are created and destroyed together.
+
+Out-of-process Compositors
+--------------------------
+
+When compositing out-of-process, this model changes slightly. In this
+case there are effectively two observers: a UI process observer
+(``CompositorWidgetVsyncObserver``), and the
+``CompositorVsyncScheduler::Observer`` in the GPU process. There are
+also two dispatchers: the widget dispatcher in the UI process
+(``CompositorVsyncDispatcher``), and the IPDL-based dispatcher in the
+GPU process (``CompositorBridgeParent::NotifyVsync``). The UI process
+observer and the GPU process dispatcher are linked via an IPDL protocol
+called PVsyncBridge. ``PVsyncBridge`` is a top-level protocol for
+sending vsync notifications to the compositor thread in the GPU process.
+The compositor controls vsync observation through a separate actor,
+``PCompositorWidget``, which (as a subactor for
+``CompositorBridgeChild``) links the compositor thread in the GPU
+process to the main thread in the UI process.
+
+Out-of-process compositors do not go through
+``CompositorVsyncDispatcher`` directly. Instead, the
+``CompositorWidgetDelegate`` in the UI process creates one, and gives it
+a ``CompositorWidgetVsyncObserver``. This observer forwards
+notifications to a Vsync I/O thread, where ``VsyncBridgeChild`` then
+forwards the notification again to the compositor thread in the GPU
+process. The notification is received by a ``VsyncBridgeParent``. The
+GPU process uses the layers ID in the notification to find the correct
+compositor to dispatch the notification to.
+
+CompositorVsyncDispatcher
+-------------------------
+
+The ``CompositorVsyncDispatcher`` executes on the *Hardware Vsync
+Thread*. It contains references to the ``nsBaseWidget`` it is associated
+with and has a lifetime equal to the ``nsBaseWidget``. The
+``CompositorVsyncDispatcher`` is responsible for notifying the
+``CompositorBridgeParent`` that a vsync event has occurred. There can be
+multiple ``CompositorVsyncDispatchers`` per ``Display``, one
+``CompositorVsyncDispatcher`` per window. The only responsibility of the
+``CompositorVsyncDispatcher`` is to notify components when a vsync event
+has occurred, and to stop listening to vsync when no components require
+vsync events. We require one ``CompositorVsyncDispatcher`` per window so
+that we can handle multiple ``Displays``. When compositing in-process,
+the ``CompositorVsyncDispatcher`` is attached to the CompositorWidget
+for the window. When out-of-process, it is attached to the
+CompositorWidgetDelegate, which forwards observer notifications over
+IPDL. In the latter case, its lifetime is tied to a CompositorSession
+rather than the nsIWidget.
+
+Multiple Displays
+-----------------
+
+The ``VsyncSource`` has an API to switch a ``CompositorVsyncDispatcher``
+from one ``Display`` to another ``Display``. For example, when one
+window either goes into full screen mode or moves from one connected
+monitor to another. When one window moves to another monitor, we expect
+a platform specific notification to occur. The detection of when a
+window enters full screen mode or moves is not covered by Silk itself,
+but the framework is built to support this use case. The expected flow
+is that the OS notification occurs on ``nsIWidget``, which retrieves the
+associated ``CompositorVsyncDispatcher``. The
+``CompositorVsyncDispatcher`` then notifies the ``VsyncSource`` to
+switch to the correct ``Display`` the ``CompositorVsyncDispatcher`` is
+connected to. Because the notification works through the ``nsIWidget``,
+the actual switching of the ``CompositorVsyncDispatcher`` to the correct
+``Display`` should occur on the *Main Thread*. The current
+implementation of Silk does not handle this case and needs to be built
+out.
+
+CompositorVsyncScheduler::Observer
+----------------------------------
+
+The ``CompositorVsyncScheduler::Observer`` handles the vsync
+notifications and interactions with the ``CompositorVsyncDispatcher``.
+When the ``Compositor`` requires a scheduled composite, it notifies the
+``CompositorVsyncScheduler::Observer`` that it needs to listen to vsync.
+The ``CompositorVsyncScheduler::Observer`` then observes / unobserves
+vsync as needed from the ``CompositorVsyncDispatcher`` to enable
+composites.
+
+GeckoTouchDispatcher
+--------------------
+
+The ``GeckoTouchDispatcher`` is a singleton that resamples touch events
+to smooth out jank while tracking a user’s finger. Because input and
+composite are linked together, the
+``CompositorVsyncScheduler::Observer`` has a reference to the
+``GeckoTouchDispatcher`` and vice versa.
+
+Input Events
+------------
+
+One large goal of Silk is to align touch events with vsync events. On
+Firefox OS, touchscreens often have different touch scan rates than the
+display refreshes. A Flame device has a touch refresh rate of 75 HZ,
+while a Nexus 4 has a touch refresh rate of 100 HZ, while the device’s
+display refresh rate is 60HZ. When a vsync event occurs, we resample
+touch events, and then dispatch the resampled touch event to APZ. Touch
+events on Firefox OS occur on a *Touch Input Thread* whereas they are
+processed by APZ on the *APZ Controller Thread*. We use `Google
+Android’s touch
+resampling <https://web.archive.org/web/20200909082458/http://www.masonchang.com/blog/2014/8/25/androids-touch-resampling-algorithm>`__
+algorithm to resample touch events.
+
+Currently, we have a strict ordering between Composites and touch
+events. When a touch event occurs on the *Touch Input Thread*, we store
+the touch event in a queue. When a vsync event occurs, the
+``CompositorVsyncDispatcher`` notifies the ``Compositor`` of a vsync
+event, which notifies the ``GeckoTouchDispatcher``. The
+``GeckoTouchDispatcher`` processes the touch event first on the *APZ
+Controller Thread*, which is the same as the *Compositor Thread* on b2g,
+then the ``Compositor`` finishes compositing. We require this strict
+ordering because if a vsync notification is dispatched to both the
+``Compositor`` and ``GeckoTouchDispatcher`` at the same time, a race
+condition occurs between processing the touch event and therefore
+position versus compositing. In practice, this creates very janky
+scrolling. As of this writing, we have not analyzed input events on
+desktop platforms.
+
+One slight quirk is that input events can start a composite, for example
+during a scroll and after the ``Compositor`` is no longer listening to
+vsync events. In these cases, we notify the ``Compositor`` to observe
+vsync so that it dispatches touch events. If touch events were not
+dispatched, and since the ``Compositor`` is not listening to vsync
+events, the touch events would never be dispatched. The
+``GeckoTouchDispatcher`` handles this case by always forcing the
+``Compositor`` to listen to vsync events while touch events are
+occurring.
+
+Widget, Compositor, CompositorVsyncDispatcher, GeckoTouchDispatcher Shutdown Procedure
+--------------------------------------------------------------------------------------
+
+When the `nsBaseWidget shuts
+down <https://hg.mozilla.org/mozilla-central/file/0df249a0e4d3/widget/nsBaseWidget.cpp#l182>`__
+- It calls nsBaseWidget::DestroyCompositor on the *Gecko Main Thread*.
+During nsBaseWidget::DestroyCompositor, it first destroys the
+CompositorBridgeChild. CompositorBridgeChild sends a sync IPC call to
+CompositorBridgeParent::RecvStop, which calls
+`CompositorBridgeParent::Destroy <https://hg.mozilla.org/mozilla-central/file/ab0490972e1e/gfx/layers/ipc/CompositorParent.cpp#l509>`__.
+During this time, the *main thread* is blocked on the parent process.
+CompositorBridgeParent::RecvStop runs on the *Compositor thread* and
+cleans up some resources, including setting the
+``CompositorVsyncScheduler::Observer`` to nullptr.
+CompositorBridgeParent::RecvStop also explicitly keeps the
+CompositorBridgeParent alive and posts another task to run
+CompositorBridgeParent::DeferredDestroy on the Compositor loop so that
+all ipdl code can finish executing. The
+``CompositorVsyncScheduler::Observer`` also unobserves from vsync and
+cancels any pending composite tasks. Once
+CompositorBridgeParent::RecvStop finishes, the *main thread* in the
+parent process continues shutting down the nsBaseWidget.
+
+At the same time, the *Compositor thread* is executing tasks until
+CompositorBridgeParent::DeferredDestroy runs, which flushes the
+compositor message loop. Now we have two tasks as both the nsBaseWidget
+releases a reference to the Compositor on the *main thread* during
+destruction and the CompositorBridgeParent::DeferredDestroy releases a
+reference to the CompositorBridgeParent on the *Compositor Thread*.
+Finally, the CompositorBridgeParent itself is destroyed on the *main
+thread* once both references are gone due to explicit `main thread
+destruction <https://hg.mozilla.org/mozilla-central/file/50b95032152c/gfx/layers/ipc/CompositorParent.h#l148>`__.
+
+With the ``CompositorVsyncScheduler::Observer``, any accesses to the
+widget after nsBaseWidget::DestroyCompositor executes are invalid. Any
+accesses to the compositor between the time the
+nsBaseWidget::DestroyCompositor runs and the
+CompositorVsyncScheduler::Observer’s destructor runs aren’t safe yet a
+hardware vsync event could occur between these times. Since any tasks
+posted on the Compositor loop after
+CompositorBridgeParent::DeferredDestroy is posted are invalid, we make
+sure that no vsync tasks can be posted once
+CompositorBridgeParent::RecvStop executes and DeferredDestroy is posted
+on the Compositor thread. When the sync call to
+CompositorBridgeParent::RecvStop executes, we explicitly set the
+CompositorVsyncScheduler::Observer to null to prevent vsync
+notifications from occurring. If vsync notifications were allowed to
+occur, since the ``CompositorVsyncScheduler::Observer``\ ’s vsync
+notification executes on the *hardware vsync thread*, it would post a
+task to the Compositor loop and may execute after
+CompositorBridgeParent::DeferredDestroy. Thus, we explicitly shut down
+vsync events in the ``CompositorVsyncDispatcher`` and
+``CompositorVsyncScheduler::Observer`` during nsBaseWidget::Shutdown to
+prevent any vsync tasks from executing after
+CompositorBridgeParent::DeferredDestroy.
+
+The ``CompositorVsyncDispatcher`` may be destroyed on either the *main
+thread* or *Compositor Thread*, since both the nsBaseWidget and
+``CompositorVsyncScheduler::Observer`` race to destroy on different
+threads. nsBaseWidget is destroyed on the *main thread* and releases a
+reference to the ``CompositorVsyncDispatcher`` during destruction. The
+``CompositorVsyncScheduler::Observer`` has a race to be destroyed either
+during CompositorBridgeParent shutdown or from the
+``GeckoTouchDispatcher`` which is destroyed on the main thread with
+`ClearOnShutdown <https://hg.mozilla.org/mozilla-central/file/21567e9a6e40/xpcom/base/ClearOnShutdown.h#l15>`__.
+Whichever object, the CompositorBridgeParent or the
+``GeckoTouchDispatcher`` is destroyed last will hold the last reference
+to the ``CompositorVsyncDispatcher``, which destroys the object.
+
+Refresh Driver
+--------------
+
+The Refresh Driver is ticked from a `single active
+timer <https://hg.mozilla.org/mozilla-central/file/ab0490972e1e/layout/base/nsRefreshDriver.cpp#l11>`__.
+The assumption is that there are multiple ``RefreshDrivers`` connected
+to a single ``RefreshTimer``. There are two ``RefreshTimers``: an active
+and an inactive ``RefreshTimer``. Each Tab has its own
+``RefreshDriver``, which connects to one of the global
+``RefreshTimers``. The ``RefreshTimers`` execute on the *Main Thread*
+and tick their connected ``RefreshDrivers``. We do not want to break
+this model of multiple ``RefreshDrivers`` per a set of two global
+``RefreshTimers``. Each ``RefreshDriver`` switches between the active
+and inactive ``RefreshTimer``.
+
+Instead, we create a new ``RefreshTimer``, the ``VsyncRefreshTimer``
+which ticks based on vsync messages. We replace the current active timer
+with a ``VsyncRefreshTimer``. All tabs will then tick based on this new
+active timer. Since the ``RefreshTimer`` has a lifetime of the process,
+we only need to create a single ``VsyncDispatcher`` per
+``Display`` when Firefox starts. Even if we do not have any content
+processes, the Chrome process will still need a ``VsyncRefreshTimer``,
+thus we can associate the ``VsyncDispatcher`` with each
+``Display``.
+
+When Firefox starts, we initially create a new ``VsyncRefreshTimer`` in
+the Chrome process. The ``VsyncRefreshTimer`` will listen to vsync
+notifications from ``VsyncDispatcher`` on the global
+``Display``. When nsRefreshDriver::Shutdown executes, it will delete the
+``VsyncRefreshTimer``. This creates a problem as all the
+``RefreshTimers`` are currently manually memory managed whereas
+``VsyncObservers`` are ref counted. To work around this problem, we
+create a new ``RefreshDriverVsyncObserver`` as an inner class to
+``VsyncRefreshTimer``, which actually receives vsync notifications. It
+then ticks the ``RefreshDrivers`` inside ``VsyncRefreshTimer``.
+
+With Content processes, the start up process is more complicated. We
+send vsync IPC messages via the use of the PBackground thread on the
+parent process, which allows us to send messages from the Parent
+process’ without waiting on the *main thread*. This sends messages from
+the Parent::\ *PBackground Thread* to the Child::\ *Main Thread*. The
+*main thread* receiving IPC messages on the content process is
+acceptable because ``RefreshDrivers`` must execute on the *main thread*.
+However, there is some amount of time required to setup the IPC
+connection upon process creation and during this time, the
+``RefreshDrivers`` must tick to set up the process. To get around this,
+we initially use software ``RefreshTimers`` that already exist during
+content process startup and swap in the ``VsyncRefreshTimer`` once the
+IPC connection is created.
+
+During nsRefreshDriver::ChooseTimer, we create an async PBackground IPC
+open request to create a ``VsyncParent`` and ``VsyncChild``. At the same
+time, we create a software ``RefreshTimer`` and tick the
+``RefreshDrivers`` as normal. Once the PBackground callback is executed
+and an IPC connection exists, we swap all ``RefreshDrivers`` currently
+associated with the active ``RefreshTimer`` and swap the
+``RefreshDrivers`` to use the ``VsyncRefreshTimer``. Since all
+interactions on the content process occur on the main thread, there are
+no need for locks. The ``VsyncParent`` listens to vsync events through
+the ``VsyncRefreshTimerDispatcher`` on the parent side and sends vsync
+IPC messages to the ``VsyncChild``. The ``VsyncChild`` notifies the
+``VsyncRefreshTimer`` on the content process.
+
+During the shutdown process of the content process, ActorDestroy is
+called on the ``VsyncChild`` and ``VsyncParent`` due to the normal
+PBackground shutdown process. Once ActorDestroy is called, no IPC
+messages should be sent across the channel. After ActorDestroy is
+called, the IPDL machinery will delete the **VsyncParent/Child** pair.
+The ``VsyncParent``, due to being a ``VsyncObserver``, is ref counted.
+After ``VsyncParent::ActorDestroy`` is called, it unregisters itself
+from the ``VsyncDispatcher``, which holds the last reference
+to the ``VsyncParent``, and the object will be deleted.
+
+Thus the overall flow during normal execution is:
+
+1. VsyncSource::Display::VsyncDispatcher receives a Vsync
+   notification from the OS in the parent process.
+2. VsyncDispatcher notifies
+   VsyncRefreshTimer::RefreshDriverVsyncObserver that a vsync occurred on
+   the parent process on the hardware vsync thread.
+3. VsyncDispatcher notifies the VsyncParent on the hardware
+   vsync thread that a vsync occurred.
+4. The VsyncRefreshTimer::RefreshDriverVsyncObserver in the parent
+   process posts a task to the main thread that ticks the refresh
+   drivers.
+5. VsyncParent posts a task to the PBackground thread to send a vsync
+   IPC message to VsyncChild.
+6. VsyncChild receive a vsync notification on the content process on the
+   main thread and ticks their respective RefreshDrivers.
+
+Compressing Vsync Messages
+--------------------------
+
+Vsync messages occur quite often and the *main thread* can be busy for
+long periods of time due to JavaScript. Consistently sending vsync
+messages to the refresh driver timer can flood the *main thread* with
+refresh driver ticks, causing even more delays. To avoid this problem,
+we compress vsync messages on both the parent and child processes.
+
+On the parent process, newer vsync messages update a vsync timestamp but
+do not actually queue any tasks on the *main thread*. Once the parent
+process’ *main thread* executes the refresh driver tick, it uses the
+most updated vsync timestamp to tick the refresh driver. After the
+refresh driver has ticked, one single vsync message is queued for
+another refresh driver tick task. On the content process, the IPDL
+``compress`` keyword automatically compresses IPC messages.
+
+Multiple Monitors
+-----------------
+
+In order to have multiple monitor support for the ``RefreshDrivers``, we
+have multiple active ``RefreshTimers``. Each ``RefreshTimer`` is
+associated with a specific ``Display`` via an id and tick when it’s
+respective ``Display`` vsync occurs. We have **N RefreshTimers**, where
+N is the number of connected displays. Each ``RefreshTimer`` still has
+multiple ``RefreshDrivers``.
+
+When a tab or window changes monitors, the ``nsIWidget`` receives a
+display changed notification. Based on which display the window is on,
+the window switches to the correct ``VsyncDispatcher`` and
+``CompositorVsyncDispatcher`` on the parent process based on the display
+id. Each ``TabParent`` should also send a notification to their child.
+Each ``TabChild``, given the display ID, switches to the correct
+``RefreshTimer`` associated with the display ID. When each display vsync
+occurs, it sends one IPC message to notify vsync. The vsync message
+contains a display ID, to tick the appropriate ``RefreshTimer`` on the
+content process. There is still only one **VsyncParent/VsyncChild**
+pair, just each vsync notification will include a display ID, which maps
+to the correct ``RefreshTimer``.
+
+Object Lifetime
+---------------
+
+1. CompositorVsyncDispatcher - Lives as long as the nsBaseWidget
+   associated with the VsyncDispatcher
+2. CompositorVsyncScheduler::Observer - Lives and dies the same time as
+   the CompositorBridgeParent.
+3. VsyncDispatcher - As long as the associated display
+   object, which is the lifetime of Firefox.
+4. VsyncSource - Lives as long as the gfxPlatform on the chrome process,
+   which is the lifetime of Firefox.
+5. VsyncParent/VsyncChild - Lives as long as the content process
+6. RefreshTimer - Lives as long as the process
+
+Threads
+-------
+
+All ``VsyncObservers`` are notified on the *Hardware Vsync Thread*. It
+is the responsibility of the ``VsyncObservers`` to post tasks to their
+respective correct thread. For example, the
+``CompositorVsyncScheduler::Observer`` will be notified on the *Hardware
+Vsync Thread*, and post a task to the *Compositor Thread* to do the
+actual composition.
+
+1. Compositor Thread - Nothing changes
+2. Main Thread - PVsyncChild receives IPC messages on the main thread.
+   We also enable/disable vsync on the main thread.
+3. PBackground Thread - Creates a connection from the PBackground thread
+   on the parent process to the main thread in the content process.
+4. Hardware Vsync Thread - Every platform is different, but we always
+   have the concept of a hardware vsync thread. Sometimes this is
+   actually created by the host OS. On Windows, we have to create a
+   separate platform thread that blocks on DwmFlush().
diff --git a/gfx/docs/SilkArchitecture.png b/gfx/docs/SilkArchitecture.png
new file mode 100644
index 0000000000..938c585e40
--- /dev/null
+++ b/gfx/docs/SilkArchitecture.png
diff --git a/gfx/docs/index.rst b/gfx/docs/index.rst
new file mode 100644
index 0000000000..240923eb02
--- /dev/null
+++ b/gfx/docs/index.rst
@@ -0,0 +1,18 @@
+Graphics
+========
+
+This collection of linked pages contains design documents for the
+Mozilla graphics architecture. The design documents live in gfx/docs directory.
+
+This `wiki page <https://wiki.mozilla.org/Platform/GFX>`__ contains
+information about graphics and the graphics team at Mozilla.
+
+.. toctree::
+   :maxdepth: 1
+
+   RenderingOverview
+   LayersHistory
+   AsyncPanZoom
+   AdvancedLayers
+   Silk
+   Moz2D