Off Main Thread Painting
========================
OMTP, or ‘off main thread painting’, is the component of Gecko that
allows us to perform painting of web content off of the main thread.
This gives us more time on the main thread for javascript, layout,
display list building, and other tasks which allows us to increase our
responsiveness.
Take a look at this `blog
post `__
for an introduction.
Background
----------
Painting (or rasterization) is the last operation that happens in a
layer transaction before we forward it to the compositor. At this point,
all display items have been assigned to a layer and invalid regions have
been calculated and assigned to each layer.
The painted layer uses a content client to acquire a buffer for
painting. The main purpose of the content client is to allow us to
retain already painted content when we are scrolling a layer. We have
two main strategies for this, rotated buffer and tiling.
This is implemented with two class hierarchies. ``ContentClient`` for
rotated buffer and ``TiledContentClient`` for tiling. Additionally we
have two different painted layer implementations, ``ClientPaintedLayer``
and ``ClientTiledPaintedLayer``.
The main distinction between rotated buffer and tiling is the amount of
graphics surfaces required. Rotated buffer utilizes just a single buffer
for a frame but potentially requires painting into it multiple times.
Tiling uses multiple buffers but doesn’t require painting into the
buffers multiple times.
Once the painted layer has a surface (or surfaces with tiling) to paint
into, they are wrapped in a ``DrawTarget`` of some form and a callback
to ``FrameLayerBuilder`` is called. This callback uses the assigned
display items and invalid regions to trigger rasterization. Each
``nsDisplayItem`` has their ``Paint`` method called with the provided
``DrawTarget`` that represents the surface, and they paint into it.
High level
----------
The key abstraction that allows us to paint off of the main thread is
``DrawTargetCapture`` [1]_. ``DrawTargetCapture`` is a special
``DrawTarget`` which records all draw commands for replaying to another
draw target in the local process. This is similar to
``DrawTargetRecording``, but only holds a reference to resources instead
of copying them into the command stream. This allows the command stream
to be much more lightweight than ``DrawTargetRecording``.
OMTP works by instrumenting the content clients to use a capture target
for all painting [2]_ [3]_ [4]_ [5]_. This capture draw target records all
the operations that would normally be performed directly on the
surface’s draw target. Once we have all of the commands, we send the
capture and surface draw target to the ``PaintThread`` [6]_ where the
commands are replayed onto the surface. Once the rasterization is done,
we forward the layer transaction to the compositor.
Tiling and parallel painting
----------------------------
We can make one additional improvement if we are using tiling as our
content client backend.
When we are tiling, the screen is subdivided into a grid of equally
sized surfaces and draw commands are performed on the tiles they affect.
Each tile is independent of the others, so we’re able to parallelize
painting by using a worker thread pool and dispatching a task for each
tile individually.
This is commonly referred to as P-OMTP or parallel painting.
Main thread rasterization
-------------------------
Even with OMTP it’s still possible for the main thread to perform
rasterization. A common pattern for painting code is to create a
temporary draw target, perform drawing with it, take a snapshot, and
then draw the snapshot onto the main draw target. This is done for
blurs, box shadows, text shadows, and with the basic layer manager
fallback.
If the temporary draw target is not a draw target capture, then this
will perform rasterization on the main thread. This can be bad as it
lowers our parallelism and can cause contention with content backends,
like Direct2D, that use locking around shared resources.
To work around this, we changed the main thread painting code to use a
draw target capture for these operations and added a source surface
capture [7]_ which only resolves the painting of the draw commands when
needed on the paint thread.
There are still possible cases we can perform main thread rasterization,
but we try and address them when they come up.
Out of memory issues
--------------------
The web is very complex, and so we can sometimes have a large amount of
draw commands for a content paint. We’ve observed OOM errors for capture
command lists that have grown to be 200MiB large.
We initially tried to mitigate this by lowering the overhead of capture
command lists. We do this by filtering commands that don’t actually
change the draw target state and folding consecutive transform changes,
but that was not always enough. So we added the ability for our draw
target capture’s to flush their command lists to the surface draw target
while we are capturing on the main thread [8]_.
This is triggered by a configurable memory limit. Because this
introduces a new source of main thread rasterization we try to balance
setting this too low and suffering poor performance, or setting this too
high and suffering crashes.
Synchronization
---------------
OMTP is conceptually simple, but in practice it relies on subtle code to
ensure thread safety. This was the most arguably the most difficult part
of the project.
There are roughly four areas that are critical.
1. Compositor message ordering
Immediately after we queue the async paints to be asynchronously
completed, we have a problem. We need to forward the layer
transaction at some point, but the compositor cannot process the
transaction until all async paints have finished. If it did, it could
access unfinished painted content.
We obviously can’t block on the async paints completing as that would
beat the whole point of OMTP. We also can’t hold off on sending the
layer transaction to ``IPDL``, as we’d trigger race conditions for
messages sent after the layer transaction is built but before it is
forwarded. Reftest and other code assumes that messages sent after a
layer transaction to the compositor are processed after that layer
transaction is processed.
The solution is to forward the layer transaction to the compositor
over ``IPDL``, but flag the message channel to start postponing
messages [9]_. Then once all async paints have completed, we unflag
the message channel and all postponed messages are sent [10]_. This
allows us to keep our message ordering guarantees and not have to
worry about scheduling a runnable in the future.
2. Texture clients
The backing store for content surfaces is managed by texture client.
While async paints are executing, it’s possible for shutdown or any
number of things to happen that could cause layer manager, all
layers, all content clients, and therefore all texture clients to be
destroyed. Therefore it’s important that we keep these texture
clients alive throughout async painting. Texture clients also manage
IPC resources and must be destroyed on the main thread, so we are
careful to do that [11]_.
3. Double buffering
We currently double buffer our content painting - our content clients
only ever have zero or one texture that is available to be painted
into at any moment.
This implies that we cannot start async painting a layer tree while
previous async paints are still active as this would lead to awful
races. We also don’t support multiple nested sets of postponed IPC
messages to allow sending the first layer transaction to the
compositor, but not the second.
To prevent issues with this, we flush all active async paints before
we begin to paint a new layer transaction [12]_.
There was some initial debate about implementing triple buffering for
content painting, but we have not seen evidence it would help us
significantly.
4. Moz2D thread safety
Finally, most Moz2D objects were not thread safe. We had to insert
special locking into draw target and source surface as they have a
special copy on write relationship that must be consistent even if
they are on different threads.
Some platform specific resources like fonts needed locking added in
order to be thread safe. We also did some work to make filter nodes
work with multiple threads executing them at the same time.
Browser process
---------------
Currently only content processes are able to use OMTP.
This restriction was added because of concern about message ordering
between ``APZ`` and OMTP. It might be able to lifted in the future.
Important bugs
--------------
1. `OMTP Meta `__
2. `Enable on
Windows `__
3. `Enable on
OSX `__
4. `Enable on
Linux `__
5. `Parallel
painting `__
Code links
----------
.. [1] `DrawTargetCapture `__
.. [2] `Creating DrawTargetCapture for rotated
buffer `__
.. [3] `Dispatch DrawTargetCapture for rotated
buffer `__
.. [4] `Creating DrawTargetCapture for
tiling `__
.. [5] `Dispatch DrawTargetCapture for
tiling `__
.. [6] `PaintThread `__
.. [7] `SourceSurfaceCapture `__
.. [8] `Sync flushing draw
commands `__
.. [9] `Postponing messages for
PCompositorBridge `__
.. [10] `Releasing messages for
PCompositorBridge `__
.. [11] `Releasing texture clients on main
thread `__
.. [12] `Flushing async
paints `__