summaryrefslogtreecommitdiffstats
path: root/gfx/docs/OffMainThreadPainting.rst
blob: c5a75f602557aac449dd371a517a9239fa3ca415 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
Off Main Thread Painting
========================

OMTP, or ‘off main thread painting’, is the component of Gecko that
allows us to perform painting of web content off of the main thread.
This gives us more time on the main thread for javascript, layout,
display list building, and other tasks which allows us to increase our
responsiveness.

Take a look at this `blog
post <https://mozillagfx.wordpress.com/2017/12/05/off-main-thread-painting/>`__
for an introduction.

Background
----------

Painting (or rasterization) is the last operation that happens in a
layer transaction before we forward it to the compositor. At this point,
all display items have been assigned to a layer and invalid regions have
been calculated and assigned to each layer.

The painted layer uses a content client to acquire a buffer for
painting. The main purpose of the content client is to allow us to
retain already painted content when we are scrolling a layer. We have
two main strategies for this, rotated buffer and tiling.

This is implemented with two class hierarchies. ``ContentClient`` for
rotated buffer and ``TiledContentClient`` for tiling. Additionally we
have two different painted layer implementations, ``ClientPaintedLayer``
and ``ClientTiledPaintedLayer``.

The main distinction between rotated buffer and tiling is the amount of
graphics surfaces required. Rotated buffer utilizes just a single buffer
for a frame but potentially requires painting into it multiple times.
Tiling uses multiple buffers but doesn’t require painting into the
buffers multiple times.

Once the painted layer has a surface (or surfaces with tiling) to paint
into, they are wrapped in a ``DrawTarget`` of some form and a callback
to ``FrameLayerBuilder`` is called. This callback uses the assigned
display items and invalid regions to trigger rasterization. Each
``nsDisplayItem`` has their ``Paint`` method called with the provided
``DrawTarget`` that represents the surface, and they paint into it.

High level
----------

The key abstraction that allows us to paint off of the main thread is
``DrawTargetCapture`` [1]_. ``DrawTargetCapture`` is a special
``DrawTarget`` which records all draw commands for replaying to another
draw target in the local process. This is similar to
``DrawTargetRecording``, but only holds a reference to resources instead
of copying them into the command stream. This allows the command stream
to be much more lightweight than ``DrawTargetRecording``.

OMTP works by instrumenting the content clients to use a capture target
for all painting [2]_ [3]_ [4]_ [5]_. This capture draw target records all
the operations that would normally be performed directly on the
surface’s draw target. Once we have all of the commands, we send the
capture and surface draw target to the ``PaintThread`` [6]_ where the
commands are replayed onto the surface. Once the rasterization is done,
we forward the layer transaction to the compositor.

Tiling and parallel painting
----------------------------

We can make one additional improvement if we are using tiling as our
content client backend.

When we are tiling, the screen is subdivided into a grid of equally
sized surfaces and draw commands are performed on the tiles they affect.
Each tile is independent of the others, so we’re able to parallelize
painting by using a worker thread pool and dispatching a task for each
tile individually.

This is commonly referred to as P-OMTP or parallel painting.

Main thread rasterization
-------------------------

Even with OMTP it’s still possible for the main thread to perform
rasterization. A common pattern for painting code is to create a
temporary draw target, perform drawing with it, take a snapshot, and
then draw the snapshot onto the main draw target. This is done for
blurs, box shadows, text shadows, and with the basic layer manager
fallback.

If the temporary draw target is not a draw target capture, then this
will perform rasterization on the main thread. This can be bad as it
lowers our parallelism and can cause contention with content backends,
like Direct2D, that use locking around shared resources.

To work around this, we changed the main thread painting code to use a
draw target capture for these operations and added a source surface
capture [7]_ which only resolves the painting of the draw commands when
needed on the paint thread.

There are still possible cases we can perform main thread rasterization,
but we try and address them when they come up.

Out of memory issues
--------------------

The web is very complex, and so we can sometimes have a large amount of
draw commands for a content paint. We’ve observed OOM errors for capture
command lists that have grown to be 200MiB large.

We initially tried to mitigate this by lowering the overhead of capture
command lists. We do this by filtering commands that don’t actually
change the draw target state and folding consecutive transform changes,
but that was not always enough. So we added the ability for our draw
target capture’s to flush their command lists to the surface draw target
while we are capturing on the main thread [8]_.

This is triggered by a configurable memory limit. Because this
introduces a new source of main thread rasterization we try to balance
setting this too low and suffering poor performance, or setting this too
high and suffering crashes.

Synchronization
---------------

OMTP is conceptually simple, but in practice it relies on subtle code to
ensure thread safety. This was the most arguably the most difficult part
of the project.

There are roughly four areas that are critical.

1. Compositor message ordering

   Immediately after we queue the async paints to be asynchronously
   completed, we have a problem. We need to forward the layer
   transaction at some point, but the compositor cannot process the
   transaction until all async paints have finished. If it did, it could
   access unfinished painted content.

   We obviously can’t block on the async paints completing as that would
   beat the whole point of OMTP. We also can’t hold off on sending the
   layer transaction to ``IPDL``, as we’d trigger race conditions for
   messages sent after the layer transaction is built but before it is
   forwarded. Reftest and other code assumes that messages sent after a
   layer transaction to the compositor are processed after that layer
   transaction is processed.

   The solution is to forward the layer transaction to the compositor
   over ``IPDL``, but flag the message channel to start postponing
   messages [9]_. Then once all async paints have completed, we unflag
   the message channel and all postponed messages are sent [10]_. This
   allows us to keep our message ordering guarantees and not have to
   worry about scheduling a runnable in the future.

2. Texture clients

   The backing store for content surfaces is managed by texture client.
   While async paints are executing, it’s possible for shutdown or any
   number of things to happen that could cause layer manager, all
   layers, all content clients, and therefore all texture clients to be
   destroyed. Therefore it’s important that we keep these texture
   clients alive throughout async painting. Texture clients also manage
   IPC resources and must be destroyed on the main thread, so we are
   careful to do that [11]_.

3. Double buffering

   We currently double buffer our content painting - our content clients
   only ever have zero or one texture that is available to be painted
   into at any moment.

   This implies that we cannot start async painting a layer tree while
   previous async paints are still active as this would lead to awful
   races. We also don’t support multiple nested sets of postponed IPC
   messages to allow sending the first layer transaction to the
   compositor, but not the second.

   To prevent issues with this, we flush all active async paints before
   we begin to paint a new layer transaction [12]_.

   There was some initial debate about implementing triple buffering for
   content painting, but we have not seen evidence it would help us
   significantly.

4. Moz2D thread safety

   Finally, most Moz2D objects were not thread safe. We had to insert
   special locking into draw target and source surface as they have a
   special copy on write relationship that must be consistent even if
   they are on different threads.

   Some platform specific resources like fonts needed locking added in
   order to be thread safe. We also did some work to make filter nodes
   work with multiple threads executing them at the same time.

Browser process
---------------

Currently only content processes are able to use OMTP.

This restriction was added because of concern about message ordering
between ``APZ`` and OMTP. It might be able to lifted in the future.

Important bugs
--------------

1. `OMTP Meta <https://bugzilla.mozilla.org/show_bug.cgi?id=omtp>`__
2. `Enable on
   Windows <https://bugzilla.mozilla.org/show_bug.cgi?id=1403935>`__
3. `Enable on
   OSX <https://bugzilla.mozilla.org/show_bug.cgi?id=1422392>`__
4. `Enable on
   Linux <https://bugzilla.mozilla.org/show_bug.cgi?id=1432531>`__
5. `Parallel
   painting <https://bugzilla.mozilla.org/show_bug.cgi?id=1425056>`__

Code links
----------

.. [1]  `DrawTargetCapture <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/2d/DrawTargetCapture.h#22>`__
.. [2]  `Creating DrawTargetCapture for rotated
    buffer <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/client/ContentClient.cpp#185>`__
.. [3]  `Dispatch DrawTargetCapture for rotated
    buffer <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/client/ClientPaintedLayer.cpp#99>`__
.. [4]  `Creating DrawTargetCapture for
    tiling <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/client/TiledContentClient.cpp#714>`__
.. [5]  `Dispatch DrawTargetCapture for
    tiling <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/client/MultiTiledContentClient.cpp#288>`__
.. [6]  `PaintThread <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/PaintThread.h#53>`__
.. [7]  `SourceSurfaceCapture <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/2d/SourceSurfaceCapture.h#19>`__
.. [8] `Sync flushing draw
    commands <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/2d/DrawTargetCapture.h#165>`__
.. [9]  `Postponing messages for
    PCompositorBridge <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/ipc/CompositorBridgeChild.cpp#1319>`__
.. [10]  `Releasing messages for
    PCompositorBridge <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/ipc/CompositorBridgeChild.cpp#1303>`__
.. [11] `Releasing texture clients on main
    thread <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/ipc/CompositorBridgeChild.cpp#1170>`__
.. [12] `Flushing async
    paints <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/client/ClientLayerManager.cpp#289>`__