tools/sanitizer/docs/tsan.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327

Thread Sanitizer
================

What is Thread Sanitizer?
--------------------------

Thread Sanitizer (TSan) is a fast data race detector for C/C++ and Rust
programs. It uses a compile-time instrumentation to check all non-race-free
memory access at runtime. Unlike other tools, it understands compiler-builtin
atomics and synchronization and therefore provides very accurate results
with no false positives (except if unsupported synchronization primitives
like inline assembly or memory fences are used). More information on how
TSan works can be found on `the Thread Sanitizer wiki <https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm>`__.

A `meta bug called tsan <https://bugzilla.mozilla.org/show_bug.cgi?id=tsan>`__
is maintained to keep track of all the bugs found with TSan.

A `blog post on hacks.mozilla.org <https://hacks.mozilla.org/2021/04/eliminating-data-races-in-firefox-a-technical-report/>`__ describes this project.

Note that unlike other sanitizers, TSan is currently **only supported on Linux**.

Downloading artifact builds
---------------------------

The easiest way to get Firefox builds with Thread Sanitizer is to download a
continuous integration TSan build of mozilla-central (updated at least daily):

-  mozilla-central optimized builds:
   `linux <https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.latest.firefox.linux64-tsan-opt/artifacts/public/build/target.tar.bz2>`__

The fuzzing team also offers a tool called ``fuzzfetch`` to download this and many
other CI builds. It makes downloading and unpacking these builds much easier and
can be used not just for fuzzing but for all purposes that require a CI build download.

You can install ``fuzzfetch`` from
`Github <https://github.com/MozillaSecurity/fuzzfetch>`__ or
`via pip <https://pypi.org/project/fuzzfetch/>`__.

Afterwards, you can run

::

   $ python -m fuzzfetch --tsan -n firefox-tsan

to get the build mentioned above unpacked into a directory called ``firefox-tsan``.

Creating Try builds
-------------------

If for some reason you can't use the pre-built binaries mentioned in the
previous section (e.g. you need to test a patch), you can either build
Firefox yourself (see the following section) or use the :ref:`try server <Pushing to Try>`
to create the customized build for you. Pushing to try requires L1 commit
access. If you don't have this access yet you can request access (see
`Becoming A Mozilla
Committer <https://www.mozilla.org/about/governance/policies/commit/>`__
and `Mozilla Commit Access
Policy <https://www.mozilla.org/about/governance/policies/commit/access-policy/>`__
for the requirements).

Using ``mach try fuzzy --full`` you can select the ``build-linux64-tsan/opt`` job
and related tests (if required).

Creating local builds on Linux
------------------------------

Build prerequisites
~~~~~~~~~~~~~~~~~~~

LLVM/Clang/Rust
^^^^^^^^^^^^^^^

The TSan instrumentation is implemented as an LLVM pass and integrated
into Clang. We strongly recommend that you use the Clang version supplied
as part of the ``mach bootstrap`` process, as we backported several required
fixes for TSan on Firefox.

Sanitizer support in Rust is genuinely experimental,
so our build system only works with a specially patched version of Rust
that we build in our CI. To install that specific version (or update to a newer
version), run the following in the root of your mozilla-central checkout:

::

    ./mach artifact toolchain --from-build linux64-rust-dev
    rm -rf ~/.mozbuild/rustc-sanitizers
    mv rustc ~/.mozbuild/rustc-sanitizers
    rustup toolchain link gecko-sanitizers ~/.mozbuild/rustc-sanitizers
    rustup override set gecko-sanitizers

``mach artifact`` will always download the ``linux64-rust-dev`` toolchain associated
with the current mozilla central commit you have checked out. The toolchain should
mostly behave like a normal rust nightly but we don't recommend using it for anything
other than building gecko, just in case. Also note that
``~/.mozbuild/rustc-sanitizers`` is just a reasonable default location -- feel
free to "install" the toolchain wherever you please.

Building Firefox
~~~~~~~~~~~~~~~~

Getting the source
^^^^^^^^^^^^^^^^^^

Using that or any later revision, all you need to do is to :ref:`get yourself
a clone of mozilla-central <Mercurial overview>`.

Adjusting the build configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Create the build configuration file ``mozconfig`` with the following
content in your mozilla-central directory:

::

   # Combined .mozconfig file for TSan on Linux+Mac

   mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir-ff-tsan

   # Enable ASan specific code and build workarounds
   ac_add_options --enable-thread-sanitizer

   # This ensures that we also instrument Rust code.
   export RUSTFLAGS="-Zsanitizer=thread"

   # rustfmt is currently missing in Rust nightly
   unset RUSTFMT

   # Current Rust Nightly has warnings
   ac_add_options --disable-warnings-as-errors

   # These are required by TSan
   ac_add_options --disable-jemalloc
   ac_add_options --disable-crashreporter
   ac_add_options --disable-elf-hack
   ac_add_options --disable-profiling

   # The Thread Sanitizer is not compatible with sandboxing
   # (see bug 1182565)
   ac_add_options --disable-sandbox

   # Keep symbols to symbolize TSan traces later
   export MOZ_DEBUG_SYMBOLS=1
   ac_add_options --enable-debug-symbols
   ac_add_options --disable-install-strip

   # Settings for an opt build (preferred)
   # The -gline-tables-only ensures that all the necessary debug information for ASan
   # is present, but the rest is stripped so the resulting binaries are smaller.
   ac_add_options --enable-optimize="-O2 -gline-tables-only"
   ac_add_options --disable-debug

   # Settings for a debug+opt build
   #ac_add_options --enable-optimize
   #ac_add_options --enable-debug


Starting the build process
^^^^^^^^^^^^^^^^^^^^^^^^^^

Now you start the build process using the regular ``./mach build``
command.

Starting Firefox
^^^^^^^^^^^^^^^^

After the build has completed, ``./mach run`` with the usual options for
running in a debugger (``gdb``, ``lldb``, ``rr``, etc.) work fine, as do
the ``--disable-e10s`` and other options.

Building only the JavaScript shell
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you want to build only the JavaScript shell instead of doing a full
Firefox build, the build script below will probably help you to do so.
Execute this script in the ``js/src/`` subdirectory and pass a directory
name as the first parameter. The build will then be created in a new
subdirectory with that name.

::

   #! /bin/sh

   if [ -z $1 ] ; then
        echo "usage: $0 <dirname>"
   elif [ -d $1 ] ; then
        echo "directory $1 already exists"
   else
        autoconf2.13
        mkdir $1
        cd $1
        CC="/path/to/mozbuild/clang" \
        CXX="/path/to/mozbuild/clang++" \
        ../configure --disable-debug --enable-optimize="-O2 -gline-tables-only" --enable-thread-sanitizer --disable-jemalloc
   fi

Thread Sanitizer and Symbols
----------------------------

Unlike Address Sanitizer, TSan requires in-process symbolizing to work
properly in the first place, as any kind of runtime suppressions will
otherwise not work.

Hence, it is required that you have a copy of ``llvm-symbolizer`` either
in your ``PATH`` or pointed to by the ``TSAN_SYMBOLIZER_PATH`` environment
variable. This binary is included in your local mozbuild directory, obtained
by ``./mach bootstrap``.


Runtime Suppressions
--------------------

TSan has the ability to suppress race reports at runtime. This can be used to
silence a race while a fix is developed as well as to permanently silence a
(benign) race that cannot be fixed.

.. warning::
       **Warning**: Many races *look* benign but are indeed not. Please read
       the :ref:`FAQ section <Frequently Asked Questions about TSan>` carefully
       and think twice before attempting to suppress a race.

The runtime Suppression list is directly baked into Firefox at compile-time and
located at `mozglue/build/TsanOptions.cpp <https://searchfox.org/mozilla-central/source/mozglue/build/TsanOptions.cpp>`__.

.. warning::
       **Important**: When adding a suppression, always make sure to include
       the bug number. If the suppression is supposed to be permanent, please
       add the string ``permanent`` in the same line as the bug number.

.. warning::
       **Important**: When adding a suppression for a *data race*, always make
       sure to include a stack frame from **each** of the two race stacks.
       Adding only one suppression for one stack can cause intermittent failures
       that are later on hard to track. One exception to this rule is when suppressing
       races on global variables. In that case, a single race entry with the name of
       the variable is sufficient.

Troubleshooting / Known Problems
--------------------------------

Known Sources of False Positives
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TSan has a number of things that can cause false positives, namely:

  * The use of memory fences (e.g. Rust Arc)
  * The use of inline assembly for synchronization
  * Uninstrumented code (e.g. external libraries) using compiler-builtins for synchronization
  * A lock order inversion involving only a single thread can cause a false positive deadlock
    report (see also https://github.com/google/sanitizers/issues/488).

If none of these four items are involved, you should *never* assume that TSan is reporting
a false positive to you without consulting TSan peers. It is very easy to misjudge a race
to be a false positive because races can be highly complex and totally non-obvious due to
compiler optimizations and the nature of parallel code.

Intermittent Broken Stacks
~~~~~~~~~~~~~~~~~~~~~~~~~~

If you intermittently see race reports where one stack is missing with a ``failed to restore the stack``
message, this can indicate that a suppression is partially covering the race you are seeing.

Any race where only one of the two stacks is matched by a runtime suppression will show up
if that particular stack fails to symbolize for some reason. The usual solution is to search
the suppressions for potential candidates and disable them temporarily to check if your race
report now becomes mostly consistent.

However, there are other reasons for broken TSan stacks, in particular if they are not intermittent.
See also the ``history_size`` parameter in the `TSan flags <https://github.com/google/sanitizers/wiki/ThreadSanitizerFlags>`__.

Intermittent Race Reports
~~~~~~~~~~~~~~~~~~~~~~~~~

Unfortunately, the TSan algorithm does not guarantee, that a race is detected 100% of the
time. Intermittent failures with TSan are (to a certain degree) to be expected and the races
involved should be filed and fixed to solve the problem.

.. _Frequently Asked Questions about TSan:

Frequently Asked Questions about TSan
-------------------------------------

Why fix data races?
~~~~~~~~~~~~~~~~~~~

Data races are undefined behavior and can cause crashes as well as correctness issues.
Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.

At Mozilla, we have already seen several dangerous races, causing random
`use-after-free crashes <https://bugzilla.mozilla.org/show_bug.cgi?id=1580288>`__,
`intermittent test failures <https://bugzilla.mozilla.org/show_bug.cgi?id=1602009>`__,
`hangs <https://bugzilla.mozilla.org/show_bug.cgi?id=1607008>`__,
`performance issues <https://bugzilla.mozilla.org/show_bug.cgi?id=1615045>`__ and
`intermittent asserts <https://bugzilla.mozilla.org/show_bug.cgi?id=1601940>`__. Such problems do
not only decrease the quality of our code and user experience, but they also waste countless hours
of developer time.

Since it is very hard to judge if a particular race could cause such a situation, we
have decided to fix all data races wherever possible, since doing so is often cheaper
than analyzing a race.

My race is benign, can we ignore it?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

While it is possible to add a runtime suppression to ignore the race, we *strongly* encourage
you to not do so, for two reasons:

    1. Each suppressed race decreases the overall performance of the TSan build, as the race
       has to be symbolized each time when it occurs. Since TSan is already in itself a slow
       build, we need to keep the amount of suppressed races as low as possible.

    2. Deciding if a race is truly benign is surprisingly hard. We recommend to read
       `this blog post <http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong>`__
       and `this paper <https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf>`
       on the effects of seemingly benign races.

Valid reasons to suppress a confirmed benign race include performance problems arising from
fixing the race or cases where fixing the race would require an unreasonable amount of work.

Note that the use of atomics usually does not have the bad performance impact that developers
tend to associate with it. If you assume that e.g. using atomics for synchronization will
cause performance regressions, we suggest to perform a benchmark to confirm this. In many
cases, the difference is not measurable.

How does TSan work exactly?
~~~~~~~~~~~~~~~~~~~~~~~~~~~

More information on how TSan works can be found on `the Thread Sanitizer wiki <https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm>`__.