summaryrefslogtreecommitdiffstats
path: root/tools/sanitizer/docs/tsan.rst
diff options
context:
space:
mode:
Diffstat (limited to 'tools/sanitizer/docs/tsan.rst')
-rw-r--r--tools/sanitizer/docs/tsan.rst327
1 files changed, 327 insertions, 0 deletions
diff --git a/tools/sanitizer/docs/tsan.rst b/tools/sanitizer/docs/tsan.rst
new file mode 100644
index 0000000000..77fb6c89d7
--- /dev/null
+++ b/tools/sanitizer/docs/tsan.rst
@@ -0,0 +1,327 @@
+Thread Sanitizer
+================
+
+What is Thread Sanitizer?
+--------------------------
+
+Thread Sanitizer (TSan) is a fast data race detector for C/C++ and Rust
+programs. It uses a compile-time instrumentation to check all non-race-free
+memory access at runtime. Unlike other tools, it understands compiler-builtin
+atomics and synchronization and therefore provides very accurate results
+with no false positives (except if unsupported synchronization primitives
+like inline assembly or memory fences are used). More information on how
+TSan works can be found on `the Thread Sanitizer wiki <https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm>`__.
+
+A `meta bug called tsan <https://bugzilla.mozilla.org/show_bug.cgi?id=tsan>`__
+is maintained to keep track of all the bugs found with TSan.
+
+A `blog post on hacks.mozilla.org <https://hacks.mozilla.org/2021/04/eliminating-data-races-in-firefox-a-technical-report/>`__ describes this project.
+
+Note that unlike other sanitizers, TSan is currently **only supported on Linux**.
+
+Downloading artifact builds
+---------------------------
+
+The easiest way to get Firefox builds with Thread Sanitizer is to download a
+continuous integration TSan build of mozilla-central (updated at least daily):
+
+- mozilla-central optimized builds:
+ `linux <https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.latest.firefox.linux64-tsan-opt/artifacts/public/build/target.tar.bz2>`__
+
+The fuzzing team also offers a tool called ``fuzzfetch`` to download this and many
+other CI builds. It makes downloading and unpacking these builds much easier and
+can be used not just for fuzzing but for all purposes that require a CI build download.
+
+You can install ``fuzzfetch`` from
+`Github <https://github.com/MozillaSecurity/fuzzfetch>`__ or
+`via pip <https://pypi.org/project/fuzzfetch/>`__.
+
+Afterwards, you can run
+
+::
+
+ $ python -m fuzzfetch --tsan -n firefox-tsan
+
+to get the build mentioned above unpacked into a directory called ``firefox-tsan``.
+
+Creating Try builds
+-------------------
+
+If for some reason you can't use the pre-built binaries mentioned in the
+previous section (e.g. you need to test a patch), you can either build
+Firefox yourself (see the following section) or use the :ref:`try server <Pushing to Try>`
+to create the customized build for you. Pushing to try requires L1 commit
+access. If you don't have this access yet you can request access (see
+`Becoming A Mozilla
+Committer <https://www.mozilla.org/about/governance/policies/commit/>`__
+and `Mozilla Commit Access
+Policy <https://www.mozilla.org/about/governance/policies/commit/access-policy/>`__
+for the requirements).
+
+Using ``mach try fuzzy --full`` you can select the ``build-linux64-tsan/opt`` job
+and related tests (if required).
+
+Creating local builds on Linux
+------------------------------
+
+Build prerequisites
+~~~~~~~~~~~~~~~~~~~
+
+LLVM/Clang/Rust
+^^^^^^^^^^^^^^^
+
+The TSan instrumentation is implemented as an LLVM pass and integrated
+into Clang. We strongly recommend that you use the Clang version supplied
+as part of the ``mach bootstrap`` process, as we backported several required
+fixes for TSan on Firefox.
+
+Sanitizer support in Rust is genuinely experimental,
+so our build system only works with a specially patched version of Rust
+that we build in our CI. To install that specific version (or update to a newer
+version), run the following in the root of your mozilla-central checkout:
+
+::
+
+ ./mach artifact toolchain --from-build linux64-rust-dev
+ rm -rf ~/.mozbuild/rustc-sanitizers
+ mv rustc ~/.mozbuild/rustc-sanitizers
+ rustup toolchain link gecko-sanitizers ~/.mozbuild/rustc-sanitizers
+ rustup override set gecko-sanitizers
+
+``mach artifact`` will always download the ``linux64-rust-dev`` toolchain associated
+with the current mozilla central commit you have checked out. The toolchain should
+mostly behave like a normal rust nightly but we don't recommend using it for anything
+other than building gecko, just in case. Also note that
+``~/.mozbuild/rustc-sanitizers`` is just a reasonable default location -- feel
+free to "install" the toolchain wherever you please.
+
+Building Firefox
+~~~~~~~~~~~~~~~~
+
+Getting the source
+^^^^^^^^^^^^^^^^^^
+
+Using that or any later revision, all you need to do is to :ref:`get yourself
+a clone of mozilla-central <Mercurial overview>`.
+
+Adjusting the build configuration
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Create the build configuration file ``mozconfig`` with the following
+content in your mozilla-central directory:
+
+::
+
+ # Combined .mozconfig file for TSan on Linux+Mac
+
+ mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir-ff-tsan
+
+ # Enable ASan specific code and build workarounds
+ ac_add_options --enable-thread-sanitizer
+
+ # This ensures that we also instrument Rust code.
+ export RUSTFLAGS="-Zsanitizer=thread"
+
+ # rustfmt is currently missing in Rust nightly
+ unset RUSTFMT
+
+ # Current Rust Nightly has warnings
+ ac_add_options --disable-warnings-as-errors
+
+ # These are required by TSan
+ ac_add_options --disable-jemalloc
+ ac_add_options --disable-crashreporter
+ ac_add_options --disable-elf-hack
+ ac_add_options --disable-profiling
+
+ # The Thread Sanitizer is not compatible with sandboxing
+ # (see bug 1182565)
+ ac_add_options --disable-sandbox
+
+ # Keep symbols to symbolize TSan traces later
+ export MOZ_DEBUG_SYMBOLS=1
+ ac_add_options --enable-debug-symbols
+ ac_add_options --disable-install-strip
+
+ # Settings for an opt build (preferred)
+ # The -gline-tables-only ensures that all the necessary debug information for ASan
+ # is present, but the rest is stripped so the resulting binaries are smaller.
+ ac_add_options --enable-optimize="-O2 -gline-tables-only"
+ ac_add_options --disable-debug
+
+ # Settings for a debug+opt build
+ #ac_add_options --enable-optimize
+ #ac_add_options --enable-debug
+
+
+Starting the build process
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Now you start the build process using the regular ``./mach build``
+command.
+
+Starting Firefox
+^^^^^^^^^^^^^^^^
+
+After the build has completed, ``./mach run`` with the usual options for
+running in a debugger (``gdb``, ``lldb``, ``rr``, etc.) work fine, as do
+the ``--disable-e10s`` and other options.
+
+Building only the JavaScript shell
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you want to build only the JavaScript shell instead of doing a full
+Firefox build, the build script below will probably help you to do so.
+Execute this script in the ``js/src/`` subdirectory and pass a directory
+name as the first parameter. The build will then be created in a new
+subdirectory with that name.
+
+::
+
+ #! /bin/sh
+
+ if [ -z $1 ] ; then
+ echo "usage: $0 <dirname>"
+ elif [ -d $1 ] ; then
+ echo "directory $1 already exists"
+ else
+ autoconf2.13
+ mkdir $1
+ cd $1
+ CC="/path/to/mozbuild/clang" \
+ CXX="/path/to/mozbuild/clang++" \
+ ../configure --disable-debug --enable-optimize="-O2 -gline-tables-only" --enable-thread-sanitizer --disable-jemalloc
+ fi
+
+Thread Sanitizer and Symbols
+----------------------------
+
+Unlike Address Sanitizer, TSan requires in-process symbolizing to work
+properly in the first place, as any kind of runtime suppressions will
+otherwise not work.
+
+Hence, it is required that you have a copy of ``llvm-symbolizer`` either
+in your ``PATH`` or pointed to by the ``TSAN_SYMBOLIZER_PATH`` environment
+variable. This binary is included in your local mozbuild directory, obtained
+by ``./mach bootstrap``.
+
+
+Runtime Suppressions
+--------------------
+
+TSan has the ability to suppress race reports at runtime. This can be used to
+silence a race while a fix is developed as well as to permanently silence a
+(benign) race that cannot be fixed.
+
+.. warning::
+ **Warning**: Many races *look* benign but are indeed not. Please read
+ the :ref:`FAQ section <Frequently Asked Questions about TSan>` carefully
+ and think twice before attempting to suppress a race.
+
+The runtime Suppression list is directly baked into Firefox at compile-time and
+located at `mozglue/build/TsanOptions.cpp <https://searchfox.org/mozilla-central/source/mozglue/build/TsanOptions.cpp>`__.
+
+.. warning::
+ **Important**: When adding a suppression, always make sure to include
+ the bug number. If the suppression is supposed to be permanent, please
+ add the string ``permanent`` in the same line as the bug number.
+
+.. warning::
+ **Important**: When adding a suppression for a *data race*, always make
+ sure to include a stack frame from **each** of the two race stacks.
+ Adding only one suppression for one stack can cause intermittent failures
+ that are later on hard to track. One exception to this rule is when suppressing
+ races on global variables. In that case, a single race entry with the name of
+ the variable is sufficient.
+
+Troubleshooting / Known Problems
+--------------------------------
+
+Known Sources of False Positives
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+TSan has a number of things that can cause false positives, namely:
+
+ * The use of memory fences (e.g. Rust Arc)
+ * The use of inline assembly for synchronization
+ * Uninstrumented code (e.g. external libraries) using compiler-builtins for synchronization
+ * A lock order inversion involving only a single thread can cause a false positive deadlock
+ report (see also https://github.com/google/sanitizers/issues/488).
+
+If none of these four items are involved, you should *never* assume that TSan is reporting
+a false positive to you without consulting TSan peers. It is very easy to misjudge a race
+to be a false positive because races can be highly complex and totally non-obvious due to
+compiler optimizations and the nature of parallel code.
+
+Intermittent Broken Stacks
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you intermittently see race reports where one stack is missing with a ``failed to restore the stack``
+message, this can indicate that a suppression is partially covering the race you are seeing.
+
+Any race where only one of the two stacks is matched by a runtime suppression will show up
+if that particular stack fails to symbolize for some reason. The usual solution is to search
+the suppressions for potential candidates and disable them temporarily to check if your race
+report now becomes mostly consistent.
+
+However, there are other reasons for broken TSan stacks, in particular if they are not intermittent.
+See also the ``history_size`` parameter in the `TSan flags <https://github.com/google/sanitizers/wiki/ThreadSanitizerFlags>`__.
+
+Intermittent Race Reports
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Unfortunately, the TSan algorithm does not guarantee, that a race is detected 100% of the
+time. Intermittent failures with TSan are (to a certain degree) to be expected and the races
+involved should be filed and fixed to solve the problem.
+
+.. _Frequently Asked Questions about TSan:
+
+Frequently Asked Questions about TSan
+-------------------------------------
+
+Why fix data races?
+~~~~~~~~~~~~~~~~~~~
+
+Data races are undefined behavior and can cause crashes as well as correctness issues.
+Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.
+
+At Mozilla, we have already seen several dangerous races, causing random
+`use-after-free crashes <https://bugzilla.mozilla.org/show_bug.cgi?id=1580288>`__,
+`intermittent test failures <https://bugzilla.mozilla.org/show_bug.cgi?id=1602009>`__,
+`hangs <https://bugzilla.mozilla.org/show_bug.cgi?id=1607008>`__,
+`performance issues <https://bugzilla.mozilla.org/show_bug.cgi?id=1615045>`__ and
+`intermittent asserts <https://bugzilla.mozilla.org/show_bug.cgi?id=1601940>`__. Such problems do
+not only decrease the quality of our code and user experience, but they also waste countless hours
+of developer time.
+
+Since it is very hard to judge if a particular race could cause such a situation, we
+have decided to fix all data races wherever possible, since doing so is often cheaper
+than analyzing a race.
+
+My race is benign, can we ignore it?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+While it is possible to add a runtime suppression to ignore the race, we *strongly* encourage
+you to not do so, for two reasons:
+
+ 1. Each suppressed race decreases the overall performance of the TSan build, as the race
+ has to be symbolized each time when it occurs. Since TSan is already in itself a slow
+ build, we need to keep the amount of suppressed races as low as possible.
+
+ 2. Deciding if a race is truly benign is surprisingly hard. We recommend to read
+ `this blog post <http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong>`__
+ and `this paper <https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf>`
+ on the effects of seemingly benign races.
+
+Valid reasons to suppress a confirmed benign race include performance problems arising from
+fixing the race or cases where fixing the race would require an unreasonable amount of work.
+
+Note that the use of atomics usually does not have the bad performance impact that developers
+tend to associate with it. If you assume that e.g. using atomics for synchronization will
+cause performance regressions, we suggest to perform a benchmark to confirm this. In many
+cases, the difference is not measurable.
+
+How does TSan work exactly?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+More information on how TSan works can be found on `the Thread Sanitizer wiki <https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm>`__.