diff options
Diffstat (limited to 'tools/sanitizer/docs/tsan.rst')
-rw-r--r-- | tools/sanitizer/docs/tsan.rst | 327 |
1 files changed, 327 insertions, 0 deletions
diff --git a/tools/sanitizer/docs/tsan.rst b/tools/sanitizer/docs/tsan.rst new file mode 100644 index 0000000000..77fb6c89d7 --- /dev/null +++ b/tools/sanitizer/docs/tsan.rst @@ -0,0 +1,327 @@ +Thread Sanitizer +================ + +What is Thread Sanitizer? +-------------------------- + +Thread Sanitizer (TSan) is a fast data race detector for C/C++ and Rust +programs. It uses a compile-time instrumentation to check all non-race-free +memory access at runtime. Unlike other tools, it understands compiler-builtin +atomics and synchronization and therefore provides very accurate results +with no false positives (except if unsupported synchronization primitives +like inline assembly or memory fences are used). More information on how +TSan works can be found on `the Thread Sanitizer wiki <https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm>`__. + +A `meta bug called tsan <https://bugzilla.mozilla.org/show_bug.cgi?id=tsan>`__ +is maintained to keep track of all the bugs found with TSan. + +A `blog post on hacks.mozilla.org <https://hacks.mozilla.org/2021/04/eliminating-data-races-in-firefox-a-technical-report/>`__ describes this project. + +Note that unlike other sanitizers, TSan is currently **only supported on Linux**. + +Downloading artifact builds +--------------------------- + +The easiest way to get Firefox builds with Thread Sanitizer is to download a +continuous integration TSan build of mozilla-central (updated at least daily): + +- mozilla-central optimized builds: + `linux <https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.latest.firefox.linux64-tsan-opt/artifacts/public/build/target.tar.bz2>`__ + +The fuzzing team also offers a tool called ``fuzzfetch`` to download this and many +other CI builds. It makes downloading and unpacking these builds much easier and +can be used not just for fuzzing but for all purposes that require a CI build download. + +You can install ``fuzzfetch`` from +`Github <https://github.com/MozillaSecurity/fuzzfetch>`__ or +`via pip <https://pypi.org/project/fuzzfetch/>`__. + +Afterwards, you can run + +:: + + $ python -m fuzzfetch --tsan -n firefox-tsan + +to get the build mentioned above unpacked into a directory called ``firefox-tsan``. + +Creating Try builds +------------------- + +If for some reason you can't use the pre-built binaries mentioned in the +previous section (e.g. you need to test a patch), you can either build +Firefox yourself (see the following section) or use the :ref:`try server <Pushing to Try>` +to create the customized build for you. Pushing to try requires L1 commit +access. If you don't have this access yet you can request access (see +`Becoming A Mozilla +Committer <https://www.mozilla.org/about/governance/policies/commit/>`__ +and `Mozilla Commit Access +Policy <https://www.mozilla.org/about/governance/policies/commit/access-policy/>`__ +for the requirements). + +Using ``mach try fuzzy --full`` you can select the ``build-linux64-tsan/opt`` job +and related tests (if required). + +Creating local builds on Linux +------------------------------ + +Build prerequisites +~~~~~~~~~~~~~~~~~~~ + +LLVM/Clang/Rust +^^^^^^^^^^^^^^^ + +The TSan instrumentation is implemented as an LLVM pass and integrated +into Clang. We strongly recommend that you use the Clang version supplied +as part of the ``mach bootstrap`` process, as we backported several required +fixes for TSan on Firefox. + +Sanitizer support in Rust is genuinely experimental, +so our build system only works with a specially patched version of Rust +that we build in our CI. To install that specific version (or update to a newer +version), run the following in the root of your mozilla-central checkout: + +:: + + ./mach artifact toolchain --from-build linux64-rust-dev + rm -rf ~/.mozbuild/rustc-sanitizers + mv rustc ~/.mozbuild/rustc-sanitizers + rustup toolchain link gecko-sanitizers ~/.mozbuild/rustc-sanitizers + rustup override set gecko-sanitizers + +``mach artifact`` will always download the ``linux64-rust-dev`` toolchain associated +with the current mozilla central commit you have checked out. The toolchain should +mostly behave like a normal rust nightly but we don't recommend using it for anything +other than building gecko, just in case. Also note that +``~/.mozbuild/rustc-sanitizers`` is just a reasonable default location -- feel +free to "install" the toolchain wherever you please. + +Building Firefox +~~~~~~~~~~~~~~~~ + +Getting the source +^^^^^^^^^^^^^^^^^^ + +Using that or any later revision, all you need to do is to :ref:`get yourself +a clone of mozilla-central <Mercurial overview>`. + +Adjusting the build configuration +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Create the build configuration file ``mozconfig`` with the following +content in your mozilla-central directory: + +:: + + # Combined .mozconfig file for TSan on Linux+Mac + + mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir-ff-tsan + + # Enable ASan specific code and build workarounds + ac_add_options --enable-thread-sanitizer + + # This ensures that we also instrument Rust code. + export RUSTFLAGS="-Zsanitizer=thread" + + # rustfmt is currently missing in Rust nightly + unset RUSTFMT + + # Current Rust Nightly has warnings + ac_add_options --disable-warnings-as-errors + + # These are required by TSan + ac_add_options --disable-jemalloc + ac_add_options --disable-crashreporter + ac_add_options --disable-elf-hack + ac_add_options --disable-profiling + + # The Thread Sanitizer is not compatible with sandboxing + # (see bug 1182565) + ac_add_options --disable-sandbox + + # Keep symbols to symbolize TSan traces later + export MOZ_DEBUG_SYMBOLS=1 + ac_add_options --enable-debug-symbols + ac_add_options --disable-install-strip + + # Settings for an opt build (preferred) + # The -gline-tables-only ensures that all the necessary debug information for ASan + # is present, but the rest is stripped so the resulting binaries are smaller. + ac_add_options --enable-optimize="-O2 -gline-tables-only" + ac_add_options --disable-debug + + # Settings for a debug+opt build + #ac_add_options --enable-optimize + #ac_add_options --enable-debug + + +Starting the build process +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Now you start the build process using the regular ``./mach build`` +command. + +Starting Firefox +^^^^^^^^^^^^^^^^ + +After the build has completed, ``./mach run`` with the usual options for +running in a debugger (``gdb``, ``lldb``, ``rr``, etc.) work fine, as do +the ``--disable-e10s`` and other options. + +Building only the JavaScript shell +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you want to build only the JavaScript shell instead of doing a full +Firefox build, the build script below will probably help you to do so. +Execute this script in the ``js/src/`` subdirectory and pass a directory +name as the first parameter. The build will then be created in a new +subdirectory with that name. + +:: + + #! /bin/sh + + if [ -z $1 ] ; then + echo "usage: $0 <dirname>" + elif [ -d $1 ] ; then + echo "directory $1 already exists" + else + autoconf2.13 + mkdir $1 + cd $1 + CC="/path/to/mozbuild/clang" \ + CXX="/path/to/mozbuild/clang++" \ + ../configure --disable-debug --enable-optimize="-O2 -gline-tables-only" --enable-thread-sanitizer --disable-jemalloc + fi + +Thread Sanitizer and Symbols +---------------------------- + +Unlike Address Sanitizer, TSan requires in-process symbolizing to work +properly in the first place, as any kind of runtime suppressions will +otherwise not work. + +Hence, it is required that you have a copy of ``llvm-symbolizer`` either +in your ``PATH`` or pointed to by the ``TSAN_SYMBOLIZER_PATH`` environment +variable. This binary is included in your local mozbuild directory, obtained +by ``./mach bootstrap``. + + +Runtime Suppressions +-------------------- + +TSan has the ability to suppress race reports at runtime. This can be used to +silence a race while a fix is developed as well as to permanently silence a +(benign) race that cannot be fixed. + +.. warning:: + **Warning**: Many races *look* benign but are indeed not. Please read + the :ref:`FAQ section <Frequently Asked Questions about TSan>` carefully + and think twice before attempting to suppress a race. + +The runtime Suppression list is directly baked into Firefox at compile-time and +located at `mozglue/build/TsanOptions.cpp <https://searchfox.org/mozilla-central/source/mozglue/build/TsanOptions.cpp>`__. + +.. warning:: + **Important**: When adding a suppression, always make sure to include + the bug number. If the suppression is supposed to be permanent, please + add the string ``permanent`` in the same line as the bug number. + +.. warning:: + **Important**: When adding a suppression for a *data race*, always make + sure to include a stack frame from **each** of the two race stacks. + Adding only one suppression for one stack can cause intermittent failures + that are later on hard to track. One exception to this rule is when suppressing + races on global variables. In that case, a single race entry with the name of + the variable is sufficient. + +Troubleshooting / Known Problems +-------------------------------- + +Known Sources of False Positives +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +TSan has a number of things that can cause false positives, namely: + + * The use of memory fences (e.g. Rust Arc) + * The use of inline assembly for synchronization + * Uninstrumented code (e.g. external libraries) using compiler-builtins for synchronization + * A lock order inversion involving only a single thread can cause a false positive deadlock + report (see also https://github.com/google/sanitizers/issues/488). + +If none of these four items are involved, you should *never* assume that TSan is reporting +a false positive to you without consulting TSan peers. It is very easy to misjudge a race +to be a false positive because races can be highly complex and totally non-obvious due to +compiler optimizations and the nature of parallel code. + +Intermittent Broken Stacks +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you intermittently see race reports where one stack is missing with a ``failed to restore the stack`` +message, this can indicate that a suppression is partially covering the race you are seeing. + +Any race where only one of the two stacks is matched by a runtime suppression will show up +if that particular stack fails to symbolize for some reason. The usual solution is to search +the suppressions for potential candidates and disable them temporarily to check if your race +report now becomes mostly consistent. + +However, there are other reasons for broken TSan stacks, in particular if they are not intermittent. +See also the ``history_size`` parameter in the `TSan flags <https://github.com/google/sanitizers/wiki/ThreadSanitizerFlags>`__. + +Intermittent Race Reports +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Unfortunately, the TSan algorithm does not guarantee, that a race is detected 100% of the +time. Intermittent failures with TSan are (to a certain degree) to be expected and the races +involved should be filed and fixed to solve the problem. + +.. _Frequently Asked Questions about TSan: + +Frequently Asked Questions about TSan +------------------------------------- + +Why fix data races? +~~~~~~~~~~~~~~~~~~~ + +Data races are undefined behavior and can cause crashes as well as correctness issues. +Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior. + +At Mozilla, we have already seen several dangerous races, causing random +`use-after-free crashes <https://bugzilla.mozilla.org/show_bug.cgi?id=1580288>`__, +`intermittent test failures <https://bugzilla.mozilla.org/show_bug.cgi?id=1602009>`__, +`hangs <https://bugzilla.mozilla.org/show_bug.cgi?id=1607008>`__, +`performance issues <https://bugzilla.mozilla.org/show_bug.cgi?id=1615045>`__ and +`intermittent asserts <https://bugzilla.mozilla.org/show_bug.cgi?id=1601940>`__. Such problems do +not only decrease the quality of our code and user experience, but they also waste countless hours +of developer time. + +Since it is very hard to judge if a particular race could cause such a situation, we +have decided to fix all data races wherever possible, since doing so is often cheaper +than analyzing a race. + +My race is benign, can we ignore it? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +While it is possible to add a runtime suppression to ignore the race, we *strongly* encourage +you to not do so, for two reasons: + + 1. Each suppressed race decreases the overall performance of the TSan build, as the race + has to be symbolized each time when it occurs. Since TSan is already in itself a slow + build, we need to keep the amount of suppressed races as low as possible. + + 2. Deciding if a race is truly benign is surprisingly hard. We recommend to read + `this blog post <http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong>`__ + and `this paper <https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf>` + on the effects of seemingly benign races. + +Valid reasons to suppress a confirmed benign race include performance problems arising from +fixing the race or cases where fixing the race would require an unreasonable amount of work. + +Note that the use of atomics usually does not have the bad performance impact that developers +tend to associate with it. If you assume that e.g. using atomics for synchronization will +cause performance regressions, we suggest to perform a benchmark to confirm this. In many +cases, the difference is not measurable. + +How does TSan work exactly? +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +More information on how TSan works can be found on `the Thread Sanitizer wiki <https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm>`__. |