From 5d1646d90e1f2cceb9f0828f4b28318cd0ec7744 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 27 Apr 2024 12:05:51 +0200 Subject: Adding upstream version 5.10.209. Signed-off-by: Daniel Baumann --- Documentation/dev-tools/coccinelle.rst | 512 ++++++++++++ Documentation/dev-tools/gcov.rst | 270 +++++++ Documentation/dev-tools/gdb-kernel-debugging.rst | 179 +++++ Documentation/dev-tools/index.rst | 36 + Documentation/dev-tools/kasan.rst | 355 +++++++++ Documentation/dev-tools/kcov.rst | 334 ++++++++ Documentation/dev-tools/kcsan.rst | 316 ++++++++ Documentation/dev-tools/kgdb.rst | 940 +++++++++++++++++++++++ Documentation/dev-tools/kmemleak.rst | 259 +++++++ Documentation/dev-tools/kselftest.rst | 350 +++++++++ Documentation/dev-tools/kunit/api/index.rst | 16 + Documentation/dev-tools/kunit/api/test.rst | 11 + Documentation/dev-tools/kunit/faq.rst | 103 +++ Documentation/dev-tools/kunit/index.rst | 94 +++ Documentation/dev-tools/kunit/kunit-tool.rst | 57 ++ Documentation/dev-tools/kunit/start.rst | 237 ++++++ Documentation/dev-tools/kunit/style.rst | 205 +++++ Documentation/dev-tools/kunit/usage.rst | 617 +++++++++++++++ Documentation/dev-tools/sparse.rst | 102 +++ Documentation/dev-tools/ubsan.rst | 88 +++ 20 files changed, 5081 insertions(+) create mode 100644 Documentation/dev-tools/coccinelle.rst create mode 100644 Documentation/dev-tools/gcov.rst create mode 100644 Documentation/dev-tools/gdb-kernel-debugging.rst create mode 100644 Documentation/dev-tools/index.rst create mode 100644 Documentation/dev-tools/kasan.rst create mode 100644 Documentation/dev-tools/kcov.rst create mode 100644 Documentation/dev-tools/kcsan.rst create mode 100644 Documentation/dev-tools/kgdb.rst create mode 100644 Documentation/dev-tools/kmemleak.rst create mode 100644 Documentation/dev-tools/kselftest.rst create mode 100644 Documentation/dev-tools/kunit/api/index.rst create mode 100644 Documentation/dev-tools/kunit/api/test.rst create mode 100644 Documentation/dev-tools/kunit/faq.rst create mode 100644 Documentation/dev-tools/kunit/index.rst create mode 100644 Documentation/dev-tools/kunit/kunit-tool.rst create mode 100644 Documentation/dev-tools/kunit/start.rst create mode 100644 Documentation/dev-tools/kunit/style.rst create mode 100644 Documentation/dev-tools/kunit/usage.rst create mode 100644 Documentation/dev-tools/sparse.rst create mode 100644 Documentation/dev-tools/ubsan.rst (limited to 'Documentation/dev-tools') diff --git a/Documentation/dev-tools/coccinelle.rst b/Documentation/dev-tools/coccinelle.rst new file mode 100644 index 000000000..74c5e6aee --- /dev/null +++ b/Documentation/dev-tools/coccinelle.rst @@ -0,0 +1,512 @@ +.. Copyright 2010 Nicolas Palix +.. Copyright 2010 Julia Lawall +.. Copyright 2010 Gilles Muller + +.. highlight:: none + +.. _devtools_coccinelle: + +Coccinelle +========== + +Coccinelle is a tool for pattern matching and text transformation that has +many uses in kernel development, including the application of complex, +tree-wide patches and detection of problematic programming patterns. + +Getting Coccinelle +------------------ + +The semantic patches included in the kernel use features and options +which are provided by Coccinelle version 1.0.0-rc11 and above. +Using earlier versions will fail as the option names used by +the Coccinelle files and coccicheck have been updated. + +Coccinelle is available through the package manager +of many distributions, e.g. : + + - Debian + - Fedora + - Ubuntu + - OpenSUSE + - Arch Linux + - NetBSD + - FreeBSD + +Some distribution packages are obsolete and it is recommended +to use the latest version released from the Coccinelle homepage at +http://coccinelle.lip6.fr/ + +Or from Github at: + +https://github.com/coccinelle/coccinelle + +Once you have it, run the following commands:: + + ./autogen + ./configure + make + +as a regular user, and install it with:: + + sudo make install + +More detailed installation instructions to build from source can be +found at: + +https://github.com/coccinelle/coccinelle/blob/master/install.txt + +Supplemental documentation +-------------------------- + +For supplemental documentation refer to the wiki: + +https://bottest.wiki.kernel.org/coccicheck + +The wiki documentation always refers to the linux-next version of the script. + +For Semantic Patch Language(SmPL) grammar documentation refer to: + +http://coccinelle.lip6.fr/documentation.php + +Using Coccinelle on the Linux kernel +------------------------------------ + +A Coccinelle-specific target is defined in the top level +Makefile. This target is named ``coccicheck`` and calls the ``coccicheck`` +front-end in the ``scripts`` directory. + +Four basic modes are defined: ``patch``, ``report``, ``context``, and +``org``. The mode to use is specified by setting the MODE variable with +``MODE=``. + +- ``patch`` proposes a fix, when possible. + +- ``report`` generates a list in the following format: + file:line:column-column: message + +- ``context`` highlights lines of interest and their context in a + diff-like style. Lines of interest are indicated with ``-``. + +- ``org`` generates a report in the Org mode format of Emacs. + +Note that not all semantic patches implement all modes. For easy use +of Coccinelle, the default mode is "report". + +Two other modes provide some common combinations of these modes. + +- ``chain`` tries the previous modes in the order above until one succeeds. + +- ``rep+ctxt`` runs successively the report mode and the context mode. + It should be used with the C option (described later) + which checks the code on a file basis. + +Examples +~~~~~~~~ + +To make a report for every semantic patch, run the following command:: + + make coccicheck MODE=report + +To produce patches, run:: + + make coccicheck MODE=patch + + +The coccicheck target applies every semantic patch available in the +sub-directories of ``scripts/coccinelle`` to the entire Linux kernel. + +For each semantic patch, a commit message is proposed. It gives a +description of the problem being checked by the semantic patch, and +includes a reference to Coccinelle. + +As with any static code analyzer, Coccinelle produces false +positives. Thus, reports must be carefully checked, and patches +reviewed. + +To enable verbose messages set the V= variable, for example:: + + make coccicheck MODE=report V=1 + +Coccinelle parallelization +-------------------------- + +By default, coccicheck tries to run as parallel as possible. To change +the parallelism, set the J= variable. For example, to run across 4 CPUs:: + + make coccicheck MODE=report J=4 + +As of Coccinelle 1.0.2 Coccinelle uses Ocaml parmap for parallelization; +if support for this is detected you will benefit from parmap parallelization. + +When parmap is enabled coccicheck will enable dynamic load balancing by using +``--chunksize 1`` argument. This ensures we keep feeding threads with work +one by one, so that we avoid the situation where most work gets done by only +a few threads. With dynamic load balancing, if a thread finishes early we keep +feeding it more work. + +When parmap is enabled, if an error occurs in Coccinelle, this error +value is propagated back, and the return value of the ``make coccicheck`` +command captures this return value. + +Using Coccinelle with a single semantic patch +--------------------------------------------- + +The optional make variable COCCI can be used to check a single +semantic patch. In that case, the variable must be initialized with +the name of the semantic patch to apply. + +For instance:: + + make coccicheck COCCI= MODE=patch + +or:: + + make coccicheck COCCI= MODE=report + + +Controlling Which Files are Processed by Coccinelle +--------------------------------------------------- + +By default the entire kernel source tree is checked. + +To apply Coccinelle to a specific directory, ``M=`` can be used. +For example, to check drivers/net/wireless/ one may write:: + + make coccicheck M=drivers/net/wireless/ + +To apply Coccinelle on a file basis, instead of a directory basis, the +C variable is used by the makefile to select which files to work with. +This variable can be used to run scripts for the entire kernel, a +specific directory, or for a single file. + +For example, to check drivers/bluetooth/bfusb.c, the value 1 is +passed to the C variable to check files that make considers +need to be compiled.:: + + make C=1 CHECK=scripts/coccicheck drivers/bluetooth/bfusb.o + +The value 2 is passed to the C variable to check files regardless of +whether they need to be compiled or not.:: + + make C=2 CHECK=scripts/coccicheck drivers/bluetooth/bfusb.o + +In these modes, which work on a file basis, there is no information +about semantic patches displayed, and no commit message proposed. + +This runs every semantic patch in scripts/coccinelle by default. The +COCCI variable may additionally be used to only apply a single +semantic patch as shown in the previous section. + +The "report" mode is the default. You can select another one with the +MODE variable explained above. + +Debugging Coccinelle SmPL patches +--------------------------------- + +Using coccicheck is best as it provides in the spatch command line +include options matching the options used when we compile the kernel. +You can learn what these options are by using V=1; you could then +manually run Coccinelle with debug options added. + +Alternatively you can debug running Coccinelle against SmPL patches +by asking for stderr to be redirected to stderr. By default stderr +is redirected to /dev/null; if you'd like to capture stderr you +can specify the ``DEBUG_FILE="file.txt"`` option to coccicheck. For +instance:: + + rm -f cocci.err + make coccicheck COCCI=scripts/coccinelle/free/kfree.cocci MODE=report DEBUG_FILE=cocci.err + cat cocci.err + +You can use SPFLAGS to add debugging flags; for instance you may want to +add both --profile --show-trying to SPFLAGS when debugging. For example +you may want to use:: + + rm -f err.log + export COCCI=scripts/coccinelle/misc/irqf_oneshot.cocci + make coccicheck DEBUG_FILE="err.log" MODE=report SPFLAGS="--profile --show-trying" M=./drivers/mfd/arizona-irq.c + +err.log will now have the profiling information, while stdout will +provide some progress information as Coccinelle moves forward with +work. + +DEBUG_FILE support is only supported when using coccinelle >= 1.0.2. + +.cocciconfig support +-------------------- + +Coccinelle supports reading .cocciconfig for default Coccinelle options that +should be used every time spatch is spawned. The order of precedence for +variables for .cocciconfig is as follows: + +- Your current user's home directory is processed first +- Your directory from which spatch is called is processed next +- The directory provided with the --dir option is processed last, if used + +Since coccicheck runs through make, it naturally runs from the kernel +proper dir; as such the second rule above would be implied for picking up a +.cocciconfig when using ``make coccicheck``. + +``make coccicheck`` also supports using M= targets. If you do not supply +any M= target, it is assumed you want to target the entire kernel. +The kernel coccicheck script has:: + + if [ "$KBUILD_EXTMOD" = "" ] ; then + OPTIONS="--dir $srctree $COCCIINCLUDE" + else + OPTIONS="--dir $KBUILD_EXTMOD $COCCIINCLUDE" + fi + +KBUILD_EXTMOD is set when an explicit target with M= is used. For both cases +the spatch --dir argument is used, as such third rule applies when whether M= +is used or not, and when M= is used the target directory can have its own +.cocciconfig file. When M= is not passed as an argument to coccicheck the +target directory is the same as the directory from where spatch was called. + +If not using the kernel's coccicheck target, keep the above precedence +order logic of .cocciconfig reading. If using the kernel's coccicheck target, +override any of the kernel's .coccicheck's settings using SPFLAGS. + +We help Coccinelle when used against Linux with a set of sensible default +options for Linux with our own Linux .cocciconfig. This hints to coccinelle +that git can be used for ``git grep`` queries over coccigrep. A timeout of 200 +seconds should suffice for now. + +The options picked up by coccinelle when reading a .cocciconfig do not appear +as arguments to spatch processes running on your system. To confirm what +options will be used by Coccinelle run:: + + spatch --print-options-only + +You can override with your own preferred index option by using SPFLAGS. Take +note that when there are conflicting options Coccinelle takes precedence for +the last options passed. Using .cocciconfig is possible to use idutils, however +given the order of precedence followed by Coccinelle, since the kernel now +carries its own .cocciconfig, you will need to use SPFLAGS to use idutils if +desired. See below section "Additional flags" for more details on how to use +idutils. + +Additional flags +---------------- + +Additional flags can be passed to spatch through the SPFLAGS +variable. This works as Coccinelle respects the last flags +given to it when options are in conflict. :: + + make SPFLAGS=--use-glimpse coccicheck + +Coccinelle supports idutils as well but requires coccinelle >= 1.0.6. +When no ID file is specified coccinelle assumes your ID database file +is in the file .id-utils.index on the top level of the kernel. Coccinelle +carries a script scripts/idutils_index.sh which creates the database with:: + + mkid -i C --output .id-utils.index + +If you have another database filename you can also just symlink with this +name. :: + + make SPFLAGS=--use-idutils coccicheck + +Alternatively you can specify the database filename explicitly, for +instance:: + + make SPFLAGS="--use-idutils /full-path/to/ID" coccicheck + +See ``spatch --help`` to learn more about spatch options. + +Note that the ``--use-glimpse`` and ``--use-idutils`` options +require external tools for indexing the code. None of them is +thus active by default. However, by indexing the code with +one of these tools, and according to the cocci file used, +spatch could proceed the entire code base more quickly. + +SmPL patch specific options +--------------------------- + +SmPL patches can have their own requirements for options passed +to Coccinelle. SmPL patch-specific options can be provided by +providing them at the top of the SmPL patch, for instance:: + + // Options: --no-includes --include-headers + +SmPL patch Coccinelle requirements +---------------------------------- + +As Coccinelle features get added some more advanced SmPL patches +may require newer versions of Coccinelle. If an SmPL patch requires +a minimum version of Coccinelle, this can be specified as follows, +as an example if requiring at least Coccinelle >= 1.0.5:: + + // Requires: 1.0.5 + +Proposing new semantic patches +------------------------------ + +New semantic patches can be proposed and submitted by kernel +developers. For sake of clarity, they should be organized in the +sub-directories of ``scripts/coccinelle/``. + + +Detailed description of the ``report`` mode +------------------------------------------- + +``report`` generates a list in the following format:: + + file:line:column-column: message + +Example +~~~~~~~ + +Running:: + + make coccicheck MODE=report COCCI=scripts/coccinelle/api/err_cast.cocci + +will execute the following part of the SmPL script:: + + + @r depends on !context && !patch && (org || report)@ + expression x; + position p; + @@ + + ERR_PTR@p(PTR_ERR(x)) + + @script:python depends on report@ + p << r.p; + x << r.x; + @@ + + msg="ERR_CAST can be used with %s" % (x) + coccilib.report.print_report(p[0], msg) + + +This SmPL excerpt generates entries on the standard output, as +illustrated below:: + + /home/user/linux/crypto/ctr.c:188:9-16: ERR_CAST can be used with alg + /home/user/linux/crypto/authenc.c:619:9-16: ERR_CAST can be used with auth + /home/user/linux/crypto/xts.c:227:9-16: ERR_CAST can be used with alg + + +Detailed description of the ``patch`` mode +------------------------------------------ + +When the ``patch`` mode is available, it proposes a fix for each problem +identified. + +Example +~~~~~~~ + +Running:: + + make coccicheck MODE=patch COCCI=scripts/coccinelle/api/err_cast.cocci + +will execute the following part of the SmPL script:: + + + @ depends on !context && patch && !org && !report @ + expression x; + @@ + + - ERR_PTR(PTR_ERR(x)) + + ERR_CAST(x) + + +This SmPL excerpt generates patch hunks on the standard output, as +illustrated below:: + + diff -u -p a/crypto/ctr.c b/crypto/ctr.c + --- a/crypto/ctr.c 2010-05-26 10:49:38.000000000 +0200 + +++ b/crypto/ctr.c 2010-06-03 23:44:49.000000000 +0200 + @@ -185,7 +185,7 @@ static struct crypto_instance *crypto_ct + alg = crypto_attr_alg(tb[1], CRYPTO_ALG_TYPE_CIPHER, + CRYPTO_ALG_TYPE_MASK); + if (IS_ERR(alg)) + - return ERR_PTR(PTR_ERR(alg)); + + return ERR_CAST(alg); + + /* Block size must be >= 4 bytes. */ + err = -EINVAL; + +Detailed description of the ``context`` mode +-------------------------------------------- + +``context`` highlights lines of interest and their context +in a diff-like style. + + **NOTE**: The diff-like output generated is NOT an applicable patch. The + intent of the ``context`` mode is to highlight the important lines + (annotated with minus, ``-``) and gives some surrounding context + lines around. This output can be used with the diff mode of + Emacs to review the code. + +Example +~~~~~~~ + +Running:: + + make coccicheck MODE=context COCCI=scripts/coccinelle/api/err_cast.cocci + +will execute the following part of the SmPL script:: + + + @ depends on context && !patch && !org && !report@ + expression x; + @@ + + * ERR_PTR(PTR_ERR(x)) + + +This SmPL excerpt generates diff hunks on the standard output, as +illustrated below:: + + diff -u -p /home/user/linux/crypto/ctr.c /tmp/nothing + --- /home/user/linux/crypto/ctr.c 2010-05-26 10:49:38.000000000 +0200 + +++ /tmp/nothing + @@ -185,7 +185,6 @@ static struct crypto_instance *crypto_ct + alg = crypto_attr_alg(tb[1], CRYPTO_ALG_TYPE_CIPHER, + CRYPTO_ALG_TYPE_MASK); + if (IS_ERR(alg)) + - return ERR_PTR(PTR_ERR(alg)); + + /* Block size must be >= 4 bytes. */ + err = -EINVAL; + +Detailed description of the ``org`` mode +---------------------------------------- + +``org`` generates a report in the Org mode format of Emacs. + +Example +~~~~~~~ + +Running:: + + make coccicheck MODE=org COCCI=scripts/coccinelle/api/err_cast.cocci + +will execute the following part of the SmPL script:: + + + @r depends on !context && !patch && (org || report)@ + expression x; + position p; + @@ + + ERR_PTR@p(PTR_ERR(x)) + + @script:python depends on org@ + p << r.p; + x << r.x; + @@ + + msg="ERR_CAST can be used with %s" % (x) + msg_safe=msg.replace("[","@(").replace("]",")") + coccilib.org.print_todo(p[0], msg_safe) + + +This SmPL excerpt generates Org entries on the standard output, as +illustrated below:: + + * TODO [[view:/home/user/linux/crypto/ctr.c::face=ovl-face1::linb=188::colb=9::cole=16][ERR_CAST can be used with alg]] + * TODO [[view:/home/user/linux/crypto/authenc.c::face=ovl-face1::linb=619::colb=9::cole=16][ERR_CAST can be used with auth]] + * TODO [[view:/home/user/linux/crypto/xts.c::face=ovl-face1::linb=227::colb=9::cole=16][ERR_CAST can be used with alg]] diff --git a/Documentation/dev-tools/gcov.rst b/Documentation/dev-tools/gcov.rst new file mode 100644 index 000000000..9e989baae --- /dev/null +++ b/Documentation/dev-tools/gcov.rst @@ -0,0 +1,270 @@ +Using gcov with the Linux kernel +================================ + +gcov profiling kernel support enables the use of GCC's coverage testing +tool gcov_ with the Linux kernel. Coverage data of a running kernel +is exported in gcov-compatible format via the "gcov" debugfs directory. +To get coverage data for a specific file, change to the kernel build +directory and use gcov with the ``-o`` option as follows (requires root):: + + # cd /tmp/linux-out + # gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c + +This will create source code files annotated with execution counts +in the current directory. In addition, graphical gcov front-ends such +as lcov_ can be used to automate the process of collecting data +for the entire kernel and provide coverage overviews in HTML format. + +Possible uses: + +* debugging (has this line been reached at all?) +* test improvement (how do I change my test to cover these lines?) +* minimizing kernel configurations (do I need this option if the + associated code is never run?) + +.. _gcov: https://gcc.gnu.org/onlinedocs/gcc/Gcov.html +.. _lcov: http://ltp.sourceforge.net/coverage/lcov.php + + +Preparation +----------- + +Configure the kernel with:: + + CONFIG_DEBUG_FS=y + CONFIG_GCOV_KERNEL=y + +and to get coverage data for the entire kernel:: + + CONFIG_GCOV_PROFILE_ALL=y + +Note that kernels compiled with profiling flags will be significantly +larger and run slower. Also CONFIG_GCOV_PROFILE_ALL may not be supported +on all architectures. + +Profiling data will only become accessible once debugfs has been +mounted:: + + mount -t debugfs none /sys/kernel/debug + + +Customization +------------- + +To enable profiling for specific files or directories, add a line +similar to the following to the respective kernel Makefile: + +- For a single file (e.g. main.o):: + + GCOV_PROFILE_main.o := y + +- For all files in one directory:: + + GCOV_PROFILE := y + +To exclude files from being profiled even when CONFIG_GCOV_PROFILE_ALL +is specified, use:: + + GCOV_PROFILE_main.o := n + +and:: + + GCOV_PROFILE := n + +Only files which are linked to the main kernel image or are compiled as +kernel modules are supported by this mechanism. + + +Files +----- + +The gcov kernel support creates the following files in debugfs: + +``/sys/kernel/debug/gcov`` + Parent directory for all gcov-related files. + +``/sys/kernel/debug/gcov/reset`` + Global reset file: resets all coverage data to zero when + written to. + +``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcda`` + The actual gcov data file as understood by the gcov + tool. Resets file coverage data to zero when written to. + +``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcno`` + Symbolic link to a static data file required by the gcov + tool. This file is generated by gcc when compiling with + option ``-ftest-coverage``. + + +Modules +------- + +Kernel modules may contain cleanup code which is only run during +module unload time. The gcov mechanism provides a means to collect +coverage data for such code by keeping a copy of the data associated +with the unloaded module. This data remains available through debugfs. +Once the module is loaded again, the associated coverage counters are +initialized with the data from its previous instantiation. + +This behavior can be deactivated by specifying the gcov_persist kernel +parameter:: + + gcov_persist=0 + +At run-time, a user can also choose to discard data for an unloaded +module by writing to its data file or the global reset file. + + +Separated build and test machines +--------------------------------- + +The gcov kernel profiling infrastructure is designed to work out-of-the +box for setups where kernels are built and run on the same machine. In +cases where the kernel runs on a separate machine, special preparations +must be made, depending on where the gcov tool is used: + +a) gcov is run on the TEST machine + + The gcov tool version on the test machine must be compatible with the + gcc version used for kernel build. Also the following files need to be + copied from build to test machine: + + from the source tree: + - all C source files + headers + + from the build tree: + - all C source files + headers + - all .gcda and .gcno files + - all links to directories + + It is important to note that these files need to be placed into the + exact same file system location on the test machine as on the build + machine. If any of the path components is symbolic link, the actual + directory needs to be used instead (due to make's CURDIR handling). + +b) gcov is run on the BUILD machine + + The following files need to be copied after each test case from test + to build machine: + + from the gcov directory in sysfs: + - all .gcda files + - all links to .gcno files + + These files can be copied to any location on the build machine. gcov + must then be called with the -o option pointing to that directory. + + Example directory setup on the build machine:: + + /tmp/linux: kernel source tree + /tmp/out: kernel build directory as specified by make O= + /tmp/coverage: location of the files copied from the test machine + + [user@build] cd /tmp/out + [user@build] gcov -o /tmp/coverage/tmp/out/init main.c + + +Note on compilers +----------------- + +GCC and LLVM gcov tools are not necessarily compatible. Use gcov_ to work with +GCC-generated .gcno and .gcda files, and use llvm-cov_ for Clang. + +.. _gcov: https://gcc.gnu.org/onlinedocs/gcc/Gcov.html +.. _llvm-cov: https://llvm.org/docs/CommandGuide/llvm-cov.html + +Build differences between GCC and Clang gcov are handled by Kconfig. It +automatically selects the appropriate gcov format depending on the detected +toolchain. + + +Troubleshooting +--------------- + +Problem + Compilation aborts during linker step. + +Cause + Profiling flags are specified for source files which are not + linked to the main kernel or which are linked by a custom + linker procedure. + +Solution + Exclude affected source files from profiling by specifying + ``GCOV_PROFILE := n`` or ``GCOV_PROFILE_basename.o := n`` in the + corresponding Makefile. + +Problem + Files copied from sysfs appear empty or incomplete. + +Cause + Due to the way seq_file works, some tools such as cp or tar + may not correctly copy files from sysfs. + +Solution + Use ``cat`` to read ``.gcda`` files and ``cp -d`` to copy links. + Alternatively use the mechanism shown in Appendix B. + + +Appendix A: gather_on_build.sh +------------------------------ + +Sample script to gather coverage meta files on the build machine +(see 6a): + +.. code-block:: sh + + #!/bin/bash + + KSRC=$1 + KOBJ=$2 + DEST=$3 + + if [ -z "$KSRC" ] || [ -z "$KOBJ" ] || [ -z "$DEST" ]; then + echo "Usage: $0 " >&2 + exit 1 + fi + + KSRC=$(cd $KSRC; printf "all:\n\t@echo \${CURDIR}\n" | make -f -) + KOBJ=$(cd $KOBJ; printf "all:\n\t@echo \${CURDIR}\n" | make -f -) + + find $KSRC $KOBJ \( -name '*.gcno' -o -name '*.[ch]' -o -type l \) -a \ + -perm /u+r,g+r | tar cfz $DEST -P -T - + + if [ $? -eq 0 ] ; then + echo "$DEST successfully created, copy to test system and unpack with:" + echo " tar xfz $DEST -P" + else + echo "Could not create file $DEST" + fi + + +Appendix B: gather_on_test.sh +----------------------------- + +Sample script to gather coverage data files on the test machine +(see 6b): + +.. code-block:: sh + + #!/bin/bash -e + + DEST=$1 + GCDA=/sys/kernel/debug/gcov + + if [ -z "$DEST" ] ; then + echo "Usage: $0 " >&2 + exit 1 + fi + + TEMPDIR=$(mktemp -d) + echo Collecting data.. + find $GCDA -type d -exec mkdir -p $TEMPDIR/\{\} \; + find $GCDA -name '*.gcda' -exec sh -c 'cat < $0 > '$TEMPDIR'/$0' {} \; + find $GCDA -name '*.gcno' -exec sh -c 'cp -d $0 '$TEMPDIR'/$0' {} \; + tar czf $DEST -C $TEMPDIR sys + rm -rf $TEMPDIR + + echo "$DEST successfully created, copy to build system and unpack with:" + echo " tar xfz $DEST" diff --git a/Documentation/dev-tools/gdb-kernel-debugging.rst b/Documentation/dev-tools/gdb-kernel-debugging.rst new file mode 100644 index 000000000..10cdd990b --- /dev/null +++ b/Documentation/dev-tools/gdb-kernel-debugging.rst @@ -0,0 +1,179 @@ +.. highlight:: none + +Debugging kernel and modules via gdb +==================================== + +The kernel debugger kgdb, hypervisors like QEMU or JTAG-based hardware +interfaces allow to debug the Linux kernel and its modules during runtime +using gdb. Gdb comes with a powerful scripting interface for python. The +kernel provides a collection of helper scripts that can simplify typical +kernel debugging steps. This is a short tutorial about how to enable and use +them. It focuses on QEMU/KVM virtual machines as target, but the examples can +be transferred to the other gdb stubs as well. + + +Requirements +------------ + +- gdb 7.2+ (recommended: 7.4+) with python support enabled (typically true + for distributions) + + +Setup +----- + +- Create a virtual Linux machine for QEMU/KVM (see www.linux-kvm.org and + www.qemu.org for more details). For cross-development, + https://landley.net/aboriginal/bin keeps a pool of machine images and + toolchains that can be helpful to start from. + +- Build the kernel with CONFIG_GDB_SCRIPTS enabled, but leave + CONFIG_DEBUG_INFO_REDUCED off. If your architecture supports + CONFIG_FRAME_POINTER, keep it enabled. + +- Install that kernel on the guest, turn off KASLR if necessary by adding + "nokaslr" to the kernel command line. + Alternatively, QEMU allows to boot the kernel directly using -kernel, + -append, -initrd command line switches. This is generally only useful if + you do not depend on modules. See QEMU documentation for more details on + this mode. In this case, you should build the kernel with + CONFIG_RANDOMIZE_BASE disabled if the architecture supports KASLR. + +- Build the gdb scripts (required on kernels v5.1 and above):: + + make scripts_gdb + +- Enable the gdb stub of QEMU/KVM, either + + - at VM startup time by appending "-s" to the QEMU command line + + or + + - during runtime by issuing "gdbserver" from the QEMU monitor + console + +- cd /path/to/linux-build + +- Start gdb: gdb vmlinux + + Note: Some distros may restrict auto-loading of gdb scripts to known safe + directories. In case gdb reports to refuse loading vmlinux-gdb.py, add:: + + add-auto-load-safe-path /path/to/linux-build + + to ~/.gdbinit. See gdb help for more details. + +- Attach to the booted guest:: + + (gdb) target remote :1234 + + +Examples of using the Linux-provided gdb helpers +------------------------------------------------ + +- Load module (and main kernel) symbols:: + + (gdb) lx-symbols + loading vmlinux + scanning for modules in /home/user/linux/build + loading @0xffffffffa0020000: /home/user/linux/build/net/netfilter/xt_tcpudp.ko + loading @0xffffffffa0016000: /home/user/linux/build/net/netfilter/xt_pkttype.ko + loading @0xffffffffa0002000: /home/user/linux/build/net/netfilter/xt_limit.ko + loading @0xffffffffa00ca000: /home/user/linux/build/net/packet/af_packet.ko + loading @0xffffffffa003c000: /home/user/linux/build/fs/fuse/fuse.ko + ... + loading @0xffffffffa0000000: /home/user/linux/build/drivers/ata/ata_generic.ko + +- Set a breakpoint on some not yet loaded module function, e.g.:: + + (gdb) b btrfs_init_sysfs + Function "btrfs_init_sysfs" not defined. + Make breakpoint pending on future shared library load? (y or [n]) y + Breakpoint 1 (btrfs_init_sysfs) pending. + +- Continue the target:: + + (gdb) c + +- Load the module on the target and watch the symbols being loaded as well as + the breakpoint hit:: + + loading @0xffffffffa0034000: /home/user/linux/build/lib/libcrc32c.ko + loading @0xffffffffa0050000: /home/user/linux/build/lib/lzo/lzo_compress.ko + loading @0xffffffffa006e000: /home/user/linux/build/lib/zlib_deflate/zlib_deflate.ko + loading @0xffffffffa01b1000: /home/user/linux/build/fs/btrfs/btrfs.ko + + Breakpoint 1, btrfs_init_sysfs () at /home/user/linux/fs/btrfs/sysfs.c:36 + 36 btrfs_kset = kset_create_and_add("btrfs", NULL, fs_kobj); + +- Dump the log buffer of the target kernel:: + + (gdb) lx-dmesg + [ 0.000000] Initializing cgroup subsys cpuset + [ 0.000000] Initializing cgroup subsys cpu + [ 0.000000] Linux version 3.8.0-rc4-dbg+ (... + [ 0.000000] Command line: root=/dev/sda2 resume=/dev/sda1 vga=0x314 + [ 0.000000] e820: BIOS-provided physical RAM map: + [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable + [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved + .... + +- Examine fields of the current task struct:: + + (gdb) p $lx_current().pid + $1 = 4998 + (gdb) p $lx_current().comm + $2 = "modprobe\000\000\000\000\000\000\000" + +- Make use of the per-cpu function for the current or a specified CPU:: + + (gdb) p $lx_per_cpu("runqueues").nr_running + $3 = 1 + (gdb) p $lx_per_cpu("runqueues", 2).nr_running + $4 = 0 + +- Dig into hrtimers using the container_of helper:: + + (gdb) set $next = $lx_per_cpu("hrtimer_bases").clock_base[0].active.next + (gdb) p *$container_of($next, "struct hrtimer", "node") + $5 = { + node = { + node = { + __rb_parent_color = 18446612133355256072, + rb_right = 0x0 , + rb_left = 0x0 + }, + expires = { + tv64 = 1835268000000 + } + }, + _softexpires = { + tv64 = 1835268000000 + }, + function = 0xffffffff81078232 , + base = 0xffff88003fd0d6f0, + state = 1, + start_pid = 0, + start_site = 0xffffffff81055c1f , + start_comm = "swapper/2\000\000\000\000\000\000" + } + + +List of commands and functions +------------------------------ + +The number of commands and convenience functions may evolve over the time, +this is just a snapshot of the initial version:: + + (gdb) apropos lx + function lx_current -- Return current task + function lx_module -- Find module by name and return the module variable + function lx_per_cpu -- Return per-cpu variable + function lx_task_by_pid -- Find Linux task by PID and return the task_struct variable + function lx_thread_info -- Calculate Linux thread_info from task variable + lx-dmesg -- Print Linux kernel log buffer + lx-lsmod -- List currently loaded modules + lx-symbols -- (Re-)load symbols of Linux kernel and currently loaded modules + +Detailed help can be obtained via "help " for commands and "help +function " for convenience functions. diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst new file mode 100644 index 000000000..f7809c7b1 --- /dev/null +++ b/Documentation/dev-tools/index.rst @@ -0,0 +1,36 @@ +================================ +Development tools for the kernel +================================ + +This document is a collection of documents about development tools that can +be used to work on the kernel. For now, the documents have been pulled +together without any significant effort to integrate them into a coherent +whole; patches welcome! + +.. class:: toc-title + + Table of contents + +.. toctree:: + :maxdepth: 2 + + coccinelle + sparse + kcov + gcov + kasan + ubsan + kmemleak + kcsan + gdb-kernel-debugging + kgdb + kselftest + kunit/index + + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst new file mode 100644 index 000000000..2b68addaa --- /dev/null +++ b/Documentation/dev-tools/kasan.rst @@ -0,0 +1,355 @@ +The Kernel Address Sanitizer (KASAN) +==================================== + +Overview +-------- + +KernelAddressSANitizer (KASAN) is a dynamic memory error detector designed to +find out-of-bound and use-after-free bugs. KASAN has two modes: generic KASAN +(similar to userspace ASan) and software tag-based KASAN (similar to userspace +HWASan). + +KASAN uses compile-time instrumentation to insert validity checks before every +memory access, and therefore requires a compiler version that supports that. + +Generic KASAN is supported in both GCC and Clang. With GCC it requires version +8.3.0 or later. Any supported Clang version is compatible, but detection of +out-of-bounds accesses for global variables is only supported since Clang 11. + +Tag-based KASAN is only supported in Clang. + +Currently generic KASAN is supported for the x86_64, arm64, xtensa, s390 and +riscv architectures, and tag-based KASAN is supported only for arm64. + +Usage +----- + +To enable KASAN configure kernel with:: + + CONFIG_KASAN = y + +and choose between CONFIG_KASAN_GENERIC (to enable generic KASAN) and +CONFIG_KASAN_SW_TAGS (to enable software tag-based KASAN). + +You also need to choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. +Outline and inline are compiler instrumentation types. The former produces +smaller binary while the latter is 1.1 - 2 times faster. + +Both KASAN modes work with both SLUB and SLAB memory allocators. +For better bug detection and nicer reporting, enable CONFIG_STACKTRACE. + +To augment reports with last allocation and freeing stack of the physical page, +it is recommended to enable also CONFIG_PAGE_OWNER and boot with page_owner=on. + +To disable instrumentation for specific files or directories, add a line +similar to the following to the respective kernel Makefile: + +- For a single file (e.g. main.o):: + + KASAN_SANITIZE_main.o := n + +- For all files in one directory:: + + KASAN_SANITIZE := n + +Error reports +~~~~~~~~~~~~~ + +A typical out-of-bounds access generic KASAN report looks like this:: + + ================================================================== + BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan] + Write of size 1 at addr ffff8801f44ec37b by task insmod/2760 + + CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 + Call Trace: + dump_stack+0x94/0xd8 + print_address_description+0x73/0x280 + kasan_report+0x144/0x187 + __asan_report_store1_noabort+0x17/0x20 + kmalloc_oob_right+0xa8/0xbc [test_kasan] + kmalloc_tests_init+0x16/0x700 [test_kasan] + do_one_initcall+0xa5/0x3ae + do_init_module+0x1b6/0x547 + load_module+0x75df/0x8070 + __do_sys_init_module+0x1c6/0x200 + __x64_sys_init_module+0x6e/0xb0 + do_syscall_64+0x9f/0x2c0 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 + RIP: 0033:0x7f96443109da + RSP: 002b:00007ffcf0b51b08 EFLAGS: 00000202 ORIG_RAX: 00000000000000af + RAX: ffffffffffffffda RBX: 000055dc3ee521a0 RCX: 00007f96443109da + RDX: 00007f96445cff88 RSI: 0000000000057a50 RDI: 00007f9644992000 + RBP: 000055dc3ee510b0 R08: 0000000000000003 R09: 0000000000000000 + R10: 00007f964430cd0a R11: 0000000000000202 R12: 00007f96445cff88 + R13: 000055dc3ee51090 R14: 0000000000000000 R15: 0000000000000000 + + Allocated by task 2760: + save_stack+0x43/0xd0 + kasan_kmalloc+0xa7/0xd0 + kmem_cache_alloc_trace+0xe1/0x1b0 + kmalloc_oob_right+0x56/0xbc [test_kasan] + kmalloc_tests_init+0x16/0x700 [test_kasan] + do_one_initcall+0xa5/0x3ae + do_init_module+0x1b6/0x547 + load_module+0x75df/0x8070 + __do_sys_init_module+0x1c6/0x200 + __x64_sys_init_module+0x6e/0xb0 + do_syscall_64+0x9f/0x2c0 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 + + Freed by task 815: + save_stack+0x43/0xd0 + __kasan_slab_free+0x135/0x190 + kasan_slab_free+0xe/0x10 + kfree+0x93/0x1a0 + umh_complete+0x6a/0xa0 + call_usermodehelper_exec_async+0x4c3/0x640 + ret_from_fork+0x35/0x40 + + The buggy address belongs to the object at ffff8801f44ec300 + which belongs to the cache kmalloc-128 of size 128 + The buggy address is located 123 bytes inside of + 128-byte region [ffff8801f44ec300, ffff8801f44ec380) + The buggy address belongs to the page: + page:ffffea0007d13b00 count:1 mapcount:0 mapping:ffff8801f7001640 index:0x0 + flags: 0x200000000000100(slab) + raw: 0200000000000100 ffffea0007d11dc0 0000001a0000001a ffff8801f7001640 + raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000 + page dumped because: kasan: bad access detected + + Memory state around the buggy address: + ffff8801f44ec200: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb + ffff8801f44ec280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc + >ffff8801f44ec300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 + ^ + ffff8801f44ec380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb + ffff8801f44ec400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc + ================================================================== + +The header of the report provides a short summary of what kind of bug happened +and what kind of access caused it. It's followed by a stack trace of the bad +access, a stack trace of where the accessed memory was allocated (in case bad +access happens on a slab object), and a stack trace of where the object was +freed (in case of a use-after-free bug report). Next comes a description of +the accessed slab object and information about the accessed memory page. + +In the last section the report shows memory state around the accessed address. +Reading this part requires some understanding of how KASAN works. + +The state of each 8 aligned bytes of memory is encoded in one shadow byte. +Those 8 bytes can be accessible, partially accessible, freed or be a redzone. +We use the following encoding for each shadow byte: 0 means that all 8 bytes +of the corresponding memory region are accessible; number N (1 <= N <= 7) means +that the first N bytes are accessible, and other (8 - N) bytes are not; +any negative value indicates that the entire 8-byte word is inaccessible. +We use different negative values to distinguish between different kinds of +inaccessible memory like redzones or freed memory (see mm/kasan/kasan.h). + +In the report above the arrows point to the shadow byte 03, which means that +the accessed address is partially accessible. + +For tag-based KASAN this last report section shows the memory tags around the +accessed address (see Implementation details section). + + +Implementation details +---------------------- + +Generic KASAN +~~~~~~~~~~~~~ + +From a high level, our approach to memory error detection is similar to that +of kmemcheck: use shadow memory to record whether each byte of memory is safe +to access, and use compile-time instrumentation to insert checks of shadow +memory on each memory access. + +Generic KASAN dedicates 1/8th of kernel memory to its shadow memory (e.g. 16TB +to cover 128TB on x86_64) and uses direct mapping with a scale and offset to +translate a memory address to its corresponding shadow address. + +Here is the function which translates an address to its corresponding shadow +address:: + + static inline void *kasan_mem_to_shadow(const void *addr) + { + return ((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT) + + KASAN_SHADOW_OFFSET; + } + +where ``KASAN_SHADOW_SCALE_SHIFT = 3``. + +Compile-time instrumentation is used to insert memory access checks. Compiler +inserts function calls (__asan_load*(addr), __asan_store*(addr)) before each +memory access of size 1, 2, 4, 8 or 16. These functions check whether memory +access is valid or not by checking corresponding shadow memory. + +GCC 5.0 has possibility to perform inline instrumentation. Instead of making +function calls GCC directly inserts the code to check the shadow memory. +This option significantly enlarges kernel but it gives x1.1-x2 performance +boost over outline instrumented kernel. + +Generic KASAN prints up to 2 call_rcu() call stacks in reports, the last one +and the second to last. + +Software tag-based KASAN +~~~~~~~~~~~~~~~~~~~~~~~~ + +Tag-based KASAN uses the Top Byte Ignore (TBI) feature of modern arm64 CPUs to +store a pointer tag in the top byte of kernel pointers. Like generic KASAN it +uses shadow memory to store memory tags associated with each 16-byte memory +cell (therefore it dedicates 1/16th of the kernel memory for shadow memory). + +On each memory allocation tag-based KASAN generates a random tag, tags the +allocated memory with this tag, and embeds this tag into the returned pointer. +Software tag-based KASAN uses compile-time instrumentation to insert checks +before each memory access. These checks make sure that tag of the memory that +is being accessed is equal to tag of the pointer that is used to access this +memory. In case of a tag mismatch tag-based KASAN prints a bug report. + +Software tag-based KASAN also has two instrumentation modes (outline, that +emits callbacks to check memory accesses; and inline, that performs the shadow +memory checks inline). With outline instrumentation mode, a bug report is +simply printed from the function that performs the access check. With inline +instrumentation a brk instruction is emitted by the compiler, and a dedicated +brk handler is used to print bug reports. + +A potential expansion of this mode is a hardware tag-based mode, which would +use hardware memory tagging support instead of compiler instrumentation and +manual shadow memory manipulation. + +What memory accesses are sanitised by KASAN? +-------------------------------------------- + +The kernel maps memory in a number of different parts of the address +space. This poses something of a problem for KASAN, which requires +that all addresses accessed by instrumented code have a valid shadow +region. + +The range of kernel virtual addresses is large: there is not enough +real memory to support a real shadow region for every address that +could be accessed by the kernel. + +By default +~~~~~~~~~~ + +By default, architectures only map real memory over the shadow region +for the linear mapping (and potentially other small areas). For all +other areas - such as vmalloc and vmemmap space - a single read-only +page is mapped over the shadow area. This read-only shadow page +declares all memory accesses as permitted. + +This presents a problem for modules: they do not live in the linear +mapping, but in a dedicated module space. By hooking in to the module +allocator, KASAN can temporarily map real shadow memory to cover +them. This allows detection of invalid accesses to module globals, for +example. + +This also creates an incompatibility with ``VMAP_STACK``: if the stack +lives in vmalloc space, it will be shadowed by the read-only page, and +the kernel will fault when trying to set up the shadow data for stack +variables. + +CONFIG_KASAN_VMALLOC +~~~~~~~~~~~~~~~~~~~~ + +With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the +cost of greater memory usage. Currently this is only supported on x86. + +This works by hooking into vmalloc and vmap, and dynamically +allocating real shadow memory to back the mappings. + +Most mappings in vmalloc space are small, requiring less than a full +page of shadow space. Allocating a full shadow page per mapping would +therefore be wasteful. Furthermore, to ensure that different mappings +use different shadow pages, mappings would have to be aligned to +``KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE``. + +Instead, we share backing space across multiple mappings. We allocate +a backing page when a mapping in vmalloc space uses a particular page +of the shadow region. This page can be shared by other vmalloc +mappings later on. + +We hook in to the vmap infrastructure to lazily clean up unused shadow +memory. + +To avoid the difficulties around swapping mappings around, we expect +that the part of the shadow region that covers the vmalloc space will +not be covered by the early shadow page, but will be left +unmapped. This will require changes in arch-specific code. + +This allows ``VMAP_STACK`` support on x86, and can simplify support of +architectures that do not have a fixed module region. + +CONFIG_KASAN_KUNIT_TEST & CONFIG_TEST_KASAN_MODULE +-------------------------------------------------- + +``CONFIG_KASAN_KUNIT_TEST`` utilizes the KUnit Test Framework for testing. +This means each test focuses on a small unit of functionality and +there are a few ways these tests can be run. + +Each test will print the KASAN report if an error is detected and then +print the number of the test and the status of the test: + +pass:: + + ok 28 - kmalloc_double_kzfree + +or, if kmalloc failed:: + + # kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163 + Expected ptr is not null, but is + not ok 4 - kmalloc_large_oob_right + +or, if a KASAN report was expected, but not found:: + + # kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:629 + Expected kasan_data->report_expected == kasan_data->report_found, but + kasan_data->report_expected == 1 + kasan_data->report_found == 0 + not ok 28 - kmalloc_double_kzfree + +All test statuses are tracked as they run and an overall status will +be printed at the end:: + + ok 1 - kasan + +or:: + + not ok 1 - kasan + +(1) Loadable Module +~~~~~~~~~~~~~~~~~~~~ + +With ``CONFIG_KUNIT`` enabled, ``CONFIG_KASAN_KUNIT_TEST`` can be built as +a loadable module and run on any architecture that supports KASAN +using something like insmod or modprobe. The module is called ``test_kasan``. + +(2) Built-In +~~~~~~~~~~~~~ + +With ``CONFIG_KUNIT`` built-in, ``CONFIG_KASAN_KUNIT_TEST`` can be built-in +on any architecure that supports KASAN. These and any other KUnit +tests enabled will run and print the results at boot as a late-init +call. + +(3) Using kunit_tool +~~~~~~~~~~~~~~~~~~~~~ + +With ``CONFIG_KUNIT`` and ``CONFIG_KASAN_KUNIT_TEST`` built-in, we can also +use kunit_tool to see the results of these along with other KUnit +tests in a more readable way. This will not print the KASAN reports +of tests that passed. Use `KUnit documentation `_ for more up-to-date +information on kunit_tool. + +.. _KUnit: https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html + +``CONFIG_TEST_KASAN_MODULE`` is a set of KASAN tests that could not be +converted to KUnit. These tests can be run only as a module with +``CONFIG_TEST_KASAN_MODULE`` built as a loadable module and +``CONFIG_KASAN`` built-in. The type of error expected and the +function being run is printed before the expression expected to give +an error. Then the error is printed, if found, and that test +should be interpretted to pass only if the error was the one expected +by the test. diff --git a/Documentation/dev-tools/kcov.rst b/Documentation/dev-tools/kcov.rst new file mode 100644 index 000000000..8548b0b04 --- /dev/null +++ b/Documentation/dev-tools/kcov.rst @@ -0,0 +1,334 @@ +kcov: code coverage for fuzzing +=============================== + +kcov exposes kernel code coverage information in a form suitable for coverage- +guided fuzzing (randomized testing). Coverage data of a running kernel is +exported via the "kcov" debugfs file. Coverage collection is enabled on a task +basis, and thus it can capture precise coverage of a single system call. + +Note that kcov does not aim to collect as much coverage as possible. It aims +to collect more or less stable coverage that is function of syscall inputs. +To achieve this goal it does not collect coverage in soft/hard interrupts +and instrumentation of some inherently non-deterministic parts of kernel is +disabled (e.g. scheduler, locking). + +kcov is also able to collect comparison operands from the instrumented code +(this feature currently requires that the kernel is compiled with clang). + +Prerequisites +------------- + +Configure the kernel with:: + + CONFIG_KCOV=y + +CONFIG_KCOV requires gcc 6.1.0 or later. + +If the comparison operands need to be collected, set:: + + CONFIG_KCOV_ENABLE_COMPARISONS=y + +Profiling data will only become accessible once debugfs has been mounted:: + + mount -t debugfs none /sys/kernel/debug + +Coverage collection +------------------- + +The following program demonstrates coverage collection from within a test +program using kcov: + +.. code-block:: c + + #include + #include + #include + #include + #include + #include + #include + #include + #include + #include + + #define KCOV_INIT_TRACE _IOR('c', 1, unsigned long) + #define KCOV_ENABLE _IO('c', 100) + #define KCOV_DISABLE _IO('c', 101) + #define COVER_SIZE (64<<10) + + #define KCOV_TRACE_PC 0 + #define KCOV_TRACE_CMP 1 + + int main(int argc, char **argv) + { + int fd; + unsigned long *cover, n, i; + + /* A single fd descriptor allows coverage collection on a single + * thread. + */ + fd = open("/sys/kernel/debug/kcov", O_RDWR); + if (fd == -1) + perror("open"), exit(1); + /* Setup trace mode and trace size. */ + if (ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE)) + perror("ioctl"), exit(1); + /* Mmap buffer shared between kernel- and user-space. */ + cover = (unsigned long*)mmap(NULL, COVER_SIZE * sizeof(unsigned long), + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if ((void*)cover == MAP_FAILED) + perror("mmap"), exit(1); + /* Enable coverage collection on the current thread. */ + if (ioctl(fd, KCOV_ENABLE, KCOV_TRACE_PC)) + perror("ioctl"), exit(1); + /* Reset coverage from the tail of the ioctl() call. */ + __atomic_store_n(&cover[0], 0, __ATOMIC_RELAXED); + /* That's the target syscal call. */ + read(-1, NULL, 0); + /* Read number of PCs collected. */ + n = __atomic_load_n(&cover[0], __ATOMIC_RELAXED); + for (i = 0; i < n; i++) + printf("0x%lx\n", cover[i + 1]); + /* Disable coverage collection for the current thread. After this call + * coverage can be enabled for a different thread. + */ + if (ioctl(fd, KCOV_DISABLE, 0)) + perror("ioctl"), exit(1); + /* Free resources. */ + if (munmap(cover, COVER_SIZE * sizeof(unsigned long))) + perror("munmap"), exit(1); + if (close(fd)) + perror("close"), exit(1); + return 0; + } + +After piping through addr2line output of the program looks as follows:: + + SyS_read + fs/read_write.c:562 + __fdget_pos + fs/file.c:774 + __fget_light + fs/file.c:746 + __fget_light + fs/file.c:750 + __fget_light + fs/file.c:760 + __fdget_pos + fs/file.c:784 + SyS_read + fs/read_write.c:562 + +If a program needs to collect coverage from several threads (independently), +it needs to open /sys/kernel/debug/kcov in each thread separately. + +The interface is fine-grained to allow efficient forking of test processes. +That is, a parent process opens /sys/kernel/debug/kcov, enables trace mode, +mmaps coverage buffer and then forks child processes in a loop. Child processes +only need to enable coverage (disable happens automatically on thread end). + +Comparison operands collection +------------------------------ + +Comparison operands collection is similar to coverage collection: + +.. code-block:: c + + /* Same includes and defines as above. */ + + /* Number of 64-bit words per record. */ + #define KCOV_WORDS_PER_CMP 4 + + /* + * The format for the types of collected comparisons. + * + * Bit 0 shows whether one of the arguments is a compile-time constant. + * Bits 1 & 2 contain log2 of the argument size, up to 8 bytes. + */ + + #define KCOV_CMP_CONST (1 << 0) + #define KCOV_CMP_SIZE(n) ((n) << 1) + #define KCOV_CMP_MASK KCOV_CMP_SIZE(3) + + int main(int argc, char **argv) + { + int fd; + uint64_t *cover, type, arg1, arg2, is_const, size; + unsigned long n, i; + + fd = open("/sys/kernel/debug/kcov", O_RDWR); + if (fd == -1) + perror("open"), exit(1); + if (ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE)) + perror("ioctl"), exit(1); + /* + * Note that the buffer pointer is of type uint64_t*, because all + * the comparison operands are promoted to uint64_t. + */ + cover = (uint64_t *)mmap(NULL, COVER_SIZE * sizeof(unsigned long), + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if ((void*)cover == MAP_FAILED) + perror("mmap"), exit(1); + /* Note KCOV_TRACE_CMP instead of KCOV_TRACE_PC. */ + if (ioctl(fd, KCOV_ENABLE, KCOV_TRACE_CMP)) + perror("ioctl"), exit(1); + __atomic_store_n(&cover[0], 0, __ATOMIC_RELAXED); + read(-1, NULL, 0); + /* Read number of comparisons collected. */ + n = __atomic_load_n(&cover[0], __ATOMIC_RELAXED); + for (i = 0; i < n; i++) { + type = cover[i * KCOV_WORDS_PER_CMP + 1]; + /* arg1 and arg2 - operands of the comparison. */ + arg1 = cover[i * KCOV_WORDS_PER_CMP + 2]; + arg2 = cover[i * KCOV_WORDS_PER_CMP + 3]; + /* ip - caller address. */ + ip = cover[i * KCOV_WORDS_PER_CMP + 4]; + /* size of the operands. */ + size = 1 << ((type & KCOV_CMP_MASK) >> 1); + /* is_const - true if either operand is a compile-time constant.*/ + is_const = type & KCOV_CMP_CONST; + printf("ip: 0x%lx type: 0x%lx, arg1: 0x%lx, arg2: 0x%lx, " + "size: %lu, %s\n", + ip, type, arg1, arg2, size, + is_const ? "const" : "non-const"); + } + if (ioctl(fd, KCOV_DISABLE, 0)) + perror("ioctl"), exit(1); + /* Free resources. */ + if (munmap(cover, COVER_SIZE * sizeof(unsigned long))) + perror("munmap"), exit(1); + if (close(fd)) + perror("close"), exit(1); + return 0; + } + +Note that the kcov modes (coverage collection or comparison operands) are +mutually exclusive. + +Remote coverage collection +-------------------------- + +With KCOV_ENABLE coverage is collected only for syscalls that are issued +from the current process. With KCOV_REMOTE_ENABLE it's possible to collect +coverage for arbitrary parts of the kernel code, provided that those parts +are annotated with kcov_remote_start()/kcov_remote_stop(). + +This allows to collect coverage from two types of kernel background +threads: the global ones, that are spawned during kernel boot in a limited +number of instances (e.g. one USB hub_event() worker thread is spawned per +USB HCD); and the local ones, that are spawned when a user interacts with +some kernel interface (e.g. vhost workers); as well as from soft +interrupts. + +To enable collecting coverage from a global background thread or from a +softirq, a unique global handle must be assigned and passed to the +corresponding kcov_remote_start() call. Then a userspace process can pass +a list of such handles to the KCOV_REMOTE_ENABLE ioctl in the handles +array field of the kcov_remote_arg struct. This will attach the used kcov +device to the code sections, that are referenced by those handles. + +Since there might be many local background threads spawned from different +userspace processes, we can't use a single global handle per annotation. +Instead, the userspace process passes a non-zero handle through the +common_handle field of the kcov_remote_arg struct. This common handle gets +saved to the kcov_handle field in the current task_struct and needs to be +passed to the newly spawned threads via custom annotations. Those threads +should in turn be annotated with kcov_remote_start()/kcov_remote_stop(). + +Internally kcov stores handles as u64 integers. The top byte of a handle +is used to denote the id of a subsystem that this handle belongs to, and +the lower 4 bytes are used to denote the id of a thread instance within +that subsystem. A reserved value 0 is used as a subsystem id for common +handles as they don't belong to a particular subsystem. The bytes 4-7 are +currently reserved and must be zero. In the future the number of bytes +used for the subsystem or handle ids might be increased. + +When a particular userspace proccess collects coverage via a common +handle, kcov will collect coverage for each code section that is annotated +to use the common handle obtained as kcov_handle from the current +task_struct. However non common handles allow to collect coverage +selectively from different subsystems. + +.. code-block:: c + + struct kcov_remote_arg { + __u32 trace_mode; + __u32 area_size; + __u32 num_handles; + __aligned_u64 common_handle; + __aligned_u64 handles[0]; + }; + + #define KCOV_INIT_TRACE _IOR('c', 1, unsigned long) + #define KCOV_DISABLE _IO('c', 101) + #define KCOV_REMOTE_ENABLE _IOW('c', 102, struct kcov_remote_arg) + + #define COVER_SIZE (64 << 10) + + #define KCOV_TRACE_PC 0 + + #define KCOV_SUBSYSTEM_COMMON (0x00ull << 56) + #define KCOV_SUBSYSTEM_USB (0x01ull << 56) + + #define KCOV_SUBSYSTEM_MASK (0xffull << 56) + #define KCOV_INSTANCE_MASK (0xffffffffull) + + static inline __u64 kcov_remote_handle(__u64 subsys, __u64 inst) + { + if (subsys & ~KCOV_SUBSYSTEM_MASK || inst & ~KCOV_INSTANCE_MASK) + return 0; + return subsys | inst; + } + + #define KCOV_COMMON_ID 0x42 + #define KCOV_USB_BUS_NUM 1 + + int main(int argc, char **argv) + { + int fd; + unsigned long *cover, n, i; + struct kcov_remote_arg *arg; + + fd = open("/sys/kernel/debug/kcov", O_RDWR); + if (fd == -1) + perror("open"), exit(1); + if (ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE)) + perror("ioctl"), exit(1); + cover = (unsigned long*)mmap(NULL, COVER_SIZE * sizeof(unsigned long), + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if ((void*)cover == MAP_FAILED) + perror("mmap"), exit(1); + + /* Enable coverage collection via common handle and from USB bus #1. */ + arg = calloc(1, sizeof(*arg) + sizeof(uint64_t)); + if (!arg) + perror("calloc"), exit(1); + arg->trace_mode = KCOV_TRACE_PC; + arg->area_size = COVER_SIZE; + arg->num_handles = 1; + arg->common_handle = kcov_remote_handle(KCOV_SUBSYSTEM_COMMON, + KCOV_COMMON_ID); + arg->handles[0] = kcov_remote_handle(KCOV_SUBSYSTEM_USB, + KCOV_USB_BUS_NUM); + if (ioctl(fd, KCOV_REMOTE_ENABLE, arg)) + perror("ioctl"), free(arg), exit(1); + free(arg); + + /* + * Here the user needs to trigger execution of a kernel code section + * that is either annotated with the common handle, or to trigger some + * activity on USB bus #1. + */ + sleep(2); + + n = __atomic_load_n(&cover[0], __ATOMIC_RELAXED); + for (i = 0; i < n; i++) + printf("0x%lx\n", cover[i + 1]); + if (ioctl(fd, KCOV_DISABLE, 0)) + perror("ioctl"), exit(1); + if (munmap(cover, COVER_SIZE * sizeof(unsigned long))) + perror("munmap"), exit(1); + if (close(fd)) + perror("close"), exit(1); + return 0; + } diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst new file mode 100644 index 000000000..be7a0b0e1 --- /dev/null +++ b/Documentation/dev-tools/kcsan.rst @@ -0,0 +1,316 @@ +The Kernel Concurrency Sanitizer (KCSAN) +======================================== + +The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector, which +relies on compile-time instrumentation, and uses a watchpoint-based sampling +approach to detect races. KCSAN's primary purpose is to detect `data races`_. + +Usage +----- + +KCSAN is supported by both GCC and Clang. With GCC we require version 11 or +later, and with Clang also require version 11 or later. + +To enable KCSAN configure the kernel with:: + + CONFIG_KCSAN = y + +KCSAN provides several other configuration options to customize behaviour (see +the respective help text in ``lib/Kconfig.kcsan`` for more info). + +Error reports +~~~~~~~~~~~~~ + +A typical data race report looks like this:: + + ================================================================== + BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode + + write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4: + kernfs_refresh_inode+0x70/0x170 + kernfs_iop_permission+0x4f/0x90 + inode_permission+0x190/0x200 + link_path_walk.part.0+0x503/0x8e0 + path_lookupat.isra.0+0x69/0x4d0 + filename_lookup+0x136/0x280 + user_path_at_empty+0x47/0x60 + vfs_statx+0x9b/0x130 + __do_sys_newlstat+0x50/0xb0 + __x64_sys_newlstat+0x37/0x50 + do_syscall_64+0x85/0x260 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 + + read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6: + generic_permission+0x5b/0x2a0 + kernfs_iop_permission+0x66/0x90 + inode_permission+0x190/0x200 + link_path_walk.part.0+0x503/0x8e0 + path_lookupat.isra.0+0x69/0x4d0 + filename_lookup+0x136/0x280 + user_path_at_empty+0x47/0x60 + do_faccessat+0x11a/0x390 + __x64_sys_access+0x3c/0x50 + do_syscall_64+0x85/0x260 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 + + Reported by Kernel Concurrency Sanitizer on: + CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 + ================================================================== + +The header of the report provides a short summary of the functions involved in +the race. It is followed by the access types and stack traces of the 2 threads +involved in the data race. + +The other less common type of data race report looks like this:: + + ================================================================== + BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10 + + race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0: + e1000_clean_rx_irq+0x551/0xb10 + e1000_clean+0x533/0xda0 + net_rx_action+0x329/0x900 + __do_softirq+0xdb/0x2db + irq_exit+0x9b/0xa0 + do_IRQ+0x9c/0xf0 + ret_from_intr+0x0/0x18 + default_idle+0x3f/0x220 + arch_cpu_idle+0x21/0x30 + do_idle+0x1df/0x230 + cpu_startup_entry+0x14/0x20 + rest_init+0xc5/0xcb + arch_call_rest_init+0x13/0x2b + start_kernel+0x6db/0x700 + + Reported by Kernel Concurrency Sanitizer on: + CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 + ================================================================== + +This report is generated where it was not possible to determine the other +racing thread, but a race was inferred due to the data value of the watched +memory location having changed. These can occur either due to missing +instrumentation or e.g. DMA accesses. These reports will only be generated if +``CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=y`` (selected by default). + +Selective analysis +~~~~~~~~~~~~~~~~~~ + +It may be desirable to disable data race detection for specific accesses, +functions, compilation units, or entire subsystems. For static blacklisting, +the below options are available: + +* KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that + any data races due to accesses in ``expr`` should be ignored and resulting + behaviour when encountering a data race is deemed safe. + +* Disabling data race detection for entire functions can be accomplished by + using the function attribute ``__no_kcsan``:: + + __no_kcsan + void foo(void) { + ... + + To dynamically limit for which functions to generate reports, see the + `DebugFS interface`_ blacklist/whitelist feature. + +* To disable data race detection for a particular compilation unit, add to the + ``Makefile``:: + + KCSAN_SANITIZE_file.o := n + +* To disable data race detection for all compilation units listed in a + ``Makefile``, add to the respective ``Makefile``:: + + KCSAN_SANITIZE := n + +Furthermore, it is possible to tell KCSAN to show or hide entire classes of +data races, depending on preferences. These can be changed via the following +Kconfig options: + +* ``CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY``: If enabled and a conflicting write + is observed via a watchpoint, but the data value of the memory location was + observed to remain unchanged, do not report the data race. + +* ``CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC``: Assume that plain aligned writes + up to word size are atomic by default. Assumes that such writes are not + subject to unsafe compiler optimizations resulting in data races. The option + causes KCSAN to not report data races due to conflicts where the only plain + accesses are aligned writes up to word size. + +DebugFS interface +~~~~~~~~~~~~~~~~~ + +The file ``/sys/kernel/debug/kcsan`` provides the following interface: + +* Reading ``/sys/kernel/debug/kcsan`` returns various runtime statistics. + +* Writing ``on`` or ``off`` to ``/sys/kernel/debug/kcsan`` allows turning KCSAN + on or off, respectively. + +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds + ``some_func_name`` to the report filter list, which (by default) blacklists + reporting data races where either one of the top stackframes are a function + in the list. + +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan`` + changes the report filtering behaviour. For example, the blacklist feature + can be used to silence frequently occurring data races; the whitelist feature + can help with reproduction and testing of fixes. + +Tuning performance +~~~~~~~~~~~~~~~~~~ + +Core parameters that affect KCSAN's overall performance and bug detection +ability are exposed as kernel command-line arguments whose defaults can also be +changed via the corresponding Kconfig options. + +* ``kcsan.skip_watch`` (``CONFIG_KCSAN_SKIP_WATCH``): Number of per-CPU memory + operations to skip, before another watchpoint is set up. Setting up + watchpoints more frequently will result in the likelihood of races to be + observed to increase. This parameter has the most significant impact on + overall system performance and race detection ability. + +* ``kcsan.udelay_task`` (``CONFIG_KCSAN_UDELAY_TASK``): For tasks, the + microsecond delay to stall execution after a watchpoint has been set up. + Larger values result in the window in which we may observe a race to + increase. + +* ``kcsan.udelay_interrupt`` (``CONFIG_KCSAN_UDELAY_INTERRUPT``): For + interrupts, the microsecond delay to stall execution after a watchpoint has + been set up. Interrupts have tighter latency requirements, and their delay + should generally be smaller than the one chosen for tasks. + +They may be tweaked at runtime via ``/sys/module/kcsan/parameters/``. + +Data Races +---------- + +In an execution, two memory accesses form a *data race* if they *conflict*, +they happen concurrently in different threads, and at least one of them is a +*plain access*; they *conflict* if both access the same memory location, and at +least one is a write. For a more thorough discussion and definition, see `"Plain +Accesses and Data Races" in the LKMM`_. + +.. _"Plain Accesses and Data Races" in the LKMM: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1922 + +Relationship with the Linux-Kernel Memory Consistency Model (LKMM) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The LKMM defines the propagation and ordering rules of various memory +operations, which gives developers the ability to reason about concurrent code. +Ultimately this allows to determine the possible executions of concurrent code, +and if that code is free from data races. + +KCSAN is aware of *marked atomic operations* (``READ_ONCE``, ``WRITE_ONCE``, +``atomic_*``, etc.), but is oblivious of any ordering guarantees and simply +assumes that memory barriers are placed correctly. In other words, KCSAN +assumes that as long as a plain access is not observed to race with another +conflicting access, memory operations are correctly ordered. + +This means that KCSAN will not report *potential* data races due to missing +memory ordering. Developers should therefore carefully consider the required +memory ordering requirements that remain unchecked. If, however, missing +memory ordering (that is observable with a particular compiler and +architecture) leads to an observable data race (e.g. entering a critical +section erroneously), KCSAN would report the resulting data race. + +Race Detection Beyond Data Races +-------------------------------- + +For code with complex concurrency design, race-condition bugs may not always +manifest as data races. Race conditions occur if concurrently executing +operations result in unexpected system behaviour. On the other hand, data races +are defined at the C-language level. The following macros can be used to check +properties of concurrent code where bugs would not manifest as data races. + +.. kernel-doc:: include/linux/kcsan-checks.h + :functions: ASSERT_EXCLUSIVE_WRITER ASSERT_EXCLUSIVE_WRITER_SCOPED + ASSERT_EXCLUSIVE_ACCESS ASSERT_EXCLUSIVE_ACCESS_SCOPED + ASSERT_EXCLUSIVE_BITS + +Implementation Details +---------------------- + +KCSAN relies on observing that two accesses happen concurrently. Crucially, we +want to (a) increase the chances of observing races (especially for races that +manifest rarely), and (b) be able to actually observe them. We can accomplish +(a) by injecting various delays, and (b) by using address watchpoints (or +breakpoints). + +If we deliberately stall a memory access, while we have a watchpoint for its +address set up, and then observe the watchpoint to fire, two accesses to the +same address just raced. Using hardware watchpoints, this is the approach taken +in `DataCollider +`_. +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead +relies on compiler instrumentation and "soft watchpoints". + +In KCSAN, watchpoints are implemented using an efficient encoding that stores +access type, size, and address in a long; the benefits of using "soft +watchpoints" are portability and greater flexibility. KCSAN then relies on the +compiler instrumenting plain accesses. For each instrumented plain access: + +1. Check if a matching watchpoint exists; if yes, and at least one access is a + write, then we encountered a racing access. + +2. Periodically, if no matching watchpoint exists, set up a watchpoint and + stall for a small randomized delay. + +3. Also check the data value before the delay, and re-check the data value + after delay; if the values mismatch, we infer a race of unknown origin. + +To detect data races between plain and marked accesses, KCSAN also annotates +marked accesses, but only to check if a watchpoint exists; i.e. KCSAN never +sets up a watchpoint on marked accesses. By never setting up watchpoints for +marked operations, if all accesses to a variable that is accessed concurrently +are properly marked, KCSAN will never trigger a watchpoint and therefore never +report the accesses. + +Key Properties +~~~~~~~~~~~~~~ + +1. **Memory Overhead:** The overall memory overhead is only a few MiB + depending on configuration. The current implementation uses a small array of + longs to encode watchpoint information, which is negligible. + +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an + efficient watchpoint encoding that does not require acquiring any shared + locks in the fast-path. For kernel boot on a system with 8 CPUs: + + - 5.0x slow-down with the default KCSAN config; + - 2.8x slow-down from runtime fast-path overhead only (set very large + ``KCSAN_SKIP_WATCH`` and unset ``KCSAN_SKIP_WATCH_RANDOMIZE``). + +3. **Annotation Overheads:** Minimal annotations are required outside the KCSAN + runtime. As a result, maintenance overheads are minimal as the kernel + evolves. + +4. **Detects Racy Writes from Devices:** Due to checking data values upon + setting up watchpoints, racy writes from devices can also be detected. + +5. **Memory Ordering:** KCSAN is *not* explicitly aware of the LKMM's ordering + rules; this may result in missed data races (false negatives). + +6. **Analysis Accuracy:** For observed executions, due to using a sampling + strategy, the analysis is *unsound* (false negatives possible), but aims to + be complete (no false positives). + +Alternatives Considered +----------------------- + +An alternative data race detection approach for the kernel can be found in the +`Kernel Thread Sanitizer (KTSAN) `_. +KTSAN is a happens-before data race detector, which explicitly establishes the +happens-before order between memory operations, which can then be used to +determine data races as defined in `Data Races`_. + +To build a correct happens-before relation, KTSAN must be aware of all ordering +rules of the LKMM and synchronization primitives. Unfortunately, any omission +leads to large numbers of false positives, which is especially detrimental in +the context of the kernel which includes numerous custom synchronization +mechanisms. To track the happens-before relation, KTSAN's implementation +requires metadata for each memory location (shadow memory), which for each page +corresponds to 4 pages of shadow memory, and can translate into overhead of +tens of GiB on a large system. diff --git a/Documentation/dev-tools/kgdb.rst b/Documentation/dev-tools/kgdb.rst new file mode 100644 index 000000000..77b688e6a --- /dev/null +++ b/Documentation/dev-tools/kgdb.rst @@ -0,0 +1,940 @@ +================================================= +Using kgdb, kdb and the kernel debugger internals +================================================= + +:Author: Jason Wessel + +Introduction +============ + +The kernel has two different debugger front ends (kdb and kgdb) which +interface to the debug core. It is possible to use either of the +debugger front ends and dynamically transition between them if you +configure the kernel properly at compile and runtime. + +Kdb is simplistic shell-style interface which you can use on a system +console with a keyboard or serial console. You can use it to inspect +memory, registers, process lists, dmesg, and even set breakpoints to +stop in a certain location. Kdb is not a source level debugger, although +you can set breakpoints and execute some basic kernel run control. Kdb +is mainly aimed at doing some analysis to aid in development or +diagnosing kernel problems. You can access some symbols by name in +kernel built-ins or in kernel modules if the code was built with +``CONFIG_KALLSYMS``. + +Kgdb is intended to be used as a source level debugger for the Linux +kernel. It is used along with gdb to debug a Linux kernel. The +expectation is that gdb can be used to "break in" to the kernel to +inspect memory, variables and look through call stack information +similar to the way an application developer would use gdb to debug an +application. It is possible to place breakpoints in kernel code and +perform some limited execution stepping. + +Two machines are required for using kgdb. One of these machines is a +development machine and the other is the target machine. The kernel to +be debugged runs on the target machine. The development machine runs an +instance of gdb against the vmlinux file which contains the symbols (not +a boot image such as bzImage, zImage, uImage...). In gdb the developer +specifies the connection parameters and connects to kgdb. The type of +connection a developer makes with gdb depends on the availability of +kgdb I/O modules compiled as built-ins or loadable kernel modules in the +test machine's kernel. + +Compiling a kernel +================== + +- In order to enable compilation of kdb, you must first enable kgdb. + +- The kgdb test compile options are described in the kgdb test suite + chapter. + +Kernel config options for kgdb +------------------------------ + +To enable ``CONFIG_KGDB`` you should look under +:menuselection:`Kernel hacking --> Kernel debugging` and select +:menuselection:`KGDB: kernel debugger`. + +While it is not a hard requirement that you have symbols in your vmlinux +file, gdb tends not to be very useful without the symbolic data, so you +will want to turn on ``CONFIG_DEBUG_INFO`` which is called +:menuselection:`Compile the kernel with debug info` in the config menu. + +It is advised, but not required, that you turn on the +``CONFIG_FRAME_POINTER`` kernel option which is called :menuselection:`Compile +the kernel with frame pointers` in the config menu. This option inserts code +to into the compiled executable which saves the frame information in +registers or on the stack at different points which allows a debugger +such as gdb to more accurately construct stack back traces while +debugging the kernel. + +If the architecture that you are using supports the kernel option +``CONFIG_STRICT_KERNEL_RWX``, you should consider turning it off. This +option will prevent the use of software breakpoints because it marks +certain regions of the kernel's memory space as read-only. If kgdb +supports it for the architecture you are using, you can use hardware +breakpoints if you desire to run with the ``CONFIG_STRICT_KERNEL_RWX`` +option turned on, else you need to turn off this option. + +Next you should choose one of more I/O drivers to interconnect debugging +host and debugged target. Early boot debugging requires a KGDB I/O +driver that supports early debugging and the driver must be built into +the kernel directly. Kgdb I/O driver configuration takes place via +kernel or module parameters which you can learn more about in the in the +section that describes the parameter kgdboc. + +Here is an example set of ``.config`` symbols to enable or disable for kgdb:: + + # CONFIG_STRICT_KERNEL_RWX is not set + CONFIG_FRAME_POINTER=y + CONFIG_KGDB=y + CONFIG_KGDB_SERIAL_CONSOLE=y + +Kernel config options for kdb +----------------------------- + +Kdb is quite a bit more complex than the simple gdbstub sitting on top +of the kernel's debug core. Kdb must implement a shell, and also adds +some helper functions in other parts of the kernel, responsible for +printing out interesting data such as what you would see if you ran +``lsmod``, or ``ps``. In order to build kdb into the kernel you follow the +same steps as you would for kgdb. + +The main config option for kdb is ``CONFIG_KGDB_KDB`` which is called +:menuselection:`KGDB_KDB: include kdb frontend for kgdb` in the config menu. +In theory you would have already also selected an I/O driver such as the +``CONFIG_KGDB_SERIAL_CONSOLE`` interface if you plan on using kdb on a +serial port, when you were configuring kgdb. + +If you want to use a PS/2-style keyboard with kdb, you would select +``CONFIG_KDB_KEYBOARD`` which is called :menuselection:`KGDB_KDB: keyboard as +input device` in the config menu. The ``CONFIG_KDB_KEYBOARD`` option is not +used for anything in the gdb interface to kgdb. The ``CONFIG_KDB_KEYBOARD`` +option only works with kdb. + +Here is an example set of ``.config`` symbols to enable/disable kdb:: + + # CONFIG_STRICT_KERNEL_RWX is not set + CONFIG_FRAME_POINTER=y + CONFIG_KGDB=y + CONFIG_KGDB_SERIAL_CONSOLE=y + CONFIG_KGDB_KDB=y + CONFIG_KDB_KEYBOARD=y + +Kernel Debugger Boot Arguments +============================== + +This section describes the various runtime kernel parameters that affect +the configuration of the kernel debugger. The following chapter covers +using kdb and kgdb as well as providing some examples of the +configuration parameters. + +Kernel parameter: kgdboc +------------------------ + +The kgdboc driver was originally an abbreviation meant to stand for +"kgdb over console". Today it is the primary mechanism to configure how +to communicate from gdb to kgdb as well as the devices you want to use +to interact with the kdb shell. + +For kgdb/gdb, kgdboc is designed to work with a single serial port. It +is intended to cover the circumstance where you want to use a serial +console as your primary console as well as using it to perform kernel +debugging. It is also possible to use kgdb on a serial port which is not +designated as a system console. Kgdboc may be configured as a kernel +built-in or a kernel loadable module. You can only make use of +``kgdbwait`` and early debugging if you build kgdboc into the kernel as +a built-in. + +Optionally you can elect to activate kms (Kernel Mode Setting) +integration. When you use kms with kgdboc and you have a video driver +that has atomic mode setting hooks, it is possible to enter the debugger +on the graphics console. When the kernel execution is resumed, the +previous graphics mode will be restored. This integration can serve as a +useful tool to aid in diagnosing crashes or doing analysis of memory +with kdb while allowing the full graphics console applications to run. + +kgdboc arguments +~~~~~~~~~~~~~~~~ + +Usage:: + + kgdboc=[kms][[,]kbd][[,]serial_device][,baud] + +The order listed above must be observed if you use any of the optional +configurations together. + +Abbreviations: + +- kms = Kernel Mode Setting + +- kbd = Keyboard + +You can configure kgdboc to use the keyboard, and/or a serial device +depending on if you are using kdb and/or kgdb, in one of the following +scenarios. The order listed above must be observed if you use any of the +optional configurations together. Using kms + only gdb is generally not +a useful combination. + +Using loadable module or built-in +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +1. As a kernel built-in: + + Use the kernel boot argument:: + + kgdboc=,[baud] + +2. As a kernel loadable module: + + Use the command:: + + modprobe kgdboc kgdboc=,[baud] + + Here are two examples of how you might format the kgdboc string. The + first is for an x86 target using the first serial port. The second + example is for the ARM Versatile AB using the second serial port. + + 1. ``kgdboc=ttyS0,115200`` + + 2. ``kgdboc=ttyAMA1,115200`` + +Configure kgdboc at runtime with sysfs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +At run time you can enable or disable kgdboc by echoing a parameters +into the sysfs. Here are two examples: + +1. Enable kgdboc on ttyS0:: + + echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc + +2. Disable kgdboc:: + + echo "" > /sys/module/kgdboc/parameters/kgdboc + +.. note:: + + You do not need to specify the baud if you are configuring the + console on tty which is already configured or open. + +More examples +^^^^^^^^^^^^^ + +You can configure kgdboc to use the keyboard, and/or a serial device +depending on if you are using kdb and/or kgdb, in one of the following +scenarios. + +1. kdb and kgdb over only a serial port:: + + kgdboc=[,baud] + + Example:: + + kgdboc=ttyS0,115200 + +2. kdb and kgdb with keyboard and a serial port:: + + kgdboc=kbd,[,baud] + + Example:: + + kgdboc=kbd,ttyS0,115200 + +3. kdb with a keyboard:: + + kgdboc=kbd + +4. kdb with kernel mode setting:: + + kgdboc=kms,kbd + +5. kdb with kernel mode setting and kgdb over a serial port:: + + kgdboc=kms,kbd,ttyS0,115200 + +.. note:: + + Kgdboc does not support interrupting the target via the gdb remote + protocol. You must manually send a :kbd:`SysRq-G` unless you have a proxy + that splits console output to a terminal program. A console proxy has a + separate TCP port for the debugger and a separate TCP port for the + "human" console. The proxy can take care of sending the :kbd:`SysRq-G` + for you. + +When using kgdboc with no debugger proxy, you can end up connecting the +debugger at one of two entry points. If an exception occurs after you +have loaded kgdboc, a message should print on the console stating it is +waiting for the debugger. In this case you disconnect your terminal +program and then connect the debugger in its place. If you want to +interrupt the target system and forcibly enter a debug session you have +to issue a :kbd:`Sysrq` sequence and then type the letter :kbd:`g`. Then you +disconnect the terminal session and connect gdb. Your options if you +don't like this are to hack gdb to send the :kbd:`SysRq-G` for you as well as +on the initial connect, or to use a debugger proxy that allows an +unmodified gdb to do the debugging. + +Kernel parameter: ``kgdboc_earlycon`` +------------------------------------- + +If you specify the kernel parameter ``kgdboc_earlycon`` and your serial +driver registers a boot console that supports polling (doesn't need +interrupts and implements a nonblocking read() function) kgdb will attempt +to work using the boot console until it can transition to the regular +tty driver specified by the ``kgdboc`` parameter. + +Normally there is only one boot console (especially that implements the +read() function) so just adding ``kgdboc_earlycon`` on its own is +sufficient to make this work. If you have more than one boot console you +can add the boot console's name to differentiate. Note that names that +are registered through the boot console layer and the tty layer are not +the same for the same port. + +For instance, on one board to be explicit you might do:: + + kgdboc_earlycon=qcom_geni kgdboc=ttyMSM0 + +If the only boot console on the device was "qcom_geni", you could simplify:: + + kgdboc_earlycon kgdboc=ttyMSM0 + +Kernel parameter: ``kgdbwait`` +------------------------------ + +The Kernel command line option ``kgdbwait`` makes kgdb wait for a +debugger connection during booting of a kernel. You can only use this +option if you compiled a kgdb I/O driver into the kernel and you +specified the I/O driver configuration as a kernel command line option. +The kgdbwait parameter should always follow the configuration parameter +for the kgdb I/O driver in the kernel command line else the I/O driver +will not be configured prior to asking the kernel to use it to wait. + +The kernel will stop and wait as early as the I/O driver and +architecture allows when you use this option. If you build the kgdb I/O +driver as a loadable kernel module kgdbwait will not do anything. + +Kernel parameter: ``kgdbcon`` +----------------------------- + +The ``kgdbcon`` feature allows you to see printk() messages inside gdb +while gdb is connected to the kernel. Kdb does not make use of the kgdbcon +feature. + +Kgdb supports using the gdb serial protocol to send console messages to +the debugger when the debugger is connected and running. There are two +ways to activate this feature. + +1. Activate with the kernel command line option:: + + kgdbcon + +2. Use sysfs before configuring an I/O driver:: + + echo 1 > /sys/module/kgdb/parameters/kgdb_use_con + +.. note:: + + If you do this after you configure the kgdb I/O driver, the + setting will not take effect until the next point the I/O is + reconfigured. + +.. important:: + + You cannot use kgdboc + kgdbcon on a tty that is an + active system console. An example of incorrect usage is:: + + console=ttyS0,115200 kgdboc=ttyS0 kgdbcon + +It is possible to use this option with kgdboc on a tty that is not a +system console. + +Run time parameter: ``kgdbreboot`` +---------------------------------- + +The kgdbreboot feature allows you to change how the debugger deals with +the reboot notification. You have 3 choices for the behavior. The +default behavior is always set to 0. + +.. tabularcolumns:: |p{0.4cm}|p{11.5cm}|p{5.6cm}| + +.. flat-table:: + :widths: 1 10 8 + + * - 1 + - ``echo -1 > /sys/module/debug_core/parameters/kgdbreboot`` + - Ignore the reboot notification entirely. + + * - 2 + - ``echo 0 > /sys/module/debug_core/parameters/kgdbreboot`` + - Send the detach message to any attached debugger client. + + * - 3 + - ``echo 1 > /sys/module/debug_core/parameters/kgdbreboot`` + - Enter the debugger on reboot notify. + +Kernel parameter: ``nokaslr`` +----------------------------- + +If the architecture that you are using enable KASLR by default, +you should consider turning it off. KASLR randomizes the +virtual address where the kernel image is mapped and confuse +gdb which resolve kernel symbol address from symbol table +of vmlinux. + +Using kdb +========= + +Quick start for kdb on a serial port +------------------------------------ + +This is a quick example of how to use kdb. + +1. Configure kgdboc at boot using kernel parameters:: + + console=ttyS0,115200 kgdboc=ttyS0,115200 nokaslr + + OR + + Configure kgdboc after the kernel has booted; assuming you are using + a serial port console:: + + echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc + +2. Enter the kernel debugger manually or by waiting for an oops or + fault. There are several ways you can enter the kernel debugger + manually; all involve using the :kbd:`SysRq-G`, which means you must have + enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config. + + - When logged in as root or with a super user session you can run:: + + echo g > /proc/sysrq-trigger + + - Example using minicom 2.2 + + Press: :kbd:`CTRL-A` :kbd:`f` :kbd:`g` + + - When you have telneted to a terminal server that supports sending + a remote break + + Press: :kbd:`CTRL-]` + + Type in: ``send break`` + + Press: :kbd:`Enter` :kbd:`g` + +3. From the kdb prompt you can run the ``help`` command to see a complete + list of the commands that are available. + + Some useful commands in kdb include: + + =========== ================================================================= + ``lsmod`` Shows where kernel modules are loaded + ``ps`` Displays only the active processes + ``ps A`` Shows all the processes + ``summary`` Shows kernel version info and memory usage + ``bt`` Get a backtrace of the current process using dump_stack() + ``dmesg`` View the kernel syslog buffer + ``go`` Continue the system + =========== ================================================================= + +4. When you are done using kdb you need to consider rebooting the system + or using the ``go`` command to resuming normal kernel execution. If you + have paused the kernel for a lengthy period of time, applications + that rely on timely networking or anything to do with real wall clock + time could be adversely affected, so you should take this into + consideration when using the kernel debugger. + +Quick start for kdb using a keyboard connected console +------------------------------------------------------ + +This is a quick example of how to use kdb with a keyboard. + +1. Configure kgdboc at boot using kernel parameters:: + + kgdboc=kbd + + OR + + Configure kgdboc after the kernel has booted:: + + echo kbd > /sys/module/kgdboc/parameters/kgdboc + +2. Enter the kernel debugger manually or by waiting for an oops or + fault. There are several ways you can enter the kernel debugger + manually; all involve using the :kbd:`SysRq-G`, which means you must have + enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config. + + - When logged in as root or with a super user session you can run:: + + echo g > /proc/sysrq-trigger + + - Example using a laptop keyboard: + + Press and hold down: :kbd:`Alt` + + Press and hold down: :kbd:`Fn` + + Press and release the key with the label: :kbd:`SysRq` + + Release: :kbd:`Fn` + + Press and release: :kbd:`g` + + Release: :kbd:`Alt` + + - Example using a PS/2 101-key keyboard + + Press and hold down: :kbd:`Alt` + + Press and release the key with the label: :kbd:`SysRq` + + Press and release: :kbd:`g` + + Release: :kbd:`Alt` + +3. Now type in a kdb command such as ``help``, ``dmesg``, ``bt`` or ``go`` to + continue kernel execution. + +Using kgdb / gdb +================ + +In order to use kgdb you must activate it by passing configuration +information to one of the kgdb I/O drivers. If you do not pass any +configuration information kgdb will not do anything at all. Kgdb will +only actively hook up to the kernel trap hooks if a kgdb I/O driver is +loaded and configured. If you unconfigure a kgdb I/O driver, kgdb will +unregister all the kernel hook points. + +All kgdb I/O drivers can be reconfigured at run time, if +``CONFIG_SYSFS`` and ``CONFIG_MODULES`` are enabled, by echo'ing a new +config string to ``/sys/module//parameter/