summaryrefslogtreecommitdiffstats
path: root/src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md
diff options
context:
space:
mode:
Diffstat (limited to 'src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md')
-rw-r--r--src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md354
1 files changed, 354 insertions, 0 deletions
diff --git a/src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md b/src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md
new file mode 100644
index 000000000..932b23b92
--- /dev/null
+++ b/src/doc/rustc-dev-guide/src/debugging-support-in-rustc.md
@@ -0,0 +1,354 @@
+# Debugging support in the Rust compiler
+
+<!-- toc -->
+
+This document explains the state of debugging tools support in the Rust compiler (rustc).
+It gives an overview of GDB, LLDB, WinDbg/CDB,
+as well as infrastructure around Rust compiler to debug Rust code.
+If you want to learn how to debug the Rust compiler itself,
+see [Debugging the Compiler].
+
+The material is gathered from the video,
+[Tom Tromey discusses debugging support in rustc].
+
+## Preliminaries
+
+### Debuggers
+
+According to Wikipedia
+
+> A [debugger or debugging tool] is a computer program that is used to test and debug
+> other programs (the "target" program).
+
+Writing a debugger from scratch for a language requires a lot of work, especially if
+debuggers have to be supported on various platforms. GDB and LLDB, however, can be
+extended to support debugging a language. This is the path that Rust has chosen.
+This document's main goal is to document the said debuggers support in Rust compiler.
+
+### DWARF
+
+According to the [DWARF] standard website
+
+> DWARF is a debugging file format used by many compilers and debuggers to support source level
+> debugging. It addresses the requirements of a number of procedural languages,
+> such as C, C++, and Fortran, and is designed to be extensible to other languages.
+> DWARF is architecture independent and applicable to any processor or operating system.
+> It is widely used on Unix, Linux and other operating systems,
+> as well as in stand-alone environments.
+
+DWARF reader is a program that consumes the DWARF format and creates debugger compatible output.
+This program may live in the compiler itself. DWARF uses a data structure called
+Debugging Information Entry (DIE) which stores the information as "tags" to denote functions,
+variables etc., e.g., `DW_TAG_variable`, `DW_TAG_pointer_type`, `DW_TAG_subprogram` etc.
+You can also invent your own tags and attributes.
+
+### CodeView/PDB
+
+[PDB] (Program Database) is a file format created by Microsoft that contains debug information.
+PDBs can be consumed by debuggers such as WinDbg/CDB and other tools to display debug information.
+A PDB contains multiple streams that describe debug information about a specific binary such
+as types, symbols, and source files used to compile the given binary. CodeView is another
+format which defines the structure of [symbol records] and [type records] that appear within
+PDB streams.
+
+## Supported debuggers
+
+### GDB
+
+#### Rust expression parser
+
+To be able to show debug output, we need an expression parser.
+This (GDB) expression parser is written in [Bison],
+and can parse only a subset of Rust expressions.
+GDB parser was written from scratch and has no relation to any other parser,
+including that of rustc.
+
+GDB has Rust-like value and type output. It can print values and types in a way
+that look like Rust syntax in the output. Or when you print a type as [ptype] in GDB,
+it also looks like Rust source code. Checkout the documentation in the [manual for GDB/Rust].
+
+#### Parser extensions
+
+Expression parser has a couple of extensions in it to facilitate features that you cannot do
+with Rust. Some limitations are listed in the [manual for GDB/Rust]. There is some special
+code in the DWARF reader in GDB to support the extensions.
+
+A couple of examples of DWARF reader support needed are as follows:
+
+1. Enum: Needed for support for enum types.
+ The Rust compiler writes the information about enum into DWARF,
+ and GDB reads the DWARF to understand where is the tag field,
+ or if there is a tag field,
+ or if the tag slot is shared with non-zero optimization etc.
+
+2. Dissect trait objects: DWARF extension where the trait object's description in the DWARF
+ also points to a stub description of the corresponding vtable which in turn points to the
+ concrete type for which this trait object exists. This means that you can do a `print *object`
+ for that trait object, and GDB will understand how to find the correct type of the payload in
+ the trait object.
+
+**TODO**: Figure out if the following should be mentioned in the GDB-Rust document rather than
+this guide page so there is no duplication. This is regarding the following comments:
+
+[This comment by Tom](https://github.com/rust-lang/rustc-dev-guide/pull/316#discussion_r284027340)
+> gdb's Rust extensions and limitations are documented in the gdb manual:
+https://sourceware.org/gdb/onlinedocs/gdb/Rust.html -- however, this neglects to mention that
+gdb convenience variables and registers follow the gdb $ convention, and that the Rust parser
+implements the gdb @ extension.
+
+[This question by Aman](https://github.com/rust-lang/rustc-dev-guide/pull/316#discussion_r285401353)
+> @tromey do you think we should mention this part in the GDB-Rust document rather than this
+document so there is no duplication etc.?
+
+### LLDB
+
+#### Rust expression parser
+
+This expression parser is written in C++. It is a type of [Recursive Descent parser].
+It implements slightly less of the Rust language than GDB.
+LLDB has Rust-like value and type output.
+
+#### Developer notes
+
+* LLDB has a plugin architecture but that does not work for language support.
+* GDB generally works better on Linux.
+
+### WinDbg/CDB
+
+Microsoft provides [Windows Debugging Tools] such as the Windows Debugger (WinDbg) and
+the Console Debugger (CDB) which both support debugging programs written in Rust. These
+debuggers parse the debug info for a binary from the `PDB`, if available, to construct a
+visualization to serve up in the debugger.
+
+#### Natvis
+
+Both WinDbg and CDB support defining and viewing custom visualizations for any given type
+within the debugger using the Natvis framework. The Rust compiler defines a set of Natvis
+files that define custom visualizations for a subset of types in the standard libraries such
+as, `std`, `core`, and `alloc`. These Natvis files are embedded into `PDBs` generated by the
+`*-pc-windows-msvc` target triples to automatically enable these custom visualizations when
+debugging. This default can be overridden by setting the `strip` rustc flag to either `debuginfo`
+or `symbols`.
+
+Rust has support for embedding Natvis files for crates outside of the standard libraries by
+using the `#[debugger_visualizer]` attribute.
+For more details on how to embed debugger visualizers,
+please refer to the `#[debugger_visualizer]` attribute in
+[the unstable book](https://doc.rust-lang.org/unstable-book/language-features/debugger-visualizer.html).
+
+## DWARF and `rustc`
+
+[DWARF] is the standard way compilers generate debugging information that debuggers read.
+It is _the_ debugging format on macOS and Linux.
+It is a multi-language and extensible format,
+and is mostly good enough for Rust's purposes.
+Hence, the current implementation reuses DWARF's concepts.
+This is true even if some of the concepts in DWARF do not align with Rust semantically because,
+generally, there can be some kind of mapping between the two.
+
+We have some DWARF extensions that the Rust compiler emits and the debuggers understand that
+are _not_ in the DWARF standard.
+
+* Rust compiler will emit DWARF for a virtual table, and this `vtable` object will have a
+ `DW_AT_containing_type` that points to the real type. This lets debuggers dissect a trait object
+ pointer to correctly find the payload. E.g., here's such a DIE, from a test case in the gdb
+ repository:
+
+ ```asm
+ <1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type)
+ <1aa> DW_AT_containing_type: <0x1b4>
+ <1ae> DW_AT_name : (indirect string, offset: 0x23d): vtable
+ <1b2> DW_AT_byte_size : 0
+ <1b3> DW_AT_alignment : 8
+ ```
+
+* The other extension is that the Rust compiler can emit a tagless discriminated union.
+ See [DWARF feature request] for this item.
+
+### Current limitations of DWARF
+
+* Traits - require a bigger change than normal to DWARF, on how to represent Traits in DWARF.
+* DWARF provides no way to differentiate between Structs and Tuples. Rust compiler emits
+fields with `__0` and debuggers look for a sequence of such names to overcome this limitation.
+For example, in this case the debugger would look at a field via `x.__0` instead of `x.0`.
+This is resolved via the Rust parser in the debugger so now you can do `x.0`.
+
+DWARF relies on debuggers to know some information about platform ABI.
+Rust does not do that all the time.
+
+## Developer notes
+
+This section is from the talk about certain aspects of development.
+
+## What is missing
+
+### Code signing for LLDB debug server on macOS
+
+According to Wikipedia, [System Integrity Protection] is
+
+> System Integrity Protection (SIP, sometimes referred to as rootless) is a security feature
+> of Apple's macOS operating system introduced in OS X El Capitan. It comprises a number of
+> mechanisms that are enforced by the kernel. A centerpiece is the protection of system-owned
+> files and directories against modifications by processes without a specific "entitlement",
+> even when executed by the root user or a user with root privileges (sudo).
+
+It prevents processes using `ptrace` syscall. If a process wants to use `ptrace` it has to be
+code signed. The certificate that signs it has to be trusted on your machine.
+
+See [Apple developer documentation for System Integrity Protection].
+
+We may need to sign up with Apple and get the keys to do this signing. Tom has looked into if
+Mozilla cannot do this because it is at the maximum number of
+keys it is allowed to sign. Tom does not know if Mozilla could get more keys.
+
+Alternatively, Tom suggests that maybe a Rust legal entity is needed to get the keys via Apple.
+This problem is not technical in nature. If we had such a key we could sign GDB as well and
+ship that.
+
+### DWARF and Traits
+
+Rust traits are not emitted into DWARF at all. The impact of this is calling a method `x.method()`
+does not work as is. The reason being that method is implemented by a trait, as opposed
+to a type. That information is not present so finding trait methods is missing.
+
+DWARF has a notion of interface types (possibly added for Java). Tom's idea was to use this
+interface type as traits.
+
+DWARF only deals with concrete names, not the reference types. So, a given implementation of a
+trait for a type would be one of these interfaces (`DW_tag_interface` type). Also, the type for
+which it is implemented would describe all the interfaces this type implements. This requires a
+DWARF extension.
+
+Issue on Github: [https://github.com/rust-lang/rust/issues/33014]
+
+## Typical process for a Debug Info change (LLVM)
+
+LLVM has Debug Info (DI) builders. This is the primary thing that Rust calls into.
+This is why we need to change LLVM first because that is emitted first and not DWARF directly.
+This is a kind of metadata that you construct and hand-off to LLVM. For the Rustc/LLVM hand-off
+some LLVM DI builder methods are called to construct representation of a type.
+
+The steps of this process are as follows:
+
+1. LLVM needs changing.
+
+ LLVM does not emit Interface types at all, so this needs to be implemented in the LLVM first.
+
+ Get sign off on LLVM maintainers that this is a good idea.
+
+2. Change the DWARF extension.
+
+3. Update the debuggers.
+
+ Update DWARF readers, expression evaluators.
+
+4. Update Rust compiler.
+
+ Change it to emit this new information.
+
+### Procedural macro stepping
+
+A deeply profound question is that how do you actually debug a procedural macro?
+What is the location you emit for a macro expansion? Consider some of the following cases -
+
+* You can emit location of the invocation of the macro.
+* You can emit the location of the definition of the macro.
+* You can emit locations of the content of the macro.
+
+RFC: [https://github.com/rust-lang/rfcs/pull/2117]
+
+Focus is to let macros decide what to do. This can be achieved by having some kind of attribute
+that lets the macro tell the compiler where the line marker should be. This affects where you
+set the breakpoints and what happens when you step it.
+
+## Source file checksums in debug info
+
+Both DWARF and CodeView (PDB) support embedding a cryptographic hash of each source file that
+contributed to the associated binary.
+
+The cryptographic hash can be used by a debugger to verify that the source file matches the
+executable. If the source file does not match, the debugger can provide a warning to the user.
+
+The hash can also be used to prove that a given source file has not been modified since it was
+used to compile an executable. Because MD5 and SHA1 both have demonstrated vulnerabilities,
+using SHA256 is recommended for this application.
+
+The Rust compiler stores the hash for each source file in the corresponding `SourceFile` in
+the `SourceMap`. The hashes of input files to external crates are stored in `rlib` metadata.
+
+A default hashing algorithm is set in the target specification. This allows the target to
+specify the best hash available, since not all targets support all hash algorithms.
+
+The hashing algorithm for a target can also be overridden with the `-Z source-file-checksum=`
+command-line option.
+
+#### DWARF 5
+DWARF version 5 supports embedding an MD5 hash to validate the source file version in use.
+DWARF 5 - Section 6.2.4.1 opcode DW_LNCT_MD5
+
+#### LLVM
+LLVM IR supports MD5 and SHA1 (and SHA256 in LLVM 11+) source file checksums in the DIFile node.
+
+[LLVM DIFile documentation](https://llvm.org/docs/LangRef.html#difile)
+
+#### Microsoft Visual C++ Compiler /ZH option
+The MSVC compiler supports embedding MD5, SHA1, or SHA256 hashes in the PDB using the `/ZH`
+compiler option.
+
+[MSVC /ZH documentation](https://docs.microsoft.com/en-us/cpp/build/reference/zh)
+
+#### Clang
+Clang always embeds an MD5 checksum, though this does not appear in documentation.
+
+## Future work
+
+#### Name mangling changes
+
+* New demangler in `libiberty` (gcc source tree).
+* New demangler in LLVM or LLDB.
+
+**TODO**: Check the location of the demangler source. [#1157](https://github.com/rust-lang/rustc-dev-guide/issues/1157)
+
+#### Reuse Rust compiler for expressions
+
+This is an important idea because debuggers by and large do not try to implement type
+inference. You need to be much more explicit when you type into the debugger than your
+actual source code. So, you cannot just copy and paste an expression from your source
+code to debugger and expect the same answer but this would be nice. This can be helped
+by using compiler.
+
+It is certainly doable but it is a large project. You certainly need a bridge to the
+debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang)
+have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC.
+
+Both debuggers expression evaluation implement both a superset and a subset of Rust.
+They implement just the expression language,
+but they also add some extensions like GDB has convenience variables.
+Therefore, if you are taking this route,
+then you not only need to do this bridge,
+but may have to add some mode to let the compiler understand some extensions.
+
+[Tom Tromey discusses debugging support in rustc]: https://www.youtube.com/watch?v=elBxMRSNYr4
+[Debugging the Compiler]: compiler-debugging.md
+[debugger or debugging tool]: https://en.wikipedia.org/wiki/Debugger
+[Bison]: https://www.gnu.org/software/bison/
+[ptype]: https://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_109.html
+[rust-lang/lldb wiki page]: https://github.com/rust-lang/lldb/wiki
+[DWARF]: http://dwarfstd.org
+[manual for GDB/Rust]: https://sourceware.org/gdb/onlinedocs/gdb/Rust.html
+[GDB Bugzilla]: https://sourceware.org/bugzilla/
+[Recursive Descent parser]: https://en.wikipedia.org/wiki/Recursive_descent_parser
+[System Integrity Protection]: https://en.wikipedia.org/wiki/System_Integrity_Protection
+[https://github.com/rust-dev-tools/gdb]: https://github.com/rust-dev-tools/gdb
+[DWARF feature request]: http://dwarfstd.org/ShowIssue.php?issue=180517.2
+[https://docs.python.org/3/c-api/stable.html]: https://docs.python.org/3/c-api/stable.html
+[https://github.com/rust-lang/rfcs/pull/2117]: https://github.com/rust-lang/rfcs/pull/2117
+[https://github.com/rust-lang/rust/issues/33014]: https://github.com/rust-lang/rust/issues/33014
+[https://github.com/rust-lang/rust/issues/34457]: https://github.com/rust-lang/rust/issues/34457
+[Apple developer documentation for System Integrity Protection]: https://developer.apple.com/library/archive/releasenotes/MacOSX/WhatsNewInOSX/Articles/MacOSX10_11.html#//apple_ref/doc/uid/TP40016227-SW11
+[https://github.com/rust-lang/lldb]: https://github.com/rust-lang/lldb
+[https://github.com/rust-lang/llvm-project]: https://github.com/rust-lang/llvm-project
+[PDB]: https://llvm.org/docs/PDB/index.html
+[symbol records]: https://llvm.org/docs/PDB/CodeViewSymbols.html
+[type records]: https://llvm.org/docs/PDB/CodeViewTypes.html
+[Windows Debugging Tools]: https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/