path: root/src/doc/rustc-dev-guide/src/
diff options
Diffstat (limited to 'src/doc/rustc-dev-guide/src/')
1 files changed, 354 insertions, 0 deletions
diff --git a/src/doc/rustc-dev-guide/src/ b/src/doc/rustc-dev-guide/src/
new file mode 100644
index 000000000..932b23b92
--- /dev/null
+++ b/src/doc/rustc-dev-guide/src/
@@ -0,0 +1,354 @@
+# Debugging support in the Rust compiler
+<!-- toc -->
+This document explains the state of debugging tools support in the Rust compiler (rustc).
+It gives an overview of GDB, LLDB, WinDbg/CDB,
+as well as infrastructure around Rust compiler to debug Rust code.
+If you want to learn how to debug the Rust compiler itself,
+see [Debugging the Compiler].
+The material is gathered from the video,
+[Tom Tromey discusses debugging support in rustc].
+## Preliminaries
+### Debuggers
+According to Wikipedia
+> A [debugger or debugging tool] is a computer program that is used to test and debug
+> other programs (the "target" program).
+Writing a debugger from scratch for a language requires a lot of work, especially if
+debuggers have to be supported on various platforms. GDB and LLDB, however, can be
+extended to support debugging a language. This is the path that Rust has chosen.
+This document's main goal is to document the said debuggers support in Rust compiler.
+### DWARF
+According to the [DWARF] standard website
+> DWARF is a debugging file format used by many compilers and debuggers to support source level
+> debugging. It addresses the requirements of a number of procedural languages,
+> such as C, C++, and Fortran, and is designed to be extensible to other languages.
+> DWARF is architecture independent and applicable to any processor or operating system.
+> It is widely used on Unix, Linux and other operating systems,
+> as well as in stand-alone environments.
+DWARF reader is a program that consumes the DWARF format and creates debugger compatible output.
+This program may live in the compiler itself. DWARF uses a data structure called
+Debugging Information Entry (DIE) which stores the information as "tags" to denote functions,
+variables etc., e.g., `DW_TAG_variable`, `DW_TAG_pointer_type`, `DW_TAG_subprogram` etc.
+You can also invent your own tags and attributes.
+### CodeView/PDB
+[PDB] (Program Database) is a file format created by Microsoft that contains debug information.
+PDBs can be consumed by debuggers such as WinDbg/CDB and other tools to display debug information.
+A PDB contains multiple streams that describe debug information about a specific binary such
+as types, symbols, and source files used to compile the given binary. CodeView is another
+format which defines the structure of [symbol records] and [type records] that appear within
+PDB streams.
+## Supported debuggers
+### GDB
+#### Rust expression parser
+To be able to show debug output, we need an expression parser.
+This (GDB) expression parser is written in [Bison],
+and can parse only a subset of Rust expressions.
+GDB parser was written from scratch and has no relation to any other parser,
+including that of rustc.
+GDB has Rust-like value and type output. It can print values and types in a way
+that look like Rust syntax in the output. Or when you print a type as [ptype] in GDB,
+it also looks like Rust source code. Checkout the documentation in the [manual for GDB/Rust].
+#### Parser extensions
+Expression parser has a couple of extensions in it to facilitate features that you cannot do
+with Rust. Some limitations are listed in the [manual for GDB/Rust]. There is some special
+code in the DWARF reader in GDB to support the extensions.
+A couple of examples of DWARF reader support needed are as follows:
+1. Enum: Needed for support for enum types.
+ The Rust compiler writes the information about enum into DWARF,
+ and GDB reads the DWARF to understand where is the tag field,
+ or if there is a tag field,
+ or if the tag slot is shared with non-zero optimization etc.
+2. Dissect trait objects: DWARF extension where the trait object's description in the DWARF
+ also points to a stub description of the corresponding vtable which in turn points to the
+ concrete type for which this trait object exists. This means that you can do a `print *object`
+ for that trait object, and GDB will understand how to find the correct type of the payload in
+ the trait object.
+**TODO**: Figure out if the following should be mentioned in the GDB-Rust document rather than
+this guide page so there is no duplication. This is regarding the following comments:
+[This comment by Tom](
+> gdb's Rust extensions and limitations are documented in the gdb manual:
+ -- however, this neglects to mention that
+gdb convenience variables and registers follow the gdb $ convention, and that the Rust parser
+implements the gdb @ extension.
+[This question by Aman](
+> @tromey do you think we should mention this part in the GDB-Rust document rather than this
+document so there is no duplication etc.?
+### LLDB
+#### Rust expression parser
+This expression parser is written in C++. It is a type of [Recursive Descent parser].
+It implements slightly less of the Rust language than GDB.
+LLDB has Rust-like value and type output.
+#### Developer notes
+* LLDB has a plugin architecture but that does not work for language support.
+* GDB generally works better on Linux.
+### WinDbg/CDB
+Microsoft provides [Windows Debugging Tools] such as the Windows Debugger (WinDbg) and
+the Console Debugger (CDB) which both support debugging programs written in Rust. These
+debuggers parse the debug info for a binary from the `PDB`, if available, to construct a
+visualization to serve up in the debugger.
+#### Natvis
+Both WinDbg and CDB support defining and viewing custom visualizations for any given type
+within the debugger using the Natvis framework. The Rust compiler defines a set of Natvis
+files that define custom visualizations for a subset of types in the standard libraries such
+as, `std`, `core`, and `alloc`. These Natvis files are embedded into `PDBs` generated by the
+`*-pc-windows-msvc` target triples to automatically enable these custom visualizations when
+debugging. This default can be overridden by setting the `strip` rustc flag to either `debuginfo`
+or `symbols`.
+Rust has support for embedding Natvis files for crates outside of the standard libraries by
+using the `#[debugger_visualizer]` attribute.
+For more details on how to embed debugger visualizers,
+please refer to the `#[debugger_visualizer]` attribute in
+[the unstable book](
+## DWARF and `rustc`
+[DWARF] is the standard way compilers generate debugging information that debuggers read.
+It is _the_ debugging format on macOS and Linux.
+It is a multi-language and extensible format,
+and is mostly good enough for Rust's purposes.
+Hence, the current implementation reuses DWARF's concepts.
+This is true even if some of the concepts in DWARF do not align with Rust semantically because,
+generally, there can be some kind of mapping between the two.
+We have some DWARF extensions that the Rust compiler emits and the debuggers understand that
+are _not_ in the DWARF standard.
+* Rust compiler will emit DWARF for a virtual table, and this `vtable` object will have a
+ `DW_AT_containing_type` that points to the real type. This lets debuggers dissect a trait object
+ pointer to correctly find the payload. E.g., here's such a DIE, from a test case in the gdb
+ repository:
+ ```asm
+ <1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type)
+ <1aa> DW_AT_containing_type: <0x1b4>
+ <1ae> DW_AT_name : (indirect string, offset: 0x23d): vtable
+ <1b2> DW_AT_byte_size : 0
+ <1b3> DW_AT_alignment : 8
+ ```
+* The other extension is that the Rust compiler can emit a tagless discriminated union.
+ See [DWARF feature request] for this item.
+### Current limitations of DWARF
+* Traits - require a bigger change than normal to DWARF, on how to represent Traits in DWARF.
+* DWARF provides no way to differentiate between Structs and Tuples. Rust compiler emits
+fields with `__0` and debuggers look for a sequence of such names to overcome this limitation.
+For example, in this case the debugger would look at a field via `x.__0` instead of `x.0`.
+This is resolved via the Rust parser in the debugger so now you can do `x.0`.
+DWARF relies on debuggers to know some information about platform ABI.
+Rust does not do that all the time.
+## Developer notes
+This section is from the talk about certain aspects of development.
+## What is missing
+### Code signing for LLDB debug server on macOS
+According to Wikipedia, [System Integrity Protection] is
+> System Integrity Protection (SIP, sometimes referred to as rootless) is a security feature
+> of Apple's macOS operating system introduced in OS X El Capitan. It comprises a number of
+> mechanisms that are enforced by the kernel. A centerpiece is the protection of system-owned
+> files and directories against modifications by processes without a specific "entitlement",
+> even when executed by the root user or a user with root privileges (sudo).
+It prevents processes using `ptrace` syscall. If a process wants to use `ptrace` it has to be
+code signed. The certificate that signs it has to be trusted on your machine.
+See [Apple developer documentation for System Integrity Protection].
+We may need to sign up with Apple and get the keys to do this signing. Tom has looked into if
+Mozilla cannot do this because it is at the maximum number of
+keys it is allowed to sign. Tom does not know if Mozilla could get more keys.
+Alternatively, Tom suggests that maybe a Rust legal entity is needed to get the keys via Apple.
+This problem is not technical in nature. If we had such a key we could sign GDB as well and
+ship that.
+### DWARF and Traits
+Rust traits are not emitted into DWARF at all. The impact of this is calling a method `x.method()`
+does not work as is. The reason being that method is implemented by a trait, as opposed
+to a type. That information is not present so finding trait methods is missing.
+DWARF has a notion of interface types (possibly added for Java). Tom's idea was to use this
+interface type as traits.
+DWARF only deals with concrete names, not the reference types. So, a given implementation of a
+trait for a type would be one of these interfaces (`DW_tag_interface` type). Also, the type for
+which it is implemented would describe all the interfaces this type implements. This requires a
+DWARF extension.
+Issue on Github: []
+## Typical process for a Debug Info change (LLVM)
+LLVM has Debug Info (DI) builders. This is the primary thing that Rust calls into.
+This is why we need to change LLVM first because that is emitted first and not DWARF directly.
+This is a kind of metadata that you construct and hand-off to LLVM. For the Rustc/LLVM hand-off
+some LLVM DI builder methods are called to construct representation of a type.
+The steps of this process are as follows:
+1. LLVM needs changing.
+ LLVM does not emit Interface types at all, so this needs to be implemented in the LLVM first.
+ Get sign off on LLVM maintainers that this is a good idea.
+2. Change the DWARF extension.
+3. Update the debuggers.
+ Update DWARF readers, expression evaluators.
+4. Update Rust compiler.
+ Change it to emit this new information.
+### Procedural macro stepping
+A deeply profound question is that how do you actually debug a procedural macro?
+What is the location you emit for a macro expansion? Consider some of the following cases -
+* You can emit location of the invocation of the macro.
+* You can emit the location of the definition of the macro.
+* You can emit locations of the content of the macro.
+RFC: []
+Focus is to let macros decide what to do. This can be achieved by having some kind of attribute
+that lets the macro tell the compiler where the line marker should be. This affects where you
+set the breakpoints and what happens when you step it.
+## Source file checksums in debug info
+Both DWARF and CodeView (PDB) support embedding a cryptographic hash of each source file that
+contributed to the associated binary.
+The cryptographic hash can be used by a debugger to verify that the source file matches the
+executable. If the source file does not match, the debugger can provide a warning to the user.
+The hash can also be used to prove that a given source file has not been modified since it was
+used to compile an executable. Because MD5 and SHA1 both have demonstrated vulnerabilities,
+using SHA256 is recommended for this application.
+The Rust compiler stores the hash for each source file in the corresponding `SourceFile` in
+the `SourceMap`. The hashes of input files to external crates are stored in `rlib` metadata.
+A default hashing algorithm is set in the target specification. This allows the target to
+specify the best hash available, since not all targets support all hash algorithms.
+The hashing algorithm for a target can also be overridden with the `-Z source-file-checksum=`
+command-line option.
+#### DWARF 5
+DWARF version 5 supports embedding an MD5 hash to validate the source file version in use.
+DWARF 5 - Section opcode DW_LNCT_MD5
+#### LLVM
+LLVM IR supports MD5 and SHA1 (and SHA256 in LLVM 11+) source file checksums in the DIFile node.
+[LLVM DIFile documentation](
+#### Microsoft Visual C++ Compiler /ZH option
+The MSVC compiler supports embedding MD5, SHA1, or SHA256 hashes in the PDB using the `/ZH`
+compiler option.
+[MSVC /ZH documentation](
+#### Clang
+Clang always embeds an MD5 checksum, though this does not appear in documentation.
+## Future work
+#### Name mangling changes
+* New demangler in `libiberty` (gcc source tree).
+* New demangler in LLVM or LLDB.
+**TODO**: Check the location of the demangler source. [#1157](
+#### Reuse Rust compiler for expressions
+This is an important idea because debuggers by and large do not try to implement type
+inference. You need to be much more explicit when you type into the debugger than your
+actual source code. So, you cannot just copy and paste an expression from your source
+code to debugger and expect the same answer but this would be nice. This can be helped
+by using compiler.
+It is certainly doable but it is a large project. You certainly need a bridge to the
+debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang)
+have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC.
+Both debuggers expression evaluation implement both a superset and a subset of Rust.
+They implement just the expression language,
+but they also add some extensions like GDB has convenience variables.
+Therefore, if you are taking this route,
+then you not only need to do this bridge,
+but may have to add some mode to let the compiler understand some extensions.
+[Tom Tromey discusses debugging support in rustc]:
+[Debugging the Compiler]:
+[debugger or debugging tool]:
+[rust-lang/lldb wiki page]:
+[manual for GDB/Rust]:
+[GDB Bugzilla]:
+[Recursive Descent parser]:
+[System Integrity Protection]:
+[DWARF feature request]:
+[Apple developer documentation for System Integrity Protection]:
+[symbol records]:
+[type records]:
+[Windows Debugging Tools]: