1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
|
# Debugging support in the Rust compiler
<!-- toc -->
This document explains the state of debugging tools support in the Rust compiler (rustc).
It gives an overview of GDB, LLDB, WinDbg/CDB,
as well as infrastructure around Rust compiler to debug Rust code.
If you want to learn how to debug the Rust compiler itself,
see [Debugging the Compiler].
The material is gathered from the video,
[Tom Tromey discusses debugging support in rustc].
## Preliminaries
### Debuggers
According to Wikipedia
> A [debugger or debugging tool] is a computer program that is used to test and debug
> other programs (the "target" program).
Writing a debugger from scratch for a language requires a lot of work, especially if
debuggers have to be supported on various platforms. GDB and LLDB, however, can be
extended to support debugging a language. This is the path that Rust has chosen.
This document's main goal is to document the said debuggers support in Rust compiler.
### DWARF
According to the [DWARF] standard website
> DWARF is a debugging file format used by many compilers and debuggers to support source level
> debugging. It addresses the requirements of a number of procedural languages,
> such as C, C++, and Fortran, and is designed to be extensible to other languages.
> DWARF is architecture independent and applicable to any processor or operating system.
> It is widely used on Unix, Linux and other operating systems,
> as well as in stand-alone environments.
DWARF reader is a program that consumes the DWARF format and creates debugger compatible output.
This program may live in the compiler itself. DWARF uses a data structure called
Debugging Information Entry (DIE) which stores the information as "tags" to denote functions,
variables etc., e.g., `DW_TAG_variable`, `DW_TAG_pointer_type`, `DW_TAG_subprogram` etc.
You can also invent your own tags and attributes.
### CodeView/PDB
[PDB] (Program Database) is a file format created by Microsoft that contains debug information.
PDBs can be consumed by debuggers such as WinDbg/CDB and other tools to display debug information.
A PDB contains multiple streams that describe debug information about a specific binary such
as types, symbols, and source files used to compile the given binary. CodeView is another
format which defines the structure of [symbol records] and [type records] that appear within
PDB streams.
## Supported debuggers
### GDB
#### Rust expression parser
To be able to show debug output, we need an expression parser.
This (GDB) expression parser is written in [Bison],
and can parse only a subset of Rust expressions.
GDB parser was written from scratch and has no relation to any other parser,
including that of rustc.
GDB has Rust-like value and type output. It can print values and types in a way
that look like Rust syntax in the output. Or when you print a type as [ptype] in GDB,
it also looks like Rust source code. Checkout the documentation in the [manual for GDB/Rust].
#### Parser extensions
Expression parser has a couple of extensions in it to facilitate features that you cannot do
with Rust. Some limitations are listed in the [manual for GDB/Rust]. There is some special
code in the DWARF reader in GDB to support the extensions.
A couple of examples of DWARF reader support needed are as follows:
1. Enum: Needed for support for enum types.
The Rust compiler writes the information about enum into DWARF,
and GDB reads the DWARF to understand where is the tag field,
or if there is a tag field,
or if the tag slot is shared with non-zero optimization etc.
2. Dissect trait objects: DWARF extension where the trait object's description in the DWARF
also points to a stub description of the corresponding vtable which in turn points to the
concrete type for which this trait object exists. This means that you can do a `print *object`
for that trait object, and GDB will understand how to find the correct type of the payload in
the trait object.
**TODO**: Figure out if the following should be mentioned in the GDB-Rust document rather than
this guide page so there is no duplication. This is regarding the following comments:
[This comment by Tom](https://github.com/rust-lang/rustc-dev-guide/pull/316#discussion_r284027340)
> gdb's Rust extensions and limitations are documented in the gdb manual:
https://sourceware.org/gdb/onlinedocs/gdb/Rust.html -- however, this neglects to mention that
gdb convenience variables and registers follow the gdb $ convention, and that the Rust parser
implements the gdb @ extension.
[This question by Aman](https://github.com/rust-lang/rustc-dev-guide/pull/316#discussion_r285401353)
> @tromey do you think we should mention this part in the GDB-Rust document rather than this
document so there is no duplication etc.?
### LLDB
#### Rust expression parser
This expression parser is written in C++. It is a type of [Recursive Descent parser].
It implements slightly less of the Rust language than GDB.
LLDB has Rust-like value and type output.
#### Developer notes
* LLDB has a plugin architecture but that does not work for language support.
* GDB generally works better on Linux.
### WinDbg/CDB
Microsoft provides [Windows Debugging Tools] such as the Windows Debugger (WinDbg) and
the Console Debugger (CDB) which both support debugging programs written in Rust. These
debuggers parse the debug info for a binary from the `PDB`, if available, to construct a
visualization to serve up in the debugger.
#### Natvis
Both WinDbg and CDB support defining and viewing custom visualizations for any given type
within the debugger using the Natvis framework. The Rust compiler defines a set of Natvis
files that define custom visualizations for a subset of types in the standard libraries such
as, `std`, `core`, and `alloc`. These Natvis files are embedded into `PDBs` generated by the
`*-pc-windows-msvc` target triples to automatically enable these custom visualizations when
debugging. This default can be overridden by setting the `strip` rustc flag to either `debuginfo`
or `symbols`.
Rust has support for embedding Natvis files for crates outside of the standard libraries by
using the `#[debugger_visualizer]` attribute.
For more details on how to embed debugger visualizers,
please refer to the section on the [`debugger_visualizer` attribute].
## DWARF and `rustc`
[DWARF] is the standard way compilers generate debugging information that debuggers read.
It is _the_ debugging format on macOS and Linux.
It is a multi-language and extensible format,
and is mostly good enough for Rust's purposes.
Hence, the current implementation reuses DWARF's concepts.
This is true even if some of the concepts in DWARF do not align with Rust semantically because,
generally, there can be some kind of mapping between the two.
We have some DWARF extensions that the Rust compiler emits and the debuggers understand that
are _not_ in the DWARF standard.
* Rust compiler will emit DWARF for a virtual table, and this `vtable` object will have a
`DW_AT_containing_type` that points to the real type. This lets debuggers dissect a trait object
pointer to correctly find the payload. E.g., here's such a DIE, from a test case in the gdb
repository:
```asm
<1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type)
<1aa> DW_AT_containing_type: <0x1b4>
<1ae> DW_AT_name : (indirect string, offset: 0x23d): vtable
<1b2> DW_AT_byte_size : 0
<1b3> DW_AT_alignment : 8
```
* The other extension is that the Rust compiler can emit a tagless discriminated union.
See [DWARF feature request] for this item.
### Current limitations of DWARF
* Traits - require a bigger change than normal to DWARF, on how to represent Traits in DWARF.
* DWARF provides no way to differentiate between Structs and Tuples. Rust compiler emits
fields with `__0` and debuggers look for a sequence of such names to overcome this limitation.
For example, in this case the debugger would look at a field via `x.__0` instead of `x.0`.
This is resolved via the Rust parser in the debugger so now you can do `x.0`.
DWARF relies on debuggers to know some information about platform ABI.
Rust does not do that all the time.
## Developer notes
This section is from the talk about certain aspects of development.
## What is missing
### Code signing for LLDB debug server on macOS
According to Wikipedia, [System Integrity Protection] is
> System Integrity Protection (SIP, sometimes referred to as rootless) is a security feature
> of Apple's macOS operating system introduced in OS X El Capitan. It comprises a number of
> mechanisms that are enforced by the kernel. A centerpiece is the protection of system-owned
> files and directories against modifications by processes without a specific "entitlement",
> even when executed by the root user or a user with root privileges (sudo).
It prevents processes using `ptrace` syscall. If a process wants to use `ptrace` it has to be
code signed. The certificate that signs it has to be trusted on your machine.
See [Apple developer documentation for System Integrity Protection].
We may need to sign up with Apple and get the keys to do this signing. Tom has looked into if
Mozilla cannot do this because it is at the maximum number of
keys it is allowed to sign. Tom does not know if Mozilla could get more keys.
Alternatively, Tom suggests that maybe a Rust legal entity is needed to get the keys via Apple.
This problem is not technical in nature. If we had such a key we could sign GDB as well and
ship that.
### DWARF and Traits
Rust traits are not emitted into DWARF at all. The impact of this is calling a method `x.method()`
does not work as is. The reason being that method is implemented by a trait, as opposed
to a type. That information is not present so finding trait methods is missing.
DWARF has a notion of interface types (possibly added for Java). Tom's idea was to use this
interface type as traits.
DWARF only deals with concrete names, not the reference types. So, a given implementation of a
trait for a type would be one of these interfaces (`DW_tag_interface` type). Also, the type for
which it is implemented would describe all the interfaces this type implements. This requires a
DWARF extension.
Issue on Github: [https://github.com/rust-lang/rust/issues/33014]
## Typical process for a Debug Info change (LLVM)
LLVM has Debug Info (DI) builders. This is the primary thing that Rust calls into.
This is why we need to change LLVM first because that is emitted first and not DWARF directly.
This is a kind of metadata that you construct and hand-off to LLVM. For the Rustc/LLVM hand-off
some LLVM DI builder methods are called to construct representation of a type.
The steps of this process are as follows:
1. LLVM needs changing.
LLVM does not emit Interface types at all, so this needs to be implemented in the LLVM first.
Get sign off on LLVM maintainers that this is a good idea.
2. Change the DWARF extension.
3. Update the debuggers.
Update DWARF readers, expression evaluators.
4. Update Rust compiler.
Change it to emit this new information.
### Procedural macro stepping
A deeply profound question is that how do you actually debug a procedural macro?
What is the location you emit for a macro expansion? Consider some of the following cases -
* You can emit location of the invocation of the macro.
* You can emit the location of the definition of the macro.
* You can emit locations of the content of the macro.
RFC: [https://github.com/rust-lang/rfcs/pull/2117]
Focus is to let macros decide what to do. This can be achieved by having some kind of attribute
that lets the macro tell the compiler where the line marker should be. This affects where you
set the breakpoints and what happens when you step it.
## Source file checksums in debug info
Both DWARF and CodeView (PDB) support embedding a cryptographic hash of each source file that
contributed to the associated binary.
The cryptographic hash can be used by a debugger to verify that the source file matches the
executable. If the source file does not match, the debugger can provide a warning to the user.
The hash can also be used to prove that a given source file has not been modified since it was
used to compile an executable. Because MD5 and SHA1 both have demonstrated vulnerabilities,
using SHA256 is recommended for this application.
The Rust compiler stores the hash for each source file in the corresponding `SourceFile` in
the `SourceMap`. The hashes of input files to external crates are stored in `rlib` metadata.
A default hashing algorithm is set in the target specification. This allows the target to
specify the best hash available, since not all targets support all hash algorithms.
The hashing algorithm for a target can also be overridden with the `-Z source-file-checksum=`
command-line option.
#### DWARF 5
DWARF version 5 supports embedding an MD5 hash to validate the source file version in use.
DWARF 5 - Section 6.2.4.1 opcode DW_LNCT_MD5
#### LLVM
LLVM IR supports MD5 and SHA1 (and SHA256 in LLVM 11+) source file checksums in the DIFile node.
[LLVM DIFile documentation](https://llvm.org/docs/LangRef.html#difile)
#### Microsoft Visual C++ Compiler /ZH option
The MSVC compiler supports embedding MD5, SHA1, or SHA256 hashes in the PDB using the `/ZH`
compiler option.
[MSVC /ZH documentation](https://docs.microsoft.com/en-us/cpp/build/reference/zh)
#### Clang
Clang always embeds an MD5 checksum, though this does not appear in documentation.
## Future work
#### Name mangling changes
* New demangler in `libiberty` (gcc source tree).
* New demangler in LLVM or LLDB.
**TODO**: Check the location of the demangler source. [#1157](https://github.com/rust-lang/rustc-dev-guide/issues/1157)
#### Reuse Rust compiler for expressions
This is an important idea because debuggers by and large do not try to implement type
inference. You need to be much more explicit when you type into the debugger than your
actual source code. So, you cannot just copy and paste an expression from your source
code to debugger and expect the same answer but this would be nice. This can be helped
by using compiler.
It is certainly doable but it is a large project. You certainly need a bridge to the
debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang)
have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC.
Both debuggers expression evaluation implement both a superset and a subset of Rust.
They implement just the expression language,
but they also add some extensions like GDB has convenience variables.
Therefore, if you are taking this route,
then you not only need to do this bridge,
but may have to add some mode to let the compiler understand some extensions.
[Tom Tromey discusses debugging support in rustc]: https://www.youtube.com/watch?v=elBxMRSNYr4
[Debugging the Compiler]: compiler-debugging.md
[debugger or debugging tool]: https://en.wikipedia.org/wiki/Debugger
[Bison]: https://www.gnu.org/software/bison/
[ptype]: https://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_109.html
[rust-lang/lldb wiki page]: https://github.com/rust-lang/lldb/wiki
[DWARF]: http://dwarfstd.org
[manual for GDB/Rust]: https://sourceware.org/gdb/onlinedocs/gdb/Rust.html
[GDB Bugzilla]: https://sourceware.org/bugzilla/
[Recursive Descent parser]: https://en.wikipedia.org/wiki/Recursive_descent_parser
[System Integrity Protection]: https://en.wikipedia.org/wiki/System_Integrity_Protection
[https://github.com/rust-dev-tools/gdb]: https://github.com/rust-dev-tools/gdb
[DWARF feature request]: http://dwarfstd.org/ShowIssue.php?issue=180517.2
[https://docs.python.org/3/c-api/stable.html]: https://docs.python.org/3/c-api/stable.html
[https://github.com/rust-lang/rfcs/pull/2117]: https://github.com/rust-lang/rfcs/pull/2117
[https://github.com/rust-lang/rust/issues/33014]: https://github.com/rust-lang/rust/issues/33014
[https://github.com/rust-lang/rust/issues/34457]: https://github.com/rust-lang/rust/issues/34457
[Apple developer documentation for System Integrity Protection]: https://developer.apple.com/library/archive/releasenotes/MacOSX/WhatsNewInOSX/Articles/MacOSX10_11.html#//apple_ref/doc/uid/TP40016227-SW11
[https://github.com/rust-lang/lldb]: https://github.com/rust-lang/lldb
[https://github.com/rust-lang/llvm-project]: https://github.com/rust-lang/llvm-project
[PDB]: https://llvm.org/docs/PDB/index.html
[symbol records]: https://llvm.org/docs/PDB/CodeViewSymbols.html
[type records]: https://llvm.org/docs/PDB/CodeViewTypes.html
[Windows Debugging Tools]: https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/
[`debugger_visualizer` attribute]: https://doc.rust-lang.org/nightly/reference/attributes/debugger.html#the-debugger_visualizer-attribute
|