summaryrefslogtreecommitdiffstats
path: root/src/doc/rustc-dev-guide/src/building/bootstrapping.md
blob: fe34cb5003a258e47f7cf53c22a25877238f3e00 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
# Bootstrapping the compiler

<!-- toc -->

[*Bootstrapping*][boot] is the process of using a compiler to compile itself.
More accurately, it means using an older compiler to compile a newer version
of the same compiler.

This raises a chicken-and-egg paradox: where did the first compiler come from?
It must have been written in a different language. In Rust's case it was
[written in OCaml][ocaml-compiler]. However it was abandoned long ago and the
only way to build a modern version of rustc is a slightly less modern
version.

This is exactly how `x.py` works: it downloads the current beta release of
rustc, then uses it to compile the new compiler.

Note that this documentation mostly covers user-facing information. See
[bootstrap/README.md][bootstrap-internals] to read about bootstrap internals.

[bootstrap-internals]: https://github.com/rust-lang/rust/blob/master/src/bootstrap/README.md

## Stages of bootstrapping

Compiling `rustc` is done in stages. Here's a diagram, adapted from Joshua Nelson's
[talk on bootstrapping][rustconf22-talk] at RustConf 2022, with detailed explanations below.

The `A`, `B`, `C`, and `D` show the ordering of the stages of bootstrapping.
<span style="background-color: lightblue; color: black">Blue</span> nodes are downloaded,
<span style="background-color: yellow; color: black">yellow</span> nodes are built with the
stage0 compiler, and
<span style="background-color: lightgreen; color: black">green</span> nodes are built with the
stage1 compiler.

[rustconf22-talk]: https://www.youtube.com/watch?v=oUIjG-y4zaA

```mermaid
graph TD
    s0c["stage0 compiler (1.63)"]:::downloaded -->|A| s0l("stage0 std (1.64)"):::with-s0c;
    s0c & s0l --- stepb[ ]:::empty;
    stepb -->|B| s0ca["stage0 compiler artifacts (1.64)"]:::with-s0c;
    s0ca -->|copy| s1c["stage1 compiler (1.64)"]:::with-s0c;
    s1c -->|C| s1l("stage1 std (1.64)"):::with-s1c;
    s1c & s1l --- stepd[ ]:::empty;
    stepd -->|D| s1ca["stage1 compiler artifacts (1.64)"]:::with-s1c;
    s1ca -->|copy| s2c["stage2 compiler"]:::with-s1c;

    classDef empty width:0px,height:0px;
    classDef downloaded fill: lightblue;
    classDef with-s0c fill: yellow;
    classDef with-s1c fill: lightgreen;
```

### Stage 0

The stage0 compiler is usually the current _beta_ `rustc` compiler
and its associated dynamic libraries,
which `x.py` will download for you.
(You can also configure `x.py` to use something else.)

The stage0 compiler is then used only to compile `src/bootstrap`, `std`, and `rustc`.
When compiling `rustc`, the stage0 compiler uses the freshly compiled `std`.
There are two concepts at play here:
a compiler (with its set of dependencies)
and its 'target' or 'object' libraries (`std` and `rustc`).
Both are staged, but in a staggered manner.

### Stage 1

The rustc source code is then compiled with the stage0 compiler to produce the stage1 compiler.

### Stage 2

We then rebuild our stage1 compiler with itself to produce the stage2 compiler.

In theory, the stage1 compiler is functionally identical to the stage2 compiler,
but in practice there are subtle differences.
In particular, the stage1 compiler itself was built by stage0
and hence not by the source in your working directory.
This means that the ABI generated by the stage0 compiler may not match the ABI that would have been
made by the stage1 compiler, which can cause problems for dynamic libraries, tests, and tools using
`rustc_private`.

Note that the `proc_macro` crate avoids this issue with a C FFI layer called `proc_macro::bridge`,
allowing it to be used with stage 1.

The `stage2` compiler is the one distributed with `rustup` and all other install methods.
However, it takes a very long time to build
because one must first build the new compiler with an older compiler
and then use that to build the new compiler with itself.
For development, you usually only want the `stage1` compiler,
which you can build with `./x.py build library`.
See [Building the compiler](./how-to-build-and-run.html#building-the-compiler).

### Stage 3

Stage 3 is optional. To sanity check our new compiler, we
can build the libraries with the stage2 compiler. The result ought
to be identical to before, unless something has broken.

### Building the stages

`x.py` tries to be helpful and pick the stage you most likely meant for each subcommand.
These defaults are as follows:

- `check`: `--stage 0`
- `doc`: `--stage 0`
- `build`: `--stage 1`
- `test`: `--stage 1`
- `dist`: `--stage 2`
- `install`: `--stage 2`
- `bench`: `--stage 2`

You can always override the stage by passing `--stage N` explicitly.

For more information about stages, [see below](#understanding-stages-of-bootstrap).

## Complications of bootstrapping

Since the build system uses the current beta compiler to build the stage-1
bootstrapping compiler, the compiler source code can't use some features
until they reach beta (because otherwise the beta compiler doesn't support
them). On the other hand, for [compiler intrinsics][intrinsics] and internal
features, the features _have_ to be used. Additionally, the compiler makes
heavy use of nightly features (`#![feature(...)]`). How can we resolve this
problem?

There are two methods used:
1. The build system sets `--cfg bootstrap` when building with `stage0`, so we
can use `cfg(not(bootstrap))` to only use features when built with `stage1`.
This is useful for e.g. features that were just stabilized, which require
`#![feature(...)]` when built with `stage0`, but not for `stage1`.
2. The build system sets `RUSTC_BOOTSTRAP=1`. This special variable means to
_break the stability guarantees_ of rust: Allow using `#![feature(...)]` with
a compiler that's not nightly. This should never be used except when
bootstrapping the compiler.

[boot]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers)
[intrinsics]: ../appendix/glossary.md#intrinsic
[ocaml-compiler]: https://github.com/rust-lang/rust/tree/ef75860a0a72f79f97216f8aaa5b388d98da6480/src/boot

## Understanding stages of bootstrap

### Overview

This is a detailed look into the separate bootstrap stages.

The convention `x.py` uses is that:

- A `--stage N` flag means to run the stage N compiler (`stageN/rustc`).
- A "stage N artifact" is a build artifact that is _produced_ by the stage N compiler.
- The stage N+1 compiler is assembled from stage N *artifacts*. This
  process is called _uplifting_.

#### Build artifacts

Anything you can build with `x.py` is a _build artifact_.
Build artifacts include, but are not limited to:

- binaries, like `stage0-rustc/rustc-main`
- shared objects, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so`
- [rlib] files, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.rlib`
- HTML files generated by rustdoc, like `doc/std`

[rlib]: ../serialization.md

#### Examples

- `./x.py build --stage 0` means to build with the beta `rustc`.
- `./x.py doc --stage 0` means to document using the beta `rustdoc`.
- `./x.py test --stage 0 library/std` means to run tests on the standard library
    without building `rustc` from source ('build with stage 0, then test the
  artifacts'). If you're working on the standard library, this is normally the
  test command you want.
- `./x.py test tests/ui` means to build the stage 1 compiler and run
  `compiletest` on it. If you're working on the compiler, this is normally the
  test command you want.

#### Examples of what *not* to do

- `./x.py test --stage 0 tests/ui` is not useful: it runs tests on the
  _beta_ compiler and doesn't build `rustc` from source. Use `test tests/ui`
  instead, which builds stage 1 from source.
- `./x.py test --stage 0 compiler/rustc` builds the compiler but runs no tests:
  it's running `cargo test -p rustc`, but cargo doesn't understand Rust's
  tests. You shouldn't need to use this, use `test` instead (without arguments).
- `./x.py build --stage 0 compiler/rustc` builds the compiler, but does not build
  libstd or even libcore. Most of the time, you'll want `./x.py build
  library` instead, which allows compiling programs without needing to define
  lang items.

### Building vs. running

Note that `build --stage N compiler/rustc` **does not** build the stage N compiler:
instead it builds the stage N+1 compiler _using_ the stage N compiler.

In short, _stage 0 uses the stage0 compiler to create stage0 artifacts which
will later be uplifted to be the stage1 compiler_.

In each stage, two major steps are performed:

1. `std` is compiled by the stage N compiler.
2. That `std` is linked to programs built by the stage N compiler,
   including the stage N artifacts (stage N+1 compiler).

This is somewhat intuitive if one thinks of the stage N artifacts as "just"
another program we are building with the stage N compiler:
`build --stage N compiler/rustc` is linking the stage N artifacts to the `std`
built by the stage N compiler.

### Stages and `std`

Note that there are two `std` libraries in play here:
1. The library _linked_ to `stageN/rustc`, which was built by stage N-1 (stage N-1 `std`)
2. The library _used to compile programs_ with `stageN/rustc`, which was
   built by stage N (stage N `std`).

Stage N `std` is pretty much necessary for any useful work with the stage N compiler.
Without it, you can only compile programs with `#![no_core]` -- not terribly useful!

The reason these need to be different is because they aren't necessarily ABI-compatible:
there could be new layout optimizations, changes to MIR, or other changes
to Rust metadata on nightly that aren't present in beta.

This is also where `--keep-stage 1 library/std` comes into play. Since most
changes to the compiler don't actually change the ABI, once you've produced a
`std` in stage 1, you can probably just reuse it with a different compiler.
If the ABI hasn't changed, you're good to go, no need to spend time
recompiling that `std`.
`--keep-stage` simply assumes the previous compile is fine and copies those
artifacts into the appropriate place, skipping the cargo invocation.

### Cross-compiling rustc

*Cross-compiling* is the process of compiling code that will run on another architecture.
For instance, you might want to build an ARM version of rustc using an x86 machine.
Building stage2 `std` is different when you are cross-compiling.

This is because `x.py` uses a trick: if `HOST` and `TARGET` are the same,
it will reuse stage1 `std` for stage2! This is sound because stage1 `std`
was compiled with the stage1 compiler, i.e. a compiler using the source code
you currently have checked out. So it should be identical (and therefore ABI-compatible)
to the `std` that `stage2/rustc` would compile.

However, when cross-compiling, stage1 `std` will only run on the host.
So the stage2 compiler has to recompile `std` for the target.

(See in the table how stage2 only builds non-host `std` targets).

### Why does only libstd use `cfg(bootstrap)`?

The `rustc` generated by the stage0 compiler is linked to the freshly-built
`std`, which means that for the most part only `std` needs to be cfg-gated,
so that `rustc` can use features added to std immediately after their addition,
without need for them to get into the downloaded beta.

Note this is different from any other Rust program: stage1 `rustc`
is built by the _beta_ compiler, but using the _master_ version of libstd!

The only time `rustc` uses `cfg(bootstrap)` is when it adds internal lints
that use diagnostic items. This happens very rarely.

### What is a 'sysroot'?

When you build a project with cargo, the build artifacts for dependencies
are normally stored in `target/debug/deps`. This only contains dependencies cargo
knows about; in particular, it doesn't have the standard library. Where do
`std` or `proc_macro` come from? It comes from the **sysroot**, the root
of a number of directories where the compiler loads build artifacts at runtime.
The sysroot doesn't just store the standard library, though - it includes
anything that needs to be loaded at runtime. That includes (but is not limited
to):

- `libstd`/`libtest`/`libproc_macro`
- The compiler crates themselves, when using `rustc_private`. In-tree these
  are always present; out of tree, you need to install `rustc-dev` with rustup.
- `libLLVM.so`, the shared object file for the LLVM project. In-tree this is
  either built from source or downloaded from CI; out-of-tree, you need to
  install `llvm-tools-preview` with rustup.

All the artifacts listed so far are *compiler* runtime dependencies. You can
see them with `rustc --print sysroot`:

```
$ ls $(rustc --print sysroot)/lib
libchalk_derive-0685d79833dc9b2b.so  libstd-25c6acf8063a3802.so
libLLVM-11-rust-1.50.0-nightly.so    libtest-57470d2aa8f7aa83.so
librustc_driver-4f0cc9f50e53f0ba.so  libtracing_attributes-e4be92c35ab2a33b.so
librustc_macros-5f0ec4a119c6ac86.so  rustlib
```

There are also runtime dependencies for the standard library! These are in
`lib/rustlib`, not `lib/` directly.

```
$ ls $(rustc --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/lib | head -n 5
libaddr2line-6c8e02b8fedc1e5f.rlib
libadler-9ef2480568df55af.rlib
liballoc-9c4002b5f79ba0e1.rlib
libcfg_if-512eb53291f6de7e.rlib
libcompiler_builtins-ef2408da76957905.rlib
```

`rustlib` includes libraries like `hashbrown` and `cfg_if`, which are not part
of the public API of the standard library, but are used to implement it.
`rustlib` is part of the search path for linkers, but `lib` will never be part
of the search path.

#### -Z force-unstable-if-unmarked

Since `rustlib` is part of the search path, it means we have to be careful
about which crates are included in it. In particular, all crates except for
the standard library are built with the flag `-Z force-unstable-if-unmarked`,
which means that you have to use `#![feature(rustc_private)]` in order to
load it (as opposed to the standard library, which is always available).

The `-Z force-unstable-if-unmarked` flag has a variety of purposes to help
enforce that the correct crates are marked as unstable. It was introduced
primarily to allow rustc and the standard library to link to arbitrary crates
on crates.io which do not themselves use `staged_api`. `rustc` also relies on
this flag to mark all of its crates as unstable with the `rustc_private`
feature so that each crate does not need to be carefully marked with
`unstable`.

This flag is automatically applied to all of `rustc` and the standard library
by the bootstrap scripts. This is needed because the compiler and all of its
dependencies are shipped in the sysroot to all users.

This flag has the following effects:

- Marks the crate as "unstable" with the `rustc_private` feature if it is not
  itself marked as stable or unstable.
- Allows these crates to access other forced-unstable crates without any need
  for attributes. Normally a crate would need a `#![feature(rustc_private)]`
  attribute to use other unstable crates. However, that would make it
  impossible for a crate from crates.io to access its own dependencies since
  that crate won't have a `feature(rustc_private)` attribute, but *everything*
  is compiled with `-Z force-unstable-if-unmarked`.

Code which does not use `-Z force-unstable-if-unmarked` should include the
`#![feature(rustc_private)]` crate attribute to access these force-unstable
crates. This is needed for things that link `rustc`, such as `miri` or
`clippy`.

You can find more discussion about sysroots in:
- The [rustdoc PR] explaining why it uses `extern crate` for dependencies loaded from sysroot
- [Discussions about sysroot on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp/topic/deps.20in.20sysroot/)
- [Discussions about building rustdoc out of tree](https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp/topic/How.20to.20create.20an.20executable.20accessing.20.60rustc_private.60.3F)

[rustdoc PR]: https://github.com/rust-lang/rust/pull/76728

## Passing flags to commands invoked by `bootstrap`

`x.py` allows you to pass stage-specific flags to `rustc` and `cargo` when bootstrapping.
The `RUSTFLAGS_BOOTSTRAP` environment variable is passed as `RUSTFLAGS` to the bootstrap stage
(stage0), and `RUSTFLAGS_NOT_BOOTSTRAP` is passed when building artifacts for later stages.
`RUSTFLAGS` will work, but also affects the build of `bootstrap` itself, so it will be rare to want
to use it.
Finally, `MAGIC_EXTRA_RUSTFLAGS` bypasses the `cargo` cache to pass flags to rustc without
recompiling all dependencies.

`RUSTDOCFLAGS`, `RUSTDOCFLAGS_BOOTSTRAP`, and `RUSTDOCFLAGS_NOT_BOOTSTRAP` are anologous to
`RUSTFLAGS`, but for rustdoc.

`CARGOFLAGS` will pass arguments to cargo itself (e.g. `--timings`). `CARGOFLAGS_BOOTSTRAP` and
`CARGOFLAGS_NOT_BOOTSTRAP` work analogously to `RUSTFLAGS_BOOTSTRAP`.

`--test-args` will pass arguments through to the test runner. For `tests/ui`, this is
compiletest; for unit tests and doctests this is the `libtest` runner. Most test runner accept
`--help`, which you can use to find out the options accepted by the runner.

## Environment Variables

During bootstrapping, there are a bunch of compiler-internal environment
variables that are used. If you are trying to run an intermediate version of
`rustc`, sometimes you may need to set some of these environment variables
manually. Otherwise, you get an error like the following:

```text
thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/src/result.rs:1165:5
```

If `./stageN/bin/rustc` gives an error about environment variables, that
usually means something is quite wrong -- or you're trying to compile e.g.
`rustc` or `std` or something that depends on environment variables. In
the unlikely case that you actually need to invoke rustc in such a situation,
you can tell the bootstrap shim to print all env variables by adding `-vvv` to your `x.py` command.

Finally, bootstrap makes use of the [cc-rs crate] which has [its own
method][env-vars] of configuring C compilers and C flags via environment
variables.

[cc-rs crate]: https://github.com/rust-lang/cc-rs
[env-vars]: https://github.com/rust-lang/cc-rs#external-configuration-via-environment-variables

## Clarification of build command's stdout

In this part, we will investigate the build command's stdout in an action
(similar, but more detailed and complete documentation compare to topic above).
When you execute `x.py build --dry-run` command, the build output will be something
like the following:

```text
Building stage0 library artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Copying stage0 library from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
Building stage0 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Copying stage0 rustc from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
Assembling stage1 compiler (x86_64-unknown-linux-gnu)
Building stage1 library artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Copying stage1 library from stage1 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
Building stage1 tool rust-analyzer-proc-macro-srv (x86_64-unknown-linux-gnu)
Building rustdoc for stage1 (x86_64-unknown-linux-gnu)
```

### Building stage0 {std,compiler} artifacts

These steps use the provided (downloaded, usually) compiler to compile the
local Rust source into libraries we can use.

### Copying stage0 {std,rustc}

This copies the library and compiler artifacts from Cargo into
`stage0-sysroot/lib/rustlib/{target-triple}/lib`

### Assembling stage1 compiler

This copies the libraries we built in "building stage0 ... artifacts" into
the stage1 compiler's lib directory. These are the host libraries that the
compiler itself uses to run. These aren't actually used by artifacts the new
compiler generates. This step also copies the rustc and rustdoc binaries we
generated into `build/$HOST/stage/bin`.

The stage1/bin/rustc is a fully functional compiler, but it doesn't yet have
any libraries to link built binaries or libraries to. The next 3 steps will
provide those libraries for it; they are mostly equivalent to constructing
the stage1/bin compiler so we don't go through them individually.