summaryrefslogtreecommitdiffstats
path: root/src/doc/rustc-dev-guide/src/building/bootstrapping.md
diff options
context:
space:
mode:
Diffstat (limited to 'src/doc/rustc-dev-guide/src/building/bootstrapping.md')
-rw-r--r--src/doc/rustc-dev-guide/src/building/bootstrapping.md415
1 files changed, 415 insertions, 0 deletions
diff --git a/src/doc/rustc-dev-guide/src/building/bootstrapping.md b/src/doc/rustc-dev-guide/src/building/bootstrapping.md
new file mode 100644
index 000000000..fd54de20c
--- /dev/null
+++ b/src/doc/rustc-dev-guide/src/building/bootstrapping.md
@@ -0,0 +1,415 @@
+# Bootstrapping the Compiler
+
+<!-- toc -->
+
+
+[*Bootstrapping*][boot] is the process of using a compiler to compile itself.
+More accurately, it means using an older compiler to compile a newer version
+of the same compiler.
+
+This raises a chicken-and-egg paradox: where did the first compiler come from?
+It must have been written in a different language. In Rust's case it was
+[written in OCaml][ocaml-compiler]. However it was abandoned long ago and the
+only way to build a modern version of rustc is a slightly less modern
+version.
+
+This is exactly how `x.py` works: it downloads the current beta release of
+rustc, then uses it to compile the new compiler.
+
+## Stages of bootstrapping
+
+Compiling `rustc` is done in stages.
+
+### Stage 0
+
+The stage0 compiler is usually the current _beta_ `rustc` compiler
+and its associated dynamic libraries,
+which `x.py` will download for you.
+(You can also configure `x.py` to use something else.)
+
+The stage0 compiler is then used only to compile `rustbuild`, `std`, and `rustc`.
+When compiling `rustc`, the stage0 compiler uses the freshly compiled `std`.
+There are two concepts at play here:
+a compiler (with its set of dependencies)
+and its 'target' or 'object' libraries (`std` and `rustc`).
+Both are staged, but in a staggered manner.
+
+### Stage 1
+
+The rustc source code is then compiled with the stage0 compiler to produce the stage1 compiler.
+
+### Stage 2
+
+We then rebuild our stage1 compiler with itself to produce the stage2 compiler.
+
+In theory, the stage1 compiler is functionally identical to the stage2 compiler,
+but in practice there are subtle differences.
+In particular, the stage1 compiler itself was built by stage0
+and hence not by the source in your working directory.
+This means that the symbol names used in the compiler source
+may not match the symbol names that would have been made by the stage1 compiler,
+which can cause problems for dynamic libraries and tests.
+
+The `stage2` compiler is the one distributed with `rustup` and all other install methods.
+However, it takes a very long time to build
+because one must first build the new compiler with an older compiler
+and then use that to build the new compiler with itself.
+For development, you usually only want the `stage1` compiler,
+which you can build with `./x.py build library`.
+See [Building the Compiler](./how-to-build-and-run.html#building-the-compiler).
+
+### Stage 3
+
+Stage 3 is optional. To sanity check our new compiler, we
+can build the libraries with the stage2 compiler. The result ought
+to be identical to before, unless something has broken.
+
+### Building the stages
+
+`x.py` tries to be helpful and pick the stage you most likely meant for each subcommand.
+These defaults are as follows:
+
+- `check`: `--stage 0`
+- `doc`: `--stage 0`
+- `build`: `--stage 1`
+- `test`: `--stage 1`
+- `dist`: `--stage 2`
+- `install`: `--stage 2`
+- `bench`: `--stage 2`
+
+You can always override the stage by passing `--stage N` explicitly.
+
+For more information about stages, [see below](#understanding-stages-of-bootstrap).
+
+## Complications of bootstrapping
+
+Since the build system uses the current beta compiler to build the stage-1
+bootstrapping compiler, the compiler source code can't use some features
+until they reach beta (because otherwise the beta compiler doesn't support
+them). On the other hand, for [compiler intrinsics][intrinsics] and internal
+features, the features _have_ to be used. Additionally, the compiler makes
+heavy use of nightly features (`#![feature(...)]`). How can we resolve this
+problem?
+
+There are two methods used:
+1. The build system sets `--cfg bootstrap` when building with `stage0`, so we
+can use `cfg(not(bootstrap))` to only use features when built with `stage1`.
+This is useful for e.g. features that were just stabilized, which require
+`#![feature(...)]` when built with `stage0`, but not for `stage1`.
+2. The build system sets `RUSTC_BOOTSTRAP=1`. This special variable means to
+_break the stability guarantees_ of rust: Allow using `#![feature(...)]` with
+a compiler that's not nightly. This should never be used except when
+bootstrapping the compiler.
+
+[boot]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers)
+[intrinsics]: ../appendix/glossary.md#intrinsic
+[ocaml-compiler]: https://github.com/rust-lang/rust/tree/ef75860a0a72f79f97216f8aaa5b388d98da6480/src/boot
+
+## Contributing to bootstrap
+
+When you use the bootstrap system, you'll call it through `x.py`.
+However, most of the code lives in `src/bootstrap`.
+`bootstrap` has a difficult problem: it is written in Rust, but yet it is run
+before the Rust compiler is built! To work around this, there are two
+components of bootstrap: the main one written in rust, and `bootstrap.py`.
+`bootstrap.py` is what gets run by `x.py`. It takes care of downloading the
+`stage0` compiler, which will then build the bootstrap binary written in
+Rust.
+
+Because there are two separate codebases behind `x.py`, they need to
+be kept in sync. In particular, both `bootstrap.py` and the bootstrap binary
+parse `config.toml` and read the same command line arguments. `bootstrap.py`
+keeps these in sync by setting various environment variables, and the
+programs sometimes have to add arguments that are explicitly ignored, to be
+read by the other.
+
+### Adding a setting to config.toml
+
+This section is a work in progress. In the meantime, you can see an example
+contribution [here][bootstrap-build].
+
+[bootstrap-build]: https://github.com/rust-lang/rust/pull/71994
+
+## Understanding stages of bootstrap
+
+### Overview
+
+This is a detailed look into the separate bootstrap stages.
+
+The convention `x.py` uses is that:
+
+- A `--stage N` flag means to run the stage N compiler (`stageN/rustc`).
+- A "stage N artifact" is a build artifact that is _produced_ by the stage N compiler.
+- The stage N+1 compiler is assembled from stage N *artifacts*. This
+ process is called _uplifting_.
+
+#### Build artifacts
+
+Anything you can build with `x.py` is a _build artifact_.
+Build artifacts include, but are not limited to:
+
+- binaries, like `stage0-rustc/rustc-main`
+- shared objects, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so`
+- [rlib] files, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.rlib`
+- HTML files generated by rustdoc, like `doc/std`
+
+[rlib]: ../serialization.md
+
+#### Examples
+
+- `./x.py build --stage 0` means to build with the beta `rustc`.
+- `./x.py doc --stage 0` means to document using the beta `rustdoc`.
+- `./x.py test --stage 0 library/std` means to run tests on the standard library
+ without building `rustc` from source ('build with stage 0, then test the
+ artifacts'). If you're working on the standard library, this is normally the
+ test command you want.
+- `./x.py test src/test/ui` means to build the stage 1 compiler and run
+ `compiletest` on it. If you're working on the compiler, this is normally the
+ test command you want.
+
+#### Examples of what *not* to do
+
+- `./x.py test --stage 0 src/test/ui` is not useful: it runs tests on the
+ _beta_ compiler and doesn't build `rustc` from source. Use `test src/test/ui`
+ instead, which builds stage 1 from source.
+- `./x.py test --stage 0 compiler/rustc` builds the compiler but runs no tests:
+ it's running `cargo test -p rustc`, but cargo doesn't understand Rust's
+ tests. You shouldn't need to use this, use `test` instead (without arguments).
+- `./x.py build --stage 0 compiler/rustc` builds the compiler, but does not build
+ libstd or even libcore. Most of the time, you'll want `./x.py build
+ library` instead, which allows compiling programs without needing to define
+ lang items.
+
+### Building vs. running
+
+Note that `build --stage N compiler/rustc` **does not** build the stage N compiler:
+instead it builds the stage N+1 compiler _using_ the stage N compiler.
+
+In short, _stage 0 uses the stage0 compiler to create stage0 artifacts which
+will later be uplifted to be the stage1 compiler_.
+
+In each stage, two major steps are performed:
+
+1. `std` is compiled by the stage N compiler.
+2. That `std` is linked to programs built by the stage N compiler,
+ including the stage N artifacts (stage N+1 compiler).
+
+This is somewhat intuitive if one thinks of the stage N artifacts as "just"
+another program we are building with the stage N compiler:
+`build --stage N compiler/rustc` is linking the stage N artifacts to the `std`
+built by the stage N compiler.
+
+Here is a chart of a full build using `x.py`:
+
+<img alt="A diagram of the rustc compilation phases" src="../img/rustc_stages.svg" class="center" />
+
+Keep in mind this diagram is a simplification, i.e. `rustdoc` can be built at
+different stages, the process is a bit different when passing flags such as
+`--keep-stage`, or if there are non-host targets.
+
+### Stages and `std`
+
+Note that there are two `std` libraries in play here:
+1. The library _linked_ to `stageN/rustc`, which was built by stage N-1 (stage N-1 `std`)
+2. The library _used to compile programs_ with `stageN/rustc`, which was
+ built by stage N (stage N `std`).
+
+Stage N `std` is pretty much necessary for any useful work with the stage N compiler.
+Without it, you can only compile programs with `#![no_core]` -- not terribly useful!
+
+The reason these need to be different is because they aren't necessarily ABI-compatible:
+there could be new layout optimizations, changes to MIR, or other changes
+to Rust metadata on nightly that aren't present in beta.
+
+This is also where `--keep-stage 1 library/std` comes into play. Since most
+changes to the compiler don't actually change the ABI, once you've produced a
+`std` in stage 1, you can probably just reuse it with a different compiler.
+If the ABI hasn't changed, you're good to go, no need to spend time
+recompiling that `std`.
+`--keep-stage` simply assumes the previous compile is fine and copies those
+artifacts into the appropriate place, skipping the cargo invocation.
+
+### Cross-compiling rustc
+
+*Cross-compiling* is the process of compiling code that will run on another architecture.
+For instance, you might want to build an ARM version of rustc using an x86 machine.
+Building stage2 `std` is different when you are cross-compiling.
+
+This is because `x.py` uses a trick: if `HOST` and `TARGET` are the same,
+it will reuse stage1 `std` for stage2! This is sound because stage1 `std`
+was compiled with the stage1 compiler, i.e. a compiler using the source code
+you currently have checked out. So it should be identical (and therefore ABI-compatible)
+to the `std` that `stage2/rustc` would compile.
+
+However, when cross-compiling, stage1 `std` will only run on the host.
+So the stage2 compiler has to recompile `std` for the target.
+
+(See in the table how stage2 only builds non-host `std` targets).
+
+### Why does only libstd use `cfg(bootstrap)`?
+
+The `rustc` generated by the stage0 compiler is linked to the freshly-built
+`std`, which means that for the most part only `std` needs to be cfg-gated,
+so that `rustc` can use features added to std immediately after their addition,
+without need for them to get into the downloaded beta.
+
+Note this is different from any other Rust program: stage1 `rustc`
+is built by the _beta_ compiler, but using the _master_ version of libstd!
+
+The only time `rustc` uses `cfg(bootstrap)` is when it adds internal lints
+that use diagnostic items. This happens very rarely.
+
+### What is a 'sysroot'?
+
+When you build a project with cargo, the build artifacts for dependencies
+are normally stored in `target/debug/deps`. This only contains dependencies cargo
+knows about; in particular, it doesn't have the standard library. Where do
+`std` or `proc_macro` come from? It comes from the **sysroot**, the root
+of a number of directories where the compiler loads build artifacts at runtime.
+The sysroot doesn't just store the standard library, though - it includes
+anything that needs to be loaded at runtime. That includes (but is not limited
+to):
+
+- `libstd`/`libtest`/`libproc_macro`
+- The compiler crates themselves, when using `rustc_private`. In-tree these
+ are always present; out of tree, you need to install `rustc-dev` with rustup.
+- `libLLVM.so`, the shared object file for the LLVM project. In-tree this is
+ either built from source or downloaded from CI; out-of-tree, you need to
+ install `llvm-tools-preview` with rustup.
+
+All the artifacts listed so far are *compiler* runtime dependencies. You can
+see them with `rustc --print sysroot`:
+
+```
+$ ls $(rustc --print sysroot)/lib
+libchalk_derive-0685d79833dc9b2b.so libstd-25c6acf8063a3802.so
+libLLVM-11-rust-1.50.0-nightly.so libtest-57470d2aa8f7aa83.so
+librustc_driver-4f0cc9f50e53f0ba.so libtracing_attributes-e4be92c35ab2a33b.so
+librustc_macros-5f0ec4a119c6ac86.so rustlib
+```
+
+There are also runtime dependencies for the standard library! These are in
+`lib/rustlib`, not `lib/` directly.
+
+```
+$ ls $(rustc --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/lib | head -n 5
+libaddr2line-6c8e02b8fedc1e5f.rlib
+libadler-9ef2480568df55af.rlib
+liballoc-9c4002b5f79ba0e1.rlib
+libcfg_if-512eb53291f6de7e.rlib
+libcompiler_builtins-ef2408da76957905.rlib
+```
+
+`rustlib` includes libraries like `hashbrown` and `cfg_if`, which are not part
+of the public API of the standard library, but are used to implement it.
+`rustlib` is part of the search path for linkers, but `lib` will never be part
+of the search path.
+
+#### -Z force-unstable-if-unmarked
+
+Since `rustlib` is part of the search path, it means we have to be careful
+about which crates are included in it. In particular, all crates except for
+the standard library are built with the flag `-Z force-unstable-if-unmarked`,
+which means that you have to use `#![feature(rustc_private)]` in order to
+load it (as opposed to the standard library, which is always available).
+
+The `-Z force-unstable-if-unmarked` flag has a variety of purposes to help
+enforce that the correct crates are marked as unstable. It was introduced
+primarily to allow rustc and the standard library to link to arbitrary crates
+on crates.io which do not themselves use `staged_api`. `rustc` also relies on
+this flag to mark all of its crates as unstable with the `rustc_private`
+feature so that each crate does not need to be carefully marked with
+`unstable`.
+
+This flag is automatically applied to all of `rustc` and the standard library
+by the bootstrap scripts. This is needed because the compiler and all of its
+dependencies are shipped in the sysroot to all users.
+
+This flag has the following effects:
+
+- Marks the crate as "unstable" with the `rustc_private` feature if it is not
+ itself marked as stable or unstable.
+- Allows these crates to access other forced-unstable crates without any need
+ for attributes. Normally a crate would need a `#![feature(rustc_private)]`
+ attribute to use other unstable crates. However, that would make it
+ impossible for a crate from crates.io to access its own dependencies since
+ that crate won't have a `feature(rustc_private)` attribute, but *everything*
+ is compiled with `-Z force-unstable-if-unmarked`.
+
+Code which does not use `-Z force-unstable-if-unmarked` should include the
+`#![feature(rustc_private)]` crate attribute to access these force-unstable
+crates. This is needed for things that link `rustc`, such as `miri`, `rls`, or
+`clippy`.
+
+You can find more discussion about sysroots in:
+- The [rustdoc PR] explaining why it uses `extern crate` for dependencies loaded from sysroot
+- [Discussions about sysroot on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp/topic/deps.20in.20sysroot/)
+- [Discussions about building rustdoc out of tree](https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp/topic/How.20to.20create.20an.20executable.20accessing.20.60rustc_private.60.3F)
+
+[rustdoc PR]: https://github.com/rust-lang/rust/pull/76728
+
+### Directories and artifacts generated by `x.py`
+
+The following tables indicate the outputs of various stage actions:
+
+| Stage 0 Action | Output |
+|-----------------------------------------------------------|----------------------------------------------|
+| `beta` extracted | `build/HOST/stage0` |
+| `stage0` builds `bootstrap` | `build/bootstrap` |
+| `stage0` builds `test`/`std` | `build/HOST/stage0-std/TARGET` |
+| copy `stage0-std` (HOST only) | `build/HOST/stage0-sysroot/lib/rustlib/HOST` |
+| `stage0` builds `rustc` with `stage0-sysroot` | `build/HOST/stage0-rustc/HOST` |
+| copy `stage0-rustc` (except executable) | `build/HOST/stage0-sysroot/lib/rustlib/HOST` |
+| build `llvm` | `build/HOST/llvm` |
+| `stage0` builds `codegen` with `stage0-sysroot` | `build/HOST/stage0-codegen/HOST` |
+| `stage0` builds `rustdoc`, `clippy`, `miri`, with `stage0-sysroot` | `build/HOST/stage0-tools/HOST` |
+
+`--stage=0` stops here.
+
+| Stage 1 Action | Output |
+|-----------------------------------------------------|---------------------------------------|
+| copy (uplift) `stage0-rustc` executable to `stage1` | `build/HOST/stage1/bin` |
+| copy (uplift) `stage0-codegen` to `stage1` | `build/HOST/stage1/lib` |
+| copy (uplift) `stage0-sysroot` to `stage1` | `build/HOST/stage1/lib` |
+| `stage1` builds `test`/`std` | `build/HOST/stage1-std/TARGET` |
+| copy `stage1-std` (HOST only) | `build/HOST/stage1/lib/rustlib/HOST` |
+| `stage1` builds `rustc` | `build/HOST/stage1-rustc/HOST` |
+| copy `stage1-rustc` (except executable) | `build/HOST/stage1/lib/rustlib/HOST` |
+| `stage1` builds `codegen` | `build/HOST/stage1-codegen/HOST` |
+
+`--stage=1` stops here.
+
+| Stage 2 Action | Output |
+|--------------------------------------------------------|-----------------------------------------------------------------|
+| copy (uplift) `stage1-rustc` executable | `build/HOST/stage2/bin` |
+| copy (uplift) `stage1-sysroot` | `build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST` |
+| `stage2` builds `test`/`std` (not HOST targets) | `build/HOST/stage2-std/TARGET` |
+| copy `stage2-std` (not HOST targets) | `build/HOST/stage2/lib/rustlib/TARGET` |
+| `stage2` builds `rustdoc`, `clippy`, `miri` | `build/HOST/stage2-tools/HOST` |
+| copy `rustdoc` | `build/HOST/stage2/bin` |
+
+`--stage=2` stops here.
+
+## Passing stage-specific flags to `rustc`
+
+`x.py` allows you to pass stage-specific flags to `rustc` when bootstrapping.
+The `RUSTFLAGS_BOOTSTRAP` environment variable is passed as RUSTFLAGS to the bootstrap stage
+(stage0), and `RUSTFLAGS_NOT_BOOTSTRAP` is passed when building artifacts for later stages.
+
+## Environment Variables
+
+During bootstrapping, there are a bunch of compiler-internal environment
+variables that are used. If you are trying to run an intermediate version of
+`rustc`, sometimes you may need to set some of these environment variables
+manually. Otherwise, you get an error like the following:
+
+```text
+thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/src/result.rs:1165:5
+```
+
+If `./stageN/bin/rustc` gives an error about environment variables, that
+usually means something is quite wrong -- or you're trying to compile e.g.
+`rustc` or `std` or something that depends on environment variables. In
+the unlikely case that you actually need to invoke rustc in such a situation,
+you can find the environment variable values by adding the following flag to
+your `x.py` command: `--on-fail=print-env`.