summaryrefslogtreecommitdiffstats
path: root/src/doc/rustc-dev-guide/src/backend/libs-and-metadata.md
blob: 5e005c965ede2f4db025a0cadb02e88220c46b7c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
# Libraries and Metadata

When the compiler sees a reference to an external crate, it needs to load some
information about that crate. This chapter gives an overview of that process,
and the supported file formats for crate libraries.

## Libraries

A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. A
key point of these file formats is that they contain `rustc`-specific
[*metadata*](#metadata). This metadata allows the compiler to discover enough
information about the external crate to understand the items it contains,
which macros it exports, and *much* more.

### rlib

An `rlib` is an [archive file], which is similar to a tar file. This file
format is specific to `rustc`, and may change over time. This file contains:

* Object code, which is the result of code generation. This is used during
  regular linking. There is a separate `.o` file for each [codegen unit]. The
  codegen step can be skipped with the [`-C
  linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o`
  file will only contain LLVM bitcode.
* [LLVM bitcode], which is a binary representation of LLVM's intermediate
  representation, which is embedded as a section in the `.o` files. This can
  be used for [Link Time Optimization] (LTO). This can be removed with the
  [`-C embed-bitcode=no`][embed-bitcode] CLI option to improve compile times
  and reduce disk space if LTO is not needed.
* `rustc` [metadata], in a file named `lib.rmeta`.
* A symbol table, which is generally a list of symbols with offsets to the
  object file that contain that symbol. This is pretty standard for archive
  files.

[archive file]: https://en.wikipedia.org/wiki/Ar_(Unix)
[LLVM bitcode]: https://llvm.org/docs/BitCodeFormat.html
[Link Time Optimization]: https://llvm.org/docs/LinkTimeOptimization.html
[codegen unit]: ../backend/codegen.md
[embed-bitcode]: https://doc.rust-lang.org/rustc/codegen-options/index.html#embed-bitcode
[linker-plugin-lto]: https://doc.rust-lang.org/rustc/codegen-options/index.html#linker-plugin-lto

### dylib

A `dylib` is a platform-specific shared library. It includes the `rustc`
[metadata] in a special link section called `.rustc` in a compressed format.

### rmeta

An `rmeta` file is custom binary format that contains the [metadata] for the
crate. This file can be used for fast "checks" of a project by skipping all
code generation (as is done with `cargo check`), collecting enough information
for documentation (as is done with `cargo doc`), or for
[pipelining](#pipelining). This file is created if the
[`--emit=metadata`][emit] CLI option is used.

`rmeta` files do not support linking, since they do not contain compiled
object files.

[emit]: https://doc.rust-lang.org/rustc/command-line-arguments.html#option-emit

## Metadata

The metadata contains a wide swath of different elements. This guide will not
go into detail of every field it contains. You are encouraged to browse the
[`CrateRoot`] definition to get a sense of the different elements it contains.
Everything about metadata encoding and decoding is in the [`rustc_metadata`]
package.

Here are a few highlights of things it contains:

* The version of the `rustc` compiler. The compiler will refuse to load files
  from any other version.
* The [Strict Version Hash](#strict-version-hash) (SVH). This helps ensure the
  correct dependency is loaded.
* The [Stable Crate Id](#stable-crate-id). This is a hash used
  to identify crates.
* Information about all the source files in the library. This can be used for
  a variety of things, such as diagnostics pointing to sources in a
  dependency.
* Information about exported macros, traits, types, and items. Generally,
  anything that's needed to be known when a path references something inside a
  crate dependency.
* Encoded [MIR]. This is optional, and only encoded if needed for code
  generation. `cargo check` skips this for performance reasons.

[`CrateRoot`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.CrateRoot.html
[`rustc_metadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html
[MIR]: ../mir/index.md

### Strict Version Hash

The Strict Version Hash ([SVH], also known as the "crate hash") is a 64-bit
hash that is used to ensure that the correct crate dependencies are loaded. It
is possible for a directory to contain multiple copies of the same dependency
built with different settings, or built from different sources. The crate
loader will skip any crates that have the wrong SVH.

The SVH is also used for the [incremental compilation] session filename,
though that usage is mostly historic.

The hash includes a variety of elements:

* Hashes of the HIR nodes.
* All of the upstream crate hashes.
* All of the source filenames.
* Hashes of certain command-line flags (like `-C metadata` via the [Crate
  Disambiguator](#crate-disambiguator), and all CLI options marked with
  `[TRACKED]`).

See [`compute_hir_hash`] for where the hash is actually computed.

[SVH]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/svh/struct.Svh.html
[incremental compilation]: ../queries/incremental-compilation.md
[`compute_hir_hash`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast_lowering/struct.LoweringContext.html#method.compute_hir_hash

### Stable Crate Id

The [`StableCrateId`] is a 64-bit hash used to identify different crates with
potentially the same name. It is a hash of the crate name and all the
[`-C metadata`] CLI options computed in [`StableCrateId::new`]. It is
used in a variety of places, such as symbol name mangling, crate loading, and
much more.

By default, all Rust symbols are mangled and incorporate the stable crate id.
This allows multiple versions of the same crate to be included together. Cargo
automatically generates `-C metadata` hashes based on a variety of factors,
like the package version, source, and the target kind (a lib and test can have
the same crate name, so they need to be disambiguated).

[`StableCrateId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/def_id/struct.StableCrateId.html
[`StableCrateId::new`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/def_id/struct.StableCrateId.html#method.new
[`-C metadata`]: https://doc.rust-lang.org/rustc/codegen-options/index.html#metadata

## Crate loading

Crate loading can have quite a few subtle complexities. During [name
resolution], when an external crate is referenced (via an `extern crate` or
path), the resolver uses the [`CrateLoader`] which is responsible for finding
the crate libraries and loading the [metadata] for them. After the dependency
is loaded, the `CrateLoader` will provide the information the resolver needs
to perform its job (such as expanding macros, resolving paths, etc.).

To load each external crate, the `CrateLoader` uses a [`CrateLocator`] to
actually find the correct files for one specific crate. There is some great
documentation in the [`locator`] module that goes into detail on how loading
works, and I strongly suggest reading it to get the full picture.

The location of a dependency can come from several different places. Direct
dependencies are usually passed with `--extern` flags, and the loader can look
at those directly. Direct dependencies often have references to their own
dependencies, which need to be loaded, too. These are usually found by
scanning the directories passed with the `-L` flag for any file whose metadata
contains a matching crate name and [SVH](#strict-version-hash). The loader
will also look at the [sysroot] to find dependencies.

As crates are loaded, they are kept in the [`CStore`] with the crate metadata
wrapped in the [`CrateMetadata`] struct. After resolution and expansion, the
`CStore` will make its way into the [`GlobalCtxt`] for the rest of
compilation.

[name resolution]: ../name-resolution.md
[`CrateLoader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CrateLoader.html
[`CrateLocator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/struct.CrateLocator.html
[`locator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/index.html
[`CStore`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CStore.html
[`CrateMetadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/decoder/struct.CrateMetadata.html
[`GlobalCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.GlobalCtxt.html
[sysroot]: ../building/bootstrapping.md#what-is-a-sysroot

## Pipelining

One trick to improve compile times is to start building a crate as soon as the
metadata for its dependencies is available. For a library, there is no need to
wait for the code generation of dependencies to finish. Cargo implements this
technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each
dependency as well as an [`rlib`](#rlib). As early as it can, `rustc` will
save the `rmeta` file to disk before it continues to the code generation
phase. The compiler sends a JSON message to let the build tool know that it
can start building the next crate if possible.

The [crate loading](#crate-loading) system is smart enough to know when it
sees an `rmeta` file to use that if the `rlib` is not there (or has only been
partially written).

This pipelining isn't possible for binaries, because the linking phase will
require the code generation of all its dependencies. In the future, it may be
possible to further improve this scenario by splitting linking into a separate
command (see [#64191]).

[#64191]: https://github.com/rust-lang/rust/issues/64191

[metadata]: #metadata