summaryrefslogtreecommitdiffstats
path: root/src/doc/rustc-dev-guide/src/query.md
blob: 3d60059bdbc20d9db3e0d7458fc7b4b54a21fd3c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
# Queries: demand-driven compilation

<!-- toc -->

As described in [the high-level overview of the compiler][hl], the Rust compiler
is still (as of <!-- date-check --> July 2021) transitioning from a
traditional "pass-based" setup to a "demand-driven" system. The compiler query
system is the key to rustc's demand-driven organization.
The idea is pretty simple. Instead of entirely independent passes
(parsing, type-checking, etc.), a set of function-like *queries*
compute information about the input source. For example,
there is a query called `type_of` that, given the [`DefId`] of
some item, will compute the type of that item and return it to you.

[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/def_id/struct.DefId.html
[hl]: ./compiler-src.md

Query execution is *memoized*. The first time you invoke a
query, it will go do the computation, but the next time, the result is
returned from a hashtable. Moreover, query execution fits nicely into
*incremental computation*; the idea is roughly that, when you invoke a
query, the result *may* be returned to you by loading stored data
from disk.[^incr-comp-detail]

Eventually, we want the entire compiler
control-flow to be query driven. There will effectively be one
top-level query (`compile`) that will run compilation on a crate; this
will in turn demand information about that crate, starting from the
*end*.  For example:

- The `compile` query might demand to get a list of codegen-units
  (i.e. modules that need to be compiled by LLVM).
- But computing the list of codegen-units would invoke some subquery
  that returns the list of all modules defined in the Rust source.
- That query in turn would invoke something asking for the HIR.
- This keeps going further and further back until we wind up doing the
  actual parsing.

Although this vision is not fully realized, large sections of the
compiler (for example, generating [MIR](./mir/index.md)) currently work exactly like this.

[^incr-comp-detail]: The ["Incremental Compilation in Detail](queries/incremental-compilation-in-detail.md) chapter gives a more
in-depth description of what queries are and how they work.
If you intend to write a query of your own, this is a good read.

## Invoking queries

Invoking a query is simple. The [`TyCtxt`] ("type context") struct offers a method
for each defined query. For example, to invoke the `type_of`
query, you would just do this:

```rust,ignore
let ty = tcx.type_of(some_def_id);
```

[`TyCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html

## How the compiler executes a query

So you may be wondering what happens when you invoke a query
method. The answer is that, for each query, the compiler maintains a
cache – if your query has already been executed, then, the answer is
simple: we clone the return value out of the cache and return it
(therefore, you should try to ensure that the return types of queries
are cheaply cloneable; insert an `Rc` if necessary).

### Providers

If, however, the query is *not* in the cache, then the compiler will
try to find a suitable **provider**. A provider is a function that has
been defined and linked into the compiler somewhere that contains the
code to compute the result of the query.

**Providers are defined per-crate.** The compiler maintains,
internally, a table of providers for every crate, at least
conceptually. Right now, there are really two sets: the providers for
queries about the **local crate** (that is, the one being compiled)
and providers for queries about **external crates** (that is,
dependencies of the local crate). Note that what determines the crate
that a query is targeting is not the *kind* of query, but the *key*.
For example, when you invoke `tcx.type_of(def_id)`, that could be a
local query or an external query, depending on what crate the `def_id`
is referring to (see the [`self::keys::Key`][Key] trait for more
information on how that works).

Providers always have the same signature:

```rust,ignore
fn provider<'tcx>(
    tcx: TyCtxt<'tcx>,
    key: QUERY_KEY,
) -> QUERY_RESULT {
    ...
}
```

Providers take two arguments: the `tcx` and the query key.
They return the result of the query.

###  How providers are setup

When the tcx is created, it is given the providers by its creator using
the [`Providers`][providers_struct] struct. This struct is generated by
the macros here, but it is basically a big list of function pointers:

[providers_struct]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/query/struct.Providers.html

```rust,ignore
struct Providers {
    type_of: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx>,
    ...
}
```

At present, we have one copy of the struct for local crates, and one
for external crates, though the plan is that we may eventually have
one per crate.

These `Providers` structs are ultimately created and populated by
`rustc_driver`, but it does this by distributing the work
throughout the other `rustc_*` crates. This is done by invoking
various [`provide`][provide_fn] functions. These functions tend to look
something like this:

[provide_fn]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/fn.provide.html

```rust,ignore
pub fn provide(providers: &mut Providers) {
    *providers = Providers {
        type_of,
        ..*providers
    };
}
```

That is, they take an `&mut Providers` and mutate it in place. Usually
we use the formulation above just because it looks nice, but you could
as well do `providers.type_of = type_of`, which would be equivalent.
(Here, `type_of` would be a top-level function, defined as we saw
before.) So, if we want to add a provider for some other query,
let's call it `fubar`, into the crate above, we might modify the `provide()`
function like so:

```rust,ignore
pub fn provide(providers: &mut Providers) {
    *providers = Providers {
        type_of,
        fubar,
        ..*providers
    };
}

fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: DefId) -> Fubar<'tcx> { ... }
```

N.B. Most of the `rustc_*` crates only provide **local
providers**. Almost all **extern providers** wind up going through the
[`rustc_metadata` crate][rustc_metadata], which loads the information
from the crate metadata. But in some cases there are crates that
provide queries for *both* local and external crates, in which case
they define both a `provide` and a `provide_extern` function, through
[`wasm_import_module_map`][wasm_import_module_map], that `rustc_driver` can invoke.

[rustc_metadata]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html
[wasm_import_module_map]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/back/symbol_export/fn.wasm_import_module_map.html

## Adding a new query

How do you add a new query?
Defining a query takes place in two steps:

1. Specify the query name and its arguments.
2. Supply query providers where needed.

To specify the query name and arguments, you simply add an entry to
the big macro invocation in
[`compiler/rustc_middle/src/query/mod.rs`][query-mod], which looks something like:

[query-mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/query/index.html

```rust,ignore
rustc_queries! {
    Other {
        /// Records the type of every item.
        query type_of(key: DefId) -> Ty<'tcx> {
            cache { key.is_local() }
        }
    }

    ...
}
```

Queries are grouped into categories (`Other`, `Codegen`, `TypeChecking`, etc.).
Each group contains one or more queries.

A query definition has the following form:

```rust,ignore
query type_of(key: DefId) -> Ty<'tcx> { ... }
^^^^^ ^^^^^^^      ^^^^^     ^^^^^^^^   ^^^
|     |            |         |          |
|     |            |         |          query modifiers
|     |            |         result type
|     |            query key type
|     name of query
query keyword
```

Let's go over these elements one by one:

- **Query keyword:** indicates a start of a query definition.
- **Name of query:** the name of the query method
  (`tcx.type_of(..)`). Also used as the name of a struct
  (`ty::queries::type_of`) that will be generated to represent
  this query.
- **Query key type:** the type of the argument to this query.
  This type must implement the [`ty::query::keys::Key`][Key] trait, which
  defines (for example) how to map it to a crate, and so forth.
- **Result type of query:** the type produced by this query. This type
  should (a) not use `RefCell` or other interior mutability and (b) be
  cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for
  non-trivial data types.[^steal]
- **Query modifiers:** various flags and options that customize how the
  query is processed (mostly with respect to [incremental compilation][incrcomp]).

[Key]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_query_impl/keys/trait.Key.html
[incrcomp]: queries/incremental-compilation-in-detail.html#query-modifiers

So, to add a query:

- Add an entry to `rustc_queries!` using the format above.
- Link the provider by modifying the appropriate `provide` method;
  or add a new one if needed and ensure that `rustc_driver` is invoking it.

[^steal]: The one exception to those rules is the `ty::steal::Steal` type,
which is used to cheaply modify MIR in place. See the definition
of `Steal` for more details. New uses of `Steal` should **not** be
added without alerting `@rust-lang/compiler`.

### Query structs and descriptions

For each query, the `rustc_queries` macro will generate a "query struct"
named after the query. This struct is a kind of placeholder
describing the query. Each query struct implements the
[`self::config::QueryConfig`][QueryConfig] trait, which has associated types for the
key/value of that particular query. Basically the code generated looks something
like this:

```rust,ignore
// Dummy struct representing a particular kind of query:
pub struct type_of<'tcx> { data: PhantomData<&'tcx ()> }

impl<'tcx> QueryConfig for type_of<'tcx> {
  type Key = DefId;
  type Value = Ty<'tcx>;

  const NAME: QueryName = QueryName::type_of;
  const CATEGORY: ProfileCategory = ProfileCategory::Other;
}
```

There is an additional trait that you may wish to implement called
[`self::config::QueryDescription`][QueryDescription]. This trait is
used during cycle errors to give a "human readable" name for the query,
so that we can summarize what was happening when the cycle occurred.
Implementing this trait is optional if the query key is `DefId`, but
if you *don't* implement it, you get a pretty generic error ("processing `foo`...").
You can put new impls into the `config` module. They look something like this:

[QueryConfig]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_query_system/query/config/trait.QueryConfig.html
[QueryDescription]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_query_system/query/config/trait.QueryDescription.html

```rust,ignore
impl<'tcx> QueryDescription for queries::type_of<'tcx> {
    fn describe(tcx: TyCtxt, key: DefId) -> String {
        format!("computing the type of `{}`", tcx.def_path_str(key))
    }
}
```

Another option is to add `desc` modifier:

```rust,ignore
rustc_queries! {
    Other {
        /// Records the type of every item.
        query type_of(key: DefId) -> Ty<'tcx> {
            desc { |tcx| "computing the type of `{}`", tcx.def_path_str(key) }
        }
    }
}
```

`rustc_queries` macro will generate an appropriate `impl` automatically.

## External links

Related design ideas, and tracking issues:

- Design document: [On-demand Rustc incremental design doc]
- Tracking Issue: ["Red/Green" dependency tracking in compiler]

More discussion and issues:

- [GitHub issue #42633]
- [Incremental Compilation Beta]
- [Incremental Compilation Announcement]

[On-demand Rustc incremental design doc]: https://github.com/nikomatsakis/rustc-on-demand-incremental-design-doc/blob/master/0000-rustc-on-demand-and-incremental.md
["Red/Green" dependency tracking in compiler]: https://github.com/rust-lang/rust/issues/42293
[GitHub issue #42633]: https://github.com/rust-lang/rust/issues/42633
[Incremental Compilation Beta]: https://internals.rust-lang.org/t/incremental-compilation-beta/4721
[Incremental Compilation Announcement]: https://blog.rust-lang.org/2016/09/08/incremental.html