summaryrefslogtreecommitdiffstats
path: root/src/doc/rustc-dev-guide/src/parallel-rustc.md
blob: 4aa13d781a3069317e01bad6eb1a65dc94a7db9c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# Parallel Compilation

As of <!-- date: 2022-05 --> May 2022, The only stage of the compiler
that is already parallel is codegen. The nightly compiler implements query evaluation,
but there is still a lot of work to be done. The lack of parallelism at other stages
also represents an opportunity for improving compiler performance. One can try out the current
parallel compiler work by enabling it in the `config.toml`.

These next few sections describe where and how parallelism is currently used,
and the current status of making parallel compilation the default in `rustc`.

The underlying thread-safe data-structures used in the parallel compiler
can be found in the `rustc_data_structures::sync` module. Some of these data structures
use the `parking_lot` crate as well.

## Codegen

There are two underlying thread safe data structures used in code generation:

- `Lrc`
    -  Which is an [`Arc`][Arc] if `parallel_compiler` is true, and a [`Rc`][Rc]
       if it is not.
- `MetadataRef` -> [`OwningRef<Box<dyn Erased + Send + Sync>, [u8]>`][OwningRef]
    - This data structure is specific to `rustc`.

During [monomorphization][monomorphization] the compiler splits up all the code to
be generated into smaller chunks called _codegen units_. These are then generated by
independent instances of LLVM running in parallel. At the end, the linker
is run to combine all the codegen units together into one binary. This process
occurs in the `rustc_codegen_ssa::base` module.

## Query System

The query model has some properties that make it actually feasible to evaluate
multiple queries in parallel without too much of an effort:

- All data a query provider can access is accessed via the query context, so
  the query context can take care of synchronizing access.
- Query results are required to be immutable so they can safely be used by
  different threads concurrently.

When a query `foo` is evaluated, the cache table for `foo` is locked.

- If there already is a result, we can clone it, release the lock and
  we are done.
- If there is no cache entry and no other active query invocation computing the
  same result, we mark the key as being "in progress", release the lock and
  start evaluating.
- If there *is* another query invocation for the same key in progress, we
  release the lock, and just block the thread until the other invocation has
  computed the result we are waiting for. This cannot deadlock because, as
  mentioned before, query invocations form a DAG. Some threads will always make
  progress.

## Rustdoc

As of <!-- date: 2022-05--> May 2022, there are still a number of steps
to complete before rustdoc rendering can be made parallel. More details on
this issue can be found [here][parallel-rustdoc].

## Current Status

As of <!-- date: 2022-05 --> May 2022, work on explicitly parallelizing the
compiler has stalled. There is a lot of design and correctness work that needs
to be done.

These are the basic ideas in the effort to make `rustc` parallel:

- There are a lot of loops in the compiler that just iterate over all items in
  a crate. These can possibly be parallelized.
- We can use (a custom fork of) [`rayon`] to run tasks in parallel. The custom
  fork allows the execution of DAGs of tasks, not just trees.
- There are currently a lot of global data structures that need to be made
  thread-safe. A key strategy here has been converting interior-mutable
  data-structures (e.g. `Cell`) into their thread-safe siblings (e.g. `Mutex`).

[`rayon`]: https://crates.io/crates/rayon

As of <!-- date: 2022-05 --> May 2022, much of this effort is on hold due
to lack of manpower. We have a working prototype with promising performance
gains in many cases. However, there are two blockers:

- It's not clear what invariants need to be upheld that might not hold in the
  face of concurrency. An auditing effort was underway, but seems to have
  stalled at some point.

- There is a lot of lock contention, which actually degrades performance as the
  number of threads increases beyond 4.

Here are some resources that can be used to learn more (note that some of them
are a bit out of date):

- [This IRLO thread by Zoxc, one of the pioneers of the effort][irlo0]
- [This list of interior mutability in the compiler by nikomatsakis][imlist]
- [This IRLO thread by alexchricton about performance][irlo1]
- [This tracking issue][tracking]

[irlo0]: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606
[imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
[irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503
[tracking]: https://github.com/rust-lang/rust/issues/48685
[monomorphization]: backend/monomorph.md
[parallel-rustdoc]: https://github.com/rust-lang/rust/issues/82741
[Arc]: https://doc.rust-lang.org/std/sync/struct.Arc.html
[Rc]: https://doc.rust-lang.org/std/rc/struct.Rc.html
[OwningRef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/owning_ref/index.html