summaryrefslogtreecommitdiffstats
path: root/src/doc/rustc-dev-guide/src/parallel-rustc.md
diff options
context:
space:
mode:
Diffstat (limited to 'src/doc/rustc-dev-guide/src/parallel-rustc.md')
-rw-r--r--src/doc/rustc-dev-guide/src/parallel-rustc.md161
1 files changed, 111 insertions, 50 deletions
diff --git a/src/doc/rustc-dev-guide/src/parallel-rustc.md b/src/doc/rustc-dev-guide/src/parallel-rustc.md
index 4aa13d781..e93f51dbb 100644
--- a/src/doc/rustc-dev-guide/src/parallel-rustc.md
+++ b/src/doc/rustc-dev-guide/src/parallel-rustc.md
@@ -1,34 +1,116 @@
# Parallel Compilation
-As of <!-- date: 2022-05 --> May 2022, The only stage of the compiler
-that is already parallel is codegen. The nightly compiler implements query evaluation,
-but there is still a lot of work to be done. The lack of parallelism at other stages
-also represents an opportunity for improving compiler performance. One can try out the current
-parallel compiler work by enabling it in the `config.toml`.
+As of <!-- date-check --> August 2022, the only stage of the compiler that
+is already parallel is codegen. Some parts of the compiler already have
+parallel implementations, such as query evaluation, type check and
+monomorphization, but the general version of the compiler does not include
+these parallelization functions. **To try out the current parallel compiler**,
+one can install rustc from source code with `parallel-compiler = true` in
+the `config.toml`.
+
+The lack of parallelism at other stages (for example, macro expansion) also
+represents an opportunity for improving compiler performance.
These next few sections describe where and how parallelism is currently used,
and the current status of making parallel compilation the default in `rustc`.
-The underlying thread-safe data-structures used in the parallel compiler
-can be found in the `rustc_data_structures::sync` module. Some of these data structures
-use the `parking_lot` crate as well.
-
## Codegen
-There are two underlying thread safe data structures used in code generation:
-
-- `Lrc`
- - Which is an [`Arc`][Arc] if `parallel_compiler` is true, and a [`Rc`][Rc]
- if it is not.
-- `MetadataRef` -> [`OwningRef<Box<dyn Erased + Send + Sync>, [u8]>`][OwningRef]
- - This data structure is specific to `rustc`.
-
During [monomorphization][monomorphization] the compiler splits up all the code to
be generated into smaller chunks called _codegen units_. These are then generated by
independent instances of LLVM running in parallel. At the end, the linker
is run to combine all the codegen units together into one binary. This process
occurs in the `rustc_codegen_ssa::base` module.
+## Data Structures
+
+The underlying thread-safe data-structures used in the parallel compiler
+can be found in the `rustc_data_structures::sync` module. These data structures
+are implemented diferently depending on whether `parallel-compiler` is true.
+
+| data structure | parallel | non-parallel |
+| -------------------------------- | --------------------------------------------------- | ------------ |
+| Lrc | std::sync::Arc | std::rc::Rc |
+| Weak | std::sync::Weak | std::rc::Weak |
+| Atomic{Bool}/{Usize}/{U32}/{U64} | std::sync::atomic::Atomic{Bool}/{Usize}/{U32}/{U64} | (std::cell::Cell<bool/usize/u32/u64>) |
+| OnceCell | std::sync::OnceLock | std::cell::OnceCell |
+| Lock\<T> | (parking_lot::Mutex\<T>) | (std::cell::RefCell) |
+| RwLock\<T> | (parking_lot::RwLock\<T>) | (std::cell::RefCell) |
+| MTRef<'a, T> | &'a T | &'a mut T |
+| MTLock\<T> | (Lock\<T>) | (T) |
+| ReadGuard | parking_lot::RwLockReadGuard | std::cell::Ref |
+| MappedReadGuard | parking_lot::MappedRwLockReadGuard | std::cell::Ref |
+| WriteGuard | parking_lot::RwLockWriteGuard | std::cell::RefMut |
+| MappedWriteGuard | parking_lot::MappedRwLockWriteGuard | std::cell::RefMut |
+| LockGuard | parking_lot::MutexGuard | std::cell::RefMut |
+| MappedLockGuard | parking_lot::MappedMutexGuard | std::cell::RefMut |
+| MetadataRef | [`OwningRef<Box<dyn Erased + Send + Sync>, [u8]>`][OwningRef] | [`OwningRef<Box<dyn Erased>, [u8]>`][OwningRef] |
+
+- These thread-safe data structures interspersed during compilation can
+ cause a lot of lock contention, which actually degrades performance as the
+ number of threads increases beyond 4. This inspires us to audit the use
+ of these data structures, leading to either refactoring to reduce use of
+ shared state, or persistent documentation covering invariants, atomicity,
+ and lock orderings.
+
+- On the other hand, we still need to figure out what other invariants
+ during compilation might not hold in parallel compilation.
+
+### WorkLocal
+
+`WorkLocal` is a special data structure implemented for parallel compiler.
+It holds worker-locals values for each thread in a thread pool. You can only
+access the worker local value through the Deref impl on the thread pool it
+was constructed on. It will panic otherwise.
+
+`WorkLocal` is used to implement the `Arena` allocator in the parallel
+environment, which is critical in parallel queries. Its implementation
+is located in the `rustc-rayon-core::worker_local` module. However, in the
+non-parallel compiler, it is implemented as `(OneThread<T>)`, whose `T`
+can be accessed directly through `Deref::deref`.
+
+## Parallel Iterator
+
+The parallel iterators provided by the [`rayon`] crate are easy ways
+to implement parallelism. In the current implementation of the parallel
+compiler we use a custom [fork][rustc-rayon] of [`rayon`] to run tasks in parallel.
+
+Some iterator functions are implemented to run loops in parallel
+when `parallel-compiler` is true.
+
+| Function(Omit `Send` and `Sync`) | Introduction | Owning Module |
+| ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------- |
+| **par_iter**<T: IntoParallelIterator>(t: T) -> T::Iter | generate a parallel iterator | rustc_data_structure::sync |
+| **par_for_each_in**<T: IntoParallelIterator>(t: T, for_each: impl Fn(T::Item)) | generate a parallel iterator and run `for_each` on each element | rustc_data_structure::sync |
+| **Map::par_body_owners**(self, f: impl Fn(LocalDefId)) | run `f` on all hir owners in the crate | rustc_middle::hir::map |
+| **Map::par_for_each_module**(self, f: impl Fn(LocalDefId)) | run `f` on all modules and sub modules in the crate | rustc_middle::hir::map |
+| **ModuleItems::par_items**(&self, f: impl Fn(ItemId)) | run `f` on all items in the module | rustc_middle::hir |
+| **ModuleItems::par_trait_items**(&self, f: impl Fn(TraitItemId)) | run `f` on all trait items in the module | rustc_middle::hir |
+| **ModuleItems::par_impl_items**(&self, f: impl Fn(ImplItemId)) | run `f` on all impl items in the module | rustc_middle::hir |
+| **ModuleItems::par_foreign_items**(&self, f: impl Fn(ForeignItemId)) | run `f` on all foreign items in the module | rustc_middle::hir |
+
+There are a lot of loops in the compiler which can possibly be
+parallelized using these functions. As of <!-- date-check--> August
+2022, scenarios where the parallel iterator function has been used
+are as follows:
+
+| caller | scenario | callee |
+| ------------------------------------------------------- | ------------------------------------------------------------ | ------------------------ |
+| rustc_metadata::rmeta::encoder::prefetch_mir | Prefetch queries which will be needed later by metadata encoding | par_iter |
+| rustc_monomorphize::collector::collect_crate_mono_items | Collect monomorphized items reachable from non-generic items | par_for_each_in |
+| rustc_interface::passes::analysis | Check the validity of the match statements | Map::par_body_owners |
+| rustc_interface::passes::analysis | MIR borrow check | Map::par_body_owners |
+| rustc_typeck::check::typeck_item_bodies | Type check | Map::par_body_owners |
+| rustc_interface::passes::hir_id_validator::check_crate | Check the validity of hir | Map::par_for_each_module |
+| rustc_interface::passes::analysis | Check the validity of loops body, attributes, naked functions, unstable abi, const bodys | Map::par_for_each_module |
+| rustc_interface::passes::analysis | Liveness and intrinsic checking of MIR | Map::par_for_each_module |
+| rustc_interface::passes::analysis | Deathness checking | Map::par_for_each_module |
+| rustc_interface::passes::analysis | Privacy checking | Map::par_for_each_module |
+| rustc_lint::late::check_crate | Run per-module lints | Map::par_for_each_module |
+| rustc_typeck::check_crate | Well-formedness checking | Map::par_for_each_module |
+
+There are still many loops that have the potential to use parallel iterators.
+
## Query System
The query model has some properties that make it actually feasible to evaluate
@@ -48,44 +130,22 @@ When a query `foo` is evaluated, the cache table for `foo` is locked.
start evaluating.
- If there *is* another query invocation for the same key in progress, we
release the lock, and just block the thread until the other invocation has
- computed the result we are waiting for. This cannot deadlock because, as
- mentioned before, query invocations form a DAG. Some threads will always make
- progress.
+ computed the result we are waiting for. **Cycle error detection** in the parallel
+ compiler requires more complex logic than in single-threaded mode. When
+ worker threads in parallel queries stop making progress due to interdependence,
+ the compiler uses an extra thread *(named deadlock handler)* to detect, remove and
+ report the cycle error.
+
+Parallel query still has a lot of work to do, most of which is related to
+the previous `Data Structures` and `Parallel Iterators`. See [this tracking issue][tracking].
## Rustdoc
-As of <!-- date: 2022-05--> May 2022, there are still a number of steps
+As of <!-- date-check--> May 2022, there are still a number of steps
to complete before rustdoc rendering can be made parallel. More details on
this issue can be found [here][parallel-rustdoc].
-## Current Status
-
-As of <!-- date: 2022-05 --> May 2022, work on explicitly parallelizing the
-compiler has stalled. There is a lot of design and correctness work that needs
-to be done.
-
-These are the basic ideas in the effort to make `rustc` parallel:
-
-- There are a lot of loops in the compiler that just iterate over all items in
- a crate. These can possibly be parallelized.
-- We can use (a custom fork of) [`rayon`] to run tasks in parallel. The custom
- fork allows the execution of DAGs of tasks, not just trees.
-- There are currently a lot of global data structures that need to be made
- thread-safe. A key strategy here has been converting interior-mutable
- data-structures (e.g. `Cell`) into their thread-safe siblings (e.g. `Mutex`).
-
-[`rayon`]: https://crates.io/crates/rayon
-
-As of <!-- date: 2022-05 --> May 2022, much of this effort is on hold due
-to lack of manpower. We have a working prototype with promising performance
-gains in many cases. However, there are two blockers:
-
-- It's not clear what invariants need to be upheld that might not hold in the
- face of concurrency. An auditing effort was underway, but seems to have
- stalled at some point.
-
-- There is a lot of lock contention, which actually degrades performance as the
- number of threads increases beyond 4.
+## Resources
Here are some resources that can be used to learn more (note that some of them
are a bit out of date):
@@ -93,8 +153,9 @@ are a bit out of date):
- [This IRLO thread by Zoxc, one of the pioneers of the effort][irlo0]
- [This list of interior mutability in the compiler by nikomatsakis][imlist]
- [This IRLO thread by alexchricton about performance][irlo1]
-- [This tracking issue][tracking]
+[`rayon`]: https://crates.io/crates/rayon
+[rustc-rayon]: https://github.com/rust-lang/rustc-rayon
[irlo0]: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606
[imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
[irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503