src/doc/rustc-dev-guide/src/backend/backend-agnostic.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204

# Backend Agnostic Codegen

<!-- toc -->

As of <!-- date-check --> Aug 2022, `rustc_codegen_ssa` provides an
abstract interface for all backends to implement, to allow other codegen
backends (e.g. [Cranelift]).

[Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/HEAD/cranelift

# Refactoring of `rustc_codegen_llvm`
by Denis Merigoux, October 23rd 2018

## State of the code before the refactoring

All the code related to the compilation of MIR into LLVM IR was contained
inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most
important elements:
* the `back` folder (7,800 LOC) implements the mechanisms for creating the
  different object files and archive through LLVM, but also the communication
  mechanisms for parallel code generation;
* the `debuginfo` (3,200 LOC) folder contains all code that passes debug
  information down to LLVM;
* the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with
  LLVM using the C++ API;
* the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM
  IR;
* the `base.rs` (1,300 LOC) file contains some helper functions but also the
  high-level code that launches the code generation and distributes the work.
* the `builder.rs` (1,200 LOC) file contains all the functions generating
  individual LLVM IR instructions inside a basic block;
* the `common.rs` (450 LOC) contains various helper functions and all the
  functions generating LLVM static values;
* the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR.

The goal of this refactoring is to separate inside this crate code that is
specific to the LLVM from code that can be reused for other rustc backends. For
instance, the `mir` folder is almost entirely backend-specific but it relies
heavily on other parts of the crate. The separation of the code must not affect
the logic of the code nor its performance.

For these reasons, the separation process involves two transformations that
have to be done at the same time for the resulting code to compile :

1. replace all the LLVM-specific types by generics inside function signatures
   and structure definitions;
2. encapsulate all functions calling the LLVM FFI inside a set of traits that
   will define the interface between backend-agnostic code and the backend.

While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new
traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name
suggestion by @eddyb).

## Generic types and structures

@irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a
generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This
work has been extended to all structures inside the `mir` folder and elsewhere,
as well as for LLVM's `BasicBlock` and `Type` types.

The two most important structures for the LLVM codegen are `CodegenCx` and
`Builder`. They are parametrized by multiple lifetime parameters and the type
for `Value`.

```rust,ignore
struct CodegenCx<'ll, 'tcx> {
  /* ... */
}

struct Builder<'a, 'll, 'tcx> {
  cx: &'a CodegenCx<'ll, 'tcx>,
  /* ... */
}
```

`CodegenCx` is used to compile one codegen-unit that can contain multiple
functions, whereas `Builder` is created to compile one basic block.

The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime
parameters, that correspond to the following:
* `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt`
  containing the program's information;
* `'a` is a short-lived reference of a `CodegenCx` or another object inside a
  struct;
* `'ll` is the lifetime of references to LLVM objects such as `Value` or
  `Type`.

Although there are already many lifetime parameters in the code, making it
generic uncovered situations where the borrow-checker was passing only due to
the special nature of the LLVM objects manipulated (they are extern pointers).
For instance, an additional lifetime parameter had to be added to
`LocalAnalyser` in `analyse.rs`, leading to the definition:

```rust,ignore
struct LocalAnalyzer<'mir, 'a, 'tcx> {
  /* ... */
}
```

However, the two most important structures `CodegenCx` and `Builder` are not
defined in the backend-agnostic code. Indeed, their content is highly specific
of the backend and it makes more sense to leave their definition to the backend
implementor than to allow just a narrow spot via a generic field for the
backend's context.

## Traits and interface

Because they have to be defined by the backend, `CodegenCx` and `Builder` will
be the structures implementing all the traits defining the backend's interface.
These traits are defined in the folder `rustc_codegen_ssa/traits` and all the
backend-agnostic code is parametrized by them. For instance, let us explain how
a function in `base.rs` is parametrized:

```rust,ignore
pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
    cx: &'a Bx::CodegenCx,
    instance: Instance<'tcx>
) {
    /* ... */
}
```

In this signature, we have the two lifetime parameters explained earlier and
the master type `Bx` which satisfies the trait `BuilderMethods` corresponding
to the interface satisfied by the `Builder` struct. The `BuilderMethods`
defines an associated type `Bx::CodegenCx` that itself satisfies the
`CodegenMethods` traits implemented by the struct `CodegenCx`.

On the trait side, here is an example with part of the definition of
`BuilderMethods` in `traits/builder.rs`:

```rust,ignore
pub trait BuilderMethods<'a, 'tcx>:
    HasCodegen<'tcx>
    + DebugInfoBuilderMethods<'tcx>
    + ArgTypeMethods<'tcx>
    + AbiBuilderMethods<'tcx>
    + IntrinsicCallMethods<'tcx>
    + AsmBuilderMethods<'tcx>
{
    fn new_block<'b>(
        cx: &'a Self::CodegenCx,
        llfn: Self::Function,
        name: &'b str
    ) -> Self;
    /* ... */
    fn cond_br(
        &mut self,
        cond: Self::Value,
        then_llbb: Self::BasicBlock,
        else_llbb: Self::BasicBlock,
    );
    /* ... */
}
```

Finally, a master structure implementing the `ExtraBackendMethods` trait is
used for high-level codegen-driving functions like `codegen_crate` in
`base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`.
`ExtraBackendMethods` should be implemented by the same structure that
implements the `CodegenBackend` defined in
`rustc_codegen_utils/codegen_backend.rs`.

During the traitification process, certain functions have been converted from
methods of a local structure to methods of `CodegenCx` or `Builder` and a
corresponding `self` parameter has been added. Indeed, LLVM stores information
internally that it can access when called through its API. This information
does not show up in a Rust data structure carried around when these methods are
called. However, when implementing a Rust backend for `rustc`, these methods
will need information from `CodegenCx`, hence the additional parameter (unused
in the LLVM implementation of the trait).

## State of the code after the refactoring

The traits offer an API which is very similar to the API of LLVM. This is not
the best solution since LLVM has a very special way of doing things: when
adding another backend, the traits definition might be changed in order to
offer more flexibility.

However, the current separation between backend-agnostic and LLVM-specific code
has allowed the reuse of a significant part of the old `rustc_codegen_llvm`.
Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the
most important elements:

* `back` folder: 3,800 (BA) vs 4,100 (LLVM);
* `mir` folder: 4,400 (BA) vs 0 (LLVM);
* `base.rs`: 1,100 (BA) vs 250 (LLVM);
* `builder.rs`: 1,400 (BA) vs 0 (LLVM);
* `common.rs`: 350 (BA) vs 350 (LLVM);

The `debuginfo` folder has been left almost untouched by the splitting and is
specific to LLVM. Only its high-level features have been traitified.

The new `traits` folder has 1500 LOC only for trait definitions. Overall, the
27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new
18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized
`rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of
approximately 10,000 LOC that would otherwise have had to be duplicated between
the multiple backends of `rustc`.

The refactored version of `rustc`'s backend introduced no regression over the
test suite nor in performance benchmark, which is in coherence with the nature
of the refactoring that used only compile-time parametricity (no trait
objects).