summaryrefslogtreecommitdiffstats
path: root/src/doc/rustc-dev-guide/src/part-5-intro.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-17 12:02:58 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-17 12:02:58 +0000
commit698f8c2f01ea549d77d7dc3338a12e04c11057b9 (patch)
tree173a775858bd501c378080a10dca74132f05bc50 /src/doc/rustc-dev-guide/src/part-5-intro.md
parentInitial commit. (diff)
downloadrustc-698f8c2f01ea549d77d7dc3338a12e04c11057b9.tar.xz
rustc-698f8c2f01ea549d77d7dc3338a12e04c11057b9.zip
Adding upstream version 1.64.0+dfsg1.upstream/1.64.0+dfsg1
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/doc/rustc-dev-guide/src/part-5-intro.md')
-rw-r--r--src/doc/rustc-dev-guide/src/part-5-intro.md54
1 files changed, 54 insertions, 0 deletions
diff --git a/src/doc/rustc-dev-guide/src/part-5-intro.md b/src/doc/rustc-dev-guide/src/part-5-intro.md
new file mode 100644
index 000000000..4b7c25797
--- /dev/null
+++ b/src/doc/rustc-dev-guide/src/part-5-intro.md
@@ -0,0 +1,54 @@
+# From MIR to Binaries
+
+All of the preceding chapters of this guide have one thing in common: we never
+generated any executable machine code at all! With this chapter, all of that
+changes.
+
+So far, we've shown how the compiler can take raw source code in text format
+and transform it into [MIR]. We have also shown how the compiler does various
+analyses on the code to detect things like type or lifetime errors. Now, we
+will finally take the MIR and produce some executable machine code.
+
+[MIR]: ./mir/index.md
+
+> NOTE: This part of a compiler is often called the _backend_. The term is a bit
+> overloaded because in the compiler source, it usually refers to the "codegen
+> backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend"
+> in this part, we are referring to the "codegen backend".
+
+So what do we need to do?
+
+0. First, we need to collect the set of things to generate code for. In
+ particular, we need to find out which concrete types to substitute for
+ generic ones, since we need to generate code for the concrete types.
+ Generating code for the concrete types (i.e. emitting a copy of the code for
+ each concrete type) is called _monomorphization_, so the process of
+ collecting all the concrete types is called _monomorphization collection_.
+1. Next, we need to actually lower the MIR to a codegen IR
+ (usually LLVM IR) for each concrete type we collected.
+2. Finally, we need to invoke LLVM or Cranelift, which runs a bunch of
+ optimization passes, generates executable code, and links together an
+ executable binary.
+
+[codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html
+
+The code for codegen is actually a bit complex due to a few factors:
+
+- Support for multiple codegen backends (LLVM and Cranelift). We try to share as much
+ backend code between them as possible, so a lot of it is generic over the
+ codegen implementation. This means that there are often a lot of layers of
+ abstraction.
+- Codegen happens asynchronously in another thread for performance.
+- The actual codegen is done by a third-party library (either LLVM or Cranelift).
+
+Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code
+(i.e. independent of LLVM or Cranelift), while the [`rustc_codegen_llvm`][llvm]
+crate contains code specific to LLVM codegen.
+
+[ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html
+[llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html
+
+At a very high level, the entry point is
+[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. This function starts the
+process discussed in the rest of this chapter.
+