summaryrefslogtreecommitdiffstats
path: root/src/cmd/compile/README.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-16 19:23:18 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-16 19:23:18 +0000
commit43a123c1ae6613b3efeed291fa552ecd909d3acf (patch)
treefd92518b7024bc74031f78a1cf9e454b65e73665 /src/cmd/compile/README.md
parentInitial commit. (diff)
downloadgolang-1.20-upstream.tar.xz
golang-1.20-upstream.zip
Adding upstream version 1.20.14.upstream/1.20.14upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--src/cmd/compile/README.md157
1 files changed, 157 insertions, 0 deletions
diff --git a/src/cmd/compile/README.md b/src/cmd/compile/README.md
new file mode 100644
index 0000000..9c4eeeb
--- /dev/null
+++ b/src/cmd/compile/README.md
@@ -0,0 +1,157 @@
+<!---
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+-->
+
+## Introduction to the Go compiler
+
+`cmd/compile` contains the main packages that form the Go compiler. The compiler
+may be logically split in four phases, which we will briefly describe alongside
+the list of packages that contain their code.
+
+You may sometimes hear the terms "front-end" and "back-end" when referring to
+the compiler. Roughly speaking, these translate to the first two and last two
+phases we are going to list here. A third term, "middle-end", often refers to
+much of the work that happens in the second phase.
+
+Note that the `go/*` family of packages, such as `go/parser` and
+`go/types`, are mostly unused by the compiler. Since the compiler was
+initially written in C, the `go/*` packages were developed to enable
+writing tools working with Go code, such as `gofmt` and `vet`.
+However, over time the compiler's internal APIs have slowly evolved to
+be more familiar to users of the `go/*` packages.
+
+It should be clarified that the name "gc" stands for "Go compiler", and has
+little to do with uppercase "GC", which stands for garbage collection.
+
+### 1. Parsing
+
+* `cmd/compile/internal/syntax` (lexer, parser, syntax tree)
+
+In the first phase of compilation, source code is tokenized (lexical analysis),
+parsed (syntax analysis), and a syntax tree is constructed for each source
+file.
+
+Each syntax tree is an exact representation of the respective source file, with
+nodes corresponding to the various elements of the source such as expressions,
+declarations, and statements. The syntax tree also includes position information
+which is used for error reporting and the creation of debugging information.
+
+### 2. Type checking
+
+* `cmd/compile/internal/types2` (type checking)
+
+The types2 package is a port of `go/types` to use the syntax package's
+AST instead of `go/ast`.
+
+### 3. IR construction ("noding")
+
+* `cmd/compile/internal/types` (compiler types)
+* `cmd/compile/internal/ir` (compiler AST)
+* `cmd/compile/internal/typecheck` (AST transformations)
+* `cmd/compile/internal/noder` (create compiler AST)
+
+The compiler middle end uses its own AST definition and representation of Go
+types carried over from when it was written in C. All of its code is written in
+terms of these, so the next step after type checking is to convert the syntax
+and types2 representations to ir and types. This process is referred to as
+"noding."
+
+There are currently two noding implementations:
+
+1. irgen (aka "-G=3" or sometimes "noder2") is the implementation used starting
+ with Go 1.18, and
+
+2. Unified IR is another, in-development implementation (enabled with
+ `GOEXPERIMENT=unified`), which also implements import/export and inlining.
+
+Up through Go 1.18, there was a third noding implementation (just
+"noder" or "-G=0"), which directly converted the pre-type-checked
+syntax representation into IR and then invoked package typecheck's
+type checker. This implementation was removed after Go 1.18, so now
+package typecheck is only used for IR transformations.
+
+### 4. Middle end
+
+* `cmd/compile/internal/deadcode` (dead code elimination)
+* `cmd/compile/internal/inline` (function call inlining)
+* `cmd/compile/internal/devirtualize` (devirtualization of known interface method calls)
+* `cmd/compile/internal/escape` (escape analysis)
+
+Several optimization passes are performed on the IR representation:
+dead code elimination, (early) devirtualization, function call
+inlining, and escape analysis.
+
+### 5. Walk
+
+* `cmd/compile/internal/walk` (order of evaluation, desugaring)
+
+The final pass over the IR representation is "walk," which serves two purposes:
+
+1. It decomposes complex statements into individual, simpler statements,
+ introducing temporary variables and respecting order of evaluation. This step
+ is also referred to as "order."
+
+2. It desugars higher-level Go constructs into more primitive ones. For example,
+ `switch` statements are turned into binary search or jump tables, and
+ operations on maps and channels are replaced with runtime calls.
+
+### 6. Generic SSA
+
+* `cmd/compile/internal/ssa` (SSA passes and rules)
+* `cmd/compile/internal/ssagen` (converting IR to SSA)
+
+In this phase, IR is converted into Static Single Assignment (SSA) form, a
+lower-level intermediate representation with specific properties that make it
+easier to implement optimizations and to eventually generate machine code from
+it.
+
+During this conversion, function intrinsics are applied. These are special
+functions that the compiler has been taught to replace with heavily optimized
+code on a case-by-case basis.
+
+Certain nodes are also lowered into simpler components during the AST to SSA
+conversion, so that the rest of the compiler can work with them. For instance,
+the copy builtin is replaced by memory moves, and range loops are rewritten into
+for loops. Some of these currently happen before the conversion to SSA due to
+historical reasons, but the long-term plan is to move all of them here.
+
+Then, a series of machine-independent passes and rules are applied. These do not
+concern any single computer architecture, and thus run on all `GOARCH` variants.
+These passes include dead code elimination, removal of
+unneeded nil checks, and removal of unused branches. The generic rewrite rules
+mainly concern expressions, such as replacing some expressions with constant
+values, and optimizing multiplications and float operations.
+
+### 7. Generating machine code
+
+* `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes)
+* `cmd/internal/obj` (machine code generation)
+
+The machine-dependent phase of the compiler begins with the "lower" pass, which
+rewrites generic values into their machine-specific variants. For example, on
+amd64 memory operands are possible, so many load-store operations may be combined.
+
+Note that the lower pass runs all machine-specific rewrite rules, and thus it
+currently applies lots of optimizations too.
+
+Once the SSA has been "lowered" and is more specific to the target architecture,
+the final code optimization passes are run. This includes yet another dead code
+elimination pass, moving values closer to their uses, the removal of local
+variables that are never read from, and register allocation.
+
+Other important pieces of work done as part of this step include stack frame
+layout, which assigns stack offsets to local variables, and pointer liveness
+analysis, which computes which on-stack pointers are live at each GC safe point.
+
+At the end of the SSA generation phase, Go functions have been transformed into
+a series of obj.Prog instructions. These are passed to the assembler
+(`cmd/internal/obj`), which turns them into machine code and writes out the
+final object file. The object file will also contain reflect data, export data,
+and debugging information.
+
+### Further reading
+
+To dig deeper into how the SSA package works, including its passes and rules,
+head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md).