summaryrefslogtreecommitdiffstats
path: root/src/cmd/compile/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'src/cmd/compile/README.md')
-rw-r--r--src/cmd/compile/README.md316
1 files changed, 316 insertions, 0 deletions
diff --git a/src/cmd/compile/README.md b/src/cmd/compile/README.md
new file mode 100644
index 0000000..9b99a1b
--- /dev/null
+++ b/src/cmd/compile/README.md
@@ -0,0 +1,316 @@
+<!---
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+-->
+
+## Introduction to the Go compiler
+
+`cmd/compile` contains the main packages that form the Go compiler. The compiler
+may be logically split in four phases, which we will briefly describe alongside
+the list of packages that contain their code.
+
+You may sometimes hear the terms "front-end" and "back-end" when referring to
+the compiler. Roughly speaking, these translate to the first two and last two
+phases we are going to list here. A third term, "middle-end", often refers to
+much of the work that happens in the second phase.
+
+Note that the `go/*` family of packages, such as `go/parser` and
+`go/types`, are mostly unused by the compiler. Since the compiler was
+initially written in C, the `go/*` packages were developed to enable
+writing tools working with Go code, such as `gofmt` and `vet`.
+However, over time the compiler's internal APIs have slowly evolved to
+be more familiar to users of the `go/*` packages.
+
+It should be clarified that the name "gc" stands for "Go compiler", and has
+little to do with uppercase "GC", which stands for garbage collection.
+
+### 1. Parsing
+
+* `cmd/compile/internal/syntax` (lexer, parser, syntax tree)
+
+In the first phase of compilation, source code is tokenized (lexical analysis),
+parsed (syntax analysis), and a syntax tree is constructed for each source
+file.
+
+Each syntax tree is an exact representation of the respective source file, with
+nodes corresponding to the various elements of the source such as expressions,
+declarations, and statements. The syntax tree also includes position information
+which is used for error reporting and the creation of debugging information.
+
+### 2. Type checking
+
+* `cmd/compile/internal/types2` (type checking)
+
+The types2 package is a port of `go/types` to use the syntax package's
+AST instead of `go/ast`.
+
+### 3. IR construction ("noding")
+
+* `cmd/compile/internal/types` (compiler types)
+* `cmd/compile/internal/ir` (compiler AST)
+* `cmd/compile/internal/noder` (create compiler AST)
+
+The compiler middle end uses its own AST definition and representation of Go
+types carried over from when it was written in C. All of its code is written in
+terms of these, so the next step after type checking is to convert the syntax
+and types2 representations to ir and types. This process is referred to as
+"noding."
+
+Noding using a process called Unified IR, which builds a node representation
+using a serialized version of the typechecked code from step 2.
+Unified IR is also involved in import/export of packages and inlining.
+
+### 4. Middle end
+
+* `cmd/compile/internal/deadcode` (dead code elimination)
+* `cmd/compile/internal/inline` (function call inlining)
+* `cmd/compile/internal/devirtualize` (devirtualization of known interface method calls)
+* `cmd/compile/internal/escape` (escape analysis)
+
+Several optimization passes are performed on the IR representation:
+dead code elimination, (early) devirtualization, function call
+inlining, and escape analysis.
+
+### 5. Walk
+
+* `cmd/compile/internal/walk` (order of evaluation, desugaring)
+
+The final pass over the IR representation is "walk," which serves two purposes:
+
+1. It decomposes complex statements into individual, simpler statements,
+ introducing temporary variables and respecting order of evaluation. This step
+ is also referred to as "order."
+
+2. It desugars higher-level Go constructs into more primitive ones. For example,
+ `switch` statements are turned into binary search or jump tables, and
+ operations on maps and channels are replaced with runtime calls.
+
+### 6. Generic SSA
+
+* `cmd/compile/internal/ssa` (SSA passes and rules)
+* `cmd/compile/internal/ssagen` (converting IR to SSA)
+
+In this phase, IR is converted into Static Single Assignment (SSA) form, a
+lower-level intermediate representation with specific properties that make it
+easier to implement optimizations and to eventually generate machine code from
+it.
+
+During this conversion, function intrinsics are applied. These are special
+functions that the compiler has been taught to replace with heavily optimized
+code on a case-by-case basis.
+
+Certain nodes are also lowered into simpler components during the AST to SSA
+conversion, so that the rest of the compiler can work with them. For instance,
+the copy builtin is replaced by memory moves, and range loops are rewritten into
+for loops. Some of these currently happen before the conversion to SSA due to
+historical reasons, but the long-term plan is to move all of them here.
+
+Then, a series of machine-independent passes and rules are applied. These do not
+concern any single computer architecture, and thus run on all `GOARCH` variants.
+These passes include dead code elimination, removal of
+unneeded nil checks, and removal of unused branches. The generic rewrite rules
+mainly concern expressions, such as replacing some expressions with constant
+values, and optimizing multiplications and float operations.
+
+### 7. Generating machine code
+
+* `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes)
+* `cmd/internal/obj` (machine code generation)
+
+The machine-dependent phase of the compiler begins with the "lower" pass, which
+rewrites generic values into their machine-specific variants. For example, on
+amd64 memory operands are possible, so many load-store operations may be combined.
+
+Note that the lower pass runs all machine-specific rewrite rules, and thus it
+currently applies lots of optimizations too.
+
+Once the SSA has been "lowered" and is more specific to the target architecture,
+the final code optimization passes are run. This includes yet another dead code
+elimination pass, moving values closer to their uses, the removal of local
+variables that are never read from, and register allocation.
+
+Other important pieces of work done as part of this step include stack frame
+layout, which assigns stack offsets to local variables, and pointer liveness
+analysis, which computes which on-stack pointers are live at each GC safe point.
+
+At the end of the SSA generation phase, Go functions have been transformed into
+a series of obj.Prog instructions. These are passed to the assembler
+(`cmd/internal/obj`), which turns them into machine code and writes out the
+final object file. The object file will also contain reflect data, export data,
+and debugging information.
+
+### 8. Tips
+
+#### Getting Started
+
+* If you have never contributed to the compiler before, a simple way to begin
+ can be adding a log statement or `panic("here")` to get some
+ initial insight into whatever you are investigating.
+
+* The compiler itself provides logging, debugging and visualization capabilities,
+ such as:
+ ```
+ $ go build -gcflags=-m=2 # print optimization info, including inlining, escape analysis
+ $ go build -gcflags=-d=ssa/check_bce/debug # print bounds check info
+ $ go build -gcflags=-W # print internal parse tree after type checking
+ $ GOSSAFUNC=Foo go build # generate ssa.html file for func Foo
+ $ go build -gcflags=-S # print assembly
+ $ go tool compile -bench=out.txt x.go # print timing of compiler phases
+ ```
+
+ Some flags alter the compiler behavior, such as:
+ ```
+ $ go tool compile -h file.go # panic on first compile error encountered
+ $ go build -gcflags=-d=checkptr=2 # enable additional unsafe pointer checking
+ ```
+
+ There are many additional flags. Some descriptions are available via:
+ ```
+ $ go tool compile -h # compiler flags, e.g., go build -gcflags='-m=1 -l'
+ $ go tool compile -d help # debug flags, e.g., go build -gcflags=-d=checkptr=2
+ $ go tool compile -d ssa/help # ssa flags, e.g., go build -gcflags=-d=ssa/prove/debug=2
+ ```
+
+ There are some additional details about `-gcflags` and the differences between `go build`
+ vs. `go tool compile` in a [section below](#-gcflags-and-go-build-vs-go-tool-compile).
+
+* In general, when investigating a problem in the compiler you usually want to
+ start with the simplest possible reproduction and understand exactly what is
+ happening with it.
+
+#### Testing your changes
+
+* Be sure to read the [Quickly testing your changes](https://go.dev/doc/contribute#quick_test)
+ section of the Go Contribution Guide.
+
+* Some tests live within the cmd/compile packages and can be run by `go test ./...` or similar,
+ but many cmd/compile tests are in the top-level
+ [test](https://github.com/golang/go/tree/master/test) directory:
+
+ ```
+ $ go test cmd/internal/testdir # all tests in 'test' dir
+ $ go test cmd/internal/testdir -run='Test/escape.*.go' # test specific files in 'test' dir
+ ```
+ For details, see the [testdir README](https://github.com/golang/go/tree/master/test#readme).
+ The `errorCheck` method in [testdir_test.go](https://github.com/golang/go/blob/master/src/cmd/internal/testdir/testdir_test.go)
+ is helpful for a description of the `ERROR` comments used in many of those tests.
+
+ In addition, the `go/types` package from the standard library and `cmd/compile/internal/types2`
+ have shared tests in `src/internal/types/testdata`, and both type checkers
+ should be checked if anything changes there.
+
+* The new [application-based coverage profiling](https://go.dev/testing/coverage/) can be used
+ with the compiler, such as:
+
+ ```
+ $ go install -cover -coverpkg=cmd/compile/... cmd/compile # build compiler with coverage instrumentation
+ $ mkdir /tmp/coverdir # pick location for coverage data
+ $ GOCOVERDIR=/tmp/coverdir go test [...] # use compiler, saving coverage data
+ $ go tool covdata textfmt -i=/tmp/coverdir -o coverage.out # convert to traditional coverage format
+ $ go tool cover -html coverage.out # view coverage via traditional tools
+ ```
+
+#### Juggling compiler versions
+
+* Many of the compiler tests use the version of the `go` command found in your PATH and
+ its corresponding `compile` binary.
+
+* If you are in a branch and your PATH includes `<go-repo>/bin`,
+ doing `go install cmd/compile` will build the compiler using the code from your
+ branch and install it to the proper location so that subsequent `go` commands
+ like `go build` or `go test ./...` will exercise your freshly built compiler.
+
+* [toolstash](https://pkg.go.dev/golang.org/x/tools/cmd/toolstash) provides a way
+ to save, run, and restore a known good copy of the Go toolchain. For example, it can be
+ a good practice to initially build your branch, save that version of
+ the toolchain, then restore the known good version of the tools to compile
+ your work-in-progress version of the compiler.
+
+ Sample set up steps:
+ ```
+ $ go install golang.org/x/tools/cmd/toolstash@latest
+ $ git clone https://go.googlesource.com/go
+ $ cd go
+ $ git checkout -b mybranch
+ $ ./src/all.bash # build and confirm good starting point
+ $ export PATH=$PWD/bin:$PATH
+ $ toolstash save # save current tools
+ ```
+ After that, your edit/compile/test cycle can be similar to:
+ ```
+ <... make edits to cmd/compile source ...>
+ $ toolstash restore && go install cmd/compile # restore known good tools to build compiler
+ <... 'go build', 'go test', etc. ...> # use freshly built compiler
+ ```
+
+* toolstash also allows comparing the installed vs. stashed copy of
+ the compiler, such as if you expect equivalent behavior after a refactor.
+ For example, to check that your changed compiler produces identical object files to
+ the stashed compiler while building the standard library:
+ ```
+ $ toolstash restore && go install cmd/compile # build latest compiler
+ $ go build -toolexec "toolstash -cmp" -a -v std # compare latest vs. saved compiler
+ ```
+
+* If versions appear to get out of sync (for example, with errors like
+ `linked object header mismatch` with version strings like
+ `devel go1.21-db3f952b1f`), you might need to do
+ `toolstash restore && go install cmd/...` to update all the tools under cmd.
+
+#### Additional helpful tools
+
+* [compilebench](https://pkg.go.dev/golang.org/x/tools/cmd/compilebench) benchmarks
+ the speed of the compiler.
+
+* [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat) is the standard tool
+ for reporting performance changes resulting from compiler modifications,
+ including whether any improvements are statistically significant:
+ ```
+ $ go test -bench=SomeBenchmarks -count=20 > new.txt # use new compiler
+ $ toolstash restore # restore old compiler
+ $ go test -bench=SomeBenchmarks -count=20 > old.txt # use old compiler
+ $ benchstat old.txt new.txt # compare old vs. new
+ ```
+
+* [bent](https://pkg.go.dev/golang.org/x/benchmarks/cmd/bent) facilitates running a
+ large set of benchmarks from various community Go projects inside a Docker container.
+
+* [perflock](https://github.com/aclements/perflock) helps obtain more consistent
+ benchmark results, including by manipulating CPU frequency scaling settings on Linux.
+
+* [view-annotated-file](https://github.com/loov/view-annotated-file) (from the community)
+ overlays inlining, bounds check, and escape info back onto the source code.
+
+* [godbolt.org](https://go.godbolt.org) is widely used to examine
+ and share assembly output from many compilers, including the Go compiler. It can also
+ [compare](https://go.godbolt.org/z/5Gs1G4bKG) assembly for different versions of
+ a function or across Go compiler versions, which can be helpful for investigations and
+ bug reports.
+
+#### -gcflags and 'go build' vs. 'go tool compile'
+
+* `-gcflags` is a go command [build flag](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies).
+ `go build -gcflags=<args>` passes the supplied `<args>` to the underlying
+ `compile` invocation(s) while still doing everything that the `go build` command
+ normally does (e.g., handling the build cache, modules, and so on). In contrast,
+ `go tool compile <args>` asks the `go` command to invoke `compile <args>` a single time
+ without involving the standard `go build` machinery. In some cases, it can be helpful to have
+ fewer moving parts by doing `go tool compile <args>`, such as if you have a
+ small standalone source file that can be compiled without any assistance from `go build`.
+ In other cases, it is more convenient to pass `-gcflags` to a build command like
+ `go build`, `go test`, or `go install`.
+
+* `-gcflags` by default applies to the packages named on the command line, but can
+ use package patterns such as `-gcflags='all=-m=1 -l'`, or multiple package patterns such as
+ `-gcflags='all=-m=1' -gcflags='fmt=-m=2'`. For details, see the
+ [cmd/go documentation](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies).
+
+### Further reading
+
+To dig deeper into how the SSA package works, including its passes and rules,
+head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md).
+
+Finally, if something in this README or the SSA README is unclear
+or if you have an idea for an improvement, feel free to leave a comment in
+[issue 30074](https://go.dev/issue/30074).