diff options
Diffstat (limited to 'src/cmd/compile/README.md')
-rw-r--r-- | src/cmd/compile/README.md | 316 |
1 files changed, 316 insertions, 0 deletions
diff --git a/src/cmd/compile/README.md b/src/cmd/compile/README.md new file mode 100644 index 0000000..9b99a1b --- /dev/null +++ b/src/cmd/compile/README.md @@ -0,0 +1,316 @@ +<!--- +// Copyright 2018 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. +--> + +## Introduction to the Go compiler + +`cmd/compile` contains the main packages that form the Go compiler. The compiler +may be logically split in four phases, which we will briefly describe alongside +the list of packages that contain their code. + +You may sometimes hear the terms "front-end" and "back-end" when referring to +the compiler. Roughly speaking, these translate to the first two and last two +phases we are going to list here. A third term, "middle-end", often refers to +much of the work that happens in the second phase. + +Note that the `go/*` family of packages, such as `go/parser` and +`go/types`, are mostly unused by the compiler. Since the compiler was +initially written in C, the `go/*` packages were developed to enable +writing tools working with Go code, such as `gofmt` and `vet`. +However, over time the compiler's internal APIs have slowly evolved to +be more familiar to users of the `go/*` packages. + +It should be clarified that the name "gc" stands for "Go compiler", and has +little to do with uppercase "GC", which stands for garbage collection. + +### 1. Parsing + +* `cmd/compile/internal/syntax` (lexer, parser, syntax tree) + +In the first phase of compilation, source code is tokenized (lexical analysis), +parsed (syntax analysis), and a syntax tree is constructed for each source +file. + +Each syntax tree is an exact representation of the respective source file, with +nodes corresponding to the various elements of the source such as expressions, +declarations, and statements. The syntax tree also includes position information +which is used for error reporting and the creation of debugging information. + +### 2. Type checking + +* `cmd/compile/internal/types2` (type checking) + +The types2 package is a port of `go/types` to use the syntax package's +AST instead of `go/ast`. + +### 3. IR construction ("noding") + +* `cmd/compile/internal/types` (compiler types) +* `cmd/compile/internal/ir` (compiler AST) +* `cmd/compile/internal/noder` (create compiler AST) + +The compiler middle end uses its own AST definition and representation of Go +types carried over from when it was written in C. All of its code is written in +terms of these, so the next step after type checking is to convert the syntax +and types2 representations to ir and types. This process is referred to as +"noding." + +Noding using a process called Unified IR, which builds a node representation +using a serialized version of the typechecked code from step 2. +Unified IR is also involved in import/export of packages and inlining. + +### 4. Middle end + +* `cmd/compile/internal/deadcode` (dead code elimination) +* `cmd/compile/internal/inline` (function call inlining) +* `cmd/compile/internal/devirtualize` (devirtualization of known interface method calls) +* `cmd/compile/internal/escape` (escape analysis) + +Several optimization passes are performed on the IR representation: +dead code elimination, (early) devirtualization, function call +inlining, and escape analysis. + +### 5. Walk + +* `cmd/compile/internal/walk` (order of evaluation, desugaring) + +The final pass over the IR representation is "walk," which serves two purposes: + +1. It decomposes complex statements into individual, simpler statements, + introducing temporary variables and respecting order of evaluation. This step + is also referred to as "order." + +2. It desugars higher-level Go constructs into more primitive ones. For example, + `switch` statements are turned into binary search or jump tables, and + operations on maps and channels are replaced with runtime calls. + +### 6. Generic SSA + +* `cmd/compile/internal/ssa` (SSA passes and rules) +* `cmd/compile/internal/ssagen` (converting IR to SSA) + +In this phase, IR is converted into Static Single Assignment (SSA) form, a +lower-level intermediate representation with specific properties that make it +easier to implement optimizations and to eventually generate machine code from +it. + +During this conversion, function intrinsics are applied. These are special +functions that the compiler has been taught to replace with heavily optimized +code on a case-by-case basis. + +Certain nodes are also lowered into simpler components during the AST to SSA +conversion, so that the rest of the compiler can work with them. For instance, +the copy builtin is replaced by memory moves, and range loops are rewritten into +for loops. Some of these currently happen before the conversion to SSA due to +historical reasons, but the long-term plan is to move all of them here. + +Then, a series of machine-independent passes and rules are applied. These do not +concern any single computer architecture, and thus run on all `GOARCH` variants. +These passes include dead code elimination, removal of +unneeded nil checks, and removal of unused branches. The generic rewrite rules +mainly concern expressions, such as replacing some expressions with constant +values, and optimizing multiplications and float operations. + +### 7. Generating machine code + +* `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes) +* `cmd/internal/obj` (machine code generation) + +The machine-dependent phase of the compiler begins with the "lower" pass, which +rewrites generic values into their machine-specific variants. For example, on +amd64 memory operands are possible, so many load-store operations may be combined. + +Note that the lower pass runs all machine-specific rewrite rules, and thus it +currently applies lots of optimizations too. + +Once the SSA has been "lowered" and is more specific to the target architecture, +the final code optimization passes are run. This includes yet another dead code +elimination pass, moving values closer to their uses, the removal of local +variables that are never read from, and register allocation. + +Other important pieces of work done as part of this step include stack frame +layout, which assigns stack offsets to local variables, and pointer liveness +analysis, which computes which on-stack pointers are live at each GC safe point. + +At the end of the SSA generation phase, Go functions have been transformed into +a series of obj.Prog instructions. These are passed to the assembler +(`cmd/internal/obj`), which turns them into machine code and writes out the +final object file. The object file will also contain reflect data, export data, +and debugging information. + +### 8. Tips + +#### Getting Started + +* If you have never contributed to the compiler before, a simple way to begin + can be adding a log statement or `panic("here")` to get some + initial insight into whatever you are investigating. + +* The compiler itself provides logging, debugging and visualization capabilities, + such as: + ``` + $ go build -gcflags=-m=2 # print optimization info, including inlining, escape analysis + $ go build -gcflags=-d=ssa/check_bce/debug # print bounds check info + $ go build -gcflags=-W # print internal parse tree after type checking + $ GOSSAFUNC=Foo go build # generate ssa.html file for func Foo + $ go build -gcflags=-S # print assembly + $ go tool compile -bench=out.txt x.go # print timing of compiler phases + ``` + + Some flags alter the compiler behavior, such as: + ``` + $ go tool compile -h file.go # panic on first compile error encountered + $ go build -gcflags=-d=checkptr=2 # enable additional unsafe pointer checking + ``` + + There are many additional flags. Some descriptions are available via: + ``` + $ go tool compile -h # compiler flags, e.g., go build -gcflags='-m=1 -l' + $ go tool compile -d help # debug flags, e.g., go build -gcflags=-d=checkptr=2 + $ go tool compile -d ssa/help # ssa flags, e.g., go build -gcflags=-d=ssa/prove/debug=2 + ``` + + There are some additional details about `-gcflags` and the differences between `go build` + vs. `go tool compile` in a [section below](#-gcflags-and-go-build-vs-go-tool-compile). + +* In general, when investigating a problem in the compiler you usually want to + start with the simplest possible reproduction and understand exactly what is + happening with it. + +#### Testing your changes + +* Be sure to read the [Quickly testing your changes](https://go.dev/doc/contribute#quick_test) + section of the Go Contribution Guide. + +* Some tests live within the cmd/compile packages and can be run by `go test ./...` or similar, + but many cmd/compile tests are in the top-level + [test](https://github.com/golang/go/tree/master/test) directory: + + ``` + $ go test cmd/internal/testdir # all tests in 'test' dir + $ go test cmd/internal/testdir -run='Test/escape.*.go' # test specific files in 'test' dir + ``` + For details, see the [testdir README](https://github.com/golang/go/tree/master/test#readme). + The `errorCheck` method in [testdir_test.go](https://github.com/golang/go/blob/master/src/cmd/internal/testdir/testdir_test.go) + is helpful for a description of the `ERROR` comments used in many of those tests. + + In addition, the `go/types` package from the standard library and `cmd/compile/internal/types2` + have shared tests in `src/internal/types/testdata`, and both type checkers + should be checked if anything changes there. + +* The new [application-based coverage profiling](https://go.dev/testing/coverage/) can be used + with the compiler, such as: + + ``` + $ go install -cover -coverpkg=cmd/compile/... cmd/compile # build compiler with coverage instrumentation + $ mkdir /tmp/coverdir # pick location for coverage data + $ GOCOVERDIR=/tmp/coverdir go test [...] # use compiler, saving coverage data + $ go tool covdata textfmt -i=/tmp/coverdir -o coverage.out # convert to traditional coverage format + $ go tool cover -html coverage.out # view coverage via traditional tools + ``` + +#### Juggling compiler versions + +* Many of the compiler tests use the version of the `go` command found in your PATH and + its corresponding `compile` binary. + +* If you are in a branch and your PATH includes `<go-repo>/bin`, + doing `go install cmd/compile` will build the compiler using the code from your + branch and install it to the proper location so that subsequent `go` commands + like `go build` or `go test ./...` will exercise your freshly built compiler. + +* [toolstash](https://pkg.go.dev/golang.org/x/tools/cmd/toolstash) provides a way + to save, run, and restore a known good copy of the Go toolchain. For example, it can be + a good practice to initially build your branch, save that version of + the toolchain, then restore the known good version of the tools to compile + your work-in-progress version of the compiler. + + Sample set up steps: + ``` + $ go install golang.org/x/tools/cmd/toolstash@latest + $ git clone https://go.googlesource.com/go + $ cd go + $ git checkout -b mybranch + $ ./src/all.bash # build and confirm good starting point + $ export PATH=$PWD/bin:$PATH + $ toolstash save # save current tools + ``` + After that, your edit/compile/test cycle can be similar to: + ``` + <... make edits to cmd/compile source ...> + $ toolstash restore && go install cmd/compile # restore known good tools to build compiler + <... 'go build', 'go test', etc. ...> # use freshly built compiler + ``` + +* toolstash also allows comparing the installed vs. stashed copy of + the compiler, such as if you expect equivalent behavior after a refactor. + For example, to check that your changed compiler produces identical object files to + the stashed compiler while building the standard library: + ``` + $ toolstash restore && go install cmd/compile # build latest compiler + $ go build -toolexec "toolstash -cmp" -a -v std # compare latest vs. saved compiler + ``` + +* If versions appear to get out of sync (for example, with errors like + `linked object header mismatch` with version strings like + `devel go1.21-db3f952b1f`), you might need to do + `toolstash restore && go install cmd/...` to update all the tools under cmd. + +#### Additional helpful tools + +* [compilebench](https://pkg.go.dev/golang.org/x/tools/cmd/compilebench) benchmarks + the speed of the compiler. + +* [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat) is the standard tool + for reporting performance changes resulting from compiler modifications, + including whether any improvements are statistically significant: + ``` + $ go test -bench=SomeBenchmarks -count=20 > new.txt # use new compiler + $ toolstash restore # restore old compiler + $ go test -bench=SomeBenchmarks -count=20 > old.txt # use old compiler + $ benchstat old.txt new.txt # compare old vs. new + ``` + +* [bent](https://pkg.go.dev/golang.org/x/benchmarks/cmd/bent) facilitates running a + large set of benchmarks from various community Go projects inside a Docker container. + +* [perflock](https://github.com/aclements/perflock) helps obtain more consistent + benchmark results, including by manipulating CPU frequency scaling settings on Linux. + +* [view-annotated-file](https://github.com/loov/view-annotated-file) (from the community) + overlays inlining, bounds check, and escape info back onto the source code. + +* [godbolt.org](https://go.godbolt.org) is widely used to examine + and share assembly output from many compilers, including the Go compiler. It can also + [compare](https://go.godbolt.org/z/5Gs1G4bKG) assembly for different versions of + a function or across Go compiler versions, which can be helpful for investigations and + bug reports. + +#### -gcflags and 'go build' vs. 'go tool compile' + +* `-gcflags` is a go command [build flag](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies). + `go build -gcflags=<args>` passes the supplied `<args>` to the underlying + `compile` invocation(s) while still doing everything that the `go build` command + normally does (e.g., handling the build cache, modules, and so on). In contrast, + `go tool compile <args>` asks the `go` command to invoke `compile <args>` a single time + without involving the standard `go build` machinery. In some cases, it can be helpful to have + fewer moving parts by doing `go tool compile <args>`, such as if you have a + small standalone source file that can be compiled without any assistance from `go build`. + In other cases, it is more convenient to pass `-gcflags` to a build command like + `go build`, `go test`, or `go install`. + +* `-gcflags` by default applies to the packages named on the command line, but can + use package patterns such as `-gcflags='all=-m=1 -l'`, or multiple package patterns such as + `-gcflags='all=-m=1' -gcflags='fmt=-m=2'`. For details, see the + [cmd/go documentation](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies). + +### Further reading + +To dig deeper into how the SSA package works, including its passes and rules, +head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md). + +Finally, if something in this README or the SSA README is unclear +or if you have an idea for an improvement, feel free to leave a comment in +[issue 30074](https://go.dev/issue/30074). |