916 files changed, 184295 insertions, 0 deletions
diff --git a/src/runtime/HACKING.md b/src/runtime/HACKING.md
new file mode 100644
index 0000000..fbf22ee
--- /dev/null
+++ b/src/runtime/HACKING.md
@@ -0,0 +1,312 @@
+This is a living document and at times it will be out of date. It is
+intended to articulate how programming in the Go runtime differs from
+writing normal Go. It focuses on pervasive concepts rather than
+details of particular interfaces.
+
+Scheduler structures
+====================
+
+The scheduler manages three types of resources that pervade the
+runtime: Gs, Ms, and Ps. It's important to understand these even if
+you're not working on the scheduler.
+
+Gs, Ms, Ps
+----------
+
+A "G" is simply a goroutine. It's represented by type `g`. When a
+goroutine exits, its `g` object is returned to a pool of free `g`s and
+can later be reused for some other goroutine.
+
+An "M" is an OS thread that can be executing user Go code, runtime
+code, a system call, or be idle. It's represented by type `m`. There
+can be any number of Ms at a time since any number of threads may be
+blocked in system calls.
+
+Finally, a "P" represents the resources required to execute user Go
+code, such as scheduler and memory allocator state. It's represented
+by type `p`. There are exactly `GOMAXPROCS` Ps. A P can be thought of
+like a CPU in the OS scheduler and the contents of the `p` type like
+per-CPU state. This is a good place to put state that needs to be
+sharded for efficiency, but doesn't need to be per-thread or
+per-goroutine.
+
+The scheduler's job is to match up a G (the code to execute), an M
+(where to execute it), and a P (the rights and resources to execute
+it). When an M stops executing user Go code, for example by entering a
+system call, it returns its P to the idle P pool. In order to resume
+executing user Go code, for example on return from a system call, it
+must acquire a P from the idle pool.
+
+All `g`, `m`, and `p` objects are heap allocated, but are never freed,
+so their memory remains type stable. As a result, the runtime can
+avoid write barriers in the depths of the scheduler.
+
+User stacks and system stacks
+-----------------------------
+
+Every non-dead G has a *user stack* associated with it, which is what
+user Go code executes on. User stacks start small (e.g., 2K) and grow
+or shrink dynamically.
+
+Every M has a *system stack* associated with it (also known as the M's
+"g0" stack because it's implemented as a stub G) and, on Unix
+platforms, a *signal stack* (also known as the M's "gsignal" stack).
+System and signal stacks cannot grow, but are large enough to execute
+runtime and cgo code (8K in a pure Go binary; system-allocated in a
+cgo binary).
+
+Runtime code often temporarily switches to the system stack using
+`systemstack`, `mcall`, or `asmcgocall` to perform tasks that must not
+be preempted, that must not grow the user stack, or that switch user
+goroutines. Code running on the system stack is implicitly
+non-preemptible and the garbage collector does not scan system stacks.
+While running on the system stack, the current user stack is not used
+for execution.
+
+`getg()` and `getg().m.curg`
+----------------------------
+
+To get the current user `g`, use `getg().m.curg`.
+
+`getg()` alone returns the current `g`, but when executing on the
+system or signal stacks, this will return the current M's "g0" or
+"gsignal", respectively. This is usually not what you want.
+
+To determine if you're running on the user stack or the system stack,
+use `getg() == getg().m.curg`.
+
+Error handling and reporting
+============================
+
+Errors that can reasonably be recovered from in user code should use
+`panic` like usual. However, there are some situations where `panic`
+will cause an immediate fatal error, such as when called on the system
+stack or when called during `mallocgc`.
+
+Most errors in the runtime are not recoverable. For these, use
+`throw`, which dumps the traceback and immediately terminates the
+process. In general, `throw` should be passed a string constant to
+avoid allocating in perilous situations. By convention, additional
+details are printed before `throw` using `print` or `println` and the
+messages are prefixed with "runtime:".
+
+For runtime error debugging, it's useful to run with
+`GOTRACEBACK=system` or `GOTRACEBACK=crash`.
+
+Synchronization
+===============
+
+The runtime has multiple synchronization mechanisms. They differ in
+semantics and, in particular, in whether they interact with the
+goroutine scheduler or the OS scheduler.
+
+The simplest is `mutex`, which is manipulated using `lock` and
+`unlock`. This should be used to protect shared structures for short
+periods. Blocking on a `mutex` directly blocks the M, without
+interacting with the Go scheduler. This means it is safe to use from
+the lowest levels of the runtime, but also prevents any associated G
+and P from being rescheduled. `rwmutex` is similar.
+
+For one-shot notifications, use `note`, which provides `notesleep` and
+`notewakeup`. Unlike traditional UNIX `sleep`/`wakeup`, `note`s are
+race-free, so `notesleep` returns immediately if the `notewakeup` has
+already happened. A `note` can be reset after use with `noteclear`,
+which must not race with a sleep or wakeup. Like `mutex`, blocking on
+a `note` blocks the M. However, there are different ways to sleep on a
+`note`:`notesleep` also prevents rescheduling of any associated G and
+P, while `notetsleepg` acts like a blocking system call that allows
+the P to be reused to run another G. This is still less efficient than
+blocking the G directly since it consumes an M.
+
+To interact directly with the goroutine scheduler, use `gopark` and
+`goready`. `gopark` parks the current goroutine—putting it in the
+"waiting" state and removing it from the scheduler's run queue—and
+schedules another goroutine on the current M/P. `goready` puts a
+parked goroutine back in the "runnable" state and adds it to the run
+queue.
+
+In summary,
+
+<table>
+<tr><th></th><th colspan="3">Blocks</th></tr>
+<tr><th>Interface</th><th>G</th><th>M</th><th>P</th></tr>
+<tr><td>(rw)mutex</td><td>Y</td><td>Y</td><td>Y</td></tr>
+<tr><td>note</td><td>Y</td><td>Y</td><td>Y/N</td></tr>
+<tr><td>park</td><td>Y</td><td>N</td><td>N</td></tr>
+</table>
+
+Atomics
+=======
+
+The runtime uses its own atomics package at `runtime/internal/atomic`.
+This corresponds to `sync/atomic`, but functions have different names
+for historical reasons and there are a few additional functions needed
+by the runtime.
+
+In general, we think hard about the uses of atomics in the runtime and
+try to avoid unnecessary atomic operations. If access to a variable is
+sometimes protected by another synchronization mechanism, the
+already-protected accesses generally don't need to be atomic. There
+are several reasons for this:
+
+1. Using non-atomic or atomic access where appropriate makes the code
+   more self-documenting. Atomic access to a variable implies there's
+   somewhere else that may concurrently access the variable.
+
+2. Non-atomic access allows for automatic race detection. The runtime
+   doesn't currently have a race detector, but it may in the future.
+   Atomic access defeats the race detector, while non-atomic access
+   allows the race detector to check your assumptions.
+
+3. Non-atomic access may improve performance.
+
+Of course, any non-atomic access to a shared variable should be
+documented to explain how that access is protected.
+
+Some common patterns that mix atomic and non-atomic access are:
+
+* Read-mostly variables where updates are protected by a lock. Within
+  the locked region, reads do not need to be atomic, but the write
+  does. Outside the locked region, reads need to be atomic.
+
+* Reads that only happen during STW, where no writes can happen during
+  STW, do not need to be atomic.
+
+That said, the advice from the Go memory model stands: "Don't be
+[too] clever." The performance of the runtime matters, but its
+robustness matters more.
+
+Unmanaged memory
+================
+
+In general, the runtime tries to use regular heap allocation. However,
+in some cases the runtime must allocate objects outside of the garbage
+collected heap, in *unmanaged memory*. This is necessary if the
+objects are part of the memory manager itself or if they must be
+allocated in situations where the caller may not have a P.
+
+There are three mechanisms for allocating unmanaged memory:
+
+* sysAlloc obtains memory directly from the OS. This comes in whole
+  multiples of the system page size, but it can be freed with sysFree.
+
+* persistentalloc combines multiple smaller allocations into a single
+  sysAlloc to avoid fragmentation. However, there is no way to free
+  persistentalloced objects (hence the name).
+
+* fixalloc is a SLAB-style allocator that allocates objects of a fixed
+  size. fixalloced objects can be freed, but this memory can only be
+  reused by the same fixalloc pool, so it can only be reused for
+  objects of the same type.
+
+In general, types that are allocated using any of these should be
+marked `//go:notinheap` (see below).
+
+Objects that are allocated in unmanaged memory **must not** contain
+heap pointers unless the following rules are also obeyed:
+
+1. Any pointers from unmanaged memory to the heap must be garbage
+   collection roots. More specifically, any pointer must either be
+   accessible through a global variable or be added as an explicit
+   garbage collection root in `runtime.markroot`.
+
+2. If the memory is reused, the heap pointers must be zero-initialized
+   before they become visible as GC roots. Otherwise, the GC may
+   observe stale heap pointers. See "Zero-initialization versus
+   zeroing".
+
+Zero-initialization versus zeroing
+==================================
+
+There are two types of zeroing in the runtime, depending on whether
+the memory is already initialized to a type-safe state.
+
+If memory is not in a type-safe state, meaning it potentially contains
+"garbage" because it was just allocated and it is being initialized
+for first use, then it must be *zero-initialized* using
+`memclrNoHeapPointers` or non-pointer writes. This does not perform
+write barriers.
+
+If memory is already in a type-safe state and is simply being set to
+the zero value, this must be done using regular writes, `typedmemclr`,
+or `memclrHasPointers`. This performs write barriers.
+
+Runtime-only compiler directives
+================================
+
+In addition to the "//go:" directives documented in "go doc compile",
+the compiler supports additional directives only in the runtime.
+
+go:systemstack
+--------------
+
+`go:systemstack` indicates that a function must run on the system
+stack. This is checked dynamically by a special function prologue.
+
+go:nowritebarrier
+-----------------
+
+`go:nowritebarrier` directs the compiler to emit an error if the
+following function contains any write barriers. (It *does not*
+suppress the generation of write barriers; it is simply an assertion.)
+
+Usually you want `go:nowritebarrierrec`. `go:nowritebarrier` is
+primarily useful in situations where it's "nice" not to have write
+barriers, but not required for correctness.
+
+go:nowritebarrierrec and go:yeswritebarrierrec
+----------------------------------------------
+
+`go:nowritebarrierrec` directs the compiler to emit an error if the
+following function or any function it calls recursively, up to a
+`go:yeswritebarrierrec`, contains a write barrier.
+
+Logically, the compiler floods the call graph starting from each
+`go:nowritebarrierrec` function and produces an error if it encounters
+a function containing a write barrier. This flood stops at
+`go:yeswritebarrierrec` functions.
+
+`go:nowritebarrierrec` is used in the implementation of the write
+barrier to prevent infinite loops.
+
+Both directives are used in the scheduler. The write barrier requires
+an active P (`getg().m.p != nil`) and scheduler code often runs
+without an active P. In this case, `go:nowritebarrierrec` is used on
+functions that release the P or may run without a P and
+`go:yeswritebarrierrec` is used when code re-acquires an active P.
+Since these are function-level annotations, code that releases or
+acquires a P may need to be split across two functions.
+
+go:notinheap
+------------
+
+`go:notinheap` applies to type declarations. It indicates that a type
+must never be allocated from the GC'd heap or on the stack.
+Specifically, pointers to this type must always fail the
+`runtime.inheap` check. The type may be used for global variables, or
+for objects in unmanaged memory (e.g., allocated with `sysAlloc`,
+`persistentalloc`, `fixalloc`, or from a manually-managed span).
+Specifically:
+
+1. `new(T)`, `make([]T)`, `append([]T, ...)` and implicit heap
+   allocation of T are disallowed. (Though implicit allocations are
+   disallowed in the runtime anyway.)
+
+2. A pointer to a regular type (other than `unsafe.Pointer`) cannot be
+   converted to a pointer to a `go:notinheap` type, even if they have
+   the same underlying type.
+
+3. Any type that contains a `go:notinheap` type is itself
+   `go:notinheap`. Structs and arrays are `go:notinheap` if their
+   elements are. Maps and channels of `go:notinheap` types are
+   disallowed. To keep things explicit, any type declaration where the
+   type is implicitly `go:notinheap` must be explicitly marked
+   `go:notinheap` as well.
+
+4. Write barriers on pointers to `go:notinheap` types can be omitted.
+
+The last point is the real benefit of `go:notinheap`. The runtime uses
+it for low-level internal structures to avoid memory barriers in the
+scheduler and the memory allocator where they are illegal or simply
+inefficient. This mechanism is reasonably safe and does not compromise
+the readability of the runtime.
diff --git a/src/runtime/Makefile b/src/runtime/Makefile
new file mode 100644
index 0000000..55087de
--- /dev/null
+++ b/src/runtime/Makefile
@@ -0,0 +1,5 @@
+# Copyright 2009 The Go Authors. All rights reserved.
+# Use of this source code is governed by a BSD-style
+# license that can be found in the LICENSE file.
+
+include ../Make.dist
diff --git a/src/runtime/alg.go b/src/runtime/alg.go
new file mode 100644
index 0000000..1b3bf11
--- /dev/null
+++ b/src/runtime/alg.go
@@ -0,0 +1,370 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	c0 = uintptr((8-sys.PtrSize)/4*2860486313 + (sys.PtrSize-4)/4*33054211828000289)
+	c1 = uintptr((8-sys.PtrSize)/4*3267000013 + (sys.PtrSize-4)/4*23344194077549503)
+)
+
+func memhash0(p unsafe.Pointer, h uintptr) uintptr {
+	return h
+}
+
+func memhash8(p unsafe.Pointer, h uintptr) uintptr {
+	return memhash(p, h, 1)
+}
+
+func memhash16(p unsafe.Pointer, h uintptr) uintptr {
+	return memhash(p, h, 2)
+}
+
+func memhash128(p unsafe.Pointer, h uintptr) uintptr {
+	return memhash(p, h, 16)
+}
+
+//go:nosplit
+func memhash_varlen(p unsafe.Pointer, h uintptr) uintptr {
+	ptr := getclosureptr()
+	size := *(*uintptr)(unsafe.Pointer(ptr + unsafe.Sizeof(h)))
+	return memhash(p, h, size)
+}
+
+// runtime variable to check if the processor we're running on
+// actually supports the instructions used by the AES-based
+// hash implementation.
+var useAeshash bool
+
+// in asm_*.s
+func memhash(p unsafe.Pointer, h, s uintptr) uintptr
+func memhash32(p unsafe.Pointer, h uintptr) uintptr
+func memhash64(p unsafe.Pointer, h uintptr) uintptr
+func strhash(p unsafe.Pointer, h uintptr) uintptr
+
+func strhashFallback(a unsafe.Pointer, h uintptr) uintptr {
+	x := (*stringStruct)(a)
+	return memhashFallback(x.str, h, uintptr(x.len))
+}
+
+// NOTE: Because NaN != NaN, a map can contain any
+// number of (mostly useless) entries keyed with NaNs.
+// To avoid long hash chains, we assign a random number
+// as the hash value for a NaN.
+
+func f32hash(p unsafe.Pointer, h uintptr) uintptr {
+	f := *(*float32)(p)
+	switch {
+	case f == 0:
+		return c1 * (c0 ^ h) // +0, -0
+	case f != f:
+		return c1 * (c0 ^ h ^ uintptr(fastrand())) // any kind of NaN
+	default:
+		return memhash(p, h, 4)
+	}
+}
+
+func f64hash(p unsafe.Pointer, h uintptr) uintptr {
+	f := *(*float64)(p)
+	switch {
+	case f == 0:
+		return c1 * (c0 ^ h) // +0, -0
+	case f != f:
+		return c1 * (c0 ^ h ^ uintptr(fastrand())) // any kind of NaN
+	default:
+		return memhash(p, h, 8)
+	}
+}
+
+func c64hash(p unsafe.Pointer, h uintptr) uintptr {
+	x := (*[2]float32)(p)
+	return f32hash(unsafe.Pointer(&x[1]), f32hash(unsafe.Pointer(&x[0]), h))
+}
+
+func c128hash(p unsafe.Pointer, h uintptr) uintptr {
+	x := (*[2]float64)(p)
+	return f64hash(unsafe.Pointer(&x[1]), f64hash(unsafe.Pointer(&x[0]), h))
+}
+
+func interhash(p unsafe.Pointer, h uintptr) uintptr {
+	a := (*iface)(p)
+	tab := a.tab
+	if tab == nil {
+		return h
+	}
+	t := tab._type
+	if t.equal == nil {
+		// Check hashability here. We could do this check inside
+		// typehash, but we want to report the topmost type in
+		// the error text (e.g. in a struct with a field of slice type
+		// we want to report the struct, not the slice).
+		panic(errorString("hash of unhashable type " + t.string()))
+	}
+	if isDirectIface(t) {
+		return c1 * typehash(t, unsafe.Pointer(&a.data), h^c0)
+	} else {
+		return c1 * typehash(t, a.data, h^c0)
+	}
+}
+
+func nilinterhash(p unsafe.Pointer, h uintptr) uintptr {
+	a := (*eface)(p)
+	t := a._type
+	if t == nil {
+		return h
+	}
+	if t.equal == nil {
+		// See comment in interhash above.
+		panic(errorString("hash of unhashable type " + t.string()))
+	}
+	if isDirectIface(t) {
+		return c1 * typehash(t, unsafe.Pointer(&a.data), h^c0)
+	} else {
+		return c1 * typehash(t, a.data, h^c0)
+	}
+}
+
+// typehash computes the hash of the object of type t at address p.
+// h is the seed.
+// This function is seldom used. Most maps use for hashing either
+// fixed functions (e.g. f32hash) or compiler-generated functions
+// (e.g. for a type like struct { x, y string }). This implementation
+// is slower but more general and is used for hashing interface types
+// (called from interhash or nilinterhash, above) or for hashing in
+// maps generated by reflect.MapOf (reflect_typehash, below).
+// Note: this function must match the compiler generated
+// functions exactly. See issue 37716.
+func typehash(t *_type, p unsafe.Pointer, h uintptr) uintptr {
+	if t.tflag&tflagRegularMemory != 0 {
+		// Handle ptr sizes specially, see issue 37086.
+		switch t.size {
+		case 4:
+			return memhash32(p, h)
+		case 8:
+			return memhash64(p, h)
+		default:
+			return memhash(p, h, t.size)
+		}
+	}
+	switch t.kind & kindMask {
+	case kindFloat32:
+		return f32hash(p, h)
+	case kindFloat64:
+		return f64hash(p, h)
+	case kindComplex64:
+		return c64hash(p, h)
+	case kindComplex128:
+		return c128hash(p, h)
+	case kindString:
+		return strhash(p, h)
+	case kindInterface:
+		i := (*interfacetype)(unsafe.Pointer(t))
+		if len(i.mhdr) == 0 {
+			return nilinterhash(p, h)
+		}
+		return interhash(p, h)
+	case kindArray:
+		a := (*arraytype)(unsafe.Pointer(t))
+		for i := uintptr(0); i < a.len; i++ {
+			h = typehash(a.elem, add(p, i*a.elem.size), h)
+		}
+		return h
+	case kindStruct:
+		s := (*structtype)(unsafe.Pointer(t))
+		memStart := uintptr(0)
+		memEnd := uintptr(0)
+		for _, f := range s.fields {
+			if memEnd > memStart && (f.name.isBlank() || f.offset() != memEnd || f.typ.tflag&tflagRegularMemory == 0) {
+				// flush any pending regular memory hashing
+				h = memhash(add(p, memStart), h, memEnd-memStart)
+				memStart = memEnd
+			}
+			if f.name.isBlank() {
+				continue
+			}
+			if f.typ.tflag&tflagRegularMemory == 0 {
+				h = typehash(f.typ, add(p, f.offset()), h)
+				continue
+			}
+			if memStart == memEnd {
+				memStart = f.offset()
+			}
+			memEnd = f.offset() + f.typ.size
+		}
+		if memEnd > memStart {
+			h = memhash(add(p, memStart), h, memEnd-memStart)
+		}
+		return h
+	default:
+		// Should never happen, as typehash should only be called
+		// with comparable types.
+		panic(errorString("hash of unhashable type " + t.string()))
+	}
+}
+
+//go:linkname reflect_typehash reflect.typehash
+func reflect_typehash(t *_type, p unsafe.Pointer, h uintptr) uintptr {
+	return typehash(t, p, h)
+}
+
+func memequal0(p, q unsafe.Pointer) bool {
+	return true
+}
+func memequal8(p, q unsafe.Pointer) bool {
+	return *(*int8)(p) == *(*int8)(q)
+}
+func memequal16(p, q unsafe.Pointer) bool {
+	return *(*int16)(p) == *(*int16)(q)
+}
+func memequal32(p, q unsafe.Pointer) bool {
+	return *(*int32)(p) == *(*int32)(q)
+}
+func memequal64(p, q unsafe.Pointer) bool {
+	return *(*int64)(p) == *(*int64)(q)
+}
+func memequal128(p, q unsafe.Pointer) bool {
+	return *(*[2]int64)(p) == *(*[2]int64)(q)
+}
+func f32equal(p, q unsafe.Pointer) bool {
+	return *(*float32)(p) == *(*float32)(q)
+}
+func f64equal(p, q unsafe.Pointer) bool {
+	return *(*float64)(p) == *(*float64)(q)
+}
+func c64equal(p, q unsafe.Pointer) bool {
+	return *(*complex64)(p) == *(*complex64)(q)
+}
+func c128equal(p, q unsafe.Pointer) bool {
+	return *(*complex128)(p) == *(*complex128)(q)
+}
+func strequal(p, q unsafe.Pointer) bool {
+	return *(*string)(p) == *(*string)(q)
+}
+func interequal(p, q unsafe.Pointer) bool {
+	x := *(*iface)(p)
+	y := *(*iface)(q)
+	return x.tab == y.tab && ifaceeq(x.tab, x.data, y.data)
+}
+func nilinterequal(p, q unsafe.Pointer) bool {
+	x := *(*eface)(p)
+	y := *(*eface)(q)
+	return x._type == y._type && efaceeq(x._type, x.data, y.data)
+}
+func efaceeq(t *_type, x, y unsafe.Pointer) bool {
+	if t == nil {
+		return true
+	}
+	eq := t.equal
+	if eq == nil {
+		panic(errorString("comparing uncomparable type " + t.string()))
+	}
+	if isDirectIface(t) {
+		// Direct interface types are ptr, chan, map, func, and single-element structs/arrays thereof.
+		// Maps and funcs are not comparable, so they can't reach here.
+		// Ptrs, chans, and single-element items can be compared directly using ==.
+		return x == y
+	}
+	return eq(x, y)
+}
+func ifaceeq(tab *itab, x, y unsafe.Pointer) bool {
+	if tab == nil {
+		return true
+	}
+	t := tab._type
+	eq := t.equal
+	if eq == nil {
+		panic(errorString("comparing uncomparable type " + t.string()))
+	}
+	if isDirectIface(t) {
+		// See comment in efaceeq.
+		return x == y
+	}
+	return eq(x, y)
+}
+
+// Testing adapters for hash quality tests (see hash_test.go)
+func stringHash(s string, seed uintptr) uintptr {
+	return strhash(noescape(unsafe.Pointer(&s)), seed)
+}
+
+func bytesHash(b []byte, seed uintptr) uintptr {
+	s := (*slice)(unsafe.Pointer(&b))
+	return memhash(s.array, seed, uintptr(s.len))
+}
+
+func int32Hash(i uint32, seed uintptr) uintptr {
+	return memhash32(noescape(unsafe.Pointer(&i)), seed)
+}
+
+func int64Hash(i uint64, seed uintptr) uintptr {
+	return memhash64(noescape(unsafe.Pointer(&i)), seed)
+}
+
+func efaceHash(i interface{}, seed uintptr) uintptr {
+	return nilinterhash(noescape(unsafe.Pointer(&i)), seed)
+}
+
+func ifaceHash(i interface {
+	F()
+}, seed uintptr) uintptr {
+	return interhash(noescape(unsafe.Pointer(&i)), seed)
+}
+
+const hashRandomBytes = sys.PtrSize / 4 * 64
+
+// used in asm_{386,amd64,arm64}.s to seed the hash function
+var aeskeysched [hashRandomBytes]byte
+
+// used in hash{32,64}.go to seed the hash function
+var hashkey [4]uintptr
+
+func alginit() {
+	// Install AES hash algorithms if the instructions needed are present.
+	if (GOARCH == "386" || GOARCH == "amd64") &&
+		cpu.X86.HasAES && // AESENC
+		cpu.X86.HasSSSE3 && // PSHUFB
+		cpu.X86.HasSSE41 { // PINSR{D,Q}
+		initAlgAES()
+		return
+	}
+	if GOARCH == "arm64" && cpu.ARM64.HasAES {
+		initAlgAES()
+		return
+	}
+	getRandomData((*[len(hashkey) * sys.PtrSize]byte)(unsafe.Pointer(&hashkey))[:])
+	hashkey[0] |= 1 // make sure these numbers are odd
+	hashkey[1] |= 1
+	hashkey[2] |= 1
+	hashkey[3] |= 1
+}
+
+func initAlgAES() {
+	useAeshash = true
+	// Initialize with random data so hash collisions will be hard to engineer.
+	getRandomData(aeskeysched[:])
+}
+
+// Note: These routines perform the read with a native endianness.
+func readUnaligned32(p unsafe.Pointer) uint32 {
+	q := (*[4]byte)(p)
+	if sys.BigEndian {
+		return uint32(q[3]) | uint32(q[2])<<8 | uint32(q[1])<<16 | uint32(q[0])<<24
+	}
+	return uint32(q[0]) | uint32(q[1])<<8 | uint32(q[2])<<16 | uint32(q[3])<<24
+}
+
+func readUnaligned64(p unsafe.Pointer) uint64 {
+	q := (*[8]byte)(p)
+	if sys.BigEndian {
+		return uint64(q[7]) | uint64(q[6])<<8 | uint64(q[5])<<16 | uint64(q[4])<<24 |
+			uint64(q[3])<<32 | uint64(q[2])<<40 | uint64(q[1])<<48 | uint64(q[0])<<56
+	}
+	return uint64(q[0]) | uint64(q[1])<<8 | uint64(q[2])<<16 | uint64(q[3])<<24 | uint64(q[4])<<32 | uint64(q[5])<<40 | uint64(q[6])<<48 | uint64(q[7])<<56
+}
diff --git a/src/runtime/asm.s b/src/runtime/asm.s
new file mode 100644
index 0000000..27d8df9
--- /dev/null
+++ b/src/runtime/asm.s
@@ -0,0 +1,13 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// funcdata for functions with no local variables in frame.
+// Define two zero-length bitmaps, because the same index is used
+// for the local variables as for the argument frame, and assembly
+// frames have two argument bitmaps, one without results and one with results.
+DATA runtime·no_pointers_stackmap+0x00(SB)/4, $2
+DATA runtime·no_pointers_stackmap+0x04(SB)/4, $0
+GLOBL runtime·no_pointers_stackmap(SB),RODATA, $8
diff --git a/src/runtime/asm_386.s b/src/runtime/asm_386.s
new file mode 100644
index 0000000..fa3b1be
--- /dev/null
+++ b/src/runtime/asm_386.s
@@ -0,0 +1,1564 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// _rt0_386 is common startup code for most 386 systems when using
+// internal linking. This is the entry point for the program from the
+// kernel for an ordinary -buildmode=exe program. The stack holds the
+// number of arguments and the C-style argv.
+TEXT _rt0_386(SB),NOSPLIT,$8
+	MOVL	8(SP), AX	// argc
+	LEAL	12(SP), BX	// argv
+	MOVL	AX, 0(SP)
+	MOVL	BX, 4(SP)
+	JMP	runtime·rt0_go(SB)
+
+// _rt0_386_lib is common startup code for most 386 systems when
+// using -buildmode=c-archive or -buildmode=c-shared. The linker will
+// arrange to invoke this function as a global constructor (for
+// c-archive) or when the shared library is loaded (for c-shared).
+// We expect argc and argv to be passed on the stack following the
+// usual C ABI.
+TEXT _rt0_386_lib(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	PUSHL	BX
+	PUSHL	SI
+	PUSHL	DI
+
+	MOVL	8(BP), AX
+	MOVL	AX, _rt0_386_lib_argc<>(SB)
+	MOVL	12(BP), AX
+	MOVL	AX, _rt0_386_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	CALL	runtime·libpreinit(SB)
+
+	SUBL	$8, SP
+
+	// Create a new thread to do the runtime initialization.
+	MOVL	_cgo_sys_thread_create(SB), AX
+	TESTL	AX, AX
+	JZ	nocgo
+
+	// Align stack to call C function.
+	// We moved SP to BP above, but BP was clobbered by the libpreinit call.
+	MOVL	SP, BP
+	ANDL	$~15, SP
+
+	MOVL	$_rt0_386_lib_go(SB), BX
+	MOVL	BX, 0(SP)
+	MOVL	$0, 4(SP)
+
+	CALL	AX
+
+	MOVL	BP, SP
+
+	JMP	restore
+
+nocgo:
+	MOVL	$0x800000, 0(SP)                    // stacksize = 8192KB
+	MOVL	$_rt0_386_lib_go(SB), AX
+	MOVL	AX, 4(SP)                           // fn
+	CALL	runtime·newosproc0(SB)
+
+restore:
+	ADDL	$8, SP
+	POPL	DI
+	POPL	SI
+	POPL	BX
+	POPL	BP
+	RET
+
+// _rt0_386_lib_go initializes the Go runtime.
+// This is started in a separate thread by _rt0_386_lib.
+TEXT _rt0_386_lib_go(SB),NOSPLIT,$8
+	MOVL	_rt0_386_lib_argc<>(SB), AX
+	MOVL	AX, 0(SP)
+	MOVL	_rt0_386_lib_argv<>(SB), AX
+	MOVL	AX, 4(SP)
+	JMP	runtime·rt0_go(SB)
+
+DATA _rt0_386_lib_argc<>(SB)/4, $0
+GLOBL _rt0_386_lib_argc<>(SB),NOPTR, $4
+DATA _rt0_386_lib_argv<>(SB)/4, $0
+GLOBL _rt0_386_lib_argv<>(SB),NOPTR, $4
+
+TEXT runtime·rt0_go(SB),NOSPLIT|NOFRAME,$0
+	// Copy arguments forward on an even stack.
+	// Users of this function jump to it, they don't call it.
+	MOVL	0(SP), AX
+	MOVL	4(SP), BX
+	SUBL	$128, SP		// plenty of scratch
+	ANDL	$~15, SP
+	MOVL	AX, 120(SP)		// save argc, argv away
+	MOVL	BX, 124(SP)
+
+	// set default stack bounds.
+	// _cgo_init may update stackguard.
+	MOVL	$runtime·g0(SB), BP
+	LEAL	(-64*1024+104)(SP), BX
+	MOVL	BX, g_stackguard0(BP)
+	MOVL	BX, g_stackguard1(BP)
+	MOVL	BX, (g_stack+stack_lo)(BP)
+	MOVL	SP, (g_stack+stack_hi)(BP)
+
+	// find out information about the processor we're on
+	// first see if CPUID instruction is supported.
+	PUSHFL
+	PUSHFL
+	XORL	$(1<<21), 0(SP) // flip ID bit
+	POPFL
+	PUSHFL
+	POPL	AX
+	XORL	0(SP), AX
+	POPFL	// restore EFLAGS
+	TESTL	$(1<<21), AX
+	JNE 	has_cpuid
+
+bad_proc: // show that the program requires MMX.
+	MOVL	$2, 0(SP)
+	MOVL	$bad_proc_msg<>(SB), 4(SP)
+	MOVL	$0x3d, 8(SP)
+	CALL	runtime·write(SB)
+	MOVL	$1, 0(SP)
+	CALL	runtime·exit(SB)
+	CALL	runtime·abort(SB)
+
+has_cpuid:
+	MOVL	$0, AX
+	CPUID
+	MOVL	AX, SI
+	CMPL	AX, $0
+	JE	nocpuinfo
+
+	// Figure out how to serialize RDTSC.
+	// On Intel processors LFENCE is enough. AMD requires MFENCE.
+	// Don't know about the rest, so let's do MFENCE.
+	CMPL	BX, $0x756E6547  // "Genu"
+	JNE	notintel
+	CMPL	DX, $0x49656E69  // "ineI"
+	JNE	notintel
+	CMPL	CX, $0x6C65746E  // "ntel"
+	JNE	notintel
+	MOVB	$1, runtime·isIntel(SB)
+	MOVB	$1, runtime·lfenceBeforeRdtsc(SB)
+notintel:
+
+	// Load EAX=1 cpuid flags
+	MOVL	$1, AX
+	CPUID
+	MOVL	CX, DI // Move to global variable clobbers CX when generating PIC
+	MOVL	AX, runtime·processorVersionInfo(SB)
+
+	// Check for MMX support
+	TESTL	$(1<<23), DX // MMX
+	JZ	bad_proc
+
+nocpuinfo:
+	// if there is an _cgo_init, call it to let it
+	// initialize and to set up GS.  if not,
+	// we set up GS ourselves.
+	MOVL	_cgo_init(SB), AX
+	TESTL	AX, AX
+	JZ	needtls
+#ifdef GOOS_android
+	// arg 4: TLS base, stored in slot 0 (Android's TLS_SLOT_SELF).
+	// Compensate for tls_g (+8).
+	MOVL	-8(TLS), BX
+	MOVL	BX, 12(SP)
+	MOVL	$runtime·tls_g(SB), 8(SP)	// arg 3: &tls_g
+#else
+	MOVL	$0, BX
+	MOVL	BX, 12(SP)	// arg 3,4: not used when using platform's TLS
+	MOVL	BX, 8(SP)
+#endif
+	MOVL	$setg_gcc<>(SB), BX
+	MOVL	BX, 4(SP)	// arg 2: setg_gcc
+	MOVL	BP, 0(SP)	// arg 1: g0
+	CALL	AX
+
+	// update stackguard after _cgo_init
+	MOVL	$runtime·g0(SB), CX
+	MOVL	(g_stack+stack_lo)(CX), AX
+	ADDL	$const__StackGuard, AX
+	MOVL	AX, g_stackguard0(CX)
+	MOVL	AX, g_stackguard1(CX)
+
+#ifndef GOOS_windows
+	// skip runtime·ldt0setup(SB) and tls test after _cgo_init for non-windows
+	JMP ok
+#endif
+needtls:
+#ifdef GOOS_plan9
+	// skip runtime·ldt0setup(SB) and tls test on Plan 9 in all cases
+	JMP	ok
+#endif
+
+	// set up %gs
+	CALL	ldt0setup<>(SB)
+
+	// store through it, to make sure it works
+	get_tls(BX)
+	MOVL	$0x123, g(BX)
+	MOVL	runtime·m0+m_tls(SB), AX
+	CMPL	AX, $0x123
+	JEQ	ok
+	MOVL	AX, 0	// abort
+ok:
+	// set up m and g "registers"
+	get_tls(BX)
+	LEAL	runtime·g0(SB), DX
+	MOVL	DX, g(BX)
+	LEAL	runtime·m0(SB), AX
+
+	// save m->g0 = g0
+	MOVL	DX, m_g0(AX)
+	// save g0->m = m0
+	MOVL	AX, g_m(DX)
+
+	CALL	runtime·emptyfunc(SB)	// fault if stack check is wrong
+
+	// convention is D is always cleared
+	CLD
+
+	CALL	runtime·check(SB)
+
+	// saved argc, argv
+	MOVL	120(SP), AX
+	MOVL	AX, 0(SP)
+	MOVL	124(SP), AX
+	MOVL	AX, 4(SP)
+	CALL	runtime·args(SB)
+	CALL	runtime·osinit(SB)
+	CALL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	PUSHL	$runtime·mainPC(SB)	// entry
+	PUSHL	$0	// arg size
+	CALL	runtime·newproc(SB)
+	POPL	AX
+	POPL	AX
+
+	// start this M
+	CALL	runtime·mstart(SB)
+
+	CALL	runtime·abort(SB)
+	RET
+
+DATA	bad_proc_msg<>+0x00(SB)/61, $"This program can only be run on processors with MMX support.\n"
+GLOBL	bad_proc_msg<>(SB), RODATA, $61
+
+DATA	runtime·mainPC+0(SB)/4,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$4
+
+TEXT runtime·breakpoint(SB),NOSPLIT,$0-0
+	INT $3
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT,$0-0
+	// Linux and MinGW start the FPU in extended double precision.
+	// Other operating systems use double precision.
+	// Change to double precision to match them,
+	// and to match other hardware that only has double.
+	FLDCW	runtime·controlWord64(SB)
+	RET
+
+/*
+ *  go-routine
+ */
+
+// void gosave(Gobuf*)
+// save state in Gobuf; setjmp
+TEXT runtime·gosave(SB), NOSPLIT, $0-4
+	MOVL	buf+0(FP), AX		// gobuf
+	LEAL	buf+0(FP), BX		// caller's SP
+	MOVL	BX, gobuf_sp(AX)
+	MOVL	0(SP), BX		// caller's PC
+	MOVL	BX, gobuf_pc(AX)
+	MOVL	$0, gobuf_ret(AX)
+	// Assert ctxt is zero. See func save.
+	MOVL	gobuf_ctxt(AX), BX
+	TESTL	BX, BX
+	JZ	2(PC)
+	CALL	runtime·badctxt(SB)
+	get_tls(CX)
+	MOVL	g(CX), BX
+	MOVL	BX, gobuf_g(AX)
+	RET
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT, $8-4
+	MOVL	buf+0(FP), BX		// gobuf
+	MOVL	gobuf_g(BX), DX
+	MOVL	0(DX), CX		// make sure g != nil
+	get_tls(CX)
+	MOVL	DX, g(CX)
+	MOVL	gobuf_sp(BX), SP	// restore SP
+	MOVL	gobuf_ret(BX), AX
+	MOVL	gobuf_ctxt(BX), DX
+	MOVL	$0, gobuf_sp(BX)	// clear to help garbage collector
+	MOVL	$0, gobuf_ret(BX)
+	MOVL	$0, gobuf_ctxt(BX)
+	MOVL	gobuf_pc(BX), BX
+	JMP	BX
+
+// func mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT, $0-4
+	MOVL	fn+0(FP), DI
+
+	get_tls(DX)
+	MOVL	g(DX), AX	// save state in g->sched
+	MOVL	0(SP), BX	// caller's PC
+	MOVL	BX, (g_sched+gobuf_pc)(AX)
+	LEAL	fn+0(FP), BX	// caller's SP
+	MOVL	BX, (g_sched+gobuf_sp)(AX)
+	MOVL	AX, (g_sched+gobuf_g)(AX)
+
+	// switch to m->g0 & its stack, call fn
+	MOVL	g(DX), BX
+	MOVL	g_m(BX), BX
+	MOVL	m_g0(BX), SI
+	CMPL	SI, AX	// if g == m->g0 call badmcall
+	JNE	3(PC)
+	MOVL	$runtime·badmcall(SB), AX
+	JMP	AX
+	MOVL	SI, g(DX)	// g = m->g0
+	MOVL	(g_sched+gobuf_sp)(SI), SP	// sp = m->g0->sched.sp
+	PUSHL	AX
+	MOVL	DI, DX
+	MOVL	0(DI), DI
+	CALL	DI
+	POPL	AX
+	MOVL	$runtime·badmcall2(SB), AX
+	JMP	AX
+	RET
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-4
+	MOVL	fn+0(FP), DI	// DI = fn
+	get_tls(CX)
+	MOVL	g(CX), AX	// AX = g
+	MOVL	g_m(AX), BX	// BX = m
+
+	CMPL	AX, m_gsignal(BX)
+	JEQ	noswitch
+
+	MOVL	m_g0(BX), DX	// DX = g0
+	CMPL	AX, DX
+	JEQ	noswitch
+
+	CMPL	AX, m_curg(BX)
+	JNE	bad
+
+	// switch stacks
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVL	$runtime·systemstack_switch(SB), (g_sched+gobuf_pc)(AX)
+	MOVL	SP, (g_sched+gobuf_sp)(AX)
+	MOVL	AX, (g_sched+gobuf_g)(AX)
+
+	// switch to g0
+	get_tls(CX)
+	MOVL	DX, g(CX)
+	MOVL	(g_sched+gobuf_sp)(DX), BX
+	// make it look like mstart called systemstack on g0, to stop traceback
+	SUBL	$4, BX
+	MOVL	$runtime·mstart(SB), DX
+	MOVL	DX, 0(BX)
+	MOVL	BX, SP
+
+	// call target function
+	MOVL	DI, DX
+	MOVL	0(DI), DI
+	CALL	DI
+
+	// switch back to g
+	get_tls(CX)
+	MOVL	g(CX), AX
+	MOVL	g_m(AX), BX
+	MOVL	m_curg(BX), AX
+	MOVL	AX, g(CX)
+	MOVL	(g_sched+gobuf_sp)(AX), SP
+	MOVL	$0, (g_sched+gobuf_sp)(AX)
+	RET
+
+noswitch:
+	// already on system stack; tail call the function
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVL	DI, DX
+	MOVL	0(DI), DI
+	JMP	DI
+
+bad:
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVL	$runtime·badsystemstack(SB), AX
+	CALL	AX
+	INT	$3
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	get_tls(CX)
+	MOVL	g(CX), BX
+	MOVL	g_m(BX), BX
+	MOVL	m_g0(BX), SI
+	CMPL	g(CX), SI
+	JNE	3(PC)
+	CALL	runtime·badmorestackg0(SB)
+	CALL	runtime·abort(SB)
+
+	// Cannot grow signal stack.
+	MOVL	m_gsignal(BX), SI
+	CMPL	g(CX), SI
+	JNE	3(PC)
+	CALL	runtime·badmorestackgsignal(SB)
+	CALL	runtime·abort(SB)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	MOVL	4(SP), DI	// f's caller's PC
+	MOVL	DI, (m_morebuf+gobuf_pc)(BX)
+	LEAL	8(SP), CX	// f's caller's SP
+	MOVL	CX, (m_morebuf+gobuf_sp)(BX)
+	get_tls(CX)
+	MOVL	g(CX), SI
+	MOVL	SI, (m_morebuf+gobuf_g)(BX)
+
+	// Set g->sched to context in f.
+	MOVL	0(SP), AX	// f's PC
+	MOVL	AX, (g_sched+gobuf_pc)(SI)
+	MOVL	SI, (g_sched+gobuf_g)(SI)
+	LEAL	4(SP), AX	// f's SP
+	MOVL	AX, (g_sched+gobuf_sp)(SI)
+	MOVL	DX, (g_sched+gobuf_ctxt)(SI)
+
+	// Call newstack on m->g0's stack.
+	MOVL	m_g0(BX), BP
+	MOVL	BP, g(CX)
+	MOVL	(g_sched+gobuf_sp)(BP), AX
+	MOVL	-4(AX), BX	// fault if CALL would, before smashing SP
+	MOVL	AX, SP
+	CALL	runtime·newstack(SB)
+	CALL	runtime·abort(SB)	// crash if newstack returns
+	RET
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0-0
+	MOVL	$0, DX
+	JMP runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(argtype *_type, f *FuncVal, arg *byte, argsize, retoffset uint32).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	CMPL	CX, $MAXSIZE;		\
+	JA	3(PC);			\
+	MOVL	$NAME(SB), AX;		\
+	JMP	AX
+// Note: can't just "JMP NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT, $0-20
+	MOVL	argsize+12(FP), CX
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVL	$runtime·badreflectcall(SB), AX
+	JMP	AX
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-20;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVL	argptr+8(FP), SI;		\
+	MOVL	argsize+12(FP), CX;		\
+	MOVL	SP, DI;				\
+	REP;MOVSB;				\
+	/* call function */			\
+	MOVL	f+4(FP), DX;			\
+	MOVL	(DX), AX; 			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	CALL	AX;				\
+	/* copy return values back */		\
+	MOVL	argtype+0(FP), DX;		\
+	MOVL	argptr+8(FP), DI;		\
+	MOVL	argsize+12(FP), CX;		\
+	MOVL	retoffset+16(FP), BX;		\
+	MOVL	SP, SI;				\
+	ADDL	BX, DI;				\
+	ADDL	BX, SI;				\
+	SUBL	BX, CX;				\
+	CALL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $16-0
+	MOVL	DX, 0(SP)
+	MOVL	DI, 4(SP)
+	MOVL	SI, 8(SP)
+	MOVL	CX, 12(SP)
+	CALL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	MOVL	cycles+0(FP), AX
+again:
+	PAUSE
+	SUBL	$1, AX
+	JNZ	again
+	RET
+
+TEXT ·publicationBarrier(SB),NOSPLIT,$0-0
+	// Stores are already ordered on x86, so this is just a
+	// compile barrier.
+	RET
+
+// void jmpdefer(fn, sp);
+// called from deferreturn.
+// 1. pop the caller
+// 2. sub 5 bytes (the length of CALL & a 32 bit displacement) from the callers
+//    return (when building for shared libraries, subtract 16 bytes -- 5 bytes
+//    for CALL & displacement to call __x86.get_pc_thunk.cx, 6 bytes for the
+//    LEAL to load the offset into BX, and finally 5 for the call & displacement)
+// 3. jmp to the argument
+TEXT runtime·jmpdefer(SB), NOSPLIT, $0-8
+	MOVL	fv+0(FP), DX	// fn
+	MOVL	argp+4(FP), BX	// caller sp
+	LEAL	-4(BX), SP	// caller sp after CALL
+#ifdef GOBUILDMODE_shared
+	SUBL	$16, (SP)	// return to CALL again
+#else
+	SUBL	$5, (SP)	// return to CALL again
+#endif
+	MOVL	0(DX), BX
+	JMP	BX	// but first run the deferred function
+
+// Save state of caller into g->sched.
+TEXT gosave<>(SB),NOSPLIT,$0
+	PUSHL	AX
+	PUSHL	BX
+	get_tls(BX)
+	MOVL	g(BX), BX
+	LEAL	arg+0(FP), AX
+	MOVL	AX, (g_sched+gobuf_sp)(BX)
+	MOVL	-4(AX), AX
+	MOVL	AX, (g_sched+gobuf_pc)(BX)
+	MOVL	$0, (g_sched+gobuf_ret)(BX)
+	// Assert ctxt is zero. See func save.
+	MOVL	(g_sched+gobuf_ctxt)(BX), AX
+	TESTL	AX, AX
+	JZ	2(PC)
+	CALL	runtime·badctxt(SB)
+	POPL	BX
+	POPL	AX
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-12
+	MOVL	fn+0(FP), AX
+	MOVL	arg+4(FP), BX
+
+	MOVL	SP, DX
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already.
+	get_tls(CX)
+	MOVL	g(CX), BP
+	CMPL	BP, $0
+	JEQ	nosave	// Don't even have a G yet.
+	MOVL	g_m(BP), BP
+	MOVL	m_g0(BP), SI
+	MOVL	g(CX), DI
+	CMPL	SI, DI
+	JEQ	noswitch
+	CMPL	DI, m_gsignal(BP)
+	JEQ	noswitch
+	CALL	gosave<>(SB)
+	get_tls(CX)
+	MOVL	SI, g(CX)
+	MOVL	(g_sched+gobuf_sp)(SI), SP
+
+noswitch:
+	// Now on a scheduling stack (a pthread-created stack).
+	SUBL	$32, SP
+	ANDL	$~15, SP	// alignment, perhaps unnecessary
+	MOVL	DI, 8(SP)	// save g
+	MOVL	(g_stack+stack_hi)(DI), DI
+	SUBL	DX, DI
+	MOVL	DI, 4(SP)	// save depth in stack (can't just save SP, as stack might be copied during a callback)
+	MOVL	BX, 0(SP)	// first argument in x86-32 ABI
+	CALL	AX
+
+	// Restore registers, g, stack pointer.
+	get_tls(CX)
+	MOVL	8(SP), DI
+	MOVL	(g_stack+stack_hi)(DI), SI
+	SUBL	4(SP), SI
+	MOVL	DI, g(CX)
+	MOVL	SI, SP
+
+	MOVL	AX, ret+8(FP)
+	RET
+nosave:
+	// Now on a scheduling stack (a pthread-created stack).
+	SUBL	$32, SP
+	ANDL	$~15, SP	// alignment, perhaps unnecessary
+	MOVL	DX, 4(SP)	// save original stack pointer
+	MOVL	BX, 0(SP)	// first argument in x86-32 ABI
+	CALL	AX
+
+	MOVL	4(SP), CX	// restore original stack pointer
+	MOVL	CX, SP
+	MOVL	AX, ret+8(FP)
+	RET
+
+// cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$12-12  // Frame size must match commented places below
+	NO_LOCAL_POINTERS
+
+	// If g is nil, Go did not create the current thread.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call through AX.
+	get_tls(CX)
+#ifdef GOOS_windows
+	MOVL	$0, BP
+	CMPL	CX, $0
+	JEQ	2(PC) // TODO
+#endif
+	MOVL	g(CX), BP
+	CMPL	BP, $0
+	JEQ	needm
+	MOVL	g_m(BP), BP
+	MOVL	BP, savedm-4(SP) // saved copy of oldm
+	JMP	havem
+needm:
+	MOVL	$runtime·needm(SB), AX
+	CALL	AX
+	MOVL	$0, savedm-4(SP) // dropm on return
+	get_tls(CX)
+	MOVL	g(CX), BP
+	MOVL	g_m(BP), BP
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVL	m_g0(BP), SI
+	MOVL	SP, (g_sched+gobuf_sp)(SI)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 0(SP).
+	MOVL	m_g0(BP), SI
+	MOVL	(g_sched+gobuf_sp)(SI), AX
+	MOVL	AX, 0(SP)
+	MOVL	SP, (g_sched+gobuf_sp)(SI)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVL	m_curg(BP), SI
+	MOVL	SI, g(CX)
+	MOVL	(g_sched+gobuf_sp)(SI), DI // prepare stack as DI
+	MOVL	(g_sched+gobuf_pc)(SI), BP
+	MOVL	BP, -4(DI)  // "push" return PC on the g stack
+	// Gather our arguments into registers.
+	MOVL	fn+0(FP), AX
+	MOVL	frame+4(FP), BX
+	MOVL	ctxt+8(FP), CX
+	LEAL	-(4+12)(DI), SP  // Must match declared frame size
+	MOVL	AX, 0(SP)
+	MOVL	BX, 4(SP)
+	MOVL	CX, 8(SP)
+	CALL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	get_tls(CX)
+	MOVL	g(CX), SI
+	MOVL	12(SP), BP  // Must match declared frame size
+	MOVL	BP, (g_sched+gobuf_pc)(SI)
+	LEAL	(12+4)(SP), DI  // Must match declared frame size
+	MOVL	DI, (g_sched+gobuf_sp)(SI)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVL	g(CX), BP
+	MOVL	g_m(BP), BP
+	MOVL	m_g0(BP), SI
+	MOVL	SI, g(CX)
+	MOVL	(g_sched+gobuf_sp)(SI), SP
+	MOVL	0(SP), AX
+	MOVL	AX, (g_sched+gobuf_sp)(SI)
+
+	// If the m on entry was nil, we called needm above to borrow an m
+	// for the duration of the call. Since the call is over, return it with dropm.
+	MOVL	savedm-4(SP), DX
+	CMPL	DX, $0
+	JNE 3(PC)
+	MOVL	$runtime·dropm(SB), AX
+	CALL	AX
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-4
+	MOVL	gg+0(FP), BX
+#ifdef GOOS_windows
+	CMPL	BX, $0
+	JNE	settls
+	MOVL	$0, 0x14(FS)
+	RET
+settls:
+	MOVL	g_m(BX), AX
+	LEAL	m_tls(AX), AX
+	MOVL	AX, 0x14(FS)
+#endif
+	get_tls(CX)
+	MOVL	BX, g(CX)
+	RET
+
+// void setg_gcc(G*); set g. for use by gcc
+TEXT setg_gcc<>(SB), NOSPLIT, $0
+	get_tls(AX)
+	MOVL	gg+0(FP), DX
+	MOVL	DX, g(AX)
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT,$0-0
+	INT	$3
+loop:
+	JMP	loop
+
+// check that SP is in range [g->stack.lo, g->stack.hi)
+TEXT runtime·stackcheck(SB), NOSPLIT, $0-0
+	get_tls(CX)
+	MOVL	g(CX), AX
+	CMPL	(g_stack+stack_hi)(AX), SP
+	JHI	2(PC)
+	CALL	runtime·abort(SB)
+	CMPL	SP, (g_stack+stack_lo)(AX)
+	JHI	2(PC)
+	CALL	runtime·abort(SB)
+	RET
+
+// func cputicks() int64
+TEXT runtime·cputicks(SB),NOSPLIT,$0-8
+	CMPB	internal∕cpu·X86+const_offsetX86HasSSE2(SB), $1
+	JNE	done
+	CMPB	runtime·lfenceBeforeRdtsc(SB), $1
+	JNE	mfence
+	LFENCE
+	JMP	done
+mfence:
+	MFENCE
+done:
+	RDTSC
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+
+TEXT ldt0setup<>(SB),NOSPLIT,$16-0
+	// set up ldt 7 to point at m0.tls
+	// ldt 1 would be fine on Linux, but on OS X, 7 is as low as we can go.
+	// the entry number is just a hint.  setldt will set up GS with what it used.
+	MOVL	$7, 0(SP)
+	LEAL	runtime·m0+m_tls(SB), AX
+	MOVL	AX, 4(SP)
+	MOVL	$32, 8(SP)	// sizeof(tls array)
+	CALL	runtime·setldt(SB)
+	RET
+
+TEXT runtime·emptyfunc(SB),0,$0-0
+	RET
+
+// hash function using AES hardware instructions
+TEXT runtime·memhash(SB),NOSPLIT,$0-16
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVL	p+0(FP), AX	// ptr to data
+	MOVL	s+8(FP), BX	// size
+	LEAL	ret+12(FP), DX
+	JMP	aeshashbody<>(SB)
+noaes:
+	JMP	runtime·memhashFallback(SB)
+
+TEXT runtime·strhash(SB),NOSPLIT,$0-12
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVL	p+0(FP), AX	// ptr to string object
+	MOVL	4(AX), BX	// length of string
+	MOVL	(AX), AX	// string data
+	LEAL	ret+8(FP), DX
+	JMP	aeshashbody<>(SB)
+noaes:
+	JMP	runtime·strhashFallback(SB)
+
+// AX: data
+// BX: length
+// DX: address to put return value
+TEXT aeshashbody<>(SB),NOSPLIT,$0-0
+	MOVL	h+4(FP), X0	            // 32 bits of per-table hash seed
+	PINSRW	$4, BX, X0	            // 16 bits of length
+	PSHUFHW	$0, X0, X0	            // replace size with its low 2 bytes repeated 4 times
+	MOVO	X0, X1                      // save unscrambled seed
+	PXOR	runtime·aeskeysched(SB), X0 // xor in per-process seed
+	AESENC	X0, X0                      // scramble seed
+
+	CMPL	BX, $16
+	JB	aes0to15
+	JE	aes16
+	CMPL	BX, $32
+	JBE	aes17to32
+	CMPL	BX, $64
+	JBE	aes33to64
+	JMP	aes65plus
+
+aes0to15:
+	TESTL	BX, BX
+	JE	aes0
+
+	ADDL	$16, AX
+	TESTW	$0xff0, AX
+	JE	endofpage
+
+	// 16 bytes loaded at this address won't cross
+	// a page boundary, so we can load it directly.
+	MOVOU	-16(AX), X1
+	ADDL	BX, BX
+	PAND	masks<>(SB)(BX*8), X1
+
+final1:
+	AESENC	X0, X1  // scramble input, xor in seed
+	AESENC	X1, X1  // scramble combo 2 times
+	AESENC	X1, X1
+	MOVL	X1, (DX)
+	RET
+
+endofpage:
+	// address ends in 1111xxxx. Might be up against
+	// a page boundary, so load ending at last byte.
+	// Then shift bytes down using pshufb.
+	MOVOU	-32(AX)(BX*1), X1
+	ADDL	BX, BX
+	PSHUFB	shifts<>(SB)(BX*8), X1
+	JMP	final1
+
+aes0:
+	// Return scrambled input seed
+	AESENC	X0, X0
+	MOVL	X0, (DX)
+	RET
+
+aes16:
+	MOVOU	(AX), X1
+	JMP	final1
+
+aes17to32:
+	// make second starting seed
+	PXOR	runtime·aeskeysched+16(SB), X1
+	AESENC	X1, X1
+
+	// load data to be hashed
+	MOVOU	(AX), X2
+	MOVOU	-16(AX)(BX*1), X3
+
+	// scramble 3 times
+	AESENC	X0, X2
+	AESENC	X1, X3
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X2, X2
+	AESENC	X3, X3
+
+	// combine results
+	PXOR	X3, X2
+	MOVL	X2, (DX)
+	RET
+
+aes33to64:
+	// make 3 more starting seeds
+	MOVO	X1, X2
+	MOVO	X1, X3
+	PXOR	runtime·aeskeysched+16(SB), X1
+	PXOR	runtime·aeskeysched+32(SB), X2
+	PXOR	runtime·aeskeysched+48(SB), X3
+	AESENC	X1, X1
+	AESENC	X2, X2
+	AESENC	X3, X3
+
+	MOVOU	(AX), X4
+	MOVOU	16(AX), X5
+	MOVOU	-32(AX)(BX*1), X6
+	MOVOU	-16(AX)(BX*1), X7
+
+	AESENC	X0, X4
+	AESENC	X1, X5
+	AESENC	X2, X6
+	AESENC	X3, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	PXOR	X6, X4
+	PXOR	X7, X5
+	PXOR	X5, X4
+	MOVL	X4, (DX)
+	RET
+
+aes65plus:
+	// make 3 more starting seeds
+	MOVO	X1, X2
+	MOVO	X1, X3
+	PXOR	runtime·aeskeysched+16(SB), X1
+	PXOR	runtime·aeskeysched+32(SB), X2
+	PXOR	runtime·aeskeysched+48(SB), X3
+	AESENC	X1, X1
+	AESENC	X2, X2
+	AESENC	X3, X3
+
+	// start with last (possibly overlapping) block
+	MOVOU	-64(AX)(BX*1), X4
+	MOVOU	-48(AX)(BX*1), X5
+	MOVOU	-32(AX)(BX*1), X6
+	MOVOU	-16(AX)(BX*1), X7
+
+	// scramble state once
+	AESENC	X0, X4
+	AESENC	X1, X5
+	AESENC	X2, X6
+	AESENC	X3, X7
+
+	// compute number of remaining 64-byte blocks
+	DECL	BX
+	SHRL	$6, BX
+
+aesloop:
+	// scramble state, xor in a block
+	MOVOU	(AX), X0
+	MOVOU	16(AX), X1
+	MOVOU	32(AX), X2
+	MOVOU	48(AX), X3
+	AESENC	X0, X4
+	AESENC	X1, X5
+	AESENC	X2, X6
+	AESENC	X3, X7
+
+	// scramble state
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	ADDL	$64, AX
+	DECL	BX
+	JNE	aesloop
+
+	// 2 more scrambles to finish
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	PXOR	X6, X4
+	PXOR	X7, X5
+	PXOR	X5, X4
+	MOVL	X4, (DX)
+	RET
+
+TEXT runtime·memhash32(SB),NOSPLIT,$0-12
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVL	p+0(FP), AX	// ptr to data
+	MOVL	h+4(FP), X0	// seed
+	PINSRD	$1, (AX), X0	// data
+	AESENC	runtime·aeskeysched+0(SB), X0
+	AESENC	runtime·aeskeysched+16(SB), X0
+	AESENC	runtime·aeskeysched+32(SB), X0
+	MOVL	X0, ret+8(FP)
+	RET
+noaes:
+	JMP	runtime·memhash32Fallback(SB)
+
+TEXT runtime·memhash64(SB),NOSPLIT,$0-12
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVL	p+0(FP), AX	// ptr to data
+	MOVQ	(AX), X0	// data
+	PINSRD	$2, h+4(FP), X0	// seed
+	AESENC	runtime·aeskeysched+0(SB), X0
+	AESENC	runtime·aeskeysched+16(SB), X0
+	AESENC	runtime·aeskeysched+32(SB), X0
+	MOVL	X0, ret+8(FP)
+	RET
+noaes:
+	JMP	runtime·memhash64Fallback(SB)
+
+// simple mask to get rid of data in the high part of the register.
+DATA masks<>+0x00(SB)/4, $0x00000000
+DATA masks<>+0x04(SB)/4, $0x00000000
+DATA masks<>+0x08(SB)/4, $0x00000000
+DATA masks<>+0x0c(SB)/4, $0x00000000
+
+DATA masks<>+0x10(SB)/4, $0x000000ff
+DATA masks<>+0x14(SB)/4, $0x00000000
+DATA masks<>+0x18(SB)/4, $0x00000000
+DATA masks<>+0x1c(SB)/4, $0x00000000
+
+DATA masks<>+0x20(SB)/4, $0x0000ffff
+DATA masks<>+0x24(SB)/4, $0x00000000
+DATA masks<>+0x28(SB)/4, $0x00000000
+DATA masks<>+0x2c(SB)/4, $0x00000000
+
+DATA masks<>+0x30(SB)/4, $0x00ffffff
+DATA masks<>+0x34(SB)/4, $0x00000000
+DATA masks<>+0x38(SB)/4, $0x00000000
+DATA masks<>+0x3c(SB)/4, $0x00000000
+
+DATA masks<>+0x40(SB)/4, $0xffffffff
+DATA masks<>+0x44(SB)/4, $0x00000000
+DATA masks<>+0x48(SB)/4, $0x00000000
+DATA masks<>+0x4c(SB)/4, $0x00000000
+
+DATA masks<>+0x50(SB)/4, $0xffffffff
+DATA masks<>+0x54(SB)/4, $0x000000ff
+DATA masks<>+0x58(SB)/4, $0x00000000
+DATA masks<>+0x5c(SB)/4, $0x00000000
+
+DATA masks<>+0x60(SB)/4, $0xffffffff
+DATA masks<>+0x64(SB)/4, $0x0000ffff
+DATA masks<>+0x68(SB)/4, $0x00000000
+DATA masks<>+0x6c(SB)/4, $0x00000000
+
+DATA masks<>+0x70(SB)/4, $0xffffffff
+DATA masks<>+0x74(SB)/4, $0x00ffffff
+DATA masks<>+0x78(SB)/4, $0x00000000
+DATA masks<>+0x7c(SB)/4, $0x00000000
+
+DATA masks<>+0x80(SB)/4, $0xffffffff
+DATA masks<>+0x84(SB)/4, $0xffffffff
+DATA masks<>+0x88(SB)/4, $0x00000000
+DATA masks<>+0x8c(SB)/4, $0x00000000
+
+DATA masks<>+0x90(SB)/4, $0xffffffff
+DATA masks<>+0x94(SB)/4, $0xffffffff
+DATA masks<>+0x98(SB)/4, $0x000000ff
+DATA masks<>+0x9c(SB)/4, $0x00000000
+
+DATA masks<>+0xa0(SB)/4, $0xffffffff
+DATA masks<>+0xa4(SB)/4, $0xffffffff
+DATA masks<>+0xa8(SB)/4, $0x0000ffff
+DATA masks<>+0xac(SB)/4, $0x00000000
+
+DATA masks<>+0xb0(SB)/4, $0xffffffff
+DATA masks<>+0xb4(SB)/4, $0xffffffff
+DATA masks<>+0xb8(SB)/4, $0x00ffffff
+DATA masks<>+0xbc(SB)/4, $0x00000000
+
+DATA masks<>+0xc0(SB)/4, $0xffffffff
+DATA masks<>+0xc4(SB)/4, $0xffffffff
+DATA masks<>+0xc8(SB)/4, $0xffffffff
+DATA masks<>+0xcc(SB)/4, $0x00000000
+
+DATA masks<>+0xd0(SB)/4, $0xffffffff
+DATA masks<>+0xd4(SB)/4, $0xffffffff
+DATA masks<>+0xd8(SB)/4, $0xffffffff
+DATA masks<>+0xdc(SB)/4, $0x000000ff
+
+DATA masks<>+0xe0(SB)/4, $0xffffffff
+DATA masks<>+0xe4(SB)/4, $0xffffffff
+DATA masks<>+0xe8(SB)/4, $0xffffffff
+DATA masks<>+0xec(SB)/4, $0x0000ffff
+
+DATA masks<>+0xf0(SB)/4, $0xffffffff
+DATA masks<>+0xf4(SB)/4, $0xffffffff
+DATA masks<>+0xf8(SB)/4, $0xffffffff
+DATA masks<>+0xfc(SB)/4, $0x00ffffff
+
+GLOBL masks<>(SB),RODATA,$256
+
+// these are arguments to pshufb. They move data down from
+// the high bytes of the register to the low bytes of the register.
+// index is how many bytes to move.
+DATA shifts<>+0x00(SB)/4, $0x00000000
+DATA shifts<>+0x04(SB)/4, $0x00000000
+DATA shifts<>+0x08(SB)/4, $0x00000000
+DATA shifts<>+0x0c(SB)/4, $0x00000000
+
+DATA shifts<>+0x10(SB)/4, $0xffffff0f
+DATA shifts<>+0x14(SB)/4, $0xffffffff
+DATA shifts<>+0x18(SB)/4, $0xffffffff
+DATA shifts<>+0x1c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x20(SB)/4, $0xffff0f0e
+DATA shifts<>+0x24(SB)/4, $0xffffffff
+DATA shifts<>+0x28(SB)/4, $0xffffffff
+DATA shifts<>+0x2c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x30(SB)/4, $0xff0f0e0d
+DATA shifts<>+0x34(SB)/4, $0xffffffff
+DATA shifts<>+0x38(SB)/4, $0xffffffff
+DATA shifts<>+0x3c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x40(SB)/4, $0x0f0e0d0c
+DATA shifts<>+0x44(SB)/4, $0xffffffff
+DATA shifts<>+0x48(SB)/4, $0xffffffff
+DATA shifts<>+0x4c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x50(SB)/4, $0x0e0d0c0b
+DATA shifts<>+0x54(SB)/4, $0xffffff0f
+DATA shifts<>+0x58(SB)/4, $0xffffffff
+DATA shifts<>+0x5c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x60(SB)/4, $0x0d0c0b0a
+DATA shifts<>+0x64(SB)/4, $0xffff0f0e
+DATA shifts<>+0x68(SB)/4, $0xffffffff
+DATA shifts<>+0x6c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x70(SB)/4, $0x0c0b0a09
+DATA shifts<>+0x74(SB)/4, $0xff0f0e0d
+DATA shifts<>+0x78(SB)/4, $0xffffffff
+DATA shifts<>+0x7c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x80(SB)/4, $0x0b0a0908
+DATA shifts<>+0x84(SB)/4, $0x0f0e0d0c
+DATA shifts<>+0x88(SB)/4, $0xffffffff
+DATA shifts<>+0x8c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x90(SB)/4, $0x0a090807
+DATA shifts<>+0x94(SB)/4, $0x0e0d0c0b
+DATA shifts<>+0x98(SB)/4, $0xffffff0f
+DATA shifts<>+0x9c(SB)/4, $0xffffffff
+
+DATA shifts<>+0xa0(SB)/4, $0x09080706
+DATA shifts<>+0xa4(SB)/4, $0x0d0c0b0a
+DATA shifts<>+0xa8(SB)/4, $0xffff0f0e
+DATA shifts<>+0xac(SB)/4, $0xffffffff
+
+DATA shifts<>+0xb0(SB)/4, $0x08070605
+DATA shifts<>+0xb4(SB)/4, $0x0c0b0a09
+DATA shifts<>+0xb8(SB)/4, $0xff0f0e0d
+DATA shifts<>+0xbc(SB)/4, $0xffffffff
+
+DATA shifts<>+0xc0(SB)/4, $0x07060504
+DATA shifts<>+0xc4(SB)/4, $0x0b0a0908
+DATA shifts<>+0xc8(SB)/4, $0x0f0e0d0c
+DATA shifts<>+0xcc(SB)/4, $0xffffffff
+
+DATA shifts<>+0xd0(SB)/4, $0x06050403
+DATA shifts<>+0xd4(SB)/4, $0x0a090807
+DATA shifts<>+0xd8(SB)/4, $0x0e0d0c0b
+DATA shifts<>+0xdc(SB)/4, $0xffffff0f
+
+DATA shifts<>+0xe0(SB)/4, $0x05040302
+DATA shifts<>+0xe4(SB)/4, $0x09080706
+DATA shifts<>+0xe8(SB)/4, $0x0d0c0b0a
+DATA shifts<>+0xec(SB)/4, $0xffff0f0e
+
+DATA shifts<>+0xf0(SB)/4, $0x04030201
+DATA shifts<>+0xf4(SB)/4, $0x08070605
+DATA shifts<>+0xf8(SB)/4, $0x0c0b0a09
+DATA shifts<>+0xfc(SB)/4, $0xff0f0e0d
+
+GLOBL shifts<>(SB),RODATA,$256
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	// check that masks<>(SB) and shifts<>(SB) are aligned to 16-byte
+	MOVL	$masks<>(SB), AX
+	MOVL	$shifts<>(SB), BX
+	ORL	BX, AX
+	TESTL	$15, AX
+	SETEQ	ret+0(FP)
+	RET
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVL	$0, AX
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$0
+	get_tls(CX)
+	MOVL	g(CX), AX
+	MOVL	g_m(AX), AX
+	MOVL	m_curg(AX), AX
+	MOVL	(g_stack+stack_hi)(AX), AX
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT,$0-0
+	BYTE	$0x90	// NOP
+	CALL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	BYTE	$0x90	// NOP
+
+// Add a module's moduledata to the linked list of moduledata objects. This
+// is called from .init_array by a function generated in the linker and so
+// follows the platform ABI wrt register preservation -- it only touches AX,
+// CX (implicitly) and DX, but it does not follow the ABI wrt arguments:
+// instead the pointer to the moduledata is passed in AX.
+TEXT runtime·addmoduledata(SB),NOSPLIT,$0-0
+	MOVL	runtime·lastmoduledatap(SB), DX
+	MOVL	AX, moduledata_next(DX)
+	MOVL	AX, runtime·lastmoduledatap(SB)
+	RET
+
+TEXT runtime·uint32tofloat64(SB),NOSPLIT,$8-12
+	MOVL	a+0(FP), AX
+	MOVL	AX, 0(SP)
+	MOVL	$0, 4(SP)
+	FMOVV	0(SP), F0
+	FMOVDP	F0, ret+4(FP)
+	RET
+
+TEXT runtime·float64touint32(SB),NOSPLIT,$12-12
+	FMOVD	a+0(FP), F0
+	FSTCW	0(SP)
+	FLDCW	runtime·controlWord64trunc(SB)
+	FMOVVP	F0, 4(SP)
+	FLDCW	0(SP)
+	MOVL	4(SP), AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+// gcWriteBarrier performs a heap pointer write and informs the GC.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It takes two arguments:
+// - DI is the destination of the write
+// - AX is the value being written at DI
+// It clobbers FLAGS. It does not clobber any general-purpose registers,
+// but may clobber others (e.g., SSE registers).
+TEXT runtime·gcWriteBarrier(SB),NOSPLIT,$28
+	// Save the registers clobbered by the fast path. This is slightly
+	// faster than having the caller spill these.
+	MOVL	CX, 20(SP)
+	MOVL	BX, 24(SP)
+	// TODO: Consider passing g.m.p in as an argument so they can be shared
+	// across a sequence of write barriers.
+	get_tls(BX)
+	MOVL	g(BX), BX
+	MOVL	g_m(BX), BX
+	MOVL	m_p(BX), BX
+	MOVL	(p_wbBuf+wbBuf_next)(BX), CX
+	// Increment wbBuf.next position.
+	LEAL	8(CX), CX
+	MOVL	CX, (p_wbBuf+wbBuf_next)(BX)
+	CMPL	CX, (p_wbBuf+wbBuf_end)(BX)
+	// Record the write.
+	MOVL	AX, -8(CX)	// Record value
+	MOVL	(DI), BX	// TODO: This turns bad writes into bad reads.
+	MOVL	BX, -4(CX)	// Record *slot
+	// Is the buffer full? (flags set in CMPL above)
+	JEQ	flush
+ret:
+	MOVL	20(SP), CX
+	MOVL	24(SP), BX
+	// Do the write.
+	MOVL	AX, (DI)
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	MOVL	DI, 0(SP)	// Also first argument to wbBufFlush
+	MOVL	AX, 4(SP)	// Also second argument to wbBufFlush
+	// BX already saved
+	// CX already saved
+	MOVL	DX, 8(SP)
+	MOVL	BP, 12(SP)
+	MOVL	SI, 16(SP)
+	// DI already saved
+
+	// This takes arguments DI and AX
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVL	0(SP), DI
+	MOVL	4(SP), AX
+	MOVL	8(SP), DX
+	MOVL	12(SP), BP
+	MOVL	16(SP), SI
+	JMP	ret
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-8
+	MOVL	DX, x+0(FP)
+	MOVL	BX, y+4(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-8
+	MOVL	DX, x+0(FP)
+	MOVL	BX, y+4(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-8
+	MOVL	DX, x+0(FP)
+	MOVL	BX, y+4(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-8
+	MOVL	DX, x+0(FP)
+	MOVL	BX, y+4(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+
+// Extended versions for 64-bit indexes.
+TEXT runtime·panicExtendIndex(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendIndex(SB)
+TEXT runtime·panicExtendIndexU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendIndexU(SB)
+TEXT runtime·panicExtendSliceAlen(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlen(SB)
+TEXT runtime·panicExtendSliceAlenU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlenU(SB)
+TEXT runtime·panicExtendSliceAcap(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcap(SB)
+TEXT runtime·panicExtendSliceAcapU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcapU(SB)
+TEXT runtime·panicExtendSliceB(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceB(SB)
+TEXT runtime·panicExtendSliceBU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceBU(SB)
+TEXT runtime·panicExtendSlice3Alen(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	DX, lo+4(FP)
+	MOVL	BX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Alen(SB)
+TEXT runtime·panicExtendSlice3AlenU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	DX, lo+4(FP)
+	MOVL	BX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AlenU(SB)
+TEXT runtime·panicExtendSlice3Acap(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	DX, lo+4(FP)
+	MOVL	BX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Acap(SB)
+TEXT runtime·panicExtendSlice3AcapU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	DX, lo+4(FP)
+	MOVL	BX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AcapU(SB)
+TEXT runtime·panicExtendSlice3B(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3B(SB)
+TEXT runtime·panicExtendSlice3BU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3BU(SB)
+TEXT runtime·panicExtendSlice3C(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3C(SB)
+TEXT runtime·panicExtendSlice3CU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3CU(SB)
+
+#ifdef GOOS_android
+// Use the free TLS_SLOT_APP slot #2 on Android Q.
+// Earlier androids are set up in gcc_android.c.
+DATA runtime·tls_g+0(SB)/4, $8
+GLOBL runtime·tls_g+0(SB), NOPTR, $4
+#endif
diff --git a/src/runtime/asm_amd64.s b/src/runtime/asm_amd64.s
new file mode 100644
index 0000000..4ac8708
--- /dev/null
+++ b/src/runtime/asm_amd64.s
@@ -0,0 +1,1833 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// _rt0_amd64 is common startup code for most amd64 systems when using
+// internal linking. This is the entry point for the program from the
+// kernel for an ordinary -buildmode=exe program. The stack holds the
+// number of arguments and the C-style argv.
+TEXT _rt0_amd64(SB),NOSPLIT,$-8
+	MOVQ	0(SP), DI	// argc
+	LEAQ	8(SP), SI	// argv
+	JMP	runtime·rt0_go(SB)
+
+// main is common startup code for most amd64 systems when using
+// external linking. The C startup code will call the symbol "main"
+// passing argc and argv in the usual C ABI registers DI and SI.
+TEXT main(SB),NOSPLIT,$-8
+	JMP	runtime·rt0_go(SB)
+
+// _rt0_amd64_lib is common startup code for most amd64 systems when
+// using -buildmode=c-archive or -buildmode=c-shared. The linker will
+// arrange to invoke this function as a global constructor (for
+// c-archive) or when the shared library is loaded (for c-shared).
+// We expect argc and argv to be passed in the usual C ABI registers
+// DI and SI.
+TEXT _rt0_amd64_lib(SB),NOSPLIT,$0x50
+	// Align stack per ELF ABI requirements.
+	MOVQ	SP, AX
+	ANDQ	$~15, SP
+	// Save C ABI callee-saved registers, as caller may need them.
+	MOVQ	BX, 0x10(SP)
+	MOVQ	BP, 0x18(SP)
+	MOVQ	R12, 0x20(SP)
+	MOVQ	R13, 0x28(SP)
+	MOVQ	R14, 0x30(SP)
+	MOVQ	R15, 0x38(SP)
+	MOVQ	AX, 0x40(SP)
+
+	MOVQ	DI, _rt0_amd64_lib_argc<>(SB)
+	MOVQ	SI, _rt0_amd64_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	CALL	runtime·libpreinit(SB)
+
+	// Create a new thread to finish Go runtime initialization.
+	MOVQ	_cgo_sys_thread_create(SB), AX
+	TESTQ	AX, AX
+	JZ	nocgo
+	MOVQ	$_rt0_amd64_lib_go(SB), DI
+	MOVQ	$0, SI
+	CALL	AX
+	JMP	restore
+
+nocgo:
+	MOVQ	$0x800000, 0(SP)		// stacksize
+	MOVQ	$_rt0_amd64_lib_go(SB), AX
+	MOVQ	AX, 8(SP)			// fn
+	CALL	runtime·newosproc0(SB)
+
+restore:
+	MOVQ	0x10(SP), BX
+	MOVQ	0x18(SP), BP
+	MOVQ	0x20(SP), R12
+	MOVQ	0x28(SP), R13
+	MOVQ	0x30(SP), R14
+	MOVQ	0x38(SP), R15
+	MOVQ	0x40(SP), SP
+	RET
+
+// _rt0_amd64_lib_go initializes the Go runtime.
+// This is started in a separate thread by _rt0_amd64_lib.
+TEXT _rt0_amd64_lib_go(SB),NOSPLIT,$0
+	MOVQ	_rt0_amd64_lib_argc<>(SB), DI
+	MOVQ	_rt0_amd64_lib_argv<>(SB), SI
+	JMP	runtime·rt0_go(SB)
+
+DATA _rt0_amd64_lib_argc<>(SB)/8, $0
+GLOBL _rt0_amd64_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_amd64_lib_argv<>(SB)/8, $0
+GLOBL _rt0_amd64_lib_argv<>(SB),NOPTR, $8
+
+// Defined as ABIInternal since it does not use the stack-based Go ABI (and
+// in addition there are no calls to this entry point from Go code).
+TEXT runtime·rt0_go<ABIInternal>(SB),NOSPLIT,$0
+	// copy arguments forward on an even stack
+	MOVQ	DI, AX		// argc
+	MOVQ	SI, BX		// argv
+	SUBQ	$(4*8+7), SP		// 2args 2auto
+	ANDQ	$~15, SP
+	MOVQ	AX, 16(SP)
+	MOVQ	BX, 24(SP)
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVQ	$runtime·g0(SB), DI
+	LEAQ	(-64*1024+104)(SP), BX
+	MOVQ	BX, g_stackguard0(DI)
+	MOVQ	BX, g_stackguard1(DI)
+	MOVQ	BX, (g_stack+stack_lo)(DI)
+	MOVQ	SP, (g_stack+stack_hi)(DI)
+
+	// find out information about the processor we're on
+	MOVL	$0, AX
+	CPUID
+	MOVL	AX, SI
+	CMPL	AX, $0
+	JE	nocpuinfo
+
+	// Figure out how to serialize RDTSC.
+	// On Intel processors LFENCE is enough. AMD requires MFENCE.
+	// Don't know about the rest, so let's do MFENCE.
+	CMPL	BX, $0x756E6547  // "Genu"
+	JNE	notintel
+	CMPL	DX, $0x49656E69  // "ineI"
+	JNE	notintel
+	CMPL	CX, $0x6C65746E  // "ntel"
+	JNE	notintel
+	MOVB	$1, runtime·isIntel(SB)
+	MOVB	$1, runtime·lfenceBeforeRdtsc(SB)
+notintel:
+
+	// Load EAX=1 cpuid flags
+	MOVL	$1, AX
+	CPUID
+	MOVL	AX, runtime·processorVersionInfo(SB)
+
+nocpuinfo:
+	// if there is an _cgo_init, call it.
+	MOVQ	_cgo_init(SB), AX
+	TESTQ	AX, AX
+	JZ	needtls
+	// arg 1: g0, already in DI
+	MOVQ	$setg_gcc<>(SB), SI // arg 2: setg_gcc
+#ifdef GOOS_android
+	MOVQ	$runtime·tls_g(SB), DX 	// arg 3: &tls_g
+	// arg 4: TLS base, stored in slot 0 (Android's TLS_SLOT_SELF).
+	// Compensate for tls_g (+16).
+	MOVQ	-16(TLS), CX
+#else
+	MOVQ	$0, DX	// arg 3, 4: not used when using platform's TLS
+	MOVQ	$0, CX
+#endif
+#ifdef GOOS_windows
+	// Adjust for the Win64 calling convention.
+	MOVQ	CX, R9 // arg 4
+	MOVQ	DX, R8 // arg 3
+	MOVQ	SI, DX // arg 2
+	MOVQ	DI, CX // arg 1
+#endif
+	CALL	AX
+
+	// update stackguard after _cgo_init
+	MOVQ	$runtime·g0(SB), CX
+	MOVQ	(g_stack+stack_lo)(CX), AX
+	ADDQ	$const__StackGuard, AX
+	MOVQ	AX, g_stackguard0(CX)
+	MOVQ	AX, g_stackguard1(CX)
+
+#ifndef GOOS_windows
+	JMP ok
+#endif
+needtls:
+#ifdef GOOS_plan9
+	// skip TLS setup on Plan 9
+	JMP ok
+#endif
+#ifdef GOOS_solaris
+	// skip TLS setup on Solaris
+	JMP ok
+#endif
+#ifdef GOOS_illumos
+	// skip TLS setup on illumos
+	JMP ok
+#endif
+#ifdef GOOS_darwin
+	// skip TLS setup on Darwin
+	JMP ok
+#endif
+#ifdef GOOS_openbsd
+	// skip TLS setup on OpenBSD
+	JMP ok
+#endif
+
+	LEAQ	runtime·m0+m_tls(SB), DI
+	CALL	runtime·settls(SB)
+
+	// store through it, to make sure it works
+	get_tls(BX)
+	MOVQ	$0x123, g(BX)
+	MOVQ	runtime·m0+m_tls(SB), AX
+	CMPQ	AX, $0x123
+	JEQ 2(PC)
+	CALL	runtime·abort(SB)
+ok:
+	// set the per-goroutine and per-mach "registers"
+	get_tls(BX)
+	LEAQ	runtime·g0(SB), CX
+	MOVQ	CX, g(BX)
+	LEAQ	runtime·m0(SB), AX
+
+	// save m->g0 = g0
+	MOVQ	CX, m_g0(AX)
+	// save m0 to g0->m
+	MOVQ	AX, g_m(CX)
+
+	CLD				// convention is D is always left cleared
+	CALL	runtime·check(SB)
+
+	MOVL	16(SP), AX		// copy argc
+	MOVL	AX, 0(SP)
+	MOVQ	24(SP), AX		// copy argv
+	MOVQ	AX, 8(SP)
+	CALL	runtime·args(SB)
+	CALL	runtime·osinit(SB)
+	CALL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVQ	$runtime·mainPC(SB), AX		// entry
+	PUSHQ	AX
+	PUSHQ	$0			// arg size
+	CALL	runtime·newproc(SB)
+	POPQ	AX
+	POPQ	AX
+
+	// start this M
+	CALL	runtime·mstart(SB)
+
+	CALL	runtime·abort(SB)	// mstart should never return
+	RET
+
+	// Prevent dead-code elimination of debugCallV1, which is
+	// intended to be called by debuggers.
+	MOVQ	$runtime·debugCallV1<ABIInternal>(SB), AX
+	RET
+
+// mainPC is a function value for runtime.main, to be passed to newproc.
+// The reference to runtime.main is made via ABIInternal, since the
+// actual function (not the ABI0 wrapper) is needed by newproc.
+DATA	runtime·mainPC+0(SB)/8,$runtime·main<ABIInternal>(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+TEXT runtime·breakpoint(SB),NOSPLIT,$0-0
+	BYTE	$0xcc
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT,$0-0
+	// No per-thread init.
+	RET
+
+/*
+ *  go-routine
+ */
+
+// func gosave(buf *gobuf)
+// save state in Gobuf; setjmp
+TEXT runtime·gosave(SB), NOSPLIT, $0-8
+	MOVQ	buf+0(FP), AX		// gobuf
+	LEAQ	buf+0(FP), BX		// caller's SP
+	MOVQ	BX, gobuf_sp(AX)
+	MOVQ	0(SP), BX		// caller's PC
+	MOVQ	BX, gobuf_pc(AX)
+	MOVQ	$0, gobuf_ret(AX)
+	MOVQ	BP, gobuf_bp(AX)
+	// Assert ctxt is zero. See func save.
+	MOVQ	gobuf_ctxt(AX), BX
+	TESTQ	BX, BX
+	JZ	2(PC)
+	CALL	runtime·badctxt(SB)
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	MOVQ	BX, gobuf_g(AX)
+	RET
+
+// func gogo(buf *gobuf)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT, $16-8
+	MOVQ	buf+0(FP), BX		// gobuf
+	MOVQ	gobuf_g(BX), DX
+	MOVQ	0(DX), CX		// make sure g != nil
+	get_tls(CX)
+	MOVQ	DX, g(CX)
+	MOVQ	gobuf_sp(BX), SP	// restore SP
+	MOVQ	gobuf_ret(BX), AX
+	MOVQ	gobuf_ctxt(BX), DX
+	MOVQ	gobuf_bp(BX), BP
+	MOVQ	$0, gobuf_sp(BX)	// clear to help garbage collector
+	MOVQ	$0, gobuf_ret(BX)
+	MOVQ	$0, gobuf_ctxt(BX)
+	MOVQ	$0, gobuf_bp(BX)
+	MOVQ	gobuf_pc(BX), BX
+	JMP	BX
+
+// func mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT, $0-8
+	MOVQ	fn+0(FP), DI
+
+	get_tls(CX)
+	MOVQ	g(CX), AX	// save state in g->sched
+	MOVQ	0(SP), BX	// caller's PC
+	MOVQ	BX, (g_sched+gobuf_pc)(AX)
+	LEAQ	fn+0(FP), BX	// caller's SP
+	MOVQ	BX, (g_sched+gobuf_sp)(AX)
+	MOVQ	AX, (g_sched+gobuf_g)(AX)
+	MOVQ	BP, (g_sched+gobuf_bp)(AX)
+
+	// switch to m->g0 & its stack, call fn
+	MOVQ	g(CX), BX
+	MOVQ	g_m(BX), BX
+	MOVQ	m_g0(BX), SI
+	CMPQ	SI, AX	// if g == m->g0 call badmcall
+	JNE	3(PC)
+	MOVQ	$runtime·badmcall(SB), AX
+	JMP	AX
+	MOVQ	SI, g(CX)	// g = m->g0
+	MOVQ	(g_sched+gobuf_sp)(SI), SP	// sp = m->g0->sched.sp
+	PUSHQ	AX
+	MOVQ	DI, DX
+	MOVQ	0(DI), DI
+	CALL	DI
+	POPQ	AX
+	MOVQ	$runtime·badmcall2(SB), AX
+	JMP	AX
+	RET
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVQ	fn+0(FP), DI	// DI = fn
+	get_tls(CX)
+	MOVQ	g(CX), AX	// AX = g
+	MOVQ	g_m(AX), BX	// BX = m
+
+	CMPQ	AX, m_gsignal(BX)
+	JEQ	noswitch
+
+	MOVQ	m_g0(BX), DX	// DX = g0
+	CMPQ	AX, DX
+	JEQ	noswitch
+
+	CMPQ	AX, m_curg(BX)
+	JNE	bad
+
+	// switch stacks
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVQ	$runtime·systemstack_switch(SB), SI
+	MOVQ	SI, (g_sched+gobuf_pc)(AX)
+	MOVQ	SP, (g_sched+gobuf_sp)(AX)
+	MOVQ	AX, (g_sched+gobuf_g)(AX)
+	MOVQ	BP, (g_sched+gobuf_bp)(AX)
+
+	// switch to g0
+	MOVQ	DX, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(DX), BX
+	// make it look like mstart called systemstack on g0, to stop traceback
+	SUBQ	$8, BX
+	MOVQ	$runtime·mstart(SB), DX
+	MOVQ	DX, 0(BX)
+	MOVQ	BX, SP
+
+	// call target function
+	MOVQ	DI, DX
+	MOVQ	0(DI), DI
+	CALL	DI
+
+	// switch back to g
+	get_tls(CX)
+	MOVQ	g(CX), AX
+	MOVQ	g_m(AX), BX
+	MOVQ	m_curg(BX), AX
+	MOVQ	AX, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(AX), SP
+	MOVQ	$0, (g_sched+gobuf_sp)(AX)
+	RET
+
+noswitch:
+	// already on m stack; tail call the function
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVQ	DI, DX
+	MOVQ	0(DI), DI
+	JMP	DI
+
+bad:
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	MOVQ	$runtime·badsystemstack(SB), AX
+	CALL	AX
+	INT	$3
+
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	MOVQ	g_m(BX), BX
+	MOVQ	m_g0(BX), SI
+	CMPQ	g(CX), SI
+	JNE	3(PC)
+	CALL	runtime·badmorestackg0(SB)
+	CALL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVQ	m_gsignal(BX), SI
+	CMPQ	g(CX), SI
+	JNE	3(PC)
+	CALL	runtime·badmorestackgsignal(SB)
+	CALL	runtime·abort(SB)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	MOVQ	8(SP), AX	// f's caller's PC
+	MOVQ	AX, (m_morebuf+gobuf_pc)(BX)
+	LEAQ	16(SP), AX	// f's caller's SP
+	MOVQ	AX, (m_morebuf+gobuf_sp)(BX)
+	get_tls(CX)
+	MOVQ	g(CX), SI
+	MOVQ	SI, (m_morebuf+gobuf_g)(BX)
+
+	// Set g->sched to context in f.
+	MOVQ	0(SP), AX // f's PC
+	MOVQ	AX, (g_sched+gobuf_pc)(SI)
+	MOVQ	SI, (g_sched+gobuf_g)(SI)
+	LEAQ	8(SP), AX // f's SP
+	MOVQ	AX, (g_sched+gobuf_sp)(SI)
+	MOVQ	BP, (g_sched+gobuf_bp)(SI)
+	MOVQ	DX, (g_sched+gobuf_ctxt)(SI)
+
+	// Call newstack on m->g0's stack.
+	MOVQ	m_g0(BX), BX
+	MOVQ	BX, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(BX), SP
+	CALL	runtime·newstack(SB)
+	CALL	runtime·abort(SB)	// crash if newstack returns
+	RET
+
+// morestack but not preserving ctxt.
+TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0
+	MOVL	$0, DX
+	JMP	runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(argtype *_type, f *FuncVal, arg *byte, argsize, retoffset uint32).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	CMPQ	CX, $MAXSIZE;		\
+	JA	3(PC);			\
+	MOVQ	$NAME(SB), AX;		\
+	JMP	AX
+// Note: can't just "JMP NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall<ABIInternal>(SB), NOSPLIT, $0-32
+	MOVLQZX argsize+24(FP), CX
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVQ	$runtime·badreflectcall(SB), AX
+	JMP	AX
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-32;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVQ	argptr+16(FP), SI;		\
+	MOVLQZX argsize+24(FP), CX;		\
+	MOVQ	SP, DI;				\
+	REP;MOVSB;				\
+	/* call function */			\
+	MOVQ	f+8(FP), DX;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	MOVQ	(DX), AX;			\
+	CALL	AX;				\
+	/* copy return values back */		\
+	MOVQ	argtype+0(FP), DX;		\
+	MOVQ	argptr+16(FP), DI;		\
+	MOVLQZX	argsize+24(FP), CX;		\
+	MOVLQZX	retoffset+28(FP), BX;		\
+	MOVQ	SP, SI;				\
+	ADDQ	BX, DI;				\
+	ADDQ	BX, SI;				\
+	SUBQ	BX, CX;				\
+	CALL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $32-0
+	NO_LOCAL_POINTERS
+	MOVQ	DX, 0(SP)
+	MOVQ	DI, 8(SP)
+	MOVQ	SI, 16(SP)
+	MOVQ	CX, 24(SP)
+	CALL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	MOVL	cycles+0(FP), AX
+again:
+	PAUSE
+	SUBL	$1, AX
+	JNZ	again
+	RET
+
+
+TEXT ·publicationBarrier(SB),NOSPLIT,$0-0
+	// Stores are already ordered on x86, so this is just a
+	// compile barrier.
+	RET
+
+// func jmpdefer(fv *funcval, argp uintptr)
+// argp is a caller SP.
+// called from deferreturn.
+// 1. pop the caller
+// 2. sub 5 bytes from the callers return
+// 3. jmp to the argument
+TEXT runtime·jmpdefer(SB), NOSPLIT, $0-16
+	MOVQ	fv+0(FP), DX	// fn
+	MOVQ	argp+8(FP), BX	// caller sp
+	LEAQ	-8(BX), SP	// caller sp after CALL
+	MOVQ	-8(SP), BP	// restore BP as if deferreturn returned (harmless if framepointers not in use)
+	SUBQ	$5, (SP)	// return to CALL again
+	MOVQ	0(DX), BX
+	JMP	BX	// but first run the deferred function
+
+// Save state of caller into g->sched. Smashes R8, R9.
+TEXT gosave<>(SB),NOSPLIT,$0
+	get_tls(R8)
+	MOVQ	g(R8), R8
+	MOVQ	0(SP), R9
+	MOVQ	R9, (g_sched+gobuf_pc)(R8)
+	LEAQ	8(SP), R9
+	MOVQ	R9, (g_sched+gobuf_sp)(R8)
+	MOVQ	$0, (g_sched+gobuf_ret)(R8)
+	MOVQ	BP, (g_sched+gobuf_bp)(R8)
+	// Assert ctxt is zero. See func save.
+	MOVQ	(g_sched+gobuf_ctxt)(R8), R9
+	TESTQ	R9, R9
+	JZ	2(PC)
+	CALL	runtime·badctxt(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOVQ	fn+0(FP), AX
+	MOVQ	arg+8(FP), BX
+
+	MOVQ	SP, DX
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already.
+	get_tls(CX)
+	MOVQ	g(CX), R8
+	CMPQ	R8, $0
+	JEQ	nosave
+	MOVQ	g_m(R8), R8
+	MOVQ	m_g0(R8), SI
+	MOVQ	g(CX), DI
+	CMPQ	SI, DI
+	JEQ	nosave
+	MOVQ	m_gsignal(R8), SI
+	CMPQ	SI, DI
+	JEQ	nosave
+
+	// Switch to system stack.
+	MOVQ	m_g0(R8), SI
+	CALL	gosave<>(SB)
+	MOVQ	SI, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(SI), SP
+
+	// Now on a scheduling stack (a pthread-created stack).
+	// Make sure we have enough room for 4 stack-backed fast-call
+	// registers as per windows amd64 calling convention.
+	SUBQ	$64, SP
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	MOVQ	DI, 48(SP)	// save g
+	MOVQ	(g_stack+stack_hi)(DI), DI
+	SUBQ	DX, DI
+	MOVQ	DI, 40(SP)	// save depth in stack (can't just save SP, as stack might be copied during a callback)
+	MOVQ	BX, DI		// DI = first argument in AMD64 ABI
+	MOVQ	BX, CX		// CX = first argument in Win64
+	CALL	AX
+
+	// Restore registers, g, stack pointer.
+	get_tls(CX)
+	MOVQ	48(SP), DI
+	MOVQ	(g_stack+stack_hi)(DI), SI
+	SUBQ	40(SP), SI
+	MOVQ	DI, g(CX)
+	MOVQ	SI, SP
+
+	MOVL	AX, ret+16(FP)
+	RET
+
+nosave:
+	// Running on a system stack, perhaps even without a g.
+	// Having no g can happen during thread creation or thread teardown
+	// (see needm/dropm on Solaris, for example).
+	// This code is like the above sequence but without saving/restoring g
+	// and without worrying about the stack moving out from under us
+	// (because we're on a system stack, not a goroutine stack).
+	// The above code could be used directly if already on a system stack,
+	// but then the only path through this code would be a rare case on Solaris.
+	// Using this code for all "already on system stack" calls exercises it more,
+	// which should help keep it correct.
+	SUBQ	$64, SP
+	ANDQ	$~15, SP
+	MOVQ	$0, 48(SP)		// where above code stores g, in case someone looks during debugging
+	MOVQ	DX, 40(SP)	// save original stack pointer
+	MOVQ	BX, DI		// DI = first argument in AMD64 ABI
+	MOVQ	BX, CX		// CX = first argument in Win64
+	CALL	AX
+	MOVQ	40(SP), SI	// restore original stack pointer
+	MOVQ	SI, SP
+	MOVL	AX, ret+16(FP)
+	RET
+
+// func cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// If g is nil, Go did not create the current thread.
+	// Call needm to obtain one m for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call through AX.
+	get_tls(CX)
+#ifdef GOOS_windows
+	MOVL	$0, BX
+	CMPQ	CX, $0
+	JEQ	2(PC)
+#endif
+	MOVQ	g(CX), BX
+	CMPQ	BX, $0
+	JEQ	needm
+	MOVQ	g_m(BX), BX
+	MOVQ	BX, savedm-8(SP)	// saved copy of oldm
+	JMP	havem
+needm:
+	MOVQ    $runtime·needm(SB), AX
+	CALL	AX
+	MOVQ	$0, savedm-8(SP) // dropm on return
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	MOVQ	g_m(BX), BX
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVQ	m_g0(BX), SI
+	MOVQ	SP, (g_sched+gobuf_sp)(SI)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 0(SP).
+	MOVQ	m_g0(BX), SI
+	MOVQ	(g_sched+gobuf_sp)(SI), AX
+	MOVQ	AX, 0(SP)
+	MOVQ	SP, (g_sched+gobuf_sp)(SI)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVQ	m_curg(BX), SI
+	MOVQ	SI, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(SI), DI  // prepare stack as DI
+	MOVQ	(g_sched+gobuf_pc)(SI), BX
+	MOVQ	BX, -8(DI)  // "push" return PC on the g stack
+	// Gather our arguments into registers.
+	MOVQ	fn+0(FP), BX
+	MOVQ	frame+8(FP), CX
+	MOVQ	ctxt+16(FP), DX
+	// Compute the size of the frame, including return PC and, if
+	// GOEXPERIMENT=framepointer, the saved base pointer
+	LEAQ	fn+0(FP), AX
+	SUBQ	SP, AX   // AX is our actual frame size
+	SUBQ	AX, DI   // Allocate the same frame size on the g stack
+	MOVQ	DI, SP
+
+	MOVQ	BX, 0(SP)
+	MOVQ	CX, 8(SP)
+	MOVQ	DX, 16(SP)
+	CALL	runtime·cgocallbackg(SB)
+
+	// Compute the size of the frame again. FP and SP have
+	// completely different values here than they did above,
+	// but only their difference matters.
+	LEAQ	fn+0(FP), AX
+	SUBQ	SP, AX
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	get_tls(CX)
+	MOVQ	g(CX), SI
+	MOVQ	SP, DI
+	ADDQ	AX, DI
+	MOVQ	-8(DI), BX
+	MOVQ	BX, (g_sched+gobuf_pc)(SI)
+	MOVQ	DI, (g_sched+gobuf_sp)(SI)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVQ	g(CX), BX
+	MOVQ	g_m(BX), BX
+	MOVQ	m_g0(BX), SI
+	MOVQ	SI, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(SI), SP
+	MOVQ	0(SP), AX
+	MOVQ	AX, (g_sched+gobuf_sp)(SI)
+
+	// If the m on entry was nil, we called needm above to borrow an m
+	// for the duration of the call. Since the call is over, return it with dropm.
+	MOVQ	savedm-8(SP), BX
+	CMPQ	BX, $0
+	JNE 3(PC)
+	MOVQ	$runtime·dropm(SB), AX
+	CALL	AX
+
+	// Done!
+	RET
+
+// func setg(gg *g)
+// set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVQ	gg+0(FP), BX
+#ifdef GOOS_windows
+	CMPQ	BX, $0
+	JNE	settls
+	MOVQ	$0, 0x28(GS)
+	RET
+settls:
+	MOVQ	g_m(BX), AX
+	LEAQ	m_tls(AX), AX
+	MOVQ	AX, 0x28(GS)
+#endif
+	get_tls(CX)
+	MOVQ	BX, g(CX)
+	RET
+
+// void setg_gcc(G*); set g called from gcc.
+TEXT setg_gcc<>(SB),NOSPLIT,$0
+	get_tls(AX)
+	MOVQ	DI, g(AX)
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT,$0-0
+	INT	$3
+loop:
+	JMP	loop
+
+// check that SP is in range [g->stack.lo, g->stack.hi)
+TEXT runtime·stackcheck(SB), NOSPLIT, $0-0
+	get_tls(CX)
+	MOVQ	g(CX), AX
+	CMPQ	(g_stack+stack_hi)(AX), SP
+	JHI	2(PC)
+	CALL	runtime·abort(SB)
+	CMPQ	SP, (g_stack+stack_lo)(AX)
+	JHI	2(PC)
+	CALL	runtime·abort(SB)
+	RET
+
+// func cputicks() int64
+TEXT runtime·cputicks(SB),NOSPLIT,$0-0
+	CMPB	runtime·lfenceBeforeRdtsc(SB), $1
+	JNE	mfence
+	LFENCE
+	JMP	done
+mfence:
+	MFENCE
+done:
+	RDTSC
+	SHLQ	$32, DX
+	ADDQ	DX, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+
+// func memhash(p unsafe.Pointer, h, s uintptr) uintptr
+// hash function using AES hardware instructions
+TEXT runtime·memhash(SB),NOSPLIT,$0-32
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVQ	p+0(FP), AX	// ptr to data
+	MOVQ	s+16(FP), CX	// size
+	LEAQ	ret+24(FP), DX
+	JMP	aeshashbody<>(SB)
+noaes:
+	JMP	runtime·memhashFallback(SB)
+
+// func strhash(p unsafe.Pointer, h uintptr) uintptr
+TEXT runtime·strhash(SB),NOSPLIT,$0-24
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVQ	p+0(FP), AX	// ptr to string struct
+	MOVQ	8(AX), CX	// length of string
+	MOVQ	(AX), AX	// string data
+	LEAQ	ret+16(FP), DX
+	JMP	aeshashbody<>(SB)
+noaes:
+	JMP	runtime·strhashFallback(SB)
+
+// AX: data
+// CX: length
+// DX: address to put return value
+TEXT aeshashbody<>(SB),NOSPLIT,$0-0
+	// Fill an SSE register with our seeds.
+	MOVQ	h+8(FP), X0			// 64 bits of per-table hash seed
+	PINSRW	$4, CX, X0			// 16 bits of length
+	PSHUFHW $0, X0, X0			// repeat length 4 times total
+	MOVO	X0, X1				// save unscrambled seed
+	PXOR	runtime·aeskeysched(SB), X0	// xor in per-process seed
+	AESENC	X0, X0				// scramble seed
+
+	CMPQ	CX, $16
+	JB	aes0to15
+	JE	aes16
+	CMPQ	CX, $32
+	JBE	aes17to32
+	CMPQ	CX, $64
+	JBE	aes33to64
+	CMPQ	CX, $128
+	JBE	aes65to128
+	JMP	aes129plus
+
+aes0to15:
+	TESTQ	CX, CX
+	JE	aes0
+
+	ADDQ	$16, AX
+	TESTW	$0xff0, AX
+	JE	endofpage
+
+	// 16 bytes loaded at this address won't cross
+	// a page boundary, so we can load it directly.
+	MOVOU	-16(AX), X1
+	ADDQ	CX, CX
+	MOVQ	$masks<>(SB), AX
+	PAND	(AX)(CX*8), X1
+final1:
+	PXOR	X0, X1	// xor data with seed
+	AESENC	X1, X1	// scramble combo 3 times
+	AESENC	X1, X1
+	AESENC	X1, X1
+	MOVQ	X1, (DX)
+	RET
+
+endofpage:
+	// address ends in 1111xxxx. Might be up against
+	// a page boundary, so load ending at last byte.
+	// Then shift bytes down using pshufb.
+	MOVOU	-32(AX)(CX*1), X1
+	ADDQ	CX, CX
+	MOVQ	$shifts<>(SB), AX
+	PSHUFB	(AX)(CX*8), X1
+	JMP	final1
+
+aes0:
+	// Return scrambled input seed
+	AESENC	X0, X0
+	MOVQ	X0, (DX)
+	RET
+
+aes16:
+	MOVOU	(AX), X1
+	JMP	final1
+
+aes17to32:
+	// make second starting seed
+	PXOR	runtime·aeskeysched+16(SB), X1
+	AESENC	X1, X1
+
+	// load data to be hashed
+	MOVOU	(AX), X2
+	MOVOU	-16(AX)(CX*1), X3
+
+	// xor with seed
+	PXOR	X0, X2
+	PXOR	X1, X3
+
+	// scramble 3 times
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X2, X2
+	AESENC	X3, X3
+
+	// combine results
+	PXOR	X3, X2
+	MOVQ	X2, (DX)
+	RET
+
+aes33to64:
+	// make 3 more starting seeds
+	MOVO	X1, X2
+	MOVO	X1, X3
+	PXOR	runtime·aeskeysched+16(SB), X1
+	PXOR	runtime·aeskeysched+32(SB), X2
+	PXOR	runtime·aeskeysched+48(SB), X3
+	AESENC	X1, X1
+	AESENC	X2, X2
+	AESENC	X3, X3
+
+	MOVOU	(AX), X4
+	MOVOU	16(AX), X5
+	MOVOU	-32(AX)(CX*1), X6
+	MOVOU	-16(AX)(CX*1), X7
+
+	PXOR	X0, X4
+	PXOR	X1, X5
+	PXOR	X2, X6
+	PXOR	X3, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	PXOR	X6, X4
+	PXOR	X7, X5
+	PXOR	X5, X4
+	MOVQ	X4, (DX)
+	RET
+
+aes65to128:
+	// make 7 more starting seeds
+	MOVO	X1, X2
+	MOVO	X1, X3
+	MOVO	X1, X4
+	MOVO	X1, X5
+	MOVO	X1, X6
+	MOVO	X1, X7
+	PXOR	runtime·aeskeysched+16(SB), X1
+	PXOR	runtime·aeskeysched+32(SB), X2
+	PXOR	runtime·aeskeysched+48(SB), X3
+	PXOR	runtime·aeskeysched+64(SB), X4
+	PXOR	runtime·aeskeysched+80(SB), X5
+	PXOR	runtime·aeskeysched+96(SB), X6
+	PXOR	runtime·aeskeysched+112(SB), X7
+	AESENC	X1, X1
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	// load data
+	MOVOU	(AX), X8
+	MOVOU	16(AX), X9
+	MOVOU	32(AX), X10
+	MOVOU	48(AX), X11
+	MOVOU	-64(AX)(CX*1), X12
+	MOVOU	-48(AX)(CX*1), X13
+	MOVOU	-32(AX)(CX*1), X14
+	MOVOU	-16(AX)(CX*1), X15
+
+	// xor with seed
+	PXOR	X0, X8
+	PXOR	X1, X9
+	PXOR	X2, X10
+	PXOR	X3, X11
+	PXOR	X4, X12
+	PXOR	X5, X13
+	PXOR	X6, X14
+	PXOR	X7, X15
+
+	// scramble 3 times
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+
+	// combine results
+	PXOR	X12, X8
+	PXOR	X13, X9
+	PXOR	X14, X10
+	PXOR	X15, X11
+	PXOR	X10, X8
+	PXOR	X11, X9
+	PXOR	X9, X8
+	MOVQ	X8, (DX)
+	RET
+
+aes129plus:
+	// make 7 more starting seeds
+	MOVO	X1, X2
+	MOVO	X1, X3
+	MOVO	X1, X4
+	MOVO	X1, X5
+	MOVO	X1, X6
+	MOVO	X1, X7
+	PXOR	runtime·aeskeysched+16(SB), X1
+	PXOR	runtime·aeskeysched+32(SB), X2
+	PXOR	runtime·aeskeysched+48(SB), X3
+	PXOR	runtime·aeskeysched+64(SB), X4
+	PXOR	runtime·aeskeysched+80(SB), X5
+	PXOR	runtime·aeskeysched+96(SB), X6
+	PXOR	runtime·aeskeysched+112(SB), X7
+	AESENC	X1, X1
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	// start with last (possibly overlapping) block
+	MOVOU	-128(AX)(CX*1), X8
+	MOVOU	-112(AX)(CX*1), X9
+	MOVOU	-96(AX)(CX*1), X10
+	MOVOU	-80(AX)(CX*1), X11
+	MOVOU	-64(AX)(CX*1), X12
+	MOVOU	-48(AX)(CX*1), X13
+	MOVOU	-32(AX)(CX*1), X14
+	MOVOU	-16(AX)(CX*1), X15
+
+	// xor in seed
+	PXOR	X0, X8
+	PXOR	X1, X9
+	PXOR	X2, X10
+	PXOR	X3, X11
+	PXOR	X4, X12
+	PXOR	X5, X13
+	PXOR	X6, X14
+	PXOR	X7, X15
+
+	// compute number of remaining 128-byte blocks
+	DECQ	CX
+	SHRQ	$7, CX
+
+aesloop:
+	// scramble state
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+
+	// scramble state, xor in a block
+	MOVOU	(AX), X0
+	MOVOU	16(AX), X1
+	MOVOU	32(AX), X2
+	MOVOU	48(AX), X3
+	AESENC	X0, X8
+	AESENC	X1, X9
+	AESENC	X2, X10
+	AESENC	X3, X11
+	MOVOU	64(AX), X4
+	MOVOU	80(AX), X5
+	MOVOU	96(AX), X6
+	MOVOU	112(AX), X7
+	AESENC	X4, X12
+	AESENC	X5, X13
+	AESENC	X6, X14
+	AESENC	X7, X15
+
+	ADDQ	$128, AX
+	DECQ	CX
+	JNE	aesloop
+
+	// 3 more scrambles to finish
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+
+	PXOR	X12, X8
+	PXOR	X13, X9
+	PXOR	X14, X10
+	PXOR	X15, X11
+	PXOR	X10, X8
+	PXOR	X11, X9
+	PXOR	X9, X8
+	MOVQ	X8, (DX)
+	RET
+
+// func memhash32(p unsafe.Pointer, h uintptr) uintptr
+TEXT runtime·memhash32(SB),NOSPLIT,$0-24
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVQ	p+0(FP), AX	// ptr to data
+	MOVQ	h+8(FP), X0	// seed
+	PINSRD	$2, (AX), X0	// data
+	AESENC	runtime·aeskeysched+0(SB), X0
+	AESENC	runtime·aeskeysched+16(SB), X0
+	AESENC	runtime·aeskeysched+32(SB), X0
+	MOVQ	X0, ret+16(FP)
+	RET
+noaes:
+	JMP	runtime·memhash32Fallback(SB)
+
+// func memhash64(p unsafe.Pointer, h uintptr) uintptr
+TEXT runtime·memhash64(SB),NOSPLIT,$0-24
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVQ	p+0(FP), AX	// ptr to data
+	MOVQ	h+8(FP), X0	// seed
+	PINSRQ	$1, (AX), X0	// data
+	AESENC	runtime·aeskeysched+0(SB), X0
+	AESENC	runtime·aeskeysched+16(SB), X0
+	AESENC	runtime·aeskeysched+32(SB), X0
+	MOVQ	X0, ret+16(FP)
+	RET
+noaes:
+	JMP	runtime·memhash64Fallback(SB)
+
+// simple mask to get rid of data in the high part of the register.
+DATA masks<>+0x00(SB)/8, $0x0000000000000000
+DATA masks<>+0x08(SB)/8, $0x0000000000000000
+DATA masks<>+0x10(SB)/8, $0x00000000000000ff
+DATA masks<>+0x18(SB)/8, $0x0000000000000000
+DATA masks<>+0x20(SB)/8, $0x000000000000ffff
+DATA masks<>+0x28(SB)/8, $0x0000000000000000
+DATA masks<>+0x30(SB)/8, $0x0000000000ffffff
+DATA masks<>+0x38(SB)/8, $0x0000000000000000
+DATA masks<>+0x40(SB)/8, $0x00000000ffffffff
+DATA masks<>+0x48(SB)/8, $0x0000000000000000
+DATA masks<>+0x50(SB)/8, $0x000000ffffffffff
+DATA masks<>+0x58(SB)/8, $0x0000000000000000
+DATA masks<>+0x60(SB)/8, $0x0000ffffffffffff
+DATA masks<>+0x68(SB)/8, $0x0000000000000000
+DATA masks<>+0x70(SB)/8, $0x00ffffffffffffff
+DATA masks<>+0x78(SB)/8, $0x0000000000000000
+DATA masks<>+0x80(SB)/8, $0xffffffffffffffff
+DATA masks<>+0x88(SB)/8, $0x0000000000000000
+DATA masks<>+0x90(SB)/8, $0xffffffffffffffff
+DATA masks<>+0x98(SB)/8, $0x00000000000000ff
+DATA masks<>+0xa0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xa8(SB)/8, $0x000000000000ffff
+DATA masks<>+0xb0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xb8(SB)/8, $0x0000000000ffffff
+DATA masks<>+0xc0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xc8(SB)/8, $0x00000000ffffffff
+DATA masks<>+0xd0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xd8(SB)/8, $0x000000ffffffffff
+DATA masks<>+0xe0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xe8(SB)/8, $0x0000ffffffffffff
+DATA masks<>+0xf0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xf8(SB)/8, $0x00ffffffffffffff
+GLOBL masks<>(SB),RODATA,$256
+
+// func checkASM() bool
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	// check that masks<>(SB) and shifts<>(SB) are aligned to 16-byte
+	MOVQ	$masks<>(SB), AX
+	MOVQ	$shifts<>(SB), BX
+	ORQ	BX, AX
+	TESTQ	$15, AX
+	SETEQ	ret+0(FP)
+	RET
+
+// these are arguments to pshufb. They move data down from
+// the high bytes of the register to the low bytes of the register.
+// index is how many bytes to move.
+DATA shifts<>+0x00(SB)/8, $0x0000000000000000
+DATA shifts<>+0x08(SB)/8, $0x0000000000000000
+DATA shifts<>+0x10(SB)/8, $0xffffffffffffff0f
+DATA shifts<>+0x18(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x20(SB)/8, $0xffffffffffff0f0e
+DATA shifts<>+0x28(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x30(SB)/8, $0xffffffffff0f0e0d
+DATA shifts<>+0x38(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x40(SB)/8, $0xffffffff0f0e0d0c
+DATA shifts<>+0x48(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x50(SB)/8, $0xffffff0f0e0d0c0b
+DATA shifts<>+0x58(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x60(SB)/8, $0xffff0f0e0d0c0b0a
+DATA shifts<>+0x68(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x70(SB)/8, $0xff0f0e0d0c0b0a09
+DATA shifts<>+0x78(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x80(SB)/8, $0x0f0e0d0c0b0a0908
+DATA shifts<>+0x88(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x90(SB)/8, $0x0e0d0c0b0a090807
+DATA shifts<>+0x98(SB)/8, $0xffffffffffffff0f
+DATA shifts<>+0xa0(SB)/8, $0x0d0c0b0a09080706
+DATA shifts<>+0xa8(SB)/8, $0xffffffffffff0f0e
+DATA shifts<>+0xb0(SB)/8, $0x0c0b0a0908070605
+DATA shifts<>+0xb8(SB)/8, $0xffffffffff0f0e0d
+DATA shifts<>+0xc0(SB)/8, $0x0b0a090807060504
+DATA shifts<>+0xc8(SB)/8, $0xffffffff0f0e0d0c
+DATA shifts<>+0xd0(SB)/8, $0x0a09080706050403
+DATA shifts<>+0xd8(SB)/8, $0xffffff0f0e0d0c0b
+DATA shifts<>+0xe0(SB)/8, $0x0908070605040302
+DATA shifts<>+0xe8(SB)/8, $0xffff0f0e0d0c0b0a
+DATA shifts<>+0xf0(SB)/8, $0x0807060504030201
+DATA shifts<>+0xf8(SB)/8, $0xff0f0e0d0c0b0a09
+GLOBL shifts<>(SB),RODATA,$256
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVL	$0, AX
+	RET
+
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$0
+	get_tls(CX)
+	MOVQ	g(CX), AX
+	MOVQ	g_m(AX), AX
+	MOVQ	m_curg(AX), AX
+	MOVQ	(g_stack+stack_hi)(AX), AX
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum. Defined as ABIInternal
+// so as to make it identifiable to traceback (this
+// function it used as a sentinel; traceback wants to
+// see the func PC, not a wrapper PC).
+TEXT runtime·goexit<ABIInternal>(SB),NOSPLIT,$0-0
+	BYTE	$0x90	// NOP
+	CALL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	BYTE	$0x90	// NOP
+
+// This is called from .init_array and follows the platform, not Go, ABI.
+TEXT runtime·addmoduledata(SB),NOSPLIT,$0-0
+	PUSHQ	R15 // The access to global variables below implicitly uses R15, which is callee-save
+	MOVQ	runtime·lastmoduledatap(SB), AX
+	MOVQ	DI, moduledata_next(AX)
+	MOVQ	DI, runtime·lastmoduledatap(SB)
+	POPQ	R15
+	RET
+
+// gcWriteBarrier performs a heap pointer write and informs the GC.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It takes two arguments:
+// - DI is the destination of the write
+// - AX is the value being written at DI
+// It clobbers FLAGS. It does not clobber any general-purpose registers,
+// but may clobber others (e.g., SSE registers).
+// Defined as ABIInternal since it does not use the stack-based Go ABI.
+TEXT runtime·gcWriteBarrier<ABIInternal>(SB),NOSPLIT,$120
+	// Save the registers clobbered by the fast path. This is slightly
+	// faster than having the caller spill these.
+	MOVQ	R14, 104(SP)
+	MOVQ	R13, 112(SP)
+	// TODO: Consider passing g.m.p in as an argument so they can be shared
+	// across a sequence of write barriers.
+	get_tls(R13)
+	MOVQ	g(R13), R13
+	MOVQ	g_m(R13), R13
+	MOVQ	m_p(R13), R13
+	MOVQ	(p_wbBuf+wbBuf_next)(R13), R14
+	// Increment wbBuf.next position.
+	LEAQ	16(R14), R14
+	MOVQ	R14, (p_wbBuf+wbBuf_next)(R13)
+	CMPQ	R14, (p_wbBuf+wbBuf_end)(R13)
+	// Record the write.
+	MOVQ	AX, -16(R14)	// Record value
+	// Note: This turns bad pointer writes into bad
+	// pointer reads, which could be confusing. We could avoid
+	// reading from obviously bad pointers, which would
+	// take care of the vast majority of these. We could
+	// patch this up in the signal handler, or use XCHG to
+	// combine the read and the write.
+	MOVQ	(DI), R13
+	MOVQ	R13, -8(R14)	// Record *slot
+	// Is the buffer full? (flags set in CMPQ above)
+	JEQ	flush
+ret:
+	MOVQ	104(SP), R14
+	MOVQ	112(SP), R13
+	// Do the write.
+	MOVQ	AX, (DI)
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	// It is possible for wbBufFlush to clobber other registers
+	// (e.g., SSE registers), but the compiler takes care of saving
+	// those in the caller if necessary. This strikes a balance
+	// with registers that are likely to be used.
+	//
+	// We don't have type information for these, but all code under
+	// here is NOSPLIT, so nothing will observe these.
+	//
+	// TODO: We could strike a different balance; e.g., saving X0
+	// and not saving GP registers that are less likely to be used.
+	MOVQ	DI, 0(SP)	// Also first argument to wbBufFlush
+	MOVQ	AX, 8(SP)	// Also second argument to wbBufFlush
+	MOVQ	BX, 16(SP)
+	MOVQ	CX, 24(SP)
+	MOVQ	DX, 32(SP)
+	// DI already saved
+	MOVQ	SI, 40(SP)
+	MOVQ	BP, 48(SP)
+	MOVQ	R8, 56(SP)
+	MOVQ	R9, 64(SP)
+	MOVQ	R10, 72(SP)
+	MOVQ	R11, 80(SP)
+	MOVQ	R12, 88(SP)
+	// R13 already saved
+	// R14 already saved
+	MOVQ	R15, 96(SP)
+
+	// This takes arguments DI and AX
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVQ	0(SP), DI
+	MOVQ	8(SP), AX
+	MOVQ	16(SP), BX
+	MOVQ	24(SP), CX
+	MOVQ	32(SP), DX
+	MOVQ	40(SP), SI
+	MOVQ	48(SP), BP
+	MOVQ	56(SP), R8
+	MOVQ	64(SP), R9
+	MOVQ	72(SP), R10
+	MOVQ	80(SP), R11
+	MOVQ	88(SP), R12
+	MOVQ	96(SP), R15
+	JMP	ret
+
+// gcWriteBarrierCX is gcWriteBarrier, but with args in DI and CX.
+// Defined as ABIInternal since it does not use the stable Go ABI.
+TEXT runtime·gcWriteBarrierCX<ABIInternal>(SB),NOSPLIT,$0
+	XCHGQ CX, AX
+	CALL runtime·gcWriteBarrier<ABIInternal>(SB)
+	XCHGQ CX, AX
+	RET
+
+// gcWriteBarrierDX is gcWriteBarrier, but with args in DI and DX.
+// Defined as ABIInternal since it does not use the stable Go ABI.
+TEXT runtime·gcWriteBarrierDX<ABIInternal>(SB),NOSPLIT,$0
+	XCHGQ DX, AX
+	CALL runtime·gcWriteBarrier<ABIInternal>(SB)
+	XCHGQ DX, AX
+	RET
+
+// gcWriteBarrierBX is gcWriteBarrier, but with args in DI and BX.
+// Defined as ABIInternal since it does not use the stable Go ABI.
+TEXT runtime·gcWriteBarrierBX<ABIInternal>(SB),NOSPLIT,$0
+	XCHGQ BX, AX
+	CALL runtime·gcWriteBarrier<ABIInternal>(SB)
+	XCHGQ BX, AX
+	RET
+
+// gcWriteBarrierBP is gcWriteBarrier, but with args in DI and BP.
+// Defined as ABIInternal since it does not use the stable Go ABI.
+TEXT runtime·gcWriteBarrierBP<ABIInternal>(SB),NOSPLIT,$0
+	XCHGQ BP, AX
+	CALL runtime·gcWriteBarrier<ABIInternal>(SB)
+	XCHGQ BP, AX
+	RET
+
+// gcWriteBarrierSI is gcWriteBarrier, but with args in DI and SI.
+// Defined as ABIInternal since it does not use the stable Go ABI.
+TEXT runtime·gcWriteBarrierSI<ABIInternal>(SB),NOSPLIT,$0
+	XCHGQ SI, AX
+	CALL runtime·gcWriteBarrier<ABIInternal>(SB)
+	XCHGQ SI, AX
+	RET
+
+// gcWriteBarrierR8 is gcWriteBarrier, but with args in DI and R8.
+// Defined as ABIInternal since it does not use the stable Go ABI.
+TEXT runtime·gcWriteBarrierR8<ABIInternal>(SB),NOSPLIT,$0
+	XCHGQ R8, AX
+	CALL runtime·gcWriteBarrier<ABIInternal>(SB)
+	XCHGQ R8, AX
+	RET
+
+// gcWriteBarrierR9 is gcWriteBarrier, but with args in DI and R9.
+// Defined as ABIInternal since it does not use the stable Go ABI.
+TEXT runtime·gcWriteBarrierR9<ABIInternal>(SB),NOSPLIT,$0
+	XCHGQ R9, AX
+	CALL runtime·gcWriteBarrier<ABIInternal>(SB)
+	XCHGQ R9, AX
+	RET
+
+DATA	debugCallFrameTooLarge<>+0x00(SB)/20, $"call frame too large"
+GLOBL	debugCallFrameTooLarge<>(SB), RODATA, $20	// Size duplicated below
+
+// debugCallV1 is the entry point for debugger-injected function
+// calls on running goroutines. It informs the runtime that a
+// debug call has been injected and creates a call frame for the
+// debugger to fill in.
+//
+// To inject a function call, a debugger should:
+// 1. Check that the goroutine is in state _Grunning and that
+//    there are at least 256 bytes free on the stack.
+// 2. Push the current PC on the stack (updating SP).
+// 3. Write the desired argument frame size at SP-16 (using the SP
+//    after step 2).
+// 4. Save all machine registers (including flags and XMM reigsters)
+//    so they can be restored later by the debugger.
+// 5. Set the PC to debugCallV1 and resume execution.
+//
+// If the goroutine is in state _Grunnable, then it's not generally
+// safe to inject a call because it may return out via other runtime
+// operations. Instead, the debugger should unwind the stack to find
+// the return to non-runtime code, add a temporary breakpoint there,
+// and inject the call once that breakpoint is hit.
+//
+// If the goroutine is in any other state, it's not safe to inject a call.
+//
+// This function communicates back to the debugger by setting RAX and
+// invoking INT3 to raise a breakpoint signal. See the comments in the
+// implementation for the protocol the debugger is expected to
+// follow. InjectDebugCall in the runtime tests demonstrates this protocol.
+//
+// The debugger must ensure that any pointers passed to the function
+// obey escape analysis requirements. Specifically, it must not pass
+// a stack pointer to an escaping argument. debugCallV1 cannot check
+// this invariant.
+//
+// This is ABIInternal because Go code injects its PC directly into new
+// goroutine stacks.
+TEXT runtime·debugCallV1<ABIInternal>(SB),NOSPLIT,$152-0
+	// Save all registers that may contain pointers so they can be
+	// conservatively scanned.
+	//
+	// We can't do anything that might clobber any of these
+	// registers before this.
+	MOVQ	R15, r15-(14*8+8)(SP)
+	MOVQ	R14, r14-(13*8+8)(SP)
+	MOVQ	R13, r13-(12*8+8)(SP)
+	MOVQ	R12, r12-(11*8+8)(SP)
+	MOVQ	R11, r11-(10*8+8)(SP)
+	MOVQ	R10, r10-(9*8+8)(SP)
+	MOVQ	R9, r9-(8*8+8)(SP)
+	MOVQ	R8, r8-(7*8+8)(SP)
+	MOVQ	DI, di-(6*8+8)(SP)
+	MOVQ	SI, si-(5*8+8)(SP)
+	MOVQ	BP, bp-(4*8+8)(SP)
+	MOVQ	BX, bx-(3*8+8)(SP)
+	MOVQ	DX, dx-(2*8+8)(SP)
+	// Save the frame size before we clobber it. Either of the last
+	// saves could clobber this depending on whether there's a saved BP.
+	MOVQ	frameSize-24(FP), DX	// aka -16(RSP) before prologue
+	MOVQ	CX, cx-(1*8+8)(SP)
+	MOVQ	AX, ax-(0*8+8)(SP)
+
+	// Save the argument frame size.
+	MOVQ	DX, frameSize-128(SP)
+
+	// Perform a safe-point check.
+	MOVQ	retpc-8(FP), AX	// Caller's PC
+	MOVQ	AX, 0(SP)
+	CALL	runtime·debugCallCheck(SB)
+	MOVQ	8(SP), AX
+	TESTQ	AX, AX
+	JZ	good
+	// The safety check failed. Put the reason string at the top
+	// of the stack.
+	MOVQ	AX, 0(SP)
+	MOVQ	16(SP), AX
+	MOVQ	AX, 8(SP)
+	// Set AX to 8 and invoke INT3. The debugger should get the
+	// reason a call can't be injected from the top of the stack
+	// and resume execution.
+	MOVQ	$8, AX
+	BYTE	$0xcc
+	JMP	restore
+
+good:
+	// Registers are saved and it's safe to make a call.
+	// Open up a call frame, moving the stack if necessary.
+	//
+	// Once the frame is allocated, this will set AX to 0 and
+	// invoke INT3. The debugger should write the argument
+	// frame for the call at SP, push the trapping PC on the
+	// stack, set the PC to the function to call, set RCX to point
+	// to the closure (if a closure call), and resume execution.
+	//
+	// If the function returns, this will set AX to 1 and invoke
+	// INT3. The debugger can then inspect any return value saved
+	// on the stack at SP and resume execution again.
+	//
+	// If the function panics, this will set AX to 2 and invoke INT3.
+	// The interface{} value of the panic will be at SP. The debugger
+	// can inspect the panic value and resume execution again.
+#define DEBUG_CALL_DISPATCH(NAME,MAXSIZE)	\
+	CMPQ	AX, $MAXSIZE;			\
+	JA	5(PC);				\
+	MOVQ	$NAME(SB), AX;			\
+	MOVQ	AX, 0(SP);			\
+	CALL	runtime·debugCallWrap(SB);	\
+	JMP	restore
+
+	MOVQ	frameSize-128(SP), AX
+	DEBUG_CALL_DISPATCH(debugCall32<>, 32)
+	DEBUG_CALL_DISPATCH(debugCall64<>, 64)
+	DEBUG_CALL_DISPATCH(debugCall128<>, 128)
+	DEBUG_CALL_DISPATCH(debugCall256<>, 256)
+	DEBUG_CALL_DISPATCH(debugCall512<>, 512)
+	DEBUG_CALL_DISPATCH(debugCall1024<>, 1024)
+	DEBUG_CALL_DISPATCH(debugCall2048<>, 2048)
+	DEBUG_CALL_DISPATCH(debugCall4096<>, 4096)
+	DEBUG_CALL_DISPATCH(debugCall8192<>, 8192)
+	DEBUG_CALL_DISPATCH(debugCall16384<>, 16384)
+	DEBUG_CALL_DISPATCH(debugCall32768<>, 32768)
+	DEBUG_CALL_DISPATCH(debugCall65536<>, 65536)
+	// The frame size is too large. Report the error.
+	MOVQ	$debugCallFrameTooLarge<>(SB), AX
+	MOVQ	AX, 0(SP)
+	MOVQ	$20, 8(SP) // length of debugCallFrameTooLarge string
+	MOVQ	$8, AX
+	BYTE	$0xcc
+	JMP	restore
+
+restore:
+	// Calls and failures resume here.
+	//
+	// Set AX to 16 and invoke INT3. The debugger should restore
+	// all registers except RIP and RSP and resume execution.
+	MOVQ	$16, AX
+	BYTE	$0xcc
+	// We must not modify flags after this point.
+
+	// Restore pointer-containing registers, which may have been
+	// modified from the debugger's copy by stack copying.
+	MOVQ	ax-(0*8+8)(SP), AX
+	MOVQ	cx-(1*8+8)(SP), CX
+	MOVQ	dx-(2*8+8)(SP), DX
+	MOVQ	bx-(3*8+8)(SP), BX
+	MOVQ	bp-(4*8+8)(SP), BP
+	MOVQ	si-(5*8+8)(SP), SI
+	MOVQ	di-(6*8+8)(SP), DI
+	MOVQ	r8-(7*8+8)(SP), R8
+	MOVQ	r9-(8*8+8)(SP), R9
+	MOVQ	r10-(9*8+8)(SP), R10
+	MOVQ	r11-(10*8+8)(SP), R11
+	MOVQ	r12-(11*8+8)(SP), R12
+	MOVQ	r13-(12*8+8)(SP), R13
+	MOVQ	r14-(13*8+8)(SP), R14
+	MOVQ	r15-(14*8+8)(SP), R15
+
+	RET
+
+// runtime.debugCallCheck assumes that functions defined with the
+// DEBUG_CALL_FN macro are safe points to inject calls.
+#define DEBUG_CALL_FN(NAME,MAXSIZE)		\
+TEXT NAME(SB),WRAPPER,$MAXSIZE-0;		\
+	NO_LOCAL_POINTERS;			\
+	MOVQ	$0, AX;				\
+	BYTE	$0xcc;				\
+	MOVQ	$1, AX;				\
+	BYTE	$0xcc;				\
+	RET
+DEBUG_CALL_FN(debugCall32<>, 32)
+DEBUG_CALL_FN(debugCall64<>, 64)
+DEBUG_CALL_FN(debugCall128<>, 128)
+DEBUG_CALL_FN(debugCall256<>, 256)
+DEBUG_CALL_FN(debugCall512<>, 512)
+DEBUG_CALL_FN(debugCall1024<>, 1024)
+DEBUG_CALL_FN(debugCall2048<>, 2048)
+DEBUG_CALL_FN(debugCall4096<>, 4096)
+DEBUG_CALL_FN(debugCall8192<>, 8192)
+DEBUG_CALL_FN(debugCall16384<>, 16384)
+DEBUG_CALL_FN(debugCall32768<>, 32768)
+DEBUG_CALL_FN(debugCall65536<>, 65536)
+
+// func debugCallPanicked(val interface{})
+TEXT runtime·debugCallPanicked(SB),NOSPLIT,$16-16
+	// Copy the panic value to the top of stack.
+	MOVQ	val_type+0(FP), AX
+	MOVQ	AX, 0(SP)
+	MOVQ	val_data+8(FP), AX
+	MOVQ	AX, 8(SP)
+	MOVQ	$2, AX
+	BYTE	$0xcc
+	RET
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+// Defined as ABIInternal since they do not use the stack-based Go ABI.
+TEXT runtime·panicIndex<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	AX, x+0(FP)
+	MOVQ	CX, y+8(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	AX, x+0(FP)
+	MOVQ	CX, y+8(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, x+0(FP)
+	MOVQ	DX, y+8(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, x+0(FP)
+	MOVQ	DX, y+8(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, x+0(FP)
+	MOVQ	DX, y+8(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, x+0(FP)
+	MOVQ	DX, y+8(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	AX, x+0(FP)
+	MOVQ	CX, y+8(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	AX, x+0(FP)
+	MOVQ	CX, y+8(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	DX, x+0(FP)
+	MOVQ	BX, y+8(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	DX, x+0(FP)
+	MOVQ	BX, y+8(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	DX, x+0(FP)
+	MOVQ	BX, y+8(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	DX, x+0(FP)
+	MOVQ	BX, y+8(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, x+0(FP)
+	MOVQ	DX, y+8(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, x+0(FP)
+	MOVQ	DX, y+8(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	AX, x+0(FP)
+	MOVQ	CX, y+8(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	AX, x+0(FP)
+	MOVQ	CX, y+8(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+
+#ifdef GOOS_android
+// Use the free TLS_SLOT_APP slot #2 on Android Q.
+// Earlier androids are set up in gcc_android.c.
+DATA runtime·tls_g+0(SB)/8, $16
+GLOBL runtime·tls_g+0(SB), NOPTR, $8
+#endif
+
+// The compiler and assembler's -spectre=ret mode rewrites
+// all indirect CALL AX / JMP AX instructions to be
+// CALL retpolineAX / JMP retpolineAX.
+// See https://support.google.com/faqs/answer/7625886.
+#define RETPOLINE(reg) \
+	/*   CALL setup */     BYTE $0xE8; BYTE $(2+2); BYTE $0; BYTE $0; BYTE $0;	\
+	/* nospec: */									\
+	/*   PAUSE */           BYTE $0xF3; BYTE $0x90;					\
+	/*   JMP nospec */      BYTE $0xEB; BYTE $-(2+2);				\
+	/* setup: */									\
+	/*   MOVQ AX, 0(SP) */  BYTE $0x48|((reg&8)>>1); BYTE $0x89;			\
+	                        BYTE $0x04|((reg&7)<<3); BYTE $0x24;			\
+	/*   RET */             BYTE $0xC3
+
+TEXT runtime·retpolineAX(SB),NOSPLIT,$0; RETPOLINE(0)
+TEXT runtime·retpolineCX(SB),NOSPLIT,$0; RETPOLINE(1)
+TEXT runtime·retpolineDX(SB),NOSPLIT,$0; RETPOLINE(2)
+TEXT runtime·retpolineBX(SB),NOSPLIT,$0; RETPOLINE(3)
+/* SP is 4, can't happen / magic encodings */
+TEXT runtime·retpolineBP(SB),NOSPLIT,$0; RETPOLINE(5)
+TEXT runtime·retpolineSI(SB),NOSPLIT,$0; RETPOLINE(6)
+TEXT runtime·retpolineDI(SB),NOSPLIT,$0; RETPOLINE(7)
+TEXT runtime·retpolineR8(SB),NOSPLIT,$0; RETPOLINE(8)
+TEXT runtime·retpolineR9(SB),NOSPLIT,$0; RETPOLINE(9)
+TEXT runtime·retpolineR10(SB),NOSPLIT,$0; RETPOLINE(10)
+TEXT runtime·retpolineR11(SB),NOSPLIT,$0; RETPOLINE(11)
+TEXT runtime·retpolineR12(SB),NOSPLIT,$0; RETPOLINE(12)
+TEXT runtime·retpolineR13(SB),NOSPLIT,$0; RETPOLINE(13)
+TEXT runtime·retpolineR14(SB),NOSPLIT,$0; RETPOLINE(14)
+TEXT runtime·retpolineR15(SB),NOSPLIT,$0; RETPOLINE(15)
diff --git a/src/runtime/asm_arm.s b/src/runtime/asm_arm.s
new file mode 100644
index 0000000..c54b4eb
--- /dev/null
+++ b/src/runtime/asm_arm.s
@@ -0,0 +1,1089 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// _rt0_arm is common startup code for most ARM systems when using
+// internal linking. This is the entry point for the program from the
+// kernel for an ordinary -buildmode=exe program. The stack holds the
+// number of arguments and the C-style argv.
+TEXT _rt0_arm(SB),NOSPLIT|NOFRAME,$0
+	MOVW	(R13), R0	// argc
+	MOVW	$4(R13), R1		// argv
+	B	runtime·rt0_go(SB)
+
+// main is common startup code for most ARM systems when using
+// external linking. The C startup code will call the symbol "main"
+// passing argc and argv in the usual C ABI registers R0 and R1.
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	B	runtime·rt0_go(SB)
+
+// _rt0_arm_lib is common startup code for most ARM systems when
+// using -buildmode=c-archive or -buildmode=c-shared. The linker will
+// arrange to invoke this function as a global constructor (for
+// c-archive) or when the shared library is loaded (for c-shared).
+// We expect argc and argv to be passed in the usual C ABI registers
+// R0 and R1.
+TEXT _rt0_arm_lib(SB),NOSPLIT,$104
+	// Preserve callee-save registers. Raspberry Pi's dlopen(), for example,
+	// actually cares that R11 is preserved.
+	MOVW	R4, 12(R13)
+	MOVW	R5, 16(R13)
+	MOVW	R6, 20(R13)
+	MOVW	R7, 24(R13)
+	MOVW	R8, 28(R13)
+	MOVW	g, 32(R13)
+	MOVW	R11, 36(R13)
+
+	// Skip floating point registers on GOARM < 6.
+	MOVB    runtime·goarm(SB), R11
+	CMP	$6, R11
+	BLT	skipfpsave
+	MOVD	F8, (40+8*0)(R13)
+	MOVD	F9, (40+8*1)(R13)
+	MOVD	F10, (40+8*2)(R13)
+	MOVD	F11, (40+8*3)(R13)
+	MOVD	F12, (40+8*4)(R13)
+	MOVD	F13, (40+8*5)(R13)
+	MOVD	F14, (40+8*6)(R13)
+	MOVD	F15, (40+8*7)(R13)
+skipfpsave:
+	// Save argc/argv.
+	MOVW	R0, _rt0_arm_lib_argc<>(SB)
+	MOVW	R1, _rt0_arm_lib_argv<>(SB)
+
+	MOVW	$0, g // Initialize g.
+
+	// Synchronous initialization.
+	CALL	runtime·libpreinit(SB)
+
+	// Create a new thread to do the runtime initialization.
+	MOVW	_cgo_sys_thread_create(SB), R2
+	CMP	$0, R2
+	BEQ	nocgo
+	MOVW	$_rt0_arm_lib_go<>(SB), R0
+	MOVW	$0, R1
+	BL	(R2)
+	B	rr
+nocgo:
+	MOVW	$0x800000, R0                     // stacksize = 8192KB
+	MOVW	$_rt0_arm_lib_go<>(SB), R1  // fn
+	MOVW	R0, 4(R13)
+	MOVW	R1, 8(R13)
+	BL	runtime·newosproc0(SB)
+rr:
+	// Restore callee-save registers and return.
+	MOVB    runtime·goarm(SB), R11
+	CMP	$6, R11
+	BLT	skipfprest
+	MOVD	(40+8*0)(R13), F8
+	MOVD	(40+8*1)(R13), F9
+	MOVD	(40+8*2)(R13), F10
+	MOVD	(40+8*3)(R13), F11
+	MOVD	(40+8*4)(R13), F12
+	MOVD	(40+8*5)(R13), F13
+	MOVD	(40+8*6)(R13), F14
+	MOVD	(40+8*7)(R13), F15
+skipfprest:
+	MOVW	12(R13), R4
+	MOVW	16(R13), R5
+	MOVW	20(R13), R6
+	MOVW	24(R13), R7
+	MOVW	28(R13), R8
+	MOVW	32(R13), g
+	MOVW	36(R13), R11
+	RET
+
+// _rt0_arm_lib_go initializes the Go runtime.
+// This is started in a separate thread by _rt0_arm_lib.
+TEXT _rt0_arm_lib_go<>(SB),NOSPLIT,$8
+	MOVW	_rt0_arm_lib_argc<>(SB), R0
+	MOVW	_rt0_arm_lib_argv<>(SB), R1
+	B	runtime·rt0_go(SB)
+
+DATA _rt0_arm_lib_argc<>(SB)/4,$0
+GLOBL _rt0_arm_lib_argc<>(SB),NOPTR,$4
+DATA _rt0_arm_lib_argv<>(SB)/4,$0
+GLOBL _rt0_arm_lib_argv<>(SB),NOPTR,$4
+
+// using NOFRAME means do not save LR on stack.
+// argc is in R0, argv is in R1.
+TEXT runtime·rt0_go(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0xcafebabe, R12
+
+	// copy arguments forward on an even stack
+	// use R13 instead of SP to avoid linker rewriting the offsets
+	SUB	$64, R13		// plenty of scratch
+	AND	$~7, R13
+	MOVW	R0, 60(R13)		// save argc, argv away
+	MOVW	R1, 64(R13)
+
+	// set up g register
+	// g is R10
+	MOVW	$runtime·g0(SB), g
+	MOVW	$runtime·m0(SB), R8
+
+	// save m->g0 = g0
+	MOVW	g, m_g0(R8)
+	// save g->m = m0
+	MOVW	R8, g_m(g)
+
+	// create istack out of the OS stack
+	// (1MB of system stack is available on iOS and Android)
+	MOVW	$(-64*1024+104)(R13), R0
+	MOVW	R0, g_stackguard0(g)
+	MOVW	R0, g_stackguard1(g)
+	MOVW	R0, (g_stack+stack_lo)(g)
+	MOVW	R13, (g_stack+stack_hi)(g)
+
+	BL	runtime·emptyfunc(SB)	// fault if stack check is wrong
+
+	BL	runtime·_initcgo(SB)	// will clobber R0-R3
+
+	// update stackguard after _cgo_init
+	MOVW	(g_stack+stack_lo)(g), R0
+	ADD	$const__StackGuard, R0
+	MOVW	R0, g_stackguard0(g)
+	MOVW	R0, g_stackguard1(g)
+
+	BL	runtime·check(SB)
+
+	// saved argc, argv
+	MOVW	60(R13), R0
+	MOVW	R0, 4(R13)
+	MOVW	64(R13), R1
+	MOVW	R1, 8(R13)
+	BL	runtime·args(SB)
+	BL	runtime·checkgoarm(SB)
+	BL	runtime·osinit(SB)
+	BL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVW	$runtime·mainPC(SB), R0
+	MOVW.W	R0, -4(R13)
+	MOVW	$8, R0
+	MOVW.W	R0, -4(R13)
+	MOVW	$0, R0
+	MOVW.W	R0, -4(R13)	// push $0 as guard
+	BL	runtime·newproc(SB)
+	MOVW	$12(R13), R13	// pop args and LR
+
+	// start this M
+	BL	runtime·mstart(SB)
+
+	MOVW	$1234, R0
+	MOVW	$1000, R1
+	MOVW	R0, (R1)	// fail hard
+
+DATA	runtime·mainPC+0(SB)/4,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$4
+
+TEXT runtime·breakpoint(SB),NOSPLIT,$0-0
+	// gdb won't skip this breakpoint instruction automatically,
+	// so you must manually "set $pc+=4" to skip it and continue.
+#ifdef GOOS_plan9
+	WORD	$0xD1200070	// undefined instruction used as armv5 breakpoint in Plan 9
+#else
+	WORD	$0xe7f001f0	// undefined instruction that gdb understands is a software breakpoint
+#endif
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT,$0-0
+	// disable runfast (flush-to-zero) mode of vfp if runtime.goarm > 5
+	MOVB	runtime·goarm(SB), R11
+	CMP	$5, R11
+	BLE	4(PC)
+	WORD	$0xeef1ba10	// vmrs r11, fpscr
+	BIC	$(1<<24), R11
+	WORD	$0xeee1ba10	// vmsr fpscr, r11
+	RET
+
+/*
+ *  go-routine
+ */
+
+// void gosave(Gobuf*)
+// save state in Gobuf; setjmp
+TEXT runtime·gosave(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	buf+0(FP), R0
+	MOVW	R13, gobuf_sp(R0)
+	MOVW	LR, gobuf_pc(R0)
+	MOVW	g, gobuf_g(R0)
+	MOVW	$0, R11
+	MOVW	R11, gobuf_lr(R0)
+	MOVW	R11, gobuf_ret(R0)
+	// Assert ctxt is zero. See func save.
+	MOVW	gobuf_ctxt(R0), R0
+	CMP	R0, R11
+	B.EQ	2(PC)
+	CALL	runtime·badctxt(SB)
+	RET
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB),NOSPLIT,$8-4
+	MOVW	buf+0(FP), R1
+	MOVW	gobuf_g(R1), R0
+	BL	setg<>(SB)
+
+	// NOTE: We updated g above, and we are about to update SP.
+	// Until LR and PC are also updated, the g/SP/LR/PC quadruple
+	// are out of sync and must not be used as the basis of a traceback.
+	// Sigprof skips the traceback when SP is not within g's bounds,
+	// and when the PC is inside this function, runtime.gogo.
+	// Since we are about to update SP, until we complete runtime.gogo
+	// we must not leave this function. In particular, no calls
+	// after this point: it must be straight-line code until the
+	// final B instruction.
+	// See large comment in sigprof for more details.
+	MOVW	gobuf_sp(R1), R13	// restore SP==R13
+	MOVW	gobuf_lr(R1), LR
+	MOVW	gobuf_ret(R1), R0
+	MOVW	gobuf_ctxt(R1), R7
+	MOVW	$0, R11
+	MOVW	R11, gobuf_sp(R1)	// clear to help garbage collector
+	MOVW	R11, gobuf_ret(R1)
+	MOVW	R11, gobuf_lr(R1)
+	MOVW	R11, gobuf_ctxt(R1)
+	MOVW	gobuf_pc(R1), R11
+	CMP	R11, R11 // set condition codes for == test, needed by stack split
+	B	(R11)
+
+// func mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB),NOSPLIT|NOFRAME,$0-4
+	// Save caller state in g->sched.
+	MOVW	R13, (g_sched+gobuf_sp)(g)
+	MOVW	LR, (g_sched+gobuf_pc)(g)
+	MOVW	$0, R11
+	MOVW	R11, (g_sched+gobuf_lr)(g)
+	MOVW	g, (g_sched+gobuf_g)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVW	g, R1
+	MOVW	g_m(g), R8
+	MOVW	m_g0(R8), R0
+	BL	setg<>(SB)
+	CMP	g, R1
+	B.NE	2(PC)
+	B	runtime·badmcall(SB)
+	MOVB	runtime·iscgo(SB), R11
+	CMP	$0, R11
+	BL.NE	runtime·save_g(SB)
+	MOVW	fn+0(FP), R0
+	MOVW	(g_sched+gobuf_sp)(g), R13
+	SUB	$8, R13
+	MOVW	R1, 4(R13)
+	MOVW	R0, R7
+	MOVW	0(R0), R0
+	BL	(R0)
+	B	runtime·badmcall2(SB)
+	RET
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB),NOSPLIT,$0-0
+	MOVW	$0, R0
+	BL	(R0) // clobber lr to ensure push {lr} is kept
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB),NOSPLIT,$0-4
+	MOVW	fn+0(FP), R0	// R0 = fn
+	MOVW	g_m(g), R1	// R1 = m
+
+	MOVW	m_gsignal(R1), R2	// R2 = gsignal
+	CMP	g, R2
+	B.EQ	noswitch
+
+	MOVW	m_g0(R1), R2	// R2 = g0
+	CMP	g, R2
+	B.EQ	noswitch
+
+	MOVW	m_curg(R1), R3
+	CMP	g, R3
+	B.EQ	switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVW	$runtime·badsystemstack(SB), R0
+	BL	(R0)
+	B	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVW	$runtime·systemstack_switch(SB), R3
+	ADD	$4, R3, R3 // get past push {lr}
+	MOVW	R3, (g_sched+gobuf_pc)(g)
+	MOVW	R13, (g_sched+gobuf_sp)(g)
+	MOVW	LR, (g_sched+gobuf_lr)(g)
+	MOVW	g, (g_sched+gobuf_g)(g)
+
+	// switch to g0
+	MOVW	R0, R5
+	MOVW	R2, R0
+	BL	setg<>(SB)
+	MOVW	R5, R0
+	MOVW	(g_sched+gobuf_sp)(R2), R3
+	// make it look like mstart called systemstack on g0, to stop traceback
+	SUB	$4, R3, R3
+	MOVW	$runtime·mstart(SB), R4
+	MOVW	R4, 0(R3)
+	MOVW	R3, R13
+
+	// call target function
+	MOVW	R0, R7
+	MOVW	0(R0), R0
+	BL	(R0)
+
+	// switch back to g
+	MOVW	g_m(g), R1
+	MOVW	m_curg(R1), R0
+	BL	setg<>(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R13
+	MOVW	$0, R3
+	MOVW	R3, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVW	R0, R7
+	MOVW	0(R0), R0
+	MOVW.P	4(R13), R14	// restore LR
+	B	(R0)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// R3 prolog's LR
+// using NOFRAME means do not save LR on stack.
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVW	g_m(g), R8
+	MOVW	m_g0(R8), R4
+	CMP	g, R4
+	BNE	3(PC)
+	BL	runtime·badmorestackg0(SB)
+	B	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVW	m_gsignal(R8), R4
+	CMP	g, R4
+	BNE	3(PC)
+	BL	runtime·badmorestackgsignal(SB)
+	B	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVW	R13, (g_sched+gobuf_sp)(g)
+	MOVW	LR, (g_sched+gobuf_pc)(g)
+	MOVW	R3, (g_sched+gobuf_lr)(g)
+	MOVW	R7, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVW	R3, (m_morebuf+gobuf_pc)(R8)	// f's caller's PC
+	MOVW	R13, (m_morebuf+gobuf_sp)(R8)	// f's caller's SP
+	MOVW	g, (m_morebuf+gobuf_g)(R8)
+
+	// Call newstack on m->g0's stack.
+	MOVW	m_g0(R8), R0
+	BL	setg<>(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R13
+	MOVW	$0, R0
+	MOVW.W  R0, -4(R13)	// create a call frame on g0 (saved LR)
+	BL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	RET
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	$0, R7
+	B runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(argtype *_type, f *FuncVal, arg *byte, argsize, retoffset uint32).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	CMP	$MAXSIZE, R0;		\
+	B.HI	3(PC);			\
+	MOVW	$NAME(SB), R1;		\
+	B	(R1)
+
+TEXT ·reflectcall(SB),NOSPLIT|NOFRAME,$0-20
+	MOVW	argsize+12(FP), R0
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVW	$runtime·badreflectcall(SB), R1
+	B	(R1)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-20;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVW	argptr+8(FP), R0;		\
+	MOVW	argsize+12(FP), R2;		\
+	ADD	$4, R13, R1;			\
+	CMP	$0, R2;				\
+	B.EQ	5(PC);				\
+	MOVBU.P	1(R0), R5;			\
+	MOVBU.P R5, 1(R1);			\
+	SUB	$1, R2, R2;			\
+	B	-5(PC);				\
+	/* call function */			\
+	MOVW	f+4(FP), R7;			\
+	MOVW	(R7), R0;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	BL	(R0);				\
+	/* copy return values back */		\
+	MOVW	argtype+0(FP), R4;		\
+	MOVW	argptr+8(FP), R0;		\
+	MOVW	argsize+12(FP), R2;		\
+	MOVW	retoffset+16(FP), R3;		\
+	ADD	$4, R13, R1;			\
+	ADD	R3, R1;				\
+	ADD	R3, R0;				\
+	SUB	R3, R2;				\
+	BL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $16-0
+	MOVW	R4, 4(R13)
+	MOVW	R0, 8(R13)
+	MOVW	R1, 12(R13)
+	MOVW	R2, 16(R13)
+	BL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+// void jmpdefer(fn, sp);
+// called from deferreturn.
+// 1. grab stored LR for caller
+// 2. sub 4 bytes to get back to BL deferreturn
+// 3. B to fn
+// TODO(rsc): Push things on stack and then use pop
+// to load all registers simultaneously, so that a profiling
+// interrupt can never see mismatched SP/LR/PC.
+// (And double-check that pop is atomic in that way.)
+TEXT runtime·jmpdefer(SB),NOSPLIT,$0-8
+	MOVW	0(R13), LR
+	MOVW	$-4(LR), LR	// BL deferreturn
+	MOVW	fv+0(FP), R7
+	MOVW	argp+4(FP), R13
+	MOVW	$-4(R13), R13	// SP is 4 below argp, due to saved LR
+	MOVW	0(R7), R1
+	B	(R1)
+
+// Save state of caller into g->sched. Smashes R11.
+TEXT gosave<>(SB),NOSPLIT|NOFRAME,$0
+	MOVW	LR, (g_sched+gobuf_pc)(g)
+	MOVW	R13, (g_sched+gobuf_sp)(g)
+	MOVW	$0, R11
+	MOVW	R11, (g_sched+gobuf_lr)(g)
+	MOVW	R11, (g_sched+gobuf_ret)(g)
+	MOVW	R11, (g_sched+gobuf_ctxt)(g)
+	// Assert ctxt is zero. See func save.
+	MOVW	(g_sched+gobuf_ctxt)(g), R11
+	CMP	$0, R11
+	B.EQ	2(PC)
+	CALL	runtime·badctxt(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-12
+	MOVW	fn+0(FP), R1
+	MOVW	arg+4(FP), R0
+
+	MOVW	R13, R2
+	CMP	$0, g
+	BEQ nosave
+	MOVW	g, R4
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already.
+	MOVW	g_m(g), R8
+	MOVW	m_gsignal(R8), R3
+	CMP	R3, g
+	BEQ	nosave
+	MOVW	m_g0(R8), R3
+	CMP	R3, g
+	BEQ	nosave
+	BL	gosave<>(SB)
+	MOVW	R0, R5
+	MOVW	R3, R0
+	BL	setg<>(SB)
+	MOVW	R5, R0
+	MOVW	(g_sched+gobuf_sp)(g), R13
+
+	// Now on a scheduling stack (a pthread-created stack).
+	SUB	$24, R13
+	BIC	$0x7, R13	// alignment for gcc ABI
+	MOVW	R4, 20(R13) // save old g
+	MOVW	(g_stack+stack_hi)(R4), R4
+	SUB	R2, R4
+	MOVW	R4, 16(R13)	// save depth in stack (can't just save SP, as stack might be copied during a callback)
+	BL	(R1)
+
+	// Restore registers, g, stack pointer.
+	MOVW	R0, R5
+	MOVW	20(R13), R0
+	BL	setg<>(SB)
+	MOVW	(g_stack+stack_hi)(g), R1
+	MOVW	16(R13), R2
+	SUB	R2, R1
+	MOVW	R5, R0
+	MOVW	R1, R13
+
+	MOVW	R0, ret+8(FP)
+	RET
+
+nosave:
+	// Running on a system stack, perhaps even without a g.
+	// Having no g can happen during thread creation or thread teardown
+	// (see needm/dropm on Solaris, for example).
+	// This code is like the above sequence but without saving/restoring g
+	// and without worrying about the stack moving out from under us
+	// (because we're on a system stack, not a goroutine stack).
+	// The above code could be used directly if already on a system stack,
+	// but then the only path through this code would be a rare case on Solaris.
+	// Using this code for all "already on system stack" calls exercises it more,
+	// which should help keep it correct.
+	SUB	$24, R13
+	BIC	$0x7, R13	// alignment for gcc ABI
+	// save null g in case someone looks during debugging.
+	MOVW	$0, R4
+	MOVW	R4, 20(R13)
+	MOVW	R2, 16(R13)	// Save old stack pointer.
+	BL	(R1)
+	// Restore stack pointer.
+	MOVW	16(R13), R2
+	MOVW	R2, R13
+	MOVW	R0, ret+8(FP)
+	RET
+
+// cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT	·cgocallback(SB),NOSPLIT,$12-12
+	NO_LOCAL_POINTERS
+
+	// Load m and g from thread-local storage.
+	MOVB	runtime·iscgo(SB), R0
+	CMP	$0, R0
+	BL.NE	runtime·load_g(SB)
+
+	// If g is nil, Go did not create the current thread.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	CMP	$0, g
+	B.EQ	needm
+
+	MOVW	g_m(g), R8
+	MOVW	R8, savedm-4(SP)
+	B	havem
+
+needm:
+	MOVW	g, savedm-4(SP) // g is zero, so is m.
+	MOVW	$runtime·needm(SB), R0
+	BL	(R0)
+
+	// Set m->g0->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVW	g_m(g), R8
+	MOVW	m_g0(R8), R3
+	MOVW	R13, (g_sched+gobuf_sp)(R3)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 4(R13) aka savedsp-12(SP).
+	MOVW	m_g0(R8), R3
+	MOVW	(g_sched+gobuf_sp)(R3), R4
+	MOVW	R4, savedsp-12(SP)	// must match frame size
+	MOVW	R13, (g_sched+gobuf_sp)(R3)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVW	m_curg(R8), R0
+	BL	setg<>(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R4 // prepare stack as R4
+	MOVW	(g_sched+gobuf_pc)(g), R5
+	MOVW	R5, -(12+4)(R4)	// "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOVW	fn+0(FP), R1
+	MOVW	frame+4(FP), R2
+	MOVW	ctxt+8(FP), R3
+	MOVW	$-(12+4)(R4), R13	// switch stack; must match frame size
+	MOVW	R1, 4(R13)
+	MOVW	R2, 8(R13)
+	MOVW	R3, 12(R13)
+	BL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVW	0(R13), R5
+	MOVW	R5, (g_sched+gobuf_pc)(g)
+	MOVW	$(12+4)(R13), R4	// must match frame size
+	MOVW	R4, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVW	g_m(g), R8
+	MOVW	m_g0(R8), R0
+	BL	setg<>(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R13
+	MOVW	savedsp-12(SP), R4	// must match frame size
+	MOVW	R4, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m
+	// for the duration of the call. Since the call is over, return it with dropm.
+	MOVW	savedm-4(SP), R6
+	CMP	$0, R6
+	B.NE	3(PC)
+	MOVW	$runtime·dropm(SB), R0
+	BL	(R0)
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	gg+0(FP), R0
+	B	setg<>(SB)
+
+TEXT setg<>(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	R0, g
+
+	// Save g to thread-local storage.
+#ifdef GOOS_windows
+	B	runtime·save_g(SB)
+#else
+	MOVB	runtime·iscgo(SB), R0
+	CMP	$0, R0
+	B.EQ	2(PC)
+	B	runtime·save_g(SB)
+
+	MOVW	g, R0
+	RET
+#endif
+
+TEXT runtime·emptyfunc(SB),0,$0-0
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	$0, R0
+	MOVW	(R0), R1
+
+// armPublicationBarrier is a native store/store barrier for ARMv7+.
+// On earlier ARM revisions, armPublicationBarrier is a no-op.
+// This will not work on SMP ARMv6 machines, if any are in use.
+// To implement publicationBarrier in sys_$GOOS_arm.s using the native
+// instructions, use:
+//
+//	TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+//		B	runtime·armPublicationBarrier(SB)
+//
+TEXT runtime·armPublicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	DMB	MB_ST
+	RET
+
+// AES hashing not implemented for ARM
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB),NOSPLIT,$0
+	MOVW	$0, R0
+	RET
+
+TEXT runtime·procyield(SB),NOSPLIT|NOFRAME,$0
+	MOVW	cycles+0(FP), R1
+	MOVW	$0, R0
+yieldloop:
+	WORD	$0xe320f001	// YIELD (NOP pre-ARMv6K)
+	CMP	R0, R1
+	B.NE	2(PC)
+	RET
+	SUB	$1, R1
+	B yieldloop
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$8
+	// R11 and g register are clobbered by load_g. They are
+	// callee-save in the gcc calling convention, so save them here.
+	MOVW	R11, saveR11-4(SP)
+	MOVW	g, saveG-8(SP)
+
+	BL	runtime·load_g(SB)
+	MOVW	g_m(g), R0
+	MOVW	m_curg(R0), R0
+	MOVW	(g_stack+stack_hi)(R0), R0
+
+	MOVW	saveG-8(SP), g
+	MOVW	saveR11-4(SP), R11
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	MOVW	R0, R0	// NOP
+	BL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	MOVW	R0, R0	// NOP
+
+// x -> x/1000000, x%1000000, called from Go with args, results on stack.
+TEXT runtime·usplit(SB),NOSPLIT,$0-12
+	MOVW	x+0(FP), R0
+	CALL	runtime·usplitR0(SB)
+	MOVW	R0, q+4(FP)
+	MOVW	R1, r+8(FP)
+	RET
+
+// R0, R1 = R0/1000000, R0%1000000
+TEXT runtime·usplitR0(SB),NOSPLIT,$0
+	// magic multiply to avoid software divide without available m.
+	// see output of go tool compile -S for x/1000000.
+	MOVW	R0, R3
+	MOVW	$1125899907, R1
+	MULLU	R1, R0, (R0, R1)
+	MOVW	R0>>18, R0
+	MOVW	$1000000, R1
+	MULU	R0, R1
+	SUB	R1, R3, R1
+	RET
+
+// This is called from .init_array and follows the platform, not Go, ABI.
+TEXT runtime·addmoduledata(SB),NOSPLIT,$0-0
+	MOVW	R9, saver9-4(SP) // The access to global variables below implicitly uses R9, which is callee-save
+	MOVW	R11, saver11-8(SP) // Likewise, R11 is the temp register, but callee-save in C ABI
+	MOVW	runtime·lastmoduledatap(SB), R1
+	MOVW	R0, moduledata_next(R1)
+	MOVW	R0, runtime·lastmoduledatap(SB)
+	MOVW	saver11-8(SP), R11
+	MOVW	saver9-4(SP), R9
+	RET
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R3
+	MOVB	R3, ret+0(FP)
+	RET
+
+// gcWriteBarrier performs a heap pointer write and informs the GC.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It takes two arguments:
+// - R2 is the destination of the write
+// - R3 is the value being written at R2
+// It clobbers condition codes.
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+// The act of CALLing gcWriteBarrier will clobber R14 (LR).
+TEXT runtime·gcWriteBarrier(SB),NOSPLIT|NOFRAME,$0
+	// Save the registers clobbered by the fast path.
+	MOVM.DB.W	[R0,R1], (R13)
+	MOVW	g_m(g), R0
+	MOVW	m_p(R0), R0
+	MOVW	(p_wbBuf+wbBuf_next)(R0), R1
+	// Increment wbBuf.next position.
+	ADD	$8, R1
+	MOVW	R1, (p_wbBuf+wbBuf_next)(R0)
+	MOVW	(p_wbBuf+wbBuf_end)(R0), R0
+	CMP	R1, R0
+	// Record the write.
+	MOVW	R3, -8(R1)	// Record value
+	MOVW	(R2), R0	// TODO: This turns bad writes into bad reads.
+	MOVW	R0, -4(R1)	// Record *slot
+	// Is the buffer full? (flags set in CMP above)
+	B.EQ	flush
+ret:
+	MOVM.IA.W	(R13), [R0,R1]
+	// Do the write.
+	MOVW	R3, (R2)
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	//
+	// R0 and R1 were saved at entry.
+	// R10 is g, so preserved.
+	// R11 is linker temp, so no need to save.
+	// R13 is stack pointer.
+	// R15 is PC.
+	//
+	// This also sets up R2 and R3 as the arguments to wbBufFlush.
+	MOVM.DB.W	[R2-R9,R12], (R13)
+	// Save R14 (LR) because the fast path above doesn't save it,
+	// but needs it to RET. This is after the MOVM so it appears below
+	// the arguments in the stack frame.
+	MOVM.DB.W	[R14], (R13)
+
+	// This takes arguments R2 and R3.
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVM.IA.W	(R13), [R14]
+	MOVM.IA.W	(R13), [R2-R9,R12]
+	JMP	ret
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+
+// Extended versions for 64-bit indexes.
+TEXT runtime·panicExtendIndex(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendIndex(SB)
+TEXT runtime·panicExtendIndexU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendIndexU(SB)
+TEXT runtime·panicExtendSliceAlen(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlen(SB)
+TEXT runtime·panicExtendSliceAlenU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlenU(SB)
+TEXT runtime·panicExtendSliceAcap(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcap(SB)
+TEXT runtime·panicExtendSliceAcapU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcapU(SB)
+TEXT runtime·panicExtendSliceB(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendSliceB(SB)
+TEXT runtime·panicExtendSliceBU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendSliceBU(SB)
+TEXT runtime·panicExtendSlice3Alen(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Alen(SB)
+TEXT runtime·panicExtendSlice3AlenU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AlenU(SB)
+TEXT runtime·panicExtendSlice3Acap(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Acap(SB)
+TEXT runtime·panicExtendSlice3AcapU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AcapU(SB)
+TEXT runtime·panicExtendSlice3B(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3B(SB)
+TEXT runtime·panicExtendSlice3BU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3BU(SB)
+TEXT runtime·panicExtendSlice3C(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3C(SB)
+TEXT runtime·panicExtendSlice3CU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3CU(SB)
diff --git a/src/runtime/asm_arm64.s b/src/runtime/asm_arm64.s
new file mode 100644
index 0000000..a2eb8bb
--- /dev/null
+++ b/src/runtime/asm_arm64.s
@@ -0,0 +1,1313 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "tls_arm64.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+TEXT runtime·rt0_go(SB),NOSPLIT,$0
+	// SP = stack; R0 = argc; R1 = argv
+
+	SUB	$32, RSP
+	MOVW	R0, 8(RSP) // argc
+	MOVD	R1, 16(RSP) // argv
+
+#ifdef TLS_darwin
+	// Initialize TLS.
+	MOVD	ZR, g // clear g, make sure it's not junk.
+	SUB	$32, RSP
+	MRS_TPIDR_R0
+	AND	$~7, R0
+	MOVD	R0, 16(RSP)             // arg2: TLS base
+	MOVD	$runtime·tls_g(SB), R2
+	MOVD	R2, 8(RSP)              // arg1: &tlsg
+	BL	·tlsinit(SB)
+	ADD	$32, RSP
+#endif
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVD	$runtime·g0(SB), g
+	MOVD	RSP, R7
+	MOVD	$(-64*1024)(R7), R0
+	MOVD	R0, g_stackguard0(g)
+	MOVD	R0, g_stackguard1(g)
+	MOVD	R0, (g_stack+stack_lo)(g)
+	MOVD	R7, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOVD	_cgo_init(SB), R12
+	CBZ	R12, nocgo
+
+#ifdef GOOS_android
+	MRS_TPIDR_R0			// load TLS base pointer
+	MOVD	R0, R3			// arg 3: TLS base pointer
+	MOVD	$runtime·tls_g(SB), R2 	// arg 2: &tls_g
+#else
+	MOVD	$0, R2		        // arg 2: not used when using platform's TLS
+#endif
+	MOVD	$setg_gcc<>(SB), R1	// arg 1: setg
+	MOVD	g, R0			// arg 0: G
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	(R12)
+	ADD	$16, RSP
+
+nocgo:
+	BL	runtime·save_g(SB)
+	// update stackguard after _cgo_init
+	MOVD	(g_stack+stack_lo)(g), R0
+	ADD	$const__StackGuard, R0
+	MOVD	R0, g_stackguard0(g)
+	MOVD	R0, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVD	$runtime·m0(SB), R0
+
+	// save m->g0 = g0
+	MOVD	g, m_g0(R0)
+	// save m0 to g0->m
+	MOVD	R0, g_m(g)
+
+	BL	runtime·check(SB)
+
+	MOVW	8(RSP), R0	// copy argc
+	MOVW	R0, -8(RSP)
+	MOVD	16(RSP), R0		// copy argv
+	MOVD	R0, 0(RSP)
+	BL	runtime·args(SB)
+	BL	runtime·osinit(SB)
+	BL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVD	$runtime·mainPC(SB), R0		// entry
+	MOVD	RSP, R7
+	MOVD.W	$0, -8(R7)
+	MOVD.W	R0, -8(R7)
+	MOVD.W	$0, -8(R7)
+	MOVD.W	$0, -8(R7)
+	MOVD	R7, RSP
+	BL	runtime·newproc(SB)
+	ADD	$32, RSP
+
+	// start this M
+	BL	runtime·mstart(SB)
+
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// boom
+	UNDEF
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	BRK
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+/*
+ *  go-routine
+ */
+
+// void gosave(Gobuf*)
+// save state in Gobuf; setjmp
+TEXT runtime·gosave(SB), NOSPLIT|NOFRAME, $0-8
+	MOVD	buf+0(FP), R3
+	MOVD	RSP, R0
+	MOVD	R0, gobuf_sp(R3)
+	MOVD	R29, gobuf_bp(R3)
+	MOVD	LR, gobuf_pc(R3)
+	MOVD	g, gobuf_g(R3)
+	MOVD	ZR, gobuf_lr(R3)
+	MOVD	ZR, gobuf_ret(R3)
+	// Assert ctxt is zero. See func save.
+	MOVD	gobuf_ctxt(R3), R0
+	CBZ	R0, 2(PC)
+	CALL	runtime·badctxt(SB)
+	RET
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT, $24-8
+	MOVD	buf+0(FP), R5
+	MOVD	gobuf_g(R5), g
+	BL	runtime·save_g(SB)
+
+	MOVD	0(g), R4	// make sure g is not nil
+	MOVD	gobuf_sp(R5), R0
+	MOVD	R0, RSP
+	MOVD	gobuf_bp(R5), R29
+	MOVD	gobuf_lr(R5), LR
+	MOVD	gobuf_ret(R5), R0
+	MOVD	gobuf_ctxt(R5), R26
+	MOVD	$0, gobuf_sp(R5)
+	MOVD	$0, gobuf_bp(R5)
+	MOVD	$0, gobuf_ret(R5)
+	MOVD	$0, gobuf_lr(R5)
+	MOVD	$0, gobuf_ctxt(R5)
+	CMP	ZR, ZR // set condition codes for == test, needed by stack split
+	MOVD	gobuf_pc(R5), R6
+	B	(R6)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT|NOFRAME, $0-8
+	// Save caller state in g->sched
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(g)
+	MOVD	R29, (g_sched+gobuf_bp)(g)
+	MOVD	LR, (g_sched+gobuf_pc)(g)
+	MOVD	$0, (g_sched+gobuf_lr)(g)
+	MOVD	g, (g_sched+gobuf_g)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVD	g, R3
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	CMP	g, R3
+	BNE	2(PC)
+	B	runtime·badmcall(SB)
+	MOVD	fn+0(FP), R26			// context
+	MOVD	0(R26), R4			// code pointer
+	MOVD	(g_sched+gobuf_sp)(g), R0
+	MOVD	R0, RSP	// sp = m->g0->sched.sp
+	MOVD	(g_sched+gobuf_bp)(g), R29
+	MOVD	R3, -8(RSP)
+	MOVD	$0, -16(RSP)
+	SUB	$16, RSP
+	BL	(R4)
+	B	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	UNDEF
+	BL	(LR)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVD	fn+0(FP), R3	// R3 = fn
+	MOVD	R3, R26		// context
+	MOVD	g_m(g), R4	// R4 = m
+
+	MOVD	m_gsignal(R4), R5	// R5 = gsignal
+	CMP	g, R5
+	BEQ	noswitch
+
+	MOVD	m_g0(R4), R5	// R5 = g0
+	CMP	g, R5
+	BEQ	noswitch
+
+	MOVD	m_curg(R4), R6
+	CMP	g, R6
+	BEQ	switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVD	$runtime·badsystemstack(SB), R3
+	BL	(R3)
+	B	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVD	$runtime·systemstack_switch(SB), R6
+	ADD	$8, R6	// get past prologue
+	MOVD	R6, (g_sched+gobuf_pc)(g)
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(g)
+	MOVD	R29, (g_sched+gobuf_bp)(g)
+	MOVD	$0, (g_sched+gobuf_lr)(g)
+	MOVD	g, (g_sched+gobuf_g)(g)
+
+	// switch to g0
+	MOVD	R5, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R3
+	// make it look like mstart called systemstack on g0, to stop traceback
+	SUB	$16, R3
+	AND	$~15, R3
+	MOVD	$runtime·mstart(SB), R4
+	MOVD	R4, 0(R3)
+	MOVD	R3, RSP
+	MOVD	(g_sched+gobuf_bp)(g), R29
+
+	// call target function
+	MOVD	0(R26), R3	// code pointer
+	BL	(R3)
+
+	// switch back to g
+	MOVD	g_m(g), R3
+	MOVD	m_curg(R3), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R0
+	MOVD	R0, RSP
+	MOVD	(g_sched+gobuf_bp)(g), R29
+	MOVD	$0, (g_sched+gobuf_sp)(g)
+	MOVD	$0, (g_sched+gobuf_bp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVD	0(R26), R3	// code pointer
+	MOVD.P	16(RSP), R30	// restore LR
+	SUB	$8, RSP, R29	// restore FP
+	B	(R3)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R3 prolog's LR (R30)
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), R4
+	CMP	g, R4
+	BNE	3(PC)
+	BL	runtime·badmorestackg0(SB)
+	B	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVD	m_gsignal(R8), R4
+	CMP	g, R4
+	BNE	3(PC)
+	BL	runtime·badmorestackgsignal(SB)
+	B	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(g)
+	MOVD	R29, (g_sched+gobuf_bp)(g)
+	MOVD	LR, (g_sched+gobuf_pc)(g)
+	MOVD	R3, (g_sched+gobuf_lr)(g)
+	MOVD	R26, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's callers.
+	MOVD	R3, (m_morebuf+gobuf_pc)(R8)	// f's caller's PC
+	MOVD	RSP, R0
+	MOVD	R0, (m_morebuf+gobuf_sp)(R8)	// f's caller's RSP
+	MOVD	g, (m_morebuf+gobuf_g)(R8)
+
+	// Call newstack on m->g0's stack.
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R0
+	MOVD	R0, RSP
+	MOVD	(g_sched+gobuf_bp)(g), R29
+	MOVD.W	$0, -16(RSP)	// create a call frame on g0 (saved LR; keep 16-aligned)
+	BL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	$0, R26
+	B runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(argtype *_type, f *FuncVal, arg *byte, argsize, retoffset uint32).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	MOVD	$MAXSIZE, R27;		\
+	CMP	R27, R16;		\
+	BGT	3(PC);			\
+	MOVD	$NAME(SB), R27;	\
+	B	(R27)
+// Note: can't just "B NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT|NOFRAME, $0-32
+	MOVWU argsize+24(FP), R16
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVD	$runtime·badreflectcall(SB), R0
+	B	(R0)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-24;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVD	arg+16(FP), R3;			\
+	MOVWU	argsize+24(FP), R4;		\
+	ADD	$8, RSP, R5;			\
+	BIC	$0xf, R4, R6;			\
+	CBZ	R6, 6(PC);			\
+	/* if R6=(argsize&~15) != 0 */		\
+	ADD	R6, R5, R6;			\
+	/* copy 16 bytes a time */		\
+	LDP.P	16(R3), (R7, R8);		\
+	STP.P	(R7, R8), 16(R5);		\
+	CMP	R5, R6;				\
+	BNE	-3(PC);				\
+	AND	$0xf, R4, R6;			\
+	CBZ	R6, 6(PC);			\
+	/* if R6=(argsize&15) != 0 */		\
+	ADD	R6, R5, R6;			\
+	/* copy 1 byte a time for the rest */	\
+	MOVBU.P	1(R3), R7;			\
+	MOVBU.P	R7, 1(R5);			\
+	CMP	R5, R6;				\
+	BNE	-3(PC);				\
+	/* call function */			\
+	MOVD	f+8(FP), R26;			\
+	MOVD	(R26), R0;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	BL	(R0);				\
+	/* copy return values back */		\
+	MOVD	argtype+0(FP), R7;		\
+	MOVD	arg+16(FP), R3;			\
+	MOVWU	n+24(FP), R4;			\
+	MOVWU	retoffset+28(FP), R6;		\
+	ADD	$8, RSP, R5;			\
+	ADD	R6, R5; 			\
+	ADD	R6, R3;				\
+	SUB	R6, R4;				\
+	BL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $40-0
+	MOVD	R7, 8(RSP)
+	MOVD	R3, 16(RSP)
+	MOVD	R5, 24(RSP)
+	MOVD	R4, 32(RSP)
+	BL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+// func memhash32(p unsafe.Pointer, h uintptr) uintptr
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-24
+	MOVB	runtime·useAeshash(SB), R0
+	CBZ	R0, noaes
+	MOVD	p+0(FP), R0
+	MOVD	h+8(FP), R1
+	MOVD	$ret+16(FP), R2
+	MOVD	$runtime·aeskeysched+0(SB), R3
+
+	VEOR	V0.B16, V0.B16, V0.B16
+	VLD1	(R3), [V2.B16]
+	VLD1	(R0), V0.S[1]
+	VMOV	R1, V0.S[0]
+
+	AESE	V2.B16, V0.B16
+	AESMC	V0.B16, V0.B16
+	AESE	V2.B16, V0.B16
+	AESMC	V0.B16, V0.B16
+	AESE	V2.B16, V0.B16
+
+	VST1	[V0.D1], (R2)
+	RET
+noaes:
+	B	runtime·memhash32Fallback(SB)
+
+// func memhash64(p unsafe.Pointer, h uintptr) uintptr
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-24
+	MOVB	runtime·useAeshash(SB), R0
+	CBZ	R0, noaes
+	MOVD	p+0(FP), R0
+	MOVD	h+8(FP), R1
+	MOVD	$ret+16(FP), R2
+	MOVD	$runtime·aeskeysched+0(SB), R3
+
+	VEOR	V0.B16, V0.B16, V0.B16
+	VLD1	(R3), [V2.B16]
+	VLD1	(R0), V0.D[1]
+	VMOV	R1, V0.D[0]
+
+	AESE	V2.B16, V0.B16
+	AESMC	V0.B16, V0.B16
+	AESE	V2.B16, V0.B16
+	AESMC	V0.B16, V0.B16
+	AESE	V2.B16, V0.B16
+
+	VST1	[V0.D1], (R2)
+	RET
+noaes:
+	B	runtime·memhash64Fallback(SB)
+
+// func memhash(p unsafe.Pointer, h, size uintptr) uintptr
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-32
+	MOVB	runtime·useAeshash(SB), R0
+	CBZ	R0, noaes
+	MOVD	p+0(FP), R0
+	MOVD	s+16(FP), R1
+	MOVD	h+8(FP), R3
+	MOVD	$ret+24(FP), R2
+	B	aeshashbody<>(SB)
+noaes:
+	B	runtime·memhashFallback(SB)
+
+// func strhash(p unsafe.Pointer, h uintptr) uintptr
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-24
+	MOVB	runtime·useAeshash(SB), R0
+	CBZ	R0, noaes
+	MOVD	p+0(FP), R10 // string pointer
+	LDP	(R10), (R0, R1) //string data/ length
+	MOVD	h+8(FP), R3
+	MOVD	$ret+16(FP), R2 // return adddress
+	B	aeshashbody<>(SB)
+noaes:
+	B	runtime·strhashFallback(SB)
+
+// R0: data
+// R1: length
+// R2: address to put return value
+// R3: seed data
+TEXT aeshashbody<>(SB),NOSPLIT|NOFRAME,$0
+	VEOR	V30.B16, V30.B16, V30.B16
+	VMOV	R3, V30.D[0]
+	VMOV	R1, V30.D[1] // load length into seed
+
+	MOVD	$runtime·aeskeysched+0(SB), R4
+	VLD1.P	16(R4), [V0.B16]
+	AESE	V30.B16, V0.B16
+	AESMC	V0.B16, V0.B16
+	CMP	$16, R1
+	BLO	aes0to15
+	BEQ	aes16
+	CMP	$32, R1
+	BLS	aes17to32
+	CMP	$64, R1
+	BLS	aes33to64
+	CMP	$128, R1
+	BLS	aes65to128
+	B	aes129plus
+
+aes0to15:
+	CBZ	R1, aes0
+	VEOR	V2.B16, V2.B16, V2.B16
+	TBZ	$3, R1, less_than_8
+	VLD1.P	8(R0), V2.D[0]
+
+less_than_8:
+	TBZ	$2, R1, less_than_4
+	VLD1.P	4(R0), V2.S[2]
+
+less_than_4:
+	TBZ	$1, R1, less_than_2
+	VLD1.P	2(R0), V2.H[6]
+
+less_than_2:
+	TBZ	$0, R1, done
+	VLD1	(R0), V2.B[14]
+done:
+	AESE	V0.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V0.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V0.B16, V2.B16
+
+	VST1	[V2.D1], (R2)
+	RET
+aes0:
+	VST1	[V0.D1], (R2)
+	RET
+aes16:
+	VLD1	(R0), [V2.B16]
+	B	done
+
+aes17to32:
+	// make second seed
+	VLD1	(R4), [V1.B16]
+	AESE	V30.B16, V1.B16
+	AESMC	V1.B16, V1.B16
+	SUB	$16, R1, R10
+	VLD1.P	(R0)(R10), [V2.B16]
+	VLD1	(R0), [V3.B16]
+
+	AESE	V0.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V1.B16, V3.B16
+	AESMC	V3.B16, V3.B16
+
+	AESE	V0.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V1.B16, V3.B16
+	AESMC	V3.B16, V3.B16
+
+	AESE	V0.B16, V2.B16
+	AESE	V1.B16, V3.B16
+
+	VEOR	V3.B16, V2.B16, V2.B16
+	VST1	[V2.D1], (R2)
+	RET
+
+aes33to64:
+	VLD1	(R4), [V1.B16, V2.B16, V3.B16]
+	AESE	V30.B16, V1.B16
+	AESMC	V1.B16, V1.B16
+	AESE	V30.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V30.B16, V3.B16
+	AESMC	V3.B16, V3.B16
+	SUB	$32, R1, R10
+
+	VLD1.P	(R0)(R10), [V4.B16, V5.B16]
+	VLD1	(R0), [V6.B16, V7.B16]
+
+	AESE	V0.B16, V4.B16
+	AESMC	V4.B16, V4.B16
+	AESE	V1.B16, V5.B16
+	AESMC	V5.B16, V5.B16
+	AESE	V2.B16, V6.B16
+	AESMC	V6.B16, V6.B16
+	AESE	V3.B16, V7.B16
+	AESMC	V7.B16, V7.B16
+
+	AESE	V0.B16, V4.B16
+	AESMC	V4.B16, V4.B16
+	AESE	V1.B16, V5.B16
+	AESMC	V5.B16, V5.B16
+	AESE	V2.B16, V6.B16
+	AESMC	V6.B16, V6.B16
+	AESE	V3.B16, V7.B16
+	AESMC	V7.B16, V7.B16
+
+	AESE	V0.B16, V4.B16
+	AESE	V1.B16, V5.B16
+	AESE	V2.B16, V6.B16
+	AESE	V3.B16, V7.B16
+
+	VEOR	V6.B16, V4.B16, V4.B16
+	VEOR	V7.B16, V5.B16, V5.B16
+	VEOR	V5.B16, V4.B16, V4.B16
+
+	VST1	[V4.D1], (R2)
+	RET
+
+aes65to128:
+	VLD1.P	64(R4), [V1.B16, V2.B16, V3.B16, V4.B16]
+	VLD1	(R4), [V5.B16, V6.B16, V7.B16]
+	AESE	V30.B16, V1.B16
+	AESMC	V1.B16, V1.B16
+	AESE	V30.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V30.B16, V3.B16
+	AESMC	V3.B16, V3.B16
+	AESE	V30.B16, V4.B16
+	AESMC	V4.B16, V4.B16
+	AESE	V30.B16, V5.B16
+	AESMC	V5.B16, V5.B16
+	AESE	V30.B16, V6.B16
+	AESMC	V6.B16, V6.B16
+	AESE	V30.B16, V7.B16
+	AESMC	V7.B16, V7.B16
+
+	SUB	$64, R1, R10
+	VLD1.P	(R0)(R10), [V8.B16, V9.B16, V10.B16, V11.B16]
+	VLD1	(R0), [V12.B16, V13.B16, V14.B16, V15.B16]
+	AESE	V0.B16,	 V8.B16
+	AESMC	V8.B16,  V8.B16
+	AESE	V1.B16,	 V9.B16
+	AESMC	V9.B16,  V9.B16
+	AESE	V2.B16, V10.B16
+	AESMC	V10.B16,  V10.B16
+	AESE	V3.B16, V11.B16
+	AESMC	V11.B16,  V11.B16
+	AESE	V4.B16, V12.B16
+	AESMC	V12.B16,  V12.B16
+	AESE	V5.B16, V13.B16
+	AESMC	V13.B16,  V13.B16
+	AESE	V6.B16, V14.B16
+	AESMC	V14.B16,  V14.B16
+	AESE	V7.B16, V15.B16
+	AESMC	V15.B16,  V15.B16
+
+	AESE	V0.B16,	 V8.B16
+	AESMC	V8.B16,  V8.B16
+	AESE	V1.B16,	 V9.B16
+	AESMC	V9.B16,  V9.B16
+	AESE	V2.B16, V10.B16
+	AESMC	V10.B16,  V10.B16
+	AESE	V3.B16, V11.B16
+	AESMC	V11.B16,  V11.B16
+	AESE	V4.B16, V12.B16
+	AESMC	V12.B16,  V12.B16
+	AESE	V5.B16, V13.B16
+	AESMC	V13.B16,  V13.B16
+	AESE	V6.B16, V14.B16
+	AESMC	V14.B16,  V14.B16
+	AESE	V7.B16, V15.B16
+	AESMC	V15.B16,  V15.B16
+
+	AESE	V0.B16,	 V8.B16
+	AESE	V1.B16,	 V9.B16
+	AESE	V2.B16, V10.B16
+	AESE	V3.B16, V11.B16
+	AESE	V4.B16, V12.B16
+	AESE	V5.B16, V13.B16
+	AESE	V6.B16, V14.B16
+	AESE	V7.B16, V15.B16
+
+	VEOR	V12.B16, V8.B16, V8.B16
+	VEOR	V13.B16, V9.B16, V9.B16
+	VEOR	V14.B16, V10.B16, V10.B16
+	VEOR	V15.B16, V11.B16, V11.B16
+	VEOR	V10.B16, V8.B16, V8.B16
+	VEOR	V11.B16, V9.B16, V9.B16
+	VEOR	V9.B16, V8.B16, V8.B16
+
+	VST1	[V8.D1], (R2)
+	RET
+
+aes129plus:
+	PRFM (R0), PLDL1KEEP
+	VLD1.P	64(R4), [V1.B16, V2.B16, V3.B16, V4.B16]
+	VLD1	(R4), [V5.B16, V6.B16, V7.B16]
+	AESE	V30.B16, V1.B16
+	AESMC	V1.B16, V1.B16
+	AESE	V30.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V30.B16, V3.B16
+	AESMC	V3.B16, V3.B16
+	AESE	V30.B16, V4.B16
+	AESMC	V4.B16, V4.B16
+	AESE	V30.B16, V5.B16
+	AESMC	V5.B16, V5.B16
+	AESE	V30.B16, V6.B16
+	AESMC	V6.B16, V6.B16
+	AESE	V30.B16, V7.B16
+	AESMC	V7.B16, V7.B16
+	ADD	R0, R1, R10
+	SUB	$128, R10, R10
+	VLD1.P	64(R10), [V8.B16, V9.B16, V10.B16, V11.B16]
+	VLD1	(R10), [V12.B16, V13.B16, V14.B16, V15.B16]
+	SUB	$1, R1, R1
+	LSR	$7, R1, R1
+
+aesloop:
+	AESE	V8.B16,	 V0.B16
+	AESMC	V0.B16,  V0.B16
+	AESE	V9.B16,	 V1.B16
+	AESMC	V1.B16,  V1.B16
+	AESE	V10.B16, V2.B16
+	AESMC	V2.B16,  V2.B16
+	AESE	V11.B16, V3.B16
+	AESMC	V3.B16,  V3.B16
+	AESE	V12.B16, V4.B16
+	AESMC	V4.B16,  V4.B16
+	AESE	V13.B16, V5.B16
+	AESMC	V5.B16,  V5.B16
+	AESE	V14.B16, V6.B16
+	AESMC	V6.B16,  V6.B16
+	AESE	V15.B16, V7.B16
+	AESMC	V7.B16,  V7.B16
+
+	VLD1.P	64(R0), [V8.B16, V9.B16, V10.B16, V11.B16]
+	AESE	V8.B16,	 V0.B16
+	AESMC	V0.B16,  V0.B16
+	AESE	V9.B16,	 V1.B16
+	AESMC	V1.B16,  V1.B16
+	AESE	V10.B16, V2.B16
+	AESMC	V2.B16,  V2.B16
+	AESE	V11.B16, V3.B16
+	AESMC	V3.B16,  V3.B16
+
+	VLD1.P	64(R0), [V12.B16, V13.B16, V14.B16, V15.B16]
+	AESE	V12.B16, V4.B16
+	AESMC	V4.B16,  V4.B16
+	AESE	V13.B16, V5.B16
+	AESMC	V5.B16,  V5.B16
+	AESE	V14.B16, V6.B16
+	AESMC	V6.B16,  V6.B16
+	AESE	V15.B16, V7.B16
+	AESMC	V7.B16,  V7.B16
+	SUB	$1, R1, R1
+	CBNZ	R1, aesloop
+
+	AESE	V8.B16,	 V0.B16
+	AESMC	V0.B16,  V0.B16
+	AESE	V9.B16,	 V1.B16
+	AESMC	V1.B16,  V1.B16
+	AESE	V10.B16, V2.B16
+	AESMC	V2.B16,  V2.B16
+	AESE	V11.B16, V3.B16
+	AESMC	V3.B16,  V3.B16
+	AESE	V12.B16, V4.B16
+	AESMC	V4.B16,  V4.B16
+	AESE	V13.B16, V5.B16
+	AESMC	V5.B16,  V5.B16
+	AESE	V14.B16, V6.B16
+	AESMC	V6.B16,  V6.B16
+	AESE	V15.B16, V7.B16
+	AESMC	V7.B16,  V7.B16
+
+	AESE	V8.B16,	 V0.B16
+	AESMC	V0.B16,  V0.B16
+	AESE	V9.B16,	 V1.B16
+	AESMC	V1.B16,  V1.B16
+	AESE	V10.B16, V2.B16
+	AESMC	V2.B16,  V2.B16
+	AESE	V11.B16, V3.B16
+	AESMC	V3.B16,  V3.B16
+	AESE	V12.B16, V4.B16
+	AESMC	V4.B16,  V4.B16
+	AESE	V13.B16, V5.B16
+	AESMC	V5.B16,  V5.B16
+	AESE	V14.B16, V6.B16
+	AESMC	V6.B16,  V6.B16
+	AESE	V15.B16, V7.B16
+	AESMC	V7.B16,  V7.B16
+
+	AESE	V8.B16,	 V0.B16
+	AESE	V9.B16,	 V1.B16
+	AESE	V10.B16, V2.B16
+	AESE	V11.B16, V3.B16
+	AESE	V12.B16, V4.B16
+	AESE	V13.B16, V5.B16
+	AESE	V14.B16, V6.B16
+	AESE	V15.B16, V7.B16
+
+	VEOR	V0.B16, V1.B16, V0.B16
+	VEOR	V2.B16, V3.B16, V2.B16
+	VEOR	V4.B16, V5.B16, V4.B16
+	VEOR	V6.B16, V7.B16, V6.B16
+	VEOR	V0.B16, V2.B16, V0.B16
+	VEOR	V4.B16, V6.B16, V4.B16
+	VEOR	V4.B16, V0.B16, V0.B16
+
+	VST1	[V0.D1], (R2)
+	RET
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	MOVWU	cycles+0(FP), R0
+again:
+	YIELD
+	SUBW	$1, R0
+	CBNZ	R0, again
+	RET
+
+// void jmpdefer(fv, sp);
+// called from deferreturn.
+// 1. grab stored LR for caller
+// 2. sub 4 bytes to get back to BL deferreturn
+// 3. BR to fn
+TEXT runtime·jmpdefer(SB), NOSPLIT|NOFRAME, $0-16
+	MOVD	0(RSP), R0
+	SUB	$4, R0
+	MOVD	R0, LR
+
+	MOVD	fv+0(FP), R26
+	MOVD	argp+8(FP), R0
+	MOVD	R0, RSP
+	SUB	$8, RSP
+	MOVD	0(R26), R3
+	B	(R3)
+
+// Save state of caller into g->sched. Smashes R0.
+TEXT gosave<>(SB),NOSPLIT|NOFRAME,$0
+	MOVD	LR, (g_sched+gobuf_pc)(g)
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(g)
+	MOVD	R29, (g_sched+gobuf_bp)(g)
+	MOVD	$0, (g_sched+gobuf_lr)(g)
+	MOVD	$0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVD	(g_sched+gobuf_ctxt)(g), R0
+	CBZ	R0, 2(PC)
+	CALL	runtime·badctxt(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOVD	fn+0(FP), R1
+	MOVD	arg+8(FP), R0
+
+	MOVD	RSP, R2		// save original stack pointer
+	CBZ	g, nosave
+	MOVD	g, R4
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already.
+	MOVD	g_m(g), R8
+	MOVD	m_gsignal(R8), R3
+	CMP	R3, g
+	BEQ	nosave
+	MOVD	m_g0(R8), R3
+	CMP	R3, g
+	BEQ	nosave
+
+	// Switch to system stack.
+	MOVD	R0, R9	// gosave<> and save_g might clobber R0
+	BL	gosave<>(SB)
+	MOVD	R3, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R0
+	MOVD	R0, RSP
+	MOVD	(g_sched+gobuf_bp)(g), R29
+	MOVD	R9, R0
+
+	// Now on a scheduling stack (a pthread-created stack).
+	// Save room for two of our pointers /*, plus 32 bytes of callee
+	// save area that lives on the caller stack. */
+	MOVD	RSP, R13
+	SUB	$16, R13
+	MOVD	R13, RSP
+	MOVD	R4, 0(RSP)	// save old g on stack
+	MOVD	(g_stack+stack_hi)(R4), R4
+	SUB	R2, R4
+	MOVD	R4, 8(RSP)	// save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+	BL	(R1)
+	MOVD	R0, R9
+
+	// Restore g, stack pointer. R0 is errno, so don't touch it
+	MOVD	0(RSP), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_stack+stack_hi)(g), R5
+	MOVD	8(RSP), R6
+	SUB	R6, R5
+	MOVD	R9, R0
+	MOVD	R5, RSP
+
+	MOVW	R0, ret+16(FP)
+	RET
+
+nosave:
+	// Running on a system stack, perhaps even without a g.
+	// Having no g can happen during thread creation or thread teardown
+	// (see needm/dropm on Solaris, for example).
+	// This code is like the above sequence but without saving/restoring g
+	// and without worrying about the stack moving out from under us
+	// (because we're on a system stack, not a goroutine stack).
+	// The above code could be used directly if already on a system stack,
+	// but then the only path through this code would be a rare case on Solaris.
+	// Using this code for all "already on system stack" calls exercises it more,
+	// which should help keep it correct.
+	MOVD	RSP, R13
+	SUB	$16, R13
+	MOVD	R13, RSP
+	MOVD	$0, R4
+	MOVD	R4, 0(RSP)	// Where above code stores g, in case someone looks during debugging.
+	MOVD	R2, 8(RSP)	// Save original stack pointer.
+	BL	(R1)
+	// Restore stack pointer.
+	MOVD	8(RSP), R2
+	MOVD	R2, RSP	
+	MOVD	R0, ret+16(FP)
+	RET
+
+// cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Load g from thread-local storage.
+	BL	runtime·load_g(SB)
+
+	// If g is nil, Go did not create the current thread.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	CBZ	g, needm
+
+	MOVD	g_m(g), R8
+	MOVD	R8, savedm-8(SP)
+	B	havem
+
+needm:
+	MOVD	g, savedm-8(SP) // g is zero, so is m.
+	MOVD	$runtime·needm(SB), R0
+	BL	(R0)
+
+	// Set m->g0->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), R3
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(R3)
+	MOVD	R29, (g_sched+gobuf_bp)(R3)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 16(RSP) aka savedsp-16(SP).
+	// Beware that the frame size is actually 32+16.
+	MOVD	m_g0(R8), R3
+	MOVD	(g_sched+gobuf_sp)(R3), R4
+	MOVD	R4, savedsp-16(SP)
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(R3)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVD	m_curg(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R4 // prepare stack as R4
+	MOVD	(g_sched+gobuf_pc)(g), R5
+	MOVD	R5, -48(R4)
+	MOVD	(g_sched+gobuf_bp)(g), R5
+	MOVD	R5, -56(R4)
+	// Gather our arguments into registers.
+	MOVD	fn+0(FP), R1
+	MOVD	frame+8(FP), R2
+	MOVD	ctxt+16(FP), R3
+	MOVD	$-48(R4), R0 // maintain 16-byte SP alignment
+	MOVD	R0, RSP	// switch stack
+	MOVD	R1, 8(RSP)
+	MOVD	R2, 16(RSP)
+	MOVD	R3, 24(RSP)
+	BL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVD	0(RSP), R5
+	MOVD	R5, (g_sched+gobuf_pc)(g)
+	MOVD	RSP, R4
+	ADD	$48, R4, R4
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R0
+	MOVD	R0, RSP
+	MOVD	savedsp-16(SP), R4
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m
+	// for the duration of the call. Since the call is over, return it with dropm.
+	MOVD	savedm-8(SP), R6
+	CBNZ	R6, droppedm
+	MOVD	$runtime·dropm(SB), R0
+	BL	(R0)
+droppedm:
+
+	// Done!
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$24
+	// g (R28) and REGTMP (R27)  might be clobbered by load_g. They
+	// are callee-save in the gcc calling convention, so save them.
+	MOVD	R27, savedR27-8(SP)
+	MOVD	g, saveG-16(SP)
+
+	BL	runtime·load_g(SB)
+	MOVD	g_m(g), R0
+	MOVD	m_curg(R0), R0
+	MOVD	(g_stack+stack_hi)(R0), R0
+
+	MOVD	saveG-16(SP), g
+	MOVD	savedR28-8(SP), R27
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVD	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	BL	runtime·save_g(SB)
+	RET
+
+// void setg_gcc(G*); set g called from gcc
+TEXT setg_gcc<>(SB),NOSPLIT,$8
+	MOVD	R0, g
+	MOVD	R27, savedR27-8(SP)
+	BL	runtime·save_g(SB)
+	MOVD	savedR27-8(SP), R27
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	ZR, R0
+	MOVD	(R0), R0
+	UNDEF
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVW	$0, R0
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	MOVD	R0, R0	// NOP
+	BL	runtime·goexit1(SB)	// does not return
+
+// This is called from .init_array and follows the platform, not Go, ABI.
+TEXT runtime·addmoduledata(SB),NOSPLIT,$0-0
+	SUB	$0x10, RSP
+	MOVD	R27, 8(RSP) // The access to global variables below implicitly uses R27, which is callee-save
+	MOVD	runtime·lastmoduledatap(SB), R1
+	MOVD	R0, moduledata_next(R1)
+	MOVD	R0, runtime·lastmoduledatap(SB)
+	MOVD	8(RSP), R27
+	ADD	$0x10, RSP
+	RET
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R3
+	MOVB	R3, ret+0(FP)
+	RET
+
+// gcWriteBarrier performs a heap pointer write and informs the GC.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It takes two arguments:
+// - R2 is the destination of the write
+// - R3 is the value being written at R2
+// It clobbers condition codes.
+// It does not clobber any general-purpose registers,
+// but may clobber others (e.g., floating point registers)
+// The act of CALLing gcWriteBarrier will clobber R30 (LR).
+TEXT runtime·gcWriteBarrier(SB),NOSPLIT,$200
+	// Save the registers clobbered by the fast path.
+	MOVD	R0, 184(RSP)
+	MOVD	R1, 192(RSP)
+	MOVD	g_m(g), R0
+	MOVD	m_p(R0), R0
+	MOVD	(p_wbBuf+wbBuf_next)(R0), R1
+	// Increment wbBuf.next position.
+	ADD	$16, R1
+	MOVD	R1, (p_wbBuf+wbBuf_next)(R0)
+	MOVD	(p_wbBuf+wbBuf_end)(R0), R0
+	CMP	R1, R0
+	// Record the write.
+	MOVD	R3, -16(R1)	// Record value
+	MOVD	(R2), R0	// TODO: This turns bad writes into bad reads.
+	MOVD	R0, -8(R1)	// Record *slot
+	// Is the buffer full? (flags set in CMP above)
+	BEQ	flush
+ret:
+	MOVD	184(RSP), R0
+	MOVD	192(RSP), R1
+	// Do the write.
+	MOVD	R3, (R2)
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	MOVD	R2, 8(RSP)	// Also first argument to wbBufFlush
+	MOVD	R3, 16(RSP)	// Also second argument to wbBufFlush
+	// R0 already saved
+	// R1 already saved
+	MOVD	R4, 24(RSP)
+	MOVD	R5, 32(RSP)
+	MOVD	R6, 40(RSP)
+	MOVD	R7, 48(RSP)
+	MOVD	R8, 56(RSP)
+	MOVD	R9, 64(RSP)
+	MOVD	R10, 72(RSP)
+	MOVD	R11, 80(RSP)
+	MOVD	R12, 88(RSP)
+	MOVD	R13, 96(RSP)
+	MOVD	R14, 104(RSP)
+	MOVD	R15, 112(RSP)
+	// R16, R17 may be clobbered by linker trampoline
+	// R18 is unused.
+	MOVD	R19, 120(RSP)
+	MOVD	R20, 128(RSP)
+	MOVD	R21, 136(RSP)
+	MOVD	R22, 144(RSP)
+	MOVD	R23, 152(RSP)
+	MOVD	R24, 160(RSP)
+	MOVD	R25, 168(RSP)
+	MOVD	R26, 176(RSP)
+	// R27 is temp register.
+	// R28 is g.
+	// R29 is frame pointer (unused).
+	// R30 is LR, which was saved by the prologue.
+	// R31 is SP.
+
+	// This takes arguments R2 and R3.
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVD	8(RSP), R2
+	MOVD	16(RSP), R3
+	MOVD	24(RSP), R4
+	MOVD	32(RSP), R5
+	MOVD	40(RSP), R6
+	MOVD	48(RSP), R7
+	MOVD	56(RSP), R8
+	MOVD	64(RSP), R9
+	MOVD	72(RSP), R10
+	MOVD	80(RSP), R11
+	MOVD	88(RSP), R12
+	MOVD	96(RSP), R13
+	MOVD	104(RSP), R14
+	MOVD	112(RSP), R15
+	MOVD	120(RSP), R19
+	MOVD	128(RSP), R20
+	MOVD	136(RSP), R21
+	MOVD	144(RSP), R22
+	MOVD	152(RSP), R23
+	MOVD	160(RSP), R24
+	MOVD	168(RSP), R25
+	MOVD	176(RSP), R26
+	JMP	ret
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
diff --git a/src/runtime/asm_mips64x.s b/src/runtime/asm_mips64x.s
new file mode 100644
index 0000000..19781f7
--- /dev/null
+++ b/src/runtime/asm_mips64x.s
@@ -0,0 +1,807 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+#define	REGCTXT	R22
+
+TEXT runtime·rt0_go(SB),NOSPLIT,$0
+	// R29 = stack; R4 = argc; R5 = argv
+
+	ADDV	$-24, R29
+	MOVW	R4, 8(R29) // argc
+	MOVV	R5, 16(R29) // argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVV	$runtime·g0(SB), g
+	MOVV	$(-64*1024), R23
+	ADDV	R23, R29, R1
+	MOVV	R1, g_stackguard0(g)
+	MOVV	R1, g_stackguard1(g)
+	MOVV	R1, (g_stack+stack_lo)(g)
+	MOVV	R29, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOVV	_cgo_init(SB), R25
+	BEQ	R25, nocgo
+
+	MOVV	R0, R7	// arg 3: not used
+	MOVV	R0, R6	// arg 2: not used
+	MOVV	$setg_gcc<>(SB), R5	// arg 1: setg
+	MOVV	g, R4	// arg 0: G
+	JAL	(R25)
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOVV	(g_stack+stack_lo)(g), R1
+	ADDV	$const__StackGuard, R1
+	MOVV	R1, g_stackguard0(g)
+	MOVV	R1, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVV	$runtime·m0(SB), R1
+
+	// save m->g0 = g0
+	MOVV	g, m_g0(R1)
+	// save m0 to g0->m
+	MOVV	R1, g_m(g)
+
+	JAL	runtime·check(SB)
+
+	// args are already prepared
+	JAL	runtime·args(SB)
+	JAL	runtime·osinit(SB)
+	JAL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVV	$runtime·mainPC(SB), R1		// entry
+	ADDV	$-24, R29
+	MOVV	R1, 16(R29)
+	MOVV	R0, 8(R29)
+	MOVV	R0, 0(R29)
+	JAL	runtime·newproc(SB)
+	ADDV	$24, R29
+
+	// start this M
+	JAL	runtime·mstart(SB)
+
+	MOVV	R0, 1(R0)
+	RET
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	MOVV	R0, 2(R0) // TODO: TD
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+/*
+ *  go-routine
+ */
+
+// void gosave(Gobuf*)
+// save state in Gobuf; setjmp
+TEXT runtime·gosave(SB), NOSPLIT|NOFRAME, $0-8
+	MOVV	buf+0(FP), R1
+	MOVV	R29, gobuf_sp(R1)
+	MOVV	R31, gobuf_pc(R1)
+	MOVV	g, gobuf_g(R1)
+	MOVV	R0, gobuf_lr(R1)
+	MOVV	R0, gobuf_ret(R1)
+	// Assert ctxt is zero. See func save.
+	MOVV	gobuf_ctxt(R1), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·badctxt(SB)
+	RET
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT, $16-8
+	MOVV	buf+0(FP), R3
+	MOVV	gobuf_g(R3), g	// make sure g is not nil
+	JAL	runtime·save_g(SB)
+
+	MOVV	0(g), R2
+	MOVV	gobuf_sp(R3), R29
+	MOVV	gobuf_lr(R3), R31
+	MOVV	gobuf_ret(R3), R1
+	MOVV	gobuf_ctxt(R3), REGCTXT
+	MOVV	R0, gobuf_sp(R3)
+	MOVV	R0, gobuf_ret(R3)
+	MOVV	R0, gobuf_lr(R3)
+	MOVV	R0, gobuf_ctxt(R3)
+	MOVV	gobuf_pc(R3), R4
+	JMP	(R4)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT|NOFRAME, $0-8
+	// Save caller state in g->sched
+	MOVV	R29, (g_sched+gobuf_sp)(g)
+	MOVV	R31, (g_sched+gobuf_pc)(g)
+	MOVV	R0, (g_sched+gobuf_lr)(g)
+	MOVV	g, (g_sched+gobuf_g)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVV	g, R1
+	MOVV	g_m(g), R3
+	MOVV	m_g0(R3), g
+	JAL	runtime·save_g(SB)
+	BNE	g, R1, 2(PC)
+	JMP	runtime·badmcall(SB)
+	MOVV	fn+0(FP), REGCTXT			// context
+	MOVV	0(REGCTXT), R4			// code pointer
+	MOVV	(g_sched+gobuf_sp)(g), R29	// sp = m->g0->sched.sp
+	ADDV	$-16, R29
+	MOVV	R1, 8(R29)
+	MOVV	R0, 0(R29)
+	JAL	(R4)
+	JMP	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	UNDEF
+	JAL	(R31)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVV	fn+0(FP), R1	// R1 = fn
+	MOVV	R1, REGCTXT		// context
+	MOVV	g_m(g), R2	// R2 = m
+
+	MOVV	m_gsignal(R2), R3	// R3 = gsignal
+	BEQ	g, R3, noswitch
+
+	MOVV	m_g0(R2), R3	// R3 = g0
+	BEQ	g, R3, noswitch
+
+	MOVV	m_curg(R2), R4
+	BEQ	g, R4, switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVV	$runtime·badsystemstack(SB), R4
+	JAL	(R4)
+	JAL	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVV	$runtime·systemstack_switch(SB), R4
+	ADDV	$8, R4	// get past prologue
+	MOVV	R4, (g_sched+gobuf_pc)(g)
+	MOVV	R29, (g_sched+gobuf_sp)(g)
+	MOVV	R0, (g_sched+gobuf_lr)(g)
+	MOVV	g, (g_sched+gobuf_g)(g)
+
+	// switch to g0
+	MOVV	R3, g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R1
+	// make it look like mstart called systemstack on g0, to stop traceback
+	ADDV	$-8, R1
+	MOVV	$runtime·mstart(SB), R2
+	MOVV	R2, 0(R1)
+	MOVV	R1, R29
+
+	// call target function
+	MOVV	0(REGCTXT), R4	// code pointer
+	JAL	(R4)
+
+	// switch back to g
+	MOVV	g_m(g), R1
+	MOVV	m_curg(R1), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R29
+	MOVV	R0, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVV	0(REGCTXT), R4	// code pointer
+	MOVV	0(R29), R31	// restore LR
+	ADDV	$8, R29
+	JMP	(R4)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R1: framesize, R2: argsize, R3: LR
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVV	g_m(g), R7
+	MOVV	m_g0(R7), R8
+	BNE	g, R8, 3(PC)
+	JAL	runtime·badmorestackg0(SB)
+	JAL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVV	m_gsignal(R7), R8
+	BNE	g, R8, 3(PC)
+	JAL	runtime·badmorestackgsignal(SB)
+	JAL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVV	R29, (g_sched+gobuf_sp)(g)
+	MOVV	R31, (g_sched+gobuf_pc)(g)
+	MOVV	R3, (g_sched+gobuf_lr)(g)
+	MOVV	REGCTXT, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVV	R3, (m_morebuf+gobuf_pc)(R7)	// f's caller's PC
+	MOVV	R29, (m_morebuf+gobuf_sp)(R7)	// f's caller's SP
+	MOVV	g, (m_morebuf+gobuf_g)(R7)
+
+	// Call newstack on m->g0's stack.
+	MOVV	m_g0(R7), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R29
+	// Create a stack frame on g0 to call newstack.
+	MOVV	R0, -8(R29)	// Zero saved LR in frame
+	ADDV	$-8, R29
+	JAL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVV	R0, REGCTXT
+	JMP	runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(argtype *_type, f *FuncVal, arg *byte, argsize, retoffset uint32).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	MOVV	$MAXSIZE, R23;		\
+	SGTU	R1, R23, R23;		\
+	BNE	R23, 3(PC);			\
+	MOVV	$NAME(SB), R4;	\
+	JMP	(R4)
+// Note: can't just "BR NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT|NOFRAME, $0-32
+	MOVWU argsize+24(FP), R1
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVV	$runtime·badreflectcall(SB), R4
+	JMP	(R4)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-24;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVV	arg+16(FP), R1;			\
+	MOVWU	argsize+24(FP), R2;			\
+	MOVV	R29, R3;				\
+	ADDV	$8, R3;			\
+	ADDV	R3, R2;				\
+	BEQ	R3, R2, 6(PC);				\
+	MOVBU	(R1), R4;			\
+	ADDV	$1, R1;			\
+	MOVBU	R4, (R3);			\
+	ADDV	$1, R3;			\
+	JMP	-5(PC);				\
+	/* call function */			\
+	MOVV	f+8(FP), REGCTXT;			\
+	MOVV	(REGCTXT), R4;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	JAL	(R4);				\
+	/* copy return values back */		\
+	MOVV	argtype+0(FP), R5;		\
+	MOVV	arg+16(FP), R1;			\
+	MOVWU	n+24(FP), R2;			\
+	MOVWU	retoffset+28(FP), R4;		\
+	ADDV	$8, R29, R3;				\
+	ADDV	R4, R3; 			\
+	ADDV	R4, R1;				\
+	SUBVU	R4, R2;				\
+	JAL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $32-0
+	MOVV	R5, 8(R29)
+	MOVV	R1, 16(R29)
+	MOVV	R3, 24(R29)
+	MOVV	R2, 32(R29)
+	JAL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	RET
+
+// void jmpdefer(fv, sp);
+// called from deferreturn.
+// 1. grab stored LR for caller
+// 2. sub 8 bytes to get back to JAL deferreturn
+// 3. JMP to fn
+TEXT runtime·jmpdefer(SB), NOSPLIT|NOFRAME, $0-16
+	MOVV	0(R29), R31
+	ADDV	$-8, R31
+
+	MOVV	fv+0(FP), REGCTXT
+	MOVV	argp+8(FP), R29
+	ADDV	$-8, R29
+	NOR	R0, R0	// prevent scheduling
+	MOVV	0(REGCTXT), R4
+	JMP	(R4)
+
+// Save state of caller into g->sched. Smashes R1.
+TEXT gosave<>(SB),NOSPLIT|NOFRAME,$0
+	MOVV	R31, (g_sched+gobuf_pc)(g)
+	MOVV	R29, (g_sched+gobuf_sp)(g)
+	MOVV	R0, (g_sched+gobuf_lr)(g)
+	MOVV	R0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVV	(g_sched+gobuf_ctxt)(g), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·badctxt(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOVV	fn+0(FP), R25
+	MOVV	arg+8(FP), R4
+
+	MOVV	R29, R3	// save original stack pointer
+	MOVV	g, R2
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already.
+	MOVV	g_m(g), R5
+	MOVV	m_g0(R5), R6
+	BEQ	R6, g, g0
+
+	JAL	gosave<>(SB)
+	MOVV	R6, g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R29
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+	// Save room for two of our pointers.
+	ADDV	$-16, R29
+	MOVV	R2, 0(R29)	// save old g on stack
+	MOVV	(g_stack+stack_hi)(R2), R2
+	SUBVU	R3, R2
+	MOVV	R2, 8(R29)	// save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+	JAL	(R25)
+
+	// Restore g, stack pointer. R2 is return value.
+	MOVV	0(R29), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_stack+stack_hi)(g), R5
+	MOVV	8(R29), R6
+	SUBVU	R6, R5
+	MOVV	R5, R29
+
+	MOVW	R2, ret+16(FP)
+	RET
+
+// func cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Load m and g from thread-local storage.
+	MOVB	runtime·iscgo(SB), R1
+	BEQ	R1, nocgo
+	JAL	runtime·load_g(SB)
+nocgo:
+
+	// If g is nil, Go did not create the current thread.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	BEQ	g, needm
+
+	MOVV	g_m(g), R3
+	MOVV	R3, savedm-8(SP)
+	JMP	havem
+
+needm:
+	MOVV	g, savedm-8(SP) // g is zero, so is m.
+	MOVV	$runtime·needm(SB), R4
+	JAL	(R4)
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVV	g_m(g), R3
+	MOVV	m_g0(R3), R1
+	MOVV	R29, (g_sched+gobuf_sp)(R1)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 8(R29) aka savedsp-16(SP).
+	MOVV	m_g0(R3), R1
+	MOVV	(g_sched+gobuf_sp)(R1), R2
+	MOVV	R2, savedsp-24(SP)	// must match frame size
+	MOVV	R29, (g_sched+gobuf_sp)(R1)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVV	m_curg(R3), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R2 // prepare stack as R2
+	MOVV	(g_sched+gobuf_pc)(g), R4
+	MOVV	R4, -(24+8)(R2)	// "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOVV	fn+0(FP), R5
+	MOVV	frame+8(FP), R6
+	MOVV	ctxt+16(FP), R7
+	MOVV	$-(24+8)(R2), R29	// switch stack; must match frame size
+	MOVV	R5, 8(R29)
+	MOVV	R6, 16(R29)
+	MOVV	R7, 24(R29)
+	JAL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVV	0(R29), R4
+	MOVV	R4, (g_sched+gobuf_pc)(g)
+	MOVV	$(24+8)(R29), R2	// must match frame size
+	MOVV	R2, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVV	g_m(g), R3
+	MOVV	m_g0(R3), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R29
+	MOVV	savedsp-24(SP), R2	// must match frame size
+	MOVV	R2, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m
+	// for the duration of the call. Since the call is over, return it with dropm.
+	MOVV	savedm-8(SP), R3
+	BNE	R3, droppedm
+	MOVV	$runtime·dropm(SB), R4
+	JAL	(R4)
+droppedm:
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVV	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	JAL	runtime·save_g(SB)
+	RET
+
+// void setg_gcc(G*); set g called from gcc with g in R1
+TEXT setg_gcc<>(SB),NOSPLIT,$0-0
+	MOVV	R1, g
+	JAL	runtime·save_g(SB)
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	(R0), R0
+	UNDEF
+
+// AES hashing not implemented for mips64
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVW	$0, R1
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$16
+	// g (R30) and REGTMP (R23)  might be clobbered by load_g. They
+	// are callee-save in the gcc calling convention, so save them.
+	MOVV	R23, savedR23-16(SP)
+	MOVV	g, savedG-8(SP)
+
+	JAL	runtime·load_g(SB)
+	MOVV	g_m(g), R1
+	MOVV	m_curg(R1), R1
+	MOVV	(g_stack+stack_hi)(R1), R2 // return value in R2
+
+	MOVV	savedG-8(SP), g
+	MOVV	savedR23-16(SP), R23
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	NOR	R0, R0	// NOP
+	JAL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	NOR	R0, R0	// NOP
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R1
+	MOVB	R1, ret+0(FP)
+	RET
+
+// gcWriteBarrier performs a heap pointer write and informs the GC.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It takes two arguments:
+// - R20 is the destination of the write
+// - R21 is the value being written at R20.
+// It clobbers R23 (the linker temp register).
+// The act of CALLing gcWriteBarrier will clobber R31 (LR).
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+TEXT runtime·gcWriteBarrier(SB),NOSPLIT,$192
+	// Save the registers clobbered by the fast path.
+	MOVV	R1, 184(R29)
+	MOVV	R2, 192(R29)
+	MOVV	g_m(g), R1
+	MOVV	m_p(R1), R1
+	MOVV	(p_wbBuf+wbBuf_next)(R1), R2
+	// Increment wbBuf.next position.
+	ADDV	$16, R2
+	MOVV	R2, (p_wbBuf+wbBuf_next)(R1)
+	MOVV	(p_wbBuf+wbBuf_end)(R1), R1
+	MOVV	R1, R23		// R23 is linker temp register
+	// Record the write.
+	MOVV	R21, -16(R2)	// Record value
+	MOVV	(R20), R1	// TODO: This turns bad writes into bad reads.
+	MOVV	R1, -8(R2)	// Record *slot
+	// Is the buffer full?
+	BEQ	R2, R23, flush
+ret:
+	MOVV	184(R29), R1
+	MOVV	192(R29), R2
+	// Do the write.
+	MOVV	R21, (R20)
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	MOVV	R20, 8(R29)	// Also first argument to wbBufFlush
+	MOVV	R21, 16(R29)	// Also second argument to wbBufFlush
+	// R1 already saved
+	// R2 already saved
+	MOVV	R3, 24(R29)
+	MOVV	R4, 32(R29)
+	MOVV	R5, 40(R29)
+	MOVV	R6, 48(R29)
+	MOVV	R7, 56(R29)
+	MOVV	R8, 64(R29)
+	MOVV	R9, 72(R29)
+	MOVV	R10, 80(R29)
+	MOVV	R11, 88(R29)
+	MOVV	R12, 96(R29)
+	MOVV	R13, 104(R29)
+	MOVV	R14, 112(R29)
+	MOVV	R15, 120(R29)
+	MOVV	R16, 128(R29)
+	MOVV	R17, 136(R29)
+	MOVV	R18, 144(R29)
+	MOVV	R19, 152(R29)
+	// R20 already saved
+	// R21 already saved.
+	MOVV	R22, 160(R29)
+	// R23 is tmp register.
+	MOVV	R24, 168(R29)
+	MOVV	R25, 176(R29)
+	// R26 is reserved by kernel.
+	// R27 is reserved by kernel.
+	// R28 is REGSB (not modified by Go code).
+	// R29 is SP.
+	// R30 is g.
+	// R31 is LR, which was saved by the prologue.
+
+	// This takes arguments R20 and R21.
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVV	8(R29), R20
+	MOVV	16(R29), R21
+	MOVV	24(R29), R3
+	MOVV	32(R29), R4
+	MOVV	40(R29), R5
+	MOVV	48(R29), R6
+	MOVV	56(R29), R7
+	MOVV	64(R29), R8
+	MOVV	72(R29), R9
+	MOVV	80(R29), R10
+	MOVV	88(R29), R11
+	MOVV	96(R29), R12
+	MOVV	104(R29), R13
+	MOVV	112(R29), R14
+	MOVV	120(R29), R15
+	MOVV	128(R29), R16
+	MOVV	136(R29), R17
+	MOVV	144(R29), R18
+	MOVV	152(R29), R19
+	MOVV	160(R29), R22
+	MOVV	168(R29), R24
+	MOVV	176(R29), R25
+	JMP	ret
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-16
+	MOVV	R3, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-16
+	MOVV	R3, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-16
+	MOVV	R3, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-16
+	MOVV	R3, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
diff --git a/src/runtime/asm_mipsx.s b/src/runtime/asm_mipsx.s
new file mode 100644
index 0000000..ee87d81
--- /dev/null
+++ b/src/runtime/asm_mipsx.s
@@ -0,0 +1,896 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+#define	REGCTXT	R22
+
+TEXT runtime·rt0_go(SB),NOSPLIT,$0
+	// R29 = stack; R4 = argc; R5 = argv
+
+	ADDU	$-12, R29
+	MOVW	R4, 4(R29)	// argc
+	MOVW	R5, 8(R29)	// argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVW	$runtime·g0(SB), g
+	MOVW	$(-64*1024), R23
+	ADD	R23, R29, R1
+	MOVW	R1, g_stackguard0(g)
+	MOVW	R1, g_stackguard1(g)
+	MOVW	R1, (g_stack+stack_lo)(g)
+	MOVW	R29, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOVW	_cgo_init(SB), R25
+	BEQ	R25, nocgo
+	ADDU	$-16, R29
+	MOVW	R0, R7	// arg 3: not used
+	MOVW	R0, R6	// arg 2: not used
+	MOVW	$setg_gcc<>(SB), R5	// arg 1: setg
+	MOVW	g, R4	// arg 0: G
+	JAL	(R25)
+	ADDU	$16, R29
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOVW	(g_stack+stack_lo)(g), R1
+	ADD	$const__StackGuard, R1
+	MOVW	R1, g_stackguard0(g)
+	MOVW	R1, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVW	$runtime·m0(SB), R1
+
+	// save m->g0 = g0
+	MOVW	g, m_g0(R1)
+	// save m0 to g0->m
+	MOVW	R1, g_m(g)
+
+	JAL	runtime·check(SB)
+
+	// args are already prepared
+	JAL	runtime·args(SB)
+	JAL	runtime·osinit(SB)
+	JAL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVW	$runtime·mainPC(SB), R1	// entry
+	ADDU	$-12, R29
+	MOVW	R1, 8(R29)
+	MOVW	R0, 4(R29)
+	MOVW	R0, 0(R29)
+	JAL	runtime·newproc(SB)
+	ADDU	$12, R29
+
+	// start this M
+	JAL	runtime·mstart(SB)
+
+	UNDEF
+	RET
+
+DATA	runtime·mainPC+0(SB)/4,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$4
+
+TEXT runtime·breakpoint(SB),NOSPLIT,$0-0
+	BREAK
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT,$0-0
+	RET
+
+/*
+ *  go-routine
+ */
+
+// void gosave(Gobuf*)
+// save state in Gobuf; setjmp
+TEXT runtime·gosave(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	buf+0(FP), R1
+	MOVW	R29, gobuf_sp(R1)
+	MOVW	R31, gobuf_pc(R1)
+	MOVW	g, gobuf_g(R1)
+	MOVW	R0, gobuf_lr(R1)
+	MOVW	R0, gobuf_ret(R1)
+	// Assert ctxt is zero. See func save.
+	MOVW	gobuf_ctxt(R1), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·badctxt(SB)
+	RET
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB),NOSPLIT,$8-4
+	MOVW	buf+0(FP), R3
+	MOVW	gobuf_g(R3), g	// make sure g is not nil
+	JAL	runtime·save_g(SB)
+
+	MOVW	0(g), R2
+	MOVW	gobuf_sp(R3), R29
+	MOVW	gobuf_lr(R3), R31
+	MOVW	gobuf_ret(R3), R1
+	MOVW	gobuf_ctxt(R3), REGCTXT
+	MOVW	R0, gobuf_sp(R3)
+	MOVW	R0, gobuf_ret(R3)
+	MOVW	R0, gobuf_lr(R3)
+	MOVW	R0, gobuf_ctxt(R3)
+	MOVW	gobuf_pc(R3), R4
+	JMP	(R4)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB),NOSPLIT|NOFRAME,$0-4
+	// Save caller state in g->sched
+	MOVW	R29, (g_sched+gobuf_sp)(g)
+	MOVW	R31, (g_sched+gobuf_pc)(g)
+	MOVW	R0, (g_sched+gobuf_lr)(g)
+	MOVW	g, (g_sched+gobuf_g)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVW	g, R1
+	MOVW	g_m(g), R3
+	MOVW	m_g0(R3), g
+	JAL	runtime·save_g(SB)
+	BNE	g, R1, 2(PC)
+	JMP	runtime·badmcall(SB)
+	MOVW	fn+0(FP), REGCTXT	// context
+	MOVW	0(REGCTXT), R4	// code pointer
+	MOVW	(g_sched+gobuf_sp)(g), R29	// sp = m->g0->sched.sp
+	ADDU	$-8, R29	// make room for 1 arg and fake LR
+	MOVW	R1, 4(R29)
+	MOVW	R0, 0(R29)
+	JAL	(R4)
+	JMP	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack.  We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB),NOSPLIT,$0-0
+	UNDEF
+	JAL	(R31)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB),NOSPLIT,$0-4
+	MOVW	fn+0(FP), R1	// R1 = fn
+	MOVW	R1, REGCTXT	// context
+	MOVW	g_m(g), R2	// R2 = m
+
+	MOVW	m_gsignal(R2), R3	// R3 = gsignal
+	BEQ	g, R3, noswitch
+
+	MOVW	m_g0(R2), R3	// R3 = g0
+	BEQ	g, R3, noswitch
+
+	MOVW	m_curg(R2), R4
+	BEQ	g, R4, switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVW	$runtime·badsystemstack(SB), R4
+	JAL	(R4)
+	JAL	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched.  Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVW	$runtime·systemstack_switch(SB), R4
+	ADDU	$8, R4	// get past prologue
+	MOVW	R4, (g_sched+gobuf_pc)(g)
+	MOVW	R29, (g_sched+gobuf_sp)(g)
+	MOVW	R0, (g_sched+gobuf_lr)(g)
+	MOVW	g, (g_sched+gobuf_g)(g)
+
+	// switch to g0
+	MOVW	R3, g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R1
+	// make it look like mstart called systemstack on g0, to stop traceback
+	ADDU	$-4, R1
+	MOVW	$runtime·mstart(SB), R2
+	MOVW	R2, 0(R1)
+	MOVW	R1, R29
+
+	// call target function
+	MOVW	0(REGCTXT), R4	// code pointer
+	JAL	(R4)
+
+	// switch back to g
+	MOVW	g_m(g), R1
+	MOVW	m_curg(R1), g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R29
+	MOVW	R0, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVW	0(REGCTXT), R4	// code pointer
+	MOVW	0(R29), R31	// restore LR
+	ADD	$4, R29
+	JMP	(R4)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R1: framesize, R2: argsize, R3: LR
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVW	g_m(g), R7
+	MOVW	m_g0(R7), R8
+	BNE	g, R8, 3(PC)
+	JAL	runtime·badmorestackg0(SB)
+	JAL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVW	m_gsignal(R7), R8
+	BNE	g, R8, 3(PC)
+	JAL	runtime·badmorestackgsignal(SB)
+	JAL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVW	R29, (g_sched+gobuf_sp)(g)
+	MOVW	R31, (g_sched+gobuf_pc)(g)
+	MOVW	R3, (g_sched+gobuf_lr)(g)
+	MOVW	REGCTXT, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVW	R3, (m_morebuf+gobuf_pc)(R7)	// f's caller's PC
+	MOVW	R29, (m_morebuf+gobuf_sp)(R7)	// f's caller's SP
+	MOVW	g, (m_morebuf+gobuf_g)(R7)
+
+	// Call newstack on m->g0's stack.
+	MOVW	m_g0(R7), g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R29
+	// Create a stack frame on g0 to call newstack.
+	MOVW	R0, -4(R29)	// Zero saved LR in frame
+	ADDU	$-4, R29
+	JAL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0-0
+	MOVW	R0, REGCTXT
+	JMP	runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(argtype *_type, f *FuncVal, arg *byte, argsize, retoffset uint32).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+
+#define DISPATCH(NAME,MAXSIZE)	\
+	MOVW	$MAXSIZE, R23;	\
+	SGTU	R1, R23, R23;	\
+	BNE	R23, 3(PC);	\
+	MOVW	$NAME(SB), R4;	\
+	JMP	(R4)
+
+TEXT ·reflectcall(SB),NOSPLIT|NOFRAME,$0-20
+	MOVW	argsize+12(FP), R1
+
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVW	$runtime·badreflectcall(SB), R4
+	JMP	(R4)
+
+#define CALLFN(NAME,MAXSIZE)	\
+TEXT NAME(SB),WRAPPER,$MAXSIZE-20;	\
+	NO_LOCAL_POINTERS;	\
+	/* copy arguments to stack */		\
+	MOVW	arg+8(FP), R1;	\
+	MOVW	argsize+12(FP), R2;	\
+	MOVW	R29, R3;	\
+	ADDU	$4, R3;	\
+	ADDU	R3, R2;	\
+	BEQ	R3, R2, 6(PC);	\
+	MOVBU	(R1), R4;	\
+	ADDU	$1, R1;	\
+	MOVBU	R4, (R3);	\
+	ADDU	$1, R3;	\
+	JMP	-5(PC);	\
+	/* call function */			\
+	MOVW	f+4(FP), REGCTXT;	\
+	MOVW	(REGCTXT), R4;	\
+	PCDATA	$PCDATA_StackMapIndex, $0;	\
+	JAL	(R4);	\
+	/* copy return values back */		\
+	MOVW	argtype+0(FP), R5;	\
+	MOVW	arg+8(FP), R1;	\
+	MOVW	n+12(FP), R2;	\
+	MOVW	retoffset+16(FP), R4;	\
+	ADDU	$4, R29, R3;	\
+	ADDU	R4, R3;	\
+	ADDU	R4, R1;	\
+	SUBU	R4, R2;	\
+	JAL	callRet<>(SB);		\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $16-0
+	MOVW	R5, 4(R29)
+	MOVW	R1, 8(R29)
+	MOVW	R3, 12(R29)
+	MOVW	R2, 16(R29)
+	JAL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-4
+	RET
+
+// void jmpdefer(fv, sp);
+// called from deferreturn.
+// 1. grab stored LR for caller
+// 2. sub 8 bytes to get back to JAL deferreturn
+// 3. JMP to fn
+TEXT runtime·jmpdefer(SB),NOSPLIT,$0-8
+	MOVW	0(R29), R31
+	ADDU	$-8, R31
+
+	MOVW	fv+0(FP), REGCTXT
+	MOVW	argp+4(FP), R29
+	ADDU	$-4, R29
+	NOR	R0, R0	// prevent scheduling
+	MOVW	0(REGCTXT), R4
+	JMP	(R4)
+
+// Save state of caller into g->sched. Smashes R1.
+TEXT gosave<>(SB),NOSPLIT|NOFRAME,$0
+	MOVW	R31, (g_sched+gobuf_pc)(g)
+	MOVW	R29, (g_sched+gobuf_sp)(g)
+	MOVW	R0, (g_sched+gobuf_lr)(g)
+	MOVW	R0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVW	(g_sched+gobuf_ctxt)(g), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·badctxt(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-12
+	MOVW	fn+0(FP), R25
+	MOVW	arg+4(FP), R4
+
+	MOVW	R29, R3	// save original stack pointer
+	MOVW	g, R2
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already.
+	MOVW	g_m(g), R5
+	MOVW	m_g0(R5), R6
+	BEQ	R6, g, g0
+
+	JAL	gosave<>(SB)
+	MOVW	R6, g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R29
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+	// Save room for two of our pointers and O32 frame.
+	ADDU	$-24, R29
+	AND	$~7, R29	// O32 ABI expects 8-byte aligned stack on function entry
+	MOVW	R2, 16(R29)	// save old g on stack
+	MOVW	(g_stack+stack_hi)(R2), R2
+	SUBU	R3, R2
+	MOVW	R2, 20(R29)	// save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+	JAL	(R25)
+
+	// Restore g, stack pointer. R2 is return value.
+	MOVW	16(R29), g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_stack+stack_hi)(g), R5
+	MOVW	20(R29), R6
+	SUBU	R6, R5
+	MOVW	R5, R29
+
+	MOVW	R2, ret+8(FP)
+	RET
+
+// cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$12-12
+	NO_LOCAL_POINTERS
+
+	// Load m and g from thread-local storage.
+	MOVB	runtime·iscgo(SB), R1
+	BEQ	R1, nocgo
+	JAL	runtime·load_g(SB)
+nocgo:
+
+	// If g is nil, Go did not create the current thread.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	BEQ	g, needm
+
+	MOVW	g_m(g), R3
+	MOVW	R3, savedm-4(SP)
+	JMP	havem
+
+needm:
+	MOVW	g, savedm-4(SP) // g is zero, so is m.
+	MOVW	$runtime·needm(SB), R4
+	JAL	(R4)
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVW	g_m(g), R3
+	MOVW	m_g0(R3), R1
+	MOVW	R29, (g_sched+gobuf_sp)(R1)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 4(R29) aka savedsp-8(SP).
+	MOVW	m_g0(R3), R1
+	MOVW	(g_sched+gobuf_sp)(R1), R2
+	MOVW	R2, savedsp-12(SP)	// must match frame size
+	MOVW	R29, (g_sched+gobuf_sp)(R1)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVW	m_curg(R3), g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R2 // prepare stack as R2
+	MOVW	(g_sched+gobuf_pc)(g), R4
+	MOVW	R4, -(12+4)(R2)	// "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOVW	fn+0(FP), R5
+	MOVW	frame+4(FP), R6
+	MOVW	ctxt+8(FP), R7
+	MOVW	$-(12+4)(R2), R29	// switch stack; must match frame size
+	MOVW	R5, 4(R29)
+	MOVW	R6, 8(R29)
+	MOVW	R7, 12(R29)
+	JAL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVW	0(R29), R4
+	MOVW	R4, (g_sched+gobuf_pc)(g)
+	MOVW	$(12+4)(R29), R2	// must match frame size
+	MOVW	R2, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVW	g_m(g), R3
+	MOVW	m_g0(R3), g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R29
+	MOVW	savedsp-12(SP), R2	// must match frame size
+	MOVW	R2, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m
+	// for the duration of the call. Since the call is over, return it with dropm.
+	MOVW	savedm-4(SP), R3
+	BNE	R3, droppedm
+	MOVW	$runtime·dropm(SB), R4
+	JAL	(R4)
+droppedm:
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+// This only happens if iscgo, so jump straight to save_g
+TEXT runtime·setg(SB),NOSPLIT,$0-4
+	MOVW	gg+0(FP), g
+	JAL	runtime·save_g(SB)
+	RET
+
+// void setg_gcc(G*); set g in C TLS.
+// Must obey the gcc calling convention.
+TEXT setg_gcc<>(SB),NOSPLIT,$0
+	MOVW	R4, g
+	JAL	runtime·save_g(SB)
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT,$0-0
+	UNDEF
+
+// AES hashing not implemented for mips
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB),NOSPLIT,$0
+	MOVW	$0, R1
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT|NOFRAME,$0
+	// g (R30), R3 and REGTMP (R23) might be clobbered by load_g. R30 and R23
+	// are callee-save in the gcc calling convention, so save them.
+	MOVW	R23, R8
+	MOVW	g, R9
+	MOVW	R31, R10 // this call frame does not save LR
+
+	JAL	runtime·load_g(SB)
+	MOVW	g_m(g), R1
+	MOVW	m_curg(R1), R1
+	MOVW	(g_stack+stack_hi)(R1), R2 // return value in R2
+
+	MOVW	R8, R23
+	MOVW	R9, g
+	MOVW	R10, R31
+
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	NOR	R0, R0	// NOP
+	JAL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	NOR	R0, R0	// NOP
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R1
+	MOVB	R1, ret+0(FP)
+	RET
+
+// gcWriteBarrier performs a heap pointer write and informs the GC.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It takes two arguments:
+// - R20 is the destination of the write
+// - R21 is the value being written at R20.
+// It clobbers R23 (the linker temp register).
+// The act of CALLing gcWriteBarrier will clobber R31 (LR).
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+TEXT runtime·gcWriteBarrier(SB),NOSPLIT,$104
+	// Save the registers clobbered by the fast path.
+	MOVW	R1, 100(R29)
+	MOVW	R2, 104(R29)
+	MOVW	g_m(g), R1
+	MOVW	m_p(R1), R1
+	MOVW	(p_wbBuf+wbBuf_next)(R1), R2
+	// Increment wbBuf.next position.
+	ADD	$8, R2
+	MOVW	R2, (p_wbBuf+wbBuf_next)(R1)
+	MOVW	(p_wbBuf+wbBuf_end)(R1), R1
+	MOVW	R1, R23		// R23 is linker temp register
+	// Record the write.
+	MOVW	R21, -8(R2)	// Record value
+	MOVW	(R20), R1	// TODO: This turns bad writes into bad reads.
+	MOVW	R1, -4(R2)	// Record *slot
+	// Is the buffer full?
+	BEQ	R2, R23, flush
+ret:
+	MOVW	100(R29), R1
+	MOVW	104(R29), R2
+	// Do the write.
+	MOVW	R21, (R20)
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	MOVW	R20, 4(R29)	// Also first argument to wbBufFlush
+	MOVW	R21, 8(R29)	// Also second argument to wbBufFlush
+	// R1 already saved
+	// R2 already saved
+	MOVW	R3, 12(R29)
+	MOVW	R4, 16(R29)
+	MOVW	R5, 20(R29)
+	MOVW	R6, 24(R29)
+	MOVW	R7, 28(R29)
+	MOVW	R8, 32(R29)
+	MOVW	R9, 36(R29)
+	MOVW	R10, 40(R29)
+	MOVW	R11, 44(R29)
+	MOVW	R12, 48(R29)
+	MOVW	R13, 52(R29)
+	MOVW	R14, 56(R29)
+	MOVW	R15, 60(R29)
+	MOVW	R16, 64(R29)
+	MOVW	R17, 68(R29)
+	MOVW	R18, 72(R29)
+	MOVW	R19, 76(R29)
+	MOVW	R20, 80(R29)
+	// R21 already saved
+	// R22 already saved.
+	MOVW	R22, 84(R29)
+	// R23 is tmp register.
+	MOVW	R24, 88(R29)
+	MOVW	R25, 92(R29)
+	// R26 is reserved by kernel.
+	// R27 is reserved by kernel.
+	MOVW	R28, 96(R29)
+	// R29 is SP.
+	// R30 is g.
+	// R31 is LR, which was saved by the prologue.
+
+	// This takes arguments R20 and R21.
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVW	4(R29), R20
+	MOVW	8(R29), R21
+	MOVW	12(R29), R3
+	MOVW	16(R29), R4
+	MOVW	20(R29), R5
+	MOVW	24(R29), R6
+	MOVW	28(R29), R7
+	MOVW	32(R29), R8
+	MOVW	36(R29), R9
+	MOVW	40(R29), R10
+	MOVW	44(R29), R11
+	MOVW	48(R29), R12
+	MOVW	52(R29), R13
+	MOVW	56(R29), R14
+	MOVW	60(R29), R15
+	MOVW	64(R29), R16
+	MOVW	68(R29), R17
+	MOVW	72(R29), R18
+	MOVW	76(R29), R19
+	MOVW	80(R29), R20
+	MOVW	84(R29), R22
+	MOVW	88(R29), R24
+	MOVW	92(R29), R25
+	MOVW	96(R29), R28
+	JMP	ret
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-8
+	MOVW	R3, x+0(FP)
+	MOVW	R4, y+4(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-8
+	MOVW	R3, x+0(FP)
+	MOVW	R4, y+4(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-8
+	MOVW	R3, x+0(FP)
+	MOVW	R4, y+4(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-8
+	MOVW	R3, x+0(FP)
+	MOVW	R4, y+4(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+
+// Extended versions for 64-bit indexes.
+TEXT runtime·panicExtendIndex(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendIndex(SB)
+TEXT runtime·panicExtendIndexU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendIndexU(SB)
+TEXT runtime·panicExtendSliceAlen(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlen(SB)
+TEXT runtime·panicExtendSliceAlenU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlenU(SB)
+TEXT runtime·panicExtendSliceAcap(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcap(SB)
+TEXT runtime·panicExtendSliceAcapU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcapU(SB)
+TEXT runtime·panicExtendSliceB(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceB(SB)
+TEXT runtime·panicExtendSliceBU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceBU(SB)
+TEXT runtime·panicExtendSlice3Alen(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R3, lo+4(FP)
+	MOVW	R4, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Alen(SB)
+TEXT runtime·panicExtendSlice3AlenU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R3, lo+4(FP)
+	MOVW	R4, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AlenU(SB)
+TEXT runtime·panicExtendSlice3Acap(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R3, lo+4(FP)
+	MOVW	R4, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Acap(SB)
+TEXT runtime·panicExtendSlice3AcapU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R3, lo+4(FP)
+	MOVW	R4, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AcapU(SB)
+TEXT runtime·panicExtendSlice3B(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3B(SB)
+TEXT runtime·panicExtendSlice3BU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3BU(SB)
+TEXT runtime·panicExtendSlice3C(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3C(SB)
+TEXT runtime·panicExtendSlice3CU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3CU(SB)
diff --git a/src/runtime/asm_ppc64x.h b/src/runtime/asm_ppc64x.h
new file mode 100644
index 0000000..5e55055
--- /dev/null
+++ b/src/runtime/asm_ppc64x.h
@@ -0,0 +1,25 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// FIXED_FRAME defines the size of the fixed part of a stack frame. A stack
+// frame looks like this:
+//
+// +---------------------+
+// | local variable area |
+// +---------------------+
+// | argument area       |
+// +---------------------+ <- R1+FIXED_FRAME
+// | fixed area          |
+// +---------------------+ <- R1
+//
+// So a function that sets up a stack frame at all uses as least FIXED_FRAME
+// bytes of stack. This mostly affects assembly that calls other functions
+// with arguments (the arguments should be stored at FIXED_FRAME+0(R1),
+// FIXED_FRAME+8(R1) etc) and some other low-level places.
+//
+// The reason for using a constant is to make supporting PIC easier (although
+// we only support PIC on ppc64le which has a minimum 32 bytes of stack frame,
+// and currently always use that much, PIC on ppc64 would need to use 48).
+
+#define FIXED_FRAME 32
diff --git a/src/runtime/asm_ppc64x.s b/src/runtime/asm_ppc64x.s
new file mode 100644
index 0000000..dc34c0e
--- /dev/null
+++ b/src/runtime/asm_ppc64x.s
@@ -0,0 +1,1038 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+#include "asm_ppc64x.h"
+
+#ifdef GOOS_aix
+#define cgoCalleeStackSize 48
+#else
+#define cgoCalleeStackSize 32
+#endif
+
+TEXT runtime·rt0_go(SB),NOSPLIT,$0
+	// R1 = stack; R3 = argc; R4 = argv; R13 = C TLS base pointer
+
+	// initialize essential registers
+	BL	runtime·reginit(SB)
+
+	SUB	$(FIXED_FRAME+16), R1
+	MOVD	R2, 24(R1)		// stash the TOC pointer away again now we've created a new frame
+	MOVW	R3, FIXED_FRAME+0(R1)	// argc
+	MOVD	R4, FIXED_FRAME+8(R1)	// argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVD	$runtime·g0(SB), g
+	BL	runtime·save_g(SB)
+	MOVD	$(-64*1024), R31
+	ADD	R31, R1, R3
+	MOVD	R3, g_stackguard0(g)
+	MOVD	R3, g_stackguard1(g)
+	MOVD	R3, (g_stack+stack_lo)(g)
+	MOVD	R1, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOVD	_cgo_init(SB), R12
+	CMP	R0, R12
+	BEQ	nocgo
+#ifdef GOARCH_ppc64
+	// ppc64 use elf ABI v1. we must get the real entry address from
+	// first slot of the function descriptor before call.
+	MOVD	8(R12), R2
+	MOVD	(R12), R12
+#endif
+	MOVD	R12, CTR		// r12 = "global function entry point"
+	MOVD	R13, R5			// arg 2: TLS base pointer
+	MOVD	$setg_gcc<>(SB), R4 	// arg 1: setg
+	MOVD	g, R3			// arg 0: G
+	// C functions expect 32 (48 for AIX) bytes of space on caller
+	// stack frame and a 16-byte aligned R1
+	MOVD	R1, R14			// save current stack
+	SUB	$cgoCalleeStackSize, R1	// reserve the callee area
+	RLDCR	$0, R1, $~15, R1	// 16-byte align
+	BL	(CTR)			// may clobber R0, R3-R12
+	MOVD	R14, R1			// restore stack
+#ifndef GOOS_aix
+	MOVD	24(R1), R2
+#endif
+	XOR	R0, R0			// fix R0
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOVD	(g_stack+stack_lo)(g), R3
+	ADD	$const__StackGuard, R3
+	MOVD	R3, g_stackguard0(g)
+	MOVD	R3, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVD	$runtime·m0(SB), R3
+
+	// save m->g0 = g0
+	MOVD	g, m_g0(R3)
+	// save m0 to g0->m
+	MOVD	R3, g_m(g)
+
+	BL	runtime·check(SB)
+
+	// args are already prepared
+	BL	runtime·args(SB)
+	BL	runtime·osinit(SB)
+	BL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVD	$runtime·mainPC(SB), R3		// entry
+	MOVDU	R3, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	BL	runtime·newproc(SB)
+	ADD	$(16+FIXED_FRAME), R1
+
+	// start this M
+	BL	runtime·mstart(SB)
+
+	MOVD	R0, 0(R0)
+	RET
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	R0, 0(R0) // TODO: TD
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+// Any changes must be reflected to runtime/cgo/gcc_aix_ppc64.S:.crosscall_ppc64
+TEXT _cgo_reginit(SB),NOSPLIT|NOFRAME,$0-0
+	// crosscall_ppc64 and crosscall2 need to reginit, but can't
+	// get at the 'runtime.reginit' symbol.
+	BR	runtime·reginit(SB)
+
+TEXT runtime·reginit(SB),NOSPLIT|NOFRAME,$0-0
+	// set R0 to zero, it's expected by the toolchain
+	XOR R0, R0
+	RET
+
+/*
+ *  go-routine
+ */
+
+// void gosave(Gobuf*)
+// save state in Gobuf; setjmp
+TEXT runtime·gosave(SB), NOSPLIT|NOFRAME, $0-8
+	MOVD	buf+0(FP), R3
+	MOVD	R1, gobuf_sp(R3)
+	MOVD	LR, R31
+	MOVD	R31, gobuf_pc(R3)
+	MOVD	g, gobuf_g(R3)
+	MOVD	R0, gobuf_lr(R3)
+	MOVD	R0, gobuf_ret(R3)
+	// Assert ctxt is zero. See func save.
+	MOVD	gobuf_ctxt(R3), R3
+	CMP	R0, R3
+	BEQ	2(PC)
+	BL	runtime·badctxt(SB)
+	RET
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT, $16-8
+	MOVD	buf+0(FP), R5
+	MOVD	gobuf_g(R5), g	// make sure g is not nil
+	BL	runtime·save_g(SB)
+
+	MOVD	0(g), R4
+	MOVD	gobuf_sp(R5), R1
+	MOVD	gobuf_lr(R5), R31
+#ifndef GOOS_aix
+	MOVD	24(R1), R2	// restore R2
+#endif
+	MOVD	R31, LR
+	MOVD	gobuf_ret(R5), R3
+	MOVD	gobuf_ctxt(R5), R11
+	MOVD	R0, gobuf_sp(R5)
+	MOVD	R0, gobuf_ret(R5)
+	MOVD	R0, gobuf_lr(R5)
+	MOVD	R0, gobuf_ctxt(R5)
+	CMP	R0, R0 // set condition codes for == test, needed by stack split
+	MOVD	gobuf_pc(R5), R12
+	MOVD	R12, CTR
+	BR	(CTR)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT|NOFRAME, $0-8
+	// Save caller state in g->sched
+	MOVD	R1, (g_sched+gobuf_sp)(g)
+	MOVD	LR, R31
+	MOVD	R31, (g_sched+gobuf_pc)(g)
+	MOVD	R0, (g_sched+gobuf_lr)(g)
+	MOVD	g, (g_sched+gobuf_g)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVD	g, R3
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	CMP	g, R3
+	BNE	2(PC)
+	BR	runtime·badmcall(SB)
+	MOVD	fn+0(FP), R11			// context
+	MOVD	0(R11), R12			// code pointer
+	MOVD	R12, CTR
+	MOVD	(g_sched+gobuf_sp)(g), R1	// sp = m->g0->sched.sp
+	MOVDU	R3, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	BL	(CTR)
+	MOVD	24(R1), R2
+	BR	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	// We have several undefs here so that 16 bytes past
+	// $runtime·systemstack_switch lies within them whether or not the
+	// instructions that derive r2 from r12 are there.
+	UNDEF
+	UNDEF
+	UNDEF
+	BL	(LR)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVD	fn+0(FP), R3	// R3 = fn
+	MOVD	R3, R11		// context
+	MOVD	g_m(g), R4	// R4 = m
+
+	MOVD	m_gsignal(R4), R5	// R5 = gsignal
+	CMP	g, R5
+	BEQ	noswitch
+
+	MOVD	m_g0(R4), R5	// R5 = g0
+	CMP	g, R5
+	BEQ	noswitch
+
+	MOVD	m_curg(R4), R6
+	CMP	g, R6
+	BEQ	switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVD	$runtime·badsystemstack(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+	BL	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVD	$runtime·systemstack_switch(SB), R6
+	ADD     $16, R6 // get past prologue (including r2-setting instructions when they're there)
+	MOVD	R6, (g_sched+gobuf_pc)(g)
+	MOVD	R1, (g_sched+gobuf_sp)(g)
+	MOVD	R0, (g_sched+gobuf_lr)(g)
+	MOVD	g, (g_sched+gobuf_g)(g)
+
+	// switch to g0
+	MOVD	R5, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R3
+	// make it look like mstart called systemstack on g0, to stop traceback
+	SUB	$FIXED_FRAME, R3
+	MOVD	$runtime·mstart(SB), R4
+	MOVD	R4, 0(R3)
+	MOVD	R3, R1
+
+	// call target function
+	MOVD	0(R11), R12	// code pointer
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	// restore TOC pointer. It seems unlikely that we will use systemstack
+	// to call a function defined in another module, but the results of
+	// doing so would be so confusing that it's worth doing this.
+	MOVD	g_m(g), R3
+	MOVD	m_curg(R3), g
+	MOVD	(g_sched+gobuf_sp)(g), R3
+#ifndef GOOS_aix
+	MOVD	24(R3), R2
+#endif
+	// switch back to g
+	MOVD	g_m(g), R3
+	MOVD	m_curg(R3), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R1
+	MOVD	R0, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// On other arches we do a tail call here, but it appears to be
+	// impossible to tail call a function pointer in shared mode on
+	// ppc64 because the caller is responsible for restoring the TOC.
+	MOVD	0(R11), R12	// code pointer
+	MOVD	R12, CTR
+	BL	(CTR)
+#ifndef GOOS_aix
+	MOVD	24(R1), R2
+#endif
+	RET
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R3: framesize, R4: argsize, R5: LR
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVD	g_m(g), R7
+	MOVD	m_g0(R7), R8
+	CMP	g, R8
+	BNE	3(PC)
+	BL	runtime·badmorestackg0(SB)
+	BL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVD	m_gsignal(R7), R8
+	CMP	g, R8
+	BNE	3(PC)
+	BL	runtime·badmorestackgsignal(SB)
+	BL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVD	R1, (g_sched+gobuf_sp)(g)
+	MOVD	LR, R8
+	MOVD	R8, (g_sched+gobuf_pc)(g)
+	MOVD	R5, (g_sched+gobuf_lr)(g)
+	MOVD	R11, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVD	R5, (m_morebuf+gobuf_pc)(R7)	// f's caller's PC
+	MOVD	R1, (m_morebuf+gobuf_sp)(R7)	// f's caller's SP
+	MOVD	g, (m_morebuf+gobuf_g)(R7)
+
+	// Call newstack on m->g0's stack.
+	MOVD	m_g0(R7), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R1
+	MOVDU   R0, -(FIXED_FRAME+0)(R1)	// create a call frame on g0
+	BL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	R0, R11
+	BR	runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(argtype *_type, f *FuncVal, arg *byte, argsize, retoffset uint32).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	MOVD	$MAXSIZE, R31;		\
+	CMP	R3, R31;		\
+	BGT	4(PC);			\
+	MOVD	$NAME(SB), R12;		\
+	MOVD	R12, CTR;		\
+	BR	(CTR)
+// Note: can't just "BR NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT|NOFRAME, $0-32
+	MOVWZ argsize+24(FP), R3
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVD	$runtime·badreflectcall(SB), R12
+	MOVD	R12, CTR
+	BR	(CTR)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-24;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVD	arg+16(FP), R3;			\
+	MOVWZ	argsize+24(FP), R4;			\
+	MOVD    R1, R5;				\
+	CMP	R4, $8;				\
+	BLT	tailsetup;			\
+	/* copy 8 at a time if possible */	\
+	ADD	$(FIXED_FRAME-8), R5;			\
+	SUB	$8, R3;				\
+top: \
+	MOVDU	8(R3), R7;			\
+	MOVDU	R7, 8(R5);			\
+	SUB	$8, R4;				\
+	CMP	R4, $8;				\
+	BGE	top;				\
+	/* handle remaining bytes */	\
+	CMP	$0, R4;			\
+	BEQ	callfn;			\
+	ADD	$7, R3;			\
+	ADD	$7, R5;			\
+	BR	tail;			\
+tailsetup: \
+	CMP	$0, R4;			\
+	BEQ	callfn;			\
+	ADD     $(FIXED_FRAME-1), R5;	\
+	SUB     $1, R3;			\
+tail: \
+	MOVBU	1(R3), R6;		\
+	MOVBU	R6, 1(R5);		\
+	SUB	$1, R4;			\
+	CMP	$0, R4;			\
+	BGT	tail;			\
+callfn: \
+	/* call function */			\
+	MOVD	f+8(FP), R11;			\
+#ifdef GOOS_aix				\
+	/* AIX won't trigger a SIGSEGV if R11 = nil */	\
+	/* So it manually triggers it */	\
+	CMP	R0, R11				\
+	BNE	2(PC)				\
+	MOVD	R0, 0(R0)			\
+#endif						\
+	MOVD	(R11), R12;			\
+	MOVD	R12, CTR;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	BL	(CTR);				\
+#ifndef GOOS_aix				\
+	MOVD	24(R1), R2;			\
+#endif						\
+	/* copy return values back */		\
+	MOVD	argtype+0(FP), R7;		\
+	MOVD	arg+16(FP), R3;			\
+	MOVWZ	n+24(FP), R4;			\
+	MOVWZ	retoffset+28(FP), R6;		\
+	ADD	$FIXED_FRAME, R1, R5;		\
+	ADD	R6, R5; 			\
+	ADD	R6, R3;				\
+	SUB	R6, R4;				\
+	BL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $32-0
+	MOVD	R7, FIXED_FRAME+0(R1)
+	MOVD	R3, FIXED_FRAME+8(R1)
+	MOVD	R5, FIXED_FRAME+16(R1)
+	MOVD	R4, FIXED_FRAME+24(R1)
+	BL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	cycles+0(FP), R7
+	// POWER does not have a pause/yield instruction equivalent.
+	// Instead, we can lower the program priority by setting the
+	// Program Priority Register prior to the wait loop and set it
+	// back to default afterwards. On Linux, the default priority is
+	// medium-low. For details, see page 837 of the ISA 3.0.
+	OR	R1, R1, R1	// Set PPR priority to low
+again:
+	SUB	$1, R7
+	CMP	$0, R7
+	BNE	again
+	OR	R6, R6, R6	// Set PPR priority back to medium-low
+	RET
+
+// void jmpdefer(fv, sp);
+// called from deferreturn.
+// 1. grab stored LR for caller
+// 2. sub 8 bytes to get back to either nop or toc reload before deferreturn
+// 3. BR to fn
+// When dynamically linking Go, it is not sufficient to rewind to the BL
+// deferreturn -- we might be jumping between modules and so we need to reset
+// the TOC pointer in r2. To do this, codegen inserts MOVD 24(R1), R2 *before*
+// the BL deferreturn and jmpdefer rewinds to that.
+TEXT runtime·jmpdefer(SB), NOSPLIT|NOFRAME, $0-16
+	MOVD	0(R1), R31
+	SUB     $8, R31
+	MOVD	R31, LR
+
+	MOVD	fv+0(FP), R11
+	MOVD	argp+8(FP), R1
+	SUB	$FIXED_FRAME, R1
+#ifdef GOOS_aix
+	// AIX won't trigger a SIGSEGV if R11 = nil
+	// So it manually triggers it
+	CMP	R0, R11
+	BNE	2(PC)
+	MOVD	R0, 0(R0)
+#endif
+	MOVD	0(R11), R12
+	MOVD	R12, CTR
+	BR	(CTR)
+
+// Save state of caller into g->sched. Smashes R31.
+TEXT gosave<>(SB),NOSPLIT|NOFRAME,$0
+	MOVD	LR, R31
+	MOVD	R31, (g_sched+gobuf_pc)(g)
+	MOVD	R1, (g_sched+gobuf_sp)(g)
+	MOVD	R0, (g_sched+gobuf_lr)(g)
+	MOVD	R0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVD	(g_sched+gobuf_ctxt)(g), R31
+	CMP	R0, R31
+	BEQ	2(PC)
+	BL	runtime·badctxt(SB)
+	RET
+
+#ifdef GOOS_aix
+#define asmcgocallSaveOffset cgoCalleeStackSize + 8
+#else
+#define asmcgocallSaveOffset cgoCalleeStackSize
+#endif
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOVD	fn+0(FP), R3
+	MOVD	arg+8(FP), R4
+
+	MOVD	R1, R7		// save original stack pointer
+	MOVD	g, R5
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already.
+	// Moreover, if it's called inside the signal handler, it must not switch
+	// to g0 as it can be in use by another syscall.
+	MOVD	g_m(g), R8
+	MOVD	m_gsignal(R8), R6
+	CMP	R6, g
+	BEQ	g0
+	MOVD	m_g0(R8), R6
+	CMP	R6, g
+	BEQ	g0
+	BL	gosave<>(SB)
+	MOVD	R6, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R1
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+#ifdef GOOS_aix
+	// Create a fake LR to improve backtrace.
+	MOVD	$runtime·asmcgocall(SB), R6
+	MOVD	R6, 16(R1)
+	// AIX also save one argument on the stack.
+	SUB $8, R1
+#endif
+	// Save room for two of our pointers, plus the callee
+	// save area that lives on the caller stack.
+	SUB	$(asmcgocallSaveOffset+16), R1
+	RLDCR	$0, R1, $~15, R1	// 16-byte alignment for gcc ABI
+	MOVD	R5, (asmcgocallSaveOffset+8)(R1)// save old g on stack
+	MOVD	(g_stack+stack_hi)(R5), R5
+	SUB	R7, R5
+	MOVD	R5, asmcgocallSaveOffset(R1)    // save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+#ifdef GOOS_aix
+	MOVD	R7, 0(R1)	// Save frame pointer to allow manual backtrace with gdb
+#else
+	MOVD	R0, 0(R1)	// clear back chain pointer (TODO can we give it real back trace information?)
+#endif
+	// This is a "global call", so put the global entry point in r12
+	MOVD	R3, R12
+
+#ifdef GOARCH_ppc64
+	// ppc64 use elf ABI v1. we must get the real entry address from
+	// first slot of the function descriptor before call.
+	// Same for AIX.
+	MOVD	8(R12), R2
+	MOVD	(R12), R12
+#endif
+	MOVD	R12, CTR
+	MOVD	R4, R3		// arg in r3
+	BL	(CTR)
+	// C code can clobber R0, so set it back to 0. F27-F31 are
+	// callee save, so we don't need to recover those.
+	XOR	R0, R0
+	// Restore g, stack pointer, toc pointer.
+	// R3 is errno, so don't touch it
+	MOVD	(asmcgocallSaveOffset+8)(R1), g
+	MOVD	(g_stack+stack_hi)(g), R5
+	MOVD	asmcgocallSaveOffset(R1), R6
+	SUB	R6, R5
+#ifndef GOOS_aix
+	MOVD	24(R5), R2
+#endif
+	MOVD	R5, R1
+	BL	runtime·save_g(SB)
+
+	MOVW	R3, ret+16(FP)
+	RET
+
+// func cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Load m and g from thread-local storage.
+	MOVBZ	runtime·iscgo(SB), R3
+	CMP	R3, $0
+	BEQ	nocgo
+	BL	runtime·load_g(SB)
+nocgo:
+
+	// If g is nil, Go did not create the current thread.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	CMP	g, $0
+	BEQ	needm
+
+	MOVD	g_m(g), R8
+	MOVD	R8, savedm-8(SP)
+	BR	havem
+
+needm:
+	MOVD	g, savedm-8(SP) // g is zero, so is m.
+	MOVD	$runtime·needm(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), R3
+	MOVD	R1, (g_sched+gobuf_sp)(R3)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 8(R1) aka savedsp-16(SP).
+	MOVD	m_g0(R8), R3
+	MOVD	(g_sched+gobuf_sp)(R3), R4
+	MOVD	R4, savedsp-24(SP)      // must match frame size
+	MOVD	R1, (g_sched+gobuf_sp)(R3)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVD	m_curg(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R4 // prepare stack as R4
+	MOVD	(g_sched+gobuf_pc)(g), R5
+	MOVD	R5, -(24+FIXED_FRAME)(R4)       // "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOVD	fn+0(FP), R5
+	MOVD	frame+8(FP), R6
+	MOVD	ctxt+16(FP), R7
+	MOVD	$-(24+FIXED_FRAME)(R4), R1      // switch stack; must match frame size
+	MOVD    R5, FIXED_FRAME+0(R1)
+	MOVD    R6, FIXED_FRAME+8(R1)
+	MOVD    R7, FIXED_FRAME+16(R1)
+	BL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVD	0(R1), R5
+	MOVD	R5, (g_sched+gobuf_pc)(g)
+	MOVD	$(24+FIXED_FRAME)(R1), R4       // must match frame size
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R1
+	MOVD	savedsp-24(SP), R4      // must match frame size
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m
+	// for the duration of the call. Since the call is over, return it with dropm.
+	MOVD	savedm-8(SP), R6
+	CMP	R6, $0
+	BNE	droppedm
+	MOVD	$runtime·dropm(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+droppedm:
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVD	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	BL	runtime·save_g(SB)
+	RET
+
+#ifdef GOARCH_ppc64
+#ifdef GOOS_aix
+DATA    setg_gcc<>+0(SB)/8, $_setg_gcc<>(SB)
+DATA    setg_gcc<>+8(SB)/8, $TOC(SB)
+DATA    setg_gcc<>+16(SB)/8, $0
+GLOBL   setg_gcc<>(SB), NOPTR, $24
+#else
+TEXT setg_gcc<>(SB),NOSPLIT|NOFRAME,$0-0
+	DWORD	$_setg_gcc<>(SB)
+	DWORD	$0
+	DWORD	$0
+#endif
+#endif
+
+// void setg_gcc(G*); set g in C TLS.
+// Must obey the gcc calling convention.
+#ifdef GOARCH_ppc64le
+TEXT setg_gcc<>(SB),NOSPLIT|NOFRAME,$0-0
+#else
+TEXT _setg_gcc<>(SB),NOSPLIT|NOFRAME,$0-0
+#endif
+	// The standard prologue clobbers R31, which is callee-save in
+	// the C ABI, so we have to use $-8-0 and save LR ourselves.
+	MOVD	LR, R4
+	// Also save g and R31, since they're callee-save in C ABI
+	MOVD	R31, R5
+	MOVD	g, R6
+
+	MOVD	R3, g
+	BL	runtime·save_g(SB)
+
+	MOVD	R6, g
+	MOVD	R5, R31
+	MOVD	R4, LR
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	(R0), R0
+	UNDEF
+
+#define	TBR	268
+
+// int64 runtime·cputicks(void)
+TEXT runtime·cputicks(SB),NOSPLIT,$0-8
+	MOVD	SPR(TBR), R3
+	MOVD	R3, ret+0(FP)
+	RET
+
+// AES hashing not implemented for ppc64
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVW	$0, R3
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+#ifdef GOOS_aix
+// On AIX, _cgo_topofstack is defined in runtime/cgo, because it must
+// be a longcall in order to prevent trampolines from ld.
+TEXT __cgo_topofstack(SB),NOSPLIT|NOFRAME,$0
+#else
+TEXT _cgo_topofstack(SB),NOSPLIT|NOFRAME,$0
+#endif
+	// g (R30) and R31 are callee-save in the C ABI, so save them
+	MOVD	g, R4
+	MOVD	R31, R5
+	MOVD	LR, R6
+
+	BL	runtime·load_g(SB)	// clobbers g (R30), R31
+	MOVD	g_m(g), R3
+	MOVD	m_curg(R3), R3
+	MOVD	(g_stack+stack_hi)(R3), R3
+
+	MOVD	R4, g
+	MOVD	R5, R31
+	MOVD	R6, LR
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+//
+// When dynamically linking Go, it can be returned to from a function
+// implemented in a different module and so needs to reload the TOC pointer
+// from the stack (although this function declares that it does not set up x-a
+// frame, newproc1 does in fact allocate one for goexit and saves the TOC
+// pointer in the correct place).
+// goexit+_PCQuantum is halfway through the usual global entry point prologue
+// that derives r2 from r12 which is a bit silly, but not harmful.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	MOVD	24(R1), R2
+	BL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	MOVD	R0, R0	// NOP
+
+// prepGoExitFrame saves the current TOC pointer (i.e. the TOC pointer for the
+// module containing runtime) to the frame that goexit will execute in when
+// the goroutine exits. It's implemented in assembly mainly because that's the
+// easiest way to get access to R2.
+TEXT runtime·prepGoExitFrame(SB),NOSPLIT,$0-8
+	MOVD    sp+0(FP), R3
+	MOVD    R2, 24(R3)
+	RET
+
+TEXT runtime·addmoduledata(SB),NOSPLIT|NOFRAME,$0-0
+	ADD	$-8, R1
+	MOVD	R31, 0(R1)
+	MOVD	runtime·lastmoduledatap(SB), R4
+	MOVD	R3, moduledata_next(R4)
+	MOVD	R3, runtime·lastmoduledatap(SB)
+	MOVD	0(R1), R31
+	ADD	$8, R1
+	RET
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R3
+	MOVB	R3, ret+0(FP)
+	RET
+
+// gcWriteBarrier performs a heap pointer write and informs the GC.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It takes two arguments:
+// - R20 is the destination of the write
+// - R21 is the value being written at R20.
+// It clobbers condition codes.
+// It does not clobber R0 through R17 (except special registers),
+// but may clobber any other register, *including* R31.
+TEXT runtime·gcWriteBarrier(SB),NOSPLIT,$112
+	// The standard prologue clobbers R31.
+	// We use R18 and R19 as scratch registers.
+	MOVD	g_m(g), R18
+	MOVD	m_p(R18), R18
+	MOVD	(p_wbBuf+wbBuf_next)(R18), R19
+	// Increment wbBuf.next position.
+	ADD	$16, R19
+	MOVD	R19, (p_wbBuf+wbBuf_next)(R18)
+	MOVD	(p_wbBuf+wbBuf_end)(R18), R18
+	CMP	R18, R19
+	// Record the write.
+	MOVD	R21, -16(R19)	// Record value
+	MOVD	(R20), R18	// TODO: This turns bad writes into bad reads.
+	MOVD	R18, -8(R19)	// Record *slot
+	// Is the buffer full? (flags set in CMP above)
+	BEQ	flush
+ret:
+	// Do the write.
+	MOVD	R21, (R20)
+	RET
+
+flush:
+	// Save registers R0 through R15 since these were not saved by the caller.
+	// We don't save all registers on ppc64 because it takes too much space.
+	MOVD	R20, (FIXED_FRAME+0)(R1)	// Also first argument to wbBufFlush
+	MOVD	R21, (FIXED_FRAME+8)(R1)	// Also second argument to wbBufFlush
+	// R0 is always 0, so no need to spill.
+	// R1 is SP.
+	// R2 is SB.
+	MOVD	R3, (FIXED_FRAME+16)(R1)
+	MOVD	R4, (FIXED_FRAME+24)(R1)
+	MOVD	R5, (FIXED_FRAME+32)(R1)
+	MOVD	R6, (FIXED_FRAME+40)(R1)
+	MOVD	R7, (FIXED_FRAME+48)(R1)
+	MOVD	R8, (FIXED_FRAME+56)(R1)
+	MOVD	R9, (FIXED_FRAME+64)(R1)
+	MOVD	R10, (FIXED_FRAME+72)(R1)
+	// R11, R12 may be clobbered by external-linker-inserted trampoline
+	// R13 is REGTLS
+	MOVD	R14, (FIXED_FRAME+80)(R1)
+	MOVD	R15, (FIXED_FRAME+88)(R1)
+	MOVD	R16, (FIXED_FRAME+96)(R1)
+	MOVD	R17, (FIXED_FRAME+104)(R1)
+
+	// This takes arguments R20 and R21.
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVD	(FIXED_FRAME+0)(R1), R20
+	MOVD	(FIXED_FRAME+8)(R1), R21
+	MOVD	(FIXED_FRAME+16)(R1), R3
+	MOVD	(FIXED_FRAME+24)(R1), R4
+	MOVD	(FIXED_FRAME+32)(R1), R5
+	MOVD	(FIXED_FRAME+40)(R1), R6
+	MOVD	(FIXED_FRAME+48)(R1), R7
+	MOVD	(FIXED_FRAME+56)(R1), R8
+	MOVD	(FIXED_FRAME+64)(R1), R9
+	MOVD	(FIXED_FRAME+72)(R1), R10
+	MOVD	(FIXED_FRAME+80)(R1), R14
+	MOVD	(FIXED_FRAME+88)(R1), R15
+	MOVD	(FIXED_FRAME+96)(R1), R16
+	MOVD	(FIXED_FRAME+104)(R1), R17
+	JMP	ret
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-16
+	MOVD	R3, x+0(FP)
+	MOVD	R4, y+8(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-16
+	MOVD	R3, x+0(FP)
+	MOVD	R4, y+8(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-16
+	MOVD	R4, x+0(FP)
+	MOVD	R5, y+8(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-16
+	MOVD	R4, x+0(FP)
+	MOVD	R5, y+8(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-16
+	MOVD	R4, x+0(FP)
+	MOVD	R5, y+8(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-16
+	MOVD	R4, x+0(FP)
+	MOVD	R5, y+8(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-16
+	MOVD	R3, x+0(FP)
+	MOVD	R4, y+8(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-16
+	MOVD	R3, x+0(FP)
+	MOVD	R4, y+8(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-16
+	MOVD	R5, x+0(FP)
+	MOVD	R6, y+8(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-16
+	MOVD	R5, x+0(FP)
+	MOVD	R6, y+8(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-16
+	MOVD	R5, x+0(FP)
+	MOVD	R6, y+8(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-16
+	MOVD	R5, x+0(FP)
+	MOVD	R6, y+8(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-16
+	MOVD	R4, x+0(FP)
+	MOVD	R5, y+8(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-16
+	MOVD	R4, x+0(FP)
+	MOVD	R5, y+8(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-16
+	MOVD	R3, x+0(FP)
+	MOVD	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-16
+	MOVD	R3, x+0(FP)
+	MOVD	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
diff --git a/src/runtime/asm_riscv64.s b/src/runtime/asm_riscv64.s
new file mode 100644
index 0000000..01b42dc
--- /dev/null
+++ b/src/runtime/asm_riscv64.s
@@ -0,0 +1,821 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// func rt0_go()
+TEXT runtime·rt0_go(SB),NOSPLIT,$0
+	// X2 = stack; A0 = argc; A1 = argv
+	ADD	$-24, X2
+	MOV	A0, 8(X2)	// argc
+	MOV	A1, 16(X2)	// argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOV	$runtime·g0(SB), g
+	MOV	$(-64*1024), T0
+	ADD	T0, X2, T1
+	MOV	T1, g_stackguard0(g)
+	MOV	T1, g_stackguard1(g)
+	MOV	T1, (g_stack+stack_lo)(g)
+	MOV	X2, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOV	_cgo_init(SB), T0
+	BEQ	T0, ZERO, nocgo
+
+	MOV	ZERO, A3		// arg 3: not used
+	MOV	ZERO, A2		// arg 2: not used
+	MOV	$setg_gcc<>(SB), A1	// arg 1: setg
+	MOV	g, A0			// arg 0: G
+	JALR	RA, T0
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOV	(g_stack+stack_lo)(g), T0
+	ADD	$const__StackGuard, T0
+	MOV	T0, g_stackguard0(g)
+	MOV	T0, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOV	$runtime·m0(SB), T0
+
+	// save m->g0 = g0
+	MOV	g, m_g0(T0)
+	// save m0 to g0->m
+	MOV	T0, g_m(g)
+
+	CALL	runtime·check(SB)
+
+	// args are already prepared
+	CALL	runtime·args(SB)
+	CALL	runtime·osinit(SB)
+	CALL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOV	$runtime·mainPC(SB), T0		// entry
+	ADD	$-24, X2
+	MOV	T0, 16(X2)
+	MOV	ZERO, 8(X2)
+	MOV	ZERO, 0(X2)
+	CALL	runtime·newproc(SB)
+	ADD	$24, X2
+
+	// start this M
+	CALL	runtime·mstart(SB)
+
+	WORD $0 // crash if reached
+	RET
+
+// void setg_gcc(G*); set g called from gcc with g in A0
+TEXT setg_gcc<>(SB),NOSPLIT,$0-0
+	MOV	A0, g
+	CALL	runtime·save_g(SB)
+	RET
+
+// func cputicks() int64
+TEXT runtime·cputicks(SB),NOSPLIT,$0-8
+	RDTIME	A0
+	MOV	A0, ret+0(FP)
+	RET
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	UNDEF
+	JALR	RA, ZERO	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOV	fn+0(FP), CTXT	// CTXT = fn
+	MOV	g_m(g), T0	// T0 = m
+
+	MOV	m_gsignal(T0), T1	// T1 = gsignal
+	BEQ	g, T1, noswitch
+
+	MOV	m_g0(T0), T1	// T1 = g0
+	BEQ	g, T1, noswitch
+
+	MOV	m_curg(T0), T2
+	BEQ	g, T2, switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOV	$runtime·badsystemstack(SB), T1
+	JALR	RA, T1
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOV	$runtime·systemstack_switch(SB), T2
+	ADD	$8, T2	// get past prologue
+	MOV	T2, (g_sched+gobuf_pc)(g)
+	MOV	X2, (g_sched+gobuf_sp)(g)
+	MOV	ZERO, (g_sched+gobuf_lr)(g)
+	MOV	g, (g_sched+gobuf_g)(g)
+
+	// switch to g0
+	MOV	T1, g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), T0
+	// make it look like mstart called systemstack on g0, to stop traceback
+	ADD	$-8, T0
+	MOV	$runtime·mstart(SB), T1
+	MOV	T1, 0(T0)
+	MOV	T0, X2
+
+	// call target function
+	MOV	0(CTXT), T1	// code pointer
+	JALR	RA, T1
+
+	// switch back to g
+	MOV	g_m(g), T0
+	MOV	m_curg(T0), g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), X2
+	MOV	ZERO, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOV	0(CTXT), T1	// code pointer
+	ADD	$8, X2
+	JMP	(T1)
+
+TEXT runtime·getcallerpc(SB),NOSPLIT|NOFRAME,$0-8
+	MOV	0(X2), T0		// LR saved by caller
+	MOV	T0, ret+0(FP)
+	RET
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R1: framesize, R2: argsize, R3: LR
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+
+// func morestack()
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOV	g_m(g), A0
+	MOV	m_g0(A0), A1
+	BNE	g, A1, 3(PC)
+	CALL	runtime·badmorestackg0(SB)
+	CALL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOV	m_gsignal(A0), A1
+	BNE	g, A1, 3(PC)
+	CALL	runtime·badmorestackgsignal(SB)
+	CALL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOV	X2, (g_sched+gobuf_sp)(g)
+	MOV	T0, (g_sched+gobuf_pc)(g)
+	MOV	RA, (g_sched+gobuf_lr)(g)
+	MOV	CTXT, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOV	RA, (m_morebuf+gobuf_pc)(A0)	// f's caller's PC
+	MOV	X2, (m_morebuf+gobuf_sp)(A0)	// f's caller's SP
+	MOV	g, (m_morebuf+gobuf_g)(A0)
+
+	// Call newstack on m->g0's stack.
+	MOV	m_g0(A0), g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), X2
+	// Create a stack frame on g0 to call newstack.
+	MOV	ZERO, -8(X2)	// Zero saved LR in frame
+	ADD	$-8, X2
+	CALL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+// func morestack_noctxt()
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	MOV	ZERO, CTXT
+	JMP	runtime·morestack(SB)
+
+// AES hashing not implemented for riscv64
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback(SB)
+
+// func return0()
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOV	$0, A0
+	RET
+
+// restore state from Gobuf; longjmp
+
+// func gogo(buf *gobuf)
+TEXT runtime·gogo(SB), NOSPLIT, $16-8
+	MOV	buf+0(FP), T0
+	MOV	gobuf_g(T0), g	// make sure g is not nil
+	CALL	runtime·save_g(SB)
+
+	MOV	(g), ZERO // make sure g is not nil
+	MOV	gobuf_sp(T0), X2
+	MOV	gobuf_lr(T0), RA
+	MOV	gobuf_ret(T0), A0
+	MOV	gobuf_ctxt(T0), CTXT
+	MOV	ZERO, gobuf_sp(T0)
+	MOV	ZERO, gobuf_ret(T0)
+	MOV	ZERO, gobuf_lr(T0)
+	MOV	ZERO, gobuf_ctxt(T0)
+	MOV	gobuf_pc(T0), T0
+	JALR	ZERO, T0
+
+// func jmpdefer(fv *funcval, argp uintptr)
+// called from deferreturn
+// 1. grab stored return address from the caller's frame
+// 2. sub 8 bytes to get back to JAL deferreturn
+// 3. JMP to fn
+TEXT runtime·jmpdefer(SB), NOSPLIT|NOFRAME, $0-16
+	MOV	0(X2), RA
+	ADD	$-8, RA
+
+	MOV	fv+0(FP), CTXT
+	MOV	argp+8(FP), X2
+	ADD	$-8, X2
+	MOV	0(CTXT), T0
+	JALR	ZERO, T0
+
+// func procyield(cycles uint32)
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	RET
+
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+
+// func mcall(fn func(*g))
+TEXT runtime·mcall(SB), NOSPLIT|NOFRAME, $0-8
+	// Save caller state in g->sched
+	MOV	X2, (g_sched+gobuf_sp)(g)
+	MOV	RA, (g_sched+gobuf_pc)(g)
+	MOV	ZERO, (g_sched+gobuf_lr)(g)
+	MOV	g, (g_sched+gobuf_g)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOV	g, T0
+	MOV	g_m(g), T1
+	MOV	m_g0(T1), g
+	CALL	runtime·save_g(SB)
+	BNE	g, T0, 2(PC)
+	JMP	runtime·badmcall(SB)
+	MOV	fn+0(FP), CTXT			// context
+	MOV	0(CTXT), T1			// code pointer
+	MOV	(g_sched+gobuf_sp)(g), X2	// sp = m->g0->sched.sp
+	ADD	$-16, X2
+	MOV	T0, 8(X2)
+	MOV	ZERO, 0(X2)
+	JALR	RA, T1
+	JMP	runtime·badmcall2(SB)
+
+// func gosave(buf *gobuf)
+// save state in Gobuf; setjmp
+TEXT runtime·gosave(SB), NOSPLIT|NOFRAME, $0-8
+	MOV	buf+0(FP), T1
+	MOV	X2, gobuf_sp(T1)
+	MOV	RA, gobuf_pc(T1)
+	MOV	g, gobuf_g(T1)
+	MOV	ZERO, gobuf_lr(T1)
+	MOV	ZERO, gobuf_ret(T1)
+	// Assert ctxt is zero. See func save.
+	MOV	gobuf_ctxt(T1), T1
+	BEQ	T1, ZERO, 2(PC)
+	CALL	runtime·badctxt(SB)
+	RET
+
+// Save state of caller into g->sched. Smashes X31.
+TEXT gosave<>(SB),NOSPLIT|NOFRAME,$0
+	MOV	X1, (g_sched+gobuf_pc)(g)
+	MOV	X2, (g_sched+gobuf_sp)(g)
+	MOV	ZERO, (g_sched+gobuf_lr)(g)
+	MOV	ZERO, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOV	(g_sched+gobuf_ctxt)(g), X31
+	BEQ	ZERO, X31, 2(PC)
+	CALL	runtime·badctxt(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOV	fn+0(FP), X5
+	MOV	arg+8(FP), X10
+
+	MOV	X2, X8	// save original stack pointer
+	MOV	g, X9
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already.
+	MOV	g_m(g), X6
+	MOV	m_g0(X6), X7
+	BEQ	X7, g, g0
+
+	CALL	gosave<>(SB)
+	MOV	X7, g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), X2
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+	// Save room for two of our pointers.
+	ADD	$-16, X2
+	MOV	X9, 0(X2)	// save old g on stack
+	MOV	(g_stack+stack_hi)(X9), X9
+	SUB	X8, X9, X8
+	MOV	X8, 8(X2)	// save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+
+	JALR	RA, (X5)
+
+	// Restore g, stack pointer. X10 is return value.
+	MOV	0(X2), g
+	CALL	runtime·save_g(SB)
+	MOV	(g_stack+stack_hi)(g), X5
+	MOV	8(X2), X6
+	SUB	X6, X5, X6
+	MOV	X6, X2
+
+	MOVW	X10, ret+16(FP)
+	RET
+
+// func asminit()
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+// reflectcall: call a function with the given argument list
+// func call(argtype *_type, f *FuncVal, arg *byte, argsize, retoffset uint32).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)	\
+	MOV	$MAXSIZE, T1	\
+	BLTU	T1, T0, 3(PC)	\
+	MOV	$NAME(SB), T2;	\
+	JALR	ZERO, T2
+// Note: can't just "BR NAME(SB)" - bad inlining results.
+
+// func call(argtype *rtype, fn, arg unsafe.Pointer, n uint32, retoffset uint32)
+TEXT reflect·call(SB), NOSPLIT, $0-0
+	JMP	·reflectcall(SB)
+
+// func reflectcall(argtype *_type, fn, arg unsafe.Pointer, argsize uint32, retoffset uint32)
+TEXT ·reflectcall(SB), NOSPLIT|NOFRAME, $0-32
+	MOVWU argsize+24(FP), T0
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOV	$runtime·badreflectcall(SB), T2
+	JALR	ZERO, T2
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-24;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOV	arg+16(FP), A1;			\
+	MOVWU	argsize+24(FP), A2;		\
+	MOV	X2, A3;				\
+	ADD	$8, A3;				\
+	ADD	A3, A2;				\
+	BEQ	A3, A2, 6(PC);			\
+	MOVBU	(A1), A4;			\
+	ADD	$1, A1;				\
+	MOVB	A4, (A3);			\
+	ADD	$1, A3;				\
+	JMP	-5(PC);				\
+	/* call function */			\
+	MOV	f+8(FP), CTXT;			\
+	MOV	(CTXT), A4;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	JALR	RA, A4;				\
+	/* copy return values back */		\
+	MOV	argtype+0(FP), A5;		\
+	MOV	arg+16(FP), A1;			\
+	MOVWU	n+24(FP), A2;			\
+	MOVWU	retoffset+28(FP), A4;		\
+	ADD	$8, X2, A3;			\
+	ADD	A4, A3; 			\
+	ADD	A4, A1;				\
+	SUB	A4, A2;				\
+	CALL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $32-0
+	MOV	A5, 8(X2)
+	MOV	A1, 16(X2)
+	MOV	A3, 24(X2)
+	MOV	A2, 32(X2)
+	CALL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$8
+	// g (X27) and REG_TMP (X31) might be clobbered by load_g.
+	// X27 is callee-save in the gcc calling convention, so save it.
+	MOV	g, savedX27-8(SP)
+
+	CALL	runtime·load_g(SB)
+	MOV	g_m(g), X5
+	MOV	m_curg(X5), X5
+	MOV	(g_stack+stack_hi)(X5), X10 // return value in X10
+
+	MOV	savedX27-8(SP), g
+	RET
+
+// func goexit(neverCallThisFunction)
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	MOV	ZERO, ZERO	// NOP
+	JMP	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	MOV	ZERO, ZERO	// NOP
+
+// func cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Load m and g from thread-local storage.
+	MOVBU	runtime·iscgo(SB), X5
+	BEQ	ZERO, X5, nocgo
+	CALL	runtime·load_g(SB)
+nocgo:
+
+	// If g is nil, Go did not create the current thread.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	BEQ	ZERO, g, needm
+
+	MOV	g_m(g), X5
+	MOV	X5, savedm-8(SP)
+	JMP	havem
+
+needm:
+	MOV	g, savedm-8(SP) // g is zero, so is m.
+	MOV	$runtime·needm(SB), X6
+	JALR	RA, X6
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOV	g_m(g), X5
+	MOV	m_g0(X5), X6
+	MOV	X2, (g_sched+gobuf_sp)(X6)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 8(X2) aka savedsp-24(SP).
+	MOV	m_g0(X5), X6
+	MOV	(g_sched+gobuf_sp)(X6), X7
+	MOV	X7, savedsp-24(SP)	// must match frame size
+	MOV	X2, (g_sched+gobuf_sp)(X6)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOV	m_curg(X5), g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), X6 // prepare stack as X6
+	MOV	(g_sched+gobuf_pc)(g), X7
+	MOV	X7, -(24+8)(X6)		// "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOV	fn+0(FP), X7
+	MOV	frame+8(FP), X8
+	MOV	ctxt+16(FP), X9
+	MOV	$-(24+8)(X6), X2	// switch stack; must match frame size
+	MOV	X7, 8(X2)
+	MOV	X8, 16(X2)
+	MOV	X9, 24(X2)
+	CALL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOV	0(X2), X7
+	MOV	X7, (g_sched+gobuf_pc)(g)
+	MOV	$(24+8)(X2), X6		// must match frame size
+	MOV	X6, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOV	g_m(g), X5
+	MOV	m_g0(X5), g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), X2
+	MOV	savedsp-24(SP), X6	// must match frame size
+	MOV	X6, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m
+	// for the duration of the call. Since the call is over, return it with dropm.
+	MOV	savedm-8(SP), X5
+	BNE	ZERO, X5, droppedm
+	MOV	$runtime·dropm(SB), X6
+	JALR	RA, X6
+droppedm:
+
+	// Done!
+	RET
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	EBREAK
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	EBREAK
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOV	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	CALL	runtime·save_g(SB)
+	RET
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOV	$1, T0
+	MOV	T0, ret+0(FP)
+	RET
+
+// gcWriteBarrier performs a heap pointer write and informs the GC.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It takes two arguments:
+// - T0 is the destination of the write
+// - T1 is the value being written at T0.
+// It clobbers R30 (the linker temp register - REG_TMP).
+// The act of CALLing gcWriteBarrier will clobber RA (LR).
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+TEXT runtime·gcWriteBarrier(SB),NOSPLIT,$216
+	// Save the registers clobbered by the fast path.
+	MOV	A0, 25*8(X2)
+	MOV	A1, 26*8(X2)
+	MOV	g_m(g), A0
+	MOV	m_p(A0), A0
+	MOV	(p_wbBuf+wbBuf_next)(A0), A1
+	// Increment wbBuf.next position.
+	ADD	$16, A1
+	MOV	A1, (p_wbBuf+wbBuf_next)(A0)
+	MOV	(p_wbBuf+wbBuf_end)(A0), A0
+	MOV	A0, T6		// T6 is linker temp register (REG_TMP)
+	// Record the write.
+	MOV	T1, -16(A1)	// Record value
+	MOV	(T0), A0	// TODO: This turns bad writes into bad reads.
+	MOV	A0, -8(A1)	// Record *slot
+	// Is the buffer full?
+	BEQ	A1, T6, flush
+ret:
+	MOV	25*8(X2), A0
+	MOV	26*8(X2), A1
+	// Do the write.
+	MOV	T1, (T0)
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	MOV	T0, 1*8(X2)	// Also first argument to wbBufFlush
+	MOV	T1, 2*8(X2)	// Also second argument to wbBufFlush
+	// X0 is zero register
+	// X1 is LR, saved by prologue
+	// X2 is SP
+	MOV	X3, 3*8(X2)
+	// X4 is TP
+	// X5 is first arg to wbBufFlush (T0)
+	// X6 is second arg to wbBufFlush (T1)
+	MOV	X7, 4*8(X2)
+	MOV	X8, 5*8(X2)
+	MOV	X9, 6*8(X2)
+	// X10 already saved (A0)
+	// X11 already saved (A1)
+	MOV	X12, 7*8(X2)
+	MOV	X13, 8*8(X2)
+	MOV	X14, 9*8(X2)
+	MOV	X15, 10*8(X2)
+	MOV	X16, 11*8(X2)
+	MOV	X17, 12*8(X2)
+	MOV	X18, 13*8(X2)
+	MOV	X19, 14*8(X2)
+	MOV	X20, 15*8(X2)
+	MOV	X21, 16*8(X2)
+	MOV	X22, 17*8(X2)
+	MOV	X23, 18*8(X2)
+	MOV	X24, 19*8(X2)
+	MOV	X25, 20*8(X2)
+	MOV	X26, 21*8(X2)
+	// X27 is g.
+	MOV	X28, 22*8(X2)
+	MOV	X29, 23*8(X2)
+	MOV	X30, 24*8(X2)
+	// X31 is tmp register.
+
+	// This takes arguments T0 and T1.
+	CALL	runtime·wbBufFlush(SB)
+
+	MOV	1*8(X2), T0
+	MOV	2*8(X2), T1
+	MOV	3*8(X2), X3
+	MOV	4*8(X2), X7
+	MOV	5*8(X2), X8
+	MOV	6*8(X2), X9
+	MOV	7*8(X2), X12
+	MOV	8*8(X2), X13
+	MOV	9*8(X2), X14
+	MOV	10*8(X2), X15
+	MOV	11*8(X2), X16
+	MOV	12*8(X2), X17
+	MOV	13*8(X2), X18
+	MOV	14*8(X2), X19
+	MOV	15*8(X2), X20
+	MOV	16*8(X2), X21
+	MOV	17*8(X2), X22
+	MOV	18*8(X2), X23
+	MOV	19*8(X2), X24
+	MOV	20*8(X2), X25
+	MOV	21*8(X2), X26
+	MOV	22*8(X2), X28
+	MOV	23*8(X2), X29
+	MOV	24*8(X2), X30
+
+	JMP	ret
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-16
+	MOV	T0, x+0(FP)
+	MOV	T1, y+8(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-16
+	MOV	T0, x+0(FP)
+	MOV	T1, y+8(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-16
+	MOV	T1, x+0(FP)
+	MOV	T2, y+8(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-16
+	MOV	T1, x+0(FP)
+	MOV	T2, y+8(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-16
+	MOV	T1, x+0(FP)
+	MOV	T2, y+8(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-16
+	MOV	T1, x+0(FP)
+	MOV	T2, y+8(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-16
+	MOV	T0, x+0(FP)
+	MOV	T1, y+8(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-16
+	MOV	T0, x+0(FP)
+	MOV	T1, y+8(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-16
+	MOV	T2, x+0(FP)
+	MOV	T3, y+8(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-16
+	MOV	T2, x+0(FP)
+	MOV	T3, y+8(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-16
+	MOV	T2, x+0(FP)
+	MOV	T3, y+8(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-16
+	MOV	T2, x+0(FP)
+	MOV	T3, y+8(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-16
+	MOV	T1, x+0(FP)
+	MOV	T2, y+8(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-16
+	MOV	T1, x+0(FP)
+	MOV	T2, y+8(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-16
+	MOV	T0, x+0(FP)
+	MOV	T1, y+8(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-16
+	MOV	T0, x+0(FP)
+	MOV	T1, y+8(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
diff --git a/src/runtime/asm_s390x.s b/src/runtime/asm_s390x.s
new file mode 100644
index 0000000..7baef37
--- /dev/null
+++ b/src/runtime/asm_s390x.s
@@ -0,0 +1,918 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// _rt0_s390x_lib is common startup code for s390x systems when
+// using -buildmode=c-archive or -buildmode=c-shared. The linker will
+// arrange to invoke this function as a global constructor (for
+// c-archive) or when the shared library is loaded (for c-shared).
+// We expect argc and argv to be passed in the usual C ABI registers
+// R2 and R3.
+TEXT _rt0_s390x_lib(SB), NOSPLIT|NOFRAME, $0
+	STMG	R6, R15, 48(R15)
+	MOVD	R2, _rt0_s390x_lib_argc<>(SB)
+	MOVD	R3, _rt0_s390x_lib_argv<>(SB)
+
+	// Save R6-R15 in the register save area of the calling function.
+	STMG	R6, R15, 48(R15)
+
+	// Allocate 80 bytes on the stack.
+	MOVD	$-80(R15), R15
+
+	// Save F8-F15 in our stack frame.
+	FMOVD	F8, 16(R15)
+	FMOVD	F9, 24(R15)
+	FMOVD	F10, 32(R15)
+	FMOVD	F11, 40(R15)
+	FMOVD	F12, 48(R15)
+	FMOVD	F13, 56(R15)
+	FMOVD	F14, 64(R15)
+	FMOVD	F15, 72(R15)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R1
+	BL	R1
+
+	// Create a new thread to finish Go runtime initialization.
+	MOVD	_cgo_sys_thread_create(SB), R1
+	CMP	R1, $0
+	BEQ	nocgo
+	MOVD	$_rt0_s390x_lib_go(SB), R2
+	MOVD	$0, R3
+	BL	R1
+	BR	restore
+
+nocgo:
+	MOVD	$0x800000, R1              // stacksize
+	MOVD	R1, 0(R15)
+	MOVD	$_rt0_s390x_lib_go(SB), R1
+	MOVD	R1, 8(R15)                 // fn
+	MOVD	$runtime·newosproc(SB), R1
+	BL	R1
+
+restore:
+	// Restore F8-F15 from our stack frame.
+	FMOVD	16(R15), F8
+	FMOVD	24(R15), F9
+	FMOVD	32(R15), F10
+	FMOVD	40(R15), F11
+	FMOVD	48(R15), F12
+	FMOVD	56(R15), F13
+	FMOVD	64(R15), F14
+	FMOVD	72(R15), F15
+	MOVD	$80(R15), R15
+
+	// Restore R6-R15.
+	LMG	48(R15), R6, R15
+	RET
+
+// _rt0_s390x_lib_go initializes the Go runtime.
+// This is started in a separate thread by _rt0_s390x_lib.
+TEXT _rt0_s390x_lib_go(SB), NOSPLIT|NOFRAME, $0
+	MOVD	_rt0_s390x_lib_argc<>(SB), R2
+	MOVD	_rt0_s390x_lib_argv<>(SB), R3
+	MOVD	$runtime·rt0_go(SB), R1
+	BR	R1
+
+DATA _rt0_s390x_lib_argc<>(SB)/8, $0
+GLOBL _rt0_s390x_lib_argc<>(SB), NOPTR, $8
+DATA _rt0_s90x_lib_argv<>(SB)/8, $0
+GLOBL _rt0_s390x_lib_argv<>(SB), NOPTR, $8
+
+TEXT runtime·rt0_go(SB),NOSPLIT,$0
+	// R2 = argc; R3 = argv; R11 = temp; R13 = g; R15 = stack pointer
+	// C TLS base pointer in AR0:AR1
+
+	// initialize essential registers
+	XOR	R0, R0
+
+	SUB	$24, R15
+	MOVW	R2, 8(R15) // argc
+	MOVD	R3, 16(R15) // argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVD	$runtime·g0(SB), g
+	MOVD	R15, R11
+	SUB	$(64*1024), R11
+	MOVD	R11, g_stackguard0(g)
+	MOVD	R11, g_stackguard1(g)
+	MOVD	R11, (g_stack+stack_lo)(g)
+	MOVD	R15, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOVD	_cgo_init(SB), R11
+	CMPBEQ	R11, $0, nocgo
+	MOVW	AR0, R4			// (AR0 << 32 | AR1) is the TLS base pointer; MOVD is translated to EAR
+	SLD	$32, R4, R4
+	MOVW	AR1, R4			// arg 2: TLS base pointer
+	MOVD	$setg_gcc<>(SB), R3 	// arg 1: setg
+	MOVD	g, R2			// arg 0: G
+	// C functions expect 160 bytes of space on caller stack frame
+	// and an 8-byte aligned stack pointer
+	MOVD	R15, R9			// save current stack (R9 is preserved in the Linux ABI)
+	SUB	$160, R15		// reserve 160 bytes
+	MOVD    $~7, R6
+	AND 	R6, R15			// 8-byte align
+	BL	R11			// this call clobbers volatile registers according to Linux ABI (R0-R5, R14)
+	MOVD	R9, R15			// restore stack
+	XOR	R0, R0			// zero R0
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOVD	(g_stack+stack_lo)(g), R2
+	ADD	$const__StackGuard, R2
+	MOVD	R2, g_stackguard0(g)
+	MOVD	R2, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVD	$runtime·m0(SB), R2
+
+	// save m->g0 = g0
+	MOVD	g, m_g0(R2)
+	// save m0 to g0->m
+	MOVD	R2, g_m(g)
+
+	BL	runtime·check(SB)
+
+	// argc/argv are already prepared on stack
+	BL	runtime·args(SB)
+	BL	runtime·osinit(SB)
+	BL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVD	$runtime·mainPC(SB), R2		// entry
+	SUB     $24, R15
+	MOVD 	R2, 16(R15)
+	MOVD 	$0, 8(R15)
+	MOVD 	$0, 0(R15)
+	BL	runtime·newproc(SB)
+	ADD	$24, R15
+
+	// start this M
+	BL	runtime·mstart(SB)
+
+	MOVD	$0, 1(R0)
+	RET
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	$0, 2(R0)
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+/*
+ *  go-routine
+ */
+
+// void gosave(Gobuf*)
+// save state in Gobuf; setjmp
+TEXT runtime·gosave(SB), NOSPLIT, $-8-8
+	MOVD	buf+0(FP), R3
+	MOVD	R15, gobuf_sp(R3)
+	MOVD	LR, gobuf_pc(R3)
+	MOVD	g, gobuf_g(R3)
+	MOVD	$0, gobuf_lr(R3)
+	MOVD	$0, gobuf_ret(R3)
+	// Assert ctxt is zero. See func save.
+	MOVD	gobuf_ctxt(R3), R3
+	CMPBEQ	R3, $0, 2(PC)
+	BL	runtime·badctxt(SB)
+	RET
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT, $16-8
+	MOVD	buf+0(FP), R5
+	MOVD	gobuf_g(R5), g	// make sure g is not nil
+	BL	runtime·save_g(SB)
+
+	MOVD	0(g), R4
+	MOVD	gobuf_sp(R5), R15
+	MOVD	gobuf_lr(R5), LR
+	MOVD	gobuf_ret(R5), R3
+	MOVD	gobuf_ctxt(R5), R12
+	MOVD	$0, gobuf_sp(R5)
+	MOVD	$0, gobuf_ret(R5)
+	MOVD	$0, gobuf_lr(R5)
+	MOVD	$0, gobuf_ctxt(R5)
+	CMP	R0, R0 // set condition codes for == test, needed by stack split
+	MOVD	gobuf_pc(R5), R6
+	BR	(R6)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return.  It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT, $-8-8
+	// Save caller state in g->sched
+	MOVD	R15, (g_sched+gobuf_sp)(g)
+	MOVD	LR, (g_sched+gobuf_pc)(g)
+	MOVD	$0, (g_sched+gobuf_lr)(g)
+	MOVD	g, (g_sched+gobuf_g)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVD	g, R3
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	CMP	g, R3
+	BNE	2(PC)
+	BR	runtime·badmcall(SB)
+	MOVD	fn+0(FP), R12			// context
+	MOVD	0(R12), R4			// code pointer
+	MOVD	(g_sched+gobuf_sp)(g), R15	// sp = m->g0->sched.sp
+	SUB	$16, R15
+	MOVD	R3, 8(R15)
+	MOVD	$0, 0(R15)
+	BL	(R4)
+	BR	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack.  We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	UNDEF
+	BL	(LR)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVD	fn+0(FP), R3	// R3 = fn
+	MOVD	R3, R12		// context
+	MOVD	g_m(g), R4	// R4 = m
+
+	MOVD	m_gsignal(R4), R5	// R5 = gsignal
+	CMPBEQ	g, R5, noswitch
+
+	MOVD	m_g0(R4), R5	// R5 = g0
+	CMPBEQ	g, R5, noswitch
+
+	MOVD	m_curg(R4), R6
+	CMPBEQ	g, R6, switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVD	$runtime·badsystemstack(SB), R3
+	BL	(R3)
+	BL	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched.  Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVD	$runtime·systemstack_switch(SB), R6
+	ADD	$16, R6	// get past prologue
+	MOVD	R6, (g_sched+gobuf_pc)(g)
+	MOVD	R15, (g_sched+gobuf_sp)(g)
+	MOVD	$0, (g_sched+gobuf_lr)(g)
+	MOVD	g, (g_sched+gobuf_g)(g)
+
+	// switch to g0
+	MOVD	R5, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R3
+	// make it look like mstart called systemstack on g0, to stop traceback
+	SUB	$8, R3
+	MOVD	$runtime·mstart(SB), R4
+	MOVD	R4, 0(R3)
+	MOVD	R3, R15
+
+	// call target function
+	MOVD	0(R12), R3	// code pointer
+	BL	(R3)
+
+	// switch back to g
+	MOVD	g_m(g), R3
+	MOVD	m_curg(R3), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R15
+	MOVD	$0, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVD	0(R12), R3	// code pointer
+	MOVD	0(R15), LR	// restore LR
+	ADD	$8, R15
+	BR	(R3)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R3: framesize, R4: argsize, R5: LR
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVD	g_m(g), R7
+	MOVD	m_g0(R7), R8
+	CMPBNE	g, R8, 3(PC)
+	BL	runtime·badmorestackg0(SB)
+	BL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVD	m_gsignal(R7), R8
+	CMP	g, R8
+	BNE	3(PC)
+	BL	runtime·badmorestackgsignal(SB)
+	BL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVD	R15, (g_sched+gobuf_sp)(g)
+	MOVD	LR, R8
+	MOVD	R8, (g_sched+gobuf_pc)(g)
+	MOVD	R5, (g_sched+gobuf_lr)(g)
+	MOVD	R12, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVD	R5, (m_morebuf+gobuf_pc)(R7)	// f's caller's PC
+	MOVD	R15, (m_morebuf+gobuf_sp)(R7)	// f's caller's SP
+	MOVD	g, (m_morebuf+gobuf_g)(R7)
+
+	// Call newstack on m->g0's stack.
+	MOVD	m_g0(R7), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R15
+	// Create a stack frame on g0 to call newstack.
+	MOVD	$0, -8(R15)	// Zero saved LR in frame
+	SUB	$8, R15
+	BL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	$0, R12
+	BR	runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(argtype *_type, f *FuncVal, arg *byte, argsize, retoffset uint32).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	MOVD	$MAXSIZE, R4;		\
+	CMP	R3, R4;		\
+	BGT	3(PC);			\
+	MOVD	$NAME(SB), R5;	\
+	BR	(R5)
+// Note: can't just "BR NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT, $-8-32
+	MOVWZ argsize+24(FP), R3
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVD	$runtime·badreflectcall(SB), R5
+	BR	(R5)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-24;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVD	arg+16(FP), R4;			\
+	MOVWZ	argsize+24(FP), R5;		\
+	MOVD	$stack-MAXSIZE(SP), R6;		\
+loopArgs: /* copy 256 bytes at a time */	\
+	CMP	R5, $256;			\
+	BLT	tailArgs;			\
+	SUB	$256, R5;			\
+	MVC	$256, 0(R4), 0(R6);		\
+	MOVD	$256(R4), R4;			\
+	MOVD	$256(R6), R6;			\
+	BR	loopArgs;			\
+tailArgs: /* copy remaining bytes */		\
+	CMP	R5, $0;				\
+	BEQ	callFunction;			\
+	SUB	$1, R5;				\
+	EXRL	$callfnMVC<>(SB), R5;		\
+callFunction:					\
+	MOVD	f+8(FP), R12;			\
+	MOVD	(R12), R8;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	BL	(R8);				\
+	/* copy return values back */		\
+	MOVD	argtype+0(FP), R7;		\
+	MOVD	arg+16(FP), R6;			\
+	MOVWZ	n+24(FP), R5;			\
+	MOVD	$stack-MAXSIZE(SP), R4;		\
+	MOVWZ	retoffset+28(FP), R1;		\
+	ADD	R1, R4;				\
+	ADD	R1, R6;				\
+	SUB	R1, R5;				\
+	BL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $32-0
+	MOVD	R7, 8(R15)
+	MOVD	R6, 16(R15)
+	MOVD	R4, 24(R15)
+	MOVD	R5, 32(R15)
+	BL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+// Not a function: target for EXRL (execute relative long) instruction.
+TEXT callfnMVC<>(SB),NOSPLIT|NOFRAME,$0-0
+	MVC	$1, 0(R4), 0(R6)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	RET
+
+// void jmpdefer(fv, sp);
+// called from deferreturn.
+// 1. grab stored LR for caller
+// 2. sub 6 bytes to get back to BL deferreturn (size of BRASL instruction)
+// 3. BR to fn
+TEXT runtime·jmpdefer(SB),NOSPLIT|NOFRAME,$0-16
+	MOVD	0(R15), R1
+	SUB	$6, R1, LR
+
+	MOVD	fv+0(FP), R12
+	MOVD	argp+8(FP), R15
+	SUB	$8, R15
+	MOVD	0(R12), R3
+	BR	(R3)
+
+// Save state of caller into g->sched. Smashes R1.
+TEXT gosave<>(SB),NOSPLIT|NOFRAME,$0
+	MOVD	LR, (g_sched+gobuf_pc)(g)
+	MOVD	R15, (g_sched+gobuf_sp)(g)
+	MOVD	$0, (g_sched+gobuf_lr)(g)
+	MOVD	$0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVD	(g_sched+gobuf_ctxt)(g), R1
+	CMPBEQ	R1, $0, 2(PC)
+	BL	runtime·badctxt(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	// R2 = argc; R3 = argv; R11 = temp; R13 = g; R15 = stack pointer
+	// C TLS base pointer in AR0:AR1
+	MOVD	fn+0(FP), R3
+	MOVD	arg+8(FP), R4
+
+	MOVD	R15, R2		// save original stack pointer
+	MOVD	g, R5
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already.
+	MOVD	g_m(g), R6
+	MOVD	m_g0(R6), R6
+	CMPBEQ	R6, g, g0
+	BL	gosave<>(SB)
+	MOVD	R6, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R15
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+	// Save room for two of our pointers, plus 160 bytes of callee
+	// save area that lives on the caller stack.
+	SUB	$176, R15
+	MOVD	$~7, R6
+	AND	R6, R15                 // 8-byte alignment for gcc ABI
+	MOVD	R5, 168(R15)             // save old g on stack
+	MOVD	(g_stack+stack_hi)(R5), R5
+	SUB	R2, R5
+	MOVD	R5, 160(R15)             // save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+	MOVD	$0, 0(R15)              // clear back chain pointer (TODO can we give it real back trace information?)
+	MOVD	R4, R2                  // arg in R2
+	BL	R3                      // can clobber: R0-R5, R14, F0-F3, F5, F7-F15
+
+	XOR	R0, R0                  // set R0 back to 0.
+	// Restore g, stack pointer.
+	MOVD	168(R15), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_stack+stack_hi)(g), R5
+	MOVD	160(R15), R6
+	SUB	R6, R5
+	MOVD	R5, R15
+
+	MOVW	R2, ret+16(FP)
+	RET
+
+// cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Load m and g from thread-local storage.
+	MOVB	runtime·iscgo(SB), R3
+	CMPBEQ	R3, $0, nocgo
+	BL	runtime·load_g(SB)
+
+nocgo:
+	// If g is nil, Go did not create the current thread.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	CMPBEQ	g, $0, needm
+
+	MOVD	g_m(g), R8
+	MOVD	R8, savedm-8(SP)
+	BR	havem
+
+needm:
+	MOVD	g, savedm-8(SP) // g is zero, so is m.
+	MOVD	$runtime·needm(SB), R3
+	BL	(R3)
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), R3
+	MOVD	R15, (g_sched+gobuf_sp)(R3)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 8(R1) aka savedsp-16(SP).
+	MOVD	m_g0(R8), R3
+	MOVD	(g_sched+gobuf_sp)(R3), R4
+	MOVD	R4, savedsp-24(SP)	// must match frame size
+	MOVD	R15, (g_sched+gobuf_sp)(R3)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVD	m_curg(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R4 // prepare stack as R4
+	MOVD	(g_sched+gobuf_pc)(g), R5
+	MOVD	R5, -(24+8)(R4)	// "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOVD	fn+0(FP), R1
+	MOVD	frame+8(FP), R2
+	MOVD	ctxt+16(FP), R3
+	MOVD	$-(24+8)(R4), R15	// switch stack; must match frame size
+	MOVD	R1, 8(R15)
+	MOVD	R2, 16(R15)
+	MOVD	R3, 24(R15)
+	BL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVD	0(R15), R5
+	MOVD	R5, (g_sched+gobuf_pc)(g)
+	MOVD	$(24+8)(R15), R4	// must match frame size
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R15
+	MOVD	savedsp-24(SP), R4	// must match frame size
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m
+	// for the duration of the call. Since the call is over, return it with dropm.
+	MOVD	savedm-8(SP), R6
+	CMPBNE	R6, $0, droppedm
+	MOVD	$runtime·dropm(SB), R3
+	BL	(R3)
+droppedm:
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVD	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	BL	runtime·save_g(SB)
+	RET
+
+// void setg_gcc(G*); set g in C TLS.
+// Must obey the gcc calling convention.
+TEXT setg_gcc<>(SB),NOSPLIT|NOFRAME,$0-0
+	// The standard prologue clobbers LR (R14), which is callee-save in
+	// the C ABI, so we have to use NOFRAME and save LR ourselves.
+	MOVD	LR, R1
+	// Also save g, R10, and R11 since they're callee-save in C ABI
+	MOVD	R10, R3
+	MOVD	g, R4
+	MOVD	R11, R5
+
+	MOVD	R2, g
+	BL	runtime·save_g(SB)
+
+	MOVD	R5, R11
+	MOVD	R4, g
+	MOVD	R3, R10
+	MOVD	R1, LR
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	(R0), R0
+	UNDEF
+
+// int64 runtime·cputicks(void)
+TEXT runtime·cputicks(SB),NOSPLIT,$0-8
+	// The TOD clock on s390 counts from the year 1900 in ~250ps intervals.
+	// This means that since about 1972 the msb has been set, making the
+	// result of a call to STORE CLOCK (stck) a negative number.
+	// We clear the msb to make it positive.
+	STCK	ret+0(FP)      // serialises before and after call
+	MOVD	ret+0(FP), R3  // R3 will wrap to 0 in the year 2043
+	SLD	$1, R3
+	SRD	$1, R3
+	MOVD	R3, ret+0(FP)
+	RET
+
+// AES hashing not implemented for s390x
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVW	$0, R3
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT|NOFRAME,$0
+	// g (R13), R10, R11 and LR (R14) are callee-save in the C ABI, so save them
+	MOVD	g, R1
+	MOVD	R10, R3
+	MOVD	LR, R4
+	MOVD	R11, R5
+
+	BL	runtime·load_g(SB)	// clobbers g (R13), R10, R11
+	MOVD	g_m(g), R2
+	MOVD	m_curg(R2), R2
+	MOVD	(g_stack+stack_hi)(R2), R2
+
+	MOVD	R1, g
+	MOVD	R3, R10
+	MOVD	R4, LR
+	MOVD	R5, R11
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	BYTE $0x07; BYTE $0x00; // 2-byte nop
+	BL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	BYTE $0x07; BYTE $0x00; // 2-byte nop
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	// Stores are already ordered on s390x, so this is just a
+	// compile barrier.
+	RET
+
+// This is called from .init_array and follows the platform, not Go, ABI.
+// We are overly conservative. We could only save the registers we use.
+// However, since this function is only called once per loaded module
+// performance is unimportant.
+TEXT runtime·addmoduledata(SB),NOSPLIT|NOFRAME,$0-0
+	// Save R6-R15 in the register save area of the calling function.
+	// Don't bother saving F8-F15 as we aren't doing any calls.
+	STMG	R6, R15, 48(R15)
+
+	// append the argument (passed in R2, as per the ELF ABI) to the
+	// moduledata linked list.
+	MOVD	runtime·lastmoduledatap(SB), R1
+	MOVD	R2, moduledata_next(R1)
+	MOVD	R2, runtime·lastmoduledatap(SB)
+
+	// Restore R6-R15.
+	LMG	48(R15), R6, R15
+	RET
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVB	$1, ret+0(FP)
+	RET
+
+// gcWriteBarrier performs a heap pointer write and informs the GC.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It takes two arguments:
+// - R2 is the destination of the write
+// - R3 is the value being written at R2.
+// It clobbers R10 (the temp register).
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+TEXT runtime·gcWriteBarrier(SB),NOSPLIT,$104
+	// Save the registers clobbered by the fast path.
+	MOVD	R1, 96(R15)
+	MOVD	R4, 104(R15)
+	MOVD	g_m(g), R1
+	MOVD	m_p(R1), R1
+	// Increment wbBuf.next position.
+	MOVD	$16, R4
+	ADD	(p_wbBuf+wbBuf_next)(R1), R4
+	MOVD	R4, (p_wbBuf+wbBuf_next)(R1)
+	MOVD	(p_wbBuf+wbBuf_end)(R1), R1
+	// Record the write.
+	MOVD	R3, -16(R4) // Record value
+	MOVD	(R2), R10   // TODO: This turns bad writes into bad reads.
+	MOVD	R10, -8(R4) // Record *slot
+	// Is the buffer full?
+	CMPBEQ	R4, R1, flush
+ret:
+	MOVD	96(R15), R1
+	MOVD	104(R15), R4
+	// Do the write.
+	MOVD	R3, (R2)
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	STMG	R2, R3, 8(R15)   // set R2 and R3 as arguments for wbBufFlush
+	MOVD	R0, 24(R15)
+	// R1 already saved.
+	// R4 already saved.
+	STMG	R5, R12, 32(R15) // save R5 - R12
+	// R13 is g.
+	// R14 is LR.
+	// R15 is SP.
+
+	// This takes arguments R2 and R3.
+	CALL	runtime·wbBufFlush(SB)
+
+	LMG	8(R15), R2, R3   // restore R2 - R3
+	MOVD	24(R15), R0      // restore R0
+	LMG	32(R15), R5, R12 // restore R5 - R12
+	JMP	ret
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
diff --git a/src/runtime/asm_wasm.s b/src/runtime/asm_wasm.s
new file mode 100644
index 0000000..fcb780f
--- /dev/null
+++ b/src/runtime/asm_wasm.s
@@ -0,0 +1,473 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+TEXT runtime·rt0_go(SB), NOSPLIT|NOFRAME, $0
+	// save m->g0 = g0
+	MOVD $runtime·g0(SB), runtime·m0+m_g0(SB)
+	// save m0 to g0->m
+	MOVD $runtime·m0(SB), runtime·g0+g_m(SB)
+	// set g to g0
+	MOVD $runtime·g0(SB), g
+	CALLNORESUME runtime·check(SB)
+	CALLNORESUME runtime·args(SB)
+	CALLNORESUME runtime·osinit(SB)
+	CALLNORESUME runtime·schedinit(SB)
+	MOVD $0, 0(SP)
+	MOVD $runtime·mainPC(SB), 8(SP)
+	CALLNORESUME runtime·newproc(SB)
+	CALL runtime·mstart(SB) // WebAssembly stack will unwind when switching to another goroutine
+	UNDEF
+
+DATA  runtime·mainPC+0(SB)/8,$runtime·main(SB)
+GLOBL runtime·mainPC(SB),RODATA,$8
+
+// func checkASM() bool
+TEXT ·checkASM(SB), NOSPLIT, $0-1
+	MOVB $1, ret+0(FP)
+	RET
+
+TEXT runtime·gogo(SB), NOSPLIT, $0-8
+	MOVD buf+0(FP), R0
+	MOVD gobuf_g(R0), g
+	MOVD gobuf_sp(R0), SP
+
+	// Put target PC at -8(SP), wasm_pc_f_loop will pick it up
+	Get SP
+	I32Const $8
+	I32Sub
+	I64Load gobuf_pc(R0)
+	I64Store $0
+
+	MOVD gobuf_ret(R0), RET0
+	MOVD gobuf_ctxt(R0), CTXT
+	// clear to help garbage collector
+	MOVD $0, gobuf_sp(R0)
+	MOVD $0, gobuf_ret(R0)
+	MOVD $0, gobuf_ctxt(R0)
+
+	I32Const $1
+	Return
+
+// func mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT, $0-8
+	// CTXT = fn
+	MOVD fn+0(FP), CTXT
+	// R1 = g.m
+	MOVD g_m(g), R1
+	// R2 = g0
+	MOVD m_g0(R1), R2
+
+	// save state in g->sched
+	MOVD 0(SP), g_sched+gobuf_pc(g)     // caller's PC
+	MOVD $fn+0(FP), g_sched+gobuf_sp(g) // caller's SP
+	MOVD g, g_sched+gobuf_g(g)
+
+	// if g == g0 call badmcall
+	Get g
+	Get R2
+	I64Eq
+	If
+		JMP runtime·badmcall(SB)
+	End
+
+	// switch to g0's stack
+	I64Load (g_sched+gobuf_sp)(R2)
+	I64Const $8
+	I64Sub
+	I32WrapI64
+	Set SP
+
+	// set arg to current g
+	MOVD g, 0(SP)
+
+	// switch to g0
+	MOVD R2, g
+
+	// call fn
+	Get CTXT
+	I32WrapI64
+	I64Load $0
+	CALL
+
+	Get SP
+	I32Const $8
+	I32Add
+	Set SP
+
+	JMP runtime·badmcall2(SB)
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	// R0 = fn
+	MOVD fn+0(FP), R0
+	// R1 = g.m
+	MOVD g_m(g), R1
+	// R2 = g0
+	MOVD m_g0(R1), R2
+
+	// if g == g0
+	Get g
+	Get R2
+	I64Eq
+	If
+		// no switch:
+		MOVD R0, CTXT
+
+		Get CTXT
+		I32WrapI64
+		I64Load $0
+		JMP
+	End
+
+	// if g != m.curg
+	Get g
+	I64Load m_curg(R1)
+	I64Ne
+	If
+		CALLNORESUME runtime·badsystemstack(SB)
+	End
+
+	// switch:
+
+	// save state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVD $runtime·systemstack_switch(SB), g_sched+gobuf_pc(g)
+
+	MOVD SP, g_sched+gobuf_sp(g)
+	MOVD g, g_sched+gobuf_g(g)
+
+	// switch to g0
+	MOVD R2, g
+
+	// make it look like mstart called systemstack on g0, to stop traceback
+	I64Load (g_sched+gobuf_sp)(R2)
+	I64Const $8
+	I64Sub
+	Set R3
+
+	MOVD $runtime·mstart(SB), 0(R3)
+	MOVD R3, SP
+
+	// call fn
+	MOVD R0, CTXT
+
+	Get CTXT
+	I32WrapI64
+	I64Load $0
+	CALL
+
+	// switch back to g
+	MOVD g_m(g), R1
+	MOVD m_curg(R1), R2
+	MOVD R2, g
+	MOVD g_sched+gobuf_sp(R2), SP
+	MOVD $0, g_sched+gobuf_sp(R2)
+	RET
+
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	RET
+
+// AES hashing not implemented for wasm
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB), NOSPLIT, $0-0
+	MOVD $0, RET0
+	RET
+
+TEXT runtime·jmpdefer(SB), NOSPLIT, $0-16
+	MOVD fv+0(FP), CTXT
+
+	Get CTXT
+	I64Eqz
+	If
+		CALLNORESUME runtime·sigpanic<ABIInternal>(SB)
+	End
+
+	// caller sp after CALL
+	I64Load argp+8(FP)
+	I64Const $8
+	I64Sub
+	I32WrapI64
+	Set SP
+
+	// decrease PC_B by 1 to CALL again
+	Get SP
+	I32Load16U (SP)
+	I32Const $1
+	I32Sub
+	I32Store16 $0
+
+	// but first run the deferred function
+	Get CTXT
+	I32WrapI64
+	I64Load $0
+	JMP
+
+TEXT runtime·asminit(SB), NOSPLIT, $0-0
+	// No per-thread init.
+	RET
+
+TEXT ·publicationBarrier(SB), NOSPLIT, $0-0
+	RET
+
+TEXT runtime·procyield(SB), NOSPLIT, $0-0 // FIXME
+	RET
+
+TEXT runtime·breakpoint(SB), NOSPLIT, $0-0
+	UNDEF
+
+// Called during function prolog when more stack is needed.
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB), NOSPLIT, $0-0
+	// R1 = g.m
+	MOVD g_m(g), R1
+
+	// R2 = g0
+	MOVD m_g0(R1), R2
+
+	// Cannot grow scheduler stack (m->g0).
+	Get g
+	Get R1
+	I64Eq
+	If
+		CALLNORESUME runtime·badmorestackg0(SB)
+	End
+
+	// Cannot grow signal stack (m->gsignal).
+	Get g
+	I64Load m_gsignal(R1)
+	I64Eq
+	If
+		CALLNORESUME runtime·badmorestackgsignal(SB)
+	End
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	MOVD 8(SP), m_morebuf+gobuf_pc(R1)
+	MOVD $16(SP), m_morebuf+gobuf_sp(R1) // f's caller's SP
+	MOVD g, m_morebuf+gobuf_g(R1)
+
+	// Set g->sched to context in f.
+	MOVD 0(SP), g_sched+gobuf_pc(g)
+	MOVD g, g_sched+gobuf_g(g)
+	MOVD $8(SP), g_sched+gobuf_sp(g) // f's SP
+	MOVD CTXT, g_sched+gobuf_ctxt(g)
+
+	// Call newstack on m->g0's stack.
+	MOVD R2, g
+	MOVD g_sched+gobuf_sp(R2), SP
+	CALL runtime·newstack(SB)
+	UNDEF // crash if newstack returns
+
+// morestack but not preserving ctxt.
+TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0
+	MOVD $0, CTXT
+	JMP runtime·morestack(SB)
+
+TEXT ·asmcgocall(SB), NOSPLIT, $0-0
+	UNDEF
+
+#define DISPATCH(NAME, MAXSIZE) \
+	Get R0; \
+	I64Const $MAXSIZE; \
+	I64LeU; \
+	If; \
+		JMP NAME(SB); \
+	End
+
+TEXT ·reflectcall(SB), NOSPLIT, $0-32
+	I64Load fn+8(FP)
+	I64Eqz
+	If
+		CALLNORESUME runtime·sigpanic<ABIInternal>(SB)
+	End
+
+	MOVW argsize+24(FP), R0
+
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	JMP runtime·badreflectcall(SB)
+
+#define CALLFN(NAME, MAXSIZE) \
+TEXT NAME(SB), WRAPPER, $MAXSIZE-32; \
+	NO_LOCAL_POINTERS; \
+	MOVW argsize+24(FP), R0; \
+	\
+	Get R0; \
+	I64Eqz; \
+	Not; \
+	If; \
+		Get SP; \
+		I64Load argptr+16(FP); \
+		I32WrapI64; \
+		I64Load argsize+24(FP); \
+		I64Const $3; \
+		I64ShrU; \
+		I32WrapI64; \
+		Call runtime·wasmMove(SB); \
+	End; \
+	\
+	MOVD f+8(FP), CTXT; \
+	Get CTXT; \
+	I32WrapI64; \
+	I64Load $0; \
+	CALL; \
+	\
+	I64Load32U retoffset+28(FP); \
+	Set R0; \
+	\
+	MOVD argtype+0(FP), RET0; \
+	\
+	I64Load argptr+16(FP); \
+	Get R0; \
+	I64Add; \
+	Set RET1; \
+	\
+	Get SP; \
+	I64ExtendI32U; \
+	Get R0; \
+	I64Add; \
+	Set RET2; \
+	\
+	I64Load32U argsize+24(FP); \
+	Get R0; \
+	I64Sub; \
+	Set RET3; \
+	\
+	CALL callRet<>(SB); \
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $32-0
+	NO_LOCAL_POINTERS
+	MOVD RET0, 0(SP)
+	MOVD RET1, 8(SP)
+	MOVD RET2, 16(SP)
+	MOVD RET3, 24(SP)
+	CALL runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·goexit(SB), NOSPLIT, $0-0
+	NOP // first PC of goexit is skipped
+	CALL runtime·goexit1(SB) // does not return
+	UNDEF
+
+TEXT runtime·cgocallback(SB), NOSPLIT, $0-24
+	UNDEF
+
+// gcWriteBarrier performs a heap pointer write and informs the GC.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It has two WebAssembly parameters:
+// R0: the destination of the write (i64)
+// R1: the value being written (i64)
+TEXT runtime·gcWriteBarrier(SB), NOSPLIT, $16
+	// R3 = g.m
+	MOVD g_m(g), R3
+	// R4 = p
+	MOVD m_p(R3), R4
+	// R5 = wbBuf.next
+	MOVD p_wbBuf+wbBuf_next(R4), R5
+
+	// Record value
+	MOVD R1, 0(R5)
+	// Record *slot
+	MOVD (R0), 8(R5)
+
+	// Increment wbBuf.next
+	Get R5
+	I64Const $16
+	I64Add
+	Set R5
+	MOVD R5, p_wbBuf+wbBuf_next(R4)
+
+	Get R5
+	I64Load (p_wbBuf+wbBuf_end)(R4)
+	I64Eq
+	If
+		// Flush
+		MOVD R0, 0(SP)
+		MOVD R1, 8(SP)
+		CALLNORESUME runtime·wbBufFlush(SB)
+	End
+
+	// Do the write
+	MOVD R1, (R0)
+
+	RET
diff --git a/src/runtime/atomic_arm64.s b/src/runtime/atomic_arm64.s
new file mode 100644
index 0000000..21b4d8c
--- /dev/null
+++ b/src/runtime/atomic_arm64.s
@@ -0,0 +1,9 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	DMB	$0xe	// DMB ST
+	RET
diff --git a/src/runtime/atomic_mips64x.s b/src/runtime/atomic_mips64x.s
new file mode 100644
index 0000000..6f42412
--- /dev/null
+++ b/src/runtime/atomic_mips64x.s
@@ -0,0 +1,13 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+#include "textflag.h"
+
+#define SYNC	WORD $0xf
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	SYNC
+	RET
diff --git a/src/runtime/atomic_mipsx.s b/src/runtime/atomic_mipsx.s
new file mode 100644
index 0000000..ed078a2
--- /dev/null
+++ b/src/runtime/atomic_mipsx.s
@@ -0,0 +1,11 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+#include "textflag.h"
+
+TEXT ·publicationBarrier(SB),NOSPLIT,$0
+	SYNC
+	RET
diff --git a/src/runtime/atomic_pointer.go b/src/runtime/atomic_pointer.go
new file mode 100644
index 0000000..b8f0c22
--- /dev/null
+++ b/src/runtime/atomic_pointer.go
@@ -0,0 +1,77 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// These functions cannot have go:noescape annotations,
+// because while ptr does not escape, new does.
+// If new is marked as not escaping, the compiler will make incorrect
+// escape analysis decisions about the pointer value being stored.
+
+// atomicwb performs a write barrier before an atomic pointer write.
+// The caller should guard the call with "if writeBarrier.enabled".
+//
+//go:nosplit
+func atomicwb(ptr *unsafe.Pointer, new unsafe.Pointer) {
+	slot := (*uintptr)(unsafe.Pointer(ptr))
+	if !getg().m.p.ptr().wbBuf.putFast(*slot, uintptr(new)) {
+		wbBufFlush(slot, uintptr(new))
+	}
+}
+
+// atomicstorep performs *ptr = new atomically and invokes a write barrier.
+//
+//go:nosplit
+func atomicstorep(ptr unsafe.Pointer, new unsafe.Pointer) {
+	if writeBarrier.enabled {
+		atomicwb((*unsafe.Pointer)(ptr), new)
+	}
+	atomic.StorepNoWB(noescape(ptr), new)
+}
+
+// Like above, but implement in terms of sync/atomic's uintptr operations.
+// We cannot just call the runtime routines, because the race detector expects
+// to be able to intercept the sync/atomic forms but not the runtime forms.
+
+//go:linkname sync_atomic_StoreUintptr sync/atomic.StoreUintptr
+func sync_atomic_StoreUintptr(ptr *uintptr, new uintptr)
+
+//go:linkname sync_atomic_StorePointer sync/atomic.StorePointer
+//go:nosplit
+func sync_atomic_StorePointer(ptr *unsafe.Pointer, new unsafe.Pointer) {
+	if writeBarrier.enabled {
+		atomicwb(ptr, new)
+	}
+	sync_atomic_StoreUintptr((*uintptr)(unsafe.Pointer(ptr)), uintptr(new))
+}
+
+//go:linkname sync_atomic_SwapUintptr sync/atomic.SwapUintptr
+func sync_atomic_SwapUintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:linkname sync_atomic_SwapPointer sync/atomic.SwapPointer
+//go:nosplit
+func sync_atomic_SwapPointer(ptr *unsafe.Pointer, new unsafe.Pointer) unsafe.Pointer {
+	if writeBarrier.enabled {
+		atomicwb(ptr, new)
+	}
+	old := unsafe.Pointer(sync_atomic_SwapUintptr((*uintptr)(noescape(unsafe.Pointer(ptr))), uintptr(new)))
+	return old
+}
+
+//go:linkname sync_atomic_CompareAndSwapUintptr sync/atomic.CompareAndSwapUintptr
+func sync_atomic_CompareAndSwapUintptr(ptr *uintptr, old, new uintptr) bool
+
+//go:linkname sync_atomic_CompareAndSwapPointer sync/atomic.CompareAndSwapPointer
+//go:nosplit
+func sync_atomic_CompareAndSwapPointer(ptr *unsafe.Pointer, old, new unsafe.Pointer) bool {
+	if writeBarrier.enabled {
+		atomicwb(ptr, new)
+	}
+	return sync_atomic_CompareAndSwapUintptr((*uintptr)(noescape(unsafe.Pointer(ptr))), uintptr(old), uintptr(new))
+}
diff --git a/src/runtime/atomic_ppc64x.s b/src/runtime/atomic_ppc64x.s
new file mode 100644
index 0000000..57f672f
--- /dev/null
+++ b/src/runtime/atomic_ppc64x.s
@@ -0,0 +1,14 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+#include "textflag.h"
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	// LWSYNC is the "export" barrier recommended by Power ISA
+	// v2.07 book II, appendix B.2.2.2.
+	// LWSYNC is a load/load, load/store, and store/store barrier.
+	LWSYNC
+	RET
diff --git a/src/runtime/atomic_riscv64.s b/src/runtime/atomic_riscv64.s
new file mode 100644
index 0000000..544a7c5
--- /dev/null
+++ b/src/runtime/atomic_riscv64.s
@@ -0,0 +1,10 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// func publicationBarrier()
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	FENCE
+	RET
diff --git a/src/runtime/auxv_none.go b/src/runtime/auxv_none.go
new file mode 100644
index 0000000..3ca617b
--- /dev/null
+++ b/src/runtime/auxv_none.go
@@ -0,0 +1,15 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !linux
+// +build !darwin
+// +build !dragonfly
+// +build !freebsd
+// +build !netbsd
+// +build !solaris
+
+package runtime
+
+func sysargs(argc int32, argv **byte) {
+}
diff --git a/src/runtime/callers_test.go b/src/runtime/callers_test.go
new file mode 100644
index 0000000..3cf3fbe
--- /dev/null
+++ b/src/runtime/callers_test.go
@@ -0,0 +1,311 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"reflect"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+func f1(pan bool) []uintptr {
+	return f2(pan) // line 15
+}
+
+func f2(pan bool) []uintptr {
+	return f3(pan) // line 19
+}
+
+func f3(pan bool) []uintptr {
+	if pan {
+		panic("f3") // line 24
+	}
+	ret := make([]uintptr, 20)
+	return ret[:runtime.Callers(0, ret)] // line 27
+}
+
+func testCallers(t *testing.T, pcs []uintptr, pan bool) {
+	m := make(map[string]int, len(pcs))
+	frames := runtime.CallersFrames(pcs)
+	for {
+		frame, more := frames.Next()
+		if frame.Function != "" {
+			m[frame.Function] = frame.Line
+		}
+		if !more {
+			break
+		}
+	}
+
+	var seen []string
+	for k := range m {
+		seen = append(seen, k)
+	}
+	t.Logf("functions seen: %s", strings.Join(seen, " "))
+
+	var f3Line int
+	if pan {
+		f3Line = 24
+	} else {
+		f3Line = 27
+	}
+	want := []struct {
+		name string
+		line int
+	}{
+		{"f1", 15},
+		{"f2", 19},
+		{"f3", f3Line},
+	}
+	for _, w := range want {
+		if got := m["runtime_test."+w.name]; got != w.line {
+			t.Errorf("%s is line %d, want %d", w.name, got, w.line)
+		}
+	}
+}
+
+func testCallersEqual(t *testing.T, pcs []uintptr, want []string) {
+	t.Helper()
+
+	got := make([]string, 0, len(want))
+
+	frames := runtime.CallersFrames(pcs)
+	for {
+		frame, more := frames.Next()
+		if !more || len(got) >= len(want) {
+			break
+		}
+		got = append(got, frame.Function)
+	}
+	if !reflect.DeepEqual(want, got) {
+		t.Fatalf("wanted %v, got %v", want, got)
+	}
+}
+
+func TestCallers(t *testing.T) {
+	testCallers(t, f1(false), false)
+}
+
+func TestCallersPanic(t *testing.T) {
+	// Make sure we don't have any extra frames on the stack (due to
+	// open-coded defer processing)
+	want := []string{"runtime.Callers", "runtime_test.TestCallersPanic.func1",
+		"runtime.gopanic", "runtime_test.f3", "runtime_test.f2", "runtime_test.f1",
+		"runtime_test.TestCallersPanic"}
+
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallers(t, pcs, true)
+		testCallersEqual(t, pcs, want)
+	}()
+	f1(true)
+}
+
+func TestCallersDoublePanic(t *testing.T) {
+	// Make sure we don't have any extra frames on the stack (due to
+	// open-coded defer processing)
+	want := []string{"runtime.Callers", "runtime_test.TestCallersDoublePanic.func1.1",
+		"runtime.gopanic", "runtime_test.TestCallersDoublePanic.func1", "runtime.gopanic", "runtime_test.TestCallersDoublePanic"}
+
+	defer func() {
+		defer func() {
+			pcs := make([]uintptr, 20)
+			pcs = pcs[:runtime.Callers(0, pcs)]
+			if recover() == nil {
+				t.Fatal("did not panic")
+			}
+			testCallersEqual(t, pcs, want)
+		}()
+		if recover() == nil {
+			t.Fatal("did not panic")
+		}
+		panic(2)
+	}()
+	panic(1)
+}
+
+// Test that a defer after a successful recovery looks like it is called directly
+// from the function with the defers.
+func TestCallersAfterRecovery(t *testing.T) {
+	want := []string{"runtime.Callers", "runtime_test.TestCallersAfterRecovery.func1", "runtime_test.TestCallersAfterRecovery"}
+
+	defer func() {
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+	}()
+	defer func() {
+		if recover() == nil {
+			t.Fatal("did not recover from panic")
+		}
+	}()
+	panic(1)
+}
+
+func TestCallersAbortedPanic(t *testing.T) {
+	want := []string{"runtime.Callers", "runtime_test.TestCallersAbortedPanic.func2", "runtime_test.TestCallersAbortedPanic"}
+
+	defer func() {
+		r := recover()
+		if r != nil {
+			t.Fatalf("should be no panic remaining to recover")
+		}
+	}()
+
+	defer func() {
+		// panic2 was aborted/replaced by panic1, so when panic2 was
+		// recovered, there is no remaining panic on the stack.
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+	}()
+	defer func() {
+		r := recover()
+		if r != "panic2" {
+			t.Fatalf("got %v, wanted %v", r, "panic2")
+		}
+	}()
+	defer func() {
+		// panic2 aborts/replaces panic1, because it is a recursive panic
+		// that is not recovered within the defer function called by
+		// panic1 panicking sequence
+		panic("panic2")
+	}()
+	panic("panic1")
+}
+
+func TestCallersAbortedPanic2(t *testing.T) {
+	want := []string{"runtime.Callers", "runtime_test.TestCallersAbortedPanic2.func2", "runtime_test.TestCallersAbortedPanic2"}
+	defer func() {
+		r := recover()
+		if r != nil {
+			t.Fatalf("should be no panic remaining to recover")
+		}
+	}()
+	defer func() {
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+	}()
+	func() {
+		defer func() {
+			r := recover()
+			if r != "panic2" {
+				t.Fatalf("got %v, wanted %v", r, "panic2")
+			}
+		}()
+		func() {
+			defer func() {
+				// Again, panic2 aborts/replaces panic1
+				panic("panic2")
+			}()
+			panic("panic1")
+		}()
+	}()
+}
+
+func TestCallersNilPointerPanic(t *testing.T) {
+	// Make sure we don't have any extra frames on the stack (due to
+	// open-coded defer processing)
+	want := []string{"runtime.Callers", "runtime_test.TestCallersNilPointerPanic.func1",
+		"runtime.gopanic", "runtime.panicmem", "runtime.sigpanic",
+		"runtime_test.TestCallersNilPointerPanic"}
+
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+	}()
+	var p *int
+	if *p == 3 {
+		t.Fatal("did not see nil pointer panic")
+	}
+}
+
+func TestCallersDivZeroPanic(t *testing.T) {
+	// Make sure we don't have any extra frames on the stack (due to
+	// open-coded defer processing)
+	want := []string{"runtime.Callers", "runtime_test.TestCallersDivZeroPanic.func1",
+		"runtime.gopanic", "runtime.panicdivide",
+		"runtime_test.TestCallersDivZeroPanic"}
+
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+	}()
+	var n int
+	if 5/n == 1 {
+		t.Fatal("did not see divide-by-sizer panic")
+	}
+}
+
+func TestCallersDeferNilFuncPanic(t *testing.T) {
+	// Make sure we don't have any extra frames on the stack. We cut off the check
+	// at runtime.sigpanic, because non-open-coded defers (which may be used in
+	// non-opt or race checker mode) include an extra 'deferreturn' frame (which is
+	// where the nil pointer deref happens).
+	state := 1
+	want := []string{"runtime.Callers", "runtime_test.TestCallersDeferNilFuncPanic.func1",
+		"runtime.gopanic", "runtime.panicmem", "runtime.sigpanic"}
+
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+		if state == 1 {
+			t.Fatal("nil defer func panicked at defer time rather than function exit time")
+		}
+
+	}()
+	var f func()
+	defer f()
+	// Use the value of 'state' to make sure nil defer func f causes panic at
+	// function exit, rather than at the defer statement.
+	state = 2
+}
+
+// Same test, but forcing non-open-coded defer by putting the defer in a loop.  See
+// issue #36050
+func TestCallersDeferNilFuncPanicWithLoop(t *testing.T) {
+	state := 1
+	want := []string{"runtime.Callers", "runtime_test.TestCallersDeferNilFuncPanicWithLoop.func1",
+		"runtime.gopanic", "runtime.panicmem", "runtime.sigpanic", "runtime.deferreturn", "runtime_test.TestCallersDeferNilFuncPanicWithLoop"}
+
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+		if state == 1 {
+			t.Fatal("nil defer func panicked at defer time rather than function exit time")
+		}
+
+	}()
+
+	for i := 0; i < 1; i++ {
+		var f func()
+		defer f()
+	}
+	// Use the value of 'state' to make sure nil defer func f causes panic at
+	// function exit, rather than at the defer statement.
+	state = 2
+}
diff --git a/src/runtime/cgo.go b/src/runtime/cgo.go
new file mode 100644
index 0000000..395d54a
--- /dev/null
+++ b/src/runtime/cgo.go
@@ -0,0 +1,54 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+//go:cgo_export_static main
+
+// Filled in by runtime/cgo when linked into binary.
+
+//go:linkname _cgo_init _cgo_init
+//go:linkname _cgo_thread_start _cgo_thread_start
+//go:linkname _cgo_sys_thread_create _cgo_sys_thread_create
+//go:linkname _cgo_notify_runtime_init_done _cgo_notify_runtime_init_done
+//go:linkname _cgo_callers _cgo_callers
+//go:linkname _cgo_set_context_function _cgo_set_context_function
+//go:linkname _cgo_yield _cgo_yield
+
+var (
+	_cgo_init                     unsafe.Pointer
+	_cgo_thread_start             unsafe.Pointer
+	_cgo_sys_thread_create        unsafe.Pointer
+	_cgo_notify_runtime_init_done unsafe.Pointer
+	_cgo_callers                  unsafe.Pointer
+	_cgo_set_context_function     unsafe.Pointer
+	_cgo_yield                    unsafe.Pointer
+)
+
+// iscgo is set to true by the runtime/cgo package
+var iscgo bool
+
+// cgoHasExtraM is set on startup when an extra M is created for cgo.
+// The extra M must be created before any C/C++ code calls cgocallback.
+var cgoHasExtraM bool
+
+// cgoUse is called by cgo-generated code (using go:linkname to get at
+// an unexported name). The calls serve two purposes:
+// 1) they are opaque to escape analysis, so the argument is considered to
+// escape to the heap.
+// 2) they keep the argument alive until the call site; the call is emitted after
+// the end of the (presumed) use of the argument by C.
+// cgoUse should not actually be called (see cgoAlwaysFalse).
+func cgoUse(interface{}) { throw("cgoUse should not be called") }
+
+// cgoAlwaysFalse is a boolean value that is always false.
+// The cgo-generated code says if cgoAlwaysFalse { cgoUse(p) }.
+// The compiler cannot see that cgoAlwaysFalse is always false,
+// so it emits the test and keeps the call, giving the desired
+// escape analysis result. The test is cheaper than the call.
+var cgoAlwaysFalse bool
+
+var cgo_yield = &_cgo_yield
diff --git a/src/runtime/cgo/asm_386.s b/src/runtime/cgo/asm_386.s
new file mode 100644
index 0000000..2e7e951
--- /dev/null
+++ b/src/runtime/cgo/asm_386.s
@@ -0,0 +1,29 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT,$28-16
+	MOVL BP, 24(SP)
+	MOVL BX, 20(SP)
+	MOVL SI, 16(SP)
+	MOVL DI, 12(SP)
+
+	MOVL	ctxt+12(FP), AX
+	MOVL	AX, 8(SP)
+	MOVL	a+4(FP), AX
+	MOVL	AX, 4(SP)
+	MOVL	fn+0(FP), AX
+	MOVL	AX, 0(SP)
+	CALL	runtime·cgocallback(SB)
+
+	MOVL 12(SP), DI
+	MOVL 16(SP), SI
+	MOVL 20(SP), BX
+	MOVL 24(SP), BP
+	RET
diff --git a/src/runtime/cgo/asm_amd64.s b/src/runtime/cgo/asm_amd64.s
new file mode 100644
index 0000000..5dc8e2d
--- /dev/null
+++ b/src/runtime/cgo/asm_amd64.s
@@ -0,0 +1,72 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+// This signature is known to SWIG, so we can't change it.
+#ifndef GOOS_windows
+TEXT crosscall2(SB),NOSPLIT,$0x50-0 /* keeps stack pointer 32-byte aligned */
+#else
+TEXT crosscall2(SB),NOSPLIT,$0x110-0 /* also need to save xmm6 - xmm15 */
+#endif
+	MOVQ	BX, 0x18(SP)
+	MOVQ	R12, 0x28(SP)
+	MOVQ	R13, 0x30(SP)
+	MOVQ	R14, 0x38(SP)
+	MOVQ	R15, 0x40(SP)
+
+#ifdef GOOS_windows
+	// Win64 save RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15 and XMM6 -- XMM15.
+	MOVQ	DI, 0x48(SP)
+	MOVQ	SI, 0x50(SP)
+	MOVUPS	X6, 0x60(SP)
+	MOVUPS	X7, 0x70(SP)
+	MOVUPS	X8, 0x80(SP)
+	MOVUPS	X9, 0x90(SP)
+	MOVUPS	X10, 0xa0(SP)
+	MOVUPS	X11, 0xb0(SP)
+	MOVUPS	X12, 0xc0(SP)
+	MOVUPS	X13, 0xd0(SP)
+	MOVUPS	X14, 0xe0(SP)
+	MOVUPS	X15, 0xf0(SP)
+
+	MOVQ	CX, 0x0(SP)	/* fn */
+	MOVQ	DX, 0x8(SP)	/* arg */
+	// Skip n in R8.
+	MOVQ	R9, 0x10(SP)	/* ctxt */
+
+	CALL	runtime·cgocallback(SB)
+
+	MOVQ	0x48(SP), DI
+	MOVQ	0x50(SP), SI
+	MOVUPS	0x60(SP), X6
+	MOVUPS	0x70(SP), X7
+	MOVUPS	0x80(SP), X8
+	MOVUPS	0x90(SP), X9
+	MOVUPS	0xa0(SP), X10
+	MOVUPS	0xb0(SP), X11
+	MOVUPS	0xc0(SP), X12
+	MOVUPS	0xd0(SP), X13
+	MOVUPS	0xe0(SP), X14
+	MOVUPS	0xf0(SP), X15
+#else
+	MOVQ	DI, 0x0(SP)	/* fn */
+	MOVQ	SI, 0x8(SP)	/* arg */
+	// Skip n in DX.
+	MOVQ	CX, 0x10(SP)	/* ctxt */
+
+	CALL	runtime·cgocallback(SB)
+#endif
+
+	MOVQ	0x18(SP), BX
+	MOVQ	0x28(SP), R12
+	MOVQ	0x30(SP), R13
+	MOVQ	0x38(SP), R14
+	MOVQ	0x40(SP), R15
+
+	RET
diff --git a/src/runtime/cgo/asm_arm.s b/src/runtime/cgo/asm_arm.s
new file mode 100644
index 0000000..ea55e17
--- /dev/null
+++ b/src/runtime/cgo/asm_arm.s
@@ -0,0 +1,56 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	SUB	$(8*9), R13 // Reserve space for the floating point registers.
+	// The C arguments arrive in R0, R1, R2, and R3. We want to
+	// pass R0, R1, and R3 to Go, so we push those on the stack.
+	// Also, save C callee-save registers R4-R12.
+	MOVM.WP	[R0, R1, R3, R4, R5, R6, R7, R8, R9, g, R11, R12], (R13)
+	// Finally, save the link register R14. This also puts the
+	// arguments we pushed for cgocallback where they need to be,
+	// starting at 4(R13).
+	MOVW.W	R14, -4(R13)
+
+	// Skip floating point registers on GOARM < 6.
+	MOVB    runtime·goarm(SB), R11
+	CMP $6, R11
+	BLT skipfpsave
+	MOVD	F8, (13*4+8*1)(R13)
+	MOVD	F9, (13*4+8*2)(R13)
+	MOVD	F10, (13*4+8*3)(R13)
+	MOVD	F11, (13*4+8*4)(R13)
+	MOVD	F12, (13*4+8*5)(R13)
+	MOVD	F13, (13*4+8*6)(R13)
+	MOVD	F14, (13*4+8*7)(R13)
+	MOVD	F15, (13*4+8*8)(R13)
+
+skipfpsave:
+	BL	runtime·load_g(SB)
+	// We set up the arguments to cgocallback when saving registers above.
+	BL	runtime·cgocallback(SB)
+
+	MOVB    runtime·goarm(SB), R11
+	CMP $6, R11
+	BLT skipfprest
+	MOVD	(13*4+8*1)(R13), F8
+	MOVD	(13*4+8*2)(R13), F9
+	MOVD	(13*4+8*3)(R13), F10
+	MOVD	(13*4+8*4)(R13), F11
+	MOVD	(13*4+8*5)(R13), F12
+	MOVD	(13*4+8*6)(R13), F13
+	MOVD	(13*4+8*7)(R13), F14
+	MOVD	(13*4+8*8)(R13), F15
+
+skipfprest:
+	MOVW.P	4(R13), R14
+	MOVM.IAW	(R13), [R0, R1, R3, R4, R5, R6, R7, R8, R9, g, R11, R12]
+	ADD	$(8*9), R13
+	MOVW	R14, R15
diff --git a/src/runtime/cgo/asm_arm64.s b/src/runtime/cgo/asm_arm64.s
new file mode 100644
index 0000000..1cb25cf
--- /dev/null
+++ b/src/runtime/cgo/asm_arm64.s
@@ -0,0 +1,70 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	/*
+	 * We still need to save all callee save register as before, and then
+	 *  push 3 args for fn (R0, R1, R3), skipping R2.
+	 * Also note that at procedure entry in gc world, 8(RSP) will be the
+	 *  first arg.
+	 * TODO(minux): use LDP/STP here if it matters.
+	 */
+	SUB	$(8*24), RSP
+	MOVD	R0, (8*1)(RSP)
+	MOVD	R1, (8*2)(RSP)
+	MOVD	R3, (8*3)(RSP)
+	MOVD	R19, (8*4)(RSP)
+	MOVD	R20, (8*5)(RSP)
+	MOVD	R21, (8*6)(RSP)
+	MOVD	R22, (8*7)(RSP)
+	MOVD	R23, (8*8)(RSP)
+	MOVD	R24, (8*9)(RSP)
+	MOVD	R25, (8*10)(RSP)
+	MOVD	R26, (8*11)(RSP)
+	MOVD	R27, (8*12)(RSP)
+	MOVD	g, (8*13)(RSP)
+	MOVD	R29, (8*14)(RSP)
+	MOVD	R30, (8*15)(RSP)
+	FMOVD	F8, (8*16)(RSP)
+	FMOVD	F9, (8*17)(RSP)
+	FMOVD	F10, (8*18)(RSP)
+	FMOVD	F11, (8*19)(RSP)
+	FMOVD	F12, (8*20)(RSP)
+	FMOVD	F13, (8*21)(RSP)
+	FMOVD	F14, (8*22)(RSP)
+	FMOVD	F15, (8*23)(RSP)
+
+	// Initialize Go ABI environment
+	BL	runtime·load_g(SB)
+
+	BL	runtime·cgocallback(SB)
+
+	MOVD	(8*4)(RSP), R19
+	MOVD	(8*5)(RSP), R20
+	MOVD	(8*6)(RSP), R21
+	MOVD	(8*7)(RSP), R22
+	MOVD	(8*8)(RSP), R23
+	MOVD	(8*9)(RSP), R24
+	MOVD	(8*10)(RSP), R25
+	MOVD	(8*11)(RSP), R26
+	MOVD	(8*12)(RSP), R27
+	MOVD	(8*13)(RSP), g
+	MOVD	(8*14)(RSP), R29
+	MOVD	(8*15)(RSP), R30
+	FMOVD	(8*16)(RSP), F8
+	FMOVD	(8*17)(RSP), F9
+	FMOVD	(8*18)(RSP), F10
+	FMOVD	(8*19)(RSP), F11
+	FMOVD	(8*20)(RSP), F12
+	FMOVD	(8*21)(RSP), F13
+	FMOVD	(8*22)(RSP), F14
+	FMOVD	(8*23)(RSP), F15
+	ADD	$(8*24), RSP
+	RET
diff --git a/src/runtime/cgo/asm_mips64x.s b/src/runtime/cgo/asm_mips64x.s
new file mode 100644
index 0000000..e51cdf3
--- /dev/null
+++ b/src/runtime/cgo/asm_mips64x.s
@@ -0,0 +1,82 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+#include "textflag.h"
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	/*
+	 * We still need to save all callee save register as before, and then
+	 *  push 3 args for fn (R4, R5, R7), skipping R6.
+	 * Also note that at procedure entry in gc world, 8(R29) will be the
+	 *  first arg.
+	 */
+#ifndef GOMIPS64_softfloat
+	ADDV	$(-8*23), R29
+#else
+	ADDV	$(-8*15), R29
+#endif
+	MOVV	R4, (8*1)(R29) // fn unsafe.Pointer
+	MOVV	R5, (8*2)(R29) // a unsafe.Pointer
+	MOVV	R7, (8*3)(R29) // ctxt uintptr
+	MOVV	R16, (8*4)(R29)
+	MOVV	R17, (8*5)(R29)
+	MOVV	R18, (8*6)(R29)
+	MOVV	R19, (8*7)(R29)
+	MOVV	R20, (8*8)(R29)
+	MOVV	R21, (8*9)(R29)
+	MOVV	R22, (8*10)(R29)
+	MOVV	R23, (8*11)(R29)
+	MOVV	RSB, (8*12)(R29)
+	MOVV	g, (8*13)(R29)
+	MOVV	R31, (8*14)(R29)
+#ifndef GOMIPS64_softfloat
+	MOVD	F24, (8*15)(R29)
+	MOVD	F25, (8*16)(R29)
+	MOVD	F26, (8*17)(R29)
+	MOVD	F27, (8*18)(R29)
+	MOVD	F28, (8*19)(R29)
+	MOVD	F29, (8*20)(R29)
+	MOVD	F30, (8*21)(R29)
+	MOVD	F31, (8*22)(R29)
+#endif
+	// Initialize Go ABI environment
+	// prepare SB register = PC & 0xffffffff00000000
+	BGEZAL	R0, 1(PC)
+	SRLV	$32, R31, RSB
+	SLLV	$32, RSB
+	JAL	runtime·load_g(SB)
+
+	JAL	runtime·cgocallback(SB)
+
+	MOVV	(8*4)(R29), R16
+	MOVV	(8*5)(R29), R17
+	MOVV	(8*6)(R29), R18
+	MOVV	(8*7)(R29), R19
+	MOVV	(8*8)(R29), R20
+	MOVV	(8*9)(R29), R21
+	MOVV	(8*10)(R29), R22
+	MOVV	(8*11)(R29), R23
+	MOVV	(8*12)(R29), RSB
+	MOVV	(8*13)(R29), g
+	MOVV	(8*14)(R29), R31
+#ifndef GOMIPS64_softfloat
+	MOVD	(8*15)(R29), F24
+	MOVD	(8*16)(R29), F25
+	MOVD	(8*17)(R29), F26
+	MOVD	(8*18)(R29), F27
+	MOVD	(8*19)(R29), F28
+	MOVD	(8*20)(R29), F29
+	MOVD	(8*21)(R29), F30
+	MOVD	(8*22)(R29), F31
+	ADDV	$(8*23), R29
+#else
+	ADDV	$(8*15), R29
+#endif
+	RET
diff --git a/src/runtime/cgo/asm_mipsx.s b/src/runtime/cgo/asm_mipsx.s
new file mode 100644
index 0000000..1127c8b
--- /dev/null
+++ b/src/runtime/cgo/asm_mipsx.s
@@ -0,0 +1,75 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+#include "textflag.h"
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	/*
+	 * We still need to save all callee save register as before, and then
+	 *  push 3 args for fn (R4, R5, R7), skipping R6.
+	 * Also note that at procedure entry in gc world, 4(R29) will be the
+	 *  first arg.
+	 */
+
+	// Space for 9 caller-saved GPR + LR + 6 caller-saved FPR.
+	// O32 ABI allows us to smash 16 bytes argument area of caller frame.
+#ifndef GOMIPS_softfloat
+	SUBU	$(4*14+8*6-16), R29
+#else
+	SUBU	$(4*14-16), R29	// For soft-float, no FPR.
+#endif
+	MOVW	R4, (4*1)(R29)	// fn unsafe.Pointer
+	MOVW	R5, (4*2)(R29)	// a unsafe.Pointer
+	MOVW	R7, (4*3)(R29)	// ctxt uintptr
+	MOVW	R16, (4*4)(R29)
+	MOVW	R17, (4*5)(R29)
+	MOVW	R18, (4*6)(R29)
+	MOVW	R19, (4*7)(R29)
+	MOVW	R20, (4*8)(R29)
+	MOVW	R21, (4*9)(R29)
+	MOVW	R22, (4*10)(R29)
+	MOVW	R23, (4*11)(R29)
+	MOVW	g, (4*12)(R29)
+	MOVW	R31, (4*13)(R29)
+#ifndef GOMIPS_softfloat
+	MOVD	F20, (4*14)(R29)
+	MOVD	F22, (4*14+8*1)(R29)
+	MOVD	F24, (4*14+8*2)(R29)
+	MOVD	F26, (4*14+8*3)(R29)
+	MOVD	F28, (4*14+8*4)(R29)
+	MOVD	F30, (4*14+8*5)(R29)
+#endif
+	JAL	runtime·load_g(SB)
+
+	JAL	runtime·cgocallback(SB)
+
+	MOVW	(4*4)(R29), R16
+	MOVW	(4*5)(R29), R17
+	MOVW	(4*6)(R29), R18
+	MOVW	(4*7)(R29), R19
+	MOVW	(4*8)(R29), R20
+	MOVW	(4*9)(R29), R21
+	MOVW	(4*10)(R29), R22
+	MOVW	(4*11)(R29), R23
+	MOVW	(4*12)(R29), g
+	MOVW	(4*13)(R29), R31
+#ifndef GOMIPS_softfloat
+	MOVD	(4*14)(R29), F20
+	MOVD	(4*14+8*1)(R29), F22
+	MOVD	(4*14+8*2)(R29), F24
+	MOVD	(4*14+8*3)(R29), F26
+	MOVD	(4*14+8*4)(R29), F28
+	MOVD	(4*14+8*5)(R29), F30
+
+	ADDU	$(4*14+8*6-16), R29
+#else
+	ADDU	$(4*14-16), R29
+#endif
+	RET
diff --git a/src/runtime/cgo/asm_ppc64x.s b/src/runtime/cgo/asm_ppc64x.s
new file mode 100644
index 0000000..f4efc1e
--- /dev/null
+++ b/src/runtime/cgo/asm_ppc64x.s
@@ -0,0 +1,134 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+#include "textflag.h"
+#include "asm_ppc64x.h"
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	// Start with standard C stack frame layout and linkage
+	MOVD	LR, R0
+	MOVD	R0, 16(R1)	// Save LR in caller's frame
+	MOVW	CR, R0		// Save CR in caller's frame
+	MOVD	R0, 8(R1)
+	MOVD	R2, 24(R1)	// Save TOC in caller's frame
+
+	BL	saveregs2<>(SB)
+
+	MOVDU	R1, (-288-3*8-FIXED_FRAME)(R1)
+
+	// Initialize Go ABI environment
+	BL	runtime·reginit(SB)
+	BL	runtime·load_g(SB)
+
+#ifdef GOARCH_ppc64
+	// ppc64 use elf ABI v1. we must get the real entry address from
+	// first slot of the function descriptor before call.
+	// Same for AIX.
+	MOVD	8(R3), R2
+	MOVD	(R3), R3
+#endif
+	MOVD	R3, FIXED_FRAME+0(R1)	// fn unsafe.Pointer
+	MOVD	R4, FIXED_FRAME+8(R1)	// a unsafe.Pointer
+	// Skip R5 = n uint32
+	MOVD	R6, FIXED_FRAME+16(R1)	// ctxt uintptr
+	BL	runtime·cgocallback(SB)
+
+	ADD	$(288+3*8+FIXED_FRAME), R1
+
+	BL	restoreregs2<>(SB)
+
+	MOVD	24(R1), R2
+	MOVD	8(R1), R0
+	MOVFL	R0, $0xff
+	MOVD	16(R1), R0
+	MOVD	R0, LR
+	RET
+
+TEXT saveregs2<>(SB),NOSPLIT|NOFRAME,$0
+	// O=-288; for R in R{14..31}; do echo "\tMOVD\t$R, $O(R1)"|sed s/R30/g/; ((O+=8)); done; for F in F{14..31}; do echo "\tFMOVD\t$F, $O(R1)"; ((O+=8)); done
+	MOVD	R14, -288(R1)
+	MOVD	R15, -280(R1)
+	MOVD	R16, -272(R1)
+	MOVD	R17, -264(R1)
+	MOVD	R18, -256(R1)
+	MOVD	R19, -248(R1)
+	MOVD	R20, -240(R1)
+	MOVD	R21, -232(R1)
+	MOVD	R22, -224(R1)
+	MOVD	R23, -216(R1)
+	MOVD	R24, -208(R1)
+	MOVD	R25, -200(R1)
+	MOVD	R26, -192(R1)
+	MOVD	R27, -184(R1)
+	MOVD	R28, -176(R1)
+	MOVD	R29, -168(R1)
+	MOVD	g, -160(R1)
+	MOVD	R31, -152(R1)
+	FMOVD	F14, -144(R1)
+	FMOVD	F15, -136(R1)
+	FMOVD	F16, -128(R1)
+	FMOVD	F17, -120(R1)
+	FMOVD	F18, -112(R1)
+	FMOVD	F19, -104(R1)
+	FMOVD	F20, -96(R1)
+	FMOVD	F21, -88(R1)
+	FMOVD	F22, -80(R1)
+	FMOVD	F23, -72(R1)
+	FMOVD	F24, -64(R1)
+	FMOVD	F25, -56(R1)
+	FMOVD	F26, -48(R1)
+	FMOVD	F27, -40(R1)
+	FMOVD	F28, -32(R1)
+	FMOVD	F29, -24(R1)
+	FMOVD	F30, -16(R1)
+	FMOVD	F31, -8(R1)
+
+	RET
+
+TEXT restoreregs2<>(SB),NOSPLIT|NOFRAME,$0
+	// O=-288; for R in R{14..31}; do echo "\tMOVD\t$O(R1), $R"|sed s/R30/g/; ((O+=8)); done; for F in F{14..31}; do echo "\tFMOVD\t$O(R1), $F"; ((O+=8)); done
+	MOVD	-288(R1), R14
+	MOVD	-280(R1), R15
+	MOVD	-272(R1), R16
+	MOVD	-264(R1), R17
+	MOVD	-256(R1), R18
+	MOVD	-248(R1), R19
+	MOVD	-240(R1), R20
+	MOVD	-232(R1), R21
+	MOVD	-224(R1), R22
+	MOVD	-216(R1), R23
+	MOVD	-208(R1), R24
+	MOVD	-200(R1), R25
+	MOVD	-192(R1), R26
+	MOVD	-184(R1), R27
+	MOVD	-176(R1), R28
+	MOVD	-168(R1), R29
+	MOVD	-160(R1), g
+	MOVD	-152(R1), R31
+	FMOVD	-144(R1), F14
+	FMOVD	-136(R1), F15
+	FMOVD	-128(R1), F16
+	FMOVD	-120(R1), F17
+	FMOVD	-112(R1), F18
+	FMOVD	-104(R1), F19
+	FMOVD	-96(R1), F20
+	FMOVD	-88(R1), F21
+	FMOVD	-80(R1), F22
+	FMOVD	-72(R1), F23
+	FMOVD	-64(R1), F24
+	FMOVD	-56(R1), F25
+	FMOVD	-48(R1), F26
+	FMOVD	-40(R1), F27
+	FMOVD	-32(R1), F28
+	FMOVD	-24(R1), F29
+	FMOVD	-16(R1), F30
+	FMOVD	-8(R1), F31
+
+	RET
diff --git a/src/runtime/cgo/asm_riscv64.s b/src/runtime/cgo/asm_riscv64.s
new file mode 100644
index 0000000..b4ddbb0
--- /dev/null
+++ b/src/runtime/cgo/asm_riscv64.s
@@ -0,0 +1,84 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build riscv64
+
+#include "textflag.h"
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	/*
+	 * Push arguments for fn (X10, X11, X13), along with all callee-save
+	 * registers. Note that at procedure entry the first argument is at
+	 * 8(X2).
+	 */
+	ADD	$(-8*31), X2
+	MOV	X10, (8*1)(X2) // fn unsafe.Pointer
+	MOV	X11, (8*2)(X2) // a unsafe.Pointer
+	MOV	X13, (8*3)(X2) // ctxt uintptr
+	MOV	X8, (8*4)(X2)
+	MOV	X9, (8*5)(X2)
+	MOV	X18, (8*6)(X2)
+	MOV	X19, (8*7)(X2)
+	MOV	X20, (8*8)(X2)
+	MOV	X21, (8*9)(X2)
+	MOV	X22, (8*10)(X2)
+	MOV	X23, (8*11)(X2)
+	MOV	X24, (8*12)(X2)
+	MOV	X25, (8*13)(X2)
+	MOV	X26, (8*14)(X2)
+	MOV	g, (8*15)(X2)
+	MOV	X3, (8*16)(X2)
+	MOV	X4, (8*17)(X2)
+	MOV	X1, (8*18)(X2)
+	MOVD	F8, (8*19)(X2)
+	MOVD	F9, (8*20)(X2)
+	MOVD	F18, (8*21)(X2)
+	MOVD	F19, (8*22)(X2)
+	MOVD	F20, (8*23)(X2)
+	MOVD	F21, (8*24)(X2)
+	MOVD	F22, (8*25)(X2)
+	MOVD	F23, (8*26)(X2)
+	MOVD	F24, (8*27)(X2)
+	MOVD	F25, (8*28)(X2)
+	MOVD	F26, (8*29)(X2)
+	MOVD	F27, (8*30)(X2)
+
+	// Initialize Go ABI environment
+	CALL	runtime·load_g(SB)
+	CALL	runtime·cgocallback(SB)
+
+	MOV	(8*4)(X2), X8
+	MOV	(8*5)(X2), X9
+	MOV	(8*6)(X2), X18
+	MOV	(8*7)(X2), X19
+	MOV	(8*8)(X2), X20
+	MOV	(8*9)(X2), X21
+	MOV	(8*10)(X2), X22
+	MOV	(8*11)(X2), X23
+	MOV	(8*12)(X2), X24
+	MOV	(8*13)(X2), X25
+	MOV	(8*14)(X2), X26
+	MOV	(8*15)(X2), g
+	MOV	(8*16)(X2), X3
+	MOV	(8*17)(X2), X4
+	MOV	(8*18)(X2), X1
+	MOVD	(8*19)(X2), F8
+	MOVD	(8*20)(X2), F9
+	MOVD	(8*21)(X2), F18
+	MOVD	(8*22)(X2), F19
+	MOVD	(8*23)(X2), F20
+	MOVD	(8*24)(X2), F21
+	MOVD	(8*25)(X2), F22
+	MOVD	(8*26)(X2), F23
+	MOVD	(8*27)(X2), F24
+	MOVD	(8*28)(X2), F25
+	MOVD	(8*29)(X2), F26
+	MOVD	(8*30)(X2), F27
+	ADD	$(8*31), X2
+
+	RET
diff --git a/src/runtime/cgo/asm_s390x.s b/src/runtime/cgo/asm_s390x.s
new file mode 100644
index 0000000..8bf16e7
--- /dev/null
+++ b/src/runtime/cgo/asm_s390x.s
@@ -0,0 +1,55 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	// Start with standard C stack frame layout and linkage.
+
+	// Save R6-R15 in the register save area of the calling function.
+	STMG	R6, R15, 48(R15)
+
+	// Allocate 96 bytes on the stack.
+	MOVD	$-96(R15), R15
+
+	// Save F8-F15 in our stack frame.
+	FMOVD	F8, 32(R15)
+	FMOVD	F9, 40(R15)
+	FMOVD	F10, 48(R15)
+	FMOVD	F11, 56(R15)
+	FMOVD	F12, 64(R15)
+	FMOVD	F13, 72(R15)
+	FMOVD	F14, 80(R15)
+	FMOVD	F15, 88(R15)
+
+	// Initialize Go ABI environment.
+	BL	runtime·load_g(SB)
+
+	MOVD	R2, 8(R15)	// fn unsafe.Pointer
+	MOVD	R3, 16(R15)	// a unsafe.Pointer
+	// Skip R4 = n uint32
+	MOVD	R5, 24(R15)	// ctxt uintptr
+	BL	runtime·cgocallback(SB)
+
+	FMOVD	32(R15), F8
+	FMOVD	40(R15), F9
+	FMOVD	48(R15), F10
+	FMOVD	56(R15), F11
+	FMOVD	64(R15), F12
+	FMOVD	72(R15), F13
+	FMOVD	80(R15), F14
+	FMOVD	88(R15), F15
+
+	// De-allocate stack frame.
+	MOVD	$96(R15), R15
+
+	// Restore R6-R15.
+	LMG	48(R15), R6, R15
+
+	RET
+
diff --git a/src/runtime/cgo/asm_wasm.s b/src/runtime/cgo/asm_wasm.s
new file mode 100644
index 0000000..cb140eb
--- /dev/null
+++ b/src/runtime/cgo/asm_wasm.s
@@ -0,0 +1,8 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT crosscall2(SB), NOSPLIT, $0
+	UNDEF
diff --git a/src/runtime/cgo/callbacks.go b/src/runtime/cgo/callbacks.go
new file mode 100644
index 0000000..cd8b795
--- /dev/null
+++ b/src/runtime/cgo/callbacks.go
@@ -0,0 +1,106 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package cgo
+
+import "unsafe"
+
+// These utility functions are available to be called from code
+// compiled with gcc via crosscall2.
+
+// The declaration of crosscall2 is:
+//   void crosscall2(void (*fn)(void *), void *, int);
+//
+// We need to export the symbol crosscall2 in order to support
+// callbacks from shared libraries. This applies regardless of
+// linking mode.
+//
+// Compatibility note: SWIG uses crosscall2 in exactly one situation:
+// to call _cgo_panic using the pattern shown below. We need to keep
+// that pattern working. In particular, crosscall2 actually takes four
+// arguments, but it works to call it with three arguments when
+// calling _cgo_panic.
+//go:cgo_export_static crosscall2
+//go:cgo_export_dynamic crosscall2
+
+// Panic. The argument is converted into a Go string.
+
+// Call like this in code compiled with gcc:
+//   struct { const char *p; } a;
+//   a.p = /* string to pass to panic */;
+//   crosscall2(_cgo_panic, &a, sizeof a);
+//   /* The function call will not return.  */
+
+// TODO: We should export a regular C function to panic, change SWIG
+// to use that instead of the above pattern, and then we can drop
+// backwards-compatibility from crosscall2 and stop exporting it.
+
+//go:linkname _runtime_cgo_panic_internal runtime._cgo_panic_internal
+func _runtime_cgo_panic_internal(p *byte)
+
+//go:linkname _cgo_panic _cgo_panic
+//go:cgo_export_static _cgo_panic
+//go:cgo_export_dynamic _cgo_panic
+func _cgo_panic(a *struct{ cstr *byte }) {
+	_runtime_cgo_panic_internal(a.cstr)
+}
+
+//go:cgo_import_static x_cgo_init
+//go:linkname x_cgo_init x_cgo_init
+//go:linkname _cgo_init _cgo_init
+var x_cgo_init byte
+var _cgo_init = &x_cgo_init
+
+//go:cgo_import_static x_cgo_thread_start
+//go:linkname x_cgo_thread_start x_cgo_thread_start
+//go:linkname _cgo_thread_start _cgo_thread_start
+var x_cgo_thread_start byte
+var _cgo_thread_start = &x_cgo_thread_start
+
+// Creates a new system thread without updating any Go state.
+//
+// This method is invoked during shared library loading to create a new OS
+// thread to perform the runtime initialization. This method is similar to
+// _cgo_sys_thread_start except that it doesn't update any Go state.
+
+//go:cgo_import_static x_cgo_sys_thread_create
+//go:linkname x_cgo_sys_thread_create x_cgo_sys_thread_create
+//go:linkname _cgo_sys_thread_create _cgo_sys_thread_create
+var x_cgo_sys_thread_create byte
+var _cgo_sys_thread_create = &x_cgo_sys_thread_create
+
+// Notifies that the runtime has been initialized.
+//
+// We currently block at every CGO entry point (via _cgo_wait_runtime_init_done)
+// to ensure that the runtime has been initialized before the CGO call is
+// executed. This is necessary for shared libraries where we kickoff runtime
+// initialization in a separate thread and return without waiting for this
+// thread to complete the init.
+
+//go:cgo_import_static x_cgo_notify_runtime_init_done
+//go:linkname x_cgo_notify_runtime_init_done x_cgo_notify_runtime_init_done
+//go:linkname _cgo_notify_runtime_init_done _cgo_notify_runtime_init_done
+var x_cgo_notify_runtime_init_done byte
+var _cgo_notify_runtime_init_done = &x_cgo_notify_runtime_init_done
+
+// Sets the traceback context function. See runtime.SetCgoTraceback.
+
+//go:cgo_import_static x_cgo_set_context_function
+//go:linkname x_cgo_set_context_function x_cgo_set_context_function
+//go:linkname _cgo_set_context_function _cgo_set_context_function
+var x_cgo_set_context_function byte
+var _cgo_set_context_function = &x_cgo_set_context_function
+
+// Calls a libc function to execute background work injected via libc
+// interceptors, such as processing pending signals under the thread
+// sanitizer.
+//
+// Left as a nil pointer if no libc interceptors are expected.
+
+//go:cgo_import_static _cgo_yield
+//go:linkname _cgo_yield _cgo_yield
+var _cgo_yield unsafe.Pointer
+
+//go:cgo_export_static _cgo_topofstack
+//go:cgo_export_dynamic _cgo_topofstack
diff --git a/src/runtime/cgo/callbacks_aix.go b/src/runtime/cgo/callbacks_aix.go
new file mode 100644
index 0000000..f4b6fe2
--- /dev/null
+++ b/src/runtime/cgo/callbacks_aix.go
@@ -0,0 +1,11 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package cgo
+
+// These functions must be exported in order to perform
+// longcall on cgo programs (cf gcc_aix_ppc64.c).
+//go:cgo_export_static __cgo_topofstack
+//go:cgo_export_static runtime.rt0_go
+//go:cgo_export_static _rt0_ppc64_aix_lib
diff --git a/src/runtime/cgo/callbacks_traceback.go b/src/runtime/cgo/callbacks_traceback.go
new file mode 100644
index 0000000..cdadf9e
--- /dev/null
+++ b/src/runtime/cgo/callbacks_traceback.go
@@ -0,0 +1,17 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build darwin linux
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+// Calls the traceback function passed to SetCgoTraceback.
+
+//go:cgo_import_static x_cgo_callers
+//go:linkname x_cgo_callers x_cgo_callers
+//go:linkname _cgo_callers _cgo_callers
+var x_cgo_callers byte
+var _cgo_callers = &x_cgo_callers
diff --git a/src/runtime/cgo/cgo.go b/src/runtime/cgo/cgo.go
new file mode 100644
index 0000000..4d2caf6
--- /dev/null
+++ b/src/runtime/cgo/cgo.go
@@ -0,0 +1,34 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+Package cgo contains runtime support for code generated
+by the cgo tool.  See the documentation for the cgo command
+for details on using cgo.
+*/
+package cgo
+
+/*
+
+#cgo darwin,!arm64 LDFLAGS: -lpthread
+#cgo darwin,arm64 LDFLAGS: -framework CoreFoundation
+#cgo dragonfly LDFLAGS: -lpthread
+#cgo freebsd LDFLAGS: -lpthread
+#cgo android LDFLAGS: -llog
+#cgo !android,linux LDFLAGS: -lpthread
+#cgo netbsd LDFLAGS: -lpthread
+#cgo openbsd LDFLAGS: -lpthread
+#cgo aix LDFLAGS: -Wl,-berok
+#cgo solaris LDFLAGS: -lxnet
+#cgo illumos LDFLAGS: -lsocket
+
+// Issue 35247.
+#cgo darwin CFLAGS: -Wno-nullability-completeness
+
+#cgo CFLAGS: -Wall -Werror
+
+#cgo solaris CPPFLAGS: -D_POSIX_PTHREAD_SEMANTICS
+
+*/
+import "C"
diff --git a/src/runtime/cgo/dragonfly.go b/src/runtime/cgo/dragonfly.go
new file mode 100644
index 0000000..d6d6918
--- /dev/null
+++ b/src/runtime/cgo/dragonfly.go
@@ -0,0 +1,19 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build dragonfly
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+// Supply environ and __progname, because we don't
+// link against the standard DragonFly crt0.o and the
+// libc dynamic library needs them.
+
+//go:linkname _environ environ
+//go:linkname _progname __progname
+
+var _environ uintptr
+var _progname uintptr
diff --git a/src/runtime/cgo/freebsd.go b/src/runtime/cgo/freebsd.go
new file mode 100644
index 0000000..5c9ddbd
--- /dev/null
+++ b/src/runtime/cgo/freebsd.go
@@ -0,0 +1,22 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build freebsd
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+// Supply environ and __progname, because we don't
+// link against the standard FreeBSD crt0.o and the
+// libc dynamic library needs them.
+
+//go:linkname _environ environ
+//go:linkname _progname __progname
+
+//go:cgo_export_dynamic environ
+//go:cgo_export_dynamic __progname
+
+var _environ uintptr
+var _progname uintptr
diff --git a/src/runtime/cgo/gcc_386.S b/src/runtime/cgo/gcc_386.S
new file mode 100644
index 0000000..ff55b2c
--- /dev/null
+++ b/src/runtime/cgo/gcc_386.S
@@ -0,0 +1,40 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+ * Apple still insists on underscore prefixes for C function names.
+ */
+#if defined(__APPLE__) || defined(_WIN32)
+#define EXT(s) _##s
+#else
+#define EXT(s) s
+#endif
+
+/*
+ * void crosscall_386(void (*fn)(void))
+ *
+ * Calling into the 8c tool chain, where all registers are caller save.
+ * Called from standard x86 ABI, where %ebp, %ebx, %esi,
+ * and %edi are callee-save, so they must be saved explicitly.
+ */
+.globl EXT(crosscall_386)
+EXT(crosscall_386):
+	pushl %ebp
+	movl %esp, %ebp
+	pushl %ebx
+	pushl %esi
+	pushl %edi
+
+	movl 8(%ebp), %eax	/* fn */
+	call *%eax
+
+	popl %edi
+	popl %esi
+	popl %ebx
+	popl %ebp
+	ret
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",@progbits
+#endif
diff --git a/src/runtime/cgo/gcc_aix_ppc64.S b/src/runtime/cgo/gcc_aix_ppc64.S
new file mode 100644
index 0000000..a00fae2
--- /dev/null
+++ b/src/runtime/cgo/gcc_aix_ppc64.S
@@ -0,0 +1,133 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64
+// +build aix
+
+/*
+ * void crosscall_ppc64(void (*fn)(void), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard ppc64 C ABI, where r2, r14-r31, f14-f31 are
+ * callee-save, so they must be saved explicitly.
+ * AIX has a special assembly syntax and keywords that can be mixed with
+ * Linux assembly.
+ */
+  .toc
+  .csect .text[PR]
+  .globl crosscall_ppc64
+  .globl .crosscall_ppc64
+  .csect crosscall_ppc64[DS]
+crosscall_ppc64:
+  .llong .crosscall_ppc64, TOC[tc0], 0
+  .csect .text[PR]
+.crosscall_ppc64:
+	// Start with standard C stack frame layout and linkage
+	mflr	0
+	std	0, 16(1)	// Save LR in caller's frame
+	std	2, 40(1)	// Save TOC in caller's frame
+	bl	saveregs
+	stdu	1, -296(1)
+
+	// Set up Go ABI constant registers
+	// Must match _cgo_reginit in runtime package.
+	xor 0, 0, 0
+
+	// Restore g pointer (r30 in Go ABI, which may have been clobbered by C)
+	mr	30, 4
+
+	// Call fn
+	mr	12, 3
+	mtctr	12
+	bctrl
+
+	addi	1, 1, 296
+	bl	restoreregs
+	ld	2, 40(1)
+	ld	0, 16(1)
+	mtlr	0
+	blr
+
+saveregs:
+	// Save callee-save registers
+	// O=-288; for R in {14..31}; do echo "\tstd\t$R, $O(1)"; ((O+=8)); done; for F in f{14..31}; do echo "\tstfd\t$F, $O(1)"; ((O+=8)); done
+	std	14, -288(1)
+	std	15, -280(1)
+	std	16, -272(1)
+	std	17, -264(1)
+	std	18, -256(1)
+	std	19, -248(1)
+	std	20, -240(1)
+	std	21, -232(1)
+	std	22, -224(1)
+	std	23, -216(1)
+	std	24, -208(1)
+	std	25, -200(1)
+	std	26, -192(1)
+	std	27, -184(1)
+	std	28, -176(1)
+	std	29, -168(1)
+	std	30, -160(1)
+	std	31, -152(1)
+	stfd	14, -144(1)
+	stfd	15, -136(1)
+	stfd	16, -128(1)
+	stfd	17, -120(1)
+	stfd	18, -112(1)
+	stfd	19, -104(1)
+	stfd	20, -96(1)
+	stfd	21, -88(1)
+	stfd	22, -80(1)
+	stfd	23, -72(1)
+	stfd	24, -64(1)
+	stfd	25, -56(1)
+	stfd	26, -48(1)
+	stfd	27, -40(1)
+	stfd	28, -32(1)
+	stfd	29, -24(1)
+	stfd	30, -16(1)
+	stfd	31, -8(1)
+
+	blr
+
+restoreregs:
+	// O=-288; for R in {14..31}; do echo "\tld\t$R, $O(1)"; ((O+=8)); done; for F in {14..31}; do echo "\tlfd\t$F, $O(1)"; ((O+=8)); done
+	ld	14, -288(1)
+	ld	15, -280(1)
+	ld	16, -272(1)
+	ld	17, -264(1)
+	ld	18, -256(1)
+	ld	19, -248(1)
+	ld	20, -240(1)
+	ld	21, -232(1)
+	ld	22, -224(1)
+	ld	23, -216(1)
+	ld	24, -208(1)
+	ld	25, -200(1)
+	ld	26, -192(1)
+	ld	27, -184(1)
+	ld	28, -176(1)
+	ld	29, -168(1)
+	ld	30, -160(1)
+	ld	31, -152(1)
+	lfd	14, -144(1)
+	lfd	15, -136(1)
+	lfd	16, -128(1)
+	lfd	17, -120(1)
+	lfd	18, -112(1)
+	lfd	19, -104(1)
+	lfd	20, -96(1)
+	lfd	21, -88(1)
+	lfd	22, -80(1)
+	lfd	23, -72(1)
+	lfd	24, -64(1)
+	lfd	25, -56(1)
+	lfd	26, -48(1)
+	lfd	27, -40(1)
+	lfd	28, -32(1)
+	lfd	29, -24(1)
+	lfd	30, -16(1)
+	lfd	31, -8(1)
+
+	blr
diff --git a/src/runtime/cgo/gcc_aix_ppc64.c b/src/runtime/cgo/gcc_aix_ppc64.c
new file mode 100644
index 0000000..f4f50b8
--- /dev/null
+++ b/src/runtime/cgo/gcc_aix_ppc64.c
@@ -0,0 +1,38 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix
+// +build ppc64 ppc64le
+
+/*
+ * On AIX, call to _cgo_topofstack and Go main are forced to be a longcall.
+ * Without it, ld might add trampolines in the middle of .text section
+ * to reach these functions which are normally declared in runtime package.
+ */
+extern int __attribute__((longcall)) __cgo_topofstack(void);
+extern int __attribute__((longcall)) runtime_rt0_go(int argc, char **argv);
+extern void __attribute__((longcall)) _rt0_ppc64_aix_lib(void);
+
+int _cgo_topofstack(void) {
+	return __cgo_topofstack();
+}
+
+int main(int argc, char **argv) {
+	return runtime_rt0_go(argc, argv);
+}
+
+static void libinit(void) __attribute__ ((constructor));
+
+/*
+ * libinit aims to replace .init_array section which isn't available on aix.
+ * Using __attribute__ ((constructor)) let gcc handles this instead of
+ * adding special code in cmd/link.
+ * However, it will be called for every Go programs which has cgo.
+ * Inside _rt0_ppc64_aix_lib(), runtime.isarchive is checked in order
+ * to know if this program is a c-archive or a simple cgo program.
+ * If it's not set, _rt0_ppc64_ax_lib() returns directly.
+ */
+static void libinit() {
+	_rt0_ppc64_aix_lib();
+}
diff --git a/src/runtime/cgo/gcc_amd64.S b/src/runtime/cgo/gcc_amd64.S
new file mode 100644
index 0000000..17d9d47
--- /dev/null
+++ b/src/runtime/cgo/gcc_amd64.S
@@ -0,0 +1,48 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+ * Apple still insists on underscore prefixes for C function names.
+ */
+#if defined(__APPLE__)
+#define EXT(s) _##s
+#else
+#define EXT(s) s
+#endif
+
+/*
+ * void crosscall_amd64(void (*fn)(void))
+ *
+ * Calling into the 6c tool chain, where all registers are caller save.
+ * Called from standard x86-64 ABI, where %rbx, %rbp, %r12-%r15
+ * are callee-save so they must be saved explicitly.
+ * The standard x86-64 ABI passes the three arguments m, g, fn
+ * in %rdi, %rsi, %rdx.
+ */
+.globl EXT(crosscall_amd64)
+EXT(crosscall_amd64):
+	pushq %rbx
+	pushq %rbp
+	pushq %r12
+	pushq %r13
+	pushq %r14
+	pushq %r15
+
+#if defined(_WIN64)
+	call *%rcx	/* fn */
+#else
+	call *%rdi	/* fn */
+#endif
+
+	popq %r15
+	popq %r14
+	popq %r13
+	popq %r12
+	popq %rbp
+	popq %rbx
+	ret
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",@progbits
+#endif
diff --git a/src/runtime/cgo/gcc_android.c b/src/runtime/cgo/gcc_android.c
new file mode 100644
index 0000000..7ea2135
--- /dev/null
+++ b/src/runtime/cgo/gcc_android.c
@@ -0,0 +1,90 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <stdarg.h>
+#include <android/log.h>
+#include <pthread.h>
+#include <dlfcn.h>
+#include "libcgo.h"
+
+void
+fatalf(const char* format, ...)
+{
+	va_list ap;
+
+	// Write to both stderr and logcat.
+	//
+	// When running from an .apk, /dev/stderr and /dev/stdout
+	// redirect to /dev/null. And when running a test binary
+	// via adb shell, it's easy to miss logcat.
+
+	fprintf(stderr, "runtime/cgo: ");
+	va_start(ap, format);
+	vfprintf(stderr, format, ap);
+	va_end(ap);
+	fprintf(stderr, "\n");
+
+	va_start(ap, format);
+	__android_log_vprint(ANDROID_LOG_FATAL, "runtime/cgo", format, ap);
+	va_end(ap);
+
+	abort();
+}
+
+// Truncated to a different magic value on 32-bit; that's ok.
+#define magic1 (0x23581321345589ULL)
+
+// From https://android.googlesource.com/platform/bionic/+/refs/heads/android10-tests-release/libc/private/bionic_asm_tls.h#69.
+#define TLS_SLOT_APP 2
+
+// inittls allocates a thread-local storage slot for g.
+//
+// It finds the first available slot using pthread_key_create and uses
+// it as the offset value for runtime.tls_g.
+static void
+inittls(void **tlsg, void **tlsbase)
+{
+	pthread_key_t k;
+	int i, err;
+	void *handle, *get_ver, *off;
+
+	// Check for Android Q where we can use the free TLS_SLOT_APP slot.
+	handle = dlopen("libc.so", RTLD_LAZY);
+	if (handle == NULL) {
+		fatalf("inittls: failed to dlopen main program");
+		return;
+	}
+	// android_get_device_api_level is introduced in Android Q, so its mere presence
+	// is enough.
+	get_ver = dlsym(handle, "android_get_device_api_level");
+	dlclose(handle);
+	if (get_ver != NULL) {
+		off = (void *)(TLS_SLOT_APP*sizeof(void *));
+		// tlsg is initialized to Q's free TLS slot. Verify it while we're here.
+		if (*tlsg != off) {
+			fatalf("tlsg offset wrong, got %ld want %ld\n", *tlsg, off);
+		}
+		return;
+	}
+
+	err = pthread_key_create(&k, nil);
+	if(err != 0) {
+		fatalf("pthread_key_create failed: %d", err);
+	}
+	pthread_setspecific(k, (void*)magic1);
+	// If thread local slots are laid out as we expect, our magic word will
+	// be located at some low offset from tlsbase. However, just in case something went
+	// wrong, the search is limited to sensible offsets. PTHREAD_KEYS_MAX was the
+	// original limit, but issue 19472 made a higher limit necessary.
+	for (i=0; i<384; i++) {
+		if (*(tlsbase+i) == (void*)magic1) {
+			*tlsg = (void*)(i*sizeof(void *));
+			pthread_setspecific(k, 0);
+			return;
+		}
+	}
+	fatalf("inittls: could not find pthread key");
+}
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase) = inittls;
diff --git a/src/runtime/cgo/gcc_arm.S b/src/runtime/cgo/gcc_arm.S
new file mode 100644
index 0000000..fe1c48b
--- /dev/null
+++ b/src/runtime/cgo/gcc_arm.S
@@ -0,0 +1,42 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+ * Apple still insists on underscore prefixes for C function names.
+ */
+#if defined(__APPLE__)
+#define EXT(s) _##s
+#else
+#define EXT(s) s
+#endif
+
+// Apple's ld64 wants 4-byte alignment for ARM code sections.
+// .align in both Apple as and GNU as treat n as aligning to 2**n bytes.
+.align	2
+
+/*
+ * void crosscall_arm1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the 5c tool chain, where all registers are caller save.
+ * Called from standard ARM EABI, where r4-r11 are callee-save, so they
+ * must be saved explicitly.
+ */
+.globl EXT(crosscall_arm1)
+EXT(crosscall_arm1):
+	push {r4, r5, r6, r7, r8, r9, r10, r11, ip, lr}
+	mov r4, r0
+	mov r5, r1
+	mov r0, r2
+
+	// Because the assembler might target an earlier revision of the ISA
+	// by default, we encode BLX as a .word.
+	.word 0xe12fff35 // blx r5 // setg(g)
+	.word 0xe12fff34 // blx r4 // fn()
+
+	pop {r4, r5, r6, r7, r8, r9, r10, r11, ip, pc}
+
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_arm64.S b/src/runtime/cgo/gcc_arm64.S
new file mode 100644
index 0000000..9154d2a
--- /dev/null
+++ b/src/runtime/cgo/gcc_arm64.S
@@ -0,0 +1,82 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+ * Apple still insists on underscore prefixes for C function names.
+ */
+#if defined(__APPLE__)
+#define EXT(s) _##s
+#else
+#define EXT(s) s
+#endif
+
+// Apple's ld64 wants 4-byte alignment for ARM code sections.
+// .align in both Apple as and GNU as treat n as aligning to 2**n bytes.
+.align	2
+
+/*
+ * void crosscall1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard ARM EABI, where x19-x29 are callee-save, so they
+ * must be saved explicitly, along with x30 (LR).
+ */
+.globl EXT(crosscall1)
+EXT(crosscall1):
+	.cfi_startproc
+	stp x29, x30, [sp, #-96]!
+	.cfi_def_cfa_offset 96
+	.cfi_offset 29, -96
+	.cfi_offset 30, -88
+	mov x29, sp
+	.cfi_def_cfa_register 29
+	stp x19, x20, [sp, #80]
+	.cfi_offset 19, -16
+	.cfi_offset 20, -8
+	stp x21, x22, [sp, #64]
+	.cfi_offset 21, -32
+	.cfi_offset 22, -24
+	stp x23, x24, [sp, #48]
+	.cfi_offset 23, -48
+	.cfi_offset 24, -40
+	stp x25, x26, [sp, #32]
+	.cfi_offset 25, -64
+	.cfi_offset 26, -56
+	stp x27, x28, [sp, #16]
+	.cfi_offset 27, -80
+	.cfi_offset 28, -72
+
+	mov x19, x0
+	mov x20, x1
+	mov x0, x2
+
+	blr x20
+	blr x19
+
+	ldp x27, x28, [sp, #16]
+	.cfi_restore 27
+	.cfi_restore 28
+	ldp x25, x26, [sp, #32]
+	.cfi_restore 25
+	.cfi_restore 26
+	ldp x23, x24, [sp, #48]
+	.cfi_restore 23
+	.cfi_restore 24
+	ldp x21, x22, [sp, #64]
+	.cfi_restore 21
+	.cfi_restore 22
+	ldp x19, x20, [sp, #80]
+	.cfi_restore 19
+	.cfi_restore 20
+	ldp x29, x30, [sp], #96
+	.cfi_restore 29
+	.cfi_restore 30
+	.cfi_def_cfa 31, 0
+	ret
+	.cfi_endproc
+
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_context.c b/src/runtime/cgo/gcc_context.c
new file mode 100644
index 0000000..5fc0abb
--- /dev/null
+++ b/src/runtime/cgo/gcc_context.c
@@ -0,0 +1,21 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build cgo
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris windows
+
+#include "libcgo.h"
+
+// Releases the cgo traceback context.
+void _cgo_release_context(uintptr_t ctxt) {
+	void (*pfn)(struct context_arg*);
+
+	pfn = _cgo_get_context_function();
+	if (ctxt != 0 && pfn != nil) {
+		struct context_arg arg;
+
+		arg.Context = ctxt;
+		(*pfn)(&arg);
+	}
+}
diff --git a/src/runtime/cgo/gcc_darwin_amd64.c b/src/runtime/cgo/gcc_darwin_amd64.c
new file mode 100644
index 0000000..51410d5
--- /dev/null
+++ b/src/runtime/cgo/gcc_darwin_amd64.c
@@ -0,0 +1,66 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <string.h> /* for strerror */
+#include <pthread.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+
+void
+x_cgo_init(G *g)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// Move the g pointer into the slot reserved in thread local storage.
+	// Constant must match the one in cmd/link/internal/ld/sym.go.
+	asm volatile("movq %0, %%gs:0x30" :: "r"(ts.g));
+
+	crosscall_amd64(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_darwin_arm64.c b/src/runtime/cgo/gcc_darwin_arm64.c
new file mode 100644
index 0000000..a5f07f1
--- /dev/null
+++ b/src/runtime/cgo/gcc_darwin_arm64.c
@@ -0,0 +1,145 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <limits.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h> /* for strerror */
+#include <sys/param.h>
+#include <unistd.h>
+#include <stdlib.h>
+
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+#include <TargetConditionals.h>
+
+#if TARGET_OS_IPHONE
+#include <CoreFoundation/CFBundle.h>
+#include <CoreFoundation/CFString.h>
+#endif
+
+static void *threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	//fprintf(stderr, "runtime/cgo: _cgo_sys_thread_start: fn=%p, g=%p\n", ts->fn, ts->g); // debug
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	size = 0;
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+#if TARGET_OS_IPHONE
+	darwin_arm_init_thread_exception_port();
+#endif
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+#if TARGET_OS_IPHONE
+
+// init_working_dir sets the current working directory to the app root.
+// By default ios/arm64 processes start in "/".
+static void
+init_working_dir()
+{
+	CFBundleRef bundle = CFBundleGetMainBundle();
+	if (bundle == NULL) {
+		fprintf(stderr, "runtime/cgo: no main bundle\n");
+		return;
+	}
+	CFURLRef url_ref = CFBundleCopyResourceURL(bundle, CFSTR("Info"), CFSTR("plist"), NULL);
+	if (url_ref == NULL) {
+		// No Info.plist found. It can happen on Corellium virtual devices.
+		return;
+	}
+	CFStringRef url_str_ref = CFURLGetString(url_ref);
+	char buf[MAXPATHLEN];
+	Boolean res = CFStringGetCString(url_str_ref, buf, sizeof(buf), kCFStringEncodingUTF8);
+	CFRelease(url_ref);
+	if (!res) {
+		fprintf(stderr, "runtime/cgo: cannot get URL string\n");
+		return;
+	}
+
+	// url is of the form "file:///path/to/Info.plist".
+	// strip it down to the working directory "/path/to".
+	int url_len = strlen(buf);
+	if (url_len < sizeof("file://")+sizeof("/Info.plist")) {
+		fprintf(stderr, "runtime/cgo: bad URL: %s\n", buf);
+		return;
+	}
+	buf[url_len-sizeof("/Info.plist")+1] = 0;
+	char *dir = &buf[0] + sizeof("file://")-1;
+
+	if (chdir(dir) != 0) {
+		fprintf(stderr, "runtime/cgo: chdir(%s) failed\n", dir);
+	}
+
+	// The test harness in go_ios_exec passes the relative working directory
+	// in the GoExecWrapperWorkingDirectory property of the app bundle.
+	CFStringRef wd_ref = CFBundleGetValueForInfoDictionaryKey(bundle, CFSTR("GoExecWrapperWorkingDirectory"));
+	if (wd_ref != NULL) {
+		if (!CFStringGetCString(wd_ref, buf, sizeof(buf), kCFStringEncodingUTF8)) {
+			fprintf(stderr, "runtime/cgo: cannot get GoExecWrapperWorkingDirectory string\n");
+			return;
+		}
+		if (chdir(buf) != 0) {
+			fprintf(stderr, "runtime/cgo: chdir(%s) failed\n", buf);
+		}
+	}
+}
+
+#endif // TARGET_OS_IPHONE
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	//fprintf(stderr, "x_cgo_init = %p\n", &x_cgo_init); // aid debugging in presence of ASLR
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+#if TARGET_OS_IPHONE
+	darwin_arm_init_mach_exception_handler();
+	darwin_arm_init_thread_exception_port();
+	init_working_dir();
+#endif
+}
diff --git a/src/runtime/cgo/gcc_dragonfly_amd64.c b/src/runtime/cgo/gcc_dragonfly_amd64.c
new file mode 100644
index 0000000..d25db91
--- /dev/null
+++ b/src/runtime/cgo/gcc_dragonfly_amd64.c
@@ -0,0 +1,71 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_amd64(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_fatalf.c b/src/runtime/cgo/gcc_fatalf.c
new file mode 100644
index 0000000..597e750
--- /dev/null
+++ b/src/runtime/cgo/gcc_fatalf.c
@@ -0,0 +1,23 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix !android,linux freebsd
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include "libcgo.h"
+
+void
+fatalf(const char* format, ...)
+{
+	va_list ap;
+
+	fprintf(stderr, "runtime/cgo: ");
+	va_start(ap, format);
+	vfprintf(stderr, format, ap);
+	va_end(ap);
+	fprintf(stderr, "\n");
+	abort();
+}
diff --git a/src/runtime/cgo/gcc_freebsd_386.c b/src/runtime/cgo/gcc_freebsd_386.c
new file mode 100644
index 0000000..9097a2a
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_386.c
@@ -0,0 +1,71 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_386(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_freebsd_amd64.c b/src/runtime/cgo/gcc_freebsd_amd64.c
new file mode 100644
index 0000000..514a2f8
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_amd64.c
@@ -0,0 +1,79 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <errno.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t *attr;
+	size_t size;
+
+	// Deal with memory sanitizer/clang interaction.
+	// See gcc_linux_amd64.c for details.
+	setg_gcc = setg;
+	attr = (pthread_attr_t*)malloc(sizeof *attr);
+	if (attr == NULL) {
+		fatalf("malloc failed: %s", strerror(errno));
+	}
+	pthread_attr_init(attr);
+	pthread_attr_getstacksize(attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(attr);
+	free(attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	_cgo_tsan_acquire();
+	free(v);
+	_cgo_tsan_release();
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_amd64(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_freebsd_arm.c b/src/runtime/cgo/gcc_freebsd_arm.c
new file mode 100644
index 0000000..74f2e0e
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_arm.c
@@ -0,0 +1,83 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <machine/sysarch.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+#ifdef ARM_TP_ADDRESS
+// ARM_TP_ADDRESS is (ARM_VECTORS_HIGH + 0x1000) or 0xffff1000
+// and is known to runtime.read_tls_fallback. Verify it with
+// cpp.
+#if ARM_TP_ADDRESS != 0xffff1000
+#error Wrong ARM_TP_ADDRESS!
+#endif
+#endif
+
+static void *threadentry(void*);
+
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	// Not sure why the memset is necessary here,
+	// but without it, we get a bogus stack size
+	// out of pthread_attr_getstacksize. C'est la Linux.
+	memset(&attr, 0, sizeof attr);
+	pthread_attr_init(&attr);
+	size = 0;
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall_arm1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall_arm1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_freebsd_arm64.c b/src/runtime/cgo/gcc_freebsd_arm64.c
new file mode 100644
index 0000000..dd8f888
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_arm64.c
@@ -0,0 +1,68 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <errno.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_freebsd_sigaction.c b/src/runtime/cgo/gcc_freebsd_sigaction.c
new file mode 100644
index 0000000..98b122d
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_sigaction.c
@@ -0,0 +1,80 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build freebsd,amd64
+
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+#include <signal.h>
+
+#include "libcgo.h"
+
+// go_sigaction_t is a C version of the sigactiont struct from
+// os_freebsd.go.  This definition — and its conversion to and from struct
+// sigaction — are specific to freebsd/amd64.
+typedef struct {
+        uint32_t __bits[_SIG_WORDS];
+} go_sigset_t;
+typedef struct {
+	uintptr_t handler;
+	int32_t flags;
+	go_sigset_t mask;
+} go_sigaction_t;
+
+int32_t
+x_cgo_sigaction(intptr_t signum, const go_sigaction_t *goact, go_sigaction_t *oldgoact) {
+	int32_t ret;
+	struct sigaction act;
+	struct sigaction oldact;
+	size_t i;
+
+	_cgo_tsan_acquire();
+
+	memset(&act, 0, sizeof act);
+	memset(&oldact, 0, sizeof oldact);
+
+	if (goact) {
+		if (goact->flags & SA_SIGINFO) {
+			act.sa_sigaction = (void(*)(int, siginfo_t*, void*))(goact->handler);
+		} else {
+			act.sa_handler = (void(*)(int))(goact->handler);
+		}
+		sigemptyset(&act.sa_mask);
+		for (i = 0; i < 8 * sizeof(goact->mask); i++) {
+			if (goact->mask.__bits[i/32] & ((uint32_t)(1)<<(i&31))) {
+				sigaddset(&act.sa_mask, i+1);
+			}
+		}
+		act.sa_flags = goact->flags;
+	}
+
+	ret = sigaction(signum, goact ? &act : NULL, oldgoact ? &oldact : NULL);
+	if (ret == -1) {
+		// runtime.sigaction expects _cgo_sigaction to return errno on error.
+		_cgo_tsan_release();
+		return errno;
+	}
+
+	if (oldgoact) {
+		if (oldact.sa_flags & SA_SIGINFO) {
+			oldgoact->handler = (uintptr_t)(oldact.sa_sigaction);
+		} else {
+			oldgoact->handler = (uintptr_t)(oldact.sa_handler);
+		}
+		for (i = 0 ; i < _SIG_WORDS; i++) {
+			oldgoact->mask.__bits[i] = 0;
+		}
+		for (i = 0; i < 8 * sizeof(oldgoact->mask); i++) {
+			if (sigismember(&oldact.sa_mask, i+1) == 1) {
+				oldgoact->mask.__bits[i/32] |= (uint32_t)(1)<<(i&31);
+			}
+		}
+		oldgoact->flags = oldact.sa_flags;
+	}
+
+	_cgo_tsan_release();
+	return ret;
+}
diff --git a/src/runtime/cgo/gcc_libinit.c b/src/runtime/cgo/gcc_libinit.c
new file mode 100644
index 0000000..3304d95
--- /dev/null
+++ b/src/runtime/cgo/gcc_libinit.c
@@ -0,0 +1,113 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build cgo
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+
+#include <pthread.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h> // strerror
+#include <time.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static pthread_cond_t runtime_init_cond = PTHREAD_COND_INITIALIZER;
+static pthread_mutex_t runtime_init_mu = PTHREAD_MUTEX_INITIALIZER;
+static int runtime_init_done;
+
+// The context function, used when tracing back C calls into Go.
+static void (*cgo_context_function)(struct context_arg*);
+
+void
+x_cgo_sys_thread_create(void* (*func)(void*), void* arg) {
+	pthread_t p;
+	int err = _cgo_try_pthread_create(&p, NULL, func, arg);
+	if (err != 0) {
+		fprintf(stderr, "pthread_create failed: %s", strerror(err));
+		abort();
+	}
+}
+
+uintptr_t
+_cgo_wait_runtime_init_done(void) {
+	void (*pfn)(struct context_arg*);
+
+	pthread_mutex_lock(&runtime_init_mu);
+	while (runtime_init_done == 0) {
+		pthread_cond_wait(&runtime_init_cond, &runtime_init_mu);
+	}
+
+	// TODO(iant): For the case of a new C thread calling into Go, such
+	// as when using -buildmode=c-archive, we know that Go runtime
+	// initialization is complete but we do not know that all Go init
+	// functions have been run. We should not fetch cgo_context_function
+	// until they have been, because that is where a call to
+	// SetCgoTraceback is likely to occur. We are going to wait for Go
+	// initialization to be complete anyhow, later, by waiting for
+	// main_init_done to be closed in cgocallbackg1. We should wait here
+	// instead. See also issue #15943.
+	pfn = cgo_context_function;
+
+	pthread_mutex_unlock(&runtime_init_mu);
+	if (pfn != nil) {
+		struct context_arg arg;
+
+		arg.Context = 0;
+		(*pfn)(&arg);
+		return arg.Context;
+	}
+	return 0;
+}
+
+void
+x_cgo_notify_runtime_init_done(void* dummy __attribute__ ((unused))) {
+	pthread_mutex_lock(&runtime_init_mu);
+	runtime_init_done = 1;
+	pthread_cond_broadcast(&runtime_init_cond);
+	pthread_mutex_unlock(&runtime_init_mu);
+}
+
+// Sets the context function to call to record the traceback context
+// when calling a Go function from C code. Called from runtime.SetCgoTraceback.
+void x_cgo_set_context_function(void (*context)(struct context_arg*)) {
+	pthread_mutex_lock(&runtime_init_mu);
+	cgo_context_function = context;
+	pthread_mutex_unlock(&runtime_init_mu);
+}
+
+// Gets the context function.
+void (*(_cgo_get_context_function(void)))(struct context_arg*) {
+	void (*ret)(struct context_arg*);
+
+	pthread_mutex_lock(&runtime_init_mu);
+	ret = cgo_context_function;
+	pthread_mutex_unlock(&runtime_init_mu);
+	return ret;
+}
+
+// _cgo_try_pthread_create retries pthread_create if it fails with
+// EAGAIN.
+int
+_cgo_try_pthread_create(pthread_t* thread, const pthread_attr_t* attr, void* (*pfn)(void*), void* arg) {
+	int tries;
+	int err;
+	struct timespec ts;
+
+	for (tries = 0; tries < 20; tries++) {
+		err = pthread_create(thread, attr, pfn, arg);
+		if (err == 0) {
+			pthread_detach(*thread);
+			return 0;
+		}
+		if (err != EAGAIN) {
+			return err;
+		}
+		ts.tv_sec = 0;
+		ts.tv_nsec = (tries + 1) * 1000 * 1000; // Milliseconds.
+		nanosleep(&ts, nil);
+	}
+	return EAGAIN;
+}
diff --git a/src/runtime/cgo/gcc_libinit_windows.c b/src/runtime/cgo/gcc_libinit_windows.c
new file mode 100644
index 0000000..2732248
--- /dev/null
+++ b/src/runtime/cgo/gcc_libinit_windows.c
@@ -0,0 +1,125 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build cgo
+
+#define WIN64_LEAN_AND_MEAN
+#include <windows.h>
+#include <process.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+
+#include "libcgo.h"
+
+static volatile LONG runtime_init_once_gate = 0;
+static volatile LONG runtime_init_once_done = 0;
+
+static CRITICAL_SECTION runtime_init_cs;
+
+static HANDLE runtime_init_wait;
+static int runtime_init_done;
+
+// Pre-initialize the runtime synchronization objects
+void
+_cgo_preinit_init() {
+	 runtime_init_wait = CreateEvent(NULL, TRUE, FALSE, NULL);
+	 if (runtime_init_wait == NULL) {
+		fprintf(stderr, "runtime: failed to create runtime initialization wait event.\n");
+		abort();
+	 }
+
+	 InitializeCriticalSection(&runtime_init_cs);
+}
+
+// Make sure that the preinit sequence has run.
+void
+_cgo_maybe_run_preinit() {
+	 if (!InterlockedExchangeAdd(&runtime_init_once_done, 0)) {
+			if (InterlockedIncrement(&runtime_init_once_gate) == 1) {
+				 _cgo_preinit_init();
+				 InterlockedIncrement(&runtime_init_once_done);
+			} else {
+				 // Decrement to avoid overflow.
+				 InterlockedDecrement(&runtime_init_once_gate);
+				 while(!InterlockedExchangeAdd(&runtime_init_once_done, 0)) {
+						Sleep(0);
+				 }
+			}
+	 }
+}
+
+void
+x_cgo_sys_thread_create(void (*func)(void*), void* arg) {
+	uintptr_t thandle;
+
+	thandle = _beginthread(func, 0, arg);
+	if(thandle == -1) {
+		fprintf(stderr, "runtime: failed to create new OS thread (%d)\n", errno);
+		abort();
+	}
+}
+
+int
+_cgo_is_runtime_initialized() {
+	 EnterCriticalSection(&runtime_init_cs);
+	 int status = runtime_init_done;
+	 LeaveCriticalSection(&runtime_init_cs);
+	 return status;
+}
+
+uintptr_t
+_cgo_wait_runtime_init_done(void) {
+	void (*pfn)(struct context_arg*);
+
+	 _cgo_maybe_run_preinit();
+	while (!_cgo_is_runtime_initialized()) {
+			WaitForSingleObject(runtime_init_wait, INFINITE);
+	}
+	pfn = _cgo_get_context_function();
+	if (pfn != nil) {
+		struct context_arg arg;
+
+		arg.Context = 0;
+		(*pfn)(&arg);
+		return arg.Context;
+	}
+	return 0;
+}
+
+void
+x_cgo_notify_runtime_init_done(void* dummy) {
+	 _cgo_maybe_run_preinit();
+
+	 EnterCriticalSection(&runtime_init_cs);
+	runtime_init_done = 1;
+	 LeaveCriticalSection(&runtime_init_cs);
+
+	 if (!SetEvent(runtime_init_wait)) {
+		fprintf(stderr, "runtime: failed to signal runtime initialization complete.\n");
+		abort();
+	}
+}
+
+// The context function, used when tracing back C calls into Go.
+static void (*cgo_context_function)(struct context_arg*);
+
+// Sets the context function to call to record the traceback context
+// when calling a Go function from C code. Called from runtime.SetCgoTraceback.
+void x_cgo_set_context_function(void (*context)(struct context_arg*)) {
+	EnterCriticalSection(&runtime_init_cs);
+	cgo_context_function = context;
+	LeaveCriticalSection(&runtime_init_cs);
+}
+
+// Gets the context function.
+void (*(_cgo_get_context_function(void)))(struct context_arg*) {
+	void (*ret)(struct context_arg*);
+
+	EnterCriticalSection(&runtime_init_cs);
+	ret = cgo_context_function;
+	LeaveCriticalSection(&runtime_init_cs);
+	return ret;
+}
diff --git a/src/runtime/cgo/gcc_linux_386.c b/src/runtime/cgo/gcc_linux_386.c
new file mode 100644
index 0000000..70c942a
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_386.c
@@ -0,0 +1,79 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+static void (*setg_gcc)(void*);
+
+// This will be set in gcc_android.c for android-specific customization.
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase) __attribute__((common));
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	// Not sure why the memset is necessary here,
+	// but without it, we get a bogus stack size
+	// out of pthread_attr_getstacksize. C'est la Linux.
+	memset(&attr, 0, sizeof attr);
+	pthread_attr_init(&attr);
+	size = 0;
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_386(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_linux_amd64.c b/src/runtime/cgo/gcc_linux_amd64.c
new file mode 100644
index 0000000..f2bf648
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_amd64.c
@@ -0,0 +1,99 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <errno.h>
+#include <string.h> // strerror
+#include <signal.h>
+#include <stdlib.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+// This will be set in gcc_android.c for android-specific customization.
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase) __attribute__((common));
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t *attr;
+	size_t size;
+
+	/* The memory sanitizer distributed with versions of clang
+	   before 3.8 has a bug: if you call mmap before malloc, mmap
+	   may return an address that is later overwritten by the msan
+	   library.  Avoid this problem by forcing a call to malloc
+	   here, before we ever call malloc.
+
+	   This is only required for the memory sanitizer, so it's
+	   unfortunate that we always run it.  It should be possible
+	   to remove this when we no longer care about versions of
+	   clang before 3.8.  The test for this is
+	   misc/cgo/testsanitizers.
+
+	   GCC works hard to eliminate a seemingly unnecessary call to
+	   malloc, so we actually use the memory we allocate.  */
+
+	setg_gcc = setg;
+	attr = (pthread_attr_t*)malloc(sizeof *attr);
+	if (attr == NULL) {
+		fatalf("malloc failed: %s", strerror(errno));
+	}
+	pthread_attr_init(attr);
+	pthread_attr_getstacksize(attr, &size);
+	g->stacklo = (uintptr)&size - size + 4096;
+	pthread_attr_destroy(attr);
+	free(attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	_cgo_tsan_acquire();
+	free(v);
+	_cgo_tsan_release();
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_amd64(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_linux_arm.c b/src/runtime/cgo/gcc_linux_arm.c
new file mode 100644
index 0000000..5bc0fee
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_arm.c
@@ -0,0 +1,74 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase) __attribute__((common));
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	// Not sure why the memset is necessary here,
+	// but without it, we get a bogus stack size
+	// out of pthread_attr_getstacksize. C'est la Linux.
+	memset(&attr, 0, sizeof attr);
+	pthread_attr_init(&attr);
+	size = 0;
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall_arm1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall_arm1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_arm64.c b/src/runtime/cgo/gcc_linux_arm64.c
new file mode 100644
index 0000000..17ff274
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_arm64.c
@@ -0,0 +1,96 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <errno.h>
+#include <string.h>
+#include <signal.h>
+#include <stdlib.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase) __attribute__((common));
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	// Not sure why the memset is necessary here,
+	// but without it, we get a bogus stack size
+	// out of pthread_attr_getstacksize. C'est la Linux.
+	memset(&attr, 0, sizeof attr);
+	pthread_attr_init(&attr);
+	size = 0;
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t *attr;
+	size_t size;
+
+	/* The memory sanitizer distributed with versions of clang
+	   before 3.8 has a bug: if you call mmap before malloc, mmap
+	   may return an address that is later overwritten by the msan
+	   library.  Avoid this problem by forcing a call to malloc
+	   here, before we ever call malloc.
+
+	   This is only required for the memory sanitizer, so it's
+	   unfortunate that we always run it.  It should be possible
+	   to remove this when we no longer care about versions of
+	   clang before 3.8.  The test for this is
+	   misc/cgo/testsanitizers.
+
+	   GCC works hard to eliminate a seemingly unnecessary call to
+	   malloc, so we actually use the memory we allocate.  */
+
+	setg_gcc = setg;
+	attr = (pthread_attr_t*)malloc(sizeof *attr);
+	if (attr == NULL) {
+		fatalf("malloc failed: %s", strerror(errno));
+	}
+	pthread_attr_init(attr);
+	pthread_attr_getstacksize(attr, &size);
+	g->stacklo = (uintptr)&size - size + 4096;
+	pthread_attr_destroy(attr);
+	free(attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_mips64x.c b/src/runtime/cgo/gcc_linux_mips64x.c
new file mode 100644
index 0000000..42837b1
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_mips64x.c
@@ -0,0 +1,78 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build cgo
+// +build linux
+// +build mips64 mips64le
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	// Not sure why the memset is necessary here,
+	// but without it, we get a bogus stack size
+	// out of pthread_attr_getstacksize.  C'est la Linux.
+	memset(&attr, 0, sizeof attr);
+	pthread_attr_init(&attr);
+	size = 0;
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_mipsx.c b/src/runtime/cgo/gcc_linux_mipsx.c
new file mode 100644
index 0000000..a44ea30
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_mipsx.c
@@ -0,0 +1,80 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build cgo
+// +build linux
+// +build mips mipsle
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	// Not sure why the memset is necessary here,
+	// but without it, we get a bogus stack size
+	// out of pthread_attr_getstacksize.  C'est la Linux.
+	memset(&attr, 0, sizeof attr);
+	pthread_attr_init(&attr);
+	size = 0;
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+
+	memset(&attr, 0, sizeof attr);
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_ppc64x.S b/src/runtime/cgo/gcc_linux_ppc64x.S
new file mode 100644
index 0000000..595eb38
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_ppc64x.S
@@ -0,0 +1,138 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+// +build linux
+
+/*
+ * Apple still insists on underscore prefixes for C function names.
+ */
+#if defined(__APPLE__)
+#define EXT(s) _##s
+#else
+#define EXT(s) s
+#endif
+
+/*
+ * void crosscall_ppc64(void (*fn)(void), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard ppc64 C ABI, where r2, r14-r31, f14-f31 are
+ * callee-save, so they must be saved explicitly.
+ */
+.globl EXT(crosscall_ppc64)
+EXT(crosscall_ppc64):
+	// Start with standard C stack frame layout and linkage
+	mflr	%r0
+	std	%r0, 16(%r1)	// Save LR in caller's frame
+	std	%r2, 24(%r1)	// Save TOC in caller's frame
+	bl	saveregs
+	stdu	%r1, -296(%r1)
+
+	// Set up Go ABI constant registers
+	bl	_cgo_reginit
+	nop
+
+	// Restore g pointer (r30 in Go ABI, which may have been clobbered by C)
+	mr	%r30, %r4
+
+	// Call fn
+	mr	%r12, %r3
+	mtctr	%r3
+	bctrl
+
+	addi	%r1, %r1, 296
+	bl	restoreregs
+	ld	%r2, 24(%r1)
+	ld	%r0, 16(%r1)
+	mtlr	%r0
+	blr
+
+saveregs:
+	// Save callee-save registers
+	// O=-288; for R in %r{14..31}; do echo "\tstd\t$R, $O(%r1)"; ((O+=8)); done; for F in f{14..31}; do echo "\tstfd\t$F, $O(%r1)"; ((O+=8)); done
+	std	%r14, -288(%r1)
+	std	%r15, -280(%r1)
+	std	%r16, -272(%r1)
+	std	%r17, -264(%r1)
+	std	%r18, -256(%r1)
+	std	%r19, -248(%r1)
+	std	%r20, -240(%r1)
+	std	%r21, -232(%r1)
+	std	%r22, -224(%r1)
+	std	%r23, -216(%r1)
+	std	%r24, -208(%r1)
+	std	%r25, -200(%r1)
+	std	%r26, -192(%r1)
+	std	%r27, -184(%r1)
+	std	%r28, -176(%r1)
+	std	%r29, -168(%r1)
+	std	%r30, -160(%r1)
+	std	%r31, -152(%r1)
+	stfd	%f14, -144(%r1)
+	stfd	%f15, -136(%r1)
+	stfd	%f16, -128(%r1)
+	stfd	%f17, -120(%r1)
+	stfd	%f18, -112(%r1)
+	stfd	%f19, -104(%r1)
+	stfd	%f20, -96(%r1)
+	stfd	%f21, -88(%r1)
+	stfd	%f22, -80(%r1)
+	stfd	%f23, -72(%r1)
+	stfd	%f24, -64(%r1)
+	stfd	%f25, -56(%r1)
+	stfd	%f26, -48(%r1)
+	stfd	%f27, -40(%r1)
+	stfd	%f28, -32(%r1)
+	stfd	%f29, -24(%r1)
+	stfd	%f30, -16(%r1)
+	stfd	%f31, -8(%r1)
+
+	blr
+
+restoreregs:
+	// O=-288; for R in %r{14..31}; do echo "\tld\t$R, $O(%r1)"; ((O+=8)); done; for F in %f{14..31}; do echo "\tlfd\t$F, $O(%r1)"; ((O+=8)); done
+	ld	%r14, -288(%r1)
+	ld	%r15, -280(%r1)
+	ld	%r16, -272(%r1)
+	ld	%r17, -264(%r1)
+	ld	%r18, -256(%r1)
+	ld	%r19, -248(%r1)
+	ld	%r20, -240(%r1)
+	ld	%r21, -232(%r1)
+	ld	%r22, -224(%r1)
+	ld	%r23, -216(%r1)
+	ld	%r24, -208(%r1)
+	ld	%r25, -200(%r1)
+	ld	%r26, -192(%r1)
+	ld	%r27, -184(%r1)
+	ld	%r28, -176(%r1)
+	ld	%r29, -168(%r1)
+	ld	%r30, -160(%r1)
+	ld	%r31, -152(%r1)
+	lfd	%f14, -144(%r1)
+	lfd	%f15, -136(%r1)
+	lfd	%f16, -128(%r1)
+	lfd	%f17, -120(%r1)
+	lfd	%f18, -112(%r1)
+	lfd	%f19, -104(%r1)
+	lfd	%f20, -96(%r1)
+	lfd	%f21, -88(%r1)
+	lfd	%f22, -80(%r1)
+	lfd	%f23, -72(%r1)
+	lfd	%f24, -64(%r1)
+	lfd	%f25, -56(%r1)
+	lfd	%f26, -48(%r1)
+	lfd	%f27, -40(%r1)
+	lfd	%f28, -32(%r1)
+	lfd	%f29, -24(%r1)
+	lfd	%f30, -16(%r1)
+	lfd	%f31, -8(%r1)
+
+	blr
+
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_linux_riscv64.c b/src/runtime/cgo/gcc_linux_riscv64.c
new file mode 100644
index 0000000..22b76c2
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_riscv64.c
@@ -0,0 +1,74 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	// Not sure why the memset is necessary here,
+	// but without it, we get a bogus stack size
+	// out of pthread_attr_getstacksize.  C'est la Linux.
+	memset(&attr, 0, sizeof attr);
+	pthread_attr_init(&attr);
+	size = 0;
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_s390x.c b/src/runtime/cgo/gcc_linux_s390x.c
new file mode 100644
index 0000000..bb60048
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_s390x.c
@@ -0,0 +1,69 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall_s390x(void (*fn)(void), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// Save g for this thread in C TLS
+	setg_gcc((void*)ts.g);
+
+	crosscall_s390x(ts.fn, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_mips64x.S b/src/runtime/cgo/gcc_mips64x.S
new file mode 100644
index 0000000..908dd21
--- /dev/null
+++ b/src/runtime/cgo/gcc_mips64x.S
@@ -0,0 +1,87 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+/*
+ * void crosscall1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard MIPS N64 ABI, where $16-$23, $28, $30, and $f24-$f31
+ * are callee-save, so they must be saved explicitly, along with $31 (LR).
+ */
+.globl crosscall1
+.set noat
+crosscall1:
+#ifndef __mips_soft_float
+	daddiu	$29, $29, -160
+#else
+	daddiu	$29, $29, -96 // For soft-float, no need to make room for FP registers
+#endif
+	sd	$31, 0($29)
+	sd	$16, 8($29)
+	sd	$17, 16($29)
+	sd	$18, 24($29)
+	sd	$19, 32($29)
+	sd	$20, 40($29)
+	sd	$21, 48($29)
+	sd	$22, 56($29)
+	sd	$23, 64($29)
+	sd	$28, 72($29)
+	sd	$30, 80($29)
+#ifndef __mips_soft_float
+	sdc1	$f24, 88($29)
+	sdc1	$f25, 96($29)
+	sdc1	$f26, 104($29)
+	sdc1	$f27, 112($29)
+	sdc1	$f28, 120($29)
+	sdc1	$f29, 128($29)
+	sdc1	$f30, 136($29)
+	sdc1	$f31, 144($29)
+#endif
+
+	// prepare SB register = pc & 0xffffffff00000000
+	bal	1f
+1:
+	dsrl	$28, $31, 32
+	dsll	$28, $28, 32
+
+	move	$20, $4 // save R4
+	move	$1, $6
+	jalr	$5	// call setg_gcc (clobbers R4)
+	jalr	$20	// call fn
+
+	ld	$16, 8($29)
+	ld	$17, 16($29)
+	ld	$18, 24($29)
+	ld	$19, 32($29)
+	ld	$20, 40($29)
+	ld	$21, 48($29)
+	ld	$22, 56($29)
+	ld	$23, 64($29)
+	ld	$28, 72($29)
+	ld	$30, 80($29)
+#ifndef __mips_soft_float
+	ldc1	$f24, 88($29)
+	ldc1	$f25, 96($29)
+	ldc1	$f26, 104($29)
+	ldc1	$f27, 112($29)
+	ldc1	$f28, 120($29)
+	ldc1	$f29, 128($29)
+	ldc1	$f30, 136($29)
+	ldc1	$f31, 144($29)
+#endif
+	ld	$31, 0($29)
+#ifndef __mips_soft_float
+	daddiu	$29, $29, 160
+#else
+	daddiu	$29, $29, 96
+#endif
+	jr	$31
+
+.set at
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_mipsx.S b/src/runtime/cgo/gcc_mipsx.S
new file mode 100644
index 0000000..54f4b82
--- /dev/null
+++ b/src/runtime/cgo/gcc_mipsx.S
@@ -0,0 +1,75 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+/*
+ * void crosscall1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard MIPS O32 ABI, where $16-$23, $30, and $f20-$f31
+ * are callee-save, so they must be saved explicitly, along with $31 (LR).
+ */
+.globl crosscall1
+.set noat
+crosscall1:
+#ifndef __mips_soft_float
+	addiu	$29, $29, -88
+#else
+	addiu	$29, $29, -40 // For soft-float, no need to make room for FP registers
+#endif
+	sw	$31, 0($29)
+	sw	$16, 4($29)
+	sw	$17, 8($29)
+	sw	$18, 12($29)
+	sw	$19, 16($29)
+	sw	$20, 20($29)
+	sw	$21, 24($29)
+	sw	$22, 28($29)
+	sw	$23, 32($29)
+	sw	$30, 36($29)
+
+#ifndef __mips_soft_float
+	sdc1	$f20, 40($29)
+	sdc1	$f22, 48($29)
+	sdc1	$f24, 56($29)
+	sdc1	$f26, 64($29)
+	sdc1	$f28, 72($29)
+	sdc1	$f30, 80($29)
+#endif
+	move	$20, $4 // save R4
+	move	$4, $6
+	jalr	$5	// call setg_gcc
+	jalr	$20	// call fn
+
+	lw	$16, 4($29)
+	lw	$17, 8($29)
+	lw	$18, 12($29)
+	lw	$19, 16($29)
+	lw	$20, 20($29)
+	lw	$21, 24($29)
+	lw	$22, 28($29)
+	lw	$23, 32($29)
+	lw	$30, 36($29)
+#ifndef __mips_soft_float
+	ldc1	$f20, 40($29)
+	ldc1	$f22, 48($29)
+	ldc1	$f24, 56($29)
+	ldc1	$f26, 64($29)
+	ldc1	$f28, 72($29)
+	ldc1	$f30, 80($29)
+#endif
+	lw	$31, 0($29)
+#ifndef __mips_soft_float
+	addiu	$29, $29, 88
+#else
+	addiu	$29, $29, 40
+#endif
+	jr	$31
+
+.set at
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_mmap.c b/src/runtime/cgo/gcc_mmap.c
new file mode 100644
index 0000000..e6a621d
--- /dev/null
+++ b/src/runtime/cgo/gcc_mmap.c
@@ -0,0 +1,39 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux,amd64 linux,arm64
+
+#include <errno.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <sys/mman.h>
+
+#include "libcgo.h"
+
+uintptr_t
+x_cgo_mmap(void *addr, uintptr_t length, int32_t prot, int32_t flags, int32_t fd, uint32_t offset) {
+	void *p;
+
+	_cgo_tsan_acquire();
+	p = mmap(addr, length, prot, flags, fd, offset);
+	_cgo_tsan_release();
+	if (p == MAP_FAILED) {
+		/* This is what the Go code expects on failure.  */
+		return (uintptr_t)errno;
+	}
+	return (uintptr_t)p;
+}
+
+void
+x_cgo_munmap(void *addr, uintptr_t length) {
+	int r;
+
+	_cgo_tsan_acquire();
+	r = munmap(addr, length);
+	_cgo_tsan_release();
+	if (r < 0) {
+		/* The Go runtime is not prepared for munmap to fail.  */
+		abort();
+	}
+}
diff --git a/src/runtime/cgo/gcc_netbsd_386.c b/src/runtime/cgo/gcc_netbsd_386.c
new file mode 100644
index 0000000..5495f0f
--- /dev/null
+++ b/src/runtime/cgo/gcc_netbsd_386.c
@@ -0,0 +1,82 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+	stack_t ss;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	// On NetBSD, a new thread inherits the signal stack of the
+	// creating thread. That confuses minit, so we remove that
+	// signal stack here before calling the regular mstart. It's
+	// a bit baroque to remove a signal stack here only to add one
+	// in minit, but it's a simple change that keeps NetBSD
+	// working like other OS's. At this point all signals are
+	// blocked, so there is no race.
+	memset(&ss, 0, sizeof ss);
+	ss.ss_flags = SS_DISABLE;
+	sigaltstack(&ss, nil);
+
+	crosscall_386(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_netbsd_amd64.c b/src/runtime/cgo/gcc_netbsd_amd64.c
new file mode 100644
index 0000000..dc966fc
--- /dev/null
+++ b/src/runtime/cgo/gcc_netbsd_amd64.c
@@ -0,0 +1,83 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+	stack_t ss;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	// On NetBSD, a new thread inherits the signal stack of the
+	// creating thread. That confuses minit, so we remove that
+	// signal stack here before calling the regular mstart. It's
+	// a bit baroque to remove a signal stack here only to add one
+	// in minit, but it's a simple change that keeps NetBSD
+	// working like other OS's. At this point all signals are
+	// blocked, so there is no race.
+	memset(&ss, 0, sizeof ss);
+	ss.ss_flags = SS_DISABLE;
+	sigaltstack(&ss, nil);
+
+	crosscall_amd64(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_netbsd_arm.c b/src/runtime/cgo/gcc_netbsd_arm.c
new file mode 100644
index 0000000..b0c80ea
--- /dev/null
+++ b/src/runtime/cgo/gcc_netbsd_arm.c
@@ -0,0 +1,79 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall_arm1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+	stack_t ss;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// On NetBSD, a new thread inherits the signal stack of the
+	// creating thread. That confuses minit, so we remove that
+	// signal stack here before calling the regular mstart. It's
+	// a bit baroque to remove a signal stack here only to add one
+	// in minit, but it's a simple change that keeps NetBSD
+	// working like other OS's. At this point all signals are
+	// blocked, so there is no race.
+	memset(&ss, 0, sizeof ss);
+	ss.ss_flags = SS_DISABLE;
+	sigaltstack(&ss, nil);
+
+	crosscall_arm1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_netbsd_arm64.c b/src/runtime/cgo/gcc_netbsd_arm64.c
new file mode 100644
index 0000000..694116c
--- /dev/null
+++ b/src/runtime/cgo/gcc_netbsd_arm64.c
@@ -0,0 +1,80 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+	stack_t ss;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// On NetBSD, a new thread inherits the signal stack of the
+	// creating thread. That confuses minit, so we remove that
+	// signal stack here before calling the regular mstart. It's
+	// a bit baroque to remove a signal stack here only to add one
+	// in minit, but it's a simple change that keeps NetBSD
+	// working like other OS's. At this point all signals are
+	// blocked, so there is no race.
+	memset(&ss, 0, sizeof ss);
+	ss.ss_flags = SS_DISABLE;
+	sigaltstack(&ss, nil);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_openbsd_386.c b/src/runtime/cgo/gcc_openbsd_386.c
new file mode 100644
index 0000000..127a1b6
--- /dev/null
+++ b/src/runtime/cgo/gcc_openbsd_386.c
@@ -0,0 +1,70 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_386(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_openbsd_amd64.c b/src/runtime/cgo/gcc_openbsd_amd64.c
new file mode 100644
index 0000000..34319fb
--- /dev/null
+++ b/src/runtime/cgo/gcc_openbsd_amd64.c
@@ -0,0 +1,70 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_amd64(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_openbsd_arm.c b/src/runtime/cgo/gcc_openbsd_arm.c
new file mode 100644
index 0000000..9a5757f
--- /dev/null
+++ b/src/runtime/cgo/gcc_openbsd_arm.c
@@ -0,0 +1,67 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall_arm1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall_arm1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_openbsd_arm64.c b/src/runtime/cgo/gcc_openbsd_arm64.c
new file mode 100644
index 0000000..abf9f66
--- /dev/null
+++ b/src/runtime/cgo/gcc_openbsd_arm64.c
@@ -0,0 +1,67 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_ppc64x.c b/src/runtime/cgo/gcc_ppc64x.c
new file mode 100644
index 0000000..9cb6e0c
--- /dev/null
+++ b/src/runtime/cgo/gcc_ppc64x.c
@@ -0,0 +1,71 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall_ppc64(void (*fn)(void), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// Save g for this thread in C TLS
+	setg_gcc((void*)ts.g);
+
+	crosscall_ppc64(ts.fn, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_riscv64.S b/src/runtime/cgo/gcc_riscv64.S
new file mode 100644
index 0000000..f429dc6
--- /dev/null
+++ b/src/runtime/cgo/gcc_riscv64.S
@@ -0,0 +1,80 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+ * void crosscall1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard RISCV ELF psABI, where x8-x9, x18-x27, f8-f9 and
+ * f18-f27 are callee-save, so they must be saved explicitly, along with
+ * x1 (LR).
+ */
+.globl crosscall1
+crosscall1:
+	sd	x1, -200(sp)
+	addi	sp, sp, -200
+	sd	x8, 8(sp)
+	sd	x9, 16(sp)
+	sd	x18, 24(sp)
+	sd	x19, 32(sp)
+	sd	x20, 40(sp)
+	sd	x21, 48(sp)
+	sd	x22, 56(sp)
+	sd	x23, 64(sp)
+	sd	x24, 72(sp)
+	sd	x25, 80(sp)
+	sd	x26, 88(sp)
+	sd	x27, 96(sp)
+	fsd	f8, 104(sp)
+	fsd	f9, 112(sp)
+	fsd	f18, 120(sp)
+	fsd	f19, 128(sp)
+	fsd	f20, 136(sp)
+	fsd	f21, 144(sp)
+	fsd	f22, 152(sp)
+	fsd	f23, 160(sp)
+	fsd	f24, 168(sp)
+	fsd	f25, 176(sp)
+	fsd	f26, 184(sp)
+	fsd	f27, 192(sp)
+
+	// a0 = *fn, a1 = *setg_gcc, a2 = *g
+	mv	s1, a0
+	mv	s0, a1
+	mv	a0, a2
+	jalr	ra, s0	// call setg_gcc (clobbers x30 aka g)
+	jalr	ra, s1	// call fn
+
+	ld	x1, 0(sp)
+	ld	x8, 8(sp)
+	ld	x9, 16(sp)
+	ld	x18, 24(sp)
+	ld	x19, 32(sp)
+	ld	x20, 40(sp)
+	ld	x21, 48(sp)
+	ld	x22, 56(sp)
+	ld	x23, 64(sp)
+	ld	x24, 72(sp)
+	ld	x25, 80(sp)
+	ld	x26, 88(sp)
+	ld	x27, 96(sp)
+	fld	f8, 104(sp)
+	fld	f9, 112(sp)
+	fld	f18, 120(sp)
+	fld	f19, 128(sp)
+	fld	f20, 136(sp)
+	fld	f21, 144(sp)
+	fld	f22, 152(sp)
+	fld	f23, 160(sp)
+	fld	f24, 168(sp)
+	fld	f25, 176(sp)
+	fld	f26, 184(sp)
+	fld	f27, 192(sp)
+	addi	sp, sp, 200
+
+	jr	ra
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_s390x.S b/src/runtime/cgo/gcc_s390x.S
new file mode 100644
index 0000000..614de4b
--- /dev/null
+++ b/src/runtime/cgo/gcc_s390x.S
@@ -0,0 +1,56 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+ * void crosscall_s390x(void (*fn)(void), void *g)
+ *
+ * Calling into the go tool chain, where all registers are caller save.
+ * Called from standard s390x C ABI, where r6-r13, r15, and f8-f15 are
+ * callee-save, so they must be saved explicitly.
+ */
+.globl crosscall_s390x
+crosscall_s390x:
+	/* save r6-r15 in the register save area of the calling function */
+	stmg    %r6, %r15, 48(%r15)
+
+	/* allocate 64 bytes of stack space to save f8-f15 */
+	lay     %r15, -64(%r15)
+
+	/* save callee-saved floating point registers */
+	std     %f8, 0(%r15)
+	std     %f9, 8(%r15)
+	std     %f10, 16(%r15)
+	std     %f11, 24(%r15)
+	std     %f12, 32(%r15)
+	std     %f13, 40(%r15)
+	std     %f14, 48(%r15)
+	std     %f15, 56(%r15)
+
+	/* restore g pointer */
+	lgr     %r13, %r3
+
+	/* call fn */
+	basr    %r14, %r2
+
+	/* restore floating point registers */
+	ld      %f8, 0(%r15)
+	ld      %f9, 8(%r15)
+	ld      %f10, 16(%r15)
+	ld      %f11, 24(%r15)
+	ld      %f12, 32(%r15)
+	ld      %f13, 40(%r15)
+	ld      %f14, 48(%r15)
+	ld      %f15, 56(%r15)
+
+	/* de-allocate stack frame */
+	la      %r15, 64(%r15)
+
+	/* restore general purpose registers */
+	lmg     %r6, %r15, 48(%r15)
+
+	br      %r14 /* restored by lmg */
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_setenv.c b/src/runtime/cgo/gcc_setenv.c
new file mode 100644
index 0000000..d4f7983
--- /dev/null
+++ b/src/runtime/cgo/gcc_setenv.c
@@ -0,0 +1,28 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build cgo
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+
+#include "libcgo.h"
+
+#include <stdlib.h>
+
+/* Stub for calling setenv */
+void
+x_cgo_setenv(char **arg)
+{
+	_cgo_tsan_acquire();
+	setenv(arg[0], arg[1], 1);
+	_cgo_tsan_release();
+}
+
+/* Stub for calling unsetenv */
+void
+x_cgo_unsetenv(char **arg)
+{
+	_cgo_tsan_acquire();
+	unsetenv(arg[0]);
+	_cgo_tsan_release();
+}
diff --git a/src/runtime/cgo/gcc_sigaction.c b/src/runtime/cgo/gcc_sigaction.c
new file mode 100644
index 0000000..e510e35
--- /dev/null
+++ b/src/runtime/cgo/gcc_sigaction.c
@@ -0,0 +1,82 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux,amd64 linux,arm64
+
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+#include <signal.h>
+
+#include "libcgo.h"
+
+// go_sigaction_t is a C version of the sigactiont struct from
+// defs_linux_amd64.go.  This definition — and its conversion to and from struct
+// sigaction — are specific to linux/amd64.
+typedef struct {
+	uintptr_t handler;
+	uint64_t flags;
+	uintptr_t restorer;
+	uint64_t mask;
+} go_sigaction_t;
+
+// SA_RESTORER is part of the kernel interface.
+// This is GNU/Linux i386/amd64 specific.
+#ifndef SA_RESTORER
+#define SA_RESTORER 0x4000000
+#endif
+
+int32_t
+x_cgo_sigaction(intptr_t signum, const go_sigaction_t *goact, go_sigaction_t *oldgoact) {
+	int32_t ret;
+	struct sigaction act;
+	struct sigaction oldact;
+	size_t i;
+
+	_cgo_tsan_acquire();
+
+	memset(&act, 0, sizeof act);
+	memset(&oldact, 0, sizeof oldact);
+
+	if (goact) {
+		if (goact->flags & SA_SIGINFO) {
+			act.sa_sigaction = (void(*)(int, siginfo_t*, void*))(goact->handler);
+		} else {
+			act.sa_handler = (void(*)(int))(goact->handler);
+		}
+		sigemptyset(&act.sa_mask);
+		for (i = 0; i < 8 * sizeof(goact->mask); i++) {
+			if (goact->mask & ((uint64_t)(1)<<i)) {
+				sigaddset(&act.sa_mask, i+1);
+			}
+		}
+		act.sa_flags = goact->flags & ~SA_RESTORER;
+	}
+
+	ret = sigaction(signum, goact ? &act : NULL, oldgoact ? &oldact : NULL);
+	if (ret == -1) {
+		// runtime.rt_sigaction expects _cgo_sigaction to return errno on error.
+		_cgo_tsan_release();
+		return errno;
+	}
+
+	if (oldgoact) {
+		if (oldact.sa_flags & SA_SIGINFO) {
+			oldgoact->handler = (uintptr_t)(oldact.sa_sigaction);
+		} else {
+			oldgoact->handler = (uintptr_t)(oldact.sa_handler);
+		}
+		oldgoact->mask = 0;
+		for (i = 0; i < 8 * sizeof(oldgoact->mask); i++) {
+			if (sigismember(&oldact.sa_mask, i+1) == 1) {
+				oldgoact->mask |= (uint64_t)(1)<<i;
+			}
+		}
+		oldgoact->flags = oldact.sa_flags;
+	}
+
+	_cgo_tsan_release();
+	return ret;
+}
diff --git a/src/runtime/cgo/gcc_signal2_ios_arm64.c b/src/runtime/cgo/gcc_signal2_ios_arm64.c
new file mode 100644
index 0000000..5b8a18f
--- /dev/null
+++ b/src/runtime/cgo/gcc_signal2_ios_arm64.c
@@ -0,0 +1,11 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build lldb
+
+// Used by gcc_signal_darwin_arm64.c when doing the test build during cgo.
+// We hope that for real binaries the definition provided by Go will take precedence
+// and the linker will drop this .o file altogether, which is why this definition
+// is all by itself in its own file.
+void __attribute__((weak)) xx_cgo_panicmem(void) {}
diff --git a/src/runtime/cgo/gcc_signal_ios_arm64.c b/src/runtime/cgo/gcc_signal_ios_arm64.c
new file mode 100644
index 0000000..6519edd
--- /dev/null
+++ b/src/runtime/cgo/gcc_signal_ios_arm64.c
@@ -0,0 +1,213 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Emulation of the Unix signal SIGSEGV.
+//
+// On iOS, Go tests and apps under development are run by lldb.
+// The debugger uses a task-level exception handler to intercept signals.
+// Despite having a 'handle' mechanism like gdb, lldb will not allow a
+// SIGSEGV to pass to the running program. For Go, this means we cannot
+// generate a panic, which cannot be recovered, and so tests fail.
+//
+// We work around this by registering a thread-level mach exception handler
+// and intercepting EXC_BAD_ACCESS. The kernel offers thread handlers a
+// chance to resolve exceptions before the task handler, so we can generate
+// the panic and avoid lldb's SIGSEGV handler.
+//
+// The dist tool enables this by build flag when testing.
+
+// +build lldb
+
+#include <limits.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <signal.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <mach/arm/thread_status.h>
+#include <mach/exception_types.h>
+#include <mach/mach.h>
+#include <mach/mach_init.h>
+#include <mach/mach_port.h>
+#include <mach/thread_act.h>
+#include <mach/thread_status.h>
+
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+void xx_cgo_panicmem(void);
+uintptr_t x_cgo_panicmem = (uintptr_t)xx_cgo_panicmem;
+
+static pthread_mutex_t mach_exception_handler_port_set_mu;
+static mach_port_t mach_exception_handler_port_set = MACH_PORT_NULL;
+
+kern_return_t
+catch_exception_raise(
+	mach_port_t exception_port,
+	mach_port_t thread,
+	mach_port_t task,
+	exception_type_t exception,
+	exception_data_t code_vector,
+	mach_msg_type_number_t code_count)
+{
+	kern_return_t ret;
+	arm_unified_thread_state_t thread_state;
+	mach_msg_type_number_t state_count = ARM_UNIFIED_THREAD_STATE_COUNT;
+
+	// Returning KERN_SUCCESS intercepts the exception.
+	//
+	// Returning KERN_FAILURE lets the exception fall through to the
+	// next handler, which is the standard signal emulation code
+	// registered on the task port.
+
+	if (exception != EXC_BAD_ACCESS) {
+		return KERN_FAILURE;
+	}
+
+	ret = thread_get_state(thread, ARM_UNIFIED_THREAD_STATE, (thread_state_t)&thread_state, &state_count);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: thread_get_state failed: %d\n", ret);
+		abort();
+	}
+
+	// Bounce call to sigpanic through asm that makes it look like
+	// we call sigpanic directly from the faulting code.
+#ifdef __arm64__
+	thread_state.ts_64.__x[1] = thread_state.ts_64.__lr;
+	thread_state.ts_64.__x[2] = thread_state.ts_64.__pc;
+	thread_state.ts_64.__pc = x_cgo_panicmem;
+#else
+	thread_state.ts_32.__r[1] = thread_state.ts_32.__lr;
+	thread_state.ts_32.__r[2] = thread_state.ts_32.__pc;
+	thread_state.ts_32.__pc = x_cgo_panicmem;
+#endif
+
+	if (0) {
+		// Useful debugging logic when panicmem is broken.
+		//
+		// Sends the first SIGSEGV and lets lldb catch the
+		// second one, avoiding a loop that locks up iOS
+		// devices requiring a hard reboot.
+		fprintf(stderr, "runtime/cgo: caught exc_bad_access\n");
+		fprintf(stderr, "__lr = %llx\n", thread_state.ts_64.__lr);
+		fprintf(stderr, "__pc = %llx\n", thread_state.ts_64.__pc);
+		static int pass1 = 0;
+		if (pass1) {
+			return KERN_FAILURE;
+		}
+		pass1 = 1;
+	}
+
+	ret = thread_set_state(thread, ARM_UNIFIED_THREAD_STATE, (thread_state_t)&thread_state, state_count);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: thread_set_state failed: %d\n", ret);
+		abort();
+	}
+
+	return KERN_SUCCESS;
+}
+
+void
+darwin_arm_init_thread_exception_port()
+{
+	// Called by each new OS thread to bind its EXC_BAD_ACCESS exception
+	// to mach_exception_handler_port_set.
+	int ret;
+	mach_port_t port = MACH_PORT_NULL;
+
+	ret = mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &port);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: mach_port_allocate failed: %d\n", ret);
+		abort();
+	}
+	ret = mach_port_insert_right(
+		mach_task_self(),
+		port,
+		port,
+		MACH_MSG_TYPE_MAKE_SEND);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: mach_port_insert_right failed: %d\n", ret);
+		abort();
+	}
+
+	ret = thread_set_exception_ports(
+		mach_thread_self(),
+		EXC_MASK_BAD_ACCESS,
+		port,
+		EXCEPTION_DEFAULT,
+		THREAD_STATE_NONE);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: thread_set_exception_ports failed: %d\n", ret);
+		abort();
+	}
+
+	ret = pthread_mutex_lock(&mach_exception_handler_port_set_mu);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: pthread_mutex_lock failed: %d\n", ret);
+		abort();
+	}
+	ret = mach_port_move_member(
+		mach_task_self(),
+		port,
+		mach_exception_handler_port_set);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: mach_port_move_member failed: %d\n", ret);
+		abort();
+	}
+	ret = pthread_mutex_unlock(&mach_exception_handler_port_set_mu);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: pthread_mutex_unlock failed: %d\n", ret);
+		abort();
+	}
+}
+
+static void*
+mach_exception_handler(void *port)
+{
+	// Calls catch_exception_raise.
+	extern boolean_t exc_server();
+	mach_msg_server(exc_server, 2048, (mach_port_t)port, 0);
+	abort(); // never returns
+}
+
+void
+darwin_arm_init_mach_exception_handler()
+{
+	pthread_mutex_init(&mach_exception_handler_port_set_mu, NULL);
+
+	// Called once per process to initialize a mach port server, listening
+	// for EXC_BAD_ACCESS thread exceptions.
+	int ret;
+	pthread_t thr = NULL;
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+
+	ret = mach_port_allocate(
+		mach_task_self(),
+		MACH_PORT_RIGHT_PORT_SET,
+		&mach_exception_handler_port_set);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: mach_port_allocate failed for port_set: %d\n", ret);
+		abort();
+	}
+
+	// Block all signals to the exception handler thread
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	// Start a thread to handle exceptions.
+	uintptr_t port_set = (uintptr_t)mach_exception_handler_port_set;
+	pthread_attr_init(&attr);
+	pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
+	ret = _cgo_try_pthread_create(&thr, &attr, mach_exception_handler, (void*)port_set);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %d\n", ret);
+		abort();
+	}
+	pthread_attr_destroy(&attr);
+}
diff --git a/src/runtime/cgo/gcc_signal_ios_nolldb.c b/src/runtime/cgo/gcc_signal_ios_nolldb.c
new file mode 100644
index 0000000..cfa4025
--- /dev/null
+++ b/src/runtime/cgo/gcc_signal_ios_nolldb.c
@@ -0,0 +1,12 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !lldb
+// +build ios
+// +build arm64
+
+#include <stdint.h>
+
+void darwin_arm_init_thread_exception_port() {}
+void darwin_arm_init_mach_exception_handler() {}
diff --git a/src/runtime/cgo/gcc_solaris_amd64.c b/src/runtime/cgo/gcc_solaris_amd64.c
new file mode 100644
index 0000000..079bd12
--- /dev/null
+++ b/src/runtime/cgo/gcc_solaris_amd64.c
@@ -0,0 +1,82 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include <ucontext.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	ucontext_t ctx;
+
+	setg_gcc = setg;
+	if (getcontext(&ctx) != 0)
+		perror("runtime/cgo: getcontext failed");
+	g->stacklo = (uintptr_t)ctx.uc_stack.ss_sp;
+
+	// Solaris processes report a tiny stack when run with "ulimit -s unlimited".
+	// Correct that as best we can: assume it's at least 1 MB.
+	// See golang.org/issue/12210.
+	if(ctx.uc_stack.ss_size < 1024*1024)
+		g->stacklo -= 1024*1024 - ctx.uc_stack.ss_size;
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	void *base;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+
+	if (pthread_attr_getstack(&attr, &base, &size) != 0)
+		perror("runtime/cgo: pthread_attr_getstack failed");
+	if (size == 0) {
+		ts->g->stackhi = 2 << 20;
+		if (pthread_attr_setstack(&attr, NULL, ts->g->stackhi) != 0)
+			perror("runtime/cgo: pthread_attr_setstack failed");
+	} else {
+		ts->g->stackhi = size;
+	}
+	pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_amd64(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_traceback.c b/src/runtime/cgo/gcc_traceback.c
new file mode 100644
index 0000000..d86331c
--- /dev/null
+++ b/src/runtime/cgo/gcc_traceback.c
@@ -0,0 +1,24 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build cgo,darwin cgo,linux
+
+#include <stdint.h>
+#include "libcgo.h"
+
+// Call the user's traceback function and then call sigtramp.
+// The runtime signal handler will jump to this code.
+// We do it this way so that the user's traceback function will be called
+// by a C function with proper unwind info.
+void
+x_cgo_callers(uintptr_t sig, void *info, void *context, void (*cgoTraceback)(struct cgoTracebackArg*), uintptr_t* cgoCallers, void (*sigtramp)(uintptr_t, void*, void*)) {
+	struct cgoTracebackArg arg;
+
+	arg.Context = 0;
+	arg.SigContext = (uintptr_t)(context);
+	arg.Buf = cgoCallers;
+	arg.Max = 32; // must match len(runtime.cgoCallers)
+	(*cgoTraceback)(&arg);
+	sigtramp(sig, info, context);
+}
diff --git a/src/runtime/cgo/gcc_util.c b/src/runtime/cgo/gcc_util.c
new file mode 100644
index 0000000..3fcb48c
--- /dev/null
+++ b/src/runtime/cgo/gcc_util.c
@@ -0,0 +1,69 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "libcgo.h"
+
+/* Stub for creating a new thread */
+void
+x_cgo_thread_start(ThreadStart *arg)
+{
+	ThreadStart *ts;
+
+	/* Make our own copy that can persist after we return. */
+	_cgo_tsan_acquire();
+	ts = malloc(sizeof *ts);
+	_cgo_tsan_release();
+	if(ts == nil) {
+		fprintf(stderr, "runtime/cgo: out of memory in thread_start\n");
+		abort();
+	}
+	*ts = *arg;
+
+	_cgo_sys_thread_start(ts);	/* OS-dependent half */
+}
+
+#ifndef CGO_TSAN
+void(* const _cgo_yield)() = NULL;
+#else
+
+#include <string.h>
+
+char x_cgo_yield_strncpy_src = 0;
+char x_cgo_yield_strncpy_dst = 0;
+size_t x_cgo_yield_strncpy_n = 0;
+
+/*
+Stub for allowing libc interceptors to execute.
+
+_cgo_yield is set to NULL if we do not expect libc interceptors to exist.
+*/
+static void
+x_cgo_yield()
+{
+	/*
+	The libc function(s) we call here must form a no-op and include at least one
+	call that triggers TSAN to process pending asynchronous signals.
+
+	sleep(0) would be fine, but it's not portable C (so it would need more header
+	guards).
+	free(NULL) has a fast-path special case in TSAN, so it doesn't
+	trigger signal delivery.
+	free(malloc(0)) would work (triggering the interceptors in malloc), but
+	it also runs a bunch of user-supplied malloc hooks.
+
+	So we choose strncpy(_, _, 0): it requires an extra header,
+	but it's standard and should be very efficient.
+
+	GCC 7 has an unfortunate habit of optimizing out strncpy calls (see
+	https://golang.org/issue/21196), so the arguments here need to be global
+	variables with external linkage in order to ensure that the call traps all the
+	way down into libc.
+	*/
+	strncpy(&x_cgo_yield_strncpy_dst, &x_cgo_yield_strncpy_src,
+	        x_cgo_yield_strncpy_n);
+}
+
+void(* const _cgo_yield)() = &x_cgo_yield;
+
+#endif  /* GO_TSAN */
diff --git a/src/runtime/cgo/gcc_windows_386.c b/src/runtime/cgo/gcc_windows_386.c
new file mode 100644
index 0000000..60cb011
--- /dev/null
+++ b/src/runtime/cgo/gcc_windows_386.c
@@ -0,0 +1,55 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+#include <process.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <errno.h>
+#include "libcgo.h"
+#include "libcgo_windows.h"
+
+static void threadentry(void*);
+
+void
+x_cgo_init(G *g)
+{
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	uintptr_t thandle;
+
+	thandle = _beginthread(threadentry, 0, ts);
+	if(thandle == -1) {
+		fprintf(stderr, "runtime: failed to create new OS thread (%d)\n", errno);
+		abort();
+	}
+}
+
+static void
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// minit queries stack bounds from the OS.
+
+	/*
+	 * Set specific keys in thread local storage.
+	 */
+	asm volatile (
+		"movl %0, %%fs:0x14\n"	// MOVL tls0, 0x14(FS)
+		"movl %%fs:0x14, %%eax\n"	// MOVL 0x14(FS), tmp
+		"movl %1, 0(%%eax)\n"	// MOVL g, 0(FS)
+		:: "r"(ts.tls), "r"(ts.g) : "%eax"
+	);
+	
+	crosscall_386(ts.fn);
+}
diff --git a/src/runtime/cgo/gcc_windows_amd64.c b/src/runtime/cgo/gcc_windows_amd64.c
new file mode 100644
index 0000000..0f8c817
--- /dev/null
+++ b/src/runtime/cgo/gcc_windows_amd64.c
@@ -0,0 +1,55 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#define WIN64_LEAN_AND_MEAN
+#include <windows.h>
+#include <process.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <errno.h>
+#include "libcgo.h"
+#include "libcgo_windows.h"
+
+static void threadentry(void*);
+
+void
+x_cgo_init(G *g)
+{
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	uintptr_t thandle;
+
+	thandle = _beginthread(threadentry, 0, ts);
+	if(thandle == -1) {
+		fprintf(stderr, "runtime: failed to create new OS thread (%d)\n", errno);
+		abort();
+	}
+}
+
+static void
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// minit queries stack bounds from the OS.
+
+	/*
+	 * Set specific keys in thread local storage.
+	 */
+	asm volatile (
+	  "movq %0, %%gs:0x28\n"	// MOVL tls0, 0x28(GS)
+	  "movq %%gs:0x28, %%rax\n" // MOVQ 0x28(GS), tmp
+	  "movq %1, 0(%%rax)\n" // MOVQ g, 0(GS)
+	  :: "r"(ts.tls), "r"(ts.g) : "%rax"
+	);
+
+	crosscall_amd64(ts.fn);
+}
diff --git a/src/runtime/cgo/iscgo.go b/src/runtime/cgo/iscgo.go
new file mode 100644
index 0000000..e12d0f4
--- /dev/null
+++ b/src/runtime/cgo/iscgo.go
@@ -0,0 +1,17 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The runtime package contains an uninitialized definition
+// for runtime·iscgo. Override it to tell the runtime we're here.
+// There are various function pointers that should be set too,
+// but those depend on dynamic linker magic to get initialized
+// correctly, and sometimes they break. This variable is a
+// backup: it depends only on old C style static linking rules.
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+//go:linkname _iscgo runtime.iscgo
+var _iscgo bool = true
diff --git a/src/runtime/cgo/libcgo.h b/src/runtime/cgo/libcgo.h
new file mode 100644
index 0000000..aba500a
--- /dev/null
+++ b/src/runtime/cgo/libcgo.h
@@ -0,0 +1,151 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+#undef nil
+#define nil ((void*)0)
+#define nelem(x) (sizeof(x)/sizeof((x)[0]))
+
+typedef uint32_t uint32;
+typedef uint64_t uint64;
+typedef uintptr_t uintptr;
+
+/*
+ * The beginning of the per-goroutine structure,
+ * as defined in ../pkg/runtime/runtime.h.
+ * Just enough to edit these two fields.
+ */
+typedef struct G G;
+struct G
+{
+	uintptr stacklo;
+	uintptr stackhi;
+};
+
+/*
+ * Arguments to the _cgo_thread_start call.
+ * Also known to ../pkg/runtime/runtime.h.
+ */
+typedef struct ThreadStart ThreadStart;
+struct ThreadStart
+{
+	G *g;
+	uintptr *tls;
+	void (*fn)(void);
+};
+
+/*
+ * Called by 5c/6c/8c world.
+ * Makes a local copy of the ThreadStart and
+ * calls _cgo_sys_thread_start(ts).
+ */
+extern void (*_cgo_thread_start)(ThreadStart *ts);
+
+/*
+ * Creates a new operating system thread without updating any Go state
+ * (OS dependent).
+ */
+extern void (*_cgo_sys_thread_create)(void* (*func)(void*), void* arg);
+
+/*
+ * Creates the new operating system thread (OS, arch dependent).
+ */
+void _cgo_sys_thread_start(ThreadStart *ts);
+
+/*
+ * Waits for the Go runtime to be initialized (OS dependent).
+ * If runtime.SetCgoTraceback is used to set a context function,
+ * calls the context function and returns the context value.
+ */
+uintptr_t _cgo_wait_runtime_init_done(void);
+
+/*
+ * Call fn in the 6c world.
+ */
+void crosscall_amd64(void (*fn)(void));
+
+/*
+ * Call fn in the 8c world.
+ */
+void crosscall_386(void (*fn)(void));
+
+/*
+ * Prints error then calls abort. For linux and android.
+ */
+void fatalf(const char* format, ...);
+
+/*
+ * Registers the current mach thread port for EXC_BAD_ACCESS processing.
+ */
+void darwin_arm_init_thread_exception_port(void);
+
+/*
+ * Starts a mach message server processing EXC_BAD_ACCESS.
+ */
+void darwin_arm_init_mach_exception_handler(void);
+
+/*
+ * The cgo context function. See runtime.SetCgoTraceback.
+ */
+struct context_arg {
+	uintptr_t Context;
+};
+extern void (*(_cgo_get_context_function(void)))(struct context_arg*);
+
+/*
+ * The argument for the cgo traceback callback. See runtime.SetCgoTraceback.
+ */
+struct cgoTracebackArg {
+	uintptr_t  Context;
+	uintptr_t  SigContext;
+	uintptr_t* Buf;
+	uintptr_t  Max;
+};
+
+/*
+ * TSAN support.  This is only useful when building with
+ *   CGO_CFLAGS="-fsanitize=thread" CGO_LDFLAGS="-fsanitize=thread" go install
+ */
+#undef CGO_TSAN
+#if defined(__has_feature)
+# if __has_feature(thread_sanitizer)
+#  define CGO_TSAN
+# endif
+#elif defined(__SANITIZE_THREAD__)
+# define CGO_TSAN
+#endif
+
+#ifdef CGO_TSAN
+
+// These must match the definitions in yesTsanProlog in cmd/cgo/out.go.
+// In general we should call _cgo_tsan_acquire when we enter C code,
+// and call _cgo_tsan_release when we return to Go code.
+// This is only necessary when calling code that might be instrumented
+// by TSAN, which mostly means system library calls that TSAN intercepts.
+// See the comment in cmd/cgo/out.go for more details.
+
+long long _cgo_sync __attribute__ ((common));
+
+extern void __tsan_acquire(void*);
+extern void __tsan_release(void*);
+
+__attribute__ ((unused))
+static void _cgo_tsan_acquire() {
+	__tsan_acquire(&_cgo_sync);
+}
+
+__attribute__ ((unused))
+static void _cgo_tsan_release() {
+	__tsan_release(&_cgo_sync);
+}
+
+#else // !defined(CGO_TSAN)
+
+#define _cgo_tsan_acquire()
+#define _cgo_tsan_release()
+
+#endif // !defined(CGO_TSAN)
diff --git a/src/runtime/cgo/libcgo_unix.h b/src/runtime/cgo/libcgo_unix.h
new file mode 100644
index 0000000..a56a366
--- /dev/null
+++ b/src/runtime/cgo/libcgo_unix.h
@@ -0,0 +1,15 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+ * Call pthread_create, retrying on EAGAIN.
+ */
+extern int _cgo_try_pthread_create(pthread_t*, const pthread_attr_t*, void* (*)(void*), void*);
+
+/*
+ * Same as _cgo_try_pthread_create, but passing on the pthread_create function.
+ * Only defined on OpenBSD.
+ */
+extern int _cgo_openbsd_try_pthread_create(int (*)(pthread_t*, const pthread_attr_t*, void *(*pfn)(void*), void*),
+	pthread_t*, const pthread_attr_t*, void* (*)(void*), void* arg);
diff --git a/src/runtime/cgo/libcgo_windows.h b/src/runtime/cgo/libcgo_windows.h
new file mode 100644
index 0000000..0013f06
--- /dev/null
+++ b/src/runtime/cgo/libcgo_windows.h
@@ -0,0 +1,12 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Ensure there's one symbol marked __declspec(dllexport).
+// If there are no exported symbols, the unfortunate behavior of
+// the binutils linker is to also strip the relocations table,
+// resulting in non-PIE binary. The other option is the
+// --export-all-symbols flag, but we don't need to export all symbols
+// and this may overflow the export table (#40795).
+// See https://sourceware.org/bugzilla/show_bug.cgi?id=19011
+__declspec(dllexport) int _cgo_dummy_export;
diff --git a/src/runtime/cgo/linux.go b/src/runtime/cgo/linux.go
new file mode 100644
index 0000000..76c0192
--- /dev/null
+++ b/src/runtime/cgo/linux.go
@@ -0,0 +1,74 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Linux system call wrappers that provide POSIX semantics through the
+// corresponding cgo->libc (nptl) wrappers for various system calls.
+
+// +build linux
+
+package cgo
+
+import "unsafe"
+
+// Each of the following entries is needed to ensure that the
+// syscall.syscall_linux code can conditionally call these
+// function pointers:
+//
+//  1. find the C-defined function start
+//  2. force the local byte alias to be mapped to that location
+//  3. map the Go pointer to the function to the syscall package
+
+//go:cgo_import_static _cgo_libc_setegid
+//go:linkname _cgo_libc_setegid _cgo_libc_setegid
+//go:linkname cgo_libc_setegid syscall.cgo_libc_setegid
+var _cgo_libc_setegid byte
+var cgo_libc_setegid = unsafe.Pointer(&_cgo_libc_setegid)
+
+//go:cgo_import_static _cgo_libc_seteuid
+//go:linkname _cgo_libc_seteuid _cgo_libc_seteuid
+//go:linkname cgo_libc_seteuid syscall.cgo_libc_seteuid
+var _cgo_libc_seteuid byte
+var cgo_libc_seteuid = unsafe.Pointer(&_cgo_libc_seteuid)
+
+//go:cgo_import_static _cgo_libc_setregid
+//go:linkname _cgo_libc_setregid _cgo_libc_setregid
+//go:linkname cgo_libc_setregid syscall.cgo_libc_setregid
+var _cgo_libc_setregid byte
+var cgo_libc_setregid = unsafe.Pointer(&_cgo_libc_setregid)
+
+//go:cgo_import_static _cgo_libc_setresgid
+//go:linkname _cgo_libc_setresgid _cgo_libc_setresgid
+//go:linkname cgo_libc_setresgid syscall.cgo_libc_setresgid
+var _cgo_libc_setresgid byte
+var cgo_libc_setresgid = unsafe.Pointer(&_cgo_libc_setresgid)
+
+//go:cgo_import_static _cgo_libc_setresuid
+//go:linkname _cgo_libc_setresuid _cgo_libc_setresuid
+//go:linkname cgo_libc_setresuid syscall.cgo_libc_setresuid
+var _cgo_libc_setresuid byte
+var cgo_libc_setresuid = unsafe.Pointer(&_cgo_libc_setresuid)
+
+//go:cgo_import_static _cgo_libc_setreuid
+//go:linkname _cgo_libc_setreuid _cgo_libc_setreuid
+//go:linkname cgo_libc_setreuid syscall.cgo_libc_setreuid
+var _cgo_libc_setreuid byte
+var cgo_libc_setreuid = unsafe.Pointer(&_cgo_libc_setreuid)
+
+//go:cgo_import_static _cgo_libc_setgroups
+//go:linkname _cgo_libc_setgroups _cgo_libc_setgroups
+//go:linkname cgo_libc_setgroups syscall.cgo_libc_setgroups
+var _cgo_libc_setgroups byte
+var cgo_libc_setgroups = unsafe.Pointer(&_cgo_libc_setgroups)
+
+//go:cgo_import_static _cgo_libc_setgid
+//go:linkname _cgo_libc_setgid _cgo_libc_setgid
+//go:linkname cgo_libc_setgid syscall.cgo_libc_setgid
+var _cgo_libc_setgid byte
+var cgo_libc_setgid = unsafe.Pointer(&_cgo_libc_setgid)
+
+//go:cgo_import_static _cgo_libc_setuid
+//go:linkname _cgo_libc_setuid _cgo_libc_setuid
+//go:linkname cgo_libc_setuid syscall.cgo_libc_setuid
+var _cgo_libc_setuid byte
+var cgo_libc_setuid = unsafe.Pointer(&_cgo_libc_setuid)
diff --git a/src/runtime/cgo/linux_syscall.c b/src/runtime/cgo/linux_syscall.c
new file mode 100644
index 0000000..59761c8
--- /dev/null
+++ b/src/runtime/cgo/linux_syscall.c
@@ -0,0 +1,85 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+
+#ifndef _GNU_SOURCE // setres[ug]id() API.
+#define _GNU_SOURCE
+#endif
+
+#include <grp.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <errno.h>
+#include "libcgo.h"
+
+/*
+ * Assumed POSIX compliant libc system call wrappers. For linux, the
+ * glibc/nptl/setxid mechanism ensures that POSIX semantics are
+ * honored for all pthreads (by default), and this in turn with cgo
+ * ensures that all Go threads launched with cgo are kept in sync for
+ * these function calls.
+ */
+
+// argset_t matches runtime/cgocall.go:argset.
+typedef struct {
+	uintptr_t* args;
+	uintptr_t retval;
+} argset_t;
+
+// libc backed posix-compliant syscalls.
+
+#define SET_RETVAL(fn) \
+  uintptr_t ret = (uintptr_t) fn ; \
+  if (ret == (uintptr_t) -1) {	   \
+    x->retval = (uintptr_t) errno; \
+  } else                           \
+    x->retval = ret
+
+void
+_cgo_libc_setegid(argset_t* x) {
+	SET_RETVAL(setegid((gid_t) x->args[0]));
+}
+
+void
+_cgo_libc_seteuid(argset_t* x) {
+	SET_RETVAL(seteuid((uid_t) x->args[0]));
+}
+
+void
+_cgo_libc_setgid(argset_t* x) {
+	SET_RETVAL(setgid((gid_t) x->args[0]));
+}
+
+void
+_cgo_libc_setgroups(argset_t* x) {
+	SET_RETVAL(setgroups((size_t) x->args[0], (const gid_t *) x->args[1]));
+}
+
+void
+_cgo_libc_setregid(argset_t* x) {
+	SET_RETVAL(setregid((gid_t) x->args[0], (gid_t) x->args[1]));
+}
+
+void
+_cgo_libc_setresgid(argset_t* x) {
+	SET_RETVAL(setresgid((gid_t) x->args[0], (gid_t) x->args[1],
+			     (gid_t) x->args[2]));
+}
+
+void
+_cgo_libc_setresuid(argset_t* x) {
+	SET_RETVAL(setresuid((uid_t) x->args[0], (uid_t) x->args[1],
+			     (uid_t) x->args[2]));
+}
+
+void
+_cgo_libc_setreuid(argset_t* x) {
+	SET_RETVAL(setreuid((uid_t) x->args[0], (uid_t) x->args[1]));
+}
+
+void
+_cgo_libc_setuid(argset_t* x) {
+	SET_RETVAL(setuid((uid_t) x->args[0]));
+}
diff --git a/src/runtime/cgo/mmap.go b/src/runtime/cgo/mmap.go
new file mode 100644
index 0000000..00fb7fc
--- /dev/null
+++ b/src/runtime/cgo/mmap.go
@@ -0,0 +1,31 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux,amd64 linux,arm64
+
+package cgo
+
+// Import "unsafe" because we use go:linkname.
+import _ "unsafe"
+
+// When using cgo, call the C library for mmap, so that we call into
+// any sanitizer interceptors. This supports using the memory
+// sanitizer with Go programs. The memory sanitizer only applies to
+// C/C++ code; this permits that code to see the Go code as normal
+// program addresses that have been initialized.
+
+// To support interceptors that look for both mmap and munmap,
+// also call the C library for munmap.
+
+//go:cgo_import_static x_cgo_mmap
+//go:linkname x_cgo_mmap x_cgo_mmap
+//go:linkname _cgo_mmap _cgo_mmap
+var x_cgo_mmap byte
+var _cgo_mmap = &x_cgo_mmap
+
+//go:cgo_import_static x_cgo_munmap
+//go:linkname x_cgo_munmap x_cgo_munmap
+//go:linkname _cgo_munmap _cgo_munmap
+var x_cgo_munmap byte
+var _cgo_munmap = &x_cgo_munmap
diff --git a/src/runtime/cgo/netbsd.go b/src/runtime/cgo/netbsd.go
new file mode 100644
index 0000000..74d0aed
--- /dev/null
+++ b/src/runtime/cgo/netbsd.go
@@ -0,0 +1,21 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build netbsd
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+// Supply environ and __progname, because we don't
+// link against the standard NetBSD crt0.o and the
+// libc dynamic library needs them.
+
+//go:linkname _environ environ
+//go:linkname _progname __progname
+//go:linkname ___ps_strings __ps_strings
+
+var _environ uintptr
+var _progname uintptr
+var ___ps_strings uintptr
diff --git a/src/runtime/cgo/openbsd.go b/src/runtime/cgo/openbsd.go
new file mode 100644
index 0000000..81c73bf
--- /dev/null
+++ b/src/runtime/cgo/openbsd.go
@@ -0,0 +1,20 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build openbsd
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+// Supply __guard_local because we don't link against the standard
+// OpenBSD crt0.o and the libc dynamic library needs it.
+
+//go:linkname _guard_local __guard_local
+
+var _guard_local uintptr
+
+// This is normally marked as hidden and placed in the
+// .openbsd.randomdata section.
+//go:cgo_export_dynamic __guard_local __guard_local
diff --git a/src/runtime/cgo/setenv.go b/src/runtime/cgo/setenv.go
new file mode 100644
index 0000000..6495fcb
--- /dev/null
+++ b/src/runtime/cgo/setenv.go
@@ -0,0 +1,21 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+//go:cgo_import_static x_cgo_setenv
+//go:linkname x_cgo_setenv x_cgo_setenv
+//go:linkname _cgo_setenv runtime._cgo_setenv
+var x_cgo_setenv byte
+var _cgo_setenv = &x_cgo_setenv
+
+//go:cgo_import_static x_cgo_unsetenv
+//go:linkname x_cgo_unsetenv x_cgo_unsetenv
+//go:linkname _cgo_unsetenv runtime._cgo_unsetenv
+var x_cgo_unsetenv byte
+var _cgo_unsetenv = &x_cgo_unsetenv
diff --git a/src/runtime/cgo/sigaction.go b/src/runtime/cgo/sigaction.go
new file mode 100644
index 0000000..076fbc1
--- /dev/null
+++ b/src/runtime/cgo/sigaction.go
@@ -0,0 +1,22 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux,amd64 freebsd,amd64 linux,arm64
+
+package cgo
+
+// Import "unsafe" because we use go:linkname.
+import _ "unsafe"
+
+// When using cgo, call the C library for sigaction, so that we call into
+// any sanitizer interceptors. This supports using the memory
+// sanitizer with Go programs. The memory sanitizer only applies to
+// C/C++ code; this permits that code to see the Go runtime's existing signal
+// handlers when registering new signal handlers for the process.
+
+//go:cgo_import_static x_cgo_sigaction
+//go:linkname x_cgo_sigaction x_cgo_sigaction
+//go:linkname _cgo_sigaction _cgo_sigaction
+var x_cgo_sigaction byte
+var _cgo_sigaction = &x_cgo_sigaction
diff --git a/src/runtime/cgo/signal_ios_arm64.go b/src/runtime/cgo/signal_ios_arm64.go
new file mode 100644
index 0000000..3425c44
--- /dev/null
+++ b/src/runtime/cgo/signal_ios_arm64.go
@@ -0,0 +1,10 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package cgo
+
+import _ "unsafe"
+
+//go:cgo_export_static xx_cgo_panicmem xx_cgo_panicmem
+func xx_cgo_panicmem()
diff --git a/src/runtime/cgo/signal_ios_arm64.s b/src/runtime/cgo/signal_ios_arm64.s
new file mode 100644
index 0000000..1ae00d1
--- /dev/null
+++ b/src/runtime/cgo/signal_ios_arm64.s
@@ -0,0 +1,56 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// xx_cgo_panicmem is the entrypoint for SIGSEGV as intercepted via a
+// mach thread port as EXC_BAD_ACCESS. As the segfault may have happened
+// in C code, we first need to load_g then call xx_cgo_panicmem.
+//
+//	R1 - LR at moment of fault
+//	R2 - PC at moment of fault
+TEXT xx_cgo_panicmem(SB),NOSPLIT|NOFRAME,$0
+	// If in external C code, we need to load the g register.
+	BL  runtime·load_g(SB)
+	CMP $0, g
+	BNE ongothread
+
+	// On a foreign thread.
+	// TODO(crawshaw): call badsignal
+	MOVD.W $0, -16(RSP)
+	MOVW $139, R1
+	MOVW R1, 8(RSP)
+	B    runtime·exit(SB)
+
+ongothread:
+	// Trigger a SIGSEGV panic.
+	//
+	// The goal is to arrange the stack so it looks like the runtime
+	// function sigpanic was called from the PC that faulted. It has
+	// to be sigpanic, as the stack unwinding code in traceback.go
+	// looks explicitly for it.
+	//
+	// To do this we call into runtime·setsigsegv, which sets the
+	// appropriate state inside the g object. We give it the faulting
+	// PC on the stack, then put it in the LR before calling sigpanic.
+
+	// Build a 32-byte stack frame for us for this call.
+	// Saved LR (none available) is at the bottom,
+	// then the PC argument for setsigsegv,
+	// then a copy of the LR for us to restore.
+	MOVD.W $0, -32(RSP)
+	MOVD R1, 8(RSP)
+	MOVD R2, 16(RSP)
+	BL runtime·setsigsegv(SB)
+	MOVD 8(RSP), R1
+	MOVD 16(RSP), R2
+
+	// Build a 16-byte stack frame for the simulated
+	// call to sigpanic, by taking 16 bytes away from the
+	// 32-byte stack frame above.
+	// The saved LR in this frame is the LR at time of fault,
+	// and the LR on entry to sigpanic is the PC at time of fault.
+	MOVD.W R1, 16(RSP)
+	MOVD R2, R30
+	B runtime·sigpanic(SB)
diff --git a/src/runtime/cgo_mmap.go b/src/runtime/cgo_mmap.go
new file mode 100644
index 0000000..d5e0cc1
--- /dev/null
+++ b/src/runtime/cgo_mmap.go
@@ -0,0 +1,67 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Support for memory sanitizer. See runtime/cgo/mmap.go.
+
+// +build linux,amd64 linux,arm64
+
+package runtime
+
+import "unsafe"
+
+// _cgo_mmap is filled in by runtime/cgo when it is linked into the
+// program, so it is only non-nil when using cgo.
+//go:linkname _cgo_mmap _cgo_mmap
+var _cgo_mmap unsafe.Pointer
+
+// _cgo_munmap is filled in by runtime/cgo when it is linked into the
+// program, so it is only non-nil when using cgo.
+//go:linkname _cgo_munmap _cgo_munmap
+var _cgo_munmap unsafe.Pointer
+
+// mmap is used to route the mmap system call through C code when using cgo, to
+// support sanitizer interceptors. Don't allow stack splits, since this function
+// (used by sysAlloc) is called in a lot of low-level parts of the runtime and
+// callers often assume it won't acquire any locks.
+//go:nosplit
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (unsafe.Pointer, int) {
+	if _cgo_mmap != nil {
+		// Make ret a uintptr so that writing to it in the
+		// function literal does not trigger a write barrier.
+		// A write barrier here could break because of the way
+		// that mmap uses the same value both as a pointer and
+		// an errno value.
+		var ret uintptr
+		systemstack(func() {
+			ret = callCgoMmap(addr, n, prot, flags, fd, off)
+		})
+		if ret < 4096 {
+			return nil, int(ret)
+		}
+		return unsafe.Pointer(ret), 0
+	}
+	return sysMmap(addr, n, prot, flags, fd, off)
+}
+
+func munmap(addr unsafe.Pointer, n uintptr) {
+	if _cgo_munmap != nil {
+		systemstack(func() { callCgoMunmap(addr, n) })
+		return
+	}
+	sysMunmap(addr, n)
+}
+
+// sysMmap calls the mmap system call. It is implemented in assembly.
+func sysMmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (p unsafe.Pointer, err int)
+
+// callCgoMmap calls the mmap function in the runtime/cgo package
+// using the GCC calling convention. It is implemented in assembly.
+func callCgoMmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) uintptr
+
+// sysMunmap calls the munmap system call. It is implemented in assembly.
+func sysMunmap(addr unsafe.Pointer, n uintptr)
+
+// callCgoMunmap calls the munmap function in the runtime/cgo package
+// using the GCC calling convention. It is implemented in assembly.
+func callCgoMunmap(addr unsafe.Pointer, n uintptr)
diff --git a/src/runtime/cgo_ppc64x.go b/src/runtime/cgo_ppc64x.go
new file mode 100644
index 0000000..fb2da32
--- /dev/null
+++ b/src/runtime/cgo_ppc64x.go
@@ -0,0 +1,12 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+package runtime
+
+// crosscall_ppc64 calls into the runtime to set up the registers the
+// Go runtime expects and so the symbol it calls needs to be exported
+// for external linking to work.
+//go:cgo_export_static _cgo_reginit
diff --git a/src/runtime/cgo_sigaction.go b/src/runtime/cgo_sigaction.go
new file mode 100644
index 0000000..de634dc
--- /dev/null
+++ b/src/runtime/cgo_sigaction.go
@@ -0,0 +1,87 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Support for memory sanitizer. See runtime/cgo/sigaction.go.
+
+// +build linux,amd64 freebsd,amd64 linux,arm64
+
+package runtime
+
+import "unsafe"
+
+// _cgo_sigaction is filled in by runtime/cgo when it is linked into the
+// program, so it is only non-nil when using cgo.
+//go:linkname _cgo_sigaction _cgo_sigaction
+var _cgo_sigaction unsafe.Pointer
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaction(sig uint32, new, old *sigactiont) {
+	// racewalk.go avoids adding sanitizing instrumentation to package runtime,
+	// but we might be calling into instrumented C functions here,
+	// so we need the pointer parameters to be properly marked.
+	//
+	// Mark the input as having been written before the call
+	// and the output as read after.
+	if msanenabled && new != nil {
+		msanwrite(unsafe.Pointer(new), unsafe.Sizeof(*new))
+	}
+
+	if _cgo_sigaction == nil || inForkedChild {
+		sysSigaction(sig, new, old)
+	} else {
+		// We need to call _cgo_sigaction, which means we need a big enough stack
+		// for C.  To complicate matters, we may be in libpreinit (before the
+		// runtime has been initialized) or in an asynchronous signal handler (with
+		// the current thread in transition between goroutines, or with the g0
+		// system stack already in use).
+
+		var ret int32
+
+		var g *g
+		if mainStarted {
+			g = getg()
+		}
+		sp := uintptr(unsafe.Pointer(&sig))
+		switch {
+		case g == nil:
+			// No g: we're on a C stack or a signal stack.
+			ret = callCgoSigaction(uintptr(sig), new, old)
+		case sp < g.stack.lo || sp >= g.stack.hi:
+			// We're no longer on g's stack, so we must be handling a signal.  It's
+			// possible that we interrupted the thread during a transition between g
+			// and g0, so we should stay on the current stack to avoid corrupting g0.
+			ret = callCgoSigaction(uintptr(sig), new, old)
+		default:
+			// We're running on g's stack, so either we're not in a signal handler or
+			// the signal handler has set the correct g.  If we're on gsignal or g0,
+			// systemstack will make the call directly; otherwise, it will switch to
+			// g0 to ensure we have enough room to call a libc function.
+			//
+			// The function literal that we pass to systemstack is not nosplit, but
+			// that's ok: we'll be running on a fresh, clean system stack so the stack
+			// check will always succeed anyway.
+			systemstack(func() {
+				ret = callCgoSigaction(uintptr(sig), new, old)
+			})
+		}
+
+		const EINVAL = 22
+		if ret == EINVAL {
+			// libc reserves certain signals — normally 32-33 — for pthreads, and
+			// returns EINVAL for sigaction calls on those signals.  If we get EINVAL,
+			// fall back to making the syscall directly.
+			sysSigaction(sig, new, old)
+		}
+	}
+
+	if msanenabled && old != nil {
+		msanread(unsafe.Pointer(old), unsafe.Sizeof(*old))
+	}
+}
+
+// callCgoSigaction calls the sigaction function in the runtime/cgo package
+// using the GCC calling convention. It is implemented in assembly.
+//go:noescape
+func callCgoSigaction(sig uintptr, new, old *sigactiont) int32
diff --git a/src/runtime/cgocall.go b/src/runtime/cgocall.go
new file mode 100644
index 0000000..20cacd6
--- /dev/null
+++ b/src/runtime/cgocall.go
@@ -0,0 +1,628 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Cgo call and callback support.
+//
+// To call into the C function f from Go, the cgo-generated code calls
+// runtime.cgocall(_cgo_Cfunc_f, frame), where _cgo_Cfunc_f is a
+// gcc-compiled function written by cgo.
+//
+// runtime.cgocall (below) calls entersyscall so as not to block
+// other goroutines or the garbage collector, and then calls
+// runtime.asmcgocall(_cgo_Cfunc_f, frame).
+//
+// runtime.asmcgocall (in asm_$GOARCH.s) switches to the m->g0 stack
+// (assumed to be an operating system-allocated stack, so safe to run
+// gcc-compiled code on) and calls _cgo_Cfunc_f(frame).
+//
+// _cgo_Cfunc_f invokes the actual C function f with arguments
+// taken from the frame structure, records the results in the frame,
+// and returns to runtime.asmcgocall.
+//
+// After it regains control, runtime.asmcgocall switches back to the
+// original g (m->curg)'s stack and returns to runtime.cgocall.
+//
+// After it regains control, runtime.cgocall calls exitsyscall, which blocks
+// until this m can run Go code without violating the $GOMAXPROCS limit,
+// and then unlocks g from m.
+//
+// The above description skipped over the possibility of the gcc-compiled
+// function f calling back into Go. If that happens, we continue down
+// the rabbit hole during the execution of f.
+//
+// To make it possible for gcc-compiled C code to call a Go function p.GoF,
+// cgo writes a gcc-compiled function named GoF (not p.GoF, since gcc doesn't
+// know about packages).  The gcc-compiled C function f calls GoF.
+//
+// GoF initializes "frame", a structure containing all of its
+// arguments and slots for p.GoF's results. It calls
+// crosscall2(_cgoexp_GoF, frame, framesize, ctxt) using the gcc ABI.
+//
+// crosscall2 (in cgo/asm_$GOARCH.s) is a four-argument adapter from
+// the gcc function call ABI to the gc function call ABI. At this
+// point we're in the Go runtime, but we're still running on m.g0's
+// stack and outside the $GOMAXPROCS limit. crosscall2 calls
+// runtime.cgocallback(_cgoexp_GoF, frame, ctxt) using the gc ABI.
+// (crosscall2's framesize argument is no longer used, but there's one
+// case where SWIG calls crosscall2 directly and expects to pass this
+// argument. See _cgo_panic.)
+//
+// runtime.cgocallback (in asm_$GOARCH.s) switches from m.g0's stack
+// to the original g (m.curg)'s stack, on which it calls
+// runtime.cgocallbackg(_cgoexp_GoF, frame, ctxt). As part of the
+// stack switch, runtime.cgocallback saves the current SP as
+// m.g0.sched.sp, so that any use of m.g0's stack during the execution
+// of the callback will be done below the existing stack frames.
+// Before overwriting m.g0.sched.sp, it pushes the old value on the
+// m.g0 stack, so that it can be restored later.
+//
+// runtime.cgocallbackg (below) is now running on a real goroutine
+// stack (not an m.g0 stack).  First it calls runtime.exitsyscall, which will
+// block until the $GOMAXPROCS limit allows running this goroutine.
+// Once exitsyscall has returned, it is safe to do things like call the memory
+// allocator or invoke the Go callback function.  runtime.cgocallbackg
+// first defers a function to unwind m.g0.sched.sp, so that if p.GoF
+// panics, m.g0.sched.sp will be restored to its old value: the m.g0 stack
+// and the m.curg stack will be unwound in lock step.
+// Then it calls _cgoexp_GoF(frame).
+//
+// _cgoexp_GoF, which was generated by cmd/cgo, unpacks the arguments
+// from frame, calls p.GoF, writes the results back to frame, and
+// returns. Now we start unwinding this whole process.
+//
+// runtime.cgocallbackg pops but does not execute the deferred
+// function to unwind m.g0.sched.sp, calls runtime.entersyscall, and
+// returns to runtime.cgocallback.
+//
+// After it regains control, runtime.cgocallback switches back to
+// m.g0's stack (the pointer is still in m.g0.sched.sp), restores the old
+// m.g0.sched.sp value from the stack, and returns to crosscall2.
+//
+// crosscall2 restores the callee-save registers for gcc and returns
+// to GoF, which unpacks any result values and returns to f.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Addresses collected in a cgo backtrace when crashing.
+// Length must match arg.Max in x_cgo_callers in runtime/cgo/gcc_traceback.c.
+type cgoCallers [32]uintptr
+
+// argset matches runtime/cgo/linux_syscall.c:argset_t
+type argset struct {
+	args   unsafe.Pointer
+	retval uintptr
+}
+
+// wrapper for syscall package to call cgocall for libc (cgo) calls.
+//go:linkname syscall_cgocaller syscall.cgocaller
+//go:nosplit
+//go:uintptrescapes
+func syscall_cgocaller(fn unsafe.Pointer, args ...uintptr) uintptr {
+	as := argset{args: unsafe.Pointer(&args[0])}
+	cgocall(fn, unsafe.Pointer(&as))
+	return as.retval
+}
+
+// Call from Go to C.
+//
+// This must be nosplit because it's used for syscalls on some
+// platforms. Syscalls may have untyped arguments on the stack, so
+// it's not safe to grow or scan the stack.
+//
+//go:nosplit
+func cgocall(fn, arg unsafe.Pointer) int32 {
+	if !iscgo && GOOS != "solaris" && GOOS != "illumos" && GOOS != "windows" {
+		throw("cgocall unavailable")
+	}
+
+	if fn == nil {
+		throw("cgocall nil")
+	}
+
+	if raceenabled {
+		racereleasemerge(unsafe.Pointer(&racecgosync))
+	}
+
+	mp := getg().m
+	mp.ncgocall++
+	mp.ncgo++
+
+	// Reset traceback.
+	mp.cgoCallers[0] = 0
+
+	// Announce we are entering a system call
+	// so that the scheduler knows to create another
+	// M to run goroutines while we are in the
+	// foreign code.
+	//
+	// The call to asmcgocall is guaranteed not to
+	// grow the stack and does not allocate memory,
+	// so it is safe to call while "in a system call", outside
+	// the $GOMAXPROCS accounting.
+	//
+	// fn may call back into Go code, in which case we'll exit the
+	// "system call", run the Go code (which may grow the stack),
+	// and then re-enter the "system call" reusing the PC and SP
+	// saved by entersyscall here.
+	entersyscall()
+
+	// Tell asynchronous preemption that we're entering external
+	// code. We do this after entersyscall because this may block
+	// and cause an async preemption to fail, but at this point a
+	// sync preemption will succeed (though this is not a matter
+	// of correctness).
+	osPreemptExtEnter(mp)
+
+	mp.incgo = true
+	errno := asmcgocall(fn, arg)
+
+	// Update accounting before exitsyscall because exitsyscall may
+	// reschedule us on to a different M.
+	mp.incgo = false
+	mp.ncgo--
+
+	osPreemptExtExit(mp)
+
+	exitsyscall()
+
+	// Note that raceacquire must be called only after exitsyscall has
+	// wired this M to a P.
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&racecgosync))
+	}
+
+	// From the garbage collector's perspective, time can move
+	// backwards in the sequence above. If there's a callback into
+	// Go code, GC will see this function at the call to
+	// asmcgocall. When the Go call later returns to C, the
+	// syscall PC/SP is rolled back and the GC sees this function
+	// back at the call to entersyscall. Normally, fn and arg
+	// would be live at entersyscall and dead at asmcgocall, so if
+	// time moved backwards, GC would see these arguments as dead
+	// and then live. Prevent these undead arguments from crashing
+	// GC by forcing them to stay live across this time warp.
+	KeepAlive(fn)
+	KeepAlive(arg)
+	KeepAlive(mp)
+
+	return errno
+}
+
+// Call from C back to Go.
+//go:nosplit
+func cgocallbackg(fn, frame unsafe.Pointer, ctxt uintptr) {
+	gp := getg()
+	if gp != gp.m.curg {
+		println("runtime: bad g in cgocallback")
+		exit(2)
+	}
+
+	// The call from C is on gp.m's g0 stack, so we must ensure
+	// that we stay on that M. We have to do this before calling
+	// exitsyscall, since it would otherwise be free to move us to
+	// a different M. The call to unlockOSThread is in unwindm.
+	lockOSThread()
+
+	// Save current syscall parameters, so m.syscall can be
+	// used again if callback decide to make syscall.
+	syscall := gp.m.syscall
+
+	// entersyscall saves the caller's SP to allow the GC to trace the Go
+	// stack. However, since we're returning to an earlier stack frame and
+	// need to pair with the entersyscall() call made by cgocall, we must
+	// save syscall* and let reentersyscall restore them.
+	savedsp := unsafe.Pointer(gp.syscallsp)
+	savedpc := gp.syscallpc
+	exitsyscall() // coming out of cgo call
+	gp.m.incgo = false
+
+	osPreemptExtExit(gp.m)
+
+	cgocallbackg1(fn, frame, ctxt)
+
+	// At this point unlockOSThread has been called.
+	// The following code must not change to a different m.
+	// This is enforced by checking incgo in the schedule function.
+
+	osPreemptExtEnter(gp.m)
+
+	gp.m.incgo = true
+	// going back to cgo call
+	reentersyscall(savedpc, uintptr(savedsp))
+
+	gp.m.syscall = syscall
+}
+
+func cgocallbackg1(fn, frame unsafe.Pointer, ctxt uintptr) {
+	gp := getg()
+	if gp.m.needextram || atomic.Load(&extraMWaiters) > 0 {
+		gp.m.needextram = false
+		systemstack(newextram)
+	}
+
+	if ctxt != 0 {
+		s := append(gp.cgoCtxt, ctxt)
+
+		// Now we need to set gp.cgoCtxt = s, but we could get
+		// a SIGPROF signal while manipulating the slice, and
+		// the SIGPROF handler could pick up gp.cgoCtxt while
+		// tracing up the stack.  We need to ensure that the
+		// handler always sees a valid slice, so set the
+		// values in an order such that it always does.
+		p := (*slice)(unsafe.Pointer(&gp.cgoCtxt))
+		atomicstorep(unsafe.Pointer(&p.array), unsafe.Pointer(&s[0]))
+		p.cap = cap(s)
+		p.len = len(s)
+
+		defer func(gp *g) {
+			// Decrease the length of the slice by one, safely.
+			p := (*slice)(unsafe.Pointer(&gp.cgoCtxt))
+			p.len--
+		}(gp)
+	}
+
+	if gp.m.ncgo == 0 {
+		// The C call to Go came from a thread not currently running
+		// any Go. In the case of -buildmode=c-archive or c-shared,
+		// this call may be coming in before package initialization
+		// is complete. Wait until it is.
+		<-main_init_done
+	}
+
+	// Add entry to defer stack in case of panic.
+	restore := true
+	defer unwindm(&restore)
+
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&racecgosync))
+	}
+
+	// Invoke callback. This function is generated by cmd/cgo and
+	// will unpack the argument frame and call the Go function.
+	var cb func(frame unsafe.Pointer)
+	cbFV := funcval{uintptr(fn)}
+	*(*unsafe.Pointer)(unsafe.Pointer(&cb)) = noescape(unsafe.Pointer(&cbFV))
+	cb(frame)
+
+	if raceenabled {
+		racereleasemerge(unsafe.Pointer(&racecgosync))
+	}
+
+	// Do not unwind m->g0->sched.sp.
+	// Our caller, cgocallback, will do that.
+	restore = false
+}
+
+func unwindm(restore *bool) {
+	if *restore {
+		// Restore sp saved by cgocallback during
+		// unwind of g's stack (see comment at top of file).
+		mp := acquirem()
+		sched := &mp.g0.sched
+		switch GOARCH {
+		default:
+			throw("unwindm not implemented")
+		case "386", "amd64", "arm", "ppc64", "ppc64le", "mips64", "mips64le", "s390x", "mips", "mipsle", "riscv64":
+			sched.sp = *(*uintptr)(unsafe.Pointer(sched.sp + sys.MinFrameSize))
+		case "arm64":
+			sched.sp = *(*uintptr)(unsafe.Pointer(sched.sp + 16))
+		}
+
+		// Do the accounting that cgocall will not have a chance to do
+		// during an unwind.
+		//
+		// In the case where a Go call originates from C, ncgo is 0
+		// and there is no matching cgocall to end.
+		if mp.ncgo > 0 {
+			mp.incgo = false
+			mp.ncgo--
+			osPreemptExtExit(mp)
+		}
+
+		releasem(mp)
+	}
+
+	// Undo the call to lockOSThread in cgocallbackg.
+	// We must still stay on the same m.
+	unlockOSThread()
+}
+
+// called from assembly
+func badcgocallback() {
+	throw("misaligned stack in cgocallback")
+}
+
+// called from (incomplete) assembly
+func cgounimpl() {
+	throw("cgo not implemented")
+}
+
+var racecgosync uint64 // represents possible synchronization in C code
+
+// Pointer checking for cgo code.
+
+// We want to detect all cases where a program that does not use
+// unsafe makes a cgo call passing a Go pointer to memory that
+// contains a Go pointer. Here a Go pointer is defined as a pointer
+// to memory allocated by the Go runtime. Programs that use unsafe
+// can evade this restriction easily, so we don't try to catch them.
+// The cgo program will rewrite all possibly bad pointer arguments to
+// call cgoCheckPointer, where we can catch cases of a Go pointer
+// pointing to a Go pointer.
+
+// Complicating matters, taking the address of a slice or array
+// element permits the C program to access all elements of the slice
+// or array. In that case we will see a pointer to a single element,
+// but we need to check the entire data structure.
+
+// The cgoCheckPointer call takes additional arguments indicating that
+// it was called on an address expression. An additional argument of
+// true means that it only needs to check a single element. An
+// additional argument of a slice or array means that it needs to
+// check the entire slice/array, but nothing else. Otherwise, the
+// pointer could be anything, and we check the entire heap object,
+// which is conservative but safe.
+
+// When and if we implement a moving garbage collector,
+// cgoCheckPointer will pin the pointer for the duration of the cgo
+// call.  (This is necessary but not sufficient; the cgo program will
+// also have to change to pin Go pointers that cannot point to Go
+// pointers.)
+
+// cgoCheckPointer checks if the argument contains a Go pointer that
+// points to a Go pointer, and panics if it does.
+func cgoCheckPointer(ptr interface{}, arg interface{}) {
+	if debug.cgocheck == 0 {
+		return
+	}
+
+	ep := efaceOf(&ptr)
+	t := ep._type
+
+	top := true
+	if arg != nil && (t.kind&kindMask == kindPtr || t.kind&kindMask == kindUnsafePointer) {
+		p := ep.data
+		if t.kind&kindDirectIface == 0 {
+			p = *(*unsafe.Pointer)(p)
+		}
+		if p == nil || !cgoIsGoPointer(p) {
+			return
+		}
+		aep := efaceOf(&arg)
+		switch aep._type.kind & kindMask {
+		case kindBool:
+			if t.kind&kindMask == kindUnsafePointer {
+				// We don't know the type of the element.
+				break
+			}
+			pt := (*ptrtype)(unsafe.Pointer(t))
+			cgoCheckArg(pt.elem, p, true, false, cgoCheckPointerFail)
+			return
+		case kindSlice:
+			// Check the slice rather than the pointer.
+			ep = aep
+			t = ep._type
+		case kindArray:
+			// Check the array rather than the pointer.
+			// Pass top as false since we have a pointer
+			// to the array.
+			ep = aep
+			t = ep._type
+			top = false
+		default:
+			throw("can't happen")
+		}
+	}
+
+	cgoCheckArg(t, ep.data, t.kind&kindDirectIface == 0, top, cgoCheckPointerFail)
+}
+
+const cgoCheckPointerFail = "cgo argument has Go pointer to Go pointer"
+const cgoResultFail = "cgo result has Go pointer"
+
+// cgoCheckArg is the real work of cgoCheckPointer. The argument p
+// is either a pointer to the value (of type t), or the value itself,
+// depending on indir. The top parameter is whether we are at the top
+// level, where Go pointers are allowed.
+func cgoCheckArg(t *_type, p unsafe.Pointer, indir, top bool, msg string) {
+	if t.ptrdata == 0 || p == nil {
+		// If the type has no pointers there is nothing to do.
+		return
+	}
+
+	switch t.kind & kindMask {
+	default:
+		throw("can't happen")
+	case kindArray:
+		at := (*arraytype)(unsafe.Pointer(t))
+		if !indir {
+			if at.len != 1 {
+				throw("can't happen")
+			}
+			cgoCheckArg(at.elem, p, at.elem.kind&kindDirectIface == 0, top, msg)
+			return
+		}
+		for i := uintptr(0); i < at.len; i++ {
+			cgoCheckArg(at.elem, p, true, top, msg)
+			p = add(p, at.elem.size)
+		}
+	case kindChan, kindMap:
+		// These types contain internal pointers that will
+		// always be allocated in the Go heap. It's never OK
+		// to pass them to C.
+		panic(errorString(msg))
+	case kindFunc:
+		if indir {
+			p = *(*unsafe.Pointer)(p)
+		}
+		if !cgoIsGoPointer(p) {
+			return
+		}
+		panic(errorString(msg))
+	case kindInterface:
+		it := *(**_type)(p)
+		if it == nil {
+			return
+		}
+		// A type known at compile time is OK since it's
+		// constant. A type not known at compile time will be
+		// in the heap and will not be OK.
+		if inheap(uintptr(unsafe.Pointer(it))) {
+			panic(errorString(msg))
+		}
+		p = *(*unsafe.Pointer)(add(p, sys.PtrSize))
+		if !cgoIsGoPointer(p) {
+			return
+		}
+		if !top {
+			panic(errorString(msg))
+		}
+		cgoCheckArg(it, p, it.kind&kindDirectIface == 0, false, msg)
+	case kindSlice:
+		st := (*slicetype)(unsafe.Pointer(t))
+		s := (*slice)(p)
+		p = s.array
+		if p == nil || !cgoIsGoPointer(p) {
+			return
+		}
+		if !top {
+			panic(errorString(msg))
+		}
+		if st.elem.ptrdata == 0 {
+			return
+		}
+		for i := 0; i < s.cap; i++ {
+			cgoCheckArg(st.elem, p, true, false, msg)
+			p = add(p, st.elem.size)
+		}
+	case kindString:
+		ss := (*stringStruct)(p)
+		if !cgoIsGoPointer(ss.str) {
+			return
+		}
+		if !top {
+			panic(errorString(msg))
+		}
+	case kindStruct:
+		st := (*structtype)(unsafe.Pointer(t))
+		if !indir {
+			if len(st.fields) != 1 {
+				throw("can't happen")
+			}
+			cgoCheckArg(st.fields[0].typ, p, st.fields[0].typ.kind&kindDirectIface == 0, top, msg)
+			return
+		}
+		for _, f := range st.fields {
+			if f.typ.ptrdata == 0 {
+				continue
+			}
+			cgoCheckArg(f.typ, add(p, f.offset()), true, top, msg)
+		}
+	case kindPtr, kindUnsafePointer:
+		if indir {
+			p = *(*unsafe.Pointer)(p)
+			if p == nil {
+				return
+			}
+		}
+
+		if !cgoIsGoPointer(p) {
+			return
+		}
+		if !top {
+			panic(errorString(msg))
+		}
+
+		cgoCheckUnknownPointer(p, msg)
+	}
+}
+
+// cgoCheckUnknownPointer is called for an arbitrary pointer into Go
+// memory. It checks whether that Go memory contains any other
+// pointer into Go memory. If it does, we panic.
+// The return values are unused but useful to see in panic tracebacks.
+func cgoCheckUnknownPointer(p unsafe.Pointer, msg string) (base, i uintptr) {
+	if inheap(uintptr(p)) {
+		b, span, _ := findObject(uintptr(p), 0, 0)
+		base = b
+		if base == 0 {
+			return
+		}
+		hbits := heapBitsForAddr(base)
+		n := span.elemsize
+		for i = uintptr(0); i < n; i += sys.PtrSize {
+			if !hbits.morePointers() {
+				// No more possible pointers.
+				break
+			}
+			if hbits.isPointer() && cgoIsGoPointer(*(*unsafe.Pointer)(unsafe.Pointer(base + i))) {
+				panic(errorString(msg))
+			}
+			hbits = hbits.next()
+		}
+
+		return
+	}
+
+	for _, datap := range activeModules() {
+		if cgoInRange(p, datap.data, datap.edata) || cgoInRange(p, datap.bss, datap.ebss) {
+			// We have no way to know the size of the object.
+			// We have to assume that it might contain a pointer.
+			panic(errorString(msg))
+		}
+		// In the text or noptr sections, we know that the
+		// pointer does not point to a Go pointer.
+	}
+
+	return
+}
+
+// cgoIsGoPointer reports whether the pointer is a Go pointer--a
+// pointer to Go memory. We only care about Go memory that might
+// contain pointers.
+//go:nosplit
+//go:nowritebarrierrec
+func cgoIsGoPointer(p unsafe.Pointer) bool {
+	if p == nil {
+		return false
+	}
+
+	if inHeapOrStack(uintptr(p)) {
+		return true
+	}
+
+	for _, datap := range activeModules() {
+		if cgoInRange(p, datap.data, datap.edata) || cgoInRange(p, datap.bss, datap.ebss) {
+			return true
+		}
+	}
+
+	return false
+}
+
+// cgoInRange reports whether p is between start and end.
+//go:nosplit
+//go:nowritebarrierrec
+func cgoInRange(p unsafe.Pointer, start, end uintptr) bool {
+	return start <= uintptr(p) && uintptr(p) < end
+}
+
+// cgoCheckResult is called to check the result parameter of an
+// exported Go function. It panics if the result is or contains a Go
+// pointer.
+func cgoCheckResult(val interface{}) {
+	if debug.cgocheck == 0 {
+		return
+	}
+
+	ep := efaceOf(&val)
+	t := ep._type
+	cgoCheckArg(t, ep.data, t.kind&kindDirectIface == 0, false, cgoResultFail)
+}
diff --git a/src/runtime/cgocallback.go b/src/runtime/cgocallback.go
new file mode 100644
index 0000000..59953f1
--- /dev/null
+++ b/src/runtime/cgocallback.go
@@ -0,0 +1,13 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// These functions are called from C code via cgo/callbacks.go.
+
+// Panic.
+
+func _cgo_panic_internal(p *byte) {
+	panic(gostringnocopy(p))
+}
diff --git a/src/runtime/cgocheck.go b/src/runtime/cgocheck.go
new file mode 100644
index 0000000..516045c
--- /dev/null
+++ b/src/runtime/cgocheck.go
@@ -0,0 +1,263 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Code to check that pointer writes follow the cgo rules.
+// These functions are invoked via the write barrier when debug.cgocheck > 1.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const cgoWriteBarrierFail = "Go pointer stored into non-Go memory"
+
+// cgoCheckWriteBarrier is called whenever a pointer is stored into memory.
+// It throws if the program is storing a Go pointer into non-Go memory.
+//
+// This is called from the write barrier, so its entire call tree must
+// be nosplit.
+//
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckWriteBarrier(dst *uintptr, src uintptr) {
+	if !cgoIsGoPointer(unsafe.Pointer(src)) {
+		return
+	}
+	if cgoIsGoPointer(unsafe.Pointer(dst)) {
+		return
+	}
+
+	// If we are running on the system stack then dst might be an
+	// address on the stack, which is OK.
+	g := getg()
+	if g == g.m.g0 || g == g.m.gsignal {
+		return
+	}
+
+	// Allocating memory can write to various mfixalloc structs
+	// that look like they are non-Go memory.
+	if g.m.mallocing != 0 {
+		return
+	}
+
+	// It's OK if writing to memory allocated by persistentalloc.
+	// Do this check last because it is more expensive and rarely true.
+	// If it is false the expense doesn't matter since we are crashing.
+	if inPersistentAlloc(uintptr(unsafe.Pointer(dst))) {
+		return
+	}
+
+	systemstack(func() {
+		println("write of Go pointer", hex(src), "to non-Go memory", hex(uintptr(unsafe.Pointer(dst))))
+		throw(cgoWriteBarrierFail)
+	})
+}
+
+// cgoCheckMemmove is called when moving a block of memory.
+// dst and src point off bytes into the value to copy.
+// size is the number of bytes to copy.
+// It throws if the program is copying a block that contains a Go pointer
+// into non-Go memory.
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckMemmove(typ *_type, dst, src unsafe.Pointer, off, size uintptr) {
+	if typ.ptrdata == 0 {
+		return
+	}
+	if !cgoIsGoPointer(src) {
+		return
+	}
+	if cgoIsGoPointer(dst) {
+		return
+	}
+	cgoCheckTypedBlock(typ, src, off, size)
+}
+
+// cgoCheckSliceCopy is called when copying n elements of a slice.
+// src and dst are pointers to the first element of the slice.
+// typ is the element type of the slice.
+// It throws if the program is copying slice elements that contain Go pointers
+// into non-Go memory.
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckSliceCopy(typ *_type, dst, src unsafe.Pointer, n int) {
+	if typ.ptrdata == 0 {
+		return
+	}
+	if !cgoIsGoPointer(src) {
+		return
+	}
+	if cgoIsGoPointer(dst) {
+		return
+	}
+	p := src
+	for i := 0; i < n; i++ {
+		cgoCheckTypedBlock(typ, p, 0, typ.size)
+		p = add(p, typ.size)
+	}
+}
+
+// cgoCheckTypedBlock checks the block of memory at src, for up to size bytes,
+// and throws if it finds a Go pointer. The type of the memory is typ,
+// and src is off bytes into that type.
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckTypedBlock(typ *_type, src unsafe.Pointer, off, size uintptr) {
+	// Anything past typ.ptrdata is not a pointer.
+	if typ.ptrdata <= off {
+		return
+	}
+	if ptrdataSize := typ.ptrdata - off; size > ptrdataSize {
+		size = ptrdataSize
+	}
+
+	if typ.kind&kindGCProg == 0 {
+		cgoCheckBits(src, typ.gcdata, off, size)
+		return
+	}
+
+	// The type has a GC program. Try to find GC bits somewhere else.
+	for _, datap := range activeModules() {
+		if cgoInRange(src, datap.data, datap.edata) {
+			doff := uintptr(src) - datap.data
+			cgoCheckBits(add(src, -doff), datap.gcdatamask.bytedata, off+doff, size)
+			return
+		}
+		if cgoInRange(src, datap.bss, datap.ebss) {
+			boff := uintptr(src) - datap.bss
+			cgoCheckBits(add(src, -boff), datap.gcbssmask.bytedata, off+boff, size)
+			return
+		}
+	}
+
+	s := spanOfUnchecked(uintptr(src))
+	if s.state.get() == mSpanManual {
+		// There are no heap bits for value stored on the stack.
+		// For a channel receive src might be on the stack of some
+		// other goroutine, so we can't unwind the stack even if
+		// we wanted to.
+		// We can't expand the GC program without extra storage
+		// space we can't easily get.
+		// Fortunately we have the type information.
+		systemstack(func() {
+			cgoCheckUsingType(typ, src, off, size)
+		})
+		return
+	}
+
+	// src must be in the regular heap.
+
+	hbits := heapBitsForAddr(uintptr(src))
+	for i := uintptr(0); i < off+size; i += sys.PtrSize {
+		bits := hbits.bits()
+		if i >= off && bits&bitPointer != 0 {
+			v := *(*unsafe.Pointer)(add(src, i))
+			if cgoIsGoPointer(v) {
+				throw(cgoWriteBarrierFail)
+			}
+		}
+		hbits = hbits.next()
+	}
+}
+
+// cgoCheckBits checks the block of memory at src, for up to size
+// bytes, and throws if it finds a Go pointer. The gcbits mark each
+// pointer value. The src pointer is off bytes into the gcbits.
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckBits(src unsafe.Pointer, gcbits *byte, off, size uintptr) {
+	skipMask := off / sys.PtrSize / 8
+	skipBytes := skipMask * sys.PtrSize * 8
+	ptrmask := addb(gcbits, skipMask)
+	src = add(src, skipBytes)
+	off -= skipBytes
+	size += off
+	var bits uint32
+	for i := uintptr(0); i < size; i += sys.PtrSize {
+		if i&(sys.PtrSize*8-1) == 0 {
+			bits = uint32(*ptrmask)
+			ptrmask = addb(ptrmask, 1)
+		} else {
+			bits >>= 1
+		}
+		if off > 0 {
+			off -= sys.PtrSize
+		} else {
+			if bits&1 != 0 {
+				v := *(*unsafe.Pointer)(add(src, i))
+				if cgoIsGoPointer(v) {
+					throw(cgoWriteBarrierFail)
+				}
+			}
+		}
+	}
+}
+
+// cgoCheckUsingType is like cgoCheckTypedBlock, but is a last ditch
+// fall back to look for pointers in src using the type information.
+// We only use this when looking at a value on the stack when the type
+// uses a GC program, because otherwise it's more efficient to use the
+// GC bits. This is called on the system stack.
+//go:nowritebarrier
+//go:systemstack
+func cgoCheckUsingType(typ *_type, src unsafe.Pointer, off, size uintptr) {
+	if typ.ptrdata == 0 {
+		return
+	}
+
+	// Anything past typ.ptrdata is not a pointer.
+	if typ.ptrdata <= off {
+		return
+	}
+	if ptrdataSize := typ.ptrdata - off; size > ptrdataSize {
+		size = ptrdataSize
+	}
+
+	if typ.kind&kindGCProg == 0 {
+		cgoCheckBits(src, typ.gcdata, off, size)
+		return
+	}
+	switch typ.kind & kindMask {
+	default:
+		throw("can't happen")
+	case kindArray:
+		at := (*arraytype)(unsafe.Pointer(typ))
+		for i := uintptr(0); i < at.len; i++ {
+			if off < at.elem.size {
+				cgoCheckUsingType(at.elem, src, off, size)
+			}
+			src = add(src, at.elem.size)
+			skipped := off
+			if skipped > at.elem.size {
+				skipped = at.elem.size
+			}
+			checked := at.elem.size - skipped
+			off -= skipped
+			if size <= checked {
+				return
+			}
+			size -= checked
+		}
+	case kindStruct:
+		st := (*structtype)(unsafe.Pointer(typ))
+		for _, f := range st.fields {
+			if off < f.typ.size {
+				cgoCheckUsingType(f.typ, src, off, size)
+			}
+			src = add(src, f.typ.size)
+			skipped := off
+			if skipped > f.typ.size {
+				skipped = f.typ.size
+			}
+			checked := f.typ.size - skipped
+			off -= skipped
+			if size <= checked {
+				return
+			}
+			size -= checked
+		}
+	}
+}
diff --git a/src/runtime/chan.go b/src/runtime/chan.go
new file mode 100644
index 0000000..ba56e2c
--- /dev/null
+++ b/src/runtime/chan.go
@@ -0,0 +1,869 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// This file contains the implementation of Go channels.
+
+// Invariants:
+//  At least one of c.sendq and c.recvq is empty,
+//  except for the case of an unbuffered channel with a single goroutine
+//  blocked on it for both sending and receiving using a select statement,
+//  in which case the length of c.sendq and c.recvq is limited only by the
+//  size of the select statement.
+//
+// For buffered channels, also:
+//  c.qcount > 0 implies that c.recvq is empty.
+//  c.qcount < c.dataqsiz implies that c.sendq is empty.
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/math"
+	"unsafe"
+)
+
+const (
+	maxAlign  = 8
+	hchanSize = unsafe.Sizeof(hchan{}) + uintptr(-int(unsafe.Sizeof(hchan{}))&(maxAlign-1))
+	debugChan = false
+)
+
+type hchan struct {
+	qcount   uint           // total data in the queue
+	dataqsiz uint           // size of the circular queue
+	buf      unsafe.Pointer // points to an array of dataqsiz elements
+	elemsize uint16
+	closed   uint32
+	elemtype *_type // element type
+	sendx    uint   // send index
+	recvx    uint   // receive index
+	recvq    waitq  // list of recv waiters
+	sendq    waitq  // list of send waiters
+
+	// lock protects all fields in hchan, as well as several
+	// fields in sudogs blocked on this channel.
+	//
+	// Do not change another G's status while holding this lock
+	// (in particular, do not ready a G), as this can deadlock
+	// with stack shrinking.
+	lock mutex
+}
+
+type waitq struct {
+	first *sudog
+	last  *sudog
+}
+
+//go:linkname reflect_makechan reflect.makechan
+func reflect_makechan(t *chantype, size int) *hchan {
+	return makechan(t, size)
+}
+
+func makechan64(t *chantype, size int64) *hchan {
+	if int64(int(size)) != size {
+		panic(plainError("makechan: size out of range"))
+	}
+
+	return makechan(t, int(size))
+}
+
+func makechan(t *chantype, size int) *hchan {
+	elem := t.elem
+
+	// compiler checks this but be safe.
+	if elem.size >= 1<<16 {
+		throw("makechan: invalid channel element type")
+	}
+	if hchanSize%maxAlign != 0 || elem.align > maxAlign {
+		throw("makechan: bad alignment")
+	}
+
+	mem, overflow := math.MulUintptr(elem.size, uintptr(size))
+	if overflow || mem > maxAlloc-hchanSize || size < 0 {
+		panic(plainError("makechan: size out of range"))
+	}
+
+	// Hchan does not contain pointers interesting for GC when elements stored in buf do not contain pointers.
+	// buf points into the same allocation, elemtype is persistent.
+	// SudoG's are referenced from their owning thread so they can't be collected.
+	// TODO(dvyukov,rlh): Rethink when collector can move allocated objects.
+	var c *hchan
+	switch {
+	case mem == 0:
+		// Queue or element size is zero.
+		c = (*hchan)(mallocgc(hchanSize, nil, true))
+		// Race detector uses this location for synchronization.
+		c.buf = c.raceaddr()
+	case elem.ptrdata == 0:
+		// Elements do not contain pointers.
+		// Allocate hchan and buf in one call.
+		c = (*hchan)(mallocgc(hchanSize+mem, nil, true))
+		c.buf = add(unsafe.Pointer(c), hchanSize)
+	default:
+		// Elements contain pointers.
+		c = new(hchan)
+		c.buf = mallocgc(mem, elem, true)
+	}
+
+	c.elemsize = uint16(elem.size)
+	c.elemtype = elem
+	c.dataqsiz = uint(size)
+	lockInit(&c.lock, lockRankHchan)
+
+	if debugChan {
+		print("makechan: chan=", c, "; elemsize=", elem.size, "; dataqsiz=", size, "\n")
+	}
+	return c
+}
+
+// chanbuf(c, i) is pointer to the i'th slot in the buffer.
+func chanbuf(c *hchan, i uint) unsafe.Pointer {
+	return add(c.buf, uintptr(i)*uintptr(c.elemsize))
+}
+
+// full reports whether a send on c would block (that is, the channel is full).
+// It uses a single word-sized read of mutable state, so although
+// the answer is instantaneously true, the correct answer may have changed
+// by the time the calling function receives the return value.
+func full(c *hchan) bool {
+	// c.dataqsiz is immutable (never written after the channel is created)
+	// so it is safe to read at any time during channel operation.
+	if c.dataqsiz == 0 {
+		// Assumes that a pointer read is relaxed-atomic.
+		return c.recvq.first == nil
+	}
+	// Assumes that a uint read is relaxed-atomic.
+	return c.qcount == c.dataqsiz
+}
+
+// entry point for c <- x from compiled code
+//go:nosplit
+func chansend1(c *hchan, elem unsafe.Pointer) {
+	chansend(c, elem, true, getcallerpc())
+}
+
+/*
+ * generic single channel send/recv
+ * If block is not nil,
+ * then the protocol will not
+ * sleep but return if it could
+ * not complete.
+ *
+ * sleep can wake up with g.param == nil
+ * when a channel involved in the sleep has
+ * been closed.  it is easiest to loop and re-run
+ * the operation; we'll see that it's now closed.
+ */
+func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
+	if c == nil {
+		if !block {
+			return false
+		}
+		gopark(nil, nil, waitReasonChanSendNilChan, traceEvGoStop, 2)
+		throw("unreachable")
+	}
+
+	if debugChan {
+		print("chansend: chan=", c, "\n")
+	}
+
+	if raceenabled {
+		racereadpc(c.raceaddr(), callerpc, funcPC(chansend))
+	}
+
+	// Fast path: check for failed non-blocking operation without acquiring the lock.
+	//
+	// After observing that the channel is not closed, we observe that the channel is
+	// not ready for sending. Each of these observations is a single word-sized read
+	// (first c.closed and second full()).
+	// Because a closed channel cannot transition from 'ready for sending' to
+	// 'not ready for sending', even if the channel is closed between the two observations,
+	// they imply a moment between the two when the channel was both not yet closed
+	// and not ready for sending. We behave as if we observed the channel at that moment,
+	// and report that the send cannot proceed.
+	//
+	// It is okay if the reads are reordered here: if we observe that the channel is not
+	// ready for sending and then observe that it is not closed, that implies that the
+	// channel wasn't closed during the first observation. However, nothing here
+	// guarantees forward progress. We rely on the side effects of lock release in
+	// chanrecv() and closechan() to update this thread's view of c.closed and full().
+	if !block && c.closed == 0 && full(c) {
+		return false
+	}
+
+	var t0 int64
+	if blockprofilerate > 0 {
+		t0 = cputicks()
+	}
+
+	lock(&c.lock)
+
+	if c.closed != 0 {
+		unlock(&c.lock)
+		panic(plainError("send on closed channel"))
+	}
+
+	if sg := c.recvq.dequeue(); sg != nil {
+		// Found a waiting receiver. We pass the value we want to send
+		// directly to the receiver, bypassing the channel buffer (if any).
+		send(c, sg, ep, func() { unlock(&c.lock) }, 3)
+		return true
+	}
+
+	if c.qcount < c.dataqsiz {
+		// Space is available in the channel buffer. Enqueue the element to send.
+		qp := chanbuf(c, c.sendx)
+		if raceenabled {
+			racenotify(c, c.sendx, nil)
+		}
+		typedmemmove(c.elemtype, qp, ep)
+		c.sendx++
+		if c.sendx == c.dataqsiz {
+			c.sendx = 0
+		}
+		c.qcount++
+		unlock(&c.lock)
+		return true
+	}
+
+	if !block {
+		unlock(&c.lock)
+		return false
+	}
+
+	// Block on the channel. Some receiver will complete our operation for us.
+	gp := getg()
+	mysg := acquireSudog()
+	mysg.releasetime = 0
+	if t0 != 0 {
+		mysg.releasetime = -1
+	}
+	// No stack splits between assigning elem and enqueuing mysg
+	// on gp.waiting where copystack can find it.
+	mysg.elem = ep
+	mysg.waitlink = nil
+	mysg.g = gp
+	mysg.isSelect = false
+	mysg.c = c
+	gp.waiting = mysg
+	gp.param = nil
+	c.sendq.enqueue(mysg)
+	// Signal to anyone trying to shrink our stack that we're about
+	// to park on a channel. The window between when this G's status
+	// changes and when we set gp.activeStackChans is not safe for
+	// stack shrinking.
+	atomic.Store8(&gp.parkingOnChan, 1)
+	gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanSend, traceEvGoBlockSend, 2)
+	// Ensure the value being sent is kept alive until the
+	// receiver copies it out. The sudog has a pointer to the
+	// stack object, but sudogs aren't considered as roots of the
+	// stack tracer.
+	KeepAlive(ep)
+
+	// someone woke us up.
+	if mysg != gp.waiting {
+		throw("G waiting list is corrupted")
+	}
+	gp.waiting = nil
+	gp.activeStackChans = false
+	closed := !mysg.success
+	gp.param = nil
+	if mysg.releasetime > 0 {
+		blockevent(mysg.releasetime-t0, 2)
+	}
+	mysg.c = nil
+	releaseSudog(mysg)
+	if closed {
+		if c.closed == 0 {
+			throw("chansend: spurious wakeup")
+		}
+		panic(plainError("send on closed channel"))
+	}
+	return true
+}
+
+// send processes a send operation on an empty channel c.
+// The value ep sent by the sender is copied to the receiver sg.
+// The receiver is then woken up to go on its merry way.
+// Channel c must be empty and locked.  send unlocks c with unlockf.
+// sg must already be dequeued from c.
+// ep must be non-nil and point to the heap or the caller's stack.
+func send(c *hchan, sg *sudog, ep unsafe.Pointer, unlockf func(), skip int) {
+	if raceenabled {
+		if c.dataqsiz == 0 {
+			racesync(c, sg)
+		} else {
+			// Pretend we go through the buffer, even though
+			// we copy directly. Note that we need to increment
+			// the head/tail locations only when raceenabled.
+			racenotify(c, c.recvx, nil)
+			racenotify(c, c.recvx, sg)
+			c.recvx++
+			if c.recvx == c.dataqsiz {
+				c.recvx = 0
+			}
+			c.sendx = c.recvx // c.sendx = (c.sendx+1) % c.dataqsiz
+		}
+	}
+	if sg.elem != nil {
+		sendDirect(c.elemtype, sg, ep)
+		sg.elem = nil
+	}
+	gp := sg.g
+	unlockf()
+	gp.param = unsafe.Pointer(sg)
+	sg.success = true
+	if sg.releasetime != 0 {
+		sg.releasetime = cputicks()
+	}
+	goready(gp, skip+1)
+}
+
+// Sends and receives on unbuffered or empty-buffered channels are the
+// only operations where one running goroutine writes to the stack of
+// another running goroutine. The GC assumes that stack writes only
+// happen when the goroutine is running and are only done by that
+// goroutine. Using a write barrier is sufficient to make up for
+// violating that assumption, but the write barrier has to work.
+// typedmemmove will call bulkBarrierPreWrite, but the target bytes
+// are not in the heap, so that will not help. We arrange to call
+// memmove and typeBitsBulkBarrier instead.
+
+func sendDirect(t *_type, sg *sudog, src unsafe.Pointer) {
+	// src is on our stack, dst is a slot on another stack.
+
+	// Once we read sg.elem out of sg, it will no longer
+	// be updated if the destination's stack gets copied (shrunk).
+	// So make sure that no preemption points can happen between read & use.
+	dst := sg.elem
+	typeBitsBulkBarrier(t, uintptr(dst), uintptr(src), t.size)
+	// No need for cgo write barrier checks because dst is always
+	// Go memory.
+	memmove(dst, src, t.size)
+}
+
+func recvDirect(t *_type, sg *sudog, dst unsafe.Pointer) {
+	// dst is on our stack or the heap, src is on another stack.
+	// The channel is locked, so src will not move during this
+	// operation.
+	src := sg.elem
+	typeBitsBulkBarrier(t, uintptr(dst), uintptr(src), t.size)
+	memmove(dst, src, t.size)
+}
+
+func closechan(c *hchan) {
+	if c == nil {
+		panic(plainError("close of nil channel"))
+	}
+
+	lock(&c.lock)
+	if c.closed != 0 {
+		unlock(&c.lock)
+		panic(plainError("close of closed channel"))
+	}
+
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(c.raceaddr(), callerpc, funcPC(closechan))
+		racerelease(c.raceaddr())
+	}
+
+	c.closed = 1
+
+	var glist gList
+
+	// release all readers
+	for {
+		sg := c.recvq.dequeue()
+		if sg == nil {
+			break
+		}
+		if sg.elem != nil {
+			typedmemclr(c.elemtype, sg.elem)
+			sg.elem = nil
+		}
+		if sg.releasetime != 0 {
+			sg.releasetime = cputicks()
+		}
+		gp := sg.g
+		gp.param = unsafe.Pointer(sg)
+		sg.success = false
+		if raceenabled {
+			raceacquireg(gp, c.raceaddr())
+		}
+		glist.push(gp)
+	}
+
+	// release all writers (they will panic)
+	for {
+		sg := c.sendq.dequeue()
+		if sg == nil {
+			break
+		}
+		sg.elem = nil
+		if sg.releasetime != 0 {
+			sg.releasetime = cputicks()
+		}
+		gp := sg.g
+		gp.param = unsafe.Pointer(sg)
+		sg.success = false
+		if raceenabled {
+			raceacquireg(gp, c.raceaddr())
+		}
+		glist.push(gp)
+	}
+	unlock(&c.lock)
+
+	// Ready all Gs now that we've dropped the channel lock.
+	for !glist.empty() {
+		gp := glist.pop()
+		gp.schedlink = 0
+		goready(gp, 3)
+	}
+}
+
+// empty reports whether a read from c would block (that is, the channel is
+// empty).  It uses a single atomic read of mutable state.
+func empty(c *hchan) bool {
+	// c.dataqsiz is immutable.
+	if c.dataqsiz == 0 {
+		return atomic.Loadp(unsafe.Pointer(&c.sendq.first)) == nil
+	}
+	return atomic.Loaduint(&c.qcount) == 0
+}
+
+// entry points for <- c from compiled code
+//go:nosplit
+func chanrecv1(c *hchan, elem unsafe.Pointer) {
+	chanrecv(c, elem, true)
+}
+
+//go:nosplit
+func chanrecv2(c *hchan, elem unsafe.Pointer) (received bool) {
+	_, received = chanrecv(c, elem, true)
+	return
+}
+
+// chanrecv receives on channel c and writes the received data to ep.
+// ep may be nil, in which case received data is ignored.
+// If block == false and no elements are available, returns (false, false).
+// Otherwise, if c is closed, zeros *ep and returns (true, false).
+// Otherwise, fills in *ep with an element and returns (true, true).
+// A non-nil ep must point to the heap or the caller's stack.
+func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
+	// raceenabled: don't need to check ep, as it is always on the stack
+	// or is new memory allocated by reflect.
+
+	if debugChan {
+		print("chanrecv: chan=", c, "\n")
+	}
+
+	if c == nil {
+		if !block {
+			return
+		}
+		gopark(nil, nil, waitReasonChanReceiveNilChan, traceEvGoStop, 2)
+		throw("unreachable")
+	}
+
+	// Fast path: check for failed non-blocking operation without acquiring the lock.
+	if !block && empty(c) {
+		// After observing that the channel is not ready for receiving, we observe whether the
+		// channel is closed.
+		//
+		// Reordering of these checks could lead to incorrect behavior when racing with a close.
+		// For example, if the channel was open and not empty, was closed, and then drained,
+		// reordered reads could incorrectly indicate "open and empty". To prevent reordering,
+		// we use atomic loads for both checks, and rely on emptying and closing to happen in
+		// separate critical sections under the same lock.  This assumption fails when closing
+		// an unbuffered channel with a blocked send, but that is an error condition anyway.
+		if atomic.Load(&c.closed) == 0 {
+			// Because a channel cannot be reopened, the later observation of the channel
+			// being not closed implies that it was also not closed at the moment of the
+			// first observation. We behave as if we observed the channel at that moment
+			// and report that the receive cannot proceed.
+			return
+		}
+		// The channel is irreversibly closed. Re-check whether the channel has any pending data
+		// to receive, which could have arrived between the empty and closed checks above.
+		// Sequential consistency is also required here, when racing with such a send.
+		if empty(c) {
+			// The channel is irreversibly closed and empty.
+			if raceenabled {
+				raceacquire(c.raceaddr())
+			}
+			if ep != nil {
+				typedmemclr(c.elemtype, ep)
+			}
+			return true, false
+		}
+	}
+
+	var t0 int64
+	if blockprofilerate > 0 {
+		t0 = cputicks()
+	}
+
+	lock(&c.lock)
+
+	if c.closed != 0 && c.qcount == 0 {
+		if raceenabled {
+			raceacquire(c.raceaddr())
+		}
+		unlock(&c.lock)
+		if ep != nil {
+			typedmemclr(c.elemtype, ep)
+		}
+		return true, false
+	}
+
+	if sg := c.sendq.dequeue(); sg != nil {
+		// Found a waiting sender. If buffer is size 0, receive value
+		// directly from sender. Otherwise, receive from head of queue
+		// and add sender's value to the tail of the queue (both map to
+		// the same buffer slot because the queue is full).
+		recv(c, sg, ep, func() { unlock(&c.lock) }, 3)
+		return true, true
+	}
+
+	if c.qcount > 0 {
+		// Receive directly from queue
+		qp := chanbuf(c, c.recvx)
+		if raceenabled {
+			racenotify(c, c.recvx, nil)
+		}
+		if ep != nil {
+			typedmemmove(c.elemtype, ep, qp)
+		}
+		typedmemclr(c.elemtype, qp)
+		c.recvx++
+		if c.recvx == c.dataqsiz {
+			c.recvx = 0
+		}
+		c.qcount--
+		unlock(&c.lock)
+		return true, true
+	}
+
+	if !block {
+		unlock(&c.lock)
+		return false, false
+	}
+
+	// no sender available: block on this channel.
+	gp := getg()
+	mysg := acquireSudog()
+	mysg.releasetime = 0
+	if t0 != 0 {
+		mysg.releasetime = -1
+	}
+	// No stack splits between assigning elem and enqueuing mysg
+	// on gp.waiting where copystack can find it.
+	mysg.elem = ep
+	mysg.waitlink = nil
+	gp.waiting = mysg
+	mysg.g = gp
+	mysg.isSelect = false
+	mysg.c = c
+	gp.param = nil
+	c.recvq.enqueue(mysg)
+	// Signal to anyone trying to shrink our stack that we're about
+	// to park on a channel. The window between when this G's status
+	// changes and when we set gp.activeStackChans is not safe for
+	// stack shrinking.
+	atomic.Store8(&gp.parkingOnChan, 1)
+	gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanReceive, traceEvGoBlockRecv, 2)
+
+	// someone woke us up
+	if mysg != gp.waiting {
+		throw("G waiting list is corrupted")
+	}
+	gp.waiting = nil
+	gp.activeStackChans = false
+	if mysg.releasetime > 0 {
+		blockevent(mysg.releasetime-t0, 2)
+	}
+	success := mysg.success
+	gp.param = nil
+	mysg.c = nil
+	releaseSudog(mysg)
+	return true, success
+}
+
+// recv processes a receive operation on a full channel c.
+// There are 2 parts:
+// 1) The value sent by the sender sg is put into the channel
+//    and the sender is woken up to go on its merry way.
+// 2) The value received by the receiver (the current G) is
+//    written to ep.
+// For synchronous channels, both values are the same.
+// For asynchronous channels, the receiver gets its data from
+// the channel buffer and the sender's data is put in the
+// channel buffer.
+// Channel c must be full and locked. recv unlocks c with unlockf.
+// sg must already be dequeued from c.
+// A non-nil ep must point to the heap or the caller's stack.
+func recv(c *hchan, sg *sudog, ep unsafe.Pointer, unlockf func(), skip int) {
+	if c.dataqsiz == 0 {
+		if raceenabled {
+			racesync(c, sg)
+		}
+		if ep != nil {
+			// copy data from sender
+			recvDirect(c.elemtype, sg, ep)
+		}
+	} else {
+		// Queue is full. Take the item at the
+		// head of the queue. Make the sender enqueue
+		// its item at the tail of the queue. Since the
+		// queue is full, those are both the same slot.
+		qp := chanbuf(c, c.recvx)
+		if raceenabled {
+			racenotify(c, c.recvx, nil)
+			racenotify(c, c.recvx, sg)
+		}
+		// copy data from queue to receiver
+		if ep != nil {
+			typedmemmove(c.elemtype, ep, qp)
+		}
+		// copy data from sender to queue
+		typedmemmove(c.elemtype, qp, sg.elem)
+		c.recvx++
+		if c.recvx == c.dataqsiz {
+			c.recvx = 0
+		}
+		c.sendx = c.recvx // c.sendx = (c.sendx+1) % c.dataqsiz
+	}
+	sg.elem = nil
+	gp := sg.g
+	unlockf()
+	gp.param = unsafe.Pointer(sg)
+	sg.success = true
+	if sg.releasetime != 0 {
+		sg.releasetime = cputicks()
+	}
+	goready(gp, skip+1)
+}
+
+func chanparkcommit(gp *g, chanLock unsafe.Pointer) bool {
+	// There are unlocked sudogs that point into gp's stack. Stack
+	// copying must lock the channels of those sudogs.
+	// Set activeStackChans here instead of before we try parking
+	// because we could self-deadlock in stack growth on the
+	// channel lock.
+	gp.activeStackChans = true
+	// Mark that it's safe for stack shrinking to occur now,
+	// because any thread acquiring this G's stack for shrinking
+	// is guaranteed to observe activeStackChans after this store.
+	atomic.Store8(&gp.parkingOnChan, 0)
+	// Make sure we unlock after setting activeStackChans and
+	// unsetting parkingOnChan. The moment we unlock chanLock
+	// we risk gp getting readied by a channel operation and
+	// so gp could continue running before everything before
+	// the unlock is visible (even to gp itself).
+	unlock((*mutex)(chanLock))
+	return true
+}
+
+// compiler implements
+//
+//	select {
+//	case c <- v:
+//		... foo
+//	default:
+//		... bar
+//	}
+//
+// as
+//
+//	if selectnbsend(c, v) {
+//		... foo
+//	} else {
+//		... bar
+//	}
+//
+func selectnbsend(c *hchan, elem unsafe.Pointer) (selected bool) {
+	return chansend(c, elem, false, getcallerpc())
+}
+
+// compiler implements
+//
+//	select {
+//	case v = <-c:
+//		... foo
+//	default:
+//		... bar
+//	}
+//
+// as
+//
+//	if selectnbrecv(&v, c) {
+//		... foo
+//	} else {
+//		... bar
+//	}
+//
+func selectnbrecv(elem unsafe.Pointer, c *hchan) (selected bool) {
+	selected, _ = chanrecv(c, elem, false)
+	return
+}
+
+// compiler implements
+//
+//	select {
+//	case v, ok = <-c:
+//		... foo
+//	default:
+//		... bar
+//	}
+//
+// as
+//
+//	if c != nil && selectnbrecv2(&v, &ok, c) {
+//		... foo
+//	} else {
+//		... bar
+//	}
+//
+func selectnbrecv2(elem unsafe.Pointer, received *bool, c *hchan) (selected bool) {
+	// TODO(khr): just return 2 values from this function, now that it is in Go.
+	selected, *received = chanrecv(c, elem, false)
+	return
+}
+
+//go:linkname reflect_chansend reflect.chansend
+func reflect_chansend(c *hchan, elem unsafe.Pointer, nb bool) (selected bool) {
+	return chansend(c, elem, !nb, getcallerpc())
+}
+
+//go:linkname reflect_chanrecv reflect.chanrecv
+func reflect_chanrecv(c *hchan, nb bool, elem unsafe.Pointer) (selected bool, received bool) {
+	return chanrecv(c, elem, !nb)
+}
+
+//go:linkname reflect_chanlen reflect.chanlen
+func reflect_chanlen(c *hchan) int {
+	if c == nil {
+		return 0
+	}
+	return int(c.qcount)
+}
+
+//go:linkname reflectlite_chanlen internal/reflectlite.chanlen
+func reflectlite_chanlen(c *hchan) int {
+	if c == nil {
+		return 0
+	}
+	return int(c.qcount)
+}
+
+//go:linkname reflect_chancap reflect.chancap
+func reflect_chancap(c *hchan) int {
+	if c == nil {
+		return 0
+	}
+	return int(c.dataqsiz)
+}
+
+//go:linkname reflect_chanclose reflect.chanclose
+func reflect_chanclose(c *hchan) {
+	closechan(c)
+}
+
+func (q *waitq) enqueue(sgp *sudog) {
+	sgp.next = nil
+	x := q.last
+	if x == nil {
+		sgp.prev = nil
+		q.first = sgp
+		q.last = sgp
+		return
+	}
+	sgp.prev = x
+	x.next = sgp
+	q.last = sgp
+}
+
+func (q *waitq) dequeue() *sudog {
+	for {
+		sgp := q.first
+		if sgp == nil {
+			return nil
+		}
+		y := sgp.next
+		if y == nil {
+			q.first = nil
+			q.last = nil
+		} else {
+			y.prev = nil
+			q.first = y
+			sgp.next = nil // mark as removed (see dequeueSudog)
+		}
+
+		// if a goroutine was put on this queue because of a
+		// select, there is a small window between the goroutine
+		// being woken up by a different case and it grabbing the
+		// channel locks. Once it has the lock
+		// it removes itself from the queue, so we won't see it after that.
+		// We use a flag in the G struct to tell us when someone
+		// else has won the race to signal this goroutine but the goroutine
+		// hasn't removed itself from the queue yet.
+		if sgp.isSelect && !atomic.Cas(&sgp.g.selectDone, 0, 1) {
+			continue
+		}
+
+		return sgp
+	}
+}
+
+func (c *hchan) raceaddr() unsafe.Pointer {
+	// Treat read-like and write-like operations on the channel to
+	// happen at this address. Avoid using the address of qcount
+	// or dataqsiz, because the len() and cap() builtins read
+	// those addresses, and we don't want them racing with
+	// operations like close().
+	return unsafe.Pointer(&c.buf)
+}
+
+func racesync(c *hchan, sg *sudog) {
+	racerelease(chanbuf(c, 0))
+	raceacquireg(sg.g, chanbuf(c, 0))
+	racereleaseg(sg.g, chanbuf(c, 0))
+	raceacquire(chanbuf(c, 0))
+}
+
+// Notify the race detector of a send or receive involving buffer entry idx
+// and a channel c or its communicating partner sg.
+// This function handles the special case of c.elemsize==0.
+func racenotify(c *hchan, idx uint, sg *sudog) {
+	// We could have passed the unsafe.Pointer corresponding to entry idx
+	// instead of idx itself.  However, in a future version of this function,
+	// we can use idx to better handle the case of elemsize==0.
+	// A future improvement to the detector is to call TSan with c and idx:
+	// this way, Go will continue to not allocating buffer entries for channels
+	// of elemsize==0, yet the race detector can be made to handle multiple
+	// sync objects underneath the hood (one sync object per idx)
+	qp := chanbuf(c, idx)
+	// When elemsize==0, we don't allocate a full buffer for the channel.
+	// Instead of individual buffer entries, the race detector uses the
+	// c.buf as the only buffer entry.  This simplification prevents us from
+	// following the memory model's happens-before rules (rules that are
+	// implemented in racereleaseacquire).  Instead, we accumulate happens-before
+	// information in the synchronization object associated with c.buf.
+	if c.elemsize == 0 {
+		if sg == nil {
+			raceacquire(qp)
+			racerelease(qp)
+		} else {
+			raceacquireg(sg.g, qp)
+			racereleaseg(sg.g, qp)
+		}
+	} else {
+		if sg == nil {
+			racereleaseacquire(qp)
+		} else {
+			racereleaseacquireg(sg.g, qp)
+		}
+	}
+}
diff --git a/src/runtime/chan_test.go b/src/runtime/chan_test.go
new file mode 100644
index 0000000..756bbbe
--- /dev/null
+++ b/src/runtime/chan_test.go
@@ -0,0 +1,1213 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"internal/testenv"
+	"math"
+	"runtime"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"time"
+)
+
+func TestChan(t *testing.T) {
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(4))
+	N := 200
+	if testing.Short() {
+		N = 20
+	}
+	for chanCap := 0; chanCap < N; chanCap++ {
+		{
+			// Ensure that receive from empty chan blocks.
+			c := make(chan int, chanCap)
+			recv1 := false
+			go func() {
+				_ = <-c
+				recv1 = true
+			}()
+			recv2 := false
+			go func() {
+				_, _ = <-c
+				recv2 = true
+			}()
+			time.Sleep(time.Millisecond)
+			if recv1 || recv2 {
+				t.Fatalf("chan[%d]: receive from empty chan", chanCap)
+			}
+			// Ensure that non-blocking receive does not block.
+			select {
+			case _ = <-c:
+				t.Fatalf("chan[%d]: receive from empty chan", chanCap)
+			default:
+			}
+			select {
+			case _, _ = <-c:
+				t.Fatalf("chan[%d]: receive from empty chan", chanCap)
+			default:
+			}
+			c <- 0
+			c <- 0
+		}
+
+		{
+			// Ensure that send to full chan blocks.
+			c := make(chan int, chanCap)
+			for i := 0; i < chanCap; i++ {
+				c <- i
+			}
+			sent := uint32(0)
+			go func() {
+				c <- 0
+				atomic.StoreUint32(&sent, 1)
+			}()
+			time.Sleep(time.Millisecond)
+			if atomic.LoadUint32(&sent) != 0 {
+				t.Fatalf("chan[%d]: send to full chan", chanCap)
+			}
+			// Ensure that non-blocking send does not block.
+			select {
+			case c <- 0:
+				t.Fatalf("chan[%d]: send to full chan", chanCap)
+			default:
+			}
+			<-c
+		}
+
+		{
+			// Ensure that we receive 0 from closed chan.
+			c := make(chan int, chanCap)
+			for i := 0; i < chanCap; i++ {
+				c <- i
+			}
+			close(c)
+			for i := 0; i < chanCap; i++ {
+				v := <-c
+				if v != i {
+					t.Fatalf("chan[%d]: received %v, expected %v", chanCap, v, i)
+				}
+			}
+			if v := <-c; v != 0 {
+				t.Fatalf("chan[%d]: received %v, expected %v", chanCap, v, 0)
+			}
+			if v, ok := <-c; v != 0 || ok {
+				t.Fatalf("chan[%d]: received %v/%v, expected %v/%v", chanCap, v, ok, 0, false)
+			}
+		}
+
+		{
+			// Ensure that close unblocks receive.
+			c := make(chan int, chanCap)
+			done := make(chan bool)
+			go func() {
+				v, ok := <-c
+				done <- v == 0 && ok == false
+			}()
+			time.Sleep(time.Millisecond)
+			close(c)
+			if !<-done {
+				t.Fatalf("chan[%d]: received non zero from closed chan", chanCap)
+			}
+		}
+
+		{
+			// Send 100 integers,
+			// ensure that we receive them non-corrupted in FIFO order.
+			c := make(chan int, chanCap)
+			go func() {
+				for i := 0; i < 100; i++ {
+					c <- i
+				}
+			}()
+			for i := 0; i < 100; i++ {
+				v := <-c
+				if v != i {
+					t.Fatalf("chan[%d]: received %v, expected %v", chanCap, v, i)
+				}
+			}
+
+			// Same, but using recv2.
+			go func() {
+				for i := 0; i < 100; i++ {
+					c <- i
+				}
+			}()
+			for i := 0; i < 100; i++ {
+				v, ok := <-c
+				if !ok {
+					t.Fatalf("chan[%d]: receive failed, expected %v", chanCap, i)
+				}
+				if v != i {
+					t.Fatalf("chan[%d]: received %v, expected %v", chanCap, v, i)
+				}
+			}
+
+			// Send 1000 integers in 4 goroutines,
+			// ensure that we receive what we send.
+			const P = 4
+			const L = 1000
+			for p := 0; p < P; p++ {
+				go func() {
+					for i := 0; i < L; i++ {
+						c <- i
+					}
+				}()
+			}
+			done := make(chan map[int]int)
+			for p := 0; p < P; p++ {
+				go func() {
+					recv := make(map[int]int)
+					for i := 0; i < L; i++ {
+						v := <-c
+						recv[v] = recv[v] + 1
+					}
+					done <- recv
+				}()
+			}
+			recv := make(map[int]int)
+			for p := 0; p < P; p++ {
+				for k, v := range <-done {
+					recv[k] = recv[k] + v
+				}
+			}
+			if len(recv) != L {
+				t.Fatalf("chan[%d]: received %v values, expected %v", chanCap, len(recv), L)
+			}
+			for _, v := range recv {
+				if v != P {
+					t.Fatalf("chan[%d]: received %v values, expected %v", chanCap, v, P)
+				}
+			}
+		}
+
+		{
+			// Test len/cap.
+			c := make(chan int, chanCap)
+			if len(c) != 0 || cap(c) != chanCap {
+				t.Fatalf("chan[%d]: bad len/cap, expect %v/%v, got %v/%v", chanCap, 0, chanCap, len(c), cap(c))
+			}
+			for i := 0; i < chanCap; i++ {
+				c <- i
+			}
+			if len(c) != chanCap || cap(c) != chanCap {
+				t.Fatalf("chan[%d]: bad len/cap, expect %v/%v, got %v/%v", chanCap, chanCap, chanCap, len(c), cap(c))
+			}
+		}
+
+	}
+}
+
+func TestNonblockRecvRace(t *testing.T) {
+	n := 10000
+	if testing.Short() {
+		n = 100
+	}
+	for i := 0; i < n; i++ {
+		c := make(chan int, 1)
+		c <- 1
+		go func() {
+			select {
+			case <-c:
+			default:
+				t.Error("chan is not ready")
+			}
+		}()
+		close(c)
+		<-c
+		if t.Failed() {
+			return
+		}
+	}
+}
+
+// This test checks that select acts on the state of the channels at one
+// moment in the execution, not over a smeared time window.
+// In the test, one goroutine does:
+//	create c1, c2
+//	make c1 ready for receiving
+//	create second goroutine
+//	make c2 ready for receiving
+//	make c1 no longer ready for receiving (if possible)
+// The second goroutine does a non-blocking select receiving from c1 and c2.
+// From the time the second goroutine is created, at least one of c1 and c2
+// is always ready for receiving, so the select in the second goroutine must
+// always receive from one or the other. It must never execute the default case.
+func TestNonblockSelectRace(t *testing.T) {
+	n := 100000
+	if testing.Short() {
+		n = 1000
+	}
+	done := make(chan bool, 1)
+	for i := 0; i < n; i++ {
+		c1 := make(chan int, 1)
+		c2 := make(chan int, 1)
+		c1 <- 1
+		go func() {
+			select {
+			case <-c1:
+			case <-c2:
+			default:
+				done <- false
+				return
+			}
+			done <- true
+		}()
+		c2 <- 1
+		select {
+		case <-c1:
+		default:
+		}
+		if !<-done {
+			t.Fatal("no chan is ready")
+		}
+	}
+}
+
+// Same as TestNonblockSelectRace, but close(c2) replaces c2 <- 1.
+func TestNonblockSelectRace2(t *testing.T) {
+	n := 100000
+	if testing.Short() {
+		n = 1000
+	}
+	done := make(chan bool, 1)
+	for i := 0; i < n; i++ {
+		c1 := make(chan int, 1)
+		c2 := make(chan int)
+		c1 <- 1
+		go func() {
+			select {
+			case <-c1:
+			case <-c2:
+			default:
+				done <- false
+				return
+			}
+			done <- true
+		}()
+		close(c2)
+		select {
+		case <-c1:
+		default:
+		}
+		if !<-done {
+			t.Fatal("no chan is ready")
+		}
+	}
+}
+
+func TestSelfSelect(t *testing.T) {
+	// Ensure that send/recv on the same chan in select
+	// does not crash nor deadlock.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(2))
+	for _, chanCap := range []int{0, 10} {
+		var wg sync.WaitGroup
+		wg.Add(2)
+		c := make(chan int, chanCap)
+		for p := 0; p < 2; p++ {
+			p := p
+			go func() {
+				defer wg.Done()
+				for i := 0; i < 1000; i++ {
+					if p == 0 || i%2 == 0 {
+						select {
+						case c <- p:
+						case v := <-c:
+							if chanCap == 0 && v == p {
+								t.Errorf("self receive")
+								return
+							}
+						}
+					} else {
+						select {
+						case v := <-c:
+							if chanCap == 0 && v == p {
+								t.Errorf("self receive")
+								return
+							}
+						case c <- p:
+						}
+					}
+				}
+			}()
+		}
+		wg.Wait()
+	}
+}
+
+func TestSelectStress(t *testing.T) {
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(10))
+	var c [4]chan int
+	c[0] = make(chan int)
+	c[1] = make(chan int)
+	c[2] = make(chan int, 2)
+	c[3] = make(chan int, 3)
+	N := int(1e5)
+	if testing.Short() {
+		N /= 10
+	}
+	// There are 4 goroutines that send N values on each of the chans,
+	// + 4 goroutines that receive N values on each of the chans,
+	// + 1 goroutine that sends N values on each of the chans in a single select,
+	// + 1 goroutine that receives N values on each of the chans in a single select.
+	// All these sends, receives and selects interact chaotically at runtime,
+	// but we are careful that this whole construct does not deadlock.
+	var wg sync.WaitGroup
+	wg.Add(10)
+	for k := 0; k < 4; k++ {
+		k := k
+		go func() {
+			for i := 0; i < N; i++ {
+				c[k] <- 0
+			}
+			wg.Done()
+		}()
+		go func() {
+			for i := 0; i < N; i++ {
+				<-c[k]
+			}
+			wg.Done()
+		}()
+	}
+	go func() {
+		var n [4]int
+		c1 := c
+		for i := 0; i < 4*N; i++ {
+			select {
+			case c1[3] <- 0:
+				n[3]++
+				if n[3] == N {
+					c1[3] = nil
+				}
+			case c1[2] <- 0:
+				n[2]++
+				if n[2] == N {
+					c1[2] = nil
+				}
+			case c1[0] <- 0:
+				n[0]++
+				if n[0] == N {
+					c1[0] = nil
+				}
+			case c1[1] <- 0:
+				n[1]++
+				if n[1] == N {
+					c1[1] = nil
+				}
+			}
+		}
+		wg.Done()
+	}()
+	go func() {
+		var n [4]int
+		c1 := c
+		for i := 0; i < 4*N; i++ {
+			select {
+			case <-c1[0]:
+				n[0]++
+				if n[0] == N {
+					c1[0] = nil
+				}
+			case <-c1[1]:
+				n[1]++
+				if n[1] == N {
+					c1[1] = nil
+				}
+			case <-c1[2]:
+				n[2]++
+				if n[2] == N {
+					c1[2] = nil
+				}
+			case <-c1[3]:
+				n[3]++
+				if n[3] == N {
+					c1[3] = nil
+				}
+			}
+		}
+		wg.Done()
+	}()
+	wg.Wait()
+}
+
+func TestSelectFairness(t *testing.T) {
+	const trials = 10000
+	if runtime.GOOS == "linux" && runtime.GOARCH == "ppc64le" {
+		testenv.SkipFlaky(t, 22047)
+	}
+	c1 := make(chan byte, trials+1)
+	c2 := make(chan byte, trials+1)
+	for i := 0; i < trials+1; i++ {
+		c1 <- 1
+		c2 <- 2
+	}
+	c3 := make(chan byte)
+	c4 := make(chan byte)
+	out := make(chan byte)
+	done := make(chan byte)
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		for {
+			var b byte
+			select {
+			case b = <-c3:
+			case b = <-c4:
+			case b = <-c1:
+			case b = <-c2:
+			}
+			select {
+			case out <- b:
+			case <-done:
+				return
+			}
+		}
+	}()
+	cnt1, cnt2 := 0, 0
+	for i := 0; i < trials; i++ {
+		switch b := <-out; b {
+		case 1:
+			cnt1++
+		case 2:
+			cnt2++
+		default:
+			t.Fatalf("unexpected value %d on channel", b)
+		}
+	}
+	// If the select in the goroutine is fair,
+	// cnt1 and cnt2 should be about the same value.
+	// With 10,000 trials, the expected margin of error at
+	// a confidence level of six nines is 4.891676 / (2 * Sqrt(10000)).
+	r := float64(cnt1) / trials
+	e := math.Abs(r - 0.5)
+	t.Log(cnt1, cnt2, r, e)
+	if e > 4.891676/(2*math.Sqrt(trials)) {
+		t.Errorf("unfair select: in %d trials, results were %d, %d", trials, cnt1, cnt2)
+	}
+	close(done)
+	wg.Wait()
+}
+
+func TestChanSendInterface(t *testing.T) {
+	type mt struct{}
+	m := &mt{}
+	c := make(chan interface{}, 1)
+	c <- m
+	select {
+	case c <- m:
+	default:
+	}
+	select {
+	case c <- m:
+	case c <- &mt{}:
+	default:
+	}
+}
+
+func TestPseudoRandomSend(t *testing.T) {
+	n := 100
+	for _, chanCap := range []int{0, n} {
+		c := make(chan int, chanCap)
+		l := make([]int, n)
+		var m sync.Mutex
+		m.Lock()
+		go func() {
+			for i := 0; i < n; i++ {
+				runtime.Gosched()
+				l[i] = <-c
+			}
+			m.Unlock()
+		}()
+		for i := 0; i < n; i++ {
+			select {
+			case c <- 1:
+			case c <- 0:
+			}
+		}
+		m.Lock() // wait
+		n0 := 0
+		n1 := 0
+		for _, i := range l {
+			n0 += (i + 1) % 2
+			n1 += i
+		}
+		if n0 <= n/10 || n1 <= n/10 {
+			t.Errorf("Want pseudorandom, got %d zeros and %d ones (chan cap %d)", n0, n1, chanCap)
+		}
+	}
+}
+
+func TestMultiConsumer(t *testing.T) {
+	const nwork = 23
+	const niter = 271828
+
+	pn := []int{2, 3, 7, 11, 13, 17, 19, 23, 27, 31}
+
+	q := make(chan int, nwork*3)
+	r := make(chan int, nwork*3)
+
+	// workers
+	var wg sync.WaitGroup
+	for i := 0; i < nwork; i++ {
+		wg.Add(1)
+		go func(w int) {
+			for v := range q {
+				// mess with the fifo-ish nature of range
+				if pn[w%len(pn)] == v {
+					runtime.Gosched()
+				}
+				r <- v
+			}
+			wg.Done()
+		}(i)
+	}
+
+	// feeder & closer
+	expect := 0
+	go func() {
+		for i := 0; i < niter; i++ {
+			v := pn[i%len(pn)]
+			expect += v
+			q <- v
+		}
+		close(q)  // no more work
+		wg.Wait() // workers done
+		close(r)  // ... so there can be no more results
+	}()
+
+	// consume & check
+	n := 0
+	s := 0
+	for v := range r {
+		n++
+		s += v
+	}
+	if n != niter || s != expect {
+		t.Errorf("Expected sum %d (got %d) from %d iter (saw %d)",
+			expect, s, niter, n)
+	}
+}
+
+func TestShrinkStackDuringBlockedSend(t *testing.T) {
+	// make sure that channel operations still work when we are
+	// blocked on a channel send and we shrink the stack.
+	// NOTE: this test probably won't fail unless stack1.go:stackDebug
+	// is set to >= 1.
+	const n = 10
+	c := make(chan int)
+	done := make(chan struct{})
+
+	go func() {
+		for i := 0; i < n; i++ {
+			c <- i
+			// use lots of stack, briefly.
+			stackGrowthRecursive(20)
+		}
+		done <- struct{}{}
+	}()
+
+	for i := 0; i < n; i++ {
+		x := <-c
+		if x != i {
+			t.Errorf("bad channel read: want %d, got %d", i, x)
+		}
+		// Waste some time so sender can finish using lots of stack
+		// and block in channel send.
+		time.Sleep(1 * time.Millisecond)
+		// trigger GC which will shrink the stack of the sender.
+		runtime.GC()
+	}
+	<-done
+}
+
+func TestNoShrinkStackWhileParking(t *testing.T) {
+	// The goal of this test is to trigger a "racy sudog adjustment"
+	// throw. Basically, there's a window between when a goroutine
+	// becomes available for preemption for stack scanning (and thus,
+	// stack shrinking) but before the goroutine has fully parked on a
+	// channel. See issue 40641 for more details on the problem.
+	//
+	// The way we try to induce this failure is to set up two
+	// goroutines: a sender and a reciever that communicate across
+	// a channel. We try to set up a situation where the sender
+	// grows its stack temporarily then *fully* blocks on a channel
+	// often. Meanwhile a GC is triggered so that we try to get a
+	// mark worker to shrink the sender's stack and race with the
+	// sender parking.
+	//
+	// Unfortunately the race window here is so small that we
+	// either need a ridiculous number of iterations, or we add
+	// "usleep(1000)" to park_m, just before the unlockf call.
+	const n = 10
+	send := func(c chan<- int, done chan struct{}) {
+		for i := 0; i < n; i++ {
+			c <- i
+			// Use lots of stack briefly so that
+			// the GC is going to want to shrink us
+			// when it scans us. Make sure not to
+			// do any function calls otherwise
+			// in order to avoid us shrinking ourselves
+			// when we're preempted.
+			stackGrowthRecursive(20)
+		}
+		done <- struct{}{}
+	}
+	recv := func(c <-chan int, done chan struct{}) {
+		for i := 0; i < n; i++ {
+			// Sleep here so that the sender always
+			// fully blocks.
+			time.Sleep(10 * time.Microsecond)
+			<-c
+		}
+		done <- struct{}{}
+	}
+	for i := 0; i < n*20; i++ {
+		c := make(chan int)
+		done := make(chan struct{})
+		go recv(c, done)
+		go send(c, done)
+		// Wait a little bit before triggering
+		// the GC to make sure the sender and
+		// reciever have gotten into their groove.
+		time.Sleep(50 * time.Microsecond)
+		runtime.GC()
+		<-done
+		<-done
+	}
+}
+
+func TestSelectDuplicateChannel(t *testing.T) {
+	// This test makes sure we can queue a G on
+	// the same channel multiple times.
+	c := make(chan int)
+	d := make(chan int)
+	e := make(chan int)
+
+	// goroutine A
+	go func() {
+		select {
+		case <-c:
+		case <-c:
+		case <-d:
+		}
+		e <- 9
+	}()
+	time.Sleep(time.Millisecond) // make sure goroutine A gets queued first on c
+
+	// goroutine B
+	go func() {
+		<-c
+	}()
+	time.Sleep(time.Millisecond) // make sure goroutine B gets queued on c before continuing
+
+	d <- 7 // wake up A, it dequeues itself from c.  This operation used to corrupt c.recvq.
+	<-e    // A tells us it's done
+	c <- 8 // wake up B.  This operation used to fail because c.recvq was corrupted (it tries to wake up an already running G instead of B)
+}
+
+var selectSink interface{}
+
+func TestSelectStackAdjust(t *testing.T) {
+	// Test that channel receive slots that contain local stack
+	// pointers are adjusted correctly by stack shrinking.
+	c := make(chan *int)
+	d := make(chan *int)
+	ready1 := make(chan bool)
+	ready2 := make(chan bool)
+
+	f := func(ready chan bool, dup bool) {
+		// Temporarily grow the stack to 10K.
+		stackGrowthRecursive((10 << 10) / (128 * 8))
+
+		// We're ready to trigger GC and stack shrink.
+		ready <- true
+
+		val := 42
+		var cx *int
+		cx = &val
+
+		var c2 chan *int
+		var d2 chan *int
+		if dup {
+			c2 = c
+			d2 = d
+		}
+
+		// Receive from d. cx won't be affected.
+		select {
+		case cx = <-c:
+		case <-c2:
+		case <-d:
+		case <-d2:
+		}
+
+		// Check that pointer in cx was adjusted correctly.
+		if cx != &val {
+			t.Error("cx no longer points to val")
+		} else if val != 42 {
+			t.Error("val changed")
+		} else {
+			*cx = 43
+			if val != 43 {
+				t.Error("changing *cx failed to change val")
+			}
+		}
+		ready <- true
+	}
+
+	go f(ready1, false)
+	go f(ready2, true)
+
+	// Let the goroutines get into the select.
+	<-ready1
+	<-ready2
+	time.Sleep(10 * time.Millisecond)
+
+	// Force concurrent GC a few times.
+	var before, after runtime.MemStats
+	runtime.ReadMemStats(&before)
+	for i := 0; i < 100; i++ {
+		selectSink = new([1 << 20]byte)
+		runtime.ReadMemStats(&after)
+		if after.NumGC-before.NumGC >= 2 {
+			goto done
+		}
+		runtime.Gosched()
+	}
+	t.Fatal("failed to trigger concurrent GC")
+done:
+	selectSink = nil
+
+	// Wake selects.
+	close(d)
+	<-ready1
+	<-ready2
+}
+
+type struct0 struct{}
+
+func BenchmarkMakeChan(b *testing.B) {
+	b.Run("Byte", func(b *testing.B) {
+		var x chan byte
+		for i := 0; i < b.N; i++ {
+			x = make(chan byte, 8)
+		}
+		close(x)
+	})
+	b.Run("Int", func(b *testing.B) {
+		var x chan int
+		for i := 0; i < b.N; i++ {
+			x = make(chan int, 8)
+		}
+		close(x)
+	})
+	b.Run("Ptr", func(b *testing.B) {
+		var x chan *byte
+		for i := 0; i < b.N; i++ {
+			x = make(chan *byte, 8)
+		}
+		close(x)
+	})
+	b.Run("Struct", func(b *testing.B) {
+		b.Run("0", func(b *testing.B) {
+			var x chan struct0
+			for i := 0; i < b.N; i++ {
+				x = make(chan struct0, 8)
+			}
+			close(x)
+		})
+		b.Run("32", func(b *testing.B) {
+			var x chan struct32
+			for i := 0; i < b.N; i++ {
+				x = make(chan struct32, 8)
+			}
+			close(x)
+		})
+		b.Run("40", func(b *testing.B) {
+			var x chan struct40
+			for i := 0; i < b.N; i++ {
+				x = make(chan struct40, 8)
+			}
+			close(x)
+		})
+	})
+}
+
+func BenchmarkChanNonblocking(b *testing.B) {
+	myc := make(chan int)
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			select {
+			case <-myc:
+			default:
+			}
+		}
+	})
+}
+
+func BenchmarkSelectUncontended(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		myc1 := make(chan int, 1)
+		myc2 := make(chan int, 1)
+		myc1 <- 0
+		for pb.Next() {
+			select {
+			case <-myc1:
+				myc2 <- 0
+			case <-myc2:
+				myc1 <- 0
+			}
+		}
+	})
+}
+
+func BenchmarkSelectSyncContended(b *testing.B) {
+	myc1 := make(chan int)
+	myc2 := make(chan int)
+	myc3 := make(chan int)
+	done := make(chan int)
+	b.RunParallel(func(pb *testing.PB) {
+		go func() {
+			for {
+				select {
+				case myc1 <- 0:
+				case myc2 <- 0:
+				case myc3 <- 0:
+				case <-done:
+					return
+				}
+			}
+		}()
+		for pb.Next() {
+			select {
+			case <-myc1:
+			case <-myc2:
+			case <-myc3:
+			}
+		}
+	})
+	close(done)
+}
+
+func BenchmarkSelectAsyncContended(b *testing.B) {
+	procs := runtime.GOMAXPROCS(0)
+	myc1 := make(chan int, procs)
+	myc2 := make(chan int, procs)
+	b.RunParallel(func(pb *testing.PB) {
+		myc1 <- 0
+		for pb.Next() {
+			select {
+			case <-myc1:
+				myc2 <- 0
+			case <-myc2:
+				myc1 <- 0
+			}
+		}
+	})
+}
+
+func BenchmarkSelectNonblock(b *testing.B) {
+	myc1 := make(chan int)
+	myc2 := make(chan int)
+	myc3 := make(chan int, 1)
+	myc4 := make(chan int, 1)
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			select {
+			case <-myc1:
+			default:
+			}
+			select {
+			case myc2 <- 0:
+			default:
+			}
+			select {
+			case <-myc3:
+			default:
+			}
+			select {
+			case myc4 <- 0:
+			default:
+			}
+		}
+	})
+}
+
+func BenchmarkChanUncontended(b *testing.B) {
+	const C = 100
+	b.RunParallel(func(pb *testing.PB) {
+		myc := make(chan int, C)
+		for pb.Next() {
+			for i := 0; i < C; i++ {
+				myc <- 0
+			}
+			for i := 0; i < C; i++ {
+				<-myc
+			}
+		}
+	})
+}
+
+func BenchmarkChanContended(b *testing.B) {
+	const C = 100
+	myc := make(chan int, C*runtime.GOMAXPROCS(0))
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			for i := 0; i < C; i++ {
+				myc <- 0
+			}
+			for i := 0; i < C; i++ {
+				<-myc
+			}
+		}
+	})
+}
+
+func benchmarkChanSync(b *testing.B, work int) {
+	const CallsPerSched = 1000
+	procs := 2
+	N := int32(b.N / CallsPerSched / procs * procs)
+	c := make(chan bool, procs)
+	myc := make(chan int)
+	for p := 0; p < procs; p++ {
+		go func() {
+			for {
+				i := atomic.AddInt32(&N, -1)
+				if i < 0 {
+					break
+				}
+				for g := 0; g < CallsPerSched; g++ {
+					if i%2 == 0 {
+						<-myc
+						localWork(work)
+						myc <- 0
+						localWork(work)
+					} else {
+						myc <- 0
+						localWork(work)
+						<-myc
+						localWork(work)
+					}
+				}
+			}
+			c <- true
+		}()
+	}
+	for p := 0; p < procs; p++ {
+		<-c
+	}
+}
+
+func BenchmarkChanSync(b *testing.B) {
+	benchmarkChanSync(b, 0)
+}
+
+func BenchmarkChanSyncWork(b *testing.B) {
+	benchmarkChanSync(b, 1000)
+}
+
+func benchmarkChanProdCons(b *testing.B, chanSize, localWork int) {
+	const CallsPerSched = 1000
+	procs := runtime.GOMAXPROCS(-1)
+	N := int32(b.N / CallsPerSched)
+	c := make(chan bool, 2*procs)
+	myc := make(chan int, chanSize)
+	for p := 0; p < procs; p++ {
+		go func() {
+			foo := 0
+			for atomic.AddInt32(&N, -1) >= 0 {
+				for g := 0; g < CallsPerSched; g++ {
+					for i := 0; i < localWork; i++ {
+						foo *= 2
+						foo /= 2
+					}
+					myc <- 1
+				}
+			}
+			myc <- 0
+			c <- foo == 42
+		}()
+		go func() {
+			foo := 0
+			for {
+				v := <-myc
+				if v == 0 {
+					break
+				}
+				for i := 0; i < localWork; i++ {
+					foo *= 2
+					foo /= 2
+				}
+			}
+			c <- foo == 42
+		}()
+	}
+	for p := 0; p < procs; p++ {
+		<-c
+		<-c
+	}
+}
+
+func BenchmarkChanProdCons0(b *testing.B) {
+	benchmarkChanProdCons(b, 0, 0)
+}
+
+func BenchmarkChanProdCons10(b *testing.B) {
+	benchmarkChanProdCons(b, 10, 0)
+}
+
+func BenchmarkChanProdCons100(b *testing.B) {
+	benchmarkChanProdCons(b, 100, 0)
+}
+
+func BenchmarkChanProdConsWork0(b *testing.B) {
+	benchmarkChanProdCons(b, 0, 100)
+}
+
+func BenchmarkChanProdConsWork10(b *testing.B) {
+	benchmarkChanProdCons(b, 10, 100)
+}
+
+func BenchmarkChanProdConsWork100(b *testing.B) {
+	benchmarkChanProdCons(b, 100, 100)
+}
+
+func BenchmarkSelectProdCons(b *testing.B) {
+	const CallsPerSched = 1000
+	procs := runtime.GOMAXPROCS(-1)
+	N := int32(b.N / CallsPerSched)
+	c := make(chan bool, 2*procs)
+	myc := make(chan int, 128)
+	myclose := make(chan bool)
+	for p := 0; p < procs; p++ {
+		go func() {
+			// Producer: sends to myc.
+			foo := 0
+			// Intended to not fire during benchmarking.
+			mytimer := time.After(time.Hour)
+			for atomic.AddInt32(&N, -1) >= 0 {
+				for g := 0; g < CallsPerSched; g++ {
+					// Model some local work.
+					for i := 0; i < 100; i++ {
+						foo *= 2
+						foo /= 2
+					}
+					select {
+					case myc <- 1:
+					case <-mytimer:
+					case <-myclose:
+					}
+				}
+			}
+			myc <- 0
+			c <- foo == 42
+		}()
+		go func() {
+			// Consumer: receives from myc.
+			foo := 0
+			// Intended to not fire during benchmarking.
+			mytimer := time.After(time.Hour)
+		loop:
+			for {
+				select {
+				case v := <-myc:
+					if v == 0 {
+						break loop
+					}
+				case <-mytimer:
+				case <-myclose:
+				}
+				// Model some local work.
+				for i := 0; i < 100; i++ {
+					foo *= 2
+					foo /= 2
+				}
+			}
+			c <- foo == 42
+		}()
+	}
+	for p := 0; p < procs; p++ {
+		<-c
+		<-c
+	}
+}
+
+func BenchmarkChanCreation(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			myc := make(chan int, 1)
+			myc <- 0
+			<-myc
+		}
+	})
+}
+
+func BenchmarkChanSem(b *testing.B) {
+	type Empty struct{}
+	myc := make(chan Empty, runtime.GOMAXPROCS(0))
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			myc <- Empty{}
+			<-myc
+		}
+	})
+}
+
+func BenchmarkChanPopular(b *testing.B) {
+	const n = 1000
+	c := make(chan bool)
+	var a []chan bool
+	var wg sync.WaitGroup
+	wg.Add(n)
+	for j := 0; j < n; j++ {
+		d := make(chan bool)
+		a = append(a, d)
+		go func() {
+			for i := 0; i < b.N; i++ {
+				select {
+				case <-c:
+				case <-d:
+				}
+			}
+			wg.Done()
+		}()
+	}
+	for i := 0; i < b.N; i++ {
+		for _, d := range a {
+			d <- true
+		}
+	}
+	wg.Wait()
+}
+
+func BenchmarkChanClosed(b *testing.B) {
+	c := make(chan struct{})
+	close(c)
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			select {
+			case <-c:
+			default:
+				b.Error("Unreachable")
+			}
+		}
+	})
+}
+
+var (
+	alwaysFalse = false
+	workSink    = 0
+)
+
+func localWork(w int) {
+	foo := 0
+	for i := 0; i < w; i++ {
+		foo /= (foo + 1)
+	}
+	if alwaysFalse {
+		workSink += foo
+	}
+}
diff --git a/src/runtime/chanbarrier_test.go b/src/runtime/chanbarrier_test.go
new file mode 100644
index 0000000..d479574
--- /dev/null
+++ b/src/runtime/chanbarrier_test.go
@@ -0,0 +1,83 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"sync"
+	"testing"
+)
+
+type response struct {
+}
+
+type myError struct {
+}
+
+func (myError) Error() string { return "" }
+
+func doRequest(useSelect bool) (*response, error) {
+	type async struct {
+		resp *response
+		err  error
+	}
+	ch := make(chan *async, 0)
+	done := make(chan struct{}, 0)
+
+	if useSelect {
+		go func() {
+			select {
+			case ch <- &async{resp: nil, err: myError{}}:
+			case <-done:
+			}
+		}()
+	} else {
+		go func() {
+			ch <- &async{resp: nil, err: myError{}}
+		}()
+	}
+
+	r := <-ch
+	runtime.Gosched()
+	return r.resp, r.err
+}
+
+func TestChanSendSelectBarrier(t *testing.T) {
+	testChanSendBarrier(true)
+}
+
+func TestChanSendBarrier(t *testing.T) {
+	testChanSendBarrier(false)
+}
+
+func testChanSendBarrier(useSelect bool) {
+	var wg sync.WaitGroup
+	var globalMu sync.Mutex
+	outer := 100
+	inner := 100000
+	if testing.Short() || runtime.GOARCH == "wasm" {
+		outer = 10
+		inner = 1000
+	}
+	for i := 0; i < outer; i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			var garbage []byte
+			for j := 0; j < inner; j++ {
+				_, err := doRequest(useSelect)
+				_, ok := err.(myError)
+				if !ok {
+					panic(1)
+				}
+				garbage = make([]byte, 1<<10)
+			}
+			globalMu.Lock()
+			global = garbage
+			globalMu.Unlock()
+		}()
+	}
+	wg.Wait()
+}
diff --git a/src/runtime/checkptr.go b/src/runtime/checkptr.go
new file mode 100644
index 0000000..59891a0
--- /dev/null
+++ b/src/runtime/checkptr.go
@@ -0,0 +1,83 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func checkptrAlignment(p unsafe.Pointer, elem *_type, n uintptr) {
+	// Check that (*[n]elem)(p) is appropriately aligned.
+	// Note that we allow unaligned pointers if the types they point to contain
+	// no pointers themselves. See issue 37298.
+	// TODO(mdempsky): What about fieldAlign?
+	if elem.ptrdata != 0 && uintptr(p)&(uintptr(elem.align)-1) != 0 {
+		throw("checkptr: misaligned pointer conversion")
+	}
+
+	// Check that (*[n]elem)(p) doesn't straddle multiple heap objects.
+	if size := n * elem.size; size > 1 && checkptrBase(p) != checkptrBase(add(p, size-1)) {
+		throw("checkptr: converted pointer straddles multiple allocations")
+	}
+}
+
+func checkptrArithmetic(p unsafe.Pointer, originals []unsafe.Pointer) {
+	if 0 < uintptr(p) && uintptr(p) < minLegalPointer {
+		throw("checkptr: pointer arithmetic computed bad pointer value")
+	}
+
+	// Check that if the computed pointer p points into a heap
+	// object, then one of the original pointers must have pointed
+	// into the same object.
+	base := checkptrBase(p)
+	if base == 0 {
+		return
+	}
+
+	for _, original := range originals {
+		if base == checkptrBase(original) {
+			return
+		}
+	}
+
+	throw("checkptr: pointer arithmetic result points to invalid allocation")
+}
+
+// checkptrBase returns the base address for the allocation containing
+// the address p.
+//
+// Importantly, if p1 and p2 point into the same variable, then
+// checkptrBase(p1) == checkptrBase(p2). However, the converse/inverse
+// is not necessarily true as allocations can have trailing padding,
+// and multiple variables may be packed into a single allocation.
+func checkptrBase(p unsafe.Pointer) uintptr {
+	// stack
+	if gp := getg(); gp.stack.lo <= uintptr(p) && uintptr(p) < gp.stack.hi {
+		// TODO(mdempsky): Walk the stack to identify the
+		// specific stack frame or even stack object that p
+		// points into.
+		//
+		// In the mean time, use "1" as a pseudo-address to
+		// represent the stack. This is an invalid address on
+		// all platforms, so it's guaranteed to be distinct
+		// from any of the addresses we might return below.
+		return 1
+	}
+
+	// heap (must check after stack because of #35068)
+	if base, _, _ := findObject(uintptr(p), 0, 0); base != 0 {
+		return base
+	}
+
+	// data or bss
+	for _, datap := range activeModules() {
+		if datap.data <= uintptr(p) && uintptr(p) < datap.edata {
+			return datap.data
+		}
+		if datap.bss <= uintptr(p) && uintptr(p) < datap.ebss {
+			return datap.bss
+		}
+	}
+
+	return 0
+}
diff --git a/src/runtime/checkptr_test.go b/src/runtime/checkptr_test.go
new file mode 100644
index 0000000..194cc12
--- /dev/null
+++ b/src/runtime/checkptr_test.go
@@ -0,0 +1,54 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"internal/testenv"
+	"os/exec"
+	"strings"
+	"testing"
+)
+
+func TestCheckPtr(t *testing.T) {
+	t.Parallel()
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprog", "-gcflags=all=-d=checkptr=1")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	testCases := []struct {
+		cmd  string
+		want string
+	}{
+		{"CheckPtrAlignmentPtr", "fatal error: checkptr: misaligned pointer conversion\n"},
+		{"CheckPtrAlignmentNoPtr", ""},
+		{"CheckPtrArithmetic", "fatal error: checkptr: pointer arithmetic result points to invalid allocation\n"},
+		{"CheckPtrArithmetic2", "fatal error: checkptr: pointer arithmetic result points to invalid allocation\n"},
+		{"CheckPtrSize", "fatal error: checkptr: converted pointer straddles multiple allocations\n"},
+		{"CheckPtrSmall", "fatal error: checkptr: pointer arithmetic computed bad pointer value\n"},
+	}
+
+	for _, tc := range testCases {
+		tc := tc
+		t.Run(tc.cmd, func(t *testing.T) {
+			t.Parallel()
+			got, err := testenv.CleanCmdEnv(exec.Command(exe, tc.cmd)).CombinedOutput()
+			if err != nil {
+				t.Log(err)
+			}
+			if tc.want == "" {
+				if len(got) > 0 {
+					t.Errorf("output:\n%s\nwant no output", got)
+				}
+				return
+			}
+			if !strings.HasPrefix(string(got), tc.want) {
+				t.Errorf("output:\n%s\n\nwant output starting with: %s", got, tc.want)
+			}
+		})
+	}
+}
diff --git a/src/runtime/closure_test.go b/src/runtime/closure_test.go
new file mode 100644
index 0000000..741c932
--- /dev/null
+++ b/src/runtime/closure_test.go
@@ -0,0 +1,54 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import "testing"
+
+var s int
+
+func BenchmarkCallClosure(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		s += func(ii int) int { return 2 * ii }(i)
+	}
+}
+
+func BenchmarkCallClosure1(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		j := i
+		s += func(ii int) int { return 2*ii + j }(i)
+	}
+}
+
+var ss *int
+
+func BenchmarkCallClosure2(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		j := i
+		s += func() int {
+			ss = &j
+			return 2
+		}()
+	}
+}
+
+func addr1(x int) *int {
+	return func() *int { return &x }()
+}
+
+func BenchmarkCallClosure3(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		ss = addr1(i)
+	}
+}
+
+func addr2() (x int, p *int) {
+	return 0, func() *int { return &x }()
+}
+
+func BenchmarkCallClosure4(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		_, ss = addr2()
+	}
+}
diff --git a/src/runtime/compiler.go b/src/runtime/compiler.go
new file mode 100644
index 0000000..1ebc62d
--- /dev/null
+++ b/src/runtime/compiler.go
@@ -0,0 +1,13 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Compiler is the name of the compiler toolchain that built the
+// running binary. Known toolchains are:
+//
+//	gc      Also known as cmd/compile.
+//	gccgo   The gccgo front end, part of the GCC compiler suite.
+//
+const Compiler = "gc"
diff --git a/src/runtime/complex.go b/src/runtime/complex.go
new file mode 100644
index 0000000..07c596f
--- /dev/null
+++ b/src/runtime/complex.go
@@ -0,0 +1,61 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// inf2one returns a signed 1 if f is an infinity and a signed 0 otherwise.
+// The sign of the result is the sign of f.
+func inf2one(f float64) float64 {
+	g := 0.0
+	if isInf(f) {
+		g = 1.0
+	}
+	return copysign(g, f)
+}
+
+func complex128div(n complex128, m complex128) complex128 {
+	var e, f float64 // complex(e, f) = n/m
+
+	// Algorithm for robust complex division as described in
+	// Robert L. Smith: Algorithm 116: Complex division. Commun. ACM 5(8): 435 (1962).
+	if abs(real(m)) >= abs(imag(m)) {
+		ratio := imag(m) / real(m)
+		denom := real(m) + ratio*imag(m)
+		e = (real(n) + imag(n)*ratio) / denom
+		f = (imag(n) - real(n)*ratio) / denom
+	} else {
+		ratio := real(m) / imag(m)
+		denom := imag(m) + ratio*real(m)
+		e = (real(n)*ratio + imag(n)) / denom
+		f = (imag(n)*ratio - real(n)) / denom
+	}
+
+	if isNaN(e) && isNaN(f) {
+		// Correct final result to infinities and zeros if applicable.
+		// Matches C99: ISO/IEC 9899:1999 - G.5.1  Multiplicative operators.
+
+		a, b := real(n), imag(n)
+		c, d := real(m), imag(m)
+
+		switch {
+		case m == 0 && (!isNaN(a) || !isNaN(b)):
+			e = copysign(inf, c) * a
+			f = copysign(inf, c) * b
+
+		case (isInf(a) || isInf(b)) && isFinite(c) && isFinite(d):
+			a = inf2one(a)
+			b = inf2one(b)
+			e = inf * (a*c + b*d)
+			f = inf * (b*c - a*d)
+
+		case (isInf(c) || isInf(d)) && isFinite(a) && isFinite(b):
+			c = inf2one(c)
+			d = inf2one(d)
+			e = 0 * (a*c + b*d)
+			f = 0 * (b*c - a*d)
+		}
+	}
+
+	return complex(e, f)
+}
diff --git a/src/runtime/complex_test.go b/src/runtime/complex_test.go
new file mode 100644
index 0000000..f41e6a3
--- /dev/null
+++ b/src/runtime/complex_test.go
@@ -0,0 +1,67 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math/cmplx"
+	"testing"
+)
+
+var result complex128
+
+func BenchmarkComplex128DivNormal(b *testing.B) {
+	d := 15 + 2i
+	n := 32 + 3i
+	res := 0i
+	for i := 0; i < b.N; i++ {
+		n += 0.1i
+		res += n / d
+	}
+	result = res
+}
+
+func BenchmarkComplex128DivNisNaN(b *testing.B) {
+	d := cmplx.NaN()
+	n := 32 + 3i
+	res := 0i
+	for i := 0; i < b.N; i++ {
+		n += 0.1i
+		res += n / d
+	}
+	result = res
+}
+
+func BenchmarkComplex128DivDisNaN(b *testing.B) {
+	d := 15 + 2i
+	n := cmplx.NaN()
+	res := 0i
+	for i := 0; i < b.N; i++ {
+		d += 0.1i
+		res += n / d
+	}
+	result = res
+}
+
+func BenchmarkComplex128DivNisInf(b *testing.B) {
+	d := 15 + 2i
+	n := cmplx.Inf()
+	res := 0i
+	for i := 0; i < b.N; i++ {
+		d += 0.1i
+		res += n / d
+	}
+	result = res
+}
+
+func BenchmarkComplex128DivDisInf(b *testing.B) {
+	d := cmplx.Inf()
+	n := 32 + 3i
+	res := 0i
+	for i := 0; i < b.N; i++ {
+		n += 0.1i
+		res += n / d
+	}
+	result = res
+}
diff --git a/src/runtime/conv_wasm_test.go b/src/runtime/conv_wasm_test.go
new file mode 100644
index 0000000..5054fca
--- /dev/null
+++ b/src/runtime/conv_wasm_test.go
@@ -0,0 +1,128 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"testing"
+)
+
+var res int64
+var ures uint64
+
+func TestFloatTruncation(t *testing.T) {
+	testdata := []struct {
+		input      float64
+		convInt64  int64
+		convUInt64 uint64
+		overflow   bool
+	}{
+		// max +- 1
+		{
+			input:      0x7fffffffffffffff,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		// For out-of-bounds conversion, the result is implementation-dependent.
+		// This test verifies the implementation of wasm architecture.
+		{
+			input:      0x8000000000000000,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      0x7ffffffffffffffe,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		// neg max +- 1
+		{
+			input:      -0x8000000000000000,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      -0x8000000000000001,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      -0x7fffffffffffffff,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		// trunc point +- 1
+		{
+			input:      0x7ffffffffffffdff,
+			convInt64:  0x7ffffffffffffc00,
+			convUInt64: 0x7ffffffffffffc00,
+		},
+		{
+			input:      0x7ffffffffffffe00,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      0x7ffffffffffffdfe,
+			convInt64:  0x7ffffffffffffc00,
+			convUInt64: 0x7ffffffffffffc00,
+		},
+		// neg trunc point +- 1
+		{
+			input:      -0x7ffffffffffffdff,
+			convInt64:  -0x7ffffffffffffc00,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      -0x7ffffffffffffe00,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      -0x7ffffffffffffdfe,
+			convInt64:  -0x7ffffffffffffc00,
+			convUInt64: 0x8000000000000000,
+		},
+		// umax +- 1
+		{
+			input:      0xffffffffffffffff,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      0x10000000000000000,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      0xfffffffffffffffe,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		// umax trunc +- 1
+		{
+			input:      0xfffffffffffffbff,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0xfffffffffffff800,
+		},
+		{
+			input:      0xfffffffffffffc00,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      0xfffffffffffffbfe,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0xfffffffffffff800,
+		},
+	}
+	for _, item := range testdata {
+		if got, want := int64(item.input), item.convInt64; got != want {
+			t.Errorf("int64(%f): got %x, want %x", item.input, got, want)
+		}
+		if got, want := uint64(item.input), item.convUInt64; got != want {
+			t.Errorf("uint64(%f): got %x, want %x", item.input, got, want)
+		}
+	}
+}
diff --git a/src/runtime/cpuflags.go b/src/runtime/cpuflags.go
new file mode 100644
index 0000000..5104650
--- /dev/null
+++ b/src/runtime/cpuflags.go
@@ -0,0 +1,34 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"unsafe"
+)
+
+// Offsets into internal/cpu records for use in assembly.
+const (
+	offsetX86HasAVX  = unsafe.Offsetof(cpu.X86.HasAVX)
+	offsetX86HasAVX2 = unsafe.Offsetof(cpu.X86.HasAVX2)
+	offsetX86HasERMS = unsafe.Offsetof(cpu.X86.HasERMS)
+	offsetX86HasSSE2 = unsafe.Offsetof(cpu.X86.HasSSE2)
+
+	offsetARMHasIDIVA = unsafe.Offsetof(cpu.ARM.HasIDIVA)
+
+	offsetMIPS64XHasMSA = unsafe.Offsetof(cpu.MIPS64X.HasMSA)
+)
+
+var (
+	// Set in runtime.cpuinit.
+	// TODO: deprecate these; use internal/cpu directly.
+	x86HasPOPCNT bool
+	x86HasSSE41  bool
+	x86HasFMA    bool
+
+	armHasVFPv4 bool
+
+	arm64HasATOMICS bool
+)
diff --git a/src/runtime/cpuflags_amd64.go b/src/runtime/cpuflags_amd64.go
new file mode 100644
index 0000000..8cca4bc
--- /dev/null
+++ b/src/runtime/cpuflags_amd64.go
@@ -0,0 +1,24 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+)
+
+var useAVXmemmove bool
+
+func init() {
+	// Let's remove stepping and reserved fields
+	processor := processorVersionInfo & 0x0FFF3FF0
+
+	isIntelBridgeFamily := isIntel &&
+		processor == 0x206A0 ||
+		processor == 0x206D0 ||
+		processor == 0x306A0 ||
+		processor == 0x306E0
+
+	useAVXmemmove = cpu.X86.HasAVX && !isIntelBridgeFamily
+}
diff --git a/src/runtime/cpuflags_arm64.go b/src/runtime/cpuflags_arm64.go
new file mode 100644
index 0000000..7576bef
--- /dev/null
+++ b/src/runtime/cpuflags_arm64.go
@@ -0,0 +1,17 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+)
+
+var arm64UseAlignedLoads bool
+
+func init() {
+	if cpu.ARM64.IsNeoverseN1 || cpu.ARM64.IsZeus {
+		arm64UseAlignedLoads = true
+	}
+}
diff --git a/src/runtime/cpuprof.go b/src/runtime/cpuprof.go
new file mode 100644
index 0000000..9bfdfe7
--- /dev/null
+++ b/src/runtime/cpuprof.go
@@ -0,0 +1,215 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// CPU profiling.
+//
+// The signal handler for the profiling clock tick adds a new stack trace
+// to a log of recent traces. The log is read by a user goroutine that
+// turns it into formatted profile data. If the reader does not keep up
+// with the log, those writes will be recorded as a count of lost records.
+// The actual profile buffer is in profbuf.go.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const maxCPUProfStack = 64
+
+type cpuProfile struct {
+	lock mutex
+	on   bool     // profiling is on
+	log  *profBuf // profile events written here
+
+	// extra holds extra stacks accumulated in addNonGo
+	// corresponding to profiling signals arriving on
+	// non-Go-created threads. Those stacks are written
+	// to log the next time a normal Go thread gets the
+	// signal handler.
+	// Assuming the stacks are 2 words each (we don't get
+	// a full traceback from those threads), plus one word
+	// size for framing, 100 Hz profiling would generate
+	// 300 words per second.
+	// Hopefully a normal Go thread will get the profiling
+	// signal at least once every few seconds.
+	extra      [1000]uintptr
+	numExtra   int
+	lostExtra  uint64 // count of frames lost because extra is full
+	lostAtomic uint64 // count of frames lost because of being in atomic64 on mips/arm; updated racily
+}
+
+var cpuprof cpuProfile
+
+// SetCPUProfileRate sets the CPU profiling rate to hz samples per second.
+// If hz <= 0, SetCPUProfileRate turns off profiling.
+// If the profiler is on, the rate cannot be changed without first turning it off.
+//
+// Most clients should use the runtime/pprof package or
+// the testing package's -test.cpuprofile flag instead of calling
+// SetCPUProfileRate directly.
+func SetCPUProfileRate(hz int) {
+	// Clamp hz to something reasonable.
+	if hz < 0 {
+		hz = 0
+	}
+	if hz > 1000000 {
+		hz = 1000000
+	}
+
+	lock(&cpuprof.lock)
+	if hz > 0 {
+		if cpuprof.on || cpuprof.log != nil {
+			print("runtime: cannot set cpu profile rate until previous profile has finished.\n")
+			unlock(&cpuprof.lock)
+			return
+		}
+
+		cpuprof.on = true
+		cpuprof.log = newProfBuf(1, 1<<17, 1<<14)
+		hdr := [1]uint64{uint64(hz)}
+		cpuprof.log.write(nil, nanotime(), hdr[:], nil)
+		setcpuprofilerate(int32(hz))
+	} else if cpuprof.on {
+		setcpuprofilerate(0)
+		cpuprof.on = false
+		cpuprof.addExtra()
+		cpuprof.log.close()
+	}
+	unlock(&cpuprof.lock)
+}
+
+// add adds the stack trace to the profile.
+// It is called from signal handlers and other limited environments
+// and cannot allocate memory or acquire locks that might be
+// held at the time of the signal, nor can it use substantial amounts
+// of stack.
+//go:nowritebarrierrec
+func (p *cpuProfile) add(gp *g, stk []uintptr) {
+	// Simple cas-lock to coordinate with setcpuprofilerate.
+	for !atomic.Cas(&prof.signalLock, 0, 1) {
+		osyield()
+	}
+
+	if prof.hz != 0 { // implies cpuprof.log != nil
+		if p.numExtra > 0 || p.lostExtra > 0 || p.lostAtomic > 0 {
+			p.addExtra()
+		}
+		hdr := [1]uint64{1}
+		// Note: write "knows" that the argument is &gp.labels,
+		// because otherwise its write barrier behavior may not
+		// be correct. See the long comment there before
+		// changing the argument here.
+		cpuprof.log.write(&gp.labels, nanotime(), hdr[:], stk)
+	}
+
+	atomic.Store(&prof.signalLock, 0)
+}
+
+// addNonGo adds the non-Go stack trace to the profile.
+// It is called from a non-Go thread, so we cannot use much stack at all,
+// nor do anything that needs a g or an m.
+// In particular, we can't call cpuprof.log.write.
+// Instead, we copy the stack into cpuprof.extra,
+// which will be drained the next time a Go thread
+// gets the signal handling event.
+//go:nosplit
+//go:nowritebarrierrec
+func (p *cpuProfile) addNonGo(stk []uintptr) {
+	// Simple cas-lock to coordinate with SetCPUProfileRate.
+	// (Other calls to add or addNonGo should be blocked out
+	// by the fact that only one SIGPROF can be handled by the
+	// process at a time. If not, this lock will serialize those too.)
+	for !atomic.Cas(&prof.signalLock, 0, 1) {
+		osyield()
+	}
+
+	if cpuprof.numExtra+1+len(stk) < len(cpuprof.extra) {
+		i := cpuprof.numExtra
+		cpuprof.extra[i] = uintptr(1 + len(stk))
+		copy(cpuprof.extra[i+1:], stk)
+		cpuprof.numExtra += 1 + len(stk)
+	} else {
+		cpuprof.lostExtra++
+	}
+
+	atomic.Store(&prof.signalLock, 0)
+}
+
+// addExtra adds the "extra" profiling events,
+// queued by addNonGo, to the profile log.
+// addExtra is called either from a signal handler on a Go thread
+// or from an ordinary goroutine; either way it can use stack
+// and has a g. The world may be stopped, though.
+func (p *cpuProfile) addExtra() {
+	// Copy accumulated non-Go profile events.
+	hdr := [1]uint64{1}
+	for i := 0; i < p.numExtra; {
+		p.log.write(nil, 0, hdr[:], p.extra[i+1:i+int(p.extra[i])])
+		i += int(p.extra[i])
+	}
+	p.numExtra = 0
+
+	// Report any lost events.
+	if p.lostExtra > 0 {
+		hdr := [1]uint64{p.lostExtra}
+		lostStk := [2]uintptr{
+			funcPC(_LostExternalCode) + sys.PCQuantum,
+			funcPC(_ExternalCode) + sys.PCQuantum,
+		}
+		p.log.write(nil, 0, hdr[:], lostStk[:])
+		p.lostExtra = 0
+	}
+
+	if p.lostAtomic > 0 {
+		hdr := [1]uint64{p.lostAtomic}
+		lostStk := [2]uintptr{
+			funcPC(_LostSIGPROFDuringAtomic64) + sys.PCQuantum,
+			funcPC(_System) + sys.PCQuantum,
+		}
+		p.log.write(nil, 0, hdr[:], lostStk[:])
+		p.lostAtomic = 0
+	}
+
+}
+
+// CPUProfile panics.
+// It formerly provided raw access to chunks of
+// a pprof-format profile generated by the runtime.
+// The details of generating that format have changed,
+// so this functionality has been removed.
+//
+// Deprecated: Use the runtime/pprof package,
+// or the handlers in the net/http/pprof package,
+// or the testing package's -test.cpuprofile flag instead.
+func CPUProfile() []byte {
+	panic("CPUProfile no longer available")
+}
+
+//go:linkname runtime_pprof_runtime_cyclesPerSecond runtime/pprof.runtime_cyclesPerSecond
+func runtime_pprof_runtime_cyclesPerSecond() int64 {
+	return tickspersecond()
+}
+
+// readProfile, provided to runtime/pprof, returns the next chunk of
+// binary CPU profiling stack trace data, blocking until data is available.
+// If profiling is turned off and all the profile data accumulated while it was
+// on has been returned, readProfile returns eof=true.
+// The caller must save the returned data and tags before calling readProfile again.
+//
+//go:linkname runtime_pprof_readProfile runtime/pprof.readProfile
+func runtime_pprof_readProfile() ([]uint64, []unsafe.Pointer, bool) {
+	lock(&cpuprof.lock)
+	log := cpuprof.log
+	unlock(&cpuprof.lock)
+	data, tags, eof := log.read(profBufBlocking)
+	if len(data) == 0 && eof {
+		lock(&cpuprof.lock)
+		cpuprof.log = nil
+		unlock(&cpuprof.lock)
+	}
+	return data, tags, eof
+}
diff --git a/src/runtime/cputicks.go b/src/runtime/cputicks.go
new file mode 100644
index 0000000..7beb57e
--- /dev/null
+++ b/src/runtime/cputicks.go
@@ -0,0 +1,17 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !arm
+// +build !arm64
+// +build !mips64
+// +build !mips64le
+// +build !mips
+// +build !mipsle
+// +build !wasm
+
+package runtime
+
+// careful: cputicks is not guaranteed to be monotonic! In particular, we have
+// noticed drift between cpus on certain os/arch combinations. See issue 8976.
+func cputicks() int64
diff --git a/src/runtime/crash_cgo_test.go b/src/runtime/crash_cgo_test.go
new file mode 100644
index 0000000..140c170
--- /dev/null
+++ b/src/runtime/crash_cgo_test.go
@@ -0,0 +1,633 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build cgo
+
+package runtime_test
+
+import (
+	"bytes"
+	"fmt"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"runtime"
+	"strconv"
+	"strings"
+	"testing"
+	"time"
+)
+
+func TestCgoCrashHandler(t *testing.T) {
+	t.Parallel()
+	testCrashHandler(t, true)
+}
+
+func TestCgoSignalDeadlock(t *testing.T) {
+	// Don't call t.Parallel, since too much work going on at the
+	// same time can cause the testprogcgo code to overrun its
+	// timeouts (issue #18598).
+
+	if testing.Short() && runtime.GOOS == "windows" {
+		t.Skip("Skipping in short mode") // takes up to 64 seconds
+	}
+	got := runTestProg(t, "testprogcgo", "CgoSignalDeadlock")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestCgoTraceback(t *testing.T) {
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "CgoTraceback")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestCgoCallbackGC(t *testing.T) {
+	t.Parallel()
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+	if testing.Short() {
+		switch {
+		case runtime.GOOS == "dragonfly":
+			t.Skip("see golang.org/issue/11990")
+		case runtime.GOOS == "linux" && runtime.GOARCH == "arm":
+			t.Skip("too slow for arm builders")
+		case runtime.GOOS == "linux" && (runtime.GOARCH == "mips64" || runtime.GOARCH == "mips64le"):
+			t.Skip("too slow for mips64x builders")
+		}
+	}
+	got := runTestProg(t, "testprogcgo", "CgoCallbackGC")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestCgoExternalThreadPanic(t *testing.T) {
+	t.Parallel()
+	if runtime.GOOS == "plan9" {
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+	got := runTestProg(t, "testprogcgo", "CgoExternalThreadPanic")
+	want := "panic: BOOM"
+	if !strings.Contains(got, want) {
+		t.Fatalf("want failure containing %q. output:\n%s\n", want, got)
+	}
+}
+
+func TestCgoExternalThreadSIGPROF(t *testing.T) {
+	t.Parallel()
+	// issue 9456.
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+	if runtime.GOARCH == "ppc64" && runtime.GOOS == "linux" {
+		// TODO(austin) External linking not implemented on
+		// linux/ppc64 (issue #8912)
+		t.Skipf("no external linking on ppc64")
+	}
+
+	exe, err := buildTestProg(t, "testprogcgo", "-tags=threadprof")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	got, err := testenv.CleanCmdEnv(exec.Command(exe, "CgoExternalThreadSIGPROF")).CombinedOutput()
+	if err != nil {
+		t.Fatalf("exit status: %v\n%s", err, got)
+	}
+
+	if want := "OK\n"; string(got) != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestCgoExternalThreadSignal(t *testing.T) {
+	t.Parallel()
+	// issue 10139
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+
+	exe, err := buildTestProg(t, "testprogcgo", "-tags=threadprof")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	got, err := testenv.CleanCmdEnv(exec.Command(exe, "CgoExternalThreadSIGPROF")).CombinedOutput()
+	if err != nil {
+		t.Fatalf("exit status: %v\n%s", err, got)
+	}
+
+	want := []byte("OK\n")
+	if !bytes.Equal(got, want) {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestCgoDLLImports(t *testing.T) {
+	// test issue 9356
+	if runtime.GOOS != "windows" {
+		t.Skip("skipping windows specific test")
+	}
+	got := runTestProg(t, "testprogcgo", "CgoDLLImportsMain")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got %v", want, got)
+	}
+}
+
+func TestCgoExecSignalMask(t *testing.T) {
+	t.Parallel()
+	// Test issue 13164.
+	switch runtime.GOOS {
+	case "windows", "plan9":
+		t.Skipf("skipping signal mask test on %s", runtime.GOOS)
+	}
+	got := runTestProg(t, "testprogcgo", "CgoExecSignalMask", "GOTRACEBACK=system")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q, got %v", want, got)
+	}
+}
+
+func TestEnsureDropM(t *testing.T) {
+	t.Parallel()
+	// Test for issue 13881.
+	switch runtime.GOOS {
+	case "windows", "plan9":
+		t.Skipf("skipping dropm test on %s", runtime.GOOS)
+	}
+	got := runTestProg(t, "testprogcgo", "EnsureDropM")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q, got %v", want, got)
+	}
+}
+
+// Test for issue 14387.
+// Test that the program that doesn't need any cgo pointer checking
+// takes about the same amount of time with it as without it.
+func TestCgoCheckBytes(t *testing.T) {
+	t.Parallel()
+	// Make sure we don't count the build time as part of the run time.
+	testenv.MustHaveGoBuild(t)
+	exe, err := buildTestProg(t, "testprogcgo")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Try it 10 times to avoid flakiness.
+	const tries = 10
+	var tot1, tot2 time.Duration
+	for i := 0; i < tries; i++ {
+		cmd := testenv.CleanCmdEnv(exec.Command(exe, "CgoCheckBytes"))
+		cmd.Env = append(cmd.Env, "GODEBUG=cgocheck=0", fmt.Sprintf("GO_CGOCHECKBYTES_TRY=%d", i))
+
+		start := time.Now()
+		cmd.Run()
+		d1 := time.Since(start)
+
+		cmd = testenv.CleanCmdEnv(exec.Command(exe, "CgoCheckBytes"))
+		cmd.Env = append(cmd.Env, fmt.Sprintf("GO_CGOCHECKBYTES_TRY=%d", i))
+
+		start = time.Now()
+		cmd.Run()
+		d2 := time.Since(start)
+
+		if d1*20 > d2 {
+			// The slow version (d2) was less than 20 times
+			// slower than the fast version (d1), so OK.
+			return
+		}
+
+		tot1 += d1
+		tot2 += d2
+	}
+
+	t.Errorf("cgo check too slow: got %v, expected at most %v", tot2/tries, (tot1/tries)*20)
+}
+
+func TestCgoPanicDeadlock(t *testing.T) {
+	t.Parallel()
+	// test issue 14432
+	got := runTestProg(t, "testprogcgo", "CgoPanicDeadlock")
+	want := "panic: cgo error\n\n"
+	if !strings.HasPrefix(got, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, got)
+	}
+}
+
+func TestCgoCCodeSIGPROF(t *testing.T) {
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "CgoCCodeSIGPROF")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func TestCgoCrashTraceback(t *testing.T) {
+	t.Parallel()
+	switch platform := runtime.GOOS + "/" + runtime.GOARCH; platform {
+	case "darwin/amd64":
+	case "linux/amd64":
+	case "linux/ppc64le":
+	default:
+		t.Skipf("not yet supported on %s", platform)
+	}
+	got := runTestProg(t, "testprogcgo", "CrashTraceback")
+	for i := 1; i <= 3; i++ {
+		if !strings.Contains(got, fmt.Sprintf("cgo symbolizer:%d", i)) {
+			t.Errorf("missing cgo symbolizer:%d", i)
+		}
+	}
+}
+
+func TestCgoCrashTracebackGo(t *testing.T) {
+	t.Parallel()
+	switch platform := runtime.GOOS + "/" + runtime.GOARCH; platform {
+	case "darwin/amd64":
+	case "linux/amd64":
+	case "linux/ppc64le":
+	default:
+		t.Skipf("not yet supported on %s", platform)
+	}
+	got := runTestProg(t, "testprogcgo", "CrashTracebackGo")
+	for i := 1; i <= 3; i++ {
+		want := fmt.Sprintf("main.h%d", i)
+		if !strings.Contains(got, want) {
+			t.Errorf("missing %s", want)
+		}
+	}
+}
+
+func TestCgoTracebackContext(t *testing.T) {
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "TracebackContext")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func testCgoPprof(t *testing.T, buildArg, runArg, top, bottom string) {
+	t.Parallel()
+	if runtime.GOOS != "linux" || (runtime.GOARCH != "amd64" && runtime.GOARCH != "ppc64le") {
+		t.Skipf("not yet supported on %s/%s", runtime.GOOS, runtime.GOARCH)
+	}
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprogcgo", buildArg)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// pprofCgoTraceback is called whenever CGO code is executing and a signal
+	// is received. Disable signal preemption to increase the likelihood at
+	// least one SIGPROF signal fired to capture a sample. See issue #37201.
+	cmd := testenv.CleanCmdEnv(exec.Command(exe, runArg))
+	cmd.Env = append(cmd.Env, "GODEBUG=asyncpreemptoff=1")
+
+	got, err := cmd.CombinedOutput()
+	if err != nil {
+		if testenv.Builder() == "linux-amd64-alpine" {
+			// See Issue 18243 and Issue 19938.
+			t.Skipf("Skipping failing test on Alpine (golang.org/issue/18243). Ignoring error: %v", err)
+		}
+		t.Fatalf("%s\n\n%v", got, err)
+	}
+	fn := strings.TrimSpace(string(got))
+	defer os.Remove(fn)
+
+	for try := 0; try < 2; try++ {
+		cmd := testenv.CleanCmdEnv(exec.Command(testenv.GoToolPath(t), "tool", "pprof", "-traces"))
+		// Check that pprof works both with and without explicit executable on command line.
+		if try == 0 {
+			cmd.Args = append(cmd.Args, exe, fn)
+		} else {
+			cmd.Args = append(cmd.Args, fn)
+		}
+
+		found := false
+		for i, e := range cmd.Env {
+			if strings.HasPrefix(e, "PPROF_TMPDIR=") {
+				cmd.Env[i] = "PPROF_TMPDIR=" + os.TempDir()
+				found = true
+				break
+			}
+		}
+		if !found {
+			cmd.Env = append(cmd.Env, "PPROF_TMPDIR="+os.TempDir())
+		}
+
+		out, err := cmd.CombinedOutput()
+		t.Logf("%s:\n%s", cmd.Args, out)
+		if err != nil {
+			t.Error(err)
+			continue
+		}
+
+		trace := findTrace(string(out), top)
+		if len(trace) == 0 {
+			t.Errorf("%s traceback missing.", top)
+			continue
+		}
+		if trace[len(trace)-1] != bottom {
+			t.Errorf("invalid traceback origin: got=%v; want=[%s ... %s]", trace, top, bottom)
+		}
+	}
+}
+
+func TestCgoPprof(t *testing.T) {
+	testCgoPprof(t, "", "CgoPprof", "cpuHog", "runtime.main")
+}
+
+func TestCgoPprofPIE(t *testing.T) {
+	testCgoPprof(t, "-buildmode=pie", "CgoPprof", "cpuHog", "runtime.main")
+}
+
+func TestCgoPprofThread(t *testing.T) {
+	testCgoPprof(t, "", "CgoPprofThread", "cpuHogThread", "cpuHogThread2")
+}
+
+func TestCgoPprofThreadNoTraceback(t *testing.T) {
+	testCgoPprof(t, "", "CgoPprofThreadNoTraceback", "cpuHogThread", "runtime._ExternalCode")
+}
+
+func TestRaceProf(t *testing.T) {
+	if (runtime.GOOS != "linux" && runtime.GOOS != "freebsd") || runtime.GOARCH != "amd64" {
+		t.Skipf("not yet supported on %s/%s", runtime.GOOS, runtime.GOARCH)
+	}
+
+	testenv.MustHaveGoRun(t)
+
+	// This test requires building various packages with -race, so
+	// it's somewhat slow.
+	if testing.Short() {
+		t.Skip("skipping test in -short mode")
+	}
+
+	exe, err := buildTestProg(t, "testprogcgo", "-race")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	got, err := testenv.CleanCmdEnv(exec.Command(exe, "CgoRaceprof")).CombinedOutput()
+	if err != nil {
+		t.Fatal(err)
+	}
+	want := "OK\n"
+	if string(got) != want {
+		t.Errorf("expected %q got %s", want, got)
+	}
+}
+
+func TestRaceSignal(t *testing.T) {
+	t.Parallel()
+	if (runtime.GOOS != "linux" && runtime.GOOS != "freebsd") || runtime.GOARCH != "amd64" {
+		t.Skipf("not yet supported on %s/%s", runtime.GOOS, runtime.GOARCH)
+	}
+
+	testenv.MustHaveGoRun(t)
+
+	// This test requires building various packages with -race, so
+	// it's somewhat slow.
+	if testing.Short() {
+		t.Skip("skipping test in -short mode")
+	}
+
+	exe, err := buildTestProg(t, "testprogcgo", "-race")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	got, err := testenv.CleanCmdEnv(exec.Command(exe, "CgoRaceSignal")).CombinedOutput()
+	if err != nil {
+		t.Logf("%s\n", got)
+		t.Fatal(err)
+	}
+	want := "OK\n"
+	if string(got) != want {
+		t.Errorf("expected %q got %s", want, got)
+	}
+}
+
+func TestCgoNumGoroutine(t *testing.T) {
+	switch runtime.GOOS {
+	case "windows", "plan9":
+		t.Skipf("skipping numgoroutine test on %s", runtime.GOOS)
+	}
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "NumGoroutine")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func TestCatchPanic(t *testing.T) {
+	t.Parallel()
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no signals on %s", runtime.GOOS)
+	case "darwin":
+		if runtime.GOARCH == "amd64" {
+			t.Skipf("crash() on darwin/amd64 doesn't raise SIGABRT")
+		}
+	}
+
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprogcgo")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	for _, early := range []bool{true, false} {
+		cmd := testenv.CleanCmdEnv(exec.Command(exe, "CgoCatchPanic"))
+		// Make sure a panic results in a crash.
+		cmd.Env = append(cmd.Env, "GOTRACEBACK=crash")
+		if early {
+			// Tell testprogcgo to install an early signal handler for SIGABRT
+			cmd.Env = append(cmd.Env, "CGOCATCHPANIC_EARLY_HANDLER=1")
+		}
+		if out, err := cmd.CombinedOutput(); err != nil {
+			t.Errorf("testprogcgo CgoCatchPanic failed: %v\n%s", err, out)
+		}
+	}
+}
+
+func TestCgoLockOSThreadExit(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+	t.Parallel()
+	testLockOSThreadExit(t, "testprogcgo")
+}
+
+func TestWindowsStackMemoryCgo(t *testing.T) {
+	if runtime.GOOS != "windows" {
+		t.Skip("skipping windows specific test")
+	}
+	testenv.SkipFlaky(t, 22575)
+	o := runTestProg(t, "testprogcgo", "StackMemory")
+	stackUsage, err := strconv.Atoi(o)
+	if err != nil {
+		t.Fatalf("Failed to read stack usage: %v", err)
+	}
+	if expected, got := 100<<10, stackUsage; got > expected {
+		t.Fatalf("expected < %d bytes of memory per thread, got %d", expected, got)
+	}
+}
+
+func TestSigStackSwapping(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no sigaltstack on %s", runtime.GOOS)
+	}
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "SigStack")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func TestCgoTracebackSigpanic(t *testing.T) {
+	// Test unwinding over a sigpanic in C code without a C
+	// symbolizer. See issue #23576.
+	if runtime.GOOS == "windows" {
+		// On Windows if we get an exception in C code, we let
+		// the Windows exception handler unwind it, rather
+		// than injecting a sigpanic.
+		t.Skip("no sigpanic in C on windows")
+	}
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "TracebackSigpanic")
+	want := "runtime.sigpanic"
+	if !strings.Contains(got, want) {
+		t.Fatalf("want failure containing %q. output:\n%s\n", want, got)
+	}
+	nowant := "unexpected return pc"
+	if strings.Contains(got, nowant) {
+		t.Fatalf("failure incorrectly contains %q. output:\n%s\n", nowant, got)
+	}
+}
+
+// Test that C code called via cgo can use large Windows thread stacks
+// and call back in to Go without crashing. See issue #20975.
+//
+// See also TestBigStackCallbackSyscall.
+func TestBigStackCallbackCgo(t *testing.T) {
+	if runtime.GOOS != "windows" {
+		t.Skip("skipping windows specific test")
+	}
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "BigStack")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func nextTrace(lines []string) ([]string, []string) {
+	var trace []string
+	for n, line := range lines {
+		if strings.HasPrefix(line, "---") {
+			return trace, lines[n+1:]
+		}
+		fields := strings.Fields(strings.TrimSpace(line))
+		if len(fields) == 0 {
+			continue
+		}
+		// Last field contains the function name.
+		trace = append(trace, fields[len(fields)-1])
+	}
+	return nil, nil
+}
+
+func findTrace(text, top string) []string {
+	lines := strings.Split(text, "\n")
+	_, lines = nextTrace(lines) // Skip the header.
+	for len(lines) > 0 {
+		var t []string
+		t, lines = nextTrace(lines)
+		if len(t) == 0 {
+			continue
+		}
+		if t[0] == top {
+			return t
+		}
+	}
+	return nil
+}
+
+func TestSegv(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no signals on %s", runtime.GOOS)
+	}
+
+	for _, test := range []string{"Segv", "SegvInCgo"} {
+		t.Run(test, func(t *testing.T) {
+			t.Parallel()
+			got := runTestProg(t, "testprogcgo", test)
+			t.Log(got)
+			if !strings.Contains(got, "SIGSEGV") {
+				t.Errorf("expected crash from signal")
+			}
+		})
+	}
+}
+
+// TestEINTR tests that we handle EINTR correctly.
+// See issue #20400 and friends.
+func TestEINTR(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no EINTR on %s", runtime.GOOS)
+	case "linux":
+		if runtime.GOARCH == "386" {
+			// On linux-386 the Go signal handler sets
+			// a restorer function that is not preserved
+			// by the C sigaction call in the test,
+			// causing the signal handler to crash when
+			// returning the normal code. The test is not
+			// architecture-specific, so just skip on 386
+			// rather than doing a complicated workaround.
+			t.Skip("skipping on linux-386; C sigaction does not preserve Go restorer")
+		}
+	}
+
+	t.Parallel()
+	output := runTestProg(t, "testprogcgo", "EINTR")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+// Issue #42207.
+func TestNeedmDeadlock(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no signals on %s", runtime.GOOS)
+	}
+	output := runTestProg(t, "testprogcgo", "NeedmDeadlock")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
diff --git a/src/runtime/crash_nonunix_test.go b/src/runtime/crash_nonunix_test.go
new file mode 100644
index 0000000..06c197e
--- /dev/null
+++ b/src/runtime/crash_nonunix_test.go
@@ -0,0 +1,13 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build windows plan9 js,wasm
+
+package runtime_test
+
+import "os"
+
+// sigquit is the signal to send to kill a hanging testdata program.
+// On Unix we send SIGQUIT, but on non-Unix we only have os.Kill.
+var sigquit = os.Kill
diff --git a/src/runtime/crash_test.go b/src/runtime/crash_test.go
new file mode 100644
index 0000000..e5bd797
--- /dev/null
+++ b/src/runtime/crash_test.go
@@ -0,0 +1,817 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"flag"
+	"fmt"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"regexp"
+	"runtime"
+	"strconv"
+	"strings"
+	"sync"
+	"testing"
+	"time"
+)
+
+var toRemove []string
+
+func TestMain(m *testing.M) {
+	status := m.Run()
+	for _, file := range toRemove {
+		os.RemoveAll(file)
+	}
+	os.Exit(status)
+}
+
+var testprog struct {
+	sync.Mutex
+	dir    string
+	target map[string]buildexe
+}
+
+type buildexe struct {
+	exe string
+	err error
+}
+
+func runTestProg(t *testing.T, binary, name string, env ...string) string {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	testenv.MustHaveGoBuild(t)
+
+	exe, err := buildTestProg(t, binary)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	return runBuiltTestProg(t, exe, name, env...)
+}
+
+func runBuiltTestProg(t *testing.T, exe, name string, env ...string) string {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	testenv.MustHaveGoBuild(t)
+
+	cmd := testenv.CleanCmdEnv(exec.Command(exe, name))
+	cmd.Env = append(cmd.Env, env...)
+	if testing.Short() {
+		cmd.Env = append(cmd.Env, "RUNTIME_TEST_SHORT=1")
+	}
+	var b bytes.Buffer
+	cmd.Stdout = &b
+	cmd.Stderr = &b
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("starting %s %s: %v", exe, name, err)
+	}
+
+	// If the process doesn't complete within 1 minute,
+	// assume it is hanging and kill it to get a stack trace.
+	p := cmd.Process
+	done := make(chan bool)
+	go func() {
+		scale := 1
+		// This GOARCH/GOOS test is copied from cmd/dist/test.go.
+		// TODO(iant): Have cmd/dist update the environment variable.
+		if runtime.GOARCH == "arm" || runtime.GOOS == "windows" {
+			scale = 2
+		}
+		if s := os.Getenv("GO_TEST_TIMEOUT_SCALE"); s != "" {
+			if sc, err := strconv.Atoi(s); err == nil {
+				scale = sc
+			}
+		}
+
+		select {
+		case <-done:
+		case <-time.After(time.Duration(scale) * time.Minute):
+			p.Signal(sigquit)
+		}
+	}()
+
+	if err := cmd.Wait(); err != nil {
+		t.Logf("%s %s exit status: %v", exe, name, err)
+	}
+	close(done)
+
+	return b.String()
+}
+
+func buildTestProg(t *testing.T, binary string, flags ...string) (string, error) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	testprog.Lock()
+	defer testprog.Unlock()
+	if testprog.dir == "" {
+		dir, err := os.MkdirTemp("", "go-build")
+		if err != nil {
+			t.Fatalf("failed to create temp directory: %v", err)
+		}
+		testprog.dir = dir
+		toRemove = append(toRemove, dir)
+	}
+
+	if testprog.target == nil {
+		testprog.target = make(map[string]buildexe)
+	}
+	name := binary
+	if len(flags) > 0 {
+		name += "_" + strings.Join(flags, "_")
+	}
+	target, ok := testprog.target[name]
+	if ok {
+		return target.exe, target.err
+	}
+
+	exe := filepath.Join(testprog.dir, name+".exe")
+	cmd := exec.Command(testenv.GoToolPath(t), append([]string{"build", "-o", exe}, flags...)...)
+	cmd.Dir = "testdata/" + binary
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		target.err = fmt.Errorf("building %s %v: %v\n%s", binary, flags, err, out)
+		testprog.target[name] = target
+		return "", target.err
+	}
+	target.exe = exe
+	testprog.target[name] = target
+	return exe, nil
+}
+
+func TestVDSO(t *testing.T) {
+	t.Parallel()
+	output := runTestProg(t, "testprog", "SignalInVDSO")
+	want := "success\n"
+	if output != want {
+		t.Fatalf("output:\n%s\n\nwanted:\n%s", output, want)
+	}
+}
+
+func testCrashHandler(t *testing.T, cgo bool) {
+	type crashTest struct {
+		Cgo bool
+	}
+	var output string
+	if cgo {
+		output = runTestProg(t, "testprogcgo", "Crash")
+	} else {
+		output = runTestProg(t, "testprog", "Crash")
+	}
+	want := "main: recovered done\nnew-thread: recovered done\nsecond-new-thread: recovered done\nmain-again: recovered done\n"
+	if output != want {
+		t.Fatalf("output:\n%s\n\nwanted:\n%s", output, want)
+	}
+}
+
+func TestCrashHandler(t *testing.T) {
+	testCrashHandler(t, false)
+}
+
+func testDeadlock(t *testing.T, name string) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t)
+
+	output := runTestProg(t, "testprog", name)
+	want := "fatal error: all goroutines are asleep - deadlock!\n"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestSimpleDeadlock(t *testing.T) {
+	testDeadlock(t, "SimpleDeadlock")
+}
+
+func TestInitDeadlock(t *testing.T) {
+	testDeadlock(t, "InitDeadlock")
+}
+
+func TestLockedDeadlock(t *testing.T) {
+	testDeadlock(t, "LockedDeadlock")
+}
+
+func TestLockedDeadlock2(t *testing.T) {
+	testDeadlock(t, "LockedDeadlock2")
+}
+
+func TestGoexitDeadlock(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t)
+
+	output := runTestProg(t, "testprog", "GoexitDeadlock")
+	want := "no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.Contains(output, want) {
+		t.Fatalf("output:\n%s\n\nwant output containing: %s", output, want)
+	}
+}
+
+func TestStackOverflow(t *testing.T) {
+	output := runTestProg(t, "testprog", "StackOverflow")
+	want := []string{
+		"runtime: goroutine stack exceeds 1474560-byte limit\n",
+		"fatal error: stack overflow",
+		// information about the current SP and stack bounds
+		"runtime: sp=",
+		"stack=[",
+	}
+	if !strings.HasPrefix(output, want[0]) {
+		t.Errorf("output does not start with %q", want[0])
+	}
+	for _, s := range want[1:] {
+		if !strings.Contains(output, s) {
+			t.Errorf("output does not contain %q", s)
+		}
+	}
+	if t.Failed() {
+		t.Logf("output:\n%s", output)
+	}
+}
+
+func TestThreadExhaustion(t *testing.T) {
+	output := runTestProg(t, "testprog", "ThreadExhaustion")
+	want := "runtime: program exceeds 10-thread limit\nfatal error: thread exhaustion"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestRecursivePanic(t *testing.T) {
+	output := runTestProg(t, "testprog", "RecursivePanic")
+	want := `wrap: bad
+panic: again
+
+`
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+}
+
+func TestRecursivePanic2(t *testing.T) {
+	output := runTestProg(t, "testprog", "RecursivePanic2")
+	want := `first panic
+second panic
+panic: third panic
+
+`
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+}
+
+func TestRecursivePanic3(t *testing.T) {
+	output := runTestProg(t, "testprog", "RecursivePanic3")
+	want := `panic: first panic
+
+`
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+}
+
+func TestRecursivePanic4(t *testing.T) {
+	output := runTestProg(t, "testprog", "RecursivePanic4")
+	want := `panic: first panic [recovered]
+	panic: second panic
+`
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+}
+
+func TestRecursivePanic5(t *testing.T) {
+	output := runTestProg(t, "testprog", "RecursivePanic5")
+	want := `first panic
+second panic
+panic: third panic
+`
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+}
+
+func TestGoexitCrash(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t)
+
+	output := runTestProg(t, "testprog", "GoexitExit")
+	want := "no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.Contains(output, want) {
+		t.Fatalf("output:\n%s\n\nwant output containing: %s", output, want)
+	}
+}
+
+func TestGoexitDefer(t *testing.T) {
+	c := make(chan struct{})
+	go func() {
+		defer func() {
+			r := recover()
+			if r != nil {
+				t.Errorf("non-nil recover during Goexit")
+			}
+			c <- struct{}{}
+		}()
+		runtime.Goexit()
+	}()
+	// Note: if the defer fails to run, we will get a deadlock here
+	<-c
+}
+
+func TestGoNil(t *testing.T) {
+	output := runTestProg(t, "testprog", "GoNil")
+	want := "go of nil func value"
+	if !strings.Contains(output, want) {
+		t.Fatalf("output:\n%s\n\nwant output containing: %s", output, want)
+	}
+}
+
+func TestMainGoroutineID(t *testing.T) {
+	output := runTestProg(t, "testprog", "MainGoroutineID")
+	want := "panic: test\n\ngoroutine 1 [running]:\n"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestNoHelperGoroutines(t *testing.T) {
+	output := runTestProg(t, "testprog", "NoHelperGoroutines")
+	matches := regexp.MustCompile(`goroutine [0-9]+ \[`).FindAllStringSubmatch(output, -1)
+	if len(matches) != 1 || matches[0][0] != "goroutine 1 [" {
+		t.Fatalf("want to see only goroutine 1, see:\n%s", output)
+	}
+}
+
+func TestBreakpoint(t *testing.T) {
+	output := runTestProg(t, "testprog", "Breakpoint")
+	// If runtime.Breakpoint() is inlined, then the stack trace prints
+	// "runtime.Breakpoint(...)" instead of "runtime.Breakpoint()".
+	want := "runtime.Breakpoint("
+	if !strings.Contains(output, want) {
+		t.Fatalf("output:\n%s\n\nwant output containing: %s", output, want)
+	}
+}
+
+func TestGoexitInPanic(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t)
+
+	// see issue 8774: this code used to trigger an infinite recursion
+	output := runTestProg(t, "testprog", "GoexitInPanic")
+	want := "fatal error: no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+// Issue 14965: Runtime panics should be of type runtime.Error
+func TestRuntimePanicWithRuntimeError(t *testing.T) {
+	testCases := [...]func(){
+		0: func() {
+			var m map[uint64]bool
+			m[1234] = true
+		},
+		1: func() {
+			ch := make(chan struct{})
+			close(ch)
+			close(ch)
+		},
+		2: func() {
+			var ch = make(chan struct{})
+			close(ch)
+			ch <- struct{}{}
+		},
+		3: func() {
+			var s = make([]int, 2)
+			_ = s[2]
+		},
+		4: func() {
+			n := -1
+			_ = make(chan bool, n)
+		},
+		5: func() {
+			close((chan bool)(nil))
+		},
+	}
+
+	for i, fn := range testCases {
+		got := panicValue(fn)
+		if _, ok := got.(runtime.Error); !ok {
+			t.Errorf("test #%d: recovered value %v(type %T) does not implement runtime.Error", i, got, got)
+		}
+	}
+}
+
+func panicValue(fn func()) (recovered interface{}) {
+	defer func() {
+		recovered = recover()
+	}()
+	fn()
+	return
+}
+
+func TestPanicAfterGoexit(t *testing.T) {
+	// an uncaught panic should still work after goexit
+	output := runTestProg(t, "testprog", "PanicAfterGoexit")
+	want := "panic: hello"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestRecoveredPanicAfterGoexit(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t)
+
+	output := runTestProg(t, "testprog", "RecoveredPanicAfterGoexit")
+	want := "fatal error: no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestRecoverBeforePanicAfterGoexit(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t)
+
+	t.Parallel()
+	output := runTestProg(t, "testprog", "RecoverBeforePanicAfterGoexit")
+	want := "fatal error: no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestRecoverBeforePanicAfterGoexit2(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t)
+
+	t.Parallel()
+	output := runTestProg(t, "testprog", "RecoverBeforePanicAfterGoexit2")
+	want := "fatal error: no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestNetpollDeadlock(t *testing.T) {
+	if os.Getenv("GO_BUILDER_NAME") == "darwin-amd64-10_12" {
+		// A suspected kernel bug in macOS 10.12 occasionally results in
+		// an apparent deadlock when dialing localhost. The errors have not
+		// been observed on newer versions of the OS, so we don't plan to work
+		// around them. See https://golang.org/issue/22019.
+		testenv.SkipFlaky(t, 22019)
+	}
+
+	t.Parallel()
+	output := runTestProg(t, "testprognet", "NetpollDeadlock")
+	want := "done\n"
+	if !strings.HasSuffix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestPanicTraceback(t *testing.T) {
+	t.Parallel()
+	output := runTestProg(t, "testprog", "PanicTraceback")
+	want := "panic: hello\n\tpanic: panic pt2\n\tpanic: panic pt1\n"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+	// Check functions in the traceback.
+	fns := []string{"main.pt1.func1", "panic", "main.pt2.func1", "panic", "main.pt2", "main.pt1"}
+	for _, fn := range fns {
+		re := regexp.MustCompile(`(?m)^` + regexp.QuoteMeta(fn) + `\(.*\n`)
+		idx := re.FindStringIndex(output)
+		if idx == nil {
+			t.Fatalf("expected %q function in traceback:\n%s", fn, output)
+		}
+		output = output[idx[1]:]
+	}
+}
+
+func testPanicDeadlock(t *testing.T, name string, want string) {
+	// test issue 14432
+	output := runTestProg(t, "testprog", name)
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestPanicDeadlockGosched(t *testing.T) {
+	testPanicDeadlock(t, "GoschedInPanic", "panic: errorThatGosched\n\n")
+}
+
+func TestPanicDeadlockSyscall(t *testing.T) {
+	testPanicDeadlock(t, "SyscallInPanic", "1\n2\npanic: 3\n\n")
+}
+
+func TestPanicLoop(t *testing.T) {
+	output := runTestProg(t, "testprog", "PanicLoop")
+	if want := "panic while printing panic value"; !strings.Contains(output, want) {
+		t.Errorf("output does not contain %q:\n%s", want, output)
+	}
+}
+
+func TestMemPprof(t *testing.T) {
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	got, err := testenv.CleanCmdEnv(exec.Command(exe, "MemProf")).CombinedOutput()
+	if err != nil {
+		t.Fatal(err)
+	}
+	fn := strings.TrimSpace(string(got))
+	defer os.Remove(fn)
+
+	for try := 0; try < 2; try++ {
+		cmd := testenv.CleanCmdEnv(exec.Command(testenv.GoToolPath(t), "tool", "pprof", "-alloc_space", "-top"))
+		// Check that pprof works both with and without explicit executable on command line.
+		if try == 0 {
+			cmd.Args = append(cmd.Args, exe, fn)
+		} else {
+			cmd.Args = append(cmd.Args, fn)
+		}
+		found := false
+		for i, e := range cmd.Env {
+			if strings.HasPrefix(e, "PPROF_TMPDIR=") {
+				cmd.Env[i] = "PPROF_TMPDIR=" + os.TempDir()
+				found = true
+				break
+			}
+		}
+		if !found {
+			cmd.Env = append(cmd.Env, "PPROF_TMPDIR="+os.TempDir())
+		}
+
+		top, err := cmd.CombinedOutput()
+		t.Logf("%s:\n%s", cmd.Args, top)
+		if err != nil {
+			t.Error(err)
+		} else if !bytes.Contains(top, []byte("MemProf")) {
+			t.Error("missing MemProf in pprof output")
+		}
+	}
+}
+
+var concurrentMapTest = flag.Bool("run_concurrent_map_tests", false, "also run flaky concurrent map tests")
+
+func TestConcurrentMapWrites(t *testing.T) {
+	if !*concurrentMapTest {
+		t.Skip("skipping without -run_concurrent_map_tests")
+	}
+	testenv.MustHaveGoRun(t)
+	output := runTestProg(t, "testprog", "concurrentMapWrites")
+	want := "fatal error: concurrent map writes"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+func TestConcurrentMapReadWrite(t *testing.T) {
+	if !*concurrentMapTest {
+		t.Skip("skipping without -run_concurrent_map_tests")
+	}
+	testenv.MustHaveGoRun(t)
+	output := runTestProg(t, "testprog", "concurrentMapReadWrite")
+	want := "fatal error: concurrent map read and map write"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+func TestConcurrentMapIterateWrite(t *testing.T) {
+	if !*concurrentMapTest {
+		t.Skip("skipping without -run_concurrent_map_tests")
+	}
+	testenv.MustHaveGoRun(t)
+	output := runTestProg(t, "testprog", "concurrentMapIterateWrite")
+	want := "fatal error: concurrent map iteration and map write"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+type point struct {
+	x, y *int
+}
+
+func (p *point) negate() {
+	*p.x = *p.x * -1
+	*p.y = *p.y * -1
+}
+
+// Test for issue #10152.
+func TestPanicInlined(t *testing.T) {
+	defer func() {
+		r := recover()
+		if r == nil {
+			t.Fatalf("recover failed")
+		}
+		buf := make([]byte, 2048)
+		n := runtime.Stack(buf, false)
+		buf = buf[:n]
+		if !bytes.Contains(buf, []byte("(*point).negate(")) {
+			t.Fatalf("expecting stack trace to contain call to (*point).negate()")
+		}
+	}()
+
+	pt := new(point)
+	pt.negate()
+}
+
+// Test for issues #3934 and #20018.
+// We want to delay exiting until a panic print is complete.
+func TestPanicRace(t *testing.T) {
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// The test is intentionally racy, and in my testing does not
+	// produce the expected output about 0.05% of the time.
+	// So run the program in a loop and only fail the test if we
+	// get the wrong output ten times in a row.
+	const tries = 10
+retry:
+	for i := 0; i < tries; i++ {
+		got, err := testenv.CleanCmdEnv(exec.Command(exe, "PanicRace")).CombinedOutput()
+		if err == nil {
+			t.Logf("try %d: program exited successfully, should have failed", i+1)
+			continue
+		}
+
+		if i > 0 {
+			t.Logf("try %d:\n", i+1)
+		}
+		t.Logf("%s\n", got)
+
+		wants := []string{
+			"panic: crash",
+			"PanicRace",
+			"created by ",
+		}
+		for _, want := range wants {
+			if !bytes.Contains(got, []byte(want)) {
+				t.Logf("did not find expected string %q", want)
+				continue retry
+			}
+		}
+
+		// Test generated expected output.
+		return
+	}
+	t.Errorf("test ran %d times without producing expected output", tries)
+}
+
+func TestBadTraceback(t *testing.T) {
+	output := runTestProg(t, "testprog", "BadTraceback")
+	for _, want := range []string{
+		"runtime: unexpected return pc",
+		"called from 0xbad",
+		"00000bad",    // Smashed LR in hex dump
+		"<main.badLR", // Symbolization in hex dump (badLR1 or badLR2)
+	} {
+		if !strings.Contains(output, want) {
+			t.Errorf("output does not contain %q:\n%s", want, output)
+		}
+	}
+}
+
+func TestTimePprof(t *testing.T) {
+	// Pass GOTRACEBACK for issue #41120 to try to get more
+	// information on timeout.
+	fn := runTestProg(t, "testprog", "TimeProf", "GOTRACEBACK=crash")
+	fn = strings.TrimSpace(fn)
+	defer os.Remove(fn)
+
+	cmd := testenv.CleanCmdEnv(exec.Command(testenv.GoToolPath(t), "tool", "pprof", "-top", "-nodecount=1", fn))
+	cmd.Env = append(cmd.Env, "PPROF_TMPDIR="+os.TempDir())
+	top, err := cmd.CombinedOutput()
+	t.Logf("%s", top)
+	if err != nil {
+		t.Error(err)
+	} else if bytes.Contains(top, []byte("ExternalCode")) {
+		t.Error("profiler refers to ExternalCode")
+	}
+}
+
+// Test that runtime.abort does so.
+func TestAbort(t *testing.T) {
+	// Pass GOTRACEBACK to ensure we get runtime frames.
+	output := runTestProg(t, "testprog", "Abort", "GOTRACEBACK=system")
+	if want := "runtime.abort"; !strings.Contains(output, want) {
+		t.Errorf("output does not contain %q:\n%s", want, output)
+	}
+	if strings.Contains(output, "BAD") {
+		t.Errorf("output contains BAD:\n%s", output)
+	}
+	// Check that it's a signal traceback.
+	want := "PC="
+	// For systems that use a breakpoint, check specifically for that.
+	switch runtime.GOARCH {
+	case "386", "amd64":
+		switch runtime.GOOS {
+		case "plan9":
+			want = "sys: breakpoint"
+		case "windows":
+			want = "Exception 0x80000003"
+		default:
+			want = "SIGTRAP"
+		}
+	}
+	if !strings.Contains(output, want) {
+		t.Errorf("output does not contain %q:\n%s", want, output)
+	}
+}
+
+// For TestRuntimePanic: test a panic in the runtime package without
+// involving the testing harness.
+func init() {
+	if os.Getenv("GO_TEST_RUNTIME_PANIC") == "1" {
+		defer func() {
+			if r := recover(); r != nil {
+				// We expect to crash, so exit 0
+				// to indicate failure.
+				os.Exit(0)
+			}
+		}()
+		runtime.PanicForTesting(nil, 1)
+		// We expect to crash, so exit 0 to indicate failure.
+		os.Exit(0)
+	}
+}
+
+func TestRuntimePanic(t *testing.T) {
+	testenv.MustHaveExec(t)
+	cmd := testenv.CleanCmdEnv(exec.Command(os.Args[0], "-test.run=TestRuntimePanic"))
+	cmd.Env = append(cmd.Env, "GO_TEST_RUNTIME_PANIC=1")
+	out, err := cmd.CombinedOutput()
+	t.Logf("%s", out)
+	if err == nil {
+		t.Error("child process did not fail")
+	} else if want := "runtime.unexportedPanicForTesting"; !bytes.Contains(out, []byte(want)) {
+		t.Errorf("output did not contain expected string %q", want)
+	}
+}
+
+// Test that g0 stack overflows are handled gracefully.
+func TestG0StackOverflow(t *testing.T) {
+	testenv.MustHaveExec(t)
+
+	switch runtime.GOOS {
+	case "darwin", "dragonfly", "freebsd", "linux", "netbsd", "openbsd", "android":
+		t.Skipf("g0 stack is wrong on pthread platforms (see golang.org/issue/26061)")
+	}
+
+	if os.Getenv("TEST_G0_STACK_OVERFLOW") != "1" {
+		cmd := testenv.CleanCmdEnv(exec.Command(os.Args[0], "-test.run=TestG0StackOverflow", "-test.v"))
+		cmd.Env = append(cmd.Env, "TEST_G0_STACK_OVERFLOW=1")
+		out, err := cmd.CombinedOutput()
+		// Don't check err since it's expected to crash.
+		if n := strings.Count(string(out), "morestack on g0\n"); n != 1 {
+			t.Fatalf("%s\n(exit status %v)", out, err)
+		}
+		// Check that it's a signal-style traceback.
+		if runtime.GOOS != "windows" {
+			if want := "PC="; !strings.Contains(string(out), want) {
+				t.Errorf("output does not contain %q:\n%s", want, out)
+			}
+		}
+		return
+	}
+
+	runtime.G0StackOverflow()
+}
+
+// Test that panic message is not clobbered.
+// See issue 30150.
+func TestDoublePanic(t *testing.T) {
+	output := runTestProg(t, "testprog", "DoublePanic", "GODEBUG=clobberfree=1")
+	wants := []string{"panic: XXX", "panic: YYY"}
+	for _, want := range wants {
+		if !strings.Contains(output, want) {
+			t.Errorf("output:\n%s\n\nwant output containing: %s", output, want)
+		}
+	}
+}
diff --git a/src/runtime/crash_unix_test.go b/src/runtime/crash_unix_test.go
new file mode 100644
index 0000000..803b031
--- /dev/null
+++ b/src/runtime/crash_unix_test.go
@@ -0,0 +1,370 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+
+package runtime_test
+
+import (
+	"bytes"
+	"internal/testenv"
+	"io"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"runtime"
+	"strings"
+	"sync"
+	"syscall"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+// sigquit is the signal to send to kill a hanging testdata program.
+// Send SIGQUIT to get a stack trace.
+var sigquit = syscall.SIGQUIT
+
+func init() {
+	if runtime.Sigisblocked(int(syscall.SIGQUIT)) {
+		// We can't use SIGQUIT to kill subprocesses because
+		// it's blocked. Use SIGKILL instead. See issue
+		// #19196 for an example of when this happens.
+		sigquit = syscall.SIGKILL
+	}
+}
+
+func TestBadOpen(t *testing.T) {
+	// make sure we get the correct error code if open fails. Same for
+	// read/write/close on the resulting -1 fd. See issue 10052.
+	nonfile := []byte("/notreallyafile")
+	fd := runtime.Open(&nonfile[0], 0, 0)
+	if fd != -1 {
+		t.Errorf("open(%q)=%d, want -1", nonfile, fd)
+	}
+	var buf [32]byte
+	r := runtime.Read(-1, unsafe.Pointer(&buf[0]), int32(len(buf)))
+	if got, want := r, -int32(syscall.EBADF); got != want {
+		t.Errorf("read()=%d, want %d", got, want)
+	}
+	w := runtime.Write(^uintptr(0), unsafe.Pointer(&buf[0]), int32(len(buf)))
+	if got, want := w, -int32(syscall.EBADF); got != want {
+		t.Errorf("write()=%d, want %d", got, want)
+	}
+	c := runtime.Close(-1)
+	if c != -1 {
+		t.Errorf("close()=%d, want -1", c)
+	}
+}
+
+func TestCrashDumpsAllThreads(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	switch runtime.GOOS {
+	case "darwin", "dragonfly", "freebsd", "linux", "netbsd", "openbsd", "illumos", "solaris":
+	default:
+		t.Skipf("skipping; not supported on %v", runtime.GOOS)
+	}
+
+	if runtime.GOOS == "openbsd" && runtime.GOARCH == "mips64" {
+		t.Skipf("skipping; test fails on %s/%s - see issue #42464", runtime.GOOS, runtime.GOARCH)
+	}
+
+	if runtime.Sigisblocked(int(syscall.SIGQUIT)) {
+		t.Skip("skipping; SIGQUIT is blocked, see golang.org/issue/19196")
+	}
+
+	// We don't use executeTest because we need to kill the
+	// program while it is running.
+
+	testenv.MustHaveGoBuild(t)
+
+	t.Parallel()
+
+	dir, err := os.MkdirTemp("", "go-build")
+	if err != nil {
+		t.Fatalf("failed to create temp directory: %v", err)
+	}
+	defer os.RemoveAll(dir)
+
+	if err := os.WriteFile(filepath.Join(dir, "main.go"), []byte(crashDumpsAllThreadsSource), 0666); err != nil {
+		t.Fatalf("failed to create Go file: %v", err)
+	}
+
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source: %v\n%s", err, out)
+	}
+
+	cmd = exec.Command(filepath.Join(dir, "a.exe"))
+	cmd = testenv.CleanCmdEnv(cmd)
+	cmd.Env = append(cmd.Env,
+		"GOTRACEBACK=crash",
+		// Set GOGC=off. Because of golang.org/issue/10958, the tight
+		// loops in the test program are not preemptible. If GC kicks
+		// in, it may lock up and prevent main from saying it's ready.
+		"GOGC=off",
+		// Set GODEBUG=asyncpreemptoff=1. If a thread is preempted
+		// when it receives SIGQUIT, it won't show the expected
+		// stack trace. See issue 35356.
+		"GODEBUG=asyncpreemptoff=1",
+	)
+
+	var outbuf bytes.Buffer
+	cmd.Stdout = &outbuf
+	cmd.Stderr = &outbuf
+
+	rp, wp, err := os.Pipe()
+	if err != nil {
+		t.Fatal(err)
+	}
+	cmd.ExtraFiles = []*os.File{wp}
+
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("starting program: %v", err)
+	}
+
+	if err := wp.Close(); err != nil {
+		t.Logf("closing write pipe: %v", err)
+	}
+	if _, err := rp.Read(make([]byte, 1)); err != nil {
+		t.Fatalf("reading from pipe: %v", err)
+	}
+
+	if err := cmd.Process.Signal(syscall.SIGQUIT); err != nil {
+		t.Fatalf("signal: %v", err)
+	}
+
+	// No point in checking the error return from Wait--we expect
+	// it to fail.
+	cmd.Wait()
+
+	// We want to see a stack trace for each thread.
+	// Before https://golang.org/cl/2811 running threads would say
+	// "goroutine running on other thread; stack unavailable".
+	out = outbuf.Bytes()
+	n := bytes.Count(out, []byte("main.loop("))
+	if n != 4 {
+		t.Errorf("found %d instances of main.loop; expected 4", n)
+		t.Logf("%s", out)
+	}
+}
+
+const crashDumpsAllThreadsSource = `
+package main
+
+import (
+	"fmt"
+	"os"
+	"runtime"
+)
+
+func main() {
+	const count = 4
+	runtime.GOMAXPROCS(count + 1)
+
+	chans := make([]chan bool, count)
+	for i := range chans {
+		chans[i] = make(chan bool)
+		go loop(i, chans[i])
+	}
+
+	// Wait for all the goroutines to start executing.
+	for _, c := range chans {
+		<-c
+	}
+
+	// Tell our parent that all the goroutines are executing.
+	if _, err := os.NewFile(3, "pipe").WriteString("x"); err != nil {
+		fmt.Fprintf(os.Stderr, "write to pipe failed: %v\n", err)
+		os.Exit(2)
+	}
+
+	select {}
+}
+
+func loop(i int, c chan bool) {
+	close(c)
+	for {
+		for j := 0; j < 0x7fffffff; j++ {
+		}
+	}
+}
+`
+
+func TestPanicSystemstack(t *testing.T) {
+	// Test that GOTRACEBACK=crash prints both the system and user
+	// stack of other threads.
+
+	// The GOTRACEBACK=crash handler takes 0.1 seconds even if
+	// it's not writing a core file and potentially much longer if
+	// it is. Skip in short mode.
+	if testing.Short() {
+		t.Skip("Skipping in short mode (GOTRACEBACK=crash is slow)")
+	}
+
+	if runtime.Sigisblocked(int(syscall.SIGQUIT)) {
+		t.Skip("skipping; SIGQUIT is blocked, see golang.org/issue/19196")
+	}
+
+	t.Parallel()
+	cmd := exec.Command(os.Args[0], "testPanicSystemstackInternal")
+	cmd = testenv.CleanCmdEnv(cmd)
+	cmd.Env = append(cmd.Env, "GOTRACEBACK=crash")
+	pr, pw, err := os.Pipe()
+	if err != nil {
+		t.Fatal("creating pipe: ", err)
+	}
+	cmd.Stderr = pw
+	if err := cmd.Start(); err != nil {
+		t.Fatal("starting command: ", err)
+	}
+	defer cmd.Process.Wait()
+	defer cmd.Process.Kill()
+	if err := pw.Close(); err != nil {
+		t.Log("closing write pipe: ", err)
+	}
+	defer pr.Close()
+
+	// Wait for "x\nx\n" to indicate almost-readiness.
+	buf := make([]byte, 4)
+	_, err = io.ReadFull(pr, buf)
+	if err != nil || string(buf) != "x\nx\n" {
+		t.Fatal("subprocess failed; output:\n", string(buf))
+	}
+
+	// The child blockers print "x\n" and then block on a lock. Receiving
+	// those bytes only indicates that the child is _about to block_. Since
+	// we don't have a way to know when it is fully blocked, sleep a bit to
+	// make us less likely to lose the race and signal before the child
+	// blocks.
+	time.Sleep(100 * time.Millisecond)
+
+	// Send SIGQUIT.
+	if err := cmd.Process.Signal(syscall.SIGQUIT); err != nil {
+		t.Fatal("signaling subprocess: ", err)
+	}
+
+	// Get traceback.
+	tb, err := io.ReadAll(pr)
+	if err != nil {
+		t.Fatal("reading traceback from pipe: ", err)
+	}
+
+	// Traceback should have two testPanicSystemstackInternal's
+	// and two blockOnSystemStackInternal's.
+	if bytes.Count(tb, []byte("testPanicSystemstackInternal")) != 2 {
+		t.Fatal("traceback missing user stack:\n", string(tb))
+	} else if bytes.Count(tb, []byte("blockOnSystemStackInternal")) != 2 {
+		t.Fatal("traceback missing system stack:\n", string(tb))
+	}
+}
+
+func init() {
+	if len(os.Args) >= 2 && os.Args[1] == "testPanicSystemstackInternal" {
+		// Get two threads running on the system stack with
+		// something recognizable in the stack trace.
+		runtime.GOMAXPROCS(2)
+		go testPanicSystemstackInternal()
+		testPanicSystemstackInternal()
+	}
+}
+
+func testPanicSystemstackInternal() {
+	runtime.BlockOnSystemStack()
+	os.Exit(1) // Should be unreachable.
+}
+
+func TestSignalExitStatus(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+	err = testenv.CleanCmdEnv(exec.Command(exe, "SignalExitStatus")).Run()
+	if err == nil {
+		t.Error("test program succeeded unexpectedly")
+	} else if ee, ok := err.(*exec.ExitError); !ok {
+		t.Errorf("error (%v) has type %T; expected exec.ExitError", err, err)
+	} else if ws, ok := ee.Sys().(syscall.WaitStatus); !ok {
+		t.Errorf("error.Sys (%v) has type %T; expected syscall.WaitStatus", ee.Sys(), ee.Sys())
+	} else if !ws.Signaled() || ws.Signal() != syscall.SIGTERM {
+		t.Errorf("got %v; expected SIGTERM", ee)
+	}
+}
+
+func TestSignalIgnoreSIGTRAP(t *testing.T) {
+	if runtime.GOOS == "openbsd" {
+		if bn := testenv.Builder(); strings.HasSuffix(bn, "-62") || strings.HasSuffix(bn, "-64") {
+			testenv.SkipFlaky(t, 17496)
+		}
+	}
+
+	output := runTestProg(t, "testprognet", "SignalIgnoreSIGTRAP")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestSignalDuringExec(t *testing.T) {
+	switch runtime.GOOS {
+	case "darwin", "dragonfly", "freebsd", "linux", "netbsd", "openbsd":
+	default:
+		t.Skipf("skipping test on %s", runtime.GOOS)
+	}
+	output := runTestProg(t, "testprognet", "SignalDuringExec")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestSignalM(t *testing.T) {
+	r, w, errno := runtime.Pipe()
+	if errno != 0 {
+		t.Fatal(syscall.Errno(errno))
+	}
+	defer func() {
+		runtime.Close(r)
+		runtime.Close(w)
+	}()
+	runtime.Closeonexec(r)
+	runtime.Closeonexec(w)
+
+	var want, got int64
+	var wg sync.WaitGroup
+	ready := make(chan *runtime.M)
+	wg.Add(1)
+	go func() {
+		runtime.LockOSThread()
+		want, got = runtime.WaitForSigusr1(r, w, func(mp *runtime.M) {
+			ready <- mp
+		})
+		runtime.UnlockOSThread()
+		wg.Done()
+	}()
+	waitingM := <-ready
+	runtime.SendSigusr1(waitingM)
+
+	timer := time.AfterFunc(time.Second, func() {
+		// Write 1 to tell WaitForSigusr1 that we timed out.
+		bw := byte(1)
+		if n := runtime.Write(uintptr(w), unsafe.Pointer(&bw), 1); n != 1 {
+			t.Errorf("pipe write failed: %d", n)
+		}
+	})
+	defer timer.Stop()
+
+	wg.Wait()
+	if got == -1 {
+		t.Fatal("signalM signal not received")
+	} else if want != got {
+		t.Fatalf("signal sent to M %d, but received on M %d", want, got)
+	}
+}
diff --git a/src/runtime/debug.go b/src/runtime/debug.go
new file mode 100644
index 0000000..f411b22
--- /dev/null
+++ b/src/runtime/debug.go
@@ -0,0 +1,63 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// GOMAXPROCS sets the maximum number of CPUs that can be executing
+// simultaneously and returns the previous setting. It defaults to
+// the value of runtime.NumCPU. If n < 1, it does not change the current setting.
+// This call will go away when the scheduler improves.
+func GOMAXPROCS(n int) int {
+	if GOARCH == "wasm" && n > 1 {
+		n = 1 // WebAssembly has no threads yet, so only one CPU is possible.
+	}
+
+	lock(&sched.lock)
+	ret := int(gomaxprocs)
+	unlock(&sched.lock)
+	if n <= 0 || n == ret {
+		return ret
+	}
+
+	stopTheWorldGC("GOMAXPROCS")
+
+	// newprocs will be processed by startTheWorld
+	newprocs = int32(n)
+
+	startTheWorldGC()
+	return ret
+}
+
+// NumCPU returns the number of logical CPUs usable by the current process.
+//
+// The set of available CPUs is checked by querying the operating system
+// at process startup. Changes to operating system CPU allocation after
+// process startup are not reflected.
+func NumCPU() int {
+	return int(ncpu)
+}
+
+// NumCgoCall returns the number of cgo calls made by the current process.
+func NumCgoCall() int64 {
+	var n int64
+	for mp := (*m)(atomic.Loadp(unsafe.Pointer(&allm))); mp != nil; mp = mp.alllink {
+		n += int64(mp.ncgocall)
+	}
+	return n
+}
+
+// NumGoroutine returns the number of goroutines that currently exist.
+func NumGoroutine() int {
+	return int(gcount())
+}
+
+//go:linkname debug_modinfo runtime/debug.modinfo
+func debug_modinfo() string {
+	return modinfo
+}
diff --git a/src/runtime/debug/debug.s b/src/runtime/debug/debug.s
new file mode 100644
index 0000000..6aae33a
--- /dev/null
+++ b/src/runtime/debug/debug.s
@@ -0,0 +1,9 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Nothing to see here.
+// This file exists so that the go command knows that parts of the
+// package are implemented in C, so that it does not instruct the
+// Go compiler to complain about extern declarations.
+// The actual implementations are in package runtime.
diff --git a/src/runtime/debug/garbage.go b/src/runtime/debug/garbage.go
new file mode 100644
index 0000000..00f92c3
--- /dev/null
+++ b/src/runtime/debug/garbage.go
@@ -0,0 +1,175 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug
+
+import (
+	"runtime"
+	"sort"
+	"time"
+)
+
+// GCStats collect information about recent garbage collections.
+type GCStats struct {
+	LastGC         time.Time       // time of last collection
+	NumGC          int64           // number of garbage collections
+	PauseTotal     time.Duration   // total pause for all collections
+	Pause          []time.Duration // pause history, most recent first
+	PauseEnd       []time.Time     // pause end times history, most recent first
+	PauseQuantiles []time.Duration
+}
+
+// ReadGCStats reads statistics about garbage collection into stats.
+// The number of entries in the pause history is system-dependent;
+// stats.Pause slice will be reused if large enough, reallocated otherwise.
+// ReadGCStats may use the full capacity of the stats.Pause slice.
+// If stats.PauseQuantiles is non-empty, ReadGCStats fills it with quantiles
+// summarizing the distribution of pause time. For example, if
+// len(stats.PauseQuantiles) is 5, it will be filled with the minimum,
+// 25%, 50%, 75%, and maximum pause times.
+func ReadGCStats(stats *GCStats) {
+	// Create a buffer with space for at least two copies of the
+	// pause history tracked by the runtime. One will be returned
+	// to the caller and the other will be used as transfer buffer
+	// for end times history and as a temporary buffer for
+	// computing quantiles.
+	const maxPause = len(((*runtime.MemStats)(nil)).PauseNs)
+	if cap(stats.Pause) < 2*maxPause+3 {
+		stats.Pause = make([]time.Duration, 2*maxPause+3)
+	}
+
+	// readGCStats fills in the pause and end times histories (up to
+	// maxPause entries) and then three more: Unix ns time of last GC,
+	// number of GC, and total pause time in nanoseconds. Here we
+	// depend on the fact that time.Duration's native unit is
+	// nanoseconds, so the pauses and the total pause time do not need
+	// any conversion.
+	readGCStats(&stats.Pause)
+	n := len(stats.Pause) - 3
+	stats.LastGC = time.Unix(0, int64(stats.Pause[n]))
+	stats.NumGC = int64(stats.Pause[n+1])
+	stats.PauseTotal = stats.Pause[n+2]
+	n /= 2 // buffer holds pauses and end times
+	stats.Pause = stats.Pause[:n]
+
+	if cap(stats.PauseEnd) < maxPause {
+		stats.PauseEnd = make([]time.Time, 0, maxPause)
+	}
+	stats.PauseEnd = stats.PauseEnd[:0]
+	for _, ns := range stats.Pause[n : n+n] {
+		stats.PauseEnd = append(stats.PauseEnd, time.Unix(0, int64(ns)))
+	}
+
+	if len(stats.PauseQuantiles) > 0 {
+		if n == 0 {
+			for i := range stats.PauseQuantiles {
+				stats.PauseQuantiles[i] = 0
+			}
+		} else {
+			// There's room for a second copy of the data in stats.Pause.
+			// See the allocation at the top of the function.
+			sorted := stats.Pause[n : n+n]
+			copy(sorted, stats.Pause)
+			sort.Slice(sorted, func(i, j int) bool { return sorted[i] < sorted[j] })
+			nq := len(stats.PauseQuantiles) - 1
+			for i := 0; i < nq; i++ {
+				stats.PauseQuantiles[i] = sorted[len(sorted)*i/nq]
+			}
+			stats.PauseQuantiles[nq] = sorted[len(sorted)-1]
+		}
+	}
+}
+
+// SetGCPercent sets the garbage collection target percentage:
+// a collection is triggered when the ratio of freshly allocated data
+// to live data remaining after the previous collection reaches this percentage.
+// SetGCPercent returns the previous setting.
+// The initial setting is the value of the GOGC environment variable
+// at startup, or 100 if the variable is not set.
+// A negative percentage disables garbage collection.
+func SetGCPercent(percent int) int {
+	return int(setGCPercent(int32(percent)))
+}
+
+// FreeOSMemory forces a garbage collection followed by an
+// attempt to return as much memory to the operating system
+// as possible. (Even if this is not called, the runtime gradually
+// returns memory to the operating system in a background task.)
+func FreeOSMemory() {
+	freeOSMemory()
+}
+
+// SetMaxStack sets the maximum amount of memory that
+// can be used by a single goroutine stack.
+// If any goroutine exceeds this limit while growing its stack,
+// the program crashes.
+// SetMaxStack returns the previous setting.
+// The initial setting is 1 GB on 64-bit systems, 250 MB on 32-bit systems.
+// There may be a system-imposed maximum stack limit regardless
+// of the value provided to SetMaxStack.
+//
+// SetMaxStack is useful mainly for limiting the damage done by
+// goroutines that enter an infinite recursion. It only limits future
+// stack growth.
+func SetMaxStack(bytes int) int {
+	return setMaxStack(bytes)
+}
+
+// SetMaxThreads sets the maximum number of operating system
+// threads that the Go program can use. If it attempts to use more than
+// this many, the program crashes.
+// SetMaxThreads returns the previous setting.
+// The initial setting is 10,000 threads.
+//
+// The limit controls the number of operating system threads, not the number
+// of goroutines. A Go program creates a new thread only when a goroutine
+// is ready to run but all the existing threads are blocked in system calls, cgo calls,
+// or are locked to other goroutines due to use of runtime.LockOSThread.
+//
+// SetMaxThreads is useful mainly for limiting the damage done by
+// programs that create an unbounded number of threads. The idea is
+// to take down the program before it takes down the operating system.
+func SetMaxThreads(threads int) int {
+	return setMaxThreads(threads)
+}
+
+// SetPanicOnFault controls the runtime's behavior when a program faults
+// at an unexpected (non-nil) address. Such faults are typically caused by
+// bugs such as runtime memory corruption, so the default response is to crash
+// the program. Programs working with memory-mapped files or unsafe
+// manipulation of memory may cause faults at non-nil addresses in less
+// dramatic situations; SetPanicOnFault allows such programs to request
+// that the runtime trigger only a panic, not a crash.
+// The runtime.Error that the runtime panics with may have an additional method:
+//     Addr() uintptr
+// If that method exists, it returns the memory address which triggered the fault.
+// The results of Addr are best-effort and the veracity of the result
+// may depend on the platform.
+// SetPanicOnFault applies only to the current goroutine.
+// It returns the previous setting.
+func SetPanicOnFault(enabled bool) bool {
+	return setPanicOnFault(enabled)
+}
+
+// WriteHeapDump writes a description of the heap and the objects in
+// it to the given file descriptor.
+//
+// WriteHeapDump suspends the execution of all goroutines until the heap
+// dump is completely written.  Thus, the file descriptor must not be
+// connected to a pipe or socket whose other end is in the same Go
+// process; instead, use a temporary file or network socket.
+//
+// The heap dump format is defined at https://golang.org/s/go15heapdump.
+func WriteHeapDump(fd uintptr)
+
+// SetTraceback sets the amount of detail printed by the runtime in
+// the traceback it prints before exiting due to an unrecovered panic
+// or an internal runtime error.
+// The level argument takes the same values as the GOTRACEBACK
+// environment variable. For example, SetTraceback("all") ensure
+// that the program prints all goroutines when it crashes.
+// See the package runtime documentation for details.
+// If SetTraceback is called with a level lower than that of the
+// environment variable, the call is ignored.
+func SetTraceback(level string)
diff --git a/src/runtime/debug/garbage_test.go b/src/runtime/debug/garbage_test.go
new file mode 100644
index 0000000..69e769e
--- /dev/null
+++ b/src/runtime/debug/garbage_test.go
@@ -0,0 +1,193 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug_test
+
+import (
+	"internal/testenv"
+	"runtime"
+	. "runtime/debug"
+	"testing"
+	"time"
+)
+
+func TestReadGCStats(t *testing.T) {
+	defer SetGCPercent(SetGCPercent(-1))
+
+	var stats GCStats
+	var mstats runtime.MemStats
+	var min, max time.Duration
+
+	// First ReadGCStats will allocate, second should not,
+	// especially if we follow up with an explicit garbage collection.
+	stats.PauseQuantiles = make([]time.Duration, 10)
+	ReadGCStats(&stats)
+	runtime.GC()
+
+	// Assume these will return same data: no GC during ReadGCStats.
+	ReadGCStats(&stats)
+	runtime.ReadMemStats(&mstats)
+
+	if stats.NumGC != int64(mstats.NumGC) {
+		t.Errorf("stats.NumGC = %d, but mstats.NumGC = %d", stats.NumGC, mstats.NumGC)
+	}
+	if stats.PauseTotal != time.Duration(mstats.PauseTotalNs) {
+		t.Errorf("stats.PauseTotal = %d, but mstats.PauseTotalNs = %d", stats.PauseTotal, mstats.PauseTotalNs)
+	}
+	if stats.LastGC.UnixNano() != int64(mstats.LastGC) {
+		t.Errorf("stats.LastGC.UnixNano = %d, but mstats.LastGC = %d", stats.LastGC.UnixNano(), mstats.LastGC)
+	}
+	n := int(mstats.NumGC)
+	if n > len(mstats.PauseNs) {
+		n = len(mstats.PauseNs)
+	}
+	if len(stats.Pause) != n {
+		t.Errorf("len(stats.Pause) = %d, want %d", len(stats.Pause), n)
+	} else {
+		off := (int(mstats.NumGC) + len(mstats.PauseNs) - 1) % len(mstats.PauseNs)
+		for i := 0; i < n; i++ {
+			dt := stats.Pause[i]
+			if dt != time.Duration(mstats.PauseNs[off]) {
+				t.Errorf("stats.Pause[%d] = %d, want %d", i, dt, mstats.PauseNs[off])
+			}
+			if max < dt {
+				max = dt
+			}
+			if min > dt || i == 0 {
+				min = dt
+			}
+			off = (off + len(mstats.PauseNs) - 1) % len(mstats.PauseNs)
+		}
+	}
+
+	q := stats.PauseQuantiles
+	nq := len(q)
+	if q[0] != min || q[nq-1] != max {
+		t.Errorf("stats.PauseQuantiles = [%d, ..., %d], want [%d, ..., %d]", q[0], q[nq-1], min, max)
+	}
+
+	for i := 0; i < nq-1; i++ {
+		if q[i] > q[i+1] {
+			t.Errorf("stats.PauseQuantiles[%d]=%d > stats.PauseQuantiles[%d]=%d", i, q[i], i+1, q[i+1])
+		}
+	}
+
+	// compare memory stats with gc stats:
+	if len(stats.PauseEnd) != n {
+		t.Fatalf("len(stats.PauseEnd) = %d, want %d", len(stats.PauseEnd), n)
+	}
+	off := (int(mstats.NumGC) + len(mstats.PauseEnd) - 1) % len(mstats.PauseEnd)
+	for i := 0; i < n; i++ {
+		dt := stats.PauseEnd[i]
+		if dt.UnixNano() != int64(mstats.PauseEnd[off]) {
+			t.Errorf("stats.PauseEnd[%d] = %d, want %d", i, dt.UnixNano(), mstats.PauseEnd[off])
+		}
+		off = (off + len(mstats.PauseEnd) - 1) % len(mstats.PauseEnd)
+	}
+}
+
+var big = make([]byte, 1<<20)
+
+func TestFreeOSMemory(t *testing.T) {
+	var ms1, ms2 runtime.MemStats
+
+	if big == nil {
+		t.Skip("test is not reliable when run multiple times")
+	}
+	big = nil
+	runtime.GC()
+	runtime.ReadMemStats(&ms1)
+	FreeOSMemory()
+	runtime.ReadMemStats(&ms2)
+	if ms1.HeapReleased >= ms2.HeapReleased {
+		t.Errorf("released before=%d; released after=%d; did not go up", ms1.HeapReleased, ms2.HeapReleased)
+	}
+}
+
+var (
+	setGCPercentBallast interface{}
+	setGCPercentSink    interface{}
+)
+
+func TestSetGCPercent(t *testing.T) {
+	testenv.SkipFlaky(t, 20076)
+
+	// Test that the variable is being set and returned correctly.
+	old := SetGCPercent(123)
+	new := SetGCPercent(old)
+	if new != 123 {
+		t.Errorf("SetGCPercent(123); SetGCPercent(x) = %d, want 123", new)
+	}
+
+	// Test that the percentage is implemented correctly.
+	defer func() {
+		SetGCPercent(old)
+		setGCPercentBallast, setGCPercentSink = nil, nil
+	}()
+	SetGCPercent(100)
+	runtime.GC()
+	// Create 100 MB of live heap as a baseline.
+	const baseline = 100 << 20
+	var ms runtime.MemStats
+	runtime.ReadMemStats(&ms)
+	setGCPercentBallast = make([]byte, baseline-ms.Alloc)
+	runtime.GC()
+	runtime.ReadMemStats(&ms)
+	if abs64(baseline-int64(ms.Alloc)) > 10<<20 {
+		t.Fatalf("failed to set up baseline live heap; got %d MB, want %d MB", ms.Alloc>>20, baseline>>20)
+	}
+	// NextGC should be ~200 MB.
+	const thresh = 20 << 20 // TODO: Figure out why this is so noisy on some builders
+	if want := int64(2 * baseline); abs64(want-int64(ms.NextGC)) > thresh {
+		t.Errorf("NextGC = %d MB, want %d±%d MB", ms.NextGC>>20, want>>20, thresh>>20)
+	}
+	// Create some garbage, but not enough to trigger another GC.
+	for i := 0; i < int(1.2*baseline); i += 1 << 10 {
+		setGCPercentSink = make([]byte, 1<<10)
+	}
+	setGCPercentSink = nil
+	// Adjust GOGC to 50. NextGC should be ~150 MB.
+	SetGCPercent(50)
+	runtime.ReadMemStats(&ms)
+	if want := int64(1.5 * baseline); abs64(want-int64(ms.NextGC)) > thresh {
+		t.Errorf("NextGC = %d MB, want %d±%d MB", ms.NextGC>>20, want>>20, thresh>>20)
+	}
+
+	// Trigger a GC and get back to 100 MB live with GOGC=100.
+	SetGCPercent(100)
+	runtime.GC()
+	// Raise live to 120 MB.
+	setGCPercentSink = make([]byte, int(0.2*baseline))
+	// Lower GOGC to 10. This must force a GC.
+	runtime.ReadMemStats(&ms)
+	ngc1 := ms.NumGC
+	SetGCPercent(10)
+	// It may require an allocation to actually force the GC.
+	setGCPercentSink = make([]byte, 1<<20)
+	runtime.ReadMemStats(&ms)
+	ngc2 := ms.NumGC
+	if ngc1 == ngc2 {
+		t.Errorf("expected GC to run but it did not")
+	}
+}
+
+func abs64(a int64) int64 {
+	if a < 0 {
+		return -a
+	}
+	return a
+}
+
+func TestSetMaxThreadsOvf(t *testing.T) {
+	// Verify that a big threads count will not overflow the int32
+	// maxmcount variable, causing a panic (see Issue 16076).
+	//
+	// This can only happen when ints are 64 bits, since on platforms
+	// with 32 bit ints SetMaxThreads (which takes an int parameter)
+	// cannot be given anything that will overflow an int32.
+	//
+	// Call SetMaxThreads with 1<<31, but only on 64 bit systems.
+	nt := SetMaxThreads(1 << (30 + ^uint(0)>>63))
+	SetMaxThreads(nt) // restore previous value
+}
diff --git a/src/runtime/debug/heapdump_test.go b/src/runtime/debug/heapdump_test.go
new file mode 100644
index 0000000..768934d
--- /dev/null
+++ b/src/runtime/debug/heapdump_test.go
@@ -0,0 +1,69 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug_test
+
+import (
+	"os"
+	"runtime"
+	. "runtime/debug"
+	"testing"
+)
+
+func TestWriteHeapDumpNonempty(t *testing.T) {
+	if runtime.GOOS == "js" {
+		t.Skipf("WriteHeapDump is not available on %s.", runtime.GOOS)
+	}
+	f, err := os.CreateTemp("", "heapdumptest")
+	if err != nil {
+		t.Fatalf("TempFile failed: %v", err)
+	}
+	defer os.Remove(f.Name())
+	defer f.Close()
+	WriteHeapDump(f.Fd())
+	fi, err := f.Stat()
+	if err != nil {
+		t.Fatalf("Stat failed: %v", err)
+	}
+	const minSize = 1
+	if size := fi.Size(); size < minSize {
+		t.Fatalf("Heap dump size %d bytes, expected at least %d bytes", size, minSize)
+	}
+}
+
+type Obj struct {
+	x, y int
+}
+
+func objfin(x *Obj) {
+	//println("finalized", x)
+}
+
+func TestWriteHeapDumpFinalizers(t *testing.T) {
+	if runtime.GOOS == "js" {
+		t.Skipf("WriteHeapDump is not available on %s.", runtime.GOOS)
+	}
+	f, err := os.CreateTemp("", "heapdumptest")
+	if err != nil {
+		t.Fatalf("TempFile failed: %v", err)
+	}
+	defer os.Remove(f.Name())
+	defer f.Close()
+
+	// bug 9172: WriteHeapDump couldn't handle more than one finalizer
+	println("allocating objects")
+	x := &Obj{}
+	runtime.SetFinalizer(x, objfin)
+	y := &Obj{}
+	runtime.SetFinalizer(y, objfin)
+
+	// Trigger collection of x and y, queueing of their finalizers.
+	println("starting gc")
+	runtime.GC()
+
+	// Make sure WriteHeapDump doesn't fail with multiple queued finalizers.
+	println("starting dump")
+	WriteHeapDump(f.Fd())
+	println("done dump")
+}
diff --git a/src/runtime/debug/mod.go b/src/runtime/debug/mod.go
new file mode 100644
index 0000000..0381bdc
--- /dev/null
+++ b/src/runtime/debug/mod.go
@@ -0,0 +1,114 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug
+
+import (
+	"strings"
+)
+
+// exported from runtime
+func modinfo() string
+
+// ReadBuildInfo returns the build information embedded
+// in the running binary. The information is available only
+// in binaries built with module support.
+func ReadBuildInfo() (info *BuildInfo, ok bool) {
+	return readBuildInfo(modinfo())
+}
+
+// BuildInfo represents the build information read from
+// the running binary.
+type BuildInfo struct {
+	Path string    // The main package path
+	Main Module    // The module containing the main package
+	Deps []*Module // Module dependencies
+}
+
+// Module represents a module.
+type Module struct {
+	Path    string  // module path
+	Version string  // module version
+	Sum     string  // checksum
+	Replace *Module // replaced by this module
+}
+
+func readBuildInfo(data string) (*BuildInfo, bool) {
+	if len(data) < 32 {
+		return nil, false
+	}
+	data = data[16 : len(data)-16]
+
+	const (
+		pathLine = "path\t"
+		modLine  = "mod\t"
+		depLine  = "dep\t"
+		repLine  = "=>\t"
+	)
+
+	readEntryFirstLine := func(elem []string) (Module, bool) {
+		if len(elem) != 2 && len(elem) != 3 {
+			return Module{}, false
+		}
+		sum := ""
+		if len(elem) == 3 {
+			sum = elem[2]
+		}
+		return Module{
+			Path:    elem[0],
+			Version: elem[1],
+			Sum:     sum,
+		}, true
+	}
+
+	var (
+		info = &BuildInfo{}
+		last *Module
+		line string
+		ok   bool
+	)
+	// Reverse of cmd/go/internal/modload.PackageBuildInfo
+	for len(data) > 0 {
+		i := strings.IndexByte(data, '\n')
+		if i < 0 {
+			break
+		}
+		line, data = data[:i], data[i+1:]
+		switch {
+		case strings.HasPrefix(line, pathLine):
+			elem := line[len(pathLine):]
+			info.Path = elem
+		case strings.HasPrefix(line, modLine):
+			elem := strings.Split(line[len(modLine):], "\t")
+			last = &info.Main
+			*last, ok = readEntryFirstLine(elem)
+			if !ok {
+				return nil, false
+			}
+		case strings.HasPrefix(line, depLine):
+			elem := strings.Split(line[len(depLine):], "\t")
+			last = new(Module)
+			info.Deps = append(info.Deps, last)
+			*last, ok = readEntryFirstLine(elem)
+			if !ok {
+				return nil, false
+			}
+		case strings.HasPrefix(line, repLine):
+			elem := strings.Split(line[len(repLine):], "\t")
+			if len(elem) != 3 {
+				return nil, false
+			}
+			if last == nil {
+				return nil, false
+			}
+			last.Replace = &Module{
+				Path:    elem[0],
+				Version: elem[1],
+				Sum:     elem[2],
+			}
+			last = nil
+		}
+	}
+	return info, true
+}
diff --git a/src/runtime/debug/panic_test.go b/src/runtime/debug/panic_test.go
new file mode 100644
index 0000000..b67a3de
--- /dev/null
+++ b/src/runtime/debug/panic_test.go
@@ -0,0 +1,53 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly freebsd linux netbsd openbsd
+
+// TODO: test on Windows?
+
+package debug_test
+
+import (
+	"runtime"
+	"runtime/debug"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+func TestPanicOnFault(t *testing.T) {
+	if runtime.GOARCH == "s390x" {
+		t.Skip("s390x fault addresses are missing the low order bits")
+	}
+	if runtime.GOOS == "ios" {
+		t.Skip("iOS doesn't provide fault addresses")
+	}
+	m, err := syscall.Mmap(-1, 0, 0x1000, syscall.PROT_READ /* Note: no PROT_WRITE */, syscall.MAP_SHARED|syscall.MAP_ANON)
+	if err != nil {
+		t.Fatalf("can't map anonymous memory: %s", err)
+	}
+	defer syscall.Munmap(m)
+	old := debug.SetPanicOnFault(true)
+	defer debug.SetPanicOnFault(old)
+	const lowBits = 0x3e7
+	defer func() {
+		r := recover()
+		if r == nil {
+			t.Fatalf("write did not fault")
+		}
+		type addressable interface {
+			Addr() uintptr
+		}
+		a, ok := r.(addressable)
+		if !ok {
+			t.Fatalf("fault does not contain address")
+		}
+		want := uintptr(unsafe.Pointer(&m[lowBits]))
+		got := a.Addr()
+		if got != want {
+			t.Fatalf("fault address %x, want %x", got, want)
+		}
+	}()
+	m[lowBits] = 1 // will fault
+}
diff --git a/src/runtime/debug/stack.go b/src/runtime/debug/stack.go
new file mode 100644
index 0000000..5d810af
--- /dev/null
+++ b/src/runtime/debug/stack.go
@@ -0,0 +1,30 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package debug contains facilities for programs to debug themselves while
+// they are running.
+package debug
+
+import (
+	"os"
+	"runtime"
+)
+
+// PrintStack prints to standard error the stack trace returned by runtime.Stack.
+func PrintStack() {
+	os.Stderr.Write(Stack())
+}
+
+// Stack returns a formatted stack trace of the goroutine that calls it.
+// It calls runtime.Stack with a large enough buffer to capture the entire trace.
+func Stack() []byte {
+	buf := make([]byte, 1024)
+	for {
+		n := runtime.Stack(buf, false)
+		if n < len(buf) {
+			return buf[:n]
+		}
+		buf = make([]byte, 2*len(buf))
+	}
+}
diff --git a/src/runtime/debug/stack_test.go b/src/runtime/debug/stack_test.go
new file mode 100644
index 0000000..9376e82
--- /dev/null
+++ b/src/runtime/debug/stack_test.go
@@ -0,0 +1,65 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug_test
+
+import (
+	. "runtime/debug"
+	"strings"
+	"testing"
+)
+
+type T int
+
+func (t *T) ptrmethod() []byte {
+	return Stack()
+}
+func (t T) method() []byte {
+	return t.ptrmethod()
+}
+
+/*
+	The traceback should look something like this, modulo line numbers and hex constants.
+	Don't worry much about the base levels, but check the ones in our own package.
+
+		goroutine 10 [running]:
+		runtime/debug.Stack(0x0, 0x0, 0x0)
+			/Users/r/go/src/runtime/debug/stack.go:28 +0x80
+		runtime/debug.(*T).ptrmethod(0xc82005ee70, 0x0, 0x0, 0x0)
+			/Users/r/go/src/runtime/debug/stack_test.go:15 +0x29
+		runtime/debug.T.method(0x0, 0x0, 0x0, 0x0)
+			/Users/r/go/src/runtime/debug/stack_test.go:18 +0x32
+		runtime/debug.TestStack(0xc8201ce000)
+			/Users/r/go/src/runtime/debug/stack_test.go:37 +0x38
+		testing.tRunner(0xc8201ce000, 0x664b58)
+			/Users/r/go/src/testing/testing.go:456 +0x98
+		created by testing.RunTests
+			/Users/r/go/src/testing/testing.go:561 +0x86d
+*/
+func TestStack(t *testing.T) {
+	b := T(0).method()
+	lines := strings.Split(string(b), "\n")
+	if len(lines) < 6 {
+		t.Fatal("too few lines")
+	}
+	n := 0
+	frame := func(line, code string) {
+		check(t, lines[n], code)
+		n++
+		check(t, lines[n], line)
+		n++
+	}
+	n++
+	frame("src/runtime/debug/stack.go", "runtime/debug.Stack")
+	frame("src/runtime/debug/stack_test.go", "runtime/debug_test.(*T).ptrmethod")
+	frame("src/runtime/debug/stack_test.go", "runtime/debug_test.T.method")
+	frame("src/runtime/debug/stack_test.go", "runtime/debug_test.TestStack")
+	frame("src/testing/testing.go", "")
+}
+
+func check(t *testing.T, line, has string) {
+	if !strings.Contains(line, has) {
+		t.Errorf("expected %q in %q", has, line)
+	}
+}
diff --git a/src/runtime/debug/stubs.go b/src/runtime/debug/stubs.go
new file mode 100644
index 0000000..2cba136
--- /dev/null
+++ b/src/runtime/debug/stubs.go
@@ -0,0 +1,17 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug
+
+import (
+	"time"
+)
+
+// Implemented in package runtime.
+func readGCStats(*[]time.Duration)
+func freeOSMemory()
+func setMaxStack(int) int
+func setGCPercent(int32) int32
+func setPanicOnFault(bool) bool
+func setMaxThreads(int) int
diff --git a/src/runtime/debug_test.go b/src/runtime/debug_test.go
new file mode 100644
index 0000000..a0b3f84
--- /dev/null
+++ b/src/runtime/debug_test.go
@@ -0,0 +1,249 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// TODO: This test could be implemented on all (most?) UNIXes if we
+// added syscall.Tgkill more widely.
+
+// We skip all of these tests under race mode because our test thread
+// spends all of its time in the race runtime, which isn't a safe
+// point.
+
+// +build amd64
+// +build linux
+// +build !race
+
+package runtime_test
+
+import (
+	"fmt"
+	"os"
+	"regexp"
+	"runtime"
+	"runtime/debug"
+	"sync/atomic"
+	"syscall"
+	"testing"
+)
+
+func startDebugCallWorker(t *testing.T) (g *runtime.G, after func()) {
+	// This can deadlock if run under a debugger because it
+	// depends on catching SIGTRAP, which is usually swallowed by
+	// a debugger.
+	skipUnderDebugger(t)
+
+	// This can deadlock if there aren't enough threads or if a GC
+	// tries to interrupt an atomic loop (see issue #10958). We
+	// use 8 Ps so there's room for the debug call worker,
+	// something that's trying to preempt the call worker, and the
+	// goroutine that's trying to stop the call worker.
+	ogomaxprocs := runtime.GOMAXPROCS(8)
+	ogcpercent := debug.SetGCPercent(-1)
+
+	// ready is a buffered channel so debugCallWorker won't block
+	// on sending to it. This makes it less likely we'll catch
+	// debugCallWorker while it's in the runtime.
+	ready := make(chan *runtime.G, 1)
+	var stop uint32
+	done := make(chan error)
+	go debugCallWorker(ready, &stop, done)
+	g = <-ready
+	return g, func() {
+		atomic.StoreUint32(&stop, 1)
+		err := <-done
+		if err != nil {
+			t.Fatal(err)
+		}
+		runtime.GOMAXPROCS(ogomaxprocs)
+		debug.SetGCPercent(ogcpercent)
+	}
+}
+
+func debugCallWorker(ready chan<- *runtime.G, stop *uint32, done chan<- error) {
+	runtime.LockOSThread()
+	defer runtime.UnlockOSThread()
+
+	ready <- runtime.Getg()
+
+	x := 2
+	debugCallWorker2(stop, &x)
+	if x != 1 {
+		done <- fmt.Errorf("want x = 2, got %d; register pointer not adjusted?", x)
+	}
+	close(done)
+}
+
+// Don't inline this function, since we want to test adjusting
+// pointers in the arguments.
+//
+//go:noinline
+func debugCallWorker2(stop *uint32, x *int) {
+	for atomic.LoadUint32(stop) == 0 {
+		// Strongly encourage x to live in a register so we
+		// can test pointer register adjustment.
+		*x++
+	}
+	*x = 1
+}
+
+func debugCallTKill(tid int) error {
+	return syscall.Tgkill(syscall.Getpid(), tid, syscall.SIGTRAP)
+}
+
+// skipUnderDebugger skips the current test when running under a
+// debugger (specifically if this process has a tracer). This is
+// Linux-specific.
+func skipUnderDebugger(t *testing.T) {
+	pid := syscall.Getpid()
+	status, err := os.ReadFile(fmt.Sprintf("/proc/%d/status", pid))
+	if err != nil {
+		t.Logf("couldn't get proc tracer: %s", err)
+		return
+	}
+	re := regexp.MustCompile(`TracerPid:\s+([0-9]+)`)
+	sub := re.FindSubmatch(status)
+	if sub == nil {
+		t.Logf("couldn't find proc tracer PID")
+		return
+	}
+	if string(sub[1]) == "0" {
+		return
+	}
+	t.Skip("test will deadlock under a debugger")
+}
+
+func TestDebugCall(t *testing.T) {
+	g, after := startDebugCallWorker(t)
+	defer after()
+
+	// Inject a call into the debugCallWorker goroutine and test
+	// basic argument and result passing.
+	var args struct {
+		x    int
+		yRet int
+	}
+	fn := func(x int) (yRet int) {
+		return x + 1
+	}
+	args.x = 42
+	if _, err := runtime.InjectDebugCall(g, fn, &args, debugCallTKill, false); err != nil {
+		t.Fatal(err)
+	}
+	if args.yRet != 43 {
+		t.Fatalf("want 43, got %d", args.yRet)
+	}
+}
+
+func TestDebugCallLarge(t *testing.T) {
+	g, after := startDebugCallWorker(t)
+	defer after()
+
+	// Inject a call with a large call frame.
+	const N = 128
+	var args struct {
+		in  [N]int
+		out [N]int
+	}
+	fn := func(in [N]int) (out [N]int) {
+		for i := range in {
+			out[i] = in[i] + 1
+		}
+		return
+	}
+	var want [N]int
+	for i := range args.in {
+		args.in[i] = i
+		want[i] = i + 1
+	}
+	if _, err := runtime.InjectDebugCall(g, fn, &args, debugCallTKill, false); err != nil {
+		t.Fatal(err)
+	}
+	if want != args.out {
+		t.Fatalf("want %v, got %v", want, args.out)
+	}
+}
+
+func TestDebugCallGC(t *testing.T) {
+	g, after := startDebugCallWorker(t)
+	defer after()
+
+	// Inject a call that performs a GC.
+	if _, err := runtime.InjectDebugCall(g, runtime.GC, nil, debugCallTKill, false); err != nil {
+		t.Fatal(err)
+	}
+}
+
+func TestDebugCallGrowStack(t *testing.T) {
+	g, after := startDebugCallWorker(t)
+	defer after()
+
+	// Inject a call that grows the stack. debugCallWorker checks
+	// for stack pointer breakage.
+	if _, err := runtime.InjectDebugCall(g, func() { growStack(nil) }, nil, debugCallTKill, false); err != nil {
+		t.Fatal(err)
+	}
+}
+
+//go:nosplit
+func debugCallUnsafePointWorker(gpp **runtime.G, ready, stop *uint32) {
+	// The nosplit causes this function to not contain safe-points
+	// except at calls.
+	runtime.LockOSThread()
+	defer runtime.UnlockOSThread()
+
+	*gpp = runtime.Getg()
+
+	for atomic.LoadUint32(stop) == 0 {
+		atomic.StoreUint32(ready, 1)
+	}
+}
+
+func TestDebugCallUnsafePoint(t *testing.T) {
+	skipUnderDebugger(t)
+
+	// This can deadlock if there aren't enough threads or if a GC
+	// tries to interrupt an atomic loop (see issue #10958).
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(8))
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+
+	// Test that the runtime refuses call injection at unsafe points.
+	var g *runtime.G
+	var ready, stop uint32
+	defer atomic.StoreUint32(&stop, 1)
+	go debugCallUnsafePointWorker(&g, &ready, &stop)
+	for atomic.LoadUint32(&ready) == 0 {
+		runtime.Gosched()
+	}
+
+	_, err := runtime.InjectDebugCall(g, func() {}, nil, debugCallTKill, true)
+	if msg := "call not at safe point"; err == nil || err.Error() != msg {
+		t.Fatalf("want %q, got %s", msg, err)
+	}
+}
+
+func TestDebugCallPanic(t *testing.T) {
+	skipUnderDebugger(t)
+
+	// This can deadlock if there aren't enough threads.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(8))
+
+	ready := make(chan *runtime.G)
+	var stop uint32
+	defer atomic.StoreUint32(&stop, 1)
+	go func() {
+		runtime.LockOSThread()
+		defer runtime.UnlockOSThread()
+		ready <- runtime.Getg()
+		for atomic.LoadUint32(&stop) == 0 {
+		}
+	}()
+	g := <-ready
+
+	p, err := runtime.InjectDebugCall(g, func() { panic("test") }, nil, debugCallTKill, false)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if ps, ok := p.(string); !ok || ps != "test" {
+		t.Fatalf("wanted panic %v, got %v", "test", p)
+	}
+}
diff --git a/src/runtime/debugcall.go b/src/runtime/debugcall.go
new file mode 100644
index 0000000..efc68a7
--- /dev/null
+++ b/src/runtime/debugcall.go
@@ -0,0 +1,241 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build amd64
+
+package runtime
+
+import "unsafe"
+
+const (
+	debugCallSystemStack = "executing on Go runtime stack"
+	debugCallUnknownFunc = "call from unknown function"
+	debugCallRuntime     = "call from within the Go runtime"
+	debugCallUnsafePoint = "call not at safe point"
+)
+
+func debugCallV1()
+func debugCallPanicked(val interface{})
+
+// debugCallCheck checks whether it is safe to inject a debugger
+// function call with return PC pc. If not, it returns a string
+// explaining why.
+//
+//go:nosplit
+func debugCallCheck(pc uintptr) string {
+	// No user calls from the system stack.
+	if getg() != getg().m.curg {
+		return debugCallSystemStack
+	}
+	if sp := getcallersp(); !(getg().stack.lo < sp && sp <= getg().stack.hi) {
+		// Fast syscalls (nanotime) and racecall switch to the
+		// g0 stack without switching g. We can't safely make
+		// a call in this state. (We can't even safely
+		// systemstack.)
+		return debugCallSystemStack
+	}
+
+	// Switch to the system stack to avoid overflowing the user
+	// stack.
+	var ret string
+	systemstack(func() {
+		f := findfunc(pc)
+		if !f.valid() {
+			ret = debugCallUnknownFunc
+			return
+		}
+
+		name := funcname(f)
+
+		switch name {
+		case "debugCall32",
+			"debugCall64",
+			"debugCall128",
+			"debugCall256",
+			"debugCall512",
+			"debugCall1024",
+			"debugCall2048",
+			"debugCall4096",
+			"debugCall8192",
+			"debugCall16384",
+			"debugCall32768",
+			"debugCall65536":
+			// These functions are allowed so that the debugger can initiate multiple function calls.
+			// See: https://golang.org/cl/161137/
+			return
+		}
+
+		// Disallow calls from the runtime. We could
+		// potentially make this condition tighter (e.g., not
+		// when locks are held), but there are enough tightly
+		// coded sequences (e.g., defer handling) that it's
+		// better to play it safe.
+		if pfx := "runtime."; len(name) > len(pfx) && name[:len(pfx)] == pfx {
+			ret = debugCallRuntime
+			return
+		}
+
+		// Check that this isn't an unsafe-point.
+		if pc != f.entry {
+			pc--
+		}
+		up := pcdatavalue(f, _PCDATA_UnsafePoint, pc, nil)
+		if up != _PCDATA_UnsafePointSafe {
+			// Not at a safe point.
+			ret = debugCallUnsafePoint
+		}
+	})
+	return ret
+}
+
+// debugCallWrap starts a new goroutine to run a debug call and blocks
+// the calling goroutine. On the goroutine, it prepares to recover
+// panics from the debug call, and then calls the call dispatching
+// function at PC dispatch.
+//
+// This must be deeply nosplit because there are untyped values on the
+// stack from debugCallV1.
+//
+//go:nosplit
+func debugCallWrap(dispatch uintptr) {
+	var lockedm bool
+	var lockedExt uint32
+	callerpc := getcallerpc()
+	gp := getg()
+
+	// Create a new goroutine to execute the call on. Run this on
+	// the system stack to avoid growing our stack.
+	systemstack(func() {
+		var args struct {
+			dispatch uintptr
+			callingG *g
+		}
+		args.dispatch = dispatch
+		args.callingG = gp
+		fn := debugCallWrap1
+		newg := newproc1(*(**funcval)(unsafe.Pointer(&fn)), unsafe.Pointer(&args), int32(unsafe.Sizeof(args)), gp, callerpc)
+
+		// If the current G is locked, then transfer that
+		// locked-ness to the new goroutine.
+		if gp.lockedm != 0 {
+			// Save lock state to restore later.
+			mp := gp.m
+			if mp != gp.lockedm.ptr() {
+				throw("inconsistent lockedm")
+			}
+
+			lockedm = true
+			lockedExt = mp.lockedExt
+
+			// Transfer external lock count to internal so
+			// it can't be unlocked from the debug call.
+			mp.lockedInt++
+			mp.lockedExt = 0
+
+			mp.lockedg.set(newg)
+			newg.lockedm.set(mp)
+			gp.lockedm = 0
+		}
+
+		// Mark the calling goroutine as being at an async
+		// safe-point, since it has a few conservative frames
+		// at the bottom of the stack. This also prevents
+		// stack shrinks.
+		gp.asyncSafePoint = true
+
+		// Stash newg away so we can execute it below (mcall's
+		// closure can't capture anything).
+		gp.schedlink.set(newg)
+	})
+
+	// Switch to the new goroutine.
+	mcall(func(gp *g) {
+		// Get newg.
+		newg := gp.schedlink.ptr()
+		gp.schedlink = 0
+
+		// Park the calling goroutine.
+		gp.waitreason = waitReasonDebugCall
+		if trace.enabled {
+			traceGoPark(traceEvGoBlock, 1)
+		}
+		casgstatus(gp, _Grunning, _Gwaiting)
+		dropg()
+
+		// Directly execute the new goroutine. The debug
+		// protocol will continue on the new goroutine, so
+		// it's important we not just let the scheduler do
+		// this or it may resume a different goroutine.
+		execute(newg, true)
+	})
+
+	// We'll resume here when the call returns.
+
+	// Restore locked state.
+	if lockedm {
+		mp := gp.m
+		mp.lockedExt = lockedExt
+		mp.lockedInt--
+		mp.lockedg.set(gp)
+		gp.lockedm.set(mp)
+	}
+
+	gp.asyncSafePoint = false
+}
+
+// debugCallWrap1 is the continuation of debugCallWrap on the callee
+// goroutine.
+func debugCallWrap1(dispatch uintptr, callingG *g) {
+	// Dispatch call and trap panics.
+	debugCallWrap2(dispatch)
+
+	// Resume the caller goroutine.
+	getg().schedlink.set(callingG)
+	mcall(func(gp *g) {
+		callingG := gp.schedlink.ptr()
+		gp.schedlink = 0
+
+		// Unlock this goroutine from the M if necessary. The
+		// calling G will relock.
+		if gp.lockedm != 0 {
+			gp.lockedm = 0
+			gp.m.lockedg = 0
+		}
+
+		// Switch back to the calling goroutine. At some point
+		// the scheduler will schedule us again and we'll
+		// finish exiting.
+		if trace.enabled {
+			traceGoSched()
+		}
+		casgstatus(gp, _Grunning, _Grunnable)
+		dropg()
+		lock(&sched.lock)
+		globrunqput(gp)
+		unlock(&sched.lock)
+
+		if trace.enabled {
+			traceGoUnpark(callingG, 0)
+		}
+		casgstatus(callingG, _Gwaiting, _Grunnable)
+		execute(callingG, true)
+	})
+}
+
+func debugCallWrap2(dispatch uintptr) {
+	// Call the dispatch function and trap panics.
+	var dispatchF func()
+	dispatchFV := funcval{dispatch}
+	*(*unsafe.Pointer)(unsafe.Pointer(&dispatchF)) = noescape(unsafe.Pointer(&dispatchFV))
+
+	var ok bool
+	defer func() {
+		if !ok {
+			err := recover()
+			debugCallPanicked(err)
+		}
+	}()
+	dispatchF()
+	ok = true
+}
diff --git a/src/runtime/debuglog.go b/src/runtime/debuglog.go
new file mode 100644
index 0000000..3ce3273
--- /dev/null
+++ b/src/runtime/debuglog.go
@@ -0,0 +1,820 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file provides an internal debug logging facility. The debug
+// log is a lightweight, in-memory, per-M ring buffer. By default, the
+// runtime prints the debug log on panic.
+//
+// To print something to the debug log, call dlog to obtain a dlogger
+// and use the methods on that to add values. The values will be
+// space-separated in the output (much like println).
+//
+// This facility can be enabled by passing -tags debuglog when
+// building. Without this tag, dlog calls compile to nothing.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// debugLogBytes is the size of each per-M ring buffer. This is
+// allocated off-heap to avoid blowing up the M and hence the GC'd
+// heap size.
+const debugLogBytes = 16 << 10
+
+// debugLogStringLimit is the maximum number of bytes in a string.
+// Above this, the string will be truncated with "..(n more bytes).."
+const debugLogStringLimit = debugLogBytes / 8
+
+// dlog returns a debug logger. The caller can use methods on the
+// returned logger to add values, which will be space-separated in the
+// final output, much like println. The caller must call end() to
+// finish the message.
+//
+// dlog can be used from highly-constrained corners of the runtime: it
+// is safe to use in the signal handler, from within the write
+// barrier, from within the stack implementation, and in places that
+// must be recursively nosplit.
+//
+// This will be compiled away if built without the debuglog build tag.
+// However, argument construction may not be. If any of the arguments
+// are not literals or trivial expressions, consider protecting the
+// call with "if dlogEnabled".
+//
+//go:nosplit
+//go:nowritebarrierrec
+func dlog() *dlogger {
+	if !dlogEnabled {
+		return nil
+	}
+
+	// Get the time.
+	tick, nano := uint64(cputicks()), uint64(nanotime())
+
+	// Try to get a cached logger.
+	l := getCachedDlogger()
+
+	// If we couldn't get a cached logger, try to get one from the
+	// global pool.
+	if l == nil {
+		allp := (*uintptr)(unsafe.Pointer(&allDloggers))
+		all := (*dlogger)(unsafe.Pointer(atomic.Loaduintptr(allp)))
+		for l1 := all; l1 != nil; l1 = l1.allLink {
+			if atomic.Load(&l1.owned) == 0 && atomic.Cas(&l1.owned, 0, 1) {
+				l = l1
+				break
+			}
+		}
+	}
+
+	// If that failed, allocate a new logger.
+	if l == nil {
+		l = (*dlogger)(sysAlloc(unsafe.Sizeof(dlogger{}), nil))
+		if l == nil {
+			throw("failed to allocate debug log")
+		}
+		l.w.r.data = &l.w.data
+		l.owned = 1
+
+		// Prepend to allDloggers list.
+		headp := (*uintptr)(unsafe.Pointer(&allDloggers))
+		for {
+			head := atomic.Loaduintptr(headp)
+			l.allLink = (*dlogger)(unsafe.Pointer(head))
+			if atomic.Casuintptr(headp, head, uintptr(unsafe.Pointer(l))) {
+				break
+			}
+		}
+	}
+
+	// If the time delta is getting too high, write a new sync
+	// packet. We set the limit so we don't write more than 6
+	// bytes of delta in the record header.
+	const deltaLimit = 1<<(3*7) - 1 // ~2ms between sync packets
+	if tick-l.w.tick > deltaLimit || nano-l.w.nano > deltaLimit {
+		l.w.writeSync(tick, nano)
+	}
+
+	// Reserve space for framing header.
+	l.w.ensure(debugLogHeaderSize)
+	l.w.write += debugLogHeaderSize
+
+	// Write record header.
+	l.w.uvarint(tick - l.w.tick)
+	l.w.uvarint(nano - l.w.nano)
+	gp := getg()
+	if gp != nil && gp.m != nil && gp.m.p != 0 {
+		l.w.varint(int64(gp.m.p.ptr().id))
+	} else {
+		l.w.varint(-1)
+	}
+
+	return l
+}
+
+// A dlogger writes to the debug log.
+//
+// To obtain a dlogger, call dlog(). When done with the dlogger, call
+// end().
+//
+//go:notinheap
+type dlogger struct {
+	w debugLogWriter
+
+	// allLink is the next dlogger in the allDloggers list.
+	allLink *dlogger
+
+	// owned indicates that this dlogger is owned by an M. This is
+	// accessed atomically.
+	owned uint32
+}
+
+// allDloggers is a list of all dloggers, linked through
+// dlogger.allLink. This is accessed atomically. This is prepend only,
+// so it doesn't need to protect against ABA races.
+var allDloggers *dlogger
+
+//go:nosplit
+func (l *dlogger) end() {
+	if !dlogEnabled {
+		return
+	}
+
+	// Fill in framing header.
+	size := l.w.write - l.w.r.end
+	if !l.w.writeFrameAt(l.w.r.end, size) {
+		throw("record too large")
+	}
+
+	// Commit the record.
+	l.w.r.end = l.w.write
+
+	// Attempt to return this logger to the cache.
+	if putCachedDlogger(l) {
+		return
+	}
+
+	// Return the logger to the global pool.
+	atomic.Store(&l.owned, 0)
+}
+
+const (
+	debugLogUnknown = 1 + iota
+	debugLogBoolTrue
+	debugLogBoolFalse
+	debugLogInt
+	debugLogUint
+	debugLogHex
+	debugLogPtr
+	debugLogString
+	debugLogConstString
+	debugLogStringOverflow
+
+	debugLogPC
+	debugLogTraceback
+)
+
+//go:nosplit
+func (l *dlogger) b(x bool) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	if x {
+		l.w.byte(debugLogBoolTrue)
+	} else {
+		l.w.byte(debugLogBoolFalse)
+	}
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) i(x int) *dlogger {
+	return l.i64(int64(x))
+}
+
+//go:nosplit
+func (l *dlogger) i8(x int8) *dlogger {
+	return l.i64(int64(x))
+}
+
+//go:nosplit
+func (l *dlogger) i16(x int16) *dlogger {
+	return l.i64(int64(x))
+}
+
+//go:nosplit
+func (l *dlogger) i32(x int32) *dlogger {
+	return l.i64(int64(x))
+}
+
+//go:nosplit
+func (l *dlogger) i64(x int64) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogInt)
+	l.w.varint(x)
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) u(x uint) *dlogger {
+	return l.u64(uint64(x))
+}
+
+//go:nosplit
+func (l *dlogger) uptr(x uintptr) *dlogger {
+	return l.u64(uint64(x))
+}
+
+//go:nosplit
+func (l *dlogger) u8(x uint8) *dlogger {
+	return l.u64(uint64(x))
+}
+
+//go:nosplit
+func (l *dlogger) u16(x uint16) *dlogger {
+	return l.u64(uint64(x))
+}
+
+//go:nosplit
+func (l *dlogger) u32(x uint32) *dlogger {
+	return l.u64(uint64(x))
+}
+
+//go:nosplit
+func (l *dlogger) u64(x uint64) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogUint)
+	l.w.uvarint(x)
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) hex(x uint64) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogHex)
+	l.w.uvarint(x)
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) p(x interface{}) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogPtr)
+	if x == nil {
+		l.w.uvarint(0)
+	} else {
+		v := efaceOf(&x)
+		switch v._type.kind & kindMask {
+		case kindChan, kindFunc, kindMap, kindPtr, kindUnsafePointer:
+			l.w.uvarint(uint64(uintptr(v.data)))
+		default:
+			throw("not a pointer type")
+		}
+	}
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) s(x string) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	str := stringStructOf(&x)
+	datap := &firstmoduledata
+	if len(x) > 4 && datap.etext <= uintptr(str.str) && uintptr(str.str) < datap.end {
+		// String constants are in the rodata section, which
+		// isn't recorded in moduledata. But it has to be
+		// somewhere between etext and end.
+		l.w.byte(debugLogConstString)
+		l.w.uvarint(uint64(str.len))
+		l.w.uvarint(uint64(uintptr(str.str) - datap.etext))
+	} else {
+		l.w.byte(debugLogString)
+		var b []byte
+		bb := (*slice)(unsafe.Pointer(&b))
+		bb.array = str.str
+		bb.len, bb.cap = str.len, str.len
+		if len(b) > debugLogStringLimit {
+			b = b[:debugLogStringLimit]
+		}
+		l.w.uvarint(uint64(len(b)))
+		l.w.bytes(b)
+		if len(b) != len(x) {
+			l.w.byte(debugLogStringOverflow)
+			l.w.uvarint(uint64(len(x) - len(b)))
+		}
+	}
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) pc(x uintptr) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogPC)
+	l.w.uvarint(uint64(x))
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) traceback(x []uintptr) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogTraceback)
+	l.w.uvarint(uint64(len(x)))
+	for _, pc := range x {
+		l.w.uvarint(uint64(pc))
+	}
+	return l
+}
+
+// A debugLogWriter is a ring buffer of binary debug log records.
+//
+// A log record consists of a 2-byte framing header and a sequence of
+// fields. The framing header gives the size of the record as a little
+// endian 16-bit value. Each field starts with a byte indicating its
+// type, followed by type-specific data. If the size in the framing
+// header is 0, it's a sync record consisting of two little endian
+// 64-bit values giving a new time base.
+//
+// Because this is a ring buffer, new records will eventually
+// overwrite old records. Hence, it maintains a reader that consumes
+// the log as it gets overwritten. That reader state is where an
+// actual log reader would start.
+//
+//go:notinheap
+type debugLogWriter struct {
+	write uint64
+	data  debugLogBuf
+
+	// tick and nano are the time bases from the most recently
+	// written sync record.
+	tick, nano uint64
+
+	// r is a reader that consumes records as they get overwritten
+	// by the writer. It also acts as the initial reader state
+	// when printing the log.
+	r debugLogReader
+
+	// buf is a scratch buffer for encoding. This is here to
+	// reduce stack usage.
+	buf [10]byte
+}
+
+//go:notinheap
+type debugLogBuf [debugLogBytes]byte
+
+const (
+	// debugLogHeaderSize is the number of bytes in the framing
+	// header of every dlog record.
+	debugLogHeaderSize = 2
+
+	// debugLogSyncSize is the number of bytes in a sync record.
+	debugLogSyncSize = debugLogHeaderSize + 2*8
+)
+
+//go:nosplit
+func (l *debugLogWriter) ensure(n uint64) {
+	for l.write+n >= l.r.begin+uint64(len(l.data)) {
+		// Consume record at begin.
+		if l.r.skip() == ^uint64(0) {
+			// Wrapped around within a record.
+			//
+			// TODO(austin): It would be better to just
+			// eat the whole buffer at this point, but we
+			// have to communicate that to the reader
+			// somehow.
+			throw("record wrapped around")
+		}
+	}
+}
+
+//go:nosplit
+func (l *debugLogWriter) writeFrameAt(pos, size uint64) bool {
+	l.data[pos%uint64(len(l.data))] = uint8(size)
+	l.data[(pos+1)%uint64(len(l.data))] = uint8(size >> 8)
+	return size <= 0xFFFF
+}
+
+//go:nosplit
+func (l *debugLogWriter) writeSync(tick, nano uint64) {
+	l.tick, l.nano = tick, nano
+	l.ensure(debugLogHeaderSize)
+	l.writeFrameAt(l.write, 0)
+	l.write += debugLogHeaderSize
+	l.writeUint64LE(tick)
+	l.writeUint64LE(nano)
+	l.r.end = l.write
+}
+
+//go:nosplit
+func (l *debugLogWriter) writeUint64LE(x uint64) {
+	var b [8]byte
+	b[0] = byte(x)
+	b[1] = byte(x >> 8)
+	b[2] = byte(x >> 16)
+	b[3] = byte(x >> 24)
+	b[4] = byte(x >> 32)
+	b[5] = byte(x >> 40)
+	b[6] = byte(x >> 48)
+	b[7] = byte(x >> 56)
+	l.bytes(b[:])
+}
+
+//go:nosplit
+func (l *debugLogWriter) byte(x byte) {
+	l.ensure(1)
+	pos := l.write
+	l.write++
+	l.data[pos%uint64(len(l.data))] = x
+}
+
+//go:nosplit
+func (l *debugLogWriter) bytes(x []byte) {
+	l.ensure(uint64(len(x)))
+	pos := l.write
+	l.write += uint64(len(x))
+	for len(x) > 0 {
+		n := copy(l.data[pos%uint64(len(l.data)):], x)
+		pos += uint64(n)
+		x = x[n:]
+	}
+}
+
+//go:nosplit
+func (l *debugLogWriter) varint(x int64) {
+	var u uint64
+	if x < 0 {
+		u = (^uint64(x) << 1) | 1 // complement i, bit 0 is 1
+	} else {
+		u = (uint64(x) << 1) // do not complement i, bit 0 is 0
+	}
+	l.uvarint(u)
+}
+
+//go:nosplit
+func (l *debugLogWriter) uvarint(u uint64) {
+	i := 0
+	for u >= 0x80 {
+		l.buf[i] = byte(u) | 0x80
+		u >>= 7
+		i++
+	}
+	l.buf[i] = byte(u)
+	i++
+	l.bytes(l.buf[:i])
+}
+
+type debugLogReader struct {
+	data *debugLogBuf
+
+	// begin and end are the positions in the log of the beginning
+	// and end of the log data, modulo len(data).
+	begin, end uint64
+
+	// tick and nano are the current time base at begin.
+	tick, nano uint64
+}
+
+//go:nosplit
+func (r *debugLogReader) skip() uint64 {
+	// Read size at pos.
+	if r.begin+debugLogHeaderSize > r.end {
+		return ^uint64(0)
+	}
+	size := uint64(r.readUint16LEAt(r.begin))
+	if size == 0 {
+		// Sync packet.
+		r.tick = r.readUint64LEAt(r.begin + debugLogHeaderSize)
+		r.nano = r.readUint64LEAt(r.begin + debugLogHeaderSize + 8)
+		size = debugLogSyncSize
+	}
+	if r.begin+size > r.end {
+		return ^uint64(0)
+	}
+	r.begin += size
+	return size
+}
+
+//go:nosplit
+func (r *debugLogReader) readUint16LEAt(pos uint64) uint16 {
+	return uint16(r.data[pos%uint64(len(r.data))]) |
+		uint16(r.data[(pos+1)%uint64(len(r.data))])<<8
+}
+
+//go:nosplit
+func (r *debugLogReader) readUint64LEAt(pos uint64) uint64 {
+	var b [8]byte
+	for i := range b {
+		b[i] = r.data[pos%uint64(len(r.data))]
+		pos++
+	}
+	return uint64(b[0]) | uint64(b[1])<<8 |
+		uint64(b[2])<<16 | uint64(b[3])<<24 |
+		uint64(b[4])<<32 | uint64(b[5])<<40 |
+		uint64(b[6])<<48 | uint64(b[7])<<56
+}
+
+func (r *debugLogReader) peek() (tick uint64) {
+	// Consume any sync records.
+	size := uint64(0)
+	for size == 0 {
+		if r.begin+debugLogHeaderSize > r.end {
+			return ^uint64(0)
+		}
+		size = uint64(r.readUint16LEAt(r.begin))
+		if size != 0 {
+			break
+		}
+		if r.begin+debugLogSyncSize > r.end {
+			return ^uint64(0)
+		}
+		// Sync packet.
+		r.tick = r.readUint64LEAt(r.begin + debugLogHeaderSize)
+		r.nano = r.readUint64LEAt(r.begin + debugLogHeaderSize + 8)
+		r.begin += debugLogSyncSize
+	}
+
+	// Peek tick delta.
+	if r.begin+size > r.end {
+		return ^uint64(0)
+	}
+	pos := r.begin + debugLogHeaderSize
+	var u uint64
+	for i := uint(0); ; i += 7 {
+		b := r.data[pos%uint64(len(r.data))]
+		pos++
+		u |= uint64(b&^0x80) << i
+		if b&0x80 == 0 {
+			break
+		}
+	}
+	if pos > r.begin+size {
+		return ^uint64(0)
+	}
+	return r.tick + u
+}
+
+func (r *debugLogReader) header() (end, tick, nano uint64, p int) {
+	// Read size. We've already skipped sync packets and checked
+	// bounds in peek.
+	size := uint64(r.readUint16LEAt(r.begin))
+	end = r.begin + size
+	r.begin += debugLogHeaderSize
+
+	// Read tick, nano, and p.
+	tick = r.uvarint() + r.tick
+	nano = r.uvarint() + r.nano
+	p = int(r.varint())
+
+	return
+}
+
+func (r *debugLogReader) uvarint() uint64 {
+	var u uint64
+	for i := uint(0); ; i += 7 {
+		b := r.data[r.begin%uint64(len(r.data))]
+		r.begin++
+		u |= uint64(b&^0x80) << i
+		if b&0x80 == 0 {
+			break
+		}
+	}
+	return u
+}
+
+func (r *debugLogReader) varint() int64 {
+	u := r.uvarint()
+	var v int64
+	if u&1 == 0 {
+		v = int64(u >> 1)
+	} else {
+		v = ^int64(u >> 1)
+	}
+	return v
+}
+
+func (r *debugLogReader) printVal() bool {
+	typ := r.data[r.begin%uint64(len(r.data))]
+	r.begin++
+
+	switch typ {
+	default:
+		print("<unknown field type ", hex(typ), " pos ", r.begin-1, " end ", r.end, ">\n")
+		return false
+
+	case debugLogUnknown:
+		print("<unknown kind>")
+
+	case debugLogBoolTrue:
+		print(true)
+
+	case debugLogBoolFalse:
+		print(false)
+
+	case debugLogInt:
+		print(r.varint())
+
+	case debugLogUint:
+		print(r.uvarint())
+
+	case debugLogHex, debugLogPtr:
+		print(hex(r.uvarint()))
+
+	case debugLogString:
+		sl := r.uvarint()
+		if r.begin+sl > r.end {
+			r.begin = r.end
+			print("<string length corrupted>")
+			break
+		}
+		for sl > 0 {
+			b := r.data[r.begin%uint64(len(r.data)):]
+			if uint64(len(b)) > sl {
+				b = b[:sl]
+			}
+			r.begin += uint64(len(b))
+			sl -= uint64(len(b))
+			gwrite(b)
+		}
+
+	case debugLogConstString:
+		len, ptr := int(r.uvarint()), uintptr(r.uvarint())
+		ptr += firstmoduledata.etext
+		str := stringStruct{
+			str: unsafe.Pointer(ptr),
+			len: len,
+		}
+		s := *(*string)(unsafe.Pointer(&str))
+		print(s)
+
+	case debugLogStringOverflow:
+		print("..(", r.uvarint(), " more bytes)..")
+
+	case debugLogPC:
+		printDebugLogPC(uintptr(r.uvarint()), false)
+
+	case debugLogTraceback:
+		n := int(r.uvarint())
+		for i := 0; i < n; i++ {
+			print("\n\t")
+			// gentraceback PCs are always return PCs.
+			// Convert them to call PCs.
+			//
+			// TODO(austin): Expand inlined frames.
+			printDebugLogPC(uintptr(r.uvarint()), true)
+		}
+	}
+
+	return true
+}
+
+// printDebugLog prints the debug log.
+func printDebugLog() {
+	if !dlogEnabled {
+		return
+	}
+
+	// This function should not panic or throw since it is used in
+	// the fatal panic path and this may deadlock.
+
+	printlock()
+
+	// Get the list of all debug logs.
+	allp := (*uintptr)(unsafe.Pointer(&allDloggers))
+	all := (*dlogger)(unsafe.Pointer(atomic.Loaduintptr(allp)))
+
+	// Count the logs.
+	n := 0
+	for l := all; l != nil; l = l.allLink {
+		n++
+	}
+	if n == 0 {
+		printunlock()
+		return
+	}
+
+	// Prepare read state for all logs.
+	type readState struct {
+		debugLogReader
+		first    bool
+		lost     uint64
+		nextTick uint64
+	}
+	state1 := sysAlloc(unsafe.Sizeof(readState{})*uintptr(n), nil)
+	if state1 == nil {
+		println("failed to allocate read state for", n, "logs")
+		printunlock()
+		return
+	}
+	state := (*[1 << 20]readState)(state1)[:n]
+	{
+		l := all
+		for i := range state {
+			s := &state[i]
+			s.debugLogReader = l.w.r
+			s.first = true
+			s.lost = l.w.r.begin
+			s.nextTick = s.peek()
+			l = l.allLink
+		}
+	}
+
+	// Print records.
+	for {
+		// Find the next record.
+		var best struct {
+			tick uint64
+			i    int
+		}
+		best.tick = ^uint64(0)
+		for i := range state {
+			if state[i].nextTick < best.tick {
+				best.tick = state[i].nextTick
+				best.i = i
+			}
+		}
+		if best.tick == ^uint64(0) {
+			break
+		}
+
+		// Print record.
+		s := &state[best.i]
+		if s.first {
+			print(">> begin log ", best.i)
+			if s.lost != 0 {
+				print("; lost first ", s.lost>>10, "KB")
+			}
+			print(" <<\n")
+			s.first = false
+		}
+
+		end, _, nano, p := s.header()
+		oldEnd := s.end
+		s.end = end
+
+		print("[")
+		var tmpbuf [21]byte
+		pnano := int64(nano) - runtimeInitTime
+		if pnano < 0 {
+			// Logged before runtimeInitTime was set.
+			pnano = 0
+		}
+		print(string(itoaDiv(tmpbuf[:], uint64(pnano), 9)))
+		print(" P ", p, "] ")
+
+		for i := 0; s.begin < s.end; i++ {
+			if i > 0 {
+				print(" ")
+			}
+			if !s.printVal() {
+				// Abort this P log.
+				print("<aborting P log>")
+				end = oldEnd
+				break
+			}
+		}
+		println()
+
+		// Move on to the next record.
+		s.begin = end
+		s.end = oldEnd
+		s.nextTick = s.peek()
+	}
+
+	printunlock()
+}
+
+// printDebugLogPC prints a single symbolized PC. If returnPC is true,
+// pc is a return PC that must first be converted to a call PC.
+func printDebugLogPC(pc uintptr, returnPC bool) {
+	fn := findfunc(pc)
+	if returnPC && (!fn.valid() || pc > fn.entry) {
+		// TODO(austin): Don't back up if the previous frame
+		// was a sigpanic.
+		pc--
+	}
+
+	print(hex(pc))
+	if !fn.valid() {
+		print(" [unknown PC]")
+	} else {
+		name := funcname(fn)
+		file, line := funcline(fn, pc)
+		print(" [", name, "+", hex(pc-fn.entry),
+			" ", file, ":", line, "]")
+	}
+}
diff --git a/src/runtime/debuglog_off.go b/src/runtime/debuglog_off.go
new file mode 100644
index 0000000..bb3e172
--- /dev/null
+++ b/src/runtime/debuglog_off.go
@@ -0,0 +1,19 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !debuglog
+
+package runtime
+
+const dlogEnabled = false
+
+type dlogPerM struct{}
+
+func getCachedDlogger() *dlogger {
+	return nil
+}
+
+func putCachedDlogger(l *dlogger) bool {
+	return false
+}
diff --git a/src/runtime/debuglog_on.go b/src/runtime/debuglog_on.go
new file mode 100644
index 0000000..3d477e8
--- /dev/null
+++ b/src/runtime/debuglog_on.go
@@ -0,0 +1,45 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build debuglog
+
+package runtime
+
+const dlogEnabled = true
+
+// dlogPerM is the per-M debug log data. This is embedded in the m
+// struct.
+type dlogPerM struct {
+	dlogCache *dlogger
+}
+
+// getCachedDlogger returns a cached dlogger if it can do so
+// efficiently, or nil otherwise. The returned dlogger will be owned.
+func getCachedDlogger() *dlogger {
+	mp := acquirem()
+	// We don't return a cached dlogger if we're running on the
+	// signal stack in case the signal arrived while in
+	// get/putCachedDlogger. (Too bad we don't have non-atomic
+	// exchange!)
+	var l *dlogger
+	if getg() != mp.gsignal {
+		l = mp.dlogCache
+		mp.dlogCache = nil
+	}
+	releasem(mp)
+	return l
+}
+
+// putCachedDlogger attempts to return l to the local cache. It
+// returns false if this fails.
+func putCachedDlogger(l *dlogger) bool {
+	mp := acquirem()
+	if getg() != mp.gsignal && mp.dlogCache == nil {
+		mp.dlogCache = l
+		releasem(mp)
+		return true
+	}
+	releasem(mp)
+	return false
+}
diff --git a/src/runtime/debuglog_test.go b/src/runtime/debuglog_test.go
new file mode 100644
index 0000000..2570e35
--- /dev/null
+++ b/src/runtime/debuglog_test.go
@@ -0,0 +1,158 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// TODO(austin): All of these tests are skipped if the debuglog build
+// tag isn't provided. That means we basically never test debuglog.
+// There are two potential ways around this:
+//
+// 1. Make these tests re-build the runtime test with the debuglog
+// build tag and re-invoke themselves.
+//
+// 2. Always build the whole debuglog infrastructure and depend on
+// linker dead-code elimination to drop it. This is easy for dlog()
+// since there won't be any calls to it. For printDebugLog, we can
+// make panic call a wrapper that is call printDebugLog if the
+// debuglog build tag is set, or otherwise do nothing. Then tests
+// could call printDebugLog directly. This is the right answer in
+// principle, but currently our linker reads in all symbols
+// regardless, so this would slow down and bloat all links. If the
+// linker gets more efficient about this, we should revisit this
+// approach.
+
+package runtime_test
+
+import (
+	"bytes"
+	"fmt"
+	"regexp"
+	"runtime"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"testing"
+)
+
+func skipDebugLog(t *testing.T) {
+	if !runtime.DlogEnabled {
+		t.Skip("debug log disabled (rebuild with -tags debuglog)")
+	}
+}
+
+func dlogCanonicalize(x string) string {
+	begin := regexp.MustCompile(`(?m)^>> begin log \d+ <<\n`)
+	x = begin.ReplaceAllString(x, "")
+	prefix := regexp.MustCompile(`(?m)^\[[^]]+\]`)
+	x = prefix.ReplaceAllString(x, "[]")
+	return x
+}
+
+func TestDebugLog(t *testing.T) {
+	skipDebugLog(t)
+	runtime.ResetDebugLog()
+	runtime.Dlog().S("testing").End()
+	got := dlogCanonicalize(runtime.DumpDebugLog())
+	if want := "[] testing\n"; got != want {
+		t.Fatalf("want %q, got %q", want, got)
+	}
+}
+
+func TestDebugLogTypes(t *testing.T) {
+	skipDebugLog(t)
+	runtime.ResetDebugLog()
+	var varString = strings.Repeat("a", 4)
+	runtime.Dlog().B(true).B(false).I(-42).I16(0x7fff).U64(^uint64(0)).Hex(0xfff).P(nil).S(varString).S("const string").End()
+	got := dlogCanonicalize(runtime.DumpDebugLog())
+	if want := "[] true false -42 32767 18446744073709551615 0xfff 0x0 aaaa const string\n"; got != want {
+		t.Fatalf("want %q, got %q", want, got)
+	}
+}
+
+func TestDebugLogSym(t *testing.T) {
+	skipDebugLog(t)
+	runtime.ResetDebugLog()
+	pc, _, _, _ := runtime.Caller(0)
+	runtime.Dlog().PC(pc).End()
+	got := dlogCanonicalize(runtime.DumpDebugLog())
+	want := regexp.MustCompile(`\[\] 0x[0-9a-f]+ \[runtime_test\.TestDebugLogSym\+0x[0-9a-f]+ .*/debuglog_test\.go:[0-9]+\]\n`)
+	if !want.MatchString(got) {
+		t.Fatalf("want matching %s, got %q", want, got)
+	}
+}
+
+func TestDebugLogInterleaving(t *testing.T) {
+	skipDebugLog(t)
+	runtime.ResetDebugLog()
+	var wg sync.WaitGroup
+	done := int32(0)
+	wg.Add(1)
+	go func() {
+		// Encourage main goroutine to move around to
+		// different Ms and Ps.
+		for atomic.LoadInt32(&done) == 0 {
+			runtime.Gosched()
+		}
+		wg.Done()
+	}()
+	var want bytes.Buffer
+	for i := 0; i < 1000; i++ {
+		runtime.Dlog().I(i).End()
+		fmt.Fprintf(&want, "[] %d\n", i)
+		runtime.Gosched()
+	}
+	atomic.StoreInt32(&done, 1)
+	wg.Wait()
+
+	gotFull := runtime.DumpDebugLog()
+	got := dlogCanonicalize(gotFull)
+	if got != want.String() {
+		// Since the timestamps are useful in understand
+		// failures of this test, we print the uncanonicalized
+		// output.
+		t.Fatalf("want %q, got (uncanonicalized) %q", want.String(), gotFull)
+	}
+}
+
+func TestDebugLogWraparound(t *testing.T) {
+	skipDebugLog(t)
+
+	// Make sure we don't switch logs so it's easier to fill one up.
+	runtime.LockOSThread()
+	defer runtime.UnlockOSThread()
+
+	runtime.ResetDebugLog()
+	var longString = strings.Repeat("a", 128)
+	var want bytes.Buffer
+	for i, j := 0, 0; j < 2*runtime.DebugLogBytes; i, j = i+1, j+len(longString) {
+		runtime.Dlog().I(i).S(longString).End()
+		fmt.Fprintf(&want, "[] %d %s\n", i, longString)
+	}
+	log := runtime.DumpDebugLog()
+
+	// Check for "lost" message.
+	lost := regexp.MustCompile(`^>> begin log \d+; lost first \d+KB <<\n`)
+	if !lost.MatchString(log) {
+		t.Fatalf("want matching %s, got %q", lost, log)
+	}
+	idx := lost.FindStringIndex(log)
+	// Strip lost message.
+	log = dlogCanonicalize(log[idx[1]:])
+
+	// Check log.
+	if !strings.HasSuffix(want.String(), log) {
+		t.Fatalf("wrong suffix:\n%s", log)
+	}
+}
+
+func TestDebugLogLongString(t *testing.T) {
+	skipDebugLog(t)
+
+	runtime.ResetDebugLog()
+	var longString = strings.Repeat("a", runtime.DebugLogStringLimit+1)
+	runtime.Dlog().S(longString).End()
+	got := dlogCanonicalize(runtime.DumpDebugLog())
+	want := "[] " + strings.Repeat("a", runtime.DebugLogStringLimit) + " ..(1 more bytes)..\n"
+	if got != want {
+		t.Fatalf("want %q, got %q", want, got)
+	}
+}
diff --git a/src/runtime/defer_test.go b/src/runtime/defer_test.go
new file mode 100644
index 0000000..9a40ea1
--- /dev/null
+++ b/src/runtime/defer_test.go
@@ -0,0 +1,440 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"reflect"
+	"runtime"
+	"testing"
+)
+
+// Make sure open-coded defer exit code is not lost, even when there is an
+// unconditional panic (hence no return from the function)
+func TestUnconditionalPanic(t *testing.T) {
+	defer func() {
+		if recover() != "testUnconditional" {
+			t.Fatal("expected unconditional panic")
+		}
+	}()
+	panic("testUnconditional")
+}
+
+var glob int = 3
+
+// Test an open-coded defer and non-open-coded defer - make sure both defers run
+// and call recover()
+func TestOpenAndNonOpenDefers(t *testing.T) {
+	for {
+		// Non-open defer because in a loop
+		defer func(n int) {
+			if recover() != "testNonOpenDefer" {
+				t.Fatal("expected testNonOpen panic")
+			}
+		}(3)
+		if glob > 2 {
+			break
+		}
+	}
+	testOpen(t, 47)
+	panic("testNonOpenDefer")
+}
+
+//go:noinline
+func testOpen(t *testing.T, arg int) {
+	defer func(n int) {
+		if recover() != "testOpenDefer" {
+			t.Fatal("expected testOpen panic")
+		}
+	}(4)
+	if arg > 2 {
+		panic("testOpenDefer")
+	}
+}
+
+// Test a non-open-coded defer and an open-coded defer - make sure both defers run
+// and call recover()
+func TestNonOpenAndOpenDefers(t *testing.T) {
+	testOpen(t, 47)
+	for {
+		// Non-open defer because in a loop
+		defer func(n int) {
+			if recover() != "testNonOpenDefer" {
+				t.Fatal("expected testNonOpen panic")
+			}
+		}(3)
+		if glob > 2 {
+			break
+		}
+	}
+	panic("testNonOpenDefer")
+}
+
+var list []int
+
+// Make sure that conditional open-coded defers are activated correctly and run in
+// the correct order.
+func TestConditionalDefers(t *testing.T) {
+	list = make([]int, 0, 10)
+
+	defer func() {
+		if recover() != "testConditional" {
+			t.Fatal("expected panic")
+		}
+		want := []int{4, 2, 1}
+		if !reflect.DeepEqual(want, list) {
+			t.Fatal(fmt.Sprintf("wanted %v, got %v", want, list))
+		}
+
+	}()
+	testConditionalDefers(8)
+}
+
+func testConditionalDefers(n int) {
+	doappend := func(i int) {
+		list = append(list, i)
+	}
+
+	defer doappend(1)
+	if n > 5 {
+		defer doappend(2)
+		if n > 8 {
+			defer doappend(3)
+		} else {
+			defer doappend(4)
+		}
+	}
+	panic("testConditional")
+}
+
+// Test that there is no compile-time or run-time error if an open-coded defer
+// call is removed by constant propagation and dead-code elimination.
+func TestDisappearingDefer(t *testing.T) {
+	switch runtime.GOOS {
+	case "invalidOS":
+		defer func() {
+			t.Fatal("Defer shouldn't run")
+		}()
+	}
+}
+
+// This tests an extra recursive panic behavior that is only specified in the
+// code. Suppose a first panic P1 happens and starts processing defer calls. If a
+// second panic P2 happens while processing defer call D in frame F, then defer
+// call processing is restarted (with some potentially new defer calls created by
+// D or its callees). If the defer processing reaches the started defer call D
+// again in the defer stack, then the original panic P1 is aborted and cannot
+// continue panic processing or be recovered. If the panic P2 does a recover at
+// some point, it will naturally remove the original panic P1 from the stack
+// (since the original panic had to be in frame F or a descendant of F).
+func TestAbortedPanic(t *testing.T) {
+	defer func() {
+		r := recover()
+		if r != nil {
+			t.Fatal(fmt.Sprintf("wanted nil recover, got %v", r))
+		}
+	}()
+	defer func() {
+		r := recover()
+		if r != "panic2" {
+			t.Fatal(fmt.Sprintf("wanted %v, got %v", "panic2", r))
+		}
+	}()
+	defer func() {
+		panic("panic2")
+	}()
+	panic("panic1")
+}
+
+// This tests that recover() does not succeed unless it is called directly from a
+// defer function that is directly called by the panic.  Here, we first call it
+// from a defer function that is created by the defer function called directly by
+// the panic.  In
+func TestRecoverMatching(t *testing.T) {
+	defer func() {
+		r := recover()
+		if r != "panic1" {
+			t.Fatal(fmt.Sprintf("wanted %v, got %v", "panic1", r))
+		}
+	}()
+	defer func() {
+		defer func() {
+			// Shouldn't succeed, even though it is called directly
+			// from a defer function, since this defer function was
+			// not directly called by the panic.
+			r := recover()
+			if r != nil {
+				t.Fatal(fmt.Sprintf("wanted nil recover, got %v", r))
+			}
+		}()
+	}()
+	panic("panic1")
+}
+
+type nonSSAable [128]byte
+
+type bigStruct struct {
+	x, y, z, w, p, q int64
+}
+
+type containsBigStruct struct {
+	element bigStruct
+}
+
+func mknonSSAable() nonSSAable {
+	globint1++
+	return nonSSAable{0, 0, 0, 0, 5}
+}
+
+var globint1, globint2, globint3 int
+
+//go:noinline
+func sideeffect(n int64) int64 {
+	globint2++
+	return n
+}
+
+func sideeffect2(in containsBigStruct) containsBigStruct {
+	globint3++
+	return in
+}
+
+// Test that nonSSAable arguments to defer are handled correctly and only evaluated once.
+func TestNonSSAableArgs(t *testing.T) {
+	globint1 = 0
+	globint2 = 0
+	globint3 = 0
+	var save1 byte
+	var save2 int64
+	var save3 int64
+	var save4 int64
+
+	defer func() {
+		if globint1 != 1 {
+			t.Fatal(fmt.Sprintf("globint1:  wanted: 1, got %v", globint1))
+		}
+		if save1 != 5 {
+			t.Fatal(fmt.Sprintf("save1:  wanted: 5, got %v", save1))
+		}
+		if globint2 != 1 {
+			t.Fatal(fmt.Sprintf("globint2:  wanted: 1, got %v", globint2))
+		}
+		if save2 != 2 {
+			t.Fatal(fmt.Sprintf("save2:  wanted: 2, got %v", save2))
+		}
+		if save3 != 4 {
+			t.Fatal(fmt.Sprintf("save3:  wanted: 4, got %v", save3))
+		}
+		if globint3 != 1 {
+			t.Fatal(fmt.Sprintf("globint3:  wanted: 1, got %v", globint3))
+		}
+		if save4 != 4 {
+			t.Fatal(fmt.Sprintf("save1:  wanted: 4, got %v", save4))
+		}
+	}()
+
+	// Test function returning a non-SSAable arg
+	defer func(n nonSSAable) {
+		save1 = n[4]
+	}(mknonSSAable())
+	// Test composite literal that is not SSAable
+	defer func(b bigStruct) {
+		save2 = b.y
+	}(bigStruct{1, 2, 3, 4, 5, sideeffect(6)})
+
+	// Test struct field reference that is non-SSAable
+	foo := containsBigStruct{}
+	foo.element.z = 4
+	defer func(element bigStruct) {
+		save3 = element.z
+	}(foo.element)
+	defer func(element bigStruct) {
+		save4 = element.z
+	}(sideeffect2(foo).element)
+}
+
+//go:noinline
+func doPanic() {
+	panic("Test panic")
+}
+
+func TestDeferForFuncWithNoExit(t *testing.T) {
+	cond := 1
+	defer func() {
+		if cond != 2 {
+			t.Fatal(fmt.Sprintf("cond: wanted 2, got %v", cond))
+		}
+		if recover() != "Test panic" {
+			t.Fatal("Didn't find expected panic")
+		}
+	}()
+	x := 0
+	// Force a stack copy, to make sure that the &cond pointer passed to defer
+	// function is properly updated.
+	growStackIter(&x, 1000)
+	cond = 2
+	doPanic()
+
+	// This function has no exit/return, since it ends with an infinite loop
+	for {
+	}
+}
+
+// Test case approximating issue #37664, where a recursive function (interpreter)
+// may do repeated recovers/re-panics until it reaches the frame where the panic
+// can actually be handled. The recurseFnPanicRec() function is testing that there
+// are no stale defer structs on the defer chain after the interpreter() sequence,
+// by writing a bunch of 0xffffffffs into several recursive stack frames, and then
+// doing a single panic-recover which would invoke any such stale defer structs.
+func TestDeferWithRepeatedRepanics(t *testing.T) {
+	interpreter(0, 6, 2)
+	recurseFnPanicRec(0, 10)
+	interpreter(0, 5, 1)
+	recurseFnPanicRec(0, 10)
+	interpreter(0, 6, 3)
+	recurseFnPanicRec(0, 10)
+}
+
+func interpreter(level int, maxlevel int, rec int) {
+	defer func() {
+		e := recover()
+		if e == nil {
+			return
+		}
+		if level != e.(int) {
+			//fmt.Fprintln(os.Stderr, "re-panicing, level", level)
+			panic(e)
+		}
+		//fmt.Fprintln(os.Stderr, "Recovered, level", level)
+	}()
+	if level+1 < maxlevel {
+		interpreter(level+1, maxlevel, rec)
+	} else {
+		//fmt.Fprintln(os.Stderr, "Initiating panic")
+		panic(rec)
+	}
+}
+
+func recurseFnPanicRec(level int, maxlevel int) {
+	defer func() {
+		recover()
+	}()
+	recurseFn(level, maxlevel)
+}
+
+var saveInt uint32
+
+func recurseFn(level int, maxlevel int) {
+	a := [40]uint32{0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff}
+	if level+1 < maxlevel {
+		// Make sure a array is referenced, so it is not optimized away
+		saveInt = a[4]
+		recurseFn(level+1, maxlevel)
+	} else {
+		panic("recurseFn panic")
+	}
+}
+
+// Try to reproduce issue #37688, where a pointer to an open-coded defer struct is
+// mistakenly held, and that struct keeps a pointer to a stack-allocated defer
+// struct, and that stack-allocated struct gets overwritten or the stack gets
+// moved, so a memory error happens on GC.
+func TestIssue37688(t *testing.T) {
+	for j := 0; j < 10; j++ {
+		g2()
+		g3()
+	}
+}
+
+type foo struct {
+}
+
+//go:noinline
+func (f *foo) method1() {
+}
+
+//go:noinline
+func (f *foo) method2() {
+}
+
+func g2() {
+	var a foo
+	ap := &a
+	// The loop forces this defer to be heap-allocated and the remaining two
+	// to be stack-allocated.
+	for i := 0; i < 1; i++ {
+		defer ap.method1()
+	}
+	defer ap.method2()
+	defer ap.method1()
+	ff1(ap, 1, 2, 3, 4, 5, 6, 7, 8, 9)
+	// Try to get the stack to be be moved by growing it too large, so
+	// existing stack-allocated defer becomes invalid.
+	rec1(2000)
+}
+
+func g3() {
+	// Mix up the stack layout by adding in an extra function frame
+	g2()
+}
+
+var globstruct struct {
+	a, b, c, d, e, f, g, h, i int
+}
+
+func ff1(ap *foo, a, b, c, d, e, f, g, h, i int) {
+	defer ap.method1()
+
+	// Make a defer that has a very large set of args, hence big size for the
+	// defer record for the open-coded frame (which means it won't use the
+	// defer pool)
+	defer func(ap *foo, a, b, c, d, e, f, g, h, i int) {
+		if v := recover(); v != nil {
+		}
+		globstruct.a = a
+		globstruct.b = b
+		globstruct.c = c
+		globstruct.d = d
+		globstruct.e = e
+		globstruct.f = f
+		globstruct.g = g
+		globstruct.h = h
+	}(ap, a, b, c, d, e, f, g, h, i)
+	panic("ff1 panic")
+}
+
+func rec1(max int) {
+	if max > 0 {
+		rec1(max - 1)
+	}
+}
+
+func TestIssue43921(t *testing.T) {
+	defer func() {
+		expect(t, 1, recover())
+	}()
+	func() {
+		// Prevent open-coded defers
+		for {
+			defer func() {}()
+			break
+		}
+
+		defer func() {
+			defer func() {
+				expect(t, 4, recover())
+			}()
+			panic(4)
+		}()
+		panic(1)
+
+	}()
+}
+
+func expect(t *testing.T, n int, err interface{}) {
+	if n != err {
+		t.Fatalf("have %v, want %v", err, n)
+	}
+}
diff --git a/src/runtime/defs1_linux.go b/src/runtime/defs1_linux.go
new file mode 100644
index 0000000..4085d6f
--- /dev/null
+++ b/src/runtime/defs1_linux.go
@@ -0,0 +1,40 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo -cdefs
+
+GOARCH=amd64 cgo -cdefs defs.go defs1.go >amd64/defs.h
+*/
+
+package runtime
+
+/*
+#include <ucontext.h>
+#include <fcntl.h>
+#include <asm/signal.h>
+*/
+import "C"
+
+const (
+	O_RDONLY    = C.O_RDONLY
+	O_NONBLOCK  = C.O_NONBLOCK
+	O_CLOEXEC   = C.O_CLOEXEC
+	SA_RESTORER = C.SA_RESTORER
+)
+
+type Usigset C.__sigset_t
+type Fpxreg C.struct__libc_fpxreg
+type Xmmreg C.struct__libc_xmmreg
+type Fpstate C.struct__libc_fpstate
+type Fpxreg1 C.struct__fpxreg
+type Xmmreg1 C.struct__xmmreg
+type Fpstate1 C.struct__fpstate
+type Fpreg1 C.struct__fpreg
+type StackT C.stack_t
+type Mcontext C.mcontext_t
+type Ucontext C.ucontext_t
+type Sigcontext C.struct_sigcontext
diff --git a/src/runtime/defs1_netbsd_386.go b/src/runtime/defs1_netbsd_386.go
new file mode 100644
index 0000000..a4548e6
--- /dev/null
+++ b/src/runtime/defs1_netbsd_386.go
@@ -0,0 +1,180 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_386.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+	_ENOSYS = 0x4e
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x400000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_FREE = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = 0x0
+	_EVFILT_WRITE = 0x1
+)
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type siginfo struct {
+	_signo  int32
+	_code   int32
+	_errno  int32
+	_reason [20]byte
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = int64(timediv(ns, 1e9, &ts.tv_nsec))
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int32
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type mcontextt struct {
+	__gregs     [19]uint32
+	__fpregs    [644]byte
+	_mc_tlsbase int32
+}
+
+type ucontextt struct {
+	uc_flags    uint32
+	uc_link     *ucontextt
+	uc_sigmask  sigset
+	uc_stack    stackt
+	uc_mcontext mcontextt
+	__uc_pad    [4]int32
+}
+
+type keventt struct {
+	ident  uint32
+	filter uint32
+	flags  uint32
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_386.go
+
+const (
+	_REG_GS     = 0x0
+	_REG_FS     = 0x1
+	_REG_ES     = 0x2
+	_REG_DS     = 0x3
+	_REG_EDI    = 0x4
+	_REG_ESI    = 0x5
+	_REG_EBP    = 0x6
+	_REG_ESP    = 0x7
+	_REG_EBX    = 0x8
+	_REG_EDX    = 0x9
+	_REG_ECX    = 0xa
+	_REG_EAX    = 0xb
+	_REG_TRAPNO = 0xc
+	_REG_ERR    = 0xd
+	_REG_EIP    = 0xe
+	_REG_CS     = 0xf
+	_REG_EFL    = 0x10
+	_REG_UESP   = 0x11
+	_REG_SS     = 0x12
+)
diff --git a/src/runtime/defs1_netbsd_amd64.go b/src/runtime/defs1_netbsd_amd64.go
new file mode 100644
index 0000000..4b0e79e
--- /dev/null
+++ b/src/runtime/defs1_netbsd_amd64.go
@@ -0,0 +1,192 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_amd64.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+	_ENOSYS = 0x4e
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x400000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_FREE = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = 0x0
+	_EVFILT_WRITE = 0x1
+)
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type siginfo struct {
+	_signo  int32
+	_code   int32
+	_errno  int32
+	_pad    int32
+	_reason [24]byte
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type mcontextt struct {
+	__gregs     [26]uint64
+	_mc_tlsbase uint64
+	__fpregs    [512]int8
+}
+
+type ucontextt struct {
+	uc_flags    uint32
+	pad_cgo_0   [4]byte
+	uc_link     *ucontextt
+	uc_sigmask  sigset
+	uc_stack    stackt
+	uc_mcontext mcontextt
+}
+
+type keventt struct {
+	ident     uint64
+	filter    uint32
+	flags     uint32
+	fflags    uint32
+	pad_cgo_0 [4]byte
+	data      int64
+	udata     *byte
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_amd64.go
+
+const (
+	_REG_RDI    = 0x0
+	_REG_RSI    = 0x1
+	_REG_RDX    = 0x2
+	_REG_RCX    = 0x3
+	_REG_R8     = 0x4
+	_REG_R9     = 0x5
+	_REG_R10    = 0x6
+	_REG_R11    = 0x7
+	_REG_R12    = 0x8
+	_REG_R13    = 0x9
+	_REG_R14    = 0xa
+	_REG_R15    = 0xb
+	_REG_RBP    = 0xc
+	_REG_RBX    = 0xd
+	_REG_RAX    = 0xe
+	_REG_GS     = 0xf
+	_REG_FS     = 0x10
+	_REG_ES     = 0x11
+	_REG_DS     = 0x12
+	_REG_TRAPNO = 0x13
+	_REG_ERR    = 0x14
+	_REG_RIP    = 0x15
+	_REG_CS     = 0x16
+	_REG_RFLAGS = 0x17
+	_REG_RSP    = 0x18
+	_REG_SS     = 0x19
+)
diff --git a/src/runtime/defs1_netbsd_arm.go b/src/runtime/defs1_netbsd_arm.go
new file mode 100644
index 0000000..2b5d599
--- /dev/null
+++ b/src/runtime/defs1_netbsd_arm.go
@@ -0,0 +1,185 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_arm.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+	_ENOSYS = 0x4e
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x400000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_FREE = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = 0x0
+	_EVFILT_WRITE = 0x1
+)
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type siginfo struct {
+	_signo   int32
+	_code    int32
+	_errno   int32
+	_reason  uintptr
+	_reasonx [16]byte
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int32
+	_       [4]byte // EABI
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = int64(timediv(ns, 1e9, &ts.tv_nsec))
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int32
+	_       [4]byte // EABI
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type mcontextt struct {
+	__gregs     [17]uint32
+	_           [4]byte   // EABI
+	__fpu       [272]byte // EABI
+	_mc_tlsbase uint32
+	_           [4]byte // EABI
+}
+
+type ucontextt struct {
+	uc_flags    uint32
+	uc_link     *ucontextt
+	uc_sigmask  sigset
+	uc_stack    stackt
+	_           [4]byte // EABI
+	uc_mcontext mcontextt
+	__uc_pad    [2]int32
+}
+
+type keventt struct {
+	ident  uint32
+	filter uint32
+	flags  uint32
+	fflags uint32
+	data   int64
+	udata  *byte
+	_      [4]byte // EABI
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_arm.go
+
+const (
+	_REG_R0   = 0x0
+	_REG_R1   = 0x1
+	_REG_R2   = 0x2
+	_REG_R3   = 0x3
+	_REG_R4   = 0x4
+	_REG_R5   = 0x5
+	_REG_R6   = 0x6
+	_REG_R7   = 0x7
+	_REG_R8   = 0x8
+	_REG_R9   = 0x9
+	_REG_R10  = 0xa
+	_REG_R11  = 0xb
+	_REG_R12  = 0xc
+	_REG_R13  = 0xd
+	_REG_R14  = 0xe
+	_REG_R15  = 0xf
+	_REG_CPSR = 0x10
+)
diff --git a/src/runtime/defs1_netbsd_arm64.go b/src/runtime/defs1_netbsd_arm64.go
new file mode 100644
index 0000000..740dc77
--- /dev/null
+++ b/src/runtime/defs1_netbsd_arm64.go
@@ -0,0 +1,200 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_arm.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+	_ENOSYS = 0x4e
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x400000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_FREE = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = 0x0
+	_EVFILT_WRITE = 0x1
+)
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type siginfo struct {
+	_signo   int32
+	_code    int32
+	_errno   int32
+	_reason  uintptr
+	_reasonx [16]byte
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int32
+	_       [4]byte // EABI
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type mcontextt struct {
+	__gregs [35]uint64
+	__fregs [4160]byte // _NFREG * 128 + 32 + 32
+	_       [8]uint64  // future use
+}
+
+type ucontextt struct {
+	uc_flags    uint32
+	uc_link     *ucontextt
+	uc_sigmask  sigset
+	uc_stack    stackt
+	_           [4]byte // EABI
+	uc_mcontext mcontextt
+	__uc_pad    [2]int32
+}
+
+type keventt struct {
+	ident     uint64
+	filter    uint32
+	flags     uint32
+	fflags    uint32
+	pad_cgo_0 [4]byte
+	data      int64
+	udata     *byte
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_arm.go
+
+const (
+	_REG_X0    = 0
+	_REG_X1    = 1
+	_REG_X2    = 2
+	_REG_X3    = 3
+	_REG_X4    = 4
+	_REG_X5    = 5
+	_REG_X6    = 6
+	_REG_X7    = 7
+	_REG_X8    = 8
+	_REG_X9    = 9
+	_REG_X10   = 10
+	_REG_X11   = 11
+	_REG_X12   = 12
+	_REG_X13   = 13
+	_REG_X14   = 14
+	_REG_X15   = 15
+	_REG_X16   = 16
+	_REG_X17   = 17
+	_REG_X18   = 18
+	_REG_X19   = 19
+	_REG_X20   = 20
+	_REG_X21   = 21
+	_REG_X22   = 22
+	_REG_X23   = 23
+	_REG_X24   = 24
+	_REG_X25   = 25
+	_REG_X26   = 26
+	_REG_X27   = 27
+	_REG_X28   = 28
+	_REG_X29   = 29
+	_REG_X30   = 30
+	_REG_X31   = 31
+	_REG_ELR   = 32
+	_REG_SPSR  = 33
+	_REG_TPIDR = 34
+)
diff --git a/src/runtime/defs1_solaris_amd64.go b/src/runtime/defs1_solaris_amd64.go
new file mode 100644
index 0000000..19e8a25
--- /dev/null
+++ b/src/runtime/defs1_solaris_amd64.go
@@ -0,0 +1,251 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_solaris.go defs_solaris_amd64.go
+
+package runtime
+
+const (
+	_EINTR       = 0x4
+	_EBADF       = 0x9
+	_EFAULT      = 0xe
+	_EAGAIN      = 0xb
+	_EBUSY       = 0x10
+	_ETIME       = 0x3e
+	_ETIMEDOUT   = 0x91
+	_EWOULDBLOCK = 0xb
+	_EINPROGRESS = 0x96
+	_ENOSYS      = 0x59
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x100
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_FREE = 0x5
+
+	_SA_SIGINFO = 0x8
+	_SA_RESTART = 0x4
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x15
+	_SIGSTOP   = 0x17
+	_SIGTSTP   = 0x18
+	_SIGCONT   = 0x19
+	_SIGCHLD   = 0x12
+	_SIGTTIN   = 0x1a
+	_SIGTTOU   = 0x1b
+	_SIGIO     = 0x16
+	_SIGXCPU   = 0x1e
+	_SIGXFSZ   = 0x1f
+	_SIGVTALRM = 0x1c
+	_SIGPROF   = 0x1d
+	_SIGWINCH  = 0x14
+	_SIGUSR1   = 0x10
+	_SIGUSR2   = 0x11
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	__SC_PAGESIZE         = 0xb
+	__SC_NPROCESSORS_ONLN = 0xf
+
+	_PTHREAD_CREATE_DETACHED = 0x40
+
+	_FORK_NOSIGCHLD = 0x1
+	_FORK_WAITPID   = 0x2
+
+	_MAXHOSTNAMELEN = 0x100
+
+	_O_NONBLOCK = 0x80
+	_O_CLOEXEC  = 0x800000
+	_FD_CLOEXEC = 0x1
+	_F_GETFL    = 0x3
+	_F_SETFL    = 0x4
+	_F_SETFD    = 0x2
+
+	_POLLIN  = 0x1
+	_POLLOUT = 0x4
+	_POLLHUP = 0x10
+	_POLLERR = 0x8
+
+	_PORT_SOURCE_FD    = 0x4
+	_PORT_SOURCE_ALERT = 0x5
+	_PORT_ALERT_UPDATE = 0x2
+)
+
+type semt struct {
+	sem_count uint32
+	sem_type  uint16
+	sem_magic uint16
+	sem_pad1  [3]uint64
+	sem_pad2  [2]uint64
+}
+
+type sigset struct {
+	__sigbits [4]uint32
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type siginfo struct {
+	si_signo int32
+	si_code  int32
+	si_errno int32
+	si_pad   int32
+	__data   [240]byte
+}
+
+type sigactiont struct {
+	sa_flags  int32
+	pad_cgo_0 [4]byte
+	_funcptr  [8]byte
+	sa_mask   sigset
+}
+
+type fpregset struct {
+	fp_reg_set [528]byte
+}
+
+type mcontext struct {
+	gregs  [28]int64
+	fpregs fpregset
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_sigmask  sigset
+	uc_stack    stackt
+	pad_cgo_0   [8]byte
+	uc_mcontext mcontext
+	uc_filler   [5]int64
+	pad_cgo_1   [8]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type portevent struct {
+	portev_events int32
+	portev_source uint16
+	portev_pad    uint16
+	portev_object uint64
+	portev_user   *byte
+}
+
+type pthread uint32
+type pthreadattr struct {
+	__pthread_attrp *byte
+}
+
+type stat struct {
+	st_dev     uint64
+	st_ino     uint64
+	st_mode    uint32
+	st_nlink   uint32
+	st_uid     uint32
+	st_gid     uint32
+	st_rdev    uint64
+	st_size    int64
+	st_atim    timespec
+	st_mtim    timespec
+	st_ctim    timespec
+	st_blksize int32
+	pad_cgo_0  [4]byte
+	st_blocks  int64
+	st_fstype  [16]int8
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_solaris.go defs_solaris_amd64.go
+
+const (
+	_REG_RDI    = 0x8
+	_REG_RSI    = 0x9
+	_REG_RDX    = 0xc
+	_REG_RCX    = 0xd
+	_REG_R8     = 0x7
+	_REG_R9     = 0x6
+	_REG_R10    = 0x5
+	_REG_R11    = 0x4
+	_REG_R12    = 0x3
+	_REG_R13    = 0x2
+	_REG_R14    = 0x1
+	_REG_R15    = 0x0
+	_REG_RBP    = 0xa
+	_REG_RBX    = 0xb
+	_REG_RAX    = 0xe
+	_REG_GS     = 0x17
+	_REG_FS     = 0x16
+	_REG_ES     = 0x18
+	_REG_DS     = 0x19
+	_REG_TRAPNO = 0xf
+	_REG_ERR    = 0x10
+	_REG_RIP    = 0x11
+	_REG_CS     = 0x12
+	_REG_RFLAGS = 0x13
+	_REG_RSP    = 0x14
+	_REG_SS     = 0x15
+)
diff --git a/src/runtime/defs2_linux.go b/src/runtime/defs2_linux.go
new file mode 100644
index 0000000..87e19c1
--- /dev/null
+++ b/src/runtime/defs2_linux.go
@@ -0,0 +1,149 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+ * Input to cgo -cdefs
+
+GOARCH=386 go tool cgo -cdefs defs2_linux.go >defs_linux_386.h
+
+The asm header tricks we have to use for Linux on amd64
+(see defs.c and defs1.c) don't work here, so this is yet another
+file.  Sigh.
+*/
+
+package runtime
+
+/*
+#cgo CFLAGS: -I/tmp/linux/arch/x86/include -I/tmp/linux/include -D_LOOSE_KERNEL_NAMES -D__ARCH_SI_UID_T=__kernel_uid32_t
+
+#define size_t __kernel_size_t
+#define pid_t int
+#include <asm/signal.h>
+#include <asm/mman.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/siginfo.h>
+#include <asm-generic/errno.h>
+#include <asm-generic/fcntl.h>
+#include <asm-generic/poll.h>
+#include <linux/eventpoll.h>
+
+// This is the sigaction structure from the Linux 2.1.68 kernel which
+//   is used with the rt_sigaction system call. For 386 this is not
+//   defined in any public header file.
+
+struct kernel_sigaction {
+	__sighandler_t k_sa_handler;
+	unsigned long sa_flags;
+	void (*sa_restorer) (void);
+	unsigned long long sa_mask;
+};
+*/
+import "C"
+
+const (
+	EINTR  = C.EINTR
+	EAGAIN = C.EAGAIN
+	ENOMEM = C.ENOMEM
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANONYMOUS
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED   = C.MADV_DONTNEED
+	MADV_FREE       = C.MADV_FREE
+	MADV_HUGEPAGE   = C.MADV_HUGEPAGE
+	MADV_NOHUGEPAGE = C.MADV_NOHUGEPAGE
+
+	SA_RESTART  = C.SA_RESTART
+	SA_ONSTACK  = C.SA_ONSTACK
+	SA_RESTORER = C.SA_RESTORER
+	SA_SIGINFO  = C.SA_SIGINFO
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGBUS    = C.SIGBUS
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGUSR1   = C.SIGUSR1
+	SIGSEGV   = C.SIGSEGV
+	SIGUSR2   = C.SIGUSR2
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGSTKFLT = C.SIGSTKFLT
+	SIGCHLD   = C.SIGCHLD
+	SIGCONT   = C.SIGCONT
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGURG    = C.SIGURG
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGIO     = C.SIGIO
+	SIGPWR    = C.SIGPWR
+	SIGSYS    = C.SIGSYS
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	O_RDONLY  = C.O_RDONLY
+	O_CLOEXEC = C.O_CLOEXEC
+
+	EPOLLIN       = C.POLLIN
+	EPOLLOUT      = C.POLLOUT
+	EPOLLERR      = C.POLLERR
+	EPOLLHUP      = C.POLLHUP
+	EPOLLRDHUP    = C.POLLRDHUP
+	EPOLLET       = C.EPOLLET
+	EPOLL_CLOEXEC = C.EPOLL_CLOEXEC
+	EPOLL_CTL_ADD = C.EPOLL_CTL_ADD
+	EPOLL_CTL_DEL = C.EPOLL_CTL_DEL
+	EPOLL_CTL_MOD = C.EPOLL_CTL_MOD
+)
+
+type Fpreg C.struct__fpreg
+type Fpxreg C.struct__fpxreg
+type Xmmreg C.struct__xmmreg
+type Fpstate C.struct__fpstate
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Sigaction C.struct_kernel_sigaction
+type Siginfo C.siginfo_t
+type StackT C.stack_t
+type Sigcontext C.struct_sigcontext
+type Ucontext C.struct_ucontext
+type Itimerval C.struct_itimerval
+type EpollEvent C.struct_epoll_event
diff --git a/src/runtime/defs3_linux.go b/src/runtime/defs3_linux.go
new file mode 100644
index 0000000..31f2191
--- /dev/null
+++ b/src/runtime/defs3_linux.go
@@ -0,0 +1,43 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo -cdefs
+
+GOARCH=ppc64 cgo -cdefs defs_linux.go defs3_linux.go > defs_linux_ppc64.h
+*/
+
+package runtime
+
+/*
+#define size_t __kernel_size_t
+#define sigset_t __sigset_t // rename the sigset_t here otherwise cgo will complain about "inconsistent definitions for C.sigset_t"
+#define	_SYS_TYPES_H	// avoid inclusion of sys/types.h
+#include <asm/ucontext.h>
+#include <asm-generic/fcntl.h>
+*/
+import "C"
+
+const (
+	O_RDONLY    = C.O_RDONLY
+	O_CLOEXEC   = C.O_CLOEXEC
+	SA_RESTORER = 0 // unused
+)
+
+type Usigset C.__sigset_t
+
+// types used in sigcontext
+type Ptregs C.struct_pt_regs
+type Gregset C.elf_gregset_t
+type FPregset C.elf_fpregset_t
+type Vreg C.elf_vrreg_t
+
+type StackT C.stack_t
+
+// PPC64 uses sigcontext in place of mcontext in ucontext.
+// see https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/include/uapi/asm/ucontext.h
+type Sigcontext C.struct_sigcontext
+type Ucontext C.struct_ucontext
diff --git a/src/runtime/defs_aix.go b/src/runtime/defs_aix.go
new file mode 100644
index 0000000..23a6cac
--- /dev/null
+++ b/src/runtime/defs_aix.go
@@ -0,0 +1,171 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo -godefs
+GOARCH=ppc64 go tool cgo -godefs defs_aix.go > defs_aix_ppc64_tmp.go
+
+This is only a helper to create defs_aix_ppc64.go
+Go runtime functions require the "linux" name of fields (ss_sp, si_addr, etc)
+However, AIX structures don't provide such names and must be modified.
+
+TODO(aix): create a script to automatise defs_aix creation.
+
+Modifications made:
+ - sigset replaced by a [4]uint64 array
+ - add sigset_all variable
+ - siginfo.si_addr uintptr instead of *byte
+ - add (*timeval) set_usec
+ - stackt.ss_sp uintptr instead of *byte
+ - stackt.ss_size uintptr instead of uint64
+ - sigcontext.sc_jmpbuf context64 instead of jumbuf
+ - ucontext.__extctx is a uintptr because we don't need extctx struct
+ - ucontext.uc_mcontext: replace jumbuf structure by context64 structure
+ - sigaction.sa_handler represents union field as both are uintptr
+ - tstate.* replace *byte by uintptr
+
+
+*/
+
+package runtime
+
+/*
+
+#include <sys/types.h>
+#include <sys/errno.h>
+#include <sys/time.h>
+#include <sys/signal.h>
+#include <sys/mman.h>
+#include <sys/thread.h>
+#include <sys/resource.h>
+
+#include <unistd.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <semaphore.h>
+*/
+import "C"
+
+const (
+	_EPERM     = C.EPERM
+	_ENOENT    = C.ENOENT
+	_EINTR     = C.EINTR
+	_EAGAIN    = C.EAGAIN
+	_ENOMEM    = C.ENOMEM
+	_EACCES    = C.EACCES
+	_EFAULT    = C.EFAULT
+	_EINVAL    = C.EINVAL
+	_ETIMEDOUT = C.ETIMEDOUT
+
+	_PROT_NONE  = C.PROT_NONE
+	_PROT_READ  = C.PROT_READ
+	_PROT_WRITE = C.PROT_WRITE
+	_PROT_EXEC  = C.PROT_EXEC
+
+	_MAP_ANON      = C.MAP_ANONYMOUS
+	_MAP_PRIVATE   = C.MAP_PRIVATE
+	_MAP_FIXED     = C.MAP_FIXED
+	_MADV_DONTNEED = C.MADV_DONTNEED
+
+	_SIGHUP     = C.SIGHUP
+	_SIGINT     = C.SIGINT
+	_SIGQUIT    = C.SIGQUIT
+	_SIGILL     = C.SIGILL
+	_SIGTRAP    = C.SIGTRAP
+	_SIGABRT    = C.SIGABRT
+	_SIGBUS     = C.SIGBUS
+	_SIGFPE     = C.SIGFPE
+	_SIGKILL    = C.SIGKILL
+	_SIGUSR1    = C.SIGUSR1
+	_SIGSEGV    = C.SIGSEGV
+	_SIGUSR2    = C.SIGUSR2
+	_SIGPIPE    = C.SIGPIPE
+	_SIGALRM    = C.SIGALRM
+	_SIGCHLD    = C.SIGCHLD
+	_SIGCONT    = C.SIGCONT
+	_SIGSTOP    = C.SIGSTOP
+	_SIGTSTP    = C.SIGTSTP
+	_SIGTTIN    = C.SIGTTIN
+	_SIGTTOU    = C.SIGTTOU
+	_SIGURG     = C.SIGURG
+	_SIGXCPU    = C.SIGXCPU
+	_SIGXFSZ    = C.SIGXFSZ
+	_SIGVTALRM  = C.SIGVTALRM
+	_SIGPROF    = C.SIGPROF
+	_SIGWINCH   = C.SIGWINCH
+	_SIGIO      = C.SIGIO
+	_SIGPWR     = C.SIGPWR
+	_SIGSYS     = C.SIGSYS
+	_SIGTERM    = C.SIGTERM
+	_SIGEMT     = C.SIGEMT
+	_SIGWAITING = C.SIGWAITING
+
+	_FPE_INTDIV = C.FPE_INTDIV
+	_FPE_INTOVF = C.FPE_INTOVF
+	_FPE_FLTDIV = C.FPE_FLTDIV
+	_FPE_FLTOVF = C.FPE_FLTOVF
+	_FPE_FLTUND = C.FPE_FLTUND
+	_FPE_FLTRES = C.FPE_FLTRES
+	_FPE_FLTINV = C.FPE_FLTINV
+	_FPE_FLTSUB = C.FPE_FLTSUB
+
+	_BUS_ADRALN = C.BUS_ADRALN
+	_BUS_ADRERR = C.BUS_ADRERR
+	_BUS_OBJERR = C.BUS_OBJERR
+
+	_SEGV_MAPERR = C.SEGV_MAPERR
+	_SEGV_ACCERR = C.SEGV_ACCERR
+
+	_ITIMER_REAL    = C.ITIMER_REAL
+	_ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	_ITIMER_PROF    = C.ITIMER_PROF
+
+	_O_RDONLY   = C.O_RDONLY
+	_O_NONBLOCK = C.O_NONBLOCK
+
+	_SS_DISABLE  = C.SS_DISABLE
+	_SI_USER     = C.SI_USER
+	_SIG_BLOCK   = C.SIG_BLOCK
+	_SIG_UNBLOCK = C.SIG_UNBLOCK
+	_SIG_SETMASK = C.SIG_SETMASK
+
+	_SA_SIGINFO = C.SA_SIGINFO
+	_SA_RESTART = C.SA_RESTART
+	_SA_ONSTACK = C.SA_ONSTACK
+
+	_PTHREAD_CREATE_DETACHED = C.PTHREAD_CREATE_DETACHED
+
+	__SC_PAGE_SIZE        = C._SC_PAGE_SIZE
+	__SC_NPROCESSORS_ONLN = C._SC_NPROCESSORS_ONLN
+
+	_F_SETFD    = C.F_SETFD
+	_F_SETFL    = C.F_SETFL
+	_F_GETFD    = C.F_GETFD
+	_F_GETFL    = C.F_GETFL
+	_FD_CLOEXEC = C.FD_CLOEXEC
+)
+
+type sigset C.sigset_t
+type siginfo C.siginfo_t
+type timespec C.struct_timespec
+type timestruc C.struct_timestruc_t
+type timeval C.struct_timeval
+type itimerval C.struct_itimerval
+
+type stackt C.stack_t
+type sigcontext C.struct_sigcontext
+type ucontext C.ucontext_t
+type _Ctype_struct___extctx uint64 // ucontext use a pointer to this structure but it shouldn't be used
+type jmpbuf C.struct___jmpbuf
+type context64 C.struct___context64
+type sigactiont C.struct_sigaction
+type tstate C.struct_tstate
+type rusage C.struct_rusage
+
+type pthread C.pthread_t
+type pthread_attr C.pthread_attr_t
+
+type semt C.sem_t
diff --git a/src/runtime/defs_aix_ppc64.go b/src/runtime/defs_aix_ppc64.go
new file mode 100644
index 0000000..a53fcc5
--- /dev/null
+++ b/src/runtime/defs_aix_ppc64.go
@@ -0,0 +1,211 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix
+
+package runtime
+
+const (
+	_EPERM     = 0x1
+	_ENOENT    = 0x2
+	_EINTR     = 0x4
+	_EAGAIN    = 0xb
+	_ENOMEM    = 0xc
+	_EACCES    = 0xd
+	_EFAULT    = 0xe
+	_EINVAL    = 0x16
+	_ETIMEDOUT = 0x4e
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON      = 0x10
+	_MAP_PRIVATE   = 0x2
+	_MAP_FIXED     = 0x100
+	_MADV_DONTNEED = 0x4
+
+	_SIGHUP     = 0x1
+	_SIGINT     = 0x2
+	_SIGQUIT    = 0x3
+	_SIGILL     = 0x4
+	_SIGTRAP    = 0x5
+	_SIGABRT    = 0x6
+	_SIGBUS     = 0xa
+	_SIGFPE     = 0x8
+	_SIGKILL    = 0x9
+	_SIGUSR1    = 0x1e
+	_SIGSEGV    = 0xb
+	_SIGUSR2    = 0x1f
+	_SIGPIPE    = 0xd
+	_SIGALRM    = 0xe
+	_SIGCHLD    = 0x14
+	_SIGCONT    = 0x13
+	_SIGSTOP    = 0x11
+	_SIGTSTP    = 0x12
+	_SIGTTIN    = 0x15
+	_SIGTTOU    = 0x16
+	_SIGURG     = 0x10
+	_SIGXCPU    = 0x18
+	_SIGXFSZ    = 0x19
+	_SIGVTALRM  = 0x22
+	_SIGPROF    = 0x20
+	_SIGWINCH   = 0x1c
+	_SIGIO      = 0x17
+	_SIGPWR     = 0x1d
+	_SIGSYS     = 0xc
+	_SIGTERM    = 0xf
+	_SIGEMT     = 0x7
+	_SIGWAITING = 0x27
+
+	_FPE_INTDIV = 0x14
+	_FPE_INTOVF = 0x15
+	_FPE_FLTDIV = 0x16
+	_FPE_FLTOVF = 0x17
+	_FPE_FLTUND = 0x18
+	_FPE_FLTRES = 0x19
+	_FPE_FLTINV = 0x1a
+	_FPE_FLTSUB = 0x1b
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+	_
+	_SEGV_MAPERR = 0x32
+	_SEGV_ACCERR = 0x33
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_O_RDONLY   = 0x0
+	_O_NONBLOCK = 0x4
+
+	_SS_DISABLE  = 0x2
+	_SI_USER     = 0x0
+	_SIG_BLOCK   = 0x0
+	_SIG_UNBLOCK = 0x1
+	_SIG_SETMASK = 0x2
+
+	_SA_SIGINFO = 0x100
+	_SA_RESTART = 0x8
+	_SA_ONSTACK = 0x1
+
+	_PTHREAD_CREATE_DETACHED = 0x1
+
+	__SC_PAGE_SIZE        = 0x30
+	__SC_NPROCESSORS_ONLN = 0x48
+
+	_F_SETFD    = 0x2
+	_F_SETFL    = 0x4
+	_F_GETFD    = 0x1
+	_F_GETFL    = 0x3
+	_FD_CLOEXEC = 0x1
+)
+
+type sigset [4]uint64
+
+var sigset_all = sigset{^uint64(0), ^uint64(0), ^uint64(0), ^uint64(0)}
+
+type siginfo struct {
+	si_signo   int32
+	si_errno   int32
+	si_code    int32
+	si_pid     int32
+	si_uid     uint32
+	si_status  int32
+	si_addr    uintptr
+	si_band    int64
+	si_value   [2]int32 // [8]byte
+	__si_flags int32
+	__pad      [3]int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	__pad     [4]int32
+	pas_cgo_0 [4]byte
+}
+
+type sigcontext struct {
+	sc_onstack int32
+	pad_cgo_0  [4]byte
+	sc_mask    sigset
+	sc_uerror  int32
+	sc_jmpbuf  context64
+}
+
+type ucontext struct {
+	__sc_onstack   int32
+	pad_cgo_0      [4]byte
+	uc_sigmask     sigset
+	__sc_error     int32
+	pad_cgo_1      [4]byte
+	uc_mcontext    context64
+	uc_link        *ucontext
+	uc_stack       stackt
+	__extctx       uintptr // pointer to struct __extctx but we don't use it
+	__extctx_magic int32
+	__pad          int32
+}
+
+type context64 struct {
+	gpr        [32]uint64
+	msr        uint64
+	iar        uint64
+	lr         uint64
+	ctr        uint64
+	cr         uint32
+	xer        uint32
+	fpscr      uint32
+	fpscrx     uint32
+	except     [1]uint64
+	fpr        [32]float64
+	fpeu       uint8
+	fpinfo     uint8
+	fpscr24_31 uint8
+	pad        [1]uint8
+	excp_type  int32
+}
+
+type sigactiont struct {
+	sa_handler uintptr // a union of two pointer
+	sa_mask    sigset
+	sa_flags   int32
+	pad_cgo_0  [4]byte
+}
+
+type pthread uint32
+type pthread_attr *byte
+
+type semt int32
diff --git a/src/runtime/defs_arm_linux.go b/src/runtime/defs_arm_linux.go
new file mode 100644
index 0000000..e51dd32
--- /dev/null
+++ b/src/runtime/defs_arm_linux.go
@@ -0,0 +1,124 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+On a Debian Lenny arm linux distribution:
+
+cgo -cdefs defs_arm.c >arm/defs.h
+*/
+
+package runtime
+
+/*
+#cgo CFLAGS: -I/usr/src/linux-headers-2.6.26-2-versatile/include
+
+#define __ARCH_SI_UID_T int
+#include <asm/signal.h>
+#include <asm/mman.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/siginfo.h>
+#include <linux/time.h>
+
+struct xsiginfo {
+	int si_signo;
+	int si_errno;
+	int si_code;
+	char _sifields[4];
+};
+
+#undef sa_handler
+#undef sa_flags
+#undef sa_restorer
+#undef sa_mask
+
+struct xsigaction {
+	void (*sa_handler)(void);
+	unsigned long sa_flags;
+	void (*sa_restorer)(void);
+	unsigned int sa_mask;		// mask last for extensibility
+};
+*/
+import "C"
+
+const (
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANONYMOUS
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED = C.MADV_DONTNEED
+
+	SA_RESTART  = C.SA_RESTART
+	SA_ONSTACK  = C.SA_ONSTACK
+	SA_RESTORER = C.SA_RESTORER
+	SA_SIGINFO  = C.SA_SIGINFO
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGBUS    = C.SIGBUS
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGUSR1   = C.SIGUSR1
+	SIGSEGV   = C.SIGSEGV
+	SIGUSR2   = C.SIGUSR2
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGSTKFLT = C.SIGSTKFLT
+	SIGCHLD   = C.SIGCHLD
+	SIGCONT   = C.SIGCONT
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGURG    = C.SIGURG
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGIO     = C.SIGIO
+	SIGPWR    = C.SIGPWR
+	SIGSYS    = C.SIGSYS
+
+	FPE_INTDIV = C.FPE_INTDIV & 0xFFFF
+	FPE_INTOVF = C.FPE_INTOVF & 0xFFFF
+	FPE_FLTDIV = C.FPE_FLTDIV & 0xFFFF
+	FPE_FLTOVF = C.FPE_FLTOVF & 0xFFFF
+	FPE_FLTUND = C.FPE_FLTUND & 0xFFFF
+	FPE_FLTRES = C.FPE_FLTRES & 0xFFFF
+	FPE_FLTINV = C.FPE_FLTINV & 0xFFFF
+	FPE_FLTSUB = C.FPE_FLTSUB & 0xFFFF
+
+	BUS_ADRALN = C.BUS_ADRALN & 0xFFFF
+	BUS_ADRERR = C.BUS_ADRERR & 0xFFFF
+	BUS_OBJERR = C.BUS_OBJERR & 0xFFFF
+
+	SEGV_MAPERR = C.SEGV_MAPERR & 0xFFFF
+	SEGV_ACCERR = C.SEGV_ACCERR & 0xFFFF
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_PROF    = C.ITIMER_PROF
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+)
+
+type Timespec C.struct_timespec
+type StackT C.stack_t
+type Sigcontext C.struct_sigcontext
+type Ucontext C.struct_ucontext
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+type Siginfo C.struct_xsiginfo
+type Sigaction C.struct_xsigaction
diff --git a/src/runtime/defs_darwin.go b/src/runtime/defs_darwin.go
new file mode 100644
index 0000000..cc8c475
--- /dev/null
+++ b/src/runtime/defs_darwin.go
@@ -0,0 +1,164 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_darwin.go >defs_darwin_amd64.h
+*/
+
+package runtime
+
+/*
+#define __DARWIN_UNIX03 0
+#include <mach/mach_time.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <signal.h>
+#include <sys/event.h>
+#include <sys/mman.h>
+#include <pthread.h>
+#include <fcntl.h>
+*/
+import "C"
+
+const (
+	EINTR     = C.EINTR
+	EFAULT    = C.EFAULT
+	EAGAIN    = C.EAGAIN
+	ETIMEDOUT = C.ETIMEDOUT
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED      = C.MADV_DONTNEED
+	MADV_FREE          = C.MADV_FREE
+	MADV_FREE_REUSABLE = C.MADV_FREE_REUSABLE
+	MADV_FREE_REUSE    = C.MADV_FREE_REUSE
+
+	SA_SIGINFO   = C.SA_SIGINFO
+	SA_RESTART   = C.SA_RESTART
+	SA_ONSTACK   = C.SA_ONSTACK
+	SA_USERTRAMP = C.SA_USERTRAMP
+	SA_64REGSET  = C.SA_64REGSET
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGINFO   = C.SIGINFO
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EV_ADD       = C.EV_ADD
+	EV_DELETE    = C.EV_DELETE
+	EV_CLEAR     = C.EV_CLEAR
+	EV_RECEIPT   = C.EV_RECEIPT
+	EV_ERROR     = C.EV_ERROR
+	EV_EOF       = C.EV_EOF
+	EVFILT_READ  = C.EVFILT_READ
+	EVFILT_WRITE = C.EVFILT_WRITE
+
+	PTHREAD_CREATE_DETACHED = C.PTHREAD_CREATE_DETACHED
+
+	F_SETFD    = C.F_SETFD
+	F_GETFL    = C.F_GETFL
+	F_SETFL    = C.F_SETFL
+	FD_CLOEXEC = C.FD_CLOEXEC
+
+	O_NONBLOCK = C.O_NONBLOCK
+)
+
+type StackT C.struct_sigaltstack
+type Sighandler C.union___sigaction_u
+
+type Sigaction C.struct___sigaction // used in syscalls
+type Usigaction C.struct_sigaction  // used by sigaction second argument
+type Sigset C.sigset_t
+type Sigval C.union_sigval
+type Siginfo C.siginfo_t
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+type Timespec C.struct_timespec
+
+type FPControl C.struct_fp_control
+type FPStatus C.struct_fp_status
+type RegMMST C.struct_mmst_reg
+type RegXMM C.struct_xmm_reg
+
+type Regs64 C.struct_x86_thread_state64
+type FloatState64 C.struct_x86_float_state64
+type ExceptionState64 C.struct_x86_exception_state64
+type Mcontext64 C.struct_mcontext64
+
+type Regs32 C.struct_i386_thread_state
+type FloatState32 C.struct_i386_float_state
+type ExceptionState32 C.struct_i386_exception_state
+type Mcontext32 C.struct_mcontext32
+
+type Ucontext C.struct_ucontext
+
+type Kevent C.struct_kevent
+
+type Pthread C.pthread_t
+type PthreadAttr C.pthread_attr_t
+type PthreadMutex C.pthread_mutex_t
+type PthreadMutexAttr C.pthread_mutexattr_t
+type PthreadCond C.pthread_cond_t
+type PthreadCondAttr C.pthread_condattr_t
+
+type MachTimebaseInfo C.mach_timebase_info_data_t
diff --git a/src/runtime/defs_darwin_amd64.go b/src/runtime/defs_darwin_amd64.go
new file mode 100644
index 0000000..cbc26bf
--- /dev/null
+++ b/src/runtime/defs_darwin_amd64.go
@@ -0,0 +1,372 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_darwin.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ETIMEDOUT = 0x3c
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED      = 0x4
+	_MADV_FREE          = 0x5
+	_MADV_FREE_REUSABLE = 0x7
+	_MADV_FREE_REUSE    = 0x8
+
+	_SA_SIGINFO   = 0x40
+	_SA_RESTART   = 0x2
+	_SA_ONSTACK   = 0x1
+	_SA_USERTRAMP = 0x100
+	_SA_64REGSET  = 0x200
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x7
+	_FPE_INTOVF = 0x8
+	_FPE_FLTDIV = 0x1
+	_FPE_FLTOVF = 0x2
+	_FPE_FLTUND = 0x3
+	_FPE_FLTRES = 0x4
+	_FPE_FLTINV = 0x5
+	_FPE_FLTSUB = 0x6
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+
+	_PTHREAD_CREATE_DETACHED = 0x2
+
+	_F_SETFD    = 0x2
+	_F_GETFL    = 0x3
+	_F_SETFL    = 0x4
+	_FD_CLOEXEC = 0x1
+
+	_O_NONBLOCK = 4
+)
+
+type stackt struct {
+	ss_sp     *byte
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type sigactiont struct {
+	__sigaction_u [8]byte
+	sa_tramp      unsafe.Pointer
+	sa_mask       uint32
+	sa_flags      int32
+}
+
+type usigactiont struct {
+	__sigaction_u [8]byte
+	sa_mask       uint32
+	sa_flags      int32
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uint64
+	si_value  [8]byte
+	si_band   int64
+	__pad     [7]uint64
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type fpcontrol struct {
+	pad_cgo_0 [2]byte
+}
+
+type fpstatus struct {
+	pad_cgo_0 [2]byte
+}
+
+type regmmst struct {
+	mmst_reg  [10]int8
+	mmst_rsrv [6]int8
+}
+
+type regxmm struct {
+	xmm_reg [16]int8
+}
+
+type regs64 struct {
+	rax    uint64
+	rbx    uint64
+	rcx    uint64
+	rdx    uint64
+	rdi    uint64
+	rsi    uint64
+	rbp    uint64
+	rsp    uint64
+	r8     uint64
+	r9     uint64
+	r10    uint64
+	r11    uint64
+	r12    uint64
+	r13    uint64
+	r14    uint64
+	r15    uint64
+	rip    uint64
+	rflags uint64
+	cs     uint64
+	fs     uint64
+	gs     uint64
+}
+
+type floatstate64 struct {
+	fpu_reserved  [2]int32
+	fpu_fcw       fpcontrol
+	fpu_fsw       fpstatus
+	fpu_ftw       uint8
+	fpu_rsrv1     uint8
+	fpu_fop       uint16
+	fpu_ip        uint32
+	fpu_cs        uint16
+	fpu_rsrv2     uint16
+	fpu_dp        uint32
+	fpu_ds        uint16
+	fpu_rsrv3     uint16
+	fpu_mxcsr     uint32
+	fpu_mxcsrmask uint32
+	fpu_stmm0     regmmst
+	fpu_stmm1     regmmst
+	fpu_stmm2     regmmst
+	fpu_stmm3     regmmst
+	fpu_stmm4     regmmst
+	fpu_stmm5     regmmst
+	fpu_stmm6     regmmst
+	fpu_stmm7     regmmst
+	fpu_xmm0      regxmm
+	fpu_xmm1      regxmm
+	fpu_xmm2      regxmm
+	fpu_xmm3      regxmm
+	fpu_xmm4      regxmm
+	fpu_xmm5      regxmm
+	fpu_xmm6      regxmm
+	fpu_xmm7      regxmm
+	fpu_xmm8      regxmm
+	fpu_xmm9      regxmm
+	fpu_xmm10     regxmm
+	fpu_xmm11     regxmm
+	fpu_xmm12     regxmm
+	fpu_xmm13     regxmm
+	fpu_xmm14     regxmm
+	fpu_xmm15     regxmm
+	fpu_rsrv4     [96]int8
+	fpu_reserved1 int32
+}
+
+type exceptionstate64 struct {
+	trapno     uint16
+	cpu        uint16
+	err        uint32
+	faultvaddr uint64
+}
+
+type mcontext64 struct {
+	es        exceptionstate64
+	ss        regs64
+	fs        floatstate64
+	pad_cgo_0 [4]byte
+}
+
+type regs32 struct {
+	eax    uint32
+	ebx    uint32
+	ecx    uint32
+	edx    uint32
+	edi    uint32
+	esi    uint32
+	ebp    uint32
+	esp    uint32
+	ss     uint32
+	eflags uint32
+	eip    uint32
+	cs     uint32
+	ds     uint32
+	es     uint32
+	fs     uint32
+	gs     uint32
+}
+
+type floatstate32 struct {
+	fpu_reserved  [2]int32
+	fpu_fcw       fpcontrol
+	fpu_fsw       fpstatus
+	fpu_ftw       uint8
+	fpu_rsrv1     uint8
+	fpu_fop       uint16
+	fpu_ip        uint32
+	fpu_cs        uint16
+	fpu_rsrv2     uint16
+	fpu_dp        uint32
+	fpu_ds        uint16
+	fpu_rsrv3     uint16
+	fpu_mxcsr     uint32
+	fpu_mxcsrmask uint32
+	fpu_stmm0     regmmst
+	fpu_stmm1     regmmst
+	fpu_stmm2     regmmst
+	fpu_stmm3     regmmst
+	fpu_stmm4     regmmst
+	fpu_stmm5     regmmst
+	fpu_stmm6     regmmst
+	fpu_stmm7     regmmst
+	fpu_xmm0      regxmm
+	fpu_xmm1      regxmm
+	fpu_xmm2      regxmm
+	fpu_xmm3      regxmm
+	fpu_xmm4      regxmm
+	fpu_xmm5      regxmm
+	fpu_xmm6      regxmm
+	fpu_xmm7      regxmm
+	fpu_rsrv4     [224]int8
+	fpu_reserved1 int32
+}
+
+type exceptionstate32 struct {
+	trapno     uint16
+	cpu        uint16
+	err        uint32
+	faultvaddr uint32
+}
+
+type mcontext32 struct {
+	es exceptionstate32
+	ss regs32
+	fs floatstate32
+}
+
+type ucontext struct {
+	uc_onstack  int32
+	uc_sigmask  uint32
+	uc_stack    stackt
+	uc_link     *ucontext
+	uc_mcsize   uint64
+	uc_mcontext *mcontext64
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type pthread uintptr
+type pthreadattr struct {
+	X__sig    int64
+	X__opaque [56]int8
+}
+type pthreadmutex struct {
+	X__sig    int64
+	X__opaque [56]int8
+}
+type pthreadmutexattr struct {
+	X__sig    int64
+	X__opaque [8]int8
+}
+type pthreadcond struct {
+	X__sig    int64
+	X__opaque [40]int8
+}
+type pthreadcondattr struct {
+	X__sig    int64
+	X__opaque [8]int8
+}
+
+type machTimebaseInfo struct {
+	numer uint32
+	denom uint32
+}
diff --git a/src/runtime/defs_darwin_arm64.go b/src/runtime/defs_darwin_arm64.go
new file mode 100644
index 0000000..9076e8b
--- /dev/null
+++ b/src/runtime/defs_darwin_arm64.go
@@ -0,0 +1,239 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_darwin.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ETIMEDOUT = 0x3c
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED      = 0x4
+	_MADV_FREE          = 0x5
+	_MADV_FREE_REUSABLE = 0x7
+	_MADV_FREE_REUSE    = 0x8
+
+	_SA_SIGINFO   = 0x40
+	_SA_RESTART   = 0x2
+	_SA_ONSTACK   = 0x1
+	_SA_USERTRAMP = 0x100
+	_SA_64REGSET  = 0x200
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x7
+	_FPE_INTOVF = 0x8
+	_FPE_FLTDIV = 0x1
+	_FPE_FLTOVF = 0x2
+	_FPE_FLTUND = 0x3
+	_FPE_FLTRES = 0x4
+	_FPE_FLTINV = 0x5
+	_FPE_FLTSUB = 0x6
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+
+	_PTHREAD_CREATE_DETACHED = 0x2
+
+	_PTHREAD_KEYS_MAX = 512
+
+	_F_SETFD    = 0x2
+	_F_GETFL    = 0x3
+	_F_SETFL    = 0x4
+	_FD_CLOEXEC = 0x1
+
+	_O_NONBLOCK = 4
+)
+
+type stackt struct {
+	ss_sp     *byte
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type sigactiont struct {
+	__sigaction_u [8]byte
+	sa_tramp      unsafe.Pointer
+	sa_mask       uint32
+	sa_flags      int32
+}
+
+type usigactiont struct {
+	__sigaction_u [8]byte
+	sa_mask       uint32
+	sa_flags      int32
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   *byte
+	si_value  [8]byte
+	si_band   int64
+	__pad     [7]uint64
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type exceptionstate64 struct {
+	far uint64 // virtual fault addr
+	esr uint32 // exception syndrome
+	exc uint32 // number of arm exception taken
+}
+
+type regs64 struct {
+	x     [29]uint64 // registers x0 to x28
+	fp    uint64     // frame register, x29
+	lr    uint64     // link register, x30
+	sp    uint64     // stack pointer, x31
+	pc    uint64     // program counter
+	cpsr  uint32     // current program status register
+	__pad uint32
+}
+
+type neonstate64 struct {
+	v    [64]uint64 // actually [32]uint128
+	fpsr uint32
+	fpcr uint32
+}
+
+type mcontext64 struct {
+	es exceptionstate64
+	ss regs64
+	ns neonstate64
+}
+
+type ucontext struct {
+	uc_onstack  int32
+	uc_sigmask  uint32
+	uc_stack    stackt
+	uc_link     *ucontext
+	uc_mcsize   uint64
+	uc_mcontext *mcontext64
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type pthread uintptr
+type pthreadattr struct {
+	X__sig    int64
+	X__opaque [56]int8
+}
+type pthreadmutex struct {
+	X__sig    int64
+	X__opaque [56]int8
+}
+type pthreadmutexattr struct {
+	X__sig    int64
+	X__opaque [8]int8
+}
+type pthreadcond struct {
+	X__sig    int64
+	X__opaque [40]int8
+}
+type pthreadcondattr struct {
+	X__sig    int64
+	X__opaque [8]int8
+}
+
+type machTimebaseInfo struct {
+	numer uint32
+	denom uint32
+}
+
+type pthreadkey uint64
diff --git a/src/runtime/defs_dragonfly.go b/src/runtime/defs_dragonfly.go
new file mode 100644
index 0000000..95014fe
--- /dev/null
+++ b/src/runtime/defs_dragonfly.go
@@ -0,0 +1,125 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_dragonfly.go >defs_dragonfly_amd64.h
+*/
+
+package runtime
+
+/*
+#include <sys/user.h>
+#include <sys/time.h>
+#include <sys/event.h>
+#include <sys/mman.h>
+#include <sys/ucontext.h>
+#include <sys/rtprio.h>
+#include <sys/signal.h>
+#include <sys/unistd.h>
+#include <errno.h>
+#include <signal.h>
+*/
+import "C"
+
+const (
+	EINTR  = C.EINTR
+	EFAULT = C.EFAULT
+	EBUSY  = C.EBUSY
+	EAGAIN = C.EAGAIN
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_FREE = C.MADV_FREE
+
+	SA_SIGINFO = C.SA_SIGINFO
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGINFO   = C.SIGINFO
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EV_ADD       = C.EV_ADD
+	EV_DELETE    = C.EV_DELETE
+	EV_CLEAR     = C.EV_CLEAR
+	EV_ERROR     = C.EV_ERROR
+	EV_EOF       = C.EV_EOF
+	EVFILT_READ  = C.EVFILT_READ
+	EVFILT_WRITE = C.EVFILT_WRITE
+)
+
+type Rtprio C.struct_rtprio
+type Lwpparams C.struct_lwp_params
+type Sigset C.struct___sigset
+type StackT C.stack_t
+
+type Siginfo C.siginfo_t
+
+type Mcontext C.mcontext_t
+type Ucontext C.ucontext_t
+
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+
+type Kevent C.struct_kevent
diff --git a/src/runtime/defs_dragonfly_amd64.go b/src/runtime/defs_dragonfly_amd64.go
new file mode 100644
index 0000000..30f1b33
--- /dev/null
+++ b/src/runtime/defs_dragonfly_amd64.go
@@ -0,0 +1,204 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_dragonfly.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EBUSY  = 0x10
+	_EAGAIN = 0x23
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_FREE = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type lwpparams struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack      uintptr
+	tid1       unsafe.Pointer // *int32
+	tid2       unsafe.Pointer // *int32
+}
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uint64
+	si_value  [8]byte
+	si_band   int64
+	__spare__ [7]int32
+	pad_cgo_0 [4]byte
+}
+
+type mcontext struct {
+	mc_onstack  uint64
+	mc_rdi      uint64
+	mc_rsi      uint64
+	mc_rdx      uint64
+	mc_rcx      uint64
+	mc_r8       uint64
+	mc_r9       uint64
+	mc_rax      uint64
+	mc_rbx      uint64
+	mc_rbp      uint64
+	mc_r10      uint64
+	mc_r11      uint64
+	mc_r12      uint64
+	mc_r13      uint64
+	mc_r14      uint64
+	mc_r15      uint64
+	mc_xflags   uint64
+	mc_trapno   uint64
+	mc_addr     uint64
+	mc_flags    uint64
+	mc_err      uint64
+	mc_rip      uint64
+	mc_cs       uint64
+	mc_rflags   uint64
+	mc_rsp      uint64
+	mc_ss       uint64
+	mc_len      uint32
+	mc_fpformat uint32
+	mc_ownedfp  uint32
+	mc_reserved uint32
+	mc_unused   [8]uint32
+	mc_fpregs   [256]int32
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	pad_cgo_0   [48]byte
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	__spare__   [8]int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
diff --git a/src/runtime/defs_freebsd.go b/src/runtime/defs_freebsd.go
new file mode 100644
index 0000000..e196dff
--- /dev/null
+++ b/src/runtime/defs_freebsd.go
@@ -0,0 +1,169 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_freebsd.go >defs_freebsd_amd64.h
+GOARCH=386 go tool cgo -cdefs defs_freebsd.go >defs_freebsd_386.h
+GOARCH=arm go tool cgo -cdefs defs_freebsd.go >defs_freebsd_arm.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <sys/time.h>
+#include <signal.h>
+#include <errno.h>
+#define _WANT_FREEBSD11_KEVENT 1
+#include <sys/event.h>
+#include <sys/mman.h>
+#include <sys/ucontext.h>
+#include <sys/umtx.h>
+#include <sys/_umtx.h>
+#include <sys/rtprio.h>
+#include <sys/thr.h>
+#include <sys/_sigset.h>
+#include <sys/unistd.h>
+#include <sys/sysctl.h>
+#include <sys/cpuset.h>
+#include <sys/param.h>
+#include <sys/vdso.h>
+*/
+import "C"
+
+// Local consts.
+const (
+	_NBBY            = C.NBBY            // Number of bits in a byte.
+	_CTL_MAXNAME     = C.CTL_MAXNAME     // Largest number of components supported.
+	_CPU_LEVEL_WHICH = C.CPU_LEVEL_WHICH // Actual mask/id for which.
+	_CPU_WHICH_PID   = C.CPU_WHICH_PID   // Specifies a process id.
+)
+
+const (
+	EINTR  = C.EINTR
+	EFAULT = C.EFAULT
+	EAGAIN = C.EAGAIN
+	ENOSYS = C.ENOSYS
+
+	O_NONBLOCK = C.O_NONBLOCK
+	O_CLOEXEC  = C.O_CLOEXEC
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_SHARED  = C.MAP_SHARED
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_FREE = C.MADV_FREE
+
+	SA_SIGINFO = C.SA_SIGINFO
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+
+	CLOCK_MONOTONIC = C.CLOCK_MONOTONIC
+	CLOCK_REALTIME  = C.CLOCK_REALTIME
+
+	UMTX_OP_WAIT_UINT         = C.UMTX_OP_WAIT_UINT
+	UMTX_OP_WAIT_UINT_PRIVATE = C.UMTX_OP_WAIT_UINT_PRIVATE
+	UMTX_OP_WAKE              = C.UMTX_OP_WAKE
+	UMTX_OP_WAKE_PRIVATE      = C.UMTX_OP_WAKE_PRIVATE
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGINFO   = C.SIGINFO
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EV_ADD       = C.EV_ADD
+	EV_DELETE    = C.EV_DELETE
+	EV_CLEAR     = C.EV_CLEAR
+	EV_RECEIPT   = C.EV_RECEIPT
+	EV_ERROR     = C.EV_ERROR
+	EV_EOF       = C.EV_EOF
+	EVFILT_READ  = C.EVFILT_READ
+	EVFILT_WRITE = C.EVFILT_WRITE
+)
+
+type Rtprio C.struct_rtprio
+type ThrParam C.struct_thr_param
+type Sigset C.struct___sigset
+type StackT C.stack_t
+
+type Siginfo C.siginfo_t
+
+type Mcontext C.mcontext_t
+type Ucontext C.ucontext_t
+
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+
+type Umtx_time C.struct__umtx_time
+
+type Kevent C.struct_kevent_freebsd11
+
+type bintime C.struct_bintime
+type vdsoTimehands C.struct_vdso_timehands
+type vdsoTimekeep C.struct_vdso_timekeep
+
+const (
+	_VDSO_TK_VER_CURR = C.VDSO_TK_VER_CURR
+
+	vdsoTimehandsSize = C.sizeof_struct_vdso_timehands
+	vdsoTimekeepSize  = C.sizeof_struct_vdso_timekeep
+)
diff --git a/src/runtime/defs_freebsd_386.go b/src/runtime/defs_freebsd_386.go
new file mode 100644
index 0000000..f822934
--- /dev/null
+++ b/src/runtime/defs_freebsd_386.go
@@ -0,0 +1,265 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_freebsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_NBBY            = 0x8
+	_CTL_MAXNAME     = 0x18
+	_CPU_LEVEL_WHICH = 0x3
+	_CPU_WHICH_PID   = 0x2
+)
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ENOSYS    = 0x4e
+	_ETIMEDOUT = 0x3c
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x100000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_SHARED  = 0x1
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_FREE = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_CLOCK_MONOTONIC = 0x4
+	_CLOCK_REALTIME  = 0x0
+
+	_UMTX_OP_WAIT_UINT         = 0xb
+	_UMTX_OP_WAIT_UINT_PRIVATE = 0xf
+	_UMTX_OP_WAKE              = 0x3
+	_UMTX_OP_WAKE_PRIVATE      = 0x10
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type thrparam struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack_base uintptr
+	stack_size uintptr
+	tls_base   unsafe.Pointer
+	tls_size   uintptr
+	child_tid  unsafe.Pointer // *int32
+	parent_tid *int32
+	flags      int32
+	rtp        *rtprio
+	spare      [3]uintptr
+}
+
+type thread int32 // long
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uintptr
+	si_value  [4]byte
+	_reason   [32]byte
+}
+
+type mcontext struct {
+	mc_onstack       uint32
+	mc_gs            uint32
+	mc_fs            uint32
+	mc_es            uint32
+	mc_ds            uint32
+	mc_edi           uint32
+	mc_esi           uint32
+	mc_ebp           uint32
+	mc_isp           uint32
+	mc_ebx           uint32
+	mc_edx           uint32
+	mc_ecx           uint32
+	mc_eax           uint32
+	mc_trapno        uint32
+	mc_err           uint32
+	mc_eip           uint32
+	mc_cs            uint32
+	mc_eflags        uint32
+	mc_esp           uint32
+	mc_ss            uint32
+	mc_len           uint32
+	mc_fpformat      uint32
+	mc_ownedfp       uint32
+	mc_flags         uint32
+	mc_fpstate       [128]uint32
+	mc_fsbase        uint32
+	mc_gsbase        uint32
+	mc_xfpustate     uint32
+	mc_xfpustate_len uint32
+	mc_spare2        [4]uint32
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_flags    int32
+	__spare__   [4]int32
+	pad_cgo_0   [12]byte
+}
+
+type timespec struct {
+	tv_sec  int32
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = timediv(ns, 1e9, &ts.tv_nsec)
+}
+
+type timeval struct {
+	tv_sec  int32
+	tv_usec int32
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type umtx_time struct {
+	_timeout timespec
+	_flags   uint32
+	_clockid uint32
+}
+
+type keventt struct {
+	ident  uint32
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int32
+	udata  *byte
+}
+
+type bintime struct {
+	sec  int32
+	frac uint64
+}
+
+type vdsoTimehands struct {
+	algo         uint32
+	gen          uint32
+	scale        uint64
+	offset_count uint32
+	counter_mask uint32
+	offset       bintime
+	boottime     bintime
+	x86_shift    uint32
+	x86_hpet_idx uint32
+	res          [6]uint32
+}
+
+type vdsoTimekeep struct {
+	ver     uint32
+	enabled uint32
+	current uint32
+}
+
+const (
+	_VDSO_TK_VER_CURR = 0x1
+
+	vdsoTimehandsSize = 0x50
+	vdsoTimekeepSize  = 0xc
+)
diff --git a/src/runtime/defs_freebsd_amd64.go b/src/runtime/defs_freebsd_amd64.go
new file mode 100644
index 0000000..0b696cf
--- /dev/null
+++ b/src/runtime/defs_freebsd_amd64.go
@@ -0,0 +1,277 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_freebsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_NBBY            = 0x8
+	_CTL_MAXNAME     = 0x18
+	_CPU_LEVEL_WHICH = 0x3
+	_CPU_WHICH_PID   = 0x2
+)
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ENOSYS    = 0x4e
+	_ETIMEDOUT = 0x3c
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x100000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_SHARED  = 0x1
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_FREE = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_CLOCK_MONOTONIC = 0x4
+	_CLOCK_REALTIME  = 0x0
+
+	_UMTX_OP_WAIT_UINT         = 0xb
+	_UMTX_OP_WAIT_UINT_PRIVATE = 0xf
+	_UMTX_OP_WAKE              = 0x3
+	_UMTX_OP_WAKE_PRIVATE      = 0x10
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type thrparam struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack_base uintptr
+	stack_size uintptr
+	tls_base   unsafe.Pointer
+	tls_size   uintptr
+	child_tid  unsafe.Pointer // *int64
+	parent_tid *int64
+	flags      int32
+	pad_cgo_0  [4]byte
+	rtp        *rtprio
+	spare      [3]uintptr
+}
+
+type thread int64 // long
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uint64
+	si_value  [8]byte
+	_reason   [40]byte
+}
+
+type mcontext struct {
+	mc_onstack       uint64
+	mc_rdi           uint64
+	mc_rsi           uint64
+	mc_rdx           uint64
+	mc_rcx           uint64
+	mc_r8            uint64
+	mc_r9            uint64
+	mc_rax           uint64
+	mc_rbx           uint64
+	mc_rbp           uint64
+	mc_r10           uint64
+	mc_r11           uint64
+	mc_r12           uint64
+	mc_r13           uint64
+	mc_r14           uint64
+	mc_r15           uint64
+	mc_trapno        uint32
+	mc_fs            uint16
+	mc_gs            uint16
+	mc_addr          uint64
+	mc_flags         uint32
+	mc_es            uint16
+	mc_ds            uint16
+	mc_err           uint64
+	mc_rip           uint64
+	mc_cs            uint64
+	mc_rflags        uint64
+	mc_rsp           uint64
+	mc_ss            uint64
+	mc_len           uint64
+	mc_fpformat      uint64
+	mc_ownedfp       uint64
+	mc_fpstate       [64]uint64
+	mc_fsbase        uint64
+	mc_gsbase        uint64
+	mc_xfpustate     uint64
+	mc_xfpustate_len uint64
+	mc_spare         [4]uint64
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_flags    int32
+	__spare__   [4]int32
+	pad_cgo_0   [12]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type umtx_time struct {
+	_timeout timespec
+	_flags   uint32
+	_clockid uint32
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type bintime struct {
+	sec  int64
+	frac uint64
+}
+
+type vdsoTimehands struct {
+	algo         uint32
+	gen          uint32
+	scale        uint64
+	offset_count uint32
+	counter_mask uint32
+	offset       bintime
+	boottime     bintime
+	x86_shift    uint32
+	x86_hpet_idx uint32
+	res          [6]uint32
+}
+
+type vdsoTimekeep struct {
+	ver       uint32
+	enabled   uint32
+	current   uint32
+	pad_cgo_0 [4]byte
+}
+
+const (
+	_VDSO_TK_VER_CURR = 0x1
+
+	vdsoTimehandsSize = 0x58
+	vdsoTimekeepSize  = 0x10
+)
diff --git a/src/runtime/defs_freebsd_arm.go b/src/runtime/defs_freebsd_arm.go
new file mode 100644
index 0000000..b6f3e79
--- /dev/null
+++ b/src/runtime/defs_freebsd_arm.go
@@ -0,0 +1,238 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_freebsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_NBBY            = 0x8
+	_CTL_MAXNAME     = 0x18
+	_CPU_LEVEL_WHICH = 0x3
+	_CPU_WHICH_PID   = 0x2
+)
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ENOSYS    = 0x4e
+	_ETIMEDOUT = 0x3c
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x100000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_SHARED  = 0x1
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_FREE = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_CLOCK_MONOTONIC = 0x4
+	_CLOCK_REALTIME  = 0x0
+
+	_UMTX_OP_WAIT_UINT         = 0xb
+	_UMTX_OP_WAIT_UINT_PRIVATE = 0xf
+	_UMTX_OP_WAKE              = 0x3
+	_UMTX_OP_WAKE_PRIVATE      = 0x10
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type thrparam struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack_base uintptr
+	stack_size uintptr
+	tls_base   unsafe.Pointer
+	tls_size   uintptr
+	child_tid  unsafe.Pointer // *int32
+	parent_tid *int32
+	flags      int32
+	rtp        *rtprio
+	spare      [3]uintptr
+}
+
+type thread int32 // long
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uintptr
+	si_value  [4]byte
+	_reason   [32]byte
+}
+
+type mcontext struct {
+	__gregs [17]uint32
+	__fpu   [140]byte
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_flags    int32
+	__spare__   [4]int32
+}
+
+type timespec struct {
+	tv_sec    int64
+	tv_nsec   int32
+	pad_cgo_0 [4]byte
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = int64(timediv(ns, 1e9, &ts.tv_nsec))
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type umtx_time struct {
+	_timeout timespec
+	_flags   uint32
+	_clockid uint32
+}
+
+type keventt struct {
+	ident  uint32
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int32
+	udata  *byte
+}
+
+type bintime struct {
+	sec  int64
+	frac uint64
+}
+
+type vdsoTimehands struct {
+	algo         uint32
+	gen          uint32
+	scale        uint64
+	offset_count uint32
+	counter_mask uint32
+	offset       bintime
+	boottime     bintime
+	physical     uint32
+	res          [7]uint32
+}
+
+type vdsoTimekeep struct {
+	ver       uint32
+	enabled   uint32
+	current   uint32
+	pad_cgo_0 [4]byte
+}
+
+const (
+	_VDSO_TK_VER_CURR = 0x1
+
+	vdsoTimehandsSize = 0x58
+	vdsoTimekeepSize  = 0x10
+)
diff --git a/src/runtime/defs_freebsd_arm64.go b/src/runtime/defs_freebsd_arm64.go
new file mode 100644
index 0000000..0759a12
--- /dev/null
+++ b/src/runtime/defs_freebsd_arm64.go
@@ -0,0 +1,260 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_freebsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_NBBY            = 0x8
+	_CTL_MAXNAME     = 0x18
+	_CPU_LEVEL_WHICH = 0x3
+	_CPU_WHICH_PID   = 0x2
+)
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ENOSYS    = 0x4e
+	_ETIMEDOUT = 0x3c
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x100000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_SHARED  = 0x1
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_FREE = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_CLOCK_MONOTONIC = 0x4
+	_CLOCK_REALTIME  = 0x0
+
+	_UMTX_OP_WAIT_UINT         = 0xb
+	_UMTX_OP_WAIT_UINT_PRIVATE = 0xf
+	_UMTX_OP_WAKE              = 0x3
+	_UMTX_OP_WAKE_PRIVATE      = 0x10
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type thrparam struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack_base uintptr
+	stack_size uintptr
+	tls_base   unsafe.Pointer
+	tls_size   uintptr
+	child_tid  unsafe.Pointer // *int64
+	parent_tid *int64
+	flags      int32
+	pad_cgo_0  [4]byte
+	rtp        *rtprio
+	spare      [3]uintptr
+}
+
+type thread int64 // long
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uint64
+	si_value  [8]byte
+	_reason   [40]byte
+}
+
+type gpregs struct {
+	gp_x    [30]uint64
+	gp_lr   uint64
+	gp_sp   uint64
+	gp_elr  uint64
+	gp_spsr uint32
+	gp_pad  int32
+}
+
+type fpregs struct {
+	fp_q     [64]uint64 // actually [32]uint128
+	fp_sr    uint32
+	fp_cr    uint32
+	fp_flags int32
+	fp_pad   int32
+}
+
+type mcontext struct {
+	mc_gpregs gpregs
+	mc_fpregs fpregs
+	mc_flags  int32
+	mc_pad    int32
+	mc_spare  [8]uint64
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_flags    int32
+	__spare__   [4]int32
+	pad_cgo_0   [12]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type umtx_time struct {
+	_timeout timespec
+	_flags   uint32
+	_clockid uint32
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type bintime struct {
+	sec  int64
+	frac uint64
+}
+
+type vdsoTimehands struct {
+	algo         uint32
+	gen          uint32
+	scale        uint64
+	offset_count uint32
+	counter_mask uint32
+	offset       bintime
+	boottime     bintime
+	physical     uint32
+	res          [7]uint32
+}
+
+type vdsoTimekeep struct {
+	ver       uint32
+	enabled   uint32
+	current   uint32
+	pad_cgo_0 [4]byte
+}
+
+const (
+	_VDSO_TK_VER_CURR = 0x1
+
+	vdsoTimehandsSize = 0x58
+	vdsoTimekeepSize  = 0x10
+)
diff --git a/src/runtime/defs_illumos_amd64.go b/src/runtime/defs_illumos_amd64.go
new file mode 100644
index 0000000..9c5413b
--- /dev/null
+++ b/src/runtime/defs_illumos_amd64.go
@@ -0,0 +1,14 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_RCTL_LOCAL_DENY = 0x2
+
+	_RCTL_LOCAL_MAXIMAL = 0x80000000
+
+	_RCTL_FIRST = 0x0
+	_RCTL_NEXT  = 0x1
+)
diff --git a/src/runtime/defs_linux.go b/src/runtime/defs_linux.go
new file mode 100644
index 0000000..7b14063
--- /dev/null
+++ b/src/runtime/defs_linux.go
@@ -0,0 +1,129 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo -cdefs
+
+GOARCH=amd64 go tool cgo -cdefs defs_linux.go defs1_linux.go >defs_linux_amd64.h
+*/
+
+package runtime
+
+/*
+// Linux glibc and Linux kernel define different and conflicting
+// definitions for struct sigaction, struct timespec, etc.
+// We want the kernel ones, which are in the asm/* headers.
+// But then we'd get conflicts when we include the system
+// headers for things like ucontext_t, so that happens in
+// a separate file, defs1.go.
+
+#define	_SYS_TYPES_H	// avoid inclusion of sys/types.h
+#include <asm/posix_types.h>
+#define size_t __kernel_size_t
+#include <asm/signal.h>
+#include <asm/siginfo.h>
+#include <asm/mman.h>
+#include <asm-generic/errno.h>
+#include <asm-generic/poll.h>
+#include <linux/eventpoll.h>
+#include <linux/time.h>
+*/
+import "C"
+
+const (
+	EINTR  = C.EINTR
+	EAGAIN = C.EAGAIN
+	ENOMEM = C.ENOMEM
+	ENOSYS = C.ENOSYS
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANONYMOUS
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED   = C.MADV_DONTNEED
+	MADV_FREE       = C.MADV_FREE
+	MADV_HUGEPAGE   = C.MADV_HUGEPAGE
+	MADV_NOHUGEPAGE = C.MADV_NOHUGEPAGE
+
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+	SA_SIGINFO = C.SA_SIGINFO
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGBUS    = C.SIGBUS
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGUSR1   = C.SIGUSR1
+	SIGSEGV   = C.SIGSEGV
+	SIGUSR2   = C.SIGUSR2
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGSTKFLT = C.SIGSTKFLT
+	SIGCHLD   = C.SIGCHLD
+	SIGCONT   = C.SIGCONT
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGURG    = C.SIGURG
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGIO     = C.SIGIO
+	SIGPWR    = C.SIGPWR
+	SIGSYS    = C.SIGSYS
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EPOLLIN       = C.POLLIN
+	EPOLLOUT      = C.POLLOUT
+	EPOLLERR      = C.POLLERR
+	EPOLLHUP      = C.POLLHUP
+	EPOLLRDHUP    = C.POLLRDHUP
+	EPOLLET       = C.EPOLLET
+	EPOLL_CLOEXEC = C.EPOLL_CLOEXEC
+	EPOLL_CTL_ADD = C.EPOLL_CTL_ADD
+	EPOLL_CTL_DEL = C.EPOLL_CTL_DEL
+	EPOLL_CTL_MOD = C.EPOLL_CTL_MOD
+)
+
+type Sigset C.sigset_t
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Sigaction C.struct_sigaction
+type Siginfo C.siginfo_t
+type Itimerval C.struct_itimerval
+type EpollEvent C.struct_epoll_event
diff --git a/src/runtime/defs_linux_386.go b/src/runtime/defs_linux_386.go
new file mode 100644
index 0000000..64a0fbc
--- /dev/null
+++ b/src/runtime/defs_linux_386.go
@@ -0,0 +1,228 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs2_linux.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+	_ENOSYS = 0x26
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+
+	_SA_RESTART  = 0x10000000
+	_SA_ONSTACK  = 0x8000000
+	_SA_RESTORER = 0x4000000
+	_SA_SIGINFO  = 0x4
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_O_RDONLY   = 0x0
+	_O_NONBLOCK = 0x800
+	_O_CLOEXEC  = 0x80000
+
+	_EPOLLIN       = 0x1
+	_EPOLLOUT      = 0x4
+	_EPOLLERR      = 0x8
+	_EPOLLHUP      = 0x10
+	_EPOLLRDHUP    = 0x2000
+	_EPOLLET       = 0x80000000
+	_EPOLL_CLOEXEC = 0x80000
+	_EPOLL_CTL_ADD = 0x1
+	_EPOLL_CTL_DEL = 0x2
+	_EPOLL_CTL_MOD = 0x3
+
+	_AF_UNIX    = 0x1
+	_SOCK_DGRAM = 0x2
+)
+
+type fpreg struct {
+	significand [4]uint16
+	exponent    uint16
+}
+
+type fpxreg struct {
+	significand [4]uint16
+	exponent    uint16
+	padding     [3]uint16
+}
+
+type xmmreg struct {
+	element [4]uint32
+}
+
+type fpstate struct {
+	cw        uint32
+	sw        uint32
+	tag       uint32
+	ipoff     uint32
+	cssel     uint32
+	dataoff   uint32
+	datasel   uint32
+	_st       [8]fpreg
+	status    uint16
+	magic     uint16
+	_fxsr_env [6]uint32
+	mxcsr     uint32
+	reserved  uint32
+	_fxsr_st  [8]fpxreg
+	_xmm      [8]xmmreg
+	padding1  [44]uint32
+	anon0     [48]byte
+}
+
+type timespec struct {
+	tv_sec  int32
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = timediv(ns, 1e9, &ts.tv_nsec)
+}
+
+type timeval struct {
+	tv_sec  int32
+	tv_usec int32
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint32
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfo struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint32
+}
+
+type stackt struct {
+	ss_sp    *byte
+	ss_flags int32
+	ss_size  uintptr
+}
+
+type sigcontext struct {
+	gs            uint16
+	__gsh         uint16
+	fs            uint16
+	__fsh         uint16
+	es            uint16
+	__esh         uint16
+	ds            uint16
+	__dsh         uint16
+	edi           uint32
+	esi           uint32
+	ebp           uint32
+	esp           uint32
+	ebx           uint32
+	edx           uint32
+	ecx           uint32
+	eax           uint32
+	trapno        uint32
+	err           uint32
+	eip           uint32
+	cs            uint16
+	__csh         uint16
+	eflags        uint32
+	esp_at_signal uint32
+	ss            uint16
+	__ssh         uint16
+	fpstate       *fpstate
+	oldmask       uint32
+	cr2           uint32
+}
+
+type ucontext struct {
+	uc_flags    uint32
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_mcontext sigcontext
+	uc_sigmask  uint32
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type epollevent struct {
+	events uint32
+	data   [8]byte // to match amd64
+}
+
+type sockaddr_un struct {
+	family uint16
+	path   [108]byte
+}
diff --git a/src/runtime/defs_linux_amd64.go b/src/runtime/defs_linux_amd64.go
new file mode 100644
index 0000000..1ae18a3
--- /dev/null
+++ b/src/runtime/defs_linux_amd64.go
@@ -0,0 +1,264 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs1_linux.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+	_ENOSYS = 0x26
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+
+	_SA_RESTART  = 0x10000000
+	_SA_ONSTACK  = 0x8000000
+	_SA_RESTORER = 0x4000000
+	_SA_SIGINFO  = 0x4
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EPOLLIN       = 0x1
+	_EPOLLOUT      = 0x4
+	_EPOLLERR      = 0x8
+	_EPOLLHUP      = 0x10
+	_EPOLLRDHUP    = 0x2000
+	_EPOLLET       = 0x80000000
+	_EPOLL_CLOEXEC = 0x80000
+	_EPOLL_CTL_ADD = 0x1
+	_EPOLL_CTL_DEL = 0x2
+	_EPOLL_CTL_MOD = 0x3
+
+	_AF_UNIX    = 0x1
+	_SOCK_DGRAM = 0x2
+)
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfo struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type epollevent struct {
+	events uint32
+	data   [8]byte // unaligned uintptr
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs1_linux.go
+
+const (
+	_O_RDONLY   = 0x0
+	_O_NONBLOCK = 0x800
+	_O_CLOEXEC  = 0x80000
+)
+
+type usigset struct {
+	__val [16]uint64
+}
+
+type fpxreg struct {
+	significand [4]uint16
+	exponent    uint16
+	padding     [3]uint16
+}
+
+type xmmreg struct {
+	element [4]uint32
+}
+
+type fpstate struct {
+	cwd       uint16
+	swd       uint16
+	ftw       uint16
+	fop       uint16
+	rip       uint64
+	rdp       uint64
+	mxcsr     uint32
+	mxcr_mask uint32
+	_st       [8]fpxreg
+	_xmm      [16]xmmreg
+	padding   [24]uint32
+}
+
+type fpxreg1 struct {
+	significand [4]uint16
+	exponent    uint16
+	padding     [3]uint16
+}
+
+type xmmreg1 struct {
+	element [4]uint32
+}
+
+type fpstate1 struct {
+	cwd       uint16
+	swd       uint16
+	ftw       uint16
+	fop       uint16
+	rip       uint64
+	rdp       uint64
+	mxcsr     uint32
+	mxcr_mask uint32
+	_st       [8]fpxreg1
+	_xmm      [16]xmmreg1
+	padding   [24]uint32
+}
+
+type fpreg1 struct {
+	significand [4]uint16
+	exponent    uint16
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+	ss_size   uintptr
+}
+
+type mcontext struct {
+	gregs       [23]uint64
+	fpregs      *fpstate
+	__reserved1 [8]uint64
+}
+
+type ucontext struct {
+	uc_flags     uint64
+	uc_link      *ucontext
+	uc_stack     stackt
+	uc_mcontext  mcontext
+	uc_sigmask   usigset
+	__fpregs_mem fpstate
+}
+
+type sigcontext struct {
+	r8          uint64
+	r9          uint64
+	r10         uint64
+	r11         uint64
+	r12         uint64
+	r13         uint64
+	r14         uint64
+	r15         uint64
+	rdi         uint64
+	rsi         uint64
+	rbp         uint64
+	rbx         uint64
+	rdx         uint64
+	rax         uint64
+	rcx         uint64
+	rsp         uint64
+	rip         uint64
+	eflags      uint64
+	cs          uint16
+	gs          uint16
+	fs          uint16
+	__pad0      uint16
+	err         uint64
+	trapno      uint64
+	oldmask     uint64
+	cr2         uint64
+	fpstate     *fpstate1
+	__reserved1 [8]uint64
+}
+
+type sockaddr_un struct {
+	family uint16
+	path   [108]byte
+}
diff --git a/src/runtime/defs_linux_arm.go b/src/runtime/defs_linux_arm.go
new file mode 100644
index 0000000..5bc0916
--- /dev/null
+++ b/src/runtime/defs_linux_arm.go
@@ -0,0 +1,185 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Constants
+const (
+	_EINTR  = 0x4
+	_ENOMEM = 0xc
+	_EAGAIN = 0xb
+	_ENOSYS = 0x26
+
+	_PROT_NONE  = 0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+
+	_SA_RESTART     = 0x10000000
+	_SA_ONSTACK     = 0x8000000
+	_SA_RESTORER    = 0 // unused on ARM
+	_SA_SIGINFO     = 0x4
+	_SIGHUP         = 0x1
+	_SIGINT         = 0x2
+	_SIGQUIT        = 0x3
+	_SIGILL         = 0x4
+	_SIGTRAP        = 0x5
+	_SIGABRT        = 0x6
+	_SIGBUS         = 0x7
+	_SIGFPE         = 0x8
+	_SIGKILL        = 0x9
+	_SIGUSR1        = 0xa
+	_SIGSEGV        = 0xb
+	_SIGUSR2        = 0xc
+	_SIGPIPE        = 0xd
+	_SIGALRM        = 0xe
+	_SIGSTKFLT      = 0x10
+	_SIGCHLD        = 0x11
+	_SIGCONT        = 0x12
+	_SIGSTOP        = 0x13
+	_SIGTSTP        = 0x14
+	_SIGTTIN        = 0x15
+	_SIGTTOU        = 0x16
+	_SIGURG         = 0x17
+	_SIGXCPU        = 0x18
+	_SIGXFSZ        = 0x19
+	_SIGVTALRM      = 0x1a
+	_SIGPROF        = 0x1b
+	_SIGWINCH       = 0x1c
+	_SIGIO          = 0x1d
+	_SIGPWR         = 0x1e
+	_SIGSYS         = 0x1f
+	_FPE_INTDIV     = 0x1
+	_FPE_INTOVF     = 0x2
+	_FPE_FLTDIV     = 0x3
+	_FPE_FLTOVF     = 0x4
+	_FPE_FLTUND     = 0x5
+	_FPE_FLTRES     = 0x6
+	_FPE_FLTINV     = 0x7
+	_FPE_FLTSUB     = 0x8
+	_BUS_ADRALN     = 0x1
+	_BUS_ADRERR     = 0x2
+	_BUS_OBJERR     = 0x3
+	_SEGV_MAPERR    = 0x1
+	_SEGV_ACCERR    = 0x2
+	_ITIMER_REAL    = 0
+	_ITIMER_PROF    = 0x2
+	_ITIMER_VIRTUAL = 0x1
+	_O_RDONLY       = 0
+	_O_NONBLOCK     = 0x800
+	_O_CLOEXEC      = 0x80000
+
+	_EPOLLIN       = 0x1
+	_EPOLLOUT      = 0x4
+	_EPOLLERR      = 0x8
+	_EPOLLHUP      = 0x10
+	_EPOLLRDHUP    = 0x2000
+	_EPOLLET       = 0x80000000
+	_EPOLL_CLOEXEC = 0x80000
+	_EPOLL_CTL_ADD = 0x1
+	_EPOLL_CTL_DEL = 0x2
+	_EPOLL_CTL_MOD = 0x3
+
+	_AF_UNIX    = 0x1
+	_SOCK_DGRAM = 0x2
+)
+
+type timespec struct {
+	tv_sec  int32
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = timediv(ns, 1e9, &ts.tv_nsec)
+}
+
+type stackt struct {
+	ss_sp    *byte
+	ss_flags int32
+	ss_size  uintptr
+}
+
+type sigcontext struct {
+	trap_no       uint32
+	error_code    uint32
+	oldmask       uint32
+	r0            uint32
+	r1            uint32
+	r2            uint32
+	r3            uint32
+	r4            uint32
+	r5            uint32
+	r6            uint32
+	r7            uint32
+	r8            uint32
+	r9            uint32
+	r10           uint32
+	fp            uint32
+	ip            uint32
+	sp            uint32
+	lr            uint32
+	pc            uint32
+	cpsr          uint32
+	fault_address uint32
+}
+
+type ucontext struct {
+	uc_flags    uint32
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_mcontext sigcontext
+	uc_sigmask  uint32
+	__unused    [31]int32
+	uc_regspace [128]uint32
+}
+
+type timeval struct {
+	tv_sec  int32
+	tv_usec int32
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type siginfo struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint32
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint32
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type epollevent struct {
+	events uint32
+	_pad   uint32
+	data   [8]byte // to match amd64
+}
+
+type sockaddr_un struct {
+	family uint16
+	path   [108]byte
+}
diff --git a/src/runtime/defs_linux_arm64.go b/src/runtime/defs_linux_arm64.go
new file mode 100644
index 0000000..0690cd3
--- /dev/null
+++ b/src/runtime/defs_linux_arm64.go
@@ -0,0 +1,187 @@
+// Created by cgo -cdefs and converted (by hand) to Go
+// ../cmd/cgo/cgo -cdefs defs_linux.go defs1_linux.go defs2_linux.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+	_ENOSYS = 0x26
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+
+	_SA_RESTART  = 0x10000000
+	_SA_ONSTACK  = 0x8000000
+	_SA_RESTORER = 0x0 // Only used on intel
+	_SA_SIGINFO  = 0x4
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EPOLLIN       = 0x1
+	_EPOLLOUT      = 0x4
+	_EPOLLERR      = 0x8
+	_EPOLLHUP      = 0x10
+	_EPOLLRDHUP    = 0x2000
+	_EPOLLET       = 0x80000000
+	_EPOLL_CLOEXEC = 0x80000
+	_EPOLL_CTL_ADD = 0x1
+	_EPOLL_CTL_DEL = 0x2
+	_EPOLL_CTL_MOD = 0x3
+
+	_AF_UNIX    = 0x1
+	_SOCK_DGRAM = 0x2
+)
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfo struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type epollevent struct {
+	events uint32
+	_pad   uint32
+	data   [8]byte // to match amd64
+}
+
+// Created by cgo -cdefs and then converted to Go by hand
+// ../cmd/cgo/cgo -cdefs defs_linux.go defs1_linux.go defs2_linux.go
+
+const (
+	_O_RDONLY   = 0x0
+	_O_NONBLOCK = 0x800
+	_O_CLOEXEC  = 0x80000
+)
+
+type usigset struct {
+	__val [16]uint64
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+	ss_size   uintptr
+}
+
+type sigcontext struct {
+	fault_address uint64
+	/* AArch64 registers */
+	regs       [31]uint64
+	sp         uint64
+	pc         uint64
+	pstate     uint64
+	_pad       [8]byte // __attribute__((__aligned__(16)))
+	__reserved [4096]byte
+}
+
+type sockaddr_un struct {
+	family uint16
+	path   [108]byte
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_sigmask  uint64
+	_pad        [(1024 - 64) / 8]byte
+	_pad2       [8]byte // sigcontext must be aligned to 16-byte
+	uc_mcontext sigcontext
+}
diff --git a/src/runtime/defs_linux_mips64x.go b/src/runtime/defs_linux_mips64x.go
new file mode 100644
index 0000000..1fb423b
--- /dev/null
+++ b/src/runtime/defs_linux_mips64x.go
@@ -0,0 +1,188 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+// +build linux
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+	_ENOSYS = 0x59
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x800
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+
+	_SA_RESTART = 0x10000000
+	_SA_ONSTACK = 0x8000000
+	_SA_SIGINFO = 0x8
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGUSR1   = 0x10
+	_SIGUSR2   = 0x11
+	_SIGCHLD   = 0x12
+	_SIGPWR    = 0x13
+	_SIGWINCH  = 0x14
+	_SIGURG    = 0x15
+	_SIGIO     = 0x16
+	_SIGSTOP   = 0x17
+	_SIGTSTP   = 0x18
+	_SIGCONT   = 0x19
+	_SIGTTIN   = 0x1a
+	_SIGTTOU   = 0x1b
+	_SIGVTALRM = 0x1c
+	_SIGPROF   = 0x1d
+	_SIGXCPU   = 0x1e
+	_SIGXFSZ   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EPOLLIN       = 0x1
+	_EPOLLOUT      = 0x4
+	_EPOLLERR      = 0x8
+	_EPOLLHUP      = 0x10
+	_EPOLLRDHUP    = 0x2000
+	_EPOLLET       = 0x80000000
+	_EPOLL_CLOEXEC = 0x80000
+	_EPOLL_CTL_ADD = 0x1
+	_EPOLL_CTL_DEL = 0x2
+	_EPOLL_CTL_MOD = 0x3
+)
+
+//struct Sigset {
+//	uint64	sig[1];
+//};
+//typedef uint64 Sigset;
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_flags   uint32
+	sa_handler uintptr
+	sa_mask    [2]uint64
+	// linux header does not have sa_restorer field,
+	// but it is used in setsig(). it is no harm to put it here
+	sa_restorer uintptr
+}
+
+type siginfo struct {
+	si_signo int32
+	si_code  int32
+	si_errno int32
+	__pad0   [1]int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type epollevent struct {
+	events    uint32
+	pad_cgo_0 [4]byte
+	data      [8]byte // unaligned uintptr
+}
+
+const (
+	_O_RDONLY    = 0x0
+	_O_NONBLOCK  = 0x80
+	_O_CLOEXEC   = 0x80000
+	_SA_RESTORER = 0
+)
+
+type stackt struct {
+	ss_sp    *byte
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type sigcontext struct {
+	sc_regs      [32]uint64
+	sc_fpregs    [32]uint64
+	sc_mdhi      uint64
+	sc_hi1       uint64
+	sc_hi2       uint64
+	sc_hi3       uint64
+	sc_mdlo      uint64
+	sc_lo1       uint64
+	sc_lo2       uint64
+	sc_lo3       uint64
+	sc_pc        uint64
+	sc_fpc_csr   uint32
+	sc_used_math uint32
+	sc_dsp       uint32
+	sc_reserved  uint32
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_mcontext sigcontext
+	uc_sigmask  uint64
+}
diff --git a/src/runtime/defs_linux_mipsx.go b/src/runtime/defs_linux_mipsx.go
new file mode 100644
index 0000000..9315ba9
--- /dev/null
+++ b/src/runtime/defs_linux_mipsx.go
@@ -0,0 +1,186 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+// +build linux
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+	_ENOSYS = 0x59
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x800
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+
+	_SA_RESTART = 0x10000000
+	_SA_ONSTACK = 0x8000000
+	_SA_SIGINFO = 0x8
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGUSR1   = 0x10
+	_SIGUSR2   = 0x11
+	_SIGCHLD   = 0x12
+	_SIGPWR    = 0x13
+	_SIGWINCH  = 0x14
+	_SIGURG    = 0x15
+	_SIGIO     = 0x16
+	_SIGSTOP   = 0x17
+	_SIGTSTP   = 0x18
+	_SIGCONT   = 0x19
+	_SIGTTIN   = 0x1a
+	_SIGTTOU   = 0x1b
+	_SIGVTALRM = 0x1c
+	_SIGPROF   = 0x1d
+	_SIGXCPU   = 0x1e
+	_SIGXFSZ   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EPOLLIN       = 0x1
+	_EPOLLOUT      = 0x4
+	_EPOLLERR      = 0x8
+	_EPOLLHUP      = 0x10
+	_EPOLLRDHUP    = 0x2000
+	_EPOLLET       = 0x80000000
+	_EPOLL_CLOEXEC = 0x80000
+	_EPOLL_CTL_ADD = 0x1
+	_EPOLL_CTL_DEL = 0x2
+	_EPOLL_CTL_MOD = 0x3
+)
+
+type timespec struct {
+	tv_sec  int32
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = timediv(ns, 1e9, &ts.tv_nsec)
+}
+
+type timeval struct {
+	tv_sec  int32
+	tv_usec int32
+}
+
+//go:nosplit
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type sigactiont struct {
+	sa_flags   uint32
+	sa_handler uintptr
+	sa_mask    [4]uint32
+	// linux header does not have sa_restorer field,
+	// but it is used in setsig(). it is no harm to put it here
+	sa_restorer uintptr
+}
+
+type siginfo struct {
+	si_signo int32
+	si_code  int32
+	si_errno int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint32
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type epollevent struct {
+	events    uint32
+	pad_cgo_0 [4]byte
+	data      uint64
+}
+
+const (
+	_O_RDONLY    = 0x0
+	_O_NONBLOCK  = 0x80
+	_O_CLOEXEC   = 0x80000
+	_SA_RESTORER = 0
+)
+
+type stackt struct {
+	ss_sp    *byte
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type sigcontext struct {
+	sc_regmask   uint32
+	sc_status    uint32
+	sc_pc        uint64
+	sc_regs      [32]uint64
+	sc_fpregs    [32]uint64
+	sc_acx       uint32
+	sc_fpc_csr   uint32
+	sc_fpc_eir   uint32
+	sc_used_math uint32
+	sc_dsp       uint32
+	sc_mdhi      uint64
+	sc_mdlo      uint64
+	sc_hi1       uint32
+	sc_lo1       uint32
+	sc_hi2       uint32
+	sc_lo2       uint32
+	sc_hi3       uint32
+	sc_lo3       uint32
+}
+
+type ucontext struct {
+	uc_flags    uint32
+	uc_link     *ucontext
+	uc_stack    stackt
+	Pad_cgo_0   [4]byte
+	uc_mcontext sigcontext
+	uc_sigmask  [4]uint32
+}
diff --git a/src/runtime/defs_linux_ppc64.go b/src/runtime/defs_linux_ppc64.go
new file mode 100644
index 0000000..90b1dc1
--- /dev/null
+++ b/src/runtime/defs_linux_ppc64.go
@@ -0,0 +1,201 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs3_linux.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+	_ENOSYS = 0x26
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+
+	_SA_RESTART = 0x10000000
+	_SA_ONSTACK = 0x8000000
+	_SA_SIGINFO = 0x4
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EPOLLIN       = 0x1
+	_EPOLLOUT      = 0x4
+	_EPOLLERR      = 0x8
+	_EPOLLHUP      = 0x10
+	_EPOLLRDHUP    = 0x2000
+	_EPOLLET       = 0x80000000
+	_EPOLL_CLOEXEC = 0x80000
+	_EPOLL_CTL_ADD = 0x1
+	_EPOLL_CTL_DEL = 0x2
+	_EPOLL_CTL_MOD = 0x3
+)
+
+//struct Sigset {
+//	uint64	sig[1];
+//};
+//typedef uint64 Sigset;
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfo struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type epollevent struct {
+	events    uint32
+	pad_cgo_0 [4]byte
+	data      [8]byte // unaligned uintptr
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs3_linux.go
+
+const (
+	_O_RDONLY    = 0x0
+	_O_NONBLOCK  = 0x800
+	_O_CLOEXEC   = 0x80000
+	_SA_RESTORER = 0
+)
+
+type ptregs struct {
+	gpr       [32]uint64
+	nip       uint64
+	msr       uint64
+	orig_gpr3 uint64
+	ctr       uint64
+	link      uint64
+	xer       uint64
+	ccr       uint64
+	softe     uint64
+	trap      uint64
+	dar       uint64
+	dsisr     uint64
+	result    uint64
+}
+
+type vreg struct {
+	u [4]uint32
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+	ss_size   uintptr
+}
+
+type sigcontext struct {
+	_unused     [4]uint64
+	signal      int32
+	_pad0       int32
+	handler     uint64
+	oldmask     uint64
+	regs        *ptregs
+	gp_regs     [48]uint64
+	fp_regs     [33]float64
+	v_regs      *vreg
+	vmx_reserve [101]int64
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_sigmask  uint64
+	__unused    [15]uint64
+	uc_mcontext sigcontext
+}
diff --git a/src/runtime/defs_linux_ppc64le.go b/src/runtime/defs_linux_ppc64le.go
new file mode 100644
index 0000000..90b1dc1
--- /dev/null
+++ b/src/runtime/defs_linux_ppc64le.go
@@ -0,0 +1,201 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs3_linux.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+	_ENOSYS = 0x26
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+
+	_SA_RESTART = 0x10000000
+	_SA_ONSTACK = 0x8000000
+	_SA_SIGINFO = 0x4
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EPOLLIN       = 0x1
+	_EPOLLOUT      = 0x4
+	_EPOLLERR      = 0x8
+	_EPOLLHUP      = 0x10
+	_EPOLLRDHUP    = 0x2000
+	_EPOLLET       = 0x80000000
+	_EPOLL_CLOEXEC = 0x80000
+	_EPOLL_CTL_ADD = 0x1
+	_EPOLL_CTL_DEL = 0x2
+	_EPOLL_CTL_MOD = 0x3
+)
+
+//struct Sigset {
+//	uint64	sig[1];
+//};
+//typedef uint64 Sigset;
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfo struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type epollevent struct {
+	events    uint32
+	pad_cgo_0 [4]byte
+	data      [8]byte // unaligned uintptr
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs3_linux.go
+
+const (
+	_O_RDONLY    = 0x0
+	_O_NONBLOCK  = 0x800
+	_O_CLOEXEC   = 0x80000
+	_SA_RESTORER = 0
+)
+
+type ptregs struct {
+	gpr       [32]uint64
+	nip       uint64
+	msr       uint64
+	orig_gpr3 uint64
+	ctr       uint64
+	link      uint64
+	xer       uint64
+	ccr       uint64
+	softe     uint64
+	trap      uint64
+	dar       uint64
+	dsisr     uint64
+	result    uint64
+}
+
+type vreg struct {
+	u [4]uint32
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+	ss_size   uintptr
+}
+
+type sigcontext struct {
+	_unused     [4]uint64
+	signal      int32
+	_pad0       int32
+	handler     uint64
+	oldmask     uint64
+	regs        *ptregs
+	gp_regs     [48]uint64
+	fp_regs     [33]float64
+	v_regs      *vreg
+	vmx_reserve [101]int64
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_sigmask  uint64
+	__unused    [15]uint64
+	uc_mcontext sigcontext
+}
diff --git a/src/runtime/defs_linux_riscv64.go b/src/runtime/defs_linux_riscv64.go
new file mode 100644
index 0000000..60da0fa
--- /dev/null
+++ b/src/runtime/defs_linux_riscv64.go
@@ -0,0 +1,209 @@
+// Generated using cgo, then manually converted into appropriate naming and code
+// for the Go runtime.
+// go tool cgo -godefs defs_linux.go defs1_linux.go defs2_linux.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+	_ENOSYS = 0x26
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+
+	_SA_RESTART  = 0x10000000
+	_SA_ONSTACK  = 0x8000000
+	_SA_RESTORER = 0x0
+	_SA_SIGINFO  = 0x4
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EPOLLIN       = 0x1
+	_EPOLLOUT      = 0x4
+	_EPOLLERR      = 0x8
+	_EPOLLHUP      = 0x10
+	_EPOLLRDHUP    = 0x2000
+	_EPOLLET       = 0x80000000
+	_EPOLL_CLOEXEC = 0x80000
+	_EPOLL_CTL_ADD = 0x1
+	_EPOLL_CTL_DEL = 0x2
+	_EPOLL_CTL_MOD = 0x3
+)
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfo struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type epollevent struct {
+	events    uint32
+	pad_cgo_0 [4]byte
+	data      [8]byte // unaligned uintptr
+}
+
+const (
+	_O_RDONLY   = 0x0
+	_O_NONBLOCK = 0x800
+	_O_CLOEXEC  = 0x80000
+)
+
+type user_regs_struct struct {
+	pc  uint64
+	ra  uint64
+	sp  uint64
+	gp  uint64
+	tp  uint64
+	t0  uint64
+	t1  uint64
+	t2  uint64
+	s0  uint64
+	s1  uint64
+	a0  uint64
+	a1  uint64
+	a2  uint64
+	a3  uint64
+	a4  uint64
+	a5  uint64
+	a6  uint64
+	a7  uint64
+	s2  uint64
+	s3  uint64
+	s4  uint64
+	s5  uint64
+	s6  uint64
+	s7  uint64
+	s8  uint64
+	s9  uint64
+	s10 uint64
+	s11 uint64
+	t3  uint64
+	t4  uint64
+	t5  uint64
+	t6  uint64
+}
+
+type user_fpregs_struct struct {
+	f [528]byte
+}
+
+type usigset struct {
+	us_x__val [16]uint64
+}
+
+type sigcontext struct {
+	sc_regs   user_regs_struct
+	sc_fpregs user_fpregs_struct
+}
+
+type stackt struct {
+	ss_sp    *byte
+	ss_flags int32
+	ss_size  uintptr
+}
+
+type ucontext struct {
+	uc_flags     uint64
+	uc_link      *ucontext
+	uc_stack     stackt
+	uc_sigmask   usigset
+	uc_x__unused [0]uint8
+	uc_pad_cgo_0 [8]byte
+	uc_mcontext  sigcontext
+}
diff --git a/src/runtime/defs_linux_s390x.go b/src/runtime/defs_linux_s390x.go
new file mode 100644
index 0000000..fa289d5
--- /dev/null
+++ b/src/runtime/defs_linux_s390x.go
@@ -0,0 +1,168 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+	_ENOSYS = 0x26
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+
+	_SA_RESTART = 0x10000000
+	_SA_ONSTACK = 0x8000000
+	_SA_SIGINFO = 0x4
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EPOLLIN       = 0x1
+	_EPOLLOUT      = 0x4
+	_EPOLLERR      = 0x8
+	_EPOLLHUP      = 0x10
+	_EPOLLRDHUP    = 0x2000
+	_EPOLLET       = 0x80000000
+	_EPOLL_CLOEXEC = 0x80000
+	_EPOLL_CTL_ADD = 0x1
+	_EPOLL_CTL_DEL = 0x2
+	_EPOLL_CTL_MOD = 0x3
+)
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfo struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type epollevent struct {
+	events    uint32
+	pad_cgo_0 [4]byte
+	data      [8]byte // unaligned uintptr
+}
+
+const (
+	_O_RDONLY    = 0x0
+	_O_NONBLOCK  = 0x800
+	_O_CLOEXEC   = 0x80000
+	_SA_RESTORER = 0
+)
+
+type stackt struct {
+	ss_sp    *byte
+	ss_flags int32
+	ss_size  uintptr
+}
+
+type sigcontext struct {
+	psw_mask uint64
+	psw_addr uint64
+	gregs    [16]uint64
+	aregs    [16]uint32
+	fpc      uint32
+	fpregs   [16]uint64
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_mcontext sigcontext
+	uc_sigmask  uint64
+}
diff --git a/src/runtime/defs_netbsd.go b/src/runtime/defs_netbsd.go
new file mode 100644
index 0000000..3f5ce5a
--- /dev/null
+++ b/src/runtime/defs_netbsd.go
@@ -0,0 +1,130 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_netbsd.go defs_netbsd_amd64.go >defs_netbsd_amd64.h
+GOARCH=386 go tool cgo -cdefs defs_netbsd.go defs_netbsd_386.go >defs_netbsd_386.h
+GOARCH=arm go tool cgo -cdefs defs_netbsd.go defs_netbsd_arm.go >defs_netbsd_arm.h
+*/
+
+// +godefs map __fpregset_t [644]byte
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <sys/signal.h>
+#include <sys/event.h>
+#include <sys/time.h>
+#include <sys/ucontext.h>
+#include <sys/unistd.h>
+#include <errno.h>
+#include <signal.h>
+*/
+import "C"
+
+const (
+	EINTR  = C.EINTR
+	EFAULT = C.EFAULT
+	EAGAIN = C.EAGAIN
+	ENOSYS = C.ENOSYS
+
+	O_NONBLOCK = C.O_NONBLOCK
+	O_CLOEXEC  = C.O_CLOEXEC
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_FREE = C.MADV_FREE
+
+	SA_SIGINFO = C.SA_SIGINFO
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGINFO   = C.SIGINFO
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EV_ADD       = C.EV_ADD
+	EV_DELETE    = C.EV_DELETE
+	EV_CLEAR     = C.EV_CLEAR
+	EV_RECEIPT   = 0
+	EV_ERROR     = C.EV_ERROR
+	EV_EOF       = C.EV_EOF
+	EVFILT_READ  = C.EVFILT_READ
+	EVFILT_WRITE = C.EVFILT_WRITE
+)
+
+type Sigset C.sigset_t
+type Siginfo C.struct__ksiginfo
+
+type StackT C.stack_t
+
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+
+type McontextT C.mcontext_t
+type UcontextT C.ucontext_t
+
+type Kevent C.struct_kevent
diff --git a/src/runtime/defs_netbsd_386.go b/src/runtime/defs_netbsd_386.go
new file mode 100644
index 0000000..c26f246
--- /dev/null
+++ b/src/runtime/defs_netbsd_386.go
@@ -0,0 +1,41 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=386 go tool cgo -cdefs defs_netbsd.go defs_netbsd_386.go >defs_netbsd_386.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <machine/mcontext.h>
+*/
+import "C"
+
+const (
+	REG_GS     = C._REG_GS
+	REG_FS     = C._REG_FS
+	REG_ES     = C._REG_ES
+	REG_DS     = C._REG_DS
+	REG_EDI    = C._REG_EDI
+	REG_ESI    = C._REG_ESI
+	REG_EBP    = C._REG_EBP
+	REG_ESP    = C._REG_ESP
+	REG_EBX    = C._REG_EBX
+	REG_EDX    = C._REG_EDX
+	REG_ECX    = C._REG_ECX
+	REG_EAX    = C._REG_EAX
+	REG_TRAPNO = C._REG_TRAPNO
+	REG_ERR    = C._REG_ERR
+	REG_EIP    = C._REG_EIP
+	REG_CS     = C._REG_CS
+	REG_EFL    = C._REG_EFL
+	REG_UESP   = C._REG_UESP
+	REG_SS     = C._REG_SS
+)
diff --git a/src/runtime/defs_netbsd_amd64.go b/src/runtime/defs_netbsd_amd64.go
new file mode 100644
index 0000000..f18a7b1
--- /dev/null
+++ b/src/runtime/defs_netbsd_amd64.go
@@ -0,0 +1,48 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_netbsd.go defs_netbsd_amd64.go >defs_netbsd_amd64.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <machine/mcontext.h>
+*/
+import "C"
+
+const (
+	REG_RDI    = C._REG_RDI
+	REG_RSI    = C._REG_RSI
+	REG_RDX    = C._REG_RDX
+	REG_RCX    = C._REG_RCX
+	REG_R8     = C._REG_R8
+	REG_R9     = C._REG_R9
+	REG_R10    = C._REG_R10
+	REG_R11    = C._REG_R11
+	REG_R12    = C._REG_R12
+	REG_R13    = C._REG_R13
+	REG_R14    = C._REG_R14
+	REG_R15    = C._REG_R15
+	REG_RBP    = C._REG_RBP
+	REG_RBX    = C._REG_RBX
+	REG_RAX    = C._REG_RAX
+	REG_GS     = C._REG_GS
+	REG_FS     = C._REG_FS
+	REG_ES     = C._REG_ES
+	REG_DS     = C._REG_DS
+	REG_TRAPNO = C._REG_TRAPNO
+	REG_ERR    = C._REG_ERR
+	REG_RIP    = C._REG_RIP
+	REG_CS     = C._REG_CS
+	REG_RFLAGS = C._REG_RFLAGS
+	REG_RSP    = C._REG_RSP
+	REG_SS     = C._REG_SS
+)
diff --git a/src/runtime/defs_netbsd_arm.go b/src/runtime/defs_netbsd_arm.go
new file mode 100644
index 0000000..cb0dce6
--- /dev/null
+++ b/src/runtime/defs_netbsd_arm.go
@@ -0,0 +1,39 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=arm go tool cgo -cdefs defs_netbsd.go defs_netbsd_arm.go >defs_netbsd_arm.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <machine/mcontext.h>
+*/
+import "C"
+
+const (
+	REG_R0   = C._REG_R0
+	REG_R1   = C._REG_R1
+	REG_R2   = C._REG_R2
+	REG_R3   = C._REG_R3
+	REG_R4   = C._REG_R4
+	REG_R5   = C._REG_R5
+	REG_R6   = C._REG_R6
+	REG_R7   = C._REG_R7
+	REG_R8   = C._REG_R8
+	REG_R9   = C._REG_R9
+	REG_R10  = C._REG_R10
+	REG_R11  = C._REG_R11
+	REG_R12  = C._REG_R12
+	REG_R13  = C._REG_R13
+	REG_R14  = C._REG_R14
+	REG_R15  = C._REG_R15
+	REG_CPSR = C._REG_CPSR
+)
diff --git a/src/runtime/defs_openbsd.go b/src/runtime/defs_openbsd.go
new file mode 100644
index 0000000..ff7e21c
--- /dev/null
+++ b/src/runtime/defs_openbsd.go
@@ -0,0 +1,145 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -godefs defs_openbsd.go
+GOARCH=386 go tool cgo -godefs defs_openbsd.go
+GOARCH=arm go tool cgo -godefs defs_openbsd.go
+GOARCH=arm64 go tool cgo -godefs defs_openbsd.go
+GOARCH=mips64 go tool cgo -godefs defs_openbsd.go
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <sys/event.h>
+#include <sys/mman.h>
+#include <sys/time.h>
+#include <sys/unistd.h>
+#include <sys/signal.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <signal.h>
+*/
+import "C"
+
+const (
+	EINTR  = C.EINTR
+	EFAULT = C.EFAULT
+	EAGAIN = C.EAGAIN
+	ENOSYS = C.ENOSYS
+
+	O_NONBLOCK = C.O_NONBLOCK
+	O_CLOEXEC  = C.O_CLOEXEC
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+	MAP_STACK   = C.MAP_STACK
+
+	MADV_FREE = C.MADV_FREE
+
+	SA_SIGINFO = C.SA_SIGINFO
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+
+	PTHREAD_CREATE_DETACHED = C.PTHREAD_CREATE_DETACHED
+
+	F_SETFD    = C.F_SETFD
+	F_GETFL    = C.F_GETFL
+	F_SETFL    = C.F_SETFL
+	FD_CLOEXEC = C.FD_CLOEXEC
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGINFO   = C.SIGINFO
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EV_ADD       = C.EV_ADD
+	EV_DELETE    = C.EV_DELETE
+	EV_CLEAR     = C.EV_CLEAR
+	EV_ERROR     = C.EV_ERROR
+	EV_EOF       = C.EV_EOF
+	EVFILT_READ  = C.EVFILT_READ
+	EVFILT_WRITE = C.EVFILT_WRITE
+)
+
+type TforkT C.struct___tfork
+
+type Sigcontext C.struct_sigcontext
+type Siginfo C.siginfo_t
+type Sigset C.sigset_t
+type Sigval C.union_sigval
+
+type StackT C.stack_t
+
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+
+type KeventT C.struct_kevent
+
+type Pthread C.pthread_t
+type PthreadAttr C.pthread_attr_t
+type PthreadCond C.pthread_cond_t
+type PthreadCondAttr C.pthread_condattr_t
+type PthreadMutex C.pthread_mutex_t
+type PthreadMutexAttr C.pthread_mutexattr_t
diff --git a/src/runtime/defs_openbsd_386.go b/src/runtime/defs_openbsd_386.go
new file mode 100644
index 0000000..35f2e53
--- /dev/null
+++ b/src/runtime/defs_openbsd_386.go
@@ -0,0 +1,168 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_openbsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+	_ENOSYS = 0x4e
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x10000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+	_MAP_STACK   = 0x4000
+
+	_MADV_FREE = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type tforkt struct {
+	tf_tcb   unsafe.Pointer
+	tf_tid   *int32
+	tf_stack uintptr
+}
+
+type sigcontext struct {
+	sc_gs       uint32
+	sc_fs       uint32
+	sc_es       uint32
+	sc_ds       uint32
+	sc_edi      uint32
+	sc_esi      uint32
+	sc_ebp      uint32
+	sc_ebx      uint32
+	sc_edx      uint32
+	sc_ecx      uint32
+	sc_eax      uint32
+	sc_eip      uint32
+	sc_cs       uint32
+	sc_eflags   uint32
+	sc_esp      uint32
+	sc_ss       uint32
+	__sc_unused uint32
+	sc_mask     uint32
+	sc_trapno   uint32
+	sc_err      uint32
+	sc_fpstate  unsafe.Pointer
+}
+
+type siginfo struct {
+	si_signo int32
+	si_code  int32
+	si_errno int32
+	_data    [116]byte
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = int64(timediv(ns, 1e9, &ts.tv_nsec))
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int32
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident  uint32
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
diff --git a/src/runtime/defs_openbsd_amd64.go b/src/runtime/defs_openbsd_amd64.go
new file mode 100644
index 0000000..46f1245
--- /dev/null
+++ b/src/runtime/defs_openbsd_amd64.go
@@ -0,0 +1,193 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_openbsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+	_ENOSYS = 0x4e
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x10000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+	_MAP_STACK   = 0x4000
+
+	_MADV_FREE = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_PTHREAD_CREATE_DETACHED = 0x1
+
+	_F_SETFD    = 0x2
+	_F_GETFL    = 0x3
+	_F_SETFL    = 0x4
+	_FD_CLOEXEC = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type tforkt struct {
+	tf_tcb   unsafe.Pointer
+	tf_tid   *int32
+	tf_stack uintptr
+}
+
+type sigcontext struct {
+	sc_rdi      uint64
+	sc_rsi      uint64
+	sc_rdx      uint64
+	sc_rcx      uint64
+	sc_r8       uint64
+	sc_r9       uint64
+	sc_r10      uint64
+	sc_r11      uint64
+	sc_r12      uint64
+	sc_r13      uint64
+	sc_r14      uint64
+	sc_r15      uint64
+	sc_rbp      uint64
+	sc_rbx      uint64
+	sc_rax      uint64
+	sc_gs       uint64
+	sc_fs       uint64
+	sc_es       uint64
+	sc_ds       uint64
+	sc_trapno   uint64
+	sc_err      uint64
+	sc_rip      uint64
+	sc_cs       uint64
+	sc_rflags   uint64
+	sc_rsp      uint64
+	sc_ss       uint64
+	sc_fpstate  unsafe.Pointer
+	__sc_unused int32
+	sc_mask     int32
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_code   int32
+	si_errno  int32
+	pad_cgo_0 [4]byte
+	_data     [120]byte
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type pthread uintptr
+type pthreadattr uintptr
+type pthreadcond uintptr
+type pthreadcondattr uintptr
+type pthreadmutex uintptr
+type pthreadmutexattr uintptr
diff --git a/src/runtime/defs_openbsd_arm.go b/src/runtime/defs_openbsd_arm.go
new file mode 100644
index 0000000..170bb38
--- /dev/null
+++ b/src/runtime/defs_openbsd_arm.go
@@ -0,0 +1,176 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_openbsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+	_ENOSYS = 0x4e
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x10000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+	_MAP_STACK   = 0x4000
+
+	_MADV_FREE = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type tforkt struct {
+	tf_tcb   unsafe.Pointer
+	tf_tid   *int32
+	tf_stack uintptr
+}
+
+type sigcontext struct {
+	__sc_unused int32
+	sc_mask     int32
+
+	sc_spsr   uint32
+	sc_r0     uint32
+	sc_r1     uint32
+	sc_r2     uint32
+	sc_r3     uint32
+	sc_r4     uint32
+	sc_r5     uint32
+	sc_r6     uint32
+	sc_r7     uint32
+	sc_r8     uint32
+	sc_r9     uint32
+	sc_r10    uint32
+	sc_r11    uint32
+	sc_r12    uint32
+	sc_usr_sp uint32
+	sc_usr_lr uint32
+	sc_svc_lr uint32
+	sc_pc     uint32
+	sc_fpused uint32
+	sc_fpscr  uint32
+	sc_fpreg  [32]uint64
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_code   int32
+	si_errno  int32
+	pad_cgo_0 [4]byte
+	_data     [120]byte
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type timespec struct {
+	tv_sec    int64
+	tv_nsec   int32
+	pad_cgo_0 [4]byte
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = int64(timediv(ns, 1e9, &ts.tv_nsec))
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident     uint32
+	filter    int16
+	flags     uint16
+	fflags    uint32
+	pad_cgo_0 [4]byte
+	data      int64
+	udata     *byte
+	pad_cgo_1 [4]byte
+}
diff --git a/src/runtime/defs_openbsd_arm64.go b/src/runtime/defs_openbsd_arm64.go
new file mode 100644
index 0000000..d2b947f
--- /dev/null
+++ b/src/runtime/defs_openbsd_arm64.go
@@ -0,0 +1,173 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+	_ENOSYS = 0x4e
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x10000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+	_MAP_STACK   = 0x4000
+
+	_MADV_FREE = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_PTHREAD_CREATE_DETACHED = 0x1
+
+	_F_SETFD    = 0x2
+	_F_GETFL    = 0x3
+	_F_SETFL    = 0x4
+	_FD_CLOEXEC = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type tforkt struct {
+	tf_tcb   unsafe.Pointer
+	tf_tid   *int32
+	tf_stack uintptr
+}
+
+type sigcontext struct {
+	__sc_unused int32
+	sc_mask     int32
+	sc_sp       uintptr
+	sc_lr       uintptr
+	sc_elr      uintptr
+	sc_spsr     uintptr
+	sc_x        [30]uintptr
+	sc_cookie   int64
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_code   int32
+	si_errno  int32
+	pad_cgo_0 [4]byte
+	_data     [120]byte
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type pthread uintptr
+type pthreadattr uintptr
+type pthreadcond uintptr
+type pthreadcondattr uintptr
+type pthreadmutex uintptr
+type pthreadmutexattr uintptr
diff --git a/src/runtime/defs_openbsd_mips64.go b/src/runtime/defs_openbsd_mips64.go
new file mode 100644
index 0000000..28d70b7
--- /dev/null
+++ b/src/runtime/defs_openbsd_mips64.go
@@ -0,0 +1,167 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Generated from:
+//
+//   GOARCH=mips64 go tool cgo -godefs defs_openbsd.go
+//
+// Then converted to the form used by the runtime.
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+	_ENOSYS = 0x4e
+
+	_O_NONBLOCK = 0x4
+	_O_CLOEXEC  = 0x10000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+	_MAP_STACK   = 0x4000
+
+	_MADV_FREE = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type tforkt struct {
+	tf_tcb   unsafe.Pointer
+	tf_tid   *int32
+	tf_stack uintptr
+}
+
+type sigcontext struct {
+	sc_cookie  uint64
+	sc_mask    uint64
+	sc_pc      uint64
+	sc_regs    [32]uint64
+	mullo      uint64
+	mulhi      uint64
+	sc_fpregs  [33]uint64
+	sc_fpused  uint64
+	sc_fpc_eir uint64
+	_xxx       [8]int64
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_code   int32
+	si_errno  int32
+	pad_cgo_0 [4]byte
+	_data     [120]byte
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
diff --git a/src/runtime/defs_plan9_386.go b/src/runtime/defs_plan9_386.go
new file mode 100644
index 0000000..49129b3
--- /dev/null
+++ b/src/runtime/defs_plan9_386.go
@@ -0,0 +1,64 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const _PAGESIZE = 0x1000
+
+type ureg struct {
+	di    uint32 /* general registers */
+	si    uint32 /* ... */
+	bp    uint32 /* ... */
+	nsp   uint32
+	bx    uint32 /* ... */
+	dx    uint32 /* ... */
+	cx    uint32 /* ... */
+	ax    uint32 /* ... */
+	gs    uint32 /* data segments */
+	fs    uint32 /* ... */
+	es    uint32 /* ... */
+	ds    uint32 /* ... */
+	trap  uint32 /* trap _type */
+	ecode uint32 /* error code (or zero) */
+	pc    uint32 /* pc */
+	cs    uint32 /* old context */
+	flags uint32 /* old flags */
+	sp    uint32
+	ss    uint32 /* old stack segment */
+}
+
+type sigctxt struct {
+	u *ureg
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uintptr { return uintptr(c.u.pc) }
+
+func (c *sigctxt) sp() uintptr { return uintptr(c.u.sp) }
+func (c *sigctxt) lr() uintptr { return uintptr(0) }
+
+func (c *sigctxt) setpc(x uintptr) { c.u.pc = uint32(x) }
+func (c *sigctxt) setsp(x uintptr) { c.u.sp = uint32(x) }
+func (c *sigctxt) setlr(x uintptr) {}
+
+func (c *sigctxt) savelr(x uintptr) {}
+
+func dumpregs(u *ureg) {
+	print("ax    ", hex(u.ax), "\n")
+	print("bx    ", hex(u.bx), "\n")
+	print("cx    ", hex(u.cx), "\n")
+	print("dx    ", hex(u.dx), "\n")
+	print("di    ", hex(u.di), "\n")
+	print("si    ", hex(u.si), "\n")
+	print("bp    ", hex(u.bp), "\n")
+	print("sp    ", hex(u.sp), "\n")
+	print("pc    ", hex(u.pc), "\n")
+	print("flags ", hex(u.flags), "\n")
+	print("cs    ", hex(u.cs), "\n")
+	print("fs    ", hex(u.fs), "\n")
+	print("gs    ", hex(u.gs), "\n")
+}
+
+func sigpanictramp() {}
diff --git a/src/runtime/defs_plan9_amd64.go b/src/runtime/defs_plan9_amd64.go
new file mode 100644
index 0000000..0099563
--- /dev/null
+++ b/src/runtime/defs_plan9_amd64.go
@@ -0,0 +1,81 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const _PAGESIZE = 0x1000
+
+type ureg struct {
+	ax  uint64
+	bx  uint64
+	cx  uint64
+	dx  uint64
+	si  uint64
+	di  uint64
+	bp  uint64
+	r8  uint64
+	r9  uint64
+	r10 uint64
+	r11 uint64
+	r12 uint64
+	r13 uint64
+	r14 uint64
+	r15 uint64
+
+	ds uint16
+	es uint16
+	fs uint16
+	gs uint16
+
+	_type uint64
+	error uint64 /* error code (or zero) */
+	ip    uint64 /* pc */
+	cs    uint64 /* old context */
+	flags uint64 /* old flags */
+	sp    uint64 /* sp */
+	ss    uint64 /* old stack segment */
+}
+
+type sigctxt struct {
+	u *ureg
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uintptr { return uintptr(c.u.ip) }
+
+func (c *sigctxt) sp() uintptr { return uintptr(c.u.sp) }
+func (c *sigctxt) lr() uintptr { return uintptr(0) }
+
+func (c *sigctxt) setpc(x uintptr) { c.u.ip = uint64(x) }
+func (c *sigctxt) setsp(x uintptr) { c.u.sp = uint64(x) }
+func (c *sigctxt) setlr(x uintptr) {}
+
+func (c *sigctxt) savelr(x uintptr) {}
+
+func dumpregs(u *ureg) {
+	print("ax    ", hex(u.ax), "\n")
+	print("bx    ", hex(u.bx), "\n")
+	print("cx    ", hex(u.cx), "\n")
+	print("dx    ", hex(u.dx), "\n")
+	print("di    ", hex(u.di), "\n")
+	print("si    ", hex(u.si), "\n")
+	print("bp    ", hex(u.bp), "\n")
+	print("sp    ", hex(u.sp), "\n")
+	print("r8    ", hex(u.r8), "\n")
+	print("r9    ", hex(u.r9), "\n")
+	print("r10   ", hex(u.r10), "\n")
+	print("r11   ", hex(u.r11), "\n")
+	print("r12   ", hex(u.r12), "\n")
+	print("r13   ", hex(u.r13), "\n")
+	print("r14   ", hex(u.r14), "\n")
+	print("r15   ", hex(u.r15), "\n")
+	print("ip    ", hex(u.ip), "\n")
+	print("flags ", hex(u.flags), "\n")
+	print("cs    ", hex(u.cs), "\n")
+	print("fs    ", hex(u.fs), "\n")
+	print("gs    ", hex(u.gs), "\n")
+}
+
+func sigpanictramp() {}
diff --git a/src/runtime/defs_plan9_arm.go b/src/runtime/defs_plan9_arm.go
new file mode 100644
index 0000000..1adc16e
--- /dev/null
+++ b/src/runtime/defs_plan9_arm.go
@@ -0,0 +1,66 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const _PAGESIZE = 0x1000
+
+type ureg struct {
+	r0   uint32 /* general registers */
+	r1   uint32 /* ... */
+	r2   uint32 /* ... */
+	r3   uint32 /* ... */
+	r4   uint32 /* ... */
+	r5   uint32 /* ... */
+	r6   uint32 /* ... */
+	r7   uint32 /* ... */
+	r8   uint32 /* ... */
+	r9   uint32 /* ... */
+	r10  uint32 /* ... */
+	r11  uint32 /* ... */
+	r12  uint32 /* ... */
+	sp   uint32
+	link uint32 /* ... */
+	trap uint32 /* trap type */
+	psr  uint32
+	pc   uint32 /* interrupted addr */
+}
+
+type sigctxt struct {
+	u *ureg
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uintptr { return uintptr(c.u.pc) }
+
+func (c *sigctxt) sp() uintptr { return uintptr(c.u.sp) }
+func (c *sigctxt) lr() uintptr { return uintptr(c.u.link) }
+
+func (c *sigctxt) setpc(x uintptr)  { c.u.pc = uint32(x) }
+func (c *sigctxt) setsp(x uintptr)  { c.u.sp = uint32(x) }
+func (c *sigctxt) setlr(x uintptr)  { c.u.link = uint32(x) }
+func (c *sigctxt) savelr(x uintptr) { c.u.r0 = uint32(x) }
+
+func dumpregs(u *ureg) {
+	print("r0    ", hex(u.r0), "\n")
+	print("r1    ", hex(u.r1), "\n")
+	print("r2    ", hex(u.r2), "\n")
+	print("r3    ", hex(u.r3), "\n")
+	print("r4    ", hex(u.r4), "\n")
+	print("r5    ", hex(u.r5), "\n")
+	print("r6    ", hex(u.r6), "\n")
+	print("r7    ", hex(u.r7), "\n")
+	print("r8    ", hex(u.r8), "\n")
+	print("r9    ", hex(u.r9), "\n")
+	print("r10   ", hex(u.r10), "\n")
+	print("r11   ", hex(u.r11), "\n")
+	print("r12   ", hex(u.r12), "\n")
+	print("sp    ", hex(u.sp), "\n")
+	print("link  ", hex(u.link), "\n")
+	print("pc    ", hex(u.pc), "\n")
+	print("psr   ", hex(u.psr), "\n")
+}
+
+func sigpanictramp()
diff --git a/src/runtime/defs_solaris.go b/src/runtime/defs_solaris.go
new file mode 100644
index 0000000..22df590
--- /dev/null
+++ b/src/runtime/defs_solaris.go
@@ -0,0 +1,161 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_solaris.go >defs_solaris_amd64.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <sys/select.h>
+#include <sys/siginfo.h>
+#include <sys/signal.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/ucontext.h>
+#include <sys/regset.h>
+#include <sys/unistd.h>
+#include <sys/fork.h>
+#include <sys/port.h>
+#include <semaphore.h>
+#include <errno.h>
+#include <signal.h>
+#include <pthread.h>
+#include <netdb.h>
+*/
+import "C"
+
+const (
+	EINTR       = C.EINTR
+	EBADF       = C.EBADF
+	EFAULT      = C.EFAULT
+	EAGAIN      = C.EAGAIN
+	EBUSY       = C.EBUSY
+	ETIME       = C.ETIME
+	ETIMEDOUT   = C.ETIMEDOUT
+	EWOULDBLOCK = C.EWOULDBLOCK
+	EINPROGRESS = C.EINPROGRESS
+	ENOSYS      = C.ENOSYS
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_FREE = C.MADV_FREE
+
+	SA_SIGINFO = C.SA_SIGINFO
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	_SC_NPROCESSORS_ONLN = C._SC_NPROCESSORS_ONLN
+
+	PTHREAD_CREATE_DETACHED = C.PTHREAD_CREATE_DETACHED
+
+	FORK_NOSIGCHLD = C.FORK_NOSIGCHLD
+	FORK_WAITPID   = C.FORK_WAITPID
+
+	MAXHOSTNAMELEN = C.MAXHOSTNAMELEN
+
+	O_NONBLOCK = C.O_NONBLOCK
+	O_CLOEXEC  = C.O_CLOEXEC
+	FD_CLOEXEC = C.FD_CLOEXEC
+	F_GETFL    = C.F_GETFL
+	F_SETFL    = C.F_SETFL
+	F_SETFD    = C.F_SETFD
+
+	POLLIN  = C.POLLIN
+	POLLOUT = C.POLLOUT
+	POLLHUP = C.POLLHUP
+	POLLERR = C.POLLERR
+
+	PORT_SOURCE_FD    = C.PORT_SOURCE_FD
+	PORT_SOURCE_ALERT = C.PORT_SOURCE_ALERT
+	PORT_ALERT_UPDATE = C.PORT_ALERT_UPDATE
+)
+
+type SemT C.sem_t
+
+type Sigset C.sigset_t
+type StackT C.stack_t
+
+type Siginfo C.siginfo_t
+type Sigaction C.struct_sigaction
+
+type Fpregset C.fpregset_t
+type Mcontext C.mcontext_t
+type Ucontext C.ucontext_t
+
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+
+type PortEvent C.port_event_t
+type Pthread C.pthread_t
+type PthreadAttr C.pthread_attr_t
+
+// depends on Timespec, must appear below
+type Stat C.struct_stat
diff --git a/src/runtime/defs_solaris_amd64.go b/src/runtime/defs_solaris_amd64.go
new file mode 100644
index 0000000..0493178
--- /dev/null
+++ b/src/runtime/defs_solaris_amd64.go
@@ -0,0 +1,48 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_solaris.go defs_solaris_amd64.go >defs_solaris_amd64.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <sys/regset.h>
+*/
+import "C"
+
+const (
+	REG_RDI    = C.REG_RDI
+	REG_RSI    = C.REG_RSI
+	REG_RDX    = C.REG_RDX
+	REG_RCX    = C.REG_RCX
+	REG_R8     = C.REG_R8
+	REG_R9     = C.REG_R9
+	REG_R10    = C.REG_R10
+	REG_R11    = C.REG_R11
+	REG_R12    = C.REG_R12
+	REG_R13    = C.REG_R13
+	REG_R14    = C.REG_R14
+	REG_R15    = C.REG_R15
+	REG_RBP    = C.REG_RBP
+	REG_RBX    = C.REG_RBX
+	REG_RAX    = C.REG_RAX
+	REG_GS     = C.REG_GS
+	REG_FS     = C.REG_FS
+	REG_ES     = C.REG_ES
+	REG_DS     = C.REG_DS
+	REG_TRAPNO = C.REG_TRAPNO
+	REG_ERR    = C.REG_ERR
+	REG_RIP    = C.REG_RIP
+	REG_CS     = C.REG_CS
+	REG_RFLAGS = C.REG_RFL
+	REG_RSP    = C.REG_RSP
+	REG_SS     = C.REG_SS
+)
diff --git a/src/runtime/defs_windows.go b/src/runtime/defs_windows.go
new file mode 100644
index 0000000..43f358d
--- /dev/null
+++ b/src/runtime/defs_windows.go
@@ -0,0 +1,78 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_windows.go > defs_windows_amd64.h
+GOARCH=386 go tool cgo -cdefs defs_windows.go > defs_windows_386.h
+*/
+
+package runtime
+
+/*
+#include <signal.h>
+#include <stdarg.h>
+#include <windef.h>
+#include <winbase.h>
+#include <wincon.h>
+
+#ifndef _X86_
+typedef struct {} FLOATING_SAVE_AREA;
+#endif
+#ifndef _AMD64_
+typedef struct {} M128A;
+#endif
+*/
+import "C"
+
+const (
+	PROT_NONE  = 0
+	PROT_READ  = 1
+	PROT_WRITE = 2
+	PROT_EXEC  = 4
+
+	MAP_ANON    = 1
+	MAP_PRIVATE = 2
+
+	DUPLICATE_SAME_ACCESS   = C.DUPLICATE_SAME_ACCESS
+	THREAD_PRIORITY_HIGHEST = C.THREAD_PRIORITY_HIGHEST
+
+	SIGINT              = C.SIGINT
+	SIGTERM             = C.SIGTERM
+	CTRL_C_EVENT        = C.CTRL_C_EVENT
+	CTRL_BREAK_EVENT    = C.CTRL_BREAK_EVENT
+	CTRL_CLOSE_EVENT    = C.CTRL_CLOSE_EVENT
+	CTRL_LOGOFF_EVENT   = C.CTRL_LOGOFF_EVENT
+	CTRL_SHUTDOWN_EVENT = C.CTRL_SHUTDOWN_EVENT
+
+	CONTEXT_CONTROL = C.CONTEXT_CONTROL
+	CONTEXT_FULL    = C.CONTEXT_FULL
+
+	EXCEPTION_ACCESS_VIOLATION     = C.STATUS_ACCESS_VIOLATION
+	EXCEPTION_BREAKPOINT           = C.STATUS_BREAKPOINT
+	EXCEPTION_FLT_DENORMAL_OPERAND = C.STATUS_FLOAT_DENORMAL_OPERAND
+	EXCEPTION_FLT_DIVIDE_BY_ZERO   = C.STATUS_FLOAT_DIVIDE_BY_ZERO
+	EXCEPTION_FLT_INEXACT_RESULT   = C.STATUS_FLOAT_INEXACT_RESULT
+	EXCEPTION_FLT_OVERFLOW         = C.STATUS_FLOAT_OVERFLOW
+	EXCEPTION_FLT_UNDERFLOW        = C.STATUS_FLOAT_UNDERFLOW
+	EXCEPTION_INT_DIVIDE_BY_ZERO   = C.STATUS_INTEGER_DIVIDE_BY_ZERO
+	EXCEPTION_INT_OVERFLOW         = C.STATUS_INTEGER_OVERFLOW
+
+	INFINITE     = C.INFINITE
+	WAIT_TIMEOUT = C.WAIT_TIMEOUT
+
+	EXCEPTION_CONTINUE_EXECUTION = C.EXCEPTION_CONTINUE_EXECUTION
+	EXCEPTION_CONTINUE_SEARCH    = C.EXCEPTION_CONTINUE_SEARCH
+)
+
+type SystemInfo C.SYSTEM_INFO
+type ExceptionRecord C.EXCEPTION_RECORD
+type FloatingSaveArea C.FLOATING_SAVE_AREA
+type M128a C.M128A
+type Context C.CONTEXT
+type Overlapped C.OVERLAPPED
+type MemoryBasicInformation C.MEMORY_BASIC_INFORMATION
diff --git a/src/runtime/defs_windows_386.go b/src/runtime/defs_windows_386.go
new file mode 100644
index 0000000..3c5057b
--- /dev/null
+++ b/src/runtime/defs_windows_386.go
@@ -0,0 +1,149 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_windows.go
+
+package runtime
+
+const (
+	_PROT_NONE  = 0
+	_PROT_READ  = 1
+	_PROT_WRITE = 2
+	_PROT_EXEC  = 4
+
+	_MAP_ANON    = 1
+	_MAP_PRIVATE = 2
+
+	_DUPLICATE_SAME_ACCESS   = 0x2
+	_THREAD_PRIORITY_HIGHEST = 0x2
+
+	_SIGINT              = 0x2
+	_SIGTERM             = 0xF
+	_CTRL_C_EVENT        = 0x0
+	_CTRL_BREAK_EVENT    = 0x1
+	_CTRL_CLOSE_EVENT    = 0x2
+	_CTRL_LOGOFF_EVENT   = 0x5
+	_CTRL_SHUTDOWN_EVENT = 0x6
+
+	_CONTEXT_CONTROL = 0x10001
+	_CONTEXT_FULL    = 0x10007
+
+	_EXCEPTION_ACCESS_VIOLATION     = 0xc0000005
+	_EXCEPTION_BREAKPOINT           = 0x80000003
+	_EXCEPTION_FLT_DENORMAL_OPERAND = 0xc000008d
+	_EXCEPTION_FLT_DIVIDE_BY_ZERO   = 0xc000008e
+	_EXCEPTION_FLT_INEXACT_RESULT   = 0xc000008f
+	_EXCEPTION_FLT_OVERFLOW         = 0xc0000091
+	_EXCEPTION_FLT_UNDERFLOW        = 0xc0000093
+	_EXCEPTION_INT_DIVIDE_BY_ZERO   = 0xc0000094
+	_EXCEPTION_INT_OVERFLOW         = 0xc0000095
+
+	_INFINITE     = 0xffffffff
+	_WAIT_TIMEOUT = 0x102
+
+	_EXCEPTION_CONTINUE_EXECUTION = -0x1
+	_EXCEPTION_CONTINUE_SEARCH    = 0x0
+)
+
+type systeminfo struct {
+	anon0                       [4]byte
+	dwpagesize                  uint32
+	lpminimumapplicationaddress *byte
+	lpmaximumapplicationaddress *byte
+	dwactiveprocessormask       uint32
+	dwnumberofprocessors        uint32
+	dwprocessortype             uint32
+	dwallocationgranularity     uint32
+	wprocessorlevel             uint16
+	wprocessorrevision          uint16
+}
+
+type exceptionrecord struct {
+	exceptioncode        uint32
+	exceptionflags       uint32
+	exceptionrecord      *exceptionrecord
+	exceptionaddress     *byte
+	numberparameters     uint32
+	exceptioninformation [15]uint32
+}
+
+type floatingsavearea struct {
+	controlword   uint32
+	statusword    uint32
+	tagword       uint32
+	erroroffset   uint32
+	errorselector uint32
+	dataoffset    uint32
+	dataselector  uint32
+	registerarea  [80]uint8
+	cr0npxstate   uint32
+}
+
+type context struct {
+	contextflags      uint32
+	dr0               uint32
+	dr1               uint32
+	dr2               uint32
+	dr3               uint32
+	dr6               uint32
+	dr7               uint32
+	floatsave         floatingsavearea
+	seggs             uint32
+	segfs             uint32
+	seges             uint32
+	segds             uint32
+	edi               uint32
+	esi               uint32
+	ebx               uint32
+	edx               uint32
+	ecx               uint32
+	eax               uint32
+	ebp               uint32
+	eip               uint32
+	segcs             uint32
+	eflags            uint32
+	esp               uint32
+	segss             uint32
+	extendedregisters [512]uint8
+}
+
+func (c *context) ip() uintptr { return uintptr(c.eip) }
+func (c *context) sp() uintptr { return uintptr(c.esp) }
+
+// 386 does not have link register, so this returns 0.
+func (c *context) lr() uintptr      { return 0 }
+func (c *context) set_lr(x uintptr) {}
+
+func (c *context) set_ip(x uintptr) { c.eip = uint32(x) }
+func (c *context) set_sp(x uintptr) { c.esp = uint32(x) }
+
+func dumpregs(r *context) {
+	print("eax     ", hex(r.eax), "\n")
+	print("ebx     ", hex(r.ebx), "\n")
+	print("ecx     ", hex(r.ecx), "\n")
+	print("edx     ", hex(r.edx), "\n")
+	print("edi     ", hex(r.edi), "\n")
+	print("esi     ", hex(r.esi), "\n")
+	print("ebp     ", hex(r.ebp), "\n")
+	print("esp     ", hex(r.esp), "\n")
+	print("eip     ", hex(r.eip), "\n")
+	print("eflags  ", hex(r.eflags), "\n")
+	print("cs      ", hex(r.segcs), "\n")
+	print("fs      ", hex(r.segfs), "\n")
+	print("gs      ", hex(r.seggs), "\n")
+}
+
+type overlapped struct {
+	internal     uint32
+	internalhigh uint32
+	anon0        [8]byte
+	hevent       *byte
+}
+
+type memoryBasicInformation struct {
+	baseAddress       uintptr
+	allocationBase    uintptr
+	allocationProtect uint32
+	regionSize        uintptr
+	state             uint32
+	protect           uint32
+	type_             uint32
+}
diff --git a/src/runtime/defs_windows_amd64.go b/src/runtime/defs_windows_amd64.go
new file mode 100644
index 0000000..ebb1506
--- /dev/null
+++ b/src/runtime/defs_windows_amd64.go
@@ -0,0 +1,171 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_windows.go
+
+package runtime
+
+const (
+	_PROT_NONE  = 0
+	_PROT_READ  = 1
+	_PROT_WRITE = 2
+	_PROT_EXEC  = 4
+
+	_MAP_ANON    = 1
+	_MAP_PRIVATE = 2
+
+	_DUPLICATE_SAME_ACCESS   = 0x2
+	_THREAD_PRIORITY_HIGHEST = 0x2
+
+	_SIGINT              = 0x2
+	_SIGTERM             = 0xF
+	_CTRL_C_EVENT        = 0x0
+	_CTRL_BREAK_EVENT    = 0x1
+	_CTRL_CLOSE_EVENT    = 0x2
+	_CTRL_LOGOFF_EVENT   = 0x5
+	_CTRL_SHUTDOWN_EVENT = 0x6
+
+	_CONTEXT_CONTROL = 0x100001
+	_CONTEXT_FULL    = 0x10000b
+
+	_EXCEPTION_ACCESS_VIOLATION     = 0xc0000005
+	_EXCEPTION_BREAKPOINT           = 0x80000003
+	_EXCEPTION_FLT_DENORMAL_OPERAND = 0xc000008d
+	_EXCEPTION_FLT_DIVIDE_BY_ZERO   = 0xc000008e
+	_EXCEPTION_FLT_INEXACT_RESULT   = 0xc000008f
+	_EXCEPTION_FLT_OVERFLOW         = 0xc0000091
+	_EXCEPTION_FLT_UNDERFLOW        = 0xc0000093
+	_EXCEPTION_INT_DIVIDE_BY_ZERO   = 0xc0000094
+	_EXCEPTION_INT_OVERFLOW         = 0xc0000095
+
+	_INFINITE     = 0xffffffff
+	_WAIT_TIMEOUT = 0x102
+
+	_EXCEPTION_CONTINUE_EXECUTION = -0x1
+	_EXCEPTION_CONTINUE_SEARCH    = 0x0
+)
+
+type systeminfo struct {
+	anon0                       [4]byte
+	dwpagesize                  uint32
+	lpminimumapplicationaddress *byte
+	lpmaximumapplicationaddress *byte
+	dwactiveprocessormask       uint64
+	dwnumberofprocessors        uint32
+	dwprocessortype             uint32
+	dwallocationgranularity     uint32
+	wprocessorlevel             uint16
+	wprocessorrevision          uint16
+}
+
+type exceptionrecord struct {
+	exceptioncode        uint32
+	exceptionflags       uint32
+	exceptionrecord      *exceptionrecord
+	exceptionaddress     *byte
+	numberparameters     uint32
+	pad_cgo_0            [4]byte
+	exceptioninformation [15]uint64
+}
+
+type m128a struct {
+	low  uint64
+	high int64
+}
+
+type context struct {
+	p1home               uint64
+	p2home               uint64
+	p3home               uint64
+	p4home               uint64
+	p5home               uint64
+	p6home               uint64
+	contextflags         uint32
+	mxcsr                uint32
+	segcs                uint16
+	segds                uint16
+	seges                uint16
+	segfs                uint16
+	seggs                uint16
+	segss                uint16
+	eflags               uint32
+	dr0                  uint64
+	dr1                  uint64
+	dr2                  uint64
+	dr3                  uint64
+	dr6                  uint64
+	dr7                  uint64
+	rax                  uint64
+	rcx                  uint64
+	rdx                  uint64
+	rbx                  uint64
+	rsp                  uint64
+	rbp                  uint64
+	rsi                  uint64
+	rdi                  uint64
+	r8                   uint64
+	r9                   uint64
+	r10                  uint64
+	r11                  uint64
+	r12                  uint64
+	r13                  uint64
+	r14                  uint64
+	r15                  uint64
+	rip                  uint64
+	anon0                [512]byte
+	vectorregister       [26]m128a
+	vectorcontrol        uint64
+	debugcontrol         uint64
+	lastbranchtorip      uint64
+	lastbranchfromrip    uint64
+	lastexceptiontorip   uint64
+	lastexceptionfromrip uint64
+}
+
+func (c *context) ip() uintptr { return uintptr(c.rip) }
+func (c *context) sp() uintptr { return uintptr(c.rsp) }
+
+// Amd64 does not have link register, so this returns 0.
+func (c *context) lr() uintptr      { return 0 }
+func (c *context) set_lr(x uintptr) {}
+
+func (c *context) set_ip(x uintptr) { c.rip = uint64(x) }
+func (c *context) set_sp(x uintptr) { c.rsp = uint64(x) }
+
+func dumpregs(r *context) {
+	print("rax     ", hex(r.rax), "\n")
+	print("rbx     ", hex(r.rbx), "\n")
+	print("rcx     ", hex(r.rcx), "\n")
+	print("rdi     ", hex(r.rdi), "\n")
+	print("rsi     ", hex(r.rsi), "\n")
+	print("rbp     ", hex(r.rbp), "\n")
+	print("rsp     ", hex(r.rsp), "\n")
+	print("r8      ", hex(r.r8), "\n")
+	print("r9      ", hex(r.r9), "\n")
+	print("r10     ", hex(r.r10), "\n")
+	print("r11     ", hex(r.r11), "\n")
+	print("r12     ", hex(r.r12), "\n")
+	print("r13     ", hex(r.r13), "\n")
+	print("r14     ", hex(r.r14), "\n")
+	print("r15     ", hex(r.r15), "\n")
+	print("rip     ", hex(r.rip), "\n")
+	print("rflags  ", hex(r.eflags), "\n")
+	print("cs      ", hex(r.segcs), "\n")
+	print("fs      ", hex(r.segfs), "\n")
+	print("gs      ", hex(r.seggs), "\n")
+}
+
+type overlapped struct {
+	internal     uint64
+	internalhigh uint64
+	anon0        [8]byte
+	hevent       *byte
+}
+
+type memoryBasicInformation struct {
+	baseAddress       uintptr
+	allocationBase    uintptr
+	allocationProtect uint32
+	regionSize        uintptr
+	state             uint32
+	protect           uint32
+	type_             uint32
+}
diff --git a/src/runtime/defs_windows_arm.go b/src/runtime/defs_windows_arm.go
new file mode 100644
index 0000000..b275b05
--- /dev/null
+++ b/src/runtime/defs_windows_arm.go
@@ -0,0 +1,154 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_PROT_NONE  = 0
+	_PROT_READ  = 1
+	_PROT_WRITE = 2
+	_PROT_EXEC  = 4
+
+	_MAP_ANON    = 1
+	_MAP_PRIVATE = 2
+
+	_DUPLICATE_SAME_ACCESS   = 0x2
+	_THREAD_PRIORITY_HIGHEST = 0x2
+
+	_SIGINT              = 0x2
+	_SIGTERM             = 0xF
+	_CTRL_C_EVENT        = 0x0
+	_CTRL_BREAK_EVENT    = 0x1
+	_CTRL_CLOSE_EVENT    = 0x2
+	_CTRL_LOGOFF_EVENT   = 0x5
+	_CTRL_SHUTDOWN_EVENT = 0x6
+
+	_CONTEXT_CONTROL = 0x10001
+	_CONTEXT_FULL    = 0x10007
+
+	_EXCEPTION_ACCESS_VIOLATION     = 0xc0000005
+	_EXCEPTION_BREAKPOINT           = 0x80000003
+	_EXCEPTION_FLT_DENORMAL_OPERAND = 0xc000008d
+	_EXCEPTION_FLT_DIVIDE_BY_ZERO   = 0xc000008e
+	_EXCEPTION_FLT_INEXACT_RESULT   = 0xc000008f
+	_EXCEPTION_FLT_OVERFLOW         = 0xc0000091
+	_EXCEPTION_FLT_UNDERFLOW        = 0xc0000093
+	_EXCEPTION_INT_DIVIDE_BY_ZERO   = 0xc0000094
+	_EXCEPTION_INT_OVERFLOW         = 0xc0000095
+
+	_INFINITE     = 0xffffffff
+	_WAIT_TIMEOUT = 0x102
+
+	_EXCEPTION_CONTINUE_EXECUTION = -0x1
+	_EXCEPTION_CONTINUE_SEARCH    = 0x0
+)
+
+type systeminfo struct {
+	anon0                       [4]byte
+	dwpagesize                  uint32
+	lpminimumapplicationaddress *byte
+	lpmaximumapplicationaddress *byte
+	dwactiveprocessormask       uint32
+	dwnumberofprocessors        uint32
+	dwprocessortype             uint32
+	dwallocationgranularity     uint32
+	wprocessorlevel             uint16
+	wprocessorrevision          uint16
+}
+
+type exceptionrecord struct {
+	exceptioncode        uint32
+	exceptionflags       uint32
+	exceptionrecord      *exceptionrecord
+	exceptionaddress     *byte
+	numberparameters     uint32
+	exceptioninformation [15]uint32
+}
+
+type neon128 struct {
+	low  uint64
+	high int64
+}
+
+type context struct {
+	contextflags uint32
+	r0           uint32
+	r1           uint32
+	r2           uint32
+	r3           uint32
+	r4           uint32
+	r5           uint32
+	r6           uint32
+	r7           uint32
+	r8           uint32
+	r9           uint32
+	r10          uint32
+	r11          uint32
+	r12          uint32
+
+	spr  uint32
+	lrr  uint32
+	pc   uint32
+	cpsr uint32
+
+	fpscr   uint32
+	padding uint32
+
+	floatNeon [16]neon128
+
+	bvr      [8]uint32
+	bcr      [8]uint32
+	wvr      [1]uint32
+	wcr      [1]uint32
+	padding2 [2]uint32
+}
+
+func (c *context) ip() uintptr { return uintptr(c.pc) }
+func (c *context) sp() uintptr { return uintptr(c.spr) }
+func (c *context) lr() uintptr { return uintptr(c.lrr) }
+
+func (c *context) set_ip(x uintptr) { c.pc = uint32(x) }
+func (c *context) set_sp(x uintptr) { c.spr = uint32(x) }
+func (c *context) set_lr(x uintptr) { c.lrr = uint32(x) }
+
+func dumpregs(r *context) {
+	print("r0   ", hex(r.r0), "\n")
+	print("r1   ", hex(r.r1), "\n")
+	print("r2   ", hex(r.r2), "\n")
+	print("r3   ", hex(r.r3), "\n")
+	print("r4   ", hex(r.r4), "\n")
+	print("r5   ", hex(r.r5), "\n")
+	print("r6   ", hex(r.r6), "\n")
+	print("r7   ", hex(r.r7), "\n")
+	print("r8   ", hex(r.r8), "\n")
+	print("r9   ", hex(r.r9), "\n")
+	print("r10  ", hex(r.r10), "\n")
+	print("r11  ", hex(r.r11), "\n")
+	print("r12  ", hex(r.r12), "\n")
+	print("sp   ", hex(r.spr), "\n")
+	print("lr   ", hex(r.lrr), "\n")
+	print("pc   ", hex(r.pc), "\n")
+	print("cpsr ", hex(r.cpsr), "\n")
+}
+
+type overlapped struct {
+	internal     uint32
+	internalhigh uint32
+	anon0        [8]byte
+	hevent       *byte
+}
+
+type memoryBasicInformation struct {
+	baseAddress       uintptr
+	allocationBase    uintptr
+	allocationProtect uint32
+	regionSize        uintptr
+	state             uint32
+	protect           uint32
+	type_             uint32
+}
+
+func stackcheck() {
+	// TODO: not implemented on ARM
+}
diff --git a/src/runtime/duff_386.s b/src/runtime/duff_386.s
new file mode 100644
index 0000000..ab01430
--- /dev/null
+++ b/src/runtime/duff_386.s
@@ -0,0 +1,779 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero(SB), NOSPLIT, $0-0
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	RET
+
+TEXT runtime·duffcopy(SB), NOSPLIT, $0-0
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	RET
diff --git a/src/runtime/duff_amd64.s b/src/runtime/duff_amd64.s
new file mode 100644
index 0000000..2ff5bf6
--- /dev/null
+++ b/src/runtime/duff_amd64.s
@@ -0,0 +1,427 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero<ABIInternal>(SB), NOSPLIT, $0-0
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X0,(DI)
+	MOVUPS	X0,16(DI)
+	MOVUPS	X0,32(DI)
+	MOVUPS	X0,48(DI)
+	LEAQ	64(DI),DI
+
+	RET
+
+TEXT runtime·duffcopy<ABIInternal>(SB), NOSPLIT, $0-0
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	RET
diff --git a/src/runtime/duff_arm.s b/src/runtime/duff_arm.s
new file mode 100644
index 0000000..ba8235b
--- /dev/null
+++ b/src/runtime/duff_arm.s
@@ -0,0 +1,523 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero(SB), NOSPLIT, $0-0
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	RET
+
+TEXT runtime·duffcopy(SB), NOSPLIT, $0-0
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	RET
diff --git a/src/runtime/duff_arm64.s b/src/runtime/duff_arm64.s
new file mode 100644
index 0000000..128b076
--- /dev/null
+++ b/src/runtime/duff_arm64.s
@@ -0,0 +1,267 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP	(ZR, ZR), (R20)
+	RET
+
+TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	RET
diff --git a/src/runtime/duff_mips64x.s b/src/runtime/duff_mips64x.s
new file mode 100644
index 0000000..c4e04cc
--- /dev/null
+++ b/src/runtime/duff_mips64x.s
@@ -0,0 +1,909 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+// +build mips64 mips64le
+
+#include "textflag.h"
+
+TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	RET
+
+TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	RET
diff --git a/src/runtime/duff_ppc64x.s b/src/runtime/duff_ppc64x.s
new file mode 100644
index 0000000..d6b89ba
--- /dev/null
+++ b/src/runtime/duff_ppc64x.s
@@ -0,0 +1,141 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+// +build ppc64 ppc64le
+
+#include "textflag.h"
+
+TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	MOVDU	R0, 8(R3)
+	RET
+
+TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0
+	UNDEF
diff --git a/src/runtime/duff_riscv64.s b/src/runtime/duff_riscv64.s
new file mode 100644
index 0000000..f7bd3f3
--- /dev/null
+++ b/src/runtime/duff_riscv64.s
@@ -0,0 +1,907 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	MOV	ZERO, (X10)
+	ADD	$8, X10
+	RET
+
+TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	MOV	(X10), X31
+	ADD	$8, X10
+	MOV	X31, (X11)
+	ADD	$8, X11
+
+	RET
diff --git a/src/runtime/duff_s390x.s b/src/runtime/duff_s390x.s
new file mode 100644
index 0000000..95d492a
--- /dev/null
+++ b/src/runtime/duff_s390x.s
@@ -0,0 +1,19 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// s390x can copy/zero 1-256 bytes with a single instruction,
+// so there's no need for these, except to satisfy the prototypes
+// in stubs.go.
+
+TEXT runtime·duffzero(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	$0, 2(R0)
+	RET
+
+TEXT runtime·duffcopy(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	$0, 2(R0)
+	RET
diff --git a/src/runtime/env_plan9.go b/src/runtime/env_plan9.go
new file mode 100644
index 0000000..f1ac476
--- /dev/null
+++ b/src/runtime/env_plan9.go
@@ -0,0 +1,122 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+const (
+	// Plan 9 environment device
+	envDir = "/env/"
+	// size of buffer to read from a directory
+	dirBufSize = 4096
+	// size of buffer to read an environment variable (may grow)
+	envBufSize = 128
+	// offset of the name field in a 9P directory entry - see syscall.UnmarshalDir()
+	nameOffset = 39
+)
+
+// Goenvs caches the Plan 9 environment variables at start of execution into
+// string array envs, to supply the initial contents for os.Environ.
+// Subsequent calls to os.Setenv will change this cache, without writing back
+// to the (possibly shared) Plan 9 environment, so that Setenv and Getenv
+// conform to the same Posix semantics as on other operating systems.
+// For Plan 9 shared environment semantics, instead of Getenv(key) and
+// Setenv(key, value), one can use os.ReadFile("/env/" + key) and
+// os.WriteFile("/env/" + key, value, 0666) respectively.
+//go:nosplit
+func goenvs() {
+	buf := make([]byte, envBufSize)
+	copy(buf, envDir)
+	dirfd := open(&buf[0], _OREAD, 0)
+	if dirfd < 0 {
+		return
+	}
+	defer closefd(dirfd)
+	dofiles(dirfd, func(name []byte) {
+		name = append(name, 0)
+		buf = buf[:len(envDir)]
+		copy(buf, envDir)
+		buf = append(buf, name...)
+		fd := open(&buf[0], _OREAD, 0)
+		if fd < 0 {
+			return
+		}
+		defer closefd(fd)
+		n := len(buf)
+		r := 0
+		for {
+			r = int(pread(fd, unsafe.Pointer(&buf[0]), int32(n), 0))
+			if r < n {
+				break
+			}
+			n = int(seek(fd, 0, 2)) + 1
+			if len(buf) < n {
+				buf = make([]byte, n)
+			}
+		}
+		if r <= 0 {
+			r = 0
+		} else if buf[r-1] == 0 {
+			r--
+		}
+		name[len(name)-1] = '='
+		env := make([]byte, len(name)+r)
+		copy(env, name)
+		copy(env[len(name):], buf[:r])
+		envs = append(envs, string(env))
+	})
+}
+
+// Dofiles reads the directory opened with file descriptor fd, applying function f
+// to each filename in it.
+//go:nosplit
+func dofiles(dirfd int32, f func([]byte)) {
+	dirbuf := new([dirBufSize]byte)
+
+	var off int64 = 0
+	for {
+		n := pread(dirfd, unsafe.Pointer(&dirbuf[0]), int32(dirBufSize), off)
+		if n <= 0 {
+			return
+		}
+		for b := dirbuf[:n]; len(b) > 0; {
+			var name []byte
+			name, b = gdirname(b)
+			if name == nil {
+				return
+			}
+			f(name)
+		}
+		off += int64(n)
+	}
+}
+
+// Gdirname returns the first filename from a buffer of directory entries,
+// and a slice containing the remaining directory entries.
+// If the buffer doesn't start with a valid directory entry, the returned name is nil.
+//go:nosplit
+func gdirname(buf []byte) (name []byte, rest []byte) {
+	if 2+nameOffset+2 > len(buf) {
+		return
+	}
+	entryLen, buf := gbit16(buf)
+	if entryLen > len(buf) {
+		return
+	}
+	n, b := gbit16(buf[nameOffset:])
+	if n > len(b) {
+		return
+	}
+	name = b[:n]
+	rest = buf[entryLen:]
+	return
+}
+
+// Gbit16 reads a 16-bit little-endian binary number from b and returns it
+// with the remaining slice of b.
+//go:nosplit
+func gbit16(b []byte) (int, []byte) {
+	return int(b[0]) | int(b[1])<<8, b[2:]
+}
diff --git a/src/runtime/env_posix.go b/src/runtime/env_posix.go
new file mode 100644
index 0000000..af353bb
--- /dev/null
+++ b/src/runtime/env_posix.go
@@ -0,0 +1,76 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly freebsd js,wasm linux netbsd openbsd solaris windows plan9
+
+package runtime
+
+import "unsafe"
+
+func gogetenv(key string) string {
+	env := environ()
+	if env == nil {
+		throw("getenv before env init")
+	}
+	for _, s := range env {
+		if len(s) > len(key) && s[len(key)] == '=' && envKeyEqual(s[:len(key)], key) {
+			return s[len(key)+1:]
+		}
+	}
+	return ""
+}
+
+// envKeyEqual reports whether a == b, with ASCII-only case insensitivity
+// on Windows. The two strings must have the same length.
+func envKeyEqual(a, b string) bool {
+	if GOOS == "windows" { // case insensitive
+		for i := 0; i < len(a); i++ {
+			ca, cb := a[i], b[i]
+			if ca == cb || lowerASCII(ca) == lowerASCII(cb) {
+				continue
+			}
+			return false
+		}
+		return true
+	}
+	return a == b
+}
+
+func lowerASCII(c byte) byte {
+	if 'A' <= c && c <= 'Z' {
+		return c + ('a' - 'A')
+	}
+	return c
+}
+
+var _cgo_setenv unsafe.Pointer   // pointer to C function
+var _cgo_unsetenv unsafe.Pointer // pointer to C function
+
+// Update the C environment if cgo is loaded.
+// Called from syscall.Setenv.
+//go:linkname syscall_setenv_c syscall.setenv_c
+func syscall_setenv_c(k string, v string) {
+	if _cgo_setenv == nil {
+		return
+	}
+	arg := [2]unsafe.Pointer{cstring(k), cstring(v)}
+	asmcgocall(_cgo_setenv, unsafe.Pointer(&arg))
+}
+
+// Update the C environment if cgo is loaded.
+// Called from syscall.unsetenv.
+//go:linkname syscall_unsetenv_c syscall.unsetenv_c
+func syscall_unsetenv_c(k string) {
+	if _cgo_unsetenv == nil {
+		return
+	}
+	arg := [1]unsafe.Pointer{cstring(k)}
+	asmcgocall(_cgo_unsetenv, unsafe.Pointer(&arg))
+}
+
+func cstring(s string) unsafe.Pointer {
+	p := make([]byte, len(s)+1)
+	copy(p, s)
+	return unsafe.Pointer(&p[0])
+}
diff --git a/src/runtime/env_test.go b/src/runtime/env_test.go
new file mode 100644
index 0000000..c009d0f
--- /dev/null
+++ b/src/runtime/env_test.go
@@ -0,0 +1,43 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"syscall"
+	"testing"
+)
+
+func TestFixedGOROOT(t *testing.T) {
+	// Restore both the real GOROOT environment variable, and runtime's copies:
+	if orig, ok := syscall.Getenv("GOROOT"); ok {
+		defer syscall.Setenv("GOROOT", orig)
+	} else {
+		defer syscall.Unsetenv("GOROOT")
+	}
+	envs := runtime.Envs()
+	oldenvs := append([]string{}, envs...)
+	defer runtime.SetEnvs(oldenvs)
+
+	// attempt to reuse existing envs backing array.
+	want := runtime.GOROOT()
+	runtime.SetEnvs(append(envs[:0], "GOROOT="+want))
+
+	if got := runtime.GOROOT(); got != want {
+		t.Errorf(`initial runtime.GOROOT()=%q, want %q`, got, want)
+	}
+	if err := syscall.Setenv("GOROOT", "/os"); err != nil {
+		t.Fatal(err)
+	}
+	if got := runtime.GOROOT(); got != want {
+		t.Errorf(`after setenv runtime.GOROOT()=%q, want %q`, got, want)
+	}
+	if err := syscall.Unsetenv("GOROOT"); err != nil {
+		t.Fatal(err)
+	}
+	if got := runtime.GOROOT(); got != want {
+		t.Errorf(`after unsetenv runtime.GOROOT()=%q, want %q`, got, want)
+	}
+}
diff --git a/src/runtime/error.go b/src/runtime/error.go
new file mode 100644
index 0000000..9e6cdf3
--- /dev/null
+++ b/src/runtime/error.go
@@ -0,0 +1,327 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "internal/bytealg"
+
+// The Error interface identifies a run time error.
+type Error interface {
+	error
+
+	// RuntimeError is a no-op function but
+	// serves to distinguish types that are run time
+	// errors from ordinary errors: a type is a
+	// run time error if it has a RuntimeError method.
+	RuntimeError()
+}
+
+// A TypeAssertionError explains a failed type assertion.
+type TypeAssertionError struct {
+	_interface    *_type
+	concrete      *_type
+	asserted      *_type
+	missingMethod string // one method needed by Interface, missing from Concrete
+}
+
+func (*TypeAssertionError) RuntimeError() {}
+
+func (e *TypeAssertionError) Error() string {
+	inter := "interface"
+	if e._interface != nil {
+		inter = e._interface.string()
+	}
+	as := e.asserted.string()
+	if e.concrete == nil {
+		return "interface conversion: " + inter + " is nil, not " + as
+	}
+	cs := e.concrete.string()
+	if e.missingMethod == "" {
+		msg := "interface conversion: " + inter + " is " + cs + ", not " + as
+		if cs == as {
+			// provide slightly clearer error message
+			if e.concrete.pkgpath() != e.asserted.pkgpath() {
+				msg += " (types from different packages)"
+			} else {
+				msg += " (types from different scopes)"
+			}
+		}
+		return msg
+	}
+	return "interface conversion: " + cs + " is not " + as +
+		": missing method " + e.missingMethod
+}
+
+//go:nosplit
+// itoa converts val to a decimal representation. The result is
+// written somewhere within buf and the location of the result is returned.
+// buf must be at least 20 bytes.
+func itoa(buf []byte, val uint64) []byte {
+	i := len(buf) - 1
+	for val >= 10 {
+		buf[i] = byte(val%10 + '0')
+		i--
+		val /= 10
+	}
+	buf[i] = byte(val + '0')
+	return buf[i:]
+}
+
+// An errorString represents a runtime error described by a single string.
+type errorString string
+
+func (e errorString) RuntimeError() {}
+
+func (e errorString) Error() string {
+	return "runtime error: " + string(e)
+}
+
+type errorAddressString struct {
+	msg  string  // error message
+	addr uintptr // memory address where the error occurred
+}
+
+func (e errorAddressString) RuntimeError() {}
+
+func (e errorAddressString) Error() string {
+	return "runtime error: " + e.msg
+}
+
+// Addr returns the memory address where a fault occurred.
+// The address provided is best-effort.
+// The veracity of the result may depend on the platform.
+// Errors providing this method will only be returned as
+// a result of using runtime/debug.SetPanicOnFault.
+func (e errorAddressString) Addr() uintptr {
+	return e.addr
+}
+
+// plainError represents a runtime error described a string without
+// the prefix "runtime error: " after invoking errorString.Error().
+// See Issue #14965.
+type plainError string
+
+func (e plainError) RuntimeError() {}
+
+func (e plainError) Error() string {
+	return string(e)
+}
+
+// A boundsError represents an indexing or slicing operation gone wrong.
+type boundsError struct {
+	x int64
+	y int
+	// Values in an index or slice expression can be signed or unsigned.
+	// That means we'd need 65 bits to encode all possible indexes, from -2^63 to 2^64-1.
+	// Instead, we keep track of whether x should be interpreted as signed or unsigned.
+	// y is known to be nonnegative and to fit in an int.
+	signed bool
+	code   boundsErrorCode
+}
+
+type boundsErrorCode uint8
+
+const (
+	boundsIndex boundsErrorCode = iota // s[x], 0 <= x < len(s) failed
+
+	boundsSliceAlen // s[?:x], 0 <= x <= len(s) failed
+	boundsSliceAcap // s[?:x], 0 <= x <= cap(s) failed
+	boundsSliceB    // s[x:y], 0 <= x <= y failed (but boundsSliceA didn't happen)
+
+	boundsSlice3Alen // s[?:?:x], 0 <= x <= len(s) failed
+	boundsSlice3Acap // s[?:?:x], 0 <= x <= cap(s) failed
+	boundsSlice3B    // s[?:x:y], 0 <= x <= y failed (but boundsSlice3A didn't happen)
+	boundsSlice3C    // s[x:y:?], 0 <= x <= y failed (but boundsSlice3A/B didn't happen)
+
+	// Note: in the above, len(s) and cap(s) are stored in y
+)
+
+// boundsErrorFmts provide error text for various out-of-bounds panics.
+// Note: if you change these strings, you should adjust the size of the buffer
+// in boundsError.Error below as well.
+var boundsErrorFmts = [...]string{
+	boundsIndex:      "index out of range [%x] with length %y",
+	boundsSliceAlen:  "slice bounds out of range [:%x] with length %y",
+	boundsSliceAcap:  "slice bounds out of range [:%x] with capacity %y",
+	boundsSliceB:     "slice bounds out of range [%x:%y]",
+	boundsSlice3Alen: "slice bounds out of range [::%x] with length %y",
+	boundsSlice3Acap: "slice bounds out of range [::%x] with capacity %y",
+	boundsSlice3B:    "slice bounds out of range [:%x:%y]",
+	boundsSlice3C:    "slice bounds out of range [%x:%y:]",
+}
+
+// boundsNegErrorFmts are overriding formats if x is negative. In this case there's no need to report y.
+var boundsNegErrorFmts = [...]string{
+	boundsIndex:      "index out of range [%x]",
+	boundsSliceAlen:  "slice bounds out of range [:%x]",
+	boundsSliceAcap:  "slice bounds out of range [:%x]",
+	boundsSliceB:     "slice bounds out of range [%x:]",
+	boundsSlice3Alen: "slice bounds out of range [::%x]",
+	boundsSlice3Acap: "slice bounds out of range [::%x]",
+	boundsSlice3B:    "slice bounds out of range [:%x:]",
+	boundsSlice3C:    "slice bounds out of range [%x::]",
+}
+
+func (e boundsError) RuntimeError() {}
+
+func appendIntStr(b []byte, v int64, signed bool) []byte {
+	if signed && v < 0 {
+		b = append(b, '-')
+		v = -v
+	}
+	var buf [20]byte
+	b = append(b, itoa(buf[:], uint64(v))...)
+	return b
+}
+
+func (e boundsError) Error() string {
+	fmt := boundsErrorFmts[e.code]
+	if e.signed && e.x < 0 {
+		fmt = boundsNegErrorFmts[e.code]
+	}
+	// max message length is 99: "runtime error: slice bounds out of range [::%x] with capacity %y"
+	// x can be at most 20 characters. y can be at most 19.
+	b := make([]byte, 0, 100)
+	b = append(b, "runtime error: "...)
+	for i := 0; i < len(fmt); i++ {
+		c := fmt[i]
+		if c != '%' {
+			b = append(b, c)
+			continue
+		}
+		i++
+		switch fmt[i] {
+		case 'x':
+			b = appendIntStr(b, e.x, e.signed)
+		case 'y':
+			b = appendIntStr(b, int64(e.y), true)
+		}
+	}
+	return string(b)
+}
+
+type stringer interface {
+	String() string
+}
+
+// printany prints an argument passed to panic.
+// If panic is called with a value that has a String or Error method,
+// it has already been converted into a string by preprintpanics.
+func printany(i interface{}) {
+	switch v := i.(type) {
+	case nil:
+		print("nil")
+	case bool:
+		print(v)
+	case int:
+		print(v)
+	case int8:
+		print(v)
+	case int16:
+		print(v)
+	case int32:
+		print(v)
+	case int64:
+		print(v)
+	case uint:
+		print(v)
+	case uint8:
+		print(v)
+	case uint16:
+		print(v)
+	case uint32:
+		print(v)
+	case uint64:
+		print(v)
+	case uintptr:
+		print(v)
+	case float32:
+		print(v)
+	case float64:
+		print(v)
+	case complex64:
+		print(v)
+	case complex128:
+		print(v)
+	case string:
+		print(v)
+	default:
+		printanycustomtype(i)
+	}
+}
+
+func printanycustomtype(i interface{}) {
+	eface := efaceOf(&i)
+	typestring := eface._type.string()
+
+	switch eface._type.kind {
+	case kindString:
+		print(typestring, `("`, *(*string)(eface.data), `")`)
+	case kindBool:
+		print(typestring, "(", *(*bool)(eface.data), ")")
+	case kindInt:
+		print(typestring, "(", *(*int)(eface.data), ")")
+	case kindInt8:
+		print(typestring, "(", *(*int8)(eface.data), ")")
+	case kindInt16:
+		print(typestring, "(", *(*int16)(eface.data), ")")
+	case kindInt32:
+		print(typestring, "(", *(*int32)(eface.data), ")")
+	case kindInt64:
+		print(typestring, "(", *(*int64)(eface.data), ")")
+	case kindUint:
+		print(typestring, "(", *(*uint)(eface.data), ")")
+	case kindUint8:
+		print(typestring, "(", *(*uint8)(eface.data), ")")
+	case kindUint16:
+		print(typestring, "(", *(*uint16)(eface.data), ")")
+	case kindUint32:
+		print(typestring, "(", *(*uint32)(eface.data), ")")
+	case kindUint64:
+		print(typestring, "(", *(*uint64)(eface.data), ")")
+	case kindUintptr:
+		print(typestring, "(", *(*uintptr)(eface.data), ")")
+	case kindFloat32:
+		print(typestring, "(", *(*float32)(eface.data), ")")
+	case kindFloat64:
+		print(typestring, "(", *(*float64)(eface.data), ")")
+	case kindComplex64:
+		print(typestring, *(*complex64)(eface.data))
+	case kindComplex128:
+		print(typestring, *(*complex128)(eface.data))
+	default:
+		print("(", typestring, ") ", eface.data)
+	}
+}
+
+// panicwrap generates a panic for a call to a wrapped value method
+// with a nil pointer receiver.
+//
+// It is called from the generated wrapper code.
+func panicwrap() {
+	pc := getcallerpc()
+	name := funcname(findfunc(pc))
+	// name is something like "main.(*T).F".
+	// We want to extract pkg ("main"), typ ("T"), and meth ("F").
+	// Do it by finding the parens.
+	i := bytealg.IndexByteString(name, '(')
+	if i < 0 {
+		throw("panicwrap: no ( in " + name)
+	}
+	pkg := name[:i-1]
+	if i+2 >= len(name) || name[i-1:i+2] != ".(*" {
+		throw("panicwrap: unexpected string after package name: " + name)
+	}
+	name = name[i+2:]
+	i = bytealg.IndexByteString(name, ')')
+	if i < 0 {
+		throw("panicwrap: no ) in " + name)
+	}
+	if i+2 >= len(name) || name[i:i+2] != ")." {
+		throw("panicwrap: unexpected string after type name: " + name)
+	}
+	typ := name[:i]
+	meth := name[i+2:]
+	panic(plainError("value method " + pkg + "." + typ + "." + meth + " called using nil *" + typ + " pointer"))
+}
diff --git a/src/runtime/example_test.go b/src/runtime/example_test.go
new file mode 100644
index 0000000..e4912a5
--- /dev/null
+++ b/src/runtime/example_test.go
@@ -0,0 +1,54 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"runtime"
+	"strings"
+)
+
+func ExampleFrames() {
+	c := func() {
+		// Ask runtime.Callers for up to 10 pcs, including runtime.Callers itself.
+		pc := make([]uintptr, 10)
+		n := runtime.Callers(0, pc)
+		if n == 0 {
+			// No pcs available. Stop now.
+			// This can happen if the first argument to runtime.Callers is large.
+			return
+		}
+
+		pc = pc[:n] // pass only valid pcs to runtime.CallersFrames
+		frames := runtime.CallersFrames(pc)
+
+		// Loop to get frames.
+		// A fixed number of pcs can expand to an indefinite number of Frames.
+		for {
+			frame, more := frames.Next()
+			// To keep this example's output stable
+			// even if there are changes in the testing package,
+			// stop unwinding when we leave package runtime.
+			if !strings.Contains(frame.File, "runtime/") {
+				break
+			}
+			fmt.Printf("- more:%v | %s\n", more, frame.Function)
+			if !more {
+				break
+			}
+		}
+	}
+
+	b := func() { c() }
+	a := func() { b() }
+
+	a()
+	// Output:
+	// - more:true | runtime.Callers
+	// - more:true | runtime_test.ExampleFrames.func1
+	// - more:true | runtime_test.ExampleFrames.func2
+	// - more:true | runtime_test.ExampleFrames.func3
+	// - more:true | runtime_test.ExampleFrames
+}
diff --git a/src/runtime/export_aix_test.go b/src/runtime/export_aix_test.go
new file mode 100644
index 0000000..162552d
--- /dev/null
+++ b/src/runtime/export_aix_test.go
@@ -0,0 +1,7 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var Fcntl = syscall_fcntl1
diff --git a/src/runtime/export_arm_test.go b/src/runtime/export_arm_test.go
new file mode 100644
index 0000000..b8a89fc
--- /dev/null
+++ b/src/runtime/export_arm_test.go
@@ -0,0 +1,9 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Export guts for testing.
+
+package runtime
+
+var Usplit = usplit
diff --git a/src/runtime/export_darwin_test.go b/src/runtime/export_darwin_test.go
new file mode 100644
index 0000000..e9b6eb3
--- /dev/null
+++ b/src/runtime/export_darwin_test.go
@@ -0,0 +1,13 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func Fcntl(fd, cmd, arg uintptr) (uintptr, uintptr) {
+	r := fcntl(int32(fd), int32(cmd), int32(arg))
+	if r < 0 {
+		return ^uintptr(0), uintptr(-r)
+	}
+	return uintptr(r), 0
+}
diff --git a/src/runtime/export_debug_test.go b/src/runtime/export_debug_test.go
new file mode 100644
index 0000000..ed4242e
--- /dev/null
+++ b/src/runtime/export_debug_test.go
@@ -0,0 +1,199 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build amd64
+// +build linux
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// InjectDebugCall injects a debugger call to fn into g. args must be
+// a pointer to a valid call frame (including arguments and return
+// space) for fn, or nil. tkill must be a function that will send
+// SIGTRAP to thread ID tid. gp must be locked to its OS thread and
+// running.
+//
+// On success, InjectDebugCall returns the panic value of fn or nil.
+// If fn did not panic, its results will be available in args.
+func InjectDebugCall(gp *g, fn, args interface{}, tkill func(tid int) error, returnOnUnsafePoint bool) (interface{}, error) {
+	if gp.lockedm == 0 {
+		return nil, plainError("goroutine not locked to thread")
+	}
+
+	tid := int(gp.lockedm.ptr().procid)
+	if tid == 0 {
+		return nil, plainError("missing tid")
+	}
+
+	f := efaceOf(&fn)
+	if f._type == nil || f._type.kind&kindMask != kindFunc {
+		return nil, plainError("fn must be a function")
+	}
+	fv := (*funcval)(f.data)
+
+	a := efaceOf(&args)
+	if a._type != nil && a._type.kind&kindMask != kindPtr {
+		return nil, plainError("args must be a pointer or nil")
+	}
+	argp := a.data
+	var argSize uintptr
+	if argp != nil {
+		argSize = (*ptrtype)(unsafe.Pointer(a._type)).elem.size
+	}
+
+	h := new(debugCallHandler)
+	h.gp = gp
+	// gp may not be running right now, but we can still get the M
+	// it will run on since it's locked.
+	h.mp = gp.lockedm.ptr()
+	h.fv, h.argp, h.argSize = fv, argp, argSize
+	h.handleF = h.handle // Avoid allocating closure during signal
+
+	defer func() { testSigtrap = nil }()
+	for i := 0; ; i++ {
+		testSigtrap = h.inject
+		noteclear(&h.done)
+		h.err = ""
+
+		if err := tkill(tid); err != nil {
+			return nil, err
+		}
+		// Wait for completion.
+		notetsleepg(&h.done, -1)
+		if h.err != "" {
+			switch h.err {
+			case "call not at safe point":
+				if returnOnUnsafePoint {
+					// This is for TestDebugCallUnsafePoint.
+					return nil, h.err
+				}
+				fallthrough
+			case "retry _Grunnable", "executing on Go runtime stack", "call from within the Go runtime":
+				// These are transient states. Try to get out of them.
+				if i < 100 {
+					usleep(100)
+					Gosched()
+					continue
+				}
+			}
+			return nil, h.err
+		}
+		return h.panic, nil
+	}
+}
+
+type debugCallHandler struct {
+	gp      *g
+	mp      *m
+	fv      *funcval
+	argp    unsafe.Pointer
+	argSize uintptr
+	panic   interface{}
+
+	handleF func(info *siginfo, ctxt *sigctxt, gp2 *g) bool
+
+	err       plainError
+	done      note
+	savedRegs sigcontext
+	savedFP   fpstate1
+}
+
+func (h *debugCallHandler) inject(info *siginfo, ctxt *sigctxt, gp2 *g) bool {
+	switch h.gp.atomicstatus {
+	case _Grunning:
+		if getg().m != h.mp {
+			println("trap on wrong M", getg().m, h.mp)
+			return false
+		}
+		// Push current PC on the stack.
+		rsp := ctxt.rsp() - sys.PtrSize
+		*(*uint64)(unsafe.Pointer(uintptr(rsp))) = ctxt.rip()
+		ctxt.set_rsp(rsp)
+		// Write the argument frame size.
+		*(*uintptr)(unsafe.Pointer(uintptr(rsp - 16))) = h.argSize
+		// Save current registers.
+		h.savedRegs = *ctxt.regs()
+		h.savedFP = *h.savedRegs.fpstate
+		h.savedRegs.fpstate = nil
+		// Set PC to debugCallV1.
+		ctxt.set_rip(uint64(funcPC(debugCallV1)))
+		// Call injected. Switch to the debugCall protocol.
+		testSigtrap = h.handleF
+	case _Grunnable:
+		// Ask InjectDebugCall to pause for a bit and then try
+		// again to interrupt this goroutine.
+		h.err = plainError("retry _Grunnable")
+		notewakeup(&h.done)
+	default:
+		h.err = plainError("goroutine in unexpected state at call inject")
+		notewakeup(&h.done)
+	}
+	// Resume execution.
+	return true
+}
+
+func (h *debugCallHandler) handle(info *siginfo, ctxt *sigctxt, gp2 *g) bool {
+	// Sanity check.
+	if getg().m != h.mp {
+		println("trap on wrong M", getg().m, h.mp)
+		return false
+	}
+	f := findfunc(uintptr(ctxt.rip()))
+	if !(hasPrefix(funcname(f), "runtime.debugCall") || hasPrefix(funcname(f), "debugCall")) {
+		println("trap in unknown function", funcname(f))
+		return false
+	}
+	if *(*byte)(unsafe.Pointer(uintptr(ctxt.rip() - 1))) != 0xcc {
+		println("trap at non-INT3 instruction pc =", hex(ctxt.rip()))
+		return false
+	}
+
+	switch status := ctxt.rax(); status {
+	case 0:
+		// Frame is ready. Copy the arguments to the frame.
+		sp := ctxt.rsp()
+		memmove(unsafe.Pointer(uintptr(sp)), h.argp, h.argSize)
+		// Push return PC.
+		sp -= sys.PtrSize
+		ctxt.set_rsp(sp)
+		*(*uint64)(unsafe.Pointer(uintptr(sp))) = ctxt.rip()
+		// Set PC to call and context register.
+		ctxt.set_rip(uint64(h.fv.fn))
+		ctxt.regs().rcx = uint64(uintptr(unsafe.Pointer(h.fv)))
+	case 1:
+		// Function returned. Copy frame back out.
+		sp := ctxt.rsp()
+		memmove(h.argp, unsafe.Pointer(uintptr(sp)), h.argSize)
+	case 2:
+		// Function panicked. Copy panic out.
+		sp := ctxt.rsp()
+		memmove(unsafe.Pointer(&h.panic), unsafe.Pointer(uintptr(sp)), 2*sys.PtrSize)
+	case 8:
+		// Call isn't safe. Get the reason.
+		sp := ctxt.rsp()
+		reason := *(*string)(unsafe.Pointer(uintptr(sp)))
+		h.err = plainError(reason)
+		// Don't wake h.done. We need to transition to status 16 first.
+	case 16:
+		// Restore all registers except RIP and RSP.
+		rip, rsp := ctxt.rip(), ctxt.rsp()
+		fp := ctxt.regs().fpstate
+		*ctxt.regs() = h.savedRegs
+		ctxt.regs().fpstate = fp
+		*fp = h.savedFP
+		ctxt.set_rip(rip)
+		ctxt.set_rsp(rsp)
+		// Done
+		notewakeup(&h.done)
+	default:
+		h.err = plainError("unexpected debugCallV1 status")
+		notewakeup(&h.done)
+	}
+	// Resume execution.
+	return true
+}
diff --git a/src/runtime/export_debuglog_test.go b/src/runtime/export_debuglog_test.go
new file mode 100644
index 0000000..8cd943b
--- /dev/null
+++ b/src/runtime/export_debuglog_test.go
@@ -0,0 +1,46 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Export debuglog guts for testing.
+
+package runtime
+
+const DlogEnabled = dlogEnabled
+
+const DebugLogBytes = debugLogBytes
+
+const DebugLogStringLimit = debugLogStringLimit
+
+var Dlog = dlog
+
+func (l *dlogger) End()                     { l.end() }
+func (l *dlogger) B(x bool) *dlogger        { return l.b(x) }
+func (l *dlogger) I(x int) *dlogger         { return l.i(x) }
+func (l *dlogger) I16(x int16) *dlogger     { return l.i16(x) }
+func (l *dlogger) U64(x uint64) *dlogger    { return l.u64(x) }
+func (l *dlogger) Hex(x uint64) *dlogger    { return l.hex(x) }
+func (l *dlogger) P(x interface{}) *dlogger { return l.p(x) }
+func (l *dlogger) S(x string) *dlogger      { return l.s(x) }
+func (l *dlogger) PC(x uintptr) *dlogger    { return l.pc(x) }
+
+func DumpDebugLog() string {
+	g := getg()
+	g.writebuf = make([]byte, 0, 1<<20)
+	printDebugLog()
+	buf := g.writebuf
+	g.writebuf = nil
+
+	return string(buf)
+}
+
+func ResetDebugLog() {
+	stopTheWorld("ResetDebugLog")
+	for l := allDloggers; l != nil; l = l.allLink {
+		l.w.write = 0
+		l.w.tick, l.w.nano = 0, 0
+		l.w.r.begin, l.w.r.end = 0, 0
+		l.w.r.tick, l.w.r.nano = 0, 0
+	}
+	startTheWorld()
+}
diff --git a/src/runtime/export_futex_test.go b/src/runtime/export_futex_test.go
new file mode 100644
index 0000000..a727a93
--- /dev/null
+++ b/src/runtime/export_futex_test.go
@@ -0,0 +1,19 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build dragonfly freebsd linux
+
+package runtime
+
+var Futexwakeup = futexwakeup
+
+//go:nosplit
+func Futexsleep(addr *uint32, val uint32, ns int64) {
+	// Temporarily disable preemption so that a preemption signal
+	// doesn't interrupt the system call.
+	poff := debug.asyncpreemptoff
+	debug.asyncpreemptoff = 1
+	futexsleep(addr, val, ns)
+	debug.asyncpreemptoff = poff
+}
diff --git a/src/runtime/export_linux_test.go b/src/runtime/export_linux_test.go
new file mode 100644
index 0000000..b7c901f
--- /dev/null
+++ b/src/runtime/export_linux_test.go
@@ -0,0 +1,19 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Export guts for testing.
+
+package runtime
+
+import "unsafe"
+
+var NewOSProc0 = newosproc0
+var Mincore = mincore
+var Add = add
+
+type EpollEvent epollevent
+
+func Epollctl(epfd, op, fd int32, ev unsafe.Pointer) int32 {
+	return epollctl(epfd, op, fd, (*epollevent)(ev))
+}
diff --git a/src/runtime/export_mmap_test.go b/src/runtime/export_mmap_test.go
new file mode 100644
index 0000000..aeaf37f
--- /dev/null
+++ b/src/runtime/export_mmap_test.go
@@ -0,0 +1,21 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+
+// Export guts for testing.
+
+package runtime
+
+var Mmap = mmap
+var Munmap = munmap
+
+const ENOMEM = _ENOMEM
+const MAP_ANON = _MAP_ANON
+const MAP_PRIVATE = _MAP_PRIVATE
+const MAP_FIXED = _MAP_FIXED
+
+func GetPhysPageSize() uintptr {
+	return physPageSize
+}
diff --git a/src/runtime/export_pipe2_test.go b/src/runtime/export_pipe2_test.go
new file mode 100644
index 0000000..9d580d3
--- /dev/null
+++ b/src/runtime/export_pipe2_test.go
@@ -0,0 +1,15 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build freebsd linux netbsd openbsd solaris
+
+package runtime
+
+func Pipe() (r, w int32, errno int32) {
+	r, w, errno = pipe2(0)
+	if errno == _ENOSYS {
+		return pipe()
+	}
+	return r, w, errno
+}
diff --git a/src/runtime/export_pipe_test.go b/src/runtime/export_pipe_test.go
new file mode 100644
index 0000000..8f66770
--- /dev/null
+++ b/src/runtime/export_pipe_test.go
@@ -0,0 +1,9 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly
+
+package runtime
+
+var Pipe = pipe
diff --git a/src/runtime/export_solaris_test.go b/src/runtime/export_solaris_test.go
new file mode 100644
index 0000000..e865c77
--- /dev/null
+++ b/src/runtime/export_solaris_test.go
@@ -0,0 +1,9 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func Fcntl(fd, cmd, arg uintptr) (uintptr, uintptr) {
+	return sysvicall3Err(&libc_fcntl, fd, cmd, arg)
+}
diff --git a/src/runtime/export_test.go b/src/runtime/export_test.go
new file mode 100644
index 0000000..22fef31
--- /dev/null
+++ b/src/runtime/export_test.go
@@ -0,0 +1,1216 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Export guts for testing.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+var Fadd64 = fadd64
+var Fsub64 = fsub64
+var Fmul64 = fmul64
+var Fdiv64 = fdiv64
+var F64to32 = f64to32
+var F32to64 = f32to64
+var Fcmp64 = fcmp64
+var Fintto64 = fintto64
+var F64toint = f64toint
+
+var Entersyscall = entersyscall
+var Exitsyscall = exitsyscall
+var LockedOSThread = lockedOSThread
+var Xadduintptr = atomic.Xadduintptr
+
+var FuncPC = funcPC
+
+var Fastlog2 = fastlog2
+
+var Atoi = atoi
+var Atoi32 = atoi32
+
+var Nanotime = nanotime
+var NetpollBreak = netpollBreak
+var Usleep = usleep
+
+var PhysPageSize = physPageSize
+var PhysHugePageSize = physHugePageSize
+
+var NetpollGenericInit = netpollGenericInit
+
+var Memmove = memmove
+var MemclrNoHeapPointers = memclrNoHeapPointers
+
+const PreemptMSupported = preemptMSupported
+
+type LFNode struct {
+	Next    uint64
+	Pushcnt uintptr
+}
+
+func LFStackPush(head *uint64, node *LFNode) {
+	(*lfstack)(head).push((*lfnode)(unsafe.Pointer(node)))
+}
+
+func LFStackPop(head *uint64) *LFNode {
+	return (*LFNode)(unsafe.Pointer((*lfstack)(head).pop()))
+}
+
+func Netpoll(delta int64) {
+	systemstack(func() {
+		netpoll(delta)
+	})
+}
+
+func GCMask(x interface{}) (ret []byte) {
+	systemstack(func() {
+		ret = getgcmask(x)
+	})
+	return
+}
+
+func RunSchedLocalQueueTest() {
+	_p_ := new(p)
+	gs := make([]g, len(_p_.runq))
+	for i := 0; i < len(_p_.runq); i++ {
+		if g, _ := runqget(_p_); g != nil {
+			throw("runq is not empty initially")
+		}
+		for j := 0; j < i; j++ {
+			runqput(_p_, &gs[i], false)
+		}
+		for j := 0; j < i; j++ {
+			if g, _ := runqget(_p_); g != &gs[i] {
+				print("bad element at iter ", i, "/", j, "\n")
+				throw("bad element")
+			}
+		}
+		if g, _ := runqget(_p_); g != nil {
+			throw("runq is not empty afterwards")
+		}
+	}
+}
+
+func RunSchedLocalQueueStealTest() {
+	p1 := new(p)
+	p2 := new(p)
+	gs := make([]g, len(p1.runq))
+	for i := 0; i < len(p1.runq); i++ {
+		for j := 0; j < i; j++ {
+			gs[j].sig = 0
+			runqput(p1, &gs[j], false)
+		}
+		gp := runqsteal(p2, p1, true)
+		s := 0
+		if gp != nil {
+			s++
+			gp.sig++
+		}
+		for {
+			gp, _ = runqget(p2)
+			if gp == nil {
+				break
+			}
+			s++
+			gp.sig++
+		}
+		for {
+			gp, _ = runqget(p1)
+			if gp == nil {
+				break
+			}
+			gp.sig++
+		}
+		for j := 0; j < i; j++ {
+			if gs[j].sig != 1 {
+				print("bad element ", j, "(", gs[j].sig, ") at iter ", i, "\n")
+				throw("bad element")
+			}
+		}
+		if s != i/2 && s != i/2+1 {
+			print("bad steal ", s, ", want ", i/2, " or ", i/2+1, ", iter ", i, "\n")
+			throw("bad steal")
+		}
+	}
+}
+
+func RunSchedLocalQueueEmptyTest(iters int) {
+	// Test that runq is not spuriously reported as empty.
+	// Runq emptiness affects scheduling decisions and spurious emptiness
+	// can lead to underutilization (both runnable Gs and idle Ps coexist
+	// for arbitrary long time).
+	done := make(chan bool, 1)
+	p := new(p)
+	gs := make([]g, 2)
+	ready := new(uint32)
+	for i := 0; i < iters; i++ {
+		*ready = 0
+		next0 := (i & 1) == 0
+		next1 := (i & 2) == 0
+		runqput(p, &gs[0], next0)
+		go func() {
+			for atomic.Xadd(ready, 1); atomic.Load(ready) != 2; {
+			}
+			if runqempty(p) {
+				println("next:", next0, next1)
+				throw("queue is empty")
+			}
+			done <- true
+		}()
+		for atomic.Xadd(ready, 1); atomic.Load(ready) != 2; {
+		}
+		runqput(p, &gs[1], next1)
+		runqget(p)
+		<-done
+		runqget(p)
+	}
+}
+
+var (
+	StringHash = stringHash
+	BytesHash  = bytesHash
+	Int32Hash  = int32Hash
+	Int64Hash  = int64Hash
+	MemHash    = memhash
+	MemHash32  = memhash32
+	MemHash64  = memhash64
+	EfaceHash  = efaceHash
+	IfaceHash  = ifaceHash
+)
+
+var UseAeshash = &useAeshash
+
+func MemclrBytes(b []byte) {
+	s := (*slice)(unsafe.Pointer(&b))
+	memclrNoHeapPointers(s.array, uintptr(s.len))
+}
+
+var HashLoad = &hashLoad
+
+// entry point for testing
+func GostringW(w []uint16) (s string) {
+	systemstack(func() {
+		s = gostringw(&w[0])
+	})
+	return
+}
+
+type Uintreg sys.Uintreg
+
+var Open = open
+var Close = closefd
+var Read = read
+var Write = write
+
+func Envs() []string     { return envs }
+func SetEnvs(e []string) { envs = e }
+
+var BigEndian = sys.BigEndian
+
+// For benchmarking.
+
+func BenchSetType(n int, x interface{}) {
+	e := *efaceOf(&x)
+	t := e._type
+	var size uintptr
+	var p unsafe.Pointer
+	switch t.kind & kindMask {
+	case kindPtr:
+		t = (*ptrtype)(unsafe.Pointer(t)).elem
+		size = t.size
+		p = e.data
+	case kindSlice:
+		slice := *(*struct {
+			ptr      unsafe.Pointer
+			len, cap uintptr
+		})(e.data)
+		t = (*slicetype)(unsafe.Pointer(t)).elem
+		size = t.size * slice.len
+		p = slice.ptr
+	}
+	allocSize := roundupsize(size)
+	systemstack(func() {
+		for i := 0; i < n; i++ {
+			heapBitsSetType(uintptr(p), allocSize, size, t)
+		}
+	})
+}
+
+const PtrSize = sys.PtrSize
+
+var ForceGCPeriod = &forcegcperiod
+
+// SetTracebackEnv is like runtime/debug.SetTraceback, but it raises
+// the "environment" traceback level, so later calls to
+// debug.SetTraceback (e.g., from testing timeouts) can't lower it.
+func SetTracebackEnv(level string) {
+	setTraceback(level)
+	traceback_env = traceback_cache
+}
+
+var ReadUnaligned32 = readUnaligned32
+var ReadUnaligned64 = readUnaligned64
+
+func CountPagesInUse() (pagesInUse, counted uintptr) {
+	stopTheWorld("CountPagesInUse")
+
+	pagesInUse = uintptr(mheap_.pagesInUse)
+
+	for _, s := range mheap_.allspans {
+		if s.state.get() == mSpanInUse {
+			counted += s.npages
+		}
+	}
+
+	startTheWorld()
+
+	return
+}
+
+func Fastrand() uint32          { return fastrand() }
+func Fastrandn(n uint32) uint32 { return fastrandn(n) }
+
+type ProfBuf profBuf
+
+func NewProfBuf(hdrsize, bufwords, tags int) *ProfBuf {
+	return (*ProfBuf)(newProfBuf(hdrsize, bufwords, tags))
+}
+
+func (p *ProfBuf) Write(tag *unsafe.Pointer, now int64, hdr []uint64, stk []uintptr) {
+	(*profBuf)(p).write(tag, now, hdr, stk)
+}
+
+const (
+	ProfBufBlocking    = profBufBlocking
+	ProfBufNonBlocking = profBufNonBlocking
+)
+
+func (p *ProfBuf) Read(mode profBufReadMode) ([]uint64, []unsafe.Pointer, bool) {
+	return (*profBuf)(p).read(profBufReadMode(mode))
+}
+
+func (p *ProfBuf) Close() {
+	(*profBuf)(p).close()
+}
+
+func ReadMetricsSlow(memStats *MemStats, samplesp unsafe.Pointer, len, cap int) {
+	stopTheWorld("ReadMetricsSlow")
+
+	// Initialize the metrics beforehand because this could
+	// allocate and skew the stats.
+	semacquire(&metricsSema)
+	initMetrics()
+	semrelease(&metricsSema)
+
+	systemstack(func() {
+		// Read memstats first. It's going to flush
+		// the mcaches which readMetrics does not do, so
+		// going the other way around may result in
+		// inconsistent statistics.
+		readmemstats_m(memStats)
+	})
+
+	// Read metrics off the system stack.
+	//
+	// The only part of readMetrics that could allocate
+	// and skew the stats is initMetrics.
+	readMetrics(samplesp, len, cap)
+
+	startTheWorld()
+}
+
+// ReadMemStatsSlow returns both the runtime-computed MemStats and
+// MemStats accumulated by scanning the heap.
+func ReadMemStatsSlow() (base, slow MemStats) {
+	stopTheWorld("ReadMemStatsSlow")
+
+	// Run on the system stack to avoid stack growth allocation.
+	systemstack(func() {
+		// Make sure stats don't change.
+		getg().m.mallocing++
+
+		readmemstats_m(&base)
+
+		// Initialize slow from base and zero the fields we're
+		// recomputing.
+		slow = base
+		slow.Alloc = 0
+		slow.TotalAlloc = 0
+		slow.Mallocs = 0
+		slow.Frees = 0
+		slow.HeapReleased = 0
+		var bySize [_NumSizeClasses]struct {
+			Mallocs, Frees uint64
+		}
+
+		// Add up current allocations in spans.
+		for _, s := range mheap_.allspans {
+			if s.state.get() != mSpanInUse {
+				continue
+			}
+			if sizeclass := s.spanclass.sizeclass(); sizeclass == 0 {
+				slow.Mallocs++
+				slow.Alloc += uint64(s.elemsize)
+			} else {
+				slow.Mallocs += uint64(s.allocCount)
+				slow.Alloc += uint64(s.allocCount) * uint64(s.elemsize)
+				bySize[sizeclass].Mallocs += uint64(s.allocCount)
+			}
+		}
+
+		// Add in frees by just reading the stats for those directly.
+		var m heapStatsDelta
+		memstats.heapStats.unsafeRead(&m)
+
+		// Collect per-sizeclass free stats.
+		var smallFree uint64
+		for i := 0; i < _NumSizeClasses; i++ {
+			slow.Frees += uint64(m.smallFreeCount[i])
+			bySize[i].Frees += uint64(m.smallFreeCount[i])
+			bySize[i].Mallocs += uint64(m.smallFreeCount[i])
+			smallFree += uint64(m.smallFreeCount[i]) * uint64(class_to_size[i])
+		}
+		slow.Frees += memstats.tinyallocs + uint64(m.largeFreeCount)
+		slow.Mallocs += slow.Frees
+
+		slow.TotalAlloc = slow.Alloc + uint64(m.largeFree) + smallFree
+
+		for i := range slow.BySize {
+			slow.BySize[i].Mallocs = bySize[i].Mallocs
+			slow.BySize[i].Frees = bySize[i].Frees
+		}
+
+		for i := mheap_.pages.start; i < mheap_.pages.end; i++ {
+			chunk := mheap_.pages.tryChunkOf(i)
+			if chunk == nil {
+				continue
+			}
+			pg := chunk.scavenged.popcntRange(0, pallocChunkPages)
+			slow.HeapReleased += uint64(pg) * pageSize
+		}
+		for _, p := range allp {
+			pg := sys.OnesCount64(p.pcache.scav)
+			slow.HeapReleased += uint64(pg) * pageSize
+		}
+
+		// Unused space in the current arena also counts as released space.
+		slow.HeapReleased += uint64(mheap_.curArena.end - mheap_.curArena.base)
+
+		getg().m.mallocing--
+	})
+
+	startTheWorld()
+	return
+}
+
+// BlockOnSystemStack switches to the system stack, prints "x\n" to
+// stderr, and blocks in a stack containing
+// "runtime.blockOnSystemStackInternal".
+func BlockOnSystemStack() {
+	systemstack(blockOnSystemStackInternal)
+}
+
+func blockOnSystemStackInternal() {
+	print("x\n")
+	lock(&deadlock)
+	lock(&deadlock)
+}
+
+type RWMutex struct {
+	rw rwmutex
+}
+
+func (rw *RWMutex) RLock() {
+	rw.rw.rlock()
+}
+
+func (rw *RWMutex) RUnlock() {
+	rw.rw.runlock()
+}
+
+func (rw *RWMutex) Lock() {
+	rw.rw.lock()
+}
+
+func (rw *RWMutex) Unlock() {
+	rw.rw.unlock()
+}
+
+const RuntimeHmapSize = unsafe.Sizeof(hmap{})
+
+func MapBucketsCount(m map[int]int) int {
+	h := *(**hmap)(unsafe.Pointer(&m))
+	return 1 << h.B
+}
+
+func MapBucketsPointerIsNil(m map[int]int) bool {
+	h := *(**hmap)(unsafe.Pointer(&m))
+	return h.buckets == nil
+}
+
+func LockOSCounts() (external, internal uint32) {
+	g := getg()
+	if g.m.lockedExt+g.m.lockedInt == 0 {
+		if g.lockedm != 0 {
+			panic("lockedm on non-locked goroutine")
+		}
+	} else {
+		if g.lockedm == 0 {
+			panic("nil lockedm on locked goroutine")
+		}
+	}
+	return g.m.lockedExt, g.m.lockedInt
+}
+
+//go:noinline
+func TracebackSystemstack(stk []uintptr, i int) int {
+	if i == 0 {
+		pc, sp := getcallerpc(), getcallersp()
+		return gentraceback(pc, sp, 0, getg(), 0, &stk[0], len(stk), nil, nil, _TraceJumpStack)
+	}
+	n := 0
+	systemstack(func() {
+		n = TracebackSystemstack(stk, i-1)
+	})
+	return n
+}
+
+func KeepNArenaHints(n int) {
+	hint := mheap_.arenaHints
+	for i := 1; i < n; i++ {
+		hint = hint.next
+		if hint == nil {
+			return
+		}
+	}
+	hint.next = nil
+}
+
+// MapNextArenaHint reserves a page at the next arena growth hint,
+// preventing the arena from growing there, and returns the range of
+// addresses that are no longer viable.
+func MapNextArenaHint() (start, end uintptr) {
+	hint := mheap_.arenaHints
+	addr := hint.addr
+	if hint.down {
+		start, end = addr-heapArenaBytes, addr
+		addr -= physPageSize
+	} else {
+		start, end = addr, addr+heapArenaBytes
+	}
+	sysReserve(unsafe.Pointer(addr), physPageSize)
+	return
+}
+
+func GetNextArenaHint() uintptr {
+	return mheap_.arenaHints.addr
+}
+
+type G = g
+
+type Sudog = sudog
+
+func Getg() *G {
+	return getg()
+}
+
+//go:noinline
+func PanicForTesting(b []byte, i int) byte {
+	return unexportedPanicForTesting(b, i)
+}
+
+//go:noinline
+func unexportedPanicForTesting(b []byte, i int) byte {
+	return b[i]
+}
+
+func G0StackOverflow() {
+	systemstack(func() {
+		stackOverflow(nil)
+	})
+}
+
+func stackOverflow(x *byte) {
+	var buf [256]byte
+	stackOverflow(&buf[0])
+}
+
+func MapTombstoneCheck(m map[int]int) {
+	// Make sure emptyOne and emptyRest are distributed correctly.
+	// We should have a series of filled and emptyOne cells, followed by
+	// a series of emptyRest cells.
+	h := *(**hmap)(unsafe.Pointer(&m))
+	i := interface{}(m)
+	t := *(**maptype)(unsafe.Pointer(&i))
+
+	for x := 0; x < 1<<h.B; x++ {
+		b0 := (*bmap)(add(h.buckets, uintptr(x)*uintptr(t.bucketsize)))
+		n := 0
+		for b := b0; b != nil; b = b.overflow(t) {
+			for i := 0; i < bucketCnt; i++ {
+				if b.tophash[i] != emptyRest {
+					n++
+				}
+			}
+		}
+		k := 0
+		for b := b0; b != nil; b = b.overflow(t) {
+			for i := 0; i < bucketCnt; i++ {
+				if k < n && b.tophash[i] == emptyRest {
+					panic("early emptyRest")
+				}
+				if k >= n && b.tophash[i] != emptyRest {
+					panic("late non-emptyRest")
+				}
+				if k == n-1 && b.tophash[i] == emptyOne {
+					panic("last non-emptyRest entry is emptyOne")
+				}
+				k++
+			}
+		}
+	}
+}
+
+func RunGetgThreadSwitchTest() {
+	// Test that getg works correctly with thread switch.
+	// With gccgo, if we generate getg inlined, the backend
+	// may cache the address of the TLS variable, which
+	// will become invalid after a thread switch. This test
+	// checks that the bad caching doesn't happen.
+
+	ch := make(chan int)
+	go func(ch chan int) {
+		ch <- 5
+		LockOSThread()
+	}(ch)
+
+	g1 := getg()
+
+	// Block on a receive. This is likely to get us a thread
+	// switch. If we yield to the sender goroutine, it will
+	// lock the thread, forcing us to resume on a different
+	// thread.
+	<-ch
+
+	g2 := getg()
+	if g1 != g2 {
+		panic("g1 != g2")
+	}
+
+	// Also test getg after some control flow, as the
+	// backend is sensitive to control flow.
+	g3 := getg()
+	if g1 != g3 {
+		panic("g1 != g3")
+	}
+}
+
+const (
+	PageSize         = pageSize
+	PallocChunkPages = pallocChunkPages
+	PageAlloc64Bit   = pageAlloc64Bit
+	PallocSumBytes   = pallocSumBytes
+)
+
+// Expose pallocSum for testing.
+type PallocSum pallocSum
+
+func PackPallocSum(start, max, end uint) PallocSum { return PallocSum(packPallocSum(start, max, end)) }
+func (m PallocSum) Start() uint                    { return pallocSum(m).start() }
+func (m PallocSum) Max() uint                      { return pallocSum(m).max() }
+func (m PallocSum) End() uint                      { return pallocSum(m).end() }
+
+// Expose pallocBits for testing.
+type PallocBits pallocBits
+
+func (b *PallocBits) Find(npages uintptr, searchIdx uint) (uint, uint) {
+	return (*pallocBits)(b).find(npages, searchIdx)
+}
+func (b *PallocBits) AllocRange(i, n uint)       { (*pallocBits)(b).allocRange(i, n) }
+func (b *PallocBits) Free(i, n uint)             { (*pallocBits)(b).free(i, n) }
+func (b *PallocBits) Summarize() PallocSum       { return PallocSum((*pallocBits)(b).summarize()) }
+func (b *PallocBits) PopcntRange(i, n uint) uint { return (*pageBits)(b).popcntRange(i, n) }
+
+// SummarizeSlow is a slow but more obviously correct implementation
+// of (*pallocBits).summarize. Used for testing.
+func SummarizeSlow(b *PallocBits) PallocSum {
+	var start, max, end uint
+
+	const N = uint(len(b)) * 64
+	for start < N && (*pageBits)(b).get(start) == 0 {
+		start++
+	}
+	for end < N && (*pageBits)(b).get(N-end-1) == 0 {
+		end++
+	}
+	run := uint(0)
+	for i := uint(0); i < N; i++ {
+		if (*pageBits)(b).get(i) == 0 {
+			run++
+		} else {
+			run = 0
+		}
+		if run > max {
+			max = run
+		}
+	}
+	return PackPallocSum(start, max, end)
+}
+
+// Expose non-trivial helpers for testing.
+func FindBitRange64(c uint64, n uint) uint { return findBitRange64(c, n) }
+
+// Given two PallocBits, returns a set of bit ranges where
+// they differ.
+func DiffPallocBits(a, b *PallocBits) []BitRange {
+	ba := (*pageBits)(a)
+	bb := (*pageBits)(b)
+
+	var d []BitRange
+	base, size := uint(0), uint(0)
+	for i := uint(0); i < uint(len(ba))*64; i++ {
+		if ba.get(i) != bb.get(i) {
+			if size == 0 {
+				base = i
+			}
+			size++
+		} else {
+			if size != 0 {
+				d = append(d, BitRange{base, size})
+			}
+			size = 0
+		}
+	}
+	if size != 0 {
+		d = append(d, BitRange{base, size})
+	}
+	return d
+}
+
+// StringifyPallocBits gets the bits in the bit range r from b,
+// and returns a string containing the bits as ASCII 0 and 1
+// characters.
+func StringifyPallocBits(b *PallocBits, r BitRange) string {
+	str := ""
+	for j := r.I; j < r.I+r.N; j++ {
+		if (*pageBits)(b).get(j) != 0 {
+			str += "1"
+		} else {
+			str += "0"
+		}
+	}
+	return str
+}
+
+// Expose pallocData for testing.
+type PallocData pallocData
+
+func (d *PallocData) FindScavengeCandidate(searchIdx uint, min, max uintptr) (uint, uint) {
+	return (*pallocData)(d).findScavengeCandidate(searchIdx, min, max)
+}
+func (d *PallocData) AllocRange(i, n uint) { (*pallocData)(d).allocRange(i, n) }
+func (d *PallocData) ScavengedSetRange(i, n uint) {
+	(*pallocData)(d).scavenged.setRange(i, n)
+}
+func (d *PallocData) PallocBits() *PallocBits {
+	return (*PallocBits)(&(*pallocData)(d).pallocBits)
+}
+func (d *PallocData) Scavenged() *PallocBits {
+	return (*PallocBits)(&(*pallocData)(d).scavenged)
+}
+
+// Expose fillAligned for testing.
+func FillAligned(x uint64, m uint) uint64 { return fillAligned(x, m) }
+
+// Expose pageCache for testing.
+type PageCache pageCache
+
+const PageCachePages = pageCachePages
+
+func NewPageCache(base uintptr, cache, scav uint64) PageCache {
+	return PageCache(pageCache{base: base, cache: cache, scav: scav})
+}
+func (c *PageCache) Empty() bool   { return (*pageCache)(c).empty() }
+func (c *PageCache) Base() uintptr { return (*pageCache)(c).base }
+func (c *PageCache) Cache() uint64 { return (*pageCache)(c).cache }
+func (c *PageCache) Scav() uint64  { return (*pageCache)(c).scav }
+func (c *PageCache) Alloc(npages uintptr) (uintptr, uintptr) {
+	return (*pageCache)(c).alloc(npages)
+}
+func (c *PageCache) Flush(s *PageAlloc) {
+	cp := (*pageCache)(c)
+	sp := (*pageAlloc)(s)
+
+	systemstack(func() {
+		// None of the tests need any higher-level locking, so we just
+		// take the lock internally.
+		lock(sp.mheapLock)
+		cp.flush(sp)
+		unlock(sp.mheapLock)
+	})
+}
+
+// Expose chunk index type.
+type ChunkIdx chunkIdx
+
+// Expose pageAlloc for testing. Note that because pageAlloc is
+// not in the heap, so is PageAlloc.
+type PageAlloc pageAlloc
+
+func (p *PageAlloc) Alloc(npages uintptr) (uintptr, uintptr) {
+	pp := (*pageAlloc)(p)
+
+	var addr, scav uintptr
+	systemstack(func() {
+		// None of the tests need any higher-level locking, so we just
+		// take the lock internally.
+		lock(pp.mheapLock)
+		addr, scav = pp.alloc(npages)
+		unlock(pp.mheapLock)
+	})
+	return addr, scav
+}
+func (p *PageAlloc) AllocToCache() PageCache {
+	pp := (*pageAlloc)(p)
+
+	var c PageCache
+	systemstack(func() {
+		// None of the tests need any higher-level locking, so we just
+		// take the lock internally.
+		lock(pp.mheapLock)
+		c = PageCache(pp.allocToCache())
+		unlock(pp.mheapLock)
+	})
+	return c
+}
+func (p *PageAlloc) Free(base, npages uintptr) {
+	pp := (*pageAlloc)(p)
+
+	systemstack(func() {
+		// None of the tests need any higher-level locking, so we just
+		// take the lock internally.
+		lock(pp.mheapLock)
+		pp.free(base, npages)
+		unlock(pp.mheapLock)
+	})
+}
+func (p *PageAlloc) Bounds() (ChunkIdx, ChunkIdx) {
+	return ChunkIdx((*pageAlloc)(p).start), ChunkIdx((*pageAlloc)(p).end)
+}
+func (p *PageAlloc) Scavenge(nbytes uintptr, mayUnlock bool) (r uintptr) {
+	pp := (*pageAlloc)(p)
+	systemstack(func() {
+		// None of the tests need any higher-level locking, so we just
+		// take the lock internally.
+		lock(pp.mheapLock)
+		r = pp.scavenge(nbytes, mayUnlock)
+		unlock(pp.mheapLock)
+	})
+	return
+}
+func (p *PageAlloc) InUse() []AddrRange {
+	ranges := make([]AddrRange, 0, len(p.inUse.ranges))
+	for _, r := range p.inUse.ranges {
+		ranges = append(ranges, AddrRange{r})
+	}
+	return ranges
+}
+
+// Returns nil if the PallocData's L2 is missing.
+func (p *PageAlloc) PallocData(i ChunkIdx) *PallocData {
+	ci := chunkIdx(i)
+	return (*PallocData)((*pageAlloc)(p).tryChunkOf(ci))
+}
+
+// AddrRange is a wrapper around addrRange for testing.
+type AddrRange struct {
+	addrRange
+}
+
+// MakeAddrRange creates a new address range.
+func MakeAddrRange(base, limit uintptr) AddrRange {
+	return AddrRange{makeAddrRange(base, limit)}
+}
+
+// Base returns the virtual base address of the address range.
+func (a AddrRange) Base() uintptr {
+	return a.addrRange.base.addr()
+}
+
+// Base returns the virtual address of the limit of the address range.
+func (a AddrRange) Limit() uintptr {
+	return a.addrRange.limit.addr()
+}
+
+// Equals returns true if the two address ranges are exactly equal.
+func (a AddrRange) Equals(b AddrRange) bool {
+	return a == b
+}
+
+// Size returns the size in bytes of the address range.
+func (a AddrRange) Size() uintptr {
+	return a.addrRange.size()
+}
+
+// AddrRanges is a wrapper around addrRanges for testing.
+type AddrRanges struct {
+	addrRanges
+	mutable bool
+}
+
+// NewAddrRanges creates a new empty addrRanges.
+//
+// Note that this initializes addrRanges just like in the
+// runtime, so its memory is persistentalloc'd. Call this
+// function sparingly since the memory it allocates is
+// leaked.
+//
+// This AddrRanges is mutable, so we can test methods like
+// Add.
+func NewAddrRanges() AddrRanges {
+	r := addrRanges{}
+	r.init(new(sysMemStat))
+	return AddrRanges{r, true}
+}
+
+// MakeAddrRanges creates a new addrRanges populated with
+// the ranges in a.
+//
+// The returned AddrRanges is immutable, so methods like
+// Add will fail.
+func MakeAddrRanges(a ...AddrRange) AddrRanges {
+	// Methods that manipulate the backing store of addrRanges.ranges should
+	// not be used on the result from this function (e.g. add) since they may
+	// trigger reallocation. That would normally be fine, except the new
+	// backing store won't come from the heap, but from persistentalloc, so
+	// we'll leak some memory implicitly.
+	ranges := make([]addrRange, 0, len(a))
+	total := uintptr(0)
+	for _, r := range a {
+		ranges = append(ranges, r.addrRange)
+		total += r.Size()
+	}
+	return AddrRanges{addrRanges{
+		ranges:     ranges,
+		totalBytes: total,
+		sysStat:    new(sysMemStat),
+	}, false}
+}
+
+// Ranges returns a copy of the ranges described by the
+// addrRanges.
+func (a *AddrRanges) Ranges() []AddrRange {
+	result := make([]AddrRange, 0, len(a.addrRanges.ranges))
+	for _, r := range a.addrRanges.ranges {
+		result = append(result, AddrRange{r})
+	}
+	return result
+}
+
+// FindSucc returns the successor to base. See addrRanges.findSucc
+// for more details.
+func (a *AddrRanges) FindSucc(base uintptr) int {
+	return a.findSucc(base)
+}
+
+// Add adds a new AddrRange to the AddrRanges.
+//
+// The AddrRange must be mutable (i.e. created by NewAddrRanges),
+// otherwise this method will throw.
+func (a *AddrRanges) Add(r AddrRange) {
+	if !a.mutable {
+		throw("attempt to mutate immutable AddrRanges")
+	}
+	a.add(r.addrRange)
+}
+
+// TotalBytes returns the totalBytes field of the addrRanges.
+func (a *AddrRanges) TotalBytes() uintptr {
+	return a.addrRanges.totalBytes
+}
+
+// BitRange represents a range over a bitmap.
+type BitRange struct {
+	I, N uint // bit index and length in bits
+}
+
+// NewPageAlloc creates a new page allocator for testing and
+// initializes it with the scav and chunks maps. Each key in these maps
+// represents a chunk index and each value is a series of bit ranges to
+// set within each bitmap's chunk.
+//
+// The initialization of the pageAlloc preserves the invariant that if a
+// scavenged bit is set the alloc bit is necessarily unset, so some
+// of the bits described by scav may be cleared in the final bitmap if
+// ranges in chunks overlap with them.
+//
+// scav is optional, and if nil, the scavenged bitmap will be cleared
+// (as opposed to all 1s, which it usually is). Furthermore, every
+// chunk index in scav must appear in chunks; ones that do not are
+// ignored.
+func NewPageAlloc(chunks, scav map[ChunkIdx][]BitRange) *PageAlloc {
+	p := new(pageAlloc)
+
+	// We've got an entry, so initialize the pageAlloc.
+	p.init(new(mutex), nil)
+	lockInit(p.mheapLock, lockRankMheap)
+	p.test = true
+
+	for i, init := range chunks {
+		addr := chunkBase(chunkIdx(i))
+
+		// Mark the chunk's existence in the pageAlloc.
+		systemstack(func() {
+			lock(p.mheapLock)
+			p.grow(addr, pallocChunkBytes)
+			unlock(p.mheapLock)
+		})
+
+		// Initialize the bitmap and update pageAlloc metadata.
+		chunk := p.chunkOf(chunkIndex(addr))
+
+		// Clear all the scavenged bits which grow set.
+		chunk.scavenged.clearRange(0, pallocChunkPages)
+
+		// Apply scavenge state if applicable.
+		if scav != nil {
+			if scvg, ok := scav[i]; ok {
+				for _, s := range scvg {
+					// Ignore the case of s.N == 0. setRange doesn't handle
+					// it and it's a no-op anyway.
+					if s.N != 0 {
+						chunk.scavenged.setRange(s.I, s.N)
+					}
+				}
+			}
+		}
+
+		// Apply alloc state.
+		for _, s := range init {
+			// Ignore the case of s.N == 0. allocRange doesn't handle
+			// it and it's a no-op anyway.
+			if s.N != 0 {
+				chunk.allocRange(s.I, s.N)
+			}
+		}
+
+		// Update heap metadata for the allocRange calls above.
+		systemstack(func() {
+			lock(p.mheapLock)
+			p.update(addr, pallocChunkPages, false, false)
+			unlock(p.mheapLock)
+		})
+	}
+
+	systemstack(func() {
+		lock(p.mheapLock)
+		p.scavengeStartGen()
+		unlock(p.mheapLock)
+	})
+
+	return (*PageAlloc)(p)
+}
+
+// FreePageAlloc releases hard OS resources owned by the pageAlloc. Once this
+// is called the pageAlloc may no longer be used. The object itself will be
+// collected by the garbage collector once it is no longer live.
+func FreePageAlloc(pp *PageAlloc) {
+	p := (*pageAlloc)(pp)
+
+	// Free all the mapped space for the summary levels.
+	if pageAlloc64Bit != 0 {
+		for l := 0; l < summaryLevels; l++ {
+			sysFree(unsafe.Pointer(&p.summary[l][0]), uintptr(cap(p.summary[l]))*pallocSumBytes, nil)
+		}
+	} else {
+		resSize := uintptr(0)
+		for _, s := range p.summary {
+			resSize += uintptr(cap(s)) * pallocSumBytes
+		}
+		sysFree(unsafe.Pointer(&p.summary[0][0]), alignUp(resSize, physPageSize), nil)
+	}
+
+	// Free the mapped space for chunks.
+	for i := range p.chunks {
+		if x := p.chunks[i]; x != nil {
+			p.chunks[i] = nil
+			// This memory comes from sysAlloc and will always be page-aligned.
+			sysFree(unsafe.Pointer(x), unsafe.Sizeof(*p.chunks[0]), nil)
+		}
+	}
+}
+
+// BaseChunkIdx is a convenient chunkIdx value which works on both
+// 64 bit and 32 bit platforms, allowing the tests to share code
+// between the two.
+//
+// This should not be higher than 0x100*pallocChunkBytes to support
+// mips and mipsle, which only have 31-bit address spaces.
+var BaseChunkIdx = ChunkIdx(chunkIndex(((0xc000*pageAlloc64Bit + 0x100*pageAlloc32Bit) * pallocChunkBytes) + arenaBaseOffset*sys.GoosAix))
+
+// PageBase returns an address given a chunk index and a page index
+// relative to that chunk.
+func PageBase(c ChunkIdx, pageIdx uint) uintptr {
+	return chunkBase(chunkIdx(c)) + uintptr(pageIdx)*pageSize
+}
+
+type BitsMismatch struct {
+	Base      uintptr
+	Got, Want uint64
+}
+
+func CheckScavengedBitsCleared(mismatches []BitsMismatch) (n int, ok bool) {
+	ok = true
+
+	// Run on the system stack to avoid stack growth allocation.
+	systemstack(func() {
+		getg().m.mallocing++
+
+		// Lock so that we can safely access the bitmap.
+		lock(&mheap_.lock)
+	chunkLoop:
+		for i := mheap_.pages.start; i < mheap_.pages.end; i++ {
+			chunk := mheap_.pages.tryChunkOf(i)
+			if chunk == nil {
+				continue
+			}
+			for j := 0; j < pallocChunkPages/64; j++ {
+				// Run over each 64-bit bitmap section and ensure
+				// scavenged is being cleared properly on allocation.
+				// If a used bit and scavenged bit are both set, that's
+				// an error, and could indicate a larger problem, or
+				// an accounting problem.
+				want := chunk.scavenged[j] &^ chunk.pallocBits[j]
+				got := chunk.scavenged[j]
+				if want != got {
+					ok = false
+					if n >= len(mismatches) {
+						break chunkLoop
+					}
+					mismatches[n] = BitsMismatch{
+						Base: chunkBase(i) + uintptr(j)*64*pageSize,
+						Got:  got,
+						Want: want,
+					}
+					n++
+				}
+			}
+		}
+		unlock(&mheap_.lock)
+
+		getg().m.mallocing--
+	})
+	return
+}
+
+func PageCachePagesLeaked() (leaked uintptr) {
+	stopTheWorld("PageCachePagesLeaked")
+
+	// Walk over destroyed Ps and look for unflushed caches.
+	deadp := allp[len(allp):cap(allp)]
+	for _, p := range deadp {
+		// Since we're going past len(allp) we may see nil Ps.
+		// Just ignore them.
+		if p != nil {
+			leaked += uintptr(sys.OnesCount64(p.pcache.cache))
+		}
+	}
+
+	startTheWorld()
+	return
+}
+
+var Semacquire = semacquire
+var Semrelease1 = semrelease1
+
+func SemNwait(addr *uint32) uint32 {
+	root := semroot(addr)
+	return atomic.Load(&root.nwait)
+}
+
+// MapHashCheck computes the hash of the key k for the map m, twice.
+// Method 1 uses the built-in hasher for the map.
+// Method 2 uses the typehash function (the one used by reflect).
+// Returns the two hash values, which should always be equal.
+func MapHashCheck(m interface{}, k interface{}) (uintptr, uintptr) {
+	// Unpack m.
+	mt := (*maptype)(unsafe.Pointer(efaceOf(&m)._type))
+	mh := (*hmap)(efaceOf(&m).data)
+
+	// Unpack k.
+	kt := efaceOf(&k)._type
+	var p unsafe.Pointer
+	if isDirectIface(kt) {
+		q := efaceOf(&k).data
+		p = unsafe.Pointer(&q)
+	} else {
+		p = efaceOf(&k).data
+	}
+
+	// Compute the hash functions.
+	x := mt.hasher(noescape(p), uintptr(mh.hash0))
+	y := typehash(kt, noescape(p), uintptr(mh.hash0))
+	return x, y
+}
+
+// mspan wrapper for testing.
+//go:notinheap
+type MSpan mspan
+
+// Allocate an mspan for testing.
+func AllocMSpan() *MSpan {
+	var s *mspan
+	systemstack(func() {
+		lock(&mheap_.lock)
+		s = (*mspan)(mheap_.spanalloc.alloc())
+		unlock(&mheap_.lock)
+	})
+	return (*MSpan)(s)
+}
+
+// Free an allocated mspan.
+func FreeMSpan(s *MSpan) {
+	systemstack(func() {
+		lock(&mheap_.lock)
+		mheap_.spanalloc.free(unsafe.Pointer(s))
+		unlock(&mheap_.lock)
+	})
+}
+
+func MSpanCountAlloc(ms *MSpan, bits []byte) int {
+	s := (*mspan)(ms)
+	s.nelems = uintptr(len(bits) * 8)
+	s.gcmarkBits = (*gcBits)(unsafe.Pointer(&bits[0]))
+	result := s.countAlloc()
+	s.gcmarkBits = nil
+	return result
+}
+
+const (
+	TimeHistSubBucketBits   = timeHistSubBucketBits
+	TimeHistNumSubBuckets   = timeHistNumSubBuckets
+	TimeHistNumSuperBuckets = timeHistNumSuperBuckets
+)
+
+type TimeHistogram timeHistogram
+
+// Counts returns the counts for the given bucket, subBucket indices.
+// Returns true if the bucket was valid, otherwise returns the counts
+// for the underflow bucket and false.
+func (th *TimeHistogram) Count(bucket, subBucket uint) (uint64, bool) {
+	t := (*timeHistogram)(th)
+	i := bucket*TimeHistNumSubBuckets + subBucket
+	if i >= uint(len(t.counts)) {
+		return t.underflow, false
+	}
+	return t.counts[i], true
+}
+
+func (th *TimeHistogram) Record(duration int64) {
+	(*timeHistogram)(th).record(duration)
+}
diff --git a/src/runtime/export_unix_test.go b/src/runtime/export_unix_test.go
new file mode 100644
index 0000000..307c63f
--- /dev/null
+++ b/src/runtime/export_unix_test.go
@@ -0,0 +1,93 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+
+package runtime
+
+import "unsafe"
+
+var NonblockingPipe = nonblockingPipe
+var SetNonblock = setNonblock
+var Closeonexec = closeonexec
+
+func sigismember(mask *sigset, i int) bool {
+	clear := *mask
+	sigdelset(&clear, i)
+	return clear != *mask
+}
+
+func Sigisblocked(i int) bool {
+	var sigmask sigset
+	sigprocmask(_SIG_SETMASK, nil, &sigmask)
+	return sigismember(&sigmask, i)
+}
+
+type M = m
+
+var waitForSigusr1 struct {
+	rdpipe int32
+	wrpipe int32
+	mID    int64
+}
+
+// WaitForSigusr1 blocks until a SIGUSR1 is received. It calls ready
+// when it is set up to receive SIGUSR1. The ready function should
+// cause a SIGUSR1 to be sent. The r and w arguments are a pipe that
+// the signal handler can use to report when the signal is received.
+//
+// Once SIGUSR1 is received, it returns the ID of the current M and
+// the ID of the M the SIGUSR1 was received on. If the caller writes
+// a non-zero byte to w, WaitForSigusr1 returns immediately with -1, -1.
+func WaitForSigusr1(r, w int32, ready func(mp *M)) (int64, int64) {
+	lockOSThread()
+	// Make sure we can receive SIGUSR1.
+	unblocksig(_SIGUSR1)
+
+	waitForSigusr1.rdpipe = r
+	waitForSigusr1.wrpipe = w
+
+	mp := getg().m
+	testSigusr1 = waitForSigusr1Callback
+	ready(mp)
+
+	// Wait for the signal. We use a pipe rather than a note
+	// because write is always async-signal-safe.
+	entersyscallblock()
+	var b byte
+	read(waitForSigusr1.rdpipe, noescape(unsafe.Pointer(&b)), 1)
+	exitsyscall()
+
+	gotM := waitForSigusr1.mID
+	testSigusr1 = nil
+
+	unlockOSThread()
+
+	if b != 0 {
+		// timeout signal from caller
+		return -1, -1
+	}
+	return mp.id, gotM
+}
+
+// waitForSigusr1Callback is called from the signal handler during
+// WaitForSigusr1. It must not have write barriers because there may
+// not be a P.
+//
+//go:nowritebarrierrec
+func waitForSigusr1Callback(gp *g) bool {
+	if gp == nil || gp.m == nil {
+		waitForSigusr1.mID = -1
+	} else {
+		waitForSigusr1.mID = gp.m.id
+	}
+	b := byte(0)
+	write(uintptr(waitForSigusr1.wrpipe), noescape(unsafe.Pointer(&b)), 1)
+	return true
+}
+
+// SendSigusr1 sends SIGUSR1 to mp.
+func SendSigusr1(mp *M) {
+	signalM(mp, _SIGUSR1)
+}
diff --git a/src/runtime/export_windows_test.go b/src/runtime/export_windows_test.go
new file mode 100644
index 0000000..536b398
--- /dev/null
+++ b/src/runtime/export_windows_test.go
@@ -0,0 +1,25 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Export guts for testing.
+
+package runtime
+
+import "unsafe"
+
+var (
+	TestingWER              = &testingWER
+	OsYield                 = osyield
+	TimeBeginPeriodRetValue = &timeBeginPeriodRetValue
+)
+
+func NumberOfProcessors() int32 {
+	var info systeminfo
+	stdcall1(_GetSystemInfo, uintptr(unsafe.Pointer(&info)))
+	return int32(info.dwnumberofprocessors)
+}
+
+func LoadLibraryExStatus() (useEx, haveEx, haveFlags bool) {
+	return useLoadLibraryEx, _LoadLibraryExW != nil, _AddDllDirectory != nil
+}
diff --git a/src/runtime/extern.go b/src/runtime/extern.go
new file mode 100644
index 0000000..dacdf4f
--- /dev/null
+++ b/src/runtime/extern.go
@@ -0,0 +1,257 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+Package runtime contains operations that interact with Go's runtime system,
+such as functions to control goroutines. It also includes the low-level type information
+used by the reflect package; see reflect's documentation for the programmable
+interface to the run-time type system.
+
+Environment Variables
+
+The following environment variables ($name or %name%, depending on the host
+operating system) control the run-time behavior of Go programs. The meanings
+and use may change from release to release.
+
+The GOGC variable sets the initial garbage collection target percentage.
+A collection is triggered when the ratio of freshly allocated data to live data
+remaining after the previous collection reaches this percentage. The default
+is GOGC=100. Setting GOGC=off disables the garbage collector entirely.
+The runtime/debug package's SetGCPercent function allows changing this
+percentage at run time. See https://golang.org/pkg/runtime/debug/#SetGCPercent.
+
+The GODEBUG variable controls debugging variables within the runtime.
+It is a comma-separated list of name=val pairs setting these named variables:
+
+	allocfreetrace: setting allocfreetrace=1 causes every allocation to be
+	profiled and a stack trace printed on each object's allocation and free.
+
+	clobberfree: setting clobberfree=1 causes the garbage collector to
+	clobber the memory content of an object with bad content when it frees
+	the object.
+
+	cgocheck: setting cgocheck=0 disables all checks for packages
+	using cgo to incorrectly pass Go pointers to non-Go code.
+	Setting cgocheck=1 (the default) enables relatively cheap
+	checks that may miss some errors.  Setting cgocheck=2 enables
+	expensive checks that should not miss any errors, but will
+	cause your program to run slower.
+
+	efence: setting efence=1 causes the allocator to run in a mode
+	where each object is allocated on a unique page and addresses are
+	never recycled.
+
+	gccheckmark: setting gccheckmark=1 enables verification of the
+	garbage collector's concurrent mark phase by performing a
+	second mark pass while the world is stopped.  If the second
+	pass finds a reachable object that was not found by concurrent
+	mark, the garbage collector will panic.
+
+	gcpacertrace: setting gcpacertrace=1 causes the garbage collector to
+	print information about the internal state of the concurrent pacer.
+
+	gcshrinkstackoff: setting gcshrinkstackoff=1 disables moving goroutines
+	onto smaller stacks. In this mode, a goroutine's stack can only grow.
+
+	gcstoptheworld: setting gcstoptheworld=1 disables concurrent garbage collection,
+	making every garbage collection a stop-the-world event. Setting gcstoptheworld=2
+	also disables concurrent sweeping after the garbage collection finishes.
+
+	gctrace: setting gctrace=1 causes the garbage collector to emit a single line to standard
+	error at each collection, summarizing the amount of memory collected and the
+	length of the pause. The format of this line is subject to change.
+	Currently, it is:
+		gc # @#s #%: #+#+# ms clock, #+#/#/#+# ms cpu, #->#-># MB, # MB goal, # P
+	where the fields are as follows:
+		gc #        the GC number, incremented at each GC
+		@#s         time in seconds since program start
+		#%          percentage of time spent in GC since program start
+		#+...+#     wall-clock/CPU times for the phases of the GC
+		#->#-># MB  heap size at GC start, at GC end, and live heap
+		# MB goal   goal heap size
+		# P         number of processors used
+	The phases are stop-the-world (STW) sweep termination, concurrent
+	mark and scan, and STW mark termination. The CPU times
+	for mark/scan are broken down in to assist time (GC performed in
+	line with allocation), background GC time, and idle GC time.
+	If the line ends with "(forced)", this GC was forced by a
+	runtime.GC() call.
+
+	inittrace: setting inittrace=1 causes the runtime to emit a single line to standard
+	error for each package with init work, summarizing the execution time and memory
+	allocation. No information is printed for inits executed as part of plugin loading
+	and for packages without both user defined and compiler generated init work.
+	The format of this line is subject to change. Currently, it is:
+		init # @#ms, # ms clock, # bytes, # allocs
+	where the fields are as follows:
+		init #      the package name
+		@# ms       time in milliseconds when the init started since program start
+		# clock     wall-clock time for package initialization work
+		# bytes     memory allocated on the heap
+		# allocs    number of heap allocations
+
+	madvdontneed: setting madvdontneed=0 will use MADV_FREE
+	instead of MADV_DONTNEED on Linux when returning memory to the
+	kernel. This is more efficient, but means RSS numbers will
+	drop only when the OS is under memory pressure.
+
+	memprofilerate: setting memprofilerate=X will update the value of runtime.MemProfileRate.
+	When set to 0 memory profiling is disabled.  Refer to the description of
+	MemProfileRate for the default value.
+
+	invalidptr: invalidptr=1 (the default) causes the garbage collector and stack
+	copier to crash the program if an invalid pointer value (for example, 1)
+	is found in a pointer-typed location. Setting invalidptr=0 disables this check.
+	This should only be used as a temporary workaround to diagnose buggy code.
+	The real fix is to not store integers in pointer-typed locations.
+
+	sbrk: setting sbrk=1 replaces the memory allocator and garbage collector
+	with a trivial allocator that obtains memory from the operating system and
+	never reclaims any memory.
+
+	scavenge: scavenge=1 enables debugging mode of heap scavenger.
+
+	scavtrace: setting scavtrace=1 causes the runtime to emit a single line to standard
+	error, roughly once per GC cycle, summarizing the amount of work done by the
+	scavenger as well as the total amount of memory returned to the operating system
+	and an estimate of physical memory utilization. The format of this line is subject
+	to change, but currently it is:
+		scav # # KiB work, # KiB total, #% util
+	where the fields are as follows:
+		scav #       the scavenge cycle number
+		# KiB work   the amount of memory returned to the OS since the last line
+		# KiB total  the total amount of memory returned to the OS
+		#% util      the fraction of all unscavenged memory which is in-use
+	If the line ends with "(forced)", then scavenging was forced by a
+	debug.FreeOSMemory() call.
+
+	scheddetail: setting schedtrace=X and scheddetail=1 causes the scheduler to emit
+	detailed multiline info every X milliseconds, describing state of the scheduler,
+	processors, threads and goroutines.
+
+	schedtrace: setting schedtrace=X causes the scheduler to emit a single line to standard
+	error every X milliseconds, summarizing the scheduler state.
+
+	tracebackancestors: setting tracebackancestors=N extends tracebacks with the stacks at
+	which goroutines were created, where N limits the number of ancestor goroutines to
+	report. This also extends the information returned by runtime.Stack. Ancestor's goroutine
+	IDs will refer to the ID of the goroutine at the time of creation; it's possible for this
+	ID to be reused for another goroutine. Setting N to 0 will report no ancestry information.
+
+	asyncpreemptoff: asyncpreemptoff=1 disables signal-based
+	asynchronous goroutine preemption. This makes some loops
+	non-preemptible for long periods, which may delay GC and
+	goroutine scheduling. This is useful for debugging GC issues
+	because it also disables the conservative stack scanning used
+	for asynchronously preempted goroutines.
+
+The net, net/http, and crypto/tls packages also refer to debugging variables in GODEBUG.
+See the documentation for those packages for details.
+
+The GOMAXPROCS variable limits the number of operating system threads that
+can execute user-level Go code simultaneously. There is no limit to the number of threads
+that can be blocked in system calls on behalf of Go code; those do not count against
+the GOMAXPROCS limit. This package's GOMAXPROCS function queries and changes
+the limit.
+
+The GORACE variable configures the race detector, for programs built using -race.
+See https://golang.org/doc/articles/race_detector.html for details.
+
+The GOTRACEBACK variable controls the amount of output generated when a Go
+program fails due to an unrecovered panic or an unexpected runtime condition.
+By default, a failure prints a stack trace for the current goroutine,
+eliding functions internal to the run-time system, and then exits with exit code 2.
+The failure prints stack traces for all goroutines if there is no current goroutine
+or the failure is internal to the run-time.
+GOTRACEBACK=none omits the goroutine stack traces entirely.
+GOTRACEBACK=single (the default) behaves as described above.
+GOTRACEBACK=all adds stack traces for all user-created goroutines.
+GOTRACEBACK=system is like ``all'' but adds stack frames for run-time functions
+and shows goroutines created internally by the run-time.
+GOTRACEBACK=crash is like ``system'' but crashes in an operating system-specific
+manner instead of exiting. For example, on Unix systems, the crash raises
+SIGABRT to trigger a core dump.
+For historical reasons, the GOTRACEBACK settings 0, 1, and 2 are synonyms for
+none, all, and system, respectively.
+The runtime/debug package's SetTraceback function allows increasing the
+amount of output at run time, but it cannot reduce the amount below that
+specified by the environment variable.
+See https://golang.org/pkg/runtime/debug/#SetTraceback.
+
+The GOARCH, GOOS, GOPATH, and GOROOT environment variables complete
+the set of Go environment variables. They influence the building of Go programs
+(see https://golang.org/cmd/go and https://golang.org/pkg/go/build).
+GOARCH, GOOS, and GOROOT are recorded at compile time and made available by
+constants or functions in this package, but they do not influence the execution
+of the run-time system.
+*/
+package runtime
+
+import "runtime/internal/sys"
+
+// Caller reports file and line number information about function invocations on
+// the calling goroutine's stack. The argument skip is the number of stack frames
+// to ascend, with 0 identifying the caller of Caller.  (For historical reasons the
+// meaning of skip differs between Caller and Callers.) The return values report the
+// program counter, file name, and line number within the file of the corresponding
+// call. The boolean ok is false if it was not possible to recover the information.
+func Caller(skip int) (pc uintptr, file string, line int, ok bool) {
+	rpc := make([]uintptr, 1)
+	n := callers(skip+1, rpc[:])
+	if n < 1 {
+		return
+	}
+	frame, _ := CallersFrames(rpc).Next()
+	return frame.PC, frame.File, frame.Line, frame.PC != 0
+}
+
+// Callers fills the slice pc with the return program counters of function invocations
+// on the calling goroutine's stack. The argument skip is the number of stack frames
+// to skip before recording in pc, with 0 identifying the frame for Callers itself and
+// 1 identifying the caller of Callers.
+// It returns the number of entries written to pc.
+//
+// To translate these PCs into symbolic information such as function
+// names and line numbers, use CallersFrames. CallersFrames accounts
+// for inlined functions and adjusts the return program counters into
+// call program counters. Iterating over the returned slice of PCs
+// directly is discouraged, as is using FuncForPC on any of the
+// returned PCs, since these cannot account for inlining or return
+// program counter adjustment.
+func Callers(skip int, pc []uintptr) int {
+	// runtime.callers uses pc.array==nil as a signal
+	// to print a stack trace. Pick off 0-length pc here
+	// so that we don't let a nil pc slice get to it.
+	if len(pc) == 0 {
+		return 0
+	}
+	return callers(skip, pc)
+}
+
+// GOROOT returns the root of the Go tree. It uses the
+// GOROOT environment variable, if set at process start,
+// or else the root used during the Go build.
+func GOROOT() string {
+	s := gogetenv("GOROOT")
+	if s != "" {
+		return s
+	}
+	return sys.DefaultGoroot
+}
+
+// Version returns the Go tree's version string.
+// It is either the commit hash and date at the time of the build or,
+// when possible, a release tag like "go1.3".
+func Version() string {
+	return sys.TheVersion
+}
+
+// GOOS is the running program's operating system target:
+// one of darwin, freebsd, linux, and so on.
+// To view possible combinations of GOOS and GOARCH, run "go tool dist list".
+const GOOS string = sys.GOOS
+
+// GOARCH is the running program's architecture target:
+// one of 386, amd64, arm, s390x, and so on.
+const GOARCH string = sys.GOARCH
diff --git a/src/runtime/fastlog2.go b/src/runtime/fastlog2.go
new file mode 100644
index 0000000..1f251bf
--- /dev/null
+++ b/src/runtime/fastlog2.go
@@ -0,0 +1,27 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// fastlog2 implements a fast approximation to the base 2 log of a
+// float64. This is used to compute a geometric distribution for heap
+// sampling, without introducing dependencies into package math. This
+// uses a very rough approximation using the float64 exponent and the
+// first 25 bits of the mantissa. The top 5 bits of the mantissa are
+// used to load limits from a table of constants and the rest are used
+// to scale linearly between them.
+func fastlog2(x float64) float64 {
+	const fastlogScaleBits = 20
+	const fastlogScaleRatio = 1.0 / (1 << fastlogScaleBits)
+
+	xBits := float64bits(x)
+	// Extract the exponent from the IEEE float64, and index a constant
+	// table with the first 10 bits from the mantissa.
+	xExp := int64((xBits>>52)&0x7FF) - 1023
+	xManIndex := (xBits >> (52 - fastlogNumBits)) % (1 << fastlogNumBits)
+	xManScale := (xBits >> (52 - fastlogNumBits - fastlogScaleBits)) % (1 << fastlogScaleBits)
+
+	low, high := fastlog2Table[xManIndex], fastlog2Table[xManIndex+1]
+	return float64(xExp) + low + (high-low)*float64(xManScale)*fastlogScaleRatio
+}
diff --git a/src/runtime/fastlog2_test.go b/src/runtime/fastlog2_test.go
new file mode 100644
index 0000000..ae0f40b
--- /dev/null
+++ b/src/runtime/fastlog2_test.go
@@ -0,0 +1,34 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math"
+	"runtime"
+	"testing"
+)
+
+func TestFastLog2(t *testing.T) {
+	// Compute the euclidean distance between math.Log2 and the FastLog2
+	// implementation over the range of interest for heap sampling.
+	const randomBitCount = 26
+	var e float64
+
+	inc := 1
+	if testing.Short() {
+		// Check 1K total values, down from 64M.
+		inc = 1 << 16
+	}
+	for i := 1; i < 1<<randomBitCount; i += inc {
+		l, fl := math.Log2(float64(i)), runtime.Fastlog2(float64(i))
+		d := l - fl
+		e += d * d
+	}
+	e = math.Sqrt(e)
+
+	if e > 1.0 {
+		t.Fatalf("imprecision on fastlog2 implementation, want <=1.0, got %f", e)
+	}
+}
diff --git a/src/runtime/fastlog2table.go b/src/runtime/fastlog2table.go
new file mode 100644
index 0000000..6ba4a7d
--- /dev/null
+++ b/src/runtime/fastlog2table.go
@@ -0,0 +1,43 @@
+// Code generated by mkfastlog2table.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkfastlog2table.go for comments.
+
+package runtime
+
+const fastlogNumBits = 5
+
+var fastlog2Table = [1<<fastlogNumBits + 1]float64{
+	0,
+	0.0443941193584535,
+	0.08746284125033943,
+	0.12928301694496647,
+	0.16992500144231248,
+	0.2094533656289499,
+	0.24792751344358555,
+	0.28540221886224837,
+	0.3219280948873623,
+	0.3575520046180837,
+	0.39231742277876036,
+	0.4262647547020979,
+	0.4594316186372973,
+	0.4918530963296748,
+	0.5235619560570128,
+	0.5545888516776374,
+	0.5849625007211563,
+	0.6147098441152082,
+	0.6438561897747247,
+	0.6724253419714956,
+	0.7004397181410922,
+	0.7279204545631992,
+	0.7548875021634686,
+	0.7813597135246596,
+	0.8073549220576042,
+	0.8328900141647417,
+	0.8579809951275721,
+	0.8826430493618412,
+	0.9068905956085185,
+	0.9307373375628862,
+	0.9541963103868752,
+	0.9772799234999164,
+	1,
+}
diff --git a/src/runtime/float.go b/src/runtime/float.go
new file mode 100644
index 0000000..459e58d
--- /dev/null
+++ b/src/runtime/float.go
@@ -0,0 +1,53 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+var inf = float64frombits(0x7FF0000000000000)
+
+// isNaN reports whether f is an IEEE 754 ``not-a-number'' value.
+func isNaN(f float64) (is bool) {
+	// IEEE 754 says that only NaNs satisfy f != f.
+	return f != f
+}
+
+// isFinite reports whether f is neither NaN nor an infinity.
+func isFinite(f float64) bool {
+	return !isNaN(f - f)
+}
+
+// isInf reports whether f is an infinity.
+func isInf(f float64) bool {
+	return !isNaN(f) && !isFinite(f)
+}
+
+// Abs returns the absolute value of x.
+//
+// Special cases are:
+//	Abs(±Inf) = +Inf
+//	Abs(NaN) = NaN
+func abs(x float64) float64 {
+	const sign = 1 << 63
+	return float64frombits(float64bits(x) &^ sign)
+}
+
+// copysign returns a value with the magnitude
+// of x and the sign of y.
+func copysign(x, y float64) float64 {
+	const sign = 1 << 63
+	return float64frombits(float64bits(x)&^sign | float64bits(y)&sign)
+}
+
+// Float64bits returns the IEEE 754 binary representation of f.
+func float64bits(f float64) uint64 {
+	return *(*uint64)(unsafe.Pointer(&f))
+}
+
+// Float64frombits returns the floating point number corresponding
+// the IEEE 754 binary representation b.
+func float64frombits(b uint64) float64 {
+	return *(*float64)(unsafe.Pointer(&b))
+}
diff --git a/src/runtime/funcdata.h b/src/runtime/funcdata.h
new file mode 100644
index 0000000..798dbac
--- /dev/null
+++ b/src/runtime/funcdata.h
@@ -0,0 +1,52 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file defines the IDs for PCDATA and FUNCDATA instructions
+// in Go binaries. It is included by assembly sources, so it must
+// be written using #defines.
+//
+// These must agree with symtab.go and ../cmd/internal/objabi/funcdata.go.
+
+#define PCDATA_UnsafePoint 0
+#define PCDATA_StackMapIndex 1
+#define PCDATA_InlTreeIndex 2
+
+#define FUNCDATA_ArgsPointerMaps 0 /* garbage collector blocks */
+#define FUNCDATA_LocalsPointerMaps 1
+#define FUNCDATA_StackObjects 2
+#define FUNCDATA_InlTree 3
+#define FUNCDATA_OpenCodedDeferInfo 4 /* info for func with open-coded defers */
+
+// Pseudo-assembly statements.
+
+// GO_ARGS, GO_RESULTS_INITIALIZED, and NO_LOCAL_POINTERS are macros
+// that communicate to the runtime information about the location and liveness
+// of pointers in an assembly function's arguments, results, and stack frame.
+// This communication is only required in assembly functions that make calls
+// to other functions that might be preempted or grow the stack.
+// NOSPLIT functions that make no calls do not need to use these macros.
+
+// GO_ARGS indicates that the Go prototype for this assembly function
+// defines the pointer map for the function's arguments.
+// GO_ARGS should be the first instruction in a function that uses it.
+// It can be omitted if there are no arguments at all.
+// GO_ARGS is inserted implicitly by the linker for any function whose
+// name starts with a middle-dot and that also has a Go prototype; it
+// is therefore usually not necessary to write explicitly.
+#define GO_ARGS	FUNCDATA $FUNCDATA_ArgsPointerMaps, go_args_stackmap(SB)
+
+// GO_RESULTS_INITIALIZED indicates that the assembly function
+// has initialized the stack space for its results and that those results
+// should be considered live for the remainder of the function.
+#define GO_RESULTS_INITIALIZED	PCDATA $PCDATA_StackMapIndex, $1
+
+// NO_LOCAL_POINTERS indicates that the assembly function stores
+// no pointers to heap objects in its local stack variables.
+#define NO_LOCAL_POINTERS	FUNCDATA $FUNCDATA_LocalsPointerMaps, runtime·no_pointers_stackmap(SB)
+
+// ArgsSizeUnknown is set in Func.argsize to mark all functions
+// whose argument size is unknown (C vararg functions, and
+// assembly code without an explicit specification).
+// This value is generated by the compiler, assembler, or linker.
+#define ArgsSizeUnknown 0x80000000
diff --git a/src/runtime/futex_test.go b/src/runtime/futex_test.go
new file mode 100644
index 0000000..3051bd5
--- /dev/null
+++ b/src/runtime/futex_test.go
@@ -0,0 +1,88 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Futex is only available on DragonFly BSD, FreeBSD and Linux.
+// The race detector emits calls to split stack functions so it breaks
+// the test.
+
+// +build dragonfly freebsd linux
+// +build !race
+
+package runtime_test
+
+import (
+	"runtime"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"time"
+)
+
+type futexsleepTest struct {
+	mtx uint32
+	ns  int64
+	msg string
+	ch  chan *futexsleepTest
+}
+
+var futexsleepTests = []futexsleepTest{
+	beforeY2038: {mtx: 0, ns: 86400 * 1e9, msg: "before the year 2038"},
+	afterY2038:  {mtx: 0, ns: (1<<31 + 100) * 1e9, msg: "after the year 2038"},
+}
+
+const (
+	beforeY2038 = iota
+	afterY2038
+)
+
+func TestFutexsleep(t *testing.T) {
+	if runtime.GOMAXPROCS(0) > 1 {
+		// futexsleep doesn't handle EINTR or other signals,
+		// so spurious wakeups may happen.
+		t.Skip("skipping; GOMAXPROCS>1")
+	}
+
+	start := time.Now()
+	var wg sync.WaitGroup
+	for i := range futexsleepTests {
+		tt := &futexsleepTests[i]
+		tt.mtx = 0
+		tt.ch = make(chan *futexsleepTest, 1)
+		wg.Add(1)
+		go func(tt *futexsleepTest) {
+			runtime.Entersyscall()
+			runtime.Futexsleep(&tt.mtx, 0, tt.ns)
+			runtime.Exitsyscall()
+			tt.ch <- tt
+			wg.Done()
+		}(tt)
+	}
+loop:
+	for {
+		select {
+		case tt := <-futexsleepTests[beforeY2038].ch:
+			t.Errorf("futexsleep test %q finished early after %s", tt.msg, time.Since(start))
+			break loop
+		case tt := <-futexsleepTests[afterY2038].ch:
+			// Looks like FreeBSD 10 kernel has changed
+			// the semantics of timedwait on userspace
+			// mutex to make broken stuff look broken.
+			switch {
+			case runtime.GOOS == "freebsd" && runtime.GOARCH == "386":
+				t.Log("freebsd/386 may not work correctly after the year 2038, see golang.org/issue/7194")
+			default:
+				t.Errorf("futexsleep test %q finished early after %s", tt.msg, time.Since(start))
+				break loop
+			}
+		case <-time.After(time.Second):
+			break loop
+		}
+	}
+	for i := range futexsleepTests {
+		tt := &futexsleepTests[i]
+		atomic.StoreUint32(&tt.mtx, 1)
+		runtime.Futexwakeup(&tt.mtx, 1)
+	}
+	wg.Wait()
+}
diff --git a/src/runtime/gc_test.go b/src/runtime/gc_test.go
new file mode 100644
index 0000000..7870f31
--- /dev/null
+++ b/src/runtime/gc_test.go
@@ -0,0 +1,802 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"math/rand"
+	"os"
+	"reflect"
+	"runtime"
+	"runtime/debug"
+	"sort"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+func TestGcSys(t *testing.T) {
+	if os.Getenv("GOGC") == "off" {
+		t.Skip("skipping test; GOGC=off in environment")
+	}
+	got := runTestProg(t, "testprog", "GCSys")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got %q", want, got)
+	}
+}
+
+func TestGcDeepNesting(t *testing.T) {
+	type T [2][2][2][2][2][2][2][2][2][2]*int
+	a := new(T)
+
+	// Prevent the compiler from applying escape analysis.
+	// This makes sure new(T) is allocated on heap, not on the stack.
+	t.Logf("%p", a)
+
+	a[0][0][0][0][0][0][0][0][0][0] = new(int)
+	*a[0][0][0][0][0][0][0][0][0][0] = 13
+	runtime.GC()
+	if *a[0][0][0][0][0][0][0][0][0][0] != 13 {
+		t.Fail()
+	}
+}
+
+func TestGcMapIndirection(t *testing.T) {
+	defer debug.SetGCPercent(debug.SetGCPercent(1))
+	runtime.GC()
+	type T struct {
+		a [256]int
+	}
+	m := make(map[T]T)
+	for i := 0; i < 2000; i++ {
+		var a T
+		a.a[0] = i
+		m[a] = T{}
+	}
+}
+
+func TestGcArraySlice(t *testing.T) {
+	type X struct {
+		buf     [1]byte
+		nextbuf []byte
+		next    *X
+	}
+	var head *X
+	for i := 0; i < 10; i++ {
+		p := &X{}
+		p.buf[0] = 42
+		p.next = head
+		if head != nil {
+			p.nextbuf = head.buf[:]
+		}
+		head = p
+		runtime.GC()
+	}
+	for p := head; p != nil; p = p.next {
+		if p.buf[0] != 42 {
+			t.Fatal("corrupted heap")
+		}
+	}
+}
+
+func TestGcRescan(t *testing.T) {
+	type X struct {
+		c     chan error
+		nextx *X
+	}
+	type Y struct {
+		X
+		nexty *Y
+		p     *int
+	}
+	var head *Y
+	for i := 0; i < 10; i++ {
+		p := &Y{}
+		p.c = make(chan error)
+		if head != nil {
+			p.nextx = &head.X
+		}
+		p.nexty = head
+		p.p = new(int)
+		*p.p = 42
+		head = p
+		runtime.GC()
+	}
+	for p := head; p != nil; p = p.nexty {
+		if *p.p != 42 {
+			t.Fatal("corrupted heap")
+		}
+	}
+}
+
+func TestGcLastTime(t *testing.T) {
+	ms := new(runtime.MemStats)
+	t0 := time.Now().UnixNano()
+	runtime.GC()
+	t1 := time.Now().UnixNano()
+	runtime.ReadMemStats(ms)
+	last := int64(ms.LastGC)
+	if t0 > last || last > t1 {
+		t.Fatalf("bad last GC time: got %v, want [%v, %v]", last, t0, t1)
+	}
+	pause := ms.PauseNs[(ms.NumGC+255)%256]
+	// Due to timer granularity, pause can actually be 0 on windows
+	// or on virtualized environments.
+	if pause == 0 {
+		t.Logf("last GC pause was 0")
+	} else if pause > 10e9 {
+		t.Logf("bad last GC pause: got %v, want [0, 10e9]", pause)
+	}
+}
+
+var hugeSink interface{}
+
+func TestHugeGCInfo(t *testing.T) {
+	// The test ensures that compiler can chew these huge types even on weakest machines.
+	// The types are not allocated at runtime.
+	if hugeSink != nil {
+		// 400MB on 32 bots, 4TB on 64-bits.
+		const n = (400 << 20) + (unsafe.Sizeof(uintptr(0))-4)<<40
+		hugeSink = new([n]*byte)
+		hugeSink = new([n]uintptr)
+		hugeSink = new(struct {
+			x float64
+			y [n]*byte
+			z []string
+		})
+		hugeSink = new(struct {
+			x float64
+			y [n]uintptr
+			z []string
+		})
+	}
+}
+
+func TestPeriodicGC(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no sysmon on wasm yet")
+	}
+
+	// Make sure we're not in the middle of a GC.
+	runtime.GC()
+
+	var ms1, ms2 runtime.MemStats
+	runtime.ReadMemStats(&ms1)
+
+	// Make periodic GC run continuously.
+	orig := *runtime.ForceGCPeriod
+	*runtime.ForceGCPeriod = 0
+
+	// Let some periodic GCs happen. In a heavily loaded system,
+	// it's possible these will be delayed, so this is designed to
+	// succeed quickly if things are working, but to give it some
+	// slack if things are slow.
+	var numGCs uint32
+	const want = 2
+	for i := 0; i < 200 && numGCs < want; i++ {
+		time.Sleep(5 * time.Millisecond)
+
+		// Test that periodic GC actually happened.
+		runtime.ReadMemStats(&ms2)
+		numGCs = ms2.NumGC - ms1.NumGC
+	}
+	*runtime.ForceGCPeriod = orig
+
+	if numGCs < want {
+		t.Fatalf("no periodic GC: got %v GCs, want >= 2", numGCs)
+	}
+}
+
+func TestGcZombieReporting(t *testing.T) {
+	// This test is somewhat sensitive to how the allocator works.
+	got := runTestProg(t, "testprog", "GCZombie")
+	want := "found pointer to free object"
+	if !strings.Contains(got, want) {
+		t.Fatalf("expected %q in output, but got %q", want, got)
+	}
+}
+
+func BenchmarkSetTypePtr(b *testing.B) {
+	benchSetType(b, new(*byte))
+}
+
+func BenchmarkSetTypePtr8(b *testing.B) {
+	benchSetType(b, new([8]*byte))
+}
+
+func BenchmarkSetTypePtr16(b *testing.B) {
+	benchSetType(b, new([16]*byte))
+}
+
+func BenchmarkSetTypePtr32(b *testing.B) {
+	benchSetType(b, new([32]*byte))
+}
+
+func BenchmarkSetTypePtr64(b *testing.B) {
+	benchSetType(b, new([64]*byte))
+}
+
+func BenchmarkSetTypePtr126(b *testing.B) {
+	benchSetType(b, new([126]*byte))
+}
+
+func BenchmarkSetTypePtr128(b *testing.B) {
+	benchSetType(b, new([128]*byte))
+}
+
+func BenchmarkSetTypePtrSlice(b *testing.B) {
+	benchSetType(b, make([]*byte, 1<<10))
+}
+
+type Node1 struct {
+	Value       [1]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode1(b *testing.B) {
+	benchSetType(b, new(Node1))
+}
+
+func BenchmarkSetTypeNode1Slice(b *testing.B) {
+	benchSetType(b, make([]Node1, 32))
+}
+
+type Node8 struct {
+	Value       [8]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode8(b *testing.B) {
+	benchSetType(b, new(Node8))
+}
+
+func BenchmarkSetTypeNode8Slice(b *testing.B) {
+	benchSetType(b, make([]Node8, 32))
+}
+
+type Node64 struct {
+	Value       [64]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode64(b *testing.B) {
+	benchSetType(b, new(Node64))
+}
+
+func BenchmarkSetTypeNode64Slice(b *testing.B) {
+	benchSetType(b, make([]Node64, 32))
+}
+
+type Node64Dead struct {
+	Left, Right *byte
+	Value       [64]uintptr
+}
+
+func BenchmarkSetTypeNode64Dead(b *testing.B) {
+	benchSetType(b, new(Node64Dead))
+}
+
+func BenchmarkSetTypeNode64DeadSlice(b *testing.B) {
+	benchSetType(b, make([]Node64Dead, 32))
+}
+
+type Node124 struct {
+	Value       [124]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode124(b *testing.B) {
+	benchSetType(b, new(Node124))
+}
+
+func BenchmarkSetTypeNode124Slice(b *testing.B) {
+	benchSetType(b, make([]Node124, 32))
+}
+
+type Node126 struct {
+	Value       [126]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode126(b *testing.B) {
+	benchSetType(b, new(Node126))
+}
+
+func BenchmarkSetTypeNode126Slice(b *testing.B) {
+	benchSetType(b, make([]Node126, 32))
+}
+
+type Node128 struct {
+	Value       [128]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode128(b *testing.B) {
+	benchSetType(b, new(Node128))
+}
+
+func BenchmarkSetTypeNode128Slice(b *testing.B) {
+	benchSetType(b, make([]Node128, 32))
+}
+
+type Node130 struct {
+	Value       [130]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode130(b *testing.B) {
+	benchSetType(b, new(Node130))
+}
+
+func BenchmarkSetTypeNode130Slice(b *testing.B) {
+	benchSetType(b, make([]Node130, 32))
+}
+
+type Node1024 struct {
+	Value       [1024]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode1024(b *testing.B) {
+	benchSetType(b, new(Node1024))
+}
+
+func BenchmarkSetTypeNode1024Slice(b *testing.B) {
+	benchSetType(b, make([]Node1024, 32))
+}
+
+func benchSetType(b *testing.B, x interface{}) {
+	v := reflect.ValueOf(x)
+	t := v.Type()
+	switch t.Kind() {
+	case reflect.Ptr:
+		b.SetBytes(int64(t.Elem().Size()))
+	case reflect.Slice:
+		b.SetBytes(int64(t.Elem().Size()) * int64(v.Len()))
+	}
+	b.ResetTimer()
+	runtime.BenchSetType(b.N, x)
+}
+
+func BenchmarkAllocation(b *testing.B) {
+	type T struct {
+		x, y *byte
+	}
+	ngo := runtime.GOMAXPROCS(0)
+	work := make(chan bool, b.N+ngo)
+	result := make(chan *T)
+	for i := 0; i < b.N; i++ {
+		work <- true
+	}
+	for i := 0; i < ngo; i++ {
+		work <- false
+	}
+	for i := 0; i < ngo; i++ {
+		go func() {
+			var x *T
+			for <-work {
+				for i := 0; i < 1000; i++ {
+					x = &T{}
+				}
+			}
+			result <- x
+		}()
+	}
+	for i := 0; i < ngo; i++ {
+		<-result
+	}
+}
+
+func TestPrintGC(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(2))
+	done := make(chan bool)
+	go func() {
+		for {
+			select {
+			case <-done:
+				return
+			default:
+				runtime.GC()
+			}
+		}
+	}()
+	for i := 0; i < 1e4; i++ {
+		func() {
+			defer print("")
+		}()
+	}
+	close(done)
+}
+
+func testTypeSwitch(x interface{}) error {
+	switch y := x.(type) {
+	case nil:
+		// ok
+	case error:
+		return y
+	}
+	return nil
+}
+
+func testAssert(x interface{}) error {
+	if y, ok := x.(error); ok {
+		return y
+	}
+	return nil
+}
+
+func testAssertVar(x interface{}) error {
+	var y, ok = x.(error)
+	if ok {
+		return y
+	}
+	return nil
+}
+
+var a bool
+
+//go:noinline
+func testIfaceEqual(x interface{}) {
+	if x == "abc" {
+		a = true
+	}
+}
+
+func TestPageAccounting(t *testing.T) {
+	// Grow the heap in small increments. This used to drop the
+	// pages-in-use count below zero because of a rounding
+	// mismatch (golang.org/issue/15022).
+	const blockSize = 64 << 10
+	blocks := make([]*[blockSize]byte, (64<<20)/blockSize)
+	for i := range blocks {
+		blocks[i] = new([blockSize]byte)
+	}
+
+	// Check that the running page count matches reality.
+	pagesInUse, counted := runtime.CountPagesInUse()
+	if pagesInUse != counted {
+		t.Fatalf("mheap_.pagesInUse is %d, but direct count is %d", pagesInUse, counted)
+	}
+}
+
+func TestReadMemStats(t *testing.T) {
+	base, slow := runtime.ReadMemStatsSlow()
+	if base != slow {
+		logDiff(t, "MemStats", reflect.ValueOf(base), reflect.ValueOf(slow))
+		t.Fatal("memstats mismatch")
+	}
+}
+
+func logDiff(t *testing.T, prefix string, got, want reflect.Value) {
+	typ := got.Type()
+	switch typ.Kind() {
+	case reflect.Array, reflect.Slice:
+		if got.Len() != want.Len() {
+			t.Logf("len(%s): got %v, want %v", prefix, got, want)
+			return
+		}
+		for i := 0; i < got.Len(); i++ {
+			logDiff(t, fmt.Sprintf("%s[%d]", prefix, i), got.Index(i), want.Index(i))
+		}
+	case reflect.Struct:
+		for i := 0; i < typ.NumField(); i++ {
+			gf, wf := got.Field(i), want.Field(i)
+			logDiff(t, prefix+"."+typ.Field(i).Name, gf, wf)
+		}
+	case reflect.Map:
+		t.Fatal("not implemented: logDiff for map")
+	default:
+		if got.Interface() != want.Interface() {
+			t.Logf("%s: got %v, want %v", prefix, got, want)
+		}
+	}
+}
+
+func BenchmarkReadMemStats(b *testing.B) {
+	var ms runtime.MemStats
+	const heapSize = 100 << 20
+	x := make([]*[1024]byte, heapSize/1024)
+	for i := range x {
+		x[i] = new([1024]byte)
+	}
+	hugeSink = x
+
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		runtime.ReadMemStats(&ms)
+	}
+
+	hugeSink = nil
+}
+
+func applyGCLoad(b *testing.B) func() {
+	// We’ll apply load to the runtime with maxProcs-1 goroutines
+	// and use one more to actually benchmark. It doesn't make sense
+	// to try to run this test with only 1 P (that's what
+	// BenchmarkReadMemStats is for).
+	maxProcs := runtime.GOMAXPROCS(-1)
+	if maxProcs == 1 {
+		b.Skip("This benchmark can only be run with GOMAXPROCS > 1")
+	}
+
+	// Code to build a big tree with lots of pointers.
+	type node struct {
+		children [16]*node
+	}
+	var buildTree func(depth int) *node
+	buildTree = func(depth int) *node {
+		tree := new(node)
+		if depth != 0 {
+			for i := range tree.children {
+				tree.children[i] = buildTree(depth - 1)
+			}
+		}
+		return tree
+	}
+
+	// Keep the GC busy by continuously generating large trees.
+	done := make(chan struct{})
+	var wg sync.WaitGroup
+	for i := 0; i < maxProcs-1; i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			var hold *node
+		loop:
+			for {
+				hold = buildTree(5)
+				select {
+				case <-done:
+					break loop
+				default:
+				}
+			}
+			runtime.KeepAlive(hold)
+		}()
+	}
+	return func() {
+		close(done)
+		wg.Wait()
+	}
+}
+
+func BenchmarkReadMemStatsLatency(b *testing.B) {
+	stop := applyGCLoad(b)
+
+	// Spend this much time measuring latencies.
+	latencies := make([]time.Duration, 0, 1024)
+
+	// Run for timeToBench hitting ReadMemStats continuously
+	// and measuring the latency.
+	b.ResetTimer()
+	var ms runtime.MemStats
+	for i := 0; i < b.N; i++ {
+		// Sleep for a bit, otherwise we're just going to keep
+		// stopping the world and no one will get to do anything.
+		time.Sleep(100 * time.Millisecond)
+		start := time.Now()
+		runtime.ReadMemStats(&ms)
+		latencies = append(latencies, time.Now().Sub(start))
+	}
+	// Make sure to stop the timer before we wait! The load created above
+	// is very heavy-weight and not easy to stop, so we could end up
+	// confusing the benchmarking framework for small b.N.
+	b.StopTimer()
+	stop()
+
+	// Disable the default */op metrics.
+	// ns/op doesn't mean anything because it's an average, but we
+	// have a sleep in our b.N loop above which skews this significantly.
+	b.ReportMetric(0, "ns/op")
+	b.ReportMetric(0, "B/op")
+	b.ReportMetric(0, "allocs/op")
+
+	// Sort latencies then report percentiles.
+	sort.Slice(latencies, func(i, j int) bool {
+		return latencies[i] < latencies[j]
+	})
+	b.ReportMetric(float64(latencies[len(latencies)*50/100]), "p50-ns")
+	b.ReportMetric(float64(latencies[len(latencies)*90/100]), "p90-ns")
+	b.ReportMetric(float64(latencies[len(latencies)*99/100]), "p99-ns")
+}
+
+func TestUserForcedGC(t *testing.T) {
+	// Test that runtime.GC() triggers a GC even if GOGC=off.
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+
+	var ms1, ms2 runtime.MemStats
+	runtime.ReadMemStats(&ms1)
+	runtime.GC()
+	runtime.ReadMemStats(&ms2)
+	if ms1.NumGC == ms2.NumGC {
+		t.Fatalf("runtime.GC() did not trigger GC")
+	}
+	if ms1.NumForcedGC == ms2.NumForcedGC {
+		t.Fatalf("runtime.GC() was not accounted in NumForcedGC")
+	}
+}
+
+func writeBarrierBenchmark(b *testing.B, f func()) {
+	runtime.GC()
+	var ms runtime.MemStats
+	runtime.ReadMemStats(&ms)
+	//b.Logf("heap size: %d MB", ms.HeapAlloc>>20)
+
+	// Keep GC running continuously during the benchmark, which in
+	// turn keeps the write barrier on continuously.
+	var stop uint32
+	done := make(chan bool)
+	go func() {
+		for atomic.LoadUint32(&stop) == 0 {
+			runtime.GC()
+		}
+		close(done)
+	}()
+	defer func() {
+		atomic.StoreUint32(&stop, 1)
+		<-done
+	}()
+
+	b.ResetTimer()
+	f()
+	b.StopTimer()
+}
+
+func BenchmarkWriteBarrier(b *testing.B) {
+	if runtime.GOMAXPROCS(-1) < 2 {
+		// We don't want GC to take our time.
+		b.Skip("need GOMAXPROCS >= 2")
+	}
+
+	// Construct a large tree both so the GC runs for a while and
+	// so we have a data structure to manipulate the pointers of.
+	type node struct {
+		l, r *node
+	}
+	var wbRoots []*node
+	var mkTree func(level int) *node
+	mkTree = func(level int) *node {
+		if level == 0 {
+			return nil
+		}
+		n := &node{mkTree(level - 1), mkTree(level - 1)}
+		if level == 10 {
+			// Seed GC with enough early pointers so it
+			// doesn't start termination barriers when it
+			// only has the top of the tree.
+			wbRoots = append(wbRoots, n)
+		}
+		return n
+	}
+	const depth = 22 // 64 MB
+	root := mkTree(22)
+
+	writeBarrierBenchmark(b, func() {
+		var stack [depth]*node
+		tos := -1
+
+		// There are two write barriers per iteration, so i+=2.
+		for i := 0; i < b.N; i += 2 {
+			if tos == -1 {
+				stack[0] = root
+				tos = 0
+			}
+
+			// Perform one step of reversing the tree.
+			n := stack[tos]
+			if n.l == nil {
+				tos--
+			} else {
+				n.l, n.r = n.r, n.l
+				stack[tos] = n.l
+				stack[tos+1] = n.r
+				tos++
+			}
+
+			if i%(1<<12) == 0 {
+				// Avoid non-preemptible loops (see issue #10958).
+				runtime.Gosched()
+			}
+		}
+	})
+
+	runtime.KeepAlive(wbRoots)
+}
+
+func BenchmarkBulkWriteBarrier(b *testing.B) {
+	if runtime.GOMAXPROCS(-1) < 2 {
+		// We don't want GC to take our time.
+		b.Skip("need GOMAXPROCS >= 2")
+	}
+
+	// Construct a large set of objects we can copy around.
+	const heapSize = 64 << 20
+	type obj [16]*byte
+	ptrs := make([]*obj, heapSize/unsafe.Sizeof(obj{}))
+	for i := range ptrs {
+		ptrs[i] = new(obj)
+	}
+
+	writeBarrierBenchmark(b, func() {
+		const blockSize = 1024
+		var pos int
+		for i := 0; i < b.N; i += blockSize {
+			// Rotate block.
+			block := ptrs[pos : pos+blockSize]
+			first := block[0]
+			copy(block, block[1:])
+			block[blockSize-1] = first
+
+			pos += blockSize
+			if pos+blockSize > len(ptrs) {
+				pos = 0
+			}
+
+			runtime.Gosched()
+		}
+	})
+
+	runtime.KeepAlive(ptrs)
+}
+
+func BenchmarkScanStackNoLocals(b *testing.B) {
+	var ready sync.WaitGroup
+	teardown := make(chan bool)
+	for j := 0; j < 10; j++ {
+		ready.Add(1)
+		go func() {
+			x := 100000
+			countpwg(&x, &ready, teardown)
+		}()
+	}
+	ready.Wait()
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		b.StartTimer()
+		runtime.GC()
+		runtime.GC()
+		b.StopTimer()
+	}
+	close(teardown)
+}
+
+func BenchmarkMSpanCountAlloc(b *testing.B) {
+	// Allocate one dummy mspan for the whole benchmark.
+	s := runtime.AllocMSpan()
+	defer runtime.FreeMSpan(s)
+
+	// n is the number of bytes to benchmark against.
+	// n must always be a multiple of 8, since gcBits is
+	// always rounded up 8 bytes.
+	for _, n := range []int{8, 16, 32, 64, 128} {
+		b.Run(fmt.Sprintf("bits=%d", n*8), func(b *testing.B) {
+			// Initialize a new byte slice with pseduo-random data.
+			bits := make([]byte, n)
+			rand.Read(bits)
+
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				runtime.MSpanCountAlloc(s, bits)
+			}
+		})
+	}
+}
+
+func countpwg(n *int, ready *sync.WaitGroup, teardown chan bool) {
+	if *n == 0 {
+		ready.Done()
+		<-teardown
+		return
+	}
+	*n--
+	countpwg(n, ready, teardown)
+}
diff --git a/src/runtime/gcinfo_test.go b/src/runtime/gcinfo_test.go
new file mode 100644
index 0000000..0808b41
--- /dev/null
+++ b/src/runtime/gcinfo_test.go
@@ -0,0 +1,214 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"runtime"
+	"testing"
+)
+
+const (
+	typeScalar  = 0
+	typePointer = 1
+)
+
+// TestGCInfo tests that various objects in heap, data and bss receive correct GC pointer type info.
+func TestGCInfo(t *testing.T) {
+	verifyGCInfo(t, "bss Ptr", &bssPtr, infoPtr)
+	verifyGCInfo(t, "bss ScalarPtr", &bssScalarPtr, infoScalarPtr)
+	verifyGCInfo(t, "bss PtrScalar", &bssPtrScalar, infoPtrScalar)
+	verifyGCInfo(t, "bss BigStruct", &bssBigStruct, infoBigStruct())
+	verifyGCInfo(t, "bss string", &bssString, infoString)
+	verifyGCInfo(t, "bss slice", &bssSlice, infoSlice)
+	verifyGCInfo(t, "bss eface", &bssEface, infoEface)
+	verifyGCInfo(t, "bss iface", &bssIface, infoIface)
+
+	verifyGCInfo(t, "data Ptr", &dataPtr, infoPtr)
+	verifyGCInfo(t, "data ScalarPtr", &dataScalarPtr, infoScalarPtr)
+	verifyGCInfo(t, "data PtrScalar", &dataPtrScalar, infoPtrScalar)
+	verifyGCInfo(t, "data BigStruct", &dataBigStruct, infoBigStruct())
+	verifyGCInfo(t, "data string", &dataString, infoString)
+	verifyGCInfo(t, "data slice", &dataSlice, infoSlice)
+	verifyGCInfo(t, "data eface", &dataEface, infoEface)
+	verifyGCInfo(t, "data iface", &dataIface, infoIface)
+
+	{
+		var x Ptr
+		verifyGCInfo(t, "stack Ptr", &x, infoPtr)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x ScalarPtr
+		verifyGCInfo(t, "stack ScalarPtr", &x, infoScalarPtr)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x PtrScalar
+		verifyGCInfo(t, "stack PtrScalar", &x, infoPtrScalar)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x BigStruct
+		verifyGCInfo(t, "stack BigStruct", &x, infoBigStruct())
+		runtime.KeepAlive(x)
+	}
+	{
+		var x string
+		verifyGCInfo(t, "stack string", &x, infoString)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x []string
+		verifyGCInfo(t, "stack slice", &x, infoSlice)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x interface{}
+		verifyGCInfo(t, "stack eface", &x, infoEface)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x Iface
+		verifyGCInfo(t, "stack iface", &x, infoIface)
+		runtime.KeepAlive(x)
+	}
+
+	for i := 0; i < 10; i++ {
+		verifyGCInfo(t, "heap Ptr", escape(new(Ptr)), trimDead(infoPtr))
+		verifyGCInfo(t, "heap PtrSlice", escape(&make([]*byte, 10)[0]), trimDead(infoPtr10))
+		verifyGCInfo(t, "heap ScalarPtr", escape(new(ScalarPtr)), trimDead(infoScalarPtr))
+		verifyGCInfo(t, "heap ScalarPtrSlice", escape(&make([]ScalarPtr, 4)[0]), trimDead(infoScalarPtr4))
+		verifyGCInfo(t, "heap PtrScalar", escape(new(PtrScalar)), trimDead(infoPtrScalar))
+		verifyGCInfo(t, "heap BigStruct", escape(new(BigStruct)), trimDead(infoBigStruct()))
+		verifyGCInfo(t, "heap string", escape(new(string)), trimDead(infoString))
+		verifyGCInfo(t, "heap eface", escape(new(interface{})), trimDead(infoEface))
+		verifyGCInfo(t, "heap iface", escape(new(Iface)), trimDead(infoIface))
+	}
+}
+
+func verifyGCInfo(t *testing.T, name string, p interface{}, mask0 []byte) {
+	mask := runtime.GCMask(p)
+	if !bytes.Equal(mask, mask0) {
+		t.Errorf("bad GC program for %v:\nwant %+v\ngot  %+v", name, mask0, mask)
+		return
+	}
+}
+
+func trimDead(mask []byte) []byte {
+	for len(mask) > 0 && mask[len(mask)-1] == typeScalar {
+		mask = mask[:len(mask)-1]
+	}
+	return mask
+}
+
+var gcinfoSink interface{}
+
+func escape(p interface{}) interface{} {
+	gcinfoSink = p
+	return p
+}
+
+var infoPtr = []byte{typePointer}
+
+type Ptr struct {
+	*byte
+}
+
+var infoPtr10 = []byte{typePointer, typePointer, typePointer, typePointer, typePointer, typePointer, typePointer, typePointer, typePointer, typePointer}
+
+type ScalarPtr struct {
+	q int
+	w *int
+	e int
+	r *int
+	t int
+	y *int
+}
+
+var infoScalarPtr = []byte{typeScalar, typePointer, typeScalar, typePointer, typeScalar, typePointer}
+
+var infoScalarPtr4 = append(append(append(append([]byte(nil), infoScalarPtr...), infoScalarPtr...), infoScalarPtr...), infoScalarPtr...)
+
+type PtrScalar struct {
+	q *int
+	w int
+	e *int
+	r int
+	t *int
+	y int
+}
+
+var infoPtrScalar = []byte{typePointer, typeScalar, typePointer, typeScalar, typePointer, typeScalar}
+
+type BigStruct struct {
+	q *int
+	w byte
+	e [17]byte
+	r []byte
+	t int
+	y uint16
+	u uint64
+	i string
+}
+
+func infoBigStruct() []byte {
+	switch runtime.GOARCH {
+	case "386", "arm", "mips", "mipsle":
+		return []byte{
+			typePointer,                                                // q *int
+			typeScalar, typeScalar, typeScalar, typeScalar, typeScalar, // w byte; e [17]byte
+			typePointer, typeScalar, typeScalar, // r []byte
+			typeScalar, typeScalar, typeScalar, typeScalar, // t int; y uint16; u uint64
+			typePointer, typeScalar, // i string
+		}
+	case "arm64", "amd64", "mips64", "mips64le", "ppc64", "ppc64le", "riscv64", "s390x", "wasm":
+		return []byte{
+			typePointer,                        // q *int
+			typeScalar, typeScalar, typeScalar, // w byte; e [17]byte
+			typePointer, typeScalar, typeScalar, // r []byte
+			typeScalar, typeScalar, typeScalar, // t int; y uint16; u uint64
+			typePointer, typeScalar, // i string
+		}
+	default:
+		panic("unknown arch")
+	}
+}
+
+type Iface interface {
+	f()
+}
+
+type IfaceImpl int
+
+func (IfaceImpl) f() {
+}
+
+var (
+	// BSS
+	bssPtr       Ptr
+	bssScalarPtr ScalarPtr
+	bssPtrScalar PtrScalar
+	bssBigStruct BigStruct
+	bssString    string
+	bssSlice     []string
+	bssEface     interface{}
+	bssIface     Iface
+
+	// DATA
+	dataPtr                   = Ptr{new(byte)}
+	dataScalarPtr             = ScalarPtr{q: 1}
+	dataPtrScalar             = PtrScalar{w: 1}
+	dataBigStruct             = BigStruct{w: 1}
+	dataString                = "foo"
+	dataSlice                 = []string{"foo"}
+	dataEface     interface{} = 42
+	dataIface     Iface       = IfaceImpl(42)
+
+	infoString = []byte{typePointer, typeScalar}
+	infoSlice  = []byte{typePointer, typeScalar, typeScalar}
+	infoEface  = []byte{typeScalar, typePointer}
+	infoIface  = []byte{typeScalar, typePointer}
+)
diff --git a/src/runtime/go_tls.h b/src/runtime/go_tls.h
new file mode 100644
index 0000000..a47e798
--- /dev/null
+++ b/src/runtime/go_tls.h
@@ -0,0 +1,17 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#ifdef GOARCH_arm
+#define LR R14
+#endif
+
+#ifdef GOARCH_amd64
+#define	get_tls(r)	MOVQ TLS, r
+#define	g(r)	0(r)(TLS*1)
+#endif
+
+#ifdef GOARCH_386
+#define	get_tls(r)	MOVL TLS, r
+#define	g(r)	0(r)(TLS*1)
+#endif
diff --git a/src/runtime/hash32.go b/src/runtime/hash32.go
new file mode 100644
index 0000000..966f70e
--- /dev/null
+++ b/src/runtime/hash32.go
@@ -0,0 +1,112 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Hashing algorithm inspired by
+//   xxhash: https://code.google.com/p/xxhash/
+// cityhash: https://code.google.com/p/cityhash/
+
+// +build 386 arm mips mipsle
+
+package runtime
+
+import "unsafe"
+
+const (
+	// Constants for multiplication: four random odd 32-bit numbers.
+	m1 = 3168982561
+	m2 = 3339683297
+	m3 = 832293441
+	m4 = 2336365089
+)
+
+func memhashFallback(p unsafe.Pointer, seed, s uintptr) uintptr {
+	h := uint32(seed + s*hashkey[0])
+tail:
+	switch {
+	case s == 0:
+	case s < 4:
+		h ^= uint32(*(*byte)(p))
+		h ^= uint32(*(*byte)(add(p, s>>1))) << 8
+		h ^= uint32(*(*byte)(add(p, s-1))) << 16
+		h = rotl_15(h*m1) * m2
+	case s == 4:
+		h ^= readUnaligned32(p)
+		h = rotl_15(h*m1) * m2
+	case s <= 8:
+		h ^= readUnaligned32(p)
+		h = rotl_15(h*m1) * m2
+		h ^= readUnaligned32(add(p, s-4))
+		h = rotl_15(h*m1) * m2
+	case s <= 16:
+		h ^= readUnaligned32(p)
+		h = rotl_15(h*m1) * m2
+		h ^= readUnaligned32(add(p, 4))
+		h = rotl_15(h*m1) * m2
+		h ^= readUnaligned32(add(p, s-8))
+		h = rotl_15(h*m1) * m2
+		h ^= readUnaligned32(add(p, s-4))
+		h = rotl_15(h*m1) * m2
+	default:
+		v1 := h
+		v2 := uint32(seed * hashkey[1])
+		v3 := uint32(seed * hashkey[2])
+		v4 := uint32(seed * hashkey[3])
+		for s >= 16 {
+			v1 ^= readUnaligned32(p)
+			v1 = rotl_15(v1*m1) * m2
+			p = add(p, 4)
+			v2 ^= readUnaligned32(p)
+			v2 = rotl_15(v2*m2) * m3
+			p = add(p, 4)
+			v3 ^= readUnaligned32(p)
+			v3 = rotl_15(v3*m3) * m4
+			p = add(p, 4)
+			v4 ^= readUnaligned32(p)
+			v4 = rotl_15(v4*m4) * m1
+			p = add(p, 4)
+			s -= 16
+		}
+		h = v1 ^ v2 ^ v3 ^ v4
+		goto tail
+	}
+	h ^= h >> 17
+	h *= m3
+	h ^= h >> 13
+	h *= m4
+	h ^= h >> 16
+	return uintptr(h)
+}
+
+func memhash32Fallback(p unsafe.Pointer, seed uintptr) uintptr {
+	h := uint32(seed + 4*hashkey[0])
+	h ^= readUnaligned32(p)
+	h = rotl_15(h*m1) * m2
+	h ^= h >> 17
+	h *= m3
+	h ^= h >> 13
+	h *= m4
+	h ^= h >> 16
+	return uintptr(h)
+}
+
+func memhash64Fallback(p unsafe.Pointer, seed uintptr) uintptr {
+	h := uint32(seed + 8*hashkey[0])
+	h ^= readUnaligned32(p)
+	h = rotl_15(h*m1) * m2
+	h ^= readUnaligned32(add(p, 4))
+	h = rotl_15(h*m1) * m2
+	h ^= h >> 17
+	h *= m3
+	h ^= h >> 13
+	h *= m4
+	h ^= h >> 16
+	return uintptr(h)
+}
+
+// Note: in order to get the compiler to issue rotl instructions, we
+// need to constant fold the shift amount by hand.
+// TODO: convince the compiler to issue rotl instructions after inlining.
+func rotl_15(x uint32) uint32 {
+	return (x << 15) | (x >> (32 - 15))
+}
diff --git a/src/runtime/hash64.go b/src/runtime/hash64.go
new file mode 100644
index 0000000..d128382
--- /dev/null
+++ b/src/runtime/hash64.go
@@ -0,0 +1,108 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Hashing algorithm inspired by
+//   xxhash: https://code.google.com/p/xxhash/
+// cityhash: https://code.google.com/p/cityhash/
+
+// +build amd64 arm64 mips64 mips64le ppc64 ppc64le riscv64 s390x wasm
+
+package runtime
+
+import "unsafe"
+
+const (
+	// Constants for multiplication: four random odd 64-bit numbers.
+	m1 = 16877499708836156737
+	m2 = 2820277070424839065
+	m3 = 9497967016996688599
+	m4 = 15839092249703872147
+)
+
+func memhashFallback(p unsafe.Pointer, seed, s uintptr) uintptr {
+	h := uint64(seed + s*hashkey[0])
+tail:
+	switch {
+	case s == 0:
+	case s < 4:
+		h ^= uint64(*(*byte)(p))
+		h ^= uint64(*(*byte)(add(p, s>>1))) << 8
+		h ^= uint64(*(*byte)(add(p, s-1))) << 16
+		h = rotl_31(h*m1) * m2
+	case s <= 8:
+		h ^= uint64(readUnaligned32(p))
+		h ^= uint64(readUnaligned32(add(p, s-4))) << 32
+		h = rotl_31(h*m1) * m2
+	case s <= 16:
+		h ^= readUnaligned64(p)
+		h = rotl_31(h*m1) * m2
+		h ^= readUnaligned64(add(p, s-8))
+		h = rotl_31(h*m1) * m2
+	case s <= 32:
+		h ^= readUnaligned64(p)
+		h = rotl_31(h*m1) * m2
+		h ^= readUnaligned64(add(p, 8))
+		h = rotl_31(h*m1) * m2
+		h ^= readUnaligned64(add(p, s-16))
+		h = rotl_31(h*m1) * m2
+		h ^= readUnaligned64(add(p, s-8))
+		h = rotl_31(h*m1) * m2
+	default:
+		v1 := h
+		v2 := uint64(seed * hashkey[1])
+		v3 := uint64(seed * hashkey[2])
+		v4 := uint64(seed * hashkey[3])
+		for s >= 32 {
+			v1 ^= readUnaligned64(p)
+			v1 = rotl_31(v1*m1) * m2
+			p = add(p, 8)
+			v2 ^= readUnaligned64(p)
+			v2 = rotl_31(v2*m2) * m3
+			p = add(p, 8)
+			v3 ^= readUnaligned64(p)
+			v3 = rotl_31(v3*m3) * m4
+			p = add(p, 8)
+			v4 ^= readUnaligned64(p)
+			v4 = rotl_31(v4*m4) * m1
+			p = add(p, 8)
+			s -= 32
+		}
+		h = v1 ^ v2 ^ v3 ^ v4
+		goto tail
+	}
+
+	h ^= h >> 29
+	h *= m3
+	h ^= h >> 32
+	return uintptr(h)
+}
+
+func memhash32Fallback(p unsafe.Pointer, seed uintptr) uintptr {
+	h := uint64(seed + 4*hashkey[0])
+	v := uint64(readUnaligned32(p))
+	h ^= v
+	h ^= v << 32
+	h = rotl_31(h*m1) * m2
+	h ^= h >> 29
+	h *= m3
+	h ^= h >> 32
+	return uintptr(h)
+}
+
+func memhash64Fallback(p unsafe.Pointer, seed uintptr) uintptr {
+	h := uint64(seed + 8*hashkey[0])
+	h ^= uint64(readUnaligned32(p)) | uint64(readUnaligned32(add(p, 4)))<<32
+	h = rotl_31(h*m1) * m2
+	h ^= h >> 29
+	h *= m3
+	h ^= h >> 32
+	return uintptr(h)
+}
+
+// Note: in order to get the compiler to issue rotl instructions, we
+// need to constant fold the shift amount by hand.
+// TODO: convince the compiler to issue rotl instructions after inlining.
+func rotl_31(x uint64) uint64 {
+	return (x << 31) | (x >> (64 - 31))
+}
diff --git a/src/runtime/hash_test.go b/src/runtime/hash_test.go
new file mode 100644
index 0000000..5023835
--- /dev/null
+++ b/src/runtime/hash_test.go
@@ -0,0 +1,806 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"math"
+	"math/rand"
+	"reflect"
+	. "runtime"
+	"strings"
+	"testing"
+	"unsafe"
+)
+
+func TestMemHash32Equality(t *testing.T) {
+	if *UseAeshash {
+		t.Skip("skipping since AES hash implementation is used")
+	}
+	var b [4]byte
+	r := rand.New(rand.NewSource(1234))
+	seed := uintptr(r.Uint64())
+	for i := 0; i < 100; i++ {
+		randBytes(r, b[:])
+		got := MemHash32(unsafe.Pointer(&b), seed)
+		want := MemHash(unsafe.Pointer(&b), seed, 4)
+		if got != want {
+			t.Errorf("MemHash32(%x, %v) = %v; want %v", b, seed, got, want)
+		}
+	}
+}
+
+func TestMemHash64Equality(t *testing.T) {
+	if *UseAeshash {
+		t.Skip("skipping since AES hash implementation is used")
+	}
+	var b [8]byte
+	r := rand.New(rand.NewSource(1234))
+	seed := uintptr(r.Uint64())
+	for i := 0; i < 100; i++ {
+		randBytes(r, b[:])
+		got := MemHash64(unsafe.Pointer(&b), seed)
+		want := MemHash(unsafe.Pointer(&b), seed, 8)
+		if got != want {
+			t.Errorf("MemHash64(%x, %v) = %v; want %v", b, seed, got, want)
+		}
+	}
+}
+
+func TestCompilerVsRuntimeHash(t *testing.T) {
+	// Test to make sure the compiler's hash function and the runtime's hash function agree.
+	// See issue 37716.
+	for _, m := range []interface{}{
+		map[bool]int{},
+		map[int8]int{},
+		map[uint8]int{},
+		map[int16]int{},
+		map[uint16]int{},
+		map[int32]int{},
+		map[uint32]int{},
+		map[int64]int{},
+		map[uint64]int{},
+		map[int]int{},
+		map[uint]int{},
+		map[uintptr]int{},
+		map[*byte]int{},
+		map[chan int]int{},
+		map[unsafe.Pointer]int{},
+		map[float32]int{},
+		map[float64]int{},
+		map[complex64]int{},
+		map[complex128]int{},
+		map[string]int{},
+		//map[interface{}]int{},
+		//map[interface{F()}]int{},
+		map[[8]uint64]int{},
+		map[[8]string]int{},
+		map[struct{ a, b, c, d int32 }]int{}, // Note: tests AMEM128
+		map[struct{ a, b, _, d int32 }]int{},
+		map[struct {
+			a, b int32
+			c    float32
+			d, e [8]byte
+		}]int{},
+		map[struct {
+			a int16
+			b int64
+		}]int{},
+	} {
+		k := reflect.New(reflect.TypeOf(m).Key()).Elem().Interface() // the zero key
+		x, y := MapHashCheck(m, k)
+		if x != y {
+			t.Errorf("hashes did not match (%x vs %x) for map %T", x, y, m)
+		}
+	}
+}
+
+// Smhasher is a torture test for hash functions.
+// https://code.google.com/p/smhasher/
+// This code is a port of some of the Smhasher tests to Go.
+//
+// The current AES hash function passes Smhasher. Our fallback
+// hash functions don't, so we only enable the difficult tests when
+// we know the AES implementation is available.
+
+// Sanity checks.
+// hash should not depend on values outside key.
+// hash should not depend on alignment.
+func TestSmhasherSanity(t *testing.T) {
+	r := rand.New(rand.NewSource(1234))
+	const REP = 10
+	const KEYMAX = 128
+	const PAD = 16
+	const OFFMAX = 16
+	for k := 0; k < REP; k++ {
+		for n := 0; n < KEYMAX; n++ {
+			for i := 0; i < OFFMAX; i++ {
+				var b [KEYMAX + OFFMAX + 2*PAD]byte
+				var c [KEYMAX + OFFMAX + 2*PAD]byte
+				randBytes(r, b[:])
+				randBytes(r, c[:])
+				copy(c[PAD+i:PAD+i+n], b[PAD:PAD+n])
+				if BytesHash(b[PAD:PAD+n], 0) != BytesHash(c[PAD+i:PAD+i+n], 0) {
+					t.Errorf("hash depends on bytes outside key")
+				}
+			}
+		}
+	}
+}
+
+type HashSet struct {
+	m map[uintptr]struct{} // set of hashes added
+	n int                  // number of hashes added
+}
+
+func newHashSet() *HashSet {
+	return &HashSet{make(map[uintptr]struct{}), 0}
+}
+func (s *HashSet) add(h uintptr) {
+	s.m[h] = struct{}{}
+	s.n++
+}
+func (s *HashSet) addS(x string) {
+	s.add(StringHash(x, 0))
+}
+func (s *HashSet) addB(x []byte) {
+	s.add(BytesHash(x, 0))
+}
+func (s *HashSet) addS_seed(x string, seed uintptr) {
+	s.add(StringHash(x, seed))
+}
+func (s *HashSet) check(t *testing.T) {
+	const SLOP = 50.0
+	collisions := s.n - len(s.m)
+	pairs := int64(s.n) * int64(s.n-1) / 2
+	expected := float64(pairs) / math.Pow(2.0, float64(hashSize))
+	stddev := math.Sqrt(expected)
+	if float64(collisions) > expected+SLOP*(3*stddev+1) {
+		t.Errorf("unexpected number of collisions: got=%d mean=%f stddev=%f threshold=%f", collisions, expected, stddev, expected+SLOP*(3*stddev+1))
+	}
+}
+
+// a string plus adding zeros must make distinct hashes
+func TestSmhasherAppendedZeros(t *testing.T) {
+	s := "hello" + strings.Repeat("\x00", 256)
+	h := newHashSet()
+	for i := 0; i <= len(s); i++ {
+		h.addS(s[:i])
+	}
+	h.check(t)
+}
+
+// All 0-3 byte strings have distinct hashes.
+func TestSmhasherSmallKeys(t *testing.T) {
+	h := newHashSet()
+	var b [3]byte
+	for i := 0; i < 256; i++ {
+		b[0] = byte(i)
+		h.addB(b[:1])
+		for j := 0; j < 256; j++ {
+			b[1] = byte(j)
+			h.addB(b[:2])
+			if !testing.Short() {
+				for k := 0; k < 256; k++ {
+					b[2] = byte(k)
+					h.addB(b[:3])
+				}
+			}
+		}
+	}
+	h.check(t)
+}
+
+// Different length strings of all zeros have distinct hashes.
+func TestSmhasherZeros(t *testing.T) {
+	N := 256 * 1024
+	if testing.Short() {
+		N = 1024
+	}
+	h := newHashSet()
+	b := make([]byte, N)
+	for i := 0; i <= N; i++ {
+		h.addB(b[:i])
+	}
+	h.check(t)
+}
+
+// Strings with up to two nonzero bytes all have distinct hashes.
+func TestSmhasherTwoNonzero(t *testing.T) {
+	if GOARCH == "wasm" {
+		t.Skip("Too slow on wasm")
+	}
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	h := newHashSet()
+	for n := 2; n <= 16; n++ {
+		twoNonZero(h, n)
+	}
+	h.check(t)
+}
+func twoNonZero(h *HashSet, n int) {
+	b := make([]byte, n)
+
+	// all zero
+	h.addB(b)
+
+	// one non-zero byte
+	for i := 0; i < n; i++ {
+		for x := 1; x < 256; x++ {
+			b[i] = byte(x)
+			h.addB(b)
+			b[i] = 0
+		}
+	}
+
+	// two non-zero bytes
+	for i := 0; i < n; i++ {
+		for x := 1; x < 256; x++ {
+			b[i] = byte(x)
+			for j := i + 1; j < n; j++ {
+				for y := 1; y < 256; y++ {
+					b[j] = byte(y)
+					h.addB(b)
+					b[j] = 0
+				}
+			}
+			b[i] = 0
+		}
+	}
+}
+
+// Test strings with repeats, like "abcdabcdabcdabcd..."
+func TestSmhasherCyclic(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	r := rand.New(rand.NewSource(1234))
+	const REPEAT = 8
+	const N = 1000000
+	for n := 4; n <= 12; n++ {
+		h := newHashSet()
+		b := make([]byte, REPEAT*n)
+		for i := 0; i < N; i++ {
+			b[0] = byte(i * 79 % 97)
+			b[1] = byte(i * 43 % 137)
+			b[2] = byte(i * 151 % 197)
+			b[3] = byte(i * 199 % 251)
+			randBytes(r, b[4:n])
+			for j := n; j < n*REPEAT; j++ {
+				b[j] = b[j-n]
+			}
+			h.addB(b)
+		}
+		h.check(t)
+	}
+}
+
+// Test strings with only a few bits set
+func TestSmhasherSparse(t *testing.T) {
+	if GOARCH == "wasm" {
+		t.Skip("Too slow on wasm")
+	}
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	sparse(t, 32, 6)
+	sparse(t, 40, 6)
+	sparse(t, 48, 5)
+	sparse(t, 56, 5)
+	sparse(t, 64, 5)
+	sparse(t, 96, 4)
+	sparse(t, 256, 3)
+	sparse(t, 2048, 2)
+}
+func sparse(t *testing.T, n int, k int) {
+	b := make([]byte, n/8)
+	h := newHashSet()
+	setbits(h, b, 0, k)
+	h.check(t)
+}
+
+// set up to k bits at index i and greater
+func setbits(h *HashSet, b []byte, i int, k int) {
+	h.addB(b)
+	if k == 0 {
+		return
+	}
+	for j := i; j < len(b)*8; j++ {
+		b[j/8] |= byte(1 << uint(j&7))
+		setbits(h, b, j+1, k-1)
+		b[j/8] &= byte(^(1 << uint(j&7)))
+	}
+}
+
+// Test all possible combinations of n blocks from the set s.
+// "permutation" is a bad name here, but it is what Smhasher uses.
+func TestSmhasherPermutation(t *testing.T) {
+	if GOARCH == "wasm" {
+		t.Skip("Too slow on wasm")
+	}
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	permutation(t, []uint32{0, 1, 2, 3, 4, 5, 6, 7}, 8)
+	permutation(t, []uint32{0, 1 << 29, 2 << 29, 3 << 29, 4 << 29, 5 << 29, 6 << 29, 7 << 29}, 8)
+	permutation(t, []uint32{0, 1}, 20)
+	permutation(t, []uint32{0, 1 << 31}, 20)
+	permutation(t, []uint32{0, 1, 2, 3, 4, 5, 6, 7, 1 << 29, 2 << 29, 3 << 29, 4 << 29, 5 << 29, 6 << 29, 7 << 29}, 6)
+}
+func permutation(t *testing.T, s []uint32, n int) {
+	b := make([]byte, n*4)
+	h := newHashSet()
+	genPerm(h, b, s, 0)
+	h.check(t)
+}
+func genPerm(h *HashSet, b []byte, s []uint32, n int) {
+	h.addB(b[:n])
+	if n == len(b) {
+		return
+	}
+	for _, v := range s {
+		b[n] = byte(v)
+		b[n+1] = byte(v >> 8)
+		b[n+2] = byte(v >> 16)
+		b[n+3] = byte(v >> 24)
+		genPerm(h, b, s, n+4)
+	}
+}
+
+type Key interface {
+	clear()              // set bits all to 0
+	random(r *rand.Rand) // set key to something random
+	bits() int           // how many bits key has
+	flipBit(i int)       // flip bit i of the key
+	hash() uintptr       // hash the key
+	name() string        // for error reporting
+}
+
+type BytesKey struct {
+	b []byte
+}
+
+func (k *BytesKey) clear() {
+	for i := range k.b {
+		k.b[i] = 0
+	}
+}
+func (k *BytesKey) random(r *rand.Rand) {
+	randBytes(r, k.b)
+}
+func (k *BytesKey) bits() int {
+	return len(k.b) * 8
+}
+func (k *BytesKey) flipBit(i int) {
+	k.b[i>>3] ^= byte(1 << uint(i&7))
+}
+func (k *BytesKey) hash() uintptr {
+	return BytesHash(k.b, 0)
+}
+func (k *BytesKey) name() string {
+	return fmt.Sprintf("bytes%d", len(k.b))
+}
+
+type Int32Key struct {
+	i uint32
+}
+
+func (k *Int32Key) clear() {
+	k.i = 0
+}
+func (k *Int32Key) random(r *rand.Rand) {
+	k.i = r.Uint32()
+}
+func (k *Int32Key) bits() int {
+	return 32
+}
+func (k *Int32Key) flipBit(i int) {
+	k.i ^= 1 << uint(i)
+}
+func (k *Int32Key) hash() uintptr {
+	return Int32Hash(k.i, 0)
+}
+func (k *Int32Key) name() string {
+	return "int32"
+}
+
+type Int64Key struct {
+	i uint64
+}
+
+func (k *Int64Key) clear() {
+	k.i = 0
+}
+func (k *Int64Key) random(r *rand.Rand) {
+	k.i = uint64(r.Uint32()) + uint64(r.Uint32())<<32
+}
+func (k *Int64Key) bits() int {
+	return 64
+}
+func (k *Int64Key) flipBit(i int) {
+	k.i ^= 1 << uint(i)
+}
+func (k *Int64Key) hash() uintptr {
+	return Int64Hash(k.i, 0)
+}
+func (k *Int64Key) name() string {
+	return "int64"
+}
+
+type EfaceKey struct {
+	i interface{}
+}
+
+func (k *EfaceKey) clear() {
+	k.i = nil
+}
+func (k *EfaceKey) random(r *rand.Rand) {
+	k.i = uint64(r.Int63())
+}
+func (k *EfaceKey) bits() int {
+	// use 64 bits. This tests inlined interfaces
+	// on 64-bit targets and indirect interfaces on
+	// 32-bit targets.
+	return 64
+}
+func (k *EfaceKey) flipBit(i int) {
+	k.i = k.i.(uint64) ^ uint64(1)<<uint(i)
+}
+func (k *EfaceKey) hash() uintptr {
+	return EfaceHash(k.i, 0)
+}
+func (k *EfaceKey) name() string {
+	return "Eface"
+}
+
+type IfaceKey struct {
+	i interface {
+		F()
+	}
+}
+type fInter uint64
+
+func (x fInter) F() {
+}
+
+func (k *IfaceKey) clear() {
+	k.i = nil
+}
+func (k *IfaceKey) random(r *rand.Rand) {
+	k.i = fInter(r.Int63())
+}
+func (k *IfaceKey) bits() int {
+	// use 64 bits. This tests inlined interfaces
+	// on 64-bit targets and indirect interfaces on
+	// 32-bit targets.
+	return 64
+}
+func (k *IfaceKey) flipBit(i int) {
+	k.i = k.i.(fInter) ^ fInter(1)<<uint(i)
+}
+func (k *IfaceKey) hash() uintptr {
+	return IfaceHash(k.i, 0)
+}
+func (k *IfaceKey) name() string {
+	return "Iface"
+}
+
+// Flipping a single bit of a key should flip each output bit with 50% probability.
+func TestSmhasherAvalanche(t *testing.T) {
+	if GOARCH == "wasm" {
+		t.Skip("Too slow on wasm")
+	}
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	avalancheTest1(t, &BytesKey{make([]byte, 2)})
+	avalancheTest1(t, &BytesKey{make([]byte, 4)})
+	avalancheTest1(t, &BytesKey{make([]byte, 8)})
+	avalancheTest1(t, &BytesKey{make([]byte, 16)})
+	avalancheTest1(t, &BytesKey{make([]byte, 32)})
+	avalancheTest1(t, &BytesKey{make([]byte, 200)})
+	avalancheTest1(t, &Int32Key{})
+	avalancheTest1(t, &Int64Key{})
+	avalancheTest1(t, &EfaceKey{})
+	avalancheTest1(t, &IfaceKey{})
+}
+func avalancheTest1(t *testing.T, k Key) {
+	const REP = 100000
+	r := rand.New(rand.NewSource(1234))
+	n := k.bits()
+
+	// grid[i][j] is a count of whether flipping
+	// input bit i affects output bit j.
+	grid := make([][hashSize]int, n)
+
+	for z := 0; z < REP; z++ {
+		// pick a random key, hash it
+		k.random(r)
+		h := k.hash()
+
+		// flip each bit, hash & compare the results
+		for i := 0; i < n; i++ {
+			k.flipBit(i)
+			d := h ^ k.hash()
+			k.flipBit(i)
+
+			// record the effects of that bit flip
+			g := &grid[i]
+			for j := 0; j < hashSize; j++ {
+				g[j] += int(d & 1)
+				d >>= 1
+			}
+		}
+	}
+
+	// Each entry in the grid should be about REP/2.
+	// More precisely, we did N = k.bits() * hashSize experiments where
+	// each is the sum of REP coin flips. We want to find bounds on the
+	// sum of coin flips such that a truly random experiment would have
+	// all sums inside those bounds with 99% probability.
+	N := n * hashSize
+	var c float64
+	// find c such that Prob(mean-c*stddev < x < mean+c*stddev)^N > .9999
+	for c = 0.0; math.Pow(math.Erf(c/math.Sqrt(2)), float64(N)) < .9999; c += .1 {
+	}
+	c *= 4.0 // allowed slack - we don't need to be perfectly random
+	mean := .5 * REP
+	stddev := .5 * math.Sqrt(REP)
+	low := int(mean - c*stddev)
+	high := int(mean + c*stddev)
+	for i := 0; i < n; i++ {
+		for j := 0; j < hashSize; j++ {
+			x := grid[i][j]
+			if x < low || x > high {
+				t.Errorf("bad bias for %s bit %d -> bit %d: %d/%d\n", k.name(), i, j, x, REP)
+			}
+		}
+	}
+}
+
+// All bit rotations of a set of distinct keys
+func TestSmhasherWindowed(t *testing.T) {
+	t.Logf("32 bit keys")
+	windowed(t, &Int32Key{})
+	t.Logf("64 bit keys")
+	windowed(t, &Int64Key{})
+	t.Logf("string keys")
+	windowed(t, &BytesKey{make([]byte, 128)})
+}
+func windowed(t *testing.T, k Key) {
+	if GOARCH == "wasm" {
+		t.Skip("Too slow on wasm")
+	}
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	const BITS = 16
+
+	for r := 0; r < k.bits(); r++ {
+		h := newHashSet()
+		for i := 0; i < 1<<BITS; i++ {
+			k.clear()
+			for j := 0; j < BITS; j++ {
+				if i>>uint(j)&1 != 0 {
+					k.flipBit((j + r) % k.bits())
+				}
+			}
+			h.add(k.hash())
+		}
+		h.check(t)
+	}
+}
+
+// All keys of the form prefix + [A-Za-z0-9]*N + suffix.
+func TestSmhasherText(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	text(t, "Foo", "Bar")
+	text(t, "FooBar", "")
+	text(t, "", "FooBar")
+}
+func text(t *testing.T, prefix, suffix string) {
+	const N = 4
+	const S = "ABCDEFGHIJKLMNOPQRSTabcdefghijklmnopqrst0123456789"
+	const L = len(S)
+	b := make([]byte, len(prefix)+N+len(suffix))
+	copy(b, prefix)
+	copy(b[len(prefix)+N:], suffix)
+	h := newHashSet()
+	c := b[len(prefix):]
+	for i := 0; i < L; i++ {
+		c[0] = S[i]
+		for j := 0; j < L; j++ {
+			c[1] = S[j]
+			for k := 0; k < L; k++ {
+				c[2] = S[k]
+				for x := 0; x < L; x++ {
+					c[3] = S[x]
+					h.addB(b)
+				}
+			}
+		}
+	}
+	h.check(t)
+}
+
+// Make sure different seed values generate different hashes.
+func TestSmhasherSeed(t *testing.T) {
+	h := newHashSet()
+	const N = 100000
+	s := "hello"
+	for i := 0; i < N; i++ {
+		h.addS_seed(s, uintptr(i))
+	}
+	h.check(t)
+}
+
+// size of the hash output (32 or 64 bits)
+const hashSize = 32 + int(^uintptr(0)>>63<<5)
+
+func randBytes(r *rand.Rand, b []byte) {
+	for i := range b {
+		b[i] = byte(r.Uint32())
+	}
+}
+
+func benchmarkHash(b *testing.B, n int) {
+	s := strings.Repeat("A", n)
+
+	for i := 0; i < b.N; i++ {
+		StringHash(s, 0)
+	}
+	b.SetBytes(int64(n))
+}
+
+func BenchmarkHash5(b *testing.B)     { benchmarkHash(b, 5) }
+func BenchmarkHash16(b *testing.B)    { benchmarkHash(b, 16) }
+func BenchmarkHash64(b *testing.B)    { benchmarkHash(b, 64) }
+func BenchmarkHash1024(b *testing.B)  { benchmarkHash(b, 1024) }
+func BenchmarkHash65536(b *testing.B) { benchmarkHash(b, 65536) }
+
+func TestArrayHash(t *testing.T) {
+	// Make sure that "" in arrays hash correctly. The hash
+	// should at least scramble the input seed so that, e.g.,
+	// {"","foo"} and {"foo",""} have different hashes.
+
+	// If the hash is bad, then all (8 choose 4) = 70 keys
+	// have the same hash. If so, we allocate 70/8 = 8
+	// overflow buckets. If the hash is good we don't
+	// normally allocate any overflow buckets, and the
+	// probability of even one or two overflows goes down rapidly.
+	// (There is always 1 allocation of the bucket array. The map
+	// header is allocated on the stack.)
+	f := func() {
+		// Make the key type at most 128 bytes. Otherwise,
+		// we get an allocation per key.
+		type key [8]string
+		m := make(map[key]bool, 70)
+
+		// fill m with keys that have 4 "foo"s and 4 ""s.
+		for i := 0; i < 256; i++ {
+			var k key
+			cnt := 0
+			for j := uint(0); j < 8; j++ {
+				if i>>j&1 != 0 {
+					k[j] = "foo"
+					cnt++
+				}
+			}
+			if cnt == 4 {
+				m[k] = true
+			}
+		}
+		if len(m) != 70 {
+			t.Errorf("bad test: (8 choose 4) should be 70, not %d", len(m))
+		}
+	}
+	if n := testing.AllocsPerRun(10, f); n > 6 {
+		t.Errorf("too many allocs %f - hash not balanced", n)
+	}
+}
+func TestStructHash(t *testing.T) {
+	// See the comment in TestArrayHash.
+	f := func() {
+		type key struct {
+			a, b, c, d, e, f, g, h string
+		}
+		m := make(map[key]bool, 70)
+
+		// fill m with keys that have 4 "foo"s and 4 ""s.
+		for i := 0; i < 256; i++ {
+			var k key
+			cnt := 0
+			if i&1 != 0 {
+				k.a = "foo"
+				cnt++
+			}
+			if i&2 != 0 {
+				k.b = "foo"
+				cnt++
+			}
+			if i&4 != 0 {
+				k.c = "foo"
+				cnt++
+			}
+			if i&8 != 0 {
+				k.d = "foo"
+				cnt++
+			}
+			if i&16 != 0 {
+				k.e = "foo"
+				cnt++
+			}
+			if i&32 != 0 {
+				k.f = "foo"
+				cnt++
+			}
+			if i&64 != 0 {
+				k.g = "foo"
+				cnt++
+			}
+			if i&128 != 0 {
+				k.h = "foo"
+				cnt++
+			}
+			if cnt == 4 {
+				m[k] = true
+			}
+		}
+		if len(m) != 70 {
+			t.Errorf("bad test: (8 choose 4) should be 70, not %d", len(m))
+		}
+	}
+	if n := testing.AllocsPerRun(10, f); n > 6 {
+		t.Errorf("too many allocs %f - hash not balanced", n)
+	}
+}
+
+var sink uint64
+
+func BenchmarkAlignedLoad(b *testing.B) {
+	var buf [16]byte
+	p := unsafe.Pointer(&buf[0])
+	var s uint64
+	for i := 0; i < b.N; i++ {
+		s += ReadUnaligned64(p)
+	}
+	sink = s
+}
+
+func BenchmarkUnalignedLoad(b *testing.B) {
+	var buf [16]byte
+	p := unsafe.Pointer(&buf[1])
+	var s uint64
+	for i := 0; i < b.N; i++ {
+		s += ReadUnaligned64(p)
+	}
+	sink = s
+}
+
+func TestCollisions(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	for i := 0; i < 16; i++ {
+		for j := 0; j < 16; j++ {
+			if j == i {
+				continue
+			}
+			var a [16]byte
+			m := make(map[uint16]struct{}, 1<<16)
+			for n := 0; n < 1<<16; n++ {
+				a[i] = byte(n)
+				a[j] = byte(n >> 8)
+				m[uint16(BytesHash(a[:], 0))] = struct{}{}
+			}
+			if len(m) <= 1<<15 {
+				t.Errorf("too many collisions i=%d j=%d outputs=%d out of 65536\n", i, j, len(m))
+			}
+		}
+	}
+}
diff --git a/src/runtime/heapdump.go b/src/runtime/heapdump.go
new file mode 100644
index 0000000..2d53157
--- /dev/null
+++ b/src/runtime/heapdump.go
@@ -0,0 +1,755 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Implementation of runtime/debug.WriteHeapDump. Writes all
+// objects in the heap plus additional info (roots, threads,
+// finalizers, etc.) to a file.
+
+// The format of the dumped file is described at
+// https://golang.org/s/go15heapdump.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+//go:linkname runtime_debug_WriteHeapDump runtime/debug.WriteHeapDump
+func runtime_debug_WriteHeapDump(fd uintptr) {
+	stopTheWorld("write heap dump")
+
+	// Keep m on this G's stack instead of the system stack.
+	// Both readmemstats_m and writeheapdump_m have pretty large
+	// peak stack depths and we risk blowing the system stack.
+	// This is safe because the world is stopped, so we don't
+	// need to worry about anyone shrinking and therefore moving
+	// our stack.
+	var m MemStats
+	systemstack(func() {
+		// Call readmemstats_m here instead of deeper in
+		// writeheapdump_m because we might blow the system stack
+		// otherwise.
+		readmemstats_m(&m)
+		writeheapdump_m(fd, &m)
+	})
+
+	startTheWorld()
+}
+
+const (
+	fieldKindEol       = 0
+	fieldKindPtr       = 1
+	fieldKindIface     = 2
+	fieldKindEface     = 3
+	tagEOF             = 0
+	tagObject          = 1
+	tagOtherRoot       = 2
+	tagType            = 3
+	tagGoroutine       = 4
+	tagStackFrame      = 5
+	tagParams          = 6
+	tagFinalizer       = 7
+	tagItab            = 8
+	tagOSThread        = 9
+	tagMemStats        = 10
+	tagQueuedFinalizer = 11
+	tagData            = 12
+	tagBSS             = 13
+	tagDefer           = 14
+	tagPanic           = 15
+	tagMemProf         = 16
+	tagAllocSample     = 17
+)
+
+var dumpfd uintptr // fd to write the dump to.
+var tmpbuf []byte
+
+// buffer of pending write data
+const (
+	bufSize = 4096
+)
+
+var buf [bufSize]byte
+var nbuf uintptr
+
+func dwrite(data unsafe.Pointer, len uintptr) {
+	if len == 0 {
+		return
+	}
+	if nbuf+len <= bufSize {
+		copy(buf[nbuf:], (*[bufSize]byte)(data)[:len])
+		nbuf += len
+		return
+	}
+
+	write(dumpfd, unsafe.Pointer(&buf), int32(nbuf))
+	if len >= bufSize {
+		write(dumpfd, data, int32(len))
+		nbuf = 0
+	} else {
+		copy(buf[:], (*[bufSize]byte)(data)[:len])
+		nbuf = len
+	}
+}
+
+func dwritebyte(b byte) {
+	dwrite(unsafe.Pointer(&b), 1)
+}
+
+func flush() {
+	write(dumpfd, unsafe.Pointer(&buf), int32(nbuf))
+	nbuf = 0
+}
+
+// Cache of types that have been serialized already.
+// We use a type's hash field to pick a bucket.
+// Inside a bucket, we keep a list of types that
+// have been serialized so far, most recently used first.
+// Note: when a bucket overflows we may end up
+// serializing a type more than once. That's ok.
+const (
+	typeCacheBuckets = 256
+	typeCacheAssoc   = 4
+)
+
+type typeCacheBucket struct {
+	t [typeCacheAssoc]*_type
+}
+
+var typecache [typeCacheBuckets]typeCacheBucket
+
+// dump a uint64 in a varint format parseable by encoding/binary
+func dumpint(v uint64) {
+	var buf [10]byte
+	var n int
+	for v >= 0x80 {
+		buf[n] = byte(v | 0x80)
+		n++
+		v >>= 7
+	}
+	buf[n] = byte(v)
+	n++
+	dwrite(unsafe.Pointer(&buf), uintptr(n))
+}
+
+func dumpbool(b bool) {
+	if b {
+		dumpint(1)
+	} else {
+		dumpint(0)
+	}
+}
+
+// dump varint uint64 length followed by memory contents
+func dumpmemrange(data unsafe.Pointer, len uintptr) {
+	dumpint(uint64(len))
+	dwrite(data, len)
+}
+
+func dumpslice(b []byte) {
+	dumpint(uint64(len(b)))
+	if len(b) > 0 {
+		dwrite(unsafe.Pointer(&b[0]), uintptr(len(b)))
+	}
+}
+
+func dumpstr(s string) {
+	sp := stringStructOf(&s)
+	dumpmemrange(sp.str, uintptr(sp.len))
+}
+
+// dump information for a type
+func dumptype(t *_type) {
+	if t == nil {
+		return
+	}
+
+	// If we've definitely serialized the type before,
+	// no need to do it again.
+	b := &typecache[t.hash&(typeCacheBuckets-1)]
+	if t == b.t[0] {
+		return
+	}
+	for i := 1; i < typeCacheAssoc; i++ {
+		if t == b.t[i] {
+			// Move-to-front
+			for j := i; j > 0; j-- {
+				b.t[j] = b.t[j-1]
+			}
+			b.t[0] = t
+			return
+		}
+	}
+
+	// Might not have been dumped yet. Dump it and
+	// remember we did so.
+	for j := typeCacheAssoc - 1; j > 0; j-- {
+		b.t[j] = b.t[j-1]
+	}
+	b.t[0] = t
+
+	// dump the type
+	dumpint(tagType)
+	dumpint(uint64(uintptr(unsafe.Pointer(t))))
+	dumpint(uint64(t.size))
+	if x := t.uncommon(); x == nil || t.nameOff(x.pkgpath).name() == "" {
+		dumpstr(t.string())
+	} else {
+		pkgpathstr := t.nameOff(x.pkgpath).name()
+		pkgpath := stringStructOf(&pkgpathstr)
+		namestr := t.name()
+		name := stringStructOf(&namestr)
+		dumpint(uint64(uintptr(pkgpath.len) + 1 + uintptr(name.len)))
+		dwrite(pkgpath.str, uintptr(pkgpath.len))
+		dwritebyte('.')
+		dwrite(name.str, uintptr(name.len))
+	}
+	dumpbool(t.kind&kindDirectIface == 0 || t.ptrdata != 0)
+}
+
+// dump an object
+func dumpobj(obj unsafe.Pointer, size uintptr, bv bitvector) {
+	dumpint(tagObject)
+	dumpint(uint64(uintptr(obj)))
+	dumpmemrange(obj, size)
+	dumpfields(bv)
+}
+
+func dumpotherroot(description string, to unsafe.Pointer) {
+	dumpint(tagOtherRoot)
+	dumpstr(description)
+	dumpint(uint64(uintptr(to)))
+}
+
+func dumpfinalizer(obj unsafe.Pointer, fn *funcval, fint *_type, ot *ptrtype) {
+	dumpint(tagFinalizer)
+	dumpint(uint64(uintptr(obj)))
+	dumpint(uint64(uintptr(unsafe.Pointer(fn))))
+	dumpint(uint64(uintptr(unsafe.Pointer(fn.fn))))
+	dumpint(uint64(uintptr(unsafe.Pointer(fint))))
+	dumpint(uint64(uintptr(unsafe.Pointer(ot))))
+}
+
+type childInfo struct {
+	// Information passed up from the callee frame about
+	// the layout of the outargs region.
+	argoff uintptr   // where the arguments start in the frame
+	arglen uintptr   // size of args region
+	args   bitvector // if args.n >= 0, pointer map of args region
+	sp     *uint8    // callee sp
+	depth  uintptr   // depth in call stack (0 == most recent)
+}
+
+// dump kinds & offsets of interesting fields in bv
+func dumpbv(cbv *bitvector, offset uintptr) {
+	for i := uintptr(0); i < uintptr(cbv.n); i++ {
+		if cbv.ptrbit(i) == 1 {
+			dumpint(fieldKindPtr)
+			dumpint(uint64(offset + i*sys.PtrSize))
+		}
+	}
+}
+
+func dumpframe(s *stkframe, arg unsafe.Pointer) bool {
+	child := (*childInfo)(arg)
+	f := s.fn
+
+	// Figure out what we can about our stack map
+	pc := s.pc
+	pcdata := int32(-1) // Use the entry map at function entry
+	if pc != f.entry {
+		pc--
+		pcdata = pcdatavalue(f, _PCDATA_StackMapIndex, pc, nil)
+	}
+	if pcdata == -1 {
+		// We do not have a valid pcdata value but there might be a
+		// stackmap for this function. It is likely that we are looking
+		// at the function prologue, assume so and hope for the best.
+		pcdata = 0
+	}
+	stkmap := (*stackmap)(funcdata(f, _FUNCDATA_LocalsPointerMaps))
+
+	var bv bitvector
+	if stkmap != nil && stkmap.n > 0 {
+		bv = stackmapdata(stkmap, pcdata)
+	} else {
+		bv.n = -1
+	}
+
+	// Dump main body of stack frame.
+	dumpint(tagStackFrame)
+	dumpint(uint64(s.sp))                              // lowest address in frame
+	dumpint(uint64(child.depth))                       // # of frames deep on the stack
+	dumpint(uint64(uintptr(unsafe.Pointer(child.sp)))) // sp of child, or 0 if bottom of stack
+	dumpmemrange(unsafe.Pointer(s.sp), s.fp-s.sp)      // frame contents
+	dumpint(uint64(f.entry))
+	dumpint(uint64(s.pc))
+	dumpint(uint64(s.continpc))
+	name := funcname(f)
+	if name == "" {
+		name = "unknown function"
+	}
+	dumpstr(name)
+
+	// Dump fields in the outargs section
+	if child.args.n >= 0 {
+		dumpbv(&child.args, child.argoff)
+	} else {
+		// conservative - everything might be a pointer
+		for off := child.argoff; off < child.argoff+child.arglen; off += sys.PtrSize {
+			dumpint(fieldKindPtr)
+			dumpint(uint64(off))
+		}
+	}
+
+	// Dump fields in the local vars section
+	if stkmap == nil {
+		// No locals information, dump everything.
+		for off := child.arglen; off < s.varp-s.sp; off += sys.PtrSize {
+			dumpint(fieldKindPtr)
+			dumpint(uint64(off))
+		}
+	} else if stkmap.n < 0 {
+		// Locals size information, dump just the locals.
+		size := uintptr(-stkmap.n)
+		for off := s.varp - size - s.sp; off < s.varp-s.sp; off += sys.PtrSize {
+			dumpint(fieldKindPtr)
+			dumpint(uint64(off))
+		}
+	} else if stkmap.n > 0 {
+		// Locals bitmap information, scan just the pointers in
+		// locals.
+		dumpbv(&bv, s.varp-uintptr(bv.n)*sys.PtrSize-s.sp)
+	}
+	dumpint(fieldKindEol)
+
+	// Record arg info for parent.
+	child.argoff = s.argp - s.fp
+	child.arglen = s.arglen
+	child.sp = (*uint8)(unsafe.Pointer(s.sp))
+	child.depth++
+	stkmap = (*stackmap)(funcdata(f, _FUNCDATA_ArgsPointerMaps))
+	if stkmap != nil {
+		child.args = stackmapdata(stkmap, pcdata)
+	} else {
+		child.args.n = -1
+	}
+	return true
+}
+
+func dumpgoroutine(gp *g) {
+	var sp, pc, lr uintptr
+	if gp.syscallsp != 0 {
+		sp = gp.syscallsp
+		pc = gp.syscallpc
+		lr = 0
+	} else {
+		sp = gp.sched.sp
+		pc = gp.sched.pc
+		lr = gp.sched.lr
+	}
+
+	dumpint(tagGoroutine)
+	dumpint(uint64(uintptr(unsafe.Pointer(gp))))
+	dumpint(uint64(sp))
+	dumpint(uint64(gp.goid))
+	dumpint(uint64(gp.gopc))
+	dumpint(uint64(readgstatus(gp)))
+	dumpbool(isSystemGoroutine(gp, false))
+	dumpbool(false) // isbackground
+	dumpint(uint64(gp.waitsince))
+	dumpstr(gp.waitreason.String())
+	dumpint(uint64(uintptr(gp.sched.ctxt)))
+	dumpint(uint64(uintptr(unsafe.Pointer(gp.m))))
+	dumpint(uint64(uintptr(unsafe.Pointer(gp._defer))))
+	dumpint(uint64(uintptr(unsafe.Pointer(gp._panic))))
+
+	// dump stack
+	var child childInfo
+	child.args.n = -1
+	child.arglen = 0
+	child.sp = nil
+	child.depth = 0
+	gentraceback(pc, sp, lr, gp, 0, nil, 0x7fffffff, dumpframe, noescape(unsafe.Pointer(&child)), 0)
+
+	// dump defer & panic records
+	for d := gp._defer; d != nil; d = d.link {
+		dumpint(tagDefer)
+		dumpint(uint64(uintptr(unsafe.Pointer(d))))
+		dumpint(uint64(uintptr(unsafe.Pointer(gp))))
+		dumpint(uint64(d.sp))
+		dumpint(uint64(d.pc))
+		dumpint(uint64(uintptr(unsafe.Pointer(d.fn))))
+		if d.fn == nil {
+			// d.fn can be nil for open-coded defers
+			dumpint(uint64(0))
+		} else {
+			dumpint(uint64(uintptr(unsafe.Pointer(d.fn.fn))))
+		}
+		dumpint(uint64(uintptr(unsafe.Pointer(d.link))))
+	}
+	for p := gp._panic; p != nil; p = p.link {
+		dumpint(tagPanic)
+		dumpint(uint64(uintptr(unsafe.Pointer(p))))
+		dumpint(uint64(uintptr(unsafe.Pointer(gp))))
+		eface := efaceOf(&p.arg)
+		dumpint(uint64(uintptr(unsafe.Pointer(eface._type))))
+		dumpint(uint64(uintptr(unsafe.Pointer(eface.data))))
+		dumpint(0) // was p->defer, no longer recorded
+		dumpint(uint64(uintptr(unsafe.Pointer(p.link))))
+	}
+}
+
+func dumpgs() {
+	// goroutines & stacks
+	for i := 0; uintptr(i) < allglen; i++ {
+		gp := allgs[i]
+		status := readgstatus(gp) // The world is stopped so gp will not be in a scan state.
+		switch status {
+		default:
+			print("runtime: unexpected G.status ", hex(status), "\n")
+			throw("dumpgs in STW - bad status")
+		case _Gdead:
+			// ok
+		case _Grunnable,
+			_Gsyscall,
+			_Gwaiting:
+			dumpgoroutine(gp)
+		}
+	}
+}
+
+func finq_callback(fn *funcval, obj unsafe.Pointer, nret uintptr, fint *_type, ot *ptrtype) {
+	dumpint(tagQueuedFinalizer)
+	dumpint(uint64(uintptr(obj)))
+	dumpint(uint64(uintptr(unsafe.Pointer(fn))))
+	dumpint(uint64(uintptr(unsafe.Pointer(fn.fn))))
+	dumpint(uint64(uintptr(unsafe.Pointer(fint))))
+	dumpint(uint64(uintptr(unsafe.Pointer(ot))))
+}
+
+func dumproots() {
+	// To protect mheap_.allspans.
+	assertWorldStopped()
+
+	// TODO(mwhudson): dump datamask etc from all objects
+	// data segment
+	dumpint(tagData)
+	dumpint(uint64(firstmoduledata.data))
+	dumpmemrange(unsafe.Pointer(firstmoduledata.data), firstmoduledata.edata-firstmoduledata.data)
+	dumpfields(firstmoduledata.gcdatamask)
+
+	// bss segment
+	dumpint(tagBSS)
+	dumpint(uint64(firstmoduledata.bss))
+	dumpmemrange(unsafe.Pointer(firstmoduledata.bss), firstmoduledata.ebss-firstmoduledata.bss)
+	dumpfields(firstmoduledata.gcbssmask)
+
+	// mspan.types
+	for _, s := range mheap_.allspans {
+		if s.state.get() == mSpanInUse {
+			// Finalizers
+			for sp := s.specials; sp != nil; sp = sp.next {
+				if sp.kind != _KindSpecialFinalizer {
+					continue
+				}
+				spf := (*specialfinalizer)(unsafe.Pointer(sp))
+				p := unsafe.Pointer(s.base() + uintptr(spf.special.offset))
+				dumpfinalizer(p, spf.fn, spf.fint, spf.ot)
+			}
+		}
+	}
+
+	// Finalizer queue
+	iterate_finq(finq_callback)
+}
+
+// Bit vector of free marks.
+// Needs to be as big as the largest number of objects per span.
+var freemark [_PageSize / 8]bool
+
+func dumpobjs() {
+	// To protect mheap_.allspans.
+	assertWorldStopped()
+
+	for _, s := range mheap_.allspans {
+		if s.state.get() != mSpanInUse {
+			continue
+		}
+		p := s.base()
+		size := s.elemsize
+		n := (s.npages << _PageShift) / size
+		if n > uintptr(len(freemark)) {
+			throw("freemark array doesn't have enough entries")
+		}
+
+		for freeIndex := uintptr(0); freeIndex < s.nelems; freeIndex++ {
+			if s.isFree(freeIndex) {
+				freemark[freeIndex] = true
+			}
+		}
+
+		for j := uintptr(0); j < n; j, p = j+1, p+size {
+			if freemark[j] {
+				freemark[j] = false
+				continue
+			}
+			dumpobj(unsafe.Pointer(p), size, makeheapobjbv(p, size))
+		}
+	}
+}
+
+func dumpparams() {
+	dumpint(tagParams)
+	x := uintptr(1)
+	if *(*byte)(unsafe.Pointer(&x)) == 1 {
+		dumpbool(false) // little-endian ptrs
+	} else {
+		dumpbool(true) // big-endian ptrs
+	}
+	dumpint(sys.PtrSize)
+	var arenaStart, arenaEnd uintptr
+	for i1 := range mheap_.arenas {
+		if mheap_.arenas[i1] == nil {
+			continue
+		}
+		for i, ha := range mheap_.arenas[i1] {
+			if ha == nil {
+				continue
+			}
+			base := arenaBase(arenaIdx(i1)<<arenaL1Shift | arenaIdx(i))
+			if arenaStart == 0 || base < arenaStart {
+				arenaStart = base
+			}
+			if base+heapArenaBytes > arenaEnd {
+				arenaEnd = base + heapArenaBytes
+			}
+		}
+	}
+	dumpint(uint64(arenaStart))
+	dumpint(uint64(arenaEnd))
+	dumpstr(sys.GOARCH)
+	dumpstr(sys.Goexperiment)
+	dumpint(uint64(ncpu))
+}
+
+func itab_callback(tab *itab) {
+	t := tab._type
+	dumptype(t)
+	dumpint(tagItab)
+	dumpint(uint64(uintptr(unsafe.Pointer(tab))))
+	dumpint(uint64(uintptr(unsafe.Pointer(t))))
+}
+
+func dumpitabs() {
+	iterate_itabs(itab_callback)
+}
+
+func dumpms() {
+	for mp := allm; mp != nil; mp = mp.alllink {
+		dumpint(tagOSThread)
+		dumpint(uint64(uintptr(unsafe.Pointer(mp))))
+		dumpint(uint64(mp.id))
+		dumpint(mp.procid)
+	}
+}
+
+//go:systemstack
+func dumpmemstats(m *MemStats) {
+	assertWorldStopped()
+
+	// These ints should be identical to the exported
+	// MemStats structure and should be ordered the same
+	// way too.
+	dumpint(tagMemStats)
+	dumpint(m.Alloc)
+	dumpint(m.TotalAlloc)
+	dumpint(m.Sys)
+	dumpint(m.Lookups)
+	dumpint(m.Mallocs)
+	dumpint(m.Frees)
+	dumpint(m.HeapAlloc)
+	dumpint(m.HeapSys)
+	dumpint(m.HeapIdle)
+	dumpint(m.HeapInuse)
+	dumpint(m.HeapReleased)
+	dumpint(m.HeapObjects)
+	dumpint(m.StackInuse)
+	dumpint(m.StackSys)
+	dumpint(m.MSpanInuse)
+	dumpint(m.MSpanSys)
+	dumpint(m.MCacheInuse)
+	dumpint(m.MCacheSys)
+	dumpint(m.BuckHashSys)
+	dumpint(m.GCSys)
+	dumpint(m.OtherSys)
+	dumpint(m.NextGC)
+	dumpint(m.LastGC)
+	dumpint(m.PauseTotalNs)
+	for i := 0; i < 256; i++ {
+		dumpint(m.PauseNs[i])
+	}
+	dumpint(uint64(m.NumGC))
+}
+
+func dumpmemprof_callback(b *bucket, nstk uintptr, pstk *uintptr, size, allocs, frees uintptr) {
+	stk := (*[100000]uintptr)(unsafe.Pointer(pstk))
+	dumpint(tagMemProf)
+	dumpint(uint64(uintptr(unsafe.Pointer(b))))
+	dumpint(uint64(size))
+	dumpint(uint64(nstk))
+	for i := uintptr(0); i < nstk; i++ {
+		pc := stk[i]
+		f := findfunc(pc)
+		if !f.valid() {
+			var buf [64]byte
+			n := len(buf)
+			n--
+			buf[n] = ')'
+			if pc == 0 {
+				n--
+				buf[n] = '0'
+			} else {
+				for pc > 0 {
+					n--
+					buf[n] = "0123456789abcdef"[pc&15]
+					pc >>= 4
+				}
+			}
+			n--
+			buf[n] = 'x'
+			n--
+			buf[n] = '0'
+			n--
+			buf[n] = '('
+			dumpslice(buf[n:])
+			dumpstr("?")
+			dumpint(0)
+		} else {
+			dumpstr(funcname(f))
+			if i > 0 && pc > f.entry {
+				pc--
+			}
+			file, line := funcline(f, pc)
+			dumpstr(file)
+			dumpint(uint64(line))
+		}
+	}
+	dumpint(uint64(allocs))
+	dumpint(uint64(frees))
+}
+
+func dumpmemprof() {
+	// To protect mheap_.allspans.
+	assertWorldStopped()
+
+	iterate_memprof(dumpmemprof_callback)
+	for _, s := range mheap_.allspans {
+		if s.state.get() != mSpanInUse {
+			continue
+		}
+		for sp := s.specials; sp != nil; sp = sp.next {
+			if sp.kind != _KindSpecialProfile {
+				continue
+			}
+			spp := (*specialprofile)(unsafe.Pointer(sp))
+			p := s.base() + uintptr(spp.special.offset)
+			dumpint(tagAllocSample)
+			dumpint(uint64(p))
+			dumpint(uint64(uintptr(unsafe.Pointer(spp.b))))
+		}
+	}
+}
+
+var dumphdr = []byte("go1.7 heap dump\n")
+
+func mdump(m *MemStats) {
+	assertWorldStopped()
+
+	// make sure we're done sweeping
+	for _, s := range mheap_.allspans {
+		if s.state.get() == mSpanInUse {
+			s.ensureSwept()
+		}
+	}
+	memclrNoHeapPointers(unsafe.Pointer(&typecache), unsafe.Sizeof(typecache))
+	dwrite(unsafe.Pointer(&dumphdr[0]), uintptr(len(dumphdr)))
+	dumpparams()
+	dumpitabs()
+	dumpobjs()
+	dumpgs()
+	dumpms()
+	dumproots()
+	dumpmemstats(m)
+	dumpmemprof()
+	dumpint(tagEOF)
+	flush()
+}
+
+func writeheapdump_m(fd uintptr, m *MemStats) {
+	assertWorldStopped()
+
+	_g_ := getg()
+	casgstatus(_g_.m.curg, _Grunning, _Gwaiting)
+	_g_.waitreason = waitReasonDumpingHeap
+
+	// Update stats so we can dump them.
+	// As a side effect, flushes all the mcaches so the mspan.freelist
+	// lists contain all the free objects.
+	updatememstats()
+
+	// Set dump file.
+	dumpfd = fd
+
+	// Call dump routine.
+	mdump(m)
+
+	// Reset dump file.
+	dumpfd = 0
+	if tmpbuf != nil {
+		sysFree(unsafe.Pointer(&tmpbuf[0]), uintptr(len(tmpbuf)), &memstats.other_sys)
+		tmpbuf = nil
+	}
+
+	casgstatus(_g_.m.curg, _Gwaiting, _Grunning)
+}
+
+// dumpint() the kind & offset of each field in an object.
+func dumpfields(bv bitvector) {
+	dumpbv(&bv, 0)
+	dumpint(fieldKindEol)
+}
+
+func makeheapobjbv(p uintptr, size uintptr) bitvector {
+	// Extend the temp buffer if necessary.
+	nptr := size / sys.PtrSize
+	if uintptr(len(tmpbuf)) < nptr/8+1 {
+		if tmpbuf != nil {
+			sysFree(unsafe.Pointer(&tmpbuf[0]), uintptr(len(tmpbuf)), &memstats.other_sys)
+		}
+		n := nptr/8 + 1
+		p := sysAlloc(n, &memstats.other_sys)
+		if p == nil {
+			throw("heapdump: out of memory")
+		}
+		tmpbuf = (*[1 << 30]byte)(p)[:n]
+	}
+	// Convert heap bitmap to pointer bitmap.
+	for i := uintptr(0); i < nptr/8+1; i++ {
+		tmpbuf[i] = 0
+	}
+	i := uintptr(0)
+	hbits := heapBitsForAddr(p)
+	for ; i < nptr; i++ {
+		if !hbits.morePointers() {
+			break // end of object
+		}
+		if hbits.isPointer() {
+			tmpbuf[i/8] |= 1 << (i % 8)
+		}
+		hbits = hbits.next()
+	}
+	return bitvector{int32(i), &tmpbuf[0]}
+}
diff --git a/src/runtime/histogram.go b/src/runtime/histogram.go
new file mode 100644
index 0000000..da4910d
--- /dev/null
+++ b/src/runtime/histogram.go
@@ -0,0 +1,172 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	// For the time histogram type, we use an HDR histogram.
+	// Values are placed in super-buckets based solely on the most
+	// significant set bit. Thus, super-buckets are power-of-2 sized.
+	// Values are then placed into sub-buckets based on the value of
+	// the next timeHistSubBucketBits most significant bits. Thus,
+	// sub-buckets are linear within a super-bucket.
+	//
+	// Therefore, the number of sub-buckets (timeHistNumSubBuckets)
+	// defines the error. This error may be computed as
+	// 1/timeHistNumSubBuckets*100%. For example, for 16 sub-buckets
+	// per super-bucket the error is approximately 6%.
+	//
+	// The number of super-buckets (timeHistNumSuperBuckets), on the
+	// other hand, defines the range. To reserve room for sub-buckets,
+	// bit timeHistSubBucketBits is the first bit considered for
+	// super-buckets, so super-bucket indices are adjusted accordingly.
+	//
+	// As an example, consider 45 super-buckets with 16 sub-buckets.
+	//
+	//    00110
+	//    ^----
+	//    │  ^
+	//    │  └---- Lowest 4 bits -> sub-bucket 6
+	//    └------- Bit 4 unset -> super-bucket 0
+	//
+	//    10110
+	//    ^----
+	//    │  ^
+	//    │  └---- Next 4 bits -> sub-bucket 6
+	//    └------- Bit 4 set -> super-bucket 1
+	//    100010
+	//    ^----^
+	//    │  ^ └-- Lower bits ignored
+	//    │  └---- Next 4 bits -> sub-bucket 1
+	//    └------- Bit 5 set -> super-bucket 2
+	//
+	// Following this pattern, bucket 45 will have the bit 48 set. We don't
+	// have any buckets for higher values, so the highest sub-bucket will
+	// contain values of 2^48-1 nanoseconds or approx. 3 days. This range is
+	// more than enough to handle durations produced by the runtime.
+	timeHistSubBucketBits   = 4
+	timeHistNumSubBuckets   = 1 << timeHistSubBucketBits
+	timeHistNumSuperBuckets = 45
+	timeHistTotalBuckets    = timeHistNumSuperBuckets*timeHistNumSubBuckets + 1
+)
+
+// timeHistogram represents a distribution of durations in
+// nanoseconds.
+//
+// The accuracy and range of the histogram is defined by the
+// timeHistSubBucketBits and timeHistNumSuperBuckets constants.
+//
+// It is an HDR histogram with exponentially-distributed
+// buckets and linearly distributed sub-buckets.
+//
+// Counts in the histogram are updated atomically, so it is safe
+// for concurrent use. It is also safe to read all the values
+// atomically.
+type timeHistogram struct {
+	counts [timeHistNumSuperBuckets * timeHistNumSubBuckets]uint64
+
+	// underflow counts all the times we got a negative duration
+	// sample. Because of how time works on some platforms, it's
+	// possible to measure negative durations. We could ignore them,
+	// but we record them anyway because it's better to have some
+	// signal that it's happening than just missing samples.
+	underflow uint64
+}
+
+// record adds the given duration to the distribution.
+func (h *timeHistogram) record(duration int64) {
+	if duration < 0 {
+		atomic.Xadd64(&h.underflow, 1)
+		return
+	}
+	// The index of the exponential bucket is just the index
+	// of the highest set bit adjusted for how many bits we
+	// use for the subbucket. Note that it's timeHistSubBucketsBits-1
+	// because we use the 0th bucket to hold values < timeHistNumSubBuckets.
+	var superBucket, subBucket uint
+	if duration >= timeHistNumSubBuckets {
+		// At this point, we know the duration value will always be
+		// at least timeHistSubBucketsBits long.
+		superBucket = uint(sys.Len64(uint64(duration))) - timeHistSubBucketBits
+		if superBucket*timeHistNumSubBuckets >= uint(len(h.counts)) {
+			// The bucket index we got is larger than what we support, so
+			// include this count in the highest bucket, which extends to
+			// infinity.
+			superBucket = timeHistNumSuperBuckets - 1
+			subBucket = timeHistNumSubBuckets - 1
+		} else {
+			// The linear subbucket index is just the timeHistSubBucketsBits
+			// bits after the top bit. To extract that value, shift down
+			// the duration such that we leave the top bit and the next bits
+			// intact, then extract the index.
+			subBucket = uint((duration >> (superBucket - 1)) % timeHistNumSubBuckets)
+		}
+	} else {
+		subBucket = uint(duration)
+	}
+	atomic.Xadd64(&h.counts[superBucket*timeHistNumSubBuckets+subBucket], 1)
+}
+
+const (
+	fInf    = 0x7FF0000000000000
+	fNegInf = 0xFFF0000000000000
+)
+
+func float64Inf() float64 {
+	inf := uint64(fInf)
+	return *(*float64)(unsafe.Pointer(&inf))
+}
+
+func float64NegInf() float64 {
+	inf := uint64(fNegInf)
+	return *(*float64)(unsafe.Pointer(&inf))
+}
+
+// timeHistogramMetricsBuckets generates a slice of boundaries for
+// the timeHistogram. These boundaries are represented in seconds,
+// not nanoseconds like the timeHistogram represents durations.
+func timeHistogramMetricsBuckets() []float64 {
+	b := make([]float64, timeHistTotalBuckets+1)
+	b[0] = float64NegInf()
+	for i := 0; i < timeHistNumSuperBuckets; i++ {
+		superBucketMin := uint64(0)
+		// The (inclusive) minimum for the first non-negative bucket is 0.
+		if i > 0 {
+			// The minimum for the second bucket will be
+			// 1 << timeHistSubBucketBits, indicating that all
+			// sub-buckets are represented by the next timeHistSubBucketBits
+			// bits.
+			// Thereafter, we shift up by 1 each time, so we can represent
+			// this pattern as (i-1)+timeHistSubBucketBits.
+			superBucketMin = uint64(1) << uint(i-1+timeHistSubBucketBits)
+		}
+		// subBucketShift is the amount that we need to shift the sub-bucket
+		// index to combine it with the bucketMin.
+		subBucketShift := uint(0)
+		if i > 1 {
+			// The first two super buckets are exact with respect to integers,
+			// so we'll never have to shift the sub-bucket index. Thereafter,
+			// we shift up by 1 with each subsequent bucket.
+			subBucketShift = uint(i - 2)
+		}
+		for j := 0; j < timeHistNumSubBuckets; j++ {
+			// j is the sub-bucket index. By shifting the index into position to
+			// combine with the bucket minimum, we obtain the minimum value for that
+			// sub-bucket.
+			subBucketMin := superBucketMin + (uint64(j) << subBucketShift)
+
+			// Convert the subBucketMin which is in nanoseconds to a float64 seconds value.
+			// These values will all be exactly representable by a float64.
+			b[i*timeHistNumSubBuckets+j+1] = float64(subBucketMin) / 1e9
+		}
+	}
+	b[len(b)-1] = float64Inf()
+	return b
+}
diff --git a/src/runtime/histogram_test.go b/src/runtime/histogram_test.go
new file mode 100644
index 0000000..dbc64fa
--- /dev/null
+++ b/src/runtime/histogram_test.go
@@ -0,0 +1,70 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math"
+	. "runtime"
+	"testing"
+)
+
+var dummyTimeHistogram TimeHistogram
+
+func TestTimeHistogram(t *testing.T) {
+	// We need to use a global dummy because this
+	// could get stack-allocated with a non-8-byte alignment.
+	// The result of this bad alignment is a segfault on
+	// 32-bit platforms when calling Record.
+	h := &dummyTimeHistogram
+
+	// Record exactly one sample in each bucket.
+	for i := 0; i < TimeHistNumSuperBuckets; i++ {
+		var base int64
+		if i > 0 {
+			base = int64(1) << (i + TimeHistSubBucketBits - 1)
+		}
+		for j := 0; j < TimeHistNumSubBuckets; j++ {
+			v := int64(j)
+			if i > 0 {
+				v <<= i - 1
+			}
+			h.Record(base + v)
+		}
+	}
+	// Hit the underflow bucket.
+	h.Record(int64(-1))
+
+	// Check to make sure there's exactly one count in each
+	// bucket.
+	for i := uint(0); i < TimeHistNumSuperBuckets; i++ {
+		for j := uint(0); j < TimeHistNumSubBuckets; j++ {
+			c, ok := h.Count(i, j)
+			if !ok {
+				t.Errorf("hit underflow bucket unexpectedly: (%d, %d)", i, j)
+			} else if c != 1 {
+				t.Errorf("bucket (%d, %d) has count that is not 1: %d", i, j, c)
+			}
+		}
+	}
+	c, ok := h.Count(TimeHistNumSuperBuckets, 0)
+	if ok {
+		t.Errorf("expected to hit underflow bucket: (%d, %d)", TimeHistNumSuperBuckets, 0)
+	}
+	if c != 1 {
+		t.Errorf("underflow bucket has count that is not 1: %d", c)
+	}
+
+	// Check overflow behavior.
+	// By hitting a high value, we should just be adding into the highest bucket.
+	h.Record(math.MaxInt64)
+	c, ok = h.Count(TimeHistNumSuperBuckets-1, TimeHistNumSubBuckets-1)
+	if !ok {
+		t.Error("hit underflow bucket in highest bucket unexpectedly")
+	} else if c != 2 {
+		t.Errorf("highest has count that is not 2: %d", c)
+	}
+
+	dummyTimeHistogram = TimeHistogram{}
+}
diff --git a/src/runtime/iface.go b/src/runtime/iface.go
new file mode 100644
index 0000000..0504b89
--- /dev/null
+++ b/src/runtime/iface.go
@@ -0,0 +1,565 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const itabInitSize = 512
+
+var (
+	itabLock      mutex                               // lock for accessing itab table
+	itabTable     = &itabTableInit                    // pointer to current table
+	itabTableInit = itabTableType{size: itabInitSize} // starter table
+)
+
+// Note: change the formula in the mallocgc call in itabAdd if you change these fields.
+type itabTableType struct {
+	size    uintptr             // length of entries array. Always a power of 2.
+	count   uintptr             // current number of filled entries.
+	entries [itabInitSize]*itab // really [size] large
+}
+
+func itabHashFunc(inter *interfacetype, typ *_type) uintptr {
+	// compiler has provided some good hash codes for us.
+	return uintptr(inter.typ.hash ^ typ.hash)
+}
+
+func getitab(inter *interfacetype, typ *_type, canfail bool) *itab {
+	if len(inter.mhdr) == 0 {
+		throw("internal error - misuse of itab")
+	}
+
+	// easy case
+	if typ.tflag&tflagUncommon == 0 {
+		if canfail {
+			return nil
+		}
+		name := inter.typ.nameOff(inter.mhdr[0].name)
+		panic(&TypeAssertionError{nil, typ, &inter.typ, name.name()})
+	}
+
+	var m *itab
+
+	// First, look in the existing table to see if we can find the itab we need.
+	// This is by far the most common case, so do it without locks.
+	// Use atomic to ensure we see any previous writes done by the thread
+	// that updates the itabTable field (with atomic.Storep in itabAdd).
+	t := (*itabTableType)(atomic.Loadp(unsafe.Pointer(&itabTable)))
+	if m = t.find(inter, typ); m != nil {
+		goto finish
+	}
+
+	// Not found.  Grab the lock and try again.
+	lock(&itabLock)
+	if m = itabTable.find(inter, typ); m != nil {
+		unlock(&itabLock)
+		goto finish
+	}
+
+	// Entry doesn't exist yet. Make a new entry & add it.
+	m = (*itab)(persistentalloc(unsafe.Sizeof(itab{})+uintptr(len(inter.mhdr)-1)*sys.PtrSize, 0, &memstats.other_sys))
+	m.inter = inter
+	m._type = typ
+	// The hash is used in type switches. However, compiler statically generates itab's
+	// for all interface/type pairs used in switches (which are added to itabTable
+	// in itabsinit). The dynamically-generated itab's never participate in type switches,
+	// and thus the hash is irrelevant.
+	// Note: m.hash is _not_ the hash used for the runtime itabTable hash table.
+	m.hash = 0
+	m.init()
+	itabAdd(m)
+	unlock(&itabLock)
+finish:
+	if m.fun[0] != 0 {
+		return m
+	}
+	if canfail {
+		return nil
+	}
+	// this can only happen if the conversion
+	// was already done once using the , ok form
+	// and we have a cached negative result.
+	// The cached result doesn't record which
+	// interface function was missing, so initialize
+	// the itab again to get the missing function name.
+	panic(&TypeAssertionError{concrete: typ, asserted: &inter.typ, missingMethod: m.init()})
+}
+
+// find finds the given interface/type pair in t.
+// Returns nil if the given interface/type pair isn't present.
+func (t *itabTableType) find(inter *interfacetype, typ *_type) *itab {
+	// Implemented using quadratic probing.
+	// Probe sequence is h(i) = h0 + i*(i+1)/2 mod 2^k.
+	// We're guaranteed to hit all table entries using this probe sequence.
+	mask := t.size - 1
+	h := itabHashFunc(inter, typ) & mask
+	for i := uintptr(1); ; i++ {
+		p := (**itab)(add(unsafe.Pointer(&t.entries), h*sys.PtrSize))
+		// Use atomic read here so if we see m != nil, we also see
+		// the initializations of the fields of m.
+		// m := *p
+		m := (*itab)(atomic.Loadp(unsafe.Pointer(p)))
+		if m == nil {
+			return nil
+		}
+		if m.inter == inter && m._type == typ {
+			return m
+		}
+		h += i
+		h &= mask
+	}
+}
+
+// itabAdd adds the given itab to the itab hash table.
+// itabLock must be held.
+func itabAdd(m *itab) {
+	// Bugs can lead to calling this while mallocing is set,
+	// typically because this is called while panicing.
+	// Crash reliably, rather than only when we need to grow
+	// the hash table.
+	if getg().m.mallocing != 0 {
+		throw("malloc deadlock")
+	}
+
+	t := itabTable
+	if t.count >= 3*(t.size/4) { // 75% load factor
+		// Grow hash table.
+		// t2 = new(itabTableType) + some additional entries
+		// We lie and tell malloc we want pointer-free memory because
+		// all the pointed-to values are not in the heap.
+		t2 := (*itabTableType)(mallocgc((2+2*t.size)*sys.PtrSize, nil, true))
+		t2.size = t.size * 2
+
+		// Copy over entries.
+		// Note: while copying, other threads may look for an itab and
+		// fail to find it. That's ok, they will then try to get the itab lock
+		// and as a consequence wait until this copying is complete.
+		iterate_itabs(t2.add)
+		if t2.count != t.count {
+			throw("mismatched count during itab table copy")
+		}
+		// Publish new hash table. Use an atomic write: see comment in getitab.
+		atomicstorep(unsafe.Pointer(&itabTable), unsafe.Pointer(t2))
+		// Adopt the new table as our own.
+		t = itabTable
+		// Note: the old table can be GC'ed here.
+	}
+	t.add(m)
+}
+
+// add adds the given itab to itab table t.
+// itabLock must be held.
+func (t *itabTableType) add(m *itab) {
+	// See comment in find about the probe sequence.
+	// Insert new itab in the first empty spot in the probe sequence.
+	mask := t.size - 1
+	h := itabHashFunc(m.inter, m._type) & mask
+	for i := uintptr(1); ; i++ {
+		p := (**itab)(add(unsafe.Pointer(&t.entries), h*sys.PtrSize))
+		m2 := *p
+		if m2 == m {
+			// A given itab may be used in more than one module
+			// and thanks to the way global symbol resolution works, the
+			// pointed-to itab may already have been inserted into the
+			// global 'hash'.
+			return
+		}
+		if m2 == nil {
+			// Use atomic write here so if a reader sees m, it also
+			// sees the correctly initialized fields of m.
+			// NoWB is ok because m is not in heap memory.
+			// *p = m
+			atomic.StorepNoWB(unsafe.Pointer(p), unsafe.Pointer(m))
+			t.count++
+			return
+		}
+		h += i
+		h &= mask
+	}
+}
+
+// init fills in the m.fun array with all the code pointers for
+// the m.inter/m._type pair. If the type does not implement the interface,
+// it sets m.fun[0] to 0 and returns the name of an interface function that is missing.
+// It is ok to call this multiple times on the same m, even concurrently.
+func (m *itab) init() string {
+	inter := m.inter
+	typ := m._type
+	x := typ.uncommon()
+
+	// both inter and typ have method sorted by name,
+	// and interface names are unique,
+	// so can iterate over both in lock step;
+	// the loop is O(ni+nt) not O(ni*nt).
+	ni := len(inter.mhdr)
+	nt := int(x.mcount)
+	xmhdr := (*[1 << 16]method)(add(unsafe.Pointer(x), uintptr(x.moff)))[:nt:nt]
+	j := 0
+	methods := (*[1 << 16]unsafe.Pointer)(unsafe.Pointer(&m.fun[0]))[:ni:ni]
+	var fun0 unsafe.Pointer
+imethods:
+	for k := 0; k < ni; k++ {
+		i := &inter.mhdr[k]
+		itype := inter.typ.typeOff(i.ityp)
+		name := inter.typ.nameOff(i.name)
+		iname := name.name()
+		ipkg := name.pkgPath()
+		if ipkg == "" {
+			ipkg = inter.pkgpath.name()
+		}
+		for ; j < nt; j++ {
+			t := &xmhdr[j]
+			tname := typ.nameOff(t.name)
+			if typ.typeOff(t.mtyp) == itype && tname.name() == iname {
+				pkgPath := tname.pkgPath()
+				if pkgPath == "" {
+					pkgPath = typ.nameOff(x.pkgpath).name()
+				}
+				if tname.isExported() || pkgPath == ipkg {
+					if m != nil {
+						ifn := typ.textOff(t.ifn)
+						if k == 0 {
+							fun0 = ifn // we'll set m.fun[0] at the end
+						} else {
+							methods[k] = ifn
+						}
+					}
+					continue imethods
+				}
+			}
+		}
+		// didn't find method
+		m.fun[0] = 0
+		return iname
+	}
+	m.fun[0] = uintptr(fun0)
+	return ""
+}
+
+func itabsinit() {
+	lockInit(&itabLock, lockRankItab)
+	lock(&itabLock)
+	for _, md := range activeModules() {
+		for _, i := range md.itablinks {
+			itabAdd(i)
+		}
+	}
+	unlock(&itabLock)
+}
+
+// panicdottypeE is called when doing an e.(T) conversion and the conversion fails.
+// have = the dynamic type we have.
+// want = the static type we're trying to convert to.
+// iface = the static type we're converting from.
+func panicdottypeE(have, want, iface *_type) {
+	panic(&TypeAssertionError{iface, have, want, ""})
+}
+
+// panicdottypeI is called when doing an i.(T) conversion and the conversion fails.
+// Same args as panicdottypeE, but "have" is the dynamic itab we have.
+func panicdottypeI(have *itab, want, iface *_type) {
+	var t *_type
+	if have != nil {
+		t = have._type
+	}
+	panicdottypeE(t, want, iface)
+}
+
+// panicnildottype is called when doing a i.(T) conversion and the interface i is nil.
+// want = the static type we're trying to convert to.
+func panicnildottype(want *_type) {
+	panic(&TypeAssertionError{nil, nil, want, ""})
+	// TODO: Add the static type we're converting from as well.
+	// It might generate a better error message.
+	// Just to match other nil conversion errors, we don't for now.
+}
+
+// The specialized convTx routines need a type descriptor to use when calling mallocgc.
+// We don't need the type to be exact, just to have the correct size, alignment, and pointer-ness.
+// However, when debugging, it'd be nice to have some indication in mallocgc where the types came from,
+// so we use named types here.
+// We then construct interface values of these types,
+// and then extract the type word to use as needed.
+type (
+	uint16InterfacePtr uint16
+	uint32InterfacePtr uint32
+	uint64InterfacePtr uint64
+	stringInterfacePtr string
+	sliceInterfacePtr  []byte
+)
+
+var (
+	uint16Eface interface{} = uint16InterfacePtr(0)
+	uint32Eface interface{} = uint32InterfacePtr(0)
+	uint64Eface interface{} = uint64InterfacePtr(0)
+	stringEface interface{} = stringInterfacePtr("")
+	sliceEface  interface{} = sliceInterfacePtr(nil)
+
+	uint16Type *_type = efaceOf(&uint16Eface)._type
+	uint32Type *_type = efaceOf(&uint32Eface)._type
+	uint64Type *_type = efaceOf(&uint64Eface)._type
+	stringType *_type = efaceOf(&stringEface)._type
+	sliceType  *_type = efaceOf(&sliceEface)._type
+)
+
+// The conv and assert functions below do very similar things.
+// The convXXX functions are guaranteed by the compiler to succeed.
+// The assertXXX functions may fail (either panicking or returning false,
+// depending on whether they are 1-result or 2-result).
+// The convXXX functions succeed on a nil input, whereas the assertXXX
+// functions fail on a nil input.
+
+func convT2E(t *_type, elem unsafe.Pointer) (e eface) {
+	if raceenabled {
+		raceReadObjectPC(t, elem, getcallerpc(), funcPC(convT2E))
+	}
+	if msanenabled {
+		msanread(elem, t.size)
+	}
+	x := mallocgc(t.size, t, true)
+	// TODO: We allocate a zeroed object only to overwrite it with actual data.
+	// Figure out how to avoid zeroing. Also below in convT2Eslice, convT2I, convT2Islice.
+	typedmemmove(t, x, elem)
+	e._type = t
+	e.data = x
+	return
+}
+
+func convT16(val uint16) (x unsafe.Pointer) {
+	if val < uint16(len(staticuint64s)) {
+		x = unsafe.Pointer(&staticuint64s[val])
+		if sys.BigEndian {
+			x = add(x, 6)
+		}
+	} else {
+		x = mallocgc(2, uint16Type, false)
+		*(*uint16)(x) = val
+	}
+	return
+}
+
+func convT32(val uint32) (x unsafe.Pointer) {
+	if val < uint32(len(staticuint64s)) {
+		x = unsafe.Pointer(&staticuint64s[val])
+		if sys.BigEndian {
+			x = add(x, 4)
+		}
+	} else {
+		x = mallocgc(4, uint32Type, false)
+		*(*uint32)(x) = val
+	}
+	return
+}
+
+func convT64(val uint64) (x unsafe.Pointer) {
+	if val < uint64(len(staticuint64s)) {
+		x = unsafe.Pointer(&staticuint64s[val])
+	} else {
+		x = mallocgc(8, uint64Type, false)
+		*(*uint64)(x) = val
+	}
+	return
+}
+
+func convTstring(val string) (x unsafe.Pointer) {
+	if val == "" {
+		x = unsafe.Pointer(&zeroVal[0])
+	} else {
+		x = mallocgc(unsafe.Sizeof(val), stringType, true)
+		*(*string)(x) = val
+	}
+	return
+}
+
+func convTslice(val []byte) (x unsafe.Pointer) {
+	// Note: this must work for any element type, not just byte.
+	if (*slice)(unsafe.Pointer(&val)).array == nil {
+		x = unsafe.Pointer(&zeroVal[0])
+	} else {
+		x = mallocgc(unsafe.Sizeof(val), sliceType, true)
+		*(*[]byte)(x) = val
+	}
+	return
+}
+
+func convT2Enoptr(t *_type, elem unsafe.Pointer) (e eface) {
+	if raceenabled {
+		raceReadObjectPC(t, elem, getcallerpc(), funcPC(convT2Enoptr))
+	}
+	if msanenabled {
+		msanread(elem, t.size)
+	}
+	x := mallocgc(t.size, t, false)
+	memmove(x, elem, t.size)
+	e._type = t
+	e.data = x
+	return
+}
+
+func convT2I(tab *itab, elem unsafe.Pointer) (i iface) {
+	t := tab._type
+	if raceenabled {
+		raceReadObjectPC(t, elem, getcallerpc(), funcPC(convT2I))
+	}
+	if msanenabled {
+		msanread(elem, t.size)
+	}
+	x := mallocgc(t.size, t, true)
+	typedmemmove(t, x, elem)
+	i.tab = tab
+	i.data = x
+	return
+}
+
+func convT2Inoptr(tab *itab, elem unsafe.Pointer) (i iface) {
+	t := tab._type
+	if raceenabled {
+		raceReadObjectPC(t, elem, getcallerpc(), funcPC(convT2Inoptr))
+	}
+	if msanenabled {
+		msanread(elem, t.size)
+	}
+	x := mallocgc(t.size, t, false)
+	memmove(x, elem, t.size)
+	i.tab = tab
+	i.data = x
+	return
+}
+
+func convI2I(inter *interfacetype, i iface) (r iface) {
+	tab := i.tab
+	if tab == nil {
+		return
+	}
+	if tab.inter == inter {
+		r.tab = tab
+		r.data = i.data
+		return
+	}
+	r.tab = getitab(inter, tab._type, false)
+	r.data = i.data
+	return
+}
+
+func assertI2I(inter *interfacetype, i iface) (r iface) {
+	tab := i.tab
+	if tab == nil {
+		// explicit conversions require non-nil interface value.
+		panic(&TypeAssertionError{nil, nil, &inter.typ, ""})
+	}
+	if tab.inter == inter {
+		r.tab = tab
+		r.data = i.data
+		return
+	}
+	r.tab = getitab(inter, tab._type, false)
+	r.data = i.data
+	return
+}
+
+func assertI2I2(inter *interfacetype, i iface) (r iface, b bool) {
+	tab := i.tab
+	if tab == nil {
+		return
+	}
+	if tab.inter != inter {
+		tab = getitab(inter, tab._type, true)
+		if tab == nil {
+			return
+		}
+	}
+	r.tab = tab
+	r.data = i.data
+	b = true
+	return
+}
+
+func assertE2I(inter *interfacetype, e eface) (r iface) {
+	t := e._type
+	if t == nil {
+		// explicit conversions require non-nil interface value.
+		panic(&TypeAssertionError{nil, nil, &inter.typ, ""})
+	}
+	r.tab = getitab(inter, t, false)
+	r.data = e.data
+	return
+}
+
+func assertE2I2(inter *interfacetype, e eface) (r iface, b bool) {
+	t := e._type
+	if t == nil {
+		return
+	}
+	tab := getitab(inter, t, true)
+	if tab == nil {
+		return
+	}
+	r.tab = tab
+	r.data = e.data
+	b = true
+	return
+}
+
+//go:linkname reflect_ifaceE2I reflect.ifaceE2I
+func reflect_ifaceE2I(inter *interfacetype, e eface, dst *iface) {
+	*dst = assertE2I(inter, e)
+}
+
+//go:linkname reflectlite_ifaceE2I internal/reflectlite.ifaceE2I
+func reflectlite_ifaceE2I(inter *interfacetype, e eface, dst *iface) {
+	*dst = assertE2I(inter, e)
+}
+
+func iterate_itabs(fn func(*itab)) {
+	// Note: only runs during stop the world or with itabLock held,
+	// so no other locks/atomics needed.
+	t := itabTable
+	for i := uintptr(0); i < t.size; i++ {
+		m := *(**itab)(add(unsafe.Pointer(&t.entries), i*sys.PtrSize))
+		if m != nil {
+			fn(m)
+		}
+	}
+}
+
+// staticuint64s is used to avoid allocating in convTx for small integer values.
+var staticuint64s = [...]uint64{
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+	0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
+	0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
+	0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
+	0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
+	0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
+	0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
+	0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f,
+	0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57,
+	0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
+	0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67,
+	0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f,
+	0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77,
+	0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,
+	0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+	0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+	0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+	0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
+	0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
+	0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
+	0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+	0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
+	0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
+	0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+	0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
+	0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
+	0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
+	0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+	0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
+	0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,
+}
diff --git a/src/runtime/iface_test.go b/src/runtime/iface_test.go
new file mode 100644
index 0000000..4fab6c9
--- /dev/null
+++ b/src/runtime/iface_test.go
@@ -0,0 +1,439 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+)
+
+type I1 interface {
+	Method1()
+}
+
+type I2 interface {
+	Method1()
+	Method2()
+}
+
+type TS uint16
+type TM uintptr
+type TL [2]uintptr
+
+func (TS) Method1() {}
+func (TS) Method2() {}
+func (TM) Method1() {}
+func (TM) Method2() {}
+func (TL) Method1() {}
+func (TL) Method2() {}
+
+type T8 uint8
+type T16 uint16
+type T32 uint32
+type T64 uint64
+type Tstr string
+type Tslice []byte
+
+func (T8) Method1()     {}
+func (T16) Method1()    {}
+func (T32) Method1()    {}
+func (T64) Method1()    {}
+func (Tstr) Method1()   {}
+func (Tslice) Method1() {}
+
+var (
+	e  interface{}
+	e_ interface{}
+	i1 I1
+	i2 I2
+	ts TS
+	tm TM
+	tl TL
+	ok bool
+)
+
+// Issue 9370
+func TestCmpIfaceConcreteAlloc(t *testing.T) {
+	if runtime.Compiler != "gc" {
+		t.Skip("skipping on non-gc compiler")
+	}
+
+	n := testing.AllocsPerRun(1, func() {
+		_ = e == ts
+		_ = i1 == ts
+		_ = e == 1
+	})
+
+	if n > 0 {
+		t.Fatalf("iface cmp allocs=%v; want 0", n)
+	}
+}
+
+func BenchmarkEqEfaceConcrete(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		_ = e == ts
+	}
+}
+
+func BenchmarkEqIfaceConcrete(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		_ = i1 == ts
+	}
+}
+
+func BenchmarkNeEfaceConcrete(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		_ = e != ts
+	}
+}
+
+func BenchmarkNeIfaceConcrete(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		_ = i1 != ts
+	}
+}
+
+func BenchmarkConvT2EByteSized(b *testing.B) {
+	b.Run("bool", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			e = yes
+		}
+	})
+	b.Run("uint8", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			e = eight8
+		}
+	})
+}
+
+func BenchmarkConvT2ESmall(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		e = ts
+	}
+}
+
+func BenchmarkConvT2EUintptr(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		e = tm
+	}
+}
+
+func BenchmarkConvT2ELarge(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		e = tl
+	}
+}
+
+func BenchmarkConvT2ISmall(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		i1 = ts
+	}
+}
+
+func BenchmarkConvT2IUintptr(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		i1 = tm
+	}
+}
+
+func BenchmarkConvT2ILarge(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		i1 = tl
+	}
+}
+
+func BenchmarkConvI2E(b *testing.B) {
+	i2 = tm
+	for i := 0; i < b.N; i++ {
+		e = i2
+	}
+}
+
+func BenchmarkConvI2I(b *testing.B) {
+	i2 = tm
+	for i := 0; i < b.N; i++ {
+		i1 = i2
+	}
+}
+
+func BenchmarkAssertE2T(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		tm = e.(TM)
+	}
+}
+
+func BenchmarkAssertE2TLarge(b *testing.B) {
+	e = tl
+	for i := 0; i < b.N; i++ {
+		tl = e.(TL)
+	}
+}
+
+func BenchmarkAssertE2I(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		i1 = e.(I1)
+	}
+}
+
+func BenchmarkAssertI2T(b *testing.B) {
+	i1 = tm
+	for i := 0; i < b.N; i++ {
+		tm = i1.(TM)
+	}
+}
+
+func BenchmarkAssertI2I(b *testing.B) {
+	i1 = tm
+	for i := 0; i < b.N; i++ {
+		i2 = i1.(I2)
+	}
+}
+
+func BenchmarkAssertI2E(b *testing.B) {
+	i1 = tm
+	for i := 0; i < b.N; i++ {
+		e = i1.(interface{})
+	}
+}
+
+func BenchmarkAssertE2E(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		e_ = e
+	}
+}
+
+func BenchmarkAssertE2T2(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		tm, ok = e.(TM)
+	}
+}
+
+func BenchmarkAssertE2T2Blank(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		_, ok = e.(TM)
+	}
+}
+
+func BenchmarkAssertI2E2(b *testing.B) {
+	i1 = tm
+	for i := 0; i < b.N; i++ {
+		e, ok = i1.(interface{})
+	}
+}
+
+func BenchmarkAssertI2E2Blank(b *testing.B) {
+	i1 = tm
+	for i := 0; i < b.N; i++ {
+		_, ok = i1.(interface{})
+	}
+}
+
+func BenchmarkAssertE2E2(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		e_, ok = e.(interface{})
+	}
+}
+
+func BenchmarkAssertE2E2Blank(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		_, ok = e.(interface{})
+	}
+}
+
+func TestNonEscapingConvT2E(t *testing.T) {
+	m := make(map[interface{}]bool)
+	m[42] = true
+	if !m[42] {
+		t.Fatalf("42 is not present in the map")
+	}
+	if m[0] {
+		t.Fatalf("0 is present in the map")
+	}
+
+	n := testing.AllocsPerRun(1000, func() {
+		if m[0] {
+			t.Fatalf("0 is present in the map")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestNonEscapingConvT2I(t *testing.T) {
+	m := make(map[I1]bool)
+	m[TM(42)] = true
+	if !m[TM(42)] {
+		t.Fatalf("42 is not present in the map")
+	}
+	if m[TM(0)] {
+		t.Fatalf("0 is present in the map")
+	}
+
+	n := testing.AllocsPerRun(1000, func() {
+		if m[TM(0)] {
+			t.Fatalf("0 is present in the map")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestZeroConvT2x(t *testing.T) {
+	tests := []struct {
+		name string
+		fn   func()
+	}{
+		{name: "E8", fn: func() { e = eight8 }},  // any byte-sized value does not allocate
+		{name: "E16", fn: func() { e = zero16 }}, // zero values do not allocate
+		{name: "E32", fn: func() { e = zero32 }},
+		{name: "E64", fn: func() { e = zero64 }},
+		{name: "Estr", fn: func() { e = zerostr }},
+		{name: "Eslice", fn: func() { e = zeroslice }},
+		{name: "Econstflt", fn: func() { e = 99.0 }}, // constants do not allocate
+		{name: "Econststr", fn: func() { e = "change" }},
+		{name: "I8", fn: func() { i1 = eight8I }},
+		{name: "I16", fn: func() { i1 = zero16I }},
+		{name: "I32", fn: func() { i1 = zero32I }},
+		{name: "I64", fn: func() { i1 = zero64I }},
+		{name: "Istr", fn: func() { i1 = zerostrI }},
+		{name: "Islice", fn: func() { i1 = zerosliceI }},
+	}
+
+	for _, test := range tests {
+		t.Run(test.name, func(t *testing.T) {
+			n := testing.AllocsPerRun(1000, test.fn)
+			if n != 0 {
+				t.Errorf("want zero allocs, got %v", n)
+			}
+		})
+	}
+}
+
+var (
+	eight8  uint8 = 8
+	eight8I T8    = 8
+	yes     bool  = true
+
+	zero16     uint16 = 0
+	zero16I    T16    = 0
+	one16      uint16 = 1
+	thousand16 uint16 = 1000
+
+	zero32     uint32 = 0
+	zero32I    T32    = 0
+	one32      uint32 = 1
+	thousand32 uint32 = 1000
+
+	zero64     uint64 = 0
+	zero64I    T64    = 0
+	one64      uint64 = 1
+	thousand64 uint64 = 1000
+
+	zerostr  string = ""
+	zerostrI Tstr   = ""
+	nzstr    string = "abc"
+
+	zeroslice  []byte = nil
+	zerosliceI Tslice = nil
+	nzslice    []byte = []byte("abc")
+
+	zerobig [512]byte
+	nzbig   [512]byte = [512]byte{511: 1}
+)
+
+func BenchmarkConvT2Ezero(b *testing.B) {
+	b.Run("zero", func(b *testing.B) {
+		b.Run("16", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zero16
+			}
+		})
+		b.Run("32", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zero32
+			}
+		})
+		b.Run("64", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zero64
+			}
+		})
+		b.Run("str", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zerostr
+			}
+		})
+		b.Run("slice", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zeroslice
+			}
+		})
+		b.Run("big", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zerobig
+			}
+		})
+	})
+	b.Run("nonzero", func(b *testing.B) {
+		b.Run("str", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = nzstr
+			}
+		})
+		b.Run("slice", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = nzslice
+			}
+		})
+		b.Run("big", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = nzbig
+			}
+		})
+	})
+	b.Run("smallint", func(b *testing.B) {
+		b.Run("16", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = one16
+			}
+		})
+		b.Run("32", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = one32
+			}
+		})
+		b.Run("64", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = one64
+			}
+		})
+	})
+	b.Run("largeint", func(b *testing.B) {
+		b.Run("16", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = thousand16
+			}
+		})
+		b.Run("32", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = thousand32
+			}
+		})
+		b.Run("64", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = thousand64
+			}
+		})
+	})
+}
diff --git a/src/runtime/internal/atomic/asm_386.s b/src/runtime/internal/atomic/asm_386.s
new file mode 100644
index 0000000..d82faef
--- /dev/null
+++ b/src/runtime/internal/atomic/asm_386.s
@@ -0,0 +1,261 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "funcdata.h"
+
+// bool Cas(int32 *val, int32 old, int32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	}else
+//		return 0;
+TEXT ·Cas(SB), NOSPLIT, $0-13
+	MOVL	ptr+0(FP), BX
+	MOVL	old+4(FP), AX
+	MOVL	new+8(FP), CX
+	LOCK
+	CMPXCHGL	CX, 0(BX)
+	SETEQ	ret+12(FP)
+	RET
+
+TEXT ·Casuintptr(SB), NOSPLIT, $0-13
+	JMP	·Cas(SB)
+
+TEXT ·CasRel(SB), NOSPLIT, $0-13
+	JMP	·Cas(SB)
+
+TEXT ·Loaduintptr(SB), NOSPLIT, $0-8
+	JMP	·Load(SB)
+
+TEXT ·Loaduint(SB), NOSPLIT, $0-8
+	JMP	·Load(SB)
+
+TEXT ·Storeuintptr(SB), NOSPLIT, $0-8
+	JMP	·Store(SB)
+
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-12
+	JMP	·Xadd(SB)
+
+TEXT ·Loadint64(SB), NOSPLIT, $0-12
+	JMP	·Load64(SB)
+
+TEXT ·Xaddint64(SB), NOSPLIT, $0-20
+	JMP	·Xadd64(SB)
+
+// bool ·Cas64(uint64 *val, uint64 old, uint64 new)
+// Atomically:
+//	if(*val == *old){
+//		*val = new;
+//		return 1;
+//	} else {
+//		return 0;
+//	}
+TEXT ·Cas64(SB), NOSPLIT, $0-21
+	NO_LOCAL_POINTERS
+	MOVL	ptr+0(FP), BP
+	TESTL	$7, BP
+	JZ	2(PC)
+	CALL	·panicUnaligned(SB)
+	MOVL	old_lo+4(FP), AX
+	MOVL	old_hi+8(FP), DX
+	MOVL	new_lo+12(FP), BX
+	MOVL	new_hi+16(FP), CX
+	LOCK
+	CMPXCHG8B	0(BP)
+	SETEQ	ret+20(FP)
+	RET
+
+// bool Casp1(void **p, void *old, void *new)
+// Atomically:
+//	if(*p == old){
+//		*p = new;
+//		return 1;
+//	}else
+//		return 0;
+TEXT ·Casp1(SB), NOSPLIT, $0-13
+	MOVL	ptr+0(FP), BX
+	MOVL	old+4(FP), AX
+	MOVL	new+8(FP), CX
+	LOCK
+	CMPXCHGL	CX, 0(BX)
+	SETEQ	ret+12(FP)
+	RET
+
+// uint32 Xadd(uint32 volatile *val, int32 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd(SB), NOSPLIT, $0-12
+	MOVL	ptr+0(FP), BX
+	MOVL	delta+4(FP), AX
+	MOVL	AX, CX
+	LOCK
+	XADDL	AX, 0(BX)
+	ADDL	CX, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT ·Xadd64(SB), NOSPLIT, $0-20
+	NO_LOCAL_POINTERS
+	// no XADDQ so use CMPXCHG8B loop
+	MOVL	ptr+0(FP), BP
+	TESTL	$7, BP
+	JZ	2(PC)
+	CALL	·panicUnaligned(SB)
+	// DI:SI = delta
+	MOVL	delta_lo+4(FP), SI
+	MOVL	delta_hi+8(FP), DI
+	// DX:AX = *addr
+	MOVL	0(BP), AX
+	MOVL	4(BP), DX
+addloop:
+	// CX:BX = DX:AX (*addr) + DI:SI (delta)
+	MOVL	AX, BX
+	MOVL	DX, CX
+	ADDL	SI, BX
+	ADCL	DI, CX
+
+	// if *addr == DX:AX {
+	//	*addr = CX:BX
+	// } else {
+	//	DX:AX = *addr
+	// }
+	// all in one instruction
+	LOCK
+	CMPXCHG8B	0(BP)
+
+	JNZ	addloop
+
+	// success
+	// return CX:BX
+	MOVL	BX, ret_lo+12(FP)
+	MOVL	CX, ret_hi+16(FP)
+	RET
+
+TEXT ·Xchg(SB), NOSPLIT, $0-12
+	MOVL	ptr+0(FP), BX
+	MOVL	new+4(FP), AX
+	XCHGL	AX, 0(BX)
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-12
+	JMP	·Xchg(SB)
+
+TEXT ·Xchg64(SB),NOSPLIT,$0-20
+	NO_LOCAL_POINTERS
+	// no XCHGQ so use CMPXCHG8B loop
+	MOVL	ptr+0(FP), BP
+	TESTL	$7, BP
+	JZ	2(PC)
+	CALL	·panicUnaligned(SB)
+	// CX:BX = new
+	MOVL	new_lo+4(FP), BX
+	MOVL	new_hi+8(FP), CX
+	// DX:AX = *addr
+	MOVL	0(BP), AX
+	MOVL	4(BP), DX
+swaploop:
+	// if *addr == DX:AX
+	//	*addr = CX:BX
+	// else
+	//	DX:AX = *addr
+	// all in one instruction
+	LOCK
+	CMPXCHG8B	0(BP)
+	JNZ	swaploop
+
+	// success
+	// return DX:AX
+	MOVL	AX, ret_lo+12(FP)
+	MOVL	DX, ret_hi+16(FP)
+	RET
+
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), BX
+	MOVL	val+4(FP), AX
+	XCHGL	AX, 0(BX)
+	RET
+
+TEXT ·Store(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), BX
+	MOVL	val+4(FP), AX
+	XCHGL	AX, 0(BX)
+	RET
+
+TEXT ·StoreRel(SB), NOSPLIT, $0-8
+	JMP	·Store(SB)
+
+TEXT runtime∕internal∕atomic·StoreReluintptr(SB), NOSPLIT, $0-8
+	JMP	runtime∕internal∕atomic·Store(SB)
+
+// uint64 atomicload64(uint64 volatile* addr);
+TEXT ·Load64(SB), NOSPLIT, $0-12
+	NO_LOCAL_POINTERS
+	MOVL	ptr+0(FP), AX
+	TESTL	$7, AX
+	JZ	2(PC)
+	CALL	·panicUnaligned(SB)
+	MOVQ	(AX), M0
+	MOVQ	M0, ret+4(FP)
+	EMMS
+	RET
+
+// void ·Store64(uint64 volatile* addr, uint64 v);
+TEXT ·Store64(SB), NOSPLIT, $0-12
+	NO_LOCAL_POINTERS
+	MOVL	ptr+0(FP), AX
+	TESTL	$7, AX
+	JZ	2(PC)
+	CALL	·panicUnaligned(SB)
+	// MOVQ and EMMS were introduced on the Pentium MMX.
+	MOVQ	val+4(FP), M0
+	MOVQ	M0, (AX)
+	EMMS
+	// This is essentially a no-op, but it provides required memory fencing.
+	// It can be replaced with MFENCE, but MFENCE was introduced only on the Pentium4 (SSE2).
+	XORL	AX, AX
+	LOCK
+	XADDL	AX, (SP)
+	RET
+
+// void	·Or8(byte volatile*, byte);
+TEXT ·Or8(SB), NOSPLIT, $0-5
+	MOVL	ptr+0(FP), AX
+	MOVB	val+4(FP), BX
+	LOCK
+	ORB	BX, (AX)
+	RET
+
+// void	·And8(byte volatile*, byte);
+TEXT ·And8(SB), NOSPLIT, $0-5
+	MOVL	ptr+0(FP), AX
+	MOVB	val+4(FP), BX
+	LOCK
+	ANDB	BX, (AX)
+	RET
+
+TEXT ·Store8(SB), NOSPLIT, $0-5
+	MOVL	ptr+0(FP), BX
+	MOVB	val+4(FP), AX
+	XCHGB	AX, 0(BX)
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), AX
+	MOVL	val+4(FP), BX
+	LOCK
+	ORL	BX, (AX)
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), AX
+	MOVL	val+4(FP), BX
+	LOCK
+	ANDL	BX, (AX)
+	RET
diff --git a/src/runtime/internal/atomic/asm_amd64.s b/src/runtime/internal/atomic/asm_amd64.s
new file mode 100644
index 0000000..2cf7c55
--- /dev/null
+++ b/src/runtime/internal/atomic/asm_amd64.s
@@ -0,0 +1,187 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Note: some of these functions are semantically inlined
+// by the compiler (in src/cmd/compile/internal/gc/ssa.go).
+
+#include "textflag.h"
+
+// bool Cas(int32 *val, int32 old, int32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT runtime∕internal∕atomic·Cas(SB),NOSPLIT,$0-17
+	MOVQ	ptr+0(FP), BX
+	MOVL	old+8(FP), AX
+	MOVL	new+12(FP), CX
+	LOCK
+	CMPXCHGL	CX, 0(BX)
+	SETEQ	ret+16(FP)
+	RET
+
+// bool	runtime∕internal∕atomic·Cas64(uint64 *val, uint64 old, uint64 new)
+// Atomically:
+//	if(*val == *old){
+//		*val = new;
+//		return 1;
+//	} else {
+//		return 0;
+//	}
+TEXT runtime∕internal∕atomic·Cas64(SB), NOSPLIT, $0-25
+	MOVQ	ptr+0(FP), BX
+	MOVQ	old+8(FP), AX
+	MOVQ	new+16(FP), CX
+	LOCK
+	CMPXCHGQ	CX, 0(BX)
+	SETEQ	ret+24(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Casuintptr(SB), NOSPLIT, $0-25
+	JMP	runtime∕internal∕atomic·Cas64(SB)
+
+TEXT runtime∕internal∕atomic·CasRel(SB), NOSPLIT, $0-17
+	JMP	runtime∕internal∕atomic·Cas(SB)
+
+TEXT runtime∕internal∕atomic·Loaduintptr(SB), NOSPLIT, $0-16
+	JMP	runtime∕internal∕atomic·Load64(SB)
+
+TEXT runtime∕internal∕atomic·Loaduint(SB), NOSPLIT, $0-16
+	JMP	runtime∕internal∕atomic·Load64(SB)
+
+TEXT runtime∕internal∕atomic·Storeuintptr(SB), NOSPLIT, $0-16
+	JMP	runtime∕internal∕atomic·Store64(SB)
+
+TEXT runtime∕internal∕atomic·Loadint64(SB), NOSPLIT, $0-16
+	JMP	runtime∕internal∕atomic·Load64(SB)
+
+TEXT runtime∕internal∕atomic·Xaddint64(SB), NOSPLIT, $0-24
+	JMP	runtime∕internal∕atomic·Xadd64(SB)
+
+// bool Casp1(void **val, void *old, void *new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT runtime∕internal∕atomic·Casp1(SB), NOSPLIT, $0-25
+	MOVQ	ptr+0(FP), BX
+	MOVQ	old+8(FP), AX
+	MOVQ	new+16(FP), CX
+	LOCK
+	CMPXCHGQ	CX, 0(BX)
+	SETEQ	ret+24(FP)
+	RET
+
+// uint32 Xadd(uint32 volatile *val, int32 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT runtime∕internal∕atomic·Xadd(SB), NOSPLIT, $0-20
+	MOVQ	ptr+0(FP), BX
+	MOVL	delta+8(FP), AX
+	MOVL	AX, CX
+	LOCK
+	XADDL	AX, 0(BX)
+	ADDL	CX, AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xadd64(SB), NOSPLIT, $0-24
+	MOVQ	ptr+0(FP), BX
+	MOVQ	delta+8(FP), AX
+	MOVQ	AX, CX
+	LOCK
+	XADDQ	AX, 0(BX)
+	ADDQ	CX, AX
+	MOVQ	AX, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xadduintptr(SB), NOSPLIT, $0-24
+	JMP	runtime∕internal∕atomic·Xadd64(SB)
+
+TEXT runtime∕internal∕atomic·Xchg(SB), NOSPLIT, $0-20
+	MOVQ	ptr+0(FP), BX
+	MOVL	new+8(FP), AX
+	XCHGL	AX, 0(BX)
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xchg64(SB), NOSPLIT, $0-24
+	MOVQ	ptr+0(FP), BX
+	MOVQ	new+8(FP), AX
+	XCHGQ	AX, 0(BX)
+	MOVQ	AX, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xchguintptr(SB), NOSPLIT, $0-24
+	JMP	runtime∕internal∕atomic·Xchg64(SB)
+
+TEXT runtime∕internal∕atomic·StorepNoWB(SB), NOSPLIT, $0-16
+	MOVQ	ptr+0(FP), BX
+	MOVQ	val+8(FP), AX
+	XCHGQ	AX, 0(BX)
+	RET
+
+TEXT runtime∕internal∕atomic·Store(SB), NOSPLIT, $0-12
+	MOVQ	ptr+0(FP), BX
+	MOVL	val+8(FP), AX
+	XCHGL	AX, 0(BX)
+	RET
+
+TEXT runtime∕internal∕atomic·StoreRel(SB), NOSPLIT, $0-12
+	JMP	runtime∕internal∕atomic·Store(SB)
+
+TEXT runtime∕internal∕atomic·StoreRel64(SB), NOSPLIT, $0-16
+	JMP	runtime∕internal∕atomic·Store64(SB)
+
+TEXT runtime∕internal∕atomic·StoreReluintptr(SB), NOSPLIT, $0-16
+	JMP	runtime∕internal∕atomic·Store64(SB)
+
+TEXT runtime∕internal∕atomic·Store8(SB), NOSPLIT, $0-9
+	MOVQ	ptr+0(FP), BX
+	MOVB	val+8(FP), AX
+	XCHGB	AX, 0(BX)
+	RET
+
+TEXT runtime∕internal∕atomic·Store64(SB), NOSPLIT, $0-16
+	MOVQ	ptr+0(FP), BX
+	MOVQ	val+8(FP), AX
+	XCHGQ	AX, 0(BX)
+	RET
+
+// void	runtime∕internal∕atomic·Or8(byte volatile*, byte);
+TEXT runtime∕internal∕atomic·Or8(SB), NOSPLIT, $0-9
+	MOVQ	ptr+0(FP), AX
+	MOVB	val+8(FP), BX
+	LOCK
+	ORB	BX, (AX)
+	RET
+
+// void	runtime∕internal∕atomic·And8(byte volatile*, byte);
+TEXT runtime∕internal∕atomic·And8(SB), NOSPLIT, $0-9
+	MOVQ	ptr+0(FP), AX
+	MOVB	val+8(FP), BX
+	LOCK
+	ANDB	BX, (AX)
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT runtime∕internal∕atomic·Or(SB), NOSPLIT, $0-12
+	MOVQ	ptr+0(FP), AX
+	MOVL	val+8(FP), BX
+	LOCK
+	ORL	BX, (AX)
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT runtime∕internal∕atomic·And(SB), NOSPLIT, $0-12
+	MOVQ	ptr+0(FP), AX
+	MOVL	val+8(FP), BX
+	LOCK
+	ANDL	BX, (AX)
+	RET
diff --git a/src/runtime/internal/atomic/asm_arm.s b/src/runtime/internal/atomic/asm_arm.s
new file mode 100644
index 0000000..274925e
--- /dev/null
+++ b/src/runtime/internal/atomic/asm_arm.s
@@ -0,0 +1,284 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "funcdata.h"
+
+// bool armcas(int32 *val, int32 old, int32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	}else
+//		return 0;
+//
+// To implement ·cas in sys_$GOOS_arm.s
+// using the native instructions, use:
+//
+//	TEXT ·cas(SB),NOSPLIT,$0
+//		B	·armcas(SB)
+//
+TEXT ·armcas(SB),NOSPLIT,$0-13
+	MOVW	ptr+0(FP), R1
+	MOVW	old+4(FP), R2
+	MOVW	new+8(FP), R3
+casl:
+	LDREX	(R1), R0
+	CMP	R0, R2
+	BNE	casfail
+
+	MOVB	runtime·goarm(SB), R8
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISHST
+
+	STREX	R3, (R1), R0
+	CMP	$0, R0
+	BNE	casl
+	MOVW	$1, R0
+
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISH
+
+	MOVB	R0, ret+12(FP)
+	RET
+casfail:
+	MOVW	$0, R0
+	MOVB	R0, ret+12(FP)
+	RET
+
+// stubs
+
+TEXT ·Loadp(SB),NOSPLIT|NOFRAME,$0-8
+	B	·Load(SB)
+
+TEXT ·LoadAcq(SB),NOSPLIT|NOFRAME,$0-8
+	B	·Load(SB)
+
+TEXT ·LoadAcquintptr(SB),NOSPLIT|NOFRAME,$0-8
+	B 	·Load(SB)
+
+TEXT ·Casuintptr(SB),NOSPLIT,$0-13
+	B	·Cas(SB)
+
+TEXT ·Casp1(SB),NOSPLIT,$0-13
+	B	·Cas(SB)
+
+TEXT ·CasRel(SB),NOSPLIT,$0-13
+	B	·Cas(SB)
+
+TEXT ·Loaduintptr(SB),NOSPLIT,$0-8
+	B	·Load(SB)
+
+TEXT ·Loaduint(SB),NOSPLIT,$0-8
+	B	·Load(SB)
+
+TEXT ·Storeuintptr(SB),NOSPLIT,$0-8
+	B	·Store(SB)
+
+TEXT ·StorepNoWB(SB),NOSPLIT,$0-8
+	B	·Store(SB)
+
+TEXT ·StoreRel(SB),NOSPLIT,$0-8
+	B	·Store(SB)
+
+TEXT ·StoreReluintptr(SB),NOSPLIT,$0-8
+	B	·Store(SB)
+
+TEXT ·Xadduintptr(SB),NOSPLIT,$0-12
+	B	·Xadd(SB)
+
+TEXT ·Loadint64(SB),NOSPLIT,$0-12
+	B	·Load64(SB)
+
+TEXT ·Xaddint64(SB),NOSPLIT,$0-20
+	B	·Xadd64(SB)
+
+// 64-bit atomics
+// The native ARM implementations use LDREXD/STREXD, which are
+// available on ARMv6k or later. We use them only on ARMv7.
+// On older ARM, we use Go implementations which simulate 64-bit
+// atomics with locks.
+
+TEXT armCas64<>(SB),NOSPLIT,$0-21
+	// addr is already in R1
+	MOVW	old_lo+4(FP), R2
+	MOVW	old_hi+8(FP), R3
+	MOVW	new_lo+12(FP), R4
+	MOVW	new_hi+16(FP), R5
+cas64loop:
+	LDREXD	(R1), R6	// loads R6 and R7
+	CMP	R2, R6
+	BNE	cas64fail
+	CMP	R3, R7
+	BNE	cas64fail
+
+	DMB	MB_ISHST
+
+	STREXD	R4, (R1), R0	// stores R4 and R5
+	CMP	$0, R0
+	BNE	cas64loop
+	MOVW	$1, R0
+
+	DMB	MB_ISH
+
+	MOVBU	R0, swapped+20(FP)
+	RET
+cas64fail:
+	MOVW	$0, R0
+	MOVBU	R0, swapped+20(FP)
+	RET
+
+TEXT armXadd64<>(SB),NOSPLIT,$0-20
+	// addr is already in R1
+	MOVW	delta_lo+4(FP), R2
+	MOVW	delta_hi+8(FP), R3
+
+add64loop:
+	LDREXD	(R1), R4	// loads R4 and R5
+	ADD.S	R2, R4
+	ADC	R3, R5
+
+	DMB	MB_ISHST
+
+	STREXD	R4, (R1), R0	// stores R4 and R5
+	CMP	$0, R0
+	BNE	add64loop
+
+	DMB	MB_ISH
+
+	MOVW	R4, new_lo+12(FP)
+	MOVW	R5, new_hi+16(FP)
+	RET
+
+TEXT armXchg64<>(SB),NOSPLIT,$0-20
+	// addr is already in R1
+	MOVW	new_lo+4(FP), R2
+	MOVW	new_hi+8(FP), R3
+
+swap64loop:
+	LDREXD	(R1), R4	// loads R4 and R5
+
+	DMB	MB_ISHST
+
+	STREXD	R2, (R1), R0	// stores R2 and R3
+	CMP	$0, R0
+	BNE	swap64loop
+
+	DMB	MB_ISH
+
+	MOVW	R4, old_lo+12(FP)
+	MOVW	R5, old_hi+16(FP)
+	RET
+
+TEXT armLoad64<>(SB),NOSPLIT,$0-12
+	// addr is already in R1
+
+	LDREXD	(R1), R2	// loads R2 and R3
+	DMB	MB_ISH
+
+	MOVW	R2, val_lo+4(FP)
+	MOVW	R3, val_hi+8(FP)
+	RET
+
+TEXT armStore64<>(SB),NOSPLIT,$0-12
+	// addr is already in R1
+	MOVW	val_lo+4(FP), R2
+	MOVW	val_hi+8(FP), R3
+
+store64loop:
+	LDREXD	(R1), R4	// loads R4 and R5
+
+	DMB	MB_ISHST
+
+	STREXD	R2, (R1), R0	// stores R2 and R3
+	CMP	$0, R0
+	BNE	store64loop
+
+	DMB	MB_ISH
+	RET
+
+// The following functions all panic if their address argument isn't
+// 8-byte aligned. Since we're calling back into Go code to do this,
+// we have to cooperate with stack unwinding. In the normal case, the
+// functions tail-call into the appropriate implementation, which
+// means they must not open a frame. Hence, when they go down the
+// panic path, at that point they push the LR to create a real frame
+// (they don't need to pop it because panic won't return).
+
+TEXT ·Cas64(SB),NOSPLIT,$-4-21
+	NO_LOCAL_POINTERS
+	MOVW	addr+0(FP), R1
+	// make unaligned atomic access panic
+	AND.S	$7, R1, R2
+	BEQ 	3(PC)
+	MOVW.W	R14, -4(R13) // prepare a real frame
+	BL	·panicUnaligned(SB)
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	armCas64<>(SB)
+	JMP	·goCas64(SB)
+
+TEXT ·Xadd64(SB),NOSPLIT,$-4-20
+	NO_LOCAL_POINTERS
+	MOVW	addr+0(FP), R1
+	// make unaligned atomic access panic
+	AND.S	$7, R1, R2
+	BEQ 	3(PC)
+	MOVW.W	R14, -4(R13) // prepare a real frame
+	BL	·panicUnaligned(SB)
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	armXadd64<>(SB)
+	JMP	·goXadd64(SB)
+
+TEXT ·Xchg64(SB),NOSPLIT,$-4-20
+	NO_LOCAL_POINTERS
+	MOVW	addr+0(FP), R1
+	// make unaligned atomic access panic
+	AND.S	$7, R1, R2
+	BEQ 	3(PC)
+	MOVW.W	R14, -4(R13) // prepare a real frame
+	BL	·panicUnaligned(SB)
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	armXchg64<>(SB)
+	JMP	·goXchg64(SB)
+
+TEXT ·Load64(SB),NOSPLIT,$-4-12
+	NO_LOCAL_POINTERS
+	MOVW	addr+0(FP), R1
+	// make unaligned atomic access panic
+	AND.S	$7, R1, R2
+	BEQ 	3(PC)
+	MOVW.W	R14, -4(R13) // prepare a real frame
+	BL	·panicUnaligned(SB)
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	armLoad64<>(SB)
+	JMP	·goLoad64(SB)
+
+TEXT ·Store64(SB),NOSPLIT,$-4-12
+	NO_LOCAL_POINTERS
+	MOVW	addr+0(FP), R1
+	// make unaligned atomic access panic
+	AND.S	$7, R1, R2
+	BEQ 	3(PC)
+	MOVW.W	R14, -4(R13) // prepare a real frame
+	BL	·panicUnaligned(SB)
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	armStore64<>(SB)
+	JMP	·goStore64(SB)
diff --git a/src/runtime/internal/atomic/asm_arm64.s b/src/runtime/internal/atomic/asm_arm64.s
new file mode 100644
index 0000000..8336a85
--- /dev/null
+++ b/src/runtime/internal/atomic/asm_arm64.s
@@ -0,0 +1,61 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// bool Cas(uint32 *ptr, uint32 old, uint32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT runtime∕internal∕atomic·Cas(SB), NOSPLIT, $0-17
+	MOVD	ptr+0(FP), R0
+	MOVW	old+8(FP), R1
+	MOVW	new+12(FP), R2
+again:
+	LDAXRW	(R0), R3
+	CMPW	R1, R3
+	BNE	ok
+	STLXRW	R2, (R0), R3
+	CBNZ	R3, again
+ok:
+	CSET	EQ, R0
+	MOVB	R0, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Casuintptr(SB), NOSPLIT, $0-25
+	B	runtime∕internal∕atomic·Cas64(SB)
+
+TEXT runtime∕internal∕atomic·CasRel(SB), NOSPLIT, $0-17
+	B	runtime∕internal∕atomic·Cas(SB)
+
+TEXT runtime∕internal∕atomic·Loaduintptr(SB), NOSPLIT, $0-16
+	B	runtime∕internal∕atomic·Load64(SB)
+
+TEXT runtime∕internal∕atomic·Loaduint(SB), NOSPLIT, $0-16
+	B	runtime∕internal∕atomic·Load64(SB)
+
+TEXT runtime∕internal∕atomic·Storeuintptr(SB), NOSPLIT, $0-16
+	B	runtime∕internal∕atomic·Store64(SB)
+
+TEXT runtime∕internal∕atomic·Xadduintptr(SB), NOSPLIT, $0-24
+	B	runtime∕internal∕atomic·Xadd64(SB)
+
+TEXT runtime∕internal∕atomic·Loadint64(SB), NOSPLIT, $0-16
+	B	runtime∕internal∕atomic·Load64(SB)
+
+TEXT runtime∕internal∕atomic·Xaddint64(SB), NOSPLIT, $0-24
+	B	runtime∕internal∕atomic·Xadd64(SB)
+
+// bool Casp1(void **val, void *old, void *new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT runtime∕internal∕atomic·Casp1(SB), NOSPLIT, $0-25
+	B runtime∕internal∕atomic·Cas64(SB)
diff --git a/src/runtime/internal/atomic/asm_mips64x.s b/src/runtime/internal/atomic/asm_mips64x.s
new file mode 100644
index 0000000..a515683
--- /dev/null
+++ b/src/runtime/internal/atomic/asm_mips64x.s
@@ -0,0 +1,271 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+#include "textflag.h"
+
+// bool cas(uint32 *ptr, uint32 old, uint32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Cas(SB), NOSPLIT, $0-17
+	MOVV	ptr+0(FP), R1
+	MOVW	old+8(FP), R2
+	MOVW	new+12(FP), R5
+	SYNC
+cas_again:
+	MOVV	R5, R3
+	LL	(R1), R4
+	BNE	R2, R4, cas_fail
+	SC	R3, (R1)
+	BEQ	R3, cas_again
+	MOVV	$1, R1
+	MOVB	R1, ret+16(FP)
+	SYNC
+	RET
+cas_fail:
+	MOVV	$0, R1
+	JMP	-4(PC)
+
+// bool	cas64(uint64 *ptr, uint64 old, uint64 new)
+// Atomically:
+//	if(*val == *old){
+//		*val = new;
+//		return 1;
+//	} else {
+//		return 0;
+//	}
+TEXT ·Cas64(SB), NOSPLIT, $0-25
+	MOVV	ptr+0(FP), R1
+	MOVV	old+8(FP), R2
+	MOVV	new+16(FP), R5
+	SYNC
+cas64_again:
+	MOVV	R5, R3
+	LLV	(R1), R4
+	BNE	R2, R4, cas64_fail
+	SCV	R3, (R1)
+	BEQ	R3, cas64_again
+	MOVV	$1, R1
+	MOVB	R1, ret+24(FP)
+	SYNC
+	RET
+cas64_fail:
+	MOVV	$0, R1
+	JMP	-4(PC)
+
+TEXT ·Casuintptr(SB), NOSPLIT, $0-25
+	JMP	·Cas64(SB)
+
+TEXT ·CasRel(SB), NOSPLIT, $0-17
+	JMP	·Cas(SB)
+
+TEXT ·Loaduintptr(SB),  NOSPLIT|NOFRAME, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Loaduint(SB), NOSPLIT|NOFRAME, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Storeuintptr(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-24
+	JMP	·Xadd64(SB)
+
+TEXT ·Loadint64(SB), NOSPLIT, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Xaddint64(SB), NOSPLIT, $0-24
+	JMP	·Xadd64(SB)
+
+// bool casp(void **val, void *old, void *new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Casp1(SB), NOSPLIT, $0-25
+	JMP runtime∕internal∕atomic·Cas64(SB)
+
+// uint32 xadd(uint32 volatile *ptr, int32 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd(SB), NOSPLIT, $0-20
+	MOVV	ptr+0(FP), R2
+	MOVW	delta+8(FP), R3
+	SYNC
+	LL	(R2), R1
+	ADDU	R1, R3, R4
+	MOVV	R4, R1
+	SC	R4, (R2)
+	BEQ	R4, -4(PC)
+	MOVW	R1, ret+16(FP)
+	SYNC
+	RET
+
+TEXT ·Xadd64(SB), NOSPLIT, $0-24
+	MOVV	ptr+0(FP), R2
+	MOVV	delta+8(FP), R3
+	SYNC
+	LLV	(R2), R1
+	ADDVU	R1, R3, R4
+	MOVV	R4, R1
+	SCV	R4, (R2)
+	BEQ	R4, -4(PC)
+	MOVV	R1, ret+16(FP)
+	SYNC
+	RET
+
+TEXT ·Xchg(SB), NOSPLIT, $0-20
+	MOVV	ptr+0(FP), R2
+	MOVW	new+8(FP), R5
+
+	SYNC
+	MOVV	R5, R3
+	LL	(R2), R1
+	SC	R3, (R2)
+	BEQ	R3, -3(PC)
+	MOVW	R1, ret+16(FP)
+	SYNC
+	RET
+
+TEXT ·Xchg64(SB), NOSPLIT, $0-24
+	MOVV	ptr+0(FP), R2
+	MOVV	new+8(FP), R5
+
+	SYNC
+	MOVV	R5, R3
+	LLV	(R2), R1
+	SCV	R3, (R2)
+	BEQ	R3, -3(PC)
+	MOVV	R1, ret+16(FP)
+	SYNC
+	RET
+
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-24
+	JMP	·Xchg64(SB)
+
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreRel(SB), NOSPLIT, $0-12
+	JMP	·Store(SB)
+
+TEXT ·StoreRel64(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreReluintptr(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·Store(SB), NOSPLIT, $0-12
+	MOVV	ptr+0(FP), R1
+	MOVW	val+8(FP), R2
+	SYNC
+	MOVW	R2, 0(R1)
+	SYNC
+	RET
+
+TEXT ·Store8(SB), NOSPLIT, $0-9
+	MOVV	ptr+0(FP), R1
+	MOVB	val+8(FP), R2
+	SYNC
+	MOVB	R2, 0(R1)
+	SYNC
+	RET
+
+TEXT ·Store64(SB), NOSPLIT, $0-16
+	MOVV	ptr+0(FP), R1
+	MOVV	val+8(FP), R2
+	SYNC
+	MOVV	R2, 0(R1)
+	SYNC
+	RET
+
+// void	Or8(byte volatile*, byte);
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOVV	ptr+0(FP), R1
+	MOVBU	val+8(FP), R2
+	// Align ptr down to 4 bytes so we can use 32-bit load/store.
+	MOVV	$~3, R3
+	AND	R1, R3
+	// Compute val shift.
+#ifdef GOARCH_mips64
+	// Big endian.  ptr = ptr ^ 3
+	XOR	$3, R1
+#endif
+	// R4 = ((ptr & 3) * 8)
+	AND	$3, R1, R4
+	SLLV	$3, R4
+	// Shift val for aligned ptr. R2 = val << R4
+	SLLV	R4, R2
+
+	SYNC
+	LL	(R3), R4
+	OR	R2, R4
+	SC	R4, (R3)
+	BEQ	R4, -4(PC)
+	SYNC
+	RET
+
+// void	And8(byte volatile*, byte);
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOVV	ptr+0(FP), R1
+	MOVBU	val+8(FP), R2
+	// Align ptr down to 4 bytes so we can use 32-bit load/store.
+	MOVV	$~3, R3
+	AND	R1, R3
+	// Compute val shift.
+#ifdef GOARCH_mips64
+	// Big endian.  ptr = ptr ^ 3
+	XOR	$3, R1
+#endif
+	// R4 = ((ptr & 3) * 8)
+	AND	$3, R1, R4
+	SLLV	$3, R4
+	// Shift val for aligned ptr. R2 = val << R4 | ^(0xFF << R4)
+	MOVV	$0xFF, R5
+	SLLV	R4, R2
+	SLLV	R4, R5
+	NOR	R0, R5
+	OR	R5, R2
+
+	SYNC
+	LL	(R3), R4
+	AND	R2, R4
+	SC	R4, (R3)
+	BEQ	R4, -4(PC)
+	SYNC
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOVV	ptr+0(FP), R1
+	MOVW	val+8(FP), R2
+
+	SYNC
+	LL	(R1), R3
+	OR	R2, R3
+	SC	R3, (R1)
+	BEQ	R3, -4(PC)
+	SYNC
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOVV	ptr+0(FP), R1
+	MOVW	val+8(FP), R2
+
+	SYNC
+	LL	(R1), R3
+	AND	R2, R3
+	SC	R3, (R1)
+	BEQ	R3, -4(PC)
+	SYNC
+	RET
diff --git a/src/runtime/internal/atomic/asm_mipsx.s b/src/runtime/internal/atomic/asm_mipsx.s
new file mode 100644
index 0000000..2b2cfab
--- /dev/null
+++ b/src/runtime/internal/atomic/asm_mipsx.s
@@ -0,0 +1,200 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+#include "textflag.h"
+
+TEXT ·Cas(SB),NOSPLIT,$0-13
+	MOVW	ptr+0(FP), R1
+	MOVW	old+4(FP), R2
+	MOVW	new+8(FP), R5
+	SYNC
+try_cas:
+	MOVW	R5, R3
+	LL	(R1), R4	// R4 = *R1
+	BNE	R2, R4, cas_fail
+	SC	R3, (R1)	// *R1 = R3
+	BEQ	R3, try_cas
+	SYNC
+	MOVB	R3, ret+12(FP)
+	RET
+cas_fail:
+	MOVB	R0, ret+12(FP)
+	RET
+
+TEXT ·Store(SB),NOSPLIT,$0-8
+	MOVW	ptr+0(FP), R1
+	MOVW	val+4(FP), R2
+	SYNC
+	MOVW	R2, 0(R1)
+	SYNC
+	RET
+
+TEXT ·Store8(SB),NOSPLIT,$0-5
+	MOVW	ptr+0(FP), R1
+	MOVB	val+4(FP), R2
+	SYNC
+	MOVB	R2, 0(R1)
+	SYNC
+	RET
+
+TEXT ·Load(SB),NOSPLIT,$0-8
+	MOVW	ptr+0(FP), R1
+	SYNC
+	MOVW	0(R1), R1
+	SYNC
+	MOVW	R1, ret+4(FP)
+	RET
+
+TEXT ·Load8(SB),NOSPLIT,$0-5
+	MOVW	ptr+0(FP), R1
+	SYNC
+	MOVB	0(R1), R1
+	SYNC
+	MOVB	R1, ret+4(FP)
+	RET
+
+TEXT ·Xadd(SB),NOSPLIT,$0-12
+	MOVW	ptr+0(FP), R2
+	MOVW	delta+4(FP), R3
+	SYNC
+try_xadd:
+	LL	(R2), R1	// R1 = *R2
+	ADDU	R1, R3, R4
+	MOVW	R4, R1
+	SC	R4, (R2)	// *R2 = R4
+	BEQ	R4, try_xadd
+	SYNC
+	MOVW	R1, ret+8(FP)
+	RET
+
+TEXT ·Xchg(SB),NOSPLIT,$0-12
+	MOVW	ptr+0(FP), R2
+	MOVW	new+4(FP), R5
+	SYNC
+try_xchg:
+	MOVW	R5, R3
+	LL	(R2), R1	// R1 = *R2
+	SC	R3, (R2)	// *R2 = R3
+	BEQ	R3, try_xchg
+	SYNC
+	MOVW	R1, ret+8(FP)
+	RET
+
+TEXT ·Casuintptr(SB),NOSPLIT,$0-13
+	JMP	·Cas(SB)
+
+TEXT ·CasRel(SB),NOSPLIT,$0-13
+	JMP	·Cas(SB)
+
+TEXT ·Loaduintptr(SB),NOSPLIT,$0-8
+	JMP	·Load(SB)
+
+TEXT ·Loaduint(SB),NOSPLIT,$0-8
+	JMP	·Load(SB)
+
+TEXT ·Loadp(SB),NOSPLIT,$-0-8
+	JMP	·Load(SB)
+
+TEXT ·Storeuintptr(SB),NOSPLIT,$0-8
+	JMP	·Store(SB)
+
+TEXT ·Xadduintptr(SB),NOSPLIT,$0-12
+	JMP	·Xadd(SB)
+
+TEXT ·Loadint64(SB),NOSPLIT,$0-12
+	JMP	·Load64(SB)
+
+TEXT ·Xaddint64(SB),NOSPLIT,$0-20
+	JMP	·Xadd64(SB)
+
+TEXT ·Casp1(SB),NOSPLIT,$0-13
+	JMP	·Cas(SB)
+
+TEXT ·Xchguintptr(SB),NOSPLIT,$0-12
+	JMP	·Xchg(SB)
+
+TEXT ·StorepNoWB(SB),NOSPLIT,$0-8
+	JMP	·Store(SB)
+
+TEXT ·StoreRel(SB),NOSPLIT,$0-8
+	JMP	·Store(SB)
+
+TEXT ·StoreReluintptr(SB),NOSPLIT,$0-8
+	JMP	·Store(SB)
+
+// void	Or8(byte volatile*, byte);
+TEXT ·Or8(SB),NOSPLIT,$0-5
+	MOVW	ptr+0(FP), R1
+	MOVBU	val+4(FP), R2
+	MOVW	$~3, R3	// Align ptr down to 4 bytes so we can use 32-bit load/store.
+	AND	R1, R3
+#ifdef GOARCH_mips
+	// Big endian.  ptr = ptr ^ 3
+	XOR	$3, R1
+#endif
+	AND	$3, R1, R4	// R4 = ((ptr & 3) * 8)
+	SLL	$3, R4
+	SLL	R4, R2, R2	// Shift val for aligned ptr. R2 = val << R4
+	SYNC
+try_or8:
+	LL	(R3), R4	// R4 = *R3
+	OR	R2, R4
+	SC	R4, (R3)	// *R3 = R4
+	BEQ	R4, try_or8
+	SYNC
+	RET
+
+// void	And8(byte volatile*, byte);
+TEXT ·And8(SB),NOSPLIT,$0-5
+	MOVW	ptr+0(FP), R1
+	MOVBU	val+4(FP), R2
+	MOVW	$~3, R3
+	AND	R1, R3
+#ifdef GOARCH_mips
+	// Big endian.  ptr = ptr ^ 3
+	XOR	$3, R1
+#endif
+	AND	$3, R1, R4	// R4 = ((ptr & 3) * 8)
+	SLL	$3, R4
+	MOVW	$0xFF, R5
+	SLL	R4, R2
+	SLL	R4, R5
+	NOR	R0, R5
+	OR	R5, R2	// Shift val for aligned ptr. R2 = val << R4 | ^(0xFF << R4)
+	SYNC
+try_and8:
+	LL	(R3), R4	// R4 = *R3
+	AND	R2, R4
+	SC	R4, (R3)	// *R3 = R4
+	BEQ	R4, try_and8
+	SYNC
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-8
+	MOVW	ptr+0(FP), R1
+	MOVW	val+4(FP), R2
+
+	SYNC
+	LL	(R1), R3
+	OR	R2, R3
+	SC	R3, (R1)
+	BEQ	R3, -4(PC)
+	SYNC
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-8
+	MOVW	ptr+0(FP), R1
+	MOVW	val+4(FP), R2
+
+	SYNC
+	LL	(R1), R3
+	AND	R2, R3
+	SC	R3, (R1)
+	BEQ	R3, -4(PC)
+	SYNC
+	RET
diff --git a/src/runtime/internal/atomic/asm_ppc64x.s b/src/runtime/internal/atomic/asm_ppc64x.s
new file mode 100644
index 0000000..bb009ab
--- /dev/null
+++ b/src/runtime/internal/atomic/asm_ppc64x.s
@@ -0,0 +1,253 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+#include "textflag.h"
+
+// bool cas(uint32 *ptr, uint32 old, uint32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT runtime∕internal∕atomic·Cas(SB), NOSPLIT, $0-17
+	MOVD	ptr+0(FP), R3
+	MOVWZ	old+8(FP), R4
+	MOVWZ	new+12(FP), R5
+	LWSYNC
+cas_again:
+	LWAR	(R3), R6
+	CMPW	R6, R4
+	BNE	cas_fail
+	STWCCC	R5, (R3)
+	BNE	cas_again
+	MOVD	$1, R3
+	LWSYNC
+	MOVB	R3, ret+16(FP)
+	RET
+cas_fail:
+	MOVB	R0, ret+16(FP)
+	RET
+
+// bool	runtime∕internal∕atomic·Cas64(uint64 *ptr, uint64 old, uint64 new)
+// Atomically:
+//	if(*val == *old){
+//		*val = new;
+//		return 1;
+//	} else {
+//		return 0;
+//	}
+TEXT runtime∕internal∕atomic·Cas64(SB), NOSPLIT, $0-25
+	MOVD	ptr+0(FP), R3
+	MOVD	old+8(FP), R4
+	MOVD	new+16(FP), R5
+	LWSYNC
+cas64_again:
+	LDAR	(R3), R6
+	CMP	R6, R4
+	BNE	cas64_fail
+	STDCCC	R5, (R3)
+	BNE	cas64_again
+	MOVD	$1, R3
+	LWSYNC
+	MOVB	R3, ret+24(FP)
+	RET
+cas64_fail:
+	MOVB	R0, ret+24(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·CasRel(SB), NOSPLIT, $0-17
+	MOVD    ptr+0(FP), R3
+	MOVWZ   old+8(FP), R4
+	MOVWZ   new+12(FP), R5
+	LWSYNC
+cas_again:
+	LWAR    (R3), $0, R6        // 0 = Mutex release hint
+	CMPW    R6, R4
+	BNE     cas_fail
+	STWCCC  R5, (R3)
+	BNE     cas_again
+	MOVD    $1, R3
+	MOVB    R3, ret+16(FP)
+	RET
+cas_fail:
+	MOVB    R0, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Casuintptr(SB), NOSPLIT, $0-25
+	BR	runtime∕internal∕atomic·Cas64(SB)
+
+TEXT runtime∕internal∕atomic·Loaduintptr(SB),  NOSPLIT|NOFRAME, $0-16
+	BR	runtime∕internal∕atomic·Load64(SB)
+
+TEXT runtime∕internal∕atomic·LoadAcquintptr(SB),  NOSPLIT|NOFRAME, $0-16
+	BR	runtime∕internal∕atomic·LoadAcq64(SB)
+
+TEXT runtime∕internal∕atomic·Loaduint(SB), NOSPLIT|NOFRAME, $0-16
+	BR	runtime∕internal∕atomic·Load64(SB)
+
+TEXT runtime∕internal∕atomic·Storeuintptr(SB), NOSPLIT, $0-16
+	BR	runtime∕internal∕atomic·Store64(SB)
+
+TEXT runtime∕internal∕atomic·StoreReluintptr(SB), NOSPLIT, $0-16
+	BR	runtime∕internal∕atomic·StoreRel64(SB)
+
+TEXT runtime∕internal∕atomic·Xadduintptr(SB), NOSPLIT, $0-24
+	BR	runtime∕internal∕atomic·Xadd64(SB)
+
+TEXT runtime∕internal∕atomic·Loadint64(SB), NOSPLIT, $0-16
+	BR	runtime∕internal∕atomic·Load64(SB)
+
+TEXT runtime∕internal∕atomic·Xaddint64(SB), NOSPLIT, $0-24
+	BR	runtime∕internal∕atomic·Xadd64(SB)
+
+// bool casp(void **val, void *old, void *new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT runtime∕internal∕atomic·Casp1(SB), NOSPLIT, $0-25
+	BR runtime∕internal∕atomic·Cas64(SB)
+
+// uint32 xadd(uint32 volatile *ptr, int32 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT runtime∕internal∕atomic·Xadd(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R4
+	MOVW	delta+8(FP), R5
+	LWSYNC
+	LWAR	(R4), R3
+	ADD	R5, R3
+	STWCCC	R3, (R4)
+	BNE	-3(PC)
+	MOVW	R3, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xadd64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R4
+	MOVD	delta+8(FP), R5
+	LWSYNC
+	LDAR	(R4), R3
+	ADD	R5, R3
+	STDCCC	R3, (R4)
+	BNE	-3(PC)
+	MOVD	R3, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xchg(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R4
+	MOVW	new+8(FP), R5
+	LWSYNC
+	LWAR	(R4), R3
+	STWCCC	R5, (R4)
+	BNE	-2(PC)
+	ISYNC
+	MOVW	R3, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xchg64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R4
+	MOVD	new+8(FP), R5
+	LWSYNC
+	LDAR	(R4), R3
+	STDCCC	R5, (R4)
+	BNE	-2(PC)
+	ISYNC
+	MOVD	R3, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xchguintptr(SB), NOSPLIT, $0-24
+	BR	runtime∕internal∕atomic·Xchg64(SB)
+
+
+TEXT runtime∕internal∕atomic·StorepNoWB(SB), NOSPLIT, $0-16
+	BR	runtime∕internal∕atomic·Store64(SB)
+
+TEXT runtime∕internal∕atomic·Store(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	SYNC
+	MOVW	R4, 0(R3)
+	RET
+
+TEXT runtime∕internal∕atomic·Store8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R3
+	MOVB	val+8(FP), R4
+	SYNC
+	MOVB	R4, 0(R3)
+	RET
+
+TEXT runtime∕internal∕atomic·Store64(SB), NOSPLIT, $0-16
+	MOVD	ptr+0(FP), R3
+	MOVD	val+8(FP), R4
+	SYNC
+	MOVD	R4, 0(R3)
+	RET
+
+TEXT runtime∕internal∕atomic·StoreRel(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	LWSYNC
+	MOVW	R4, 0(R3)
+	RET
+
+TEXT runtime∕internal∕atomic·StoreRel64(SB), NOSPLIT, $0-16
+	MOVD	ptr+0(FP), R3
+	MOVD	val+8(FP), R4
+	LWSYNC
+	MOVD	R4, 0(R3)
+	RET
+
+// void runtime∕internal∕atomic·Or8(byte volatile*, byte);
+TEXT runtime∕internal∕atomic·Or8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R3
+	MOVBZ	val+8(FP), R4
+	LWSYNC
+again:
+	LBAR	(R3), R6
+	OR	R4, R6
+	STBCCC	R6, (R3)
+	BNE	again
+	RET
+
+// void runtime∕internal∕atomic·And8(byte volatile*, byte);
+TEXT runtime∕internal∕atomic·And8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R3
+	MOVBZ	val+8(FP), R4
+	LWSYNC
+again:
+	LBAR	(R3), R6
+	AND	R4, R6
+	STBCCC	R6, (R3)
+	BNE	again
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT runtime∕internal∕atomic·Or(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	LWSYNC
+again:
+	LWAR	(R3), R6
+	OR	R4, R6
+	STWCCC	R6, (R3)
+	BNE	again
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT runtime∕internal∕atomic·And(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	LWSYNC
+again:
+	LWAR	(R3),R6
+	AND	R4, R6
+	STWCCC	R6, (R3)
+	BNE	again
+	RET
diff --git a/src/runtime/internal/atomic/asm_s390x.s b/src/runtime/internal/atomic/asm_s390x.s
new file mode 100644
index 0000000..daf1f3c
--- /dev/null
+++ b/src/runtime/internal/atomic/asm_s390x.s
@@ -0,0 +1,216 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// func Store(ptr *uint32, val uint32)
+TEXT ·Store(SB), NOSPLIT, $0
+	MOVD	ptr+0(FP), R2
+	MOVWZ	val+8(FP), R3
+	MOVW	R3, 0(R2)
+	SYNC
+	RET
+
+// func Store8(ptr *uint8, val uint8)
+TEXT ·Store8(SB), NOSPLIT, $0
+	MOVD	ptr+0(FP), R2
+	MOVB	val+8(FP), R3
+	MOVB	R3, 0(R2)
+	SYNC
+	RET
+
+// func Store64(ptr *uint64, val uint64)
+TEXT ·Store64(SB), NOSPLIT, $0
+	MOVD	ptr+0(FP), R2
+	MOVD	val+8(FP), R3
+	MOVD	R3, 0(R2)
+	SYNC
+	RET
+
+// func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+TEXT ·StorepNoWB(SB), NOSPLIT, $0
+	MOVD	ptr+0(FP), R2
+	MOVD	val+8(FP), R3
+	MOVD	R3, 0(R2)
+	SYNC
+	RET
+
+// func Cas(ptr *uint32, old, new uint32) bool
+// Atomically:
+//	if *ptr == old {
+//		*val = new
+//		return 1
+//	} else {
+//		return 0
+//	}
+TEXT ·Cas(SB), NOSPLIT, $0-17
+	MOVD	ptr+0(FP), R3
+	MOVWZ	old+8(FP), R4
+	MOVWZ	new+12(FP), R5
+	CS	R4, R5, 0(R3)    //  if (R4 == 0(R3)) then 0(R3)= R5
+	BNE	cas_fail
+	MOVB	$1, ret+16(FP)
+	RET
+cas_fail:
+	MOVB	$0, ret+16(FP)
+	RET
+
+// func Cas64(ptr *uint64, old, new uint64) bool
+// Atomically:
+//	if *ptr == old {
+//		*ptr = new
+//		return 1
+//	} else {
+//		return 0
+//	}
+TEXT ·Cas64(SB), NOSPLIT, $0-25
+	MOVD	ptr+0(FP), R3
+	MOVD	old+8(FP), R4
+	MOVD	new+16(FP), R5
+	CSG	R4, R5, 0(R3)    //  if (R4 == 0(R3)) then 0(R3)= R5
+	BNE	cas64_fail
+	MOVB	$1, ret+24(FP)
+	RET
+cas64_fail:
+	MOVB	$0, ret+24(FP)
+	RET
+
+// func Casuintptr(ptr *uintptr, old, new uintptr) bool
+TEXT ·Casuintptr(SB), NOSPLIT, $0-25
+	BR	·Cas64(SB)
+
+// func CasRel(ptr *uint32, old, new uint32) bool
+TEXT ·CasRel(SB), NOSPLIT, $0-17
+	BR	·Cas(SB)
+
+// func Loaduintptr(ptr *uintptr) uintptr
+TEXT ·Loaduintptr(SB), NOSPLIT, $0-16
+	BR	·Load64(SB)
+
+// func Loaduint(ptr *uint) uint
+TEXT ·Loaduint(SB), NOSPLIT, $0-16
+	BR	·Load64(SB)
+
+// func Storeuintptr(ptr *uintptr, new uintptr)
+TEXT ·Storeuintptr(SB), NOSPLIT, $0-16
+	BR	·Store64(SB)
+
+// func Loadint64(ptr *int64) int64
+TEXT ·Loadint64(SB), NOSPLIT, $0-16
+	BR	·Load64(SB)
+
+// func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-24
+	BR	·Xadd64(SB)
+
+// func Xaddint64(ptr *int64, delta int64) int64
+TEXT ·Xaddint64(SB), NOSPLIT, $0-24
+	BR	·Xadd64(SB)
+
+// func Casp1(ptr *unsafe.Pointer, old, new unsafe.Pointer) bool
+// Atomically:
+//	if *ptr == old {
+//		*ptr = new
+//		return 1
+//	} else {
+//		return 0
+//	}
+TEXT ·Casp1(SB), NOSPLIT, $0-25
+	BR ·Cas64(SB)
+
+// func Xadd(ptr *uint32, delta int32) uint32
+// Atomically:
+//	*ptr += delta
+//	return *ptr
+TEXT ·Xadd(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R4
+	MOVW	delta+8(FP), R5
+	MOVW	(R4), R3
+repeat:
+	ADD	R5, R3, R6
+	CS	R3, R6, (R4) // if R3==(R4) then (R4)=R6 else R3=(R4)
+	BNE	repeat
+	MOVW	R6, ret+16(FP)
+	RET
+
+// func Xadd64(ptr *uint64, delta int64) uint64
+TEXT ·Xadd64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R4
+	MOVD	delta+8(FP), R5
+	MOVD	(R4), R3
+repeat:
+	ADD	R5, R3, R6
+	CSG	R3, R6, (R4) // if R3==(R4) then (R4)=R6 else R3=(R4)
+	BNE	repeat
+	MOVD	R6, ret+16(FP)
+	RET
+
+// func Xchg(ptr *uint32, new uint32) uint32
+TEXT ·Xchg(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R4
+	MOVW	new+8(FP), R3
+	MOVW	(R4), R6
+repeat:
+	CS	R6, R3, (R4) // if R6==(R4) then (R4)=R3 else R6=(R4)
+	BNE	repeat
+	MOVW	R6, ret+16(FP)
+	RET
+
+// func Xchg64(ptr *uint64, new uint64) uint64
+TEXT ·Xchg64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R4
+	MOVD	new+8(FP), R3
+	MOVD	(R4), R6
+repeat:
+	CSG	R6, R3, (R4) // if R6==(R4) then (R4)=R3 else R6=(R4)
+	BNE	repeat
+	MOVD	R6, ret+16(FP)
+	RET
+
+// func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-24
+	BR	·Xchg64(SB)
+
+// func Or8(addr *uint8, v uint8)
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R3
+	MOVBZ	val+8(FP), R4
+	// We don't have atomic operations that work on individual bytes so we
+	// need to align addr down to a word boundary and create a mask
+	// containing v to OR with the entire word atomically.
+	MOVD	$(3<<3), R5
+	RXSBG	$59, $60, $3, R3, R5 // R5 = 24 - ((addr % 4) * 8) = ((addr & 3) << 3) ^ (3 << 3)
+	ANDW	$~3, R3              // R3 = floor(addr, 4) = addr &^ 3
+	SLW	R5, R4               // R4 = uint32(v) << R5
+	LAO	R4, R6, 0(R3)        // R6 = *R3; *R3 |= R4; (atomic)
+	RET
+
+// func And8(addr *uint8, v uint8)
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R3
+	MOVBZ	val+8(FP), R4
+	// We don't have atomic operations that work on individual bytes so we
+	// need to align addr down to a word boundary and create a mask
+	// containing v to AND with the entire word atomically.
+	ORW	$~0xff, R4           // R4 = uint32(v) | 0xffffff00
+	MOVD	$(3<<3), R5
+	RXSBG	$59, $60, $3, R3, R5 // R5 = 24 - ((addr % 4) * 8) = ((addr & 3) << 3) ^ (3 << 3)
+	ANDW	$~3, R3              // R3 = floor(addr, 4) = addr &^ 3
+	RLL	R5, R4, R4           // R4 = rotl(R4, R5)
+	LAN	R4, R6, 0(R3)        // R6 = *R3; *R3 &= R4; (atomic)
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	LAO	R4, R6, 0(R3)        // R6 = *R3; *R3 |= R4; (atomic)
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	LAN	R4, R6, 0(R3)        // R6 = *R3; *R3 &= R4; (atomic)
+	RET
diff --git a/src/runtime/internal/atomic/asm_wasm.s b/src/runtime/internal/atomic/asm_wasm.s
new file mode 100644
index 0000000..7c33cb1
--- /dev/null
+++ b/src/runtime/internal/atomic/asm_wasm.s
@@ -0,0 +1,10 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT runtime∕internal∕atomic·StorepNoWB(SB), NOSPLIT, $0-16
+	MOVD ptr+0(FP), R0
+	MOVD val+8(FP), 0(R0)
+	RET
diff --git a/src/runtime/internal/atomic/atomic_386.go b/src/runtime/internal/atomic/atomic_386.go
new file mode 100644
index 0000000..1bfcb11
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_386.go
@@ -0,0 +1,102 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build 386
+
+package atomic
+
+import "unsafe"
+
+// Export some functions via linkname to assembly in sync/atomic.
+//go:linkname Load
+//go:linkname Loadp
+
+//go:nosplit
+//go:noinline
+func Load(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer {
+	return *(*unsafe.Pointer)(ptr)
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcquintptr(ptr *uintptr) uintptr {
+	return *ptr
+}
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+//go:nosplit
+//go:noinline
+func Load8(ptr *uint8) uint8 {
+	return *ptr
+}
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
diff --git a/src/runtime/internal/atomic/atomic_amd64.go b/src/runtime/internal/atomic/atomic_amd64.go
new file mode 100644
index 0000000..e36eb83
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_amd64.go
@@ -0,0 +1,116 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic
+
+import "unsafe"
+
+// Export some functions via linkname to assembly in sync/atomic.
+//go:linkname Load
+//go:linkname Loadp
+//go:linkname Load64
+
+//go:nosplit
+//go:noinline
+func Load(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer {
+	return *(*unsafe.Pointer)(ptr)
+}
+
+//go:nosplit
+//go:noinline
+func Load64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcquintptr(ptr *uintptr) uintptr {
+	return *ptr
+}
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:nosplit
+//go:noinline
+func Load8(ptr *uint8) uint8 {
+	return *ptr
+}
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
+
+// StorepNoWB performs *ptr = val atomically and without a write
+// barrier.
+//
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
diff --git a/src/runtime/internal/atomic/atomic_arm.go b/src/runtime/internal/atomic/atomic_arm.go
new file mode 100644
index 0000000..546b3d6
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_arm.go
@@ -0,0 +1,242 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build arm
+
+package atomic
+
+import (
+	"internal/cpu"
+	"unsafe"
+)
+
+// Export some functions via linkname to assembly in sync/atomic.
+//go:linkname Xchg
+//go:linkname Xchguintptr
+
+type spinlock struct {
+	v uint32
+}
+
+//go:nosplit
+func (l *spinlock) lock() {
+	for {
+		if Cas(&l.v, 0, 1) {
+			return
+		}
+	}
+}
+
+//go:nosplit
+func (l *spinlock) unlock() {
+	Store(&l.v, 0)
+}
+
+var locktab [57]struct {
+	l   spinlock
+	pad [cpu.CacheLinePadSize - unsafe.Sizeof(spinlock{})]byte
+}
+
+func addrLock(addr *uint64) *spinlock {
+	return &locktab[(uintptr(unsafe.Pointer(addr))>>3)%uintptr(len(locktab))].l
+}
+
+// Atomic add and return new value.
+//go:nosplit
+func Xadd(val *uint32, delta int32) uint32 {
+	for {
+		oval := *val
+		nval := oval + uint32(delta)
+		if Cas(val, oval, nval) {
+			return nval
+		}
+	}
+}
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:nosplit
+func Xchg(addr *uint32, v uint32) uint32 {
+	for {
+		old := *addr
+		if Cas(addr, old, v) {
+			return old
+		}
+	}
+}
+
+//go:nosplit
+func Xchguintptr(addr *uintptr, v uintptr) uintptr {
+	return uintptr(Xchg((*uint32)(unsafe.Pointer(addr)), uint32(v)))
+}
+
+// Not noescape -- it installs a pointer to addr.
+func StorepNoWB(addr unsafe.Pointer, v unsafe.Pointer)
+
+//go:noescape
+func Store(addr *uint32, v uint32)
+
+//go:noescape
+func StoreRel(addr *uint32, v uint32)
+
+//go:noescape
+func StoreReluintptr(addr *uintptr, v uintptr)
+
+//go:nosplit
+func goCas64(addr *uint64, old, new uint64) bool {
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		*(*int)(nil) = 0 // crash on unaligned uint64
+	}
+	_ = *addr // if nil, fault before taking the lock
+	var ok bool
+	addrLock(addr).lock()
+	if *addr == old {
+		*addr = new
+		ok = true
+	}
+	addrLock(addr).unlock()
+	return ok
+}
+
+//go:nosplit
+func goXadd64(addr *uint64, delta int64) uint64 {
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		*(*int)(nil) = 0 // crash on unaligned uint64
+	}
+	_ = *addr // if nil, fault before taking the lock
+	var r uint64
+	addrLock(addr).lock()
+	r = *addr + uint64(delta)
+	*addr = r
+	addrLock(addr).unlock()
+	return r
+}
+
+//go:nosplit
+func goXchg64(addr *uint64, v uint64) uint64 {
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		*(*int)(nil) = 0 // crash on unaligned uint64
+	}
+	_ = *addr // if nil, fault before taking the lock
+	var r uint64
+	addrLock(addr).lock()
+	r = *addr
+	*addr = v
+	addrLock(addr).unlock()
+	return r
+}
+
+//go:nosplit
+func goLoad64(addr *uint64) uint64 {
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		*(*int)(nil) = 0 // crash on unaligned uint64
+	}
+	_ = *addr // if nil, fault before taking the lock
+	var r uint64
+	addrLock(addr).lock()
+	r = *addr
+	addrLock(addr).unlock()
+	return r
+}
+
+//go:nosplit
+func goStore64(addr *uint64, v uint64) {
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		*(*int)(nil) = 0 // crash on unaligned uint64
+	}
+	_ = *addr // if nil, fault before taking the lock
+	addrLock(addr).lock()
+	*addr = v
+	addrLock(addr).unlock()
+}
+
+//go:nosplit
+func Or8(addr *uint8, v uint8) {
+	// Align down to 4 bytes and use 32-bit CAS.
+	uaddr := uintptr(unsafe.Pointer(addr))
+	addr32 := (*uint32)(unsafe.Pointer(uaddr &^ 3))
+	word := uint32(v) << ((uaddr & 3) * 8) // little endian
+	for {
+		old := *addr32
+		if Cas(addr32, old, old|word) {
+			return
+		}
+	}
+}
+
+//go:nosplit
+func And8(addr *uint8, v uint8) {
+	// Align down to 4 bytes and use 32-bit CAS.
+	uaddr := uintptr(unsafe.Pointer(addr))
+	addr32 := (*uint32)(unsafe.Pointer(uaddr &^ 3))
+	word := uint32(v) << ((uaddr & 3) * 8)    // little endian
+	mask := uint32(0xFF) << ((uaddr & 3) * 8) // little endian
+	word |= ^mask
+	for {
+		old := *addr32
+		if Cas(addr32, old, old&word) {
+			return
+		}
+	}
+}
+
+//go:nosplit
+func Or(addr *uint32, v uint32) {
+	for {
+		old := *addr
+		if Cas(addr, old, old|v) {
+			return
+		}
+	}
+}
+
+//go:nosplit
+func And(addr *uint32, v uint32) {
+	for {
+		old := *addr
+		if Cas(addr, old, old&v) {
+			return
+		}
+	}
+}
+
+//go:nosplit
+func armcas(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Load(addr *uint32) uint32
+
+// NO go:noescape annotation; *addr escapes if result escapes (#31525)
+func Loadp(addr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func Load8(addr *uint8) uint8
+
+//go:noescape
+func LoadAcq(addr *uint32) uint32
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func Cas64(addr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(addr *uint32, old, new uint32) bool
+
+//go:noescape
+func Xadd64(addr *uint64, delta int64) uint64
+
+//go:noescape
+func Xchg64(addr *uint64, v uint64) uint64
+
+//go:noescape
+func Load64(addr *uint64) uint64
+
+//go:noescape
+func Store8(addr *uint8, v uint8)
+
+//go:noescape
+func Store64(addr *uint64, v uint64)
diff --git a/src/runtime/internal/atomic/atomic_arm64.go b/src/runtime/internal/atomic/atomic_arm64.go
new file mode 100644
index 0000000..d49bee8
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_arm64.go
@@ -0,0 +1,87 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build arm64
+
+package atomic
+
+import "unsafe"
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(addr *uint32) uint32
+
+//go:noescape
+func LoadAcq64(ptr *uint64) uint64
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
diff --git a/src/runtime/internal/atomic/atomic_arm64.s b/src/runtime/internal/atomic/atomic_arm64.s
new file mode 100644
index 0000000..0cf3c40
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_arm64.s
@@ -0,0 +1,185 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// uint32 runtime∕internal∕atomic·Load(uint32 volatile* addr)
+TEXT ·Load(SB),NOSPLIT,$0-12
+	MOVD	ptr+0(FP), R0
+	LDARW	(R0), R0
+	MOVW	R0, ret+8(FP)
+	RET
+
+// uint8 runtime∕internal∕atomic·Load8(uint8 volatile* addr)
+TEXT ·Load8(SB),NOSPLIT,$0-9
+	MOVD	ptr+0(FP), R0
+	LDARB	(R0), R0
+	MOVB	R0, ret+8(FP)
+	RET
+
+// uint64 runtime∕internal∕atomic·Load64(uint64 volatile* addr)
+TEXT ·Load64(SB),NOSPLIT,$0-16
+	MOVD	ptr+0(FP), R0
+	LDAR	(R0), R0
+	MOVD	R0, ret+8(FP)
+	RET
+
+// void *runtime∕internal∕atomic·Loadp(void *volatile *addr)
+TEXT ·Loadp(SB),NOSPLIT,$0-16
+	MOVD	ptr+0(FP), R0
+	LDAR	(R0), R0
+	MOVD	R0, ret+8(FP)
+	RET
+
+// uint32 runtime∕internal∕atomic·LoadAcq(uint32 volatile* addr)
+TEXT ·LoadAcq(SB),NOSPLIT,$0-12
+	B	·Load(SB)
+
+// uint64 runtime∕internal∕atomic·LoadAcquintptr(uint64 volatile* addr)
+TEXT ·LoadAcq64(SB),NOSPLIT,$0-16
+	B	·Load64(SB)
+
+// uintptr runtime∕internal∕atomic·LoadAcq64(uintptr volatile* addr)
+TEXT ·LoadAcquintptr(SB),NOSPLIT,$0-16
+	B	·Load64(SB)
+
+TEXT runtime∕internal∕atomic·StorepNoWB(SB), NOSPLIT, $0-16
+	B	runtime∕internal∕atomic·Store64(SB)
+
+TEXT runtime∕internal∕atomic·StoreRel(SB), NOSPLIT, $0-12
+	B	runtime∕internal∕atomic·Store(SB)
+
+TEXT runtime∕internal∕atomic·StoreRel64(SB), NOSPLIT, $0-16
+	B	runtime∕internal∕atomic·Store64(SB)
+
+TEXT runtime∕internal∕atomic·StoreReluintptr(SB), NOSPLIT, $0-16
+	B	runtime∕internal∕atomic·Store64(SB)
+
+TEXT runtime∕internal∕atomic·Store(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R0
+	MOVW	val+8(FP), R1
+	STLRW	R1, (R0)
+	RET
+
+TEXT runtime∕internal∕atomic·Store8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R0
+	MOVB	val+8(FP), R1
+	STLRB	R1, (R0)
+	RET
+
+TEXT runtime∕internal∕atomic·Store64(SB), NOSPLIT, $0-16
+	MOVD	ptr+0(FP), R0
+	MOVD	val+8(FP), R1
+	STLR	R1, (R0)
+	RET
+
+TEXT runtime∕internal∕atomic·Xchg(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R0
+	MOVW	new+8(FP), R1
+again:
+	LDAXRW	(R0), R2
+	STLXRW	R1, (R0), R3
+	CBNZ	R3, again
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xchg64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R0
+	MOVD	new+8(FP), R1
+again:
+	LDAXR	(R0), R2
+	STLXR	R1, (R0), R3
+	CBNZ	R3, again
+	MOVD	R2, ret+16(FP)
+	RET
+
+// bool runtime∕internal∕atomic·Cas64(uint64 *ptr, uint64 old, uint64 new)
+// Atomically:
+//      if(*val == *old){
+//              *val = new;
+//              return 1;
+//      } else {
+//              return 0;
+//      }
+TEXT runtime∕internal∕atomic·Cas64(SB), NOSPLIT, $0-25
+	MOVD	ptr+0(FP), R0
+	MOVD	old+8(FP), R1
+	MOVD	new+16(FP), R2
+again:
+	LDAXR	(R0), R3
+	CMP	R1, R3
+	BNE	ok
+	STLXR	R2, (R0), R3
+	CBNZ	R3, again
+ok:
+	CSET	EQ, R0
+	MOVB	R0, ret+24(FP)
+	RET
+
+// uint32 xadd(uint32 volatile *ptr, int32 delta)
+// Atomically:
+//      *val += delta;
+//      return *val;
+TEXT runtime∕internal∕atomic·Xadd(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R0
+	MOVW	delta+8(FP), R1
+again:
+	LDAXRW	(R0), R2
+	ADDW	R2, R1, R2
+	STLXRW	R2, (R0), R3
+	CBNZ	R3, again
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xadd64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R0
+	MOVD	delta+8(FP), R1
+again:
+	LDAXR	(R0), R2
+	ADD	R2, R1, R2
+	STLXR	R2, (R0), R3
+	CBNZ	R3, again
+	MOVD	R2, ret+16(FP)
+	RET
+
+TEXT runtime∕internal∕atomic·Xchguintptr(SB), NOSPLIT, $0-24
+	B	runtime∕internal∕atomic·Xchg64(SB)
+
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R0
+	MOVB	val+8(FP), R1
+	LDAXRB	(R0), R2
+	AND	R1, R2
+	STLXRB	R2, (R0), R3
+	CBNZ	R3, -3(PC)
+	RET
+
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R0
+	MOVB	val+8(FP), R1
+	LDAXRB	(R0), R2
+	ORR	R1, R2
+	STLXRB	R2, (R0), R3
+	CBNZ	R3, -3(PC)
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R0
+	MOVW	val+8(FP), R1
+	LDAXRW	(R0), R2
+	AND	R1, R2
+	STLXRW	R2, (R0), R3
+	CBNZ	R3, -3(PC)
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R0
+	MOVW	val+8(FP), R1
+	LDAXRW	(R0), R2
+	ORR	R1, R2
+	STLXRW	R2, (R0), R3
+	CBNZ	R3, -3(PC)
+	RET
diff --git a/src/runtime/internal/atomic/atomic_mips64x.go b/src/runtime/internal/atomic/atomic_mips64x.go
new file mode 100644
index 0000000..b0109d7
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_mips64x.go
@@ -0,0 +1,89 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+package atomic
+
+import "unsafe"
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(ptr *uint32) uint32
+
+//go:noescape
+func LoadAcq64(ptr *uint64) uint64
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
diff --git a/src/runtime/internal/atomic/atomic_mips64x.s b/src/runtime/internal/atomic/atomic_mips64x.s
new file mode 100644
index 0000000..125c0c2
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_mips64x.s
@@ -0,0 +1,57 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+#include "textflag.h"
+
+#define SYNC	WORD $0xf
+
+// uint32 runtime∕internal∕atomic·Load(uint32 volatile* ptr)
+TEXT ·Load(SB),NOSPLIT|NOFRAME,$0-12
+	MOVV	ptr+0(FP), R1
+	SYNC
+	MOVWU	0(R1), R1
+	SYNC
+	MOVW	R1, ret+8(FP)
+	RET
+
+// uint8 runtime∕internal∕atomic·Load8(uint8 volatile* ptr)
+TEXT ·Load8(SB),NOSPLIT|NOFRAME,$0-9
+	MOVV	ptr+0(FP), R1
+	SYNC
+	MOVBU	0(R1), R1
+	SYNC
+	MOVB	R1, ret+8(FP)
+	RET
+
+// uint64 runtime∕internal∕atomic·Load64(uint64 volatile* ptr)
+TEXT ·Load64(SB),NOSPLIT|NOFRAME,$0-16
+	MOVV	ptr+0(FP), R1
+	SYNC
+	MOVV	0(R1), R1
+	SYNC
+	MOVV	R1, ret+8(FP)
+	RET
+
+// void *runtime∕internal∕atomic·Loadp(void *volatile *ptr)
+TEXT ·Loadp(SB),NOSPLIT|NOFRAME,$0-16
+	MOVV	ptr+0(FP), R1
+	SYNC
+	MOVV	0(R1), R1
+	SYNC
+	MOVV	R1, ret+8(FP)
+	RET
+
+// uint32 runtime∕internal∕atomic·LoadAcq(uint32 volatile* ptr)
+TEXT ·LoadAcq(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	atomic·Load(SB)
+
+// uint64 runtime∕internal∕atomic·LoadAcq64(uint64 volatile* ptr)
+TEXT ·LoadAcq64(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	atomic·Load64(SB)
+
+// uintptr runtime∕internal∕atomic·LoadAcquintptr(uintptr volatile* ptr)
+TEXT ·LoadAcquintptr(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	atomic·Load64(SB)
diff --git a/src/runtime/internal/atomic/atomic_mipsx.go b/src/runtime/internal/atomic/atomic_mipsx.go
new file mode 100644
index 0000000..1336b50
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_mipsx.go
@@ -0,0 +1,166 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+// Export some functions via linkname to assembly in sync/atomic.
+//go:linkname Xadd64
+//go:linkname Xchg64
+//go:linkname Cas64
+//go:linkname Load64
+//go:linkname Store64
+
+package atomic
+
+import (
+	"internal/cpu"
+	"unsafe"
+)
+
+// TODO implement lock striping
+var lock struct {
+	state uint32
+	pad   [cpu.CacheLinePadSize - 4]byte
+}
+
+//go:noescape
+func spinLock(state *uint32)
+
+//go:noescape
+func spinUnlock(state *uint32)
+
+//go:nosplit
+func lockAndCheck(addr *uint64) {
+	// ensure 8-byte alignment
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		panicUnaligned()
+	}
+	// force dereference before taking lock
+	_ = *addr
+
+	spinLock(&lock.state)
+}
+
+//go:nosplit
+func unlock() {
+	spinUnlock(&lock.state)
+}
+
+//go:nosplit
+func unlockNoFence() {
+	lock.state = 0
+}
+
+//go:nosplit
+func Xadd64(addr *uint64, delta int64) (new uint64) {
+	lockAndCheck(addr)
+
+	new = *addr + uint64(delta)
+	*addr = new
+
+	unlock()
+	return
+}
+
+//go:nosplit
+func Xchg64(addr *uint64, new uint64) (old uint64) {
+	lockAndCheck(addr)
+
+	old = *addr
+	*addr = new
+
+	unlock()
+	return
+}
+
+//go:nosplit
+func Cas64(addr *uint64, old, new uint64) (swapped bool) {
+	lockAndCheck(addr)
+
+	if (*addr) == old {
+		*addr = new
+		unlock()
+		return true
+	}
+
+	unlockNoFence()
+	return false
+}
+
+//go:nosplit
+func Load64(addr *uint64) (val uint64) {
+	lockAndCheck(addr)
+
+	val = *addr
+
+	unlock()
+	return
+}
+
+//go:nosplit
+func Store64(addr *uint64, val uint64) {
+	lockAndCheck(addr)
+
+	*addr = val
+
+	unlock()
+	return
+}
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(ptr *uint32) uint32
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
+
+//go:noescape
+func CasRel(addr *uint32, old, new uint32) bool
diff --git a/src/runtime/internal/atomic/atomic_mipsx.s b/src/runtime/internal/atomic/atomic_mipsx.s
new file mode 100644
index 0000000..aeebc8f
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_mipsx.s
@@ -0,0 +1,28 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+#include "textflag.h"
+
+TEXT ·spinLock(SB),NOSPLIT,$0-4
+	MOVW	state+0(FP), R1
+	MOVW	$1, R2
+	SYNC
+try_lock:
+	MOVW	R2, R3
+check_again:
+	LL	(R1), R4
+	BNE	R4, check_again
+	SC	R3, (R1)
+	BEQ	R3, try_lock
+	SYNC
+	RET
+
+TEXT ·spinUnlock(SB),NOSPLIT,$0-4
+	MOVW	state+0(FP), R1
+	SYNC
+	MOVW	R0, (R1)
+	SYNC
+	RET
diff --git a/src/runtime/internal/atomic/atomic_ppc64x.go b/src/runtime/internal/atomic/atomic_ppc64x.go
new file mode 100644
index 0000000..e4b109f
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_ppc64x.go
@@ -0,0 +1,89 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+package atomic
+
+import "unsafe"
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(ptr *uint32) uint32
+
+//go:noescape
+func LoadAcq64(ptr *uint64) uint64
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
diff --git a/src/runtime/internal/atomic/atomic_ppc64x.s b/src/runtime/internal/atomic/atomic_ppc64x.s
new file mode 100644
index 0000000..b79cdbc
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_ppc64x.s
@@ -0,0 +1,80 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+#include "textflag.h"
+
+
+// For more details about how various memory models are
+// enforced on POWER, the following paper provides more
+// details about how they enforce C/C++ like models. This
+// gives context about why the strange looking code
+// sequences below work.
+//
+// http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html
+
+// uint32 runtime∕internal∕atomic·Load(uint32 volatile* ptr)
+TEXT ·Load(SB),NOSPLIT|NOFRAME,$-8-12
+	MOVD	ptr+0(FP), R3
+	SYNC
+	MOVWZ	0(R3), R3
+	CMPW	R3, R3, CR7
+	BC	4, 30, 1(PC) // bne- cr7,0x4
+	ISYNC
+	MOVW	R3, ret+8(FP)
+	RET
+
+// uint8 runtime∕internal∕atomic·Load8(uint8 volatile* ptr)
+TEXT ·Load8(SB),NOSPLIT|NOFRAME,$-8-9
+	MOVD	ptr+0(FP), R3
+	SYNC
+	MOVBZ	0(R3), R3
+	CMP	R3, R3, CR7
+	BC	4, 30, 1(PC) // bne- cr7,0x4
+	ISYNC
+	MOVB	R3, ret+8(FP)
+	RET
+
+// uint64 runtime∕internal∕atomic·Load64(uint64 volatile* ptr)
+TEXT ·Load64(SB),NOSPLIT|NOFRAME,$-8-16
+	MOVD	ptr+0(FP), R3
+	SYNC
+	MOVD	0(R3), R3
+	CMP	R3, R3, CR7
+	BC	4, 30, 1(PC) // bne- cr7,0x4
+	ISYNC
+	MOVD	R3, ret+8(FP)
+	RET
+
+// void *runtime∕internal∕atomic·Loadp(void *volatile *ptr)
+TEXT ·Loadp(SB),NOSPLIT|NOFRAME,$-8-16
+	MOVD	ptr+0(FP), R3
+	SYNC
+	MOVD	0(R3), R3
+	CMP	R3, R3, CR7
+	BC	4, 30, 1(PC) // bne- cr7,0x4
+	ISYNC
+	MOVD	R3, ret+8(FP)
+	RET
+
+// uint32 runtime∕internal∕atomic·LoadAcq(uint32 volatile* ptr)
+TEXT ·LoadAcq(SB),NOSPLIT|NOFRAME,$-8-12
+	MOVD   ptr+0(FP), R3
+	MOVWZ  0(R3), R3
+	CMPW   R3, R3, CR7
+	BC     4, 30, 1(PC) // bne- cr7, 0x4
+	ISYNC
+	MOVW   R3, ret+8(FP)
+	RET
+
+// uint64 runtime∕internal∕atomic·LoadAcq64(uint64 volatile* ptr)
+TEXT ·LoadAcq64(SB),NOSPLIT|NOFRAME,$-8-16
+	MOVD   ptr+0(FP), R3
+	MOVD   0(R3), R3
+	CMP    R3, R3, CR7
+	BC     4, 30, 1(PC) // bne- cr7, 0x4
+	ISYNC
+	MOVD   R3, ret+8(FP)
+	RET
diff --git a/src/runtime/internal/atomic/atomic_riscv64.go b/src/runtime/internal/atomic/atomic_riscv64.go
new file mode 100644
index 0000000..8f24d61
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_riscv64.go
@@ -0,0 +1,85 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic
+
+import "unsafe"
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(ptr *uint32) uint32
+
+//go:noescape
+func LoadAcq64(ptr *uint64) uint64
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
diff --git a/src/runtime/internal/atomic/atomic_riscv64.s b/src/runtime/internal/atomic/atomic_riscv64.s
new file mode 100644
index 0000000..74c896c
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_riscv64.s
@@ -0,0 +1,258 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// RISC-V's atomic operations have two bits, aq ("acquire") and rl ("release"),
+// which may be toggled on and off. Their precise semantics are defined in
+// section 6.3 of the specification, but the basic idea is as follows:
+//
+//   - If neither aq nor rl is set, the CPU may reorder the atomic arbitrarily.
+//     It guarantees only that it will execute atomically.
+//
+//   - If aq is set, the CPU may move the instruction backward, but not forward.
+//
+//   - If rl is set, the CPU may move the instruction forward, but not backward.
+//
+//   - If both are set, the CPU may not reorder the instruction at all.
+//
+// These four modes correspond to other well-known memory models on other CPUs.
+// On ARM, aq corresponds to a dmb ishst, aq+rl corresponds to a dmb ish. On
+// Intel, aq corresponds to an lfence, rl to an sfence, and aq+rl to an mfence
+// (or a lock prefix).
+//
+// Go's memory model requires that
+//   - if a read happens after a write, the read must observe the write, and
+//     that
+//   - if a read happens concurrently with a write, the read may observe the
+//     write.
+// aq is sufficient to guarantee this, so that's what we use here. (This jibes
+// with ARM, which uses dmb ishst.)
+
+#include "textflag.h"
+
+// Atomically:
+//      if(*val == *old){
+//              *val = new;
+//              return 1;
+//      } else {
+//              return 0;
+//      }
+
+TEXT ·Cas(SB), NOSPLIT, $0-17
+	MOV	ptr+0(FP), A0
+	MOVW	old+8(FP), A1
+	MOVW	new+12(FP), A2
+cas_again:
+	LRW	(A0), A3
+	BNE	A3, A1, cas_fail
+	SCW	A2, (A0), A4
+	BNE	A4, ZERO, cas_again
+	MOV	$1, A0
+	MOVB	A0, ret+16(FP)
+	RET
+cas_fail:
+	MOV	$0, A0
+	MOV	A0, ret+16(FP)
+	RET
+
+// func Cas64(ptr *uint64, old, new uint64) bool
+TEXT ·Cas64(SB), NOSPLIT, $0-25
+	MOV	ptr+0(FP), A0
+	MOV	old+8(FP), A1
+	MOV	new+16(FP), A2
+cas_again:
+	LRD	(A0), A3
+	BNE	A3, A1, cas_fail
+	SCD	A2, (A0), A4
+	BNE	A4, ZERO, cas_again
+	MOV	$1, A0
+	MOVB	A0, ret+24(FP)
+	RET
+cas_fail:
+	MOVB	ZERO, ret+24(FP)
+	RET
+
+// func Load(ptr *uint32) uint32
+TEXT ·Load(SB),NOSPLIT|NOFRAME,$0-12
+	MOV	ptr+0(FP), A0
+	LRW	(A0), A0
+	MOVW	A0, ret+8(FP)
+	RET
+
+// func Load8(ptr *uint8) uint8
+TEXT ·Load8(SB),NOSPLIT|NOFRAME,$0-9
+	MOV	ptr+0(FP), A0
+	FENCE
+	MOVBU	(A0), A1
+	FENCE
+	MOVB	A1, ret+8(FP)
+	RET
+
+// func Load64(ptr *uint64) uint64
+TEXT ·Load64(SB),NOSPLIT|NOFRAME,$0-16
+	MOV	ptr+0(FP), A0
+	LRD	(A0), A0
+	MOV	A0, ret+8(FP)
+	RET
+
+// func Store(ptr *uint32, val uint32)
+TEXT ·Store(SB), NOSPLIT, $0-12
+	MOV	ptr+0(FP), A0
+	MOVW	val+8(FP), A1
+	AMOSWAPW A1, (A0), ZERO
+	RET
+
+// func Store8(ptr *uint8, val uint8)
+TEXT ·Store8(SB), NOSPLIT, $0-9
+	MOV	ptr+0(FP), A0
+	MOVBU	val+8(FP), A1
+	FENCE
+	MOVB	A1, (A0)
+	FENCE
+	RET
+
+// func Store64(ptr *uint64, val uint64)
+TEXT ·Store64(SB), NOSPLIT, $0-16
+	MOV	ptr+0(FP), A0
+	MOV	val+8(FP), A1
+	AMOSWAPD A1, (A0), ZERO
+	RET
+
+TEXT ·Casp1(SB), NOSPLIT, $0-25
+	JMP	·Cas64(SB)
+
+TEXT ·Casuintptr(SB),NOSPLIT,$0-25
+	JMP	·Cas64(SB)
+
+TEXT ·CasRel(SB), NOSPLIT, $0-17
+	JMP	·Cas(SB)
+
+TEXT ·Loaduintptr(SB),NOSPLIT,$0-16
+	JMP	·Load64(SB)
+
+TEXT ·Storeuintptr(SB),NOSPLIT,$0-16
+	JMP	·Store64(SB)
+
+TEXT ·Loaduint(SB),NOSPLIT,$0-16
+	JMP ·Loaduintptr(SB)
+
+TEXT ·Loadint64(SB),NOSPLIT,$0-16
+	JMP ·Loaduintptr(SB)
+
+TEXT ·Xaddint64(SB),NOSPLIT,$0-24
+	MOV	ptr+0(FP), A0
+	MOV	delta+8(FP), A1
+	AMOADDD A1, (A0), A0
+	ADD	A0, A1, A0
+	MOVW	A0, ret+16(FP)
+	RET
+
+TEXT ·LoadAcq(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	·Load(SB)
+
+TEXT ·LoadAcq64(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	·Load64(SB)
+
+TEXT ·LoadAcquintptr(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	·Load64(SB)
+
+// func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+TEXT ·Loadp(SB),NOSPLIT,$0-16
+	JMP	·Load64(SB)
+
+// func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreRel(SB), NOSPLIT, $0-12
+	JMP	·Store(SB)
+
+TEXT ·StoreRel64(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreReluintptr(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+// func Xchg(ptr *uint32, new uint32) uint32
+TEXT ·Xchg(SB), NOSPLIT, $0-20
+	MOV	ptr+0(FP), A0
+	MOVW	new+8(FP), A1
+	AMOSWAPW A1, (A0), A1
+	MOVW	A1, ret+16(FP)
+	RET
+
+// func Xchg64(ptr *uint64, new uint64) uint64
+TEXT ·Xchg64(SB), NOSPLIT, $0-24
+	MOV	ptr+0(FP), A0
+	MOV	new+8(FP), A1
+	AMOSWAPD A1, (A0), A1
+	MOV	A1, ret+16(FP)
+	RET
+
+// Atomically:
+//      *val += delta;
+//      return *val;
+
+// func Xadd(ptr *uint32, delta int32) uint32
+TEXT ·Xadd(SB), NOSPLIT, $0-20
+	MOV	ptr+0(FP), A0
+	MOVW	delta+8(FP), A1
+	AMOADDW A1, (A0), A2
+	ADD	A2,A1,A0
+	MOVW	A0, ret+16(FP)
+	RET
+
+// func Xadd64(ptr *uint64, delta int64) uint64
+TEXT ·Xadd64(SB), NOSPLIT, $0-24
+	MOV	ptr+0(FP), A0
+	MOV	delta+8(FP), A1
+	AMOADDD A1, (A0), A2
+	ADD	A2, A1, A0
+	MOV	A0, ret+16(FP)
+	RET
+
+// func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-24
+	JMP	·Xadd64(SB)
+
+// func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-24
+	JMP	·Xchg64(SB)
+
+// func And8(ptr *uint8, val uint8)
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOV	ptr+0(FP), A0
+	MOVBU	val+8(FP), A1
+	AND	$3, A0, A2
+	AND	$-4, A0
+	SLL	$3, A2
+	XOR	$255, A1
+	SLL	A2, A1
+	XOR	$-1, A1
+	AMOANDW A1, (A0), ZERO
+	RET
+
+// func Or8(ptr *uint8, val uint8)
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOV	ptr+0(FP), A0
+	MOVBU	val+8(FP), A1
+	AND	$3, A0, A2
+	AND	$-4, A0
+	SLL	$3, A2
+	SLL	A2, A1
+	AMOORW	A1, (A0), ZERO
+	RET
+
+// func And(ptr *uint32, val uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOV	ptr+0(FP), A0
+	MOVW	val+8(FP), A1
+	AMOANDW	A1, (A0), ZERO
+	RET
+
+// func Or(ptr *uint32, val uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOV	ptr+0(FP), A0
+	MOVW	val+8(FP), A1
+	AMOORW	A1, (A0), ZERO
+	RET
diff --git a/src/runtime/internal/atomic/atomic_s390x.go b/src/runtime/internal/atomic/atomic_s390x.go
new file mode 100644
index 0000000..a058d60
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_s390x.go
@@ -0,0 +1,122 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic
+
+import "unsafe"
+
+// Export some functions via linkname to assembly in sync/atomic.
+//go:linkname Load
+//go:linkname Loadp
+//go:linkname Load64
+
+//go:nosplit
+//go:noinline
+func Load(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer {
+	return *(*unsafe.Pointer)(ptr)
+}
+
+//go:nosplit
+//go:noinline
+func Load8(ptr *uint8) uint8 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Load64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcquintptr(ptr *uintptr) uintptr {
+	return *ptr
+}
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:nosplit
+//go:noinline
+func StoreRel(ptr *uint32, val uint32) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func StoreRel64(ptr *uint64, val uint64) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func StoreReluintptr(ptr *uintptr, val uintptr) {
+	*ptr = val
+}
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
diff --git a/src/runtime/internal/atomic/atomic_test.go b/src/runtime/internal/atomic/atomic_test.go
new file mode 100644
index 0000000..c9c2eba
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_test.go
@@ -0,0 +1,356 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic_test
+
+import (
+	"runtime"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"testing"
+	"unsafe"
+)
+
+func runParallel(N, iter int, f func()) {
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(int(N)))
+	done := make(chan bool)
+	for i := 0; i < N; i++ {
+		go func() {
+			for j := 0; j < iter; j++ {
+				f()
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < N; i++ {
+		<-done
+	}
+}
+
+func TestXadduintptr(t *testing.T) {
+	N := 20
+	iter := 100000
+	if testing.Short() {
+		N = 10
+		iter = 10000
+	}
+	inc := uintptr(100)
+	total := uintptr(0)
+	runParallel(N, iter, func() {
+		atomic.Xadduintptr(&total, inc)
+	})
+	if want := uintptr(N*iter) * inc; want != total {
+		t.Fatalf("xadduintpr error, want %d, got %d", want, total)
+	}
+	total = 0
+	runParallel(N, iter, func() {
+		atomic.Xadduintptr(&total, inc)
+		atomic.Xadduintptr(&total, uintptr(-int64(inc)))
+	})
+	if total != 0 {
+		t.Fatalf("xadduintpr total error, want %d, got %d", 0, total)
+	}
+}
+
+// Tests that xadduintptr correctly updates 64-bit values. The place where
+// we actually do so is mstats.go, functions mSysStat{Inc,Dec}.
+func TestXadduintptrOnUint64(t *testing.T) {
+	if sys.BigEndian {
+		// On big endian architectures, we never use xadduintptr to update
+		// 64-bit values and hence we skip the test.  (Note that functions
+		// mSysStat{Inc,Dec} in mstats.go have explicit checks for
+		// big-endianness.)
+		t.Skip("skip xadduintptr on big endian architecture")
+	}
+	const inc = 100
+	val := uint64(0)
+	atomic.Xadduintptr((*uintptr)(unsafe.Pointer(&val)), inc)
+	if inc != val {
+		t.Fatalf("xadduintptr should increase lower-order bits, want %d, got %d", inc, val)
+	}
+}
+
+func shouldPanic(t *testing.T, name string, f func()) {
+	defer func() {
+		// Check that all GC maps are sane.
+		runtime.GC()
+
+		err := recover()
+		want := "unaligned 64-bit atomic operation"
+		if err == nil {
+			t.Errorf("%s did not panic", name)
+		} else if s, _ := err.(string); s != want {
+			t.Errorf("%s: wanted panic %q, got %q", name, want, err)
+		}
+	}()
+	f()
+}
+
+// Variant of sync/atomic's TestUnaligned64:
+func TestUnaligned64(t *testing.T) {
+	// Unaligned 64-bit atomics on 32-bit systems are
+	// a continual source of pain. Test that on 32-bit systems they crash
+	// instead of failing silently.
+
+	if unsafe.Sizeof(int(0)) != 4 {
+		t.Skip("test only runs on 32-bit systems")
+	}
+
+	x := make([]uint32, 4)
+	u := unsafe.Pointer(uintptr(unsafe.Pointer(&x[0])) | 4) // force alignment to 4
+
+	up64 := (*uint64)(u) // misaligned
+	p64 := (*int64)(u)   // misaligned
+
+	shouldPanic(t, "Load64", func() { atomic.Load64(up64) })
+	shouldPanic(t, "Loadint64", func() { atomic.Loadint64(p64) })
+	shouldPanic(t, "Store64", func() { atomic.Store64(up64, 0) })
+	shouldPanic(t, "Xadd64", func() { atomic.Xadd64(up64, 1) })
+	shouldPanic(t, "Xchg64", func() { atomic.Xchg64(up64, 1) })
+	shouldPanic(t, "Cas64", func() { atomic.Cas64(up64, 1, 2) })
+}
+
+func TestAnd8(t *testing.T) {
+	// Basic sanity check.
+	x := uint8(0xff)
+	for i := uint8(0); i < 8; i++ {
+		atomic.And8(&x, ^(1 << i))
+		if r := uint8(0xff) << (i + 1); x != r {
+			t.Fatalf("clearing bit %#x: want %#x, got %#x", uint8(1<<i), r, x)
+		}
+	}
+
+	// Set every bit in array to 1.
+	a := make([]uint8, 1<<12)
+	for i := range a {
+		a[i] = 0xff
+	}
+
+	// Clear array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 8; i++ {
+		m := ^uint8(1 << i)
+		go func() {
+			for i := range a {
+				atomic.And8(&a[i], m)
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 8; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally cleared.
+	for i, v := range a {
+		if v != 0 {
+			t.Fatalf("a[%v] not cleared: want %#x, got %#x", i, uint8(0), v)
+		}
+	}
+}
+
+func TestAnd(t *testing.T) {
+	// Basic sanity check.
+	x := uint32(0xffffffff)
+	for i := uint32(0); i < 32; i++ {
+		atomic.And(&x, ^(1 << i))
+		if r := uint32(0xffffffff) << (i + 1); x != r {
+			t.Fatalf("clearing bit %#x: want %#x, got %#x", uint32(1<<i), r, x)
+		}
+	}
+
+	// Set every bit in array to 1.
+	a := make([]uint32, 1<<12)
+	for i := range a {
+		a[i] = 0xffffffff
+	}
+
+	// Clear array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 32; i++ {
+		m := ^uint32(1 << i)
+		go func() {
+			for i := range a {
+				atomic.And(&a[i], m)
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 32; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally cleared.
+	for i, v := range a {
+		if v != 0 {
+			t.Fatalf("a[%v] not cleared: want %#x, got %#x", i, uint32(0), v)
+		}
+	}
+}
+
+func TestOr8(t *testing.T) {
+	// Basic sanity check.
+	x := uint8(0)
+	for i := uint8(0); i < 8; i++ {
+		atomic.Or8(&x, 1<<i)
+		if r := (uint8(1) << (i + 1)) - 1; x != r {
+			t.Fatalf("setting bit %#x: want %#x, got %#x", uint8(1)<<i, r, x)
+		}
+	}
+
+	// Start with every bit in array set to 0.
+	a := make([]uint8, 1<<12)
+
+	// Set every bit in array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 8; i++ {
+		m := uint8(1 << i)
+		go func() {
+			for i := range a {
+				atomic.Or8(&a[i], m)
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 8; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally set.
+	for i, v := range a {
+		if v != 0xff {
+			t.Fatalf("a[%v] not fully set: want %#x, got %#x", i, uint8(0xff), v)
+		}
+	}
+}
+
+func TestOr(t *testing.T) {
+	// Basic sanity check.
+	x := uint32(0)
+	for i := uint32(0); i < 32; i++ {
+		atomic.Or(&x, 1<<i)
+		if r := (uint32(1) << (i + 1)) - 1; x != r {
+			t.Fatalf("setting bit %#x: want %#x, got %#x", uint32(1)<<i, r, x)
+		}
+	}
+
+	// Start with every bit in array set to 0.
+	a := make([]uint32, 1<<12)
+
+	// Set every bit in array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 32; i++ {
+		m := uint32(1 << i)
+		go func() {
+			for i := range a {
+				atomic.Or(&a[i], m)
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 32; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally set.
+	for i, v := range a {
+		if v != 0xffffffff {
+			t.Fatalf("a[%v] not fully set: want %#x, got %#x", i, uint32(0xffffffff), v)
+		}
+	}
+}
+
+func TestBitwiseContended8(t *testing.T) {
+	// Start with every bit in array set to 0.
+	a := make([]uint8, 16)
+
+	// Iterations to try.
+	N := 1 << 16
+	if testing.Short() {
+		N = 1 << 10
+	}
+
+	// Set and then clear every bit in the array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 8; i++ {
+		m := uint8(1 << i)
+		go func() {
+			for n := 0; n < N; n++ {
+				for i := range a {
+					atomic.Or8(&a[i], m)
+					if atomic.Load8(&a[i])&m != m {
+						t.Errorf("a[%v] bit %#x not set", i, m)
+					}
+					atomic.And8(&a[i], ^m)
+					if atomic.Load8(&a[i])&m != 0 {
+						t.Errorf("a[%v] bit %#x not clear", i, m)
+					}
+				}
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 8; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally cleared.
+	for i, v := range a {
+		if v != 0 {
+			t.Fatalf("a[%v] not cleared: want %#x, got %#x", i, uint8(0), v)
+		}
+	}
+}
+
+func TestBitwiseContended(t *testing.T) {
+	// Start with every bit in array set to 0.
+	a := make([]uint32, 16)
+
+	// Iterations to try.
+	N := 1 << 16
+	if testing.Short() {
+		N = 1 << 10
+	}
+
+	// Set and then clear every bit in the array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 32; i++ {
+		m := uint32(1 << i)
+		go func() {
+			for n := 0; n < N; n++ {
+				for i := range a {
+					atomic.Or(&a[i], m)
+					if atomic.Load(&a[i])&m != m {
+						t.Errorf("a[%v] bit %#x not set", i, m)
+					}
+					atomic.And(&a[i], ^m)
+					if atomic.Load(&a[i])&m != 0 {
+						t.Errorf("a[%v] bit %#x not clear", i, m)
+					}
+				}
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 32; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally cleared.
+	for i, v := range a {
+		if v != 0 {
+			t.Fatalf("a[%v] not cleared: want %#x, got %#x", i, uint32(0), v)
+		}
+	}
+}
+
+func TestStorepNoWB(t *testing.T) {
+	var p [2]*int
+	for i := range p {
+		atomic.StorepNoWB(unsafe.Pointer(&p[i]), unsafe.Pointer(new(int)))
+	}
+	if p[0] == p[1] {
+		t.Error("Bad escape analysis of StorepNoWB")
+	}
+}
diff --git a/src/runtime/internal/atomic/atomic_wasm.go b/src/runtime/internal/atomic/atomic_wasm.go
new file mode 100644
index 0000000..b05d98e
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_wasm.go
@@ -0,0 +1,268 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// TODO(neelance): implement with actual atomic operations as soon as threads are available
+// See https://github.com/WebAssembly/design/issues/1073
+
+// Export some functions via linkname to assembly in sync/atomic.
+//go:linkname Load
+//go:linkname Loadp
+//go:linkname Load64
+//go:linkname Loaduintptr
+//go:linkname Xadd
+//go:linkname Xadd64
+//go:linkname Xadduintptr
+//go:linkname Xchg
+//go:linkname Xchg64
+//go:linkname Xchguintptr
+//go:linkname Cas
+//go:linkname Cas64
+//go:linkname Casuintptr
+//go:linkname Store
+//go:linkname Store64
+//go:linkname Storeuintptr
+
+package atomic
+
+import "unsafe"
+
+//go:nosplit
+//go:noinline
+func Load(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer {
+	return *(*unsafe.Pointer)(ptr)
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcquintptr(ptr *uintptr) uintptr {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Load8(ptr *uint8) uint8 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Load64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Xadd(ptr *uint32, delta int32) uint32 {
+	new := *ptr + uint32(delta)
+	*ptr = new
+	return new
+}
+
+//go:nosplit
+//go:noinline
+func Xadd64(ptr *uint64, delta int64) uint64 {
+	new := *ptr + uint64(delta)
+	*ptr = new
+	return new
+}
+
+//go:nosplit
+//go:noinline
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr {
+	new := *ptr + delta
+	*ptr = new
+	return new
+}
+
+//go:nosplit
+//go:noinline
+func Xchg(ptr *uint32, new uint32) uint32 {
+	old := *ptr
+	*ptr = new
+	return old
+}
+
+//go:nosplit
+//go:noinline
+func Xchg64(ptr *uint64, new uint64) uint64 {
+	old := *ptr
+	*ptr = new
+	return old
+}
+
+//go:nosplit
+//go:noinline
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr {
+	old := *ptr
+	*ptr = new
+	return old
+}
+
+//go:nosplit
+//go:noinline
+func And8(ptr *uint8, val uint8) {
+	*ptr = *ptr & val
+}
+
+//go:nosplit
+//go:noinline
+func Or8(ptr *uint8, val uint8) {
+	*ptr = *ptr | val
+}
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:nosplit
+//go:noinline
+func And(ptr *uint32, val uint32) {
+	*ptr = *ptr & val
+}
+
+//go:nosplit
+//go:noinline
+func Or(ptr *uint32, val uint32) {
+	*ptr = *ptr | val
+}
+
+//go:nosplit
+//go:noinline
+func Cas64(ptr *uint64, old, new uint64) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func Store(ptr *uint32, val uint32) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func StoreRel(ptr *uint32, val uint32) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func StoreRel64(ptr *uint64, val uint64) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func StoreReluintptr(ptr *uintptr, val uintptr) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func Store8(ptr *uint8, val uint8) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func Store64(ptr *uint64, val uint64) {
+	*ptr = val
+}
+
+// StorepNoWB performs *ptr = val atomically and without a write
+// barrier.
+//
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:nosplit
+//go:noinline
+func Cas(ptr *uint32, old, new uint32) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func Casp1(ptr *unsafe.Pointer, old, new unsafe.Pointer) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func Casuintptr(ptr *uintptr, old, new uintptr) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func CasRel(ptr *uint32, old, new uint32) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func Storeuintptr(ptr *uintptr, new uintptr) {
+	*ptr = new
+}
+
+//go:nosplit
+//go:noinline
+func Loaduintptr(ptr *uintptr) uintptr {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loaduint(ptr *uint) uint {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadint64(ptr *int64) int64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Xaddint64(ptr *int64, delta int64) int64 {
+	new := *ptr + delta
+	*ptr = new
+	return new
+}
diff --git a/src/runtime/internal/atomic/bench_test.go b/src/runtime/internal/atomic/bench_test.go
new file mode 100644
index 0000000..2476c06
--- /dev/null
+++ b/src/runtime/internal/atomic/bench_test.go
@@ -0,0 +1,195 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic_test
+
+import (
+	"runtime/internal/atomic"
+	"testing"
+)
+
+var sink interface{}
+
+func BenchmarkAtomicLoad64(b *testing.B) {
+	var x uint64
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		_ = atomic.Load64(&x)
+	}
+}
+
+func BenchmarkAtomicStore64(b *testing.B) {
+	var x uint64
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.Store64(&x, 0)
+	}
+}
+
+func BenchmarkAtomicLoad(b *testing.B) {
+	var x uint32
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		_ = atomic.Load(&x)
+	}
+}
+
+func BenchmarkAtomicStore(b *testing.B) {
+	var x uint32
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.Store(&x, 0)
+	}
+}
+
+func BenchmarkAnd8(b *testing.B) {
+	var x [512]uint8 // give byte its own cache line
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.And8(&x[255], uint8(i))
+	}
+}
+
+func BenchmarkAnd(b *testing.B) {
+	var x [128]uint32 // give x its own cache line
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.And(&x[63], uint32(i))
+	}
+}
+
+func BenchmarkAnd8Parallel(b *testing.B) {
+	var x [512]uint8 // give byte its own cache line
+	sink = &x
+	b.RunParallel(func(pb *testing.PB) {
+		i := uint8(0)
+		for pb.Next() {
+			atomic.And8(&x[255], i)
+			i++
+		}
+	})
+}
+
+func BenchmarkAndParallel(b *testing.B) {
+	var x [128]uint32 // give x its own cache line
+	sink = &x
+	b.RunParallel(func(pb *testing.PB) {
+		i := uint32(0)
+		for pb.Next() {
+			atomic.And(&x[63], i)
+			i++
+		}
+	})
+}
+
+func BenchmarkOr8(b *testing.B) {
+	var x [512]uint8 // give byte its own cache line
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.Or8(&x[255], uint8(i))
+	}
+}
+
+func BenchmarkOr(b *testing.B) {
+	var x [128]uint32 // give x its own cache line
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.Or(&x[63], uint32(i))
+	}
+}
+
+func BenchmarkOr8Parallel(b *testing.B) {
+	var x [512]uint8 // give byte its own cache line
+	sink = &x
+	b.RunParallel(func(pb *testing.PB) {
+		i := uint8(0)
+		for pb.Next() {
+			atomic.Or8(&x[255], i)
+			i++
+		}
+	})
+}
+
+func BenchmarkOrParallel(b *testing.B) {
+	var x [128]uint32 // give x its own cache line
+	sink = &x
+	b.RunParallel(func(pb *testing.PB) {
+		i := uint32(0)
+		for pb.Next() {
+			atomic.Or(&x[63], i)
+			i++
+		}
+	})
+}
+
+func BenchmarkXadd(b *testing.B) {
+	var x uint32
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			atomic.Xadd(ptr, 1)
+		}
+	})
+}
+
+func BenchmarkXadd64(b *testing.B) {
+	var x uint64
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			atomic.Xadd64(ptr, 1)
+		}
+	})
+}
+
+func BenchmarkCas(b *testing.B) {
+	var x uint32
+	x = 1
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			atomic.Cas(ptr, 1, 0)
+			atomic.Cas(ptr, 0, 1)
+		}
+	})
+}
+
+func BenchmarkCas64(b *testing.B) {
+	var x uint64
+	x = 1
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			atomic.Cas64(ptr, 1, 0)
+			atomic.Cas64(ptr, 0, 1)
+		}
+	})
+}
+func BenchmarkXchg(b *testing.B) {
+	var x uint32
+	x = 1
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		var y uint32
+		y = 1
+		for pb.Next() {
+			y = atomic.Xchg(ptr, y)
+			y += 1
+		}
+	})
+}
+
+func BenchmarkXchg64(b *testing.B) {
+	var x uint64
+	x = 1
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		var y uint64
+		y = 1
+		for pb.Next() {
+			y = atomic.Xchg64(ptr, y)
+			y += 1
+		}
+	})
+}
diff --git a/src/runtime/internal/atomic/stubs.go b/src/runtime/internal/atomic/stubs.go
new file mode 100644
index 0000000..62e30d1
--- /dev/null
+++ b/src/runtime/internal/atomic/stubs.go
@@ -0,0 +1,35 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !wasm
+
+package atomic
+
+import "unsafe"
+
+//go:noescape
+func Cas(ptr *uint32, old, new uint32) bool
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func Casp1(ptr *unsafe.Pointer, old, new unsafe.Pointer) bool
+
+//go:noescape
+func Casuintptr(ptr *uintptr, old, new uintptr) bool
+
+//go:noescape
+func Storeuintptr(ptr *uintptr, new uintptr)
+
+//go:noescape
+func Loaduintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func Loaduint(ptr *uint) uint
+
+// TODO(matloob): Should these functions have the go:noescape annotation?
+
+//go:noescape
+func Loadint64(ptr *int64) int64
+
+//go:noescape
+func Xaddint64(ptr *int64, delta int64) int64
diff --git a/src/runtime/internal/atomic/sys_linux_arm.s b/src/runtime/internal/atomic/sys_linux_arm.s
new file mode 100644
index 0000000..192be4b
--- /dev/null
+++ b/src/runtime/internal/atomic/sys_linux_arm.s
@@ -0,0 +1,144 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Linux/ARM atomic operations.
+
+// Because there is so much variation in ARM devices,
+// the Linux kernel provides an appropriate compare-and-swap
+// implementation at address 0xffff0fc0.  Caller sets:
+//	R0 = old value
+//	R1 = new value
+//	R2 = addr
+//	LR = return address
+// The function returns with CS true if the swap happened.
+// http://lxr.linux.no/linux+v2.6.37.2/arch/arm/kernel/entry-armv.S#L850
+// On older kernels (before 2.6.24) the function can incorrectly
+// report a conflict, so we have to double-check the compare ourselves
+// and retry if necessary.
+//
+// https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b49c0f24cf6744a3f4fd09289fe7cade349dead5
+//
+TEXT cas<>(SB),NOSPLIT,$0
+	MOVW	$0xffff0fc0, R15 // R15 is hardware PC.
+
+TEXT runtime∕internal∕atomic·Cas(SB),NOSPLIT|NOFRAME,$0
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	·armcas(SB)
+	JMP	kernelcas<>(SB)
+
+TEXT kernelcas<>(SB),NOSPLIT,$0
+	MOVW	ptr+0(FP), R2
+	// trigger potential paging fault here,
+	// because we don't know how to traceback through __kuser_cmpxchg
+	MOVW    (R2), R0
+	MOVW	old+4(FP), R0
+loop:
+	MOVW	new+8(FP), R1
+	BL	cas<>(SB)
+	BCC	check
+	MOVW	$1, R0
+	MOVB	R0, ret+12(FP)
+	RET
+check:
+	// Kernel lies; double-check.
+	MOVW	ptr+0(FP), R2
+	MOVW	old+4(FP), R0
+	MOVW	0(R2), R3
+	CMP	R0, R3
+	BEQ	loop
+	MOVW	$0, R0
+	MOVB	R0, ret+12(FP)
+	RET
+
+// As for cas, memory barriers are complicated on ARM, but the kernel
+// provides a user helper. ARMv5 does not support SMP and has no
+// memory barrier instruction at all. ARMv6 added SMP support and has
+// a memory barrier, but it requires writing to a coprocessor
+// register. ARMv7 introduced the DMB instruction, but it's expensive
+// even on single-core devices. The kernel helper takes care of all of
+// this for us.
+
+// Use kernel helper version of memory_barrier, when compiled with GOARM < 7.
+TEXT memory_barrier<>(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0xffff0fa0, R15 // R15 is hardware PC.
+
+TEXT	·Load(SB),NOSPLIT,$0-8
+	MOVW	addr+0(FP), R0
+	MOVW	(R0), R1
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BGE	native_barrier
+	BL	memory_barrier<>(SB)
+	B	end
+native_barrier:
+	DMB	MB_ISH
+end:
+	MOVW	R1, ret+4(FP)
+	RET
+
+TEXT	·Store(SB),NOSPLIT,$0-8
+	MOVW	addr+0(FP), R1
+	MOVW	v+4(FP), R2
+
+	MOVB	runtime·goarm(SB), R8
+	CMP	$7, R8
+	BGE	native_barrier
+	BL	memory_barrier<>(SB)
+	B	store
+native_barrier:
+	DMB	MB_ISH
+
+store:
+	MOVW	R2, (R1)
+
+	CMP	$7, R8
+	BGE	native_barrier2
+	BL	memory_barrier<>(SB)
+	RET
+native_barrier2:
+	DMB	MB_ISH
+	RET
+
+TEXT	·Load8(SB),NOSPLIT,$0-5
+	MOVW	addr+0(FP), R0
+	MOVB	(R0), R1
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BGE	native_barrier
+	BL	memory_barrier<>(SB)
+	B	end
+native_barrier:
+	DMB	MB_ISH
+end:
+	MOVB	R1, ret+4(FP)
+	RET
+
+TEXT	·Store8(SB),NOSPLIT,$0-5
+	MOVW	addr+0(FP), R1
+	MOVB	v+4(FP), R2
+
+	MOVB	runtime·goarm(SB), R8
+	CMP	$7, R8
+	BGE	native_barrier
+	BL	memory_barrier<>(SB)
+	B	store
+native_barrier:
+	DMB	MB_ISH
+
+store:
+	MOVB	R2, (R1)
+
+	CMP	$7, R8
+	BGE	native_barrier2
+	BL	memory_barrier<>(SB)
+	RET
+native_barrier2:
+	DMB	MB_ISH
+	RET
diff --git a/src/runtime/internal/atomic/sys_nonlinux_arm.s b/src/runtime/internal/atomic/sys_nonlinux_arm.s
new file mode 100644
index 0000000..57568b2
--- /dev/null
+++ b/src/runtime/internal/atomic/sys_nonlinux_arm.s
@@ -0,0 +1,79 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !linux,arm
+
+#include "textflag.h"
+
+// TODO(minux): this is only valid for ARMv6+
+// bool armcas(int32 *val, int32 old, int32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	}else
+//		return 0;
+TEXT	·Cas(SB),NOSPLIT,$0
+	JMP	·armcas(SB)
+
+// Non-linux OSes support only single processor machines before ARMv7.
+// So we don't need memory barriers if goarm < 7. And we fail loud at
+// startup (runtime.checkgoarm) if it is a multi-processor but goarm < 7.
+
+TEXT	·Load(SB),NOSPLIT|NOFRAME,$0-8
+	MOVW	addr+0(FP), R0
+	MOVW	(R0), R1
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	DMB	MB_ISH
+
+	MOVW	R1, ret+4(FP)
+	RET
+
+TEXT	·Store(SB),NOSPLIT,$0-8
+	MOVW	addr+0(FP), R1
+	MOVW	v+4(FP), R2
+
+	MOVB	runtime·goarm(SB), R8
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISH
+
+	MOVW	R2, (R1)
+
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISH
+	RET
+
+TEXT	·Load8(SB),NOSPLIT|NOFRAME,$0-5
+	MOVW	addr+0(FP), R0
+	MOVB	(R0), R1
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	DMB	MB_ISH
+
+	MOVB	R1, ret+4(FP)
+	RET
+
+TEXT	·Store8(SB),NOSPLIT,$0-5
+	MOVW	addr+0(FP), R1
+	MOVB	v+4(FP), R2
+
+	MOVB	runtime·goarm(SB), R8
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISH
+
+	MOVB	R2, (R1)
+
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISH
+	RET
+
diff --git a/src/runtime/internal/atomic/unaligned.go b/src/runtime/internal/atomic/unaligned.go
new file mode 100644
index 0000000..a859de4
--- /dev/null
+++ b/src/runtime/internal/atomic/unaligned.go
@@ -0,0 +1,9 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic
+
+func panicUnaligned() {
+	panic("unaligned 64-bit atomic operation")
+}
diff --git a/src/runtime/internal/math/math.go b/src/runtime/internal/math/math.go
new file mode 100644
index 0000000..5385f5d
--- /dev/null
+++ b/src/runtime/internal/math/math.go
@@ -0,0 +1,19 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package math
+
+import "runtime/internal/sys"
+
+const MaxUintptr = ^uintptr(0)
+
+// MulUintptr returns a * b and whether the multiplication overflowed.
+// On supported platforms this is an intrinsic lowered by the compiler.
+func MulUintptr(a, b uintptr) (uintptr, bool) {
+	if a|b < 1<<(4*sys.PtrSize) || a == 0 {
+		return a * b, false
+	}
+	overflow := b > MaxUintptr/a
+	return a * b, overflow
+}
diff --git a/src/runtime/internal/math/math_test.go b/src/runtime/internal/math/math_test.go
new file mode 100644
index 0000000..303eb63
--- /dev/null
+++ b/src/runtime/internal/math/math_test.go
@@ -0,0 +1,79 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package math_test
+
+import (
+	. "runtime/internal/math"
+	"testing"
+)
+
+const (
+	UintptrSize = 32 << (^uintptr(0) >> 63)
+)
+
+type mulUintptrTest struct {
+	a        uintptr
+	b        uintptr
+	overflow bool
+}
+
+var mulUintptrTests = []mulUintptrTest{
+	{0, 0, false},
+	{1000, 1000, false},
+	{MaxUintptr, 0, false},
+	{MaxUintptr, 1, false},
+	{MaxUintptr / 2, 2, false},
+	{MaxUintptr / 2, 3, true},
+	{MaxUintptr, 10, true},
+	{MaxUintptr, 100, true},
+	{MaxUintptr / 100, 100, false},
+	{MaxUintptr / 1000, 1001, true},
+	{1<<(UintptrSize/2) - 1, 1<<(UintptrSize/2) - 1, false},
+	{1 << (UintptrSize / 2), 1 << (UintptrSize / 2), true},
+	{MaxUintptr >> 32, MaxUintptr >> 32, false},
+	{MaxUintptr, MaxUintptr, true},
+}
+
+func TestMulUintptr(t *testing.T) {
+	for _, test := range mulUintptrTests {
+		a, b := test.a, test.b
+		for i := 0; i < 2; i++ {
+			mul, overflow := MulUintptr(a, b)
+			if mul != a*b || overflow != test.overflow {
+				t.Errorf("MulUintptr(%v, %v) = %v, %v want %v, %v",
+					a, b, mul, overflow, a*b, test.overflow)
+			}
+			a, b = b, a
+		}
+	}
+}
+
+var SinkUintptr uintptr
+var SinkBool bool
+
+var x, y uintptr
+
+func BenchmarkMulUintptr(b *testing.B) {
+	x, y = 1, 2
+	b.Run("small", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			var overflow bool
+			SinkUintptr, overflow = MulUintptr(x, y)
+			if overflow {
+				SinkUintptr = 0
+			}
+		}
+	})
+	x, y = MaxUintptr, MaxUintptr-1
+	b.Run("large", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			var overflow bool
+			SinkUintptr, overflow = MulUintptr(x, y)
+			if overflow {
+				SinkUintptr = 0
+			}
+		}
+	})
+}
diff --git a/src/runtime/internal/sys/arch.go b/src/runtime/internal/sys/arch.go
new file mode 100644
index 0000000..13c00cf
--- /dev/null
+++ b/src/runtime/internal/sys/arch.go
@@ -0,0 +1,20 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+type ArchFamilyType int
+
+const (
+	AMD64 ArchFamilyType = iota
+	ARM
+	ARM64
+	I386
+	MIPS
+	MIPS64
+	PPC64
+	RISCV64
+	S390X
+	WASM
+)
diff --git a/src/runtime/internal/sys/arch_386.go b/src/runtime/internal/sys/arch_386.go
new file mode 100644
index 0000000..b51f70a
--- /dev/null
+++ b/src/runtime/internal/sys/arch_386.go
@@ -0,0 +1,16 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = I386
+	BigEndian           = false
+	DefaultPhysPageSize = 4096
+	PCQuantum           = 1
+	Int64Align          = 4
+	MinFrameSize        = 0
+)
+
+type Uintreg uint32
diff --git a/src/runtime/internal/sys/arch_amd64.go b/src/runtime/internal/sys/arch_amd64.go
new file mode 100644
index 0000000..3d6776e
--- /dev/null
+++ b/src/runtime/internal/sys/arch_amd64.go
@@ -0,0 +1,16 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = AMD64
+	BigEndian           = false
+	DefaultPhysPageSize = 4096
+	PCQuantum           = 1
+	Int64Align          = 8
+	MinFrameSize        = 0
+)
+
+type Uintreg uint64
diff --git a/src/runtime/internal/sys/arch_arm.go b/src/runtime/internal/sys/arch_arm.go
new file mode 100644
index 0000000..97960d6
--- /dev/null
+++ b/src/runtime/internal/sys/arch_arm.go
@@ -0,0 +1,16 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = ARM
+	BigEndian           = false
+	DefaultPhysPageSize = 65536
+	PCQuantum           = 4
+	Int64Align          = 4
+	MinFrameSize        = 4
+)
+
+type Uintreg uint32
diff --git a/src/runtime/internal/sys/arch_arm64.go b/src/runtime/internal/sys/arch_arm64.go
new file mode 100644
index 0000000..911a948
--- /dev/null
+++ b/src/runtime/internal/sys/arch_arm64.go
@@ -0,0 +1,16 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = ARM64
+	BigEndian           = false
+	DefaultPhysPageSize = 65536
+	PCQuantum           = 4
+	Int64Align          = 8
+	MinFrameSize        = 8
+)
+
+type Uintreg uint64
diff --git a/src/runtime/internal/sys/arch_mips.go b/src/runtime/internal/sys/arch_mips.go
new file mode 100644
index 0000000..75cdb2e
--- /dev/null
+++ b/src/runtime/internal/sys/arch_mips.go
@@ -0,0 +1,16 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = MIPS
+	BigEndian           = true
+	DefaultPhysPageSize = 65536
+	PCQuantum           = 4
+	Int64Align          = 4
+	MinFrameSize        = 4
+)
+
+type Uintreg uint32
diff --git a/src/runtime/internal/sys/arch_mips64.go b/src/runtime/internal/sys/arch_mips64.go
new file mode 100644
index 0000000..494291a
--- /dev/null
+++ b/src/runtime/internal/sys/arch_mips64.go
@@ -0,0 +1,16 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = MIPS64
+	BigEndian           = true
+	DefaultPhysPageSize = 16384
+	PCQuantum           = 4
+	Int64Align          = 8
+	MinFrameSize        = 8
+)
+
+type Uintreg uint64
diff --git a/src/runtime/internal/sys/arch_mips64le.go b/src/runtime/internal/sys/arch_mips64le.go
new file mode 100644
index 0000000..d36d120
--- /dev/null
+++ b/src/runtime/internal/sys/arch_mips64le.go
@@ -0,0 +1,16 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = MIPS64
+	BigEndian           = false
+	DefaultPhysPageSize = 16384
+	PCQuantum           = 4
+	Int64Align          = 8
+	MinFrameSize        = 8
+)
+
+type Uintreg uint64
diff --git a/src/runtime/internal/sys/arch_mipsle.go b/src/runtime/internal/sys/arch_mipsle.go
new file mode 100644
index 0000000..323bf82
--- /dev/null
+++ b/src/runtime/internal/sys/arch_mipsle.go
@@ -0,0 +1,16 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = MIPS
+	BigEndian           = false
+	DefaultPhysPageSize = 65536
+	PCQuantum           = 4
+	Int64Align          = 4
+	MinFrameSize        = 4
+)
+
+type Uintreg uint32
diff --git a/src/runtime/internal/sys/arch_ppc64.go b/src/runtime/internal/sys/arch_ppc64.go
new file mode 100644
index 0000000..da1fe3d
--- /dev/null
+++ b/src/runtime/internal/sys/arch_ppc64.go
@@ -0,0 +1,16 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = PPC64
+	BigEndian           = true
+	DefaultPhysPageSize = 65536
+	PCQuantum           = 4
+	Int64Align          = 8
+	MinFrameSize        = 32
+)
+
+type Uintreg uint64
diff --git a/src/runtime/internal/sys/arch_ppc64le.go b/src/runtime/internal/sys/arch_ppc64le.go
new file mode 100644
index 0000000..6059799
--- /dev/null
+++ b/src/runtime/internal/sys/arch_ppc64le.go
@@ -0,0 +1,16 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = PPC64
+	BigEndian           = false
+	DefaultPhysPageSize = 65536
+	PCQuantum           = 4
+	Int64Align          = 8
+	MinFrameSize        = 32
+)
+
+type Uintreg uint64
diff --git a/src/runtime/internal/sys/arch_riscv64.go b/src/runtime/internal/sys/arch_riscv64.go
new file mode 100644
index 0000000..7cdcc8f
--- /dev/null
+++ b/src/runtime/internal/sys/arch_riscv64.go
@@ -0,0 +1,18 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = RISCV64
+	BigEndian           = false
+	CacheLineSize       = 64
+	DefaultPhysPageSize = 4096
+	PCQuantum           = 4
+	Int64Align          = 8
+	HugePageSize        = 1 << 21
+	MinFrameSize        = 8
+)
+
+type Uintreg uint64
diff --git a/src/runtime/internal/sys/arch_s390x.go b/src/runtime/internal/sys/arch_s390x.go
new file mode 100644
index 0000000..12cb8a0
--- /dev/null
+++ b/src/runtime/internal/sys/arch_s390x.go
@@ -0,0 +1,16 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = S390X
+	BigEndian           = true
+	DefaultPhysPageSize = 4096
+	PCQuantum           = 2
+	Int64Align          = 8
+	MinFrameSize        = 8
+)
+
+type Uintreg uint64
diff --git a/src/runtime/internal/sys/arch_wasm.go b/src/runtime/internal/sys/arch_wasm.go
new file mode 100644
index 0000000..eb825df
--- /dev/null
+++ b/src/runtime/internal/sys/arch_wasm.go
@@ -0,0 +1,16 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+const (
+	ArchFamily          = WASM
+	BigEndian           = false
+	DefaultPhysPageSize = 65536
+	PCQuantum           = 1
+	Int64Align          = 8
+	MinFrameSize        = 0
+)
+
+type Uintreg uint64
diff --git a/src/runtime/internal/sys/gengoos.go b/src/runtime/internal/sys/gengoos.go
new file mode 100644
index 0000000..9bbc48d
--- /dev/null
+++ b/src/runtime/internal/sys/gengoos.go
@@ -0,0 +1,98 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"log"
+	"os"
+	"strconv"
+	"strings"
+)
+
+var gooses, goarches []string
+
+func main() {
+	data, err := os.ReadFile("../../../go/build/syslist.go")
+	if err != nil {
+		log.Fatal(err)
+	}
+	const (
+		goosPrefix   = `const goosList = `
+		goarchPrefix = `const goarchList = `
+	)
+	for _, line := range strings.Split(string(data), "\n") {
+		if strings.HasPrefix(line, goosPrefix) {
+			text, err := strconv.Unquote(strings.TrimPrefix(line, goosPrefix))
+			if err != nil {
+				log.Fatalf("parsing goosList: %v", err)
+			}
+			gooses = strings.Fields(text)
+		}
+		if strings.HasPrefix(line, goarchPrefix) {
+			text, err := strconv.Unquote(strings.TrimPrefix(line, goarchPrefix))
+			if err != nil {
+				log.Fatalf("parsing goarchList: %v", err)
+			}
+			goarches = strings.Fields(text)
+		}
+	}
+
+	for _, target := range gooses {
+		if target == "nacl" {
+			continue
+		}
+		var buf bytes.Buffer
+		fmt.Fprintf(&buf, "// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.\n\n")
+		if target == "linux" {
+			fmt.Fprintf(&buf, "// +build !android\n") // must explicitly exclude android for linux
+		}
+		if target == "solaris" {
+			fmt.Fprintf(&buf, "// +build !illumos\n") // must explicitly exclude illumos for solaris
+		}
+		if target == "darwin" {
+			fmt.Fprintf(&buf, "// +build !ios\n") // must explicitly exclude ios for darwin
+		}
+		fmt.Fprintf(&buf, "// +build %s\n\n", target) // must explicitly include target for bootstrapping purposes
+		fmt.Fprintf(&buf, "package sys\n\n")
+		fmt.Fprintf(&buf, "const GOOS = `%s`\n\n", target)
+		for _, goos := range gooses {
+			value := 0
+			if goos == target {
+				value = 1
+			}
+			fmt.Fprintf(&buf, "const Goos%s = %d\n", strings.Title(goos), value)
+		}
+		err := os.WriteFile("zgoos_"+target+".go", buf.Bytes(), 0666)
+		if err != nil {
+			log.Fatal(err)
+		}
+	}
+
+	for _, target := range goarches {
+		if target == "amd64p32" {
+			continue
+		}
+		var buf bytes.Buffer
+		fmt.Fprintf(&buf, "// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.\n\n")
+		fmt.Fprintf(&buf, "// +build %s\n\n", target) // must explicitly include target for bootstrapping purposes
+		fmt.Fprintf(&buf, "package sys\n\n")
+		fmt.Fprintf(&buf, "const GOARCH = `%s`\n\n", target)
+		for _, goarch := range goarches {
+			value := 0
+			if goarch == target {
+				value = 1
+			}
+			fmt.Fprintf(&buf, "const Goarch%s = %d\n", strings.Title(goarch), value)
+		}
+		err := os.WriteFile("zgoarch_"+target+".go", buf.Bytes(), 0666)
+		if err != nil {
+			log.Fatal(err)
+		}
+	}
+}
diff --git a/src/runtime/internal/sys/intrinsics.go b/src/runtime/internal/sys/intrinsics.go
new file mode 100644
index 0000000..3c88982
--- /dev/null
+++ b/src/runtime/internal/sys/intrinsics.go
@@ -0,0 +1,91 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !386
+
+// TODO finish intrinsifying 386, deadcode the assembly, remove build tags, merge w/ intrinsics_common
+// TODO replace all uses of CtzXX with TrailingZerosXX; they are the same.
+
+package sys
+
+// Using techniques from http://supertech.csail.mit.edu/papers/debruijn.pdf
+
+const deBruijn64ctz = 0x0218a392cd3d5dbf
+
+var deBruijnIdx64ctz = [64]byte{
+	0, 1, 2, 7, 3, 13, 8, 19,
+	4, 25, 14, 28, 9, 34, 20, 40,
+	5, 17, 26, 38, 15, 46, 29, 48,
+	10, 31, 35, 54, 21, 50, 41, 57,
+	63, 6, 12, 18, 24, 27, 33, 39,
+	16, 37, 45, 47, 30, 53, 49, 56,
+	62, 11, 23, 32, 36, 44, 52, 55,
+	61, 22, 43, 51, 60, 42, 59, 58,
+}
+
+const deBruijn32ctz = 0x04653adf
+
+var deBruijnIdx32ctz = [32]byte{
+	0, 1, 2, 6, 3, 11, 7, 16,
+	4, 14, 12, 21, 8, 23, 17, 26,
+	31, 5, 10, 15, 13, 20, 22, 25,
+	30, 9, 19, 24, 29, 18, 28, 27,
+}
+
+// Ctz64 counts trailing (low-order) zeroes,
+// and if all are zero, then 64.
+func Ctz64(x uint64) int {
+	x &= -x                       // isolate low-order bit
+	y := x * deBruijn64ctz >> 58  // extract part of deBruijn sequence
+	i := int(deBruijnIdx64ctz[y]) // convert to bit index
+	z := int((x - 1) >> 57 & 64)  // adjustment if zero
+	return i + z
+}
+
+// Ctz32 counts trailing (low-order) zeroes,
+// and if all are zero, then 32.
+func Ctz32(x uint32) int {
+	x &= -x                       // isolate low-order bit
+	y := x * deBruijn32ctz >> 27  // extract part of deBruijn sequence
+	i := int(deBruijnIdx32ctz[y]) // convert to bit index
+	z := int((x - 1) >> 26 & 32)  // adjustment if zero
+	return i + z
+}
+
+// Ctz8 returns the number of trailing zero bits in x; the result is 8 for x == 0.
+func Ctz8(x uint8) int {
+	return int(ntz8tab[x])
+}
+
+// Bswap64 returns its input with byte order reversed
+// 0x0102030405060708 -> 0x0807060504030201
+func Bswap64(x uint64) uint64 {
+	c8 := uint64(0x00ff00ff00ff00ff)
+	a := x >> 8 & c8
+	b := (x & c8) << 8
+	x = a | b
+	c16 := uint64(0x0000ffff0000ffff)
+	a = x >> 16 & c16
+	b = (x & c16) << 16
+	x = a | b
+	c32 := uint64(0x00000000ffffffff)
+	a = x >> 32 & c32
+	b = (x & c32) << 32
+	x = a | b
+	return x
+}
+
+// Bswap32 returns its input with byte order reversed
+// 0x01020304 -> 0x04030201
+func Bswap32(x uint32) uint32 {
+	c8 := uint32(0x00ff00ff)
+	a := x >> 8 & c8
+	b := (x & c8) << 8
+	x = a | b
+	c16 := uint32(0x0000ffff)
+	a = x >> 16 & c16
+	b = (x & c16) << 16
+	x = a | b
+	return x
+}
diff --git a/src/runtime/internal/sys/intrinsics_386.s b/src/runtime/internal/sys/intrinsics_386.s
new file mode 100644
index 0000000..784b246
--- /dev/null
+++ b/src/runtime/internal/sys/intrinsics_386.s
@@ -0,0 +1,58 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT runtime∕internal∕sys·Ctz64(SB), NOSPLIT, $0-12
+	// Try low 32 bits.
+	MOVL	x_lo+0(FP), AX
+	BSFL	AX, AX
+	JZ	tryhigh
+	MOVL	AX, ret+8(FP)
+	RET
+
+tryhigh:
+	// Try high 32 bits.
+	MOVL	x_hi+4(FP), AX
+	BSFL	AX, AX
+	JZ	none
+	ADDL	$32, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+none:
+	// No bits are set.
+	MOVL	$64, ret+8(FP)
+	RET
+
+TEXT runtime∕internal∕sys·Ctz32(SB), NOSPLIT, $0-8
+	MOVL	x+0(FP), AX
+	BSFL	AX, AX
+	JNZ	2(PC)
+	MOVL	$32, AX
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime∕internal∕sys·Ctz8(SB), NOSPLIT, $0-8
+	MOVBLZX	x+0(FP), AX
+	BSFL	AX, AX
+	JNZ	2(PC)
+	MOVL	$8, AX
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime∕internal∕sys·Bswap64(SB), NOSPLIT, $0-16
+	MOVL	x_lo+0(FP), AX
+	MOVL	x_hi+4(FP), BX
+	BSWAPL	AX
+	BSWAPL	BX
+	MOVL	BX, ret_lo+8(FP)
+	MOVL	AX, ret_hi+12(FP)
+	RET
+
+TEXT runtime∕internal∕sys·Bswap32(SB), NOSPLIT, $0-8
+	MOVL	x+0(FP), AX
+	BSWAPL	AX
+	MOVL	AX, ret+4(FP)
+	RET
diff --git a/src/runtime/internal/sys/intrinsics_common.go b/src/runtime/internal/sys/intrinsics_common.go
new file mode 100644
index 0000000..818d75e
--- /dev/null
+++ b/src/runtime/internal/sys/intrinsics_common.go
@@ -0,0 +1,143 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+// Copied from math/bits to avoid dependence.
+
+var len8tab = [256]uint8{
+	0x00, 0x01, 0x02, 0x02, 0x03, 0x03, 0x03, 0x03, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
+	0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
+	0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
+	0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
+	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
+	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
+	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
+	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
+	0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
+	0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
+	0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
+	0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
+	0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
+	0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
+	0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
+	0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
+}
+
+var ntz8tab = [256]uint8{
+	0x08, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x05, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x06, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x05, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x07, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x05, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x06, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x05, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+	0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00,
+}
+
+// len64 returns the minimum number of bits required to represent x; the result is 0 for x == 0.
+func Len64(x uint64) (n int) {
+	if x >= 1<<32 {
+		x >>= 32
+		n = 32
+	}
+	if x >= 1<<16 {
+		x >>= 16
+		n += 16
+	}
+	if x >= 1<<8 {
+		x >>= 8
+		n += 8
+	}
+	return n + int(len8tab[x])
+}
+
+// --- OnesCount ---
+
+const m0 = 0x5555555555555555 // 01010101 ...
+const m1 = 0x3333333333333333 // 00110011 ...
+const m2 = 0x0f0f0f0f0f0f0f0f // 00001111 ...
+
+// OnesCount64 returns the number of one bits ("population count") in x.
+func OnesCount64(x uint64) int {
+	// Implementation: Parallel summing of adjacent bits.
+	// See "Hacker's Delight", Chap. 5: Counting Bits.
+	// The following pattern shows the general approach:
+	//
+	//   x = x>>1&(m0&m) + x&(m0&m)
+	//   x = x>>2&(m1&m) + x&(m1&m)
+	//   x = x>>4&(m2&m) + x&(m2&m)
+	//   x = x>>8&(m3&m) + x&(m3&m)
+	//   x = x>>16&(m4&m) + x&(m4&m)
+	//   x = x>>32&(m5&m) + x&(m5&m)
+	//   return int(x)
+	//
+	// Masking (& operations) can be left away when there's no
+	// danger that a field's sum will carry over into the next
+	// field: Since the result cannot be > 64, 8 bits is enough
+	// and we can ignore the masks for the shifts by 8 and up.
+	// Per "Hacker's Delight", the first line can be simplified
+	// more, but it saves at best one instruction, so we leave
+	// it alone for clarity.
+	const m = 1<<64 - 1
+	x = x>>1&(m0&m) + x&(m0&m)
+	x = x>>2&(m1&m) + x&(m1&m)
+	x = (x>>4 + x) & (m2 & m)
+	x += x >> 8
+	x += x >> 16
+	x += x >> 32
+	return int(x) & (1<<7 - 1)
+}
+
+var deBruijn64tab = [64]byte{
+	0, 1, 56, 2, 57, 49, 28, 3, 61, 58, 42, 50, 38, 29, 17, 4,
+	62, 47, 59, 36, 45, 43, 51, 22, 53, 39, 33, 30, 24, 18, 12, 5,
+	63, 55, 48, 27, 60, 41, 37, 16, 46, 35, 44, 21, 52, 32, 23, 11,
+	54, 26, 40, 15, 34, 20, 31, 10, 25, 14, 19, 9, 13, 8, 7, 6,
+}
+
+const deBruijn64 = 0x03f79d71b4ca8b09
+
+// TrailingZeros64 returns the number of trailing zero bits in x; the result is 64 for x == 0.
+func TrailingZeros64(x uint64) int {
+	if x == 0 {
+		return 64
+	}
+	// If popcount is fast, replace code below with return popcount(^x & (x - 1)).
+	//
+	// x & -x leaves only the right-most bit set in the word. Let k be the
+	// index of that bit. Since only a single bit is set, the value is two
+	// to the power of k. Multiplying by a power of two is equivalent to
+	// left shifting, in this case by k bits. The de Bruijn (64 bit) constant
+	// is such that all six bit, consecutive substrings are distinct.
+	// Therefore, if we have a left shifted version of this constant we can
+	// find by how many bits it was shifted by looking at which six bit
+	// substring ended up at the top of the word.
+	// (Knuth, volume 4, section 7.3.1)
+	return int(deBruijn64tab[(x&-x)*deBruijn64>>(64-6)])
+}
+
+// LeadingZeros64 returns the number of leading zero bits in x; the result is 64 for x == 0.
+func LeadingZeros64(x uint64) int { return 64 - Len64(x) }
+
+// LeadingZeros8 returns the number of leading zero bits in x; the result is 8 for x == 0.
+func LeadingZeros8(x uint8) int { return 8 - Len8(x) }
+
+// TrailingZeros8 returns the number of trailing zero bits in x; the result is 8 for x == 0.
+func TrailingZeros8(x uint8) int {
+	return int(ntz8tab[x])
+}
+
+// Len8 returns the minimum number of bits required to represent x; the result is 0 for x == 0.
+func Len8(x uint8) int {
+	return int(len8tab[x])
+}
diff --git a/src/runtime/internal/sys/intrinsics_stubs.go b/src/runtime/internal/sys/intrinsics_stubs.go
new file mode 100644
index 0000000..9cbf482
--- /dev/null
+++ b/src/runtime/internal/sys/intrinsics_stubs.go
@@ -0,0 +1,13 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build 386
+
+package sys
+
+func Ctz64(x uint64) int
+func Ctz32(x uint32) int
+func Ctz8(x uint8) int
+func Bswap64(x uint64) uint64
+func Bswap32(x uint32) uint32
diff --git a/src/runtime/internal/sys/intrinsics_test.go b/src/runtime/internal/sys/intrinsics_test.go
new file mode 100644
index 0000000..0444183
--- /dev/null
+++ b/src/runtime/internal/sys/intrinsics_test.go
@@ -0,0 +1,38 @@
+package sys_test
+
+import (
+	"runtime/internal/sys"
+	"testing"
+)
+
+func TestCtz64(t *testing.T) {
+	for i := 0; i <= 64; i++ {
+		x := uint64(5) << uint(i)
+		if got := sys.Ctz64(x); got != i {
+			t.Errorf("Ctz64(%d)=%d, want %d", x, got, i)
+		}
+	}
+}
+func TestCtz32(t *testing.T) {
+	for i := 0; i <= 32; i++ {
+		x := uint32(5) << uint(i)
+		if got := sys.Ctz32(x); got != i {
+			t.Errorf("Ctz32(%d)=%d, want %d", x, got, i)
+		}
+	}
+}
+
+func TestBswap64(t *testing.T) {
+	x := uint64(0x1122334455667788)
+	y := sys.Bswap64(x)
+	if y != 0x8877665544332211 {
+		t.Errorf("Bswap(%x)=%x, want 0x8877665544332211", x, y)
+	}
+}
+func TestBswap32(t *testing.T) {
+	x := uint32(0x11223344)
+	y := sys.Bswap32(x)
+	if y != 0x44332211 {
+		t.Errorf("Bswap(%x)=%x, want 0x44332211", x, y)
+	}
+}
diff --git a/src/runtime/internal/sys/stubs.go b/src/runtime/internal/sys/stubs.go
new file mode 100644
index 0000000..10b0173
--- /dev/null
+++ b/src/runtime/internal/sys/stubs.go
@@ -0,0 +1,16 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+// Declarations for runtime services implemented in C or assembly.
+
+const PtrSize = 4 << (^uintptr(0) >> 63)           // unsafe.Sizeof(uintptr(0)) but an ideal const
+const RegSize = 4 << (^Uintreg(0) >> 63)           // unsafe.Sizeof(uintreg(0)) but an ideal const
+const SpAlign = 1*(1-GoarchArm64) + 16*GoarchArm64 // SP alignment: 1 normally, 16 for ARM64
+
+var DefaultGoroot string // set at link time
+
+// AIX requires a larger stack for syscalls.
+const StackGuardMultiplier = StackGuardMultiplierDefault*(1-GoosAix) + 2*GoosAix
diff --git a/src/runtime/internal/sys/sys.go b/src/runtime/internal/sys/sys.go
new file mode 100644
index 0000000..9d9ac45
--- /dev/null
+++ b/src/runtime/internal/sys/sys.go
@@ -0,0 +1,15 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// package sys contains system- and configuration- and architecture-specific
+// constants used by the runtime.
+package sys
+
+// The next line makes 'go generate' write the zgo*.go files with
+// per-OS and per-arch information, including constants
+// named Goos$GOOS and Goarch$GOARCH for every
+// known GOOS and GOARCH. The constant is 1 on the
+// current system, 0 otherwise; multiplying by them is
+// useful for defining GOOS- or GOARCH-specific constants.
+//go:generate go run gengoos.go
diff --git a/src/runtime/internal/sys/zgoarch_386.go b/src/runtime/internal/sys/zgoarch_386.go
new file mode 100644
index 0000000..c286d0d
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_386.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build 386
+
+package sys
+
+const GOARCH = `386`
+
+const Goarch386 = 1
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_amd64.go b/src/runtime/internal/sys/zgoarch_amd64.go
new file mode 100644
index 0000000..d21c1d7
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_amd64.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build amd64
+
+package sys
+
+const GOARCH = `amd64`
+
+const Goarch386 = 0
+const GoarchAmd64 = 1
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_arm.go b/src/runtime/internal/sys/zgoarch_arm.go
new file mode 100644
index 0000000..9085fb0
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_arm.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build arm
+
+package sys
+
+const GOARCH = `arm`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 1
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_arm64.go b/src/runtime/internal/sys/zgoarch_arm64.go
new file mode 100644
index 0000000..ed7ef2e
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_arm64.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build arm64
+
+package sys
+
+const GOARCH = `arm64`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 1
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_arm64be.go b/src/runtime/internal/sys/zgoarch_arm64be.go
new file mode 100644
index 0000000..faf3111
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_arm64be.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build arm64be
+
+package sys
+
+const GOARCH = `arm64be`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 1
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_armbe.go b/src/runtime/internal/sys/zgoarch_armbe.go
new file mode 100644
index 0000000..cb28301
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_armbe.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build armbe
+
+package sys
+
+const GOARCH = `armbe`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 1
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_mips.go b/src/runtime/internal/sys/zgoarch_mips.go
new file mode 100644
index 0000000..315dea1
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_mips.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build mips
+
+package sys
+
+const GOARCH = `mips`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 1
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_mips64.go b/src/runtime/internal/sys/zgoarch_mips64.go
new file mode 100644
index 0000000..5258cbf
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_mips64.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build mips64
+
+package sys
+
+const GOARCH = `mips64`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 1
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_mips64le.go b/src/runtime/internal/sys/zgoarch_mips64le.go
new file mode 100644
index 0000000..1721698
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_mips64le.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build mips64le
+
+package sys
+
+const GOARCH = `mips64le`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 1
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_mips64p32.go b/src/runtime/internal/sys/zgoarch_mips64p32.go
new file mode 100644
index 0000000..44c4624
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_mips64p32.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build mips64p32
+
+package sys
+
+const GOARCH = `mips64p32`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 1
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_mips64p32le.go b/src/runtime/internal/sys/zgoarch_mips64p32le.go
new file mode 100644
index 0000000..eb63225
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_mips64p32le.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build mips64p32le
+
+package sys
+
+const GOARCH = `mips64p32le`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 1
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_mipsle.go b/src/runtime/internal/sys/zgoarch_mipsle.go
new file mode 100644
index 0000000..e0ebfbf
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_mipsle.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build mipsle
+
+package sys
+
+const GOARCH = `mipsle`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 1
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_ppc.go b/src/runtime/internal/sys/zgoarch_ppc.go
new file mode 100644
index 0000000..ef26aa3
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_ppc.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build ppc
+
+package sys
+
+const GOARCH = `ppc`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 1
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_ppc64.go b/src/runtime/internal/sys/zgoarch_ppc64.go
new file mode 100644
index 0000000..32c2d46
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_ppc64.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build ppc64
+
+package sys
+
+const GOARCH = `ppc64`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 1
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_ppc64le.go b/src/runtime/internal/sys/zgoarch_ppc64le.go
new file mode 100644
index 0000000..3a6e567
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_ppc64le.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build ppc64le
+
+package sys
+
+const GOARCH = `ppc64le`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 1
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_riscv.go b/src/runtime/internal/sys/zgoarch_riscv.go
new file mode 100644
index 0000000..d8f6b49
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_riscv.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build riscv
+
+package sys
+
+const GOARCH = `riscv`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 1
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_riscv64.go b/src/runtime/internal/sys/zgoarch_riscv64.go
new file mode 100644
index 0000000..0ba843b
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_riscv64.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build riscv64
+
+package sys
+
+const GOARCH = `riscv64`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 1
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_s390.go b/src/runtime/internal/sys/zgoarch_s390.go
new file mode 100644
index 0000000..20a1b23
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_s390.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build s390
+
+package sys
+
+const GOARCH = `s390`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 1
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_s390x.go b/src/runtime/internal/sys/zgoarch_s390x.go
new file mode 100644
index 0000000..ffdda0c
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_s390x.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build s390x
+
+package sys
+
+const GOARCH = `s390x`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 1
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_sparc.go b/src/runtime/internal/sys/zgoarch_sparc.go
new file mode 100644
index 0000000..b494951
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_sparc.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build sparc
+
+package sys
+
+const GOARCH = `sparc`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 1
+const GoarchSparc64 = 0
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_sparc64.go b/src/runtime/internal/sys/zgoarch_sparc64.go
new file mode 100644
index 0000000..0f6df41
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_sparc64.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build sparc64
+
+package sys
+
+const GOARCH = `sparc64`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 1
+const GoarchWasm = 0
diff --git a/src/runtime/internal/sys/zgoarch_wasm.go b/src/runtime/internal/sys/zgoarch_wasm.go
new file mode 100644
index 0000000..e69afb0
--- /dev/null
+++ b/src/runtime/internal/sys/zgoarch_wasm.go
@@ -0,0 +1,31 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build wasm
+
+package sys
+
+const GOARCH = `wasm`
+
+const Goarch386 = 0
+const GoarchAmd64 = 0
+const GoarchAmd64p32 = 0
+const GoarchArm = 0
+const GoarchArmbe = 0
+const GoarchArm64 = 0
+const GoarchArm64be = 0
+const GoarchPpc64 = 0
+const GoarchPpc64le = 0
+const GoarchMips = 0
+const GoarchMipsle = 0
+const GoarchMips64 = 0
+const GoarchMips64le = 0
+const GoarchMips64p32 = 0
+const GoarchMips64p32le = 0
+const GoarchPpc = 0
+const GoarchRiscv = 0
+const GoarchRiscv64 = 0
+const GoarchS390 = 0
+const GoarchS390x = 0
+const GoarchSparc = 0
+const GoarchSparc64 = 0
+const GoarchWasm = 1
diff --git a/src/runtime/internal/sys/zgoos_aix.go b/src/runtime/internal/sys/zgoos_aix.go
new file mode 100644
index 0000000..0631d02
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_aix.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build aix
+
+package sys
+
+const GOOS = `aix`
+
+const GoosAix = 1
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_android.go b/src/runtime/internal/sys/zgoos_android.go
new file mode 100644
index 0000000..d356a40
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_android.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build android
+
+package sys
+
+const GOOS = `android`
+
+const GoosAix = 0
+const GoosAndroid = 1
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_darwin.go b/src/runtime/internal/sys/zgoos_darwin.go
new file mode 100644
index 0000000..6aa2db7
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_darwin.go
@@ -0,0 +1,26 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build !ios
+// +build darwin
+
+package sys
+
+const GOOS = `darwin`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 1
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_dragonfly.go b/src/runtime/internal/sys/zgoos_dragonfly.go
new file mode 100644
index 0000000..88ee117
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_dragonfly.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build dragonfly
+
+package sys
+
+const GOOS = `dragonfly`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 1
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_freebsd.go b/src/runtime/internal/sys/zgoos_freebsd.go
new file mode 100644
index 0000000..8de2ec0
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_freebsd.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build freebsd
+
+package sys
+
+const GOOS = `freebsd`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 1
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_hurd.go b/src/runtime/internal/sys/zgoos_hurd.go
new file mode 100644
index 0000000..183ccb0
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_hurd.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build hurd
+
+package sys
+
+const GOOS = `hurd`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 1
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_illumos.go b/src/runtime/internal/sys/zgoos_illumos.go
new file mode 100644
index 0000000..d04134e
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_illumos.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build illumos
+
+package sys
+
+const GOOS = `illumos`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 1
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_ios.go b/src/runtime/internal/sys/zgoos_ios.go
new file mode 100644
index 0000000..cf6e9d6
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_ios.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build ios
+
+package sys
+
+const GOOS = `ios`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 1
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_js.go b/src/runtime/internal/sys/zgoos_js.go
new file mode 100644
index 0000000..1d9279a
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_js.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build js
+
+package sys
+
+const GOOS = `js`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 1
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_linux.go b/src/runtime/internal/sys/zgoos_linux.go
new file mode 100644
index 0000000..0f718d7
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_linux.go
@@ -0,0 +1,26 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build !android
+// +build linux
+
+package sys
+
+const GOOS = `linux`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 1
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_netbsd.go b/src/runtime/internal/sys/zgoos_netbsd.go
new file mode 100644
index 0000000..2ae149f
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_netbsd.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build netbsd
+
+package sys
+
+const GOOS = `netbsd`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 1
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_openbsd.go b/src/runtime/internal/sys/zgoos_openbsd.go
new file mode 100644
index 0000000..7d4d61e
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_openbsd.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build openbsd
+
+package sys
+
+const GOOS = `openbsd`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 1
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_plan9.go b/src/runtime/internal/sys/zgoos_plan9.go
new file mode 100644
index 0000000..30aec46
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_plan9.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build plan9
+
+package sys
+
+const GOOS = `plan9`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 1
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_solaris.go b/src/runtime/internal/sys/zgoos_solaris.go
new file mode 100644
index 0000000..4bb8c99
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_solaris.go
@@ -0,0 +1,26 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build !illumos
+// +build solaris
+
+package sys
+
+const GOOS = `solaris`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 1
+const GoosWindows = 0
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_windows.go b/src/runtime/internal/sys/zgoos_windows.go
new file mode 100644
index 0000000..d1f4290
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_windows.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build windows
+
+package sys
+
+const GOOS = `windows`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 1
+const GoosZos = 0
diff --git a/src/runtime/internal/sys/zgoos_zos.go b/src/runtime/internal/sys/zgoos_zos.go
new file mode 100644
index 0000000..d22be46
--- /dev/null
+++ b/src/runtime/internal/sys/zgoos_zos.go
@@ -0,0 +1,25 @@
+// Code generated by gengoos.go using 'go generate'. DO NOT EDIT.
+
+// +build zos
+
+package sys
+
+const GOOS = `zos`
+
+const GoosAix = 0
+const GoosAndroid = 0
+const GoosDarwin = 0
+const GoosDragonfly = 0
+const GoosFreebsd = 0
+const GoosHurd = 0
+const GoosIllumos = 0
+const GoosIos = 0
+const GoosJs = 0
+const GoosLinux = 0
+const GoosNacl = 0
+const GoosNetbsd = 0
+const GoosOpenbsd = 0
+const GoosPlan9 = 0
+const GoosSolaris = 0
+const GoosWindows = 0
+const GoosZos = 1
diff --git a/src/runtime/lfstack.go b/src/runtime/lfstack.go
new file mode 100644
index 0000000..406561a
--- /dev/null
+++ b/src/runtime/lfstack.go
@@ -0,0 +1,67 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Lock-free stack.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// lfstack is the head of a lock-free stack.
+//
+// The zero value of lfstack is an empty list.
+//
+// This stack is intrusive. Nodes must embed lfnode as the first field.
+//
+// The stack does not keep GC-visible pointers to nodes, so the caller
+// is responsible for ensuring the nodes are not garbage collected
+// (typically by allocating them from manually-managed memory).
+type lfstack uint64
+
+func (head *lfstack) push(node *lfnode) {
+	node.pushcnt++
+	new := lfstackPack(node, node.pushcnt)
+	if node1 := lfstackUnpack(new); node1 != node {
+		print("runtime: lfstack.push invalid packing: node=", node, " cnt=", hex(node.pushcnt), " packed=", hex(new), " -> node=", node1, "\n")
+		throw("lfstack.push")
+	}
+	for {
+		old := atomic.Load64((*uint64)(head))
+		node.next = old
+		if atomic.Cas64((*uint64)(head), old, new) {
+			break
+		}
+	}
+}
+
+func (head *lfstack) pop() unsafe.Pointer {
+	for {
+		old := atomic.Load64((*uint64)(head))
+		if old == 0 {
+			return nil
+		}
+		node := lfstackUnpack(old)
+		next := atomic.Load64(&node.next)
+		if atomic.Cas64((*uint64)(head), old, next) {
+			return unsafe.Pointer(node)
+		}
+	}
+}
+
+func (head *lfstack) empty() bool {
+	return atomic.Load64((*uint64)(head)) == 0
+}
+
+// lfnodeValidate panics if node is not a valid address for use with
+// lfstack.push. This only needs to be called when node is allocated.
+func lfnodeValidate(node *lfnode) {
+	if lfstackUnpack(lfstackPack(node, ^uintptr(0))) != node {
+		printlock()
+		println("runtime: bad lfnode address", hex(uintptr(unsafe.Pointer(node))))
+		throw("bad lfnode address")
+	}
+}
diff --git a/src/runtime/lfstack_32bit.go b/src/runtime/lfstack_32bit.go
new file mode 100644
index 0000000..f07ff1c
--- /dev/null
+++ b/src/runtime/lfstack_32bit.go
@@ -0,0 +1,19 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build 386 arm mips mipsle
+
+package runtime
+
+import "unsafe"
+
+// On 32-bit systems, the stored uint64 has a 32-bit pointer and 32-bit count.
+
+func lfstackPack(node *lfnode, cnt uintptr) uint64 {
+	return uint64(uintptr(unsafe.Pointer(node)))<<32 | uint64(cnt)
+}
+
+func lfstackUnpack(val uint64) *lfnode {
+	return (*lfnode)(unsafe.Pointer(uintptr(val >> 32)))
+}
diff --git a/src/runtime/lfstack_64bit.go b/src/runtime/lfstack_64bit.go
new file mode 100644
index 0000000..9d821b9
--- /dev/null
+++ b/src/runtime/lfstack_64bit.go
@@ -0,0 +1,58 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build amd64 arm64 mips64 mips64le ppc64 ppc64le riscv64 s390x wasm
+
+package runtime
+
+import "unsafe"
+
+const (
+	// addrBits is the number of bits needed to represent a virtual address.
+	//
+	// See heapAddrBits for a table of address space sizes on
+	// various architectures. 48 bits is enough for all
+	// architectures except s390x.
+	//
+	// On AMD64, virtual addresses are 48-bit (or 57-bit) numbers sign extended to 64.
+	// We shift the address left 16 to eliminate the sign extended part and make
+	// room in the bottom for the count.
+	//
+	// On s390x, virtual addresses are 64-bit. There's not much we
+	// can do about this, so we just hope that the kernel doesn't
+	// get to really high addresses and panic if it does.
+	addrBits = 48
+
+	// In addition to the 16 bits taken from the top, we can take 3 from the
+	// bottom, because node must be pointer-aligned, giving a total of 19 bits
+	// of count.
+	cntBits = 64 - addrBits + 3
+
+	// On AIX, 64-bit addresses are split into 36-bit segment number and 28-bit
+	// offset in segment.  Segment numbers in the range 0x0A0000000-0x0AFFFFFFF(LSA)
+	// are available for mmap.
+	// We assume all lfnode addresses are from memory allocated with mmap.
+	// We use one bit to distinguish between the two ranges.
+	aixAddrBits = 57
+	aixCntBits  = 64 - aixAddrBits + 3
+)
+
+func lfstackPack(node *lfnode, cnt uintptr) uint64 {
+	if GOARCH == "ppc64" && GOOS == "aix" {
+		return uint64(uintptr(unsafe.Pointer(node)))<<(64-aixAddrBits) | uint64(cnt&(1<<aixCntBits-1))
+	}
+	return uint64(uintptr(unsafe.Pointer(node)))<<(64-addrBits) | uint64(cnt&(1<<cntBits-1))
+}
+
+func lfstackUnpack(val uint64) *lfnode {
+	if GOARCH == "amd64" {
+		// amd64 systems can place the stack above the VA hole, so we need to sign extend
+		// val before unpacking.
+		return (*lfnode)(unsafe.Pointer(uintptr(int64(val) >> cntBits << 3)))
+	}
+	if GOARCH == "ppc64" && GOOS == "aix" {
+		return (*lfnode)(unsafe.Pointer(uintptr((val >> aixCntBits << 3) | 0xa<<56)))
+	}
+	return (*lfnode)(unsafe.Pointer(uintptr(val >> cntBits << 3)))
+}
diff --git a/src/runtime/lfstack_test.go b/src/runtime/lfstack_test.go
new file mode 100644
index 0000000..fb4b459
--- /dev/null
+++ b/src/runtime/lfstack_test.go
@@ -0,0 +1,140 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math/rand"
+	. "runtime"
+	"testing"
+	"unsafe"
+)
+
+type MyNode struct {
+	LFNode
+	data int
+}
+
+func fromMyNode(node *MyNode) *LFNode {
+	return (*LFNode)(unsafe.Pointer(node))
+}
+
+func toMyNode(node *LFNode) *MyNode {
+	return (*MyNode)(unsafe.Pointer(node))
+}
+
+var global interface{}
+
+func TestLFStack(t *testing.T) {
+	stack := new(uint64)
+	global = stack // force heap allocation
+
+	// Need to keep additional references to nodes, the stack is not all that type-safe.
+	var nodes []*MyNode
+
+	// Check the stack is initially empty.
+	if LFStackPop(stack) != nil {
+		t.Fatalf("stack is not empty")
+	}
+
+	// Push one element.
+	node := &MyNode{data: 42}
+	nodes = append(nodes, node)
+	LFStackPush(stack, fromMyNode(node))
+
+	// Push another.
+	node = &MyNode{data: 43}
+	nodes = append(nodes, node)
+	LFStackPush(stack, fromMyNode(node))
+
+	// Pop one element.
+	node = toMyNode(LFStackPop(stack))
+	if node == nil {
+		t.Fatalf("stack is empty")
+	}
+	if node.data != 43 {
+		t.Fatalf("no lifo")
+	}
+
+	// Pop another.
+	node = toMyNode(LFStackPop(stack))
+	if node == nil {
+		t.Fatalf("stack is empty")
+	}
+	if node.data != 42 {
+		t.Fatalf("no lifo")
+	}
+
+	// Check the stack is empty again.
+	if LFStackPop(stack) != nil {
+		t.Fatalf("stack is not empty")
+	}
+	if *stack != 0 {
+		t.Fatalf("stack is not empty")
+	}
+}
+
+var stress []*MyNode
+
+func TestLFStackStress(t *testing.T) {
+	const K = 100
+	P := 4 * GOMAXPROCS(-1)
+	N := 100000
+	if testing.Short() {
+		N /= 10
+	}
+	// Create 2 stacks.
+	stacks := [2]*uint64{new(uint64), new(uint64)}
+	// Need to keep additional references to nodes,
+	// the lock-free stack is not type-safe.
+	stress = nil
+	// Push K elements randomly onto the stacks.
+	sum := 0
+	for i := 0; i < K; i++ {
+		sum += i
+		node := &MyNode{data: i}
+		stress = append(stress, node)
+		LFStackPush(stacks[i%2], fromMyNode(node))
+	}
+	c := make(chan bool, P)
+	for p := 0; p < P; p++ {
+		go func() {
+			r := rand.New(rand.NewSource(rand.Int63()))
+			// Pop a node from a random stack, then push it onto a random stack.
+			for i := 0; i < N; i++ {
+				node := toMyNode(LFStackPop(stacks[r.Intn(2)]))
+				if node != nil {
+					LFStackPush(stacks[r.Intn(2)], fromMyNode(node))
+				}
+			}
+			c <- true
+		}()
+	}
+	for i := 0; i < P; i++ {
+		<-c
+	}
+	// Pop all elements from both stacks, and verify that nothing lost.
+	sum2 := 0
+	cnt := 0
+	for i := 0; i < 2; i++ {
+		for {
+			node := toMyNode(LFStackPop(stacks[i]))
+			if node == nil {
+				break
+			}
+			cnt++
+			sum2 += node.data
+			node.Next = 0
+		}
+	}
+	if cnt != K {
+		t.Fatalf("Wrong number of nodes %d/%d", cnt, K)
+	}
+	if sum2 != sum {
+		t.Fatalf("Wrong sum %d/%d", sum2, sum)
+	}
+
+	// Let nodes be collected now.
+	stress = nil
+}
diff --git a/src/runtime/libfuzzer.go b/src/runtime/libfuzzer.go
new file mode 100644
index 0000000..0161955
--- /dev/null
+++ b/src/runtime/libfuzzer.go
@@ -0,0 +1,75 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build libfuzzer
+
+package runtime
+
+import _ "unsafe" // for go:linkname
+
+func libfuzzerCall(fn *byte, arg0, arg1 uintptr)
+
+func libfuzzerTraceCmp1(arg0, arg1 uint8) {
+	libfuzzerCall(&__sanitizer_cov_trace_cmp1, uintptr(arg0), uintptr(arg1))
+}
+
+func libfuzzerTraceCmp2(arg0, arg1 uint16) {
+	libfuzzerCall(&__sanitizer_cov_trace_cmp2, uintptr(arg0), uintptr(arg1))
+}
+
+func libfuzzerTraceCmp4(arg0, arg1 uint32) {
+	libfuzzerCall(&__sanitizer_cov_trace_cmp4, uintptr(arg0), uintptr(arg1))
+}
+
+func libfuzzerTraceCmp8(arg0, arg1 uint64) {
+	libfuzzerCall(&__sanitizer_cov_trace_cmp8, uintptr(arg0), uintptr(arg1))
+}
+
+func libfuzzerTraceConstCmp1(arg0, arg1 uint8) {
+	libfuzzerCall(&__sanitizer_cov_trace_const_cmp1, uintptr(arg0), uintptr(arg1))
+}
+
+func libfuzzerTraceConstCmp2(arg0, arg1 uint16) {
+	libfuzzerCall(&__sanitizer_cov_trace_const_cmp2, uintptr(arg0), uintptr(arg1))
+}
+
+func libfuzzerTraceConstCmp4(arg0, arg1 uint32) {
+	libfuzzerCall(&__sanitizer_cov_trace_const_cmp4, uintptr(arg0), uintptr(arg1))
+}
+
+func libfuzzerTraceConstCmp8(arg0, arg1 uint64) {
+	libfuzzerCall(&__sanitizer_cov_trace_const_cmp8, uintptr(arg0), uintptr(arg1))
+}
+
+//go:linkname __sanitizer_cov_trace_cmp1 __sanitizer_cov_trace_cmp1
+//go:cgo_import_static __sanitizer_cov_trace_cmp1
+var __sanitizer_cov_trace_cmp1 byte
+
+//go:linkname __sanitizer_cov_trace_cmp2 __sanitizer_cov_trace_cmp2
+//go:cgo_import_static __sanitizer_cov_trace_cmp2
+var __sanitizer_cov_trace_cmp2 byte
+
+//go:linkname __sanitizer_cov_trace_cmp4 __sanitizer_cov_trace_cmp4
+//go:cgo_import_static __sanitizer_cov_trace_cmp4
+var __sanitizer_cov_trace_cmp4 byte
+
+//go:linkname __sanitizer_cov_trace_cmp8 __sanitizer_cov_trace_cmp8
+//go:cgo_import_static __sanitizer_cov_trace_cmp8
+var __sanitizer_cov_trace_cmp8 byte
+
+//go:linkname __sanitizer_cov_trace_const_cmp1 __sanitizer_cov_trace_const_cmp1
+//go:cgo_import_static __sanitizer_cov_trace_const_cmp1
+var __sanitizer_cov_trace_const_cmp1 byte
+
+//go:linkname __sanitizer_cov_trace_const_cmp2 __sanitizer_cov_trace_const_cmp2
+//go:cgo_import_static __sanitizer_cov_trace_const_cmp2
+var __sanitizer_cov_trace_const_cmp2 byte
+
+//go:linkname __sanitizer_cov_trace_const_cmp4 __sanitizer_cov_trace_const_cmp4
+//go:cgo_import_static __sanitizer_cov_trace_const_cmp4
+var __sanitizer_cov_trace_const_cmp4 byte
+
+//go:linkname __sanitizer_cov_trace_const_cmp8 __sanitizer_cov_trace_const_cmp8
+//go:cgo_import_static __sanitizer_cov_trace_const_cmp8
+var __sanitizer_cov_trace_const_cmp8 byte
diff --git a/src/runtime/libfuzzer_amd64.s b/src/runtime/libfuzzer_amd64.s
new file mode 100644
index 0000000..890fde3
--- /dev/null
+++ b/src/runtime/libfuzzer_amd64.s
@@ -0,0 +1,42 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build libfuzzer
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// Based on race_amd64.s; see commentary there.
+
+#ifdef GOOS_windows
+#define RARG0 CX
+#define RARG1 DX
+#else
+#define RARG0 DI
+#define RARG1 SI
+#endif
+
+// void runtime·libfuzzerCall(fn, arg0, arg1 uintptr)
+// Calls C function fn from libFuzzer and passes 2 arguments to it.
+TEXT	runtime·libfuzzerCall(SB), NOSPLIT, $0-24
+	MOVQ	fn+0(FP), AX
+	MOVQ	arg0+8(FP), RARG0
+	MOVQ	arg1+16(FP), RARG1
+
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_m(R14), R13
+
+	// Switch to g0 stack.
+	MOVQ	SP, R12		// callee-saved, preserved across the CALL
+	MOVQ	m_g0(R13), R10
+	CMPQ	R10, R14
+	JE	call	// already on g0
+	MOVQ	(g_sched+gobuf_sp)(R10), SP
+call:
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	CALL	AX
+	MOVQ	R12, SP
+	RET
diff --git a/src/runtime/libfuzzer_arm64.s b/src/runtime/libfuzzer_arm64.s
new file mode 100644
index 0000000..121673e
--- /dev/null
+++ b/src/runtime/libfuzzer_arm64.s
@@ -0,0 +1,31 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build libfuzzer
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Based on race_arm64.s; see commentary there.
+
+// func runtime·libfuzzerCall(fn, arg0, arg1 uintptr)
+// Calls C function fn from libFuzzer and passes 2 arguments to it.
+TEXT	runtime·libfuzzerCall(SB), NOSPLIT, $0-24
+	MOVD	fn+0(FP), R9
+	MOVD	arg0+8(FP), R0
+	MOVD	arg1+16(FP), R1
+
+	MOVD	g_m(g), R10
+
+	// Switch to g0 stack.
+	MOVD	RSP, R19	// callee-saved, preserved across the CALL
+	MOVD	m_g0(R10), R11
+	CMP	R11, g
+	BEQ	call	// already on g0
+	MOVD	(g_sched+gobuf_sp)(R11), R12
+	MOVD	R12, RSP
+call:
+	BL	R9
+	MOVD	R19, RSP
+	RET
diff --git a/src/runtime/lock_futex.go b/src/runtime/lock_futex.go
new file mode 100644
index 0000000..91467fd
--- /dev/null
+++ b/src/runtime/lock_futex.go
@@ -0,0 +1,245 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build dragonfly freebsd linux
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// This implementation depends on OS-specific implementations of
+//
+//	futexsleep(addr *uint32, val uint32, ns int64)
+//		Atomically,
+//			if *addr == val { sleep }
+//		Might be woken up spuriously; that's allowed.
+//		Don't sleep longer than ns; ns < 0 means forever.
+//
+//	futexwakeup(addr *uint32, cnt uint32)
+//		If any procs are sleeping on addr, wake up at most cnt.
+
+const (
+	mutex_unlocked = 0
+	mutex_locked   = 1
+	mutex_sleeping = 2
+
+	active_spin     = 4
+	active_spin_cnt = 30
+	passive_spin    = 1
+)
+
+// Possible lock states are mutex_unlocked, mutex_locked and mutex_sleeping.
+// mutex_sleeping means that there is presumably at least one sleeping thread.
+// Note that there can be spinning threads during all states - they do not
+// affect mutex's state.
+
+// We use the uintptr mutex.key and note.key as a uint32.
+//go:nosplit
+func key32(p *uintptr) *uint32 {
+	return (*uint32)(unsafe.Pointer(p))
+}
+
+func lock(l *mutex) {
+	lockWithRank(l, getLockRank(l))
+}
+
+func lock2(l *mutex) {
+	gp := getg()
+
+	if gp.m.locks < 0 {
+		throw("runtime·lock: lock count")
+	}
+	gp.m.locks++
+
+	// Speculative grab for lock.
+	v := atomic.Xchg(key32(&l.key), mutex_locked)
+	if v == mutex_unlocked {
+		return
+	}
+
+	// wait is either MUTEX_LOCKED or MUTEX_SLEEPING
+	// depending on whether there is a thread sleeping
+	// on this mutex. If we ever change l->key from
+	// MUTEX_SLEEPING to some other value, we must be
+	// careful to change it back to MUTEX_SLEEPING before
+	// returning, to ensure that the sleeping thread gets
+	// its wakeup call.
+	wait := v
+
+	// On uniprocessors, no point spinning.
+	// On multiprocessors, spin for ACTIVE_SPIN attempts.
+	spin := 0
+	if ncpu > 1 {
+		spin = active_spin
+	}
+	for {
+		// Try for lock, spinning.
+		for i := 0; i < spin; i++ {
+			for l.key == mutex_unlocked {
+				if atomic.Cas(key32(&l.key), mutex_unlocked, wait) {
+					return
+				}
+			}
+			procyield(active_spin_cnt)
+		}
+
+		// Try for lock, rescheduling.
+		for i := 0; i < passive_spin; i++ {
+			for l.key == mutex_unlocked {
+				if atomic.Cas(key32(&l.key), mutex_unlocked, wait) {
+					return
+				}
+			}
+			osyield()
+		}
+
+		// Sleep.
+		v = atomic.Xchg(key32(&l.key), mutex_sleeping)
+		if v == mutex_unlocked {
+			return
+		}
+		wait = mutex_sleeping
+		futexsleep(key32(&l.key), mutex_sleeping, -1)
+	}
+}
+
+func unlock(l *mutex) {
+	unlockWithRank(l)
+}
+
+func unlock2(l *mutex) {
+	v := atomic.Xchg(key32(&l.key), mutex_unlocked)
+	if v == mutex_unlocked {
+		throw("unlock of unlocked lock")
+	}
+	if v == mutex_sleeping {
+		futexwakeup(key32(&l.key), 1)
+	}
+
+	gp := getg()
+	gp.m.locks--
+	if gp.m.locks < 0 {
+		throw("runtime·unlock: lock count")
+	}
+	if gp.m.locks == 0 && gp.preempt { // restore the preemption request in case we've cleared it in newstack
+		gp.stackguard0 = stackPreempt
+	}
+}
+
+// One-time notifications.
+func noteclear(n *note) {
+	n.key = 0
+}
+
+func notewakeup(n *note) {
+	old := atomic.Xchg(key32(&n.key), 1)
+	if old != 0 {
+		print("notewakeup - double wakeup (", old, ")\n")
+		throw("notewakeup - double wakeup")
+	}
+	futexwakeup(key32(&n.key), 1)
+}
+
+func notesleep(n *note) {
+	gp := getg()
+	if gp != gp.m.g0 {
+		throw("notesleep not on g0")
+	}
+	ns := int64(-1)
+	if *cgo_yield != nil {
+		// Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
+		ns = 10e6
+	}
+	for atomic.Load(key32(&n.key)) == 0 {
+		gp.m.blocked = true
+		futexsleep(key32(&n.key), 0, ns)
+		if *cgo_yield != nil {
+			asmcgocall(*cgo_yield, nil)
+		}
+		gp.m.blocked = false
+	}
+}
+
+// May run with m.p==nil if called from notetsleep, so write barriers
+// are not allowed.
+//
+//go:nosplit
+//go:nowritebarrier
+func notetsleep_internal(n *note, ns int64) bool {
+	gp := getg()
+
+	if ns < 0 {
+		if *cgo_yield != nil {
+			// Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
+			ns = 10e6
+		}
+		for atomic.Load(key32(&n.key)) == 0 {
+			gp.m.blocked = true
+			futexsleep(key32(&n.key), 0, ns)
+			if *cgo_yield != nil {
+				asmcgocall(*cgo_yield, nil)
+			}
+			gp.m.blocked = false
+		}
+		return true
+	}
+
+	if atomic.Load(key32(&n.key)) != 0 {
+		return true
+	}
+
+	deadline := nanotime() + ns
+	for {
+		if *cgo_yield != nil && ns > 10e6 {
+			ns = 10e6
+		}
+		gp.m.blocked = true
+		futexsleep(key32(&n.key), 0, ns)
+		if *cgo_yield != nil {
+			asmcgocall(*cgo_yield, nil)
+		}
+		gp.m.blocked = false
+		if atomic.Load(key32(&n.key)) != 0 {
+			break
+		}
+		now := nanotime()
+		if now >= deadline {
+			break
+		}
+		ns = deadline - now
+	}
+	return atomic.Load(key32(&n.key)) != 0
+}
+
+func notetsleep(n *note, ns int64) bool {
+	gp := getg()
+	if gp != gp.m.g0 && gp.m.preemptoff != "" {
+		throw("notetsleep not on g0")
+	}
+
+	return notetsleep_internal(n, ns)
+}
+
+// same as runtime·notetsleep, but called on user g (not g0)
+// calls only nosplit functions between entersyscallblock/exitsyscall
+func notetsleepg(n *note, ns int64) bool {
+	gp := getg()
+	if gp == gp.m.g0 {
+		throw("notetsleepg on g0")
+	}
+
+	entersyscallblock()
+	ok := notetsleep_internal(n, ns)
+	exitsyscall()
+	return ok
+}
+
+func beforeIdle(int64) (*g, bool) {
+	return nil, false
+}
+
+func checkTimeouts() {}
diff --git a/src/runtime/lock_js.go b/src/runtime/lock_js.go
new file mode 100644
index 0000000..14bdc76
--- /dev/null
+++ b/src/runtime/lock_js.go
@@ -0,0 +1,258 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build js,wasm
+
+package runtime
+
+import (
+	_ "unsafe"
+)
+
+// js/wasm has no support for threads yet. There is no preemption.
+
+const (
+	mutex_unlocked = 0
+	mutex_locked   = 1
+
+	note_cleared = 0
+	note_woken   = 1
+	note_timeout = 2
+
+	active_spin     = 4
+	active_spin_cnt = 30
+	passive_spin    = 1
+)
+
+func lock(l *mutex) {
+	lockWithRank(l, getLockRank(l))
+}
+
+func lock2(l *mutex) {
+	if l.key == mutex_locked {
+		// js/wasm is single-threaded so we should never
+		// observe this.
+		throw("self deadlock")
+	}
+	gp := getg()
+	if gp.m.locks < 0 {
+		throw("lock count")
+	}
+	gp.m.locks++
+	l.key = mutex_locked
+}
+
+func unlock(l *mutex) {
+	unlockWithRank(l)
+}
+
+func unlock2(l *mutex) {
+	if l.key == mutex_unlocked {
+		throw("unlock of unlocked lock")
+	}
+	gp := getg()
+	gp.m.locks--
+	if gp.m.locks < 0 {
+		throw("lock count")
+	}
+	l.key = mutex_unlocked
+}
+
+// One-time notifications.
+
+type noteWithTimeout struct {
+	gp       *g
+	deadline int64
+}
+
+var (
+	notes            = make(map[*note]*g)
+	notesWithTimeout = make(map[*note]noteWithTimeout)
+)
+
+func noteclear(n *note) {
+	n.key = note_cleared
+}
+
+func notewakeup(n *note) {
+	// gp := getg()
+	if n.key == note_woken {
+		throw("notewakeup - double wakeup")
+	}
+	cleared := n.key == note_cleared
+	n.key = note_woken
+	if cleared {
+		goready(notes[n], 1)
+	}
+}
+
+func notesleep(n *note) {
+	throw("notesleep not supported by js")
+}
+
+func notetsleep(n *note, ns int64) bool {
+	throw("notetsleep not supported by js")
+	return false
+}
+
+// same as runtime·notetsleep, but called on user g (not g0)
+func notetsleepg(n *note, ns int64) bool {
+	gp := getg()
+	if gp == gp.m.g0 {
+		throw("notetsleepg on g0")
+	}
+
+	if ns >= 0 {
+		deadline := nanotime() + ns
+		delay := ns/1000000 + 1 // round up
+		if delay > 1<<31-1 {
+			delay = 1<<31 - 1 // cap to max int32
+		}
+
+		id := scheduleTimeoutEvent(delay)
+		mp := acquirem()
+		notes[n] = gp
+		notesWithTimeout[n] = noteWithTimeout{gp: gp, deadline: deadline}
+		releasem(mp)
+
+		gopark(nil, nil, waitReasonSleep, traceEvNone, 1)
+
+		clearTimeoutEvent(id) // note might have woken early, clear timeout
+		clearIdleID()
+
+		mp = acquirem()
+		delete(notes, n)
+		delete(notesWithTimeout, n)
+		releasem(mp)
+
+		return n.key == note_woken
+	}
+
+	for n.key != note_woken {
+		mp := acquirem()
+		notes[n] = gp
+		releasem(mp)
+
+		gopark(nil, nil, waitReasonZero, traceEvNone, 1)
+
+		mp = acquirem()
+		delete(notes, n)
+		releasem(mp)
+	}
+	return true
+}
+
+// checkTimeouts resumes goroutines that are waiting on a note which has reached its deadline.
+func checkTimeouts() {
+	now := nanotime()
+	for n, nt := range notesWithTimeout {
+		if n.key == note_cleared && now >= nt.deadline {
+			n.key = note_timeout
+			goready(nt.gp, 1)
+		}
+	}
+}
+
+// events is a stack of calls from JavaScript into Go.
+var events []*event
+
+type event struct {
+	// g was the active goroutine when the call from JavaScript occurred.
+	// It needs to be active when returning to JavaScript.
+	gp *g
+	// returned reports whether the event handler has returned.
+	// When all goroutines are idle and the event handler has returned,
+	// then g gets resumed and returns the execution to JavaScript.
+	returned bool
+}
+
+// The timeout event started by beforeIdle.
+var idleID int32
+
+// beforeIdle gets called by the scheduler if no goroutine is awake.
+// If we are not already handling an event, then we pause for an async event.
+// If an event handler returned, we resume it and it will pause the execution.
+// beforeIdle either returns the specific goroutine to schedule next or
+// indicates with otherReady that some goroutine became ready.
+func beforeIdle(delay int64) (gp *g, otherReady bool) {
+	if delay > 0 {
+		clearIdleID()
+		if delay < 1e6 {
+			delay = 1
+		} else if delay < 1e15 {
+			delay = delay / 1e6
+		} else {
+			// An arbitrary cap on how long to wait for a timer.
+			// 1e9 ms == ~11.5 days.
+			delay = 1e9
+		}
+		idleID = scheduleTimeoutEvent(delay)
+	}
+
+	if len(events) == 0 {
+		go handleAsyncEvent()
+		return nil, true
+	}
+
+	e := events[len(events)-1]
+	if e.returned {
+		return e.gp, false
+	}
+	return nil, false
+}
+
+func handleAsyncEvent() {
+	pause(getcallersp() - 16)
+}
+
+// clearIdleID clears our record of the timeout started by beforeIdle.
+func clearIdleID() {
+	if idleID != 0 {
+		clearTimeoutEvent(idleID)
+		idleID = 0
+	}
+}
+
+// pause sets SP to newsp and pauses the execution of Go's WebAssembly code until an event is triggered.
+func pause(newsp uintptr)
+
+// scheduleTimeoutEvent tells the WebAssembly environment to trigger an event after ms milliseconds.
+// It returns a timer id that can be used with clearTimeoutEvent.
+func scheduleTimeoutEvent(ms int64) int32
+
+// clearTimeoutEvent clears a timeout event scheduled by scheduleTimeoutEvent.
+func clearTimeoutEvent(id int32)
+
+// handleEvent gets invoked on a call from JavaScript into Go. It calls the event handler of the syscall/js package
+// and then parks the handler goroutine to allow other goroutines to run before giving execution back to JavaScript.
+// When no other goroutine is awake any more, beforeIdle resumes the handler goroutine. Now that the same goroutine
+// is running as was running when the call came in from JavaScript, execution can be safely passed back to JavaScript.
+func handleEvent() {
+	e := &event{
+		gp:       getg(),
+		returned: false,
+	}
+	events = append(events, e)
+
+	eventHandler()
+
+	clearIdleID()
+
+	// wait until all goroutines are idle
+	e.returned = true
+	gopark(nil, nil, waitReasonZero, traceEvNone, 1)
+
+	events[len(events)-1] = nil
+	events = events[:len(events)-1]
+
+	// return execution to JavaScript
+	pause(getcallersp() - 16)
+}
+
+var eventHandler func()
+
+//go:linkname setEventHandler syscall/js.setEventHandler
+func setEventHandler(fn func()) {
+	eventHandler = fn
+}
diff --git a/src/runtime/lock_sema.go b/src/runtime/lock_sema.go
new file mode 100644
index 0000000..671e524
--- /dev/null
+++ b/src/runtime/lock_sema.go
@@ -0,0 +1,304 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin netbsd openbsd plan9 solaris windows
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// This implementation depends on OS-specific implementations of
+//
+//	func semacreate(mp *m)
+//		Create a semaphore for mp, if it does not already have one.
+//
+//	func semasleep(ns int64) int32
+//		If ns < 0, acquire m's semaphore and return 0.
+//		If ns >= 0, try to acquire m's semaphore for at most ns nanoseconds.
+//		Return 0 if the semaphore was acquired, -1 if interrupted or timed out.
+//
+//	func semawakeup(mp *m)
+//		Wake up mp, which is or will soon be sleeping on its semaphore.
+//
+const (
+	locked uintptr = 1
+
+	active_spin     = 4
+	active_spin_cnt = 30
+	passive_spin    = 1
+)
+
+func lock(l *mutex) {
+	lockWithRank(l, getLockRank(l))
+}
+
+func lock2(l *mutex) {
+	gp := getg()
+	if gp.m.locks < 0 {
+		throw("runtime·lock: lock count")
+	}
+	gp.m.locks++
+
+	// Speculative grab for lock.
+	if atomic.Casuintptr(&l.key, 0, locked) {
+		return
+	}
+	semacreate(gp.m)
+
+	// On uniprocessor's, no point spinning.
+	// On multiprocessors, spin for ACTIVE_SPIN attempts.
+	spin := 0
+	if ncpu > 1 {
+		spin = active_spin
+	}
+Loop:
+	for i := 0; ; i++ {
+		v := atomic.Loaduintptr(&l.key)
+		if v&locked == 0 {
+			// Unlocked. Try to lock.
+			if atomic.Casuintptr(&l.key, v, v|locked) {
+				return
+			}
+			i = 0
+		}
+		if i < spin {
+			procyield(active_spin_cnt)
+		} else if i < spin+passive_spin {
+			osyield()
+		} else {
+			// Someone else has it.
+			// l->waitm points to a linked list of M's waiting
+			// for this lock, chained through m->nextwaitm.
+			// Queue this M.
+			for {
+				gp.m.nextwaitm = muintptr(v &^ locked)
+				if atomic.Casuintptr(&l.key, v, uintptr(unsafe.Pointer(gp.m))|locked) {
+					break
+				}
+				v = atomic.Loaduintptr(&l.key)
+				if v&locked == 0 {
+					continue Loop
+				}
+			}
+			if v&locked != 0 {
+				// Queued. Wait.
+				semasleep(-1)
+				i = 0
+			}
+		}
+	}
+}
+
+func unlock(l *mutex) {
+	unlockWithRank(l)
+}
+
+//go:nowritebarrier
+// We might not be holding a p in this code.
+func unlock2(l *mutex) {
+	gp := getg()
+	var mp *m
+	for {
+		v := atomic.Loaduintptr(&l.key)
+		if v == locked {
+			if atomic.Casuintptr(&l.key, locked, 0) {
+				break
+			}
+		} else {
+			// Other M's are waiting for the lock.
+			// Dequeue an M.
+			mp = muintptr(v &^ locked).ptr()
+			if atomic.Casuintptr(&l.key, v, uintptr(mp.nextwaitm)) {
+				// Dequeued an M.  Wake it.
+				semawakeup(mp)
+				break
+			}
+		}
+	}
+	gp.m.locks--
+	if gp.m.locks < 0 {
+		throw("runtime·unlock: lock count")
+	}
+	if gp.m.locks == 0 && gp.preempt { // restore the preemption request in case we've cleared it in newstack
+		gp.stackguard0 = stackPreempt
+	}
+}
+
+// One-time notifications.
+func noteclear(n *note) {
+	if GOOS == "aix" {
+		// On AIX, semaphores might not synchronize the memory in some
+		// rare cases. See issue #30189.
+		atomic.Storeuintptr(&n.key, 0)
+	} else {
+		n.key = 0
+	}
+}
+
+func notewakeup(n *note) {
+	var v uintptr
+	for {
+		v = atomic.Loaduintptr(&n.key)
+		if atomic.Casuintptr(&n.key, v, locked) {
+			break
+		}
+	}
+
+	// Successfully set waitm to locked.
+	// What was it before?
+	switch {
+	case v == 0:
+		// Nothing was waiting. Done.
+	case v == locked:
+		// Two notewakeups! Not allowed.
+		throw("notewakeup - double wakeup")
+	default:
+		// Must be the waiting m. Wake it up.
+		semawakeup((*m)(unsafe.Pointer(v)))
+	}
+}
+
+func notesleep(n *note) {
+	gp := getg()
+	if gp != gp.m.g0 {
+		throw("notesleep not on g0")
+	}
+	semacreate(gp.m)
+	if !atomic.Casuintptr(&n.key, 0, uintptr(unsafe.Pointer(gp.m))) {
+		// Must be locked (got wakeup).
+		if n.key != locked {
+			throw("notesleep - waitm out of sync")
+		}
+		return
+	}
+	// Queued. Sleep.
+	gp.m.blocked = true
+	if *cgo_yield == nil {
+		semasleep(-1)
+	} else {
+		// Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
+		const ns = 10e6
+		for atomic.Loaduintptr(&n.key) == 0 {
+			semasleep(ns)
+			asmcgocall(*cgo_yield, nil)
+		}
+	}
+	gp.m.blocked = false
+}
+
+//go:nosplit
+func notetsleep_internal(n *note, ns int64, gp *g, deadline int64) bool {
+	// gp and deadline are logically local variables, but they are written
+	// as parameters so that the stack space they require is charged
+	// to the caller.
+	// This reduces the nosplit footprint of notetsleep_internal.
+	gp = getg()
+
+	// Register for wakeup on n->waitm.
+	if !atomic.Casuintptr(&n.key, 0, uintptr(unsafe.Pointer(gp.m))) {
+		// Must be locked (got wakeup).
+		if n.key != locked {
+			throw("notetsleep - waitm out of sync")
+		}
+		return true
+	}
+	if ns < 0 {
+		// Queued. Sleep.
+		gp.m.blocked = true
+		if *cgo_yield == nil {
+			semasleep(-1)
+		} else {
+			// Sleep in arbitrary-but-moderate intervals to poll libc interceptors.
+			const ns = 10e6
+			for semasleep(ns) < 0 {
+				asmcgocall(*cgo_yield, nil)
+			}
+		}
+		gp.m.blocked = false
+		return true
+	}
+
+	deadline = nanotime() + ns
+	for {
+		// Registered. Sleep.
+		gp.m.blocked = true
+		if *cgo_yield != nil && ns > 10e6 {
+			ns = 10e6
+		}
+		if semasleep(ns) >= 0 {
+			gp.m.blocked = false
+			// Acquired semaphore, semawakeup unregistered us.
+			// Done.
+			return true
+		}
+		if *cgo_yield != nil {
+			asmcgocall(*cgo_yield, nil)
+		}
+		gp.m.blocked = false
+		// Interrupted or timed out. Still registered. Semaphore not acquired.
+		ns = deadline - nanotime()
+		if ns <= 0 {
+			break
+		}
+		// Deadline hasn't arrived. Keep sleeping.
+	}
+
+	// Deadline arrived. Still registered. Semaphore not acquired.
+	// Want to give up and return, but have to unregister first,
+	// so that any notewakeup racing with the return does not
+	// try to grant us the semaphore when we don't expect it.
+	for {
+		v := atomic.Loaduintptr(&n.key)
+		switch v {
+		case uintptr(unsafe.Pointer(gp.m)):
+			// No wakeup yet; unregister if possible.
+			if atomic.Casuintptr(&n.key, v, 0) {
+				return false
+			}
+		case locked:
+			// Wakeup happened so semaphore is available.
+			// Grab it to avoid getting out of sync.
+			gp.m.blocked = true
+			if semasleep(-1) < 0 {
+				throw("runtime: unable to acquire - semaphore out of sync")
+			}
+			gp.m.blocked = false
+			return true
+		default:
+			throw("runtime: unexpected waitm - semaphore out of sync")
+		}
+	}
+}
+
+func notetsleep(n *note, ns int64) bool {
+	gp := getg()
+	if gp != gp.m.g0 {
+		throw("notetsleep not on g0")
+	}
+	semacreate(gp.m)
+	return notetsleep_internal(n, ns, nil, 0)
+}
+
+// same as runtime·notetsleep, but called on user g (not g0)
+// calls only nosplit functions between entersyscallblock/exitsyscall
+func notetsleepg(n *note, ns int64) bool {
+	gp := getg()
+	if gp == gp.m.g0 {
+		throw("notetsleepg on g0")
+	}
+	semacreate(gp.m)
+	entersyscallblock()
+	ok := notetsleep_internal(n, ns, nil, 0)
+	exitsyscall()
+	return ok
+}
+
+func beforeIdle(int64) (*g, bool) {
+	return nil, false
+}
+
+func checkTimeouts() {}
diff --git a/src/runtime/lockrank.go b/src/runtime/lockrank.go
new file mode 100644
index 0000000..b3c01ba
--- /dev/null
+++ b/src/runtime/lockrank.go
@@ -0,0 +1,248 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file records the static ranks of the locks in the runtime. If a lock
+// is not given a rank, then it is assumed to be a leaf lock, which means no other
+// lock can be acquired while it is held. Therefore, leaf locks do not need to be
+// given an explicit rank. We list all of the architecture-independent leaf locks
+// for documentation purposes, but don't list any of the architecture-dependent
+// locks (which are all leaf locks). debugLock is ignored for ranking, since it is used
+// when printing out lock ranking errors.
+//
+// lockInit(l *mutex, rank int) is used to set the rank of lock before it is used.
+// If there is no clear place to initialize a lock, then the rank of a lock can be
+// specified during the lock call itself via lockWithrank(l *mutex, rank int).
+//
+// Besides the static lock ranking (which is a total ordering of the locks), we
+// also represent and enforce the actual partial order among the locks in the
+// arcs[] array below. That is, if it is possible that lock B can be acquired when
+// lock A is the previous acquired lock that is still held, then there should be
+// an entry for A in arcs[B][]. We will currently fail not only if the total order
+// (the lock ranking) is violated, but also if there is a missing entry in the
+// partial order.
+
+package runtime
+
+type lockRank int
+
+// Constants representing the lock rank of the architecture-independent locks in
+// the runtime. Locks with lower rank must be taken before locks with higher
+// rank.
+const (
+	lockRankDummy lockRank = iota
+
+	// Locks held above sched
+	lockRankSysmon
+	lockRankScavenge
+	lockRankForcegc
+	lockRankSweepWaiters
+	lockRankAssistQueue
+	lockRankCpuprof
+	lockRankSweep
+
+	lockRankPollDesc
+	lockRankSched
+	lockRankDeadlock
+	lockRankPanic
+	lockRankAllg
+	lockRankAllp
+
+	lockRankTimers // Multiple timers locked simultaneously in destroy()
+	lockRankItab
+	lockRankReflectOffs
+	lockRankHchan // Multiple hchans acquired in lock order in syncadjustsudogs()
+	lockRankFin
+	lockRankNotifyList
+	lockRankTraceBuf
+	lockRankTraceStrings
+	lockRankMspanSpecial
+	lockRankProf
+	lockRankGcBitsArenas
+	lockRankRoot
+	lockRankTrace
+	lockRankTraceStackTab
+	lockRankNetpollInit
+
+	lockRankRwmutexW
+	lockRankRwmutexR
+
+	lockRankSpanSetSpine
+	lockRankGscan
+	lockRankStackpool
+	lockRankStackLarge
+	lockRankDefer
+	lockRankSudog
+
+	// Memory-related non-leaf locks
+	lockRankWbufSpans
+	lockRankMheap
+	lockRankMheapSpecial
+
+	// Memory-related leaf locks
+	lockRankGlobalAlloc
+
+	// Other leaf locks
+	lockRankGFree
+	// Generally, hchan must be acquired before gscan. But in one specific
+	// case (in syncadjustsudogs from markroot after the g has been suspended
+	// by suspendG), we allow gscan to be acquired, and then an hchan lock. To
+	// allow this case, we get this lockRankHchanLeaf rank in
+	// syncadjustsudogs(), rather than lockRankHchan. By using this special
+	// rank, we don't allow any further locks to be acquired other than more
+	// hchan locks.
+	lockRankHchanLeaf
+
+	// Leaf locks with no dependencies, so these constants are not actually used anywhere.
+	// There are other architecture-dependent leaf locks as well.
+	lockRankNewmHandoff
+	lockRankDebugPtrmask
+	lockRankFaketimeState
+	lockRankTicks
+	lockRankRaceFini
+	lockRankPollCache
+	lockRankDebug
+)
+
+// lockRankLeafRank is the rank of lock that does not have a declared rank, and hence is
+// a leaf lock.
+const lockRankLeafRank lockRank = 1000
+
+// lockNames gives the names associated with each of the above ranks
+var lockNames = []string{
+	lockRankDummy: "",
+
+	lockRankSysmon:       "sysmon",
+	lockRankScavenge:     "scavenge",
+	lockRankForcegc:      "forcegc",
+	lockRankSweepWaiters: "sweepWaiters",
+	lockRankAssistQueue:  "assistQueue",
+	lockRankCpuprof:      "cpuprof",
+	lockRankSweep:        "sweep",
+
+	lockRankPollDesc: "pollDesc",
+	lockRankSched:    "sched",
+	lockRankDeadlock: "deadlock",
+	lockRankPanic:    "panic",
+	lockRankAllg:     "allg",
+	lockRankAllp:     "allp",
+
+	lockRankTimers:      "timers",
+	lockRankItab:        "itab",
+	lockRankReflectOffs: "reflectOffs",
+
+	lockRankHchan:         "hchan",
+	lockRankFin:           "fin",
+	lockRankNotifyList:    "notifyList",
+	lockRankTraceBuf:      "traceBuf",
+	lockRankTraceStrings:  "traceStrings",
+	lockRankMspanSpecial:  "mspanSpecial",
+	lockRankProf:          "prof",
+	lockRankGcBitsArenas:  "gcBitsArenas",
+	lockRankRoot:          "root",
+	lockRankTrace:         "trace",
+	lockRankTraceStackTab: "traceStackTab",
+	lockRankNetpollInit:   "netpollInit",
+
+	lockRankRwmutexW: "rwmutexW",
+	lockRankRwmutexR: "rwmutexR",
+
+	lockRankSpanSetSpine: "spanSetSpine",
+	lockRankGscan:        "gscan",
+	lockRankStackpool:    "stackpool",
+	lockRankStackLarge:   "stackLarge",
+	lockRankDefer:        "defer",
+	lockRankSudog:        "sudog",
+
+	lockRankWbufSpans:    "wbufSpans",
+	lockRankMheap:        "mheap",
+	lockRankMheapSpecial: "mheapSpecial",
+
+	lockRankGlobalAlloc: "globalAlloc.mutex",
+
+	lockRankGFree:     "gFree",
+	lockRankHchanLeaf: "hchanLeaf",
+
+	lockRankNewmHandoff:   "newmHandoff.lock",
+	lockRankDebugPtrmask:  "debugPtrmask.lock",
+	lockRankFaketimeState: "faketimeState.lock",
+	lockRankTicks:         "ticks.lock",
+	lockRankRaceFini:      "raceFiniLock",
+	lockRankPollCache:     "pollCache.lock",
+	lockRankDebug:         "debugLock",
+}
+
+func (rank lockRank) String() string {
+	if rank == 0 {
+		return "UNKNOWN"
+	}
+	if rank == lockRankLeafRank {
+		return "LEAF"
+	}
+	return lockNames[rank]
+}
+
+// lockPartialOrder is a partial order among the various lock types, listing the
+// immediate ordering that has actually been observed in the runtime. Each entry
+// (which corresponds to a particular lock rank) specifies the list of locks
+// that can already be held immediately "above" it.
+//
+// So, for example, the lockRankSched entry shows that all the locks preceding
+// it in rank can actually be held. The allp lock shows that only the sysmon or
+// sched lock can be held immediately above it when it is acquired.
+var lockPartialOrder [][]lockRank = [][]lockRank{
+	lockRankDummy:         {},
+	lockRankSysmon:        {},
+	lockRankScavenge:      {lockRankSysmon},
+	lockRankForcegc:       {lockRankSysmon},
+	lockRankSweepWaiters:  {},
+	lockRankAssistQueue:   {},
+	lockRankCpuprof:       {},
+	lockRankSweep:         {},
+	lockRankPollDesc:      {},
+	lockRankSched:         {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankPollDesc},
+	lockRankDeadlock:      {lockRankDeadlock},
+	lockRankPanic:         {lockRankDeadlock},
+	lockRankAllg:          {lockRankSysmon, lockRankSched, lockRankPanic},
+	lockRankAllp:          {lockRankSysmon, lockRankSched},
+	lockRankTimers:        {lockRankSysmon, lockRankScavenge, lockRankSched, lockRankAllp, lockRankPollDesc, lockRankTimers},
+	lockRankItab:          {},
+	lockRankReflectOffs:   {lockRankItab},
+	lockRankHchan:         {lockRankScavenge, lockRankSweep, lockRankHchan},
+	lockRankFin:           {lockRankSysmon, lockRankScavenge, lockRankSched, lockRankAllg, lockRankTimers, lockRankHchan},
+	lockRankNotifyList:    {},
+	lockRankTraceBuf:      {lockRankSysmon, lockRankScavenge},
+	lockRankTraceStrings:  {lockRankTraceBuf},
+	lockRankMspanSpecial:  {lockRankSysmon, lockRankScavenge, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankHchan, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings},
+	lockRankProf:          {lockRankSysmon, lockRankScavenge, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan},
+	lockRankGcBitsArenas:  {lockRankSysmon, lockRankScavenge, lockRankAssistQueue, lockRankCpuprof, lockRankSched, lockRankAllg, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan},
+	lockRankRoot:          {},
+	lockRankTrace:         {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankAssistQueue, lockRankSched, lockRankHchan, lockRankTraceBuf, lockRankTraceStrings, lockRankRoot, lockRankSweep},
+	lockRankTraceStackTab: {lockRankScavenge, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankSched, lockRankAllg, lockRankTimers, lockRankHchan, lockRankFin, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankRoot, lockRankTrace},
+	lockRankNetpollInit:   {lockRankTimers},
+
+	lockRankRwmutexW: {},
+	lockRankRwmutexR: {lockRankSysmon, lockRankRwmutexW},
+
+	lockRankSpanSetSpine: {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankPollDesc, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan},
+	lockRankGscan:        {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankHchan, lockRankFin, lockRankTraceBuf, lockRankTraceStrings, lockRankRoot, lockRankNotifyList, lockRankProf, lockRankGcBitsArenas, lockRankTrace, lockRankTraceStackTab, lockRankNetpollInit, lockRankSpanSetSpine},
+	lockRankStackpool:    {lockRankSysmon, lockRankScavenge, lockRankSweepWaiters, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankPollDesc, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankHchan, lockRankFin, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankProf, lockRankGcBitsArenas, lockRankRoot, lockRankTrace, lockRankTraceStackTab, lockRankNetpollInit, lockRankRwmutexR, lockRankSpanSetSpine, lockRankGscan},
+	lockRankStackLarge:   {lockRankSysmon, lockRankAssistQueue, lockRankSched, lockRankItab, lockRankHchan, lockRankProf, lockRankGcBitsArenas, lockRankRoot, lockRankSpanSetSpine, lockRankGscan},
+	lockRankDefer:        {},
+	lockRankSudog:        {lockRankNotifyList, lockRankHchan},
+	lockRankWbufSpans:    {lockRankSysmon, lockRankScavenge, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankSched, lockRankAllg, lockRankPollDesc, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankHchan, lockRankFin, lockRankNotifyList, lockRankTraceStrings, lockRankMspanSpecial, lockRankProf, lockRankRoot, lockRankGscan, lockRankDefer, lockRankSudog},
+	lockRankMheap:        {lockRankSysmon, lockRankScavenge, lockRankSweepWaiters, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankAllg, lockRankAllp, lockRankFin, lockRankPollDesc, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan, lockRankMspanSpecial, lockRankProf, lockRankGcBitsArenas, lockRankRoot, lockRankGscan, lockRankStackpool, lockRankStackLarge, lockRankDefer, lockRankSudog, lockRankWbufSpans, lockRankSpanSetSpine},
+	lockRankMheapSpecial: {lockRankSysmon, lockRankScavenge, lockRankAssistQueue, lockRankCpuprof, lockRankSweep, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankItab, lockRankReflectOffs, lockRankNotifyList, lockRankTraceBuf, lockRankTraceStrings, lockRankHchan},
+	lockRankGlobalAlloc:  {lockRankProf, lockRankSpanSetSpine, lockRankMheap, lockRankMheapSpecial},
+
+	lockRankGFree:     {lockRankSched},
+	lockRankHchanLeaf: {lockRankGscan, lockRankHchanLeaf},
+
+	lockRankNewmHandoff:   {},
+	lockRankDebugPtrmask:  {},
+	lockRankFaketimeState: {},
+	lockRankTicks:         {},
+	lockRankRaceFini:      {},
+	lockRankPollCache:     {},
+	lockRankDebug:         {},
+}
diff --git a/src/runtime/lockrank_off.go b/src/runtime/lockrank_off.go
new file mode 100644
index 0000000..7dcd8f5
--- /dev/null
+++ b/src/runtime/lockrank_off.go
@@ -0,0 +1,64 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !goexperiment.staticlockranking
+
+package runtime
+
+// // lockRankStruct is embedded in mutex, but is empty when staticklockranking is
+// disabled (the default)
+type lockRankStruct struct {
+}
+
+func lockInit(l *mutex, rank lockRank) {
+}
+
+func getLockRank(l *mutex) lockRank {
+	return 0
+}
+
+func lockWithRank(l *mutex, rank lockRank) {
+	lock2(l)
+}
+
+// This function may be called in nosplit context and thus must be nosplit.
+//go:nosplit
+func acquireLockRank(rank lockRank) {
+}
+
+func unlockWithRank(l *mutex) {
+	unlock2(l)
+}
+
+// This function may be called in nosplit context and thus must be nosplit.
+//go:nosplit
+func releaseLockRank(rank lockRank) {
+}
+
+func lockWithRankMayAcquire(l *mutex, rank lockRank) {
+}
+
+//go:nosplit
+func assertLockHeld(l *mutex) {
+}
+
+//go:nosplit
+func assertRankHeld(r lockRank) {
+}
+
+//go:nosplit
+func worldStopped() {
+}
+
+//go:nosplit
+func worldStarted() {
+}
+
+//go:nosplit
+func assertWorldStopped() {
+}
+
+//go:nosplit
+func assertWorldStoppedOrLockHeld(l *mutex) {
+}
diff --git a/src/runtime/lockrank_on.go b/src/runtime/lockrank_on.go
new file mode 100644
index 0000000..88ac95a
--- /dev/null
+++ b/src/runtime/lockrank_on.go
@@ -0,0 +1,383 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build goexperiment.staticlockranking
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// worldIsStopped is accessed atomically to track world-stops. 1 == world
+// stopped.
+var worldIsStopped uint32
+
+// lockRankStruct is embedded in mutex
+type lockRankStruct struct {
+	// static lock ranking of the lock
+	rank lockRank
+	// pad field to make sure lockRankStruct is a multiple of 8 bytes, even on
+	// 32-bit systems.
+	pad int
+}
+
+// init checks that the partial order in lockPartialOrder fits within the total
+// order determined by the order of the lockRank constants.
+func init() {
+	for rank, list := range lockPartialOrder {
+		for _, entry := range list {
+			if entry > lockRank(rank) {
+				println("lockPartial order row", lockRank(rank).String(), "entry", entry.String())
+				throw("lockPartialOrder table is inconsistent with total lock ranking order")
+			}
+		}
+	}
+}
+
+func lockInit(l *mutex, rank lockRank) {
+	l.rank = rank
+}
+
+func getLockRank(l *mutex) lockRank {
+	return l.rank
+}
+
+// lockWithRank is like lock(l), but allows the caller to specify a lock rank
+// when acquiring a non-static lock.
+//
+// Note that we need to be careful about stack splits:
+//
+// This function is not nosplit, thus it may split at function entry. This may
+// introduce a new edge in the lock order, but it is no different from any
+// other (nosplit) call before this call (including the call to lock() itself).
+//
+// However, we switch to the systemstack to record the lock held to ensure that
+// we record an accurate lock ordering. e.g., without systemstack, a stack
+// split on entry to lock2() would record stack split locks as taken after l,
+// even though l is not actually locked yet.
+func lockWithRank(l *mutex, rank lockRank) {
+	if l == &debuglock || l == &paniclk {
+		// debuglock is only used for println/printlock(). Don't do lock
+		// rank recording for it, since print/println are used when
+		// printing out a lock ordering problem below.
+		//
+		// paniclk has an ordering problem, since it can be acquired
+		// during a panic with any other locks held (especially if the
+		// panic is because of a directed segv), and yet also allg is
+		// acquired after paniclk in tracebackothers()). This is a genuine
+		// problem, so for now we don't do lock rank recording for paniclk
+		// either.
+		lock2(l)
+		return
+	}
+	if rank == 0 {
+		rank = lockRankLeafRank
+	}
+	gp := getg()
+	// Log the new class.
+	systemstack(func() {
+		i := gp.m.locksHeldLen
+		if i >= len(gp.m.locksHeld) {
+			throw("too many locks held concurrently for rank checking")
+		}
+		gp.m.locksHeld[i].rank = rank
+		gp.m.locksHeld[i].lockAddr = uintptr(unsafe.Pointer(l))
+		gp.m.locksHeldLen++
+
+		// i is the index of the lock being acquired
+		if i > 0 {
+			checkRanks(gp, gp.m.locksHeld[i-1].rank, rank)
+		}
+		lock2(l)
+	})
+}
+
+// nosplit to ensure it can be called in as many contexts as possible.
+//go:nosplit
+func printHeldLocks(gp *g) {
+	if gp.m.locksHeldLen == 0 {
+		println("<none>")
+		return
+	}
+
+	for j, held := range gp.m.locksHeld[:gp.m.locksHeldLen] {
+		println(j, ":", held.rank.String(), held.rank, unsafe.Pointer(gp.m.locksHeld[j].lockAddr))
+	}
+}
+
+// acquireLockRank acquires a rank which is not associated with a mutex lock
+//
+// This function may be called in nosplit context and thus must be nosplit.
+//go:nosplit
+func acquireLockRank(rank lockRank) {
+	gp := getg()
+	// Log the new class. See comment on lockWithRank.
+	systemstack(func() {
+		i := gp.m.locksHeldLen
+		if i >= len(gp.m.locksHeld) {
+			throw("too many locks held concurrently for rank checking")
+		}
+		gp.m.locksHeld[i].rank = rank
+		gp.m.locksHeld[i].lockAddr = 0
+		gp.m.locksHeldLen++
+
+		// i is the index of the lock being acquired
+		if i > 0 {
+			checkRanks(gp, gp.m.locksHeld[i-1].rank, rank)
+		}
+	})
+}
+
+// checkRanks checks if goroutine g, which has mostly recently acquired a lock
+// with rank 'prevRank', can now acquire a lock with rank 'rank'.
+//
+//go:systemstack
+func checkRanks(gp *g, prevRank, rank lockRank) {
+	rankOK := false
+	if rank < prevRank {
+		// If rank < prevRank, then we definitely have a rank error
+		rankOK = false
+	} else if rank == lockRankLeafRank {
+		// If new lock is a leaf lock, then the preceding lock can
+		// be anything except another leaf lock.
+		rankOK = prevRank < lockRankLeafRank
+	} else {
+		// We've now verified the total lock ranking, but we
+		// also enforce the partial ordering specified by
+		// lockPartialOrder as well. Two locks with the same rank
+		// can only be acquired at the same time if explicitly
+		// listed in the lockPartialOrder table.
+		list := lockPartialOrder[rank]
+		for _, entry := range list {
+			if entry == prevRank {
+				rankOK = true
+				break
+			}
+		}
+	}
+	if !rankOK {
+		printlock()
+		println(gp.m.procid, " ======")
+		printHeldLocks(gp)
+		throw("lock ordering problem")
+	}
+}
+
+// See comment on lockWithRank regarding stack splitting.
+func unlockWithRank(l *mutex) {
+	if l == &debuglock || l == &paniclk {
+		// See comment at beginning of lockWithRank.
+		unlock2(l)
+		return
+	}
+	gp := getg()
+	systemstack(func() {
+		found := false
+		for i := gp.m.locksHeldLen - 1; i >= 0; i-- {
+			if gp.m.locksHeld[i].lockAddr == uintptr(unsafe.Pointer(l)) {
+				found = true
+				copy(gp.m.locksHeld[i:gp.m.locksHeldLen-1], gp.m.locksHeld[i+1:gp.m.locksHeldLen])
+				gp.m.locksHeldLen--
+				break
+			}
+		}
+		if !found {
+			println(gp.m.procid, ":", l.rank.String(), l.rank, l)
+			throw("unlock without matching lock acquire")
+		}
+		unlock2(l)
+	})
+}
+
+// releaseLockRank releases a rank which is not associated with a mutex lock
+//
+// This function may be called in nosplit context and thus must be nosplit.
+//go:nosplit
+func releaseLockRank(rank lockRank) {
+	gp := getg()
+	systemstack(func() {
+		found := false
+		for i := gp.m.locksHeldLen - 1; i >= 0; i-- {
+			if gp.m.locksHeld[i].rank == rank && gp.m.locksHeld[i].lockAddr == 0 {
+				found = true
+				copy(gp.m.locksHeld[i:gp.m.locksHeldLen-1], gp.m.locksHeld[i+1:gp.m.locksHeldLen])
+				gp.m.locksHeldLen--
+				break
+			}
+		}
+		if !found {
+			println(gp.m.procid, ":", rank.String(), rank)
+			throw("lockRank release without matching lockRank acquire")
+		}
+	})
+}
+
+// See comment on lockWithRank regarding stack splitting.
+func lockWithRankMayAcquire(l *mutex, rank lockRank) {
+	gp := getg()
+	if gp.m.locksHeldLen == 0 {
+		// No possibilty of lock ordering problem if no other locks held
+		return
+	}
+
+	systemstack(func() {
+		i := gp.m.locksHeldLen
+		if i >= len(gp.m.locksHeld) {
+			throw("too many locks held concurrently for rank checking")
+		}
+		// Temporarily add this lock to the locksHeld list, so
+		// checkRanks() will print out list, including this lock, if there
+		// is a lock ordering problem.
+		gp.m.locksHeld[i].rank = rank
+		gp.m.locksHeld[i].lockAddr = uintptr(unsafe.Pointer(l))
+		gp.m.locksHeldLen++
+		checkRanks(gp, gp.m.locksHeld[i-1].rank, rank)
+		gp.m.locksHeldLen--
+	})
+}
+
+// nosplit to ensure it can be called in as many contexts as possible.
+//go:nosplit
+func checkLockHeld(gp *g, l *mutex) bool {
+	for i := gp.m.locksHeldLen - 1; i >= 0; i-- {
+		if gp.m.locksHeld[i].lockAddr == uintptr(unsafe.Pointer(l)) {
+			return true
+		}
+	}
+	return false
+}
+
+// assertLockHeld throws if l is not held by the caller.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//go:nosplit
+func assertLockHeld(l *mutex) {
+	gp := getg()
+
+	held := checkLockHeld(gp, l)
+	if held {
+		return
+	}
+
+	// Crash from system stack to avoid splits that may cause
+	// additional issues.
+	systemstack(func() {
+		printlock()
+		print("caller requires lock ", l, " (rank ", l.rank.String(), "), holding:\n")
+		printHeldLocks(gp)
+		throw("not holding required lock!")
+	})
+}
+
+// assertRankHeld throws if a mutex with rank r is not held by the caller.
+//
+// This is less precise than assertLockHeld, but can be used in places where a
+// pointer to the exact mutex is not available.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//go:nosplit
+func assertRankHeld(r lockRank) {
+	gp := getg()
+
+	for i := gp.m.locksHeldLen - 1; i >= 0; i-- {
+		if gp.m.locksHeld[i].rank == r {
+			return
+		}
+	}
+
+	// Crash from system stack to avoid splits that may cause
+	// additional issues.
+	systemstack(func() {
+		printlock()
+		print("caller requires lock with rank ", r.String(), "), holding:\n")
+		printHeldLocks(gp)
+		throw("not holding required lock!")
+	})
+}
+
+// worldStopped notes that the world is stopped.
+//
+// Caller must hold worldsema.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//go:nosplit
+func worldStopped() {
+	if stopped := atomic.Xadd(&worldIsStopped, 1); stopped != 1 {
+		systemstack(func() {
+			print("world stop count=", stopped, "\n")
+			throw("recursive world stop")
+		})
+	}
+}
+
+// worldStarted that the world is starting.
+//
+// Caller must hold worldsema.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//go:nosplit
+func worldStarted() {
+	if stopped := atomic.Xadd(&worldIsStopped, -1); stopped != 0 {
+		systemstack(func() {
+			print("world stop count=", stopped, "\n")
+			throw("released non-stopped world stop")
+		})
+	}
+}
+
+// nosplit to ensure it can be called in as many contexts as possible.
+//go:nosplit
+func checkWorldStopped() bool {
+	stopped := atomic.Load(&worldIsStopped)
+	if stopped > 1 {
+		systemstack(func() {
+			print("inconsistent world stop count=", stopped, "\n")
+			throw("inconsistent world stop count")
+		})
+	}
+
+	return stopped == 1
+}
+
+// assertWorldStopped throws if the world is not stopped. It does not check
+// which M stopped the world.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//go:nosplit
+func assertWorldStopped() {
+	if checkWorldStopped() {
+		return
+	}
+
+	throw("world not stopped")
+}
+
+// assertWorldStoppedOrLockHeld throws if the world is not stopped and the
+// passed lock is not held.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//go:nosplit
+func assertWorldStoppedOrLockHeld(l *mutex) {
+	if checkWorldStopped() {
+		return
+	}
+
+	gp := getg()
+	held := checkLockHeld(gp, l)
+	if held {
+		return
+	}
+
+	// Crash from system stack to avoid splits that may cause
+	// additional issues.
+	systemstack(func() {
+		printlock()
+		print("caller requires world stop or lock ", l, " (rank ", l.rank.String(), "), holding:\n")
+		println("<no world stop>")
+		printHeldLocks(gp)
+		throw("no world stop or required lock!")
+	})
+}
diff --git a/src/runtime/malloc.go b/src/runtime/malloc.go
new file mode 100644
index 0000000..f20ded5
--- /dev/null
+++ b/src/runtime/malloc.go
@@ -0,0 +1,1452 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Memory allocator.
+//
+// This was originally based on tcmalloc, but has diverged quite a bit.
+// http://goog-perftools.sourceforge.net/doc/tcmalloc.html
+
+// The main allocator works in runs of pages.
+// Small allocation sizes (up to and including 32 kB) are
+// rounded to one of about 70 size classes, each of which
+// has its own free set of objects of exactly that size.
+// Any free page of memory can be split into a set of objects
+// of one size class, which are then managed using a free bitmap.
+//
+// The allocator's data structures are:
+//
+//	fixalloc: a free-list allocator for fixed-size off-heap objects,
+//		used to manage storage used by the allocator.
+//	mheap: the malloc heap, managed at page (8192-byte) granularity.
+//	mspan: a run of in-use pages managed by the mheap.
+//	mcentral: collects all spans of a given size class.
+//	mcache: a per-P cache of mspans with free space.
+//	mstats: allocation statistics.
+//
+// Allocating a small object proceeds up a hierarchy of caches:
+//
+//	1. Round the size up to one of the small size classes
+//	   and look in the corresponding mspan in this P's mcache.
+//	   Scan the mspan's free bitmap to find a free slot.
+//	   If there is a free slot, allocate it.
+//	   This can all be done without acquiring a lock.
+//
+//	2. If the mspan has no free slots, obtain a new mspan
+//	   from the mcentral's list of mspans of the required size
+//	   class that have free space.
+//	   Obtaining a whole span amortizes the cost of locking
+//	   the mcentral.
+//
+//	3. If the mcentral's mspan list is empty, obtain a run
+//	   of pages from the mheap to use for the mspan.
+//
+//	4. If the mheap is empty or has no page runs large enough,
+//	   allocate a new group of pages (at least 1MB) from the
+//	   operating system. Allocating a large run of pages
+//	   amortizes the cost of talking to the operating system.
+//
+// Sweeping an mspan and freeing objects on it proceeds up a similar
+// hierarchy:
+//
+//	1. If the mspan is being swept in response to allocation, it
+//	   is returned to the mcache to satisfy the allocation.
+//
+//	2. Otherwise, if the mspan still has allocated objects in it,
+//	   it is placed on the mcentral free list for the mspan's size
+//	   class.
+//
+//	3. Otherwise, if all objects in the mspan are free, the mspan's
+//	   pages are returned to the mheap and the mspan is now dead.
+//
+// Allocating and freeing a large object uses the mheap
+// directly, bypassing the mcache and mcentral.
+//
+// If mspan.needzero is false, then free object slots in the mspan are
+// already zeroed. Otherwise if needzero is true, objects are zeroed as
+// they are allocated. There are various benefits to delaying zeroing
+// this way:
+//
+//	1. Stack frame allocation can avoid zeroing altogether.
+//
+//	2. It exhibits better temporal locality, since the program is
+//	   probably about to write to the memory.
+//
+//	3. We don't zero pages that never get reused.
+
+// Virtual memory layout
+//
+// The heap consists of a set of arenas, which are 64MB on 64-bit and
+// 4MB on 32-bit (heapArenaBytes). Each arena's start address is also
+// aligned to the arena size.
+//
+// Each arena has an associated heapArena object that stores the
+// metadata for that arena: the heap bitmap for all words in the arena
+// and the span map for all pages in the arena. heapArena objects are
+// themselves allocated off-heap.
+//
+// Since arenas are aligned, the address space can be viewed as a
+// series of arena frames. The arena map (mheap_.arenas) maps from
+// arena frame number to *heapArena, or nil for parts of the address
+// space not backed by the Go heap. The arena map is structured as a
+// two-level array consisting of a "L1" arena map and many "L2" arena
+// maps; however, since arenas are large, on many architectures, the
+// arena map consists of a single, large L2 map.
+//
+// The arena map covers the entire possible address space, allowing
+// the Go heap to use any part of the address space. The allocator
+// attempts to keep arenas contiguous so that large spans (and hence
+// large objects) can cross arenas.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/math"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	debugMalloc = false
+
+	maxTinySize   = _TinySize
+	tinySizeClass = _TinySizeClass
+	maxSmallSize  = _MaxSmallSize
+
+	pageShift = _PageShift
+	pageSize  = _PageSize
+	pageMask  = _PageMask
+	// By construction, single page spans of the smallest object class
+	// have the most objects per span.
+	maxObjsPerSpan = pageSize / 8
+
+	concurrentSweep = _ConcurrentSweep
+
+	_PageSize = 1 << _PageShift
+	_PageMask = _PageSize - 1
+
+	// _64bit = 1 on 64-bit systems, 0 on 32-bit systems
+	_64bit = 1 << (^uintptr(0) >> 63) / 2
+
+	// Tiny allocator parameters, see "Tiny allocator" comment in malloc.go.
+	_TinySize      = 16
+	_TinySizeClass = int8(2)
+
+	_FixAllocChunk = 16 << 10 // Chunk size for FixAlloc
+
+	// Per-P, per order stack segment cache size.
+	_StackCacheSize = 32 * 1024
+
+	// Number of orders that get caching. Order 0 is FixedStack
+	// and each successive order is twice as large.
+	// We want to cache 2KB, 4KB, 8KB, and 16KB stacks. Larger stacks
+	// will be allocated directly.
+	// Since FixedStack is different on different systems, we
+	// must vary NumStackOrders to keep the same maximum cached size.
+	//   OS               | FixedStack | NumStackOrders
+	//   -----------------+------------+---------------
+	//   linux/darwin/bsd | 2KB        | 4
+	//   windows/32       | 4KB        | 3
+	//   windows/64       | 8KB        | 2
+	//   plan9            | 4KB        | 3
+	_NumStackOrders = 4 - sys.PtrSize/4*sys.GoosWindows - 1*sys.GoosPlan9
+
+	// heapAddrBits is the number of bits in a heap address. On
+	// amd64, addresses are sign-extended beyond heapAddrBits. On
+	// other arches, they are zero-extended.
+	//
+	// On most 64-bit platforms, we limit this to 48 bits based on a
+	// combination of hardware and OS limitations.
+	//
+	// amd64 hardware limits addresses to 48 bits, sign-extended
+	// to 64 bits. Addresses where the top 16 bits are not either
+	// all 0 or all 1 are "non-canonical" and invalid. Because of
+	// these "negative" addresses, we offset addresses by 1<<47
+	// (arenaBaseOffset) on amd64 before computing indexes into
+	// the heap arenas index. In 2017, amd64 hardware added
+	// support for 57 bit addresses; however, currently only Linux
+	// supports this extension and the kernel will never choose an
+	// address above 1<<47 unless mmap is called with a hint
+	// address above 1<<47 (which we never do).
+	//
+	// arm64 hardware (as of ARMv8) limits user addresses to 48
+	// bits, in the range [0, 1<<48).
+	//
+	// ppc64, mips64, and s390x support arbitrary 64 bit addresses
+	// in hardware. On Linux, Go leans on stricter OS limits. Based
+	// on Linux's processor.h, the user address space is limited as
+	// follows on 64-bit architectures:
+	//
+	// Architecture  Name              Maximum Value (exclusive)
+	// ---------------------------------------------------------------------
+	// amd64         TASK_SIZE_MAX     0x007ffffffff000 (47 bit addresses)
+	// arm64         TASK_SIZE_64      0x01000000000000 (48 bit addresses)
+	// ppc64{,le}    TASK_SIZE_USER64  0x00400000000000 (46 bit addresses)
+	// mips64{,le}   TASK_SIZE64       0x00010000000000 (40 bit addresses)
+	// s390x         TASK_SIZE         1<<64 (64 bit addresses)
+	//
+	// These limits may increase over time, but are currently at
+	// most 48 bits except on s390x. On all architectures, Linux
+	// starts placing mmap'd regions at addresses that are
+	// significantly below 48 bits, so even if it's possible to
+	// exceed Go's 48 bit limit, it's extremely unlikely in
+	// practice.
+	//
+	// On 32-bit platforms, we accept the full 32-bit address
+	// space because doing so is cheap.
+	// mips32 only has access to the low 2GB of virtual memory, so
+	// we further limit it to 31 bits.
+	//
+	// On ios/arm64, although 64-bit pointers are presumably
+	// available, pointers are truncated to 33 bits. Furthermore,
+	// only the top 4 GiB of the address space are actually available
+	// to the application, but we allow the whole 33 bits anyway for
+	// simplicity.
+	// TODO(mknyszek): Consider limiting it to 32 bits and using
+	// arenaBaseOffset to offset into the top 4 GiB.
+	//
+	// WebAssembly currently has a limit of 4GB linear memory.
+	heapAddrBits = (_64bit*(1-sys.GoarchWasm)*(1-sys.GoosIos*sys.GoarchArm64))*48 + (1-_64bit+sys.GoarchWasm)*(32-(sys.GoarchMips+sys.GoarchMipsle)) + 33*sys.GoosIos*sys.GoarchArm64
+
+	// maxAlloc is the maximum size of an allocation. On 64-bit,
+	// it's theoretically possible to allocate 1<<heapAddrBits bytes. On
+	// 32-bit, however, this is one less than 1<<32 because the
+	// number of bytes in the address space doesn't actually fit
+	// in a uintptr.
+	maxAlloc = (1 << heapAddrBits) - (1-_64bit)*1
+
+	// The number of bits in a heap address, the size of heap
+	// arenas, and the L1 and L2 arena map sizes are related by
+	//
+	//   (1 << addr bits) = arena size * L1 entries * L2 entries
+	//
+	// Currently, we balance these as follows:
+	//
+	//       Platform  Addr bits  Arena size  L1 entries   L2 entries
+	// --------------  ---------  ----------  ----------  -----------
+	//       */64-bit         48        64MB           1    4M (32MB)
+	// windows/64-bit         48         4MB          64    1M  (8MB)
+	//       */32-bit         32         4MB           1  1024  (4KB)
+	//     */mips(le)         31         4MB           1   512  (2KB)
+
+	// heapArenaBytes is the size of a heap arena. The heap
+	// consists of mappings of size heapArenaBytes, aligned to
+	// heapArenaBytes. The initial heap mapping is one arena.
+	//
+	// This is currently 64MB on 64-bit non-Windows and 4MB on
+	// 32-bit and on Windows. We use smaller arenas on Windows
+	// because all committed memory is charged to the process,
+	// even if it's not touched. Hence, for processes with small
+	// heaps, the mapped arena space needs to be commensurate.
+	// This is particularly important with the race detector,
+	// since it significantly amplifies the cost of committed
+	// memory.
+	heapArenaBytes = 1 << logHeapArenaBytes
+
+	// logHeapArenaBytes is log_2 of heapArenaBytes. For clarity,
+	// prefer using heapArenaBytes where possible (we need the
+	// constant to compute some other constants).
+	logHeapArenaBytes = (6+20)*(_64bit*(1-sys.GoosWindows)*(1-sys.GoarchWasm)) + (2+20)*(_64bit*sys.GoosWindows) + (2+20)*(1-_64bit) + (2+20)*sys.GoarchWasm
+
+	// heapArenaBitmapBytes is the size of each heap arena's bitmap.
+	heapArenaBitmapBytes = heapArenaBytes / (sys.PtrSize * 8 / 2)
+
+	pagesPerArena = heapArenaBytes / pageSize
+
+	// arenaL1Bits is the number of bits of the arena number
+	// covered by the first level arena map.
+	//
+	// This number should be small, since the first level arena
+	// map requires PtrSize*(1<<arenaL1Bits) of space in the
+	// binary's BSS. It can be zero, in which case the first level
+	// index is effectively unused. There is a performance benefit
+	// to this, since the generated code can be more efficient,
+	// but comes at the cost of having a large L2 mapping.
+	//
+	// We use the L1 map on 64-bit Windows because the arena size
+	// is small, but the address space is still 48 bits, and
+	// there's a high cost to having a large L2.
+	arenaL1Bits = 6 * (_64bit * sys.GoosWindows)
+
+	// arenaL2Bits is the number of bits of the arena number
+	// covered by the second level arena index.
+	//
+	// The size of each arena map allocation is proportional to
+	// 1<<arenaL2Bits, so it's important that this not be too
+	// large. 48 bits leads to 32MB arena index allocations, which
+	// is about the practical threshold.
+	arenaL2Bits = heapAddrBits - logHeapArenaBytes - arenaL1Bits
+
+	// arenaL1Shift is the number of bits to shift an arena frame
+	// number by to compute an index into the first level arena map.
+	arenaL1Shift = arenaL2Bits
+
+	// arenaBits is the total bits in a combined arena map index.
+	// This is split between the index into the L1 arena map and
+	// the L2 arena map.
+	arenaBits = arenaL1Bits + arenaL2Bits
+
+	// arenaBaseOffset is the pointer value that corresponds to
+	// index 0 in the heap arena map.
+	//
+	// On amd64, the address space is 48 bits, sign extended to 64
+	// bits. This offset lets us handle "negative" addresses (or
+	// high addresses if viewed as unsigned).
+	//
+	// On aix/ppc64, this offset allows to keep the heapAddrBits to
+	// 48. Otherwize, it would be 60 in order to handle mmap addresses
+	// (in range 0x0a00000000000000 - 0x0afffffffffffff). But in this
+	// case, the memory reserved in (s *pageAlloc).init for chunks
+	// is causing important slowdowns.
+	//
+	// On other platforms, the user address space is contiguous
+	// and starts at 0, so no offset is necessary.
+	arenaBaseOffset = 0xffff800000000000*sys.GoarchAmd64 + 0x0a00000000000000*sys.GoosAix
+	// A typed version of this constant that will make it into DWARF (for viewcore).
+	arenaBaseOffsetUintptr = uintptr(arenaBaseOffset)
+
+	// Max number of threads to run garbage collection.
+	// 2, 3, and 4 are all plausible maximums depending
+	// on the hardware details of the machine. The garbage
+	// collector scales well to 32 cpus.
+	_MaxGcproc = 32
+
+	// minLegalPointer is the smallest possible legal pointer.
+	// This is the smallest possible architectural page size,
+	// since we assume that the first page is never mapped.
+	//
+	// This should agree with minZeroPage in the compiler.
+	minLegalPointer uintptr = 4096
+)
+
+// physPageSize is the size in bytes of the OS's physical pages.
+// Mapping and unmapping operations must be done at multiples of
+// physPageSize.
+//
+// This must be set by the OS init code (typically in osinit) before
+// mallocinit.
+var physPageSize uintptr
+
+// physHugePageSize is the size in bytes of the OS's default physical huge
+// page size whose allocation is opaque to the application. It is assumed
+// and verified to be a power of two.
+//
+// If set, this must be set by the OS init code (typically in osinit) before
+// mallocinit. However, setting it at all is optional, and leaving the default
+// value is always safe (though potentially less efficient).
+//
+// Since physHugePageSize is always assumed to be a power of two,
+// physHugePageShift is defined as physHugePageSize == 1 << physHugePageShift.
+// The purpose of physHugePageShift is to avoid doing divisions in
+// performance critical functions.
+var (
+	physHugePageSize  uintptr
+	physHugePageShift uint
+)
+
+// OS memory management abstraction layer
+//
+// Regions of the address space managed by the runtime may be in one of four
+// states at any given time:
+// 1) None - Unreserved and unmapped, the default state of any region.
+// 2) Reserved - Owned by the runtime, but accessing it would cause a fault.
+//               Does not count against the process' memory footprint.
+// 3) Prepared - Reserved, intended not to be backed by physical memory (though
+//               an OS may implement this lazily). Can transition efficiently to
+//               Ready. Accessing memory in such a region is undefined (may
+//               fault, may give back unexpected zeroes, etc.).
+// 4) Ready - may be accessed safely.
+//
+// This set of states is more than is strictly necessary to support all the
+// currently supported platforms. One could get by with just None, Reserved, and
+// Ready. However, the Prepared state gives us flexibility for performance
+// purposes. For example, on POSIX-y operating systems, Reserved is usually a
+// private anonymous mmap'd region with PROT_NONE set, and to transition
+// to Ready would require setting PROT_READ|PROT_WRITE. However the
+// underspecification of Prepared lets us use just MADV_FREE to transition from
+// Ready to Prepared. Thus with the Prepared state we can set the permission
+// bits just once early on, we can efficiently tell the OS that it's free to
+// take pages away from us when we don't strictly need them.
+//
+// For each OS there is a common set of helpers defined that transition
+// memory regions between these states. The helpers are as follows:
+//
+// sysAlloc transitions an OS-chosen region of memory from None to Ready.
+// More specifically, it obtains a large chunk of zeroed memory from the
+// operating system, typically on the order of a hundred kilobytes
+// or a megabyte. This memory is always immediately available for use.
+//
+// sysFree transitions a memory region from any state to None. Therefore, it
+// returns memory unconditionally. It is used if an out-of-memory error has been
+// detected midway through an allocation or to carve out an aligned section of
+// the address space. It is okay if sysFree is a no-op only if sysReserve always
+// returns a memory region aligned to the heap allocator's alignment
+// restrictions.
+//
+// sysReserve transitions a memory region from None to Reserved. It reserves
+// address space in such a way that it would cause a fatal fault upon access
+// (either via permissions or not committing the memory). Such a reservation is
+// thus never backed by physical memory.
+// If the pointer passed to it is non-nil, the caller wants the
+// reservation there, but sysReserve can still choose another
+// location if that one is unavailable.
+// NOTE: sysReserve returns OS-aligned memory, but the heap allocator
+// may use larger alignment, so the caller must be careful to realign the
+// memory obtained by sysReserve.
+//
+// sysMap transitions a memory region from Reserved to Prepared. It ensures the
+// memory region can be efficiently transitioned to Ready.
+//
+// sysUsed transitions a memory region from Prepared to Ready. It notifies the
+// operating system that the memory region is needed and ensures that the region
+// may be safely accessed. This is typically a no-op on systems that don't have
+// an explicit commit step and hard over-commit limits, but is critical on
+// Windows, for example.
+//
+// sysUnused transitions a memory region from Ready to Prepared. It notifies the
+// operating system that the physical pages backing this memory region are no
+// longer needed and can be reused for other purposes. The contents of a
+// sysUnused memory region are considered forfeit and the region must not be
+// accessed again until sysUsed is called.
+//
+// sysFault transitions a memory region from Ready or Prepared to Reserved. It
+// marks a region such that it will always fault if accessed. Used only for
+// debugging the runtime.
+
+func mallocinit() {
+	if class_to_size[_TinySizeClass] != _TinySize {
+		throw("bad TinySizeClass")
+	}
+
+	testdefersizes()
+
+	if heapArenaBitmapBytes&(heapArenaBitmapBytes-1) != 0 {
+		// heapBits expects modular arithmetic on bitmap
+		// addresses to work.
+		throw("heapArenaBitmapBytes not a power of 2")
+	}
+
+	// Copy class sizes out for statistics table.
+	for i := range class_to_size {
+		memstats.by_size[i].size = uint32(class_to_size[i])
+	}
+
+	// Check physPageSize.
+	if physPageSize == 0 {
+		// The OS init code failed to fetch the physical page size.
+		throw("failed to get system page size")
+	}
+	if physPageSize > maxPhysPageSize {
+		print("system page size (", physPageSize, ") is larger than maximum page size (", maxPhysPageSize, ")\n")
+		throw("bad system page size")
+	}
+	if physPageSize < minPhysPageSize {
+		print("system page size (", physPageSize, ") is smaller than minimum page size (", minPhysPageSize, ")\n")
+		throw("bad system page size")
+	}
+	if physPageSize&(physPageSize-1) != 0 {
+		print("system page size (", physPageSize, ") must be a power of 2\n")
+		throw("bad system page size")
+	}
+	if physHugePageSize&(physHugePageSize-1) != 0 {
+		print("system huge page size (", physHugePageSize, ") must be a power of 2\n")
+		throw("bad system huge page size")
+	}
+	if physHugePageSize > maxPhysHugePageSize {
+		// physHugePageSize is greater than the maximum supported huge page size.
+		// Don't throw here, like in the other cases, since a system configured
+		// in this way isn't wrong, we just don't have the code to support them.
+		// Instead, silently set the huge page size to zero.
+		physHugePageSize = 0
+	}
+	if physHugePageSize != 0 {
+		// Since physHugePageSize is a power of 2, it suffices to increase
+		// physHugePageShift until 1<<physHugePageShift == physHugePageSize.
+		for 1<<physHugePageShift != physHugePageSize {
+			physHugePageShift++
+		}
+	}
+	if pagesPerArena%pagesPerSpanRoot != 0 {
+		print("pagesPerArena (", pagesPerArena, ") is not divisible by pagesPerSpanRoot (", pagesPerSpanRoot, ")\n")
+		throw("bad pagesPerSpanRoot")
+	}
+	if pagesPerArena%pagesPerReclaimerChunk != 0 {
+		print("pagesPerArena (", pagesPerArena, ") is not divisible by pagesPerReclaimerChunk (", pagesPerReclaimerChunk, ")\n")
+		throw("bad pagesPerReclaimerChunk")
+	}
+
+	// Initialize the heap.
+	mheap_.init()
+	mcache0 = allocmcache()
+	lockInit(&gcBitsArenas.lock, lockRankGcBitsArenas)
+	lockInit(&proflock, lockRankProf)
+	lockInit(&globalAlloc.mutex, lockRankGlobalAlloc)
+
+	// Create initial arena growth hints.
+	if sys.PtrSize == 8 {
+		// On a 64-bit machine, we pick the following hints
+		// because:
+		//
+		// 1. Starting from the middle of the address space
+		// makes it easier to grow out a contiguous range
+		// without running in to some other mapping.
+		//
+		// 2. This makes Go heap addresses more easily
+		// recognizable when debugging.
+		//
+		// 3. Stack scanning in gccgo is still conservative,
+		// so it's important that addresses be distinguishable
+		// from other data.
+		//
+		// Starting at 0x00c0 means that the valid memory addresses
+		// will begin 0x00c0, 0x00c1, ...
+		// In little-endian, that's c0 00, c1 00, ... None of those are valid
+		// UTF-8 sequences, and they are otherwise as far away from
+		// ff (likely a common byte) as possible. If that fails, we try other 0xXXc0
+		// addresses. An earlier attempt to use 0x11f8 caused out of memory errors
+		// on OS X during thread allocations.  0x00c0 causes conflicts with
+		// AddressSanitizer which reserves all memory up to 0x0100.
+		// These choices reduce the odds of a conservative garbage collector
+		// not collecting memory because some non-pointer block of memory
+		// had a bit pattern that matched a memory address.
+		//
+		// However, on arm64, we ignore all this advice above and slam the
+		// allocation at 0x40 << 32 because when using 4k pages with 3-level
+		// translation buffers, the user address space is limited to 39 bits
+		// On ios/arm64, the address space is even smaller.
+		//
+		// On AIX, mmaps starts at 0x0A00000000000000 for 64-bit.
+		// processes.
+		for i := 0x7f; i >= 0; i-- {
+			var p uintptr
+			switch {
+			case raceenabled:
+				// The TSAN runtime requires the heap
+				// to be in the range [0x00c000000000,
+				// 0x00e000000000).
+				p = uintptr(i)<<32 | uintptrMask&(0x00c0<<32)
+				if p >= uintptrMask&0x00e000000000 {
+					continue
+				}
+			case GOARCH == "arm64" && GOOS == "ios":
+				p = uintptr(i)<<40 | uintptrMask&(0x0013<<28)
+			case GOARCH == "arm64":
+				p = uintptr(i)<<40 | uintptrMask&(0x0040<<32)
+			case GOOS == "aix":
+				if i == 0 {
+					// We don't use addresses directly after 0x0A00000000000000
+					// to avoid collisions with others mmaps done by non-go programs.
+					continue
+				}
+				p = uintptr(i)<<40 | uintptrMask&(0xa0<<52)
+			default:
+				p = uintptr(i)<<40 | uintptrMask&(0x00c0<<32)
+			}
+			hint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
+			hint.addr = p
+			hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
+		}
+	} else {
+		// On a 32-bit machine, we're much more concerned
+		// about keeping the usable heap contiguous.
+		// Hence:
+		//
+		// 1. We reserve space for all heapArenas up front so
+		// they don't get interleaved with the heap. They're
+		// ~258MB, so this isn't too bad. (We could reserve a
+		// smaller amount of space up front if this is a
+		// problem.)
+		//
+		// 2. We hint the heap to start right above the end of
+		// the binary so we have the best chance of keeping it
+		// contiguous.
+		//
+		// 3. We try to stake out a reasonably large initial
+		// heap reservation.
+
+		const arenaMetaSize = (1 << arenaBits) * unsafe.Sizeof(heapArena{})
+		meta := uintptr(sysReserve(nil, arenaMetaSize))
+		if meta != 0 {
+			mheap_.heapArenaAlloc.init(meta, arenaMetaSize)
+		}
+
+		// We want to start the arena low, but if we're linked
+		// against C code, it's possible global constructors
+		// have called malloc and adjusted the process' brk.
+		// Query the brk so we can avoid trying to map the
+		// region over it (which will cause the kernel to put
+		// the region somewhere else, likely at a high
+		// address).
+		procBrk := sbrk0()
+
+		// If we ask for the end of the data segment but the
+		// operating system requires a little more space
+		// before we can start allocating, it will give out a
+		// slightly higher pointer. Except QEMU, which is
+		// buggy, as usual: it won't adjust the pointer
+		// upward. So adjust it upward a little bit ourselves:
+		// 1/4 MB to get away from the running binary image.
+		p := firstmoduledata.end
+		if p < procBrk {
+			p = procBrk
+		}
+		if mheap_.heapArenaAlloc.next <= p && p < mheap_.heapArenaAlloc.end {
+			p = mheap_.heapArenaAlloc.end
+		}
+		p = alignUp(p+(256<<10), heapArenaBytes)
+		// Because we're worried about fragmentation on
+		// 32-bit, we try to make a large initial reservation.
+		arenaSizes := []uintptr{
+			512 << 20,
+			256 << 20,
+			128 << 20,
+		}
+		for _, arenaSize := range arenaSizes {
+			a, size := sysReserveAligned(unsafe.Pointer(p), arenaSize, heapArenaBytes)
+			if a != nil {
+				mheap_.arena.init(uintptr(a), size)
+				p = mheap_.arena.end // For hint below
+				break
+			}
+		}
+		hint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
+		hint.addr = p
+		hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
+	}
+}
+
+// sysAlloc allocates heap arena space for at least n bytes. The
+// returned pointer is always heapArenaBytes-aligned and backed by
+// h.arenas metadata. The returned size is always a multiple of
+// heapArenaBytes. sysAlloc returns nil on failure.
+// There is no corresponding free function.
+//
+// sysAlloc returns a memory region in the Prepared state. This region must
+// be transitioned to Ready before use.
+//
+// h must be locked.
+func (h *mheap) sysAlloc(n uintptr) (v unsafe.Pointer, size uintptr) {
+	assertLockHeld(&h.lock)
+
+	n = alignUp(n, heapArenaBytes)
+
+	// First, try the arena pre-reservation.
+	v = h.arena.alloc(n, heapArenaBytes, &memstats.heap_sys)
+	if v != nil {
+		size = n
+		goto mapped
+	}
+
+	// Try to grow the heap at a hint address.
+	for h.arenaHints != nil {
+		hint := h.arenaHints
+		p := hint.addr
+		if hint.down {
+			p -= n
+		}
+		if p+n < p {
+			// We can't use this, so don't ask.
+			v = nil
+		} else if arenaIndex(p+n-1) >= 1<<arenaBits {
+			// Outside addressable heap. Can't use.
+			v = nil
+		} else {
+			v = sysReserve(unsafe.Pointer(p), n)
+		}
+		if p == uintptr(v) {
+			// Success. Update the hint.
+			if !hint.down {
+				p += n
+			}
+			hint.addr = p
+			size = n
+			break
+		}
+		// Failed. Discard this hint and try the next.
+		//
+		// TODO: This would be cleaner if sysReserve could be
+		// told to only return the requested address. In
+		// particular, this is already how Windows behaves, so
+		// it would simplify things there.
+		if v != nil {
+			sysFree(v, n, nil)
+		}
+		h.arenaHints = hint.next
+		h.arenaHintAlloc.free(unsafe.Pointer(hint))
+	}
+
+	if size == 0 {
+		if raceenabled {
+			// The race detector assumes the heap lives in
+			// [0x00c000000000, 0x00e000000000), but we
+			// just ran out of hints in this region. Give
+			// a nice failure.
+			throw("too many address space collisions for -race mode")
+		}
+
+		// All of the hints failed, so we'll take any
+		// (sufficiently aligned) address the kernel will give
+		// us.
+		v, size = sysReserveAligned(nil, n, heapArenaBytes)
+		if v == nil {
+			return nil, 0
+		}
+
+		// Create new hints for extending this region.
+		hint := (*arenaHint)(h.arenaHintAlloc.alloc())
+		hint.addr, hint.down = uintptr(v), true
+		hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
+		hint = (*arenaHint)(h.arenaHintAlloc.alloc())
+		hint.addr = uintptr(v) + size
+		hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
+	}
+
+	// Check for bad pointers or pointers we can't use.
+	{
+		var bad string
+		p := uintptr(v)
+		if p+size < p {
+			bad = "region exceeds uintptr range"
+		} else if arenaIndex(p) >= 1<<arenaBits {
+			bad = "base outside usable address space"
+		} else if arenaIndex(p+size-1) >= 1<<arenaBits {
+			bad = "end outside usable address space"
+		}
+		if bad != "" {
+			// This should be impossible on most architectures,
+			// but it would be really confusing to debug.
+			print("runtime: memory allocated by OS [", hex(p), ", ", hex(p+size), ") not in usable address space: ", bad, "\n")
+			throw("memory reservation exceeds address space limit")
+		}
+	}
+
+	if uintptr(v)&(heapArenaBytes-1) != 0 {
+		throw("misrounded allocation in sysAlloc")
+	}
+
+	// Transition from Reserved to Prepared.
+	sysMap(v, size, &memstats.heap_sys)
+
+mapped:
+	// Create arena metadata.
+	for ri := arenaIndex(uintptr(v)); ri <= arenaIndex(uintptr(v)+size-1); ri++ {
+		l2 := h.arenas[ri.l1()]
+		if l2 == nil {
+			// Allocate an L2 arena map.
+			l2 = (*[1 << arenaL2Bits]*heapArena)(persistentalloc(unsafe.Sizeof(*l2), sys.PtrSize, nil))
+			if l2 == nil {
+				throw("out of memory allocating heap arena map")
+			}
+			atomic.StorepNoWB(unsafe.Pointer(&h.arenas[ri.l1()]), unsafe.Pointer(l2))
+		}
+
+		if l2[ri.l2()] != nil {
+			throw("arena already initialized")
+		}
+		var r *heapArena
+		r = (*heapArena)(h.heapArenaAlloc.alloc(unsafe.Sizeof(*r), sys.PtrSize, &memstats.gcMiscSys))
+		if r == nil {
+			r = (*heapArena)(persistentalloc(unsafe.Sizeof(*r), sys.PtrSize, &memstats.gcMiscSys))
+			if r == nil {
+				throw("out of memory allocating heap arena metadata")
+			}
+		}
+
+		// Add the arena to the arenas list.
+		if len(h.allArenas) == cap(h.allArenas) {
+			size := 2 * uintptr(cap(h.allArenas)) * sys.PtrSize
+			if size == 0 {
+				size = physPageSize
+			}
+			newArray := (*notInHeap)(persistentalloc(size, sys.PtrSize, &memstats.gcMiscSys))
+			if newArray == nil {
+				throw("out of memory allocating allArenas")
+			}
+			oldSlice := h.allArenas
+			*(*notInHeapSlice)(unsafe.Pointer(&h.allArenas)) = notInHeapSlice{newArray, len(h.allArenas), int(size / sys.PtrSize)}
+			copy(h.allArenas, oldSlice)
+			// Do not free the old backing array because
+			// there may be concurrent readers. Since we
+			// double the array each time, this can lead
+			// to at most 2x waste.
+		}
+		h.allArenas = h.allArenas[:len(h.allArenas)+1]
+		h.allArenas[len(h.allArenas)-1] = ri
+
+		// Store atomically just in case an object from the
+		// new heap arena becomes visible before the heap lock
+		// is released (which shouldn't happen, but there's
+		// little downside to this).
+		atomic.StorepNoWB(unsafe.Pointer(&l2[ri.l2()]), unsafe.Pointer(r))
+	}
+
+	// Tell the race detector about the new heap memory.
+	if raceenabled {
+		racemapshadow(v, size)
+	}
+
+	return
+}
+
+// sysReserveAligned is like sysReserve, but the returned pointer is
+// aligned to align bytes. It may reserve either n or n+align bytes,
+// so it returns the size that was reserved.
+func sysReserveAligned(v unsafe.Pointer, size, align uintptr) (unsafe.Pointer, uintptr) {
+	// Since the alignment is rather large in uses of this
+	// function, we're not likely to get it by chance, so we ask
+	// for a larger region and remove the parts we don't need.
+	retries := 0
+retry:
+	p := uintptr(sysReserve(v, size+align))
+	switch {
+	case p == 0:
+		return nil, 0
+	case p&(align-1) == 0:
+		// We got lucky and got an aligned region, so we can
+		// use the whole thing.
+		return unsafe.Pointer(p), size + align
+	case GOOS == "windows":
+		// On Windows we can't release pieces of a
+		// reservation, so we release the whole thing and
+		// re-reserve the aligned sub-region. This may race,
+		// so we may have to try again.
+		sysFree(unsafe.Pointer(p), size+align, nil)
+		p = alignUp(p, align)
+		p2 := sysReserve(unsafe.Pointer(p), size)
+		if p != uintptr(p2) {
+			// Must have raced. Try again.
+			sysFree(p2, size, nil)
+			if retries++; retries == 100 {
+				throw("failed to allocate aligned heap memory; too many retries")
+			}
+			goto retry
+		}
+		// Success.
+		return p2, size
+	default:
+		// Trim off the unaligned parts.
+		pAligned := alignUp(p, align)
+		sysFree(unsafe.Pointer(p), pAligned-p, nil)
+		end := pAligned + size
+		endLen := (p + size + align) - end
+		if endLen > 0 {
+			sysFree(unsafe.Pointer(end), endLen, nil)
+		}
+		return unsafe.Pointer(pAligned), size
+	}
+}
+
+// base address for all 0-byte allocations
+var zerobase uintptr
+
+// nextFreeFast returns the next free object if one is quickly available.
+// Otherwise it returns 0.
+func nextFreeFast(s *mspan) gclinkptr {
+	theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache?
+	if theBit < 64 {
+		result := s.freeindex + uintptr(theBit)
+		if result < s.nelems {
+			freeidx := result + 1
+			if freeidx%64 == 0 && freeidx != s.nelems {
+				return 0
+			}
+			s.allocCache >>= uint(theBit + 1)
+			s.freeindex = freeidx
+			s.allocCount++
+			return gclinkptr(result*s.elemsize + s.base())
+		}
+	}
+	return 0
+}
+
+// nextFree returns the next free object from the cached span if one is available.
+// Otherwise it refills the cache with a span with an available object and
+// returns that object along with a flag indicating that this was a heavy
+// weight allocation. If it is a heavy weight allocation the caller must
+// determine whether a new GC cycle needs to be started or if the GC is active
+// whether this goroutine needs to assist the GC.
+//
+// Must run in a non-preemptible context since otherwise the owner of
+// c could change.
+func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
+	s = c.alloc[spc]
+	shouldhelpgc = false
+	freeIndex := s.nextFreeIndex()
+	if freeIndex == s.nelems {
+		// The span is full.
+		if uintptr(s.allocCount) != s.nelems {
+			println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
+			throw("s.allocCount != s.nelems && freeIndex == s.nelems")
+		}
+		c.refill(spc)
+		shouldhelpgc = true
+		s = c.alloc[spc]
+
+		freeIndex = s.nextFreeIndex()
+	}
+
+	if freeIndex >= s.nelems {
+		throw("freeIndex is not valid")
+	}
+
+	v = gclinkptr(freeIndex*s.elemsize + s.base())
+	s.allocCount++
+	if uintptr(s.allocCount) > s.nelems {
+		println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
+		throw("s.allocCount > s.nelems")
+	}
+	return
+}
+
+// Allocate an object of size bytes.
+// Small objects are allocated from the per-P cache's free lists.
+// Large objects (> 32 kB) are allocated straight from the heap.
+func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
+	if gcphase == _GCmarktermination {
+		throw("mallocgc called with gcphase == _GCmarktermination")
+	}
+
+	if size == 0 {
+		return unsafe.Pointer(&zerobase)
+	}
+
+	if debug.malloc {
+		if debug.sbrk != 0 {
+			align := uintptr(16)
+			if typ != nil {
+				// TODO(austin): This should be just
+				//   align = uintptr(typ.align)
+				// but that's only 4 on 32-bit platforms,
+				// even if there's a uint64 field in typ (see #599).
+				// This causes 64-bit atomic accesses to panic.
+				// Hence, we use stricter alignment that matches
+				// the normal allocator better.
+				if size&7 == 0 {
+					align = 8
+				} else if size&3 == 0 {
+					align = 4
+				} else if size&1 == 0 {
+					align = 2
+				} else {
+					align = 1
+				}
+			}
+			return persistentalloc(size, align, &memstats.other_sys)
+		}
+
+		if inittrace.active && inittrace.id == getg().goid {
+			// Init functions are executed sequentially in a single Go routine.
+			inittrace.allocs += 1
+		}
+	}
+
+	// assistG is the G to charge for this allocation, or nil if
+	// GC is not currently active.
+	var assistG *g
+	if gcBlackenEnabled != 0 {
+		// Charge the current user G for this allocation.
+		assistG = getg()
+		if assistG.m.curg != nil {
+			assistG = assistG.m.curg
+		}
+		// Charge the allocation against the G. We'll account
+		// for internal fragmentation at the end of mallocgc.
+		assistG.gcAssistBytes -= int64(size)
+
+		if assistG.gcAssistBytes < 0 {
+			// This G is in debt. Assist the GC to correct
+			// this before allocating. This must happen
+			// before disabling preemption.
+			gcAssistAlloc(assistG)
+		}
+	}
+
+	// Set mp.mallocing to keep from being preempted by GC.
+	mp := acquirem()
+	if mp.mallocing != 0 {
+		throw("malloc deadlock")
+	}
+	if mp.gsignal == getg() {
+		throw("malloc during signal")
+	}
+	mp.mallocing = 1
+
+	shouldhelpgc := false
+	dataSize := size
+	c := getMCache()
+	if c == nil {
+		throw("mallocgc called without a P or outside bootstrapping")
+	}
+	var span *mspan
+	var x unsafe.Pointer
+	noscan := typ == nil || typ.ptrdata == 0
+	if size <= maxSmallSize {
+		if noscan && size < maxTinySize {
+			// Tiny allocator.
+			//
+			// Tiny allocator combines several tiny allocation requests
+			// into a single memory block. The resulting memory block
+			// is freed when all subobjects are unreachable. The subobjects
+			// must be noscan (don't have pointers), this ensures that
+			// the amount of potentially wasted memory is bounded.
+			//
+			// Size of the memory block used for combining (maxTinySize) is tunable.
+			// Current setting is 16 bytes, which relates to 2x worst case memory
+			// wastage (when all but one subobjects are unreachable).
+			// 8 bytes would result in no wastage at all, but provides less
+			// opportunities for combining.
+			// 32 bytes provides more opportunities for combining,
+			// but can lead to 4x worst case wastage.
+			// The best case winning is 8x regardless of block size.
+			//
+			// Objects obtained from tiny allocator must not be freed explicitly.
+			// So when an object will be freed explicitly, we ensure that
+			// its size >= maxTinySize.
+			//
+			// SetFinalizer has a special case for objects potentially coming
+			// from tiny allocator, it such case it allows to set finalizers
+			// for an inner byte of a memory block.
+			//
+			// The main targets of tiny allocator are small strings and
+			// standalone escaping variables. On a json benchmark
+			// the allocator reduces number of allocations by ~12% and
+			// reduces heap size by ~20%.
+			off := c.tinyoffset
+			// Align tiny pointer for required (conservative) alignment.
+			if size&7 == 0 {
+				off = alignUp(off, 8)
+			} else if sys.PtrSize == 4 && size == 12 {
+				// Conservatively align 12-byte objects to 8 bytes on 32-bit
+				// systems so that objects whose first field is a 64-bit
+				// value is aligned to 8 bytes and does not cause a fault on
+				// atomic access. See issue 37262.
+				// TODO(mknyszek): Remove this workaround if/when issue 36606
+				// is resolved.
+				off = alignUp(off, 8)
+			} else if size&3 == 0 {
+				off = alignUp(off, 4)
+			} else if size&1 == 0 {
+				off = alignUp(off, 2)
+			}
+			if off+size <= maxTinySize && c.tiny != 0 {
+				// The object fits into existing tiny block.
+				x = unsafe.Pointer(c.tiny + off)
+				c.tinyoffset = off + size
+				c.tinyAllocs++
+				mp.mallocing = 0
+				releasem(mp)
+				return x
+			}
+			// Allocate a new maxTinySize block.
+			span = c.alloc[tinySpanClass]
+			v := nextFreeFast(span)
+			if v == 0 {
+				v, span, shouldhelpgc = c.nextFree(tinySpanClass)
+			}
+			x = unsafe.Pointer(v)
+			(*[2]uint64)(x)[0] = 0
+			(*[2]uint64)(x)[1] = 0
+			// See if we need to replace the existing tiny block with the new one
+			// based on amount of remaining free space.
+			if size < c.tinyoffset || c.tiny == 0 {
+				c.tiny = uintptr(x)
+				c.tinyoffset = size
+			}
+			size = maxTinySize
+		} else {
+			var sizeclass uint8
+			if size <= smallSizeMax-8 {
+				sizeclass = size_to_class8[divRoundUp(size, smallSizeDiv)]
+			} else {
+				sizeclass = size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]
+			}
+			size = uintptr(class_to_size[sizeclass])
+			spc := makeSpanClass(sizeclass, noscan)
+			span = c.alloc[spc]
+			v := nextFreeFast(span)
+			if v == 0 {
+				v, span, shouldhelpgc = c.nextFree(spc)
+			}
+			x = unsafe.Pointer(v)
+			if needzero && span.needzero != 0 {
+				memclrNoHeapPointers(unsafe.Pointer(v), size)
+			}
+		}
+	} else {
+		shouldhelpgc = true
+		span = c.allocLarge(size, needzero, noscan)
+		span.freeindex = 1
+		span.allocCount = 1
+		x = unsafe.Pointer(span.base())
+		size = span.elemsize
+	}
+
+	var scanSize uintptr
+	if !noscan {
+		// If allocating a defer+arg block, now that we've picked a malloc size
+		// large enough to hold everything, cut the "asked for" size down to
+		// just the defer header, so that the GC bitmap will record the arg block
+		// as containing nothing at all (as if it were unused space at the end of
+		// a malloc block caused by size rounding).
+		// The defer arg areas are scanned as part of scanstack.
+		if typ == deferType {
+			dataSize = unsafe.Sizeof(_defer{})
+		}
+		heapBitsSetType(uintptr(x), size, dataSize, typ)
+		if dataSize > typ.size {
+			// Array allocation. If there are any
+			// pointers, GC has to scan to the last
+			// element.
+			if typ.ptrdata != 0 {
+				scanSize = dataSize - typ.size + typ.ptrdata
+			}
+		} else {
+			scanSize = typ.ptrdata
+		}
+		c.scanAlloc += scanSize
+	}
+
+	// Ensure that the stores above that initialize x to
+	// type-safe memory and set the heap bits occur before
+	// the caller can make x observable to the garbage
+	// collector. Otherwise, on weakly ordered machines,
+	// the garbage collector could follow a pointer to x,
+	// but see uninitialized memory or stale heap bits.
+	publicationBarrier()
+
+	// Allocate black during GC.
+	// All slots hold nil so no scanning is needed.
+	// This may be racing with GC so do it atomically if there can be
+	// a race marking the bit.
+	if gcphase != _GCoff {
+		gcmarknewobject(span, uintptr(x), size, scanSize)
+	}
+
+	if raceenabled {
+		racemalloc(x, size)
+	}
+
+	if msanenabled {
+		msanmalloc(x, size)
+	}
+
+	mp.mallocing = 0
+	releasem(mp)
+
+	if debug.malloc {
+		if debug.allocfreetrace != 0 {
+			tracealloc(x, size, typ)
+		}
+
+		if inittrace.active && inittrace.id == getg().goid {
+			// Init functions are executed sequentially in a single Go routine.
+			inittrace.bytes += uint64(size)
+		}
+	}
+
+	if rate := MemProfileRate; rate > 0 {
+		if rate != 1 && size < c.nextSample {
+			c.nextSample -= size
+		} else {
+			mp := acquirem()
+			profilealloc(mp, x, size)
+			releasem(mp)
+		}
+	}
+
+	if assistG != nil {
+		// Account for internal fragmentation in the assist
+		// debt now that we know it.
+		assistG.gcAssistBytes -= int64(size - dataSize)
+	}
+
+	if shouldhelpgc {
+		if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
+			gcStart(t)
+		}
+	}
+
+	return x
+}
+
+// implementation of new builtin
+// compiler (both frontend and SSA backend) knows the signature
+// of this function
+func newobject(typ *_type) unsafe.Pointer {
+	return mallocgc(typ.size, typ, true)
+}
+
+//go:linkname reflect_unsafe_New reflect.unsafe_New
+func reflect_unsafe_New(typ *_type) unsafe.Pointer {
+	return mallocgc(typ.size, typ, true)
+}
+
+//go:linkname reflectlite_unsafe_New internal/reflectlite.unsafe_New
+func reflectlite_unsafe_New(typ *_type) unsafe.Pointer {
+	return mallocgc(typ.size, typ, true)
+}
+
+// newarray allocates an array of n elements of type typ.
+func newarray(typ *_type, n int) unsafe.Pointer {
+	if n == 1 {
+		return mallocgc(typ.size, typ, true)
+	}
+	mem, overflow := math.MulUintptr(typ.size, uintptr(n))
+	if overflow || mem > maxAlloc || n < 0 {
+		panic(plainError("runtime: allocation size out of range"))
+	}
+	return mallocgc(mem, typ, true)
+}
+
+//go:linkname reflect_unsafe_NewArray reflect.unsafe_NewArray
+func reflect_unsafe_NewArray(typ *_type, n int) unsafe.Pointer {
+	return newarray(typ, n)
+}
+
+func profilealloc(mp *m, x unsafe.Pointer, size uintptr) {
+	c := getMCache()
+	if c == nil {
+		throw("profilealloc called without a P or outside bootstrapping")
+	}
+	c.nextSample = nextSample()
+	mProf_Malloc(x, size)
+}
+
+// nextSample returns the next sampling point for heap profiling. The goal is
+// to sample allocations on average every MemProfileRate bytes, but with a
+// completely random distribution over the allocation timeline; this
+// corresponds to a Poisson process with parameter MemProfileRate. In Poisson
+// processes, the distance between two samples follows the exponential
+// distribution (exp(MemProfileRate)), so the best return value is a random
+// number taken from an exponential distribution whose mean is MemProfileRate.
+func nextSample() uintptr {
+	if MemProfileRate == 1 {
+		// Callers assign our return value to
+		// mcache.next_sample, but next_sample is not used
+		// when the rate is 1. So avoid the math below and
+		// just return something.
+		return 0
+	}
+	if GOOS == "plan9" {
+		// Plan 9 doesn't support floating point in note handler.
+		if g := getg(); g == g.m.gsignal {
+			return nextSampleNoFP()
+		}
+	}
+
+	return uintptr(fastexprand(MemProfileRate))
+}
+
+// fastexprand returns a random number from an exponential distribution with
+// the specified mean.
+func fastexprand(mean int) int32 {
+	// Avoid overflow. Maximum possible step is
+	// -ln(1/(1<<randomBitCount)) * mean, approximately 20 * mean.
+	switch {
+	case mean > 0x7000000:
+		mean = 0x7000000
+	case mean == 0:
+		return 0
+	}
+
+	// Take a random sample of the exponential distribution exp(-mean*x).
+	// The probability distribution function is mean*exp(-mean*x), so the CDF is
+	// p = 1 - exp(-mean*x), so
+	// q = 1 - p == exp(-mean*x)
+	// log_e(q) = -mean*x
+	// -log_e(q)/mean = x
+	// x = -log_e(q) * mean
+	// x = log_2(q) * (-log_e(2)) * mean    ; Using log_2 for efficiency
+	const randomBitCount = 26
+	q := fastrand()%(1<<randomBitCount) + 1
+	qlog := fastlog2(float64(q)) - randomBitCount
+	if qlog > 0 {
+		qlog = 0
+	}
+	const minusLog2 = -0.6931471805599453 // -ln(2)
+	return int32(qlog*(minusLog2*float64(mean))) + 1
+}
+
+// nextSampleNoFP is similar to nextSample, but uses older,
+// simpler code to avoid floating point.
+func nextSampleNoFP() uintptr {
+	// Set first allocation sample size.
+	rate := MemProfileRate
+	if rate > 0x3fffffff { // make 2*rate not overflow
+		rate = 0x3fffffff
+	}
+	if rate != 0 {
+		return uintptr(fastrand() % uint32(2*rate))
+	}
+	return 0
+}
+
+type persistentAlloc struct {
+	base *notInHeap
+	off  uintptr
+}
+
+var globalAlloc struct {
+	mutex
+	persistentAlloc
+}
+
+// persistentChunkSize is the number of bytes we allocate when we grow
+// a persistentAlloc.
+const persistentChunkSize = 256 << 10
+
+// persistentChunks is a list of all the persistent chunks we have
+// allocated. The list is maintained through the first word in the
+// persistent chunk. This is updated atomically.
+var persistentChunks *notInHeap
+
+// Wrapper around sysAlloc that can allocate small chunks.
+// There is no associated free operation.
+// Intended for things like function/type/debug-related persistent data.
+// If align is 0, uses default align (currently 8).
+// The returned memory will be zeroed.
+//
+// Consider marking persistentalloc'd types go:notinheap.
+func persistentalloc(size, align uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	var p *notInHeap
+	systemstack(func() {
+		p = persistentalloc1(size, align, sysStat)
+	})
+	return unsafe.Pointer(p)
+}
+
+// Must run on system stack because stack growth can (re)invoke it.
+// See issue 9174.
+//go:systemstack
+func persistentalloc1(size, align uintptr, sysStat *sysMemStat) *notInHeap {
+	const (
+		maxBlock = 64 << 10 // VM reservation granularity is 64K on windows
+	)
+
+	if size == 0 {
+		throw("persistentalloc: size == 0")
+	}
+	if align != 0 {
+		if align&(align-1) != 0 {
+			throw("persistentalloc: align is not a power of 2")
+		}
+		if align > _PageSize {
+			throw("persistentalloc: align is too large")
+		}
+	} else {
+		align = 8
+	}
+
+	if size >= maxBlock {
+		return (*notInHeap)(sysAlloc(size, sysStat))
+	}
+
+	mp := acquirem()
+	var persistent *persistentAlloc
+	if mp != nil && mp.p != 0 {
+		persistent = &mp.p.ptr().palloc
+	} else {
+		lock(&globalAlloc.mutex)
+		persistent = &globalAlloc.persistentAlloc
+	}
+	persistent.off = alignUp(persistent.off, align)
+	if persistent.off+size > persistentChunkSize || persistent.base == nil {
+		persistent.base = (*notInHeap)(sysAlloc(persistentChunkSize, &memstats.other_sys))
+		if persistent.base == nil {
+			if persistent == &globalAlloc.persistentAlloc {
+				unlock(&globalAlloc.mutex)
+			}
+			throw("runtime: cannot allocate memory")
+		}
+
+		// Add the new chunk to the persistentChunks list.
+		for {
+			chunks := uintptr(unsafe.Pointer(persistentChunks))
+			*(*uintptr)(unsafe.Pointer(persistent.base)) = chunks
+			if atomic.Casuintptr((*uintptr)(unsafe.Pointer(&persistentChunks)), chunks, uintptr(unsafe.Pointer(persistent.base))) {
+				break
+			}
+		}
+		persistent.off = alignUp(sys.PtrSize, align)
+	}
+	p := persistent.base.add(persistent.off)
+	persistent.off += size
+	releasem(mp)
+	if persistent == &globalAlloc.persistentAlloc {
+		unlock(&globalAlloc.mutex)
+	}
+
+	if sysStat != &memstats.other_sys {
+		sysStat.add(int64(size))
+		memstats.other_sys.add(-int64(size))
+	}
+	return p
+}
+
+// inPersistentAlloc reports whether p points to memory allocated by
+// persistentalloc. This must be nosplit because it is called by the
+// cgo checker code, which is called by the write barrier code.
+//go:nosplit
+func inPersistentAlloc(p uintptr) bool {
+	chunk := atomic.Loaduintptr((*uintptr)(unsafe.Pointer(&persistentChunks)))
+	for chunk != 0 {
+		if p >= chunk && p < chunk+persistentChunkSize {
+			return true
+		}
+		chunk = *(*uintptr)(unsafe.Pointer(chunk))
+	}
+	return false
+}
+
+// linearAlloc is a simple linear allocator that pre-reserves a region
+// of memory and then maps that region into the Ready state as needed. The
+// caller is responsible for locking.
+type linearAlloc struct {
+	next   uintptr // next free byte
+	mapped uintptr // one byte past end of mapped space
+	end    uintptr // end of reserved space
+}
+
+func (l *linearAlloc) init(base, size uintptr) {
+	if base+size < base {
+		// Chop off the last byte. The runtime isn't prepared
+		// to deal with situations where the bounds could overflow.
+		// Leave that memory reserved, though, so we don't map it
+		// later.
+		size -= 1
+	}
+	l.next, l.mapped = base, base
+	l.end = base + size
+}
+
+func (l *linearAlloc) alloc(size, align uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	p := alignUp(l.next, align)
+	if p+size > l.end {
+		return nil
+	}
+	l.next = p + size
+	if pEnd := alignUp(l.next-1, physPageSize); pEnd > l.mapped {
+		// Transition from Reserved to Prepared to Ready.
+		sysMap(unsafe.Pointer(l.mapped), pEnd-l.mapped, sysStat)
+		sysUsed(unsafe.Pointer(l.mapped), pEnd-l.mapped)
+		l.mapped = pEnd
+	}
+	return unsafe.Pointer(p)
+}
+
+// notInHeap is off-heap memory allocated by a lower-level allocator
+// like sysAlloc or persistentAlloc.
+//
+// In general, it's better to use real types marked as go:notinheap,
+// but this serves as a generic type for situations where that isn't
+// possible (like in the allocators).
+//
+// TODO: Use this as the return type of sysAlloc, persistentAlloc, etc?
+//
+//go:notinheap
+type notInHeap struct{}
+
+func (p *notInHeap) add(bytes uintptr) *notInHeap {
+	return (*notInHeap)(unsafe.Pointer(uintptr(unsafe.Pointer(p)) + bytes))
+}
diff --git a/src/runtime/malloc_test.go b/src/runtime/malloc_test.go
new file mode 100644
index 0000000..4ba94d0
--- /dev/null
+++ b/src/runtime/malloc_test.go
@@ -0,0 +1,455 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"flag"
+	"fmt"
+	"internal/race"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"reflect"
+	"runtime"
+	. "runtime"
+	"strings"
+	"sync/atomic"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+var testMemStatsCount int
+
+func TestMemStats(t *testing.T) {
+	testMemStatsCount++
+
+	// Make sure there's at least one forced GC.
+	GC()
+
+	// Test that MemStats has sane values.
+	st := new(MemStats)
+	ReadMemStats(st)
+
+	nz := func(x interface{}) error {
+		if x != reflect.Zero(reflect.TypeOf(x)).Interface() {
+			return nil
+		}
+		return fmt.Errorf("zero value")
+	}
+	le := func(thresh float64) func(interface{}) error {
+		return func(x interface{}) error {
+			// These sanity tests aren't necessarily valid
+			// with high -test.count values, so only run
+			// them once.
+			if testMemStatsCount > 1 {
+				return nil
+			}
+
+			if reflect.ValueOf(x).Convert(reflect.TypeOf(thresh)).Float() < thresh {
+				return nil
+			}
+			return fmt.Errorf("insanely high value (overflow?); want <= %v", thresh)
+		}
+	}
+	eq := func(x interface{}) func(interface{}) error {
+		return func(y interface{}) error {
+			if x == y {
+				return nil
+			}
+			return fmt.Errorf("want %v", x)
+		}
+	}
+	// Of the uint fields, HeapReleased, HeapIdle can be 0.
+	// PauseTotalNs can be 0 if timer resolution is poor.
+	fields := map[string][]func(interface{}) error{
+		"Alloc": {nz, le(1e10)}, "TotalAlloc": {nz, le(1e11)}, "Sys": {nz, le(1e10)},
+		"Lookups": {eq(uint64(0))}, "Mallocs": {nz, le(1e10)}, "Frees": {nz, le(1e10)},
+		"HeapAlloc": {nz, le(1e10)}, "HeapSys": {nz, le(1e10)}, "HeapIdle": {le(1e10)},
+		"HeapInuse": {nz, le(1e10)}, "HeapReleased": {le(1e10)}, "HeapObjects": {nz, le(1e10)},
+		"StackInuse": {nz, le(1e10)}, "StackSys": {nz, le(1e10)},
+		"MSpanInuse": {nz, le(1e10)}, "MSpanSys": {nz, le(1e10)},
+		"MCacheInuse": {nz, le(1e10)}, "MCacheSys": {nz, le(1e10)},
+		"BuckHashSys": {nz, le(1e10)}, "GCSys": {nz, le(1e10)}, "OtherSys": {nz, le(1e10)},
+		"NextGC": {nz, le(1e10)}, "LastGC": {nz},
+		"PauseTotalNs": {le(1e11)}, "PauseNs": nil, "PauseEnd": nil,
+		"NumGC": {nz, le(1e9)}, "NumForcedGC": {nz, le(1e9)},
+		"GCCPUFraction": {le(0.99)}, "EnableGC": {eq(true)}, "DebugGC": {eq(false)},
+		"BySize": nil,
+	}
+
+	rst := reflect.ValueOf(st).Elem()
+	for i := 0; i < rst.Type().NumField(); i++ {
+		name, val := rst.Type().Field(i).Name, rst.Field(i).Interface()
+		checks, ok := fields[name]
+		if !ok {
+			t.Errorf("unknown MemStats field %s", name)
+			continue
+		}
+		for _, check := range checks {
+			if err := check(val); err != nil {
+				t.Errorf("%s = %v: %s", name, val, err)
+			}
+		}
+	}
+
+	if st.Sys != st.HeapSys+st.StackSys+st.MSpanSys+st.MCacheSys+
+		st.BuckHashSys+st.GCSys+st.OtherSys {
+		t.Fatalf("Bad sys value: %+v", *st)
+	}
+
+	if st.HeapIdle+st.HeapInuse != st.HeapSys {
+		t.Fatalf("HeapIdle(%d) + HeapInuse(%d) should be equal to HeapSys(%d), but isn't.", st.HeapIdle, st.HeapInuse, st.HeapSys)
+	}
+
+	if lpe := st.PauseEnd[int(st.NumGC+255)%len(st.PauseEnd)]; st.LastGC != lpe {
+		t.Fatalf("LastGC(%d) != last PauseEnd(%d)", st.LastGC, lpe)
+	}
+
+	var pauseTotal uint64
+	for _, pause := range st.PauseNs {
+		pauseTotal += pause
+	}
+	if int(st.NumGC) < len(st.PauseNs) {
+		// We have all pauses, so this should be exact.
+		if st.PauseTotalNs != pauseTotal {
+			t.Fatalf("PauseTotalNs(%d) != sum PauseNs(%d)", st.PauseTotalNs, pauseTotal)
+		}
+		for i := int(st.NumGC); i < len(st.PauseNs); i++ {
+			if st.PauseNs[i] != 0 {
+				t.Fatalf("Non-zero PauseNs[%d]: %+v", i, st)
+			}
+			if st.PauseEnd[i] != 0 {
+				t.Fatalf("Non-zero PauseEnd[%d]: %+v", i, st)
+			}
+		}
+	} else {
+		if st.PauseTotalNs < pauseTotal {
+			t.Fatalf("PauseTotalNs(%d) < sum PauseNs(%d)", st.PauseTotalNs, pauseTotal)
+		}
+	}
+
+	if st.NumForcedGC > st.NumGC {
+		t.Fatalf("NumForcedGC(%d) > NumGC(%d)", st.NumForcedGC, st.NumGC)
+	}
+}
+
+func TestStringConcatenationAllocs(t *testing.T) {
+	n := testing.AllocsPerRun(1e3, func() {
+		b := make([]byte, 10)
+		for i := 0; i < 10; i++ {
+			b[i] = byte(i) + '0'
+		}
+		s := "foo" + string(b)
+		if want := "foo0123456789"; s != want {
+			t.Fatalf("want %v, got %v", want, s)
+		}
+	})
+	// Only string concatenation allocates.
+	if n != 1 {
+		t.Fatalf("want 1 allocation, got %v", n)
+	}
+}
+
+func TestTinyAlloc(t *testing.T) {
+	const N = 16
+	var v [N]unsafe.Pointer
+	for i := range v {
+		v[i] = unsafe.Pointer(new(byte))
+	}
+
+	chunks := make(map[uintptr]bool, N)
+	for _, p := range v {
+		chunks[uintptr(p)&^7] = true
+	}
+
+	if len(chunks) == N {
+		t.Fatal("no bytes allocated within the same 8-byte chunk")
+	}
+}
+
+var (
+	tinyByteSink   *byte
+	tinyUint32Sink *uint32
+	tinyObj12Sink  *obj12
+)
+
+type obj12 struct {
+	a uint64
+	b uint32
+}
+
+func TestTinyAllocIssue37262(t *testing.T) {
+	// Try to cause an alignment access fault
+	// by atomically accessing the first 64-bit
+	// value of a tiny-allocated object.
+	// See issue 37262 for details.
+
+	// GC twice, once to reach a stable heap state
+	// and again to make sure we finish the sweep phase.
+	runtime.GC()
+	runtime.GC()
+
+	// Make 1-byte allocations until we get a fresh tiny slot.
+	aligned := false
+	for i := 0; i < 16; i++ {
+		tinyByteSink = new(byte)
+		if uintptr(unsafe.Pointer(tinyByteSink))&0xf == 0xf {
+			aligned = true
+			break
+		}
+	}
+	if !aligned {
+		t.Fatal("unable to get a fresh tiny slot")
+	}
+
+	// Create a 4-byte object so that the current
+	// tiny slot is partially filled.
+	tinyUint32Sink = new(uint32)
+
+	// Create a 12-byte object, which fits into the
+	// tiny slot. If it actually gets place there,
+	// then the field "a" will be improperly aligned
+	// for atomic access on 32-bit architectures.
+	// This won't be true if issue 36606 gets resolved.
+	tinyObj12Sink = new(obj12)
+
+	// Try to atomically access "x.a".
+	atomic.StoreUint64(&tinyObj12Sink.a, 10)
+
+	// Clear the sinks.
+	tinyByteSink = nil
+	tinyUint32Sink = nil
+	tinyObj12Sink = nil
+}
+
+func TestPageCacheLeak(t *testing.T) {
+	defer GOMAXPROCS(GOMAXPROCS(1))
+	leaked := PageCachePagesLeaked()
+	if leaked != 0 {
+		t.Fatalf("found %d leaked pages in page caches", leaked)
+	}
+}
+
+func TestPhysicalMemoryUtilization(t *testing.T) {
+	got := runTestProg(t, "testprog", "GCPhys")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got %q", want, got)
+	}
+}
+
+func TestScavengedBitsCleared(t *testing.T) {
+	var mismatches [128]BitsMismatch
+	if n, ok := CheckScavengedBitsCleared(mismatches[:]); !ok {
+		t.Errorf("uncleared scavenged bits")
+		for _, m := range mismatches[:n] {
+			t.Logf("\t@ address 0x%x", m.Base)
+			t.Logf("\t|  got: %064b", m.Got)
+			t.Logf("\t| want: %064b", m.Want)
+		}
+		t.FailNow()
+	}
+}
+
+type acLink struct {
+	x [1 << 20]byte
+}
+
+var arenaCollisionSink []*acLink
+
+func TestArenaCollision(t *testing.T) {
+	testenv.MustHaveExec(t)
+
+	// Test that mheap.sysAlloc handles collisions with other
+	// memory mappings.
+	if os.Getenv("TEST_ARENA_COLLISION") != "1" {
+		cmd := testenv.CleanCmdEnv(exec.Command(os.Args[0], "-test.run=TestArenaCollision", "-test.v"))
+		cmd.Env = append(cmd.Env, "TEST_ARENA_COLLISION=1")
+		out, err := cmd.CombinedOutput()
+		if race.Enabled {
+			// This test runs the runtime out of hint
+			// addresses, so it will start mapping the
+			// heap wherever it can. The race detector
+			// doesn't support this, so look for the
+			// expected failure.
+			if want := "too many address space collisions"; !strings.Contains(string(out), want) {
+				t.Fatalf("want %q, got:\n%s", want, string(out))
+			}
+		} else if !strings.Contains(string(out), "PASS\n") || err != nil {
+			t.Fatalf("%s\n(exit status %v)", string(out), err)
+		}
+		return
+	}
+	disallowed := [][2]uintptr{}
+	// Drop all but the next 3 hints. 64-bit has a lot of hints,
+	// so it would take a lot of memory to go through all of them.
+	KeepNArenaHints(3)
+	// Consume these 3 hints and force the runtime to find some
+	// fallback hints.
+	for i := 0; i < 5; i++ {
+		// Reserve memory at the next hint so it can't be used
+		// for the heap.
+		start, end := MapNextArenaHint()
+		disallowed = append(disallowed, [2]uintptr{start, end})
+		// Allocate until the runtime tries to use the hint we
+		// just mapped over.
+		hint := GetNextArenaHint()
+		for GetNextArenaHint() == hint {
+			ac := new(acLink)
+			arenaCollisionSink = append(arenaCollisionSink, ac)
+			// The allocation must not have fallen into
+			// one of the reserved regions.
+			p := uintptr(unsafe.Pointer(ac))
+			for _, d := range disallowed {
+				if d[0] <= p && p < d[1] {
+					t.Fatalf("allocation %#x in reserved region [%#x, %#x)", p, d[0], d[1])
+				}
+			}
+		}
+	}
+}
+
+var mallocSink uintptr
+
+func BenchmarkMalloc8(b *testing.B) {
+	var x uintptr
+	for i := 0; i < b.N; i++ {
+		p := new(int64)
+		x ^= uintptr(unsafe.Pointer(p))
+	}
+	mallocSink = x
+}
+
+func BenchmarkMalloc16(b *testing.B) {
+	var x uintptr
+	for i := 0; i < b.N; i++ {
+		p := new([2]int64)
+		x ^= uintptr(unsafe.Pointer(p))
+	}
+	mallocSink = x
+}
+
+func BenchmarkMallocTypeInfo8(b *testing.B) {
+	var x uintptr
+	for i := 0; i < b.N; i++ {
+		p := new(struct {
+			p [8 / unsafe.Sizeof(uintptr(0))]*int
+		})
+		x ^= uintptr(unsafe.Pointer(p))
+	}
+	mallocSink = x
+}
+
+func BenchmarkMallocTypeInfo16(b *testing.B) {
+	var x uintptr
+	for i := 0; i < b.N; i++ {
+		p := new(struct {
+			p [16 / unsafe.Sizeof(uintptr(0))]*int
+		})
+		x ^= uintptr(unsafe.Pointer(p))
+	}
+	mallocSink = x
+}
+
+type LargeStruct struct {
+	x [16][]byte
+}
+
+func BenchmarkMallocLargeStruct(b *testing.B) {
+	var x uintptr
+	for i := 0; i < b.N; i++ {
+		p := make([]LargeStruct, 2)
+		x ^= uintptr(unsafe.Pointer(&p[0]))
+	}
+	mallocSink = x
+}
+
+var n = flag.Int("n", 1000, "number of goroutines")
+
+func BenchmarkGoroutineSelect(b *testing.B) {
+	quit := make(chan struct{})
+	read := func(ch chan struct{}) {
+		for {
+			select {
+			case _, ok := <-ch:
+				if !ok {
+					return
+				}
+			case <-quit:
+				return
+			}
+		}
+	}
+	benchHelper(b, *n, read)
+}
+
+func BenchmarkGoroutineBlocking(b *testing.B) {
+	read := func(ch chan struct{}) {
+		for {
+			if _, ok := <-ch; !ok {
+				return
+			}
+		}
+	}
+	benchHelper(b, *n, read)
+}
+
+func BenchmarkGoroutineForRange(b *testing.B) {
+	read := func(ch chan struct{}) {
+		for range ch {
+		}
+	}
+	benchHelper(b, *n, read)
+}
+
+func benchHelper(b *testing.B, n int, read func(chan struct{})) {
+	m := make([]chan struct{}, n)
+	for i := range m {
+		m[i] = make(chan struct{}, 1)
+		go read(m[i])
+	}
+	b.StopTimer()
+	b.ResetTimer()
+	GC()
+
+	for i := 0; i < b.N; i++ {
+		for _, ch := range m {
+			if ch != nil {
+				ch <- struct{}{}
+			}
+		}
+		time.Sleep(10 * time.Millisecond)
+		b.StartTimer()
+		GC()
+		b.StopTimer()
+	}
+
+	for _, ch := range m {
+		close(ch)
+	}
+	time.Sleep(10 * time.Millisecond)
+}
+
+func BenchmarkGoroutineIdle(b *testing.B) {
+	quit := make(chan struct{})
+	fn := func() {
+		<-quit
+	}
+	for i := 0; i < *n; i++ {
+		go fn()
+	}
+
+	GC()
+	b.ResetTimer()
+
+	for i := 0; i < b.N; i++ {
+		GC()
+	}
+
+	b.StopTimer()
+	close(quit)
+	time.Sleep(10 * time.Millisecond)
+}
diff --git a/src/runtime/map.go b/src/runtime/map.go
new file mode 100644
index 0000000..0beff57
--- /dev/null
+++ b/src/runtime/map.go
@@ -0,0 +1,1384 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// This file contains the implementation of Go's map type.
+//
+// A map is just a hash table. The data is arranged
+// into an array of buckets. Each bucket contains up to
+// 8 key/elem pairs. The low-order bits of the hash are
+// used to select a bucket. Each bucket contains a few
+// high-order bits of each hash to distinguish the entries
+// within a single bucket.
+//
+// If more than 8 keys hash to a bucket, we chain on
+// extra buckets.
+//
+// When the hashtable grows, we allocate a new array
+// of buckets twice as big. Buckets are incrementally
+// copied from the old bucket array to the new bucket array.
+//
+// Map iterators walk through the array of buckets and
+// return the keys in walk order (bucket #, then overflow
+// chain order, then bucket index).  To maintain iteration
+// semantics, we never move keys within their bucket (if
+// we did, keys might be returned 0 or 2 times).  When
+// growing the table, iterators remain iterating through the
+// old table and must check the new table if the bucket
+// they are iterating through has been moved ("evacuated")
+// to the new table.
+
+// Picking loadFactor: too large and we have lots of overflow
+// buckets, too small and we waste a lot of space. I wrote
+// a simple program to check some stats for different loads:
+// (64-bit, 8 byte keys and elems)
+//  loadFactor    %overflow  bytes/entry     hitprobe    missprobe
+//        4.00         2.13        20.77         3.00         4.00
+//        4.50         4.05        17.30         3.25         4.50
+//        5.00         6.85        14.77         3.50         5.00
+//        5.50        10.55        12.94         3.75         5.50
+//        6.00        15.27        11.67         4.00         6.00
+//        6.50        20.90        10.79         4.25         6.50
+//        7.00        27.14        10.15         4.50         7.00
+//        7.50        34.03         9.73         4.75         7.50
+//        8.00        41.10         9.40         5.00         8.00
+//
+// %overflow   = percentage of buckets which have an overflow bucket
+// bytes/entry = overhead bytes used per key/elem pair
+// hitprobe    = # of entries to check when looking up a present key
+// missprobe   = # of entries to check when looking up an absent key
+//
+// Keep in mind this data is for maximally loaded tables, i.e. just
+// before the table grows. Typical tables will be somewhat less loaded.
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/math"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	// Maximum number of key/elem pairs a bucket can hold.
+	bucketCntBits = 3
+	bucketCnt     = 1 << bucketCntBits
+
+	// Maximum average load of a bucket that triggers growth is 6.5.
+	// Represent as loadFactorNum/loadFactorDen, to allow integer math.
+	loadFactorNum = 13
+	loadFactorDen = 2
+
+	// Maximum key or elem size to keep inline (instead of mallocing per element).
+	// Must fit in a uint8.
+	// Fast versions cannot handle big elems - the cutoff size for
+	// fast versions in cmd/compile/internal/gc/walk.go must be at most this elem.
+	maxKeySize  = 128
+	maxElemSize = 128
+
+	// data offset should be the size of the bmap struct, but needs to be
+	// aligned correctly. For amd64p32 this means 64-bit alignment
+	// even though pointers are 32 bit.
+	dataOffset = unsafe.Offsetof(struct {
+		b bmap
+		v int64
+	}{}.v)
+
+	// Possible tophash values. We reserve a few possibilities for special marks.
+	// Each bucket (including its overflow buckets, if any) will have either all or none of its
+	// entries in the evacuated* states (except during the evacuate() method, which only happens
+	// during map writes and thus no one else can observe the map during that time).
+	emptyRest      = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.
+	emptyOne       = 1 // this cell is empty
+	evacuatedX     = 2 // key/elem is valid.  Entry has been evacuated to first half of larger table.
+	evacuatedY     = 3 // same as above, but evacuated to second half of larger table.
+	evacuatedEmpty = 4 // cell is empty, bucket is evacuated.
+	minTopHash     = 5 // minimum tophash for a normal filled cell.
+
+	// flags
+	iterator     = 1 // there may be an iterator using buckets
+	oldIterator  = 2 // there may be an iterator using oldbuckets
+	hashWriting  = 4 // a goroutine is writing to the map
+	sameSizeGrow = 8 // the current map growth is to a new map of the same size
+
+	// sentinel bucket ID for iterator checks
+	noCheck = 1<<(8*sys.PtrSize) - 1
+)
+
+// isEmpty reports whether the given tophash array entry represents an empty bucket entry.
+func isEmpty(x uint8) bool {
+	return x <= emptyOne
+}
+
+// A header for a Go map.
+type hmap struct {
+	// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
+	// Make sure this stays in sync with the compiler's definition.
+	count     int // # live cells == size of map.  Must be first (used by len() builtin)
+	flags     uint8
+	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
+	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
+	hash0     uint32 // hash seed
+
+	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
+	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
+	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)
+
+	extra *mapextra // optional fields
+}
+
+// mapextra holds fields that are not present on all maps.
+type mapextra struct {
+	// If both key and elem do not contain pointers and are inline, then we mark bucket
+	// type as containing no pointers. This avoids scanning such maps.
+	// However, bmap.overflow is a pointer. In order to keep overflow buckets
+	// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
+	// overflow and oldoverflow are only used if key and elem do not contain pointers.
+	// overflow contains overflow buckets for hmap.buckets.
+	// oldoverflow contains overflow buckets for hmap.oldbuckets.
+	// The indirection allows to store a pointer to the slice in hiter.
+	overflow    *[]*bmap
+	oldoverflow *[]*bmap
+
+	// nextOverflow holds a pointer to a free overflow bucket.
+	nextOverflow *bmap
+}
+
+// A bucket for a Go map.
+type bmap struct {
+	// tophash generally contains the top byte of the hash value
+	// for each key in this bucket. If tophash[0] < minTopHash,
+	// tophash[0] is a bucket evacuation state instead.
+	tophash [bucketCnt]uint8
+	// Followed by bucketCnt keys and then bucketCnt elems.
+	// NOTE: packing all the keys together and then all the elems together makes the
+	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
+	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
+	// Followed by an overflow pointer.
+}
+
+// A hash iteration structure.
+// If you modify hiter, also change cmd/compile/internal/gc/reflect.go to indicate
+// the layout of this structure.
+type hiter struct {
+	key         unsafe.Pointer // Must be in first position.  Write nil to indicate iteration end (see cmd/compile/internal/gc/range.go).
+	elem        unsafe.Pointer // Must be in second position (see cmd/compile/internal/gc/range.go).
+	t           *maptype
+	h           *hmap
+	buckets     unsafe.Pointer // bucket ptr at hash_iter initialization time
+	bptr        *bmap          // current bucket
+	overflow    *[]*bmap       // keeps overflow buckets of hmap.buckets alive
+	oldoverflow *[]*bmap       // keeps overflow buckets of hmap.oldbuckets alive
+	startBucket uintptr        // bucket iteration started at
+	offset      uint8          // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)
+	wrapped     bool           // already wrapped around from end of bucket array to beginning
+	B           uint8
+	i           uint8
+	bucket      uintptr
+	checkBucket uintptr
+}
+
+// bucketShift returns 1<<b, optimized for code generation.
+func bucketShift(b uint8) uintptr {
+	// Masking the shift amount allows overflow checks to be elided.
+	return uintptr(1) << (b & (sys.PtrSize*8 - 1))
+}
+
+// bucketMask returns 1<<b - 1, optimized for code generation.
+func bucketMask(b uint8) uintptr {
+	return bucketShift(b) - 1
+}
+
+// tophash calculates the tophash value for hash.
+func tophash(hash uintptr) uint8 {
+	top := uint8(hash >> (sys.PtrSize*8 - 8))
+	if top < minTopHash {
+		top += minTopHash
+	}
+	return top
+}
+
+func evacuated(b *bmap) bool {
+	h := b.tophash[0]
+	return h > emptyOne && h < minTopHash
+}
+
+func (b *bmap) overflow(t *maptype) *bmap {
+	return *(**bmap)(add(unsafe.Pointer(b), uintptr(t.bucketsize)-sys.PtrSize))
+}
+
+func (b *bmap) setoverflow(t *maptype, ovf *bmap) {
+	*(**bmap)(add(unsafe.Pointer(b), uintptr(t.bucketsize)-sys.PtrSize)) = ovf
+}
+
+func (b *bmap) keys() unsafe.Pointer {
+	return add(unsafe.Pointer(b), dataOffset)
+}
+
+// incrnoverflow increments h.noverflow.
+// noverflow counts the number of overflow buckets.
+// This is used to trigger same-size map growth.
+// See also tooManyOverflowBuckets.
+// To keep hmap small, noverflow is a uint16.
+// When there are few buckets, noverflow is an exact count.
+// When there are many buckets, noverflow is an approximate count.
+func (h *hmap) incrnoverflow() {
+	// We trigger same-size map growth if there are
+	// as many overflow buckets as buckets.
+	// We need to be able to count to 1<<h.B.
+	if h.B < 16 {
+		h.noverflow++
+		return
+	}
+	// Increment with probability 1/(1<<(h.B-15)).
+	// When we reach 1<<15 - 1, we will have approximately
+	// as many overflow buckets as buckets.
+	mask := uint32(1)<<(h.B-15) - 1
+	// Example: if h.B == 18, then mask == 7,
+	// and fastrand & 7 == 0 with probability 1/8.
+	if fastrand()&mask == 0 {
+		h.noverflow++
+	}
+}
+
+func (h *hmap) newoverflow(t *maptype, b *bmap) *bmap {
+	var ovf *bmap
+	if h.extra != nil && h.extra.nextOverflow != nil {
+		// We have preallocated overflow buckets available.
+		// See makeBucketArray for more details.
+		ovf = h.extra.nextOverflow
+		if ovf.overflow(t) == nil {
+			// We're not at the end of the preallocated overflow buckets. Bump the pointer.
+			h.extra.nextOverflow = (*bmap)(add(unsafe.Pointer(ovf), uintptr(t.bucketsize)))
+		} else {
+			// This is the last preallocated overflow bucket.
+			// Reset the overflow pointer on this bucket,
+			// which was set to a non-nil sentinel value.
+			ovf.setoverflow(t, nil)
+			h.extra.nextOverflow = nil
+		}
+	} else {
+		ovf = (*bmap)(newobject(t.bucket))
+	}
+	h.incrnoverflow()
+	if t.bucket.ptrdata == 0 {
+		h.createOverflow()
+		*h.extra.overflow = append(*h.extra.overflow, ovf)
+	}
+	b.setoverflow(t, ovf)
+	return ovf
+}
+
+func (h *hmap) createOverflow() {
+	if h.extra == nil {
+		h.extra = new(mapextra)
+	}
+	if h.extra.overflow == nil {
+		h.extra.overflow = new([]*bmap)
+	}
+}
+
+func makemap64(t *maptype, hint int64, h *hmap) *hmap {
+	if int64(int(hint)) != hint {
+		hint = 0
+	}
+	return makemap(t, int(hint), h)
+}
+
+// makemap_small implements Go map creation for make(map[k]v) and
+// make(map[k]v, hint) when hint is known to be at most bucketCnt
+// at compile time and the map needs to be allocated on the heap.
+func makemap_small() *hmap {
+	h := new(hmap)
+	h.hash0 = fastrand()
+	return h
+}
+
+// makemap implements Go map creation for make(map[k]v, hint).
+// If the compiler has determined that the map or the first bucket
+// can be created on the stack, h and/or bucket may be non-nil.
+// If h != nil, the map can be created directly in h.
+// If h.buckets != nil, bucket pointed to can be used as the first bucket.
+func makemap(t *maptype, hint int, h *hmap) *hmap {
+	mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
+	if overflow || mem > maxAlloc {
+		hint = 0
+	}
+
+	// initialize Hmap
+	if h == nil {
+		h = new(hmap)
+	}
+	h.hash0 = fastrand()
+
+	// Find the size parameter B which will hold the requested # of elements.
+	// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
+	B := uint8(0)
+	for overLoadFactor(hint, B) {
+		B++
+	}
+	h.B = B
+
+	// allocate initial hash table
+	// if B == 0, the buckets field is allocated lazily later (in mapassign)
+	// If hint is large zeroing this memory could take a while.
+	if h.B != 0 {
+		var nextOverflow *bmap
+		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
+		if nextOverflow != nil {
+			h.extra = new(mapextra)
+			h.extra.nextOverflow = nextOverflow
+		}
+	}
+
+	return h
+}
+
+// makeBucketArray initializes a backing array for map buckets.
+// 1<<b is the minimum number of buckets to allocate.
+// dirtyalloc should either be nil or a bucket array previously
+// allocated by makeBucketArray with the same t and b parameters.
+// If dirtyalloc is nil a new backing array will be alloced and
+// otherwise dirtyalloc will be cleared and reused as backing array.
+func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
+	base := bucketShift(b)
+	nbuckets := base
+	// For small b, overflow buckets are unlikely.
+	// Avoid the overhead of the calculation.
+	if b >= 4 {
+		// Add on the estimated number of overflow buckets
+		// required to insert the median number of elements
+		// used with this value of b.
+		nbuckets += bucketShift(b - 4)
+		sz := t.bucket.size * nbuckets
+		up := roundupsize(sz)
+		if up != sz {
+			nbuckets = up / t.bucket.size
+		}
+	}
+
+	if dirtyalloc == nil {
+		buckets = newarray(t.bucket, int(nbuckets))
+	} else {
+		// dirtyalloc was previously generated by
+		// the above newarray(t.bucket, int(nbuckets))
+		// but may not be empty.
+		buckets = dirtyalloc
+		size := t.bucket.size * nbuckets
+		if t.bucket.ptrdata != 0 {
+			memclrHasPointers(buckets, size)
+		} else {
+			memclrNoHeapPointers(buckets, size)
+		}
+	}
+
+	if base != nbuckets {
+		// We preallocated some overflow buckets.
+		// To keep the overhead of tracking these overflow buckets to a minimum,
+		// we use the convention that if a preallocated overflow bucket's overflow
+		// pointer is nil, then there are more available by bumping the pointer.
+		// We need a safe non-nil pointer for the last overflow bucket; just use buckets.
+		nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
+		last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
+		last.setoverflow(t, (*bmap)(buckets))
+	}
+	return buckets, nextOverflow
+}
+
+// mapaccess1 returns a pointer to h[key].  Never returns nil, instead
+// it will return a reference to the zero object for the elem type if
+// the key is not in the map.
+// NOTE: The returned pointer may keep the whole map live, so don't
+// hold onto it for very long.
+func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		pc := funcPC(mapaccess1)
+		racereadpc(unsafe.Pointer(h), callerpc, pc)
+		raceReadObjectPC(t.key, key, callerpc, pc)
+	}
+	if msanenabled && h != nil {
+		msanread(key, t.key.size)
+	}
+	if h == nil || h.count == 0 {
+		if t.hashMightPanic() {
+			t.hasher(key, 0) // see issue 23734
+		}
+		return unsafe.Pointer(&zeroVal[0])
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map read and map write")
+	}
+	hash := t.hasher(key, uintptr(h.hash0))
+	m := bucketMask(h.B)
+	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
+	if c := h.oldbuckets; c != nil {
+		if !h.sameSizeGrow() {
+			// There used to be half as many buckets; mask down one more power of two.
+			m >>= 1
+		}
+		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
+		if !evacuated(oldb) {
+			b = oldb
+		}
+	}
+	top := tophash(hash)
+bucketloop:
+	for ; b != nil; b = b.overflow(t) {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
+			if t.indirectkey() {
+				k = *((*unsafe.Pointer)(k))
+			}
+			if t.key.equal(key, k) {
+				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
+				if t.indirectelem() {
+					e = *((*unsafe.Pointer)(e))
+				}
+				return e
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0])
+}
+
+func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		pc := funcPC(mapaccess2)
+		racereadpc(unsafe.Pointer(h), callerpc, pc)
+		raceReadObjectPC(t.key, key, callerpc, pc)
+	}
+	if msanenabled && h != nil {
+		msanread(key, t.key.size)
+	}
+	if h == nil || h.count == 0 {
+		if t.hashMightPanic() {
+			t.hasher(key, 0) // see issue 23734
+		}
+		return unsafe.Pointer(&zeroVal[0]), false
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map read and map write")
+	}
+	hash := t.hasher(key, uintptr(h.hash0))
+	m := bucketMask(h.B)
+	b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
+	if c := h.oldbuckets; c != nil {
+		if !h.sameSizeGrow() {
+			// There used to be half as many buckets; mask down one more power of two.
+			m >>= 1
+		}
+		oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))
+		if !evacuated(oldb) {
+			b = oldb
+		}
+	}
+	top := tophash(hash)
+bucketloop:
+	for ; b != nil; b = b.overflow(t) {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
+			if t.indirectkey() {
+				k = *((*unsafe.Pointer)(k))
+			}
+			if t.key.equal(key, k) {
+				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
+				if t.indirectelem() {
+					e = *((*unsafe.Pointer)(e))
+				}
+				return e, true
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0]), false
+}
+
+// returns both key and elem. Used by map iterator
+func mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer) {
+	if h == nil || h.count == 0 {
+		return nil, nil
+	}
+	hash := t.hasher(key, uintptr(h.hash0))
+	m := bucketMask(h.B)
+	b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
+	if c := h.oldbuckets; c != nil {
+		if !h.sameSizeGrow() {
+			// There used to be half as many buckets; mask down one more power of two.
+			m >>= 1
+		}
+		oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))
+		if !evacuated(oldb) {
+			b = oldb
+		}
+	}
+	top := tophash(hash)
+bucketloop:
+	for ; b != nil; b = b.overflow(t) {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
+			if t.indirectkey() {
+				k = *((*unsafe.Pointer)(k))
+			}
+			if t.key.equal(key, k) {
+				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
+				if t.indirectelem() {
+					e = *((*unsafe.Pointer)(e))
+				}
+				return k, e
+			}
+		}
+	}
+	return nil, nil
+}
+
+func mapaccess1_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) unsafe.Pointer {
+	e := mapaccess1(t, h, key)
+	if e == unsafe.Pointer(&zeroVal[0]) {
+		return zero
+	}
+	return e
+}
+
+func mapaccess2_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) (unsafe.Pointer, bool) {
+	e := mapaccess1(t, h, key)
+	if e == unsafe.Pointer(&zeroVal[0]) {
+		return zero, false
+	}
+	return e, true
+}
+
+// Like mapaccess, but allocates a slot for the key if it is not present in the map.
+func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		pc := funcPC(mapassign)
+		racewritepc(unsafe.Pointer(h), callerpc, pc)
+		raceReadObjectPC(t.key, key, callerpc, pc)
+	}
+	if msanenabled {
+		msanread(key, t.key.size)
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+	hash := t.hasher(key, uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher, since t.hasher may panic,
+	// in which case we have not actually done a write.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
+	top := tophash(hash)
+
+	var inserti *uint8
+	var insertk unsafe.Pointer
+	var elem unsafe.Pointer
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if isEmpty(b.tophash[i]) && inserti == nil {
+					inserti = &b.tophash[i]
+					insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
+					elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
+			if t.indirectkey() {
+				k = *((*unsafe.Pointer)(k))
+			}
+			if !t.key.equal(key, k) {
+				continue
+			}
+			// already have a mapping for key. Update it.
+			if t.needkeyupdate() {
+				typedmemmove(t.key, k, key)
+			}
+			elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if inserti == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		newb := h.newoverflow(t, b)
+		inserti = &newb.tophash[0]
+		insertk = add(unsafe.Pointer(newb), dataOffset)
+		elem = add(insertk, bucketCnt*uintptr(t.keysize))
+	}
+
+	// store new key/elem at insert position
+	if t.indirectkey() {
+		kmem := newobject(t.key)
+		*(*unsafe.Pointer)(insertk) = kmem
+		insertk = kmem
+	}
+	if t.indirectelem() {
+		vmem := newobject(t.elem)
+		*(*unsafe.Pointer)(elem) = vmem
+	}
+	typedmemmove(t.key, insertk, key)
+	*inserti = top
+	h.count++
+
+done:
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	if t.indirectelem() {
+		elem = *((*unsafe.Pointer)(elem))
+	}
+	return elem
+}
+
+func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		pc := funcPC(mapdelete)
+		racewritepc(unsafe.Pointer(h), callerpc, pc)
+		raceReadObjectPC(t.key, key, callerpc, pc)
+	}
+	if msanenabled && h != nil {
+		msanread(key, t.key.size)
+	}
+	if h == nil || h.count == 0 {
+		if t.hashMightPanic() {
+			t.hasher(key, 0) // see issue 23734
+		}
+		return
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+
+	hash := t.hasher(key, uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher, since t.hasher may panic,
+	// in which case we have not actually done a write (delete).
+	h.flags ^= hashWriting
+
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
+	bOrig := b
+	top := tophash(hash)
+search:
+	for ; b != nil; b = b.overflow(t) {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if b.tophash[i] == emptyRest {
+					break search
+				}
+				continue
+			}
+			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
+			k2 := k
+			if t.indirectkey() {
+				k2 = *((*unsafe.Pointer)(k2))
+			}
+			if !t.key.equal(key, k2) {
+				continue
+			}
+			// Only clear key if there are pointers in it.
+			if t.indirectkey() {
+				*(*unsafe.Pointer)(k) = nil
+			} else if t.key.ptrdata != 0 {
+				memclrHasPointers(k, t.key.size)
+			}
+			e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
+			if t.indirectelem() {
+				*(*unsafe.Pointer)(e) = nil
+			} else if t.elem.ptrdata != 0 {
+				memclrHasPointers(e, t.elem.size)
+			} else {
+				memclrNoHeapPointers(e, t.elem.size)
+			}
+			b.tophash[i] = emptyOne
+			// If the bucket now ends in a bunch of emptyOne states,
+			// change those to emptyRest states.
+			// It would be nice to make this a separate function, but
+			// for loops are not currently inlineable.
+			if i == bucketCnt-1 {
+				if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
+					goto notLast
+				}
+			} else {
+				if b.tophash[i+1] != emptyRest {
+					goto notLast
+				}
+			}
+			for {
+				b.tophash[i] = emptyRest
+				if i == 0 {
+					if b == bOrig {
+						break // beginning of initial bucket, we're done.
+					}
+					// Find previous bucket, continue at its last entry.
+					c := b
+					for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
+					}
+					i = bucketCnt - 1
+				} else {
+					i--
+				}
+				if b.tophash[i] != emptyOne {
+					break
+				}
+			}
+		notLast:
+			h.count--
+			// Reset the hash seed to make it more difficult for attackers to
+			// repeatedly trigger hash collisions. See issue 25237.
+			if h.count == 0 {
+				h.hash0 = fastrand()
+			}
+			break search
+		}
+	}
+
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+}
+
+// mapiterinit initializes the hiter struct used for ranging over maps.
+// The hiter struct pointed to by 'it' is allocated on the stack
+// by the compilers order pass or on the heap by reflect_mapiterinit.
+// Both need to have zeroed hiter since the struct contains pointers.
+func mapiterinit(t *maptype, h *hmap, it *hiter) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit))
+	}
+
+	if h == nil || h.count == 0 {
+		return
+	}
+
+	if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 {
+		throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go
+	}
+	it.t = t
+	it.h = h
+
+	// grab snapshot of bucket state
+	it.B = h.B
+	it.buckets = h.buckets
+	if t.bucket.ptrdata == 0 {
+		// Allocate the current slice and remember pointers to both current and old.
+		// This preserves all relevant overflow buckets alive even if
+		// the table grows and/or overflow buckets are added to the table
+		// while we are iterating.
+		h.createOverflow()
+		it.overflow = h.extra.overflow
+		it.oldoverflow = h.extra.oldoverflow
+	}
+
+	// decide where to start
+	r := uintptr(fastrand())
+	if h.B > 31-bucketCntBits {
+		r += uintptr(fastrand()) << 31
+	}
+	it.startBucket = r & bucketMask(h.B)
+	it.offset = uint8(r >> h.B & (bucketCnt - 1))
+
+	// iterator state
+	it.bucket = it.startBucket
+
+	// Remember we have an iterator.
+	// Can run concurrently with another mapiterinit().
+	if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {
+		atomic.Or8(&h.flags, iterator|oldIterator)
+	}
+
+	mapiternext(it)
+}
+
+func mapiternext(it *hiter) {
+	h := it.h
+	if raceenabled {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext))
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map iteration and map write")
+	}
+	t := it.t
+	bucket := it.bucket
+	b := it.bptr
+	i := it.i
+	checkBucket := it.checkBucket
+
+next:
+	if b == nil {
+		if bucket == it.startBucket && it.wrapped {
+			// end of iteration
+			it.key = nil
+			it.elem = nil
+			return
+		}
+		if h.growing() && it.B == h.B {
+			// Iterator was started in the middle of a grow, and the grow isn't done yet.
+			// If the bucket we're looking at hasn't been filled in yet (i.e. the old
+			// bucket hasn't been evacuated) then we need to iterate through the old
+			// bucket and only return the ones that will be migrated to this bucket.
+			oldbucket := bucket & it.h.oldbucketmask()
+			b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
+			if !evacuated(b) {
+				checkBucket = bucket
+			} else {
+				b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
+				checkBucket = noCheck
+			}
+		} else {
+			b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
+			checkBucket = noCheck
+		}
+		bucket++
+		if bucket == bucketShift(it.B) {
+			bucket = 0
+			it.wrapped = true
+		}
+		i = 0
+	}
+	for ; i < bucketCnt; i++ {
+		offi := (i + it.offset) & (bucketCnt - 1)
+		if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {
+			// TODO: emptyRest is hard to use here, as we start iterating
+			// in the middle of a bucket. It's feasible, just tricky.
+			continue
+		}
+		k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize))
+		if t.indirectkey() {
+			k = *((*unsafe.Pointer)(k))
+		}
+		e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize))
+		if checkBucket != noCheck && !h.sameSizeGrow() {
+			// Special case: iterator was started during a grow to a larger size
+			// and the grow is not done yet. We're working on a bucket whose
+			// oldbucket has not been evacuated yet. Or at least, it wasn't
+			// evacuated when we started the bucket. So we're iterating
+			// through the oldbucket, skipping any keys that will go
+			// to the other new bucket (each oldbucket expands to two
+			// buckets during a grow).
+			if t.reflexivekey() || t.key.equal(k, k) {
+				// If the item in the oldbucket is not destined for
+				// the current new bucket in the iteration, skip it.
+				hash := t.hasher(k, uintptr(h.hash0))
+				if hash&bucketMask(it.B) != checkBucket {
+					continue
+				}
+			} else {
+				// Hash isn't repeatable if k != k (NaNs).  We need a
+				// repeatable and randomish choice of which direction
+				// to send NaNs during evacuation. We'll use the low
+				// bit of tophash to decide which way NaNs go.
+				// NOTE: this case is why we need two evacuate tophash
+				// values, evacuatedX and evacuatedY, that differ in
+				// their low bit.
+				if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) {
+					continue
+				}
+			}
+		}
+		if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) ||
+			!(t.reflexivekey() || t.key.equal(k, k)) {
+			// This is the golden data, we can return it.
+			// OR
+			// key!=key, so the entry can't be deleted or updated, so we can just return it.
+			// That's lucky for us because when key!=key we can't look it up successfully.
+			it.key = k
+			if t.indirectelem() {
+				e = *((*unsafe.Pointer)(e))
+			}
+			it.elem = e
+		} else {
+			// The hash table has grown since the iterator was started.
+			// The golden data for this key is now somewhere else.
+			// Check the current hash table for the data.
+			// This code handles the case where the key
+			// has been deleted, updated, or deleted and reinserted.
+			// NOTE: we need to regrab the key as it has potentially been
+			// updated to an equal() but not identical key (e.g. +0.0 vs -0.0).
+			rk, re := mapaccessK(t, h, k)
+			if rk == nil {
+				continue // key has been deleted
+			}
+			it.key = rk
+			it.elem = re
+		}
+		it.bucket = bucket
+		if it.bptr != b { // avoid unnecessary write barrier; see issue 14921
+			it.bptr = b
+		}
+		it.i = i + 1
+		it.checkBucket = checkBucket
+		return
+	}
+	b = b.overflow(t)
+	i = 0
+	goto next
+}
+
+// mapclear deletes all keys from a map.
+func mapclear(t *maptype, h *hmap) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		pc := funcPC(mapclear)
+		racewritepc(unsafe.Pointer(h), callerpc, pc)
+	}
+
+	if h == nil || h.count == 0 {
+		return
+	}
+
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+
+	h.flags ^= hashWriting
+
+	h.flags &^= sameSizeGrow
+	h.oldbuckets = nil
+	h.nevacuate = 0
+	h.noverflow = 0
+	h.count = 0
+
+	// Reset the hash seed to make it more difficult for attackers to
+	// repeatedly trigger hash collisions. See issue 25237.
+	h.hash0 = fastrand()
+
+	// Keep the mapextra allocation but clear any extra information.
+	if h.extra != nil {
+		*h.extra = mapextra{}
+	}
+
+	// makeBucketArray clears the memory pointed to by h.buckets
+	// and recovers any overflow buckets by generating them
+	// as if h.buckets was newly alloced.
+	_, nextOverflow := makeBucketArray(t, h.B, h.buckets)
+	if nextOverflow != nil {
+		// If overflow buckets are created then h.extra
+		// will have been allocated during initial bucket creation.
+		h.extra.nextOverflow = nextOverflow
+	}
+
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+}
+
+func hashGrow(t *maptype, h *hmap) {
+	// If we've hit the load factor, get bigger.
+	// Otherwise, there are too many overflow buckets,
+	// so keep the same number of buckets and "grow" laterally.
+	bigger := uint8(1)
+	if !overLoadFactor(h.count+1, h.B) {
+		bigger = 0
+		h.flags |= sameSizeGrow
+	}
+	oldbuckets := h.buckets
+	newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
+
+	flags := h.flags &^ (iterator | oldIterator)
+	if h.flags&iterator != 0 {
+		flags |= oldIterator
+	}
+	// commit the grow (atomic wrt gc)
+	h.B += bigger
+	h.flags = flags
+	h.oldbuckets = oldbuckets
+	h.buckets = newbuckets
+	h.nevacuate = 0
+	h.noverflow = 0
+
+	if h.extra != nil && h.extra.overflow != nil {
+		// Promote current overflow buckets to the old generation.
+		if h.extra.oldoverflow != nil {
+			throw("oldoverflow is not nil")
+		}
+		h.extra.oldoverflow = h.extra.overflow
+		h.extra.overflow = nil
+	}
+	if nextOverflow != nil {
+		if h.extra == nil {
+			h.extra = new(mapextra)
+		}
+		h.extra.nextOverflow = nextOverflow
+	}
+
+	// the actual copying of the hash table data is done incrementally
+	// by growWork() and evacuate().
+}
+
+// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
+func overLoadFactor(count int, B uint8) bool {
+	return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
+}
+
+// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
+// Note that most of these overflow buckets must be in sparse use;
+// if use was dense, then we'd have already triggered regular map growth.
+func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
+	// If the threshold is too low, we do extraneous work.
+	// If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.
+	// "too many" means (approximately) as many overflow buckets as regular buckets.
+	// See incrnoverflow for more details.
+	if B > 15 {
+		B = 15
+	}
+	// The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
+	return noverflow >= uint16(1)<<(B&15)
+}
+
+// growing reports whether h is growing. The growth may be to the same size or bigger.
+func (h *hmap) growing() bool {
+	return h.oldbuckets != nil
+}
+
+// sameSizeGrow reports whether the current growth is to a map of the same size.
+func (h *hmap) sameSizeGrow() bool {
+	return h.flags&sameSizeGrow != 0
+}
+
+// noldbuckets calculates the number of buckets prior to the current map growth.
+func (h *hmap) noldbuckets() uintptr {
+	oldB := h.B
+	if !h.sameSizeGrow() {
+		oldB--
+	}
+	return bucketShift(oldB)
+}
+
+// oldbucketmask provides a mask that can be applied to calculate n % noldbuckets().
+func (h *hmap) oldbucketmask() uintptr {
+	return h.noldbuckets() - 1
+}
+
+func growWork(t *maptype, h *hmap, bucket uintptr) {
+	// make sure we evacuate the oldbucket corresponding
+	// to the bucket we're about to use
+	evacuate(t, h, bucket&h.oldbucketmask())
+
+	// evacuate one more oldbucket to make progress on growing
+	if h.growing() {
+		evacuate(t, h, h.nevacuate)
+	}
+}
+
+func bucketEvacuated(t *maptype, h *hmap, bucket uintptr) bool {
+	b := (*bmap)(add(h.oldbuckets, bucket*uintptr(t.bucketsize)))
+	return evacuated(b)
+}
+
+// evacDst is an evacuation destination.
+type evacDst struct {
+	b *bmap          // current destination bucket
+	i int            // key/elem index into b
+	k unsafe.Pointer // pointer to current key storage
+	e unsafe.Pointer // pointer to current elem storage
+}
+
+func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
+	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
+	newbit := h.noldbuckets()
+	if !evacuated(b) {
+		// TODO: reuse overflow buckets instead of using new ones, if there
+		// is no iterator using the old buckets.  (If !oldIterator.)
+
+		// xy contains the x and y (low and high) evacuation destinations.
+		var xy [2]evacDst
+		x := &xy[0]
+		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
+		x.k = add(unsafe.Pointer(x.b), dataOffset)
+		x.e = add(x.k, bucketCnt*uintptr(t.keysize))
+
+		if !h.sameSizeGrow() {
+			// Only calculate y pointers if we're growing bigger.
+			// Otherwise GC can see bad pointers.
+			y := &xy[1]
+			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
+			y.k = add(unsafe.Pointer(y.b), dataOffset)
+			y.e = add(y.k, bucketCnt*uintptr(t.keysize))
+		}
+
+		for ; b != nil; b = b.overflow(t) {
+			k := add(unsafe.Pointer(b), dataOffset)
+			e := add(k, bucketCnt*uintptr(t.keysize))
+			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
+				top := b.tophash[i]
+				if isEmpty(top) {
+					b.tophash[i] = evacuatedEmpty
+					continue
+				}
+				if top < minTopHash {
+					throw("bad map state")
+				}
+				k2 := k
+				if t.indirectkey() {
+					k2 = *((*unsafe.Pointer)(k2))
+				}
+				var useY uint8
+				if !h.sameSizeGrow() {
+					// Compute hash to make our evacuation decision (whether we need
+					// to send this key/elem to bucket x or bucket y).
+					hash := t.hasher(k2, uintptr(h.hash0))
+					if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
+						// If key != key (NaNs), then the hash could be (and probably
+						// will be) entirely different from the old hash. Moreover,
+						// it isn't reproducible. Reproducibility is required in the
+						// presence of iterators, as our evacuation decision must
+						// match whatever decision the iterator made.
+						// Fortunately, we have the freedom to send these keys either
+						// way. Also, tophash is meaningless for these kinds of keys.
+						// We let the low bit of tophash drive the evacuation decision.
+						// We recompute a new random tophash for the next level so
+						// these keys will get evenly distributed across all buckets
+						// after multiple grows.
+						useY = top & 1
+						top = tophash(hash)
+					} else {
+						if hash&newbit != 0 {
+							useY = 1
+						}
+					}
+				}
+
+				if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
+					throw("bad evacuatedN")
+				}
+
+				b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
+				dst := &xy[useY]                 // evacuation destination
+
+				if dst.i == bucketCnt {
+					dst.b = h.newoverflow(t, dst.b)
+					dst.i = 0
+					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
+					dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
+				}
+				dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
+				if t.indirectkey() {
+					*(*unsafe.Pointer)(dst.k) = k2 // copy pointer
+				} else {
+					typedmemmove(t.key, dst.k, k) // copy elem
+				}
+				if t.indirectelem() {
+					*(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
+				} else {
+					typedmemmove(t.elem, dst.e, e)
+				}
+				dst.i++
+				// These updates might push these pointers past the end of the
+				// key or elem arrays.  That's ok, as we have the overflow pointer
+				// at the end of the bucket to protect against pointing past the
+				// end of the bucket.
+				dst.k = add(dst.k, uintptr(t.keysize))
+				dst.e = add(dst.e, uintptr(t.elemsize))
+			}
+		}
+		// Unlink the overflow buckets & clear key/elem to help GC.
+		if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
+			b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
+			// Preserve b.tophash because the evacuation
+			// state is maintained there.
+			ptr := add(b, dataOffset)
+			n := uintptr(t.bucketsize) - dataOffset
+			memclrHasPointers(ptr, n)
+		}
+	}
+
+	if oldbucket == h.nevacuate {
+		advanceEvacuationMark(h, t, newbit)
+	}
+}
+
+func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {
+	h.nevacuate++
+	// Experiments suggest that 1024 is overkill by at least an order of magnitude.
+	// Put it in there as a safeguard anyway, to ensure O(1) behavior.
+	stop := h.nevacuate + 1024
+	if stop > newbit {
+		stop = newbit
+	}
+	for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {
+		h.nevacuate++
+	}
+	if h.nevacuate == newbit { // newbit == # of oldbuckets
+		// Growing is all done. Free old main bucket array.
+		h.oldbuckets = nil
+		// Can discard old overflow buckets as well.
+		// If they are still referenced by an iterator,
+		// then the iterator holds a pointers to the slice.
+		if h.extra != nil {
+			h.extra.oldoverflow = nil
+		}
+		h.flags &^= sameSizeGrow
+	}
+}
+
+// Reflect stubs. Called from ../reflect/asm_*.s
+
+//go:linkname reflect_makemap reflect.makemap
+func reflect_makemap(t *maptype, cap int) *hmap {
+	// Check invariants and reflects math.
+	if t.key.equal == nil {
+		throw("runtime.reflect_makemap: unsupported map key type")
+	}
+	if t.key.size > maxKeySize && (!t.indirectkey() || t.keysize != uint8(sys.PtrSize)) ||
+		t.key.size <= maxKeySize && (t.indirectkey() || t.keysize != uint8(t.key.size)) {
+		throw("key size wrong")
+	}
+	if t.elem.size > maxElemSize && (!t.indirectelem() || t.elemsize != uint8(sys.PtrSize)) ||
+		t.elem.size <= maxElemSize && (t.indirectelem() || t.elemsize != uint8(t.elem.size)) {
+		throw("elem size wrong")
+	}
+	if t.key.align > bucketCnt {
+		throw("key align too big")
+	}
+	if t.elem.align > bucketCnt {
+		throw("elem align too big")
+	}
+	if t.key.size%uintptr(t.key.align) != 0 {
+		throw("key size not a multiple of key align")
+	}
+	if t.elem.size%uintptr(t.elem.align) != 0 {
+		throw("elem size not a multiple of elem align")
+	}
+	if bucketCnt < 8 {
+		throw("bucketsize too small for proper alignment")
+	}
+	if dataOffset%uintptr(t.key.align) != 0 {
+		throw("need padding in bucket (key)")
+	}
+	if dataOffset%uintptr(t.elem.align) != 0 {
+		throw("need padding in bucket (elem)")
+	}
+
+	return makemap(t, cap, nil)
+}
+
+//go:linkname reflect_mapaccess reflect.mapaccess
+func reflect_mapaccess(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
+	elem, ok := mapaccess2(t, h, key)
+	if !ok {
+		// reflect wants nil for a missing element
+		elem = nil
+	}
+	return elem
+}
+
+//go:linkname reflect_mapassign reflect.mapassign
+func reflect_mapassign(t *maptype, h *hmap, key unsafe.Pointer, elem unsafe.Pointer) {
+	p := mapassign(t, h, key)
+	typedmemmove(t.elem, p, elem)
+}
+
+//go:linkname reflect_mapdelete reflect.mapdelete
+func reflect_mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
+	mapdelete(t, h, key)
+}
+
+//go:linkname reflect_mapiterinit reflect.mapiterinit
+func reflect_mapiterinit(t *maptype, h *hmap) *hiter {
+	it := new(hiter)
+	mapiterinit(t, h, it)
+	return it
+}
+
+//go:linkname reflect_mapiternext reflect.mapiternext
+func reflect_mapiternext(it *hiter) {
+	mapiternext(it)
+}
+
+//go:linkname reflect_mapiterkey reflect.mapiterkey
+func reflect_mapiterkey(it *hiter) unsafe.Pointer {
+	return it.key
+}
+
+//go:linkname reflect_mapiterelem reflect.mapiterelem
+func reflect_mapiterelem(it *hiter) unsafe.Pointer {
+	return it.elem
+}
+
+//go:linkname reflect_maplen reflect.maplen
+func reflect_maplen(h *hmap) int {
+	if h == nil {
+		return 0
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, funcPC(reflect_maplen))
+	}
+	return h.count
+}
+
+//go:linkname reflectlite_maplen internal/reflectlite.maplen
+func reflectlite_maplen(h *hmap) int {
+	if h == nil {
+		return 0
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, funcPC(reflect_maplen))
+	}
+	return h.count
+}
+
+const maxZero = 1024 // must match value in reflect/value.go:maxZero cmd/compile/internal/gc/walk.go:zeroValSize
+var zeroVal [maxZero]byte
diff --git a/src/runtime/map_benchmark_test.go b/src/runtime/map_benchmark_test.go
new file mode 100644
index 0000000..d0becc9
--- /dev/null
+++ b/src/runtime/map_benchmark_test.go
@@ -0,0 +1,535 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"math/rand"
+	"strconv"
+	"strings"
+	"testing"
+)
+
+const size = 10
+
+func BenchmarkHashStringSpeed(b *testing.B) {
+	strings := make([]string, size)
+	for i := 0; i < size; i++ {
+		strings[i] = fmt.Sprintf("string#%d", i)
+	}
+	sum := 0
+	m := make(map[string]int, size)
+	for i := 0; i < size; i++ {
+		m[strings[i]] = 0
+	}
+	idx := 0
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		sum += m[strings[idx]]
+		idx++
+		if idx == size {
+			idx = 0
+		}
+	}
+}
+
+type chunk [17]byte
+
+func BenchmarkHashBytesSpeed(b *testing.B) {
+	// a bunch of chunks, each with a different alignment mod 16
+	var chunks [size]chunk
+	// initialize each to a different value
+	for i := 0; i < size; i++ {
+		chunks[i][0] = byte(i)
+	}
+	// put into a map
+	m := make(map[chunk]int, size)
+	for i, c := range chunks {
+		m[c] = i
+	}
+	idx := 0
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		if m[chunks[idx]] != idx {
+			b.Error("bad map entry for chunk")
+		}
+		idx++
+		if idx == size {
+			idx = 0
+		}
+	}
+}
+
+func BenchmarkHashInt32Speed(b *testing.B) {
+	ints := make([]int32, size)
+	for i := 0; i < size; i++ {
+		ints[i] = int32(i)
+	}
+	sum := 0
+	m := make(map[int32]int, size)
+	for i := 0; i < size; i++ {
+		m[ints[i]] = 0
+	}
+	idx := 0
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		sum += m[ints[idx]]
+		idx++
+		if idx == size {
+			idx = 0
+		}
+	}
+}
+
+func BenchmarkHashInt64Speed(b *testing.B) {
+	ints := make([]int64, size)
+	for i := 0; i < size; i++ {
+		ints[i] = int64(i)
+	}
+	sum := 0
+	m := make(map[int64]int, size)
+	for i := 0; i < size; i++ {
+		m[ints[i]] = 0
+	}
+	idx := 0
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		sum += m[ints[idx]]
+		idx++
+		if idx == size {
+			idx = 0
+		}
+	}
+}
+func BenchmarkHashStringArraySpeed(b *testing.B) {
+	stringpairs := make([][2]string, size)
+	for i := 0; i < size; i++ {
+		for j := 0; j < 2; j++ {
+			stringpairs[i][j] = fmt.Sprintf("string#%d/%d", i, j)
+		}
+	}
+	sum := 0
+	m := make(map[[2]string]int, size)
+	for i := 0; i < size; i++ {
+		m[stringpairs[i]] = 0
+	}
+	idx := 0
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		sum += m[stringpairs[idx]]
+		idx++
+		if idx == size {
+			idx = 0
+		}
+	}
+}
+
+func BenchmarkMegMap(b *testing.B) {
+	m := make(map[string]bool)
+	for suffix := 'A'; suffix <= 'G'; suffix++ {
+		m[strings.Repeat("X", 1<<20-1)+fmt.Sprint(suffix)] = true
+	}
+	key := strings.Repeat("X", 1<<20-1) + "k"
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[key]
+	}
+}
+
+func BenchmarkMegOneMap(b *testing.B) {
+	m := make(map[string]bool)
+	m[strings.Repeat("X", 1<<20)] = true
+	key := strings.Repeat("Y", 1<<20)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[key]
+	}
+}
+
+func BenchmarkMegEqMap(b *testing.B) {
+	m := make(map[string]bool)
+	key1 := strings.Repeat("X", 1<<20)
+	key2 := strings.Repeat("X", 1<<20) // equal but different instance
+	m[key1] = true
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[key2]
+	}
+}
+
+func BenchmarkMegEmptyMap(b *testing.B) {
+	m := make(map[string]bool)
+	key := strings.Repeat("X", 1<<20)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[key]
+	}
+}
+
+func BenchmarkSmallStrMap(b *testing.B) {
+	m := make(map[string]bool)
+	for suffix := 'A'; suffix <= 'G'; suffix++ {
+		m[fmt.Sprint(suffix)] = true
+	}
+	key := "k"
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[key]
+	}
+}
+
+func BenchmarkMapStringKeysEight_16(b *testing.B) { benchmarkMapStringKeysEight(b, 16) }
+func BenchmarkMapStringKeysEight_32(b *testing.B) { benchmarkMapStringKeysEight(b, 32) }
+func BenchmarkMapStringKeysEight_64(b *testing.B) { benchmarkMapStringKeysEight(b, 64) }
+func BenchmarkMapStringKeysEight_1M(b *testing.B) { benchmarkMapStringKeysEight(b, 1<<20) }
+
+func benchmarkMapStringKeysEight(b *testing.B, keySize int) {
+	m := make(map[string]bool)
+	for i := 0; i < 8; i++ {
+		m[strings.Repeat("K", i+1)] = true
+	}
+	key := strings.Repeat("K", keySize)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_ = m[key]
+	}
+}
+
+func BenchmarkIntMap(b *testing.B) {
+	m := make(map[int]bool)
+	for i := 0; i < 8; i++ {
+		m[i] = true
+	}
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[7]
+	}
+}
+
+func BenchmarkMapFirst(b *testing.B) {
+	for n := 1; n <= 16; n++ {
+		b.Run(fmt.Sprintf("%d", n), func(b *testing.B) {
+			m := make(map[int]bool)
+			for i := 0; i < n; i++ {
+				m[i] = true
+			}
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				_ = m[0]
+			}
+		})
+	}
+}
+func BenchmarkMapMid(b *testing.B) {
+	for n := 1; n <= 16; n++ {
+		b.Run(fmt.Sprintf("%d", n), func(b *testing.B) {
+			m := make(map[int]bool)
+			for i := 0; i < n; i++ {
+				m[i] = true
+			}
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				_ = m[n>>1]
+			}
+		})
+	}
+}
+func BenchmarkMapLast(b *testing.B) {
+	for n := 1; n <= 16; n++ {
+		b.Run(fmt.Sprintf("%d", n), func(b *testing.B) {
+			m := make(map[int]bool)
+			for i := 0; i < n; i++ {
+				m[i] = true
+			}
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				_ = m[n-1]
+			}
+		})
+	}
+}
+
+func BenchmarkMapCycle(b *testing.B) {
+	// Arrange map entries to be a permutation, so that
+	// we hit all entries, and one lookup is data dependent
+	// on the previous lookup.
+	const N = 3127
+	p := rand.New(rand.NewSource(1)).Perm(N)
+	m := map[int]int{}
+	for i := 0; i < N; i++ {
+		m[i] = p[i]
+	}
+	b.ResetTimer()
+	j := 0
+	for i := 0; i < b.N; i++ {
+		j = m[j]
+	}
+	sink = uint64(j)
+}
+
+// Accessing the same keys in a row.
+func benchmarkRepeatedLookup(b *testing.B, lookupKeySize int) {
+	m := make(map[string]bool)
+	// At least bigger than a single bucket:
+	for i := 0; i < 64; i++ {
+		m[fmt.Sprintf("some key %d", i)] = true
+	}
+	base := strings.Repeat("x", lookupKeySize-1)
+	key1 := base + "1"
+	key2 := base + "2"
+	b.ResetTimer()
+	for i := 0; i < b.N/4; i++ {
+		_ = m[key1]
+		_ = m[key1]
+		_ = m[key2]
+		_ = m[key2]
+	}
+}
+
+func BenchmarkRepeatedLookupStrMapKey32(b *testing.B) { benchmarkRepeatedLookup(b, 32) }
+func BenchmarkRepeatedLookupStrMapKey1M(b *testing.B) { benchmarkRepeatedLookup(b, 1<<20) }
+
+func BenchmarkMakeMap(b *testing.B) {
+	b.Run("[Byte]Byte", func(b *testing.B) {
+		var m map[byte]byte
+		for i := 0; i < b.N; i++ {
+			m = make(map[byte]byte, 10)
+		}
+		hugeSink = m
+	})
+	b.Run("[Int]Int", func(b *testing.B) {
+		var m map[int]int
+		for i := 0; i < b.N; i++ {
+			m = make(map[int]int, 10)
+		}
+		hugeSink = m
+	})
+}
+
+func BenchmarkNewEmptyMap(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		_ = make(map[int]int)
+	}
+}
+
+func BenchmarkNewSmallMap(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		m := make(map[int]int)
+		m[0] = 0
+		m[1] = 1
+	}
+}
+
+func BenchmarkMapIter(b *testing.B) {
+	m := make(map[int]bool)
+	for i := 0; i < 8; i++ {
+		m[i] = true
+	}
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		for range m {
+		}
+	}
+}
+
+func BenchmarkMapIterEmpty(b *testing.B) {
+	m := make(map[int]bool)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		for range m {
+		}
+	}
+}
+
+func BenchmarkSameLengthMap(b *testing.B) {
+	// long strings, same length, differ in first few
+	// and last few bytes.
+	m := make(map[string]bool)
+	s1 := "foo" + strings.Repeat("-", 100) + "bar"
+	s2 := "goo" + strings.Repeat("-", 100) + "ber"
+	m[s1] = true
+	m[s2] = true
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_ = m[s1]
+	}
+}
+
+type BigKey [3]int64
+
+func BenchmarkBigKeyMap(b *testing.B) {
+	m := make(map[BigKey]bool)
+	k := BigKey{3, 4, 5}
+	m[k] = true
+	for i := 0; i < b.N; i++ {
+		_ = m[k]
+	}
+}
+
+type BigVal [3]int64
+
+func BenchmarkBigValMap(b *testing.B) {
+	m := make(map[BigKey]BigVal)
+	k := BigKey{3, 4, 5}
+	m[k] = BigVal{6, 7, 8}
+	for i := 0; i < b.N; i++ {
+		_ = m[k]
+	}
+}
+
+func BenchmarkSmallKeyMap(b *testing.B) {
+	m := make(map[int16]bool)
+	m[5] = true
+	for i := 0; i < b.N; i++ {
+		_ = m[5]
+	}
+}
+
+func BenchmarkMapPopulate(b *testing.B) {
+	for size := 1; size < 1000000; size *= 10 {
+		b.Run(strconv.Itoa(size), func(b *testing.B) {
+			b.ReportAllocs()
+			for i := 0; i < b.N; i++ {
+				m := make(map[int]bool)
+				for j := 0; j < size; j++ {
+					m[j] = true
+				}
+			}
+		})
+	}
+}
+
+type ComplexAlgKey struct {
+	a, b, c int64
+	_       int
+	d       int32
+	_       int
+	e       string
+	_       int
+	f, g, h int64
+}
+
+func BenchmarkComplexAlgMap(b *testing.B) {
+	m := make(map[ComplexAlgKey]bool)
+	var k ComplexAlgKey
+	m[k] = true
+	for i := 0; i < b.N; i++ {
+		_ = m[k]
+	}
+}
+
+func BenchmarkGoMapClear(b *testing.B) {
+	b.Run("Reflexive", func(b *testing.B) {
+		for size := 1; size < 100000; size *= 10 {
+			b.Run(strconv.Itoa(size), func(b *testing.B) {
+				m := make(map[int]int, size)
+				for i := 0; i < b.N; i++ {
+					m[0] = size // Add one element so len(m) != 0 avoiding fast paths.
+					for k := range m {
+						delete(m, k)
+					}
+				}
+			})
+		}
+	})
+	b.Run("NonReflexive", func(b *testing.B) {
+		for size := 1; size < 100000; size *= 10 {
+			b.Run(strconv.Itoa(size), func(b *testing.B) {
+				m := make(map[float64]int, size)
+				for i := 0; i < b.N; i++ {
+					m[1.0] = size // Add one element so len(m) != 0 avoiding fast paths.
+					for k := range m {
+						delete(m, k)
+					}
+				}
+			})
+		}
+	})
+}
+
+func BenchmarkMapStringConversion(b *testing.B) {
+	for _, length := range []int{32, 64} {
+		b.Run(strconv.Itoa(length), func(b *testing.B) {
+			bytes := make([]byte, length)
+			b.Run("simple", func(b *testing.B) {
+				b.ReportAllocs()
+				m := make(map[string]int)
+				m[string(bytes)] = 0
+				for i := 0; i < b.N; i++ {
+					_ = m[string(bytes)]
+				}
+			})
+			b.Run("struct", func(b *testing.B) {
+				b.ReportAllocs()
+				type stringstruct struct{ s string }
+				m := make(map[stringstruct]int)
+				m[stringstruct{string(bytes)}] = 0
+				for i := 0; i < b.N; i++ {
+					_ = m[stringstruct{string(bytes)}]
+				}
+			})
+			b.Run("array", func(b *testing.B) {
+				b.ReportAllocs()
+				type stringarray [1]string
+				m := make(map[stringarray]int)
+				m[stringarray{string(bytes)}] = 0
+				for i := 0; i < b.N; i++ {
+					_ = m[stringarray{string(bytes)}]
+				}
+			})
+		})
+	}
+}
+
+var BoolSink bool
+
+func BenchmarkMapInterfaceString(b *testing.B) {
+	m := map[interface{}]bool{}
+
+	for i := 0; i < 100; i++ {
+		m[fmt.Sprintf("%d", i)] = true
+	}
+
+	key := (interface{})("A")
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		BoolSink = m[key]
+	}
+}
+func BenchmarkMapInterfacePtr(b *testing.B) {
+	m := map[interface{}]bool{}
+
+	for i := 0; i < 100; i++ {
+		i := i
+		m[&i] = true
+	}
+
+	key := new(int)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		BoolSink = m[key]
+	}
+}
+
+var (
+	hintLessThan8    = 7
+	hintGreaterThan8 = 32
+)
+
+func BenchmarkNewEmptyMapHintLessThan8(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		_ = make(map[int]int, hintLessThan8)
+	}
+}
+
+func BenchmarkNewEmptyMapHintGreaterThan8(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		_ = make(map[int]int, hintGreaterThan8)
+	}
+}
diff --git a/src/runtime/map_fast32.go b/src/runtime/map_fast32.go
new file mode 100644
index 0000000..8d52dad
--- /dev/null
+++ b/src/runtime/map_fast32.go
@@ -0,0 +1,461 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func mapaccess1_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapaccess1_fast32))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0])
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map read and map write")
+	}
+	var b *bmap
+	if h.B == 0 {
+		// One-bucket table. No need to hash.
+		b = (*bmap)(h.buckets)
+	} else {
+		hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+		m := bucketMask(h.B)
+		b = (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
+		if c := h.oldbuckets; c != nil {
+			if !h.sameSizeGrow() {
+				// There used to be half as many buckets; mask down one more power of two.
+				m >>= 1
+			}
+			oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
+			if !evacuated(oldb) {
+				b = oldb
+			}
+		}
+	}
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 4) {
+			if *(*uint32)(k) == key && !isEmpty(b.tophash[i]) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*4+i*uintptr(t.elemsize))
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0])
+}
+
+func mapaccess2_fast32(t *maptype, h *hmap, key uint32) (unsafe.Pointer, bool) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapaccess2_fast32))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0]), false
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map read and map write")
+	}
+	var b *bmap
+	if h.B == 0 {
+		// One-bucket table. No need to hash.
+		b = (*bmap)(h.buckets)
+	} else {
+		hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+		m := bucketMask(h.B)
+		b = (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
+		if c := h.oldbuckets; c != nil {
+			if !h.sameSizeGrow() {
+				// There used to be half as many buckets; mask down one more power of two.
+				m >>= 1
+			}
+			oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
+			if !evacuated(oldb) {
+				b = oldb
+			}
+		}
+	}
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 4) {
+			if *(*uint32)(k) == key && !isEmpty(b.tophash[i]) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*4+i*uintptr(t.elemsize)), true
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0]), false
+}
+
+func mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, funcPC(mapassign_fast32))
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+	hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapassign.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast32(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
+
+	var insertb *bmap
+	var inserti uintptr
+	var insertk unsafe.Pointer
+
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if isEmpty(b.tophash[i]) {
+				if insertb == nil {
+					inserti = i
+					insertb = b
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := *((*uint32)(add(unsafe.Pointer(b), dataOffset+i*4)))
+			if k != key {
+				continue
+			}
+			inserti = i
+			insertb = b
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if insertb == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		insertb = h.newoverflow(t, b)
+		inserti = 0 // not necessary, but avoids needlessly spilling inserti
+	}
+	insertb.tophash[inserti&(bucketCnt-1)] = tophash(hash) // mask inserti to avoid bounds checks
+
+	insertk = add(unsafe.Pointer(insertb), dataOffset+inserti*4)
+	// store new key at insert position
+	*(*uint32)(insertk) = key
+
+	h.count++
+
+done:
+	elem := add(unsafe.Pointer(insertb), dataOffset+bucketCnt*4+inserti*uintptr(t.elemsize))
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	return elem
+}
+
+func mapassign_fast32ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, funcPC(mapassign_fast32))
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+	hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapassign.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast32(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
+
+	var insertb *bmap
+	var inserti uintptr
+	var insertk unsafe.Pointer
+
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if isEmpty(b.tophash[i]) {
+				if insertb == nil {
+					inserti = i
+					insertb = b
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := *((*unsafe.Pointer)(add(unsafe.Pointer(b), dataOffset+i*4)))
+			if k != key {
+				continue
+			}
+			inserti = i
+			insertb = b
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if insertb == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		insertb = h.newoverflow(t, b)
+		inserti = 0 // not necessary, but avoids needlessly spilling inserti
+	}
+	insertb.tophash[inserti&(bucketCnt-1)] = tophash(hash) // mask inserti to avoid bounds checks
+
+	insertk = add(unsafe.Pointer(insertb), dataOffset+inserti*4)
+	// store new key at insert position
+	*(*unsafe.Pointer)(insertk) = key
+
+	h.count++
+
+done:
+	elem := add(unsafe.Pointer(insertb), dataOffset+bucketCnt*4+inserti*uintptr(t.elemsize))
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	return elem
+}
+
+func mapdelete_fast32(t *maptype, h *hmap, key uint32) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, funcPC(mapdelete_fast32))
+	}
+	if h == nil || h.count == 0 {
+		return
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+
+	hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapdelete
+	h.flags ^= hashWriting
+
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast32(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
+	bOrig := b
+search:
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 4) {
+			if key != *(*uint32)(k) || isEmpty(b.tophash[i]) {
+				continue
+			}
+			// Only clear key if there are pointers in it.
+			// This can only happen if pointers are 32 bit
+			// wide as 64 bit pointers do not fit into a 32 bit key.
+			if sys.PtrSize == 4 && t.key.ptrdata != 0 {
+				// The key must be a pointer as we checked pointers are
+				// 32 bits wide and the key is 32 bits wide also.
+				*(*unsafe.Pointer)(k) = nil
+			}
+			e := add(unsafe.Pointer(b), dataOffset+bucketCnt*4+i*uintptr(t.elemsize))
+			if t.elem.ptrdata != 0 {
+				memclrHasPointers(e, t.elem.size)
+			} else {
+				memclrNoHeapPointers(e, t.elem.size)
+			}
+			b.tophash[i] = emptyOne
+			// If the bucket now ends in a bunch of emptyOne states,
+			// change those to emptyRest states.
+			if i == bucketCnt-1 {
+				if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
+					goto notLast
+				}
+			} else {
+				if b.tophash[i+1] != emptyRest {
+					goto notLast
+				}
+			}
+			for {
+				b.tophash[i] = emptyRest
+				if i == 0 {
+					if b == bOrig {
+						break // beginning of initial bucket, we're done.
+					}
+					// Find previous bucket, continue at its last entry.
+					c := b
+					for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
+					}
+					i = bucketCnt - 1
+				} else {
+					i--
+				}
+				if b.tophash[i] != emptyOne {
+					break
+				}
+			}
+		notLast:
+			h.count--
+			// Reset the hash seed to make it more difficult for attackers to
+			// repeatedly trigger hash collisions. See issue 25237.
+			if h.count == 0 {
+				h.hash0 = fastrand()
+			}
+			break search
+		}
+	}
+
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+}
+
+func growWork_fast32(t *maptype, h *hmap, bucket uintptr) {
+	// make sure we evacuate the oldbucket corresponding
+	// to the bucket we're about to use
+	evacuate_fast32(t, h, bucket&h.oldbucketmask())
+
+	// evacuate one more oldbucket to make progress on growing
+	if h.growing() {
+		evacuate_fast32(t, h, h.nevacuate)
+	}
+}
+
+func evacuate_fast32(t *maptype, h *hmap, oldbucket uintptr) {
+	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
+	newbit := h.noldbuckets()
+	if !evacuated(b) {
+		// TODO: reuse overflow buckets instead of using new ones, if there
+		// is no iterator using the old buckets.  (If !oldIterator.)
+
+		// xy contains the x and y (low and high) evacuation destinations.
+		var xy [2]evacDst
+		x := &xy[0]
+		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
+		x.k = add(unsafe.Pointer(x.b), dataOffset)
+		x.e = add(x.k, bucketCnt*4)
+
+		if !h.sameSizeGrow() {
+			// Only calculate y pointers if we're growing bigger.
+			// Otherwise GC can see bad pointers.
+			y := &xy[1]
+			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
+			y.k = add(unsafe.Pointer(y.b), dataOffset)
+			y.e = add(y.k, bucketCnt*4)
+		}
+
+		for ; b != nil; b = b.overflow(t) {
+			k := add(unsafe.Pointer(b), dataOffset)
+			e := add(k, bucketCnt*4)
+			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, 4), add(e, uintptr(t.elemsize)) {
+				top := b.tophash[i]
+				if isEmpty(top) {
+					b.tophash[i] = evacuatedEmpty
+					continue
+				}
+				if top < minTopHash {
+					throw("bad map state")
+				}
+				var useY uint8
+				if !h.sameSizeGrow() {
+					// Compute hash to make our evacuation decision (whether we need
+					// to send this key/elem to bucket x or bucket y).
+					hash := t.hasher(k, uintptr(h.hash0))
+					if hash&newbit != 0 {
+						useY = 1
+					}
+				}
+
+				b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY, enforced in makemap
+				dst := &xy[useY]                 // evacuation destination
+
+				if dst.i == bucketCnt {
+					dst.b = h.newoverflow(t, dst.b)
+					dst.i = 0
+					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
+					dst.e = add(dst.k, bucketCnt*4)
+				}
+				dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
+
+				// Copy key.
+				if sys.PtrSize == 4 && t.key.ptrdata != 0 && writeBarrier.enabled {
+					// Write with a write barrier.
+					*(*unsafe.Pointer)(dst.k) = *(*unsafe.Pointer)(k)
+				} else {
+					*(*uint32)(dst.k) = *(*uint32)(k)
+				}
+
+				typedmemmove(t.elem, dst.e, e)
+				dst.i++
+				// These updates might push these pointers past the end of the
+				// key or elem arrays.  That's ok, as we have the overflow pointer
+				// at the end of the bucket to protect against pointing past the
+				// end of the bucket.
+				dst.k = add(dst.k, 4)
+				dst.e = add(dst.e, uintptr(t.elemsize))
+			}
+		}
+		// Unlink the overflow buckets & clear key/elem to help GC.
+		if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
+			b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
+			// Preserve b.tophash because the evacuation
+			// state is maintained there.
+			ptr := add(b, dataOffset)
+			n := uintptr(t.bucketsize) - dataOffset
+			memclrHasPointers(ptr, n)
+		}
+	}
+
+	if oldbucket == h.nevacuate {
+		advanceEvacuationMark(h, t, newbit)
+	}
+}
diff --git a/src/runtime/map_fast64.go b/src/runtime/map_fast64.go
new file mode 100644
index 0000000..f1368dc
--- /dev/null
+++ b/src/runtime/map_fast64.go
@@ -0,0 +1,469 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func mapaccess1_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapaccess1_fast64))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0])
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map read and map write")
+	}
+	var b *bmap
+	if h.B == 0 {
+		// One-bucket table. No need to hash.
+		b = (*bmap)(h.buckets)
+	} else {
+		hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+		m := bucketMask(h.B)
+		b = (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
+		if c := h.oldbuckets; c != nil {
+			if !h.sameSizeGrow() {
+				// There used to be half as many buckets; mask down one more power of two.
+				m >>= 1
+			}
+			oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
+			if !evacuated(oldb) {
+				b = oldb
+			}
+		}
+	}
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 8) {
+			if *(*uint64)(k) == key && !isEmpty(b.tophash[i]) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*8+i*uintptr(t.elemsize))
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0])
+}
+
+func mapaccess2_fast64(t *maptype, h *hmap, key uint64) (unsafe.Pointer, bool) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapaccess2_fast64))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0]), false
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map read and map write")
+	}
+	var b *bmap
+	if h.B == 0 {
+		// One-bucket table. No need to hash.
+		b = (*bmap)(h.buckets)
+	} else {
+		hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+		m := bucketMask(h.B)
+		b = (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
+		if c := h.oldbuckets; c != nil {
+			if !h.sameSizeGrow() {
+				// There used to be half as many buckets; mask down one more power of two.
+				m >>= 1
+			}
+			oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
+			if !evacuated(oldb) {
+				b = oldb
+			}
+		}
+	}
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 8) {
+			if *(*uint64)(k) == key && !isEmpty(b.tophash[i]) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*8+i*uintptr(t.elemsize)), true
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0]), false
+}
+
+func mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, funcPC(mapassign_fast64))
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+	hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapassign.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast64(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
+
+	var insertb *bmap
+	var inserti uintptr
+	var insertk unsafe.Pointer
+
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if isEmpty(b.tophash[i]) {
+				if insertb == nil {
+					insertb = b
+					inserti = i
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := *((*uint64)(add(unsafe.Pointer(b), dataOffset+i*8)))
+			if k != key {
+				continue
+			}
+			insertb = b
+			inserti = i
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if insertb == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		insertb = h.newoverflow(t, b)
+		inserti = 0 // not necessary, but avoids needlessly spilling inserti
+	}
+	insertb.tophash[inserti&(bucketCnt-1)] = tophash(hash) // mask inserti to avoid bounds checks
+
+	insertk = add(unsafe.Pointer(insertb), dataOffset+inserti*8)
+	// store new key at insert position
+	*(*uint64)(insertk) = key
+
+	h.count++
+
+done:
+	elem := add(unsafe.Pointer(insertb), dataOffset+bucketCnt*8+inserti*uintptr(t.elemsize))
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	return elem
+}
+
+func mapassign_fast64ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, funcPC(mapassign_fast64))
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+	hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapassign.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast64(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
+
+	var insertb *bmap
+	var inserti uintptr
+	var insertk unsafe.Pointer
+
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if isEmpty(b.tophash[i]) {
+				if insertb == nil {
+					insertb = b
+					inserti = i
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := *((*unsafe.Pointer)(add(unsafe.Pointer(b), dataOffset+i*8)))
+			if k != key {
+				continue
+			}
+			insertb = b
+			inserti = i
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if insertb == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		insertb = h.newoverflow(t, b)
+		inserti = 0 // not necessary, but avoids needlessly spilling inserti
+	}
+	insertb.tophash[inserti&(bucketCnt-1)] = tophash(hash) // mask inserti to avoid bounds checks
+
+	insertk = add(unsafe.Pointer(insertb), dataOffset+inserti*8)
+	// store new key at insert position
+	*(*unsafe.Pointer)(insertk) = key
+
+	h.count++
+
+done:
+	elem := add(unsafe.Pointer(insertb), dataOffset+bucketCnt*8+inserti*uintptr(t.elemsize))
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	return elem
+}
+
+func mapdelete_fast64(t *maptype, h *hmap, key uint64) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, funcPC(mapdelete_fast64))
+	}
+	if h == nil || h.count == 0 {
+		return
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+
+	hash := t.hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapdelete
+	h.flags ^= hashWriting
+
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast64(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
+	bOrig := b
+search:
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 8) {
+			if key != *(*uint64)(k) || isEmpty(b.tophash[i]) {
+				continue
+			}
+			// Only clear key if there are pointers in it.
+			if t.key.ptrdata != 0 {
+				if sys.PtrSize == 8 {
+					*(*unsafe.Pointer)(k) = nil
+				} else {
+					// There are three ways to squeeze at one ore more 32 bit pointers into 64 bits.
+					// Just call memclrHasPointers instead of trying to handle all cases here.
+					memclrHasPointers(k, 8)
+				}
+			}
+			e := add(unsafe.Pointer(b), dataOffset+bucketCnt*8+i*uintptr(t.elemsize))
+			if t.elem.ptrdata != 0 {
+				memclrHasPointers(e, t.elem.size)
+			} else {
+				memclrNoHeapPointers(e, t.elem.size)
+			}
+			b.tophash[i] = emptyOne
+			// If the bucket now ends in a bunch of emptyOne states,
+			// change those to emptyRest states.
+			if i == bucketCnt-1 {
+				if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
+					goto notLast
+				}
+			} else {
+				if b.tophash[i+1] != emptyRest {
+					goto notLast
+				}
+			}
+			for {
+				b.tophash[i] = emptyRest
+				if i == 0 {
+					if b == bOrig {
+						break // beginning of initial bucket, we're done.
+					}
+					// Find previous bucket, continue at its last entry.
+					c := b
+					for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
+					}
+					i = bucketCnt - 1
+				} else {
+					i--
+				}
+				if b.tophash[i] != emptyOne {
+					break
+				}
+			}
+		notLast:
+			h.count--
+			// Reset the hash seed to make it more difficult for attackers to
+			// repeatedly trigger hash collisions. See issue 25237.
+			if h.count == 0 {
+				h.hash0 = fastrand()
+			}
+			break search
+		}
+	}
+
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+}
+
+func growWork_fast64(t *maptype, h *hmap, bucket uintptr) {
+	// make sure we evacuate the oldbucket corresponding
+	// to the bucket we're about to use
+	evacuate_fast64(t, h, bucket&h.oldbucketmask())
+
+	// evacuate one more oldbucket to make progress on growing
+	if h.growing() {
+		evacuate_fast64(t, h, h.nevacuate)
+	}
+}
+
+func evacuate_fast64(t *maptype, h *hmap, oldbucket uintptr) {
+	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
+	newbit := h.noldbuckets()
+	if !evacuated(b) {
+		// TODO: reuse overflow buckets instead of using new ones, if there
+		// is no iterator using the old buckets.  (If !oldIterator.)
+
+		// xy contains the x and y (low and high) evacuation destinations.
+		var xy [2]evacDst
+		x := &xy[0]
+		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
+		x.k = add(unsafe.Pointer(x.b), dataOffset)
+		x.e = add(x.k, bucketCnt*8)
+
+		if !h.sameSizeGrow() {
+			// Only calculate y pointers if we're growing bigger.
+			// Otherwise GC can see bad pointers.
+			y := &xy[1]
+			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
+			y.k = add(unsafe.Pointer(y.b), dataOffset)
+			y.e = add(y.k, bucketCnt*8)
+		}
+
+		for ; b != nil; b = b.overflow(t) {
+			k := add(unsafe.Pointer(b), dataOffset)
+			e := add(k, bucketCnt*8)
+			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, 8), add(e, uintptr(t.elemsize)) {
+				top := b.tophash[i]
+				if isEmpty(top) {
+					b.tophash[i] = evacuatedEmpty
+					continue
+				}
+				if top < minTopHash {
+					throw("bad map state")
+				}
+				var useY uint8
+				if !h.sameSizeGrow() {
+					// Compute hash to make our evacuation decision (whether we need
+					// to send this key/elem to bucket x or bucket y).
+					hash := t.hasher(k, uintptr(h.hash0))
+					if hash&newbit != 0 {
+						useY = 1
+					}
+				}
+
+				b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY, enforced in makemap
+				dst := &xy[useY]                 // evacuation destination
+
+				if dst.i == bucketCnt {
+					dst.b = h.newoverflow(t, dst.b)
+					dst.i = 0
+					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
+					dst.e = add(dst.k, bucketCnt*8)
+				}
+				dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
+
+				// Copy key.
+				if t.key.ptrdata != 0 && writeBarrier.enabled {
+					if sys.PtrSize == 8 {
+						// Write with a write barrier.
+						*(*unsafe.Pointer)(dst.k) = *(*unsafe.Pointer)(k)
+					} else {
+						// There are three ways to squeeze at least one 32 bit pointer into 64 bits.
+						// Give up and call typedmemmove.
+						typedmemmove(t.key, dst.k, k)
+					}
+				} else {
+					*(*uint64)(dst.k) = *(*uint64)(k)
+				}
+
+				typedmemmove(t.elem, dst.e, e)
+				dst.i++
+				// These updates might push these pointers past the end of the
+				// key or elem arrays.  That's ok, as we have the overflow pointer
+				// at the end of the bucket to protect against pointing past the
+				// end of the bucket.
+				dst.k = add(dst.k, 8)
+				dst.e = add(dst.e, uintptr(t.elemsize))
+			}
+		}
+		// Unlink the overflow buckets & clear key/elem to help GC.
+		if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
+			b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
+			// Preserve b.tophash because the evacuation
+			// state is maintained there.
+			ptr := add(b, dataOffset)
+			n := uintptr(t.bucketsize) - dataOffset
+			memclrHasPointers(ptr, n)
+		}
+	}
+
+	if oldbucket == h.nevacuate {
+		advanceEvacuationMark(h, t, newbit)
+	}
+}
diff --git a/src/runtime/map_faststr.go b/src/runtime/map_faststr.go
new file mode 100644
index 0000000..2d1ac76
--- /dev/null
+++ b/src/runtime/map_faststr.go
@@ -0,0 +1,481 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func mapaccess1_faststr(t *maptype, h *hmap, ky string) unsafe.Pointer {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapaccess1_faststr))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0])
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map read and map write")
+	}
+	key := stringStructOf(&ky)
+	if h.B == 0 {
+		// One-bucket table.
+		b := (*bmap)(h.buckets)
+		if key.len < 32 {
+			// short key, doing lots of comparisons is ok
+			for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*sys.PtrSize) {
+				k := (*stringStruct)(kptr)
+				if k.len != key.len || isEmpty(b.tophash[i]) {
+					if b.tophash[i] == emptyRest {
+						break
+					}
+					continue
+				}
+				if k.str == key.str || memequal(k.str, key.str, uintptr(key.len)) {
+					return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*sys.PtrSize+i*uintptr(t.elemsize))
+				}
+			}
+			return unsafe.Pointer(&zeroVal[0])
+		}
+		// long key, try not to do more comparisons than necessary
+		keymaybe := uintptr(bucketCnt)
+		for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*sys.PtrSize) {
+			k := (*stringStruct)(kptr)
+			if k.len != key.len || isEmpty(b.tophash[i]) {
+				if b.tophash[i] == emptyRest {
+					break
+				}
+				continue
+			}
+			if k.str == key.str {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*sys.PtrSize+i*uintptr(t.elemsize))
+			}
+			// check first 4 bytes
+			if *((*[4]byte)(key.str)) != *((*[4]byte)(k.str)) {
+				continue
+			}
+			// check last 4 bytes
+			if *((*[4]byte)(add(key.str, uintptr(key.len)-4))) != *((*[4]byte)(add(k.str, uintptr(key.len)-4))) {
+				continue
+			}
+			if keymaybe != bucketCnt {
+				// Two keys are potential matches. Use hash to distinguish them.
+				goto dohash
+			}
+			keymaybe = i
+		}
+		if keymaybe != bucketCnt {
+			k := (*stringStruct)(add(unsafe.Pointer(b), dataOffset+keymaybe*2*sys.PtrSize))
+			if memequal(k.str, key.str, uintptr(key.len)) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*sys.PtrSize+keymaybe*uintptr(t.elemsize))
+			}
+		}
+		return unsafe.Pointer(&zeroVal[0])
+	}
+dohash:
+	hash := t.hasher(noescape(unsafe.Pointer(&ky)), uintptr(h.hash0))
+	m := bucketMask(h.B)
+	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
+	if c := h.oldbuckets; c != nil {
+		if !h.sameSizeGrow() {
+			// There used to be half as many buckets; mask down one more power of two.
+			m >>= 1
+		}
+		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
+		if !evacuated(oldb) {
+			b = oldb
+		}
+	}
+	top := tophash(hash)
+	for ; b != nil; b = b.overflow(t) {
+		for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*sys.PtrSize) {
+			k := (*stringStruct)(kptr)
+			if k.len != key.len || b.tophash[i] != top {
+				continue
+			}
+			if k.str == key.str || memequal(k.str, key.str, uintptr(key.len)) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*sys.PtrSize+i*uintptr(t.elemsize))
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0])
+}
+
+func mapaccess2_faststr(t *maptype, h *hmap, ky string) (unsafe.Pointer, bool) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapaccess2_faststr))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0]), false
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map read and map write")
+	}
+	key := stringStructOf(&ky)
+	if h.B == 0 {
+		// One-bucket table.
+		b := (*bmap)(h.buckets)
+		if key.len < 32 {
+			// short key, doing lots of comparisons is ok
+			for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*sys.PtrSize) {
+				k := (*stringStruct)(kptr)
+				if k.len != key.len || isEmpty(b.tophash[i]) {
+					if b.tophash[i] == emptyRest {
+						break
+					}
+					continue
+				}
+				if k.str == key.str || memequal(k.str, key.str, uintptr(key.len)) {
+					return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*sys.PtrSize+i*uintptr(t.elemsize)), true
+				}
+			}
+			return unsafe.Pointer(&zeroVal[0]), false
+		}
+		// long key, try not to do more comparisons than necessary
+		keymaybe := uintptr(bucketCnt)
+		for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*sys.PtrSize) {
+			k := (*stringStruct)(kptr)
+			if k.len != key.len || isEmpty(b.tophash[i]) {
+				if b.tophash[i] == emptyRest {
+					break
+				}
+				continue
+			}
+			if k.str == key.str {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*sys.PtrSize+i*uintptr(t.elemsize)), true
+			}
+			// check first 4 bytes
+			if *((*[4]byte)(key.str)) != *((*[4]byte)(k.str)) {
+				continue
+			}
+			// check last 4 bytes
+			if *((*[4]byte)(add(key.str, uintptr(key.len)-4))) != *((*[4]byte)(add(k.str, uintptr(key.len)-4))) {
+				continue
+			}
+			if keymaybe != bucketCnt {
+				// Two keys are potential matches. Use hash to distinguish them.
+				goto dohash
+			}
+			keymaybe = i
+		}
+		if keymaybe != bucketCnt {
+			k := (*stringStruct)(add(unsafe.Pointer(b), dataOffset+keymaybe*2*sys.PtrSize))
+			if memequal(k.str, key.str, uintptr(key.len)) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*sys.PtrSize+keymaybe*uintptr(t.elemsize)), true
+			}
+		}
+		return unsafe.Pointer(&zeroVal[0]), false
+	}
+dohash:
+	hash := t.hasher(noescape(unsafe.Pointer(&ky)), uintptr(h.hash0))
+	m := bucketMask(h.B)
+	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
+	if c := h.oldbuckets; c != nil {
+		if !h.sameSizeGrow() {
+			// There used to be half as many buckets; mask down one more power of two.
+			m >>= 1
+		}
+		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
+		if !evacuated(oldb) {
+			b = oldb
+		}
+	}
+	top := tophash(hash)
+	for ; b != nil; b = b.overflow(t) {
+		for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*sys.PtrSize) {
+			k := (*stringStruct)(kptr)
+			if k.len != key.len || b.tophash[i] != top {
+				continue
+			}
+			if k.str == key.str || memequal(k.str, key.str, uintptr(key.len)) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*sys.PtrSize+i*uintptr(t.elemsize)), true
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0]), false
+}
+
+func mapassign_faststr(t *maptype, h *hmap, s string) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, funcPC(mapassign_faststr))
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+	key := stringStructOf(&s)
+	hash := t.hasher(noescape(unsafe.Pointer(&s)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapassign.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_faststr(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
+	top := tophash(hash)
+
+	var insertb *bmap
+	var inserti uintptr
+	var insertk unsafe.Pointer
+
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if isEmpty(b.tophash[i]) && insertb == nil {
+					insertb = b
+					inserti = i
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := (*stringStruct)(add(unsafe.Pointer(b), dataOffset+i*2*sys.PtrSize))
+			if k.len != key.len {
+				continue
+			}
+			if k.str != key.str && !memequal(k.str, key.str, uintptr(key.len)) {
+				continue
+			}
+			// already have a mapping for key. Update it.
+			inserti = i
+			insertb = b
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if insertb == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		insertb = h.newoverflow(t, b)
+		inserti = 0 // not necessary, but avoids needlessly spilling inserti
+	}
+	insertb.tophash[inserti&(bucketCnt-1)] = top // mask inserti to avoid bounds checks
+
+	insertk = add(unsafe.Pointer(insertb), dataOffset+inserti*2*sys.PtrSize)
+	// store new key at insert position
+	*((*stringStruct)(insertk)) = *key
+	h.count++
+
+done:
+	elem := add(unsafe.Pointer(insertb), dataOffset+bucketCnt*2*sys.PtrSize+inserti*uintptr(t.elemsize))
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	return elem
+}
+
+func mapdelete_faststr(t *maptype, h *hmap, ky string) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, funcPC(mapdelete_faststr))
+	}
+	if h == nil || h.count == 0 {
+		return
+	}
+	if h.flags&hashWriting != 0 {
+		throw("concurrent map writes")
+	}
+
+	key := stringStructOf(&ky)
+	hash := t.hasher(noescape(unsafe.Pointer(&ky)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapdelete
+	h.flags ^= hashWriting
+
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_faststr(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
+	bOrig := b
+	top := tophash(hash)
+search:
+	for ; b != nil; b = b.overflow(t) {
+		for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*sys.PtrSize) {
+			k := (*stringStruct)(kptr)
+			if k.len != key.len || b.tophash[i] != top {
+				continue
+			}
+			if k.str != key.str && !memequal(k.str, key.str, uintptr(key.len)) {
+				continue
+			}
+			// Clear key's pointer.
+			k.str = nil
+			e := add(unsafe.Pointer(b), dataOffset+bucketCnt*2*sys.PtrSize+i*uintptr(t.elemsize))
+			if t.elem.ptrdata != 0 {
+				memclrHasPointers(e, t.elem.size)
+			} else {
+				memclrNoHeapPointers(e, t.elem.size)
+			}
+			b.tophash[i] = emptyOne
+			// If the bucket now ends in a bunch of emptyOne states,
+			// change those to emptyRest states.
+			if i == bucketCnt-1 {
+				if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
+					goto notLast
+				}
+			} else {
+				if b.tophash[i+1] != emptyRest {
+					goto notLast
+				}
+			}
+			for {
+				b.tophash[i] = emptyRest
+				if i == 0 {
+					if b == bOrig {
+						break // beginning of initial bucket, we're done.
+					}
+					// Find previous bucket, continue at its last entry.
+					c := b
+					for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
+					}
+					i = bucketCnt - 1
+				} else {
+					i--
+				}
+				if b.tophash[i] != emptyOne {
+					break
+				}
+			}
+		notLast:
+			h.count--
+			// Reset the hash seed to make it more difficult for attackers to
+			// repeatedly trigger hash collisions. See issue 25237.
+			if h.count == 0 {
+				h.hash0 = fastrand()
+			}
+			break search
+		}
+	}
+
+	if h.flags&hashWriting == 0 {
+		throw("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+}
+
+func growWork_faststr(t *maptype, h *hmap, bucket uintptr) {
+	// make sure we evacuate the oldbucket corresponding
+	// to the bucket we're about to use
+	evacuate_faststr(t, h, bucket&h.oldbucketmask())
+
+	// evacuate one more oldbucket to make progress on growing
+	if h.growing() {
+		evacuate_faststr(t, h, h.nevacuate)
+	}
+}
+
+func evacuate_faststr(t *maptype, h *hmap, oldbucket uintptr) {
+	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
+	newbit := h.noldbuckets()
+	if !evacuated(b) {
+		// TODO: reuse overflow buckets instead of using new ones, if there
+		// is no iterator using the old buckets.  (If !oldIterator.)
+
+		// xy contains the x and y (low and high) evacuation destinations.
+		var xy [2]evacDst
+		x := &xy[0]
+		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
+		x.k = add(unsafe.Pointer(x.b), dataOffset)
+		x.e = add(x.k, bucketCnt*2*sys.PtrSize)
+
+		if !h.sameSizeGrow() {
+			// Only calculate y pointers if we're growing bigger.
+			// Otherwise GC can see bad pointers.
+			y := &xy[1]
+			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
+			y.k = add(unsafe.Pointer(y.b), dataOffset)
+			y.e = add(y.k, bucketCnt*2*sys.PtrSize)
+		}
+
+		for ; b != nil; b = b.overflow(t) {
+			k := add(unsafe.Pointer(b), dataOffset)
+			e := add(k, bucketCnt*2*sys.PtrSize)
+			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, 2*sys.PtrSize), add(e, uintptr(t.elemsize)) {
+				top := b.tophash[i]
+				if isEmpty(top) {
+					b.tophash[i] = evacuatedEmpty
+					continue
+				}
+				if top < minTopHash {
+					throw("bad map state")
+				}
+				var useY uint8
+				if !h.sameSizeGrow() {
+					// Compute hash to make our evacuation decision (whether we need
+					// to send this key/elem to bucket x or bucket y).
+					hash := t.hasher(k, uintptr(h.hash0))
+					if hash&newbit != 0 {
+						useY = 1
+					}
+				}
+
+				b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY, enforced in makemap
+				dst := &xy[useY]                 // evacuation destination
+
+				if dst.i == bucketCnt {
+					dst.b = h.newoverflow(t, dst.b)
+					dst.i = 0
+					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
+					dst.e = add(dst.k, bucketCnt*2*sys.PtrSize)
+				}
+				dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
+
+				// Copy key.
+				*(*string)(dst.k) = *(*string)(k)
+
+				typedmemmove(t.elem, dst.e, e)
+				dst.i++
+				// These updates might push these pointers past the end of the
+				// key or elem arrays.  That's ok, as we have the overflow pointer
+				// at the end of the bucket to protect against pointing past the
+				// end of the bucket.
+				dst.k = add(dst.k, 2*sys.PtrSize)
+				dst.e = add(dst.e, uintptr(t.elemsize))
+			}
+		}
+		// Unlink the overflow buckets & clear key/elem to help GC.
+		if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
+			b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
+			// Preserve b.tophash because the evacuation
+			// state is maintained there.
+			ptr := add(b, dataOffset)
+			n := uintptr(t.bucketsize) - dataOffset
+			memclrHasPointers(ptr, n)
+		}
+	}
+
+	if oldbucket == h.nevacuate {
+		advanceEvacuationMark(h, t, newbit)
+	}
+}
diff --git a/src/runtime/map_test.go b/src/runtime/map_test.go
new file mode 100644
index 0000000..302b3c2
--- /dev/null
+++ b/src/runtime/map_test.go
@@ -0,0 +1,1241 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"math"
+	"reflect"
+	"runtime"
+	"runtime/internal/sys"
+	"sort"
+	"strconv"
+	"strings"
+	"sync"
+	"testing"
+)
+
+func TestHmapSize(t *testing.T) {
+	// The structure of hmap is defined in runtime/map.go
+	// and in cmd/compile/internal/gc/reflect.go and must be in sync.
+	// The size of hmap should be 48 bytes on 64 bit and 28 bytes on 32 bit platforms.
+	var hmapSize = uintptr(8 + 5*sys.PtrSize)
+	if runtime.RuntimeHmapSize != hmapSize {
+		t.Errorf("sizeof(runtime.hmap{})==%d, want %d", runtime.RuntimeHmapSize, hmapSize)
+	}
+
+}
+
+// negative zero is a good test because:
+//  1) 0 and -0 are equal, yet have distinct representations.
+//  2) 0 is represented as all zeros, -0 isn't.
+// I'm not sure the language spec actually requires this behavior,
+// but it's what the current map implementation does.
+func TestNegativeZero(t *testing.T) {
+	m := make(map[float64]bool, 0)
+
+	m[+0.0] = true
+	m[math.Copysign(0.0, -1.0)] = true // should overwrite +0 entry
+
+	if len(m) != 1 {
+		t.Error("length wrong")
+	}
+
+	for k := range m {
+		if math.Copysign(1.0, k) > 0 {
+			t.Error("wrong sign")
+		}
+	}
+
+	m = make(map[float64]bool, 0)
+	m[math.Copysign(0.0, -1.0)] = true
+	m[+0.0] = true // should overwrite -0.0 entry
+
+	if len(m) != 1 {
+		t.Error("length wrong")
+	}
+
+	for k := range m {
+		if math.Copysign(1.0, k) < 0 {
+			t.Error("wrong sign")
+		}
+	}
+}
+
+func testMapNan(t *testing.T, m map[float64]int) {
+	if len(m) != 3 {
+		t.Error("length wrong")
+	}
+	s := 0
+	for k, v := range m {
+		if k == k {
+			t.Error("nan disappeared")
+		}
+		if (v & (v - 1)) != 0 {
+			t.Error("value wrong")
+		}
+		s |= v
+	}
+	if s != 7 {
+		t.Error("values wrong")
+	}
+}
+
+// nan is a good test because nan != nan, and nan has
+// a randomized hash value.
+func TestMapAssignmentNan(t *testing.T) {
+	m := make(map[float64]int, 0)
+	nan := math.NaN()
+
+	// Test assignment.
+	m[nan] = 1
+	m[nan] = 2
+	m[nan] = 4
+	testMapNan(t, m)
+}
+
+// nan is a good test because nan != nan, and nan has
+// a randomized hash value.
+func TestMapOperatorAssignmentNan(t *testing.T) {
+	m := make(map[float64]int, 0)
+	nan := math.NaN()
+
+	// Test assignment operations.
+	m[nan] += 1
+	m[nan] += 2
+	m[nan] += 4
+	testMapNan(t, m)
+}
+
+func TestMapOperatorAssignment(t *testing.T) {
+	m := make(map[int]int, 0)
+
+	// "m[k] op= x" is rewritten into "m[k] = m[k] op x"
+	// differently when op is / or % than when it isn't.
+	// Simple test to make sure they all work as expected.
+	m[0] = 12345
+	m[0] += 67890
+	m[0] /= 123
+	m[0] %= 456
+
+	const want = (12345 + 67890) / 123 % 456
+	if got := m[0]; got != want {
+		t.Errorf("got %d, want %d", got, want)
+	}
+}
+
+var sinkAppend bool
+
+func TestMapAppendAssignment(t *testing.T) {
+	m := make(map[int][]int, 0)
+
+	m[0] = nil
+	m[0] = append(m[0], 12345)
+	m[0] = append(m[0], 67890)
+	sinkAppend, m[0] = !sinkAppend, append(m[0], 123, 456)
+	a := []int{7, 8, 9, 0}
+	m[0] = append(m[0], a...)
+
+	want := []int{12345, 67890, 123, 456, 7, 8, 9, 0}
+	if got := m[0]; !reflect.DeepEqual(got, want) {
+		t.Errorf("got %v, want %v", got, want)
+	}
+}
+
+// Maps aren't actually copied on assignment.
+func TestAlias(t *testing.T) {
+	m := make(map[int]int, 0)
+	m[0] = 5
+	n := m
+	n[0] = 6
+	if m[0] != 6 {
+		t.Error("alias didn't work")
+	}
+}
+
+func TestGrowWithNaN(t *testing.T) {
+	m := make(map[float64]int, 4)
+	nan := math.NaN()
+
+	// Use both assignment and assignment operations as they may
+	// behave differently.
+	m[nan] = 1
+	m[nan] = 2
+	m[nan] += 4
+
+	cnt := 0
+	s := 0
+	growflag := true
+	for k, v := range m {
+		if growflag {
+			// force a hashtable resize
+			for i := 0; i < 50; i++ {
+				m[float64(i)] = i
+			}
+			for i := 50; i < 100; i++ {
+				m[float64(i)] += i
+			}
+			growflag = false
+		}
+		if k != k {
+			cnt++
+			s |= v
+		}
+	}
+	if cnt != 3 {
+		t.Error("NaN keys lost during grow")
+	}
+	if s != 7 {
+		t.Error("NaN values lost during grow")
+	}
+}
+
+type FloatInt struct {
+	x float64
+	y int
+}
+
+func TestGrowWithNegativeZero(t *testing.T) {
+	negzero := math.Copysign(0.0, -1.0)
+	m := make(map[FloatInt]int, 4)
+	m[FloatInt{0.0, 0}] = 1
+	m[FloatInt{0.0, 1}] += 2
+	m[FloatInt{0.0, 2}] += 4
+	m[FloatInt{0.0, 3}] = 8
+	growflag := true
+	s := 0
+	cnt := 0
+	negcnt := 0
+	// The first iteration should return the +0 key.
+	// The subsequent iterations should return the -0 key.
+	// I'm not really sure this is required by the spec,
+	// but it makes sense.
+	// TODO: are we allowed to get the first entry returned again???
+	for k, v := range m {
+		if v == 0 {
+			continue
+		} // ignore entries added to grow table
+		cnt++
+		if math.Copysign(1.0, k.x) < 0 {
+			if v&16 == 0 {
+				t.Error("key/value not updated together 1")
+			}
+			negcnt++
+			s |= v & 15
+		} else {
+			if v&16 == 16 {
+				t.Error("key/value not updated together 2", k, v)
+			}
+			s |= v
+		}
+		if growflag {
+			// force a hashtable resize
+			for i := 0; i < 100; i++ {
+				m[FloatInt{3.0, i}] = 0
+			}
+			// then change all the entries
+			// to negative zero
+			m[FloatInt{negzero, 0}] = 1 | 16
+			m[FloatInt{negzero, 1}] = 2 | 16
+			m[FloatInt{negzero, 2}] = 4 | 16
+			m[FloatInt{negzero, 3}] = 8 | 16
+			growflag = false
+		}
+	}
+	if s != 15 {
+		t.Error("entry missing", s)
+	}
+	if cnt != 4 {
+		t.Error("wrong number of entries returned by iterator", cnt)
+	}
+	if negcnt != 3 {
+		t.Error("update to negzero missed by iteration", negcnt)
+	}
+}
+
+func TestIterGrowAndDelete(t *testing.T) {
+	m := make(map[int]int, 4)
+	for i := 0; i < 100; i++ {
+		m[i] = i
+	}
+	growflag := true
+	for k := range m {
+		if growflag {
+			// grow the table
+			for i := 100; i < 1000; i++ {
+				m[i] = i
+			}
+			// delete all odd keys
+			for i := 1; i < 1000; i += 2 {
+				delete(m, i)
+			}
+			growflag = false
+		} else {
+			if k&1 == 1 {
+				t.Error("odd value returned")
+			}
+		}
+	}
+}
+
+// make sure old bucket arrays don't get GCd while
+// an iterator is still using them.
+func TestIterGrowWithGC(t *testing.T) {
+	m := make(map[int]int, 4)
+	for i := 0; i < 8; i++ {
+		m[i] = i
+	}
+	for i := 8; i < 16; i++ {
+		m[i] += i
+	}
+	growflag := true
+	bitmask := 0
+	for k := range m {
+		if k < 16 {
+			bitmask |= 1 << uint(k)
+		}
+		if growflag {
+			// grow the table
+			for i := 100; i < 1000; i++ {
+				m[i] = i
+			}
+			// trigger a gc
+			runtime.GC()
+			growflag = false
+		}
+	}
+	if bitmask != 1<<16-1 {
+		t.Error("missing key", bitmask)
+	}
+}
+
+func testConcurrentReadsAfterGrowth(t *testing.T, useReflect bool) {
+	t.Parallel()
+	if runtime.GOMAXPROCS(-1) == 1 {
+		defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(16))
+	}
+	numLoop := 10
+	numGrowStep := 250
+	numReader := 16
+	if testing.Short() {
+		numLoop, numGrowStep = 2, 100
+	}
+	for i := 0; i < numLoop; i++ {
+		m := make(map[int]int, 0)
+		for gs := 0; gs < numGrowStep; gs++ {
+			m[gs] = gs
+			var wg sync.WaitGroup
+			wg.Add(numReader * 2)
+			for nr := 0; nr < numReader; nr++ {
+				go func() {
+					defer wg.Done()
+					for range m {
+					}
+				}()
+				go func() {
+					defer wg.Done()
+					for key := 0; key < gs; key++ {
+						_ = m[key]
+					}
+				}()
+				if useReflect {
+					wg.Add(1)
+					go func() {
+						defer wg.Done()
+						mv := reflect.ValueOf(m)
+						keys := mv.MapKeys()
+						for _, k := range keys {
+							mv.MapIndex(k)
+						}
+					}()
+				}
+			}
+			wg.Wait()
+		}
+	}
+}
+
+func TestConcurrentReadsAfterGrowth(t *testing.T) {
+	testConcurrentReadsAfterGrowth(t, false)
+}
+
+func TestConcurrentReadsAfterGrowthReflect(t *testing.T) {
+	testConcurrentReadsAfterGrowth(t, true)
+}
+
+func TestBigItems(t *testing.T) {
+	var key [256]string
+	for i := 0; i < 256; i++ {
+		key[i] = "foo"
+	}
+	m := make(map[[256]string][256]string, 4)
+	for i := 0; i < 100; i++ {
+		key[37] = fmt.Sprintf("string%02d", i)
+		m[key] = key
+	}
+	var keys [100]string
+	var values [100]string
+	i := 0
+	for k, v := range m {
+		keys[i] = k[37]
+		values[i] = v[37]
+		i++
+	}
+	sort.Strings(keys[:])
+	sort.Strings(values[:])
+	for i := 0; i < 100; i++ {
+		if keys[i] != fmt.Sprintf("string%02d", i) {
+			t.Errorf("#%d: missing key: %v", i, keys[i])
+		}
+		if values[i] != fmt.Sprintf("string%02d", i) {
+			t.Errorf("#%d: missing value: %v", i, values[i])
+		}
+	}
+}
+
+func TestMapHugeZero(t *testing.T) {
+	type T [4000]byte
+	m := map[int]T{}
+	x := m[0]
+	if x != (T{}) {
+		t.Errorf("map value not zero")
+	}
+	y, ok := m[0]
+	if ok {
+		t.Errorf("map value should be missing")
+	}
+	if y != (T{}) {
+		t.Errorf("map value not zero")
+	}
+}
+
+type empty struct {
+}
+
+func TestEmptyKeyAndValue(t *testing.T) {
+	a := make(map[int]empty, 4)
+	b := make(map[empty]int, 4)
+	c := make(map[empty]empty, 4)
+	a[0] = empty{}
+	b[empty{}] = 0
+	b[empty{}] = 1
+	c[empty{}] = empty{}
+
+	if len(a) != 1 {
+		t.Errorf("empty value insert problem")
+	}
+	if b[empty{}] != 1 {
+		t.Errorf("empty key returned wrong value")
+	}
+}
+
+// Tests a map with a single bucket, with same-lengthed short keys
+// ("quick keys") as well as long keys.
+func TestSingleBucketMapStringKeys_DupLen(t *testing.T) {
+	testMapLookups(t, map[string]string{
+		"x":                      "x1val",
+		"xx":                     "x2val",
+		"foo":                    "fooval",
+		"bar":                    "barval", // same key length as "foo"
+		"xxxx":                   "x4val",
+		strings.Repeat("x", 128): "longval1",
+		strings.Repeat("y", 128): "longval2",
+	})
+}
+
+// Tests a map with a single bucket, with all keys having different lengths.
+func TestSingleBucketMapStringKeys_NoDupLen(t *testing.T) {
+	testMapLookups(t, map[string]string{
+		"x":                      "x1val",
+		"xx":                     "x2val",
+		"foo":                    "fooval",
+		"xxxx":                   "x4val",
+		"xxxxx":                  "x5val",
+		"xxxxxx":                 "x6val",
+		strings.Repeat("x", 128): "longval",
+	})
+}
+
+func testMapLookups(t *testing.T, m map[string]string) {
+	for k, v := range m {
+		if m[k] != v {
+			t.Fatalf("m[%q] = %q; want %q", k, m[k], v)
+		}
+	}
+}
+
+// Tests whether the iterator returns the right elements when
+// started in the middle of a grow, when the keys are NaNs.
+func TestMapNanGrowIterator(t *testing.T) {
+	m := make(map[float64]int)
+	nan := math.NaN()
+	const nBuckets = 16
+	// To fill nBuckets buckets takes LOAD * nBuckets keys.
+	nKeys := int(nBuckets * *runtime.HashLoad)
+
+	// Get map to full point with nan keys.
+	for i := 0; i < nKeys; i++ {
+		m[nan] = i
+	}
+	// Trigger grow
+	m[1.0] = 1
+	delete(m, 1.0)
+
+	// Run iterator
+	found := make(map[int]struct{})
+	for _, v := range m {
+		if v != -1 {
+			if _, repeat := found[v]; repeat {
+				t.Fatalf("repeat of value %d", v)
+			}
+			found[v] = struct{}{}
+		}
+		if len(found) == nKeys/2 {
+			// Halfway through iteration, finish grow.
+			for i := 0; i < nBuckets; i++ {
+				delete(m, 1.0)
+			}
+		}
+	}
+	if len(found) != nKeys {
+		t.Fatalf("missing value")
+	}
+}
+
+func TestMapIterOrder(t *testing.T) {
+	for _, n := range [...]int{3, 7, 9, 15} {
+		for i := 0; i < 1000; i++ {
+			// Make m be {0: true, 1: true, ..., n-1: true}.
+			m := make(map[int]bool)
+			for i := 0; i < n; i++ {
+				m[i] = true
+			}
+			// Check that iterating over the map produces at least two different orderings.
+			ord := func() []int {
+				var s []int
+				for key := range m {
+					s = append(s, key)
+				}
+				return s
+			}
+			first := ord()
+			ok := false
+			for try := 0; try < 100; try++ {
+				if !reflect.DeepEqual(first, ord()) {
+					ok = true
+					break
+				}
+			}
+			if !ok {
+				t.Errorf("Map with n=%d elements had consistent iteration order: %v", n, first)
+				break
+			}
+		}
+	}
+}
+
+// Issue 8410
+func TestMapSparseIterOrder(t *testing.T) {
+	// Run several rounds to increase the probability
+	// of failure. One is not enough.
+NextRound:
+	for round := 0; round < 10; round++ {
+		m := make(map[int]bool)
+		// Add 1000 items, remove 980.
+		for i := 0; i < 1000; i++ {
+			m[i] = true
+		}
+		for i := 20; i < 1000; i++ {
+			delete(m, i)
+		}
+
+		var first []int
+		for i := range m {
+			first = append(first, i)
+		}
+
+		// 800 chances to get a different iteration order.
+		// See bug 8736 for why we need so many tries.
+		for n := 0; n < 800; n++ {
+			idx := 0
+			for i := range m {
+				if i != first[idx] {
+					// iteration order changed.
+					continue NextRound
+				}
+				idx++
+			}
+		}
+		t.Fatalf("constant iteration order on round %d: %v", round, first)
+	}
+}
+
+func TestMapStringBytesLookup(t *testing.T) {
+	// Use large string keys to avoid small-allocation coalescing,
+	// which can cause AllocsPerRun to report lower counts than it should.
+	m := map[string]int{
+		"1000000000000000000000000000000000000000000000000": 1,
+		"2000000000000000000000000000000000000000000000000": 2,
+	}
+	buf := []byte("1000000000000000000000000000000000000000000000000")
+	if x := m[string(buf)]; x != 1 {
+		t.Errorf(`m[string([]byte("1"))] = %d, want 1`, x)
+	}
+	buf[0] = '2'
+	if x := m[string(buf)]; x != 2 {
+		t.Errorf(`m[string([]byte("2"))] = %d, want 2`, x)
+	}
+
+	var x int
+	n := testing.AllocsPerRun(100, func() {
+		x += m[string(buf)]
+	})
+	if n != 0 {
+		t.Errorf("AllocsPerRun for m[string(buf)] = %v, want 0", n)
+	}
+
+	x = 0
+	n = testing.AllocsPerRun(100, func() {
+		y, ok := m[string(buf)]
+		if !ok {
+			panic("!ok")
+		}
+		x += y
+	})
+	if n != 0 {
+		t.Errorf("AllocsPerRun for x,ok = m[string(buf)] = %v, want 0", n)
+	}
+}
+
+func TestMapLargeKeyNoPointer(t *testing.T) {
+	const (
+		I = 1000
+		N = 64
+	)
+	type T [N]int
+	m := make(map[T]int)
+	for i := 0; i < I; i++ {
+		var v T
+		for j := 0; j < N; j++ {
+			v[j] = i + j
+		}
+		m[v] = i
+	}
+	runtime.GC()
+	for i := 0; i < I; i++ {
+		var v T
+		for j := 0; j < N; j++ {
+			v[j] = i + j
+		}
+		if m[v] != i {
+			t.Fatalf("corrupted map: want %+v, got %+v", i, m[v])
+		}
+	}
+}
+
+func TestMapLargeValNoPointer(t *testing.T) {
+	const (
+		I = 1000
+		N = 64
+	)
+	type T [N]int
+	m := make(map[int]T)
+	for i := 0; i < I; i++ {
+		var v T
+		for j := 0; j < N; j++ {
+			v[j] = i + j
+		}
+		m[i] = v
+	}
+	runtime.GC()
+	for i := 0; i < I; i++ {
+		var v T
+		for j := 0; j < N; j++ {
+			v[j] = i + j
+		}
+		v1 := m[i]
+		for j := 0; j < N; j++ {
+			if v1[j] != v[j] {
+				t.Fatalf("corrupted map: want %+v, got %+v", v, v1)
+			}
+		}
+	}
+}
+
+// Test that making a map with a large or invalid hint
+// doesn't panic. (Issue 19926).
+func TestIgnoreBogusMapHint(t *testing.T) {
+	for _, hint := range []int64{-1, 1 << 62} {
+		_ = make(map[int]int, hint)
+	}
+}
+
+var mapSink map[int]int
+
+var mapBucketTests = [...]struct {
+	n        int // n is the number of map elements
+	noescape int // number of expected buckets for non-escaping map
+	escape   int // number of expected buckets for escaping map
+}{
+	{-(1 << 30), 1, 1},
+	{-1, 1, 1},
+	{0, 1, 1},
+	{1, 1, 1},
+	{8, 1, 1},
+	{9, 2, 2},
+	{13, 2, 2},
+	{14, 4, 4},
+	{26, 4, 4},
+}
+
+func TestMapBuckets(t *testing.T) {
+	// Test that maps of different sizes have the right number of buckets.
+	// Non-escaping maps with small buckets (like map[int]int) never
+	// have a nil bucket pointer due to starting with preallocated buckets
+	// on the stack. Escaping maps start with a non-nil bucket pointer if
+	// hint size is above bucketCnt and thereby have more than one bucket.
+	// These tests depend on bucketCnt and loadFactor* in map.go.
+	t.Run("mapliteral", func(t *testing.T) {
+		for _, tt := range mapBucketTests {
+			localMap := map[int]int{}
+			if runtime.MapBucketsPointerIsNil(localMap) {
+				t.Errorf("no escape: buckets pointer is nil for non-escaping map")
+			}
+			for i := 0; i < tt.n; i++ {
+				localMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(localMap); got != tt.noescape {
+				t.Errorf("no escape: n=%d want %d buckets, got %d", tt.n, tt.noescape, got)
+			}
+			escapingMap := map[int]int{}
+			if count := runtime.MapBucketsCount(escapingMap); count > 1 && runtime.MapBucketsPointerIsNil(escapingMap) {
+				t.Errorf("escape: buckets pointer is nil for n=%d buckets", count)
+			}
+			for i := 0; i < tt.n; i++ {
+				escapingMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(escapingMap); got != tt.escape {
+				t.Errorf("escape n=%d want %d buckets, got %d", tt.n, tt.escape, got)
+			}
+			mapSink = escapingMap
+		}
+	})
+	t.Run("nohint", func(t *testing.T) {
+		for _, tt := range mapBucketTests {
+			localMap := make(map[int]int)
+			if runtime.MapBucketsPointerIsNil(localMap) {
+				t.Errorf("no escape: buckets pointer is nil for non-escaping map")
+			}
+			for i := 0; i < tt.n; i++ {
+				localMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(localMap); got != tt.noescape {
+				t.Errorf("no escape: n=%d want %d buckets, got %d", tt.n, tt.noescape, got)
+			}
+			escapingMap := make(map[int]int)
+			if count := runtime.MapBucketsCount(escapingMap); count > 1 && runtime.MapBucketsPointerIsNil(escapingMap) {
+				t.Errorf("escape: buckets pointer is nil for n=%d buckets", count)
+			}
+			for i := 0; i < tt.n; i++ {
+				escapingMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(escapingMap); got != tt.escape {
+				t.Errorf("escape: n=%d want %d buckets, got %d", tt.n, tt.escape, got)
+			}
+			mapSink = escapingMap
+		}
+	})
+	t.Run("makemap", func(t *testing.T) {
+		for _, tt := range mapBucketTests {
+			localMap := make(map[int]int, tt.n)
+			if runtime.MapBucketsPointerIsNil(localMap) {
+				t.Errorf("no escape: buckets pointer is nil for non-escaping map")
+			}
+			for i := 0; i < tt.n; i++ {
+				localMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(localMap); got != tt.noescape {
+				t.Errorf("no escape: n=%d want %d buckets, got %d", tt.n, tt.noescape, got)
+			}
+			escapingMap := make(map[int]int, tt.n)
+			if count := runtime.MapBucketsCount(escapingMap); count > 1 && runtime.MapBucketsPointerIsNil(escapingMap) {
+				t.Errorf("escape: buckets pointer is nil for n=%d buckets", count)
+			}
+			for i := 0; i < tt.n; i++ {
+				escapingMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(escapingMap); got != tt.escape {
+				t.Errorf("escape: n=%d want %d buckets, got %d", tt.n, tt.escape, got)
+			}
+			mapSink = escapingMap
+		}
+	})
+	t.Run("makemap64", func(t *testing.T) {
+		for _, tt := range mapBucketTests {
+			localMap := make(map[int]int, int64(tt.n))
+			if runtime.MapBucketsPointerIsNil(localMap) {
+				t.Errorf("no escape: buckets pointer is nil for non-escaping map")
+			}
+			for i := 0; i < tt.n; i++ {
+				localMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(localMap); got != tt.noescape {
+				t.Errorf("no escape: n=%d want %d buckets, got %d", tt.n, tt.noescape, got)
+			}
+			escapingMap := make(map[int]int, tt.n)
+			if count := runtime.MapBucketsCount(escapingMap); count > 1 && runtime.MapBucketsPointerIsNil(escapingMap) {
+				t.Errorf("escape: buckets pointer is nil for n=%d buckets", count)
+			}
+			for i := 0; i < tt.n; i++ {
+				escapingMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(escapingMap); got != tt.escape {
+				t.Errorf("escape: n=%d want %d buckets, got %d", tt.n, tt.escape, got)
+			}
+			mapSink = escapingMap
+		}
+	})
+
+}
+
+func benchmarkMapPop(b *testing.B, n int) {
+	m := map[int]int{}
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < n; j++ {
+			m[j] = j
+		}
+		for j := 0; j < n; j++ {
+			// Use iterator to pop an element.
+			// We want this to be fast, see issue 8412.
+			for k := range m {
+				delete(m, k)
+				break
+			}
+		}
+	}
+}
+
+func BenchmarkMapPop100(b *testing.B)   { benchmarkMapPop(b, 100) }
+func BenchmarkMapPop1000(b *testing.B)  { benchmarkMapPop(b, 1000) }
+func BenchmarkMapPop10000(b *testing.B) { benchmarkMapPop(b, 10000) }
+
+var testNonEscapingMapVariable int = 8
+
+func TestNonEscapingMap(t *testing.T) {
+	n := testing.AllocsPerRun(1000, func() {
+		m := map[int]int{}
+		m[0] = 0
+	})
+	if n != 0 {
+		t.Fatalf("mapliteral: want 0 allocs, got %v", n)
+	}
+	n = testing.AllocsPerRun(1000, func() {
+		m := make(map[int]int)
+		m[0] = 0
+	})
+	if n != 0 {
+		t.Fatalf("no hint: want 0 allocs, got %v", n)
+	}
+	n = testing.AllocsPerRun(1000, func() {
+		m := make(map[int]int, 8)
+		m[0] = 0
+	})
+	if n != 0 {
+		t.Fatalf("with small hint: want 0 allocs, got %v", n)
+	}
+	n = testing.AllocsPerRun(1000, func() {
+		m := make(map[int]int, testNonEscapingMapVariable)
+		m[0] = 0
+	})
+	if n != 0 {
+		t.Fatalf("with variable hint: want 0 allocs, got %v", n)
+	}
+
+}
+
+func benchmarkMapAssignInt32(b *testing.B, n int) {
+	a := make(map[int32]int)
+	for i := 0; i < b.N; i++ {
+		a[int32(i&(n-1))] = i
+	}
+}
+
+func benchmarkMapOperatorAssignInt32(b *testing.B, n int) {
+	a := make(map[int32]int)
+	for i := 0; i < b.N; i++ {
+		a[int32(i&(n-1))] += i
+	}
+}
+
+func benchmarkMapAppendAssignInt32(b *testing.B, n int) {
+	a := make(map[int32][]int)
+	b.ReportAllocs()
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		key := int32(i & (n - 1))
+		a[key] = append(a[key], i)
+	}
+}
+
+func benchmarkMapDeleteInt32(b *testing.B, n int) {
+	a := make(map[int32]int, n)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		if len(a) == 0 {
+			b.StopTimer()
+			for j := i; j < i+n; j++ {
+				a[int32(j)] = j
+			}
+			b.StartTimer()
+		}
+		delete(a, int32(i))
+	}
+}
+
+func benchmarkMapAssignInt64(b *testing.B, n int) {
+	a := make(map[int64]int)
+	for i := 0; i < b.N; i++ {
+		a[int64(i&(n-1))] = i
+	}
+}
+
+func benchmarkMapOperatorAssignInt64(b *testing.B, n int) {
+	a := make(map[int64]int)
+	for i := 0; i < b.N; i++ {
+		a[int64(i&(n-1))] += i
+	}
+}
+
+func benchmarkMapAppendAssignInt64(b *testing.B, n int) {
+	a := make(map[int64][]int)
+	b.ReportAllocs()
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		key := int64(i & (n - 1))
+		a[key] = append(a[key], i)
+	}
+}
+
+func benchmarkMapDeleteInt64(b *testing.B, n int) {
+	a := make(map[int64]int, n)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		if len(a) == 0 {
+			b.StopTimer()
+			for j := i; j < i+n; j++ {
+				a[int64(j)] = j
+			}
+			b.StartTimer()
+		}
+		delete(a, int64(i))
+	}
+}
+
+func benchmarkMapAssignStr(b *testing.B, n int) {
+	k := make([]string, n)
+	for i := 0; i < len(k); i++ {
+		k[i] = strconv.Itoa(i)
+	}
+	b.ResetTimer()
+	a := make(map[string]int)
+	for i := 0; i < b.N; i++ {
+		a[k[i&(n-1)]] = i
+	}
+}
+
+func benchmarkMapOperatorAssignStr(b *testing.B, n int) {
+	k := make([]string, n)
+	for i := 0; i < len(k); i++ {
+		k[i] = strconv.Itoa(i)
+	}
+	b.ResetTimer()
+	a := make(map[string]string)
+	for i := 0; i < b.N; i++ {
+		key := k[i&(n-1)]
+		a[key] += key
+	}
+}
+
+func benchmarkMapAppendAssignStr(b *testing.B, n int) {
+	k := make([]string, n)
+	for i := 0; i < len(k); i++ {
+		k[i] = strconv.Itoa(i)
+	}
+	a := make(map[string][]string)
+	b.ReportAllocs()
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		key := k[i&(n-1)]
+		a[key] = append(a[key], key)
+	}
+}
+
+func benchmarkMapDeleteStr(b *testing.B, n int) {
+	i2s := make([]string, n)
+	for i := 0; i < n; i++ {
+		i2s[i] = strconv.Itoa(i)
+	}
+	a := make(map[string]int, n)
+	b.ResetTimer()
+	k := 0
+	for i := 0; i < b.N; i++ {
+		if len(a) == 0 {
+			b.StopTimer()
+			for j := 0; j < n; j++ {
+				a[i2s[j]] = j
+			}
+			k = i
+			b.StartTimer()
+		}
+		delete(a, i2s[i-k])
+	}
+}
+
+func benchmarkMapDeletePointer(b *testing.B, n int) {
+	i2p := make([]*int, n)
+	for i := 0; i < n; i++ {
+		i2p[i] = new(int)
+	}
+	a := make(map[*int]int, n)
+	b.ResetTimer()
+	k := 0
+	for i := 0; i < b.N; i++ {
+		if len(a) == 0 {
+			b.StopTimer()
+			for j := 0; j < n; j++ {
+				a[i2p[j]] = j
+			}
+			k = i
+			b.StartTimer()
+		}
+		delete(a, i2p[i-k])
+	}
+}
+
+func runWith(f func(*testing.B, int), v ...int) func(*testing.B) {
+	return func(b *testing.B) {
+		for _, n := range v {
+			b.Run(strconv.Itoa(n), func(b *testing.B) { f(b, n) })
+		}
+	}
+}
+
+func BenchmarkMapAssign(b *testing.B) {
+	b.Run("Int32", runWith(benchmarkMapAssignInt32, 1<<8, 1<<16))
+	b.Run("Int64", runWith(benchmarkMapAssignInt64, 1<<8, 1<<16))
+	b.Run("Str", runWith(benchmarkMapAssignStr, 1<<8, 1<<16))
+}
+
+func BenchmarkMapOperatorAssign(b *testing.B) {
+	b.Run("Int32", runWith(benchmarkMapOperatorAssignInt32, 1<<8, 1<<16))
+	b.Run("Int64", runWith(benchmarkMapOperatorAssignInt64, 1<<8, 1<<16))
+	b.Run("Str", runWith(benchmarkMapOperatorAssignStr, 1<<8, 1<<16))
+}
+
+func BenchmarkMapAppendAssign(b *testing.B) {
+	b.Run("Int32", runWith(benchmarkMapAppendAssignInt32, 1<<8, 1<<16))
+	b.Run("Int64", runWith(benchmarkMapAppendAssignInt64, 1<<8, 1<<16))
+	b.Run("Str", runWith(benchmarkMapAppendAssignStr, 1<<8, 1<<16))
+}
+
+func BenchmarkMapDelete(b *testing.B) {
+	b.Run("Int32", runWith(benchmarkMapDeleteInt32, 100, 1000, 10000))
+	b.Run("Int64", runWith(benchmarkMapDeleteInt64, 100, 1000, 10000))
+	b.Run("Str", runWith(benchmarkMapDeleteStr, 100, 1000, 10000))
+	b.Run("Pointer", runWith(benchmarkMapDeletePointer, 100, 1000, 10000))
+}
+
+func TestDeferDeleteSlow(t *testing.T) {
+	ks := []complex128{0, 1, 2, 3}
+
+	m := make(map[interface{}]int)
+	for i, k := range ks {
+		m[k] = i
+	}
+	if len(m) != len(ks) {
+		t.Errorf("want %d elements, got %d", len(ks), len(m))
+	}
+
+	func() {
+		for _, k := range ks {
+			defer delete(m, k)
+		}
+	}()
+	if len(m) != 0 {
+		t.Errorf("want 0 elements, got %d", len(m))
+	}
+}
+
+// TestIncrementAfterDeleteValueInt and other test Issue 25936.
+// Value types int, int32, int64 are affected. Value type string
+// works as expected.
+func TestIncrementAfterDeleteValueInt(t *testing.T) {
+	const key1 = 12
+	const key2 = 13
+
+	m := make(map[int]int)
+	m[key1] = 99
+	delete(m, key1)
+	m[key2]++
+	if n2 := m[key2]; n2 != 1 {
+		t.Errorf("incremented 0 to %d", n2)
+	}
+}
+
+func TestIncrementAfterDeleteValueInt32(t *testing.T) {
+	const key1 = 12
+	const key2 = 13
+
+	m := make(map[int]int32)
+	m[key1] = 99
+	delete(m, key1)
+	m[key2]++
+	if n2 := m[key2]; n2 != 1 {
+		t.Errorf("incremented 0 to %d", n2)
+	}
+}
+
+func TestIncrementAfterDeleteValueInt64(t *testing.T) {
+	const key1 = 12
+	const key2 = 13
+
+	m := make(map[int]int64)
+	m[key1] = 99
+	delete(m, key1)
+	m[key2]++
+	if n2 := m[key2]; n2 != 1 {
+		t.Errorf("incremented 0 to %d", n2)
+	}
+}
+
+func TestIncrementAfterDeleteKeyStringValueInt(t *testing.T) {
+	const key1 = ""
+	const key2 = "x"
+
+	m := make(map[string]int)
+	m[key1] = 99
+	delete(m, key1)
+	m[key2] += 1
+	if n2 := m[key2]; n2 != 1 {
+		t.Errorf("incremented 0 to %d", n2)
+	}
+}
+
+func TestIncrementAfterDeleteKeyValueString(t *testing.T) {
+	const key1 = ""
+	const key2 = "x"
+
+	m := make(map[string]string)
+	m[key1] = "99"
+	delete(m, key1)
+	m[key2] += "1"
+	if n2 := m[key2]; n2 != "1" {
+		t.Errorf("appended '1' to empty (nil) string, got %s", n2)
+	}
+}
+
+// TestIncrementAfterBulkClearKeyStringValueInt tests that map bulk
+// deletion (mapclear) still works as expected. Note that it was not
+// affected by Issue 25936.
+func TestIncrementAfterBulkClearKeyStringValueInt(t *testing.T) {
+	const key1 = ""
+	const key2 = "x"
+
+	m := make(map[string]int)
+	m[key1] = 99
+	for k := range m {
+		delete(m, k)
+	}
+	m[key2]++
+	if n2 := m[key2]; n2 != 1 {
+		t.Errorf("incremented 0 to %d", n2)
+	}
+}
+
+func TestMapTombstones(t *testing.T) {
+	m := map[int]int{}
+	const N = 10000
+	// Fill a map.
+	for i := 0; i < N; i++ {
+		m[i] = i
+	}
+	runtime.MapTombstoneCheck(m)
+	// Delete half of the entries.
+	for i := 0; i < N; i += 2 {
+		delete(m, i)
+	}
+	runtime.MapTombstoneCheck(m)
+	// Add new entries to fill in holes.
+	for i := N; i < 3*N/2; i++ {
+		m[i] = i
+	}
+	runtime.MapTombstoneCheck(m)
+	// Delete everything.
+	for i := 0; i < 3*N/2; i++ {
+		delete(m, i)
+	}
+	runtime.MapTombstoneCheck(m)
+}
+
+type canString int
+
+func (c canString) String() string {
+	return fmt.Sprintf("%d", int(c))
+}
+
+func TestMapInterfaceKey(t *testing.T) {
+	// Test all the special cases in runtime.typehash.
+	type GrabBag struct {
+		f32  float32
+		f64  float64
+		c64  complex64
+		c128 complex128
+		s    string
+		i0   interface{}
+		i1   interface {
+			String() string
+		}
+		a [4]string
+	}
+
+	m := map[interface{}]bool{}
+	// Put a bunch of data in m, so that a bad hash is likely to
+	// lead to a bad bucket, which will lead to a missed lookup.
+	for i := 0; i < 1000; i++ {
+		m[i] = true
+	}
+	m[GrabBag{f32: 1.0}] = true
+	if !m[GrabBag{f32: 1.0}] {
+		panic("f32 not found")
+	}
+	m[GrabBag{f64: 1.0}] = true
+	if !m[GrabBag{f64: 1.0}] {
+		panic("f64 not found")
+	}
+	m[GrabBag{c64: 1.0i}] = true
+	if !m[GrabBag{c64: 1.0i}] {
+		panic("c64 not found")
+	}
+	m[GrabBag{c128: 1.0i}] = true
+	if !m[GrabBag{c128: 1.0i}] {
+		panic("c128 not found")
+	}
+	m[GrabBag{s: "foo"}] = true
+	if !m[GrabBag{s: "foo"}] {
+		panic("string not found")
+	}
+	m[GrabBag{i0: "foo"}] = true
+	if !m[GrabBag{i0: "foo"}] {
+		panic("interface{} not found")
+	}
+	m[GrabBag{i1: canString(5)}] = true
+	if !m[GrabBag{i1: canString(5)}] {
+		panic("interface{String() string} not found")
+	}
+	m[GrabBag{a: [4]string{"foo", "bar", "baz", "bop"}}] = true
+	if !m[GrabBag{a: [4]string{"foo", "bar", "baz", "bop"}}] {
+		panic("array not found")
+	}
+}
diff --git a/src/runtime/mbarrier.go b/src/runtime/mbarrier.go
new file mode 100644
index 0000000..2b5affc
--- /dev/null
+++ b/src/runtime/mbarrier.go
@@ -0,0 +1,327 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: write barriers.
+//
+// For the concurrent garbage collector, the Go compiler implements
+// updates to pointer-valued fields that may be in heap objects by
+// emitting calls to write barriers. The main write barrier for
+// individual pointer writes is gcWriteBarrier and is implemented in
+// assembly. This file contains write barrier entry points for bulk
+// operations. See also mwbbuf.go.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Go uses a hybrid barrier that combines a Yuasa-style deletion
+// barrier—which shades the object whose reference is being
+// overwritten—with Dijkstra insertion barrier—which shades the object
+// whose reference is being written. The insertion part of the barrier
+// is necessary while the calling goroutine's stack is grey. In
+// pseudocode, the barrier is:
+//
+//     writePointer(slot, ptr):
+//         shade(*slot)
+//         if current stack is grey:
+//             shade(ptr)
+//         *slot = ptr
+//
+// slot is the destination in Go code.
+// ptr is the value that goes into the slot in Go code.
+//
+// Shade indicates that it has seen a white pointer by adding the referent
+// to wbuf as well as marking it.
+//
+// The two shades and the condition work together to prevent a mutator
+// from hiding an object from the garbage collector:
+//
+// 1. shade(*slot) prevents a mutator from hiding an object by moving
+// the sole pointer to it from the heap to its stack. If it attempts
+// to unlink an object from the heap, this will shade it.
+//
+// 2. shade(ptr) prevents a mutator from hiding an object by moving
+// the sole pointer to it from its stack into a black object in the
+// heap. If it attempts to install the pointer into a black object,
+// this will shade it.
+//
+// 3. Once a goroutine's stack is black, the shade(ptr) becomes
+// unnecessary. shade(ptr) prevents hiding an object by moving it from
+// the stack to the heap, but this requires first having a pointer
+// hidden on the stack. Immediately after a stack is scanned, it only
+// points to shaded objects, so it's not hiding anything, and the
+// shade(*slot) prevents it from hiding any other pointers on its
+// stack.
+//
+// For a detailed description of this barrier and proof of
+// correctness, see https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md
+//
+//
+//
+// Dealing with memory ordering:
+//
+// Both the Yuasa and Dijkstra barriers can be made conditional on the
+// color of the object containing the slot. We chose not to make these
+// conditional because the cost of ensuring that the object holding
+// the slot doesn't concurrently change color without the mutator
+// noticing seems prohibitive.
+//
+// Consider the following example where the mutator writes into
+// a slot and then loads the slot's mark bit while the GC thread
+// writes to the slot's mark bit and then as part of scanning reads
+// the slot.
+//
+// Initially both [slot] and [slotmark] are 0 (nil)
+// Mutator thread          GC thread
+// st [slot], ptr          st [slotmark], 1
+//
+// ld r1, [slotmark]       ld r2, [slot]
+//
+// Without an expensive memory barrier between the st and the ld, the final
+// result on most HW (including 386/amd64) can be r1==r2==0. This is a classic
+// example of what can happen when loads are allowed to be reordered with older
+// stores (avoiding such reorderings lies at the heart of the classic
+// Peterson/Dekker algorithms for mutual exclusion). Rather than require memory
+// barriers, which will slow down both the mutator and the GC, we always grey
+// the ptr object regardless of the slot's color.
+//
+// Another place where we intentionally omit memory barriers is when
+// accessing mheap_.arena_used to check if a pointer points into the
+// heap. On relaxed memory machines, it's possible for a mutator to
+// extend the size of the heap by updating arena_used, allocate an
+// object from this new region, and publish a pointer to that object,
+// but for tracing running on another processor to observe the pointer
+// but use the old value of arena_used. In this case, tracing will not
+// mark the object, even though it's reachable. However, the mutator
+// is guaranteed to execute a write barrier when it publishes the
+// pointer, so it will take care of marking the object. A general
+// consequence of this is that the garbage collector may cache the
+// value of mheap_.arena_used. (See issue #9984.)
+//
+//
+// Stack writes:
+//
+// The compiler omits write barriers for writes to the current frame,
+// but if a stack pointer has been passed down the call stack, the
+// compiler will generate a write barrier for writes through that
+// pointer (because it doesn't know it's not a heap pointer).
+//
+// One might be tempted to ignore the write barrier if slot points
+// into to the stack. Don't do it! Mark termination only re-scans
+// frames that have potentially been active since the concurrent scan,
+// so it depends on write barriers to track changes to pointers in
+// stack frames that have not been active.
+//
+//
+// Global writes:
+//
+// The Go garbage collector requires write barriers when heap pointers
+// are stored in globals. Many garbage collectors ignore writes to
+// globals and instead pick up global -> heap pointers during
+// termination. This increases pause time, so we instead rely on write
+// barriers for writes to globals so that we don't have to rescan
+// global during mark termination.
+//
+//
+// Publication ordering:
+//
+// The write barrier is *pre-publication*, meaning that the write
+// barrier happens prior to the *slot = ptr write that may make ptr
+// reachable by some goroutine that currently cannot reach it.
+//
+//
+// Signal handler pointer writes:
+//
+// In general, the signal handler cannot safely invoke the write
+// barrier because it may run without a P or even during the write
+// barrier.
+//
+// There is exactly one exception: profbuf.go omits a barrier during
+// signal handler profile logging. That's safe only because of the
+// deletion barrier. See profbuf.go for a detailed argument. If we
+// remove the deletion barrier, we'll have to work out a new way to
+// handle the profile logging.
+
+// typedmemmove copies a value of type t to dst from src.
+// Must be nosplit, see #16026.
+//
+// TODO: Perfect for go:nosplitrec since we can't have a safe point
+// anywhere in the bulk barrier or memmove.
+//
+//go:nosplit
+func typedmemmove(typ *_type, dst, src unsafe.Pointer) {
+	if dst == src {
+		return
+	}
+	if writeBarrier.needed && typ.ptrdata != 0 {
+		bulkBarrierPreWrite(uintptr(dst), uintptr(src), typ.ptrdata)
+	}
+	// There's a race here: if some other goroutine can write to
+	// src, it may change some pointer in src after we've
+	// performed the write barrier but before we perform the
+	// memory copy. This safe because the write performed by that
+	// other goroutine must also be accompanied by a write
+	// barrier, so at worst we've unnecessarily greyed the old
+	// pointer that was in src.
+	memmove(dst, src, typ.size)
+	if writeBarrier.cgo {
+		cgoCheckMemmove(typ, dst, src, 0, typ.size)
+	}
+}
+
+//go:linkname reflect_typedmemmove reflect.typedmemmove
+func reflect_typedmemmove(typ *_type, dst, src unsafe.Pointer) {
+	if raceenabled {
+		raceWriteObjectPC(typ, dst, getcallerpc(), funcPC(reflect_typedmemmove))
+		raceReadObjectPC(typ, src, getcallerpc(), funcPC(reflect_typedmemmove))
+	}
+	if msanenabled {
+		msanwrite(dst, typ.size)
+		msanread(src, typ.size)
+	}
+	typedmemmove(typ, dst, src)
+}
+
+//go:linkname reflectlite_typedmemmove internal/reflectlite.typedmemmove
+func reflectlite_typedmemmove(typ *_type, dst, src unsafe.Pointer) {
+	reflect_typedmemmove(typ, dst, src)
+}
+
+// typedmemmovepartial is like typedmemmove but assumes that
+// dst and src point off bytes into the value and only copies size bytes.
+// off must be a multiple of sys.PtrSize.
+//go:linkname reflect_typedmemmovepartial reflect.typedmemmovepartial
+func reflect_typedmemmovepartial(typ *_type, dst, src unsafe.Pointer, off, size uintptr) {
+	if writeBarrier.needed && typ.ptrdata > off && size >= sys.PtrSize {
+		if off&(sys.PtrSize-1) != 0 {
+			panic("reflect: internal error: misaligned offset")
+		}
+		pwsize := alignDown(size, sys.PtrSize)
+		if poff := typ.ptrdata - off; pwsize > poff {
+			pwsize = poff
+		}
+		bulkBarrierPreWrite(uintptr(dst), uintptr(src), pwsize)
+	}
+
+	memmove(dst, src, size)
+	if writeBarrier.cgo {
+		cgoCheckMemmove(typ, dst, src, off, size)
+	}
+}
+
+// reflectcallmove is invoked by reflectcall to copy the return values
+// out of the stack and into the heap, invoking the necessary write
+// barriers. dst, src, and size describe the return value area to
+// copy. typ describes the entire frame (not just the return values).
+// typ may be nil, which indicates write barriers are not needed.
+//
+// It must be nosplit and must only call nosplit functions because the
+// stack map of reflectcall is wrong.
+//
+//go:nosplit
+func reflectcallmove(typ *_type, dst, src unsafe.Pointer, size uintptr) {
+	if writeBarrier.needed && typ != nil && typ.ptrdata != 0 && size >= sys.PtrSize {
+		bulkBarrierPreWrite(uintptr(dst), uintptr(src), size)
+	}
+	memmove(dst, src, size)
+}
+
+//go:nosplit
+func typedslicecopy(typ *_type, dstPtr unsafe.Pointer, dstLen int, srcPtr unsafe.Pointer, srcLen int) int {
+	n := dstLen
+	if n > srcLen {
+		n = srcLen
+	}
+	if n == 0 {
+		return 0
+	}
+
+	// The compiler emits calls to typedslicecopy before
+	// instrumentation runs, so unlike the other copying and
+	// assignment operations, it's not instrumented in the calling
+	// code and needs its own instrumentation.
+	if raceenabled {
+		callerpc := getcallerpc()
+		pc := funcPC(slicecopy)
+		racewriterangepc(dstPtr, uintptr(n)*typ.size, callerpc, pc)
+		racereadrangepc(srcPtr, uintptr(n)*typ.size, callerpc, pc)
+	}
+	if msanenabled {
+		msanwrite(dstPtr, uintptr(n)*typ.size)
+		msanread(srcPtr, uintptr(n)*typ.size)
+	}
+
+	if writeBarrier.cgo {
+		cgoCheckSliceCopy(typ, dstPtr, srcPtr, n)
+	}
+
+	if dstPtr == srcPtr {
+		return n
+	}
+
+	// Note: No point in checking typ.ptrdata here:
+	// compiler only emits calls to typedslicecopy for types with pointers,
+	// and growslice and reflect_typedslicecopy check for pointers
+	// before calling typedslicecopy.
+	size := uintptr(n) * typ.size
+	if writeBarrier.needed {
+		pwsize := size - typ.size + typ.ptrdata
+		bulkBarrierPreWrite(uintptr(dstPtr), uintptr(srcPtr), pwsize)
+	}
+	// See typedmemmove for a discussion of the race between the
+	// barrier and memmove.
+	memmove(dstPtr, srcPtr, size)
+	return n
+}
+
+//go:linkname reflect_typedslicecopy reflect.typedslicecopy
+func reflect_typedslicecopy(elemType *_type, dst, src slice) int {
+	if elemType.ptrdata == 0 {
+		return slicecopy(dst.array, dst.len, src.array, src.len, elemType.size)
+	}
+	return typedslicecopy(elemType, dst.array, dst.len, src.array, src.len)
+}
+
+// typedmemclr clears the typed memory at ptr with type typ. The
+// memory at ptr must already be initialized (and hence in type-safe
+// state). If the memory is being initialized for the first time, see
+// memclrNoHeapPointers.
+//
+// If the caller knows that typ has pointers, it can alternatively
+// call memclrHasPointers.
+//
+//go:nosplit
+func typedmemclr(typ *_type, ptr unsafe.Pointer) {
+	if writeBarrier.needed && typ.ptrdata != 0 {
+		bulkBarrierPreWrite(uintptr(ptr), 0, typ.ptrdata)
+	}
+	memclrNoHeapPointers(ptr, typ.size)
+}
+
+//go:linkname reflect_typedmemclr reflect.typedmemclr
+func reflect_typedmemclr(typ *_type, ptr unsafe.Pointer) {
+	typedmemclr(typ, ptr)
+}
+
+//go:linkname reflect_typedmemclrpartial reflect.typedmemclrpartial
+func reflect_typedmemclrpartial(typ *_type, ptr unsafe.Pointer, off, size uintptr) {
+	if writeBarrier.needed && typ.ptrdata != 0 {
+		bulkBarrierPreWrite(uintptr(ptr), 0, size)
+	}
+	memclrNoHeapPointers(ptr, size)
+}
+
+// memclrHasPointers clears n bytes of typed memory starting at ptr.
+// The caller must ensure that the type of the object at ptr has
+// pointers, usually by checking typ.ptrdata. However, ptr
+// does not have to point to the start of the allocation.
+//
+//go:nosplit
+func memclrHasPointers(ptr unsafe.Pointer, n uintptr) {
+	bulkBarrierPreWrite(uintptr(ptr), 0, n)
+	memclrNoHeapPointers(ptr, n)
+}
diff --git a/src/runtime/mbitmap.go b/src/runtime/mbitmap.go
new file mode 100644
index 0000000..fbfaae0
--- /dev/null
+++ b/src/runtime/mbitmap.go
@@ -0,0 +1,2026 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: type and heap bitmaps.
+//
+// Stack, data, and bss bitmaps
+//
+// Stack frames and global variables in the data and bss sections are
+// described by bitmaps with 1 bit per pointer-sized word. A "1" bit
+// means the word is a live pointer to be visited by the GC (referred to
+// as "pointer"). A "0" bit means the word should be ignored by GC
+// (referred to as "scalar", though it could be a dead pointer value).
+//
+// Heap bitmap
+//
+// The heap bitmap comprises 2 bits for each pointer-sized word in the heap,
+// stored in the heapArena metadata backing each heap arena.
+// That is, if ha is the heapArena for the arena starting a start,
+// then ha.bitmap[0] holds the 2-bit entries for the four words start
+// through start+3*ptrSize, ha.bitmap[1] holds the entries for
+// start+4*ptrSize through start+7*ptrSize, and so on.
+//
+// In each 2-bit entry, the lower bit is a pointer/scalar bit, just
+// like in the stack/data bitmaps described above. The upper bit
+// indicates scan/dead: a "1" value ("scan") indicates that there may
+// be pointers in later words of the allocation, and a "0" value
+// ("dead") indicates there are no more pointers in the allocation. If
+// the upper bit is 0, the lower bit must also be 0, and this
+// indicates scanning can ignore the rest of the allocation.
+//
+// The 2-bit entries are split when written into the byte, so that the top half
+// of the byte contains 4 high (scan) bits and the bottom half contains 4 low
+// (pointer) bits. This form allows a copy from the 1-bit to the 4-bit form to
+// keep the pointer bits contiguous, instead of having to space them out.
+//
+// The code makes use of the fact that the zero value for a heap
+// bitmap means scalar/dead. This property must be preserved when
+// modifying the encoding.
+//
+// The bitmap for noscan spans is not maintained. Code must ensure
+// that an object is scannable before consulting its bitmap by
+// checking either the noscan bit in the span or by consulting its
+// type's information.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	bitPointer = 1 << 0
+	bitScan    = 1 << 4
+
+	heapBitsShift      = 1     // shift offset between successive bitPointer or bitScan entries
+	wordsPerBitmapByte = 8 / 2 // heap words described by one bitmap byte
+
+	// all scan/pointer bits in a byte
+	bitScanAll    = bitScan | bitScan<<heapBitsShift | bitScan<<(2*heapBitsShift) | bitScan<<(3*heapBitsShift)
+	bitPointerAll = bitPointer | bitPointer<<heapBitsShift | bitPointer<<(2*heapBitsShift) | bitPointer<<(3*heapBitsShift)
+)
+
+// addb returns the byte pointer p+n.
+//go:nowritebarrier
+//go:nosplit
+func addb(p *byte, n uintptr) *byte {
+	// Note: wrote out full expression instead of calling add(p, n)
+	// to reduce the number of temporaries generated by the
+	// compiler for this trivial expression during inlining.
+	return (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(p)) + n))
+}
+
+// subtractb returns the byte pointer p-n.
+//go:nowritebarrier
+//go:nosplit
+func subtractb(p *byte, n uintptr) *byte {
+	// Note: wrote out full expression instead of calling add(p, -n)
+	// to reduce the number of temporaries generated by the
+	// compiler for this trivial expression during inlining.
+	return (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(p)) - n))
+}
+
+// add1 returns the byte pointer p+1.
+//go:nowritebarrier
+//go:nosplit
+func add1(p *byte) *byte {
+	// Note: wrote out full expression instead of calling addb(p, 1)
+	// to reduce the number of temporaries generated by the
+	// compiler for this trivial expression during inlining.
+	return (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(p)) + 1))
+}
+
+// subtract1 returns the byte pointer p-1.
+//go:nowritebarrier
+//
+// nosplit because it is used during write barriers and must not be preempted.
+//go:nosplit
+func subtract1(p *byte) *byte {
+	// Note: wrote out full expression instead of calling subtractb(p, 1)
+	// to reduce the number of temporaries generated by the
+	// compiler for this trivial expression during inlining.
+	return (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(p)) - 1))
+}
+
+// heapBits provides access to the bitmap bits for a single heap word.
+// The methods on heapBits take value receivers so that the compiler
+// can more easily inline calls to those methods and registerize the
+// struct fields independently.
+type heapBits struct {
+	bitp  *uint8
+	shift uint32
+	arena uint32 // Index of heap arena containing bitp
+	last  *uint8 // Last byte arena's bitmap
+}
+
+// Make the compiler check that heapBits.arena is large enough to hold
+// the maximum arena frame number.
+var _ = heapBits{arena: (1<<heapAddrBits)/heapArenaBytes - 1}
+
+// markBits provides access to the mark bit for an object in the heap.
+// bytep points to the byte holding the mark bit.
+// mask is a byte with a single bit set that can be &ed with *bytep
+// to see if the bit has been set.
+// *m.byte&m.mask != 0 indicates the mark bit is set.
+// index can be used along with span information to generate
+// the address of the object in the heap.
+// We maintain one set of mark bits for allocation and one for
+// marking purposes.
+type markBits struct {
+	bytep *uint8
+	mask  uint8
+	index uintptr
+}
+
+//go:nosplit
+func (s *mspan) allocBitsForIndex(allocBitIndex uintptr) markBits {
+	bytep, mask := s.allocBits.bitp(allocBitIndex)
+	return markBits{bytep, mask, allocBitIndex}
+}
+
+// refillAllocCache takes 8 bytes s.allocBits starting at whichByte
+// and negates them so that ctz (count trailing zeros) instructions
+// can be used. It then places these 8 bytes into the cached 64 bit
+// s.allocCache.
+func (s *mspan) refillAllocCache(whichByte uintptr) {
+	bytes := (*[8]uint8)(unsafe.Pointer(s.allocBits.bytep(whichByte)))
+	aCache := uint64(0)
+	aCache |= uint64(bytes[0])
+	aCache |= uint64(bytes[1]) << (1 * 8)
+	aCache |= uint64(bytes[2]) << (2 * 8)
+	aCache |= uint64(bytes[3]) << (3 * 8)
+	aCache |= uint64(bytes[4]) << (4 * 8)
+	aCache |= uint64(bytes[5]) << (5 * 8)
+	aCache |= uint64(bytes[6]) << (6 * 8)
+	aCache |= uint64(bytes[7]) << (7 * 8)
+	s.allocCache = ^aCache
+}
+
+// nextFreeIndex returns the index of the next free object in s at
+// or after s.freeindex.
+// There are hardware instructions that can be used to make this
+// faster if profiling warrants it.
+func (s *mspan) nextFreeIndex() uintptr {
+	sfreeindex := s.freeindex
+	snelems := s.nelems
+	if sfreeindex == snelems {
+		return sfreeindex
+	}
+	if sfreeindex > snelems {
+		throw("s.freeindex > s.nelems")
+	}
+
+	aCache := s.allocCache
+
+	bitIndex := sys.Ctz64(aCache)
+	for bitIndex == 64 {
+		// Move index to start of next cached bits.
+		sfreeindex = (sfreeindex + 64) &^ (64 - 1)
+		if sfreeindex >= snelems {
+			s.freeindex = snelems
+			return snelems
+		}
+		whichByte := sfreeindex / 8
+		// Refill s.allocCache with the next 64 alloc bits.
+		s.refillAllocCache(whichByte)
+		aCache = s.allocCache
+		bitIndex = sys.Ctz64(aCache)
+		// nothing available in cached bits
+		// grab the next 8 bytes and try again.
+	}
+	result := sfreeindex + uintptr(bitIndex)
+	if result >= snelems {
+		s.freeindex = snelems
+		return snelems
+	}
+
+	s.allocCache >>= uint(bitIndex + 1)
+	sfreeindex = result + 1
+
+	if sfreeindex%64 == 0 && sfreeindex != snelems {
+		// We just incremented s.freeindex so it isn't 0.
+		// As each 1 in s.allocCache was encountered and used for allocation
+		// it was shifted away. At this point s.allocCache contains all 0s.
+		// Refill s.allocCache so that it corresponds
+		// to the bits at s.allocBits starting at s.freeindex.
+		whichByte := sfreeindex / 8
+		s.refillAllocCache(whichByte)
+	}
+	s.freeindex = sfreeindex
+	return result
+}
+
+// isFree reports whether the index'th object in s is unallocated.
+//
+// The caller must ensure s.state is mSpanInUse, and there must have
+// been no preemption points since ensuring this (which could allow a
+// GC transition, which would allow the state to change).
+func (s *mspan) isFree(index uintptr) bool {
+	if index < s.freeindex {
+		return false
+	}
+	bytep, mask := s.allocBits.bitp(index)
+	return *bytep&mask == 0
+}
+
+func (s *mspan) objIndex(p uintptr) uintptr {
+	byteOffset := p - s.base()
+	if byteOffset == 0 {
+		return 0
+	}
+	if s.baseMask != 0 {
+		// s.baseMask is non-0, elemsize is a power of two, so shift by s.divShift
+		return byteOffset >> s.divShift
+	}
+	return uintptr(((uint64(byteOffset) >> s.divShift) * uint64(s.divMul)) >> s.divShift2)
+}
+
+func markBitsForAddr(p uintptr) markBits {
+	s := spanOf(p)
+	objIndex := s.objIndex(p)
+	return s.markBitsForIndex(objIndex)
+}
+
+func (s *mspan) markBitsForIndex(objIndex uintptr) markBits {
+	bytep, mask := s.gcmarkBits.bitp(objIndex)
+	return markBits{bytep, mask, objIndex}
+}
+
+func (s *mspan) markBitsForBase() markBits {
+	return markBits{(*uint8)(s.gcmarkBits), uint8(1), 0}
+}
+
+// isMarked reports whether mark bit m is set.
+func (m markBits) isMarked() bool {
+	return *m.bytep&m.mask != 0
+}
+
+// setMarked sets the marked bit in the markbits, atomically.
+func (m markBits) setMarked() {
+	// Might be racing with other updates, so use atomic update always.
+	// We used to be clever here and use a non-atomic update in certain
+	// cases, but it's not worth the risk.
+	atomic.Or8(m.bytep, m.mask)
+}
+
+// setMarkedNonAtomic sets the marked bit in the markbits, non-atomically.
+func (m markBits) setMarkedNonAtomic() {
+	*m.bytep |= m.mask
+}
+
+// clearMarked clears the marked bit in the markbits, atomically.
+func (m markBits) clearMarked() {
+	// Might be racing with other updates, so use atomic update always.
+	// We used to be clever here and use a non-atomic update in certain
+	// cases, but it's not worth the risk.
+	atomic.And8(m.bytep, ^m.mask)
+}
+
+// markBitsForSpan returns the markBits for the span base address base.
+func markBitsForSpan(base uintptr) (mbits markBits) {
+	mbits = markBitsForAddr(base)
+	if mbits.mask != 1 {
+		throw("markBitsForSpan: unaligned start")
+	}
+	return mbits
+}
+
+// advance advances the markBits to the next object in the span.
+func (m *markBits) advance() {
+	if m.mask == 1<<7 {
+		m.bytep = (*uint8)(unsafe.Pointer(uintptr(unsafe.Pointer(m.bytep)) + 1))
+		m.mask = 1
+	} else {
+		m.mask = m.mask << 1
+	}
+	m.index++
+}
+
+// heapBitsForAddr returns the heapBits for the address addr.
+// The caller must ensure addr is in an allocated span.
+// In particular, be careful not to point past the end of an object.
+//
+// nosplit because it is used during write barriers and must not be preempted.
+//go:nosplit
+func heapBitsForAddr(addr uintptr) (h heapBits) {
+	// 2 bits per word, 4 pairs per byte, and a mask is hard coded.
+	arena := arenaIndex(addr)
+	ha := mheap_.arenas[arena.l1()][arena.l2()]
+	// The compiler uses a load for nil checking ha, but in this
+	// case we'll almost never hit that cache line again, so it
+	// makes more sense to do a value check.
+	if ha == nil {
+		// addr is not in the heap. Return nil heapBits, which
+		// we expect to crash in the caller.
+		return
+	}
+	h.bitp = &ha.bitmap[(addr/(sys.PtrSize*4))%heapArenaBitmapBytes]
+	h.shift = uint32((addr / sys.PtrSize) & 3)
+	h.arena = uint32(arena)
+	h.last = &ha.bitmap[len(ha.bitmap)-1]
+	return
+}
+
+// badPointer throws bad pointer in heap panic.
+func badPointer(s *mspan, p, refBase, refOff uintptr) {
+	// Typically this indicates an incorrect use
+	// of unsafe or cgo to store a bad pointer in
+	// the Go heap. It may also indicate a runtime
+	// bug.
+	//
+	// TODO(austin): We could be more aggressive
+	// and detect pointers to unallocated objects
+	// in allocated spans.
+	printlock()
+	print("runtime: pointer ", hex(p))
+	state := s.state.get()
+	if state != mSpanInUse {
+		print(" to unallocated span")
+	} else {
+		print(" to unused region of span")
+	}
+	print(" span.base()=", hex(s.base()), " span.limit=", hex(s.limit), " span.state=", state, "\n")
+	if refBase != 0 {
+		print("runtime: found in object at *(", hex(refBase), "+", hex(refOff), ")\n")
+		gcDumpObject("object", refBase, refOff)
+	}
+	getg().m.traceback = 2
+	throw("found bad pointer in Go heap (incorrect use of unsafe or cgo?)")
+}
+
+// findObject returns the base address for the heap object containing
+// the address p, the object's span, and the index of the object in s.
+// If p does not point into a heap object, it returns base == 0.
+//
+// If p points is an invalid heap pointer and debug.invalidptr != 0,
+// findObject panics.
+//
+// refBase and refOff optionally give the base address of the object
+// in which the pointer p was found and the byte offset at which it
+// was found. These are used for error reporting.
+//
+// It is nosplit so it is safe for p to be a pointer to the current goroutine's stack.
+// Since p is a uintptr, it would not be adjusted if the stack were to move.
+//go:nosplit
+func findObject(p, refBase, refOff uintptr) (base uintptr, s *mspan, objIndex uintptr) {
+	s = spanOf(p)
+	// If s is nil, the virtual address has never been part of the heap.
+	// This pointer may be to some mmap'd region, so we allow it.
+	if s == nil {
+		return
+	}
+	// If p is a bad pointer, it may not be in s's bounds.
+	//
+	// Check s.state to synchronize with span initialization
+	// before checking other fields. See also spanOfHeap.
+	if state := s.state.get(); state != mSpanInUse || p < s.base() || p >= s.limit {
+		// Pointers into stacks are also ok, the runtime manages these explicitly.
+		if state == mSpanManual {
+			return
+		}
+		// The following ensures that we are rigorous about what data
+		// structures hold valid pointers.
+		if debug.invalidptr != 0 {
+			badPointer(s, p, refBase, refOff)
+		}
+		return
+	}
+	// If this span holds object of a power of 2 size, just mask off the bits to
+	// the interior of the object. Otherwise use the size to get the base.
+	if s.baseMask != 0 {
+		// optimize for power of 2 sized objects.
+		base = s.base()
+		base = base + (p-base)&uintptr(s.baseMask)
+		objIndex = (base - s.base()) >> s.divShift
+		// base = p & s.baseMask is faster for small spans,
+		// but doesn't work for large spans.
+		// Overall, it's faster to use the more general computation above.
+	} else {
+		base = s.base()
+		if p-base >= s.elemsize {
+			// n := (p - base) / s.elemsize, using division by multiplication
+			objIndex = uintptr(p-base) >> s.divShift * uintptr(s.divMul) >> s.divShift2
+			base += objIndex * s.elemsize
+		}
+	}
+	return
+}
+
+// next returns the heapBits describing the next pointer-sized word in memory.
+// That is, if h describes address p, h.next() describes p+ptrSize.
+// Note that next does not modify h. The caller must record the result.
+//
+// nosplit because it is used during write barriers and must not be preempted.
+//go:nosplit
+func (h heapBits) next() heapBits {
+	if h.shift < 3*heapBitsShift {
+		h.shift += heapBitsShift
+	} else if h.bitp != h.last {
+		h.bitp, h.shift = add1(h.bitp), 0
+	} else {
+		// Move to the next arena.
+		return h.nextArena()
+	}
+	return h
+}
+
+// nextArena advances h to the beginning of the next heap arena.
+//
+// This is a slow-path helper to next. gc's inliner knows that
+// heapBits.next can be inlined even though it calls this. This is
+// marked noinline so it doesn't get inlined into next and cause next
+// to be too big to inline.
+//
+//go:nosplit
+//go:noinline
+func (h heapBits) nextArena() heapBits {
+	h.arena++
+	ai := arenaIdx(h.arena)
+	l2 := mheap_.arenas[ai.l1()]
+	if l2 == nil {
+		// We just passed the end of the object, which
+		// was also the end of the heap. Poison h. It
+		// should never be dereferenced at this point.
+		return heapBits{}
+	}
+	ha := l2[ai.l2()]
+	if ha == nil {
+		return heapBits{}
+	}
+	h.bitp, h.shift = &ha.bitmap[0], 0
+	h.last = &ha.bitmap[len(ha.bitmap)-1]
+	return h
+}
+
+// forward returns the heapBits describing n pointer-sized words ahead of h in memory.
+// That is, if h describes address p, h.forward(n) describes p+n*ptrSize.
+// h.forward(1) is equivalent to h.next(), just slower.
+// Note that forward does not modify h. The caller must record the result.
+// bits returns the heap bits for the current word.
+//go:nosplit
+func (h heapBits) forward(n uintptr) heapBits {
+	n += uintptr(h.shift) / heapBitsShift
+	nbitp := uintptr(unsafe.Pointer(h.bitp)) + n/4
+	h.shift = uint32(n%4) * heapBitsShift
+	if nbitp <= uintptr(unsafe.Pointer(h.last)) {
+		h.bitp = (*uint8)(unsafe.Pointer(nbitp))
+		return h
+	}
+
+	// We're in a new heap arena.
+	past := nbitp - (uintptr(unsafe.Pointer(h.last)) + 1)
+	h.arena += 1 + uint32(past/heapArenaBitmapBytes)
+	ai := arenaIdx(h.arena)
+	if l2 := mheap_.arenas[ai.l1()]; l2 != nil && l2[ai.l2()] != nil {
+		a := l2[ai.l2()]
+		h.bitp = &a.bitmap[past%heapArenaBitmapBytes]
+		h.last = &a.bitmap[len(a.bitmap)-1]
+	} else {
+		h.bitp, h.last = nil, nil
+	}
+	return h
+}
+
+// forwardOrBoundary is like forward, but stops at boundaries between
+// contiguous sections of the bitmap. It returns the number of words
+// advanced over, which will be <= n.
+func (h heapBits) forwardOrBoundary(n uintptr) (heapBits, uintptr) {
+	maxn := 4 * ((uintptr(unsafe.Pointer(h.last)) + 1) - uintptr(unsafe.Pointer(h.bitp)))
+	if n > maxn {
+		n = maxn
+	}
+	return h.forward(n), n
+}
+
+// The caller can test morePointers and isPointer by &-ing with bitScan and bitPointer.
+// The result includes in its higher bits the bits for subsequent words
+// described by the same bitmap byte.
+//
+// nosplit because it is used during write barriers and must not be preempted.
+//go:nosplit
+func (h heapBits) bits() uint32 {
+	// The (shift & 31) eliminates a test and conditional branch
+	// from the generated code.
+	return uint32(*h.bitp) >> (h.shift & 31)
+}
+
+// morePointers reports whether this word and all remaining words in this object
+// are scalars.
+// h must not describe the second word of the object.
+func (h heapBits) morePointers() bool {
+	return h.bits()&bitScan != 0
+}
+
+// isPointer reports whether the heap bits describe a pointer word.
+//
+// nosplit because it is used during write barriers and must not be preempted.
+//go:nosplit
+func (h heapBits) isPointer() bool {
+	return h.bits()&bitPointer != 0
+}
+
+// bulkBarrierPreWrite executes a write barrier
+// for every pointer slot in the memory range [src, src+size),
+// using pointer/scalar information from [dst, dst+size).
+// This executes the write barriers necessary before a memmove.
+// src, dst, and size must be pointer-aligned.
+// The range [dst, dst+size) must lie within a single object.
+// It does not perform the actual writes.
+//
+// As a special case, src == 0 indicates that this is being used for a
+// memclr. bulkBarrierPreWrite will pass 0 for the src of each write
+// barrier.
+//
+// Callers should call bulkBarrierPreWrite immediately before
+// calling memmove(dst, src, size). This function is marked nosplit
+// to avoid being preempted; the GC must not stop the goroutine
+// between the memmove and the execution of the barriers.
+// The caller is also responsible for cgo pointer checks if this
+// may be writing Go pointers into non-Go memory.
+//
+// The pointer bitmap is not maintained for allocations containing
+// no pointers at all; any caller of bulkBarrierPreWrite must first
+// make sure the underlying allocation contains pointers, usually
+// by checking typ.ptrdata.
+//
+// Callers must perform cgo checks if writeBarrier.cgo.
+//
+//go:nosplit
+func bulkBarrierPreWrite(dst, src, size uintptr) {
+	if (dst|src|size)&(sys.PtrSize-1) != 0 {
+		throw("bulkBarrierPreWrite: unaligned arguments")
+	}
+	if !writeBarrier.needed {
+		return
+	}
+	if s := spanOf(dst); s == nil {
+		// If dst is a global, use the data or BSS bitmaps to
+		// execute write barriers.
+		for _, datap := range activeModules() {
+			if datap.data <= dst && dst < datap.edata {
+				bulkBarrierBitmap(dst, src, size, dst-datap.data, datap.gcdatamask.bytedata)
+				return
+			}
+		}
+		for _, datap := range activeModules() {
+			if datap.bss <= dst && dst < datap.ebss {
+				bulkBarrierBitmap(dst, src, size, dst-datap.bss, datap.gcbssmask.bytedata)
+				return
+			}
+		}
+		return
+	} else if s.state.get() != mSpanInUse || dst < s.base() || s.limit <= dst {
+		// dst was heap memory at some point, but isn't now.
+		// It can't be a global. It must be either our stack,
+		// or in the case of direct channel sends, it could be
+		// another stack. Either way, no need for barriers.
+		// This will also catch if dst is in a freed span,
+		// though that should never have.
+		return
+	}
+
+	buf := &getg().m.p.ptr().wbBuf
+	h := heapBitsForAddr(dst)
+	if src == 0 {
+		for i := uintptr(0); i < size; i += sys.PtrSize {
+			if h.isPointer() {
+				dstx := (*uintptr)(unsafe.Pointer(dst + i))
+				if !buf.putFast(*dstx, 0) {
+					wbBufFlush(nil, 0)
+				}
+			}
+			h = h.next()
+		}
+	} else {
+		for i := uintptr(0); i < size; i += sys.PtrSize {
+			if h.isPointer() {
+				dstx := (*uintptr)(unsafe.Pointer(dst + i))
+				srcx := (*uintptr)(unsafe.Pointer(src + i))
+				if !buf.putFast(*dstx, *srcx) {
+					wbBufFlush(nil, 0)
+				}
+			}
+			h = h.next()
+		}
+	}
+}
+
+// bulkBarrierPreWriteSrcOnly is like bulkBarrierPreWrite but
+// does not execute write barriers for [dst, dst+size).
+//
+// In addition to the requirements of bulkBarrierPreWrite
+// callers need to ensure [dst, dst+size) is zeroed.
+//
+// This is used for special cases where e.g. dst was just
+// created and zeroed with malloc.
+//go:nosplit
+func bulkBarrierPreWriteSrcOnly(dst, src, size uintptr) {
+	if (dst|src|size)&(sys.PtrSize-1) != 0 {
+		throw("bulkBarrierPreWrite: unaligned arguments")
+	}
+	if !writeBarrier.needed {
+		return
+	}
+	buf := &getg().m.p.ptr().wbBuf
+	h := heapBitsForAddr(dst)
+	for i := uintptr(0); i < size; i += sys.PtrSize {
+		if h.isPointer() {
+			srcx := (*uintptr)(unsafe.Pointer(src + i))
+			if !buf.putFast(0, *srcx) {
+				wbBufFlush(nil, 0)
+			}
+		}
+		h = h.next()
+	}
+}
+
+// bulkBarrierBitmap executes write barriers for copying from [src,
+// src+size) to [dst, dst+size) using a 1-bit pointer bitmap. src is
+// assumed to start maskOffset bytes into the data covered by the
+// bitmap in bits (which may not be a multiple of 8).
+//
+// This is used by bulkBarrierPreWrite for writes to data and BSS.
+//
+//go:nosplit
+func bulkBarrierBitmap(dst, src, size, maskOffset uintptr, bits *uint8) {
+	word := maskOffset / sys.PtrSize
+	bits = addb(bits, word/8)
+	mask := uint8(1) << (word % 8)
+
+	buf := &getg().m.p.ptr().wbBuf
+	for i := uintptr(0); i < size; i += sys.PtrSize {
+		if mask == 0 {
+			bits = addb(bits, 1)
+			if *bits == 0 {
+				// Skip 8 words.
+				i += 7 * sys.PtrSize
+				continue
+			}
+			mask = 1
+		}
+		if *bits&mask != 0 {
+			dstx := (*uintptr)(unsafe.Pointer(dst + i))
+			if src == 0 {
+				if !buf.putFast(*dstx, 0) {
+					wbBufFlush(nil, 0)
+				}
+			} else {
+				srcx := (*uintptr)(unsafe.Pointer(src + i))
+				if !buf.putFast(*dstx, *srcx) {
+					wbBufFlush(nil, 0)
+				}
+			}
+		}
+		mask <<= 1
+	}
+}
+
+// typeBitsBulkBarrier executes a write barrier for every
+// pointer that would be copied from [src, src+size) to [dst,
+// dst+size) by a memmove using the type bitmap to locate those
+// pointer slots.
+//
+// The type typ must correspond exactly to [src, src+size) and [dst, dst+size).
+// dst, src, and size must be pointer-aligned.
+// The type typ must have a plain bitmap, not a GC program.
+// The only use of this function is in channel sends, and the
+// 64 kB channel element limit takes care of this for us.
+//
+// Must not be preempted because it typically runs right before memmove,
+// and the GC must observe them as an atomic action.
+//
+// Callers must perform cgo checks if writeBarrier.cgo.
+//
+//go:nosplit
+func typeBitsBulkBarrier(typ *_type, dst, src, size uintptr) {
+	if typ == nil {
+		throw("runtime: typeBitsBulkBarrier without type")
+	}
+	if typ.size != size {
+		println("runtime: typeBitsBulkBarrier with type ", typ.string(), " of size ", typ.size, " but memory size", size)
+		throw("runtime: invalid typeBitsBulkBarrier")
+	}
+	if typ.kind&kindGCProg != 0 {
+		println("runtime: typeBitsBulkBarrier with type ", typ.string(), " with GC prog")
+		throw("runtime: invalid typeBitsBulkBarrier")
+	}
+	if !writeBarrier.needed {
+		return
+	}
+	ptrmask := typ.gcdata
+	buf := &getg().m.p.ptr().wbBuf
+	var bits uint32
+	for i := uintptr(0); i < typ.ptrdata; i += sys.PtrSize {
+		if i&(sys.PtrSize*8-1) == 0 {
+			bits = uint32(*ptrmask)
+			ptrmask = addb(ptrmask, 1)
+		} else {
+			bits = bits >> 1
+		}
+		if bits&1 != 0 {
+			dstx := (*uintptr)(unsafe.Pointer(dst + i))
+			srcx := (*uintptr)(unsafe.Pointer(src + i))
+			if !buf.putFast(*dstx, *srcx) {
+				wbBufFlush(nil, 0)
+			}
+		}
+	}
+}
+
+// The methods operating on spans all require that h has been returned
+// by heapBitsForSpan and that size, n, total are the span layout description
+// returned by the mspan's layout method.
+// If total > size*n, it means that there is extra leftover memory in the span,
+// usually due to rounding.
+//
+// TODO(rsc): Perhaps introduce a different heapBitsSpan type.
+
+// initSpan initializes the heap bitmap for a span.
+// If this is a span of pointer-sized objects, it initializes all
+// words to pointer/scan.
+// Otherwise, it initializes all words to scalar/dead.
+func (h heapBits) initSpan(s *mspan) {
+	// Clear bits corresponding to objects.
+	nw := (s.npages << _PageShift) / sys.PtrSize
+	if nw%wordsPerBitmapByte != 0 {
+		throw("initSpan: unaligned length")
+	}
+	if h.shift != 0 {
+		throw("initSpan: unaligned base")
+	}
+	isPtrs := sys.PtrSize == 8 && s.elemsize == sys.PtrSize
+	for nw > 0 {
+		hNext, anw := h.forwardOrBoundary(nw)
+		nbyte := anw / wordsPerBitmapByte
+		if isPtrs {
+			bitp := h.bitp
+			for i := uintptr(0); i < nbyte; i++ {
+				*bitp = bitPointerAll | bitScanAll
+				bitp = add1(bitp)
+			}
+		} else {
+			memclrNoHeapPointers(unsafe.Pointer(h.bitp), nbyte)
+		}
+		h = hNext
+		nw -= anw
+	}
+}
+
+// countAlloc returns the number of objects allocated in span s by
+// scanning the allocation bitmap.
+func (s *mspan) countAlloc() int {
+	count := 0
+	bytes := divRoundUp(s.nelems, 8)
+	// Iterate over each 8-byte chunk and count allocations
+	// with an intrinsic. Note that newMarkBits guarantees that
+	// gcmarkBits will be 8-byte aligned, so we don't have to
+	// worry about edge cases, irrelevant bits will simply be zero.
+	for i := uintptr(0); i < bytes; i += 8 {
+		// Extract 64 bits from the byte pointer and get a OnesCount.
+		// Note that the unsafe cast here doesn't preserve endianness,
+		// but that's OK. We only care about how many bits are 1, not
+		// about the order we discover them in.
+		mrkBits := *(*uint64)(unsafe.Pointer(s.gcmarkBits.bytep(i)))
+		count += sys.OnesCount64(mrkBits)
+	}
+	return count
+}
+
+// heapBitsSetType records that the new allocation [x, x+size)
+// holds in [x, x+dataSize) one or more values of type typ.
+// (The number of values is given by dataSize / typ.size.)
+// If dataSize < size, the fragment [x+dataSize, x+size) is
+// recorded as non-pointer data.
+// It is known that the type has pointers somewhere;
+// malloc does not call heapBitsSetType when there are no pointers,
+// because all free objects are marked as noscan during
+// heapBitsSweepSpan.
+//
+// There can only be one allocation from a given span active at a time,
+// and the bitmap for a span always falls on byte boundaries,
+// so there are no write-write races for access to the heap bitmap.
+// Hence, heapBitsSetType can access the bitmap without atomics.
+//
+// There can be read-write races between heapBitsSetType and things
+// that read the heap bitmap like scanobject. However, since
+// heapBitsSetType is only used for objects that have not yet been
+// made reachable, readers will ignore bits being modified by this
+// function. This does mean this function cannot transiently modify
+// bits that belong to neighboring objects. Also, on weakly-ordered
+// machines, callers must execute a store/store (publication) barrier
+// between calling this function and making the object reachable.
+func heapBitsSetType(x, size, dataSize uintptr, typ *_type) {
+	const doubleCheck = false // slow but helpful; enable to test modifications to this code
+
+	const (
+		mask1 = bitPointer | bitScan                        // 00010001
+		mask2 = bitPointer | bitScan | mask1<<heapBitsShift // 00110011
+		mask3 = bitPointer | bitScan | mask2<<heapBitsShift // 01110111
+	)
+
+	// dataSize is always size rounded up to the next malloc size class,
+	// except in the case of allocating a defer block, in which case
+	// size is sizeof(_defer{}) (at least 6 words) and dataSize may be
+	// arbitrarily larger.
+	//
+	// The checks for size == sys.PtrSize and size == 2*sys.PtrSize can therefore
+	// assume that dataSize == size without checking it explicitly.
+
+	if sys.PtrSize == 8 && size == sys.PtrSize {
+		// It's one word and it has pointers, it must be a pointer.
+		// Since all allocated one-word objects are pointers
+		// (non-pointers are aggregated into tinySize allocations),
+		// initSpan sets the pointer bits for us. Nothing to do here.
+		if doubleCheck {
+			h := heapBitsForAddr(x)
+			if !h.isPointer() {
+				throw("heapBitsSetType: pointer bit missing")
+			}
+			if !h.morePointers() {
+				throw("heapBitsSetType: scan bit missing")
+			}
+		}
+		return
+	}
+
+	h := heapBitsForAddr(x)
+	ptrmask := typ.gcdata // start of 1-bit pointer mask (or GC program, handled below)
+
+	// 2-word objects only have 4 bitmap bits and 3-word objects only have 6 bitmap bits.
+	// Therefore, these objects share a heap bitmap byte with the objects next to them.
+	// These are called out as a special case primarily so the code below can assume all
+	// objects are at least 4 words long and that their bitmaps start either at the beginning
+	// of a bitmap byte, or half-way in (h.shift of 0 and 2 respectively).
+
+	if size == 2*sys.PtrSize {
+		if typ.size == sys.PtrSize {
+			// We're allocating a block big enough to hold two pointers.
+			// On 64-bit, that means the actual object must be two pointers,
+			// or else we'd have used the one-pointer-sized block.
+			// On 32-bit, however, this is the 8-byte block, the smallest one.
+			// So it could be that we're allocating one pointer and this was
+			// just the smallest block available. Distinguish by checking dataSize.
+			// (In general the number of instances of typ being allocated is
+			// dataSize/typ.size.)
+			if sys.PtrSize == 4 && dataSize == sys.PtrSize {
+				// 1 pointer object. On 32-bit machines clear the bit for the
+				// unused second word.
+				*h.bitp &^= (bitPointer | bitScan | (bitPointer|bitScan)<<heapBitsShift) << h.shift
+				*h.bitp |= (bitPointer | bitScan) << h.shift
+			} else {
+				// 2-element array of pointer.
+				*h.bitp |= (bitPointer | bitScan | (bitPointer|bitScan)<<heapBitsShift) << h.shift
+			}
+			return
+		}
+		// Otherwise typ.size must be 2*sys.PtrSize,
+		// and typ.kind&kindGCProg == 0.
+		if doubleCheck {
+			if typ.size != 2*sys.PtrSize || typ.kind&kindGCProg != 0 {
+				print("runtime: heapBitsSetType size=", size, " but typ.size=", typ.size, " gcprog=", typ.kind&kindGCProg != 0, "\n")
+				throw("heapBitsSetType")
+			}
+		}
+		b := uint32(*ptrmask)
+		hb := b & 3
+		hb |= bitScanAll & ((bitScan << (typ.ptrdata / sys.PtrSize)) - 1)
+		// Clear the bits for this object so we can set the
+		// appropriate ones.
+		*h.bitp &^= (bitPointer | bitScan | ((bitPointer | bitScan) << heapBitsShift)) << h.shift
+		*h.bitp |= uint8(hb << h.shift)
+		return
+	} else if size == 3*sys.PtrSize {
+		b := uint8(*ptrmask)
+		if doubleCheck {
+			if b == 0 {
+				println("runtime: invalid type ", typ.string())
+				throw("heapBitsSetType: called with non-pointer type")
+			}
+			if sys.PtrSize != 8 {
+				throw("heapBitsSetType: unexpected 3 pointer wide size class on 32 bit")
+			}
+			if typ.kind&kindGCProg != 0 {
+				throw("heapBitsSetType: unexpected GC prog for 3 pointer wide size class")
+			}
+			if typ.size == 2*sys.PtrSize {
+				print("runtime: heapBitsSetType size=", size, " but typ.size=", typ.size, "\n")
+				throw("heapBitsSetType: inconsistent object sizes")
+			}
+		}
+		if typ.size == sys.PtrSize {
+			// The type contains a pointer otherwise heapBitsSetType wouldn't have been called.
+			// Since the type is only 1 pointer wide and contains a pointer, its gcdata must be exactly 1.
+			if doubleCheck && *typ.gcdata != 1 {
+				print("runtime: heapBitsSetType size=", size, " typ.size=", typ.size, "but *typ.gcdata", *typ.gcdata, "\n")
+				throw("heapBitsSetType: unexpected gcdata for 1 pointer wide type size in 3 pointer wide size class")
+			}
+			// 3 element array of pointers. Unrolling ptrmask 3 times into p yields 00000111.
+			b = 7
+		}
+
+		hb := b & 7
+		// Set bitScan bits for all pointers.
+		hb |= hb << wordsPerBitmapByte
+		// First bitScan bit is always set since the type contains pointers.
+		hb |= bitScan
+		// Second bitScan bit needs to also be set if the third bitScan bit is set.
+		hb |= hb & (bitScan << (2 * heapBitsShift)) >> 1
+
+		// For h.shift > 1 heap bits cross a byte boundary and need to be written part
+		// to h.bitp and part to the next h.bitp.
+		switch h.shift {
+		case 0:
+			*h.bitp &^= mask3 << 0
+			*h.bitp |= hb << 0
+		case 1:
+			*h.bitp &^= mask3 << 1
+			*h.bitp |= hb << 1
+		case 2:
+			*h.bitp &^= mask2 << 2
+			*h.bitp |= (hb & mask2) << 2
+			// Two words written to the first byte.
+			// Advance two words to get to the next byte.
+			h = h.next().next()
+			*h.bitp &^= mask1
+			*h.bitp |= (hb >> 2) & mask1
+		case 3:
+			*h.bitp &^= mask1 << 3
+			*h.bitp |= (hb & mask1) << 3
+			// One word written to the first byte.
+			// Advance one word to get to the next byte.
+			h = h.next()
+			*h.bitp &^= mask2
+			*h.bitp |= (hb >> 1) & mask2
+		}
+		return
+	}
+
+	// Copy from 1-bit ptrmask into 2-bit bitmap.
+	// The basic approach is to use a single uintptr as a bit buffer,
+	// alternating between reloading the buffer and writing bitmap bytes.
+	// In general, one load can supply two bitmap byte writes.
+	// This is a lot of lines of code, but it compiles into relatively few
+	// machine instructions.
+
+	outOfPlace := false
+	if arenaIndex(x+size-1) != arenaIdx(h.arena) || (doubleCheck && fastrand()%2 == 0) {
+		// This object spans heap arenas, so the bitmap may be
+		// discontiguous. Unroll it into the object instead
+		// and then copy it out.
+		//
+		// In doubleCheck mode, we randomly do this anyway to
+		// stress test the bitmap copying path.
+		outOfPlace = true
+		h.bitp = (*uint8)(unsafe.Pointer(x))
+		h.last = nil
+	}
+
+	var (
+		// Ptrmask input.
+		p     *byte   // last ptrmask byte read
+		b     uintptr // ptrmask bits already loaded
+		nb    uintptr // number of bits in b at next read
+		endp  *byte   // final ptrmask byte to read (then repeat)
+		endnb uintptr // number of valid bits in *endp
+		pbits uintptr // alternate source of bits
+
+		// Heap bitmap output.
+		w     uintptr // words processed
+		nw    uintptr // number of words to process
+		hbitp *byte   // next heap bitmap byte to write
+		hb    uintptr // bits being prepared for *hbitp
+	)
+
+	hbitp = h.bitp
+
+	// Handle GC program. Delayed until this part of the code
+	// so that we can use the same double-checking mechanism
+	// as the 1-bit case. Nothing above could have encountered
+	// GC programs: the cases were all too small.
+	if typ.kind&kindGCProg != 0 {
+		heapBitsSetTypeGCProg(h, typ.ptrdata, typ.size, dataSize, size, addb(typ.gcdata, 4))
+		if doubleCheck {
+			// Double-check the heap bits written by GC program
+			// by running the GC program to create a 1-bit pointer mask
+			// and then jumping to the double-check code below.
+			// This doesn't catch bugs shared between the 1-bit and 4-bit
+			// GC program execution, but it does catch mistakes specific
+			// to just one of those and bugs in heapBitsSetTypeGCProg's
+			// implementation of arrays.
+			lock(&debugPtrmask.lock)
+			if debugPtrmask.data == nil {
+				debugPtrmask.data = (*byte)(persistentalloc(1<<20, 1, &memstats.other_sys))
+			}
+			ptrmask = debugPtrmask.data
+			runGCProg(addb(typ.gcdata, 4), nil, ptrmask, 1)
+		}
+		goto Phase4
+	}
+
+	// Note about sizes:
+	//
+	// typ.size is the number of words in the object,
+	// and typ.ptrdata is the number of words in the prefix
+	// of the object that contains pointers. That is, the final
+	// typ.size - typ.ptrdata words contain no pointers.
+	// This allows optimization of a common pattern where
+	// an object has a small header followed by a large scalar
+	// buffer. If we know the pointers are over, we don't have
+	// to scan the buffer's heap bitmap at all.
+	// The 1-bit ptrmasks are sized to contain only bits for
+	// the typ.ptrdata prefix, zero padded out to a full byte
+	// of bitmap. This code sets nw (below) so that heap bitmap
+	// bits are only written for the typ.ptrdata prefix; if there is
+	// more room in the allocated object, the next heap bitmap
+	// entry is a 00, indicating that there are no more pointers
+	// to scan. So only the ptrmask for the ptrdata bytes is needed.
+	//
+	// Replicated copies are not as nice: if there is an array of
+	// objects with scalar tails, all but the last tail does have to
+	// be initialized, because there is no way to say "skip forward".
+	// However, because of the possibility of a repeated type with
+	// size not a multiple of 4 pointers (one heap bitmap byte),
+	// the code already must handle the last ptrmask byte specially
+	// by treating it as containing only the bits for endnb pointers,
+	// where endnb <= 4. We represent large scalar tails that must
+	// be expanded in the replication by setting endnb larger than 4.
+	// This will have the effect of reading many bits out of b,
+	// but once the real bits are shifted out, b will supply as many
+	// zero bits as we try to read, which is exactly what we need.
+
+	p = ptrmask
+	if typ.size < dataSize {
+		// Filling in bits for an array of typ.
+		// Set up for repetition of ptrmask during main loop.
+		// Note that ptrmask describes only a prefix of
+		const maxBits = sys.PtrSize*8 - 7
+		if typ.ptrdata/sys.PtrSize <= maxBits {
+			// Entire ptrmask fits in uintptr with room for a byte fragment.
+			// Load into pbits and never read from ptrmask again.
+			// This is especially important when the ptrmask has
+			// fewer than 8 bits in it; otherwise the reload in the middle
+			// of the Phase 2 loop would itself need to loop to gather
+			// at least 8 bits.
+
+			// Accumulate ptrmask into b.
+			// ptrmask is sized to describe only typ.ptrdata, but we record
+			// it as describing typ.size bytes, since all the high bits are zero.
+			nb = typ.ptrdata / sys.PtrSize
+			for i := uintptr(0); i < nb; i += 8 {
+				b |= uintptr(*p) << i
+				p = add1(p)
+			}
+			nb = typ.size / sys.PtrSize
+
+			// Replicate ptrmask to fill entire pbits uintptr.
+			// Doubling and truncating is fewer steps than
+			// iterating by nb each time. (nb could be 1.)
+			// Since we loaded typ.ptrdata/sys.PtrSize bits
+			// but are pretending to have typ.size/sys.PtrSize,
+			// there might be no replication necessary/possible.
+			pbits = b
+			endnb = nb
+			if nb+nb <= maxBits {
+				for endnb <= sys.PtrSize*8 {
+					pbits |= pbits << endnb
+					endnb += endnb
+				}
+				// Truncate to a multiple of original ptrmask.
+				// Because nb+nb <= maxBits, nb fits in a byte.
+				// Byte division is cheaper than uintptr division.
+				endnb = uintptr(maxBits/byte(nb)) * nb
+				pbits &= 1<<endnb - 1
+				b = pbits
+				nb = endnb
+			}
+
+			// Clear p and endp as sentinel for using pbits.
+			// Checked during Phase 2 loop.
+			p = nil
+			endp = nil
+		} else {
+			// Ptrmask is larger. Read it multiple times.
+			n := (typ.ptrdata/sys.PtrSize+7)/8 - 1
+			endp = addb(ptrmask, n)
+			endnb = typ.size/sys.PtrSize - n*8
+		}
+	}
+	if p != nil {
+		b = uintptr(*p)
+		p = add1(p)
+		nb = 8
+	}
+
+	if typ.size == dataSize {
+		// Single entry: can stop once we reach the non-pointer data.
+		nw = typ.ptrdata / sys.PtrSize
+	} else {
+		// Repeated instances of typ in an array.
+		// Have to process first N-1 entries in full, but can stop
+		// once we reach the non-pointer data in the final entry.
+		nw = ((dataSize/typ.size-1)*typ.size + typ.ptrdata) / sys.PtrSize
+	}
+	if nw == 0 {
+		// No pointers! Caller was supposed to check.
+		println("runtime: invalid type ", typ.string())
+		throw("heapBitsSetType: called with non-pointer type")
+		return
+	}
+
+	// Phase 1: Special case for leading byte (shift==0) or half-byte (shift==2).
+	// The leading byte is special because it contains the bits for word 1,
+	// which does not have the scan bit set.
+	// The leading half-byte is special because it's a half a byte,
+	// so we have to be careful with the bits already there.
+	switch {
+	default:
+		throw("heapBitsSetType: unexpected shift")
+
+	case h.shift == 0:
+		// Ptrmask and heap bitmap are aligned.
+		//
+		// This is a fast path for small objects.
+		//
+		// The first byte we write out covers the first four
+		// words of the object. The scan/dead bit on the first
+		// word must be set to scan since there are pointers
+		// somewhere in the object.
+		// In all following words, we set the scan/dead
+		// appropriately to indicate that the object continues
+		// to the next 2-bit entry in the bitmap.
+		//
+		// We set four bits at a time here, but if the object
+		// is fewer than four words, phase 3 will clear
+		// unnecessary bits.
+		hb = b & bitPointerAll
+		hb |= bitScanAll
+		if w += 4; w >= nw {
+			goto Phase3
+		}
+		*hbitp = uint8(hb)
+		hbitp = add1(hbitp)
+		b >>= 4
+		nb -= 4
+
+	case h.shift == 2:
+		// Ptrmask and heap bitmap are misaligned.
+		//
+		// On 32 bit architectures only the 6-word object that corresponds
+		// to a 24 bytes size class can start with h.shift of 2 here since
+		// all other non 16 byte aligned size classes have been handled by
+		// special code paths at the beginning of heapBitsSetType on 32 bit.
+		//
+		// Many size classes are only 16 byte aligned. On 64 bit architectures
+		// this results in a heap bitmap position starting with a h.shift of 2.
+		//
+		// The bits for the first two words are in a byte shared
+		// with another object, so we must be careful with the bits
+		// already there.
+		//
+		// We took care of 1-word, 2-word, and 3-word objects above,
+		// so this is at least a 6-word object.
+		hb = (b & (bitPointer | bitPointer<<heapBitsShift)) << (2 * heapBitsShift)
+		hb |= bitScan << (2 * heapBitsShift)
+		if nw > 1 {
+			hb |= bitScan << (3 * heapBitsShift)
+		}
+		b >>= 2
+		nb -= 2
+		*hbitp &^= uint8((bitPointer | bitScan | ((bitPointer | bitScan) << heapBitsShift)) << (2 * heapBitsShift))
+		*hbitp |= uint8(hb)
+		hbitp = add1(hbitp)
+		if w += 2; w >= nw {
+			// We know that there is more data, because we handled 2-word and 3-word objects above.
+			// This must be at least a 6-word object. If we're out of pointer words,
+			// mark no scan in next bitmap byte and finish.
+			hb = 0
+			w += 4
+			goto Phase3
+		}
+	}
+
+	// Phase 2: Full bytes in bitmap, up to but not including write to last byte (full or partial) in bitmap.
+	// The loop computes the bits for that last write but does not execute the write;
+	// it leaves the bits in hb for processing by phase 3.
+	// To avoid repeated adjustment of nb, we subtract out the 4 bits we're going to
+	// use in the first half of the loop right now, and then we only adjust nb explicitly
+	// if the 8 bits used by each iteration isn't balanced by 8 bits loaded mid-loop.
+	nb -= 4
+	for {
+		// Emit bitmap byte.
+		// b has at least nb+4 bits, with one exception:
+		// if w+4 >= nw, then b has only nw-w bits,
+		// but we'll stop at the break and then truncate
+		// appropriately in Phase 3.
+		hb = b & bitPointerAll
+		hb |= bitScanAll
+		if w += 4; w >= nw {
+			break
+		}
+		*hbitp = uint8(hb)
+		hbitp = add1(hbitp)
+		b >>= 4
+
+		// Load more bits. b has nb right now.
+		if p != endp {
+			// Fast path: keep reading from ptrmask.
+			// nb unmodified: we just loaded 8 bits,
+			// and the next iteration will consume 8 bits,
+			// leaving us with the same nb the next time we're here.
+			if nb < 8 {
+				b |= uintptr(*p) << nb
+				p = add1(p)
+			} else {
+				// Reduce the number of bits in b.
+				// This is important if we skipped
+				// over a scalar tail, since nb could
+				// be larger than the bit width of b.
+				nb -= 8
+			}
+		} else if p == nil {
+			// Almost as fast path: track bit count and refill from pbits.
+			// For short repetitions.
+			if nb < 8 {
+				b |= pbits << nb
+				nb += endnb
+			}
+			nb -= 8 // for next iteration
+		} else {
+			// Slow path: reached end of ptrmask.
+			// Process final partial byte and rewind to start.
+			b |= uintptr(*p) << nb
+			nb += endnb
+			if nb < 8 {
+				b |= uintptr(*ptrmask) << nb
+				p = add1(ptrmask)
+			} else {
+				nb -= 8
+				p = ptrmask
+			}
+		}
+
+		// Emit bitmap byte.
+		hb = b & bitPointerAll
+		hb |= bitScanAll
+		if w += 4; w >= nw {
+			break
+		}
+		*hbitp = uint8(hb)
+		hbitp = add1(hbitp)
+		b >>= 4
+	}
+
+Phase3:
+	// Phase 3: Write last byte or partial byte and zero the rest of the bitmap entries.
+	if w > nw {
+		// Counting the 4 entries in hb not yet written to memory,
+		// there are more entries than possible pointer slots.
+		// Discard the excess entries (can't be more than 3).
+		mask := uintptr(1)<<(4-(w-nw)) - 1
+		hb &= mask | mask<<4 // apply mask to both pointer bits and scan bits
+	}
+
+	// Change nw from counting possibly-pointer words to total words in allocation.
+	nw = size / sys.PtrSize
+
+	// Write whole bitmap bytes.
+	// The first is hb, the rest are zero.
+	if w <= nw {
+		*hbitp = uint8(hb)
+		hbitp = add1(hbitp)
+		hb = 0 // for possible final half-byte below
+		for w += 4; w <= nw; w += 4 {
+			*hbitp = 0
+			hbitp = add1(hbitp)
+		}
+	}
+
+	// Write final partial bitmap byte if any.
+	// We know w > nw, or else we'd still be in the loop above.
+	// It can be bigger only due to the 4 entries in hb that it counts.
+	// If w == nw+4 then there's nothing left to do: we wrote all nw entries
+	// and can discard the 4 sitting in hb.
+	// But if w == nw+2, we need to write first two in hb.
+	// The byte is shared with the next object, so be careful with
+	// existing bits.
+	if w == nw+2 {
+		*hbitp = *hbitp&^(bitPointer|bitScan|(bitPointer|bitScan)<<heapBitsShift) | uint8(hb)
+	}
+
+Phase4:
+	// Phase 4: Copy unrolled bitmap to per-arena bitmaps, if necessary.
+	if outOfPlace {
+		// TODO: We could probably make this faster by
+		// handling [x+dataSize, x+size) specially.
+		h := heapBitsForAddr(x)
+		// cnw is the number of heap words, or bit pairs
+		// remaining (like nw above).
+		cnw := size / sys.PtrSize
+		src := (*uint8)(unsafe.Pointer(x))
+		// We know the first and last byte of the bitmap are
+		// not the same, but it's still possible for small
+		// objects span arenas, so it may share bitmap bytes
+		// with neighboring objects.
+		//
+		// Handle the first byte specially if it's shared. See
+		// Phase 1 for why this is the only special case we need.
+		if doubleCheck {
+			if !(h.shift == 0 || h.shift == 2) {
+				print("x=", x, " size=", size, " cnw=", h.shift, "\n")
+				throw("bad start shift")
+			}
+		}
+		if h.shift == 2 {
+			*h.bitp = *h.bitp&^((bitPointer|bitScan|(bitPointer|bitScan)<<heapBitsShift)<<(2*heapBitsShift)) | *src
+			h = h.next().next()
+			cnw -= 2
+			src = addb(src, 1)
+		}
+		// We're now byte aligned. Copy out to per-arena
+		// bitmaps until the last byte (which may again be
+		// partial).
+		for cnw >= 4 {
+			// This loop processes four words at a time,
+			// so round cnw down accordingly.
+			hNext, words := h.forwardOrBoundary(cnw / 4 * 4)
+
+			// n is the number of bitmap bytes to copy.
+			n := words / 4
+			memmove(unsafe.Pointer(h.bitp), unsafe.Pointer(src), n)
+			cnw -= words
+			h = hNext
+			src = addb(src, n)
+		}
+		if doubleCheck && h.shift != 0 {
+			print("cnw=", cnw, " h.shift=", h.shift, "\n")
+			throw("bad shift after block copy")
+		}
+		// Handle the last byte if it's shared.
+		if cnw == 2 {
+			*h.bitp = *h.bitp&^(bitPointer|bitScan|(bitPointer|bitScan)<<heapBitsShift) | *src
+			src = addb(src, 1)
+			h = h.next().next()
+		}
+		if doubleCheck {
+			if uintptr(unsafe.Pointer(src)) > x+size {
+				throw("copy exceeded object size")
+			}
+			if !(cnw == 0 || cnw == 2) {
+				print("x=", x, " size=", size, " cnw=", cnw, "\n")
+				throw("bad number of remaining words")
+			}
+			// Set up hbitp so doubleCheck code below can check it.
+			hbitp = h.bitp
+		}
+		// Zero the object where we wrote the bitmap.
+		memclrNoHeapPointers(unsafe.Pointer(x), uintptr(unsafe.Pointer(src))-x)
+	}
+
+	// Double check the whole bitmap.
+	if doubleCheck {
+		// x+size may not point to the heap, so back up one
+		// word and then advance it the way we do above.
+		end := heapBitsForAddr(x + size - sys.PtrSize)
+		if outOfPlace {
+			// In out-of-place copying, we just advance
+			// using next.
+			end = end.next()
+		} else {
+			// Don't use next because that may advance to
+			// the next arena and the in-place logic
+			// doesn't do that.
+			end.shift += heapBitsShift
+			if end.shift == 4*heapBitsShift {
+				end.bitp, end.shift = add1(end.bitp), 0
+			}
+		}
+		if typ.kind&kindGCProg == 0 && (hbitp != end.bitp || (w == nw+2) != (end.shift == 2)) {
+			println("ended at wrong bitmap byte for", typ.string(), "x", dataSize/typ.size)
+			print("typ.size=", typ.size, " typ.ptrdata=", typ.ptrdata, " dataSize=", dataSize, " size=", size, "\n")
+			print("w=", w, " nw=", nw, " b=", hex(b), " nb=", nb, " hb=", hex(hb), "\n")
+			h0 := heapBitsForAddr(x)
+			print("initial bits h0.bitp=", h0.bitp, " h0.shift=", h0.shift, "\n")
+			print("ended at hbitp=", hbitp, " but next starts at bitp=", end.bitp, " shift=", end.shift, "\n")
+			throw("bad heapBitsSetType")
+		}
+
+		// Double-check that bits to be written were written correctly.
+		// Does not check that other bits were not written, unfortunately.
+		h := heapBitsForAddr(x)
+		nptr := typ.ptrdata / sys.PtrSize
+		ndata := typ.size / sys.PtrSize
+		count := dataSize / typ.size
+		totalptr := ((count-1)*typ.size + typ.ptrdata) / sys.PtrSize
+		for i := uintptr(0); i < size/sys.PtrSize; i++ {
+			j := i % ndata
+			var have, want uint8
+			have = (*h.bitp >> h.shift) & (bitPointer | bitScan)
+			if i >= totalptr {
+				if typ.kind&kindGCProg != 0 && i < (totalptr+3)/4*4 {
+					// heapBitsSetTypeGCProg always fills
+					// in full nibbles of bitScan.
+					want = bitScan
+				}
+			} else {
+				if j < nptr && (*addb(ptrmask, j/8)>>(j%8))&1 != 0 {
+					want |= bitPointer
+				}
+				want |= bitScan
+			}
+			if have != want {
+				println("mismatch writing bits for", typ.string(), "x", dataSize/typ.size)
+				print("typ.size=", typ.size, " typ.ptrdata=", typ.ptrdata, " dataSize=", dataSize, " size=", size, "\n")
+				print("kindGCProg=", typ.kind&kindGCProg != 0, " outOfPlace=", outOfPlace, "\n")
+				print("w=", w, " nw=", nw, " b=", hex(b), " nb=", nb, " hb=", hex(hb), "\n")
+				h0 := heapBitsForAddr(x)
+				print("initial bits h0.bitp=", h0.bitp, " h0.shift=", h0.shift, "\n")
+				print("current bits h.bitp=", h.bitp, " h.shift=", h.shift, " *h.bitp=", hex(*h.bitp), "\n")
+				print("ptrmask=", ptrmask, " p=", p, " endp=", endp, " endnb=", endnb, " pbits=", hex(pbits), " b=", hex(b), " nb=", nb, "\n")
+				println("at word", i, "offset", i*sys.PtrSize, "have", hex(have), "want", hex(want))
+				if typ.kind&kindGCProg != 0 {
+					println("GC program:")
+					dumpGCProg(addb(typ.gcdata, 4))
+				}
+				throw("bad heapBitsSetType")
+			}
+			h = h.next()
+		}
+		if ptrmask == debugPtrmask.data {
+			unlock(&debugPtrmask.lock)
+		}
+	}
+}
+
+var debugPtrmask struct {
+	lock mutex
+	data *byte
+}
+
+// heapBitsSetTypeGCProg implements heapBitsSetType using a GC program.
+// progSize is the size of the memory described by the program.
+// elemSize is the size of the element that the GC program describes (a prefix of).
+// dataSize is the total size of the intended data, a multiple of elemSize.
+// allocSize is the total size of the allocated memory.
+//
+// GC programs are only used for large allocations.
+// heapBitsSetType requires that allocSize is a multiple of 4 words,
+// so that the relevant bitmap bytes are not shared with surrounding
+// objects.
+func heapBitsSetTypeGCProg(h heapBits, progSize, elemSize, dataSize, allocSize uintptr, prog *byte) {
+	if sys.PtrSize == 8 && allocSize%(4*sys.PtrSize) != 0 {
+		// Alignment will be wrong.
+		throw("heapBitsSetTypeGCProg: small allocation")
+	}
+	var totalBits uintptr
+	if elemSize == dataSize {
+		totalBits = runGCProg(prog, nil, h.bitp, 2)
+		if totalBits*sys.PtrSize != progSize {
+			println("runtime: heapBitsSetTypeGCProg: total bits", totalBits, "but progSize", progSize)
+			throw("heapBitsSetTypeGCProg: unexpected bit count")
+		}
+	} else {
+		count := dataSize / elemSize
+
+		// Piece together program trailer to run after prog that does:
+		//	literal(0)
+		//	repeat(1, elemSize-progSize-1) // zeros to fill element size
+		//	repeat(elemSize, count-1) // repeat that element for count
+		// This zero-pads the data remaining in the first element and then
+		// repeats that first element to fill the array.
+		var trailer [40]byte // 3 varints (max 10 each) + some bytes
+		i := 0
+		if n := elemSize/sys.PtrSize - progSize/sys.PtrSize; n > 0 {
+			// literal(0)
+			trailer[i] = 0x01
+			i++
+			trailer[i] = 0
+			i++
+			if n > 1 {
+				// repeat(1, n-1)
+				trailer[i] = 0x81
+				i++
+				n--
+				for ; n >= 0x80; n >>= 7 {
+					trailer[i] = byte(n | 0x80)
+					i++
+				}
+				trailer[i] = byte(n)
+				i++
+			}
+		}
+		// repeat(elemSize/ptrSize, count-1)
+		trailer[i] = 0x80
+		i++
+		n := elemSize / sys.PtrSize
+		for ; n >= 0x80; n >>= 7 {
+			trailer[i] = byte(n | 0x80)
+			i++
+		}
+		trailer[i] = byte(n)
+		i++
+		n = count - 1
+		for ; n >= 0x80; n >>= 7 {
+			trailer[i] = byte(n | 0x80)
+			i++
+		}
+		trailer[i] = byte(n)
+		i++
+		trailer[i] = 0
+		i++
+
+		runGCProg(prog, &trailer[0], h.bitp, 2)
+
+		// Even though we filled in the full array just now,
+		// record that we only filled in up to the ptrdata of the
+		// last element. This will cause the code below to
+		// memclr the dead section of the final array element,
+		// so that scanobject can stop early in the final element.
+		totalBits = (elemSize*(count-1) + progSize) / sys.PtrSize
+	}
+	endProg := unsafe.Pointer(addb(h.bitp, (totalBits+3)/4))
+	endAlloc := unsafe.Pointer(addb(h.bitp, allocSize/sys.PtrSize/wordsPerBitmapByte))
+	memclrNoHeapPointers(endProg, uintptr(endAlloc)-uintptr(endProg))
+}
+
+// progToPointerMask returns the 1-bit pointer mask output by the GC program prog.
+// size the size of the region described by prog, in bytes.
+// The resulting bitvector will have no more than size/sys.PtrSize bits.
+func progToPointerMask(prog *byte, size uintptr) bitvector {
+	n := (size/sys.PtrSize + 7) / 8
+	x := (*[1 << 30]byte)(persistentalloc(n+1, 1, &memstats.buckhash_sys))[:n+1]
+	x[len(x)-1] = 0xa1 // overflow check sentinel
+	n = runGCProg(prog, nil, &x[0], 1)
+	if x[len(x)-1] != 0xa1 {
+		throw("progToPointerMask: overflow")
+	}
+	return bitvector{int32(n), &x[0]}
+}
+
+// Packed GC pointer bitmaps, aka GC programs.
+//
+// For large types containing arrays, the type information has a
+// natural repetition that can be encoded to save space in the
+// binary and in the memory representation of the type information.
+//
+// The encoding is a simple Lempel-Ziv style bytecode machine
+// with the following instructions:
+//
+//	00000000: stop
+//	0nnnnnnn: emit n bits copied from the next (n+7)/8 bytes
+//	10000000 n c: repeat the previous n bits c times; n, c are varints
+//	1nnnnnnn c: repeat the previous n bits c times; c is a varint
+
+// runGCProg executes the GC program prog, and then trailer if non-nil,
+// writing to dst with entries of the given size.
+// If size == 1, dst is a 1-bit pointer mask laid out moving forward from dst.
+// If size == 2, dst is the 2-bit heap bitmap, and writes move backward
+// starting at dst (because the heap bitmap does). In this case, the caller guarantees
+// that only whole bytes in dst need to be written.
+//
+// runGCProg returns the number of 1- or 2-bit entries written to memory.
+func runGCProg(prog, trailer, dst *byte, size int) uintptr {
+	dstStart := dst
+
+	// Bits waiting to be written to memory.
+	var bits uintptr
+	var nbits uintptr
+
+	p := prog
+Run:
+	for {
+		// Flush accumulated full bytes.
+		// The rest of the loop assumes that nbits <= 7.
+		for ; nbits >= 8; nbits -= 8 {
+			if size == 1 {
+				*dst = uint8(bits)
+				dst = add1(dst)
+				bits >>= 8
+			} else {
+				v := bits&bitPointerAll | bitScanAll
+				*dst = uint8(v)
+				dst = add1(dst)
+				bits >>= 4
+				v = bits&bitPointerAll | bitScanAll
+				*dst = uint8(v)
+				dst = add1(dst)
+				bits >>= 4
+			}
+		}
+
+		// Process one instruction.
+		inst := uintptr(*p)
+		p = add1(p)
+		n := inst & 0x7F
+		if inst&0x80 == 0 {
+			// Literal bits; n == 0 means end of program.
+			if n == 0 {
+				// Program is over; continue in trailer if present.
+				if trailer != nil {
+					p = trailer
+					trailer = nil
+					continue
+				}
+				break Run
+			}
+			nbyte := n / 8
+			for i := uintptr(0); i < nbyte; i++ {
+				bits |= uintptr(*p) << nbits
+				p = add1(p)
+				if size == 1 {
+					*dst = uint8(bits)
+					dst = add1(dst)
+					bits >>= 8
+				} else {
+					v := bits&0xf | bitScanAll
+					*dst = uint8(v)
+					dst = add1(dst)
+					bits >>= 4
+					v = bits&0xf | bitScanAll
+					*dst = uint8(v)
+					dst = add1(dst)
+					bits >>= 4
+				}
+			}
+			if n %= 8; n > 0 {
+				bits |= uintptr(*p) << nbits
+				p = add1(p)
+				nbits += n
+			}
+			continue Run
+		}
+
+		// Repeat. If n == 0, it is encoded in a varint in the next bytes.
+		if n == 0 {
+			for off := uint(0); ; off += 7 {
+				x := uintptr(*p)
+				p = add1(p)
+				n |= (x & 0x7F) << off
+				if x&0x80 == 0 {
+					break
+				}
+			}
+		}
+
+		// Count is encoded in a varint in the next bytes.
+		c := uintptr(0)
+		for off := uint(0); ; off += 7 {
+			x := uintptr(*p)
+			p = add1(p)
+			c |= (x & 0x7F) << off
+			if x&0x80 == 0 {
+				break
+			}
+		}
+		c *= n // now total number of bits to copy
+
+		// If the number of bits being repeated is small, load them
+		// into a register and use that register for the entire loop
+		// instead of repeatedly reading from memory.
+		// Handling fewer than 8 bits here makes the general loop simpler.
+		// The cutoff is sys.PtrSize*8 - 7 to guarantee that when we add
+		// the pattern to a bit buffer holding at most 7 bits (a partial byte)
+		// it will not overflow.
+		src := dst
+		const maxBits = sys.PtrSize*8 - 7
+		if n <= maxBits {
+			// Start with bits in output buffer.
+			pattern := bits
+			npattern := nbits
+
+			// If we need more bits, fetch them from memory.
+			if size == 1 {
+				src = subtract1(src)
+				for npattern < n {
+					pattern <<= 8
+					pattern |= uintptr(*src)
+					src = subtract1(src)
+					npattern += 8
+				}
+			} else {
+				src = subtract1(src)
+				for npattern < n {
+					pattern <<= 4
+					pattern |= uintptr(*src) & 0xf
+					src = subtract1(src)
+					npattern += 4
+				}
+			}
+
+			// We started with the whole bit output buffer,
+			// and then we loaded bits from whole bytes.
+			// Either way, we might now have too many instead of too few.
+			// Discard the extra.
+			if npattern > n {
+				pattern >>= npattern - n
+				npattern = n
+			}
+
+			// Replicate pattern to at most maxBits.
+			if npattern == 1 {
+				// One bit being repeated.
+				// If the bit is 1, make the pattern all 1s.
+				// If the bit is 0, the pattern is already all 0s,
+				// but we can claim that the number of bits
+				// in the word is equal to the number we need (c),
+				// because right shift of bits will zero fill.
+				if pattern == 1 {
+					pattern = 1<<maxBits - 1
+					npattern = maxBits
+				} else {
+					npattern = c
+				}
+			} else {
+				b := pattern
+				nb := npattern
+				if nb+nb <= maxBits {
+					// Double pattern until the whole uintptr is filled.
+					for nb <= sys.PtrSize*8 {
+						b |= b << nb
+						nb += nb
+					}
+					// Trim away incomplete copy of original pattern in high bits.
+					// TODO(rsc): Replace with table lookup or loop on systems without divide?
+					nb = maxBits / npattern * npattern
+					b &= 1<<nb - 1
+					pattern = b
+					npattern = nb
+				}
+			}
+
+			// Add pattern to bit buffer and flush bit buffer, c/npattern times.
+			// Since pattern contains >8 bits, there will be full bytes to flush
+			// on each iteration.
+			for ; c >= npattern; c -= npattern {
+				bits |= pattern << nbits
+				nbits += npattern
+				if size == 1 {
+					for nbits >= 8 {
+						*dst = uint8(bits)
+						dst = add1(dst)
+						bits >>= 8
+						nbits -= 8
+					}
+				} else {
+					for nbits >= 4 {
+						*dst = uint8(bits&0xf | bitScanAll)
+						dst = add1(dst)
+						bits >>= 4
+						nbits -= 4
+					}
+				}
+			}
+
+			// Add final fragment to bit buffer.
+			if c > 0 {
+				pattern &= 1<<c - 1
+				bits |= pattern << nbits
+				nbits += c
+			}
+			continue Run
+		}
+
+		// Repeat; n too large to fit in a register.
+		// Since nbits <= 7, we know the first few bytes of repeated data
+		// are already written to memory.
+		off := n - nbits // n > nbits because n > maxBits and nbits <= 7
+		if size == 1 {
+			// Leading src fragment.
+			src = subtractb(src, (off+7)/8)
+			if frag := off & 7; frag != 0 {
+				bits |= uintptr(*src) >> (8 - frag) << nbits
+				src = add1(src)
+				nbits += frag
+				c -= frag
+			}
+			// Main loop: load one byte, write another.
+			// The bits are rotating through the bit buffer.
+			for i := c / 8; i > 0; i-- {
+				bits |= uintptr(*src) << nbits
+				src = add1(src)
+				*dst = uint8(bits)
+				dst = add1(dst)
+				bits >>= 8
+			}
+			// Final src fragment.
+			if c %= 8; c > 0 {
+				bits |= (uintptr(*src) & (1<<c - 1)) << nbits
+				nbits += c
+			}
+		} else {
+			// Leading src fragment.
+			src = subtractb(src, (off+3)/4)
+			if frag := off & 3; frag != 0 {
+				bits |= (uintptr(*src) & 0xf) >> (4 - frag) << nbits
+				src = add1(src)
+				nbits += frag
+				c -= frag
+			}
+			// Main loop: load one byte, write another.
+			// The bits are rotating through the bit buffer.
+			for i := c / 4; i > 0; i-- {
+				bits |= (uintptr(*src) & 0xf) << nbits
+				src = add1(src)
+				*dst = uint8(bits&0xf | bitScanAll)
+				dst = add1(dst)
+				bits >>= 4
+			}
+			// Final src fragment.
+			if c %= 4; c > 0 {
+				bits |= (uintptr(*src) & (1<<c - 1)) << nbits
+				nbits += c
+			}
+		}
+	}
+
+	// Write any final bits out, using full-byte writes, even for the final byte.
+	var totalBits uintptr
+	if size == 1 {
+		totalBits = (uintptr(unsafe.Pointer(dst))-uintptr(unsafe.Pointer(dstStart)))*8 + nbits
+		nbits += -nbits & 7
+		for ; nbits > 0; nbits -= 8 {
+			*dst = uint8(bits)
+			dst = add1(dst)
+			bits >>= 8
+		}
+	} else {
+		totalBits = (uintptr(unsafe.Pointer(dst))-uintptr(unsafe.Pointer(dstStart)))*4 + nbits
+		nbits += -nbits & 3
+		for ; nbits > 0; nbits -= 4 {
+			v := bits&0xf | bitScanAll
+			*dst = uint8(v)
+			dst = add1(dst)
+			bits >>= 4
+		}
+	}
+	return totalBits
+}
+
+// materializeGCProg allocates space for the (1-bit) pointer bitmask
+// for an object of size ptrdata.  Then it fills that space with the
+// pointer bitmask specified by the program prog.
+// The bitmask starts at s.startAddr.
+// The result must be deallocated with dematerializeGCProg.
+func materializeGCProg(ptrdata uintptr, prog *byte) *mspan {
+	// Each word of ptrdata needs one bit in the bitmap.
+	bitmapBytes := divRoundUp(ptrdata, 8*sys.PtrSize)
+	// Compute the number of pages needed for bitmapBytes.
+	pages := divRoundUp(bitmapBytes, pageSize)
+	s := mheap_.allocManual(pages, spanAllocPtrScalarBits)
+	runGCProg(addb(prog, 4), nil, (*byte)(unsafe.Pointer(s.startAddr)), 1)
+	return s
+}
+func dematerializeGCProg(s *mspan) {
+	mheap_.freeManual(s, spanAllocPtrScalarBits)
+}
+
+func dumpGCProg(p *byte) {
+	nptr := 0
+	for {
+		x := *p
+		p = add1(p)
+		if x == 0 {
+			print("\t", nptr, " end\n")
+			break
+		}
+		if x&0x80 == 0 {
+			print("\t", nptr, " lit ", x, ":")
+			n := int(x+7) / 8
+			for i := 0; i < n; i++ {
+				print(" ", hex(*p))
+				p = add1(p)
+			}
+			print("\n")
+			nptr += int(x)
+		} else {
+			nbit := int(x &^ 0x80)
+			if nbit == 0 {
+				for nb := uint(0); ; nb += 7 {
+					x := *p
+					p = add1(p)
+					nbit |= int(x&0x7f) << nb
+					if x&0x80 == 0 {
+						break
+					}
+				}
+			}
+			count := 0
+			for nb := uint(0); ; nb += 7 {
+				x := *p
+				p = add1(p)
+				count |= int(x&0x7f) << nb
+				if x&0x80 == 0 {
+					break
+				}
+			}
+			print("\t", nptr, " repeat ", nbit, " × ", count, "\n")
+			nptr += nbit * count
+		}
+	}
+}
+
+// Testing.
+
+func getgcmaskcb(frame *stkframe, ctxt unsafe.Pointer) bool {
+	target := (*stkframe)(ctxt)
+	if frame.sp <= target.sp && target.sp < frame.varp {
+		*target = *frame
+		return false
+	}
+	return true
+}
+
+// gcbits returns the GC type info for x, for testing.
+// The result is the bitmap entries (0 or 1), one entry per byte.
+//go:linkname reflect_gcbits reflect.gcbits
+func reflect_gcbits(x interface{}) []byte {
+	ret := getgcmask(x)
+	typ := (*ptrtype)(unsafe.Pointer(efaceOf(&x)._type)).elem
+	nptr := typ.ptrdata / sys.PtrSize
+	for uintptr(len(ret)) > nptr && ret[len(ret)-1] == 0 {
+		ret = ret[:len(ret)-1]
+	}
+	return ret
+}
+
+// Returns GC type info for the pointer stored in ep for testing.
+// If ep points to the stack, only static live information will be returned
+// (i.e. not for objects which are only dynamically live stack objects).
+func getgcmask(ep interface{}) (mask []byte) {
+	e := *efaceOf(&ep)
+	p := e.data
+	t := e._type
+	// data or bss
+	for _, datap := range activeModules() {
+		// data
+		if datap.data <= uintptr(p) && uintptr(p) < datap.edata {
+			bitmap := datap.gcdatamask.bytedata
+			n := (*ptrtype)(unsafe.Pointer(t)).elem.size
+			mask = make([]byte, n/sys.PtrSize)
+			for i := uintptr(0); i < n; i += sys.PtrSize {
+				off := (uintptr(p) + i - datap.data) / sys.PtrSize
+				mask[i/sys.PtrSize] = (*addb(bitmap, off/8) >> (off % 8)) & 1
+			}
+			return
+		}
+
+		// bss
+		if datap.bss <= uintptr(p) && uintptr(p) < datap.ebss {
+			bitmap := datap.gcbssmask.bytedata
+			n := (*ptrtype)(unsafe.Pointer(t)).elem.size
+			mask = make([]byte, n/sys.PtrSize)
+			for i := uintptr(0); i < n; i += sys.PtrSize {
+				off := (uintptr(p) + i - datap.bss) / sys.PtrSize
+				mask[i/sys.PtrSize] = (*addb(bitmap, off/8) >> (off % 8)) & 1
+			}
+			return
+		}
+	}
+
+	// heap
+	if base, s, _ := findObject(uintptr(p), 0, 0); base != 0 {
+		hbits := heapBitsForAddr(base)
+		n := s.elemsize
+		mask = make([]byte, n/sys.PtrSize)
+		for i := uintptr(0); i < n; i += sys.PtrSize {
+			if hbits.isPointer() {
+				mask[i/sys.PtrSize] = 1
+			}
+			if !hbits.morePointers() {
+				mask = mask[:i/sys.PtrSize]
+				break
+			}
+			hbits = hbits.next()
+		}
+		return
+	}
+
+	// stack
+	if _g_ := getg(); _g_.m.curg.stack.lo <= uintptr(p) && uintptr(p) < _g_.m.curg.stack.hi {
+		var frame stkframe
+		frame.sp = uintptr(p)
+		_g_ := getg()
+		gentraceback(_g_.m.curg.sched.pc, _g_.m.curg.sched.sp, 0, _g_.m.curg, 0, nil, 1000, getgcmaskcb, noescape(unsafe.Pointer(&frame)), 0)
+		if frame.fn.valid() {
+			locals, _, _ := getStackMap(&frame, nil, false)
+			if locals.n == 0 {
+				return
+			}
+			size := uintptr(locals.n) * sys.PtrSize
+			n := (*ptrtype)(unsafe.Pointer(t)).elem.size
+			mask = make([]byte, n/sys.PtrSize)
+			for i := uintptr(0); i < n; i += sys.PtrSize {
+				off := (uintptr(p) + i - frame.varp + size) / sys.PtrSize
+				mask[i/sys.PtrSize] = locals.ptrbit(off)
+			}
+		}
+		return
+	}
+
+	// otherwise, not something the GC knows about.
+	// possibly read-only data, like malloc(0).
+	// must not have pointers
+	return
+}
diff --git a/src/runtime/mcache.go b/src/runtime/mcache.go
new file mode 100644
index 0000000..bb7475b
--- /dev/null
+++ b/src/runtime/mcache.go
@@ -0,0 +1,313 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// Per-thread (in Go, per-P) cache for small objects.
+// This includes a small object cache and local allocation stats.
+// No locking needed because it is per-thread (per-P).
+//
+// mcaches are allocated from non-GC'd memory, so any heap pointers
+// must be specially handled.
+//
+//go:notinheap
+type mcache struct {
+	// The following members are accessed on every malloc,
+	// so they are grouped here for better caching.
+	nextSample uintptr // trigger heap sample after allocating this many bytes
+	scanAlloc  uintptr // bytes of scannable heap allocated
+
+	// Allocator cache for tiny objects w/o pointers.
+	// See "Tiny allocator" comment in malloc.go.
+
+	// tiny points to the beginning of the current tiny block, or
+	// nil if there is no current tiny block.
+	//
+	// tiny is a heap pointer. Since mcache is in non-GC'd memory,
+	// we handle it by clearing it in releaseAll during mark
+	// termination.
+	//
+	// tinyAllocs is the number of tiny allocations performed
+	// by the P that owns this mcache.
+	tiny       uintptr
+	tinyoffset uintptr
+	tinyAllocs uintptr
+
+	// The rest is not accessed on every malloc.
+
+	alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass
+
+	stackcache [_NumStackOrders]stackfreelist
+
+	// flushGen indicates the sweepgen during which this mcache
+	// was last flushed. If flushGen != mheap_.sweepgen, the spans
+	// in this mcache are stale and need to the flushed so they
+	// can be swept. This is done in acquirep.
+	flushGen uint32
+}
+
+// A gclink is a node in a linked list of blocks, like mlink,
+// but it is opaque to the garbage collector.
+// The GC does not trace the pointers during collection,
+// and the compiler does not emit write barriers for assignments
+// of gclinkptr values. Code should store references to gclinks
+// as gclinkptr, not as *gclink.
+type gclink struct {
+	next gclinkptr
+}
+
+// A gclinkptr is a pointer to a gclink, but it is opaque
+// to the garbage collector.
+type gclinkptr uintptr
+
+// ptr returns the *gclink form of p.
+// The result should be used for accessing fields, not stored
+// in other data structures.
+func (p gclinkptr) ptr() *gclink {
+	return (*gclink)(unsafe.Pointer(p))
+}
+
+type stackfreelist struct {
+	list gclinkptr // linked list of free stacks
+	size uintptr   // total size of stacks in list
+}
+
+// dummy mspan that contains no free objects.
+var emptymspan mspan
+
+func allocmcache() *mcache {
+	var c *mcache
+	systemstack(func() {
+		lock(&mheap_.lock)
+		c = (*mcache)(mheap_.cachealloc.alloc())
+		c.flushGen = mheap_.sweepgen
+		unlock(&mheap_.lock)
+	})
+	for i := range c.alloc {
+		c.alloc[i] = &emptymspan
+	}
+	c.nextSample = nextSample()
+	return c
+}
+
+// freemcache releases resources associated with this
+// mcache and puts the object onto a free list.
+//
+// In some cases there is no way to simply release
+// resources, such as statistics, so donate them to
+// a different mcache (the recipient).
+func freemcache(c *mcache) {
+	systemstack(func() {
+		c.releaseAll()
+		stackcache_clear(c)
+
+		// NOTE(rsc,rlh): If gcworkbuffree comes back, we need to coordinate
+		// with the stealing of gcworkbufs during garbage collection to avoid
+		// a race where the workbuf is double-freed.
+		// gcworkbuffree(c.gcworkbuf)
+
+		lock(&mheap_.lock)
+		mheap_.cachealloc.free(unsafe.Pointer(c))
+		unlock(&mheap_.lock)
+	})
+}
+
+// getMCache is a convenience function which tries to obtain an mcache.
+//
+// Returns nil if we're not bootstrapping or we don't have a P. The caller's
+// P must not change, so we must be in a non-preemptible state.
+func getMCache() *mcache {
+	// Grab the mcache, since that's where stats live.
+	pp := getg().m.p.ptr()
+	var c *mcache
+	if pp == nil {
+		// We will be called without a P while bootstrapping,
+		// in which case we use mcache0, which is set in mallocinit.
+		// mcache0 is cleared when bootstrapping is complete,
+		// by procresize.
+		c = mcache0
+	} else {
+		c = pp.mcache
+	}
+	return c
+}
+
+// refill acquires a new span of span class spc for c. This span will
+// have at least one free object. The current span in c must be full.
+//
+// Must run in a non-preemptible context since otherwise the owner of
+// c could change.
+func (c *mcache) refill(spc spanClass) {
+	// Return the current cached span to the central lists.
+	s := c.alloc[spc]
+
+	if uintptr(s.allocCount) != s.nelems {
+		throw("refill of span with free space remaining")
+	}
+	if s != &emptymspan {
+		// Mark this span as no longer cached.
+		if s.sweepgen != mheap_.sweepgen+3 {
+			throw("bad sweepgen in refill")
+		}
+		mheap_.central[spc].mcentral.uncacheSpan(s)
+	}
+
+	// Get a new cached span from the central lists.
+	s = mheap_.central[spc].mcentral.cacheSpan()
+	if s == nil {
+		throw("out of memory")
+	}
+
+	if uintptr(s.allocCount) == s.nelems {
+		throw("span has no free space")
+	}
+
+	// Indicate that this span is cached and prevent asynchronous
+	// sweeping in the next sweep phase.
+	s.sweepgen = mheap_.sweepgen + 3
+
+	// Assume all objects from this span will be allocated in the
+	// mcache. If it gets uncached, we'll adjust this.
+	stats := memstats.heapStats.acquire()
+	atomic.Xadduintptr(&stats.smallAllocCount[spc.sizeclass()], uintptr(s.nelems)-uintptr(s.allocCount))
+	memstats.heapStats.release()
+
+	// Update heap_live with the same assumption.
+	usedBytes := uintptr(s.allocCount) * s.elemsize
+	atomic.Xadd64(&memstats.heap_live, int64(s.npages*pageSize)-int64(usedBytes))
+
+	// Flush tinyAllocs.
+	if spc == tinySpanClass {
+		atomic.Xadd64(&memstats.tinyallocs, int64(c.tinyAllocs))
+		c.tinyAllocs = 0
+	}
+
+	// While we're here, flush scanAlloc, since we have to call
+	// revise anyway.
+	atomic.Xadd64(&memstats.heap_scan, int64(c.scanAlloc))
+	c.scanAlloc = 0
+
+	if trace.enabled {
+		// heap_live changed.
+		traceHeapAlloc()
+	}
+	if gcBlackenEnabled != 0 {
+		// heap_live and heap_scan changed.
+		gcController.revise()
+	}
+
+	c.alloc[spc] = s
+}
+
+// allocLarge allocates a span for a large object.
+func (c *mcache) allocLarge(size uintptr, needzero bool, noscan bool) *mspan {
+	if size+_PageSize < size {
+		throw("out of memory")
+	}
+	npages := size >> _PageShift
+	if size&_PageMask != 0 {
+		npages++
+	}
+
+	// Deduct credit for this span allocation and sweep if
+	// necessary. mHeap_Alloc will also sweep npages, so this only
+	// pays the debt down to npage pages.
+	deductSweepCredit(npages*_PageSize, npages)
+
+	spc := makeSpanClass(0, noscan)
+	s := mheap_.alloc(npages, spc, needzero)
+	if s == nil {
+		throw("out of memory")
+	}
+	stats := memstats.heapStats.acquire()
+	atomic.Xadduintptr(&stats.largeAlloc, npages*pageSize)
+	atomic.Xadduintptr(&stats.largeAllocCount, 1)
+	memstats.heapStats.release()
+
+	// Update heap_live and revise pacing if needed.
+	atomic.Xadd64(&memstats.heap_live, int64(npages*pageSize))
+	if trace.enabled {
+		// Trace that a heap alloc occurred because heap_live changed.
+		traceHeapAlloc()
+	}
+	if gcBlackenEnabled != 0 {
+		gcController.revise()
+	}
+
+	// Put the large span in the mcentral swept list so that it's
+	// visible to the background sweeper.
+	mheap_.central[spc].mcentral.fullSwept(mheap_.sweepgen).push(s)
+	s.limit = s.base() + size
+	heapBitsForAddr(s.base()).initSpan(s)
+	return s
+}
+
+func (c *mcache) releaseAll() {
+	// Take this opportunity to flush scanAlloc.
+	atomic.Xadd64(&memstats.heap_scan, int64(c.scanAlloc))
+	c.scanAlloc = 0
+
+	sg := mheap_.sweepgen
+	for i := range c.alloc {
+		s := c.alloc[i]
+		if s != &emptymspan {
+			// Adjust nsmallalloc in case the span wasn't fully allocated.
+			n := uintptr(s.nelems) - uintptr(s.allocCount)
+			stats := memstats.heapStats.acquire()
+			atomic.Xadduintptr(&stats.smallAllocCount[spanClass(i).sizeclass()], -n)
+			memstats.heapStats.release()
+			if s.sweepgen != sg+1 {
+				// refill conservatively counted unallocated slots in heap_live.
+				// Undo this.
+				//
+				// If this span was cached before sweep, then
+				// heap_live was totally recomputed since
+				// caching this span, so we don't do this for
+				// stale spans.
+				atomic.Xadd64(&memstats.heap_live, -int64(n)*int64(s.elemsize))
+			}
+			// Release the span to the mcentral.
+			mheap_.central[i].mcentral.uncacheSpan(s)
+			c.alloc[i] = &emptymspan
+		}
+	}
+	// Clear tinyalloc pool.
+	c.tiny = 0
+	c.tinyoffset = 0
+	atomic.Xadd64(&memstats.tinyallocs, int64(c.tinyAllocs))
+	c.tinyAllocs = 0
+
+	// Updated heap_scan and possible heap_live.
+	if gcBlackenEnabled != 0 {
+		gcController.revise()
+	}
+}
+
+// prepareForSweep flushes c if the system has entered a new sweep phase
+// since c was populated. This must happen between the sweep phase
+// starting and the first allocation from c.
+func (c *mcache) prepareForSweep() {
+	// Alternatively, instead of making sure we do this on every P
+	// between starting the world and allocating on that P, we
+	// could leave allocate-black on, allow allocation to continue
+	// as usual, use a ragged barrier at the beginning of sweep to
+	// ensure all cached spans are swept, and then disable
+	// allocate-black. However, with this approach it's difficult
+	// to avoid spilling mark bits into the *next* GC cycle.
+	sg := mheap_.sweepgen
+	if c.flushGen == sg {
+		return
+	} else if c.flushGen != sg-2 {
+		println("bad flushGen", c.flushGen, "in prepareForSweep; sweepgen", sg)
+		throw("bad flushGen")
+	}
+	c.releaseAll()
+	stackcache_clear(c)
+	atomic.Store(&c.flushGen, mheap_.sweepgen) // Synchronizes with gcStart
+}
diff --git a/src/runtime/mcentral.go b/src/runtime/mcentral.go
new file mode 100644
index 0000000..cd20dec
--- /dev/null
+++ b/src/runtime/mcentral.go
@@ -0,0 +1,243 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Central free lists.
+//
+// See malloc.go for an overview.
+//
+// The mcentral doesn't actually contain the list of free objects; the mspan does.
+// Each mcentral is two lists of mspans: those with free objects (c->nonempty)
+// and those that are completely allocated (c->empty).
+
+package runtime
+
+import "runtime/internal/atomic"
+
+// Central list of free objects of a given size.
+//
+//go:notinheap
+type mcentral struct {
+	spanclass spanClass
+
+	// partial and full contain two mspan sets: one of swept in-use
+	// spans, and one of unswept in-use spans. These two trade
+	// roles on each GC cycle. The unswept set is drained either by
+	// allocation or by the background sweeper in every GC cycle,
+	// so only two roles are necessary.
+	//
+	// sweepgen is increased by 2 on each GC cycle, so the swept
+	// spans are in partial[sweepgen/2%2] and the unswept spans are in
+	// partial[1-sweepgen/2%2]. Sweeping pops spans from the
+	// unswept set and pushes spans that are still in-use on the
+	// swept set. Likewise, allocating an in-use span pushes it
+	// on the swept set.
+	//
+	// Some parts of the sweeper can sweep arbitrary spans, and hence
+	// can't remove them from the unswept set, but will add the span
+	// to the appropriate swept list. As a result, the parts of the
+	// sweeper and mcentral that do consume from the unswept list may
+	// encounter swept spans, and these should be ignored.
+	partial [2]spanSet // list of spans with a free object
+	full    [2]spanSet // list of spans with no free objects
+}
+
+// Initialize a single central free list.
+func (c *mcentral) init(spc spanClass) {
+	c.spanclass = spc
+	lockInit(&c.partial[0].spineLock, lockRankSpanSetSpine)
+	lockInit(&c.partial[1].spineLock, lockRankSpanSetSpine)
+	lockInit(&c.full[0].spineLock, lockRankSpanSetSpine)
+	lockInit(&c.full[1].spineLock, lockRankSpanSetSpine)
+}
+
+// partialUnswept returns the spanSet which holds partially-filled
+// unswept spans for this sweepgen.
+func (c *mcentral) partialUnswept(sweepgen uint32) *spanSet {
+	return &c.partial[1-sweepgen/2%2]
+}
+
+// partialSwept returns the spanSet which holds partially-filled
+// swept spans for this sweepgen.
+func (c *mcentral) partialSwept(sweepgen uint32) *spanSet {
+	return &c.partial[sweepgen/2%2]
+}
+
+// fullUnswept returns the spanSet which holds unswept spans without any
+// free slots for this sweepgen.
+func (c *mcentral) fullUnswept(sweepgen uint32) *spanSet {
+	return &c.full[1-sweepgen/2%2]
+}
+
+// fullSwept returns the spanSet which holds swept spans without any
+// free slots for this sweepgen.
+func (c *mcentral) fullSwept(sweepgen uint32) *spanSet {
+	return &c.full[sweepgen/2%2]
+}
+
+// Allocate a span to use in an mcache.
+func (c *mcentral) cacheSpan() *mspan {
+	// Deduct credit for this span allocation and sweep if necessary.
+	spanBytes := uintptr(class_to_allocnpages[c.spanclass.sizeclass()]) * _PageSize
+	deductSweepCredit(spanBytes, 0)
+
+	sg := mheap_.sweepgen
+
+	traceDone := false
+	if trace.enabled {
+		traceGCSweepStart()
+	}
+
+	// If we sweep spanBudget spans without finding any free
+	// space, just allocate a fresh span. This limits the amount
+	// of time we can spend trying to find free space and
+	// amortizes the cost of small object sweeping over the
+	// benefit of having a full free span to allocate from. By
+	// setting this to 100, we limit the space overhead to 1%.
+	//
+	// TODO(austin,mknyszek): This still has bad worst-case
+	// throughput. For example, this could find just one free slot
+	// on the 100th swept span. That limits allocation latency, but
+	// still has very poor throughput. We could instead keep a
+	// running free-to-used budget and switch to fresh span
+	// allocation if the budget runs low.
+	spanBudget := 100
+
+	var s *mspan
+
+	// Try partial swept spans first.
+	if s = c.partialSwept(sg).pop(); s != nil {
+		goto havespan
+	}
+
+	// Now try partial unswept spans.
+	for ; spanBudget >= 0; spanBudget-- {
+		s = c.partialUnswept(sg).pop()
+		if s == nil {
+			break
+		}
+		if atomic.Load(&s.sweepgen) == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
+			// We got ownership of the span, so let's sweep it and use it.
+			s.sweep(true)
+			goto havespan
+		}
+		// We failed to get ownership of the span, which means it's being or
+		// has been swept by an asynchronous sweeper that just couldn't remove it
+		// from the unswept list. That sweeper took ownership of the span and
+		// responsibility for either freeing it to the heap or putting it on the
+		// right swept list. Either way, we should just ignore it (and it's unsafe
+		// for us to do anything else).
+	}
+	// Now try full unswept spans, sweeping them and putting them into the
+	// right list if we fail to get a span.
+	for ; spanBudget >= 0; spanBudget-- {
+		s = c.fullUnswept(sg).pop()
+		if s == nil {
+			break
+		}
+		if atomic.Load(&s.sweepgen) == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
+			// We got ownership of the span, so let's sweep it.
+			s.sweep(true)
+			// Check if there's any free space.
+			freeIndex := s.nextFreeIndex()
+			if freeIndex != s.nelems {
+				s.freeindex = freeIndex
+				goto havespan
+			}
+			// Add it to the swept list, because sweeping didn't give us any free space.
+			c.fullSwept(sg).push(s)
+		}
+		// See comment for partial unswept spans.
+	}
+	if trace.enabled {
+		traceGCSweepDone()
+		traceDone = true
+	}
+
+	// We failed to get a span from the mcentral so get one from mheap.
+	s = c.grow()
+	if s == nil {
+		return nil
+	}
+
+	// At this point s is a span that should have free slots.
+havespan:
+	if trace.enabled && !traceDone {
+		traceGCSweepDone()
+	}
+	n := int(s.nelems) - int(s.allocCount)
+	if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {
+		throw("span has no free objects")
+	}
+	freeByteBase := s.freeindex &^ (64 - 1)
+	whichByte := freeByteBase / 8
+	// Init alloc bits cache.
+	s.refillAllocCache(whichByte)
+
+	// Adjust the allocCache so that s.freeindex corresponds to the low bit in
+	// s.allocCache.
+	s.allocCache >>= s.freeindex % 64
+
+	return s
+}
+
+// Return span from an mcache.
+//
+// s must have a span class corresponding to this
+// mcentral and it must not be empty.
+func (c *mcentral) uncacheSpan(s *mspan) {
+	if s.allocCount == 0 {
+		throw("uncaching span but s.allocCount == 0")
+	}
+
+	sg := mheap_.sweepgen
+	stale := s.sweepgen == sg+1
+
+	// Fix up sweepgen.
+	if stale {
+		// Span was cached before sweep began. It's our
+		// responsibility to sweep it.
+		//
+		// Set sweepgen to indicate it's not cached but needs
+		// sweeping and can't be allocated from. sweep will
+		// set s.sweepgen to indicate s is swept.
+		atomic.Store(&s.sweepgen, sg-1)
+	} else {
+		// Indicate that s is no longer cached.
+		atomic.Store(&s.sweepgen, sg)
+	}
+
+	// Put the span in the appropriate place.
+	if stale {
+		// It's stale, so just sweep it. Sweeping will put it on
+		// the right list.
+		s.sweep(false)
+	} else {
+		if int(s.nelems)-int(s.allocCount) > 0 {
+			// Put it back on the partial swept list.
+			c.partialSwept(sg).push(s)
+		} else {
+			// There's no free space and it's not stale, so put it on the
+			// full swept list.
+			c.fullSwept(sg).push(s)
+		}
+	}
+}
+
+// grow allocates a new empty span from the heap and initializes it for c's size class.
+func (c *mcentral) grow() *mspan {
+	npages := uintptr(class_to_allocnpages[c.spanclass.sizeclass()])
+	size := uintptr(class_to_size[c.spanclass.sizeclass()])
+
+	s := mheap_.alloc(npages, c.spanclass, true)
+	if s == nil {
+		return nil
+	}
+
+	// Use division by multiplication and shifts to quickly compute:
+	// n := (npages << _PageShift) / size
+	n := (npages << _PageShift) >> s.divShift * uintptr(s.divMul) >> s.divShift2
+	s.limit = s.base() + size*n
+	heapBitsForAddr(s.base()).initSpan(s)
+	return s
+}
diff --git a/src/runtime/mcheckmark.go b/src/runtime/mcheckmark.go
new file mode 100644
index 0000000..ba80ac1
--- /dev/null
+++ b/src/runtime/mcheckmark.go
@@ -0,0 +1,102 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// GC checkmarks
+//
+// In a concurrent garbage collector, one worries about failing to mark
+// a live object due to mutations without write barriers or bugs in the
+// collector implementation. As a sanity check, the GC has a 'checkmark'
+// mode that retraverses the object graph with the world stopped, to make
+// sure that everything that should be marked is marked.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// A checkmarksMap stores the GC marks in "checkmarks" mode. It is a
+// per-arena bitmap with a bit for every word in the arena. The mark
+// is stored on the bit corresponding to the first word of the marked
+// allocation.
+//
+//go:notinheap
+type checkmarksMap [heapArenaBytes / sys.PtrSize / 8]uint8
+
+// If useCheckmark is true, marking of an object uses the checkmark
+// bits instead of the standard mark bits.
+var useCheckmark = false
+
+// startCheckmarks prepares for the checkmarks phase.
+//
+// The world must be stopped.
+func startCheckmarks() {
+	assertWorldStopped()
+
+	// Clear all checkmarks.
+	for _, ai := range mheap_.allArenas {
+		arena := mheap_.arenas[ai.l1()][ai.l2()]
+		bitmap := arena.checkmarks
+
+		if bitmap == nil {
+			// Allocate bitmap on first use.
+			bitmap = (*checkmarksMap)(persistentalloc(unsafe.Sizeof(*bitmap), 0, &memstats.gcMiscSys))
+			if bitmap == nil {
+				throw("out of memory allocating checkmarks bitmap")
+			}
+			arena.checkmarks = bitmap
+		} else {
+			// Otherwise clear the existing bitmap.
+			for i := range bitmap {
+				bitmap[i] = 0
+			}
+		}
+	}
+	// Enable checkmarking.
+	useCheckmark = true
+}
+
+// endCheckmarks ends the checkmarks phase.
+func endCheckmarks() {
+	if gcMarkWorkAvailable(nil) {
+		throw("GC work not flushed")
+	}
+	useCheckmark = false
+}
+
+// setCheckmark throws if marking object is a checkmarks violation,
+// and otherwise sets obj's checkmark. It returns true if obj was
+// already checkmarked.
+func setCheckmark(obj, base, off uintptr, mbits markBits) bool {
+	if !mbits.isMarked() {
+		printlock()
+		print("runtime: checkmarks found unexpected unmarked object obj=", hex(obj), "\n")
+		print("runtime: found obj at *(", hex(base), "+", hex(off), ")\n")
+
+		// Dump the source (base) object
+		gcDumpObject("base", base, off)
+
+		// Dump the object
+		gcDumpObject("obj", obj, ^uintptr(0))
+
+		getg().m.traceback = 2
+		throw("checkmark found unmarked object")
+	}
+
+	ai := arenaIndex(obj)
+	arena := mheap_.arenas[ai.l1()][ai.l2()]
+	arenaWord := (obj / heapArenaBytes / 8) % uintptr(len(arena.checkmarks))
+	mask := byte(1 << ((obj / heapArenaBytes) % 8))
+	bytep := &arena.checkmarks[arenaWord]
+
+	if atomic.Load8(bytep)&mask != 0 {
+		// Already checkmarked.
+		return true
+	}
+
+	atomic.Or8(bytep, mask)
+	return false
+}
diff --git a/src/runtime/mem_aix.go b/src/runtime/mem_aix.go
new file mode 100644
index 0000000..957aa4d
--- /dev/null
+++ b/src/runtime/mem_aix.go
@@ -0,0 +1,77 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Don't split the stack as this method may be invoked without a valid G, which
+// prevents us from allocating more stack.
+//go:nosplit
+func sysAlloc(n uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	p, err := mmap(nil, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		if err == _EACCES {
+			print("runtime: mmap: access denied\n")
+			exit(2)
+		}
+		if err == _EAGAIN {
+			print("runtime: mmap: too much locked memory (check 'ulimit -l').\n")
+			exit(2)
+		}
+		return nil
+	}
+	sysStat.add(int64(n))
+	return p
+}
+
+func sysUnused(v unsafe.Pointer, n uintptr) {
+	madvise(v, n, _MADV_DONTNEED)
+}
+
+func sysUsed(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePage(v unsafe.Pointer, n uintptr) {
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//go:nosplit
+func sysFree(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(-int64(n))
+	munmap(v, n)
+
+}
+
+func sysFault(v unsafe.Pointer, n uintptr) {
+	mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE|_MAP_FIXED, -1, 0)
+}
+
+func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	p, err := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	return p
+}
+
+func sysMap(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(int64(n))
+
+	// AIX does not allow mapping a range that is already mapped.
+	// So, call mprotect to change permissions.
+	// Note that sysMap is always called with a non-nil pointer
+	// since it transitions a Reserved memory region to Prepared,
+	// so mprotect is always possible.
+	_, err := mprotect(v, n, _PROT_READ|_PROT_WRITE)
+	if err == _ENOMEM {
+		throw("runtime: out of memory")
+	}
+	if err != 0 {
+		throw("runtime: cannot map pages in arena address space")
+	}
+}
diff --git a/src/runtime/mem_bsd.go b/src/runtime/mem_bsd.go
new file mode 100644
index 0000000..bc67201
--- /dev/null
+++ b/src/runtime/mem_bsd.go
@@ -0,0 +1,78 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build dragonfly freebsd netbsd openbsd solaris
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//go:nosplit
+func sysAlloc(n uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	v, err := mmap(nil, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	sysStat.add(int64(n))
+	return v
+}
+
+func sysUnused(v unsafe.Pointer, n uintptr) {
+	madvise(v, n, _MADV_FREE)
+}
+
+func sysUsed(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePage(v unsafe.Pointer, n uintptr) {
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//go:nosplit
+func sysFree(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(-int64(n))
+	munmap(v, n)
+}
+
+func sysFault(v unsafe.Pointer, n uintptr) {
+	mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE|_MAP_FIXED, -1, 0)
+}
+
+// Indicates not to reserve swap space for the mapping.
+const _sunosMAP_NORESERVE = 0x40
+
+func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	flags := int32(_MAP_ANON | _MAP_PRIVATE)
+	if GOOS == "solaris" || GOOS == "illumos" {
+		// Be explicit that we don't want to reserve swap space
+		// for PROT_NONE anonymous mappings. This avoids an issue
+		// wherein large mappings can cause fork to fail.
+		flags |= _sunosMAP_NORESERVE
+	}
+	p, err := mmap(v, n, _PROT_NONE, flags, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	return p
+}
+
+const _sunosEAGAIN = 11
+const _ENOMEM = 12
+
+func sysMap(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(int64(n))
+
+	p, err := mmap(v, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, -1, 0)
+	if err == _ENOMEM || ((GOOS == "solaris" || GOOS == "illumos") && err == _sunosEAGAIN) {
+		throw("runtime: out of memory")
+	}
+	if p != v || err != 0 {
+		throw("runtime: cannot map pages in arena address space")
+	}
+}
diff --git a/src/runtime/mem_darwin.go b/src/runtime/mem_darwin.go
new file mode 100644
index 0000000..7fccd2b
--- /dev/null
+++ b/src/runtime/mem_darwin.go
@@ -0,0 +1,71 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//go:nosplit
+func sysAlloc(n uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	v, err := mmap(nil, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	sysStat.add(int64(n))
+	return v
+}
+
+func sysUnused(v unsafe.Pointer, n uintptr) {
+	// MADV_FREE_REUSABLE is like MADV_FREE except it also propagates
+	// accounting information about the process to task_info.
+	madvise(v, n, _MADV_FREE_REUSABLE)
+}
+
+func sysUsed(v unsafe.Pointer, n uintptr) {
+	// MADV_FREE_REUSE is necessary to keep the kernel's accounting
+	// accurate. If called on any memory region that hasn't been
+	// MADV_FREE_REUSABLE'd, it's a no-op.
+	madvise(v, n, _MADV_FREE_REUSE)
+}
+
+func sysHugePage(v unsafe.Pointer, n uintptr) {
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//go:nosplit
+func sysFree(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(-int64(n))
+	munmap(v, n)
+}
+
+func sysFault(v unsafe.Pointer, n uintptr) {
+	mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE|_MAP_FIXED, -1, 0)
+}
+
+func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	p, err := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	return p
+}
+
+const _ENOMEM = 12
+
+func sysMap(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(int64(n))
+
+	p, err := mmap(v, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, -1, 0)
+	if err == _ENOMEM {
+		throw("runtime: out of memory")
+	}
+	if p != v || err != 0 {
+		throw("runtime: cannot map pages in arena address space")
+	}
+}
diff --git a/src/runtime/mem_js.go b/src/runtime/mem_js.go
new file mode 100644
index 0000000..957ed36
--- /dev/null
+++ b/src/runtime/mem_js.go
@@ -0,0 +1,85 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build js,wasm
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//go:nosplit
+func sysAlloc(n uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	p := sysReserve(nil, n)
+	sysMap(p, n, sysStat)
+	return p
+}
+
+func sysUnused(v unsafe.Pointer, n uintptr) {
+}
+
+func sysUsed(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePage(v unsafe.Pointer, n uintptr) {
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//go:nosplit
+func sysFree(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(-int64(n))
+}
+
+func sysFault(v unsafe.Pointer, n uintptr) {
+}
+
+var reserveEnd uintptr
+
+func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	// TODO(neelance): maybe unify with mem_plan9.go, depending on how https://github.com/WebAssembly/design/blob/master/FutureFeatures.md#finer-grained-control-over-memory turns out
+
+	if v != nil {
+		// The address space of WebAssembly's linear memory is contiguous,
+		// so requesting specific addresses is not supported. We could use
+		// a different address, but then mheap.sysAlloc discards the result
+		// right away and we don't reuse chunks passed to sysFree.
+		return nil
+	}
+
+	// Round up the initial reserveEnd to 64 KiB so that
+	// reservations are always aligned to the page size.
+	initReserveEnd := alignUp(lastmoduledatap.end, physPageSize)
+	if reserveEnd < initReserveEnd {
+		reserveEnd = initReserveEnd
+	}
+	v = unsafe.Pointer(reserveEnd)
+	reserveEnd += alignUp(n, physPageSize)
+
+	current := currentMemory()
+	// reserveEnd is always at a page boundary.
+	needed := int32(reserveEnd / physPageSize)
+	if current < needed {
+		if growMemory(needed-current) == -1 {
+			return nil
+		}
+		resetMemoryDataView()
+	}
+
+	return v
+}
+
+func currentMemory() int32
+func growMemory(pages int32) int32
+
+// resetMemoryDataView signals the JS front-end that WebAssembly's memory.grow instruction has been used.
+// This allows the front-end to replace the old DataView object with a new one.
+func resetMemoryDataView()
+
+func sysMap(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(int64(n))
+}
diff --git a/src/runtime/mem_linux.go b/src/runtime/mem_linux.go
new file mode 100644
index 0000000..3436851
--- /dev/null
+++ b/src/runtime/mem_linux.go
@@ -0,0 +1,174 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const (
+	_EACCES = 13
+	_EINVAL = 22
+)
+
+// Don't split the stack as this method may be invoked without a valid G, which
+// prevents us from allocating more stack.
+//go:nosplit
+func sysAlloc(n uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	p, err := mmap(nil, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		if err == _EACCES {
+			print("runtime: mmap: access denied\n")
+			exit(2)
+		}
+		if err == _EAGAIN {
+			print("runtime: mmap: too much locked memory (check 'ulimit -l').\n")
+			exit(2)
+		}
+		return nil
+	}
+	sysStat.add(int64(n))
+	return p
+}
+
+var adviseUnused = uint32(_MADV_FREE)
+
+func sysUnused(v unsafe.Pointer, n uintptr) {
+	// By default, Linux's "transparent huge page" support will
+	// merge pages into a huge page if there's even a single
+	// present regular page, undoing the effects of madvise(adviseUnused)
+	// below. On amd64, that means khugepaged can turn a single
+	// 4KB page to 2MB, bloating the process's RSS by as much as
+	// 512X. (See issue #8832 and Linux kernel bug
+	// https://bugzilla.kernel.org/show_bug.cgi?id=93111)
+	//
+	// To work around this, we explicitly disable transparent huge
+	// pages when we release pages of the heap. However, we have
+	// to do this carefully because changing this flag tends to
+	// split the VMA (memory mapping) containing v in to three
+	// VMAs in order to track the different values of the
+	// MADV_NOHUGEPAGE flag in the different regions. There's a
+	// default limit of 65530 VMAs per address space (sysctl
+	// vm.max_map_count), so we must be careful not to create too
+	// many VMAs (see issue #12233).
+	//
+	// Since huge pages are huge, there's little use in adjusting
+	// the MADV_NOHUGEPAGE flag on a fine granularity, so we avoid
+	// exploding the number of VMAs by only adjusting the
+	// MADV_NOHUGEPAGE flag on a large granularity. This still
+	// gets most of the benefit of huge pages while keeping the
+	// number of VMAs under control. With hugePageSize = 2MB, even
+	// a pessimal heap can reach 128GB before running out of VMAs.
+	if physHugePageSize != 0 {
+		// If it's a large allocation, we want to leave huge
+		// pages enabled. Hence, we only adjust the huge page
+		// flag on the huge pages containing v and v+n-1, and
+		// only if those aren't aligned.
+		var head, tail uintptr
+		if uintptr(v)&(physHugePageSize-1) != 0 {
+			// Compute huge page containing v.
+			head = alignDown(uintptr(v), physHugePageSize)
+		}
+		if (uintptr(v)+n)&(physHugePageSize-1) != 0 {
+			// Compute huge page containing v+n-1.
+			tail = alignDown(uintptr(v)+n-1, physHugePageSize)
+		}
+
+		// Note that madvise will return EINVAL if the flag is
+		// already set, which is quite likely. We ignore
+		// errors.
+		if head != 0 && head+physHugePageSize == tail {
+			// head and tail are different but adjacent,
+			// so do this in one call.
+			madvise(unsafe.Pointer(head), 2*physHugePageSize, _MADV_NOHUGEPAGE)
+		} else {
+			// Advise the huge pages containing v and v+n-1.
+			if head != 0 {
+				madvise(unsafe.Pointer(head), physHugePageSize, _MADV_NOHUGEPAGE)
+			}
+			if tail != 0 && tail != head {
+				madvise(unsafe.Pointer(tail), physHugePageSize, _MADV_NOHUGEPAGE)
+			}
+		}
+	}
+
+	if uintptr(v)&(physPageSize-1) != 0 || n&(physPageSize-1) != 0 {
+		// madvise will round this to any physical page
+		// *covered* by this range, so an unaligned madvise
+		// will release more memory than intended.
+		throw("unaligned sysUnused")
+	}
+
+	var advise uint32
+	if debug.madvdontneed != 0 {
+		advise = _MADV_DONTNEED
+	} else {
+		advise = atomic.Load(&adviseUnused)
+	}
+	if errno := madvise(v, n, int32(advise)); advise == _MADV_FREE && errno != 0 {
+		// MADV_FREE was added in Linux 4.5. Fall back to MADV_DONTNEED if it is
+		// not supported.
+		atomic.Store(&adviseUnused, _MADV_DONTNEED)
+		madvise(v, n, _MADV_DONTNEED)
+	}
+}
+
+func sysUsed(v unsafe.Pointer, n uintptr) {
+	// Partially undo the NOHUGEPAGE marks from sysUnused
+	// for whole huge pages between v and v+n. This may
+	// leave huge pages off at the end points v and v+n
+	// even though allocations may cover these entire huge
+	// pages. We could detect this and undo NOHUGEPAGE on
+	// the end points as well, but it's probably not worth
+	// the cost because when neighboring allocations are
+	// freed sysUnused will just set NOHUGEPAGE again.
+	sysHugePage(v, n)
+}
+
+func sysHugePage(v unsafe.Pointer, n uintptr) {
+	if physHugePageSize != 0 {
+		// Round v up to a huge page boundary.
+		beg := alignUp(uintptr(v), physHugePageSize)
+		// Round v+n down to a huge page boundary.
+		end := alignDown(uintptr(v)+n, physHugePageSize)
+
+		if beg < end {
+			madvise(unsafe.Pointer(beg), end-beg, _MADV_HUGEPAGE)
+		}
+	}
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//go:nosplit
+func sysFree(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(-int64(n))
+	munmap(v, n)
+}
+
+func sysFault(v unsafe.Pointer, n uintptr) {
+	mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE|_MAP_FIXED, -1, 0)
+}
+
+func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	p, err := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	return p
+}
+
+func sysMap(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(int64(n))
+
+	p, err := mmap(v, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, -1, 0)
+	if err == _ENOMEM {
+		throw("runtime: out of memory")
+	}
+	if p != v || err != 0 {
+		throw("runtime: cannot map pages in arena address space")
+	}
+}
diff --git a/src/runtime/mem_plan9.go b/src/runtime/mem_plan9.go
new file mode 100644
index 0000000..53d8e6d
--- /dev/null
+++ b/src/runtime/mem_plan9.go
@@ -0,0 +1,202 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+const memDebug = false
+
+var bloc uintptr
+var blocMax uintptr
+var memlock mutex
+
+type memHdr struct {
+	next memHdrPtr
+	size uintptr
+}
+
+var memFreelist memHdrPtr // sorted in ascending order
+
+type memHdrPtr uintptr
+
+func (p memHdrPtr) ptr() *memHdr   { return (*memHdr)(unsafe.Pointer(p)) }
+func (p *memHdrPtr) set(x *memHdr) { *p = memHdrPtr(unsafe.Pointer(x)) }
+
+func memAlloc(n uintptr) unsafe.Pointer {
+	n = memRound(n)
+	var prevp *memHdr
+	for p := memFreelist.ptr(); p != nil; p = p.next.ptr() {
+		if p.size >= n {
+			if p.size == n {
+				if prevp != nil {
+					prevp.next = p.next
+				} else {
+					memFreelist = p.next
+				}
+			} else {
+				p.size -= n
+				p = (*memHdr)(add(unsafe.Pointer(p), p.size))
+			}
+			*p = memHdr{}
+			return unsafe.Pointer(p)
+		}
+		prevp = p
+	}
+	return sbrk(n)
+}
+
+func memFree(ap unsafe.Pointer, n uintptr) {
+	n = memRound(n)
+	memclrNoHeapPointers(ap, n)
+	bp := (*memHdr)(ap)
+	bp.size = n
+	bpn := uintptr(ap)
+	if memFreelist == 0 {
+		bp.next = 0
+		memFreelist.set(bp)
+		return
+	}
+	p := memFreelist.ptr()
+	if bpn < uintptr(unsafe.Pointer(p)) {
+		memFreelist.set(bp)
+		if bpn+bp.size == uintptr(unsafe.Pointer(p)) {
+			bp.size += p.size
+			bp.next = p.next
+			*p = memHdr{}
+		} else {
+			bp.next.set(p)
+		}
+		return
+	}
+	for ; p.next != 0; p = p.next.ptr() {
+		if bpn > uintptr(unsafe.Pointer(p)) && bpn < uintptr(unsafe.Pointer(p.next)) {
+			break
+		}
+	}
+	if bpn+bp.size == uintptr(unsafe.Pointer(p.next)) {
+		bp.size += p.next.ptr().size
+		bp.next = p.next.ptr().next
+		*p.next.ptr() = memHdr{}
+	} else {
+		bp.next = p.next
+	}
+	if uintptr(unsafe.Pointer(p))+p.size == bpn {
+		p.size += bp.size
+		p.next = bp.next
+		*bp = memHdr{}
+	} else {
+		p.next.set(bp)
+	}
+}
+
+func memCheck() {
+	if memDebug == false {
+		return
+	}
+	for p := memFreelist.ptr(); p != nil && p.next != 0; p = p.next.ptr() {
+		if uintptr(unsafe.Pointer(p)) == uintptr(unsafe.Pointer(p.next)) {
+			print("runtime: ", unsafe.Pointer(p), " == ", unsafe.Pointer(p.next), "\n")
+			throw("mem: infinite loop")
+		}
+		if uintptr(unsafe.Pointer(p)) > uintptr(unsafe.Pointer(p.next)) {
+			print("runtime: ", unsafe.Pointer(p), " > ", unsafe.Pointer(p.next), "\n")
+			throw("mem: unordered list")
+		}
+		if uintptr(unsafe.Pointer(p))+p.size > uintptr(unsafe.Pointer(p.next)) {
+			print("runtime: ", unsafe.Pointer(p), "+", p.size, " > ", unsafe.Pointer(p.next), "\n")
+			throw("mem: overlapping blocks")
+		}
+		for b := add(unsafe.Pointer(p), unsafe.Sizeof(memHdr{})); uintptr(b) < uintptr(unsafe.Pointer(p))+p.size; b = add(b, 1) {
+			if *(*byte)(b) != 0 {
+				print("runtime: value at addr ", b, " with offset ", uintptr(b)-uintptr(unsafe.Pointer(p)), " in block ", p, " of size ", p.size, " is not zero\n")
+				throw("mem: uninitialised memory")
+			}
+		}
+	}
+}
+
+func memRound(p uintptr) uintptr {
+	return (p + _PAGESIZE - 1) &^ (_PAGESIZE - 1)
+}
+
+func initBloc() {
+	bloc = memRound(firstmoduledata.end)
+	blocMax = bloc
+}
+
+func sbrk(n uintptr) unsafe.Pointer {
+	// Plan 9 sbrk from /sys/src/libc/9sys/sbrk.c
+	bl := bloc
+	n = memRound(n)
+	if bl+n > blocMax {
+		if brk_(unsafe.Pointer(bl+n)) < 0 {
+			return nil
+		}
+		blocMax = bl + n
+	}
+	bloc += n
+	return unsafe.Pointer(bl)
+}
+
+func sysAlloc(n uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	lock(&memlock)
+	p := memAlloc(n)
+	memCheck()
+	unlock(&memlock)
+	if p != nil {
+		sysStat.add(int64(n))
+	}
+	return p
+}
+
+func sysFree(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(-int64(n))
+	lock(&memlock)
+	if uintptr(v)+n == bloc {
+		// Address range being freed is at the end of memory,
+		// so record a new lower value for end of memory.
+		// Can't actually shrink address space because segment is shared.
+		memclrNoHeapPointers(v, n)
+		bloc -= n
+	} else {
+		memFree(v, n)
+		memCheck()
+	}
+	unlock(&memlock)
+}
+
+func sysUnused(v unsafe.Pointer, n uintptr) {
+}
+
+func sysUsed(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePage(v unsafe.Pointer, n uintptr) {
+}
+
+func sysMap(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	// sysReserve has already allocated all heap memory,
+	// but has not adjusted stats.
+	sysStat.add(int64(n))
+}
+
+func sysFault(v unsafe.Pointer, n uintptr) {
+}
+
+func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	lock(&memlock)
+	var p unsafe.Pointer
+	if uintptr(v) == bloc {
+		// Address hint is the current end of memory,
+		// so try to extend the address space.
+		p = sbrk(n)
+	}
+	if p == nil && v == nil {
+		p = memAlloc(n)
+		memCheck()
+	}
+	unlock(&memlock)
+	return p
+}
diff --git a/src/runtime/mem_windows.go b/src/runtime/mem_windows.go
new file mode 100644
index 0000000..3a805b9
--- /dev/null
+++ b/src/runtime/mem_windows.go
@@ -0,0 +1,129 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+const (
+	_MEM_COMMIT   = 0x1000
+	_MEM_RESERVE  = 0x2000
+	_MEM_DECOMMIT = 0x4000
+	_MEM_RELEASE  = 0x8000
+
+	_PAGE_READWRITE = 0x0004
+	_PAGE_NOACCESS  = 0x0001
+
+	_ERROR_NOT_ENOUGH_MEMORY = 8
+	_ERROR_COMMITMENT_LIMIT  = 1455
+)
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//go:nosplit
+func sysAlloc(n uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	sysStat.add(int64(n))
+	return unsafe.Pointer(stdcall4(_VirtualAlloc, 0, n, _MEM_COMMIT|_MEM_RESERVE, _PAGE_READWRITE))
+}
+
+func sysUnused(v unsafe.Pointer, n uintptr) {
+	r := stdcall3(_VirtualFree, uintptr(v), n, _MEM_DECOMMIT)
+	if r != 0 {
+		return
+	}
+
+	// Decommit failed. Usual reason is that we've merged memory from two different
+	// VirtualAlloc calls, and Windows will only let each VirtualFree handle pages from
+	// a single VirtualAlloc. It is okay to specify a subset of the pages from a single alloc,
+	// just not pages from multiple allocs. This is a rare case, arising only when we're
+	// trying to give memory back to the operating system, which happens on a time
+	// scale of minutes. It doesn't have to be terribly fast. Instead of extra bookkeeping
+	// on all our VirtualAlloc calls, try freeing successively smaller pieces until
+	// we manage to free something, and then repeat. This ends up being O(n log n)
+	// in the worst case, but that's fast enough.
+	for n > 0 {
+		small := n
+		for small >= 4096 && stdcall3(_VirtualFree, uintptr(v), small, _MEM_DECOMMIT) == 0 {
+			small /= 2
+			small &^= 4096 - 1
+		}
+		if small < 4096 {
+			print("runtime: VirtualFree of ", small, " bytes failed with errno=", getlasterror(), "\n")
+			throw("runtime: failed to decommit pages")
+		}
+		v = add(v, small)
+		n -= small
+	}
+}
+
+func sysUsed(v unsafe.Pointer, n uintptr) {
+	p := stdcall4(_VirtualAlloc, uintptr(v), n, _MEM_COMMIT, _PAGE_READWRITE)
+	if p == uintptr(v) {
+		return
+	}
+
+	// Commit failed. See SysUnused.
+	// Hold on to n here so we can give back a better error message
+	// for certain cases.
+	k := n
+	for k > 0 {
+		small := k
+		for small >= 4096 && stdcall4(_VirtualAlloc, uintptr(v), small, _MEM_COMMIT, _PAGE_READWRITE) == 0 {
+			small /= 2
+			small &^= 4096 - 1
+		}
+		if small < 4096 {
+			errno := getlasterror()
+			switch errno {
+			case _ERROR_NOT_ENOUGH_MEMORY, _ERROR_COMMITMENT_LIMIT:
+				print("runtime: VirtualAlloc of ", n, " bytes failed with errno=", errno, "\n")
+				throw("out of memory")
+			default:
+				print("runtime: VirtualAlloc of ", small, " bytes failed with errno=", errno, "\n")
+				throw("runtime: failed to commit pages")
+			}
+		}
+		v = add(v, small)
+		k -= small
+	}
+}
+
+func sysHugePage(v unsafe.Pointer, n uintptr) {
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//go:nosplit
+func sysFree(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(-int64(n))
+	r := stdcall3(_VirtualFree, uintptr(v), 0, _MEM_RELEASE)
+	if r == 0 {
+		print("runtime: VirtualFree of ", n, " bytes failed with errno=", getlasterror(), "\n")
+		throw("runtime: failed to release pages")
+	}
+}
+
+func sysFault(v unsafe.Pointer, n uintptr) {
+	// SysUnused makes the memory inaccessible and prevents its reuse
+	sysUnused(v, n)
+}
+
+func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	// v is just a hint.
+	// First try at v.
+	// This will fail if any of [v, v+n) is already reserved.
+	v = unsafe.Pointer(stdcall4(_VirtualAlloc, uintptr(v), n, _MEM_RESERVE, _PAGE_READWRITE))
+	if v != nil {
+		return v
+	}
+
+	// Next let the kernel choose the address.
+	return unsafe.Pointer(stdcall4(_VirtualAlloc, 0, n, _MEM_RESERVE, _PAGE_READWRITE))
+}
+
+func sysMap(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(int64(n))
+}
diff --git a/src/runtime/memclr_386.s b/src/runtime/memclr_386.s
new file mode 100644
index 0000000..5e090ef
--- /dev/null
+++ b/src/runtime/memclr_386.s
@@ -0,0 +1,138 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// NOTE: Windows externalthreadhandler expects memclr to preserve DX.
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), DI
+	MOVL	n+4(FP), BX
+	XORL	AX, AX
+
+	// MOVOU seems always faster than REP STOSL.
+tail:
+	// BSR+branch table make almost all memmove/memclr benchmarks worse. Not worth doing.
+	TESTL	BX, BX
+	JEQ	_0
+	CMPL	BX, $2
+	JBE	_1or2
+	CMPL	BX, $4
+	JB	_3
+	JE	_4
+	CMPL	BX, $8
+	JBE	_5through8
+	CMPL	BX, $16
+	JBE	_9through16
+	CMPB	internal∕cpu·X86+const_offsetX86HasSSE2(SB), $1
+	JNE	nosse2
+	PXOR	X0, X0
+	CMPL	BX, $32
+	JBE	_17through32
+	CMPL	BX, $64
+	JBE	_33through64
+	CMPL	BX, $128
+	JBE	_65through128
+	CMPL	BX, $256
+	JBE	_129through256
+
+loop:
+	MOVOU	X0, 0(DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, 32(DI)
+	MOVOU	X0, 48(DI)
+	MOVOU	X0, 64(DI)
+	MOVOU	X0, 80(DI)
+	MOVOU	X0, 96(DI)
+	MOVOU	X0, 112(DI)
+	MOVOU	X0, 128(DI)
+	MOVOU	X0, 144(DI)
+	MOVOU	X0, 160(DI)
+	MOVOU	X0, 176(DI)
+	MOVOU	X0, 192(DI)
+	MOVOU	X0, 208(DI)
+	MOVOU	X0, 224(DI)
+	MOVOU	X0, 240(DI)
+	SUBL	$256, BX
+	ADDL	$256, DI
+	CMPL	BX, $256
+	JAE	loop
+	JMP	tail
+
+_1or2:
+	MOVB	AX, (DI)
+	MOVB	AX, -1(DI)(BX*1)
+	RET
+_0:
+	RET
+_3:
+	MOVW	AX, (DI)
+	MOVB	AX, 2(DI)
+	RET
+_4:
+	// We need a separate case for 4 to make sure we clear pointers atomically.
+	MOVL	AX, (DI)
+	RET
+_5through8:
+	MOVL	AX, (DI)
+	MOVL	AX, -4(DI)(BX*1)
+	RET
+_9through16:
+	MOVL	AX, (DI)
+	MOVL	AX, 4(DI)
+	MOVL	AX, -8(DI)(BX*1)
+	MOVL	AX, -4(DI)(BX*1)
+	RET
+_17through32:
+	MOVOU	X0, (DI)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+_33through64:
+	MOVOU	X0, (DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, -32(DI)(BX*1)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+_65through128:
+	MOVOU	X0, (DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, 32(DI)
+	MOVOU	X0, 48(DI)
+	MOVOU	X0, -64(DI)(BX*1)
+	MOVOU	X0, -48(DI)(BX*1)
+	MOVOU	X0, -32(DI)(BX*1)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+_129through256:
+	MOVOU	X0, (DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, 32(DI)
+	MOVOU	X0, 48(DI)
+	MOVOU	X0, 64(DI)
+	MOVOU	X0, 80(DI)
+	MOVOU	X0, 96(DI)
+	MOVOU	X0, 112(DI)
+	MOVOU	X0, -128(DI)(BX*1)
+	MOVOU	X0, -112(DI)(BX*1)
+	MOVOU	X0, -96(DI)(BX*1)
+	MOVOU	X0, -80(DI)(BX*1)
+	MOVOU	X0, -64(DI)(BX*1)
+	MOVOU	X0, -48(DI)(BX*1)
+	MOVOU	X0, -32(DI)(BX*1)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+nosse2:
+	MOVL	BX, CX
+	SHRL	$2, CX
+	REP
+	STOSL
+	ANDL	$3, BX
+	JNE	tail
+	RET
diff --git a/src/runtime/memclr_amd64.s b/src/runtime/memclr_amd64.s
new file mode 100644
index 0000000..37fe974
--- /dev/null
+++ b/src/runtime/memclr_amd64.s
@@ -0,0 +1,180 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// NOTE: Windows externalthreadhandler expects memclr to preserve DX.
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB), NOSPLIT, $0-16
+	MOVQ	ptr+0(FP), DI
+	MOVQ	n+8(FP), BX
+	XORQ	AX, AX
+
+	// MOVOU seems always faster than REP STOSQ.
+tail:
+	// BSR+branch table make almost all memmove/memclr benchmarks worse. Not worth doing.
+	TESTQ	BX, BX
+	JEQ	_0
+	CMPQ	BX, $2
+	JBE	_1or2
+	CMPQ	BX, $4
+	JBE	_3or4
+	CMPQ	BX, $8
+	JB	_5through7
+	JE	_8
+	CMPQ	BX, $16
+	JBE	_9through16
+	PXOR	X0, X0
+	CMPQ	BX, $32
+	JBE	_17through32
+	CMPQ	BX, $64
+	JBE	_33through64
+	CMPQ	BX, $128
+	JBE	_65through128
+	CMPQ	BX, $256
+	JBE	_129through256
+	CMPB	internal∕cpu·X86+const_offsetX86HasAVX2(SB), $1
+	JE loop_preheader_avx2
+	// TODO: for really big clears, use MOVNTDQ, even without AVX2.
+
+loop:
+	MOVOU	X0, 0(DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, 32(DI)
+	MOVOU	X0, 48(DI)
+	MOVOU	X0, 64(DI)
+	MOVOU	X0, 80(DI)
+	MOVOU	X0, 96(DI)
+	MOVOU	X0, 112(DI)
+	MOVOU	X0, 128(DI)
+	MOVOU	X0, 144(DI)
+	MOVOU	X0, 160(DI)
+	MOVOU	X0, 176(DI)
+	MOVOU	X0, 192(DI)
+	MOVOU	X0, 208(DI)
+	MOVOU	X0, 224(DI)
+	MOVOU	X0, 240(DI)
+	SUBQ	$256, BX
+	ADDQ	$256, DI
+	CMPQ	BX, $256
+	JAE	loop
+	JMP	tail
+
+loop_preheader_avx2:
+	VPXOR Y0, Y0, Y0
+	// For smaller sizes MOVNTDQ may be faster or slower depending on hardware.
+	// For larger sizes it is always faster, even on dual Xeons with 30M cache.
+	// TODO take into account actual LLC size. E. g. glibc uses LLC size/2.
+	CMPQ    BX, $0x2000000
+	JAE     loop_preheader_avx2_huge
+loop_avx2:
+	VMOVDQU	Y0, 0(DI)
+	VMOVDQU	Y0, 32(DI)
+	VMOVDQU	Y0, 64(DI)
+	VMOVDQU	Y0, 96(DI)
+	SUBQ	$128, BX
+	ADDQ	$128, DI
+	CMPQ	BX, $128
+	JAE	loop_avx2
+	VMOVDQU  Y0, -32(DI)(BX*1)
+	VMOVDQU  Y0, -64(DI)(BX*1)
+	VMOVDQU  Y0, -96(DI)(BX*1)
+	VMOVDQU  Y0, -128(DI)(BX*1)
+	VZEROUPPER
+	RET
+loop_preheader_avx2_huge:
+	// Align to 32 byte boundary
+	VMOVDQU  Y0, 0(DI)
+	MOVQ	DI, SI
+	ADDQ	$32, DI
+	ANDQ	$~31, DI
+	SUBQ	DI, SI
+	ADDQ	SI, BX
+loop_avx2_huge:
+	VMOVNTDQ	Y0, 0(DI)
+	VMOVNTDQ	Y0, 32(DI)
+	VMOVNTDQ	Y0, 64(DI)
+	VMOVNTDQ	Y0, 96(DI)
+	SUBQ	$128, BX
+	ADDQ	$128, DI
+	CMPQ	BX, $128
+	JAE	loop_avx2_huge
+	// In the description of MOVNTDQ in [1]
+	// "... fencing operation implemented with the SFENCE or MFENCE instruction
+	// should be used in conjunction with MOVNTDQ instructions..."
+	// [1] 64-ia-32-architectures-software-developer-manual-325462.pdf
+	SFENCE
+	VMOVDQU  Y0, -32(DI)(BX*1)
+	VMOVDQU  Y0, -64(DI)(BX*1)
+	VMOVDQU  Y0, -96(DI)(BX*1)
+	VMOVDQU  Y0, -128(DI)(BX*1)
+	VZEROUPPER
+	RET
+
+_1or2:
+	MOVB	AX, (DI)
+	MOVB	AX, -1(DI)(BX*1)
+	RET
+_0:
+	RET
+_3or4:
+	MOVW	AX, (DI)
+	MOVW	AX, -2(DI)(BX*1)
+	RET
+_5through7:
+	MOVL	AX, (DI)
+	MOVL	AX, -4(DI)(BX*1)
+	RET
+_8:
+	// We need a separate case for 8 to make sure we clear pointers atomically.
+	MOVQ	AX, (DI)
+	RET
+_9through16:
+	MOVQ	AX, (DI)
+	MOVQ	AX, -8(DI)(BX*1)
+	RET
+_17through32:
+	MOVOU	X0, (DI)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+_33through64:
+	MOVOU	X0, (DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, -32(DI)(BX*1)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+_65through128:
+	MOVOU	X0, (DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, 32(DI)
+	MOVOU	X0, 48(DI)
+	MOVOU	X0, -64(DI)(BX*1)
+	MOVOU	X0, -48(DI)(BX*1)
+	MOVOU	X0, -32(DI)(BX*1)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+_129through256:
+	MOVOU	X0, (DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, 32(DI)
+	MOVOU	X0, 48(DI)
+	MOVOU	X0, 64(DI)
+	MOVOU	X0, 80(DI)
+	MOVOU	X0, 96(DI)
+	MOVOU	X0, 112(DI)
+	MOVOU	X0, -128(DI)(BX*1)
+	MOVOU	X0, -112(DI)(BX*1)
+	MOVOU	X0, -96(DI)(BX*1)
+	MOVOU	X0, -80(DI)(BX*1)
+	MOVOU	X0, -64(DI)(BX*1)
+	MOVOU	X0, -48(DI)(BX*1)
+	MOVOU	X0, -32(DI)(BX*1)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
diff --git a/src/runtime/memclr_arm.s b/src/runtime/memclr_arm.s
new file mode 100644
index 0000000..f113a1a
--- /dev/null
+++ b/src/runtime/memclr_arm.s
@@ -0,0 +1,90 @@
+// Inferno's libkern/memset-arm.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memset-arm.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "textflag.h"
+
+#define TO	R8
+#define TOE	R11
+#define N	R12
+#define TMP	R12				/* N and TMP don't overlap */
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-8
+	MOVW	ptr+0(FP), TO
+	MOVW	n+4(FP), N
+	MOVW	$0, R0
+
+	ADD	N, TO, TOE	/* to end pointer */
+
+	CMP	$4, N		/* need at least 4 bytes to copy */
+	BLT	_1tail
+
+_4align:				/* align on 4 */
+	AND.S	$3, TO, TMP
+	BEQ	_4aligned
+
+	MOVBU.P	R0, 1(TO)		/* implicit write back */
+	B	_4align
+
+_4aligned:
+	SUB	$31, TOE, TMP	/* do 32-byte chunks if possible */
+	CMP	TMP, TO
+	BHS	_4tail
+
+	MOVW	R0, R1			/* replicate */
+	MOVW	R0, R2
+	MOVW	R0, R3
+	MOVW	R0, R4
+	MOVW	R0, R5
+	MOVW	R0, R6
+	MOVW	R0, R7
+
+_f32loop:
+	CMP	TMP, TO
+	BHS	_4tail
+
+	MOVM.IA.W [R0-R7], (TO)
+	B	_f32loop
+
+_4tail:
+	SUB	$3, TOE, TMP	/* do remaining words if possible */
+_4loop:
+	CMP	TMP, TO
+	BHS	_1tail
+
+	MOVW.P	R0, 4(TO)		/* implicit write back */
+	B	_4loop
+
+_1tail:
+	CMP	TO, TOE
+	BEQ	_return
+
+	MOVBU.P	R0, 1(TO)		/* implicit write back */
+	B	_1tail
+
+_return:
+	RET
diff --git a/src/runtime/memclr_arm64.s b/src/runtime/memclr_arm64.s
new file mode 100644
index 0000000..bef7765
--- /dev/null
+++ b/src/runtime/memclr_arm64.s
@@ -0,0 +1,184 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-16
+	MOVD	ptr+0(FP), R0
+	MOVD	n+8(FP), R1
+
+	CMP	$16, R1
+	// If n is equal to 16 bytes, use zero_exact_16 to zero
+	BEQ	zero_exact_16
+
+	// If n is greater than 16 bytes, use zero_by_16 to zero
+	BHI	zero_by_16
+
+	// n is less than 16 bytes
+	ADD	R1, R0, R7
+	TBZ	$3, R1, less_than_8
+	MOVD	ZR, (R0)
+	MOVD	ZR, -8(R7)
+	RET
+
+less_than_8:
+	TBZ	$2, R1, less_than_4
+	MOVW	ZR, (R0)
+	MOVW	ZR, -4(R7)
+	RET
+
+less_than_4:
+	CBZ	R1, ending
+	MOVB	ZR, (R0)
+	TBZ	$1, R1, ending
+	MOVH	ZR, -2(R7)
+
+ending:
+	RET
+
+zero_exact_16:
+	// n is exactly 16 bytes
+	STP	(ZR, ZR), (R0)
+	RET
+
+zero_by_16:
+	// n greater than 16 bytes, check if the start address is aligned
+	NEG	R0, R4
+	ANDS	$15, R4, R4
+	// Try zeroing using zva if the start address is aligned with 16
+	BEQ	try_zva
+
+	// Non-aligned store
+	STP	(ZR, ZR), (R0)
+	// Make the destination aligned
+	SUB	R4, R1, R1
+	ADD	R4, R0, R0
+	B	try_zva
+
+tail_maybe_long:
+	CMP	$64, R1
+	BHS	no_zva
+
+tail63:
+	ANDS	$48, R1, R3
+	BEQ	last16
+	CMPW	$32, R3
+	BEQ	last48
+	BLT	last32
+	STP.P	(ZR, ZR), 16(R0)
+last48:
+	STP.P	(ZR, ZR), 16(R0)
+last32:
+	STP.P	(ZR, ZR), 16(R0)
+	// The last store length is at most 16, so it is safe to use
+	// stp to write last 16 bytes
+last16:
+	ANDS	$15, R1, R1
+	CBZ	R1, last_end
+	ADD	R1, R0, R0
+	STP	(ZR, ZR), -16(R0)
+last_end:
+	RET
+
+no_zva:
+	SUB	$16, R0, R0
+	SUB	$64, R1, R1
+
+loop_64:
+	STP	(ZR, ZR), 16(R0)
+	STP	(ZR, ZR), 32(R0)
+	STP	(ZR, ZR), 48(R0)
+	STP.W	(ZR, ZR), 64(R0)
+	SUBS	$64, R1, R1
+	BGE	loop_64
+	ANDS	$63, R1, ZR
+	ADD	$16, R0, R0
+	BNE	tail63
+	RET
+
+try_zva:
+	// Try using the ZVA feature to zero entire cache lines
+	// It is not meaningful to use ZVA if the block size is less than 64,
+	// so make sure that n is greater than or equal to 64
+	CMP	$63, R1
+	BLE	tail63
+
+	CMP	$128, R1
+	// Ensure n is at least 128 bytes, so that there is enough to copy after
+	// alignment.
+	BLT	no_zva
+	// Check if ZVA is allowed from user code, and if so get the block size
+	MOVW	block_size<>(SB), R5
+	TBNZ	$31, R5, no_zva
+	CBNZ	R5, zero_by_line
+	// DCZID_EL0 bit assignments
+	// [63:5] Reserved
+	// [4]    DZP, if bit set DC ZVA instruction is prohibited, else permitted
+	// [3:0]  log2 of the block size in words, eg. if it returns 0x4 then block size is 16 words
+	MRS	DCZID_EL0, R3
+	TBZ	$4, R3, init
+	// ZVA not available
+	MOVW	$~0, R5
+	MOVW	R5, block_size<>(SB)
+	B	no_zva
+
+init:
+	MOVW	$4, R9
+	ANDW	$15, R3, R5
+	LSLW	R5, R9, R5
+	MOVW	R5, block_size<>(SB)
+
+	ANDS	$63, R5, R9
+	// Block size is less than 64.
+	BNE	no_zva
+
+zero_by_line:
+	CMP	R5, R1
+	// Not enough memory to reach alignment
+	BLO	no_zva
+	SUB	$1, R5, R6
+	NEG	R0, R4
+	ANDS	R6, R4, R4
+	// Already aligned
+	BEQ	aligned
+
+	// check there is enough to copy after alignment
+	SUB	R4, R1, R3
+
+	// Check that the remaining length to ZVA after alignment
+	// is greater than 64.
+	CMP	$64, R3
+	CCMP	GE, R3, R5, $10  // condition code GE, NZCV=0b1010
+	BLT	no_zva
+
+	// We now have at least 64 bytes to zero, update n
+	MOVD	R3, R1
+
+loop_zva_prolog:
+	STP	(ZR, ZR), (R0)
+	STP	(ZR, ZR), 16(R0)
+	STP	(ZR, ZR), 32(R0)
+	SUBS	$64, R4, R4
+	STP	(ZR, ZR), 48(R0)
+	ADD	$64, R0, R0
+	BGE	loop_zva_prolog
+
+	ADD	R4, R0, R0
+
+aligned:
+	SUB	R5, R1, R1
+
+loop_zva:
+	WORD	$0xd50b7420 // DC ZVA, R0
+	ADD	R5, R0, R0
+	SUBS	R5, R1, R1
+	BHS	loop_zva
+	ANDS	R6, R1, R1
+	BNE	tail_maybe_long
+	RET
+
+GLOBL block_size<>(SB), NOPTR, $8
diff --git a/src/runtime/memclr_mips64x.s b/src/runtime/memclr_mips64x.s
new file mode 100644
index 0000000..d7a3251
--- /dev/null
+++ b/src/runtime/memclr_mips64x.s
@@ -0,0 +1,99 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-16
+	MOVV	ptr+0(FP), R1
+	MOVV	n+8(FP), R2
+	ADDV	R1, R2, R4
+
+	// if less than 16 bytes or no MSA, do words check
+	SGTU	$16, R2, R3
+	BNE	R3, no_msa
+	MOVBU	internal∕cpu·MIPS64X+const_offsetMIPS64XHasMSA(SB), R3
+	BEQ	R3, R0, no_msa
+
+	VMOVB	$0, W0
+
+	SGTU	$128, R2, R3
+	BEQ	R3, msa_large
+
+	AND	$15, R2, R5
+	XOR	R2, R5, R6
+	ADDVU	R1, R6
+
+msa_small:
+	VMOVB	W0, (R1)
+	ADDVU	$16, R1
+	SGTU	R6, R1, R3
+	BNE	R3, R0, msa_small
+	BEQ	R5, R0, done
+	VMOVB	W0, -16(R4)
+	JMP	done
+
+msa_large:
+	AND	$127, R2, R5
+	XOR	R2, R5, R6
+	ADDVU	R1, R6
+
+msa_large_loop:
+	VMOVB	W0, (R1)
+	VMOVB	W0, 16(R1)
+	VMOVB	W0, 32(R1)
+	VMOVB	W0, 48(R1)
+	VMOVB	W0, 64(R1)
+	VMOVB	W0, 80(R1)
+	VMOVB	W0, 96(R1)
+	VMOVB	W0, 112(R1)
+
+	ADDVU	$128, R1
+	SGTU	R6, R1, R3
+	BNE	R3, R0, msa_large_loop
+	BEQ	R5, R0, done
+	VMOVB	W0, -128(R4)
+	VMOVB	W0, -112(R4)
+	VMOVB	W0, -96(R4)
+	VMOVB	W0, -80(R4)
+	VMOVB	W0, -64(R4)
+	VMOVB	W0, -48(R4)
+	VMOVB	W0, -32(R4)
+	VMOVB	W0, -16(R4)
+	JMP	done
+
+no_msa:
+	// if less than 8 bytes, do one byte at a time
+	SGTU	$8, R2, R3
+	BNE	R3, out
+
+	// do one byte at a time until 8-aligned
+	AND	$7, R1, R3
+	BEQ	R3, words
+	MOVB	R0, (R1)
+	ADDV	$1, R1
+	JMP	-4(PC)
+
+words:
+	// do 8 bytes at a time if there is room
+	ADDV	$-7, R4, R2
+
+	SGTU	R2, R1, R3
+	BEQ	R3, out
+	MOVV	R0, (R1)
+	ADDV	$8, R1
+	JMP	-4(PC)
+
+out:
+	BEQ	R1, R4, done
+	MOVB	R0, (R1)
+	ADDV	$1, R1
+	JMP	-3(PC)
+done:
+	RET
diff --git a/src/runtime/memclr_mipsx.s b/src/runtime/memclr_mipsx.s
new file mode 100644
index 0000000..eb2a8a7
--- /dev/null
+++ b/src/runtime/memclr_mipsx.s
@@ -0,0 +1,73 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+#include "textflag.h"
+
+#ifdef GOARCH_mips
+#define MOVWHI  MOVWL
+#define MOVWLO  MOVWR
+#else
+#define MOVWHI  MOVWR
+#define MOVWLO  MOVWL
+#endif
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-8
+	MOVW	n+4(FP), R2
+	MOVW	ptr+0(FP), R1
+
+	SGTU	$4, R2, R3
+	ADDU	R2, R1, R4
+	BNE	R3, small_zero
+
+ptr_align:
+	AND	$3, R1, R3
+	BEQ	R3, setup
+	SUBU	R1, R0, R3
+	AND	$3, R3		// R3 contains number of bytes needed to align ptr
+	MOVWHI	R0, 0(R1)	// MOVWHI will write zeros up to next word boundary
+	SUBU	R3, R2
+	ADDU	R3, R1
+
+setup:
+	AND	$31, R2, R6
+	AND	$3, R2, R5
+	SUBU	R6, R4, R6	// end pointer for 32-byte chunks
+	SUBU	R5, R4, R5	// end pointer for 4-byte chunks
+
+large:
+	BEQ	R1, R6, words
+	MOVW	R0, 0(R1)
+	MOVW	R0, 4(R1)
+	MOVW	R0, 8(R1)
+	MOVW	R0, 12(R1)
+	MOVW	R0, 16(R1)
+	MOVW	R0, 20(R1)
+	MOVW	R0, 24(R1)
+	MOVW	R0, 28(R1)
+	ADDU	$32, R1
+	JMP	large
+
+words:
+	BEQ	R1, R5, tail
+	MOVW	R0, 0(R1)
+	ADDU	$4, R1
+	JMP	words
+
+tail:
+	BEQ	R1, R4, ret
+	MOVWLO	R0, -1(R4)
+
+ret:
+	RET
+
+small_zero:
+	BEQ	R1, R4, ret
+	MOVB	R0, 0(R1)
+	ADDU	$1, R1
+	JMP	small_zero
diff --git a/src/runtime/memclr_plan9_386.s b/src/runtime/memclr_plan9_386.s
new file mode 100644
index 0000000..54701a9
--- /dev/null
+++ b/src/runtime/memclr_plan9_386.s
@@ -0,0 +1,58 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), DI
+	MOVL	n+4(FP), BX
+	XORL	AX, AX
+
+tail:
+	TESTL	BX, BX
+	JEQ	_0
+	CMPL	BX, $2
+	JBE	_1or2
+	CMPL	BX, $4
+	JB	_3
+	JE	_4
+	CMPL	BX, $8
+	JBE	_5through8
+	CMPL	BX, $16
+	JBE	_9through16
+	MOVL	BX, CX
+	SHRL	$2, CX
+	REP
+	STOSL
+	ANDL	$3, BX
+	JNE	tail
+	RET
+
+_1or2:
+	MOVB	AX, (DI)
+	MOVB	AX, -1(DI)(BX*1)
+	RET
+_0:
+	RET
+_3:
+	MOVW	AX, (DI)
+	MOVB	AX, 2(DI)
+	RET
+_4:
+	// We need a separate case for 4 to make sure we clear pointers atomically.
+	MOVL	AX, (DI)
+	RET
+_5through8:
+	MOVL	AX, (DI)
+	MOVL	AX, -4(DI)(BX*1)
+	RET
+_9through16:
+	MOVL	AX, (DI)
+	MOVL	AX, 4(DI)
+	MOVL	AX, -8(DI)(BX*1)
+	MOVL	AX, -4(DI)(BX*1)
+	RET
diff --git a/src/runtime/memclr_plan9_amd64.s b/src/runtime/memclr_plan9_amd64.s
new file mode 100644
index 0000000..8c6a1cc
--- /dev/null
+++ b/src/runtime/memclr_plan9_amd64.s
@@ -0,0 +1,23 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-16
+	MOVQ	ptr+0(FP), DI
+	MOVQ	n+8(FP), CX
+	MOVQ	CX, BX
+	ANDQ	$7, BX
+	SHRQ	$3, CX
+	MOVQ	$0, AX
+	CLD
+	REP
+	STOSQ
+	MOVQ	BX, CX
+	REP
+	STOSB
+	RET
diff --git a/src/runtime/memclr_ppc64x.s b/src/runtime/memclr_ppc64x.s
new file mode 100644
index 0000000..7512620
--- /dev/null
+++ b/src/runtime/memclr_ppc64x.s
@@ -0,0 +1,165 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB), NOSPLIT|NOFRAME, $0-16
+	MOVD ptr+0(FP), R3
+	MOVD n+8(FP), R4
+
+	// Determine if there are doublewords to clear
+check:
+	ANDCC $7, R4, R5  // R5: leftover bytes to clear
+	SRD   $3, R4, R6  // R6: double words to clear
+	CMP   R6, $0, CR1 // CR1[EQ] set if no double words
+
+	BC    12, 6, nozerolarge // only single bytes
+	CMP   R4, $512
+	BLT   under512           // special case for < 512
+	ANDCC $127, R3, R8       // check for 128 alignment of address
+	BEQ   zero512setup
+
+	ANDCC $7, R3, R15
+	BEQ   zero512xsetup // at least 8 byte aligned
+
+	// zero bytes up to 8 byte alignment
+
+	ANDCC $1, R3, R15 // check for byte alignment
+	BEQ   byte2
+	MOVB  R0, 0(R3)   // zero 1 byte
+	ADD   $1, R3      // bump ptr by 1
+	ADD   $-1, R4
+
+byte2:
+	ANDCC $2, R3, R15 // check for 2 byte alignment
+	BEQ   byte4
+	MOVH  R0, 0(R3)   // zero 2 bytes
+	ADD   $2, R3      // bump ptr by 2
+	ADD   $-2, R4
+
+byte4:
+	ANDCC $4, R3, R15   // check for 4 byte alignment
+	BEQ   zero512xsetup
+	MOVW  R0, 0(R3)     // zero 4 bytes
+	ADD   $4, R3        // bump ptr by 4
+	ADD   $-4, R4
+	BR    zero512xsetup // ptr should now be 8 byte aligned
+
+under512:
+	MOVD  R6, CTR     // R6 = number of double words
+	SRDCC $2, R6, R7  // 32 byte chunks?
+	BNE   zero32setup
+
+	// Clear double words
+
+zero8:
+	MOVD R0, 0(R3)    // double word
+	ADD  $8, R3
+	ADD  $-8, R4
+	BC   16, 0, zero8 // dec ctr, br zero8 if ctr not 0
+	BR   nozerolarge  // handle leftovers
+
+	// Prepare to clear 32 bytes at a time.
+
+zero32setup:
+	DCBTST (R3)             // prepare data cache
+	XXLXOR VS32, VS32, VS32 // clear VS32 (V0)
+	MOVD   R7, CTR          // number of 32 byte chunks
+	MOVD   $16, R8
+
+zero32:
+	STXVD2X VS32, (R3+R0)   // store 16 bytes
+	STXVD2X VS32, (R3+R8)
+	ADD     $32, R3
+	ADD     $-32, R4
+	BC      16, 0, zero32   // dec ctr, br zero32 if ctr not 0
+	RLDCLCC $61, R4, $3, R6 // remaining doublewords
+	BEQ     nozerolarge
+	MOVD    R6, CTR         // set up the CTR for doublewords
+	BR      zero8
+
+nozerolarge:
+	ANDCC $7, R4, R5 // any remaining bytes
+	BC    4, 1, LR   // ble lr
+
+zerotail:
+	MOVD R5, CTR // set up to clear tail bytes
+
+zerotailloop:
+	MOVB R0, 0(R3)           // clear single bytes
+	ADD  $1, R3
+	BC   16, 0, zerotailloop // dec ctr, br zerotailloop if ctr not 0
+	RET
+
+zero512xsetup:  // 512 chunk with extra needed
+	ANDCC $8, R3, R11    // 8 byte alignment?
+	BEQ   zero512setup16
+	MOVD  R0, 0(R3)      // clear 8 bytes
+	ADD   $8, R3         // update ptr to next 8
+	ADD   $-8, R4        // dec count by 8
+
+zero512setup16:
+	ANDCC $127, R3, R14 // < 128 byte alignment
+	BEQ   zero512setup  // handle 128 byte alignment
+	MOVD  $128, R15
+	SUB   R14, R15, R14 // find increment to 128 alignment
+	SRD   $4, R14, R15  // number of 16 byte chunks
+
+zero512presetup:
+	MOVD   R15, CTR         // loop counter of 16 bytes
+	XXLXOR VS32, VS32, VS32 // clear VS32 (V0)
+
+zero512preloop:  // clear up to 128 alignment
+	STXVD2X VS32, (R3+R0)         // clear 16 bytes
+	ADD     $16, R3               // update ptr
+	ADD     $-16, R4              // dec count
+	BC      16, 0, zero512preloop
+
+zero512setup:  // setup for dcbz loop
+	CMP  R4, $512   // check if at least 512
+	BLT  remain
+	SRD  $9, R4, R8 // loop count for 512 chunks
+	MOVD R8, CTR    // set up counter
+	MOVD $128, R9   // index regs for 128 bytes
+	MOVD $256, R10
+	MOVD $384, R11
+
+zero512:
+	DCBZ (R3+R0)        // clear first chunk
+	DCBZ (R3+R9)        // clear second chunk
+	DCBZ (R3+R10)       // clear third chunk
+	DCBZ (R3+R11)       // clear fourth chunk
+	ADD  $512, R3
+	ADD  $-512, R4
+	BC   16, 0, zero512
+
+remain:
+	CMP  R4, $128  // check if 128 byte chunks left
+	BLT  smaller
+	DCBZ (R3+R0)   // clear 128
+	ADD  $128, R3
+	ADD  $-128, R4
+	BR   remain
+
+smaller:
+	ANDCC $127, R4, R7 // find leftovers
+	BEQ   done
+	CMP   R7, $64      // more than 64, do 32 at a time
+	BLT   zero8setup   // less than 64, do 8 at a time
+	SRD   $5, R7, R7   // set up counter for 32
+	BR    zero32setup
+
+zero8setup:
+	SRDCC $3, R7, R7  // less than 8 bytes
+	BEQ   nozerolarge
+	MOVD  R7, CTR
+	BR    zero8
+
+done:
+	RET
diff --git a/src/runtime/memclr_riscv64.s b/src/runtime/memclr_riscv64.s
new file mode 100644
index 0000000..54ddaa4
--- /dev/null
+++ b/src/runtime/memclr_riscv64.s
@@ -0,0 +1,46 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// void runtime·memclrNoHeapPointers(void*, uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-16
+	MOV	ptr+0(FP), T1
+	MOV	n+8(FP), T2
+	ADD	T1, T2, T4
+
+	// If less than eight bytes, do one byte at a time.
+	SLTU	$8, T2, T3
+	BNE	T3, ZERO, outcheck
+
+	// Do one byte at a time until eight-aligned.
+	JMP	aligncheck
+align:
+	MOVB	ZERO, (T1)
+	ADD	$1, T1
+aligncheck:
+	AND	$7, T1, T3
+	BNE	T3, ZERO, align
+
+	// Do eight bytes at a time as long as there is room.
+	ADD	$-7, T4, T5
+	JMP	wordscheck
+words:
+	MOV	ZERO, (T1)
+	ADD	$8, T1
+wordscheck:
+	SLTU	T5, T1, T3
+	BNE	T3, ZERO, words
+
+	JMP	outcheck
+out:
+	MOVB	ZERO, (T1)
+	ADD	$1, T1
+outcheck:
+	BNE	T1, T4, out
+
+done:
+	RET
diff --git a/src/runtime/memclr_s390x.s b/src/runtime/memclr_s390x.s
new file mode 100644
index 0000000..fa657ef
--- /dev/null
+++ b/src/runtime/memclr_s390x.s
@@ -0,0 +1,124 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT|NOFRAME,$0-16
+	MOVD	ptr+0(FP), R4
+	MOVD	n+8(FP), R5
+
+start:
+	CMPBLE	R5, $3, clear0to3
+	CMPBLE	R5, $7, clear4to7
+	CMPBLE	R5, $11, clear8to11
+	CMPBLE	R5, $15, clear12to15
+	CMP	R5, $32
+	BGE	clearmt32
+	MOVD	$0, 0(R4)
+	MOVD	$0, 8(R4)
+	ADD	$16, R4
+	SUB	$16, R5
+	BR	start
+
+clear0to3:
+	CMPBEQ	R5, $0, done
+	CMPBNE	R5, $1, clear2
+	MOVB	$0, 0(R4)
+	RET
+clear2:
+	CMPBNE	R5, $2, clear3
+	MOVH	$0, 0(R4)
+	RET
+clear3:
+	MOVH	$0, 0(R4)
+	MOVB	$0, 2(R4)
+	RET
+
+clear4to7:
+	CMPBNE	R5, $4, clear5
+	MOVW	$0, 0(R4)
+	RET
+clear5:
+	CMPBNE	R5, $5, clear6
+	MOVW	$0, 0(R4)
+	MOVB	$0, 4(R4)
+	RET
+clear6:
+	CMPBNE	R5, $6, clear7
+	MOVW	$0, 0(R4)
+	MOVH	$0, 4(R4)
+	RET
+clear7:
+	MOVW	$0, 0(R4)
+	MOVH	$0, 4(R4)
+	MOVB	$0, 6(R4)
+	RET
+
+clear8to11:
+	CMPBNE	R5, $8, clear9
+	MOVD	$0, 0(R4)
+	RET
+clear9:
+	CMPBNE	R5, $9, clear10
+	MOVD	$0, 0(R4)
+	MOVB	$0, 8(R4)
+	RET
+clear10:
+	CMPBNE	R5, $10, clear11
+	MOVD	$0, 0(R4)
+	MOVH	$0, 8(R4)
+	RET
+clear11:
+	MOVD	$0, 0(R4)
+	MOVH	$0, 8(R4)
+	MOVB	$0, 10(R4)
+	RET
+
+clear12to15:
+	CMPBNE	R5, $12, clear13
+	MOVD	$0, 0(R4)
+	MOVW	$0, 8(R4)
+	RET
+clear13:
+	CMPBNE	R5, $13, clear14
+	MOVD	$0, 0(R4)
+	MOVW	$0, 8(R4)
+	MOVB	$0, 12(R4)
+	RET
+clear14:
+	CMPBNE	R5, $14, clear15
+	MOVD	$0, 0(R4)
+	MOVW	$0, 8(R4)
+	MOVH	$0, 12(R4)
+	RET
+clear15:
+	MOVD	$0, 0(R4)
+	MOVW	$0, 8(R4)
+	MOVH	$0, 12(R4)
+	MOVB	$0, 14(R4)
+	RET
+
+clearmt32:
+	CMP	R5, $256
+	BLT	clearlt256
+	XC	$256, 0(R4), 0(R4)
+	ADD	$256, R4
+	ADD	$-256, R5
+	BR	clearmt32
+clearlt256:
+	CMPBEQ	R5, $0, done
+	ADD	$-1, R5
+	EXRL	$memclr_exrl_xc<>(SB), R5
+done:
+	RET
+
+// DO NOT CALL - target for exrl (execute relative long) instruction.
+TEXT memclr_exrl_xc<>(SB),NOSPLIT|NOFRAME,$0-0
+	XC	$1, 0(R4), 0(R4)
+	MOVD	$0, 0(R0)
+	RET
+
diff --git a/src/runtime/memclr_wasm.s b/src/runtime/memclr_wasm.s
new file mode 100644
index 0000000..5a05304
--- /dev/null
+++ b/src/runtime/memclr_wasm.s
@@ -0,0 +1,39 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB), NOSPLIT, $0-16
+	MOVD ptr+0(FP), R0
+	MOVD n+8(FP), R1
+
+loop:
+	Loop
+		Get R1
+		I64Eqz
+		If
+			RET
+		End
+
+		Get R0
+		I32WrapI64
+		I64Const $0
+		I64Store8 $0
+
+		Get R0
+		I64Const $1
+		I64Add
+		Set R0
+
+		Get R1
+		I64Const $1
+		I64Sub
+		Set R1
+
+		Br loop
+	End
+	UNDEF
diff --git a/src/runtime/memmove_386.s b/src/runtime/memmove_386.s
new file mode 100644
index 0000000..d99546c
--- /dev/null
+++ b/src/runtime/memmove_386.s
@@ -0,0 +1,203 @@
+// Inferno's libkern/memmove-386.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memmove-386.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+// +build !plan9
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $0-12
+	MOVL	to+0(FP), DI
+	MOVL	from+4(FP), SI
+	MOVL	n+8(FP), BX
+
+	// REP instructions have a high startup cost, so we handle small sizes
+	// with some straightline code. The REP MOVSL instruction is really fast
+	// for large sizes. The cutover is approximately 1K.  We implement up to
+	// 128 because that is the maximum SSE register load (loading all data
+	// into registers lets us ignore copy direction).
+tail:
+	// BSR+branch table make almost all memmove/memclr benchmarks worse. Not worth doing.
+	TESTL	BX, BX
+	JEQ	move_0
+	CMPL	BX, $2
+	JBE	move_1or2
+	CMPL	BX, $4
+	JB	move_3
+	JE	move_4
+	CMPL	BX, $8
+	JBE	move_5through8
+	CMPL	BX, $16
+	JBE	move_9through16
+	CMPB	internal∕cpu·X86+const_offsetX86HasSSE2(SB), $1
+	JNE	nosse2
+	CMPL	BX, $32
+	JBE	move_17through32
+	CMPL	BX, $64
+	JBE	move_33through64
+	CMPL	BX, $128
+	JBE	move_65through128
+
+nosse2:
+/*
+ * check and set for backwards
+ */
+	CMPL	SI, DI
+	JLS	back
+
+/*
+ * forward copy loop
+ */
+forward:
+	// If REP MOVSB isn't fast, don't use it
+	CMPB	internal∕cpu·X86+const_offsetX86HasERMS(SB), $1 // enhanced REP MOVSB/STOSB
+	JNE	fwdBy4
+
+	// Check alignment
+	MOVL	SI, AX
+	ORL	DI, AX
+	TESTL	$3, AX
+	JEQ	fwdBy4
+
+	// Do 1 byte at a time
+	MOVL	BX, CX
+	REP;	MOVSB
+	RET
+
+fwdBy4:
+	// Do 4 bytes at a time
+	MOVL	BX, CX
+	SHRL	$2, CX
+	ANDL	$3, BX
+	REP;	MOVSL
+	JMP	tail
+
+/*
+ * check overlap
+ */
+back:
+	MOVL	SI, CX
+	ADDL	BX, CX
+	CMPL	CX, DI
+	JLS	forward
+/*
+ * whole thing backwards has
+ * adjusted addresses
+ */
+
+	ADDL	BX, DI
+	ADDL	BX, SI
+	STD
+
+/*
+ * copy
+ */
+	MOVL	BX, CX
+	SHRL	$2, CX
+	ANDL	$3, BX
+
+	SUBL	$4, DI
+	SUBL	$4, SI
+	REP;	MOVSL
+
+	CLD
+	ADDL	$4, DI
+	ADDL	$4, SI
+	SUBL	BX, DI
+	SUBL	BX, SI
+	JMP	tail
+
+move_1or2:
+	MOVB	(SI), AX
+	MOVB	-1(SI)(BX*1), CX
+	MOVB	AX, (DI)
+	MOVB	CX, -1(DI)(BX*1)
+	RET
+move_0:
+	RET
+move_3:
+	MOVW	(SI), AX
+	MOVB	2(SI), CX
+	MOVW	AX, (DI)
+	MOVB	CX, 2(DI)
+	RET
+move_4:
+	// We need a separate case for 4 to make sure we write pointers atomically.
+	MOVL	(SI), AX
+	MOVL	AX, (DI)
+	RET
+move_5through8:
+	MOVL	(SI), AX
+	MOVL	-4(SI)(BX*1), CX
+	MOVL	AX, (DI)
+	MOVL	CX, -4(DI)(BX*1)
+	RET
+move_9through16:
+	MOVL	(SI), AX
+	MOVL	4(SI), CX
+	MOVL	-8(SI)(BX*1), DX
+	MOVL	-4(SI)(BX*1), BP
+	MOVL	AX, (DI)
+	MOVL	CX, 4(DI)
+	MOVL	DX, -8(DI)(BX*1)
+	MOVL	BP, -4(DI)(BX*1)
+	RET
+move_17through32:
+	MOVOU	(SI), X0
+	MOVOU	-16(SI)(BX*1), X1
+	MOVOU	X0, (DI)
+	MOVOU	X1, -16(DI)(BX*1)
+	RET
+move_33through64:
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	-32(SI)(BX*1), X2
+	MOVOU	-16(SI)(BX*1), X3
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, -32(DI)(BX*1)
+	MOVOU	X3, -16(DI)(BX*1)
+	RET
+move_65through128:
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	32(SI), X2
+	MOVOU	48(SI), X3
+	MOVOU	-64(SI)(BX*1), X4
+	MOVOU	-48(SI)(BX*1), X5
+	MOVOU	-32(SI)(BX*1), X6
+	MOVOU	-16(SI)(BX*1), X7
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, 32(DI)
+	MOVOU	X3, 48(DI)
+	MOVOU	X4, -64(DI)(BX*1)
+	MOVOU	X5, -48(DI)(BX*1)
+	MOVOU	X6, -32(DI)(BX*1)
+	MOVOU	X7, -16(DI)(BX*1)
+	RET
diff --git a/src/runtime/memmove_amd64.s b/src/runtime/memmove_amd64.s
new file mode 100644
index 0000000..d91641a
--- /dev/null
+++ b/src/runtime/memmove_amd64.s
@@ -0,0 +1,525 @@
+// Derived from Inferno's libkern/memmove-386.s (adapted for amd64)
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memmove-386.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+// +build !plan9
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $0-24
+
+	MOVQ	to+0(FP), DI
+	MOVQ	from+8(FP), SI
+	MOVQ	n+16(FP), BX
+
+	// REP instructions have a high startup cost, so we handle small sizes
+	// with some straightline code. The REP MOVSQ instruction is really fast
+	// for large sizes. The cutover is approximately 2K.
+tail:
+	// move_129through256 or smaller work whether or not the source and the
+	// destination memory regions overlap because they load all data into
+	// registers before writing it back.  move_256through2048 on the other
+	// hand can be used only when the memory regions don't overlap or the copy
+	// direction is forward.
+	//
+	// BSR+branch table make almost all memmove/memclr benchmarks worse. Not worth doing.
+	TESTQ	BX, BX
+	JEQ	move_0
+	CMPQ	BX, $2
+	JBE	move_1or2
+	CMPQ	BX, $4
+	JB	move_3
+	JBE	move_4
+	CMPQ	BX, $8
+	JB	move_5through7
+	JE	move_8
+	CMPQ	BX, $16
+	JBE	move_9through16
+	CMPQ	BX, $32
+	JBE	move_17through32
+	CMPQ	BX, $64
+	JBE	move_33through64
+	CMPQ	BX, $128
+	JBE	move_65through128
+	CMPQ	BX, $256
+	JBE	move_129through256
+
+	TESTB	$1, runtime·useAVXmemmove(SB)
+	JNZ	avxUnaligned
+
+/*
+ * check and set for backwards
+ */
+	CMPQ	SI, DI
+	JLS	back
+
+/*
+ * forward copy loop
+ */
+forward:
+	CMPQ	BX, $2048
+	JLS	move_256through2048
+
+	// If REP MOVSB isn't fast, don't use it
+	CMPB	internal∕cpu·X86+const_offsetX86HasERMS(SB), $1 // enhanced REP MOVSB/STOSB
+	JNE	fwdBy8
+
+	// Check alignment
+	MOVL	SI, AX
+	ORL	DI, AX
+	TESTL	$7, AX
+	JEQ	fwdBy8
+
+	// Do 1 byte at a time
+	MOVQ	BX, CX
+	REP;	MOVSB
+	RET
+
+fwdBy8:
+	// Do 8 bytes at a time
+	MOVQ	BX, CX
+	SHRQ	$3, CX
+	ANDQ	$7, BX
+	REP;	MOVSQ
+	JMP	tail
+
+back:
+/*
+ * check overlap
+ */
+	MOVQ	SI, CX
+	ADDQ	BX, CX
+	CMPQ	CX, DI
+	JLS	forward
+/*
+ * whole thing backwards has
+ * adjusted addresses
+ */
+	ADDQ	BX, DI
+	ADDQ	BX, SI
+	STD
+
+/*
+ * copy
+ */
+	MOVQ	BX, CX
+	SHRQ	$3, CX
+	ANDQ	$7, BX
+
+	SUBQ	$8, DI
+	SUBQ	$8, SI
+	REP;	MOVSQ
+
+	CLD
+	ADDQ	$8, DI
+	ADDQ	$8, SI
+	SUBQ	BX, DI
+	SUBQ	BX, SI
+	JMP	tail
+
+move_1or2:
+	MOVB	(SI), AX
+	MOVB	-1(SI)(BX*1), CX
+	MOVB	AX, (DI)
+	MOVB	CX, -1(DI)(BX*1)
+	RET
+move_0:
+	RET
+move_4:
+	MOVL	(SI), AX
+	MOVL	AX, (DI)
+	RET
+move_3:
+	MOVW	(SI), AX
+	MOVB	2(SI), CX
+	MOVW	AX, (DI)
+	MOVB	CX, 2(DI)
+	RET
+move_5through7:
+	MOVL	(SI), AX
+	MOVL	-4(SI)(BX*1), CX
+	MOVL	AX, (DI)
+	MOVL	CX, -4(DI)(BX*1)
+	RET
+move_8:
+	// We need a separate case for 8 to make sure we write pointers atomically.
+	MOVQ	(SI), AX
+	MOVQ	AX, (DI)
+	RET
+move_9through16:
+	MOVQ	(SI), AX
+	MOVQ	-8(SI)(BX*1), CX
+	MOVQ	AX, (DI)
+	MOVQ	CX, -8(DI)(BX*1)
+	RET
+move_17through32:
+	MOVOU	(SI), X0
+	MOVOU	-16(SI)(BX*1), X1
+	MOVOU	X0, (DI)
+	MOVOU	X1, -16(DI)(BX*1)
+	RET
+move_33through64:
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	-32(SI)(BX*1), X2
+	MOVOU	-16(SI)(BX*1), X3
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, -32(DI)(BX*1)
+	MOVOU	X3, -16(DI)(BX*1)
+	RET
+move_65through128:
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	32(SI), X2
+	MOVOU	48(SI), X3
+	MOVOU	-64(SI)(BX*1), X4
+	MOVOU	-48(SI)(BX*1), X5
+	MOVOU	-32(SI)(BX*1), X6
+	MOVOU	-16(SI)(BX*1), X7
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, 32(DI)
+	MOVOU	X3, 48(DI)
+	MOVOU	X4, -64(DI)(BX*1)
+	MOVOU	X5, -48(DI)(BX*1)
+	MOVOU	X6, -32(DI)(BX*1)
+	MOVOU	X7, -16(DI)(BX*1)
+	RET
+move_129through256:
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	32(SI), X2
+	MOVOU	48(SI), X3
+	MOVOU	64(SI), X4
+	MOVOU	80(SI), X5
+	MOVOU	96(SI), X6
+	MOVOU	112(SI), X7
+	MOVOU	-128(SI)(BX*1), X8
+	MOVOU	-112(SI)(BX*1), X9
+	MOVOU	-96(SI)(BX*1), X10
+	MOVOU	-80(SI)(BX*1), X11
+	MOVOU	-64(SI)(BX*1), X12
+	MOVOU	-48(SI)(BX*1), X13
+	MOVOU	-32(SI)(BX*1), X14
+	MOVOU	-16(SI)(BX*1), X15
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, 32(DI)
+	MOVOU	X3, 48(DI)
+	MOVOU	X4, 64(DI)
+	MOVOU	X5, 80(DI)
+	MOVOU	X6, 96(DI)
+	MOVOU	X7, 112(DI)
+	MOVOU	X8, -128(DI)(BX*1)
+	MOVOU	X9, -112(DI)(BX*1)
+	MOVOU	X10, -96(DI)(BX*1)
+	MOVOU	X11, -80(DI)(BX*1)
+	MOVOU	X12, -64(DI)(BX*1)
+	MOVOU	X13, -48(DI)(BX*1)
+	MOVOU	X14, -32(DI)(BX*1)
+	MOVOU	X15, -16(DI)(BX*1)
+	RET
+move_256through2048:
+	SUBQ	$256, BX
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	32(SI), X2
+	MOVOU	48(SI), X3
+	MOVOU	64(SI), X4
+	MOVOU	80(SI), X5
+	MOVOU	96(SI), X6
+	MOVOU	112(SI), X7
+	MOVOU	128(SI), X8
+	MOVOU	144(SI), X9
+	MOVOU	160(SI), X10
+	MOVOU	176(SI), X11
+	MOVOU	192(SI), X12
+	MOVOU	208(SI), X13
+	MOVOU	224(SI), X14
+	MOVOU	240(SI), X15
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, 32(DI)
+	MOVOU	X3, 48(DI)
+	MOVOU	X4, 64(DI)
+	MOVOU	X5, 80(DI)
+	MOVOU	X6, 96(DI)
+	MOVOU	X7, 112(DI)
+	MOVOU	X8, 128(DI)
+	MOVOU	X9, 144(DI)
+	MOVOU	X10, 160(DI)
+	MOVOU	X11, 176(DI)
+	MOVOU	X12, 192(DI)
+	MOVOU	X13, 208(DI)
+	MOVOU	X14, 224(DI)
+	MOVOU	X15, 240(DI)
+	CMPQ	BX, $256
+	LEAQ	256(SI), SI
+	LEAQ	256(DI), DI
+	JGE	move_256through2048
+	JMP	tail
+
+avxUnaligned:
+	// There are two implementations of move algorithm.
+	// The first one for non-overlapped memory regions. It uses forward copying.
+	// The second one for overlapped regions. It uses backward copying
+	MOVQ	DI, CX
+	SUBQ	SI, CX
+	// Now CX contains distance between SRC and DEST
+	CMPQ	CX, BX
+	// If the distance lesser than region length it means that regions are overlapped
+	JC	copy_backward
+
+	// Non-temporal copy would be better for big sizes.
+	CMPQ	BX, $0x100000
+	JAE	gobble_big_data_fwd
+
+	// Memory layout on the source side
+	// SI                                       CX
+	// |<---------BX before correction--------->|
+	// |       |<--BX corrected-->|             |
+	// |       |                  |<--- AX  --->|
+	// |<-R11->|                  |<-128 bytes->|
+	// +----------------------------------------+
+	// | Head  | Body             | Tail        |
+	// +-------+------------------+-------------+
+	// ^       ^                  ^
+	// |       |                  |
+	// Save head into Y4          Save tail into X5..X12
+	//         |
+	//         SI+R11, where R11 = ((DI & -32) + 32) - DI
+	// Algorithm:
+	// 1. Unaligned save of the tail's 128 bytes
+	// 2. Unaligned save of the head's 32  bytes
+	// 3. Destination-aligned copying of body (128 bytes per iteration)
+	// 4. Put head on the new place
+	// 5. Put the tail on the new place
+	// It can be important to satisfy processor's pipeline requirements for
+	// small sizes as the cost of unaligned memory region copying is
+	// comparable with the cost of main loop. So code is slightly messed there.
+	// There is more clean implementation of that algorithm for bigger sizes
+	// where the cost of unaligned part copying is negligible.
+	// You can see it after gobble_big_data_fwd label.
+	LEAQ	(SI)(BX*1), CX
+	MOVQ	DI, R10
+	// CX points to the end of buffer so we need go back slightly. We will use negative offsets there.
+	MOVOU	-0x80(CX), X5
+	MOVOU	-0x70(CX), X6
+	MOVQ	$0x80, AX
+	// Align destination address
+	ANDQ	$-32, DI
+	ADDQ	$32, DI
+	// Continue tail saving.
+	MOVOU	-0x60(CX), X7
+	MOVOU	-0x50(CX), X8
+	// Make R11 delta between aligned and unaligned destination addresses.
+	MOVQ	DI, R11
+	SUBQ	R10, R11
+	// Continue tail saving.
+	MOVOU	-0x40(CX), X9
+	MOVOU	-0x30(CX), X10
+	// Let's make bytes-to-copy value adjusted as we've prepared unaligned part for copying.
+	SUBQ	R11, BX
+	// Continue tail saving.
+	MOVOU	-0x20(CX), X11
+	MOVOU	-0x10(CX), X12
+	// The tail will be put on its place after main body copying.
+	// It's time for the unaligned heading part.
+	VMOVDQU	(SI), Y4
+	// Adjust source address to point past head.
+	ADDQ	R11, SI
+	SUBQ	AX, BX
+	// Aligned memory copying there
+gobble_128_loop:
+	VMOVDQU	(SI), Y0
+	VMOVDQU	0x20(SI), Y1
+	VMOVDQU	0x40(SI), Y2
+	VMOVDQU	0x60(SI), Y3
+	ADDQ	AX, SI
+	VMOVDQA	Y0, (DI)
+	VMOVDQA	Y1, 0x20(DI)
+	VMOVDQA	Y2, 0x40(DI)
+	VMOVDQA	Y3, 0x60(DI)
+	ADDQ	AX, DI
+	SUBQ	AX, BX
+	JA	gobble_128_loop
+	// Now we can store unaligned parts.
+	ADDQ	AX, BX
+	ADDQ	DI, BX
+	VMOVDQU	Y4, (R10)
+	VZEROUPPER
+	MOVOU	X5, -0x80(BX)
+	MOVOU	X6, -0x70(BX)
+	MOVOU	X7, -0x60(BX)
+	MOVOU	X8, -0x50(BX)
+	MOVOU	X9, -0x40(BX)
+	MOVOU	X10, -0x30(BX)
+	MOVOU	X11, -0x20(BX)
+	MOVOU	X12, -0x10(BX)
+	RET
+
+gobble_big_data_fwd:
+	// There is forward copying for big regions.
+	// It uses non-temporal mov instructions.
+	// Details of this algorithm are commented previously for small sizes.
+	LEAQ	(SI)(BX*1), CX
+	MOVOU	-0x80(SI)(BX*1), X5
+	MOVOU	-0x70(CX), X6
+	MOVOU	-0x60(CX), X7
+	MOVOU	-0x50(CX), X8
+	MOVOU	-0x40(CX), X9
+	MOVOU	-0x30(CX), X10
+	MOVOU	-0x20(CX), X11
+	MOVOU	-0x10(CX), X12
+	VMOVDQU	(SI), Y4
+	MOVQ	DI, R8
+	ANDQ	$-32, DI
+	ADDQ	$32, DI
+	MOVQ	DI, R10
+	SUBQ	R8, R10
+	SUBQ	R10, BX
+	ADDQ	R10, SI
+	LEAQ	(DI)(BX*1), CX
+	SUBQ	$0x80, BX
+gobble_mem_fwd_loop:
+	PREFETCHNTA 0x1C0(SI)
+	PREFETCHNTA 0x280(SI)
+	// Prefetch values were chosen empirically.
+	// Approach for prefetch usage as in 7.6.6 of [1]
+	// [1] 64-ia-32-architectures-optimization-manual.pdf
+	// https://www.intel.ru/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
+	VMOVDQU	(SI), Y0
+	VMOVDQU	0x20(SI), Y1
+	VMOVDQU	0x40(SI), Y2
+	VMOVDQU	0x60(SI), Y3
+	ADDQ	$0x80, SI
+	VMOVNTDQ Y0, (DI)
+	VMOVNTDQ Y1, 0x20(DI)
+	VMOVNTDQ Y2, 0x40(DI)
+	VMOVNTDQ Y3, 0x60(DI)
+	ADDQ	$0x80, DI
+	SUBQ	$0x80, BX
+	JA		gobble_mem_fwd_loop
+	// NT instructions don't follow the normal cache-coherency rules.
+	// We need SFENCE there to make copied data available timely.
+	SFENCE
+	VMOVDQU	Y4, (R8)
+	VZEROUPPER
+	MOVOU	X5, -0x80(CX)
+	MOVOU	X6, -0x70(CX)
+	MOVOU	X7, -0x60(CX)
+	MOVOU	X8, -0x50(CX)
+	MOVOU	X9, -0x40(CX)
+	MOVOU	X10, -0x30(CX)
+	MOVOU	X11, -0x20(CX)
+	MOVOU	X12, -0x10(CX)
+	RET
+
+copy_backward:
+	MOVQ	DI, AX
+	// Backward copying is about the same as the forward one.
+	// Firstly we load unaligned tail in the beginning of region.
+	MOVOU	(SI), X5
+	MOVOU	0x10(SI), X6
+	ADDQ	BX, DI
+	MOVOU	0x20(SI), X7
+	MOVOU	0x30(SI), X8
+	LEAQ	-0x20(DI), R10
+	MOVQ	DI, R11
+	MOVOU	0x40(SI), X9
+	MOVOU	0x50(SI), X10
+	ANDQ	$0x1F, R11
+	MOVOU	0x60(SI), X11
+	MOVOU	0x70(SI), X12
+	XORQ	R11, DI
+	// Let's point SI to the end of region
+	ADDQ	BX, SI
+	// and load unaligned head into X4.
+	VMOVDQU	-0x20(SI), Y4
+	SUBQ	R11, SI
+	SUBQ	R11, BX
+	// If there is enough data for non-temporal moves go to special loop
+	CMPQ	BX, $0x100000
+	JA		gobble_big_data_bwd
+	SUBQ	$0x80, BX
+gobble_mem_bwd_loop:
+	VMOVDQU	-0x20(SI), Y0
+	VMOVDQU	-0x40(SI), Y1
+	VMOVDQU	-0x60(SI), Y2
+	VMOVDQU	-0x80(SI), Y3
+	SUBQ	$0x80, SI
+	VMOVDQA	Y0, -0x20(DI)
+	VMOVDQA	Y1, -0x40(DI)
+	VMOVDQA	Y2, -0x60(DI)
+	VMOVDQA	Y3, -0x80(DI)
+	SUBQ	$0x80, DI
+	SUBQ	$0x80, BX
+	JA		gobble_mem_bwd_loop
+	// Let's store unaligned data
+	VMOVDQU	Y4, (R10)
+	VZEROUPPER
+	MOVOU	X5, (AX)
+	MOVOU	X6, 0x10(AX)
+	MOVOU	X7, 0x20(AX)
+	MOVOU	X8, 0x30(AX)
+	MOVOU	X9, 0x40(AX)
+	MOVOU	X10, 0x50(AX)
+	MOVOU	X11, 0x60(AX)
+	MOVOU	X12, 0x70(AX)
+	RET
+
+gobble_big_data_bwd:
+	SUBQ	$0x80, BX
+gobble_big_mem_bwd_loop:
+	PREFETCHNTA -0x1C0(SI)
+	PREFETCHNTA -0x280(SI)
+	VMOVDQU	-0x20(SI), Y0
+	VMOVDQU	-0x40(SI), Y1
+	VMOVDQU	-0x60(SI), Y2
+	VMOVDQU	-0x80(SI), Y3
+	SUBQ	$0x80, SI
+	VMOVNTDQ	Y0, -0x20(DI)
+	VMOVNTDQ	Y1, -0x40(DI)
+	VMOVNTDQ	Y2, -0x60(DI)
+	VMOVNTDQ	Y3, -0x80(DI)
+	SUBQ	$0x80, DI
+	SUBQ	$0x80, BX
+	JA	gobble_big_mem_bwd_loop
+	SFENCE
+	VMOVDQU	Y4, (R10)
+	VZEROUPPER
+	MOVOU	X5, (AX)
+	MOVOU	X6, 0x10(AX)
+	MOVOU	X7, 0x20(AX)
+	MOVOU	X8, 0x30(AX)
+	MOVOU	X9, 0x40(AX)
+	MOVOU	X10, 0x50(AX)
+	MOVOU	X11, 0x60(AX)
+	MOVOU	X12, 0x70(AX)
+	RET
diff --git a/src/runtime/memmove_arm.s b/src/runtime/memmove_arm.s
new file mode 100644
index 0000000..43d53fa
--- /dev/null
+++ b/src/runtime/memmove_arm.s
@@ -0,0 +1,264 @@
+// Inferno's libkern/memmove-arm.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memmove-arm.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "textflag.h"
+
+// TE or TS are spilled to the stack during bulk register moves.
+#define TS	R0
+#define TE	R8
+
+// Warning: the linker will use R11 to synthesize certain instructions. Please
+// take care and double check with objdump.
+#define FROM	R11
+#define N	R12
+#define TMP	R12				/* N and TMP don't overlap */
+#define TMP1	R5
+
+#define RSHIFT	R5
+#define LSHIFT	R6
+#define OFFSET	R7
+
+#define BR0	R0					/* shared with TS */
+#define BW0	R1
+#define BR1	R1
+#define BW1	R2
+#define BR2	R2
+#define BW2	R3
+#define BR3	R3
+#define BW3	R4
+
+#define FW0	R1
+#define FR0	R2
+#define FW1	R2
+#define FR1	R3
+#define FW2	R3
+#define FR2	R4
+#define FW3	R4
+#define FR3	R8					/* shared with TE */
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $4-12
+_memmove:
+	MOVW	to+0(FP), TS
+	MOVW	from+4(FP), FROM
+	MOVW	n+8(FP), N
+
+	ADD	N, TS, TE	/* to end pointer */
+
+	CMP	FROM, TS
+	BLS	_forward
+
+_back:
+	ADD	N, FROM		/* from end pointer */
+	CMP	$4, N		/* need at least 4 bytes to copy */
+	BLT	_b1tail
+
+_b4align:				/* align destination on 4 */
+	AND.S	$3, TE, TMP
+	BEQ	_b4aligned
+
+	MOVBU.W	-1(FROM), TMP	/* pre-indexed */
+	MOVBU.W	TMP, -1(TE)	/* pre-indexed */
+	B	_b4align
+
+_b4aligned:				/* is source now aligned? */
+	AND.S	$3, FROM, TMP
+	BNE	_bunaligned
+
+	ADD	$31, TS, TMP	/* do 32-byte chunks if possible */
+	MOVW	TS, savedts-4(SP)
+_b32loop:
+	CMP	TMP, TE
+	BLS	_b4tail
+
+	MOVM.DB.W (FROM), [R0-R7]
+	MOVM.DB.W [R0-R7], (TE)
+	B	_b32loop
+
+_b4tail:				/* do remaining words if possible */
+	MOVW	savedts-4(SP), TS
+	ADD	$3, TS, TMP
+_b4loop:
+	CMP	TMP, TE
+	BLS	_b1tail
+
+	MOVW.W	-4(FROM), TMP1	/* pre-indexed */
+	MOVW.W	TMP1, -4(TE)	/* pre-indexed */
+	B	_b4loop
+
+_b1tail:				/* remaining bytes */
+	CMP	TE, TS
+	BEQ	_return
+
+	MOVBU.W	-1(FROM), TMP	/* pre-indexed */
+	MOVBU.W	TMP, -1(TE)	/* pre-indexed */
+	B	_b1tail
+
+_forward:
+	CMP	$4, N		/* need at least 4 bytes to copy */
+	BLT	_f1tail
+
+_f4align:				/* align destination on 4 */
+	AND.S	$3, TS, TMP
+	BEQ	_f4aligned
+
+	MOVBU.P	1(FROM), TMP	/* implicit write back */
+	MOVBU.P	TMP, 1(TS)	/* implicit write back */
+	B	_f4align
+
+_f4aligned:				/* is source now aligned? */
+	AND.S	$3, FROM, TMP
+	BNE	_funaligned
+
+	SUB	$31, TE, TMP	/* do 32-byte chunks if possible */
+	MOVW	TE, savedte-4(SP)
+_f32loop:
+	CMP	TMP, TS
+	BHS	_f4tail
+
+	MOVM.IA.W (FROM), [R1-R8]
+	MOVM.IA.W [R1-R8], (TS)
+	B	_f32loop
+
+_f4tail:
+	MOVW	savedte-4(SP), TE
+	SUB	$3, TE, TMP	/* do remaining words if possible */
+_f4loop:
+	CMP	TMP, TS
+	BHS	_f1tail
+
+	MOVW.P	4(FROM), TMP1	/* implicit write back */
+	MOVW.P	TMP1, 4(TS)	/* implicit write back */
+	B	_f4loop
+
+_f1tail:
+	CMP	TS, TE
+	BEQ	_return
+
+	MOVBU.P	1(FROM), TMP	/* implicit write back */
+	MOVBU.P	TMP, 1(TS)	/* implicit write back */
+	B	_f1tail
+
+_return:
+	MOVW	to+0(FP), R0
+	RET
+
+_bunaligned:
+	CMP	$2, TMP		/* is TMP < 2 ? */
+
+	MOVW.LT	$8, RSHIFT		/* (R(n)<<24)|(R(n-1)>>8) */
+	MOVW.LT	$24, LSHIFT
+	MOVW.LT	$1, OFFSET
+
+	MOVW.EQ	$16, RSHIFT		/* (R(n)<<16)|(R(n-1)>>16) */
+	MOVW.EQ	$16, LSHIFT
+	MOVW.EQ	$2, OFFSET
+
+	MOVW.GT	$24, RSHIFT		/* (R(n)<<8)|(R(n-1)>>24) */
+	MOVW.GT	$8, LSHIFT
+	MOVW.GT	$3, OFFSET
+
+	ADD	$16, TS, TMP	/* do 16-byte chunks if possible */
+	CMP	TMP, TE
+	BLS	_b1tail
+
+	BIC	$3, FROM		/* align source */
+	MOVW	TS, savedts-4(SP)
+	MOVW	(FROM), BR0	/* prime first block register */
+
+_bu16loop:
+	CMP	TMP, TE
+	BLS	_bu1tail
+
+	MOVW	BR0<<LSHIFT, BW3
+	MOVM.DB.W (FROM), [BR0-BR3]
+	ORR	BR3>>RSHIFT, BW3
+
+	MOVW	BR3<<LSHIFT, BW2
+	ORR	BR2>>RSHIFT, BW2
+
+	MOVW	BR2<<LSHIFT, BW1
+	ORR	BR1>>RSHIFT, BW1
+
+	MOVW	BR1<<LSHIFT, BW0
+	ORR	BR0>>RSHIFT, BW0
+
+	MOVM.DB.W [BW0-BW3], (TE)
+	B	_bu16loop
+
+_bu1tail:
+	MOVW	savedts-4(SP), TS
+	ADD	OFFSET, FROM
+	B	_b1tail
+
+_funaligned:
+	CMP	$2, TMP
+
+	MOVW.LT	$8, RSHIFT		/* (R(n+1)<<24)|(R(n)>>8) */
+	MOVW.LT	$24, LSHIFT
+	MOVW.LT	$3, OFFSET
+
+	MOVW.EQ	$16, RSHIFT		/* (R(n+1)<<16)|(R(n)>>16) */
+	MOVW.EQ	$16, LSHIFT
+	MOVW.EQ	$2, OFFSET
+
+	MOVW.GT	$24, RSHIFT		/* (R(n+1)<<8)|(R(n)>>24) */
+	MOVW.GT	$8, LSHIFT
+	MOVW.GT	$1, OFFSET
+
+	SUB	$16, TE, TMP	/* do 16-byte chunks if possible */
+	CMP	TMP, TS
+	BHS	_f1tail
+
+	BIC	$3, FROM		/* align source */
+	MOVW	TE, savedte-4(SP)
+	MOVW.P	4(FROM), FR3	/* prime last block register, implicit write back */
+
+_fu16loop:
+	CMP	TMP, TS
+	BHS	_fu1tail
+
+	MOVW	FR3>>RSHIFT, FW0
+	MOVM.IA.W (FROM), [FR0,FR1,FR2,FR3]
+	ORR	FR0<<LSHIFT, FW0
+
+	MOVW	FR0>>RSHIFT, FW1
+	ORR	FR1<<LSHIFT, FW1
+
+	MOVW	FR1>>RSHIFT, FW2
+	ORR	FR2<<LSHIFT, FW2
+
+	MOVW	FR2>>RSHIFT, FW3
+	ORR	FR3<<LSHIFT, FW3
+
+	MOVM.IA.W [FW0,FW1,FW2,FW3], (TS)
+	B	_fu16loop
+
+_fu1tail:
+	MOVW	savedte-4(SP), TE
+	SUB	OFFSET, FROM
+	B	_f1tail
diff --git a/src/runtime/memmove_arm64.s b/src/runtime/memmove_arm64.s
new file mode 100644
index 0000000..43d2762
--- /dev/null
+++ b/src/runtime/memmove_arm64.s
@@ -0,0 +1,241 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// Register map
+//
+// dstin  R0
+// src    R1
+// count  R2
+// dst    R3 (same as R0, but gets modified in unaligned cases)
+// srcend R4
+// dstend R5
+// data   R6-R17
+// tmp1   R14
+
+// Copies are split into 3 main cases: small copies of up to 32 bytes, medium
+// copies of up to 128 bytes, and large copies. The overhead of the overlap
+// check is negligible since it is only required for large copies.
+//
+// Large copies use a software pipelined loop processing 64 bytes per iteration.
+// The destination pointer is 16-byte aligned to minimize unaligned accesses.
+// The loop tail is handled by always copying 64 bytes from the end.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT|NOFRAME, $0-24
+	MOVD	to+0(FP), R0
+	MOVD	from+8(FP), R1
+	MOVD	n+16(FP), R2
+	CBZ	R2, copy0
+
+	// Small copies: 1..16 bytes
+	CMP	$16, R2
+	BLE	copy16
+
+	// Large copies
+	CMP	$128, R2
+	BHI	copy_long
+	CMP	$32, R2
+	BHI	copy32_128
+
+	// Small copies: 17..32 bytes.
+	LDP	(R1), (R6, R7)
+	ADD	R1, R2, R4          // R4 points just past the last source byte
+	LDP	-16(R4), (R12, R13)
+	STP	(R6, R7), (R0)
+	ADD	R0, R2, R5          // R5 points just past the last destination byte
+	STP	(R12, R13), -16(R5)
+	RET
+
+// Small copies: 1..16 bytes.
+copy16:
+	ADD	R1, R2, R4 // R4 points just past the last source byte
+	ADD	R0, R2, R5 // R5 points just past the last destination byte
+	CMP	$8, R2
+	BLT	copy7
+	MOVD	(R1), R6
+	MOVD	-8(R4), R7
+	MOVD	R6, (R0)
+	MOVD	R7, -8(R5)
+	RET
+
+copy7:
+	TBZ	$2, R2, copy3
+	MOVWU	(R1), R6
+	MOVWU	-4(R4), R7
+	MOVW	R6, (R0)
+	MOVW	R7, -4(R5)
+	RET
+
+copy3:
+	TBZ	$1, R2, copy1
+	MOVHU	(R1), R6
+	MOVHU	-2(R4), R7
+	MOVH	R6, (R0)
+	MOVH	R7, -2(R5)
+	RET
+
+copy1:
+	MOVBU	(R1), R6
+	MOVB	R6, (R0)
+
+copy0:
+	RET
+
+	// Medium copies: 33..128 bytes.
+copy32_128:
+	ADD	R1, R2, R4          // R4 points just past the last source byte
+	ADD	R0, R2, R5          // R5 points just past the last destination byte
+	LDP	(R1), (R6, R7)
+	LDP	16(R1), (R8, R9)
+	LDP	-32(R4), (R10, R11)
+	LDP	-16(R4), (R12, R13)
+	CMP	$64, R2
+	BHI	copy128
+	STP	(R6, R7), (R0)
+	STP	(R8, R9), 16(R0)
+	STP	(R10, R11), -32(R5)
+	STP	(R12, R13), -16(R5)
+	RET
+
+	// Copy 65..128 bytes.
+copy128:
+	LDP	32(R1), (R14, R15)
+	LDP	48(R1), (R16, R17)
+	CMP	$96, R2
+	BLS	copy96
+	LDP	-64(R4), (R2, R3)
+	LDP	-48(R4), (R1, R4)
+	STP	(R2, R3), -64(R5)
+	STP	(R1, R4), -48(R5)
+
+copy96:
+	STP	(R6, R7), (R0)
+	STP	(R8, R9), 16(R0)
+	STP	(R14, R15), 32(R0)
+	STP	(R16, R17), 48(R0)
+	STP	(R10, R11), -32(R5)
+	STP	(R12, R13), -16(R5)
+	RET
+
+	// Copy more than 128 bytes.
+copy_long:
+	ADD	R1, R2, R4 // R4 points just past the last source byte
+	ADD	R0, R2, R5 // R5 points just past the last destination byte
+	MOVD	ZR, R7
+	MOVD	ZR, R8
+
+	CMP	$1024, R2
+	BLT	backward_check
+	// feature detect to decide how to align
+	MOVBU	runtime·arm64UseAlignedLoads(SB), R6
+	CBNZ	R6, use_aligned_loads
+	MOVD	R0, R7
+	MOVD	R5, R8
+	B	backward_check
+use_aligned_loads:
+	MOVD	R1, R7
+	MOVD	R4, R8
+	// R7 and R8 are used here for the realignment calculation. In
+	// the use_aligned_loads case, R7 is the src pointer and R8 is
+	// srcend pointer, which is used in the backward copy case.
+	// When doing aligned stores, R7 is the dst pointer and R8 is
+	// the dstend pointer.
+
+backward_check:
+	// Use backward copy if there is an overlap.
+	SUB	R1, R0, R14
+	CBZ	R14, copy0
+	CMP	R2, R14
+	BCC	copy_long_backward
+
+	// Copy 16 bytes and then align src (R1) or dst (R0) to 16-byte alignment.
+	LDP	(R1), (R12, R13)     // Load  A
+	AND	$15, R7, R14         // Calculate the realignment offset
+	SUB	R14, R1, R1
+	SUB	R14, R0, R3          // move dst back same amount as src
+	ADD	R14, R2, R2
+	LDP	16(R1), (R6, R7)     // Load   B
+	STP	(R12, R13), (R0)     // Store A
+	LDP	32(R1), (R8, R9)     // Load    C
+	LDP	48(R1), (R10, R11)   // Load     D
+	LDP.W	64(R1), (R12, R13)   // Load      E
+	// 80 bytes have been loaded; if less than 80+64 bytes remain, copy from the end
+	SUBS	$144, R2, R2
+	BLS	copy64_from_end
+
+loop64:
+	STP	(R6, R7), 16(R3)     // Store  B
+	LDP	16(R1), (R6, R7)     // Load   B (next iteration)
+	STP	(R8, R9), 32(R3)     // Store   C
+	LDP	32(R1), (R8, R9)     // Load    C
+	STP	(R10, R11), 48(R3)   // Store    D
+	LDP	48(R1), (R10, R11)   // Load     D
+	STP.W	(R12, R13), 64(R3)   // Store     E
+	LDP.W	64(R1), (R12, R13)   // Load      E
+	SUBS	$64, R2, R2
+	BHI	loop64
+
+	// Write the last iteration and copy 64 bytes from the end.
+copy64_from_end:
+	LDP	-64(R4), (R14, R15)  // Load       F
+	STP	(R6, R7), 16(R3)     // Store  B
+	LDP	-48(R4), (R6, R7)    // Load        G
+	STP	(R8, R9), 32(R3)     // Store   C
+	LDP	-32(R4), (R8, R9)    // Load         H
+	STP	(R10, R11), 48(R3)   // Store    D
+	LDP	-16(R4), (R10, R11)  // Load          I
+	STP	(R12, R13), 64(R3)   // Store     E
+	STP	(R14, R15), -64(R5)  // Store      F
+	STP	(R6, R7), -48(R5)    // Store       G
+	STP	(R8, R9), -32(R5)    // Store        H
+	STP	(R10, R11), -16(R5)  // Store         I
+	RET
+
+	// Large backward copy for overlapping copies.
+	// Copy 16 bytes and then align srcend (R4) or dstend (R5) to 16-byte alignment.
+copy_long_backward:
+	LDP	-16(R4), (R12, R13)
+	AND	$15, R8, R14
+	SUB	R14, R4, R4
+	SUB	R14, R2, R2
+	LDP	-16(R4), (R6, R7)
+	STP	(R12, R13), -16(R5)
+	LDP	-32(R4), (R8, R9)
+	LDP	-48(R4), (R10, R11)
+	LDP.W	-64(R4), (R12, R13)
+	SUB	R14, R5, R5
+	SUBS	$128, R2, R2
+	BLS	copy64_from_start
+
+loop64_backward:
+	STP	(R6, R7), -16(R5)
+	LDP	-16(R4), (R6, R7)
+	STP	(R8, R9), -32(R5)
+	LDP	-32(R4), (R8, R9)
+	STP	(R10, R11), -48(R5)
+	LDP	-48(R4), (R10, R11)
+	STP.W	(R12, R13), -64(R5)
+	LDP.W	-64(R4), (R12, R13)
+	SUBS	$64, R2, R2
+	BHI	loop64_backward
+
+	// Write the last iteration and copy 64 bytes from the start.
+copy64_from_start:
+	LDP	48(R1), (R2, R3)
+	STP	(R6, R7), -16(R5)
+	LDP	32(R1), (R6, R7)
+	STP	(R8, R9), -32(R5)
+	LDP	16(R1), (R8, R9)
+	STP	(R10, R11), -48(R5)
+	LDP	(R1), (R10, R11)
+	STP	(R12, R13), -64(R5)
+	STP	(R2, R3), 48(R0)
+	STP	(R6, R7), 32(R0)
+	STP	(R8, R9), 16(R0)
+	STP	(R10, R11), (R0)
+	RET
diff --git a/src/runtime/memmove_linux_amd64_test.go b/src/runtime/memmove_linux_amd64_test.go
new file mode 100644
index 0000000..b3ccd90
--- /dev/null
+++ b/src/runtime/memmove_linux_amd64_test.go
@@ -0,0 +1,61 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"os"
+	"reflect"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+// TestMemmoveOverflow maps 3GB of memory and calls memmove on
+// the corresponding slice.
+func TestMemmoveOverflow(t *testing.T) {
+	t.Parallel()
+	// Create a temporary file.
+	tmp, err := os.CreateTemp("", "go-memmovetest")
+	if err != nil {
+		t.Fatal(err)
+	}
+	_, err = tmp.Write(make([]byte, 65536))
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer os.Remove(tmp.Name())
+	defer tmp.Close()
+
+	// Set up mappings.
+	base, _, errno := syscall.Syscall6(syscall.SYS_MMAP,
+		0xa0<<32, 3<<30, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_PRIVATE|syscall.MAP_ANONYMOUS, ^uintptr(0), 0)
+	if errno != 0 {
+		t.Skipf("could not create memory mapping: %s", errno)
+	}
+	syscall.Syscall(syscall.SYS_MUNMAP, base, 3<<30, 0)
+
+	for off := uintptr(0); off < 3<<30; off += 65536 {
+		_, _, errno := syscall.Syscall6(syscall.SYS_MMAP,
+			base+off, 65536, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED|syscall.MAP_FIXED, tmp.Fd(), 0)
+		if errno != 0 {
+			t.Skipf("could not map a page at requested 0x%x: %s", base+off, errno)
+		}
+		defer syscall.Syscall(syscall.SYS_MUNMAP, base+off, 65536, 0)
+	}
+
+	var s []byte
+	sp := (*reflect.SliceHeader)(unsafe.Pointer(&s))
+	sp.Data = base
+	sp.Len, sp.Cap = 3<<30, 3<<30
+
+	n := copy(s[1:], s)
+	if n != 3<<30-1 {
+		t.Fatalf("copied %d bytes, expected %d", n, 3<<30-1)
+	}
+	n = copy(s, s[1:])
+	if n != 3<<30-1 {
+		t.Fatalf("copied %d bytes, expected %d", n, 3<<30-1)
+	}
+}
diff --git a/src/runtime/memmove_mips64x.s b/src/runtime/memmove_mips64x.s
new file mode 100644
index 0000000..8a1b88a
--- /dev/null
+++ b/src/runtime/memmove_mips64x.s
@@ -0,0 +1,107 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT|NOFRAME, $0-24
+	MOVV	to+0(FP), R1
+	MOVV	from+8(FP), R2
+	MOVV	n+16(FP), R3
+	BNE	R3, check
+	RET
+
+check:
+	SGTU	R1, R2, R4
+	BNE	R4, backward
+
+	ADDV	R1, R3, R6 // end pointer
+
+	// if the two pointers are not of same alignments, do byte copying
+	SUBVU	R2, R1, R4
+	AND	$7, R4
+	BNE	R4, out
+
+	// if less than 8 bytes, do byte copying
+	SGTU	$8, R3, R4
+	BNE	R4, out
+
+	// do one byte at a time until 8-aligned
+	AND	$7, R1, R5
+	BEQ	R5, words
+	MOVB	(R2), R4
+	ADDV	$1, R2
+	MOVB	R4, (R1)
+	ADDV	$1, R1
+	JMP	-6(PC)
+
+words:
+	// do 8 bytes at a time if there is room
+	ADDV	$-7, R6, R3 // R3 is end pointer-7
+
+	SGTU	R3, R1, R5
+	BEQ	R5, out
+	MOVV	(R2), R4
+	ADDV	$8, R2
+	MOVV	R4, (R1)
+	ADDV	$8, R1
+	JMP	-6(PC)
+
+out:
+	BEQ	R1, R6, done
+	MOVB	(R2), R4
+	ADDV	$1, R2
+	MOVB	R4, (R1)
+	ADDV	$1, R1
+	JMP	-5(PC)
+done:
+	RET
+
+backward:
+	ADDV	R3, R2 // from-end pointer
+	ADDV	R1, R3, R6 // to-end pointer
+
+	// if the two pointers are not of same alignments, do byte copying
+	SUBVU	R6, R2, R4
+	AND	$7, R4
+	BNE	R4, out1
+
+	// if less than 8 bytes, do byte copying
+	SGTU	$8, R3, R4
+	BNE	R4, out1
+
+	// do one byte at a time until 8-aligned
+	AND	$7, R6, R5
+	BEQ	R5, words1
+	ADDV	$-1, R2
+	MOVB	(R2), R4
+	ADDV	$-1, R6
+	MOVB	R4, (R6)
+	JMP	-6(PC)
+
+words1:
+	// do 8 bytes at a time if there is room
+	ADDV	$7, R1, R3 // R3 is start pointer+7
+
+	SGTU	R6, R3, R5
+	BEQ	R5, out1
+	ADDV	$-8, R2
+	MOVV	(R2), R4
+	ADDV	$-8, R6
+	MOVV	R4, (R6)
+	JMP	-6(PC)
+
+out1:
+	BEQ	R1, R6, done1
+	ADDV	$-1, R2
+	MOVB	(R2), R4
+	ADDV	$-1, R6
+	MOVB	R4, (R6)
+	JMP	-5(PC)
+done1:
+	RET
diff --git a/src/runtime/memmove_mipsx.s b/src/runtime/memmove_mipsx.s
new file mode 100644
index 0000000..6c86558
--- /dev/null
+++ b/src/runtime/memmove_mipsx.s
@@ -0,0 +1,260 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+#include "textflag.h"
+
+#ifdef GOARCH_mips
+#define MOVWHI  MOVWL
+#define MOVWLO  MOVWR
+#else
+#define MOVWHI  MOVWR
+#define MOVWLO  MOVWL
+#endif
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB),NOSPLIT,$-0-12
+	MOVW	n+8(FP), R3
+	MOVW	from+4(FP), R2
+	MOVW	to+0(FP), R1
+
+	ADDU	R3, R2, R4	// end pointer for source
+	ADDU	R3, R1, R5	// end pointer for destination
+
+	// if destination is ahead of source, start at the end of the buffer and go backward.
+	SGTU	R1, R2, R6
+	BNE	R6, backward
+
+	// if less than 4 bytes, use byte by byte copying
+	SGTU	$4, R3, R6
+	BNE	R6, f_small_copy
+
+	// align destination to 4 bytes
+	AND	$3, R1, R6
+	BEQ	R6, f_dest_aligned
+	SUBU	R1, R0, R6
+	AND	$3, R6
+	MOVWHI	0(R2), R7
+	SUBU	R6, R3
+	MOVWLO	3(R2), R7
+	ADDU	R6, R2
+	MOVWHI	R7, 0(R1)
+	ADDU	R6, R1
+
+f_dest_aligned:
+	AND	$31, R3, R7
+	AND	$3, R3, R6
+	SUBU	R7, R5, R7	// end pointer for 32-byte chunks
+	SUBU	R6, R5, R6	// end pointer for 4-byte chunks
+
+	// if source is not aligned, use unaligned reads
+	AND	$3, R2, R8
+	BNE	R8, f_large_ua
+
+f_large:
+	BEQ	R1, R7, f_words
+	ADDU	$32, R1
+	MOVW	0(R2), R8
+	MOVW	4(R2), R9
+	MOVW	8(R2), R10
+	MOVW	12(R2), R11
+	MOVW	16(R2), R12
+	MOVW	20(R2), R13
+	MOVW	24(R2), R14
+	MOVW	28(R2), R15
+	ADDU	$32, R2
+	MOVW	R8, -32(R1)
+	MOVW	R9, -28(R1)
+	MOVW	R10, -24(R1)
+	MOVW	R11, -20(R1)
+	MOVW	R12, -16(R1)
+	MOVW	R13, -12(R1)
+	MOVW	R14, -8(R1)
+	MOVW	R15, -4(R1)
+	JMP	f_large
+
+f_words:
+	BEQ	R1, R6, f_tail
+	ADDU	$4, R1
+	MOVW	0(R2), R8
+	ADDU	$4, R2
+	MOVW	R8, -4(R1)
+	JMP	f_words
+
+f_tail:
+	BEQ	R1, R5, ret
+	MOVWLO	-1(R4), R8
+	MOVWLO	R8, -1(R5)
+
+ret:
+	RET
+
+f_large_ua:
+	BEQ	R1, R7, f_words_ua
+	ADDU	$32, R1
+	MOVWHI	0(R2), R8
+	MOVWHI	4(R2), R9
+	MOVWHI	8(R2), R10
+	MOVWHI	12(R2), R11
+	MOVWHI	16(R2), R12
+	MOVWHI	20(R2), R13
+	MOVWHI	24(R2), R14
+	MOVWHI	28(R2), R15
+	MOVWLO	3(R2), R8
+	MOVWLO	7(R2), R9
+	MOVWLO	11(R2), R10
+	MOVWLO	15(R2), R11
+	MOVWLO	19(R2), R12
+	MOVWLO	23(R2), R13
+	MOVWLO	27(R2), R14
+	MOVWLO	31(R2), R15
+	ADDU	$32, R2
+	MOVW	R8, -32(R1)
+	MOVW	R9, -28(R1)
+	MOVW	R10, -24(R1)
+	MOVW	R11, -20(R1)
+	MOVW	R12, -16(R1)
+	MOVW	R13, -12(R1)
+	MOVW	R14, -8(R1)
+	MOVW	R15, -4(R1)
+	JMP	f_large_ua
+
+f_words_ua:
+	BEQ	R1, R6, f_tail_ua
+	MOVWHI	0(R2), R8
+	ADDU	$4, R1
+	MOVWLO	3(R2), R8
+	ADDU	$4, R2
+	MOVW	R8, -4(R1)
+	JMP	f_words_ua
+
+f_tail_ua:
+	BEQ	R1, R5, ret
+	MOVWHI	-4(R4), R8
+	MOVWLO	-1(R4), R8
+	MOVWLO	R8, -1(R5)
+	JMP	ret
+
+f_small_copy:
+	BEQ	R1, R5, ret
+	ADDU	$1, R1
+	MOVB	0(R2), R6
+	ADDU	$1, R2
+	MOVB	R6, -1(R1)
+	JMP	f_small_copy
+
+backward:
+	SGTU	$4, R3, R6
+	BNE	R6, b_small_copy
+
+	AND	$3, R5, R6
+	BEQ	R6, b_dest_aligned
+	MOVWHI	-4(R4), R7
+	SUBU	R6, R3
+	MOVWLO	-1(R4), R7
+	SUBU	R6, R4
+	MOVWLO	R7, -1(R5)
+	SUBU	R6, R5
+
+b_dest_aligned:
+	AND	$31, R3, R7
+	AND	$3, R3, R6
+	ADDU	R7, R1, R7
+	ADDU	R6, R1, R6
+
+	AND	$3, R4, R8
+	BNE	R8, b_large_ua
+
+b_large:
+	BEQ	R5, R7, b_words
+	ADDU	$-32, R5
+	MOVW	-4(R4), R8
+	MOVW	-8(R4), R9
+	MOVW	-12(R4), R10
+	MOVW	-16(R4), R11
+	MOVW	-20(R4), R12
+	MOVW	-24(R4), R13
+	MOVW	-28(R4), R14
+	MOVW	-32(R4), R15
+	ADDU	$-32, R4
+	MOVW	R8, 28(R5)
+	MOVW	R9, 24(R5)
+	MOVW	R10, 20(R5)
+	MOVW	R11, 16(R5)
+	MOVW	R12, 12(R5)
+	MOVW	R13, 8(R5)
+	MOVW	R14, 4(R5)
+	MOVW	R15, 0(R5)
+	JMP	b_large
+
+b_words:
+	BEQ	R5, R6, b_tail
+	ADDU	$-4, R5
+	MOVW	-4(R4), R8
+	ADDU	$-4, R4
+	MOVW	R8, 0(R5)
+	JMP	b_words
+
+b_tail:
+	BEQ	R5, R1, ret
+	MOVWHI	0(R2), R8	// R2 and R1 have the same alignment so we don't need to load a whole word
+	MOVWHI	R8, 0(R1)
+	JMP	ret
+
+b_large_ua:
+	BEQ	R5, R7, b_words_ua
+	ADDU	$-32, R5
+	MOVWHI	-4(R4), R8
+	MOVWHI	-8(R4), R9
+	MOVWHI	-12(R4), R10
+	MOVWHI	-16(R4), R11
+	MOVWHI	-20(R4), R12
+	MOVWHI	-24(R4), R13
+	MOVWHI	-28(R4), R14
+	MOVWHI	-32(R4), R15
+	MOVWLO	-1(R4), R8
+	MOVWLO	-5(R4), R9
+	MOVWLO	-9(R4), R10
+	MOVWLO	-13(R4), R11
+	MOVWLO	-17(R4), R12
+	MOVWLO	-21(R4), R13
+	MOVWLO	-25(R4), R14
+	MOVWLO	-29(R4), R15
+	ADDU	$-32, R4
+	MOVW	R8, 28(R5)
+	MOVW	R9, 24(R5)
+	MOVW	R10, 20(R5)
+	MOVW	R11, 16(R5)
+	MOVW	R12, 12(R5)
+	MOVW	R13, 8(R5)
+	MOVW	R14, 4(R5)
+	MOVW	R15, 0(R5)
+	JMP	b_large_ua
+
+b_words_ua:
+	BEQ	R5, R6, b_tail_ua
+	MOVWHI	-4(R4), R8
+	ADDU	$-4, R5
+	MOVWLO	-1(R4), R8
+	ADDU	$-4, R4
+	MOVW	R8, 0(R5)
+	JMP	b_words_ua
+
+b_tail_ua:
+	BEQ	R5, R1, ret
+	MOVWHI	(R2), R8
+	MOVWLO	3(R2), R8
+	MOVWHI	R8, 0(R1)
+	JMP ret
+
+b_small_copy:
+	BEQ	R5, R1, ret
+	ADDU	$-1, R5
+	MOVB	-1(R4), R6
+	ADDU	$-1, R4
+	MOVB	R6, 0(R5)
+	JMP	b_small_copy
diff --git a/src/runtime/memmove_plan9_386.s b/src/runtime/memmove_plan9_386.s
new file mode 100644
index 0000000..cfce0e9
--- /dev/null
+++ b/src/runtime/memmove_plan9_386.s
@@ -0,0 +1,137 @@
+// Inferno's libkern/memmove-386.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memmove-386.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $0-12
+	MOVL	to+0(FP), DI
+	MOVL	from+4(FP), SI
+	MOVL	n+8(FP), BX
+
+	// REP instructions have a high startup cost, so we handle small sizes
+	// with some straightline code. The REP MOVSL instruction is really fast
+	// for large sizes. The cutover is approximately 1K.
+tail:
+	TESTL	BX, BX
+	JEQ	move_0
+	CMPL	BX, $2
+	JBE	move_1or2
+	CMPL	BX, $4
+	JB	move_3
+	JE	move_4
+	CMPL	BX, $8
+	JBE	move_5through8
+	CMPL	BX, $16
+	JBE	move_9through16
+
+/*
+ * check and set for backwards
+ */
+	CMPL	SI, DI
+	JLS	back
+
+/*
+ * forward copy loop
+ */
+forward:
+	MOVL	BX, CX
+	SHRL	$2, CX
+	ANDL	$3, BX
+
+	REP;	MOVSL
+	JMP	tail
+/*
+ * check overlap
+ */
+back:
+	MOVL	SI, CX
+	ADDL	BX, CX
+	CMPL	CX, DI
+	JLS	forward
+/*
+ * whole thing backwards has
+ * adjusted addresses
+ */
+
+	ADDL	BX, DI
+	ADDL	BX, SI
+	STD
+
+/*
+ * copy
+ */
+	MOVL	BX, CX
+	SHRL	$2, CX
+	ANDL	$3, BX
+
+	SUBL	$4, DI
+	SUBL	$4, SI
+	REP;	MOVSL
+
+	CLD
+	ADDL	$4, DI
+	ADDL	$4, SI
+	SUBL	BX, DI
+	SUBL	BX, SI
+	JMP	tail
+
+move_1or2:
+	MOVB	(SI), AX
+	MOVB	-1(SI)(BX*1), CX
+	MOVB	AX, (DI)
+	MOVB	CX, -1(DI)(BX*1)
+	RET
+move_0:
+	RET
+move_3:
+	MOVW	(SI), AX
+	MOVB	2(SI), CX
+	MOVW	AX, (DI)
+	MOVB	CX, 2(DI)
+	RET
+move_4:
+	// We need a separate case for 4 to make sure we write pointers atomically.
+	MOVL	(SI), AX
+	MOVL	AX, (DI)
+	RET
+move_5through8:
+	MOVL	(SI), AX
+	MOVL	-4(SI)(BX*1), CX
+	MOVL	AX, (DI)
+	MOVL	CX, -4(DI)(BX*1)
+	RET
+move_9through16:
+	MOVL	(SI), AX
+	MOVL	4(SI), CX
+	MOVL	-8(SI)(BX*1), DX
+	MOVL	-4(SI)(BX*1), BP
+	MOVL	AX, (DI)
+	MOVL	CX, 4(DI)
+	MOVL	DX, -8(DI)(BX*1)
+	MOVL	BP, -4(DI)(BX*1)
+	RET
diff --git a/src/runtime/memmove_plan9_amd64.s b/src/runtime/memmove_plan9_amd64.s
new file mode 100644
index 0000000..217aa60
--- /dev/null
+++ b/src/runtime/memmove_plan9_amd64.s
@@ -0,0 +1,135 @@
+// Derived from Inferno's libkern/memmove-386.s (adapted for amd64)
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memmove-386.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $0-24
+
+	MOVQ	to+0(FP), DI
+	MOVQ	from+8(FP), SI
+	MOVQ	n+16(FP), BX
+
+	// REP instructions have a high startup cost, so we handle small sizes
+	// with some straightline code. The REP MOVSQ instruction is really fast
+	// for large sizes. The cutover is approximately 1K.
+tail:
+	TESTQ	BX, BX
+	JEQ	move_0
+	CMPQ	BX, $2
+	JBE	move_1or2
+	CMPQ	BX, $4
+	JBE	move_3or4
+	CMPQ	BX, $8
+	JB	move_5through7
+	JE	move_8
+	CMPQ	BX, $16
+	JBE	move_9through16
+
+/*
+ * check and set for backwards
+ */
+	CMPQ	SI, DI
+	JLS	back
+
+/*
+ * forward copy loop
+ */
+forward:
+	MOVQ	BX, CX
+	SHRQ	$3, CX
+	ANDQ	$7, BX
+
+	REP;	MOVSQ
+	JMP	tail
+
+back:
+/*
+ * check overlap
+ */
+	MOVQ	SI, CX
+	ADDQ	BX, CX
+	CMPQ	CX, DI
+	JLS	forward
+
+/*
+ * whole thing backwards has
+ * adjusted addresses
+ */
+	ADDQ	BX, DI
+	ADDQ	BX, SI
+	STD
+
+/*
+ * copy
+ */
+	MOVQ	BX, CX
+	SHRQ	$3, CX
+	ANDQ	$7, BX
+
+	SUBQ	$8, DI
+	SUBQ	$8, SI
+	REP;	MOVSQ
+
+	CLD
+	ADDQ	$8, DI
+	ADDQ	$8, SI
+	SUBQ	BX, DI
+	SUBQ	BX, SI
+	JMP	tail
+
+move_1or2:
+	MOVB	(SI), AX
+	MOVB	-1(SI)(BX*1), CX
+	MOVB	AX, (DI)
+	MOVB	CX, -1(DI)(BX*1)
+	RET
+move_0:
+	RET
+move_3or4:
+	MOVW	(SI), AX
+	MOVW	-2(SI)(BX*1), CX
+	MOVW	AX, (DI)
+	MOVW	CX, -2(DI)(BX*1)
+	RET
+move_5through7:
+	MOVL	(SI), AX
+	MOVL	-4(SI)(BX*1), CX
+	MOVL	AX, (DI)
+	MOVL	CX, -4(DI)(BX*1)
+	RET
+move_8:
+	// We need a separate case for 8 to make sure we write pointers atomically.
+	MOVQ	(SI), AX
+	MOVQ	AX, (DI)
+	RET
+move_9through16:
+	MOVQ	(SI), AX
+	MOVQ	-8(SI)(BX*1), CX
+	MOVQ	AX, (DI)
+	MOVQ	CX, -8(DI)(BX*1)
+	RET
diff --git a/src/runtime/memmove_ppc64x.s b/src/runtime/memmove_ppc64x.s
new file mode 100644
index 0000000..edc6452
--- /dev/null
+++ b/src/runtime/memmove_ppc64x.s
@@ -0,0 +1,172 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+
+// target address
+#define TGT R3
+// source address
+#define SRC R4
+// length to move
+#define LEN R5
+// number of doublewords
+#define DWORDS R6
+// number of bytes < 8
+#define BYTES R7
+// const 16 used as index
+#define IDX16 R8
+// temp used for copies, etc.
+#define TMP R9
+// number of 32 byte chunks
+#define QWORDS R10
+
+TEXT runtime·memmove(SB), NOSPLIT|NOFRAME, $0-24
+	MOVD	to+0(FP), TGT
+	MOVD	from+8(FP), SRC
+	MOVD	n+16(FP), LEN
+
+	// Determine if there are doublewords to
+	// copy so a more efficient move can be done
+check:
+	ANDCC	$7, LEN, BYTES	// R7: bytes to copy
+	SRD	$3, LEN, DWORDS	// R6: double words to copy
+	MOVFL	CR0, CR3	// save CR from ANDCC
+	CMP	DWORDS, $0, CR1	// CR1[EQ] set if no double words to copy
+
+	// Determine overlap by subtracting dest - src and comparing against the
+	// length.  This catches the cases where src and dest are in different types
+	// of storage such as stack and static to avoid doing backward move when not
+	// necessary.
+
+	SUB	SRC, TGT, TMP	// dest - src
+	CMPU	TMP, LEN, CR2	// < len?
+	BC	12, 8, backward // BLT CR2 backward
+
+	// Copying forward if no overlap.
+
+	BC	12, 6, checkbytes	// BEQ CR1, checkbytes
+	SRDCC	$2, DWORDS, QWORDS	// 32 byte chunks?
+	BEQ	lt32gt8			// < 32 bytes
+
+	// Prepare for moves of 32 bytes at a time.
+
+forward32setup:
+	DCBTST	(TGT)			// prepare data cache
+	DCBT	(SRC)
+	MOVD	QWORDS, CTR		// Number of 32 byte chunks
+	MOVD	$16, IDX16		// 16 for index
+
+forward32:
+	LXVD2X	(R0)(SRC), VS32		// load 16 bytes
+	LXVD2X	(IDX16)(SRC), VS33	// load 16 bytes
+	ADD	$32, SRC
+	STXVD2X	VS32, (R0)(TGT)		// store 16 bytes
+	STXVD2X	VS33, (IDX16)(TGT)
+	ADD	$32,TGT			// bump up for next set
+	BC	16, 0, forward32	// continue
+	ANDCC	$3, DWORDS		// remaining doublewords
+	BEQ	checkbytes		// only bytes remain
+
+lt32gt8:
+        // At this point >= 8 and < 32
+	// Move 16 bytes if possible
+	CMP     DWORDS, $2
+	BLT     lt16
+	LXVD2X	(R0)(SRC), VS32
+	ADD	$-2, DWORDS
+	STXVD2X	VS32, (R0)(TGT)
+	ADD     $16, SRC
+	ADD     $16, TGT
+
+lt16:	// Move 8 bytes if possible
+	CMP     DWORDS, $1
+	BLT     checkbytes
+	MOVD    0(SRC), TMP
+	ADD	$8, SRC
+	MOVD    TMP, 0(TGT)
+	ADD     $8, TGT
+checkbytes:
+	BC	12, 14, LR		// BEQ lr
+lt8:	// Move word if possible
+	CMP BYTES, $4
+	BLT lt4
+	MOVWZ 0(SRC), TMP
+	ADD $-4, BYTES
+	MOVW TMP, 0(TGT)
+	ADD $4, SRC
+	ADD $4, TGT
+lt4:	// Move halfword if possible
+	CMP BYTES, $2
+	BLT lt2
+	MOVHZ 0(SRC), TMP
+	ADD $-2, BYTES
+	MOVH TMP, 0(TGT)
+	ADD $2, SRC
+	ADD $2, TGT
+lt2:	// Move last byte if 1 left
+	CMP BYTES, $1
+	BC 12, 0, LR	// ble lr
+	MOVBZ 0(SRC), TMP
+	MOVBZ TMP, 0(TGT)
+	RET
+
+backward:
+	// Copying backwards proceeds by copying R7 bytes then copying R6 double words.
+	// R3 and R4 are advanced to the end of the destination/source buffers
+	// respectively and moved back as we copy.
+
+	ADD	LEN, SRC, SRC		// end of source
+	ADD	TGT, LEN, TGT		// end of dest
+
+	BEQ	nobackwardtail		// earlier condition
+
+	MOVD	BYTES, CTR			// bytes to move
+
+backwardtailloop:
+	MOVBZ 	-1(SRC), TMP		// point to last byte
+	SUB	$1,SRC
+	MOVBZ 	TMP, -1(TGT)
+	SUB	$1,TGT
+	BC	16, 0, backwardtailloop // bndz
+
+nobackwardtail:
+	BC	4, 5, LR		// ble CR1 lr
+
+backwardlarge:
+	MOVD	DWORDS, CTR
+	SUB	TGT, SRC, TMP		// Use vsx if moving
+	CMP	TMP, $32		// at least 32 byte chunks
+	BLT	backwardlargeloop	// and distance >= 32
+	SRDCC	$2,DWORDS,QWORDS	// 32 byte chunks
+	BNE	backward32setup
+
+backwardlargeloop:
+	MOVD 	-8(SRC), TMP
+	SUB	$8,SRC
+	MOVD 	TMP, -8(TGT)
+	SUB	$8,TGT
+	BC	16, 0, backwardlargeloop // bndz
+	RET
+
+backward32setup:
+	MOVD	QWORDS, CTR			// set up loop ctr
+	MOVD	$16, IDX16			// 32 bytes at at time
+
+backward32loop:
+	SUB	$32, TGT
+	SUB	$32, SRC
+	LXVD2X	(R0)(TGT), VS32           // load 16 bytes
+	LXVD2X	(IDX16)(TGT), VS33
+	STXVD2X	VS32, (R0)(SRC)           // store 16 bytes
+	STXVD2X	VS33, (IDX16)(SRC)
+	BC      16, 0, backward32loop   // bndz
+	BC	4, 5, LR		// ble CR1 lr
+	MOVD	DWORDS, CTR
+	BR	backwardlargeloop
diff --git a/src/runtime/memmove_riscv64.s b/src/runtime/memmove_riscv64.s
new file mode 100644
index 0000000..5dec8d0
--- /dev/null
+++ b/src/runtime/memmove_riscv64.s
@@ -0,0 +1,98 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// void runtime·memmove(void*, void*, uintptr)
+TEXT runtime·memmove(SB),NOSPLIT,$-0-24
+	MOV	to+0(FP), T0
+	MOV	from+8(FP), T1
+	MOV	n+16(FP), T2
+	ADD	T1, T2, T5
+
+	// If the destination is ahead of the source, start at the end of the
+	// buffer and go backward.
+	BLTU	T1, T0, b
+
+	// If less than eight bytes, do one byte at a time.
+	SLTU	$8, T2, T3
+	BNE	T3, ZERO, f_outcheck
+
+	// Do one byte at a time until from is eight-aligned.
+	JMP	f_aligncheck
+f_align:
+	MOVB	(T1), T3
+	MOVB	T3, (T0)
+	ADD	$1, T0
+	ADD	$1, T1
+f_aligncheck:
+	AND	$7, T1, T3
+	BNE	T3, ZERO, f_align
+
+	// Do eight bytes at a time as long as there is room.
+	ADD	$-7, T5, T6
+	JMP	f_wordscheck
+f_words:
+	MOV	(T1), T3
+	MOV	T3, (T0)
+	ADD	$8, T0
+	ADD	$8, T1
+f_wordscheck:
+	SLTU	T6, T1, T3
+	BNE	T3, ZERO, f_words
+
+	// Finish off the remaining partial word.
+	JMP 	f_outcheck
+f_out:
+	MOVB	(T1), T3
+	MOVB	T3, (T0)
+	ADD	$1, T0
+	ADD	$1, T1
+f_outcheck:
+	BNE	T1, T5, f_out
+
+	RET
+
+b:
+	ADD	T0, T2, T4
+	// If less than eight bytes, do one byte at a time.
+	SLTU	$8, T2, T3
+	BNE	T3, ZERO, b_outcheck
+
+	// Do one byte at a time until from+n is eight-aligned.
+	JMP	b_aligncheck
+b_align:
+	ADD	$-1, T4
+	ADD	$-1, T5
+	MOVB	(T5), T3
+	MOVB	T3, (T4)
+b_aligncheck:
+	AND	$7, T5, T3
+	BNE	T3, ZERO, b_align
+
+	// Do eight bytes at a time as long as there is room.
+	ADD	$7, T1, T6
+	JMP	b_wordscheck
+b_words:
+	ADD	$-8, T4
+	ADD	$-8, T5
+	MOV	(T5), T3
+	MOV	T3, (T4)
+b_wordscheck:
+	SLTU	T5, T6, T3
+	BNE	T3, ZERO, b_words
+
+	// Finish off the remaining partial word.
+	JMP	b_outcheck
+b_out:
+	ADD	$-1, T4
+	ADD	$-1, T5
+	MOVB	(T5), T3
+	MOVB	T3, (T4)
+b_outcheck:
+	BNE	T5, T1, b_out
+
+	RET
diff --git a/src/runtime/memmove_s390x.s b/src/runtime/memmove_s390x.s
new file mode 100644
index 0000000..f4c2b87
--- /dev/null
+++ b/src/runtime/memmove_s390x.s
@@ -0,0 +1,191 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB),NOSPLIT|NOFRAME,$0-24
+	MOVD	to+0(FP), R6
+	MOVD	from+8(FP), R4
+	MOVD	n+16(FP), R5
+
+	CMPBEQ	R6, R4, done
+
+start:
+	CMPBLE	R5, $3, move0to3
+	CMPBLE	R5, $7, move4to7
+	CMPBLE	R5, $11, move8to11
+	CMPBLE	R5, $15, move12to15
+	CMPBNE	R5, $16, movemt16
+	MOVD	0(R4), R7
+	MOVD	8(R4), R8
+	MOVD	R7, 0(R6)
+	MOVD	R8, 8(R6)
+	RET
+
+movemt16:
+	CMPBGT	R4, R6, forwards
+	ADD	R5, R4, R7
+	CMPBLE	R7, R6, forwards
+	ADD	R5, R6, R8
+backwards:
+	MOVD	-8(R7), R3
+	MOVD	R3, -8(R8)
+	MOVD	-16(R7), R3
+	MOVD	R3, -16(R8)
+	ADD	$-16, R5
+	ADD	$-16, R7
+	ADD	$-16, R8
+	CMP	R5, $16
+	BGE	backwards
+	BR	start
+
+forwards:
+	CMPBGT	R5, $64, forwards_fast
+	MOVD	0(R4), R3
+	MOVD	R3, 0(R6)
+	MOVD	8(R4), R3
+	MOVD	R3, 8(R6)
+	ADD	$16, R4
+	ADD	$16, R6
+	ADD	$-16, R5
+	CMP	R5, $16
+	BGE	forwards
+	BR	start
+
+forwards_fast:
+	CMP	R5, $256
+	BLE	forwards_small
+	MVC	$256, 0(R4), 0(R6)
+	ADD	$256, R4
+	ADD	$256, R6
+	ADD	$-256, R5
+	BR	forwards_fast
+
+forwards_small:
+	CMPBEQ	R5, $0, done
+	ADD	$-1, R5
+	EXRL	$memmove_exrl_mvc<>(SB), R5
+	RET
+
+move0to3:
+	CMPBEQ	R5, $0, done
+move1:
+	CMPBNE	R5, $1, move2
+	MOVB	0(R4), R3
+	MOVB	R3, 0(R6)
+	RET
+move2:
+	CMPBNE	R5, $2, move3
+	MOVH	0(R4), R3
+	MOVH	R3, 0(R6)
+	RET
+move3:
+	MOVH	0(R4), R3
+	MOVB	2(R4), R7
+	MOVH	R3, 0(R6)
+	MOVB	R7, 2(R6)
+	RET
+
+move4to7:
+	CMPBNE	R5, $4, move5
+	MOVW	0(R4), R3
+	MOVW	R3, 0(R6)
+	RET
+move5:
+	CMPBNE	R5, $5, move6
+	MOVW	0(R4), R3
+	MOVB	4(R4), R7
+	MOVW	R3, 0(R6)
+	MOVB	R7, 4(R6)
+	RET
+move6:
+	CMPBNE	R5, $6, move7
+	MOVW	0(R4), R3
+	MOVH	4(R4), R7
+	MOVW	R3, 0(R6)
+	MOVH	R7, 4(R6)
+	RET
+move7:
+	MOVW	0(R4), R3
+	MOVH	4(R4), R7
+	MOVB	6(R4), R8
+	MOVW	R3, 0(R6)
+	MOVH	R7, 4(R6)
+	MOVB	R8, 6(R6)
+	RET
+
+move8to11:
+	CMPBNE	R5, $8, move9
+	MOVD	0(R4), R3
+	MOVD	R3, 0(R6)
+	RET
+move9:
+	CMPBNE	R5, $9, move10
+	MOVD	0(R4), R3
+	MOVB	8(R4), R7
+	MOVD	R3, 0(R6)
+	MOVB	R7, 8(R6)
+	RET
+move10:
+	CMPBNE	R5, $10, move11
+	MOVD	0(R4), R3
+	MOVH	8(R4), R7
+	MOVD	R3, 0(R6)
+	MOVH	R7, 8(R6)
+	RET
+move11:
+	MOVD	0(R4), R3
+	MOVH	8(R4), R7
+	MOVB	10(R4), R8
+	MOVD	R3, 0(R6)
+	MOVH	R7, 8(R6)
+	MOVB	R8, 10(R6)
+	RET
+
+move12to15:
+	CMPBNE	R5, $12, move13
+	MOVD	0(R4), R3
+	MOVW	8(R4), R7
+	MOVD	R3, 0(R6)
+	MOVW	R7, 8(R6)
+	RET
+move13:
+	CMPBNE	R5, $13, move14
+	MOVD	0(R4), R3
+	MOVW	8(R4), R7
+	MOVB	12(R4), R8
+	MOVD	R3, 0(R6)
+	MOVW	R7, 8(R6)
+	MOVB	R8, 12(R6)
+	RET
+move14:
+	CMPBNE	R5, $14, move15
+	MOVD	0(R4), R3
+	MOVW	8(R4), R7
+	MOVH	12(R4), R8
+	MOVD	R3, 0(R6)
+	MOVW	R7, 8(R6)
+	MOVH	R8, 12(R6)
+	RET
+move15:
+	MOVD	0(R4), R3
+	MOVW	8(R4), R7
+	MOVH	12(R4), R8
+	MOVB	14(R4), R10
+	MOVD	R3, 0(R6)
+	MOVW	R7, 8(R6)
+	MOVH	R8, 12(R6)
+	MOVB	R10, 14(R6)
+done:
+	RET
+
+// DO NOT CALL - target for exrl (execute relative long) instruction.
+TEXT memmove_exrl_mvc<>(SB),NOSPLIT|NOFRAME,$0-0
+	MVC	$1, 0(R4), 0(R6)
+	MOVD	R0, 0(R0)
+	RET
+
diff --git a/src/runtime/memmove_test.go b/src/runtime/memmove_test.go
new file mode 100644
index 0000000..7c9d2ad
--- /dev/null
+++ b/src/runtime/memmove_test.go
@@ -0,0 +1,597 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"crypto/rand"
+	"encoding/binary"
+	"fmt"
+	"internal/race"
+	"internal/testenv"
+	. "runtime"
+	"sync/atomic"
+	"testing"
+	"unsafe"
+)
+
+func TestMemmove(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	t.Parallel()
+	size := 256
+	if testing.Short() {
+		size = 128 + 16
+	}
+	src := make([]byte, size)
+	dst := make([]byte, size)
+	for i := 0; i < size; i++ {
+		src[i] = byte(128 + (i & 127))
+	}
+	for i := 0; i < size; i++ {
+		dst[i] = byte(i & 127)
+	}
+	for n := 0; n <= size; n++ {
+		for x := 0; x <= size-n; x++ { // offset in src
+			for y := 0; y <= size-n; y++ { // offset in dst
+				copy(dst[y:y+n], src[x:x+n])
+				for i := 0; i < y; i++ {
+					if dst[i] != byte(i&127) {
+						t.Fatalf("prefix dst[%d] = %d", i, dst[i])
+					}
+				}
+				for i := y; i < y+n; i++ {
+					if dst[i] != byte(128+((i-y+x)&127)) {
+						t.Fatalf("copied dst[%d] = %d", i, dst[i])
+					}
+					dst[i] = byte(i & 127) // reset dst
+				}
+				for i := y + n; i < size; i++ {
+					if dst[i] != byte(i&127) {
+						t.Fatalf("suffix dst[%d] = %d", i, dst[i])
+					}
+				}
+			}
+		}
+	}
+}
+
+func TestMemmoveAlias(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	t.Parallel()
+	size := 256
+	if testing.Short() {
+		size = 128 + 16
+	}
+	buf := make([]byte, size)
+	for i := 0; i < size; i++ {
+		buf[i] = byte(i)
+	}
+	for n := 0; n <= size; n++ {
+		for x := 0; x <= size-n; x++ { // src offset
+			for y := 0; y <= size-n; y++ { // dst offset
+				copy(buf[y:y+n], buf[x:x+n])
+				for i := 0; i < y; i++ {
+					if buf[i] != byte(i) {
+						t.Fatalf("prefix buf[%d] = %d", i, buf[i])
+					}
+				}
+				for i := y; i < y+n; i++ {
+					if buf[i] != byte(i-y+x) {
+						t.Fatalf("copied buf[%d] = %d", i, buf[i])
+					}
+					buf[i] = byte(i) // reset buf
+				}
+				for i := y + n; i < size; i++ {
+					if buf[i] != byte(i) {
+						t.Fatalf("suffix buf[%d] = %d", i, buf[i])
+					}
+				}
+			}
+		}
+	}
+}
+
+func TestMemmoveLarge0x180000(t *testing.T) {
+	if testing.Short() && testenv.Builder() == "" {
+		t.Skip("-short")
+	}
+
+	t.Parallel()
+	if race.Enabled {
+		t.Skip("skipping large memmove test under race detector")
+	}
+	testSize(t, 0x180000)
+}
+
+func TestMemmoveOverlapLarge0x120000(t *testing.T) {
+	if testing.Short() && testenv.Builder() == "" {
+		t.Skip("-short")
+	}
+
+	t.Parallel()
+	if race.Enabled {
+		t.Skip("skipping large memmove test under race detector")
+	}
+	testOverlap(t, 0x120000)
+}
+
+func testSize(t *testing.T, size int) {
+	src := make([]byte, size)
+	dst := make([]byte, size)
+	_, _ = rand.Read(src)
+	_, _ = rand.Read(dst)
+
+	ref := make([]byte, size)
+	copyref(ref, dst)
+
+	for n := size - 50; n > 1; n >>= 1 {
+		for x := 0; x <= size-n; x = x*7 + 1 { // offset in src
+			for y := 0; y <= size-n; y = y*9 + 1 { // offset in dst
+				copy(dst[y:y+n], src[x:x+n])
+				copyref(ref[y:y+n], src[x:x+n])
+				p := cmpb(dst, ref)
+				if p >= 0 {
+					t.Fatalf("Copy failed, copying from src[%d:%d] to dst[%d:%d].\nOffset %d is different, %v != %v", x, x+n, y, y+n, p, dst[p], ref[p])
+				}
+			}
+		}
+	}
+}
+
+func testOverlap(t *testing.T, size int) {
+	src := make([]byte, size)
+	test := make([]byte, size)
+	ref := make([]byte, size)
+	_, _ = rand.Read(src)
+
+	for n := size - 50; n > 1; n >>= 1 {
+		for x := 0; x <= size-n; x = x*7 + 1 { // offset in src
+			for y := 0; y <= size-n; y = y*9 + 1 { // offset in dst
+				// Reset input
+				copyref(test, src)
+				copyref(ref, src)
+				copy(test[y:y+n], test[x:x+n])
+				if y <= x {
+					copyref(ref[y:y+n], ref[x:x+n])
+				} else {
+					copybw(ref[y:y+n], ref[x:x+n])
+				}
+				p := cmpb(test, ref)
+				if p >= 0 {
+					t.Fatalf("Copy failed, copying from src[%d:%d] to dst[%d:%d].\nOffset %d is different, %v != %v", x, x+n, y, y+n, p, test[p], ref[p])
+				}
+			}
+		}
+	}
+
+}
+
+// Forward copy.
+func copyref(dst, src []byte) {
+	for i, v := range src {
+		dst[i] = v
+	}
+}
+
+// Backwards copy
+func copybw(dst, src []byte) {
+	if len(src) == 0 {
+		return
+	}
+	for i := len(src) - 1; i >= 0; i-- {
+		dst[i] = src[i]
+	}
+}
+
+// Returns offset of difference
+func matchLen(a, b []byte, max int) int {
+	a = a[:max]
+	b = b[:max]
+	for i, av := range a {
+		if b[i] != av {
+			return i
+		}
+	}
+	return max
+}
+
+func cmpb(a, b []byte) int {
+	l := matchLen(a, b, len(a))
+	if l == len(a) {
+		return -1
+	}
+	return l
+}
+
+// Ensure that memmove writes pointers atomically, so the GC won't
+// observe a partially updated pointer.
+func TestMemmoveAtomicity(t *testing.T) {
+	if race.Enabled {
+		t.Skip("skip under the race detector -- this test is intentionally racy")
+	}
+
+	var x int
+
+	for _, backward := range []bool{true, false} {
+		for _, n := range []int{3, 4, 5, 6, 7, 8, 9, 10, 15, 25, 49} {
+			n := n
+
+			// test copying [N]*int.
+			sz := uintptr(n * PtrSize)
+			name := fmt.Sprint(sz)
+			if backward {
+				name += "-backward"
+			} else {
+				name += "-forward"
+			}
+			t.Run(name, func(t *testing.T) {
+				// Use overlapping src and dst to force forward/backward copy.
+				var s [100]*int
+				src := s[n-1 : 2*n-1]
+				dst := s[:n]
+				if backward {
+					src, dst = dst, src
+				}
+				for i := range src {
+					src[i] = &x
+				}
+				for i := range dst {
+					dst[i] = nil
+				}
+
+				var ready uint32
+				go func() {
+					sp := unsafe.Pointer(&src[0])
+					dp := unsafe.Pointer(&dst[0])
+					atomic.StoreUint32(&ready, 1)
+					for i := 0; i < 10000; i++ {
+						Memmove(dp, sp, sz)
+						MemclrNoHeapPointers(dp, sz)
+					}
+					atomic.StoreUint32(&ready, 2)
+				}()
+
+				for atomic.LoadUint32(&ready) == 0 {
+					Gosched()
+				}
+
+				for atomic.LoadUint32(&ready) != 2 {
+					for i := range dst {
+						p := dst[i]
+						if p != nil && p != &x {
+							t.Fatalf("got partially updated pointer %p at dst[%d], want either nil or %p", p, i, &x)
+						}
+					}
+				}
+			})
+		}
+	}
+}
+
+func benchmarkSizes(b *testing.B, sizes []int, fn func(b *testing.B, n int)) {
+	for _, n := range sizes {
+		b.Run(fmt.Sprint(n), func(b *testing.B) {
+			b.SetBytes(int64(n))
+			fn(b, n)
+		})
+	}
+}
+
+var bufSizes = []int{
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+	32, 64, 128, 256, 512, 1024, 2048, 4096,
+}
+var bufSizesOverlap = []int{
+	32, 64, 128, 256, 512, 1024, 2048, 4096,
+}
+
+func BenchmarkMemmove(b *testing.B) {
+	benchmarkSizes(b, bufSizes, func(b *testing.B, n int) {
+		x := make([]byte, n)
+		y := make([]byte, n)
+		for i := 0; i < b.N; i++ {
+			copy(x, y)
+		}
+	})
+}
+
+func BenchmarkMemmoveOverlap(b *testing.B) {
+	benchmarkSizes(b, bufSizesOverlap, func(b *testing.B, n int) {
+		x := make([]byte, n+16)
+		for i := 0; i < b.N; i++ {
+			copy(x[16:n+16], x[:n])
+		}
+	})
+}
+
+func BenchmarkMemmoveUnalignedDst(b *testing.B) {
+	benchmarkSizes(b, bufSizes, func(b *testing.B, n int) {
+		x := make([]byte, n+1)
+		y := make([]byte, n)
+		for i := 0; i < b.N; i++ {
+			copy(x[1:], y)
+		}
+	})
+}
+
+func BenchmarkMemmoveUnalignedDstOverlap(b *testing.B) {
+	benchmarkSizes(b, bufSizesOverlap, func(b *testing.B, n int) {
+		x := make([]byte, n+16)
+		for i := 0; i < b.N; i++ {
+			copy(x[16:n+16], x[1:n+1])
+		}
+	})
+}
+
+func BenchmarkMemmoveUnalignedSrc(b *testing.B) {
+	benchmarkSizes(b, bufSizes, func(b *testing.B, n int) {
+		x := make([]byte, n)
+		y := make([]byte, n+1)
+		for i := 0; i < b.N; i++ {
+			copy(x, y[1:])
+		}
+	})
+}
+
+func BenchmarkMemmoveUnalignedSrcOverlap(b *testing.B) {
+	benchmarkSizes(b, bufSizesOverlap, func(b *testing.B, n int) {
+		x := make([]byte, n+1)
+		for i := 0; i < b.N; i++ {
+			copy(x[1:n+1], x[:n])
+		}
+	})
+}
+
+func TestMemclr(t *testing.T) {
+	size := 512
+	if testing.Short() {
+		size = 128 + 16
+	}
+	mem := make([]byte, size)
+	for i := 0; i < size; i++ {
+		mem[i] = 0xee
+	}
+	for n := 0; n < size; n++ {
+		for x := 0; x <= size-n; x++ { // offset in mem
+			MemclrBytes(mem[x : x+n])
+			for i := 0; i < x; i++ {
+				if mem[i] != 0xee {
+					t.Fatalf("overwrite prefix mem[%d] = %d", i, mem[i])
+				}
+			}
+			for i := x; i < x+n; i++ {
+				if mem[i] != 0 {
+					t.Fatalf("failed clear mem[%d] = %d", i, mem[i])
+				}
+				mem[i] = 0xee
+			}
+			for i := x + n; i < size; i++ {
+				if mem[i] != 0xee {
+					t.Fatalf("overwrite suffix mem[%d] = %d", i, mem[i])
+				}
+			}
+		}
+	}
+}
+
+func BenchmarkMemclr(b *testing.B) {
+	for _, n := range []int{5, 16, 64, 256, 4096, 65536} {
+		x := make([]byte, n)
+		b.Run(fmt.Sprint(n), func(b *testing.B) {
+			b.SetBytes(int64(n))
+			for i := 0; i < b.N; i++ {
+				MemclrBytes(x)
+			}
+		})
+	}
+	for _, m := range []int{1, 4, 8, 16, 64} {
+		x := make([]byte, m<<20)
+		b.Run(fmt.Sprint(m, "M"), func(b *testing.B) {
+			b.SetBytes(int64(m << 20))
+			for i := 0; i < b.N; i++ {
+				MemclrBytes(x)
+			}
+		})
+	}
+}
+
+func BenchmarkGoMemclr(b *testing.B) {
+	benchmarkSizes(b, []int{5, 16, 64, 256}, func(b *testing.B, n int) {
+		x := make([]byte, n)
+		for i := 0; i < b.N; i++ {
+			for j := range x {
+				x[j] = 0
+			}
+		}
+	})
+}
+
+func BenchmarkClearFat8(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [8 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat12(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [12 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat16(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [16 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat24(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [24 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat32(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [32 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat40(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [40 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat48(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [48 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat56(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [56 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat64(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [64 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat128(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [128 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat256(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [256 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat512(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [512 / 4]uint32
+		_ = x
+	}
+}
+func BenchmarkClearFat1024(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x [1024 / 4]uint32
+		_ = x
+	}
+}
+
+func BenchmarkCopyFat8(b *testing.B) {
+	var x [8 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+func BenchmarkCopyFat12(b *testing.B) {
+	var x [12 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+func BenchmarkCopyFat16(b *testing.B) {
+	var x [16 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+func BenchmarkCopyFat24(b *testing.B) {
+	var x [24 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+func BenchmarkCopyFat32(b *testing.B) {
+	var x [32 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+func BenchmarkCopyFat64(b *testing.B) {
+	var x [64 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+func BenchmarkCopyFat128(b *testing.B) {
+	var x [128 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+func BenchmarkCopyFat256(b *testing.B) {
+	var x [256 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+func BenchmarkCopyFat512(b *testing.B) {
+	var x [512 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+func BenchmarkCopyFat520(b *testing.B) {
+	var x [520 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+func BenchmarkCopyFat1024(b *testing.B) {
+	var x [1024 / 4]uint32
+	for i := 0; i < b.N; i++ {
+		y := x
+		_ = y
+	}
+}
+
+// BenchmarkIssue18740 ensures that memmove uses 4 and 8 byte load/store to move 4 and 8 bytes.
+// It used to do 2 2-byte load/stores, which leads to a pipeline stall
+// when we try to read the result with one 4-byte load.
+func BenchmarkIssue18740(b *testing.B) {
+	benchmarks := []struct {
+		name  string
+		nbyte int
+		f     func([]byte) uint64
+	}{
+		{"2byte", 2, func(buf []byte) uint64 { return uint64(binary.LittleEndian.Uint16(buf)) }},
+		{"4byte", 4, func(buf []byte) uint64 { return uint64(binary.LittleEndian.Uint32(buf)) }},
+		{"8byte", 8, func(buf []byte) uint64 { return binary.LittleEndian.Uint64(buf) }},
+	}
+
+	var g [4096]byte
+	for _, bm := range benchmarks {
+		buf := make([]byte, bm.nbyte)
+		b.Run(bm.name, func(b *testing.B) {
+			for j := 0; j < b.N; j++ {
+				for i := 0; i < 4096; i += bm.nbyte {
+					copy(buf[:], g[i:])
+					sink += bm.f(buf[:])
+				}
+			}
+		})
+	}
+}
diff --git a/src/runtime/memmove_wasm.s b/src/runtime/memmove_wasm.s
new file mode 100644
index 0000000..8525fea
--- /dev/null
+++ b/src/runtime/memmove_wasm.s
@@ -0,0 +1,154 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $0-24
+	MOVD to+0(FP), R0
+	MOVD from+8(FP), R1
+	MOVD n+16(FP), R2
+
+	Get R0
+	Get R1
+	I64LtU
+	If // forward
+exit_forward_64:
+		Block
+loop_forward_64:
+			Loop
+				Get R2
+				I64Const $8
+				I64LtU
+				BrIf exit_forward_64
+
+				MOVD 0(R1), 0(R0)
+
+				Get R0
+				I64Const $8
+				I64Add
+				Set R0
+
+				Get R1
+				I64Const $8
+				I64Add
+				Set R1
+
+				Get R2
+				I64Const $8
+				I64Sub
+				Set R2
+
+				Br loop_forward_64
+			End
+		End
+
+loop_forward_8:
+		Loop
+			Get R2
+			I64Eqz
+			If
+				RET
+			End
+
+			Get R0
+			I32WrapI64
+			I64Load8U (R1)
+			I64Store8 $0
+
+			Get R0
+			I64Const $1
+			I64Add
+			Set R0
+
+			Get R1
+			I64Const $1
+			I64Add
+			Set R1
+
+			Get R2
+			I64Const $1
+			I64Sub
+			Set R2
+
+			Br loop_forward_8
+		End
+
+	Else
+		// backward
+		Get R0
+		Get R2
+		I64Add
+		Set R0
+
+		Get R1
+		Get R2
+		I64Add
+		Set R1
+
+exit_backward_64:
+		Block
+loop_backward_64:
+			Loop
+				Get R2
+				I64Const $8
+				I64LtU
+				BrIf exit_backward_64
+
+				Get R0
+				I64Const $8
+				I64Sub
+				Set R0
+
+				Get R1
+				I64Const $8
+				I64Sub
+				Set R1
+
+				Get R2
+				I64Const $8
+				I64Sub
+				Set R2
+
+				MOVD 0(R1), 0(R0)
+
+				Br loop_backward_64
+			End
+		End
+
+loop_backward_8:
+		Loop
+			Get R2
+			I64Eqz
+			If
+				RET
+			End
+
+			Get R0
+			I64Const $1
+			I64Sub
+			Set R0
+
+			Get R1
+			I64Const $1
+			I64Sub
+			Set R1
+
+			Get R2
+			I64Const $1
+			I64Sub
+			Set R2
+
+			Get R0
+			I32WrapI64
+			I64Load8U (R1)
+			I64Store8 $0
+
+			Br loop_backward_8
+		End
+	End
+
+	UNDEF
diff --git a/src/runtime/metrics.go b/src/runtime/metrics.go
new file mode 100644
index 0000000..3e8dbda
--- /dev/null
+++ b/src/runtime/metrics.go
@@ -0,0 +1,510 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Metrics implementation exported to runtime/metrics.
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+var (
+	// metrics is a map of runtime/metrics keys to
+	// data used by the runtime to sample each metric's
+	// value.
+	metricsSema uint32 = 1
+	metricsInit bool
+	metrics     map[string]metricData
+
+	sizeClassBuckets []float64
+	timeHistBuckets  []float64
+)
+
+type metricData struct {
+	// deps is the set of runtime statistics that this metric
+	// depends on. Before compute is called, the statAggregate
+	// which will be passed must ensure() these dependencies.
+	deps statDepSet
+
+	// compute is a function that populates a metricValue
+	// given a populated statAggregate structure.
+	compute func(in *statAggregate, out *metricValue)
+}
+
+// initMetrics initializes the metrics map if it hasn't been yet.
+//
+// metricsSema must be held.
+func initMetrics() {
+	if metricsInit {
+		return
+	}
+
+	sizeClassBuckets = make([]float64, _NumSizeClasses, _NumSizeClasses+1)
+	// Skip size class 0 which is a stand-in for large objects, but large
+	// objects are tracked separately (and they actually get placed in
+	// the last bucket, not the first).
+	sizeClassBuckets[0] = 1 // The smallest allocation is 1 byte in size.
+	for i := 1; i < _NumSizeClasses; i++ {
+		// Size classes have an inclusive upper-bound
+		// and exclusive lower bound (e.g. 48-byte size class is
+		// (32, 48]) whereas we want and inclusive lower-bound
+		// and exclusive upper-bound (e.g. 48-byte size class is
+		// [33, 49). We can achieve this by shifting all bucket
+		// boundaries up by 1.
+		//
+		// Also, a float64 can precisely represent integers with
+		// value up to 2^53 and size classes are relatively small
+		// (nowhere near 2^48 even) so this will give us exact
+		// boundaries.
+		sizeClassBuckets[i] = float64(class_to_size[i] + 1)
+	}
+	sizeClassBuckets = append(sizeClassBuckets, float64Inf())
+
+	timeHistBuckets = timeHistogramMetricsBuckets()
+	metrics = map[string]metricData{
+		"/gc/cycles/automatic:gc-cycles": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.gcCyclesDone - in.sysStats.gcCyclesForced
+			},
+		},
+		"/gc/cycles/forced:gc-cycles": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.gcCyclesForced
+			},
+		},
+		"/gc/cycles/total:gc-cycles": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.gcCyclesDone
+			},
+		},
+		"/gc/heap/allocs-by-size:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				hist := out.float64HistOrInit(sizeClassBuckets)
+				hist.counts[len(hist.counts)-1] = uint64(in.heapStats.largeAllocCount)
+				// Cut off the first index which is ostensibly for size class 0,
+				// but large objects are tracked separately so it's actually unused.
+				for i, count := range in.heapStats.smallAllocCount[1:] {
+					hist.counts[i] = uint64(count)
+				}
+			},
+		},
+		"/gc/heap/frees-by-size:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				hist := out.float64HistOrInit(sizeClassBuckets)
+				hist.counts[len(hist.counts)-1] = uint64(in.heapStats.largeFreeCount)
+				// Cut off the first index which is ostensibly for size class 0,
+				// but large objects are tracked separately so it's actually unused.
+				for i, count := range in.heapStats.smallFreeCount[1:] {
+					hist.counts[i] = uint64(count)
+				}
+			},
+		},
+		"/gc/heap/goal:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.heapGoal
+			},
+		},
+		"/gc/heap/objects:objects": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.heapStats.numObjects
+			},
+		},
+		"/gc/pauses:seconds": {
+			compute: func(_ *statAggregate, out *metricValue) {
+				hist := out.float64HistOrInit(timeHistBuckets)
+				// The bottom-most bucket, containing negative values, is tracked
+				// as a separately as underflow, so fill that in manually and then
+				// iterate over the rest.
+				hist.counts[0] = atomic.Load64(&memstats.gcPauseDist.underflow)
+				for i := range memstats.gcPauseDist.counts {
+					hist.counts[i+1] = atomic.Load64(&memstats.gcPauseDist.counts[i])
+				}
+			},
+		},
+		"/memory/classes/heap/free:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.committed - in.heapStats.inHeap -
+					in.heapStats.inStacks - in.heapStats.inWorkBufs -
+					in.heapStats.inPtrScalarBits)
+			},
+		},
+		"/memory/classes/heap/objects:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.heapStats.inObjects
+			},
+		},
+		"/memory/classes/heap/released:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.released)
+			},
+		},
+		"/memory/classes/heap/stacks:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.inStacks)
+			},
+		},
+		"/memory/classes/heap/unused:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.inHeap) - in.heapStats.inObjects
+			},
+		},
+		"/memory/classes/metadata/mcache/free:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.mCacheSys - in.sysStats.mCacheInUse
+			},
+		},
+		"/memory/classes/metadata/mcache/inuse:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.mCacheInUse
+			},
+		},
+		"/memory/classes/metadata/mspan/free:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.mSpanSys - in.sysStats.mSpanInUse
+			},
+		},
+		"/memory/classes/metadata/mspan/inuse:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.mSpanInUse
+			},
+		},
+		"/memory/classes/metadata/other:bytes": {
+			deps: makeStatDepSet(heapStatsDep, sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.inWorkBufs+in.heapStats.inPtrScalarBits) + in.sysStats.gcMiscSys
+			},
+		},
+		"/memory/classes/os-stacks:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.stacksSys
+			},
+		},
+		"/memory/classes/other:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.otherSys
+			},
+		},
+		"/memory/classes/profiling/buckets:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.buckHashSys
+			},
+		},
+		"/memory/classes/total:bytes": {
+			deps: makeStatDepSet(heapStatsDep, sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.committed+in.heapStats.released) +
+					in.sysStats.stacksSys + in.sysStats.mSpanSys +
+					in.sysStats.mCacheSys + in.sysStats.buckHashSys +
+					in.sysStats.gcMiscSys + in.sysStats.otherSys
+			},
+		},
+		"/sched/goroutines:goroutines": {
+			compute: func(_ *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(gcount())
+			},
+		},
+	}
+	metricsInit = true
+}
+
+// statDep is a dependency on a group of statistics
+// that a metric might have.
+type statDep uint
+
+const (
+	heapStatsDep statDep = iota // corresponds to heapStatsAggregate
+	sysStatsDep                 // corresponds to sysStatsAggregate
+	numStatsDeps
+)
+
+// statDepSet represents a set of statDeps.
+//
+// Under the hood, it's a bitmap.
+type statDepSet [1]uint64
+
+// makeStatDepSet creates a new statDepSet from a list of statDeps.
+func makeStatDepSet(deps ...statDep) statDepSet {
+	var s statDepSet
+	for _, d := range deps {
+		s[d/64] |= 1 << (d % 64)
+	}
+	return s
+}
+
+// differennce returns set difference of s from b as a new set.
+func (s statDepSet) difference(b statDepSet) statDepSet {
+	var c statDepSet
+	for i := range s {
+		c[i] = s[i] &^ b[i]
+	}
+	return c
+}
+
+// union returns the union of the two sets as a new set.
+func (s statDepSet) union(b statDepSet) statDepSet {
+	var c statDepSet
+	for i := range s {
+		c[i] = s[i] | b[i]
+	}
+	return c
+}
+
+// empty returns true if there are no dependencies in the set.
+func (s *statDepSet) empty() bool {
+	for _, c := range s {
+		if c != 0 {
+			return false
+		}
+	}
+	return true
+}
+
+// has returns true if the set contains a given statDep.
+func (s *statDepSet) has(d statDep) bool {
+	return s[d/64]&(1<<(d%64)) != 0
+}
+
+// heapStatsAggregate represents memory stats obtained from the
+// runtime. This set of stats is grouped together because they
+// depend on each other in some way to make sense of the runtime's
+// current heap memory use. They're also sharded across Ps, so it
+// makes sense to grab them all at once.
+type heapStatsAggregate struct {
+	heapStatsDelta
+
+	// Derived from values in heapStatsDelta.
+
+	// inObjects is the bytes of memory occupied by objects,
+	inObjects uint64
+
+	// numObjects is the number of live objects in the heap.
+	numObjects uint64
+}
+
+// compute populates the heapStatsAggregate with values from the runtime.
+func (a *heapStatsAggregate) compute() {
+	memstats.heapStats.read(&a.heapStatsDelta)
+
+	// Calculate derived stats.
+	a.inObjects = uint64(a.largeAlloc - a.largeFree)
+	a.numObjects = uint64(a.largeAllocCount - a.largeFreeCount)
+	for i := range a.smallAllocCount {
+		n := uint64(a.smallAllocCount[i] - a.smallFreeCount[i])
+		a.inObjects += n * uint64(class_to_size[i])
+		a.numObjects += n
+	}
+}
+
+// sysStatsAggregate represents system memory stats obtained
+// from the runtime. This set of stats is grouped together because
+// they're all relatively cheap to acquire and generally independent
+// of one another and other runtime memory stats. The fact that they
+// may be acquired at different times, especially with respect to
+// heapStatsAggregate, means there could be some skew, but because of
+// these stats are independent, there's no real consistency issue here.
+type sysStatsAggregate struct {
+	stacksSys      uint64
+	mSpanSys       uint64
+	mSpanInUse     uint64
+	mCacheSys      uint64
+	mCacheInUse    uint64
+	buckHashSys    uint64
+	gcMiscSys      uint64
+	otherSys       uint64
+	heapGoal       uint64
+	gcCyclesDone   uint64
+	gcCyclesForced uint64
+}
+
+// compute populates the sysStatsAggregate with values from the runtime.
+func (a *sysStatsAggregate) compute() {
+	a.stacksSys = memstats.stacks_sys.load()
+	a.buckHashSys = memstats.buckhash_sys.load()
+	a.gcMiscSys = memstats.gcMiscSys.load()
+	a.otherSys = memstats.other_sys.load()
+	a.heapGoal = atomic.Load64(&memstats.next_gc)
+	a.gcCyclesDone = uint64(memstats.numgc)
+	a.gcCyclesForced = uint64(memstats.numforcedgc)
+
+	systemstack(func() {
+		lock(&mheap_.lock)
+		a.mSpanSys = memstats.mspan_sys.load()
+		a.mSpanInUse = uint64(mheap_.spanalloc.inuse)
+		a.mCacheSys = memstats.mcache_sys.load()
+		a.mCacheInUse = uint64(mheap_.cachealloc.inuse)
+		unlock(&mheap_.lock)
+	})
+}
+
+// statAggregate is the main driver of the metrics implementation.
+//
+// It contains multiple aggregates of runtime statistics, as well
+// as a set of these aggregates that it has populated. The aggergates
+// are populated lazily by its ensure method.
+type statAggregate struct {
+	ensured   statDepSet
+	heapStats heapStatsAggregate
+	sysStats  sysStatsAggregate
+}
+
+// ensure populates statistics aggregates determined by deps if they
+// haven't yet been populated.
+func (a *statAggregate) ensure(deps *statDepSet) {
+	missing := deps.difference(a.ensured)
+	if missing.empty() {
+		return
+	}
+	for i := statDep(0); i < numStatsDeps; i++ {
+		if !missing.has(i) {
+			continue
+		}
+		switch i {
+		case heapStatsDep:
+			a.heapStats.compute()
+		case sysStatsDep:
+			a.sysStats.compute()
+		}
+	}
+	a.ensured = a.ensured.union(missing)
+}
+
+// metricValidKind is a runtime copy of runtime/metrics.ValueKind and
+// must be kept structurally identical to that type.
+type metricKind int
+
+const (
+	// These values must be kept identical to their corresponding Kind* values
+	// in the runtime/metrics package.
+	metricKindBad metricKind = iota
+	metricKindUint64
+	metricKindFloat64
+	metricKindFloat64Histogram
+)
+
+// metricSample is a runtime copy of runtime/metrics.Sample and
+// must be kept structurally identical to that type.
+type metricSample struct {
+	name  string
+	value metricValue
+}
+
+// metricValue is a runtime copy of runtime/metrics.Sample and
+// must be kept structurally identical to that type.
+type metricValue struct {
+	kind    metricKind
+	scalar  uint64         // contains scalar values for scalar Kinds.
+	pointer unsafe.Pointer // contains non-scalar values.
+}
+
+// float64HistOrInit tries to pull out an existing float64Histogram
+// from the value, but if none exists, then it allocates one with
+// the given buckets.
+func (v *metricValue) float64HistOrInit(buckets []float64) *metricFloat64Histogram {
+	var hist *metricFloat64Histogram
+	if v.kind == metricKindFloat64Histogram && v.pointer != nil {
+		hist = (*metricFloat64Histogram)(v.pointer)
+	} else {
+		v.kind = metricKindFloat64Histogram
+		hist = new(metricFloat64Histogram)
+		v.pointer = unsafe.Pointer(hist)
+	}
+	hist.buckets = buckets
+	if len(hist.counts) != len(hist.buckets)-1 {
+		hist.counts = make([]uint64, len(buckets)-1)
+	}
+	return hist
+}
+
+// metricFloat64Histogram is a runtime copy of runtime/metrics.Float64Histogram
+// and must be kept structurally identical to that type.
+type metricFloat64Histogram struct {
+	counts  []uint64
+	buckets []float64
+}
+
+// agg is used by readMetrics, and is protected by metricsSema.
+//
+// Managed as a global variable because its pointer will be
+// an argument to a dynamically-defined function, and we'd
+// like to avoid it escaping to the heap.
+var agg statAggregate
+
+// readMetrics is the implementation of runtime/metrics.Read.
+//
+//go:linkname readMetrics runtime/metrics.runtime_readMetrics
+func readMetrics(samplesp unsafe.Pointer, len int, cap int) {
+	// Construct a slice from the args.
+	sl := slice{samplesp, len, cap}
+	samples := *(*[]metricSample)(unsafe.Pointer(&sl))
+
+	// Acquire the metricsSema but with handoff. This operation
+	// is expensive enough that queueing up goroutines and handing
+	// off between them will be noticably better-behaved.
+	semacquire1(&metricsSema, true, 0, 0)
+
+	// Ensure the map is initialized.
+	initMetrics()
+
+	// Clear agg defensively.
+	agg = statAggregate{}
+
+	// Sample.
+	for i := range samples {
+		sample := &samples[i]
+		data, ok := metrics[sample.name]
+		if !ok {
+			sample.value.kind = metricKindBad
+			continue
+		}
+		// Ensure we have all the stats we need.
+		// agg is populated lazily.
+		agg.ensure(&data.deps)
+
+		// Compute the value based on the stats we have.
+		data.compute(&agg, &sample.value)
+	}
+
+	semrelease(&metricsSema)
+}
diff --git a/src/runtime/metrics/description.go b/src/runtime/metrics/description.go
new file mode 100644
index 0000000..1175156
--- /dev/null
+++ b/src/runtime/metrics/description.go
@@ -0,0 +1,184 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics
+
+// Description describes a runtime metric.
+type Description struct {
+	// Name is the full name of the metric which includes the unit.
+	//
+	// The format of the metric may be described by the following regular expression.
+	//
+	// 	^(?P<name>/[^:]+):(?P<unit>[^:*/]+(?:[*/][^:*/]+)*)$
+	//
+	// The format splits the name into two components, separated by a colon: a path which always
+	// starts with a /, and a machine-parseable unit. The name may contain any valid Unicode
+	// codepoint in between / characters, but by convention will try to stick to lowercase
+	// characters and hyphens. An example of such a path might be "/memory/heap/free".
+	//
+	// The unit is by convention a series of lowercase English unit names (singular or plural)
+	// without prefixes delimited by '*' or '/'. The unit names may contain any valid Unicode
+	// codepoint that is not a delimiter.
+	// Examples of units might be "seconds", "bytes", "bytes/second", "cpu-seconds",
+	// "byte*cpu-seconds", and "bytes/second/second".
+	//
+	// For histograms, multiple units may apply. For instance, the units of the buckets and
+	// the count. By convention, for histograms, the units of the count are always "samples"
+	// with the type of sample evident by the metric's name, while the unit in the name
+	// specifies the buckets' unit.
+	//
+	// A complete name might look like "/memory/heap/free:bytes".
+	Name string
+
+	// Description is an English language sentence describing the metric.
+	Description string
+
+	// Kind is the kind of value for this metric.
+	//
+	// The purpose of this field is to allow users to filter out metrics whose values are
+	// types which their application may not understand.
+	Kind ValueKind
+
+	// Cumulative is whether or not the metric is cumulative. If a cumulative metric is just
+	// a single number, then it increases monotonically. If the metric is a distribution,
+	// then each bucket count increases monotonically.
+	//
+	// This flag thus indicates whether or not it's useful to compute a rate from this value.
+	Cumulative bool
+}
+
+// The English language descriptions below must be kept in sync with the
+// descriptions of each metric in doc.go.
+var allDesc = []Description{
+	{
+		Name:        "/gc/cycles/automatic:gc-cycles",
+		Description: "Count of completed GC cycles generated by the Go runtime.",
+		Kind:        KindUint64,
+		Cumulative:  true,
+	},
+	{
+		Name:        "/gc/cycles/forced:gc-cycles",
+		Description: "Count of completed GC cycles forced by the application.",
+		Kind:        KindUint64,
+		Cumulative:  true,
+	},
+	{
+		Name:        "/gc/cycles/total:gc-cycles",
+		Description: "Count of all completed GC cycles.",
+		Kind:        KindUint64,
+		Cumulative:  true,
+	},
+	{
+		Name:        "/gc/heap/allocs-by-size:bytes",
+		Description: "Distribution of all objects allocated by approximate size.",
+		Kind:        KindFloat64Histogram,
+		Cumulative:  true,
+	},
+	{
+		Name:        "/gc/heap/frees-by-size:bytes",
+		Description: "Distribution of all objects freed by approximate size.",
+		Kind:        KindFloat64Histogram,
+		Cumulative:  true,
+	},
+	{
+		Name:        "/gc/heap/goal:bytes",
+		Description: "Heap size target for the end of the GC cycle.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/gc/heap/objects:objects",
+		Description: "Number of objects, live or unswept, occupying heap memory.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/gc/pauses:seconds",
+		Description: "Distribution individual GC-related stop-the-world pause latencies.",
+		Kind:        KindFloat64Histogram,
+		Cumulative:  true,
+	},
+	{
+		Name: "/memory/classes/heap/free:bytes",
+		Description: "Memory that is completely free and eligible to be returned to the underlying system, " +
+			"but has not been. This metric is the runtime's estimate of free address space that is backed by " +
+			"physical memory.",
+		Kind: KindUint64,
+	},
+	{
+		Name:        "/memory/classes/heap/objects:bytes",
+		Description: "Memory occupied by live objects and dead objects that have not yet been marked free by the garbage collector.",
+		Kind:        KindUint64,
+	},
+	{
+		Name: "/memory/classes/heap/released:bytes",
+		Description: "Memory that is completely free and has been returned to the underlying system. This " +
+			"metric is the runtime's estimate of free address space that is still mapped into the process, " +
+			"but is not backed by physical memory.",
+		Kind: KindUint64,
+	},
+	{
+		Name:        "/memory/classes/heap/stacks:bytes",
+		Description: "Memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/heap/unused:bytes",
+		Description: "Memory that is reserved for heap objects but is not currently used to hold heap objects.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/metadata/mcache/free:bytes",
+		Description: "Memory that is reserved for runtime mcache structures, but not in-use.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/metadata/mcache/inuse:bytes",
+		Description: "Memory that is occupied by runtime mcache structures that are currently being used.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/metadata/mspan/free:bytes",
+		Description: "Memory that is reserved for runtime mspan structures, but not in-use.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/metadata/mspan/inuse:bytes",
+		Description: "Memory that is occupied by runtime mspan structures that are currently being used.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/metadata/other:bytes",
+		Description: "Memory that is reserved for or used to hold runtime metadata.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/os-stacks:bytes",
+		Description: "Stack memory allocated by the underlying operating system.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/other:bytes",
+		Description: "Memory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/profiling/buckets:bytes",
+		Description: "Memory that is used by the stack trace hash map used for profiling.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/total:bytes",
+		Description: "All memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/sched/goroutines:goroutines",
+		Description: "Count of live goroutines.",
+		Kind:        KindUint64,
+	},
+}
+
+// All returns a slice of containing metric descriptions for all supported metrics.
+func All() []Description {
+	return allDesc
+}
diff --git a/src/runtime/metrics/description_test.go b/src/runtime/metrics/description_test.go
new file mode 100644
index 0000000..fd1fd46
--- /dev/null
+++ b/src/runtime/metrics/description_test.go
@@ -0,0 +1,115 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics_test
+
+import (
+	"bufio"
+	"os"
+	"regexp"
+	"runtime/metrics"
+	"strings"
+	"testing"
+)
+
+func TestDescriptionNameFormat(t *testing.T) {
+	r := regexp.MustCompile("^(?P<name>/[^:]+):(?P<unit>[^:*/]+(?:[*/][^:*/]+)*)$")
+	descriptions := metrics.All()
+	for _, desc := range descriptions {
+		if !r.MatchString(desc.Name) {
+			t.Errorf("metrics %q does not match regexp %s", desc.Name, r)
+		}
+	}
+}
+
+func extractMetricDocs(t *testing.T) map[string]string {
+	f, err := os.Open("doc.go")
+	if err != nil {
+		t.Fatalf("failed to open doc.go in runtime/metrics package: %v", err)
+	}
+	const (
+		stateSearch          = iota // look for list of metrics
+		stateNextMetric             // look for next metric
+		stateNextDescription        // build description
+	)
+	state := stateSearch
+	s := bufio.NewScanner(f)
+	result := make(map[string]string)
+	var metric string
+	var prevMetric string
+	var desc strings.Builder
+	for s.Scan() {
+		line := strings.TrimSpace(s.Text())
+		switch state {
+		case stateSearch:
+			if line == "Below is the full list of supported metrics, ordered lexicographically." {
+				state = stateNextMetric
+			}
+		case stateNextMetric:
+			// Ignore empty lines until we find a non-empty
+			// one. This will be our metric name.
+			if len(line) != 0 {
+				prevMetric = metric
+				metric = line
+				if prevMetric > metric {
+					t.Errorf("metrics %s and %s are out of lexicographical order", prevMetric, metric)
+				}
+				state = stateNextDescription
+			}
+		case stateNextDescription:
+			if len(line) == 0 || line == `*/` {
+				// An empty line means we're done.
+				// Write down the description and look
+				// for a new metric.
+				result[metric] = desc.String()
+				desc.Reset()
+				state = stateNextMetric
+			} else {
+				// As long as we're seeing data, assume that's
+				// part of the description and append it.
+				if desc.Len() != 0 {
+					// Turn previous newlines into spaces.
+					desc.WriteString(" ")
+				}
+				desc.WriteString(line)
+			}
+		}
+		if line == `*/` {
+			break
+		}
+	}
+	if state == stateSearch {
+		t.Fatalf("failed to find supported metrics docs in %s", f.Name())
+	}
+	return result
+}
+
+func TestDescriptionDocs(t *testing.T) {
+	docs := extractMetricDocs(t)
+	descriptions := metrics.All()
+	for _, d := range descriptions {
+		want := d.Description
+		got, ok := docs[d.Name]
+		if !ok {
+			t.Errorf("no docs found for metric %s", d.Name)
+			continue
+		}
+		if got != want {
+			t.Errorf("mismatched description and docs for metric %s", d.Name)
+			t.Errorf("want: %q, got %q", want, got)
+			continue
+		}
+	}
+	if len(docs) > len(descriptions) {
+	docsLoop:
+		for name, _ := range docs {
+			for _, d := range descriptions {
+				if name == d.Name {
+					continue docsLoop
+				}
+			}
+			t.Errorf("stale documentation for non-existent metric: %s", name)
+		}
+	}
+}
diff --git a/src/runtime/metrics/doc.go b/src/runtime/metrics/doc.go
new file mode 100644
index 0000000..7f790af
--- /dev/null
+++ b/src/runtime/metrics/doc.go
@@ -0,0 +1,143 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+Package metrics provides a stable interface to access implementation-defined
+metrics exported by the Go runtime. This package is similar to existing functions
+like runtime.ReadMemStats and debug.ReadGCStats, but significantly more general.
+
+The set of metrics defined by this package may evolve as the runtime itself
+evolves, and also enables variation across Go implementations, whose relevant
+metric sets may not intersect.
+
+Interface
+
+Metrics are designated by a string key, rather than, for example, a field name in
+a struct. The full list of supported metrics is always available in the slice of
+Descriptions returned by All. Each Description also includes useful information
+about the metric.
+
+Thus, users of this API are encouraged to sample supported metrics defined by the
+slice returned by All to remain compatible across Go versions. Of course, situations
+arise where reading specific metrics is critical. For these cases, users are
+encouraged to use build tags, and although metrics may be deprecated and removed,
+users should consider this to be an exceptional and rare event, coinciding with a
+very large change in a particular Go implementation.
+
+Each metric key also has a "kind" that describes the format of the metric's value.
+In the interest of not breaking users of this package, the "kind" for a given metric
+is guaranteed not to change. If it must change, then a new metric will be introduced
+with a new key and a new "kind."
+
+Metric key format
+
+As mentioned earlier, metric keys are strings. Their format is simple and well-defined,
+designed to be both human and machine readable. It is split into two components,
+separated by a colon: a rooted path and a unit. The choice to include the unit in
+the key is motivated by compatibility: if a metric's unit changes, its semantics likely
+did also, and a new key should be introduced.
+
+For more details on the precise definition of the metric key's path and unit formats, see
+the documentation of the Name field of the Description struct.
+
+A note about floats
+
+This package supports metrics whose values have a floating-point representation. In
+order to improve ease-of-use, this package promises to never produce the following
+classes of floating-point values: NaN, infinity.
+
+Supported metrics
+
+Below is the full list of supported metrics, ordered lexicographically.
+
+	/gc/cycles/automatic:gc-cycles
+		Count of completed GC cycles generated by the Go runtime.
+
+	/gc/cycles/forced:gc-cycles
+		Count of completed GC cycles forced by the application.
+
+	/gc/cycles/total:gc-cycles
+		Count of all completed GC cycles.
+
+	/gc/heap/allocs-by-size:bytes
+		Distribution of all objects allocated by approximate size.
+
+	/gc/heap/frees-by-size:bytes
+		Distribution of all objects freed by approximate size.
+
+	/gc/heap/goal:bytes
+		Heap size target for the end of the GC cycle.
+
+	/gc/heap/objects:objects
+		Number of objects, live or unswept, occupying heap memory.
+
+	/gc/pauses:seconds
+		Distribution individual GC-related stop-the-world pause latencies.
+
+	/memory/classes/heap/free:bytes
+		Memory that is completely free and eligible to be returned to
+		the underlying system, but has not been. This metric is the
+		runtime's estimate of free address space that is backed by
+		physical memory.
+
+	/memory/classes/heap/objects:bytes
+		Memory occupied by live objects and dead objects that have
+		not yet been marked free by the garbage collector.
+
+	/memory/classes/heap/released:bytes
+		Memory that is completely free and has been returned to
+		the underlying system. This metric is the runtime's estimate of
+		free address space that is still mapped into the process, but
+		is not backed by physical memory.
+
+	/memory/classes/heap/stacks:bytes
+		Memory allocated from the heap that is reserved for stack
+		space, whether or not it is currently in-use.
+
+	/memory/classes/heap/unused:bytes
+		Memory that is reserved for heap objects but is not currently
+		used to hold heap objects.
+
+	/memory/classes/metadata/mcache/free:bytes
+		Memory that is reserved for runtime mcache structures, but
+		not in-use.
+
+	/memory/classes/metadata/mcache/inuse:bytes
+		Memory that is occupied by runtime mcache structures that
+		are currently being used.
+
+	/memory/classes/metadata/mspan/free:bytes
+		Memory that is reserved for runtime mspan structures, but
+		not in-use.
+
+	/memory/classes/metadata/mspan/inuse:bytes
+		Memory that is occupied by runtime mspan structures that are
+		currently being used.
+
+	/memory/classes/metadata/other:bytes
+		Memory that is reserved for or used to hold runtime
+		metadata.
+
+	/memory/classes/os-stacks:bytes
+		Stack memory allocated by the underlying operating system.
+
+	/memory/classes/other:bytes
+		Memory used by execution trace buffers, structures for
+		debugging the runtime, finalizer and profiler specials, and
+		more.
+
+	/memory/classes/profiling/buckets:bytes
+		Memory that is used by the stack trace hash map used for
+		profiling.
+
+	/memory/classes/total:bytes
+		All memory mapped by the Go runtime into the current process
+		as read-write. Note that this does not include memory mapped
+		by code called via cgo or via the syscall package.
+		Sum of all metrics in /memory/classes.
+
+	/sched/goroutines:goroutines
+		Count of live goroutines.
+*/
+package metrics
diff --git a/src/runtime/metrics/example_test.go b/src/runtime/metrics/example_test.go
new file mode 100644
index 0000000..624d9d8
--- /dev/null
+++ b/src/runtime/metrics/example_test.go
@@ -0,0 +1,96 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics_test
+
+import (
+	"fmt"
+	"runtime/metrics"
+)
+
+func ExampleRead_readingOneMetric() {
+	// Name of the metric we want to read.
+	const myMetric = "/memory/classes/heap/free:bytes"
+
+	// Create a sample for the metric.
+	sample := make([]metrics.Sample, 1)
+	sample[0].Name = myMetric
+
+	// Sample the metric.
+	metrics.Read(sample)
+
+	// Check if the metric is actually supported.
+	// If it's not, the resulting value will always have
+	// kind KindBad.
+	if sample[0].Value.Kind() == metrics.KindBad {
+		panic(fmt.Sprintf("metric %q no longer supported", myMetric))
+	}
+
+	// Handle the result.
+	//
+	// It's OK to assume a particular Kind for a metric;
+	// they're guaranteed not to change.
+	freeBytes := sample[0].Value.Uint64()
+
+	fmt.Printf("free but not released memory: %d\n", freeBytes)
+}
+
+func ExampleRead_readingAllMetrics() {
+	// Get descriptions for all supported metrics.
+	descs := metrics.All()
+
+	// Create a sample for each metric.
+	samples := make([]metrics.Sample, len(descs))
+	for i := range samples {
+		samples[i].Name = descs[i].Name
+	}
+
+	// Sample the metrics. Re-use the samples slice if you can!
+	metrics.Read(samples)
+
+	// Iterate over all results.
+	for _, sample := range samples {
+		// Pull out the name and value.
+		name, value := sample.Name, sample.Value
+
+		// Handle each sample.
+		switch value.Kind() {
+		case metrics.KindUint64:
+			fmt.Printf("%s: %d\n", name, value.Uint64())
+		case metrics.KindFloat64:
+			fmt.Printf("%s: %f\n", name, value.Float64())
+		case metrics.KindFloat64Histogram:
+			// The histogram may be quite large, so let's just pull out
+			// a crude estimate for the median for the sake of this example.
+			fmt.Printf("%s: %f\n", name, medianBucket(value.Float64Histogram()))
+		case metrics.KindBad:
+			// This should never happen because all metrics are supported
+			// by construction.
+			panic("bug in runtime/metrics package!")
+		default:
+			// This may happen as new metrics get added.
+			//
+			// The safest thing to do here is to simply log it somewhere
+			// as something to look into, but ignore it for now.
+			// In the worst case, you might temporarily miss out on a new metric.
+			fmt.Printf("%s: unexpected metric Kind: %v\n", name, value.Kind())
+		}
+	}
+}
+
+func medianBucket(h *metrics.Float64Histogram) float64 {
+	total := uint64(0)
+	for _, count := range h.Counts {
+		total += count
+	}
+	thresh := total / 2
+	total = 0
+	for i, count := range h.Counts {
+		total += count
+		if total >= thresh {
+			return h.Buckets[i]
+		}
+	}
+	panic("should not happen")
+}
diff --git a/src/runtime/metrics/histogram.go b/src/runtime/metrics/histogram.go
new file mode 100644
index 0000000..956422b
--- /dev/null
+++ b/src/runtime/metrics/histogram.go
@@ -0,0 +1,33 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics
+
+// Float64Histogram represents a distribution of float64 values.
+type Float64Histogram struct {
+	// Counts contains the weights for each histogram bucket.
+	//
+	// Given N buckets, Count[n] is the weight of the range
+	// [bucket[n], bucket[n+1]), for 0 <= n < N.
+	Counts []uint64
+
+	// Buckets contains the boundaries of the histogram buckets, in increasing order.
+	//
+	// Buckets[0] is the inclusive lower bound of the minimum bucket while
+	// Buckets[len(Buckets)-1] is the exclusive upper bound of the maximum bucket.
+	// Hence, there are len(Buckets)-1 counts. Furthermore, len(Buckets) != 1, always,
+	// since at least two boundaries are required to describe one bucket (and 0
+	// boundaries are used to describe 0 buckets).
+	//
+	// Buckets[0] is permitted to have value -Inf and Buckets[len(Buckets)-1] is
+	// permitted to have value Inf.
+	//
+	// For a given metric name, the value of Buckets is guaranteed not to change
+	// between calls until program exit.
+	//
+	// This slice value is permitted to alias with other Float64Histograms' Buckets
+	// fields, so the values within should only ever be read. If they need to be
+	// modified, the user must make a copy.
+	Buckets []float64
+}
diff --git a/src/runtime/metrics/sample.go b/src/runtime/metrics/sample.go
new file mode 100644
index 0000000..4cf8cdf
--- /dev/null
+++ b/src/runtime/metrics/sample.go
@@ -0,0 +1,47 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics
+
+import (
+	_ "runtime" // depends on the runtime via a linkname'd function
+	"unsafe"
+)
+
+// Sample captures a single metric sample.
+type Sample struct {
+	// Name is the name of the metric sampled.
+	//
+	// It must correspond to a name in one of the metric descriptions
+	// returned by All.
+	Name string
+
+	// Value is the value of the metric sample.
+	Value Value
+}
+
+// Implemented in the runtime.
+func runtime_readMetrics(unsafe.Pointer, int, int)
+
+// Read populates each Value field in the given slice of metric samples.
+//
+// Desired metrics should be present in the slice with the appropriate name.
+// The user of this API is encouraged to re-use the same slice between calls for
+// efficiency, but is not required to do so.
+//
+// Note that re-use has some caveats. Notably, Values should not be read or
+// manipulated while a Read with that value is outstanding; that is a data race.
+// This property includes pointer-typed Values (for example, Float64Histogram)
+// whose underlying storage will be reused by Read when possible. To safely use
+// such values in a concurrent setting, all data must be deep-copied.
+//
+// It is safe to execute multiple Read calls concurrently, but their arguments
+// must share no underlying memory. When in doubt, create a new []Sample from
+// scratch, which is always safe, though may be inefficient.
+//
+// Sample values with names not appearing in All will have their Value populated
+// as KindBad to indicate that the name is unknown.
+func Read(m []Sample) {
+	runtime_readMetrics(unsafe.Pointer(&m[0]), len(m), cap(m))
+}
diff --git a/src/runtime/metrics/value.go b/src/runtime/metrics/value.go
new file mode 100644
index 0000000..ed9a33d
--- /dev/null
+++ b/src/runtime/metrics/value.go
@@ -0,0 +1,69 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics
+
+import (
+	"math"
+	"unsafe"
+)
+
+// ValueKind is a tag for a metric Value which indicates its type.
+type ValueKind int
+
+const (
+	// KindBad indicates that the Value has no type and should not be used.
+	KindBad ValueKind = iota
+
+	// KindUint64 indicates that the type of the Value is a uint64.
+	KindUint64
+
+	// KindFloat64 indicates that the type of the Value is a float64.
+	KindFloat64
+
+	// KindFloat64Histogram indicates that the type of the Value is a *Float64Histogram.
+	KindFloat64Histogram
+)
+
+// Value represents a metric value returned by the runtime.
+type Value struct {
+	kind    ValueKind
+	scalar  uint64         // contains scalar values for scalar Kinds.
+	pointer unsafe.Pointer // contains non-scalar values.
+}
+
+// Kind returns the tag representing the kind of value this is.
+func (v Value) Kind() ValueKind {
+	return v.kind
+}
+
+// Uint64 returns the internal uint64 value for the metric.
+//
+// If v.Kind() != KindUint64, this method panics.
+func (v Value) Uint64() uint64 {
+	if v.kind != KindUint64 {
+		panic("called Uint64 on non-uint64 metric value")
+	}
+	return v.scalar
+}
+
+// Float64 returns the internal float64 value for the metric.
+//
+// If v.Kind() != KindFloat64, this method panics.
+func (v Value) Float64() float64 {
+	if v.kind != KindFloat64 {
+		panic("called Float64 on non-float64 metric value")
+	}
+	return math.Float64frombits(v.scalar)
+}
+
+// Float64Histogram returns the internal *Float64Histogram value for the metric.
+//
+// If v.Kind() != KindFloat64Histogram, this method panics.
+func (v Value) Float64Histogram() *Float64Histogram {
+	if v.kind != KindFloat64Histogram {
+		panic("called Float64Histogram on non-Float64Histogram metric value")
+	}
+	return (*Float64Histogram)(v.pointer)
+}
diff --git a/src/runtime/metrics_test.go b/src/runtime/metrics_test.go
new file mode 100644
index 0000000..8a3cf01
--- /dev/null
+++ b/src/runtime/metrics_test.go
@@ -0,0 +1,258 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"runtime/metrics"
+	"sort"
+	"strings"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+func prepareAllMetricsSamples() (map[string]metrics.Description, []metrics.Sample) {
+	all := metrics.All()
+	samples := make([]metrics.Sample, len(all))
+	descs := make(map[string]metrics.Description)
+	for i := range all {
+		samples[i].Name = all[i].Name
+		descs[all[i].Name] = all[i]
+	}
+	return descs, samples
+}
+
+func TestReadMetrics(t *testing.T) {
+	// Tests whether readMetrics produces values aligning
+	// with ReadMemStats while the world is stopped.
+	var mstats runtime.MemStats
+	_, samples := prepareAllMetricsSamples()
+	runtime.ReadMetricsSlow(&mstats, unsafe.Pointer(&samples[0]), len(samples), cap(samples))
+
+	checkUint64 := func(t *testing.T, m string, got, want uint64) {
+		t.Helper()
+		if got != want {
+			t.Errorf("metric %q: got %d, want %d", m, got, want)
+		}
+	}
+
+	// Check to make sure the values we read line up with other values we read.
+	for i := range samples {
+		switch name := samples[i].Name; name {
+		case "/memory/classes/heap/free:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.HeapIdle-mstats.HeapReleased)
+		case "/memory/classes/heap/released:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.HeapReleased)
+		case "/memory/classes/heap/objects:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.HeapAlloc)
+		case "/memory/classes/heap/unused:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.HeapInuse-mstats.HeapAlloc)
+		case "/memory/classes/heap/stacks:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.StackInuse)
+		case "/memory/classes/metadata/mcache/free:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.MCacheSys-mstats.MCacheInuse)
+		case "/memory/classes/metadata/mcache/inuse:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.MCacheInuse)
+		case "/memory/classes/metadata/mspan/free:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.MSpanSys-mstats.MSpanInuse)
+		case "/memory/classes/metadata/mspan/inuse:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.MSpanInuse)
+		case "/memory/classes/metadata/other:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.GCSys)
+		case "/memory/classes/os-stacks:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.StackSys-mstats.StackInuse)
+		case "/memory/classes/other:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.OtherSys)
+		case "/memory/classes/profiling/buckets:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.BuckHashSys)
+		case "/memory/classes/total:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.Sys)
+		case "/gc/heap/allocs-by-size:bytes":
+			hist := samples[i].Value.Float64Histogram()
+			// Skip size class 0 in BySize, because it's always empty and not represented
+			// in the histogram.
+			for i, sc := range mstats.BySize[1:] {
+				if b, s := hist.Buckets[i+1], float64(sc.Size+1); b != s {
+					t.Errorf("bucket does not match size class: got %f, want %f", b, s)
+					// The rest of the checks aren't expected to work anyway.
+					continue
+				}
+				if c, m := hist.Counts[i], sc.Mallocs; c != m {
+					t.Errorf("histogram counts do not much BySize for class %d: got %d, want %d", i, c, m)
+				}
+			}
+		case "/gc/heap/frees-by-size:bytes":
+			hist := samples[i].Value.Float64Histogram()
+			// Skip size class 0 in BySize, because it's always empty and not represented
+			// in the histogram.
+			for i, sc := range mstats.BySize[1:] {
+				if b, s := hist.Buckets[i+1], float64(sc.Size+1); b != s {
+					t.Errorf("bucket does not match size class: got %f, want %f", b, s)
+					// The rest of the checks aren't expected to work anyway.
+					continue
+				}
+				if c, f := hist.Counts[i], sc.Frees; c != f {
+					t.Errorf("histogram counts do not much BySize for class %d: got %d, want %d", i, c, f)
+				}
+			}
+		case "/gc/heap/objects:objects":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.HeapObjects)
+		case "/gc/heap/goal:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.NextGC)
+		case "/gc/cycles/automatic:gc-cycles":
+			checkUint64(t, name, samples[i].Value.Uint64(), uint64(mstats.NumGC-mstats.NumForcedGC))
+		case "/gc/cycles/forced:gc-cycles":
+			checkUint64(t, name, samples[i].Value.Uint64(), uint64(mstats.NumForcedGC))
+		case "/gc/cycles/total:gc-cycles":
+			checkUint64(t, name, samples[i].Value.Uint64(), uint64(mstats.NumGC))
+		}
+	}
+}
+
+func TestReadMetricsConsistency(t *testing.T) {
+	// Tests whether readMetrics produces consistent, sensible values.
+	// The values are read concurrently with the runtime doing other
+	// things (e.g. allocating) so what we read can't reasonably compared
+	// to runtime values.
+
+	// Run a few GC cycles to get some of the stats to be non-zero.
+	runtime.GC()
+	runtime.GC()
+	runtime.GC()
+
+	// Read all the supported metrics through the metrics package.
+	descs, samples := prepareAllMetricsSamples()
+	metrics.Read(samples)
+
+	// Check to make sure the values we read make sense.
+	var totalVirtual struct {
+		got, want uint64
+	}
+	var objects struct {
+		alloc, free *metrics.Float64Histogram
+		total       uint64
+	}
+	var gc struct {
+		numGC  uint64
+		pauses uint64
+	}
+	for i := range samples {
+		kind := samples[i].Value.Kind()
+		if want := descs[samples[i].Name].Kind; kind != want {
+			t.Errorf("supported metric %q has unexpected kind: got %d, want %d", samples[i].Name, kind, want)
+			continue
+		}
+		if samples[i].Name != "/memory/classes/total:bytes" && strings.HasPrefix(samples[i].Name, "/memory/classes") {
+			v := samples[i].Value.Uint64()
+			totalVirtual.want += v
+
+			// None of these stats should ever get this big.
+			// If they do, there's probably overflow involved,
+			// usually due to bad accounting.
+			if int64(v) < 0 {
+				t.Errorf("%q has high/negative value: %d", samples[i].Name, v)
+			}
+		}
+		switch samples[i].Name {
+		case "/memory/classes/total:bytes":
+			totalVirtual.got = samples[i].Value.Uint64()
+		case "/gc/heap/objects:objects":
+			objects.total = samples[i].Value.Uint64()
+		case "/gc/heap/allocs-by-size:bytes":
+			objects.alloc = samples[i].Value.Float64Histogram()
+		case "/gc/heap/frees-by-size:bytes":
+			objects.free = samples[i].Value.Float64Histogram()
+		case "/gc/cycles:gc-cycles":
+			gc.numGC = samples[i].Value.Uint64()
+		case "/gc/pauses:seconds":
+			h := samples[i].Value.Float64Histogram()
+			gc.pauses = 0
+			for i := range h.Counts {
+				gc.pauses += h.Counts[i]
+			}
+		case "/sched/goroutines:goroutines":
+			if samples[i].Value.Uint64() < 1 {
+				t.Error("number of goroutines is less than one")
+			}
+		}
+	}
+	if totalVirtual.got != totalVirtual.want {
+		t.Errorf(`"/memory/classes/total:bytes" does not match sum of /memory/classes/**: got %d, want %d`, totalVirtual.got, totalVirtual.want)
+	}
+	if b, c := len(objects.alloc.Buckets), len(objects.alloc.Counts); b != c+1 {
+		t.Errorf("allocs-by-size has wrong bucket or counts length: %d buckets, %d counts", b, c)
+	}
+	if b, c := len(objects.free.Buckets), len(objects.free.Counts); b != c+1 {
+		t.Errorf("frees-by-size has wrong bucket or counts length: %d buckets, %d counts", b, c)
+	}
+	if len(objects.alloc.Buckets) != len(objects.free.Buckets) {
+		t.Error("allocs-by-size and frees-by-size buckets don't match in length")
+	} else if len(objects.alloc.Counts) != len(objects.free.Counts) {
+		t.Error("allocs-by-size and frees-by-size counts don't match in length")
+	} else {
+		for i := range objects.alloc.Buckets {
+			ba := objects.alloc.Buckets[i]
+			bf := objects.free.Buckets[i]
+			if ba != bf {
+				t.Errorf("bucket %d is different for alloc and free hists: %f != %f", i, ba, bf)
+			}
+		}
+		if !t.Failed() {
+			got, want := uint64(0), objects.total
+			for i := range objects.alloc.Counts {
+				if objects.alloc.Counts[i] < objects.free.Counts[i] {
+					t.Errorf("found more allocs than frees in object dist bucket %d", i)
+					continue
+				}
+				got += objects.alloc.Counts[i] - objects.free.Counts[i]
+			}
+			if got != want {
+				t.Errorf("object distribution counts don't match count of live objects: got %d, want %d", got, want)
+			}
+		}
+	}
+	// The current GC has at least 2 pauses per GC.
+	// Check to see if that value makes sense.
+	if gc.pauses < gc.numGC*2 {
+		t.Errorf("fewer pauses than expected: got %d, want at least %d", gc.pauses, gc.numGC*2)
+	}
+}
+
+func BenchmarkReadMetricsLatency(b *testing.B) {
+	stop := applyGCLoad(b)
+
+	// Spend this much time measuring latencies.
+	latencies := make([]time.Duration, 0, 1024)
+	_, samples := prepareAllMetricsSamples()
+
+	// Hit metrics.Read continuously and measure.
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		start := time.Now()
+		metrics.Read(samples)
+		latencies = append(latencies, time.Now().Sub(start))
+	}
+	// Make sure to stop the timer before we wait! The load created above
+	// is very heavy-weight and not easy to stop, so we could end up
+	// confusing the benchmarking framework for small b.N.
+	b.StopTimer()
+	stop()
+
+	// Disable the default */op metrics.
+	// ns/op doesn't mean anything because it's an average, but we
+	// have a sleep in our b.N loop above which skews this significantly.
+	b.ReportMetric(0, "ns/op")
+	b.ReportMetric(0, "B/op")
+	b.ReportMetric(0, "allocs/op")
+
+	// Sort latencies then report percentiles.
+	sort.Slice(latencies, func(i, j int) bool {
+		return latencies[i] < latencies[j]
+	})
+	b.ReportMetric(float64(latencies[len(latencies)*50/100]), "p50-ns")
+	b.ReportMetric(float64(latencies[len(latencies)*90/100]), "p90-ns")
+	b.ReportMetric(float64(latencies[len(latencies)*99/100]), "p99-ns")
+}
diff --git a/src/runtime/mfinal.go b/src/runtime/mfinal.go
new file mode 100644
index 0000000..f4dbd77
--- /dev/null
+++ b/src/runtime/mfinal.go
@@ -0,0 +1,453 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: finalizers and block profiling.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// finblock is an array of finalizers to be executed. finblocks are
+// arranged in a linked list for the finalizer queue.
+//
+// finblock is allocated from non-GC'd memory, so any heap pointers
+// must be specially handled. GC currently assumes that the finalizer
+// queue does not grow during marking (but it can shrink).
+//
+//go:notinheap
+type finblock struct {
+	alllink *finblock
+	next    *finblock
+	cnt     uint32
+	_       int32
+	fin     [(_FinBlockSize - 2*sys.PtrSize - 2*4) / unsafe.Sizeof(finalizer{})]finalizer
+}
+
+var finlock mutex  // protects the following variables
+var fing *g        // goroutine that runs finalizers
+var finq *finblock // list of finalizers that are to be executed
+var finc *finblock // cache of free blocks
+var finptrmask [_FinBlockSize / sys.PtrSize / 8]byte
+var fingwait bool
+var fingwake bool
+var allfin *finblock // list of all blocks
+
+// NOTE: Layout known to queuefinalizer.
+type finalizer struct {
+	fn   *funcval       // function to call (may be a heap pointer)
+	arg  unsafe.Pointer // ptr to object (may be a heap pointer)
+	nret uintptr        // bytes of return values from fn
+	fint *_type         // type of first argument of fn
+	ot   *ptrtype       // type of ptr to object (may be a heap pointer)
+}
+
+var finalizer1 = [...]byte{
+	// Each Finalizer is 5 words, ptr ptr INT ptr ptr (INT = uintptr here)
+	// Each byte describes 8 words.
+	// Need 8 Finalizers described by 5 bytes before pattern repeats:
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	// aka
+	//
+	//	ptr ptr INT ptr ptr ptr ptr INT
+	//	ptr ptr ptr ptr INT ptr ptr ptr
+	//	ptr INT ptr ptr ptr ptr INT ptr
+	//	ptr ptr ptr INT ptr ptr ptr ptr
+	//	INT ptr ptr ptr ptr INT ptr ptr
+	//
+	// Assumptions about Finalizer layout checked below.
+	1<<0 | 1<<1 | 0<<2 | 1<<3 | 1<<4 | 1<<5 | 1<<6 | 0<<7,
+	1<<0 | 1<<1 | 1<<2 | 1<<3 | 0<<4 | 1<<5 | 1<<6 | 1<<7,
+	1<<0 | 0<<1 | 1<<2 | 1<<3 | 1<<4 | 1<<5 | 0<<6 | 1<<7,
+	1<<0 | 1<<1 | 1<<2 | 0<<3 | 1<<4 | 1<<5 | 1<<6 | 1<<7,
+	0<<0 | 1<<1 | 1<<2 | 1<<3 | 1<<4 | 0<<5 | 1<<6 | 1<<7,
+}
+
+func queuefinalizer(p unsafe.Pointer, fn *funcval, nret uintptr, fint *_type, ot *ptrtype) {
+	if gcphase != _GCoff {
+		// Currently we assume that the finalizer queue won't
+		// grow during marking so we don't have to rescan it
+		// during mark termination. If we ever need to lift
+		// this assumption, we can do it by adding the
+		// necessary barriers to queuefinalizer (which it may
+		// have automatically).
+		throw("queuefinalizer during GC")
+	}
+
+	lock(&finlock)
+	if finq == nil || finq.cnt == uint32(len(finq.fin)) {
+		if finc == nil {
+			finc = (*finblock)(persistentalloc(_FinBlockSize, 0, &memstats.gcMiscSys))
+			finc.alllink = allfin
+			allfin = finc
+			if finptrmask[0] == 0 {
+				// Build pointer mask for Finalizer array in block.
+				// Check assumptions made in finalizer1 array above.
+				if (unsafe.Sizeof(finalizer{}) != 5*sys.PtrSize ||
+					unsafe.Offsetof(finalizer{}.fn) != 0 ||
+					unsafe.Offsetof(finalizer{}.arg) != sys.PtrSize ||
+					unsafe.Offsetof(finalizer{}.nret) != 2*sys.PtrSize ||
+					unsafe.Offsetof(finalizer{}.fint) != 3*sys.PtrSize ||
+					unsafe.Offsetof(finalizer{}.ot) != 4*sys.PtrSize) {
+					throw("finalizer out of sync")
+				}
+				for i := range finptrmask {
+					finptrmask[i] = finalizer1[i%len(finalizer1)]
+				}
+			}
+		}
+		block := finc
+		finc = block.next
+		block.next = finq
+		finq = block
+	}
+	f := &finq.fin[finq.cnt]
+	atomic.Xadd(&finq.cnt, +1) // Sync with markroots
+	f.fn = fn
+	f.nret = nret
+	f.fint = fint
+	f.ot = ot
+	f.arg = p
+	fingwake = true
+	unlock(&finlock)
+}
+
+//go:nowritebarrier
+func iterate_finq(callback func(*funcval, unsafe.Pointer, uintptr, *_type, *ptrtype)) {
+	for fb := allfin; fb != nil; fb = fb.alllink {
+		for i := uint32(0); i < fb.cnt; i++ {
+			f := &fb.fin[i]
+			callback(f.fn, f.arg, f.nret, f.fint, f.ot)
+		}
+	}
+}
+
+func wakefing() *g {
+	var res *g
+	lock(&finlock)
+	if fingwait && fingwake {
+		fingwait = false
+		fingwake = false
+		res = fing
+	}
+	unlock(&finlock)
+	return res
+}
+
+var (
+	fingCreate  uint32
+	fingRunning bool
+)
+
+func createfing() {
+	// start the finalizer goroutine exactly once
+	if fingCreate == 0 && atomic.Cas(&fingCreate, 0, 1) {
+		go runfinq()
+	}
+}
+
+// This is the goroutine that runs all of the finalizers
+func runfinq() {
+	var (
+		frame    unsafe.Pointer
+		framecap uintptr
+	)
+
+	for {
+		lock(&finlock)
+		fb := finq
+		finq = nil
+		if fb == nil {
+			gp := getg()
+			fing = gp
+			fingwait = true
+			goparkunlock(&finlock, waitReasonFinalizerWait, traceEvGoBlock, 1)
+			continue
+		}
+		unlock(&finlock)
+		if raceenabled {
+			racefingo()
+		}
+		for fb != nil {
+			for i := fb.cnt; i > 0; i-- {
+				f := &fb.fin[i-1]
+
+				framesz := unsafe.Sizeof((interface{})(nil)) + f.nret
+				if framecap < framesz {
+					// The frame does not contain pointers interesting for GC,
+					// all not yet finalized objects are stored in finq.
+					// If we do not mark it as FlagNoScan,
+					// the last finalized object is not collected.
+					frame = mallocgc(framesz, nil, true)
+					framecap = framesz
+				}
+
+				if f.fint == nil {
+					throw("missing type in runfinq")
+				}
+				// frame is effectively uninitialized
+				// memory. That means we have to clear
+				// it before writing to it to avoid
+				// confusing the write barrier.
+				*(*[2]uintptr)(frame) = [2]uintptr{}
+				switch f.fint.kind & kindMask {
+				case kindPtr:
+					// direct use of pointer
+					*(*unsafe.Pointer)(frame) = f.arg
+				case kindInterface:
+					ityp := (*interfacetype)(unsafe.Pointer(f.fint))
+					// set up with empty interface
+					(*eface)(frame)._type = &f.ot.typ
+					(*eface)(frame).data = f.arg
+					if len(ityp.mhdr) != 0 {
+						// convert to interface with methods
+						// this conversion is guaranteed to succeed - we checked in SetFinalizer
+						*(*iface)(frame) = assertE2I(ityp, *(*eface)(frame))
+					}
+				default:
+					throw("bad kind in runfinq")
+				}
+				fingRunning = true
+				reflectcall(nil, unsafe.Pointer(f.fn), frame, uint32(framesz), uint32(framesz))
+				fingRunning = false
+
+				// Drop finalizer queue heap references
+				// before hiding them from markroot.
+				// This also ensures these will be
+				// clear if we reuse the finalizer.
+				f.fn = nil
+				f.arg = nil
+				f.ot = nil
+				atomic.Store(&fb.cnt, i-1)
+			}
+			next := fb.next
+			lock(&finlock)
+			fb.next = finc
+			finc = fb
+			unlock(&finlock)
+			fb = next
+		}
+	}
+}
+
+// SetFinalizer sets the finalizer associated with obj to the provided
+// finalizer function. When the garbage collector finds an unreachable block
+// with an associated finalizer, it clears the association and runs
+// finalizer(obj) in a separate goroutine. This makes obj reachable again,
+// but now without an associated finalizer. Assuming that SetFinalizer
+// is not called again, the next time the garbage collector sees
+// that obj is unreachable, it will free obj.
+//
+// SetFinalizer(obj, nil) clears any finalizer associated with obj.
+//
+// The argument obj must be a pointer to an object allocated by calling
+// new, by taking the address of a composite literal, or by taking the
+// address of a local variable.
+// The argument finalizer must be a function that takes a single argument
+// to which obj's type can be assigned, and can have arbitrary ignored return
+// values. If either of these is not true, SetFinalizer may abort the
+// program.
+//
+// Finalizers are run in dependency order: if A points at B, both have
+// finalizers, and they are otherwise unreachable, only the finalizer
+// for A runs; once A is freed, the finalizer for B can run.
+// If a cyclic structure includes a block with a finalizer, that
+// cycle is not guaranteed to be garbage collected and the finalizer
+// is not guaranteed to run, because there is no ordering that
+// respects the dependencies.
+//
+// The finalizer is scheduled to run at some arbitrary time after the
+// program can no longer reach the object to which obj points.
+// There is no guarantee that finalizers will run before a program exits,
+// so typically they are useful only for releasing non-memory resources
+// associated with an object during a long-running program.
+// For example, an os.File object could use a finalizer to close the
+// associated operating system file descriptor when a program discards
+// an os.File without calling Close, but it would be a mistake
+// to depend on a finalizer to flush an in-memory I/O buffer such as a
+// bufio.Writer, because the buffer would not be flushed at program exit.
+//
+// It is not guaranteed that a finalizer will run if the size of *obj is
+// zero bytes.
+//
+// It is not guaranteed that a finalizer will run for objects allocated
+// in initializers for package-level variables. Such objects may be
+// linker-allocated, not heap-allocated.
+//
+// A finalizer may run as soon as an object becomes unreachable.
+// In order to use finalizers correctly, the program must ensure that
+// the object is reachable until it is no longer required.
+// Objects stored in global variables, or that can be found by tracing
+// pointers from a global variable, are reachable. For other objects,
+// pass the object to a call of the KeepAlive function to mark the
+// last point in the function where the object must be reachable.
+//
+// For example, if p points to a struct, such as os.File, that contains
+// a file descriptor d, and p has a finalizer that closes that file
+// descriptor, and if the last use of p in a function is a call to
+// syscall.Write(p.d, buf, size), then p may be unreachable as soon as
+// the program enters syscall.Write. The finalizer may run at that moment,
+// closing p.d, causing syscall.Write to fail because it is writing to
+// a closed file descriptor (or, worse, to an entirely different
+// file descriptor opened by a different goroutine). To avoid this problem,
+// call runtime.KeepAlive(p) after the call to syscall.Write.
+//
+// A single goroutine runs all finalizers for a program, sequentially.
+// If a finalizer must run for a long time, it should do so by starting
+// a new goroutine.
+func SetFinalizer(obj interface{}, finalizer interface{}) {
+	if debug.sbrk != 0 {
+		// debug.sbrk never frees memory, so no finalizers run
+		// (and we don't have the data structures to record them).
+		return
+	}
+	e := efaceOf(&obj)
+	etyp := e._type
+	if etyp == nil {
+		throw("runtime.SetFinalizer: first argument is nil")
+	}
+	if etyp.kind&kindMask != kindPtr {
+		throw("runtime.SetFinalizer: first argument is " + etyp.string() + ", not pointer")
+	}
+	ot := (*ptrtype)(unsafe.Pointer(etyp))
+	if ot.elem == nil {
+		throw("nil elem type!")
+	}
+
+	// find the containing object
+	base, _, _ := findObject(uintptr(e.data), 0, 0)
+
+	if base == 0 {
+		// 0-length objects are okay.
+		if e.data == unsafe.Pointer(&zerobase) {
+			return
+		}
+
+		// Global initializers might be linker-allocated.
+		//	var Foo = &Object{}
+		//	func main() {
+		//		runtime.SetFinalizer(Foo, nil)
+		//	}
+		// The relevant segments are: noptrdata, data, bss, noptrbss.
+		// We cannot assume they are in any order or even contiguous,
+		// due to external linking.
+		for datap := &firstmoduledata; datap != nil; datap = datap.next {
+			if datap.noptrdata <= uintptr(e.data) && uintptr(e.data) < datap.enoptrdata ||
+				datap.data <= uintptr(e.data) && uintptr(e.data) < datap.edata ||
+				datap.bss <= uintptr(e.data) && uintptr(e.data) < datap.ebss ||
+				datap.noptrbss <= uintptr(e.data) && uintptr(e.data) < datap.enoptrbss {
+				return
+			}
+		}
+		throw("runtime.SetFinalizer: pointer not in allocated block")
+	}
+
+	if uintptr(e.data) != base {
+		// As an implementation detail we allow to set finalizers for an inner byte
+		// of an object if it could come from tiny alloc (see mallocgc for details).
+		if ot.elem == nil || ot.elem.ptrdata != 0 || ot.elem.size >= maxTinySize {
+			throw("runtime.SetFinalizer: pointer not at beginning of allocated block")
+		}
+	}
+
+	f := efaceOf(&finalizer)
+	ftyp := f._type
+	if ftyp == nil {
+		// switch to system stack and remove finalizer
+		systemstack(func() {
+			removefinalizer(e.data)
+		})
+		return
+	}
+
+	if ftyp.kind&kindMask != kindFunc {
+		throw("runtime.SetFinalizer: second argument is " + ftyp.string() + ", not a function")
+	}
+	ft := (*functype)(unsafe.Pointer(ftyp))
+	if ft.dotdotdot() {
+		throw("runtime.SetFinalizer: cannot pass " + etyp.string() + " to finalizer " + ftyp.string() + " because dotdotdot")
+	}
+	if ft.inCount != 1 {
+		throw("runtime.SetFinalizer: cannot pass " + etyp.string() + " to finalizer " + ftyp.string())
+	}
+	fint := ft.in()[0]
+	switch {
+	case fint == etyp:
+		// ok - same type
+		goto okarg
+	case fint.kind&kindMask == kindPtr:
+		if (fint.uncommon() == nil || etyp.uncommon() == nil) && (*ptrtype)(unsafe.Pointer(fint)).elem == ot.elem {
+			// ok - not same type, but both pointers,
+			// one or the other is unnamed, and same element type, so assignable.
+			goto okarg
+		}
+	case fint.kind&kindMask == kindInterface:
+		ityp := (*interfacetype)(unsafe.Pointer(fint))
+		if len(ityp.mhdr) == 0 {
+			// ok - satisfies empty interface
+			goto okarg
+		}
+		if _, ok := assertE2I2(ityp, *efaceOf(&obj)); ok {
+			goto okarg
+		}
+	}
+	throw("runtime.SetFinalizer: cannot pass " + etyp.string() + " to finalizer " + ftyp.string())
+okarg:
+	// compute size needed for return parameters
+	nret := uintptr(0)
+	for _, t := range ft.out() {
+		nret = alignUp(nret, uintptr(t.align)) + uintptr(t.size)
+	}
+	nret = alignUp(nret, sys.PtrSize)
+
+	// make sure we have a finalizer goroutine
+	createfing()
+
+	systemstack(func() {
+		if !addfinalizer(e.data, (*funcval)(f.data), nret, fint, ot) {
+			throw("runtime.SetFinalizer: finalizer already set")
+		}
+	})
+}
+
+// Mark KeepAlive as noinline so that it is easily detectable as an intrinsic.
+//go:noinline
+
+// KeepAlive marks its argument as currently reachable.
+// This ensures that the object is not freed, and its finalizer is not run,
+// before the point in the program where KeepAlive is called.
+//
+// A very simplified example showing where KeepAlive is required:
+// 	type File struct { d int }
+// 	d, err := syscall.Open("/file/path", syscall.O_RDONLY, 0)
+// 	// ... do something if err != nil ...
+// 	p := &File{d}
+// 	runtime.SetFinalizer(p, func(p *File) { syscall.Close(p.d) })
+// 	var buf [10]byte
+// 	n, err := syscall.Read(p.d, buf[:])
+// 	// Ensure p is not finalized until Read returns.
+// 	runtime.KeepAlive(p)
+// 	// No more uses of p after this point.
+//
+// Without the KeepAlive call, the finalizer could run at the start of
+// syscall.Read, closing the file descriptor before syscall.Read makes
+// the actual system call.
+func KeepAlive(x interface{}) {
+	// Introduce a use of x that the compiler can't eliminate.
+	// This makes sure x is alive on entry. We need x to be alive
+	// on entry for "defer runtime.KeepAlive(x)"; see issue 21402.
+	if cgoAlwaysFalse {
+		println(x)
+	}
+}
diff --git a/src/runtime/mfinal_test.go b/src/runtime/mfinal_test.go
new file mode 100644
index 0000000..3ca8d31
--- /dev/null
+++ b/src/runtime/mfinal_test.go
@@ -0,0 +1,264 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+type Tintptr *int // assignable to *int
+type Tint int     // *Tint implements Tinter, interface{}
+
+func (t *Tint) m() {}
+
+type Tinter interface {
+	m()
+}
+
+func TestFinalizerType(t *testing.T) {
+	if runtime.GOARCH != "amd64" {
+		t.Skipf("Skipping on non-amd64 machine")
+	}
+
+	ch := make(chan bool, 10)
+	finalize := func(x *int) {
+		if *x != 97531 {
+			t.Errorf("finalizer %d, want %d", *x, 97531)
+		}
+		ch <- true
+	}
+
+	var finalizerTests = []struct {
+		convert   func(*int) interface{}
+		finalizer interface{}
+	}{
+		{func(x *int) interface{} { return x }, func(v *int) { finalize(v) }},
+		{func(x *int) interface{} { return Tintptr(x) }, func(v Tintptr) { finalize(v) }},
+		{func(x *int) interface{} { return Tintptr(x) }, func(v *int) { finalize(v) }},
+		{func(x *int) interface{} { return (*Tint)(x) }, func(v *Tint) { finalize((*int)(v)) }},
+		{func(x *int) interface{} { return (*Tint)(x) }, func(v Tinter) { finalize((*int)(v.(*Tint))) }},
+	}
+
+	for i, tt := range finalizerTests {
+		done := make(chan bool, 1)
+		go func() {
+			// allocate struct with pointer to avoid hitting tinyalloc.
+			// Otherwise we can't be sure when the allocation will
+			// be freed.
+			type T struct {
+				v int
+				p unsafe.Pointer
+			}
+			v := &new(T).v
+			*v = 97531
+			runtime.SetFinalizer(tt.convert(v), tt.finalizer)
+			v = nil
+			done <- true
+		}()
+		<-done
+		runtime.GC()
+		select {
+		case <-ch:
+		case <-time.After(time.Second * 4):
+			t.Errorf("#%d: finalizer for type %T didn't run", i, tt.finalizer)
+		}
+	}
+}
+
+type bigValue struct {
+	fill uint64
+	it   bool
+	up   string
+}
+
+func TestFinalizerInterfaceBig(t *testing.T) {
+	if runtime.GOARCH != "amd64" {
+		t.Skipf("Skipping on non-amd64 machine")
+	}
+	ch := make(chan bool)
+	done := make(chan bool, 1)
+	go func() {
+		v := &bigValue{0xDEADBEEFDEADBEEF, true, "It matters not how strait the gate"}
+		old := *v
+		runtime.SetFinalizer(v, func(v interface{}) {
+			i, ok := v.(*bigValue)
+			if !ok {
+				t.Errorf("finalizer called with type %T, want *bigValue", v)
+			}
+			if *i != old {
+				t.Errorf("finalizer called with %+v, want %+v", *i, old)
+			}
+			close(ch)
+		})
+		v = nil
+		done <- true
+	}()
+	<-done
+	runtime.GC()
+	select {
+	case <-ch:
+	case <-time.After(4 * time.Second):
+		t.Errorf("finalizer for type *bigValue didn't run")
+	}
+}
+
+func fin(v *int) {
+}
+
+// Verify we don't crash at least. golang.org/issue/6857
+func TestFinalizerZeroSizedStruct(t *testing.T) {
+	type Z struct{}
+	z := new(Z)
+	runtime.SetFinalizer(z, func(*Z) {})
+}
+
+func BenchmarkFinalizer(b *testing.B) {
+	const Batch = 1000
+	b.RunParallel(func(pb *testing.PB) {
+		var data [Batch]*int
+		for i := 0; i < Batch; i++ {
+			data[i] = new(int)
+		}
+		for pb.Next() {
+			for i := 0; i < Batch; i++ {
+				runtime.SetFinalizer(data[i], fin)
+			}
+			for i := 0; i < Batch; i++ {
+				runtime.SetFinalizer(data[i], nil)
+			}
+		}
+	})
+}
+
+func BenchmarkFinalizerRun(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			v := new(int)
+			runtime.SetFinalizer(v, fin)
+		}
+	})
+}
+
+// One chunk must be exactly one sizeclass in size.
+// It should be a sizeclass not used much by others, so we
+// have a greater chance of finding adjacent ones.
+// size class 19: 320 byte objects, 25 per page, 1 page alloc at a time
+const objsize = 320
+
+type objtype [objsize]byte
+
+func adjChunks() (*objtype, *objtype) {
+	var s []*objtype
+
+	for {
+		c := new(objtype)
+		for _, d := range s {
+			if uintptr(unsafe.Pointer(c))+unsafe.Sizeof(*c) == uintptr(unsafe.Pointer(d)) {
+				return c, d
+			}
+			if uintptr(unsafe.Pointer(d))+unsafe.Sizeof(*c) == uintptr(unsafe.Pointer(c)) {
+				return d, c
+			}
+		}
+		s = append(s, c)
+	}
+}
+
+// Make sure an empty slice on the stack doesn't pin the next object in memory.
+func TestEmptySlice(t *testing.T) {
+	x, y := adjChunks()
+
+	// the pointer inside xs points to y.
+	xs := x[objsize:] // change objsize to objsize-1 and the test passes
+
+	fin := make(chan bool, 1)
+	runtime.SetFinalizer(y, func(z *objtype) { fin <- true })
+	runtime.GC()
+	select {
+	case <-fin:
+	case <-time.After(4 * time.Second):
+		t.Errorf("finalizer of next object in memory didn't run")
+	}
+	xsglobal = xs // keep empty slice alive until here
+}
+
+var xsglobal []byte
+
+func adjStringChunk() (string, *objtype) {
+	b := make([]byte, objsize)
+	for {
+		s := string(b)
+		t := new(objtype)
+		p := *(*uintptr)(unsafe.Pointer(&s))
+		q := uintptr(unsafe.Pointer(t))
+		if p+objsize == q {
+			return s, t
+		}
+	}
+}
+
+// Make sure an empty string on the stack doesn't pin the next object in memory.
+func TestEmptyString(t *testing.T) {
+	x, y := adjStringChunk()
+
+	ss := x[objsize:] // change objsize to objsize-1 and the test passes
+	fin := make(chan bool, 1)
+	// set finalizer on string contents of y
+	runtime.SetFinalizer(y, func(z *objtype) { fin <- true })
+	runtime.GC()
+	select {
+	case <-fin:
+	case <-time.After(4 * time.Second):
+		t.Errorf("finalizer of next string in memory didn't run")
+	}
+	ssglobal = ss // keep 0-length string live until here
+}
+
+var ssglobal string
+
+// Test for issue 7656.
+func TestFinalizerOnGlobal(t *testing.T) {
+	runtime.SetFinalizer(Foo1, func(p *Object1) {})
+	runtime.SetFinalizer(Foo2, func(p *Object2) {})
+	runtime.SetFinalizer(Foo1, nil)
+	runtime.SetFinalizer(Foo2, nil)
+}
+
+type Object1 struct {
+	Something []byte
+}
+
+type Object2 struct {
+	Something byte
+}
+
+var (
+	Foo2 = &Object2{}
+	Foo1 = &Object1{}
+)
+
+func TestDeferKeepAlive(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	// See issue 21402.
+	t.Parallel()
+	type T *int // needs to be a pointer base type to avoid tinyalloc and its never-finalized behavior.
+	x := new(T)
+	finRun := false
+	runtime.SetFinalizer(x, func(x *T) {
+		finRun = true
+	})
+	defer runtime.KeepAlive(x)
+	runtime.GC()
+	time.Sleep(time.Second)
+	if finRun {
+		t.Errorf("finalizer ran prematurely")
+	}
+}
diff --git a/src/runtime/mfixalloc.go b/src/runtime/mfixalloc.go
new file mode 100644
index 0000000..293c16b
--- /dev/null
+++ b/src/runtime/mfixalloc.go
@@ -0,0 +1,99 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Fixed-size object allocator. Returned memory is not zeroed.
+//
+// See malloc.go for overview.
+
+package runtime
+
+import "unsafe"
+
+// FixAlloc is a simple free-list allocator for fixed size objects.
+// Malloc uses a FixAlloc wrapped around sysAlloc to manage its
+// mcache and mspan objects.
+//
+// Memory returned by fixalloc.alloc is zeroed by default, but the
+// caller may take responsibility for zeroing allocations by setting
+// the zero flag to false. This is only safe if the memory never
+// contains heap pointers.
+//
+// The caller is responsible for locking around FixAlloc calls.
+// Callers can keep state in the object but the first word is
+// smashed by freeing and reallocating.
+//
+// Consider marking fixalloc'd types go:notinheap.
+type fixalloc struct {
+	size   uintptr
+	first  func(arg, p unsafe.Pointer) // called first time p is returned
+	arg    unsafe.Pointer
+	list   *mlink
+	chunk  uintptr // use uintptr instead of unsafe.Pointer to avoid write barriers
+	nchunk uint32
+	inuse  uintptr // in-use bytes now
+	stat   *sysMemStat
+	zero   bool // zero allocations
+}
+
+// A generic linked list of blocks.  (Typically the block is bigger than sizeof(MLink).)
+// Since assignments to mlink.next will result in a write barrier being performed
+// this cannot be used by some of the internal GC structures. For example when
+// the sweeper is placing an unmarked object on the free list it does not want the
+// write barrier to be called since that could result in the object being reachable.
+//
+//go:notinheap
+type mlink struct {
+	next *mlink
+}
+
+// Initialize f to allocate objects of the given size,
+// using the allocator to obtain chunks of memory.
+func (f *fixalloc) init(size uintptr, first func(arg, p unsafe.Pointer), arg unsafe.Pointer, stat *sysMemStat) {
+	f.size = size
+	f.first = first
+	f.arg = arg
+	f.list = nil
+	f.chunk = 0
+	f.nchunk = 0
+	f.inuse = 0
+	f.stat = stat
+	f.zero = true
+}
+
+func (f *fixalloc) alloc() unsafe.Pointer {
+	if f.size == 0 {
+		print("runtime: use of FixAlloc_Alloc before FixAlloc_Init\n")
+		throw("runtime: internal error")
+	}
+
+	if f.list != nil {
+		v := unsafe.Pointer(f.list)
+		f.list = f.list.next
+		f.inuse += f.size
+		if f.zero {
+			memclrNoHeapPointers(v, f.size)
+		}
+		return v
+	}
+	if uintptr(f.nchunk) < f.size {
+		f.chunk = uintptr(persistentalloc(_FixAllocChunk, 0, f.stat))
+		f.nchunk = _FixAllocChunk
+	}
+
+	v := unsafe.Pointer(f.chunk)
+	if f.first != nil {
+		f.first(f.arg, v)
+	}
+	f.chunk = f.chunk + f.size
+	f.nchunk -= uint32(f.size)
+	f.inuse += f.size
+	return v
+}
+
+func (f *fixalloc) free(p unsafe.Pointer) {
+	f.inuse -= f.size
+	v := (*mlink)(p)
+	v.next = f.list
+	f.list = v
+}
diff --git a/src/runtime/mgc.go b/src/runtime/mgc.go
new file mode 100644
index 0000000..185d320
--- /dev/null
+++ b/src/runtime/mgc.go
@@ -0,0 +1,2336 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector (GC).
+//
+// The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple
+// GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is
+// non-generational and non-compacting. Allocation is done using size segregated per P allocation
+// areas to minimize fragmentation while eliminating locks in the common case.
+//
+// The algorithm decomposes into several steps.
+// This is a high level description of the algorithm being used. For an overview of GC a good
+// place to start is Richard Jones' gchandbook.org.
+//
+// The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see
+// Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978.
+// On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978),
+// 966-975.
+// For journal quality proofs that these steps are complete, correct, and terminate see
+// Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world.
+// Concurrency and Computation: Practice and Experience 15(3-5), 2003.
+//
+// 1. GC performs sweep termination.
+//
+//    a. Stop the world. This causes all Ps to reach a GC safe-point.
+//
+//    b. Sweep any unswept spans. There will only be unswept spans if
+//    this GC cycle was forced before the expected time.
+//
+// 2. GC performs the mark phase.
+//
+//    a. Prepare for the mark phase by setting gcphase to _GCmark
+//    (from _GCoff), enabling the write barrier, enabling mutator
+//    assists, and enqueueing root mark jobs. No objects may be
+//    scanned until all Ps have enabled the write barrier, which is
+//    accomplished using STW.
+//
+//    b. Start the world. From this point, GC work is done by mark
+//    workers started by the scheduler and by assists performed as
+//    part of allocation. The write barrier shades both the
+//    overwritten pointer and the new pointer value for any pointer
+//    writes (see mbarrier.go for details). Newly allocated objects
+//    are immediately marked black.
+//
+//    c. GC performs root marking jobs. This includes scanning all
+//    stacks, shading all globals, and shading any heap pointers in
+//    off-heap runtime data structures. Scanning a stack stops a
+//    goroutine, shades any pointers found on its stack, and then
+//    resumes the goroutine.
+//
+//    d. GC drains the work queue of grey objects, scanning each grey
+//    object to black and shading all pointers found in the object
+//    (which in turn may add those pointers to the work queue).
+//
+//    e. Because GC work is spread across local caches, GC uses a
+//    distributed termination algorithm to detect when there are no
+//    more root marking jobs or grey objects (see gcMarkDone). At this
+//    point, GC transitions to mark termination.
+//
+// 3. GC performs mark termination.
+//
+//    a. Stop the world.
+//
+//    b. Set gcphase to _GCmarktermination, and disable workers and
+//    assists.
+//
+//    c. Perform housekeeping like flushing mcaches.
+//
+// 4. GC performs the sweep phase.
+//
+//    a. Prepare for the sweep phase by setting gcphase to _GCoff,
+//    setting up sweep state and disabling the write barrier.
+//
+//    b. Start the world. From this point on, newly allocated objects
+//    are white, and allocating sweeps spans before use if necessary.
+//
+//    c. GC does concurrent sweeping in the background and in response
+//    to allocation. See description below.
+//
+// 5. When sufficient allocation has taken place, replay the sequence
+// starting with 1 above. See discussion of GC rate below.
+
+// Concurrent sweep.
+//
+// The sweep phase proceeds concurrently with normal program execution.
+// The heap is swept span-by-span both lazily (when a goroutine needs another span)
+// and concurrently in a background goroutine (this helps programs that are not CPU bound).
+// At the end of STW mark termination all spans are marked as "needs sweeping".
+//
+// The background sweeper goroutine simply sweeps spans one-by-one.
+//
+// To avoid requesting more OS memory while there are unswept spans, when a
+// goroutine needs another span, it first attempts to reclaim that much memory
+// by sweeping. When a goroutine needs to allocate a new small-object span, it
+// sweeps small-object spans for the same object size until it frees at least
+// one object. When a goroutine needs to allocate large-object span from heap,
+// it sweeps spans until it frees at least that many pages into heap. There is
+// one case where this may not suffice: if a goroutine sweeps and frees two
+// nonadjacent one-page spans to the heap, it will allocate a new two-page
+// span, but there can still be other one-page unswept spans which could be
+// combined into a two-page span.
+//
+// It's critical to ensure that no operations proceed on unswept spans (that would corrupt
+// mark bits in GC bitmap). During GC all mcaches are flushed into the central cache,
+// so they are empty. When a goroutine grabs a new span into mcache, it sweeps it.
+// When a goroutine explicitly frees an object or sets a finalizer, it ensures that
+// the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish).
+// The finalizer goroutine is kicked off only when all spans are swept.
+// When the next GC starts, it sweeps all not-yet-swept spans (if any).
+
+// GC rate.
+// Next GC is after we've allocated an extra amount of memory proportional to
+// the amount already in use. The proportion is controlled by GOGC environment variable
+// (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M
+// (this mark is tracked in next_gc variable). This keeps the GC cost in linear
+// proportion to the allocation cost. Adjusting GOGC just changes the linear constant
+// (and also the amount of extra memory used).
+
+// Oblets
+//
+// In order to prevent long pauses while scanning large objects and to
+// improve parallelism, the garbage collector breaks up scan jobs for
+// objects larger than maxObletBytes into "oblets" of at most
+// maxObletBytes. When scanning encounters the beginning of a large
+// object, it scans only the first oblet and enqueues the remaining
+// oblets as new scan jobs.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const (
+	_DebugGC         = 0
+	_ConcurrentSweep = true
+	_FinBlockSize    = 4 * 1024
+
+	// debugScanConservative enables debug logging for stack
+	// frames that are scanned conservatively.
+	debugScanConservative = false
+
+	// sweepMinHeapDistance is a lower bound on the heap distance
+	// (in bytes) reserved for concurrent sweeping between GC
+	// cycles.
+	sweepMinHeapDistance = 1024 * 1024
+)
+
+// heapminimum is the minimum heap size at which to trigger GC.
+// For small heaps, this overrides the usual GOGC*live set rule.
+//
+// When there is a very small live set but a lot of allocation, simply
+// collecting when the heap reaches GOGC*live results in many GC
+// cycles and high total per-GC overhead. This minimum amortizes this
+// per-GC overhead while keeping the heap reasonably small.
+//
+// During initialization this is set to 4MB*GOGC/100. In the case of
+// GOGC==0, this will set heapminimum to 0, resulting in constant
+// collection even when the heap size is small, which is useful for
+// debugging.
+var heapminimum uint64 = defaultHeapMinimum
+
+// defaultHeapMinimum is the value of heapminimum for GOGC==100.
+const defaultHeapMinimum = 4 << 20
+
+// Initialized from $GOGC.  GOGC=off means no GC.
+var gcpercent int32
+
+func gcinit() {
+	if unsafe.Sizeof(workbuf{}) != _WorkbufSize {
+		throw("size of Workbuf is suboptimal")
+	}
+
+	// No sweep on the first cycle.
+	mheap_.sweepdone = 1
+
+	// Set a reasonable initial GC trigger.
+	memstats.triggerRatio = 7 / 8.0
+
+	// Fake a heap_marked value so it looks like a trigger at
+	// heapminimum is the appropriate growth from heap_marked.
+	// This will go into computing the initial GC goal.
+	memstats.heap_marked = uint64(float64(heapminimum) / (1 + memstats.triggerRatio))
+
+	// Set gcpercent from the environment. This will also compute
+	// and set the GC trigger and goal.
+	_ = setGCPercent(readgogc())
+
+	work.startSema = 1
+	work.markDoneSema = 1
+	lockInit(&work.sweepWaiters.lock, lockRankSweepWaiters)
+	lockInit(&work.assistQueue.lock, lockRankAssistQueue)
+	lockInit(&work.wbufSpans.lock, lockRankWbufSpans)
+}
+
+func readgogc() int32 {
+	p := gogetenv("GOGC")
+	if p == "off" {
+		return -1
+	}
+	if n, ok := atoi32(p); ok {
+		return n
+	}
+	return 100
+}
+
+// gcenable is called after the bulk of the runtime initialization,
+// just before we're about to start letting user code run.
+// It kicks off the background sweeper goroutine, the background
+// scavenger goroutine, and enables GC.
+func gcenable() {
+	// Kick off sweeping and scavenging.
+	c := make(chan int, 2)
+	go bgsweep(c)
+	go bgscavenge(c)
+	<-c
+	<-c
+	memstats.enablegc = true // now that runtime is initialized, GC is okay
+}
+
+//go:linkname setGCPercent runtime/debug.setGCPercent
+func setGCPercent(in int32) (out int32) {
+	// Run on the system stack since we grab the heap lock.
+	systemstack(func() {
+		lock(&mheap_.lock)
+		out = gcpercent
+		if in < 0 {
+			in = -1
+		}
+		gcpercent = in
+		heapminimum = defaultHeapMinimum * uint64(gcpercent) / 100
+		// Update pacing in response to gcpercent change.
+		gcSetTriggerRatio(memstats.triggerRatio)
+		unlock(&mheap_.lock)
+	})
+
+	// If we just disabled GC, wait for any concurrent GC mark to
+	// finish so we always return with no GC running.
+	if in < 0 {
+		gcWaitOnMark(atomic.Load(&work.cycles))
+	}
+
+	return out
+}
+
+// Garbage collector phase.
+// Indicates to write barrier and synchronization task to perform.
+var gcphase uint32
+
+// The compiler knows about this variable.
+// If you change it, you must change builtin/runtime.go, too.
+// If you change the first four bytes, you must also change the write
+// barrier insertion code.
+var writeBarrier struct {
+	enabled bool    // compiler emits a check of this before calling write barrier
+	pad     [3]byte // compiler uses 32-bit load for "enabled" field
+	needed  bool    // whether we need a write barrier for current GC phase
+	cgo     bool    // whether we need a write barrier for a cgo check
+	alignme uint64  // guarantee alignment so that compiler can use a 32 or 64-bit load
+}
+
+// gcBlackenEnabled is 1 if mutator assists and background mark
+// workers are allowed to blacken objects. This must only be set when
+// gcphase == _GCmark.
+var gcBlackenEnabled uint32
+
+const (
+	_GCoff             = iota // GC not running; sweeping in background, write barrier disabled
+	_GCmark                   // GC marking roots and workbufs: allocate black, write barrier ENABLED
+	_GCmarktermination        // GC mark termination: allocate black, P's help GC, write barrier ENABLED
+)
+
+//go:nosplit
+func setGCPhase(x uint32) {
+	atomic.Store(&gcphase, x)
+	writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination
+	writeBarrier.enabled = writeBarrier.needed || writeBarrier.cgo
+}
+
+// gcMarkWorkerMode represents the mode that a concurrent mark worker
+// should operate in.
+//
+// Concurrent marking happens through four different mechanisms. One
+// is mutator assists, which happen in response to allocations and are
+// not scheduled. The other three are variations in the per-P mark
+// workers and are distinguished by gcMarkWorkerMode.
+type gcMarkWorkerMode int
+
+const (
+	// gcMarkWorkerNotWorker indicates that the next scheduled G is not
+	// starting work and the mode should be ignored.
+	gcMarkWorkerNotWorker gcMarkWorkerMode = iota
+
+	// gcMarkWorkerDedicatedMode indicates that the P of a mark
+	// worker is dedicated to running that mark worker. The mark
+	// worker should run without preemption.
+	gcMarkWorkerDedicatedMode
+
+	// gcMarkWorkerFractionalMode indicates that a P is currently
+	// running the "fractional" mark worker. The fractional worker
+	// is necessary when GOMAXPROCS*gcBackgroundUtilization is not
+	// an integer. The fractional worker should run until it is
+	// preempted and will be scheduled to pick up the fractional
+	// part of GOMAXPROCS*gcBackgroundUtilization.
+	gcMarkWorkerFractionalMode
+
+	// gcMarkWorkerIdleMode indicates that a P is running the mark
+	// worker because it has nothing else to do. The idle worker
+	// should run until it is preempted and account its time
+	// against gcController.idleMarkTime.
+	gcMarkWorkerIdleMode
+)
+
+// gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes
+// to use in execution traces.
+var gcMarkWorkerModeStrings = [...]string{
+	"Not worker",
+	"GC (dedicated)",
+	"GC (fractional)",
+	"GC (idle)",
+}
+
+// gcController implements the GC pacing controller that determines
+// when to trigger concurrent garbage collection and how much marking
+// work to do in mutator assists and background marking.
+//
+// It uses a feedback control algorithm to adjust the memstats.gc_trigger
+// trigger based on the heap growth and GC CPU utilization each cycle.
+// This algorithm optimizes for heap growth to match GOGC and for CPU
+// utilization between assist and background marking to be 25% of
+// GOMAXPROCS. The high-level design of this algorithm is documented
+// at https://golang.org/s/go15gcpacing.
+//
+// All fields of gcController are used only during a single mark
+// cycle.
+var gcController gcControllerState
+
+type gcControllerState struct {
+	// scanWork is the total scan work performed this cycle. This
+	// is updated atomically during the cycle. Updates occur in
+	// bounded batches, since it is both written and read
+	// throughout the cycle. At the end of the cycle, this is how
+	// much of the retained heap is scannable.
+	//
+	// Currently this is the bytes of heap scanned. For most uses,
+	// this is an opaque unit of work, but for estimation the
+	// definition is important.
+	scanWork int64
+
+	// bgScanCredit is the scan work credit accumulated by the
+	// concurrent background scan. This credit is accumulated by
+	// the background scan and stolen by mutator assists. This is
+	// updated atomically. Updates occur in bounded batches, since
+	// it is both written and read throughout the cycle.
+	bgScanCredit int64
+
+	// assistTime is the nanoseconds spent in mutator assists
+	// during this cycle. This is updated atomically. Updates
+	// occur in bounded batches, since it is both written and read
+	// throughout the cycle.
+	assistTime int64
+
+	// dedicatedMarkTime is the nanoseconds spent in dedicated
+	// mark workers during this cycle. This is updated atomically
+	// at the end of the concurrent mark phase.
+	dedicatedMarkTime int64
+
+	// fractionalMarkTime is the nanoseconds spent in the
+	// fractional mark worker during this cycle. This is updated
+	// atomically throughout the cycle and will be up-to-date if
+	// the fractional mark worker is not currently running.
+	fractionalMarkTime int64
+
+	// idleMarkTime is the nanoseconds spent in idle marking
+	// during this cycle. This is updated atomically throughout
+	// the cycle.
+	idleMarkTime int64
+
+	// markStartTime is the absolute start time in nanoseconds
+	// that assists and background mark workers started.
+	markStartTime int64
+
+	// dedicatedMarkWorkersNeeded is the number of dedicated mark
+	// workers that need to be started. This is computed at the
+	// beginning of each cycle and decremented atomically as
+	// dedicated mark workers get started.
+	dedicatedMarkWorkersNeeded int64
+
+	// assistWorkPerByte is the ratio of scan work to allocated
+	// bytes that should be performed by mutator assists. This is
+	// computed at the beginning of each cycle and updated every
+	// time heap_scan is updated.
+	//
+	// Stored as a uint64, but it's actually a float64. Use
+	// float64frombits to get the value.
+	//
+	// Read and written atomically.
+	assistWorkPerByte uint64
+
+	// assistBytesPerWork is 1/assistWorkPerByte.
+	//
+	// Stored as a uint64, but it's actually a float64. Use
+	// float64frombits to get the value.
+	//
+	// Read and written atomically.
+	//
+	// Note that because this is read and written independently
+	// from assistWorkPerByte users may notice a skew between
+	// the two values, and such a state should be safe.
+	assistBytesPerWork uint64
+
+	// fractionalUtilizationGoal is the fraction of wall clock
+	// time that should be spent in the fractional mark worker on
+	// each P that isn't running a dedicated worker.
+	//
+	// For example, if the utilization goal is 25% and there are
+	// no dedicated workers, this will be 0.25. If the goal is
+	// 25%, there is one dedicated worker, and GOMAXPROCS is 5,
+	// this will be 0.05 to make up the missing 5%.
+	//
+	// If this is zero, no fractional workers are needed.
+	fractionalUtilizationGoal float64
+
+	_ cpu.CacheLinePad
+}
+
+// startCycle resets the GC controller's state and computes estimates
+// for a new GC cycle. The caller must hold worldsema and the world
+// must be stopped.
+func (c *gcControllerState) startCycle() {
+	c.scanWork = 0
+	c.bgScanCredit = 0
+	c.assistTime = 0
+	c.dedicatedMarkTime = 0
+	c.fractionalMarkTime = 0
+	c.idleMarkTime = 0
+
+	// Ensure that the heap goal is at least a little larger than
+	// the current live heap size. This may not be the case if GC
+	// start is delayed or if the allocation that pushed heap_live
+	// over gc_trigger is large or if the trigger is really close to
+	// GOGC. Assist is proportional to this distance, so enforce a
+	// minimum distance, even if it means going over the GOGC goal
+	// by a tiny bit.
+	if memstats.next_gc < memstats.heap_live+1024*1024 {
+		memstats.next_gc = memstats.heap_live + 1024*1024
+	}
+
+	// Compute the background mark utilization goal. In general,
+	// this may not come out exactly. We round the number of
+	// dedicated workers so that the utilization is closest to
+	// 25%. For small GOMAXPROCS, this would introduce too much
+	// error, so we add fractional workers in that case.
+	totalUtilizationGoal := float64(gomaxprocs) * gcBackgroundUtilization
+	c.dedicatedMarkWorkersNeeded = int64(totalUtilizationGoal + 0.5)
+	utilError := float64(c.dedicatedMarkWorkersNeeded)/totalUtilizationGoal - 1
+	const maxUtilError = 0.3
+	if utilError < -maxUtilError || utilError > maxUtilError {
+		// Rounding put us more than 30% off our goal. With
+		// gcBackgroundUtilization of 25%, this happens for
+		// GOMAXPROCS<=3 or GOMAXPROCS=6. Enable fractional
+		// workers to compensate.
+		if float64(c.dedicatedMarkWorkersNeeded) > totalUtilizationGoal {
+			// Too many dedicated workers.
+			c.dedicatedMarkWorkersNeeded--
+		}
+		c.fractionalUtilizationGoal = (totalUtilizationGoal - float64(c.dedicatedMarkWorkersNeeded)) / float64(gomaxprocs)
+	} else {
+		c.fractionalUtilizationGoal = 0
+	}
+
+	// In STW mode, we just want dedicated workers.
+	if debug.gcstoptheworld > 0 {
+		c.dedicatedMarkWorkersNeeded = int64(gomaxprocs)
+		c.fractionalUtilizationGoal = 0
+	}
+
+	// Clear per-P state
+	for _, p := range allp {
+		p.gcAssistTime = 0
+		p.gcFractionalMarkTime = 0
+	}
+
+	// Compute initial values for controls that are updated
+	// throughout the cycle.
+	c.revise()
+
+	if debug.gcpacertrace > 0 {
+		assistRatio := float64frombits(atomic.Load64(&c.assistWorkPerByte))
+		print("pacer: assist ratio=", assistRatio,
+			" (scan ", memstats.heap_scan>>20, " MB in ",
+			work.initialHeapLive>>20, "->",
+			memstats.next_gc>>20, " MB)",
+			" workers=", c.dedicatedMarkWorkersNeeded,
+			"+", c.fractionalUtilizationGoal, "\n")
+	}
+}
+
+// revise updates the assist ratio during the GC cycle to account for
+// improved estimates. This should be called whenever memstats.heap_scan,
+// memstats.heap_live, or memstats.next_gc is updated. It is safe to
+// call concurrently, but it may race with other calls to revise.
+//
+// The result of this race is that the two assist ratio values may not line
+// up or may be stale. In practice this is OK because the assist ratio
+// moves slowly throughout a GC cycle, and the assist ratio is a best-effort
+// heuristic anyway. Furthermore, no part of the heuristic depends on
+// the two assist ratio values being exact reciprocals of one another, since
+// the two values are used to convert values from different sources.
+//
+// The worst case result of this raciness is that we may miss a larger shift
+// in the ratio (say, if we decide to pace more aggressively against the
+// hard heap goal) but even this "hard goal" is best-effort (see #40460).
+// The dedicated GC should ensure we don't exceed the hard goal by too much
+// in the rare case we do exceed it.
+//
+// It should only be called when gcBlackenEnabled != 0 (because this
+// is when assists are enabled and the necessary statistics are
+// available).
+func (c *gcControllerState) revise() {
+	gcpercent := gcpercent
+	if gcpercent < 0 {
+		// If GC is disabled but we're running a forced GC,
+		// act like GOGC is huge for the below calculations.
+		gcpercent = 100000
+	}
+	live := atomic.Load64(&memstats.heap_live)
+	scan := atomic.Load64(&memstats.heap_scan)
+	work := atomic.Loadint64(&c.scanWork)
+
+	// Assume we're under the soft goal. Pace GC to complete at
+	// next_gc assuming the heap is in steady-state.
+	heapGoal := int64(atomic.Load64(&memstats.next_gc))
+
+	// Compute the expected scan work remaining.
+	//
+	// This is estimated based on the expected
+	// steady-state scannable heap. For example, with
+	// GOGC=100, only half of the scannable heap is
+	// expected to be live, so that's what we target.
+	//
+	// (This is a float calculation to avoid overflowing on
+	// 100*heap_scan.)
+	scanWorkExpected := int64(float64(scan) * 100 / float64(100+gcpercent))
+
+	if int64(live) > heapGoal || work > scanWorkExpected {
+		// We're past the soft goal, or we've already done more scan
+		// work than we expected. Pace GC so that in the worst case it
+		// will complete by the hard goal.
+		const maxOvershoot = 1.1
+		heapGoal = int64(float64(heapGoal) * maxOvershoot)
+
+		// Compute the upper bound on the scan work remaining.
+		scanWorkExpected = int64(scan)
+	}
+
+	// Compute the remaining scan work estimate.
+	//
+	// Note that we currently count allocations during GC as both
+	// scannable heap (heap_scan) and scan work completed
+	// (scanWork), so allocation will change this difference
+	// slowly in the soft regime and not at all in the hard
+	// regime.
+	scanWorkRemaining := scanWorkExpected - work
+	if scanWorkRemaining < 1000 {
+		// We set a somewhat arbitrary lower bound on
+		// remaining scan work since if we aim a little high,
+		// we can miss by a little.
+		//
+		// We *do* need to enforce that this is at least 1,
+		// since marking is racy and double-scanning objects
+		// may legitimately make the remaining scan work
+		// negative, even in the hard goal regime.
+		scanWorkRemaining = 1000
+	}
+
+	// Compute the heap distance remaining.
+	heapRemaining := heapGoal - int64(live)
+	if heapRemaining <= 0 {
+		// This shouldn't happen, but if it does, avoid
+		// dividing by zero or setting the assist negative.
+		heapRemaining = 1
+	}
+
+	// Compute the mutator assist ratio so by the time the mutator
+	// allocates the remaining heap bytes up to next_gc, it will
+	// have done (or stolen) the remaining amount of scan work.
+	// Note that the assist ratio values are updated atomically
+	// but not together. This means there may be some degree of
+	// skew between the two values. This is generally OK as the
+	// values shift relatively slowly over the course of a GC
+	// cycle.
+	assistWorkPerByte := float64(scanWorkRemaining) / float64(heapRemaining)
+	assistBytesPerWork := float64(heapRemaining) / float64(scanWorkRemaining)
+	atomic.Store64(&c.assistWorkPerByte, float64bits(assistWorkPerByte))
+	atomic.Store64(&c.assistBytesPerWork, float64bits(assistBytesPerWork))
+}
+
+// endCycle computes the trigger ratio for the next cycle.
+func (c *gcControllerState) endCycle() float64 {
+	if work.userForced {
+		// Forced GC means this cycle didn't start at the
+		// trigger, so where it finished isn't good
+		// information about how to adjust the trigger.
+		// Just leave it where it is.
+		return memstats.triggerRatio
+	}
+
+	// Proportional response gain for the trigger controller. Must
+	// be in [0, 1]. Lower values smooth out transient effects but
+	// take longer to respond to phase changes. Higher values
+	// react to phase changes quickly, but are more affected by
+	// transient changes. Values near 1 may be unstable.
+	const triggerGain = 0.5
+
+	// Compute next cycle trigger ratio. First, this computes the
+	// "error" for this cycle; that is, how far off the trigger
+	// was from what it should have been, accounting for both heap
+	// growth and GC CPU utilization. We compute the actual heap
+	// growth during this cycle and scale that by how far off from
+	// the goal CPU utilization we were (to estimate the heap
+	// growth if we had the desired CPU utilization). The
+	// difference between this estimate and the GOGC-based goal
+	// heap growth is the error.
+	goalGrowthRatio := gcEffectiveGrowthRatio()
+	actualGrowthRatio := float64(memstats.heap_live)/float64(memstats.heap_marked) - 1
+	assistDuration := nanotime() - c.markStartTime
+
+	// Assume background mark hit its utilization goal.
+	utilization := gcBackgroundUtilization
+	// Add assist utilization; avoid divide by zero.
+	if assistDuration > 0 {
+		utilization += float64(c.assistTime) / float64(assistDuration*int64(gomaxprocs))
+	}
+
+	triggerError := goalGrowthRatio - memstats.triggerRatio - utilization/gcGoalUtilization*(actualGrowthRatio-memstats.triggerRatio)
+
+	// Finally, we adjust the trigger for next time by this error,
+	// damped by the proportional gain.
+	triggerRatio := memstats.triggerRatio + triggerGain*triggerError
+
+	if debug.gcpacertrace > 0 {
+		// Print controller state in terms of the design
+		// document.
+		H_m_prev := memstats.heap_marked
+		h_t := memstats.triggerRatio
+		H_T := memstats.gc_trigger
+		h_a := actualGrowthRatio
+		H_a := memstats.heap_live
+		h_g := goalGrowthRatio
+		H_g := int64(float64(H_m_prev) * (1 + h_g))
+		u_a := utilization
+		u_g := gcGoalUtilization
+		W_a := c.scanWork
+		print("pacer: H_m_prev=", H_m_prev,
+			" h_t=", h_t, " H_T=", H_T,
+			" h_a=", h_a, " H_a=", H_a,
+			" h_g=", h_g, " H_g=", H_g,
+			" u_a=", u_a, " u_g=", u_g,
+			" W_a=", W_a,
+			" goalΔ=", goalGrowthRatio-h_t,
+			" actualΔ=", h_a-h_t,
+			" u_a/u_g=", u_a/u_g,
+			"\n")
+	}
+
+	return triggerRatio
+}
+
+// enlistWorker encourages another dedicated mark worker to start on
+// another P if there are spare worker slots. It is used by putfull
+// when more work is made available.
+//
+//go:nowritebarrier
+func (c *gcControllerState) enlistWorker() {
+	// If there are idle Ps, wake one so it will run an idle worker.
+	// NOTE: This is suspected of causing deadlocks. See golang.org/issue/19112.
+	//
+	//	if atomic.Load(&sched.npidle) != 0 && atomic.Load(&sched.nmspinning) == 0 {
+	//		wakep()
+	//		return
+	//	}
+
+	// There are no idle Ps. If we need more dedicated workers,
+	// try to preempt a running P so it will switch to a worker.
+	if c.dedicatedMarkWorkersNeeded <= 0 {
+		return
+	}
+	// Pick a random other P to preempt.
+	if gomaxprocs <= 1 {
+		return
+	}
+	gp := getg()
+	if gp == nil || gp.m == nil || gp.m.p == 0 {
+		return
+	}
+	myID := gp.m.p.ptr().id
+	for tries := 0; tries < 5; tries++ {
+		id := int32(fastrandn(uint32(gomaxprocs - 1)))
+		if id >= myID {
+			id++
+		}
+		p := allp[id]
+		if p.status != _Prunning {
+			continue
+		}
+		if preemptone(p) {
+			return
+		}
+	}
+}
+
+// findRunnableGCWorker returns a background mark worker for _p_ if it
+// should be run. This must only be called when gcBlackenEnabled != 0.
+func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g {
+	if gcBlackenEnabled == 0 {
+		throw("gcControllerState.findRunnable: blackening not enabled")
+	}
+
+	if !gcMarkWorkAvailable(_p_) {
+		// No work to be done right now. This can happen at
+		// the end of the mark phase when there are still
+		// assists tapering off. Don't bother running a worker
+		// now because it'll just return immediately.
+		return nil
+	}
+
+	// Grab a worker before we commit to running below.
+	node := (*gcBgMarkWorkerNode)(gcBgMarkWorkerPool.pop())
+	if node == nil {
+		// There is at least one worker per P, so normally there are
+		// enough workers to run on all Ps, if necessary. However, once
+		// a worker enters gcMarkDone it may park without rejoining the
+		// pool, thus freeing a P with no corresponding worker.
+		// gcMarkDone never depends on another worker doing work, so it
+		// is safe to simply do nothing here.
+		//
+		// If gcMarkDone bails out without completing the mark phase,
+		// it will always do so with queued global work. Thus, that P
+		// will be immediately eligible to re-run the worker G it was
+		// just using, ensuring work can complete.
+		return nil
+	}
+
+	decIfPositive := func(ptr *int64) bool {
+		for {
+			v := atomic.Loadint64(ptr)
+			if v <= 0 {
+				return false
+			}
+
+			// TODO: having atomic.Casint64 would be more pleasant.
+			if atomic.Cas64((*uint64)(unsafe.Pointer(ptr)), uint64(v), uint64(v-1)) {
+				return true
+			}
+		}
+	}
+
+	if decIfPositive(&c.dedicatedMarkWorkersNeeded) {
+		// This P is now dedicated to marking until the end of
+		// the concurrent mark phase.
+		_p_.gcMarkWorkerMode = gcMarkWorkerDedicatedMode
+	} else if c.fractionalUtilizationGoal == 0 {
+		// No need for fractional workers.
+		gcBgMarkWorkerPool.push(&node.node)
+		return nil
+	} else {
+		// Is this P behind on the fractional utilization
+		// goal?
+		//
+		// This should be kept in sync with pollFractionalWorkerExit.
+		delta := nanotime() - gcController.markStartTime
+		if delta > 0 && float64(_p_.gcFractionalMarkTime)/float64(delta) > c.fractionalUtilizationGoal {
+			// Nope. No need to run a fractional worker.
+			gcBgMarkWorkerPool.push(&node.node)
+			return nil
+		}
+		// Run a fractional worker.
+		_p_.gcMarkWorkerMode = gcMarkWorkerFractionalMode
+	}
+
+	// Run the background mark worker.
+	gp := node.gp.ptr()
+	casgstatus(gp, _Gwaiting, _Grunnable)
+	if trace.enabled {
+		traceGoUnpark(gp, 0)
+	}
+	return gp
+}
+
+// pollFractionalWorkerExit reports whether a fractional mark worker
+// should self-preempt. It assumes it is called from the fractional
+// worker.
+func pollFractionalWorkerExit() bool {
+	// This should be kept in sync with the fractional worker
+	// scheduler logic in findRunnableGCWorker.
+	now := nanotime()
+	delta := now - gcController.markStartTime
+	if delta <= 0 {
+		return true
+	}
+	p := getg().m.p.ptr()
+	selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime)
+	// Add some slack to the utilization goal so that the
+	// fractional worker isn't behind again the instant it exits.
+	return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal
+}
+
+// gcSetTriggerRatio sets the trigger ratio and updates everything
+// derived from it: the absolute trigger, the heap goal, mark pacing,
+// and sweep pacing.
+//
+// This can be called any time. If GC is the in the middle of a
+// concurrent phase, it will adjust the pacing of that phase.
+//
+// This depends on gcpercent, memstats.heap_marked, and
+// memstats.heap_live. These must be up to date.
+//
+// mheap_.lock must be held or the world must be stopped.
+func gcSetTriggerRatio(triggerRatio float64) {
+	assertWorldStoppedOrLockHeld(&mheap_.lock)
+
+	// Compute the next GC goal, which is when the allocated heap
+	// has grown by GOGC/100 over the heap marked by the last
+	// cycle.
+	goal := ^uint64(0)
+	if gcpercent >= 0 {
+		goal = memstats.heap_marked + memstats.heap_marked*uint64(gcpercent)/100
+	}
+
+	// Set the trigger ratio, capped to reasonable bounds.
+	if gcpercent >= 0 {
+		scalingFactor := float64(gcpercent) / 100
+		// Ensure there's always a little margin so that the
+		// mutator assist ratio isn't infinity.
+		maxTriggerRatio := 0.95 * scalingFactor
+		if triggerRatio > maxTriggerRatio {
+			triggerRatio = maxTriggerRatio
+		}
+
+		// If we let triggerRatio go too low, then if the application
+		// is allocating very rapidly we might end up in a situation
+		// where we're allocating black during a nearly always-on GC.
+		// The result of this is a growing heap and ultimately an
+		// increase in RSS. By capping us at a point >0, we're essentially
+		// saying that we're OK using more CPU during the GC to prevent
+		// this growth in RSS.
+		//
+		// The current constant was chosen empirically: given a sufficiently
+		// fast/scalable allocator with 48 Ps that could drive the trigger ratio
+		// to <0.05, this constant causes applications to retain the same peak
+		// RSS compared to not having this allocator.
+		minTriggerRatio := 0.6 * scalingFactor
+		if triggerRatio < minTriggerRatio {
+			triggerRatio = minTriggerRatio
+		}
+	} else if triggerRatio < 0 {
+		// gcpercent < 0, so just make sure we're not getting a negative
+		// triggerRatio. This case isn't expected to happen in practice,
+		// and doesn't really matter because if gcpercent < 0 then we won't
+		// ever consume triggerRatio further on in this function, but let's
+		// just be defensive here; the triggerRatio being negative is almost
+		// certainly undesirable.
+		triggerRatio = 0
+	}
+	memstats.triggerRatio = triggerRatio
+
+	// Compute the absolute GC trigger from the trigger ratio.
+	//
+	// We trigger the next GC cycle when the allocated heap has
+	// grown by the trigger ratio over the marked heap size.
+	trigger := ^uint64(0)
+	if gcpercent >= 0 {
+		trigger = uint64(float64(memstats.heap_marked) * (1 + triggerRatio))
+		// Don't trigger below the minimum heap size.
+		minTrigger := heapminimum
+		if !isSweepDone() {
+			// Concurrent sweep happens in the heap growth
+			// from heap_live to gc_trigger, so ensure
+			// that concurrent sweep has some heap growth
+			// in which to perform sweeping before we
+			// start the next GC cycle.
+			sweepMin := atomic.Load64(&memstats.heap_live) + sweepMinHeapDistance
+			if sweepMin > minTrigger {
+				minTrigger = sweepMin
+			}
+		}
+		if trigger < minTrigger {
+			trigger = minTrigger
+		}
+		if int64(trigger) < 0 {
+			print("runtime: next_gc=", memstats.next_gc, " heap_marked=", memstats.heap_marked, " heap_live=", memstats.heap_live, " initialHeapLive=", work.initialHeapLive, "triggerRatio=", triggerRatio, " minTrigger=", minTrigger, "\n")
+			throw("gc_trigger underflow")
+		}
+		if trigger > goal {
+			// The trigger ratio is always less than GOGC/100, but
+			// other bounds on the trigger may have raised it.
+			// Push up the goal, too.
+			goal = trigger
+		}
+	}
+
+	// Commit to the trigger and goal.
+	memstats.gc_trigger = trigger
+	atomic.Store64(&memstats.next_gc, goal)
+	if trace.enabled {
+		traceNextGC()
+	}
+
+	// Update mark pacing.
+	if gcphase != _GCoff {
+		gcController.revise()
+	}
+
+	// Update sweep pacing.
+	if isSweepDone() {
+		mheap_.sweepPagesPerByte = 0
+	} else {
+		// Concurrent sweep needs to sweep all of the in-use
+		// pages by the time the allocated heap reaches the GC
+		// trigger. Compute the ratio of in-use pages to sweep
+		// per byte allocated, accounting for the fact that
+		// some might already be swept.
+		heapLiveBasis := atomic.Load64(&memstats.heap_live)
+		heapDistance := int64(trigger) - int64(heapLiveBasis)
+		// Add a little margin so rounding errors and
+		// concurrent sweep are less likely to leave pages
+		// unswept when GC starts.
+		heapDistance -= 1024 * 1024
+		if heapDistance < _PageSize {
+			// Avoid setting the sweep ratio extremely high
+			heapDistance = _PageSize
+		}
+		pagesSwept := atomic.Load64(&mheap_.pagesSwept)
+		pagesInUse := atomic.Load64(&mheap_.pagesInUse)
+		sweepDistancePages := int64(pagesInUse) - int64(pagesSwept)
+		if sweepDistancePages <= 0 {
+			mheap_.sweepPagesPerByte = 0
+		} else {
+			mheap_.sweepPagesPerByte = float64(sweepDistancePages) / float64(heapDistance)
+			mheap_.sweepHeapLiveBasis = heapLiveBasis
+			// Write pagesSweptBasis last, since this
+			// signals concurrent sweeps to recompute
+			// their debt.
+			atomic.Store64(&mheap_.pagesSweptBasis, pagesSwept)
+		}
+	}
+
+	gcPaceScavenger()
+}
+
+// gcEffectiveGrowthRatio returns the current effective heap growth
+// ratio (GOGC/100) based on heap_marked from the previous GC and
+// next_gc for the current GC.
+//
+// This may differ from gcpercent/100 because of various upper and
+// lower bounds on gcpercent. For example, if the heap is smaller than
+// heapminimum, this can be higher than gcpercent/100.
+//
+// mheap_.lock must be held or the world must be stopped.
+func gcEffectiveGrowthRatio() float64 {
+	assertWorldStoppedOrLockHeld(&mheap_.lock)
+
+	egogc := float64(atomic.Load64(&memstats.next_gc)-memstats.heap_marked) / float64(memstats.heap_marked)
+	if egogc < 0 {
+		// Shouldn't happen, but just in case.
+		egogc = 0
+	}
+	return egogc
+}
+
+// gcGoalUtilization is the goal CPU utilization for
+// marking as a fraction of GOMAXPROCS.
+const gcGoalUtilization = 0.30
+
+// gcBackgroundUtilization is the fixed CPU utilization for background
+// marking. It must be <= gcGoalUtilization. The difference between
+// gcGoalUtilization and gcBackgroundUtilization will be made up by
+// mark assists. The scheduler will aim to use within 50% of this
+// goal.
+//
+// Setting this to < gcGoalUtilization avoids saturating the trigger
+// feedback controller when there are no assists, which allows it to
+// better control CPU and heap growth. However, the larger the gap,
+// the more mutator assists are expected to happen, which impact
+// mutator latency.
+const gcBackgroundUtilization = 0.25
+
+// gcCreditSlack is the amount of scan work credit that can
+// accumulate locally before updating gcController.scanWork and,
+// optionally, gcController.bgScanCredit. Lower values give a more
+// accurate assist ratio and make it more likely that assists will
+// successfully steal background credit. Higher values reduce memory
+// contention.
+const gcCreditSlack = 2000
+
+// gcAssistTimeSlack is the nanoseconds of mutator assist time that
+// can accumulate on a P before updating gcController.assistTime.
+const gcAssistTimeSlack = 5000
+
+// gcOverAssistWork determines how many extra units of scan work a GC
+// assist does when an assist happens. This amortizes the cost of an
+// assist by pre-paying for this many bytes of future allocations.
+const gcOverAssistWork = 64 << 10
+
+var work struct {
+	full  lfstack          // lock-free list of full blocks workbuf
+	empty lfstack          // lock-free list of empty blocks workbuf
+	pad0  cpu.CacheLinePad // prevents false-sharing between full/empty and nproc/nwait
+
+	wbufSpans struct {
+		lock mutex
+		// free is a list of spans dedicated to workbufs, but
+		// that don't currently contain any workbufs.
+		free mSpanList
+		// busy is a list of all spans containing workbufs on
+		// one of the workbuf lists.
+		busy mSpanList
+	}
+
+	// Restore 64-bit alignment on 32-bit.
+	_ uint32
+
+	// bytesMarked is the number of bytes marked this cycle. This
+	// includes bytes blackened in scanned objects, noscan objects
+	// that go straight to black, and permagrey objects scanned by
+	// markroot during the concurrent scan phase. This is updated
+	// atomically during the cycle. Updates may be batched
+	// arbitrarily, since the value is only read at the end of the
+	// cycle.
+	//
+	// Because of benign races during marking, this number may not
+	// be the exact number of marked bytes, but it should be very
+	// close.
+	//
+	// Put this field here because it needs 64-bit atomic access
+	// (and thus 8-byte alignment even on 32-bit architectures).
+	bytesMarked uint64
+
+	markrootNext uint32 // next markroot job
+	markrootJobs uint32 // number of markroot jobs
+
+	nproc  uint32
+	tstart int64
+	nwait  uint32
+
+	// Number of roots of various root types. Set by gcMarkRootPrepare.
+	nFlushCacheRoots                               int
+	nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int
+
+	// Each type of GC state transition is protected by a lock.
+	// Since multiple threads can simultaneously detect the state
+	// transition condition, any thread that detects a transition
+	// condition must acquire the appropriate transition lock,
+	// re-check the transition condition and return if it no
+	// longer holds or perform the transition if it does.
+	// Likewise, any transition must invalidate the transition
+	// condition before releasing the lock. This ensures that each
+	// transition is performed by exactly one thread and threads
+	// that need the transition to happen block until it has
+	// happened.
+	//
+	// startSema protects the transition from "off" to mark or
+	// mark termination.
+	startSema uint32
+	// markDoneSema protects transitions from mark to mark termination.
+	markDoneSema uint32
+
+	bgMarkReady note   // signal background mark worker has started
+	bgMarkDone  uint32 // cas to 1 when at a background mark completion point
+	// Background mark completion signaling
+
+	// mode is the concurrency mode of the current GC cycle.
+	mode gcMode
+
+	// userForced indicates the current GC cycle was forced by an
+	// explicit user call.
+	userForced bool
+
+	// totaltime is the CPU nanoseconds spent in GC since the
+	// program started if debug.gctrace > 0.
+	totaltime int64
+
+	// initialHeapLive is the value of memstats.heap_live at the
+	// beginning of this GC cycle.
+	initialHeapLive uint64
+
+	// assistQueue is a queue of assists that are blocked because
+	// there was neither enough credit to steal or enough work to
+	// do.
+	assistQueue struct {
+		lock mutex
+		q    gQueue
+	}
+
+	// sweepWaiters is a list of blocked goroutines to wake when
+	// we transition from mark termination to sweep.
+	sweepWaiters struct {
+		lock mutex
+		list gList
+	}
+
+	// cycles is the number of completed GC cycles, where a GC
+	// cycle is sweep termination, mark, mark termination, and
+	// sweep. This differs from memstats.numgc, which is
+	// incremented at mark termination.
+	cycles uint32
+
+	// Timing/utilization stats for this cycle.
+	stwprocs, maxprocs                 int32
+	tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start
+
+	pauseNS    int64 // total STW time this cycle
+	pauseStart int64 // nanotime() of last STW
+
+	// debug.gctrace heap sizes for this cycle.
+	heap0, heap1, heap2, heapGoal uint64
+}
+
+// GC runs a garbage collection and blocks the caller until the
+// garbage collection is complete. It may also block the entire
+// program.
+func GC() {
+	// We consider a cycle to be: sweep termination, mark, mark
+	// termination, and sweep. This function shouldn't return
+	// until a full cycle has been completed, from beginning to
+	// end. Hence, we always want to finish up the current cycle
+	// and start a new one. That means:
+	//
+	// 1. In sweep termination, mark, or mark termination of cycle
+	// N, wait until mark termination N completes and transitions
+	// to sweep N.
+	//
+	// 2. In sweep N, help with sweep N.
+	//
+	// At this point we can begin a full cycle N+1.
+	//
+	// 3. Trigger cycle N+1 by starting sweep termination N+1.
+	//
+	// 4. Wait for mark termination N+1 to complete.
+	//
+	// 5. Help with sweep N+1 until it's done.
+	//
+	// This all has to be written to deal with the fact that the
+	// GC may move ahead on its own. For example, when we block
+	// until mark termination N, we may wake up in cycle N+2.
+
+	// Wait until the current sweep termination, mark, and mark
+	// termination complete.
+	n := atomic.Load(&work.cycles)
+	gcWaitOnMark(n)
+
+	// We're now in sweep N or later. Trigger GC cycle N+1, which
+	// will first finish sweep N if necessary and then enter sweep
+	// termination N+1.
+	gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1})
+
+	// Wait for mark termination N+1 to complete.
+	gcWaitOnMark(n + 1)
+
+	// Finish sweep N+1 before returning. We do this both to
+	// complete the cycle and because runtime.GC() is often used
+	// as part of tests and benchmarks to get the system into a
+	// relatively stable and isolated state.
+	for atomic.Load(&work.cycles) == n+1 && sweepone() != ^uintptr(0) {
+		sweep.nbgsweep++
+		Gosched()
+	}
+
+	// Callers may assume that the heap profile reflects the
+	// just-completed cycle when this returns (historically this
+	// happened because this was a STW GC), but right now the
+	// profile still reflects mark termination N, not N+1.
+	//
+	// As soon as all of the sweep frees from cycle N+1 are done,
+	// we can go ahead and publish the heap profile.
+	//
+	// First, wait for sweeping to finish. (We know there are no
+	// more spans on the sweep queue, but we may be concurrently
+	// sweeping spans, so we have to wait.)
+	for atomic.Load(&work.cycles) == n+1 && atomic.Load(&mheap_.sweepers) != 0 {
+		Gosched()
+	}
+
+	// Now we're really done with sweeping, so we can publish the
+	// stable heap profile. Only do this if we haven't already hit
+	// another mark termination.
+	mp := acquirem()
+	cycle := atomic.Load(&work.cycles)
+	if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) {
+		mProf_PostSweep()
+	}
+	releasem(mp)
+}
+
+// gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has
+// already completed this mark phase, it returns immediately.
+func gcWaitOnMark(n uint32) {
+	for {
+		// Disable phase transitions.
+		lock(&work.sweepWaiters.lock)
+		nMarks := atomic.Load(&work.cycles)
+		if gcphase != _GCmark {
+			// We've already completed this cycle's mark.
+			nMarks++
+		}
+		if nMarks > n {
+			// We're done.
+			unlock(&work.sweepWaiters.lock)
+			return
+		}
+
+		// Wait until sweep termination, mark, and mark
+		// termination of cycle N complete.
+		work.sweepWaiters.list.push(getg())
+		goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceEvGoBlock, 1)
+	}
+}
+
+// gcMode indicates how concurrent a GC cycle should be.
+type gcMode int
+
+const (
+	gcBackgroundMode gcMode = iota // concurrent GC and sweep
+	gcForceMode                    // stop-the-world GC now, concurrent sweep
+	gcForceBlockMode               // stop-the-world GC now and STW sweep (forced by user)
+)
+
+// A gcTrigger is a predicate for starting a GC cycle. Specifically,
+// it is an exit condition for the _GCoff phase.
+type gcTrigger struct {
+	kind gcTriggerKind
+	now  int64  // gcTriggerTime: current time
+	n    uint32 // gcTriggerCycle: cycle number to start
+}
+
+type gcTriggerKind int
+
+const (
+	// gcTriggerHeap indicates that a cycle should be started when
+	// the heap size reaches the trigger heap size computed by the
+	// controller.
+	gcTriggerHeap gcTriggerKind = iota
+
+	// gcTriggerTime indicates that a cycle should be started when
+	// it's been more than forcegcperiod nanoseconds since the
+	// previous GC cycle.
+	gcTriggerTime
+
+	// gcTriggerCycle indicates that a cycle should be started if
+	// we have not yet started cycle number gcTrigger.n (relative
+	// to work.cycles).
+	gcTriggerCycle
+)
+
+// test reports whether the trigger condition is satisfied, meaning
+// that the exit condition for the _GCoff phase has been met. The exit
+// condition should be tested when allocating.
+func (t gcTrigger) test() bool {
+	if !memstats.enablegc || panicking != 0 || gcphase != _GCoff {
+		return false
+	}
+	switch t.kind {
+	case gcTriggerHeap:
+		// Non-atomic access to heap_live for performance. If
+		// we are going to trigger on this, this thread just
+		// atomically wrote heap_live anyway and we'll see our
+		// own write.
+		return memstats.heap_live >= memstats.gc_trigger
+	case gcTriggerTime:
+		if gcpercent < 0 {
+			return false
+		}
+		lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
+		return lastgc != 0 && t.now-lastgc > forcegcperiod
+	case gcTriggerCycle:
+		// t.n > work.cycles, but accounting for wraparound.
+		return int32(t.n-work.cycles) > 0
+	}
+	return true
+}
+
+// gcStart starts the GC. It transitions from _GCoff to _GCmark (if
+// debug.gcstoptheworld == 0) or performs all of GC (if
+// debug.gcstoptheworld != 0).
+//
+// This may return without performing this transition in some cases,
+// such as when called on a system stack or with locks held.
+func gcStart(trigger gcTrigger) {
+	// Since this is called from malloc and malloc is called in
+	// the guts of a number of libraries that might be holding
+	// locks, don't attempt to start GC in non-preemptible or
+	// potentially unstable situations.
+	mp := acquirem()
+	if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" {
+		releasem(mp)
+		return
+	}
+	releasem(mp)
+	mp = nil
+
+	// Pick up the remaining unswept/not being swept spans concurrently
+	//
+	// This shouldn't happen if we're being invoked in background
+	// mode since proportional sweep should have just finished
+	// sweeping everything, but rounding errors, etc, may leave a
+	// few spans unswept. In forced mode, this is necessary since
+	// GC can be forced at any point in the sweeping cycle.
+	//
+	// We check the transition condition continuously here in case
+	// this G gets delayed in to the next GC cycle.
+	for trigger.test() && sweepone() != ^uintptr(0) {
+		sweep.nbgsweep++
+	}
+
+	// Perform GC initialization and the sweep termination
+	// transition.
+	semacquire(&work.startSema)
+	// Re-check transition condition under transition lock.
+	if !trigger.test() {
+		semrelease(&work.startSema)
+		return
+	}
+
+	// For stats, check if this GC was forced by the user.
+	work.userForced = trigger.kind == gcTriggerCycle
+
+	// In gcstoptheworld debug mode, upgrade the mode accordingly.
+	// We do this after re-checking the transition condition so
+	// that multiple goroutines that detect the heap trigger don't
+	// start multiple STW GCs.
+	mode := gcBackgroundMode
+	if debug.gcstoptheworld == 1 {
+		mode = gcForceMode
+	} else if debug.gcstoptheworld == 2 {
+		mode = gcForceBlockMode
+	}
+
+	// Ok, we're doing it! Stop everybody else
+	semacquire(&gcsema)
+	semacquire(&worldsema)
+
+	if trace.enabled {
+		traceGCStart()
+	}
+
+	// Check that all Ps have finished deferred mcache flushes.
+	for _, p := range allp {
+		if fg := atomic.Load(&p.mcache.flushGen); fg != mheap_.sweepgen {
+			println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen)
+			throw("p mcache not flushed")
+		}
+	}
+
+	gcBgMarkStartWorkers()
+
+	systemstack(gcResetMarkState)
+
+	work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
+	if work.stwprocs > ncpu {
+		// This is used to compute CPU time of the STW phases,
+		// so it can't be more than ncpu, even if GOMAXPROCS is.
+		work.stwprocs = ncpu
+	}
+	work.heap0 = atomic.Load64(&memstats.heap_live)
+	work.pauseNS = 0
+	work.mode = mode
+
+	now := nanotime()
+	work.tSweepTerm = now
+	work.pauseStart = now
+	if trace.enabled {
+		traceGCSTWStart(1)
+	}
+	systemstack(stopTheWorldWithSema)
+	// Finish sweep before we start concurrent scan.
+	systemstack(func() {
+		finishsweep_m()
+	})
+
+	// clearpools before we start the GC. If we wait they memory will not be
+	// reclaimed until the next GC cycle.
+	clearpools()
+
+	work.cycles++
+
+	gcController.startCycle()
+	work.heapGoal = memstats.next_gc
+
+	// In STW mode, disable scheduling of user Gs. This may also
+	// disable scheduling of this goroutine, so it may block as
+	// soon as we start the world again.
+	if mode != gcBackgroundMode {
+		schedEnableUser(false)
+	}
+
+	// Enter concurrent mark phase and enable
+	// write barriers.
+	//
+	// Because the world is stopped, all Ps will
+	// observe that write barriers are enabled by
+	// the time we start the world and begin
+	// scanning.
+	//
+	// Write barriers must be enabled before assists are
+	// enabled because they must be enabled before
+	// any non-leaf heap objects are marked. Since
+	// allocations are blocked until assists can
+	// happen, we want enable assists as early as
+	// possible.
+	setGCPhase(_GCmark)
+
+	gcBgMarkPrepare() // Must happen before assist enable.
+	gcMarkRootPrepare()
+
+	// Mark all active tinyalloc blocks. Since we're
+	// allocating from these, they need to be black like
+	// other allocations. The alternative is to blacken
+	// the tiny block on every allocation from it, which
+	// would slow down the tiny allocator.
+	gcMarkTinyAllocs()
+
+	// At this point all Ps have enabled the write
+	// barrier, thus maintaining the no white to
+	// black invariant. Enable mutator assists to
+	// put back-pressure on fast allocating
+	// mutators.
+	atomic.Store(&gcBlackenEnabled, 1)
+
+	// Assists and workers can start the moment we start
+	// the world.
+	gcController.markStartTime = now
+
+	// In STW mode, we could block the instant systemstack
+	// returns, so make sure we're not preemptible.
+	mp = acquirem()
+
+	// Concurrent mark.
+	systemstack(func() {
+		now = startTheWorldWithSema(trace.enabled)
+		work.pauseNS += now - work.pauseStart
+		work.tMark = now
+		memstats.gcPauseDist.record(now - work.pauseStart)
+	})
+
+	// Release the world sema before Gosched() in STW mode
+	// because we will need to reacquire it later but before
+	// this goroutine becomes runnable again, and we could
+	// self-deadlock otherwise.
+	semrelease(&worldsema)
+	releasem(mp)
+
+	// Make sure we block instead of returning to user code
+	// in STW mode.
+	if mode != gcBackgroundMode {
+		Gosched()
+	}
+
+	semrelease(&work.startSema)
+}
+
+// gcMarkDoneFlushed counts the number of P's with flushed work.
+//
+// Ideally this would be a captured local in gcMarkDone, but forEachP
+// escapes its callback closure, so it can't capture anything.
+//
+// This is protected by markDoneSema.
+var gcMarkDoneFlushed uint32
+
+// gcMarkDone transitions the GC from mark to mark termination if all
+// reachable objects have been marked (that is, there are no grey
+// objects and can be no more in the future). Otherwise, it flushes
+// all local work to the global queues where it can be discovered by
+// other workers.
+//
+// This should be called when all local mark work has been drained and
+// there are no remaining workers. Specifically, when
+//
+//   work.nwait == work.nproc && !gcMarkWorkAvailable(p)
+//
+// The calling context must be preemptible.
+//
+// Flushing local work is important because idle Ps may have local
+// work queued. This is the only way to make that work visible and
+// drive GC to completion.
+//
+// It is explicitly okay to have write barriers in this function. If
+// it does transition to mark termination, then all reachable objects
+// have been marked, so the write barrier cannot shade any more
+// objects.
+func gcMarkDone() {
+	// Ensure only one thread is running the ragged barrier at a
+	// time.
+	semacquire(&work.markDoneSema)
+
+top:
+	// Re-check transition condition under transition lock.
+	//
+	// It's critical that this checks the global work queues are
+	// empty before performing the ragged barrier. Otherwise,
+	// there could be global work that a P could take after the P
+	// has passed the ragged barrier.
+	if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) {
+		semrelease(&work.markDoneSema)
+		return
+	}
+
+	// forEachP needs worldsema to execute, and we'll need it to
+	// stop the world later, so acquire worldsema now.
+	semacquire(&worldsema)
+
+	// Flush all local buffers and collect flushedWork flags.
+	gcMarkDoneFlushed = 0
+	systemstack(func() {
+		gp := getg().m.curg
+		// Mark the user stack as preemptible so that it may be scanned.
+		// Otherwise, our attempt to force all P's to a safepoint could
+		// result in a deadlock as we attempt to preempt a worker that's
+		// trying to preempt us (e.g. for a stack scan).
+		casgstatus(gp, _Grunning, _Gwaiting)
+		forEachP(func(_p_ *p) {
+			// Flush the write barrier buffer, since this may add
+			// work to the gcWork.
+			wbBufFlush1(_p_)
+
+			// Flush the gcWork, since this may create global work
+			// and set the flushedWork flag.
+			//
+			// TODO(austin): Break up these workbufs to
+			// better distribute work.
+			_p_.gcw.dispose()
+			// Collect the flushedWork flag.
+			if _p_.gcw.flushedWork {
+				atomic.Xadd(&gcMarkDoneFlushed, 1)
+				_p_.gcw.flushedWork = false
+			}
+		})
+		casgstatus(gp, _Gwaiting, _Grunning)
+	})
+
+	if gcMarkDoneFlushed != 0 {
+		// More grey objects were discovered since the
+		// previous termination check, so there may be more
+		// work to do. Keep going. It's possible the
+		// transition condition became true again during the
+		// ragged barrier, so re-check it.
+		semrelease(&worldsema)
+		goto top
+	}
+
+	// There was no global work, no local work, and no Ps
+	// communicated work since we took markDoneSema. Therefore
+	// there are no grey objects and no more objects can be
+	// shaded. Transition to mark termination.
+	now := nanotime()
+	work.tMarkTerm = now
+	work.pauseStart = now
+	getg().m.preemptoff = "gcing"
+	if trace.enabled {
+		traceGCSTWStart(0)
+	}
+	systemstack(stopTheWorldWithSema)
+	// The gcphase is _GCmark, it will transition to _GCmarktermination
+	// below. The important thing is that the wb remains active until
+	// all marking is complete. This includes writes made by the GC.
+
+	// There is sometimes work left over when we enter mark termination due
+	// to write barriers performed after the completion barrier above.
+	// Detect this and resume concurrent mark. This is obviously
+	// unfortunate.
+	//
+	// See issue #27993 for details.
+	//
+	// Switch to the system stack to call wbBufFlush1, though in this case
+	// it doesn't matter because we're non-preemptible anyway.
+	restart := false
+	systemstack(func() {
+		for _, p := range allp {
+			wbBufFlush1(p)
+			if !p.gcw.empty() {
+				restart = true
+				break
+			}
+		}
+	})
+	if restart {
+		getg().m.preemptoff = ""
+		systemstack(func() {
+			now := startTheWorldWithSema(true)
+			work.pauseNS += now - work.pauseStart
+			memstats.gcPauseDist.record(now - work.pauseStart)
+		})
+		semrelease(&worldsema)
+		goto top
+	}
+
+	// Disable assists and background workers. We must do
+	// this before waking blocked assists.
+	atomic.Store(&gcBlackenEnabled, 0)
+
+	// Wake all blocked assists. These will run when we
+	// start the world again.
+	gcWakeAllAssists()
+
+	// Likewise, release the transition lock. Blocked
+	// workers and assists will run when we start the
+	// world again.
+	semrelease(&work.markDoneSema)
+
+	// In STW mode, re-enable user goroutines. These will be
+	// queued to run after we start the world.
+	schedEnableUser(true)
+
+	// endCycle depends on all gcWork cache stats being flushed.
+	// The termination algorithm above ensured that up to
+	// allocations since the ragged barrier.
+	nextTriggerRatio := gcController.endCycle()
+
+	// Perform mark termination. This will restart the world.
+	gcMarkTermination(nextTriggerRatio)
+}
+
+// World must be stopped and mark assists and background workers must be
+// disabled.
+func gcMarkTermination(nextTriggerRatio float64) {
+	// Start marktermination (write barrier remains enabled for now).
+	setGCPhase(_GCmarktermination)
+
+	work.heap1 = memstats.heap_live
+	startTime := nanotime()
+
+	mp := acquirem()
+	mp.preemptoff = "gcing"
+	_g_ := getg()
+	_g_.m.traceback = 2
+	gp := _g_.m.curg
+	casgstatus(gp, _Grunning, _Gwaiting)
+	gp.waitreason = waitReasonGarbageCollection
+
+	// Run gc on the g0 stack. We do this so that the g stack
+	// we're currently running on will no longer change. Cuts
+	// the root set down a bit (g0 stacks are not scanned, and
+	// we don't need to scan gc's internal state).  We also
+	// need to switch to g0 so we can shrink the stack.
+	systemstack(func() {
+		gcMark(startTime)
+		// Must return immediately.
+		// The outer function's stack may have moved
+		// during gcMark (it shrinks stacks, including the
+		// outer function's stack), so we must not refer
+		// to any of its variables. Return back to the
+		// non-system stack to pick up the new addresses
+		// before continuing.
+	})
+
+	systemstack(func() {
+		work.heap2 = work.bytesMarked
+		if debug.gccheckmark > 0 {
+			// Run a full non-parallel, stop-the-world
+			// mark using checkmark bits, to check that we
+			// didn't forget to mark anything during the
+			// concurrent mark process.
+			startCheckmarks()
+			gcResetMarkState()
+			gcw := &getg().m.p.ptr().gcw
+			gcDrain(gcw, 0)
+			wbBufFlush1(getg().m.p.ptr())
+			gcw.dispose()
+			endCheckmarks()
+		}
+
+		// marking is complete so we can turn the write barrier off
+		setGCPhase(_GCoff)
+		gcSweep(work.mode)
+	})
+
+	_g_.m.traceback = 0
+	casgstatus(gp, _Gwaiting, _Grunning)
+
+	if trace.enabled {
+		traceGCDone()
+	}
+
+	// all done
+	mp.preemptoff = ""
+
+	if gcphase != _GCoff {
+		throw("gc done but gcphase != _GCoff")
+	}
+
+	// Record next_gc and heap_inuse for scavenger.
+	memstats.last_next_gc = memstats.next_gc
+	memstats.last_heap_inuse = memstats.heap_inuse
+
+	// Update GC trigger and pacing for the next cycle.
+	gcSetTriggerRatio(nextTriggerRatio)
+
+	// Update timing memstats
+	now := nanotime()
+	sec, nsec, _ := time_now()
+	unixNow := sec*1e9 + int64(nsec)
+	work.pauseNS += now - work.pauseStart
+	work.tEnd = now
+	memstats.gcPauseDist.record(now - work.pauseStart)
+	atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user
+	atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us
+	memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS)
+	memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow)
+	memstats.pause_total_ns += uint64(work.pauseNS)
+
+	// Update work.totaltime.
+	sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm)
+	// We report idle marking time below, but omit it from the
+	// overall utilization here since it's "free".
+	markCpu := gcController.assistTime + gcController.dedicatedMarkTime + gcController.fractionalMarkTime
+	markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm)
+	cycleCpu := sweepTermCpu + markCpu + markTermCpu
+	work.totaltime += cycleCpu
+
+	// Compute overall GC CPU utilization.
+	totalCpu := sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs)
+	memstats.gc_cpu_fraction = float64(work.totaltime) / float64(totalCpu)
+
+	// Reset sweep state.
+	sweep.nbgsweep = 0
+	sweep.npausesweep = 0
+
+	if work.userForced {
+		memstats.numforcedgc++
+	}
+
+	// Bump GC cycle count and wake goroutines waiting on sweep.
+	lock(&work.sweepWaiters.lock)
+	memstats.numgc++
+	injectglist(&work.sweepWaiters.list)
+	unlock(&work.sweepWaiters.lock)
+
+	// Finish the current heap profiling cycle and start a new
+	// heap profiling cycle. We do this before starting the world
+	// so events don't leak into the wrong cycle.
+	mProf_NextCycle()
+
+	systemstack(func() { startTheWorldWithSema(true) })
+
+	// Flush the heap profile so we can start a new cycle next GC.
+	// This is relatively expensive, so we don't do it with the
+	// world stopped.
+	mProf_Flush()
+
+	// Prepare workbufs for freeing by the sweeper. We do this
+	// asynchronously because it can take non-trivial time.
+	prepareFreeWorkbufs()
+
+	// Free stack spans. This must be done between GC cycles.
+	systemstack(freeStackSpans)
+
+	// Ensure all mcaches are flushed. Each P will flush its own
+	// mcache before allocating, but idle Ps may not. Since this
+	// is necessary to sweep all spans, we need to ensure all
+	// mcaches are flushed before we start the next GC cycle.
+	systemstack(func() {
+		forEachP(func(_p_ *p) {
+			_p_.mcache.prepareForSweep()
+		})
+	})
+
+	// Print gctrace before dropping worldsema. As soon as we drop
+	// worldsema another cycle could start and smash the stats
+	// we're trying to print.
+	if debug.gctrace > 0 {
+		util := int(memstats.gc_cpu_fraction * 100)
+
+		var sbuf [24]byte
+		printlock()
+		print("gc ", memstats.numgc,
+			" @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ",
+			util, "%: ")
+		prev := work.tSweepTerm
+		for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} {
+			if i != 0 {
+				print("+")
+			}
+			print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev))))
+			prev = ns
+		}
+		print(" ms clock, ")
+		for i, ns := range []int64{sweepTermCpu, gcController.assistTime, gcController.dedicatedMarkTime + gcController.fractionalMarkTime, gcController.idleMarkTime, markTermCpu} {
+			if i == 2 || i == 3 {
+				// Separate mark time components with /.
+				print("/")
+			} else if i != 0 {
+				print("+")
+			}
+			print(string(fmtNSAsMS(sbuf[:], uint64(ns))))
+		}
+		print(" ms cpu, ",
+			work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ",
+			work.heapGoal>>20, " MB goal, ",
+			work.maxprocs, " P")
+		if work.userForced {
+			print(" (forced)")
+		}
+		print("\n")
+		printunlock()
+	}
+
+	semrelease(&worldsema)
+	semrelease(&gcsema)
+	// Careful: another GC cycle may start now.
+
+	releasem(mp)
+	mp = nil
+
+	// now that gc is done, kick off finalizer thread if needed
+	if !concurrentSweep {
+		// give the queued finalizers, if any, a chance to run
+		Gosched()
+	}
+}
+
+// gcBgMarkStartWorkers prepares background mark worker goroutines. These
+// goroutines will not run until the mark phase, but they must be started while
+// the work is not stopped and from a regular G stack. The caller must hold
+// worldsema.
+func gcBgMarkStartWorkers() {
+	// Background marking is performed by per-P G's. Ensure that each P has
+	// a background GC G.
+	//
+	// Worker Gs don't exit if gomaxprocs is reduced. If it is raised
+	// again, we can reuse the old workers; no need to create new workers.
+	for gcBgMarkWorkerCount < gomaxprocs {
+		go gcBgMarkWorker()
+
+		notetsleepg(&work.bgMarkReady, -1)
+		noteclear(&work.bgMarkReady)
+		// The worker is now guaranteed to be added to the pool before
+		// its P's next findRunnableGCWorker.
+
+		gcBgMarkWorkerCount++
+	}
+}
+
+// gcBgMarkPrepare sets up state for background marking.
+// Mutator assists must not yet be enabled.
+func gcBgMarkPrepare() {
+	// Background marking will stop when the work queues are empty
+	// and there are no more workers (note that, since this is
+	// concurrent, this may be a transient state, but mark
+	// termination will clean it up). Between background workers
+	// and assists, we don't really know how many workers there
+	// will be, so we pretend to have an arbitrarily large number
+	// of workers, almost all of which are "waiting". While a
+	// worker is working it decrements nwait. If nproc == nwait,
+	// there are no workers.
+	work.nproc = ^uint32(0)
+	work.nwait = ^uint32(0)
+}
+
+// gcBgMarkWorker is an entry in the gcBgMarkWorkerPool. It points to a single
+// gcBgMarkWorker goroutine.
+type gcBgMarkWorkerNode struct {
+	// Unused workers are managed in a lock-free stack. This field must be first.
+	node lfnode
+
+	// The g of this worker.
+	gp guintptr
+
+	// Release this m on park. This is used to communicate with the unlock
+	// function, which cannot access the G's stack. It is unused outside of
+	// gcBgMarkWorker().
+	m muintptr
+}
+
+func gcBgMarkWorker() {
+	gp := getg()
+
+	// We pass node to a gopark unlock function, so it can't be on
+	// the stack (see gopark). Prevent deadlock from recursively
+	// starting GC by disabling preemption.
+	gp.m.preemptoff = "GC worker init"
+	node := new(gcBgMarkWorkerNode)
+	gp.m.preemptoff = ""
+
+	node.gp.set(gp)
+
+	node.m.set(acquirem())
+	notewakeup(&work.bgMarkReady)
+	// After this point, the background mark worker is generally scheduled
+	// cooperatively by gcController.findRunnableGCWorker. While performing
+	// work on the P, preemption is disabled because we are working on
+	// P-local work buffers. When the preempt flag is set, this puts itself
+	// into _Gwaiting to be woken up by gcController.findRunnableGCWorker
+	// at the appropriate time.
+	//
+	// When preemption is enabled (e.g., while in gcMarkDone), this worker
+	// may be preempted and schedule as a _Grunnable G from a runq. That is
+	// fine; it will eventually gopark again for further scheduling via
+	// findRunnableGCWorker.
+	//
+	// Since we disable preemption before notifying bgMarkReady, we
+	// guarantee that this G will be in the worker pool for the next
+	// findRunnableGCWorker. This isn't strictly necessary, but it reduces
+	// latency between _GCmark starting and the workers starting.
+
+	for {
+		// Go to sleep until woken by
+		// gcController.findRunnableGCWorker.
+		gopark(func(g *g, nodep unsafe.Pointer) bool {
+			node := (*gcBgMarkWorkerNode)(nodep)
+
+			if mp := node.m.ptr(); mp != nil {
+				// The worker G is no longer running; release
+				// the M.
+				//
+				// N.B. it is _safe_ to release the M as soon
+				// as we are no longer performing P-local mark
+				// work.
+				//
+				// However, since we cooperatively stop work
+				// when gp.preempt is set, if we releasem in
+				// the loop then the following call to gopark
+				// would immediately preempt the G. This is
+				// also safe, but inefficient: the G must
+				// schedule again only to enter gopark and park
+				// again. Thus, we defer the release until
+				// after parking the G.
+				releasem(mp)
+			}
+
+			// Release this G to the pool.
+			gcBgMarkWorkerPool.push(&node.node)
+			// Note that at this point, the G may immediately be
+			// rescheduled and may be running.
+			return true
+		}, unsafe.Pointer(node), waitReasonGCWorkerIdle, traceEvGoBlock, 0)
+
+		// Preemption must not occur here, or another G might see
+		// p.gcMarkWorkerMode.
+
+		// Disable preemption so we can use the gcw. If the
+		// scheduler wants to preempt us, we'll stop draining,
+		// dispose the gcw, and then preempt.
+		node.m.set(acquirem())
+		pp := gp.m.p.ptr() // P can't change with preemption disabled.
+
+		if gcBlackenEnabled == 0 {
+			println("worker mode", pp.gcMarkWorkerMode)
+			throw("gcBgMarkWorker: blackening not enabled")
+		}
+
+		if pp.gcMarkWorkerMode == gcMarkWorkerNotWorker {
+			throw("gcBgMarkWorker: mode not set")
+		}
+
+		startTime := nanotime()
+		pp.gcMarkWorkerStartTime = startTime
+
+		decnwait := atomic.Xadd(&work.nwait, -1)
+		if decnwait == work.nproc {
+			println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc)
+			throw("work.nwait was > work.nproc")
+		}
+
+		systemstack(func() {
+			// Mark our goroutine preemptible so its stack
+			// can be scanned. This lets two mark workers
+			// scan each other (otherwise, they would
+			// deadlock). We must not modify anything on
+			// the G stack. However, stack shrinking is
+			// disabled for mark workers, so it is safe to
+			// read from the G stack.
+			casgstatus(gp, _Grunning, _Gwaiting)
+			switch pp.gcMarkWorkerMode {
+			default:
+				throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
+			case gcMarkWorkerDedicatedMode:
+				gcDrain(&pp.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
+				if gp.preempt {
+					// We were preempted. This is
+					// a useful signal to kick
+					// everything out of the run
+					// queue so it can run
+					// somewhere else.
+					lock(&sched.lock)
+					for {
+						gp, _ := runqget(pp)
+						if gp == nil {
+							break
+						}
+						globrunqput(gp)
+					}
+					unlock(&sched.lock)
+				}
+				// Go back to draining, this time
+				// without preemption.
+				gcDrain(&pp.gcw, gcDrainFlushBgCredit)
+			case gcMarkWorkerFractionalMode:
+				gcDrain(&pp.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
+			case gcMarkWorkerIdleMode:
+				gcDrain(&pp.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
+			}
+			casgstatus(gp, _Gwaiting, _Grunning)
+		})
+
+		// Account for time.
+		duration := nanotime() - startTime
+		switch pp.gcMarkWorkerMode {
+		case gcMarkWorkerDedicatedMode:
+			atomic.Xaddint64(&gcController.dedicatedMarkTime, duration)
+			atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1)
+		case gcMarkWorkerFractionalMode:
+			atomic.Xaddint64(&gcController.fractionalMarkTime, duration)
+			atomic.Xaddint64(&pp.gcFractionalMarkTime, duration)
+		case gcMarkWorkerIdleMode:
+			atomic.Xaddint64(&gcController.idleMarkTime, duration)
+		}
+
+		// Was this the last worker and did we run out
+		// of work?
+		incnwait := atomic.Xadd(&work.nwait, +1)
+		if incnwait > work.nproc {
+			println("runtime: p.gcMarkWorkerMode=", pp.gcMarkWorkerMode,
+				"work.nwait=", incnwait, "work.nproc=", work.nproc)
+			throw("work.nwait > work.nproc")
+		}
+
+		// We'll releasem after this point and thus this P may run
+		// something else. We must clear the worker mode to avoid
+		// attributing the mode to a different (non-worker) G in
+		// traceGoStart.
+		pp.gcMarkWorkerMode = gcMarkWorkerNotWorker
+
+		// If this worker reached a background mark completion
+		// point, signal the main GC goroutine.
+		if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
+			// We don't need the P-local buffers here, allow
+			// preemption becuse we may schedule like a regular
+			// goroutine in gcMarkDone (block on locks, etc).
+			releasem(node.m.ptr())
+			node.m.set(nil)
+
+			gcMarkDone()
+		}
+	}
+}
+
+// gcMarkWorkAvailable reports whether executing a mark worker
+// on p is potentially useful. p may be nil, in which case it only
+// checks the global sources of work.
+func gcMarkWorkAvailable(p *p) bool {
+	if p != nil && !p.gcw.empty() {
+		return true
+	}
+	if !work.full.empty() {
+		return true // global work available
+	}
+	if work.markrootNext < work.markrootJobs {
+		return true // root scan work available
+	}
+	return false
+}
+
+// gcMark runs the mark (or, for concurrent GC, mark termination)
+// All gcWork caches must be empty.
+// STW is in effect at this point.
+func gcMark(start_time int64) {
+	if debug.allocfreetrace > 0 {
+		tracegc()
+	}
+
+	if gcphase != _GCmarktermination {
+		throw("in gcMark expecting to see gcphase as _GCmarktermination")
+	}
+	work.tstart = start_time
+
+	// Check that there's no marking work remaining.
+	if work.full != 0 || work.markrootNext < work.markrootJobs {
+		print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nBSSRoots=", work.nBSSRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n")
+		panic("non-empty mark queue after concurrent mark")
+	}
+
+	if debug.gccheckmark > 0 {
+		// This is expensive when there's a large number of
+		// Gs, so only do it if checkmark is also enabled.
+		gcMarkRootCheck()
+	}
+	if work.full != 0 {
+		throw("work.full != 0")
+	}
+
+	// Clear out buffers and double-check that all gcWork caches
+	// are empty. This should be ensured by gcMarkDone before we
+	// enter mark termination.
+	//
+	// TODO: We could clear out buffers just before mark if this
+	// has a non-negligible impact on STW time.
+	for _, p := range allp {
+		// The write barrier may have buffered pointers since
+		// the gcMarkDone barrier. However, since the barrier
+		// ensured all reachable objects were marked, all of
+		// these must be pointers to black objects. Hence we
+		// can just discard the write barrier buffer.
+		if debug.gccheckmark > 0 {
+			// For debugging, flush the buffer and make
+			// sure it really was all marked.
+			wbBufFlush1(p)
+		} else {
+			p.wbBuf.reset()
+		}
+
+		gcw := &p.gcw
+		if !gcw.empty() {
+			printlock()
+			print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork)
+			if gcw.wbuf1 == nil {
+				print(" wbuf1=<nil>")
+			} else {
+				print(" wbuf1.n=", gcw.wbuf1.nobj)
+			}
+			if gcw.wbuf2 == nil {
+				print(" wbuf2=<nil>")
+			} else {
+				print(" wbuf2.n=", gcw.wbuf2.nobj)
+			}
+			print("\n")
+			throw("P has cached GC work at end of mark termination")
+		}
+		// There may still be cached empty buffers, which we
+		// need to flush since we're going to free them. Also,
+		// there may be non-zero stats because we allocated
+		// black after the gcMarkDone barrier.
+		gcw.dispose()
+	}
+
+	// Update the marked heap stat.
+	memstats.heap_marked = work.bytesMarked
+
+	// Flush scanAlloc from each mcache since we're about to modify
+	// heap_scan directly. If we were to flush this later, then scanAlloc
+	// might have incorrect information.
+	for _, p := range allp {
+		c := p.mcache
+		if c == nil {
+			continue
+		}
+		memstats.heap_scan += uint64(c.scanAlloc)
+		c.scanAlloc = 0
+	}
+
+	// Update other GC heap size stats. This must happen after
+	// cachestats (which flushes local statistics to these) and
+	// flushallmcaches (which modifies heap_live).
+	memstats.heap_live = work.bytesMarked
+	memstats.heap_scan = uint64(gcController.scanWork)
+
+	if trace.enabled {
+		traceHeapAlloc()
+	}
+}
+
+// gcSweep must be called on the system stack because it acquires the heap
+// lock. See mheap for details.
+//
+// The world must be stopped.
+//
+//go:systemstack
+func gcSweep(mode gcMode) {
+	assertWorldStopped()
+
+	if gcphase != _GCoff {
+		throw("gcSweep being done but phase is not GCoff")
+	}
+
+	lock(&mheap_.lock)
+	mheap_.sweepgen += 2
+	mheap_.sweepdone = 0
+	mheap_.pagesSwept = 0
+	mheap_.sweepArenas = mheap_.allArenas
+	mheap_.reclaimIndex = 0
+	mheap_.reclaimCredit = 0
+	unlock(&mheap_.lock)
+
+	sweep.centralIndex.clear()
+
+	if !_ConcurrentSweep || mode == gcForceBlockMode {
+		// Special case synchronous sweep.
+		// Record that no proportional sweeping has to happen.
+		lock(&mheap_.lock)
+		mheap_.sweepPagesPerByte = 0
+		unlock(&mheap_.lock)
+		// Sweep all spans eagerly.
+		for sweepone() != ^uintptr(0) {
+			sweep.npausesweep++
+		}
+		// Free workbufs eagerly.
+		prepareFreeWorkbufs()
+		for freeSomeWbufs(false) {
+		}
+		// All "free" events for this mark/sweep cycle have
+		// now happened, so we can make this profile cycle
+		// available immediately.
+		mProf_NextCycle()
+		mProf_Flush()
+		return
+	}
+
+	// Background sweep.
+	lock(&sweep.lock)
+	if sweep.parked {
+		sweep.parked = false
+		ready(sweep.g, 0, true)
+	}
+	unlock(&sweep.lock)
+}
+
+// gcResetMarkState resets global state prior to marking (concurrent
+// or STW) and resets the stack scan state of all Gs.
+//
+// This is safe to do without the world stopped because any Gs created
+// during or after this will start out in the reset state.
+//
+// gcResetMarkState must be called on the system stack because it acquires
+// the heap lock. See mheap for details.
+//
+//go:systemstack
+func gcResetMarkState() {
+	// This may be called during a concurrent phase, so make sure
+	// allgs doesn't change.
+	lock(&allglock)
+	for _, gp := range allgs {
+		gp.gcscandone = false // set to true in gcphasework
+		gp.gcAssistBytes = 0
+	}
+	unlock(&allglock)
+
+	// Clear page marks. This is just 1MB per 64GB of heap, so the
+	// time here is pretty trivial.
+	lock(&mheap_.lock)
+	arenas := mheap_.allArenas
+	unlock(&mheap_.lock)
+	for _, ai := range arenas {
+		ha := mheap_.arenas[ai.l1()][ai.l2()]
+		for i := range ha.pageMarks {
+			ha.pageMarks[i] = 0
+		}
+	}
+
+	work.bytesMarked = 0
+	work.initialHeapLive = atomic.Load64(&memstats.heap_live)
+}
+
+// Hooks for other packages
+
+var poolcleanup func()
+
+//go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup
+func sync_runtime_registerPoolCleanup(f func()) {
+	poolcleanup = f
+}
+
+func clearpools() {
+	// clear sync.Pools
+	if poolcleanup != nil {
+		poolcleanup()
+	}
+
+	// Clear central sudog cache.
+	// Leave per-P caches alone, they have strictly bounded size.
+	// Disconnect cached list before dropping it on the floor,
+	// so that a dangling ref to one entry does not pin all of them.
+	lock(&sched.sudoglock)
+	var sg, sgnext *sudog
+	for sg = sched.sudogcache; sg != nil; sg = sgnext {
+		sgnext = sg.next
+		sg.next = nil
+	}
+	sched.sudogcache = nil
+	unlock(&sched.sudoglock)
+
+	// Clear central defer pools.
+	// Leave per-P pools alone, they have strictly bounded size.
+	lock(&sched.deferlock)
+	for i := range sched.deferpool {
+		// disconnect cached list before dropping it on the floor,
+		// so that a dangling ref to one entry does not pin all of them.
+		var d, dlink *_defer
+		for d = sched.deferpool[i]; d != nil; d = dlink {
+			dlink = d.link
+			d.link = nil
+		}
+		sched.deferpool[i] = nil
+	}
+	unlock(&sched.deferlock)
+}
+
+// Timing
+
+// itoaDiv formats val/(10**dec) into buf.
+func itoaDiv(buf []byte, val uint64, dec int) []byte {
+	i := len(buf) - 1
+	idec := i - dec
+	for val >= 10 || i >= idec {
+		buf[i] = byte(val%10 + '0')
+		i--
+		if i == idec {
+			buf[i] = '.'
+			i--
+		}
+		val /= 10
+	}
+	buf[i] = byte(val + '0')
+	return buf[i:]
+}
+
+// fmtNSAsMS nicely formats ns nanoseconds as milliseconds.
+func fmtNSAsMS(buf []byte, ns uint64) []byte {
+	if ns >= 10e6 {
+		// Format as whole milliseconds.
+		return itoaDiv(buf, ns/1e6, 0)
+	}
+	// Format two digits of precision, with at most three decimal places.
+	x := ns / 1e3
+	if x == 0 {
+		buf[0] = '0'
+		return buf[:1]
+	}
+	dec := 3
+	for x >= 100 {
+		x /= 10
+		dec--
+	}
+	return itoaDiv(buf, x, dec)
+}
diff --git a/src/runtime/mgcmark.go b/src/runtime/mgcmark.go
new file mode 100644
index 0000000..46fae5d
--- /dev/null
+++ b/src/runtime/mgcmark.go
@@ -0,0 +1,1549 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: marking and scanning
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	fixedRootFinalizers = iota
+	fixedRootFreeGStacks
+	fixedRootCount
+
+	// rootBlockBytes is the number of bytes to scan per data or
+	// BSS root.
+	rootBlockBytes = 256 << 10
+
+	// maxObletBytes is the maximum bytes of an object to scan at
+	// once. Larger objects will be split up into "oblets" of at
+	// most this size. Since we can scan 1–2 MB/ms, 128 KB bounds
+	// scan preemption at ~100 µs.
+	//
+	// This must be > _MaxSmallSize so that the object base is the
+	// span base.
+	maxObletBytes = 128 << 10
+
+	// drainCheckThreshold specifies how many units of work to do
+	// between self-preemption checks in gcDrain. Assuming a scan
+	// rate of 1 MB/ms, this is ~100 µs. Lower values have higher
+	// overhead in the scan loop (the scheduler check may perform
+	// a syscall, so its overhead is nontrivial). Higher values
+	// make the system less responsive to incoming work.
+	drainCheckThreshold = 100000
+
+	// pagesPerSpanRoot indicates how many pages to scan from a span root
+	// at a time. Used by special root marking.
+	//
+	// Higher values improve throughput by increasing locality, but
+	// increase the minimum latency of a marking operation.
+	//
+	// Must be a multiple of the pageInUse bitmap element size and
+	// must also evenly divide pagesPerArena.
+	pagesPerSpanRoot = 512
+)
+
+// gcMarkRootPrepare queues root scanning jobs (stacks, globals, and
+// some miscellany) and initializes scanning-related state.
+//
+// The world must be stopped.
+func gcMarkRootPrepare() {
+	assertWorldStopped()
+
+	work.nFlushCacheRoots = 0
+
+	// Compute how many data and BSS root blocks there are.
+	nBlocks := func(bytes uintptr) int {
+		return int(divRoundUp(bytes, rootBlockBytes))
+	}
+
+	work.nDataRoots = 0
+	work.nBSSRoots = 0
+
+	// Scan globals.
+	for _, datap := range activeModules() {
+		nDataRoots := nBlocks(datap.edata - datap.data)
+		if nDataRoots > work.nDataRoots {
+			work.nDataRoots = nDataRoots
+		}
+	}
+
+	for _, datap := range activeModules() {
+		nBSSRoots := nBlocks(datap.ebss - datap.bss)
+		if nBSSRoots > work.nBSSRoots {
+			work.nBSSRoots = nBSSRoots
+		}
+	}
+
+	// Scan span roots for finalizer specials.
+	//
+	// We depend on addfinalizer to mark objects that get
+	// finalizers after root marking.
+	//
+	// We're going to scan the whole heap (that was available at the time the
+	// mark phase started, i.e. markArenas) for in-use spans which have specials.
+	//
+	// Break up the work into arenas, and further into chunks.
+	//
+	// Snapshot allArenas as markArenas. This snapshot is safe because allArenas
+	// is append-only.
+	mheap_.markArenas = mheap_.allArenas[:len(mheap_.allArenas):len(mheap_.allArenas)]
+	work.nSpanRoots = len(mheap_.markArenas) * (pagesPerArena / pagesPerSpanRoot)
+
+	// Scan stacks.
+	//
+	// Gs may be created after this point, but it's okay that we
+	// ignore them because they begin life without any roots, so
+	// there's nothing to scan, and any roots they create during
+	// the concurrent phase will be caught by the write barrier.
+	work.nStackRoots = int(atomic.Loaduintptr(&allglen))
+
+	work.markrootNext = 0
+	work.markrootJobs = uint32(fixedRootCount + work.nFlushCacheRoots + work.nDataRoots + work.nBSSRoots + work.nSpanRoots + work.nStackRoots)
+}
+
+// gcMarkRootCheck checks that all roots have been scanned. It is
+// purely for debugging.
+func gcMarkRootCheck() {
+	if work.markrootNext < work.markrootJobs {
+		print(work.markrootNext, " of ", work.markrootJobs, " markroot jobs done\n")
+		throw("left over markroot jobs")
+	}
+
+	lock(&allglock)
+	// Check that stacks have been scanned.
+	var gp *g
+	for i := 0; i < work.nStackRoots; i++ {
+		gp = allgs[i]
+		if !gp.gcscandone {
+			goto fail
+		}
+	}
+	unlock(&allglock)
+	return
+
+fail:
+	println("gp", gp, "goid", gp.goid,
+		"status", readgstatus(gp),
+		"gcscandone", gp.gcscandone)
+	throw("scan missed a g")
+}
+
+// ptrmask for an allocation containing a single pointer.
+var oneptrmask = [...]uint8{1}
+
+// markroot scans the i'th root.
+//
+// Preemption must be disabled (because this uses a gcWork).
+//
+// nowritebarrier is only advisory here.
+//
+//go:nowritebarrier
+func markroot(gcw *gcWork, i uint32) {
+	// TODO(austin): This is a bit ridiculous. Compute and store
+	// the bases in gcMarkRootPrepare instead of the counts.
+	baseFlushCache := uint32(fixedRootCount)
+	baseData := baseFlushCache + uint32(work.nFlushCacheRoots)
+	baseBSS := baseData + uint32(work.nDataRoots)
+	baseSpans := baseBSS + uint32(work.nBSSRoots)
+	baseStacks := baseSpans + uint32(work.nSpanRoots)
+	end := baseStacks + uint32(work.nStackRoots)
+
+	// Note: if you add a case here, please also update heapdump.go:dumproots.
+	switch {
+	case baseFlushCache <= i && i < baseData:
+		flushmcache(int(i - baseFlushCache))
+
+	case baseData <= i && i < baseBSS:
+		for _, datap := range activeModules() {
+			markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-baseData))
+		}
+
+	case baseBSS <= i && i < baseSpans:
+		for _, datap := range activeModules() {
+			markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-baseBSS))
+		}
+
+	case i == fixedRootFinalizers:
+		for fb := allfin; fb != nil; fb = fb.alllink {
+			cnt := uintptr(atomic.Load(&fb.cnt))
+			scanblock(uintptr(unsafe.Pointer(&fb.fin[0])), cnt*unsafe.Sizeof(fb.fin[0]), &finptrmask[0], gcw, nil)
+		}
+
+	case i == fixedRootFreeGStacks:
+		// Switch to the system stack so we can call
+		// stackfree.
+		systemstack(markrootFreeGStacks)
+
+	case baseSpans <= i && i < baseStacks:
+		// mark mspan.specials
+		markrootSpans(gcw, int(i-baseSpans))
+
+	default:
+		// the rest is scanning goroutine stacks
+		var gp *g
+		if baseStacks <= i && i < end {
+			gp = allgs[i-baseStacks]
+		} else {
+			throw("markroot: bad index")
+		}
+
+		// remember when we've first observed the G blocked
+		// needed only to output in traceback
+		status := readgstatus(gp) // We are not in a scan state
+		if (status == _Gwaiting || status == _Gsyscall) && gp.waitsince == 0 {
+			gp.waitsince = work.tstart
+		}
+
+		// scanstack must be done on the system stack in case
+		// we're trying to scan our own stack.
+		systemstack(func() {
+			// If this is a self-scan, put the user G in
+			// _Gwaiting to prevent self-deadlock. It may
+			// already be in _Gwaiting if this is a mark
+			// worker or we're in mark termination.
+			userG := getg().m.curg
+			selfScan := gp == userG && readgstatus(userG) == _Grunning
+			if selfScan {
+				casgstatus(userG, _Grunning, _Gwaiting)
+				userG.waitreason = waitReasonGarbageCollectionScan
+			}
+
+			// TODO: suspendG blocks (and spins) until gp
+			// stops, which may take a while for
+			// running goroutines. Consider doing this in
+			// two phases where the first is non-blocking:
+			// we scan the stacks we can and ask running
+			// goroutines to scan themselves; and the
+			// second blocks.
+			stopped := suspendG(gp)
+			if stopped.dead {
+				gp.gcscandone = true
+				return
+			}
+			if gp.gcscandone {
+				throw("g already scanned")
+			}
+			scanstack(gp, gcw)
+			gp.gcscandone = true
+			resumeG(stopped)
+
+			if selfScan {
+				casgstatus(userG, _Gwaiting, _Grunning)
+			}
+		})
+	}
+}
+
+// markrootBlock scans the shard'th shard of the block of memory [b0,
+// b0+n0), with the given pointer mask.
+//
+//go:nowritebarrier
+func markrootBlock(b0, n0 uintptr, ptrmask0 *uint8, gcw *gcWork, shard int) {
+	if rootBlockBytes%(8*sys.PtrSize) != 0 {
+		// This is necessary to pick byte offsets in ptrmask0.
+		throw("rootBlockBytes must be a multiple of 8*ptrSize")
+	}
+
+	// Note that if b0 is toward the end of the address space,
+	// then b0 + rootBlockBytes might wrap around.
+	// These tests are written to avoid any possible overflow.
+	off := uintptr(shard) * rootBlockBytes
+	if off >= n0 {
+		return
+	}
+	b := b0 + off
+	ptrmask := (*uint8)(add(unsafe.Pointer(ptrmask0), uintptr(shard)*(rootBlockBytes/(8*sys.PtrSize))))
+	n := uintptr(rootBlockBytes)
+	if off+n > n0 {
+		n = n0 - off
+	}
+
+	// Scan this shard.
+	scanblock(b, n, ptrmask, gcw, nil)
+}
+
+// markrootFreeGStacks frees stacks of dead Gs.
+//
+// This does not free stacks of dead Gs cached on Ps, but having a few
+// cached stacks around isn't a problem.
+func markrootFreeGStacks() {
+	// Take list of dead Gs with stacks.
+	lock(&sched.gFree.lock)
+	list := sched.gFree.stack
+	sched.gFree.stack = gList{}
+	unlock(&sched.gFree.lock)
+	if list.empty() {
+		return
+	}
+
+	// Free stacks.
+	q := gQueue{list.head, list.head}
+	for gp := list.head.ptr(); gp != nil; gp = gp.schedlink.ptr() {
+		stackfree(gp.stack)
+		gp.stack.lo = 0
+		gp.stack.hi = 0
+		// Manipulate the queue directly since the Gs are
+		// already all linked the right way.
+		q.tail.set(gp)
+	}
+
+	// Put Gs back on the free list.
+	lock(&sched.gFree.lock)
+	sched.gFree.noStack.pushAll(q)
+	unlock(&sched.gFree.lock)
+}
+
+// markrootSpans marks roots for one shard of markArenas.
+//
+//go:nowritebarrier
+func markrootSpans(gcw *gcWork, shard int) {
+	// Objects with finalizers have two GC-related invariants:
+	//
+	// 1) Everything reachable from the object must be marked.
+	// This ensures that when we pass the object to its finalizer,
+	// everything the finalizer can reach will be retained.
+	//
+	// 2) Finalizer specials (which are not in the garbage
+	// collected heap) are roots. In practice, this means the fn
+	// field must be scanned.
+	sg := mheap_.sweepgen
+
+	// Find the arena and page index into that arena for this shard.
+	ai := mheap_.markArenas[shard/(pagesPerArena/pagesPerSpanRoot)]
+	ha := mheap_.arenas[ai.l1()][ai.l2()]
+	arenaPage := uint(uintptr(shard) * pagesPerSpanRoot % pagesPerArena)
+
+	// Construct slice of bitmap which we'll iterate over.
+	specialsbits := ha.pageSpecials[arenaPage/8:]
+	specialsbits = specialsbits[:pagesPerSpanRoot/8]
+	for i := range specialsbits {
+		// Find set bits, which correspond to spans with specials.
+		specials := atomic.Load8(&specialsbits[i])
+		if specials == 0 {
+			continue
+		}
+		for j := uint(0); j < 8; j++ {
+			if specials&(1<<j) == 0 {
+				continue
+			}
+			// Find the span for this bit.
+			//
+			// This value is guaranteed to be non-nil because having
+			// specials implies that the span is in-use, and since we're
+			// currently marking we can be sure that we don't have to worry
+			// about the span being freed and re-used.
+			s := ha.spans[arenaPage+uint(i)*8+j]
+
+			// The state must be mSpanInUse if the specials bit is set, so
+			// sanity check that.
+			if state := s.state.get(); state != mSpanInUse {
+				print("s.state = ", state, "\n")
+				throw("non in-use span found with specials bit set")
+			}
+			// Check that this span was swept (it may be cached or uncached).
+			if !useCheckmark && !(s.sweepgen == sg || s.sweepgen == sg+3) {
+				// sweepgen was updated (+2) during non-checkmark GC pass
+				print("sweep ", s.sweepgen, " ", sg, "\n")
+				throw("gc: unswept span")
+			}
+
+			// Lock the specials to prevent a special from being
+			// removed from the list while we're traversing it.
+			lock(&s.speciallock)
+			for sp := s.specials; sp != nil; sp = sp.next {
+				if sp.kind != _KindSpecialFinalizer {
+					continue
+				}
+				// don't mark finalized object, but scan it so we
+				// retain everything it points to.
+				spf := (*specialfinalizer)(unsafe.Pointer(sp))
+				// A finalizer can be set for an inner byte of an object, find object beginning.
+				p := s.base() + uintptr(spf.special.offset)/s.elemsize*s.elemsize
+
+				// Mark everything that can be reached from
+				// the object (but *not* the object itself or
+				// we'll never collect it).
+				scanobject(p, gcw)
+
+				// The special itself is a root.
+				scanblock(uintptr(unsafe.Pointer(&spf.fn)), sys.PtrSize, &oneptrmask[0], gcw, nil)
+			}
+			unlock(&s.speciallock)
+		}
+	}
+}
+
+// gcAssistAlloc performs GC work to make gp's assist debt positive.
+// gp must be the calling user gorountine.
+//
+// This must be called with preemption enabled.
+func gcAssistAlloc(gp *g) {
+	// Don't assist in non-preemptible contexts. These are
+	// generally fragile and won't allow the assist to block.
+	if getg() == gp.m.g0 {
+		return
+	}
+	if mp := getg().m; mp.locks > 0 || mp.preemptoff != "" {
+		return
+	}
+
+	traced := false
+retry:
+	// Compute the amount of scan work we need to do to make the
+	// balance positive. When the required amount of work is low,
+	// we over-assist to build up credit for future allocations
+	// and amortize the cost of assisting.
+	assistWorkPerByte := float64frombits(atomic.Load64(&gcController.assistWorkPerByte))
+	assistBytesPerWork := float64frombits(atomic.Load64(&gcController.assistBytesPerWork))
+	debtBytes := -gp.gcAssistBytes
+	scanWork := int64(assistWorkPerByte * float64(debtBytes))
+	if scanWork < gcOverAssistWork {
+		scanWork = gcOverAssistWork
+		debtBytes = int64(assistBytesPerWork * float64(scanWork))
+	}
+
+	// Steal as much credit as we can from the background GC's
+	// scan credit. This is racy and may drop the background
+	// credit below 0 if two mutators steal at the same time. This
+	// will just cause steals to fail until credit is accumulated
+	// again, so in the long run it doesn't really matter, but we
+	// do have to handle the negative credit case.
+	bgScanCredit := atomic.Loadint64(&gcController.bgScanCredit)
+	stolen := int64(0)
+	if bgScanCredit > 0 {
+		if bgScanCredit < scanWork {
+			stolen = bgScanCredit
+			gp.gcAssistBytes += 1 + int64(assistBytesPerWork*float64(stolen))
+		} else {
+			stolen = scanWork
+			gp.gcAssistBytes += debtBytes
+		}
+		atomic.Xaddint64(&gcController.bgScanCredit, -stolen)
+
+		scanWork -= stolen
+
+		if scanWork == 0 {
+			// We were able to steal all of the credit we
+			// needed.
+			if traced {
+				traceGCMarkAssistDone()
+			}
+			return
+		}
+	}
+
+	if trace.enabled && !traced {
+		traced = true
+		traceGCMarkAssistStart()
+	}
+
+	// Perform assist work
+	systemstack(func() {
+		gcAssistAlloc1(gp, scanWork)
+		// The user stack may have moved, so this can't touch
+		// anything on it until it returns from systemstack.
+	})
+
+	completed := gp.param != nil
+	gp.param = nil
+	if completed {
+		gcMarkDone()
+	}
+
+	if gp.gcAssistBytes < 0 {
+		// We were unable steal enough credit or perform
+		// enough work to pay off the assist debt. We need to
+		// do one of these before letting the mutator allocate
+		// more to prevent over-allocation.
+		//
+		// If this is because we were preempted, reschedule
+		// and try some more.
+		if gp.preempt {
+			Gosched()
+			goto retry
+		}
+
+		// Add this G to an assist queue and park. When the GC
+		// has more background credit, it will satisfy queued
+		// assists before flushing to the global credit pool.
+		//
+		// Note that this does *not* get woken up when more
+		// work is added to the work list. The theory is that
+		// there wasn't enough work to do anyway, so we might
+		// as well let background marking take care of the
+		// work that is available.
+		if !gcParkAssist() {
+			goto retry
+		}
+
+		// At this point either background GC has satisfied
+		// this G's assist debt, or the GC cycle is over.
+	}
+	if traced {
+		traceGCMarkAssistDone()
+	}
+}
+
+// gcAssistAlloc1 is the part of gcAssistAlloc that runs on the system
+// stack. This is a separate function to make it easier to see that
+// we're not capturing anything from the user stack, since the user
+// stack may move while we're in this function.
+//
+// gcAssistAlloc1 indicates whether this assist completed the mark
+// phase by setting gp.param to non-nil. This can't be communicated on
+// the stack since it may move.
+//
+//go:systemstack
+func gcAssistAlloc1(gp *g, scanWork int64) {
+	// Clear the flag indicating that this assist completed the
+	// mark phase.
+	gp.param = nil
+
+	if atomic.Load(&gcBlackenEnabled) == 0 {
+		// The gcBlackenEnabled check in malloc races with the
+		// store that clears it but an atomic check in every malloc
+		// would be a performance hit.
+		// Instead we recheck it here on the non-preemptable system
+		// stack to determine if we should perform an assist.
+
+		// GC is done, so ignore any remaining debt.
+		gp.gcAssistBytes = 0
+		return
+	}
+	// Track time spent in this assist. Since we're on the
+	// system stack, this is non-preemptible, so we can
+	// just measure start and end time.
+	startTime := nanotime()
+
+	decnwait := atomic.Xadd(&work.nwait, -1)
+	if decnwait == work.nproc {
+		println("runtime: work.nwait =", decnwait, "work.nproc=", work.nproc)
+		throw("nwait > work.nprocs")
+	}
+
+	// gcDrainN requires the caller to be preemptible.
+	casgstatus(gp, _Grunning, _Gwaiting)
+	gp.waitreason = waitReasonGCAssistMarking
+
+	// drain own cached work first in the hopes that it
+	// will be more cache friendly.
+	gcw := &getg().m.p.ptr().gcw
+	workDone := gcDrainN(gcw, scanWork)
+
+	casgstatus(gp, _Gwaiting, _Grunning)
+
+	// Record that we did this much scan work.
+	//
+	// Back out the number of bytes of assist credit that
+	// this scan work counts for. The "1+" is a poor man's
+	// round-up, to ensure this adds credit even if
+	// assistBytesPerWork is very low.
+	assistBytesPerWork := float64frombits(atomic.Load64(&gcController.assistBytesPerWork))
+	gp.gcAssistBytes += 1 + int64(assistBytesPerWork*float64(workDone))
+
+	// If this is the last worker and we ran out of work,
+	// signal a completion point.
+	incnwait := atomic.Xadd(&work.nwait, +1)
+	if incnwait > work.nproc {
+		println("runtime: work.nwait=", incnwait,
+			"work.nproc=", work.nproc)
+		throw("work.nwait > work.nproc")
+	}
+
+	if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
+		// This has reached a background completion point. Set
+		// gp.param to a non-nil value to indicate this. It
+		// doesn't matter what we set it to (it just has to be
+		// a valid pointer).
+		gp.param = unsafe.Pointer(gp)
+	}
+	duration := nanotime() - startTime
+	_p_ := gp.m.p.ptr()
+	_p_.gcAssistTime += duration
+	if _p_.gcAssistTime > gcAssistTimeSlack {
+		atomic.Xaddint64(&gcController.assistTime, _p_.gcAssistTime)
+		_p_.gcAssistTime = 0
+	}
+}
+
+// gcWakeAllAssists wakes all currently blocked assists. This is used
+// at the end of a GC cycle. gcBlackenEnabled must be false to prevent
+// new assists from going to sleep after this point.
+func gcWakeAllAssists() {
+	lock(&work.assistQueue.lock)
+	list := work.assistQueue.q.popList()
+	injectglist(&list)
+	unlock(&work.assistQueue.lock)
+}
+
+// gcParkAssist puts the current goroutine on the assist queue and parks.
+//
+// gcParkAssist reports whether the assist is now satisfied. If it
+// returns false, the caller must retry the assist.
+//
+//go:nowritebarrier
+func gcParkAssist() bool {
+	lock(&work.assistQueue.lock)
+	// If the GC cycle finished while we were getting the lock,
+	// exit the assist. The cycle can't finish while we hold the
+	// lock.
+	if atomic.Load(&gcBlackenEnabled) == 0 {
+		unlock(&work.assistQueue.lock)
+		return true
+	}
+
+	gp := getg()
+	oldList := work.assistQueue.q
+	work.assistQueue.q.pushBack(gp)
+
+	// Recheck for background credit now that this G is in
+	// the queue, but can still back out. This avoids a
+	// race in case background marking has flushed more
+	// credit since we checked above.
+	if atomic.Loadint64(&gcController.bgScanCredit) > 0 {
+		work.assistQueue.q = oldList
+		if oldList.tail != 0 {
+			oldList.tail.ptr().schedlink.set(nil)
+		}
+		unlock(&work.assistQueue.lock)
+		return false
+	}
+	// Park.
+	goparkunlock(&work.assistQueue.lock, waitReasonGCAssistWait, traceEvGoBlockGC, 2)
+	return true
+}
+
+// gcFlushBgCredit flushes scanWork units of background scan work
+// credit. This first satisfies blocked assists on the
+// work.assistQueue and then flushes any remaining credit to
+// gcController.bgScanCredit.
+//
+// Write barriers are disallowed because this is used by gcDrain after
+// it has ensured that all work is drained and this must preserve that
+// condition.
+//
+//go:nowritebarrierrec
+func gcFlushBgCredit(scanWork int64) {
+	if work.assistQueue.q.empty() {
+		// Fast path; there are no blocked assists. There's a
+		// small window here where an assist may add itself to
+		// the blocked queue and park. If that happens, we'll
+		// just get it on the next flush.
+		atomic.Xaddint64(&gcController.bgScanCredit, scanWork)
+		return
+	}
+
+	assistBytesPerWork := float64frombits(atomic.Load64(&gcController.assistBytesPerWork))
+	scanBytes := int64(float64(scanWork) * assistBytesPerWork)
+
+	lock(&work.assistQueue.lock)
+	for !work.assistQueue.q.empty() && scanBytes > 0 {
+		gp := work.assistQueue.q.pop()
+		// Note that gp.gcAssistBytes is negative because gp
+		// is in debt. Think carefully about the signs below.
+		if scanBytes+gp.gcAssistBytes >= 0 {
+			// Satisfy this entire assist debt.
+			scanBytes += gp.gcAssistBytes
+			gp.gcAssistBytes = 0
+			// It's important that we *not* put gp in
+			// runnext. Otherwise, it's possible for user
+			// code to exploit the GC worker's high
+			// scheduler priority to get itself always run
+			// before other goroutines and always in the
+			// fresh quantum started by GC.
+			ready(gp, 0, false)
+		} else {
+			// Partially satisfy this assist.
+			gp.gcAssistBytes += scanBytes
+			scanBytes = 0
+			// As a heuristic, we move this assist to the
+			// back of the queue so that large assists
+			// can't clog up the assist queue and
+			// substantially delay small assists.
+			work.assistQueue.q.pushBack(gp)
+			break
+		}
+	}
+
+	if scanBytes > 0 {
+		// Convert from scan bytes back to work.
+		assistWorkPerByte := float64frombits(atomic.Load64(&gcController.assistWorkPerByte))
+		scanWork = int64(float64(scanBytes) * assistWorkPerByte)
+		atomic.Xaddint64(&gcController.bgScanCredit, scanWork)
+	}
+	unlock(&work.assistQueue.lock)
+}
+
+// scanstack scans gp's stack, greying all pointers found on the stack.
+//
+// scanstack will also shrink the stack if it is safe to do so. If it
+// is not, it schedules a stack shrink for the next synchronous safe
+// point.
+//
+// scanstack is marked go:systemstack because it must not be preempted
+// while using a workbuf.
+//
+//go:nowritebarrier
+//go:systemstack
+func scanstack(gp *g, gcw *gcWork) {
+	if readgstatus(gp)&_Gscan == 0 {
+		print("runtime:scanstack: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", hex(readgstatus(gp)), "\n")
+		throw("scanstack - bad status")
+	}
+
+	switch readgstatus(gp) &^ _Gscan {
+	default:
+		print("runtime: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", readgstatus(gp), "\n")
+		throw("mark - bad status")
+	case _Gdead:
+		return
+	case _Grunning:
+		print("runtime: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", readgstatus(gp), "\n")
+		throw("scanstack: goroutine not stopped")
+	case _Grunnable, _Gsyscall, _Gwaiting:
+		// ok
+	}
+
+	if gp == getg() {
+		throw("can't scan our own stack")
+	}
+
+	if isShrinkStackSafe(gp) {
+		// Shrink the stack if not much of it is being used.
+		shrinkstack(gp)
+	} else {
+		// Otherwise, shrink the stack at the next sync safe point.
+		gp.preemptShrink = true
+	}
+
+	var state stackScanState
+	state.stack = gp.stack
+
+	if stackTraceDebug {
+		println("stack trace goroutine", gp.goid)
+	}
+
+	if debugScanConservative && gp.asyncSafePoint {
+		print("scanning async preempted goroutine ", gp.goid, " stack [", hex(gp.stack.lo), ",", hex(gp.stack.hi), ")\n")
+	}
+
+	// Scan the saved context register. This is effectively a live
+	// register that gets moved back and forth between the
+	// register and sched.ctxt without a write barrier.
+	if gp.sched.ctxt != nil {
+		scanblock(uintptr(unsafe.Pointer(&gp.sched.ctxt)), sys.PtrSize, &oneptrmask[0], gcw, &state)
+	}
+
+	// Scan the stack. Accumulate a list of stack objects.
+	scanframe := func(frame *stkframe, unused unsafe.Pointer) bool {
+		scanframeworker(frame, &state, gcw)
+		return true
+	}
+	gentraceback(^uintptr(0), ^uintptr(0), 0, gp, 0, nil, 0x7fffffff, scanframe, nil, 0)
+
+	// Find additional pointers that point into the stack from the heap.
+	// Currently this includes defers and panics. See also function copystack.
+
+	// Find and trace all defer arguments.
+	tracebackdefers(gp, scanframe, nil)
+
+	// Find and trace other pointers in defer records.
+	for d := gp._defer; d != nil; d = d.link {
+		if d.fn != nil {
+			// tracebackdefers above does not scan the func value, which could
+			// be a stack allocated closure. See issue 30453.
+			scanblock(uintptr(unsafe.Pointer(&d.fn)), sys.PtrSize, &oneptrmask[0], gcw, &state)
+		}
+		if d.link != nil {
+			// The link field of a stack-allocated defer record might point
+			// to a heap-allocated defer record. Keep that heap record live.
+			scanblock(uintptr(unsafe.Pointer(&d.link)), sys.PtrSize, &oneptrmask[0], gcw, &state)
+		}
+		// Retain defers records themselves.
+		// Defer records might not be reachable from the G through regular heap
+		// tracing because the defer linked list might weave between the stack and the heap.
+		if d.heap {
+			scanblock(uintptr(unsafe.Pointer(&d)), sys.PtrSize, &oneptrmask[0], gcw, &state)
+		}
+	}
+	if gp._panic != nil {
+		// Panics are always stack allocated.
+		state.putPtr(uintptr(unsafe.Pointer(gp._panic)), false)
+	}
+
+	// Find and scan all reachable stack objects.
+	//
+	// The state's pointer queue prioritizes precise pointers over
+	// conservative pointers so that we'll prefer scanning stack
+	// objects precisely.
+	state.buildIndex()
+	for {
+		p, conservative := state.getPtr()
+		if p == 0 {
+			break
+		}
+		obj := state.findObject(p)
+		if obj == nil {
+			continue
+		}
+		t := obj.typ
+		if t == nil {
+			// We've already scanned this object.
+			continue
+		}
+		obj.setType(nil) // Don't scan it again.
+		if stackTraceDebug {
+			printlock()
+			print("  live stkobj at", hex(state.stack.lo+uintptr(obj.off)), "of type", t.string())
+			if conservative {
+				print(" (conservative)")
+			}
+			println()
+			printunlock()
+		}
+		gcdata := t.gcdata
+		var s *mspan
+		if t.kind&kindGCProg != 0 {
+			// This path is pretty unlikely, an object large enough
+			// to have a GC program allocated on the stack.
+			// We need some space to unpack the program into a straight
+			// bitmask, which we allocate/free here.
+			// TODO: it would be nice if there were a way to run a GC
+			// program without having to store all its bits. We'd have
+			// to change from a Lempel-Ziv style program to something else.
+			// Or we can forbid putting objects on stacks if they require
+			// a gc program (see issue 27447).
+			s = materializeGCProg(t.ptrdata, gcdata)
+			gcdata = (*byte)(unsafe.Pointer(s.startAddr))
+		}
+
+		b := state.stack.lo + uintptr(obj.off)
+		if conservative {
+			scanConservative(b, t.ptrdata, gcdata, gcw, &state)
+		} else {
+			scanblock(b, t.ptrdata, gcdata, gcw, &state)
+		}
+
+		if s != nil {
+			dematerializeGCProg(s)
+		}
+	}
+
+	// Deallocate object buffers.
+	// (Pointer buffers were all deallocated in the loop above.)
+	for state.head != nil {
+		x := state.head
+		state.head = x.next
+		if stackTraceDebug {
+			for i := 0; i < x.nobj; i++ {
+				obj := &x.obj[i]
+				if obj.typ == nil { // reachable
+					continue
+				}
+				println("  dead stkobj at", hex(gp.stack.lo+uintptr(obj.off)), "of type", obj.typ.string())
+				// Note: not necessarily really dead - only reachable-from-ptr dead.
+			}
+		}
+		x.nobj = 0
+		putempty((*workbuf)(unsafe.Pointer(x)))
+	}
+	if state.buf != nil || state.cbuf != nil || state.freeBuf != nil {
+		throw("remaining pointer buffers")
+	}
+}
+
+// Scan a stack frame: local variables and function arguments/results.
+//go:nowritebarrier
+func scanframeworker(frame *stkframe, state *stackScanState, gcw *gcWork) {
+	if _DebugGC > 1 && frame.continpc != 0 {
+		print("scanframe ", funcname(frame.fn), "\n")
+	}
+
+	isAsyncPreempt := frame.fn.valid() && frame.fn.funcID == funcID_asyncPreempt
+	isDebugCall := frame.fn.valid() && frame.fn.funcID == funcID_debugCallV1
+	if state.conservative || isAsyncPreempt || isDebugCall {
+		if debugScanConservative {
+			println("conservatively scanning function", funcname(frame.fn), "at PC", hex(frame.continpc))
+		}
+
+		// Conservatively scan the frame. Unlike the precise
+		// case, this includes the outgoing argument space
+		// since we may have stopped while this function was
+		// setting up a call.
+		//
+		// TODO: We could narrow this down if the compiler
+		// produced a single map per function of stack slots
+		// and registers that ever contain a pointer.
+		if frame.varp != 0 {
+			size := frame.varp - frame.sp
+			if size > 0 {
+				scanConservative(frame.sp, size, nil, gcw, state)
+			}
+		}
+
+		// Scan arguments to this frame.
+		if frame.arglen != 0 {
+			// TODO: We could pass the entry argument map
+			// to narrow this down further.
+			scanConservative(frame.argp, frame.arglen, nil, gcw, state)
+		}
+
+		if isAsyncPreempt || isDebugCall {
+			// This function's frame contained the
+			// registers for the asynchronously stopped
+			// parent frame. Scan the parent
+			// conservatively.
+			state.conservative = true
+		} else {
+			// We only wanted to scan those two frames
+			// conservatively. Clear the flag for future
+			// frames.
+			state.conservative = false
+		}
+		return
+	}
+
+	locals, args, objs := getStackMap(frame, &state.cache, false)
+
+	// Scan local variables if stack frame has been allocated.
+	if locals.n > 0 {
+		size := uintptr(locals.n) * sys.PtrSize
+		scanblock(frame.varp-size, size, locals.bytedata, gcw, state)
+	}
+
+	// Scan arguments.
+	if args.n > 0 {
+		scanblock(frame.argp, uintptr(args.n)*sys.PtrSize, args.bytedata, gcw, state)
+	}
+
+	// Add all stack objects to the stack object list.
+	if frame.varp != 0 {
+		// varp is 0 for defers, where there are no locals.
+		// In that case, there can't be a pointer to its args, either.
+		// (And all args would be scanned above anyway.)
+		for _, obj := range objs {
+			off := obj.off
+			base := frame.varp // locals base pointer
+			if off >= 0 {
+				base = frame.argp // arguments and return values base pointer
+			}
+			ptr := base + uintptr(off)
+			if ptr < frame.sp {
+				// object hasn't been allocated in the frame yet.
+				continue
+			}
+			if stackTraceDebug {
+				println("stkobj at", hex(ptr), "of type", obj.typ.string())
+			}
+			state.addObject(ptr, obj.typ)
+		}
+	}
+}
+
+type gcDrainFlags int
+
+const (
+	gcDrainUntilPreempt gcDrainFlags = 1 << iota
+	gcDrainFlushBgCredit
+	gcDrainIdle
+	gcDrainFractional
+)
+
+// gcDrain scans roots and objects in work buffers, blackening grey
+// objects until it is unable to get more work. It may return before
+// GC is done; it's the caller's responsibility to balance work from
+// other Ps.
+//
+// If flags&gcDrainUntilPreempt != 0, gcDrain returns when g.preempt
+// is set.
+//
+// If flags&gcDrainIdle != 0, gcDrain returns when there is other work
+// to do.
+//
+// If flags&gcDrainFractional != 0, gcDrain self-preempts when
+// pollFractionalWorkerExit() returns true. This implies
+// gcDrainNoBlock.
+//
+// If flags&gcDrainFlushBgCredit != 0, gcDrain flushes scan work
+// credit to gcController.bgScanCredit every gcCreditSlack units of
+// scan work.
+//
+// gcDrain will always return if there is a pending STW.
+//
+//go:nowritebarrier
+func gcDrain(gcw *gcWork, flags gcDrainFlags) {
+	if !writeBarrier.needed {
+		throw("gcDrain phase incorrect")
+	}
+
+	gp := getg().m.curg
+	preemptible := flags&gcDrainUntilPreempt != 0
+	flushBgCredit := flags&gcDrainFlushBgCredit != 0
+	idle := flags&gcDrainIdle != 0
+
+	initScanWork := gcw.scanWork
+
+	// checkWork is the scan work before performing the next
+	// self-preempt check.
+	checkWork := int64(1<<63 - 1)
+	var check func() bool
+	if flags&(gcDrainIdle|gcDrainFractional) != 0 {
+		checkWork = initScanWork + drainCheckThreshold
+		if idle {
+			check = pollWork
+		} else if flags&gcDrainFractional != 0 {
+			check = pollFractionalWorkerExit
+		}
+	}
+
+	// Drain root marking jobs.
+	if work.markrootNext < work.markrootJobs {
+		// Stop if we're preemptible or if someone wants to STW.
+		for !(gp.preempt && (preemptible || atomic.Load(&sched.gcwaiting) != 0)) {
+			job := atomic.Xadd(&work.markrootNext, +1) - 1
+			if job >= work.markrootJobs {
+				break
+			}
+			markroot(gcw, job)
+			if check != nil && check() {
+				goto done
+			}
+		}
+	}
+
+	// Drain heap marking jobs.
+	// Stop if we're preemptible or if someone wants to STW.
+	for !(gp.preempt && (preemptible || atomic.Load(&sched.gcwaiting) != 0)) {
+		// Try to keep work available on the global queue. We used to
+		// check if there were waiting workers, but it's better to
+		// just keep work available than to make workers wait. In the
+		// worst case, we'll do O(log(_WorkbufSize)) unnecessary
+		// balances.
+		if work.full == 0 {
+			gcw.balance()
+		}
+
+		b := gcw.tryGetFast()
+		if b == 0 {
+			b = gcw.tryGet()
+			if b == 0 {
+				// Flush the write barrier
+				// buffer; this may create
+				// more work.
+				wbBufFlush(nil, 0)
+				b = gcw.tryGet()
+			}
+		}
+		if b == 0 {
+			// Unable to get work.
+			break
+		}
+		scanobject(b, gcw)
+
+		// Flush background scan work credit to the global
+		// account if we've accumulated enough locally so
+		// mutator assists can draw on it.
+		if gcw.scanWork >= gcCreditSlack {
+			atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
+			if flushBgCredit {
+				gcFlushBgCredit(gcw.scanWork - initScanWork)
+				initScanWork = 0
+			}
+			checkWork -= gcw.scanWork
+			gcw.scanWork = 0
+
+			if checkWork <= 0 {
+				checkWork += drainCheckThreshold
+				if check != nil && check() {
+					break
+				}
+			}
+		}
+	}
+
+done:
+	// Flush remaining scan work credit.
+	if gcw.scanWork > 0 {
+		atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
+		if flushBgCredit {
+			gcFlushBgCredit(gcw.scanWork - initScanWork)
+		}
+		gcw.scanWork = 0
+	}
+}
+
+// gcDrainN blackens grey objects until it has performed roughly
+// scanWork units of scan work or the G is preempted. This is
+// best-effort, so it may perform less work if it fails to get a work
+// buffer. Otherwise, it will perform at least n units of work, but
+// may perform more because scanning is always done in whole object
+// increments. It returns the amount of scan work performed.
+//
+// The caller goroutine must be in a preemptible state (e.g.,
+// _Gwaiting) to prevent deadlocks during stack scanning. As a
+// consequence, this must be called on the system stack.
+//
+//go:nowritebarrier
+//go:systemstack
+func gcDrainN(gcw *gcWork, scanWork int64) int64 {
+	if !writeBarrier.needed {
+		throw("gcDrainN phase incorrect")
+	}
+
+	// There may already be scan work on the gcw, which we don't
+	// want to claim was done by this call.
+	workFlushed := -gcw.scanWork
+
+	gp := getg().m.curg
+	for !gp.preempt && workFlushed+gcw.scanWork < scanWork {
+		// See gcDrain comment.
+		if work.full == 0 {
+			gcw.balance()
+		}
+
+		// This might be a good place to add prefetch code...
+		// if(wbuf.nobj > 4) {
+		//         PREFETCH(wbuf->obj[wbuf.nobj - 3];
+		//  }
+		//
+		b := gcw.tryGetFast()
+		if b == 0 {
+			b = gcw.tryGet()
+			if b == 0 {
+				// Flush the write barrier buffer;
+				// this may create more work.
+				wbBufFlush(nil, 0)
+				b = gcw.tryGet()
+			}
+		}
+
+		if b == 0 {
+			// Try to do a root job.
+			//
+			// TODO: Assists should get credit for this
+			// work.
+			if work.markrootNext < work.markrootJobs {
+				job := atomic.Xadd(&work.markrootNext, +1) - 1
+				if job < work.markrootJobs {
+					markroot(gcw, job)
+					continue
+				}
+			}
+			// No heap or root jobs.
+			break
+		}
+		scanobject(b, gcw)
+
+		// Flush background scan work credit.
+		if gcw.scanWork >= gcCreditSlack {
+			atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
+			workFlushed += gcw.scanWork
+			gcw.scanWork = 0
+		}
+	}
+
+	// Unlike gcDrain, there's no need to flush remaining work
+	// here because this never flushes to bgScanCredit and
+	// gcw.dispose will flush any remaining work to scanWork.
+
+	return workFlushed + gcw.scanWork
+}
+
+// scanblock scans b as scanobject would, but using an explicit
+// pointer bitmap instead of the heap bitmap.
+//
+// This is used to scan non-heap roots, so it does not update
+// gcw.bytesMarked or gcw.scanWork.
+//
+// If stk != nil, possible stack pointers are also reported to stk.putPtr.
+//go:nowritebarrier
+func scanblock(b0, n0 uintptr, ptrmask *uint8, gcw *gcWork, stk *stackScanState) {
+	// Use local copies of original parameters, so that a stack trace
+	// due to one of the throws below shows the original block
+	// base and extent.
+	b := b0
+	n := n0
+
+	for i := uintptr(0); i < n; {
+		// Find bits for the next word.
+		bits := uint32(*addb(ptrmask, i/(sys.PtrSize*8)))
+		if bits == 0 {
+			i += sys.PtrSize * 8
+			continue
+		}
+		for j := 0; j < 8 && i < n; j++ {
+			if bits&1 != 0 {
+				// Same work as in scanobject; see comments there.
+				p := *(*uintptr)(unsafe.Pointer(b + i))
+				if p != 0 {
+					if obj, span, objIndex := findObject(p, b, i); obj != 0 {
+						greyobject(obj, b, i, span, gcw, objIndex)
+					} else if stk != nil && p >= stk.stack.lo && p < stk.stack.hi {
+						stk.putPtr(p, false)
+					}
+				}
+			}
+			bits >>= 1
+			i += sys.PtrSize
+		}
+	}
+}
+
+// scanobject scans the object starting at b, adding pointers to gcw.
+// b must point to the beginning of a heap object or an oblet.
+// scanobject consults the GC bitmap for the pointer mask and the
+// spans for the size of the object.
+//
+//go:nowritebarrier
+func scanobject(b uintptr, gcw *gcWork) {
+	// Find the bits for b and the size of the object at b.
+	//
+	// b is either the beginning of an object, in which case this
+	// is the size of the object to scan, or it points to an
+	// oblet, in which case we compute the size to scan below.
+	hbits := heapBitsForAddr(b)
+	s := spanOfUnchecked(b)
+	n := s.elemsize
+	if n == 0 {
+		throw("scanobject n == 0")
+	}
+
+	if n > maxObletBytes {
+		// Large object. Break into oblets for better
+		// parallelism and lower latency.
+		if b == s.base() {
+			// It's possible this is a noscan object (not
+			// from greyobject, but from other code
+			// paths), in which case we must *not* enqueue
+			// oblets since their bitmaps will be
+			// uninitialized.
+			if s.spanclass.noscan() {
+				// Bypass the whole scan.
+				gcw.bytesMarked += uint64(n)
+				return
+			}
+
+			// Enqueue the other oblets to scan later.
+			// Some oblets may be in b's scalar tail, but
+			// these will be marked as "no more pointers",
+			// so we'll drop out immediately when we go to
+			// scan those.
+			for oblet := b + maxObletBytes; oblet < s.base()+s.elemsize; oblet += maxObletBytes {
+				if !gcw.putFast(oblet) {
+					gcw.put(oblet)
+				}
+			}
+		}
+
+		// Compute the size of the oblet. Since this object
+		// must be a large object, s.base() is the beginning
+		// of the object.
+		n = s.base() + s.elemsize - b
+		if n > maxObletBytes {
+			n = maxObletBytes
+		}
+	}
+
+	var i uintptr
+	for i = 0; i < n; i += sys.PtrSize {
+		// Find bits for this word.
+		if i != 0 {
+			// Avoid needless hbits.next() on last iteration.
+			hbits = hbits.next()
+		}
+		// Load bits once. See CL 22712 and issue 16973 for discussion.
+		bits := hbits.bits()
+		if bits&bitScan == 0 {
+			break // no more pointers in this object
+		}
+		if bits&bitPointer == 0 {
+			continue // not a pointer
+		}
+
+		// Work here is duplicated in scanblock and above.
+		// If you make changes here, make changes there too.
+		obj := *(*uintptr)(unsafe.Pointer(b + i))
+
+		// At this point we have extracted the next potential pointer.
+		// Quickly filter out nil and pointers back to the current object.
+		if obj != 0 && obj-b >= n {
+			// Test if obj points into the Go heap and, if so,
+			// mark the object.
+			//
+			// Note that it's possible for findObject to
+			// fail if obj points to a just-allocated heap
+			// object because of a race with growing the
+			// heap. In this case, we know the object was
+			// just allocated and hence will be marked by
+			// allocation itself.
+			if obj, span, objIndex := findObject(obj, b, i); obj != 0 {
+				greyobject(obj, b, i, span, gcw, objIndex)
+			}
+		}
+	}
+	gcw.bytesMarked += uint64(n)
+	gcw.scanWork += int64(i)
+}
+
+// scanConservative scans block [b, b+n) conservatively, treating any
+// pointer-like value in the block as a pointer.
+//
+// If ptrmask != nil, only words that are marked in ptrmask are
+// considered as potential pointers.
+//
+// If state != nil, it's assumed that [b, b+n) is a block in the stack
+// and may contain pointers to stack objects.
+func scanConservative(b, n uintptr, ptrmask *uint8, gcw *gcWork, state *stackScanState) {
+	if debugScanConservative {
+		printlock()
+		print("conservatively scanning [", hex(b), ",", hex(b+n), ")\n")
+		hexdumpWords(b, b+n, func(p uintptr) byte {
+			if ptrmask != nil {
+				word := (p - b) / sys.PtrSize
+				bits := *addb(ptrmask, word/8)
+				if (bits>>(word%8))&1 == 0 {
+					return '$'
+				}
+			}
+
+			val := *(*uintptr)(unsafe.Pointer(p))
+			if state != nil && state.stack.lo <= val && val < state.stack.hi {
+				return '@'
+			}
+
+			span := spanOfHeap(val)
+			if span == nil {
+				return ' '
+			}
+			idx := span.objIndex(val)
+			if span.isFree(idx) {
+				return ' '
+			}
+			return '*'
+		})
+		printunlock()
+	}
+
+	for i := uintptr(0); i < n; i += sys.PtrSize {
+		if ptrmask != nil {
+			word := i / sys.PtrSize
+			bits := *addb(ptrmask, word/8)
+			if bits == 0 {
+				// Skip 8 words (the loop increment will do the 8th)
+				//
+				// This must be the first time we've
+				// seen this word of ptrmask, so i
+				// must be 8-word-aligned, but check
+				// our reasoning just in case.
+				if i%(sys.PtrSize*8) != 0 {
+					throw("misaligned mask")
+				}
+				i += sys.PtrSize*8 - sys.PtrSize
+				continue
+			}
+			if (bits>>(word%8))&1 == 0 {
+				continue
+			}
+		}
+
+		val := *(*uintptr)(unsafe.Pointer(b + i))
+
+		// Check if val points into the stack.
+		if state != nil && state.stack.lo <= val && val < state.stack.hi {
+			// val may point to a stack object. This
+			// object may be dead from last cycle and
+			// hence may contain pointers to unallocated
+			// objects, but unlike heap objects we can't
+			// tell if it's already dead. Hence, if all
+			// pointers to this object are from
+			// conservative scanning, we have to scan it
+			// defensively, too.
+			state.putPtr(val, true)
+			continue
+		}
+
+		// Check if val points to a heap span.
+		span := spanOfHeap(val)
+		if span == nil {
+			continue
+		}
+
+		// Check if val points to an allocated object.
+		idx := span.objIndex(val)
+		if span.isFree(idx) {
+			continue
+		}
+
+		// val points to an allocated object. Mark it.
+		obj := span.base() + idx*span.elemsize
+		greyobject(obj, b, i, span, gcw, idx)
+	}
+}
+
+// Shade the object if it isn't already.
+// The object is not nil and known to be in the heap.
+// Preemption must be disabled.
+//go:nowritebarrier
+func shade(b uintptr) {
+	if obj, span, objIndex := findObject(b, 0, 0); obj != 0 {
+		gcw := &getg().m.p.ptr().gcw
+		greyobject(obj, 0, 0, span, gcw, objIndex)
+	}
+}
+
+// obj is the start of an object with mark mbits.
+// If it isn't already marked, mark it and enqueue into gcw.
+// base and off are for debugging only and could be removed.
+//
+// See also wbBufFlush1, which partially duplicates this logic.
+//
+//go:nowritebarrierrec
+func greyobject(obj, base, off uintptr, span *mspan, gcw *gcWork, objIndex uintptr) {
+	// obj should be start of allocation, and so must be at least pointer-aligned.
+	if obj&(sys.PtrSize-1) != 0 {
+		throw("greyobject: obj not pointer-aligned")
+	}
+	mbits := span.markBitsForIndex(objIndex)
+
+	if useCheckmark {
+		if setCheckmark(obj, base, off, mbits) {
+			// Already marked.
+			return
+		}
+	} else {
+		if debug.gccheckmark > 0 && span.isFree(objIndex) {
+			print("runtime: marking free object ", hex(obj), " found at *(", hex(base), "+", hex(off), ")\n")
+			gcDumpObject("base", base, off)
+			gcDumpObject("obj", obj, ^uintptr(0))
+			getg().m.traceback = 2
+			throw("marking free object")
+		}
+
+		// If marked we have nothing to do.
+		if mbits.isMarked() {
+			return
+		}
+		mbits.setMarked()
+
+		// Mark span.
+		arena, pageIdx, pageMask := pageIndexOf(span.base())
+		if arena.pageMarks[pageIdx]&pageMask == 0 {
+			atomic.Or8(&arena.pageMarks[pageIdx], pageMask)
+		}
+
+		// If this is a noscan object, fast-track it to black
+		// instead of greying it.
+		if span.spanclass.noscan() {
+			gcw.bytesMarked += uint64(span.elemsize)
+			return
+		}
+	}
+
+	// Queue the obj for scanning. The PREFETCH(obj) logic has been removed but
+	// seems like a nice optimization that can be added back in.
+	// There needs to be time between the PREFETCH and the use.
+	// Previously we put the obj in an 8 element buffer that is drained at a rate
+	// to give the PREFETCH time to do its work.
+	// Use of PREFETCHNTA might be more appropriate than PREFETCH
+	if !gcw.putFast(obj) {
+		gcw.put(obj)
+	}
+}
+
+// gcDumpObject dumps the contents of obj for debugging and marks the
+// field at byte offset off in obj.
+func gcDumpObject(label string, obj, off uintptr) {
+	s := spanOf(obj)
+	print(label, "=", hex(obj))
+	if s == nil {
+		print(" s=nil\n")
+		return
+	}
+	print(" s.base()=", hex(s.base()), " s.limit=", hex(s.limit), " s.spanclass=", s.spanclass, " s.elemsize=", s.elemsize, " s.state=")
+	if state := s.state.get(); 0 <= state && int(state) < len(mSpanStateNames) {
+		print(mSpanStateNames[state], "\n")
+	} else {
+		print("unknown(", state, ")\n")
+	}
+
+	skipped := false
+	size := s.elemsize
+	if s.state.get() == mSpanManual && size == 0 {
+		// We're printing something from a stack frame. We
+		// don't know how big it is, so just show up to an
+		// including off.
+		size = off + sys.PtrSize
+	}
+	for i := uintptr(0); i < size; i += sys.PtrSize {
+		// For big objects, just print the beginning (because
+		// that usually hints at the object's type) and the
+		// fields around off.
+		if !(i < 128*sys.PtrSize || off-16*sys.PtrSize < i && i < off+16*sys.PtrSize) {
+			skipped = true
+			continue
+		}
+		if skipped {
+			print(" ...\n")
+			skipped = false
+		}
+		print(" *(", label, "+", i, ") = ", hex(*(*uintptr)(unsafe.Pointer(obj + i))))
+		if i == off {
+			print(" <==")
+		}
+		print("\n")
+	}
+	if skipped {
+		print(" ...\n")
+	}
+}
+
+// gcmarknewobject marks a newly allocated object black. obj must
+// not contain any non-nil pointers.
+//
+// This is nosplit so it can manipulate a gcWork without preemption.
+//
+//go:nowritebarrier
+//go:nosplit
+func gcmarknewobject(span *mspan, obj, size, scanSize uintptr) {
+	if useCheckmark { // The world should be stopped so this should not happen.
+		throw("gcmarknewobject called while doing checkmark")
+	}
+
+	// Mark object.
+	objIndex := span.objIndex(obj)
+	span.markBitsForIndex(objIndex).setMarked()
+
+	// Mark span.
+	arena, pageIdx, pageMask := pageIndexOf(span.base())
+	if arena.pageMarks[pageIdx]&pageMask == 0 {
+		atomic.Or8(&arena.pageMarks[pageIdx], pageMask)
+	}
+
+	gcw := &getg().m.p.ptr().gcw
+	gcw.bytesMarked += uint64(size)
+	gcw.scanWork += int64(scanSize)
+}
+
+// gcMarkTinyAllocs greys all active tiny alloc blocks.
+//
+// The world must be stopped.
+func gcMarkTinyAllocs() {
+	assertWorldStopped()
+
+	for _, p := range allp {
+		c := p.mcache
+		if c == nil || c.tiny == 0 {
+			continue
+		}
+		_, span, objIndex := findObject(c.tiny, 0, 0)
+		gcw := &p.gcw
+		greyobject(c.tiny, 0, 0, span, gcw, objIndex)
+	}
+}
diff --git a/src/runtime/mgcscavenge.go b/src/runtime/mgcscavenge.go
new file mode 100644
index 0000000..a7c5bc4
--- /dev/null
+++ b/src/runtime/mgcscavenge.go
@@ -0,0 +1,953 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Scavenging free pages.
+//
+// This file implements scavenging (the release of physical pages backing mapped
+// memory) of free and unused pages in the heap as a way to deal with page-level
+// fragmentation and reduce the RSS of Go applications.
+//
+// Scavenging in Go happens on two fronts: there's the background
+// (asynchronous) scavenger and the heap-growth (synchronous) scavenger.
+//
+// The former happens on a goroutine much like the background sweeper which is
+// soft-capped at using scavengePercent of the mutator's time, based on
+// order-of-magnitude estimates of the costs of scavenging. The background
+// scavenger's primary goal is to bring the estimated heap RSS of the
+// application down to a goal.
+//
+// That goal is defined as:
+//   (retainExtraPercent+100) / 100 * (next_gc / last_next_gc) * last_heap_inuse
+//
+// Essentially, we wish to have the application's RSS track the heap goal, but
+// the heap goal is defined in terms of bytes of objects, rather than pages like
+// RSS. As a result, we need to take into account for fragmentation internal to
+// spans. next_gc / last_next_gc defines the ratio between the current heap goal
+// and the last heap goal, which tells us by how much the heap is growing and
+// shrinking. We estimate what the heap will grow to in terms of pages by taking
+// this ratio and multiplying it by heap_inuse at the end of the last GC, which
+// allows us to account for this additional fragmentation. Note that this
+// procedure makes the assumption that the degree of fragmentation won't change
+// dramatically over the next GC cycle. Overestimating the amount of
+// fragmentation simply results in higher memory use, which will be accounted
+// for by the next pacing up date. Underestimating the fragmentation however
+// could lead to performance degradation. Handling this case is not within the
+// scope of the scavenger. Situations where the amount of fragmentation balloons
+// over the course of a single GC cycle should be considered pathologies,
+// flagged as bugs, and fixed appropriately.
+//
+// An additional factor of retainExtraPercent is added as a buffer to help ensure
+// that there's more unscavenged memory to allocate out of, since each allocation
+// out of scavenged memory incurs a potentially expensive page fault.
+//
+// The goal is updated after each GC and the scavenger's pacing parameters
+// (which live in mheap_) are updated to match. The pacing parameters work much
+// like the background sweeping parameters. The parameters define a line whose
+// horizontal axis is time and vertical axis is estimated heap RSS, and the
+// scavenger attempts to stay below that line at all times.
+//
+// The synchronous heap-growth scavenging happens whenever the heap grows in
+// size, for some definition of heap-growth. The intuition behind this is that
+// the application had to grow the heap because existing fragments were
+// not sufficiently large to satisfy a page-level memory allocation, so we
+// scavenge those fragments eagerly to offset the growth in RSS that results.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	// The background scavenger is paced according to these parameters.
+	//
+	// scavengePercent represents the portion of mutator time we're willing
+	// to spend on scavenging in percent.
+	scavengePercent = 1 // 1%
+
+	// retainExtraPercent represents the amount of memory over the heap goal
+	// that the scavenger should keep as a buffer space for the allocator.
+	//
+	// The purpose of maintaining this overhead is to have a greater pool of
+	// unscavenged memory available for allocation (since using scavenged memory
+	// incurs an additional cost), to account for heap fragmentation and
+	// the ever-changing layout of the heap.
+	retainExtraPercent = 10
+
+	// maxPagesPerPhysPage is the maximum number of supported runtime pages per
+	// physical page, based on maxPhysPageSize.
+	maxPagesPerPhysPage = maxPhysPageSize / pageSize
+
+	// scavengeCostRatio is the approximate ratio between the costs of using previously
+	// scavenged memory and scavenging memory.
+	//
+	// For most systems the cost of scavenging greatly outweighs the costs
+	// associated with using scavenged memory, making this constant 0. On other systems
+	// (especially ones where "sysUsed" is not just a no-op) this cost is non-trivial.
+	//
+	// This ratio is used as part of multiplicative factor to help the scavenger account
+	// for the additional costs of using scavenged memory in its pacing.
+	scavengeCostRatio = 0.7 * (sys.GoosDarwin + sys.GoosIos)
+
+	// scavengeReservationShards determines the amount of memory the scavenger
+	// should reserve for scavenging at a time. Specifically, the amount of
+	// memory reserved is (heap size in bytes) / scavengeReservationShards.
+	scavengeReservationShards = 64
+)
+
+// heapRetained returns an estimate of the current heap RSS.
+func heapRetained() uint64 {
+	return memstats.heap_sys.load() - atomic.Load64(&memstats.heap_released)
+}
+
+// gcPaceScavenger updates the scavenger's pacing, particularly
+// its rate and RSS goal.
+//
+// The RSS goal is based on the current heap goal with a small overhead
+// to accommodate non-determinism in the allocator.
+//
+// The pacing is based on scavengePageRate, which applies to both regular and
+// huge pages. See that constant for more information.
+//
+// mheap_.lock must be held or the world must be stopped.
+func gcPaceScavenger() {
+	// If we're called before the first GC completed, disable scavenging.
+	// We never scavenge before the 2nd GC cycle anyway (we don't have enough
+	// information about the heap yet) so this is fine, and avoids a fault
+	// or garbage data later.
+	if memstats.last_next_gc == 0 {
+		mheap_.scavengeGoal = ^uint64(0)
+		return
+	}
+	// Compute our scavenging goal.
+	goalRatio := float64(atomic.Load64(&memstats.next_gc)) / float64(memstats.last_next_gc)
+	retainedGoal := uint64(float64(memstats.last_heap_inuse) * goalRatio)
+	// Add retainExtraPercent overhead to retainedGoal. This calculation
+	// looks strange but the purpose is to arrive at an integer division
+	// (e.g. if retainExtraPercent = 12.5, then we get a divisor of 8)
+	// that also avoids the overflow from a multiplication.
+	retainedGoal += retainedGoal / (1.0 / (retainExtraPercent / 100.0))
+	// Align it to a physical page boundary to make the following calculations
+	// a bit more exact.
+	retainedGoal = (retainedGoal + uint64(physPageSize) - 1) &^ (uint64(physPageSize) - 1)
+
+	// Represents where we are now in the heap's contribution to RSS in bytes.
+	//
+	// Guaranteed to always be a multiple of physPageSize on systems where
+	// physPageSize <= pageSize since we map heap_sys at a rate larger than
+	// any physPageSize and released memory in multiples of the physPageSize.
+	//
+	// However, certain functions recategorize heap_sys as other stats (e.g.
+	// stack_sys) and this happens in multiples of pageSize, so on systems
+	// where physPageSize > pageSize the calculations below will not be exact.
+	// Generally this is OK since we'll be off by at most one regular
+	// physical page.
+	retainedNow := heapRetained()
+
+	// If we're already below our goal, or within one page of our goal, then disable
+	// the background scavenger. We disable the background scavenger if there's
+	// less than one physical page of work to do because it's not worth it.
+	if retainedNow <= retainedGoal || retainedNow-retainedGoal < uint64(physPageSize) {
+		mheap_.scavengeGoal = ^uint64(0)
+		return
+	}
+	mheap_.scavengeGoal = retainedGoal
+}
+
+// Sleep/wait state of the background scavenger.
+var scavenge struct {
+	lock       mutex
+	g          *g
+	parked     bool
+	timer      *timer
+	sysmonWake uint32 // Set atomically.
+}
+
+// readyForScavenger signals sysmon to wake the scavenger because
+// there may be new work to do.
+//
+// There may be a significant delay between when this function runs
+// and when the scavenger is kicked awake, but it may be safely invoked
+// in contexts where wakeScavenger is unsafe to call directly.
+func readyForScavenger() {
+	atomic.Store(&scavenge.sysmonWake, 1)
+}
+
+// wakeScavenger immediately unparks the scavenger if necessary.
+//
+// May run without a P, but it may allocate, so it must not be called
+// on any allocation path.
+//
+// mheap_.lock, scavenge.lock, and sched.lock must not be held.
+func wakeScavenger() {
+	lock(&scavenge.lock)
+	if scavenge.parked {
+		// Notify sysmon that it shouldn't bother waking up the scavenger.
+		atomic.Store(&scavenge.sysmonWake, 0)
+
+		// Try to stop the timer but we don't really care if we succeed.
+		// It's possible that either a timer was never started, or that
+		// we're racing with it.
+		// In the case that we're racing with there's the low chance that
+		// we experience a spurious wake-up of the scavenger, but that's
+		// totally safe.
+		stopTimer(scavenge.timer)
+
+		// Unpark the goroutine and tell it that there may have been a pacing
+		// change. Note that we skip the scheduler's runnext slot because we
+		// want to avoid having the scavenger interfere with the fair
+		// scheduling of user goroutines. In effect, this schedules the
+		// scavenger at a "lower priority" but that's OK because it'll
+		// catch up on the work it missed when it does get scheduled.
+		scavenge.parked = false
+
+		// Ready the goroutine by injecting it. We use injectglist instead
+		// of ready or goready in order to allow us to run this function
+		// without a P. injectglist also avoids placing the goroutine in
+		// the current P's runnext slot, which is desireable to prevent
+		// the scavenger from interfering with user goroutine scheduling
+		// too much.
+		var list gList
+		list.push(scavenge.g)
+		injectglist(&list)
+	}
+	unlock(&scavenge.lock)
+}
+
+// scavengeSleep attempts to put the scavenger to sleep for ns.
+//
+// Note that this function should only be called by the scavenger.
+//
+// The scavenger may be woken up earlier by a pacing change, and it may not go
+// to sleep at all if there's a pending pacing change.
+//
+// Returns the amount of time actually slept.
+func scavengeSleep(ns int64) int64 {
+	lock(&scavenge.lock)
+
+	// Set the timer.
+	//
+	// This must happen here instead of inside gopark
+	// because we can't close over any variables without
+	// failing escape analysis.
+	start := nanotime()
+	resetTimer(scavenge.timer, start+ns)
+
+	// Mark ourself as asleep and go to sleep.
+	scavenge.parked = true
+	goparkunlock(&scavenge.lock, waitReasonSleep, traceEvGoSleep, 2)
+
+	// Return how long we actually slept for.
+	return nanotime() - start
+}
+
+// Background scavenger.
+//
+// The background scavenger maintains the RSS of the application below
+// the line described by the proportional scavenging statistics in
+// the mheap struct.
+func bgscavenge(c chan int) {
+	scavenge.g = getg()
+
+	lockInit(&scavenge.lock, lockRankScavenge)
+	lock(&scavenge.lock)
+	scavenge.parked = true
+
+	scavenge.timer = new(timer)
+	scavenge.timer.f = func(_ interface{}, _ uintptr) {
+		wakeScavenger()
+	}
+
+	c <- 1
+	goparkunlock(&scavenge.lock, waitReasonGCScavengeWait, traceEvGoBlock, 1)
+
+	// Exponentially-weighted moving average of the fraction of time this
+	// goroutine spends scavenging (that is, percent of a single CPU).
+	// It represents a measure of scheduling overheads which might extend
+	// the sleep or the critical time beyond what's expected. Assume no
+	// overhead to begin with.
+	//
+	// TODO(mknyszek): Consider making this based on total CPU time of the
+	// application (i.e. scavengePercent * GOMAXPROCS). This isn't really
+	// feasible now because the scavenger acquires the heap lock over the
+	// scavenging operation, which means scavenging effectively blocks
+	// allocators and isn't scalable. However, given a scalable allocator,
+	// it makes sense to also make the scavenger scale with it; if you're
+	// allocating more frequently, then presumably you're also generating
+	// more work for the scavenger.
+	const idealFraction = scavengePercent / 100.0
+	scavengeEWMA := float64(idealFraction)
+
+	for {
+		released := uintptr(0)
+
+		// Time in scavenging critical section.
+		crit := float64(0)
+
+		// Run on the system stack since we grab the heap lock,
+		// and a stack growth with the heap lock means a deadlock.
+		systemstack(func() {
+			lock(&mheap_.lock)
+
+			// If background scavenging is disabled or if there's no work to do just park.
+			retained, goal := heapRetained(), mheap_.scavengeGoal
+			if retained <= goal {
+				unlock(&mheap_.lock)
+				return
+			}
+
+			// Scavenge one page, and measure the amount of time spent scavenging.
+			start := nanotime()
+			released = mheap_.pages.scavenge(physPageSize, true)
+			mheap_.pages.scav.released += released
+			crit = float64(nanotime() - start)
+
+			unlock(&mheap_.lock)
+		})
+
+		if released == 0 {
+			lock(&scavenge.lock)
+			scavenge.parked = true
+			goparkunlock(&scavenge.lock, waitReasonGCScavengeWait, traceEvGoBlock, 1)
+			continue
+		}
+
+		if released < physPageSize {
+			// If this happens, it means that we may have attempted to release part
+			// of a physical page, but the likely effect of that is that it released
+			// the whole physical page, some of which may have still been in-use.
+			// This could lead to memory corruption. Throw.
+			throw("released less than one physical page of memory")
+		}
+
+		// On some platforms we may see crit as zero if the time it takes to scavenge
+		// memory is less than the minimum granularity of its clock (e.g. Windows).
+		// In this case, just assume scavenging takes 10 µs per regular physical page
+		// (determined empirically), and conservatively ignore the impact of huge pages
+		// on timing.
+		//
+		// We shouldn't ever see a crit value less than zero unless there's a bug of
+		// some kind, either on our side or in the platform we're running on, but be
+		// defensive in that case as well.
+		const approxCritNSPerPhysicalPage = 10e3
+		if crit <= 0 {
+			crit = approxCritNSPerPhysicalPage * float64(released/physPageSize)
+		}
+
+		// Multiply the critical time by 1 + the ratio of the costs of using
+		// scavenged memory vs. scavenging memory. This forces us to pay down
+		// the cost of reusing this memory eagerly by sleeping for a longer period
+		// of time and scavenging less frequently. More concretely, we avoid situations
+		// where we end up scavenging so often that we hurt allocation performance
+		// because of the additional overheads of using scavenged memory.
+		crit *= 1 + scavengeCostRatio
+
+		// If we spent more than 10 ms (for example, if the OS scheduled us away, or someone
+		// put their machine to sleep) in the critical section, bound the time we use to
+		// calculate at 10 ms to avoid letting the sleep time get arbitrarily high.
+		const maxCrit = 10e6
+		if crit > maxCrit {
+			crit = maxCrit
+		}
+
+		// Compute the amount of time to sleep, assuming we want to use at most
+		// scavengePercent of CPU time. Take into account scheduling overheads
+		// that may extend the length of our sleep by multiplying by how far
+		// off we are from the ideal ratio. For example, if we're sleeping too
+		// much, then scavengeEMWA < idealFraction, so we'll adjust the sleep time
+		// down.
+		adjust := scavengeEWMA / idealFraction
+		sleepTime := int64(adjust * crit / (scavengePercent / 100.0))
+
+		// Go to sleep.
+		slept := scavengeSleep(sleepTime)
+
+		// Compute the new ratio.
+		fraction := crit / (crit + float64(slept))
+
+		// Set a lower bound on the fraction.
+		// Due to OS-related anomalies we may "sleep" for an inordinate amount
+		// of time. Let's avoid letting the ratio get out of hand by bounding
+		// the sleep time we use in our EWMA.
+		const minFraction = 1 / 1000
+		if fraction < minFraction {
+			fraction = minFraction
+		}
+
+		// Update scavengeEWMA by merging in the new crit/slept ratio.
+		const alpha = 0.5
+		scavengeEWMA = alpha*fraction + (1-alpha)*scavengeEWMA
+	}
+}
+
+// scavenge scavenges nbytes worth of free pages, starting with the
+// highest address first. Successive calls continue from where it left
+// off until the heap is exhausted. Call scavengeStartGen to bring it
+// back to the top of the heap.
+//
+// Returns the amount of memory scavenged in bytes.
+//
+// p.mheapLock must be held, but may be temporarily released if
+// mayUnlock == true.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) scavenge(nbytes uintptr, mayUnlock bool) uintptr {
+	assertLockHeld(p.mheapLock)
+
+	var (
+		addrs addrRange
+		gen   uint32
+	)
+	released := uintptr(0)
+	for released < nbytes {
+		if addrs.size() == 0 {
+			if addrs, gen = p.scavengeReserve(); addrs.size() == 0 {
+				break
+			}
+		}
+		r, a := p.scavengeOne(addrs, nbytes-released, mayUnlock)
+		released += r
+		addrs = a
+	}
+	// Only unreserve the space which hasn't been scavenged or searched
+	// to ensure we always make progress.
+	p.scavengeUnreserve(addrs, gen)
+	return released
+}
+
+// printScavTrace prints a scavenge trace line to standard error.
+//
+// released should be the amount of memory released since the last time this
+// was called, and forced indicates whether the scavenge was forced by the
+// application.
+func printScavTrace(gen uint32, released uintptr, forced bool) {
+	printlock()
+	print("scav ", gen, " ",
+		released>>10, " KiB work, ",
+		atomic.Load64(&memstats.heap_released)>>10, " KiB total, ",
+		(atomic.Load64(&memstats.heap_inuse)*100)/heapRetained(), "% util",
+	)
+	if forced {
+		print(" (forced)")
+	}
+	println()
+	printunlock()
+}
+
+// scavengeStartGen starts a new scavenge generation, resetting
+// the scavenger's search space to the full in-use address space.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) scavengeStartGen() {
+	assertLockHeld(p.mheapLock)
+
+	if debug.scavtrace > 0 {
+		printScavTrace(p.scav.gen, p.scav.released, false)
+	}
+	p.inUse.cloneInto(&p.scav.inUse)
+
+	// Pick the new starting address for the scavenger cycle.
+	var startAddr offAddr
+	if p.scav.scavLWM.lessThan(p.scav.freeHWM) {
+		// The "free" high watermark exceeds the "scavenged" low watermark,
+		// so there are free scavengable pages in parts of the address space
+		// that the scavenger already searched, the high watermark being the
+		// highest one. Pick that as our new starting point to ensure we
+		// see those pages.
+		startAddr = p.scav.freeHWM
+	} else {
+		// The "free" high watermark does not exceed the "scavenged" low
+		// watermark. This means the allocator didn't free any memory in
+		// the range we scavenged last cycle, so we might as well continue
+		// scavenging from where we were.
+		startAddr = p.scav.scavLWM
+	}
+	p.scav.inUse.removeGreaterEqual(startAddr.addr())
+
+	// reservationBytes may be zero if p.inUse.totalBytes is small, or if
+	// scavengeReservationShards is large. This case is fine as the scavenger
+	// will simply be turned off, but it does mean that scavengeReservationShards,
+	// in concert with pallocChunkBytes, dictates the minimum heap size at which
+	// the scavenger triggers. In practice this minimum is generally less than an
+	// arena in size, so virtually every heap has the scavenger on.
+	p.scav.reservationBytes = alignUp(p.inUse.totalBytes, pallocChunkBytes) / scavengeReservationShards
+	p.scav.gen++
+	p.scav.released = 0
+	p.scav.freeHWM = minOffAddr
+	p.scav.scavLWM = maxOffAddr
+}
+
+// scavengeReserve reserves a contiguous range of the address space
+// for scavenging. The maximum amount of space it reserves is proportional
+// to the size of the heap. The ranges are reserved from the high addresses
+// first.
+//
+// Returns the reserved range and the scavenge generation number for it.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) scavengeReserve() (addrRange, uint32) {
+	assertLockHeld(p.mheapLock)
+
+	// Start by reserving the minimum.
+	r := p.scav.inUse.removeLast(p.scav.reservationBytes)
+
+	// Return early if the size is zero; we don't want to use
+	// the bogus address below.
+	if r.size() == 0 {
+		return r, p.scav.gen
+	}
+
+	// The scavenger requires that base be aligned to a
+	// palloc chunk because that's the unit of operation for
+	// the scavenger, so align down, potentially extending
+	// the range.
+	newBase := alignDown(r.base.addr(), pallocChunkBytes)
+
+	// Remove from inUse however much extra we just pulled out.
+	p.scav.inUse.removeGreaterEqual(newBase)
+	r.base = offAddr{newBase}
+	return r, p.scav.gen
+}
+
+// scavengeUnreserve returns an unscavenged portion of a range that was
+// previously reserved with scavengeReserve.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) scavengeUnreserve(r addrRange, gen uint32) {
+	assertLockHeld(p.mheapLock)
+
+	if r.size() == 0 || gen != p.scav.gen {
+		return
+	}
+	if r.base.addr()%pallocChunkBytes != 0 {
+		throw("unreserving unaligned region")
+	}
+	p.scav.inUse.add(r)
+}
+
+// scavengeOne walks over address range work until it finds
+// a contiguous run of pages to scavenge. It will try to scavenge
+// at most max bytes at once, but may scavenge more to avoid
+// breaking huge pages. Once it scavenges some memory it returns
+// how much it scavenged in bytes.
+//
+// Returns the number of bytes scavenged and the part of work
+// which was not yet searched.
+//
+// work's base address must be aligned to pallocChunkBytes.
+//
+// p.mheapLock must be held, but may be temporarily released if
+// mayUnlock == true.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) scavengeOne(work addrRange, max uintptr, mayUnlock bool) (uintptr, addrRange) {
+	assertLockHeld(p.mheapLock)
+
+	// Defensively check if we've received an empty address range.
+	// If so, just return.
+	if work.size() == 0 {
+		// Nothing to do.
+		return 0, work
+	}
+	// Check the prerequisites of work.
+	if work.base.addr()%pallocChunkBytes != 0 {
+		throw("scavengeOne called with unaligned work region")
+	}
+	// Calculate the maximum number of pages to scavenge.
+	//
+	// This should be alignUp(max, pageSize) / pageSize but max can and will
+	// be ^uintptr(0), so we need to be very careful not to overflow here.
+	// Rather than use alignUp, calculate the number of pages rounded down
+	// first, then add back one if necessary.
+	maxPages := max / pageSize
+	if max%pageSize != 0 {
+		maxPages++
+	}
+
+	// Calculate the minimum number of pages we can scavenge.
+	//
+	// Because we can only scavenge whole physical pages, we must
+	// ensure that we scavenge at least minPages each time, aligned
+	// to minPages*pageSize.
+	minPages := physPageSize / pageSize
+	if minPages < 1 {
+		minPages = 1
+	}
+
+	// Helpers for locking and unlocking only if mayUnlock == true.
+	lockHeap := func() {
+		if mayUnlock {
+			lock(p.mheapLock)
+		}
+	}
+	unlockHeap := func() {
+		if mayUnlock {
+			unlock(p.mheapLock)
+		}
+	}
+
+	// Fast path: check the chunk containing the top-most address in work,
+	// starting at that address's page index in the chunk.
+	//
+	// Note that work.end() is exclusive, so get the chunk we care about
+	// by subtracting 1.
+	maxAddr := work.limit.addr() - 1
+	maxChunk := chunkIndex(maxAddr)
+	if p.summary[len(p.summary)-1][maxChunk].max() >= uint(minPages) {
+		// We only bother looking for a candidate if there at least
+		// minPages free pages at all.
+		base, npages := p.chunkOf(maxChunk).findScavengeCandidate(chunkPageIndex(maxAddr), minPages, maxPages)
+
+		// If we found something, scavenge it and return!
+		if npages != 0 {
+			work.limit = offAddr{p.scavengeRangeLocked(maxChunk, base, npages)}
+
+			assertLockHeld(p.mheapLock) // Must be locked on return.
+			return uintptr(npages) * pageSize, work
+		}
+	}
+	// Update the limit to reflect the fact that we checked maxChunk already.
+	work.limit = offAddr{chunkBase(maxChunk)}
+
+	// findCandidate finds the next scavenge candidate in work optimistically.
+	//
+	// Returns the candidate chunk index and true on success, and false on failure.
+	//
+	// The heap need not be locked.
+	findCandidate := func(work addrRange) (chunkIdx, bool) {
+		// Iterate over this work's chunks.
+		for i := chunkIndex(work.limit.addr() - 1); i >= chunkIndex(work.base.addr()); i-- {
+			// If this chunk is totally in-use or has no unscavenged pages, don't bother
+			// doing a more sophisticated check.
+			//
+			// Note we're accessing the summary and the chunks without a lock, but
+			// that's fine. We're being optimistic anyway.
+
+			// Check quickly if there are enough free pages at all.
+			if p.summary[len(p.summary)-1][i].max() < uint(minPages) {
+				continue
+			}
+
+			// Run over the chunk looking harder for a candidate. Again, we could
+			// race with a lot of different pieces of code, but we're just being
+			// optimistic. Make sure we load the l2 pointer atomically though, to
+			// avoid races with heap growth. It may or may not be possible to also
+			// see a nil pointer in this case if we do race with heap growth, but
+			// just defensively ignore the nils. This operation is optimistic anyway.
+			l2 := (*[1 << pallocChunksL2Bits]pallocData)(atomic.Loadp(unsafe.Pointer(&p.chunks[i.l1()])))
+			if l2 != nil && l2[i.l2()].hasScavengeCandidate(minPages) {
+				return i, true
+			}
+		}
+		return 0, false
+	}
+
+	// Slow path: iterate optimistically over the in-use address space
+	// looking for any free and unscavenged page. If we think we see something,
+	// lock and verify it!
+	for work.size() != 0 {
+		unlockHeap()
+
+		// Search for the candidate.
+		candidateChunkIdx, ok := findCandidate(work)
+
+		// Lock the heap. We need to do this now if we found a candidate or not.
+		// If we did, we'll verify it. If not, we need to lock before returning
+		// anyway.
+		lockHeap()
+
+		if !ok {
+			// We didn't find a candidate, so we're done.
+			work.limit = work.base
+			break
+		}
+
+		// Find, verify, and scavenge if we can.
+		chunk := p.chunkOf(candidateChunkIdx)
+		base, npages := chunk.findScavengeCandidate(pallocChunkPages-1, minPages, maxPages)
+		if npages > 0 {
+			work.limit = offAddr{p.scavengeRangeLocked(candidateChunkIdx, base, npages)}
+
+			assertLockHeld(p.mheapLock) // Must be locked on return.
+			return uintptr(npages) * pageSize, work
+		}
+
+		// We were fooled, so let's continue from where we left off.
+		work.limit = offAddr{chunkBase(candidateChunkIdx)}
+	}
+
+	assertLockHeld(p.mheapLock) // Must be locked on return.
+	return 0, work
+}
+
+// scavengeRangeLocked scavenges the given region of memory.
+// The region of memory is described by its chunk index (ci),
+// the starting page index of the region relative to that
+// chunk (base), and the length of the region in pages (npages).
+//
+// Returns the base address of the scavenged region.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) scavengeRangeLocked(ci chunkIdx, base, npages uint) uintptr {
+	assertLockHeld(p.mheapLock)
+
+	p.chunkOf(ci).scavenged.setRange(base, npages)
+
+	// Compute the full address for the start of the range.
+	addr := chunkBase(ci) + uintptr(base)*pageSize
+
+	// Update the scavenge low watermark.
+	if oAddr := (offAddr{addr}); oAddr.lessThan(p.scav.scavLWM) {
+		p.scav.scavLWM = oAddr
+	}
+
+	// Only perform the actual scavenging if we're not in a test.
+	// It's dangerous to do so otherwise.
+	if p.test {
+		return addr
+	}
+	sysUnused(unsafe.Pointer(addr), uintptr(npages)*pageSize)
+
+	// Update global accounting only when not in test, otherwise
+	// the runtime's accounting will be wrong.
+	nbytes := int64(npages) * pageSize
+	atomic.Xadd64(&memstats.heap_released, nbytes)
+
+	// Update consistent accounting too.
+	stats := memstats.heapStats.acquire()
+	atomic.Xaddint64(&stats.committed, -nbytes)
+	atomic.Xaddint64(&stats.released, nbytes)
+	memstats.heapStats.release()
+
+	return addr
+}
+
+// fillAligned returns x but with all zeroes in m-aligned
+// groups of m bits set to 1 if any bit in the group is non-zero.
+//
+// For example, fillAligned(0x0100a3, 8) == 0xff00ff.
+//
+// Note that if m == 1, this is a no-op.
+//
+// m must be a power of 2 <= maxPagesPerPhysPage.
+func fillAligned(x uint64, m uint) uint64 {
+	apply := func(x uint64, c uint64) uint64 {
+		// The technique used it here is derived from
+		// https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
+		// and extended for more than just bytes (like nibbles
+		// and uint16s) by using an appropriate constant.
+		//
+		// To summarize the technique, quoting from that page:
+		// "[It] works by first zeroing the high bits of the [8]
+		// bytes in the word. Subsequently, it adds a number that
+		// will result in an overflow to the high bit of a byte if
+		// any of the low bits were initially set. Next the high
+		// bits of the original word are ORed with these values;
+		// thus, the high bit of a byte is set iff any bit in the
+		// byte was set. Finally, we determine if any of these high
+		// bits are zero by ORing with ones everywhere except the
+		// high bits and inverting the result."
+		return ^((((x & c) + c) | x) | c)
+	}
+	// Transform x to contain a 1 bit at the top of each m-aligned
+	// group of m zero bits.
+	switch m {
+	case 1:
+		return x
+	case 2:
+		x = apply(x, 0x5555555555555555)
+	case 4:
+		x = apply(x, 0x7777777777777777)
+	case 8:
+		x = apply(x, 0x7f7f7f7f7f7f7f7f)
+	case 16:
+		x = apply(x, 0x7fff7fff7fff7fff)
+	case 32:
+		x = apply(x, 0x7fffffff7fffffff)
+	case 64: // == maxPagesPerPhysPage
+		x = apply(x, 0x7fffffffffffffff)
+	default:
+		throw("bad m value")
+	}
+	// Now, the top bit of each m-aligned group in x is set
+	// that group was all zero in the original x.
+
+	// From each group of m bits subtract 1.
+	// Because we know only the top bits of each
+	// m-aligned group are set, we know this will
+	// set each group to have all the bits set except
+	// the top bit, so just OR with the original
+	// result to set all the bits.
+	return ^((x - (x >> (m - 1))) | x)
+}
+
+// hasScavengeCandidate returns true if there's any min-page-aligned groups of
+// min pages of free-and-unscavenged memory in the region represented by this
+// pallocData.
+//
+// min must be a non-zero power of 2 <= maxPagesPerPhysPage.
+func (m *pallocData) hasScavengeCandidate(min uintptr) bool {
+	if min&(min-1) != 0 || min == 0 {
+		print("runtime: min = ", min, "\n")
+		throw("min must be a non-zero power of 2")
+	} else if min > maxPagesPerPhysPage {
+		print("runtime: min = ", min, "\n")
+		throw("min too large")
+	}
+
+	// The goal of this search is to see if the chunk contains any free and unscavenged memory.
+	for i := len(m.scavenged) - 1; i >= 0; i-- {
+		// 1s are scavenged OR non-free => 0s are unscavenged AND free
+		//
+		// TODO(mknyszek): Consider splitting up fillAligned into two
+		// functions, since here we technically could get by with just
+		// the first half of its computation. It'll save a few instructions
+		// but adds some additional code complexity.
+		x := fillAligned(m.scavenged[i]|m.pallocBits[i], uint(min))
+
+		// Quickly skip over chunks of non-free or scavenged pages.
+		if x != ^uint64(0) {
+			return true
+		}
+	}
+	return false
+}
+
+// findScavengeCandidate returns a start index and a size for this pallocData
+// segment which represents a contiguous region of free and unscavenged memory.
+//
+// searchIdx indicates the page index within this chunk to start the search, but
+// note that findScavengeCandidate searches backwards through the pallocData. As a
+// a result, it will return the highest scavenge candidate in address order.
+//
+// min indicates a hard minimum size and alignment for runs of pages. That is,
+// findScavengeCandidate will not return a region smaller than min pages in size,
+// or that is min pages or greater in size but not aligned to min. min must be
+// a non-zero power of 2 <= maxPagesPerPhysPage.
+//
+// max is a hint for how big of a region is desired. If max >= pallocChunkPages, then
+// findScavengeCandidate effectively returns entire free and unscavenged regions.
+// If max < pallocChunkPages, it may truncate the returned region such that size is
+// max. However, findScavengeCandidate may still return a larger region if, for
+// example, it chooses to preserve huge pages, or if max is not aligned to min (it
+// will round up). That is, even if max is small, the returned size is not guaranteed
+// to be equal to max. max is allowed to be less than min, in which case it is as if
+// max == min.
+func (m *pallocData) findScavengeCandidate(searchIdx uint, min, max uintptr) (uint, uint) {
+	if min&(min-1) != 0 || min == 0 {
+		print("runtime: min = ", min, "\n")
+		throw("min must be a non-zero power of 2")
+	} else if min > maxPagesPerPhysPage {
+		print("runtime: min = ", min, "\n")
+		throw("min too large")
+	}
+	// max may not be min-aligned, so we might accidentally truncate to
+	// a max value which causes us to return a non-min-aligned value.
+	// To prevent this, align max up to a multiple of min (which is always
+	// a power of 2). This also prevents max from ever being less than
+	// min, unless it's zero, so handle that explicitly.
+	if max == 0 {
+		max = min
+	} else {
+		max = alignUp(max, min)
+	}
+
+	i := int(searchIdx / 64)
+	// Start by quickly skipping over blocks of non-free or scavenged pages.
+	for ; i >= 0; i-- {
+		// 1s are scavenged OR non-free => 0s are unscavenged AND free
+		x := fillAligned(m.scavenged[i]|m.pallocBits[i], uint(min))
+		if x != ^uint64(0) {
+			break
+		}
+	}
+	if i < 0 {
+		// Failed to find any free/unscavenged pages.
+		return 0, 0
+	}
+	// We have something in the 64-bit chunk at i, but it could
+	// extend further. Loop until we find the extent of it.
+
+	// 1s are scavenged OR non-free => 0s are unscavenged AND free
+	x := fillAligned(m.scavenged[i]|m.pallocBits[i], uint(min))
+	z1 := uint(sys.LeadingZeros64(^x))
+	run, end := uint(0), uint(i)*64+(64-z1)
+	if x<<z1 != 0 {
+		// After shifting out z1 bits, we still have 1s,
+		// so the run ends inside this word.
+		run = uint(sys.LeadingZeros64(x << z1))
+	} else {
+		// After shifting out z1 bits, we have no more 1s.
+		// This means the run extends to the bottom of the
+		// word so it may extend into further words.
+		run = 64 - z1
+		for j := i - 1; j >= 0; j-- {
+			x := fillAligned(m.scavenged[j]|m.pallocBits[j], uint(min))
+			run += uint(sys.LeadingZeros64(x))
+			if x != 0 {
+				// The run stopped in this word.
+				break
+			}
+		}
+	}
+
+	// Split the run we found if it's larger than max but hold on to
+	// our original length, since we may need it later.
+	size := run
+	if size > uint(max) {
+		size = uint(max)
+	}
+	start := end - size
+
+	// Each huge page is guaranteed to fit in a single palloc chunk.
+	//
+	// TODO(mknyszek): Support larger huge page sizes.
+	// TODO(mknyszek): Consider taking pages-per-huge-page as a parameter
+	// so we can write tests for this.
+	if physHugePageSize > pageSize && physHugePageSize > physPageSize {
+		// We have huge pages, so let's ensure we don't break one by scavenging
+		// over a huge page boundary. If the range [start, start+size) overlaps with
+		// a free-and-unscavenged huge page, we want to grow the region we scavenge
+		// to include that huge page.
+
+		// Compute the huge page boundary above our candidate.
+		pagesPerHugePage := uintptr(physHugePageSize / pageSize)
+		hugePageAbove := uint(alignUp(uintptr(start), pagesPerHugePage))
+
+		// If that boundary is within our current candidate, then we may be breaking
+		// a huge page.
+		if hugePageAbove <= end {
+			// Compute the huge page boundary below our candidate.
+			hugePageBelow := uint(alignDown(uintptr(start), pagesPerHugePage))
+
+			if hugePageBelow >= end-run {
+				// We're in danger of breaking apart a huge page since start+size crosses
+				// a huge page boundary and rounding down start to the nearest huge
+				// page boundary is included in the full run we found. Include the entire
+				// huge page in the bound by rounding down to the huge page size.
+				size = size + (start - hugePageBelow)
+				start = hugePageBelow
+			}
+		}
+	}
+	return start, size
+}
diff --git a/src/runtime/mgcscavenge_test.go b/src/runtime/mgcscavenge_test.go
new file mode 100644
index 0000000..2503430
--- /dev/null
+++ b/src/runtime/mgcscavenge_test.go
@@ -0,0 +1,460 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"math/rand"
+	. "runtime"
+	"testing"
+)
+
+// makePallocData produces an initialized PallocData by setting
+// the ranges of described in alloc and scavenge.
+func makePallocData(alloc, scavenged []BitRange) *PallocData {
+	b := new(PallocData)
+	for _, v := range alloc {
+		if v.N == 0 {
+			// Skip N==0. It's harmless and allocRange doesn't
+			// handle this case.
+			continue
+		}
+		b.AllocRange(v.I, v.N)
+	}
+	for _, v := range scavenged {
+		if v.N == 0 {
+			// See the previous loop.
+			continue
+		}
+		b.ScavengedSetRange(v.I, v.N)
+	}
+	return b
+}
+
+func TestFillAligned(t *testing.T) {
+	fillAlignedSlow := func(x uint64, m uint) uint64 {
+		if m == 1 {
+			return x
+		}
+		out := uint64(0)
+		for i := uint(0); i < 64; i += m {
+			for j := uint(0); j < m; j++ {
+				if x&(uint64(1)<<(i+j)) != 0 {
+					out |= ((uint64(1) << m) - 1) << i
+					break
+				}
+			}
+		}
+		return out
+	}
+	check := func(x uint64, m uint) {
+		want := fillAlignedSlow(x, m)
+		if got := FillAligned(x, m); got != want {
+			t.Logf("got:  %064b", got)
+			t.Logf("want: %064b", want)
+			t.Errorf("bad fillAligned(%016x, %d)", x, m)
+		}
+	}
+	for m := uint(1); m <= 64; m *= 2 {
+		tests := []uint64{
+			0x0000000000000000,
+			0x00000000ffffffff,
+			0xffffffff00000000,
+			0x8000000000000001,
+			0xf00000000000000f,
+			0xf00000010050000f,
+			0xffffffffffffffff,
+			0x0000000000000001,
+			0x0000000000000002,
+			0x0000000000000008,
+			uint64(1) << (m - 1),
+			uint64(1) << m,
+			// Try a few fixed arbitrary examples.
+			0xb02b9effcf137016,
+			0x3975a076a9fbff18,
+			0x0f8c88ec3b81506e,
+			0x60f14d80ef2fa0e6,
+		}
+		for _, test := range tests {
+			check(test, m)
+		}
+		for i := 0; i < 1000; i++ {
+			// Try a pseudo-random numbers.
+			check(rand.Uint64(), m)
+
+			if m > 1 {
+				// For m != 1, let's construct a slightly more interesting
+				// random test. Generate a bitmap which is either 0 or
+				// randomly set bits for each m-aligned group of m bits.
+				val := uint64(0)
+				for n := uint(0); n < 64; n += m {
+					// For each group of m bits, flip a coin:
+					// * Leave them as zero.
+					// * Set them randomly.
+					if rand.Uint64()%2 == 0 {
+						val |= (rand.Uint64() & ((1 << m) - 1)) << n
+					}
+				}
+				check(val, m)
+			}
+		}
+	}
+}
+
+func TestPallocDataFindScavengeCandidate(t *testing.T) {
+	type test struct {
+		alloc, scavenged []BitRange
+		min, max         uintptr
+		want             BitRange
+	}
+	tests := map[string]test{
+		"MixedMin1": {
+			alloc:     []BitRange{{0, 40}, {42, PallocChunkPages - 42}},
+			scavenged: []BitRange{{0, 41}, {42, PallocChunkPages - 42}},
+			min:       1,
+			max:       PallocChunkPages,
+			want:      BitRange{41, 1},
+		},
+		"MultiMin1": {
+			alloc:     []BitRange{{0, 63}, {65, 20}, {87, PallocChunkPages - 87}},
+			scavenged: []BitRange{{86, 1}},
+			min:       1,
+			max:       PallocChunkPages,
+			want:      BitRange{85, 1},
+		},
+	}
+	// Try out different page minimums.
+	for m := uintptr(1); m <= 64; m *= 2 {
+		suffix := fmt.Sprintf("Min%d", m)
+		tests["AllFree"+suffix] = test{
+			min:  m,
+			max:  PallocChunkPages,
+			want: BitRange{0, PallocChunkPages},
+		}
+		tests["AllScavenged"+suffix] = test{
+			scavenged: []BitRange{{0, PallocChunkPages}},
+			min:       m,
+			max:       PallocChunkPages,
+			want:      BitRange{0, 0},
+		}
+		tests["NoneFree"+suffix] = test{
+			alloc:     []BitRange{{0, PallocChunkPages}},
+			scavenged: []BitRange{{PallocChunkPages / 2, PallocChunkPages / 2}},
+			min:       m,
+			max:       PallocChunkPages,
+			want:      BitRange{0, 0},
+		}
+		tests["StartFree"+suffix] = test{
+			alloc: []BitRange{{uint(m), PallocChunkPages - uint(m)}},
+			min:   m,
+			max:   PallocChunkPages,
+			want:  BitRange{0, uint(m)},
+		}
+		tests["StartFree"+suffix] = test{
+			alloc: []BitRange{{uint(m), PallocChunkPages - uint(m)}},
+			min:   m,
+			max:   PallocChunkPages,
+			want:  BitRange{0, uint(m)},
+		}
+		tests["EndFree"+suffix] = test{
+			alloc: []BitRange{{0, PallocChunkPages - uint(m)}},
+			min:   m,
+			max:   PallocChunkPages,
+			want:  BitRange{PallocChunkPages - uint(m), uint(m)},
+		}
+		tests["Straddle64"+suffix] = test{
+			alloc: []BitRange{{0, 64 - uint(m)}, {64 + uint(m), PallocChunkPages - (64 + uint(m))}},
+			min:   m,
+			max:   2 * m,
+			want:  BitRange{64 - uint(m), 2 * uint(m)},
+		}
+		tests["BottomEdge64WithFull"+suffix] = test{
+			alloc:     []BitRange{{64, 64}, {128 + 3*uint(m), PallocChunkPages - (128 + 3*uint(m))}},
+			scavenged: []BitRange{{1, 10}},
+			min:       m,
+			max:       3 * m,
+			want:      BitRange{128, 3 * uint(m)},
+		}
+		tests["BottomEdge64WithPocket"+suffix] = test{
+			alloc:     []BitRange{{64, 62}, {127, 1}, {128 + 3*uint(m), PallocChunkPages - (128 + 3*uint(m))}},
+			scavenged: []BitRange{{1, 10}},
+			min:       m,
+			max:       3 * m,
+			want:      BitRange{128, 3 * uint(m)},
+		}
+		tests["Max0"+suffix] = test{
+			scavenged: []BitRange{{0, PallocChunkPages - uint(m)}},
+			min:       m,
+			max:       0,
+			want:      BitRange{PallocChunkPages - uint(m), uint(m)},
+		}
+		if m <= 8 {
+			tests["OneFree"] = test{
+				alloc: []BitRange{{0, 40}, {40 + uint(m), PallocChunkPages - (40 + uint(m))}},
+				min:   m,
+				max:   PallocChunkPages,
+				want:  BitRange{40, uint(m)},
+			}
+			tests["OneScavenged"] = test{
+				alloc:     []BitRange{{0, 40}, {40 + uint(m), PallocChunkPages - (40 + uint(m))}},
+				scavenged: []BitRange{{40, 1}},
+				min:       m,
+				max:       PallocChunkPages,
+				want:      BitRange{0, 0},
+			}
+		}
+		if m > 1 {
+			tests["MaxUnaligned"+suffix] = test{
+				scavenged: []BitRange{{0, PallocChunkPages - uint(m*2-1)}},
+				min:       m,
+				max:       m - 2,
+				want:      BitRange{PallocChunkPages - uint(m), uint(m)},
+			}
+			tests["SkipSmall"+suffix] = test{
+				alloc: []BitRange{{0, 64 - uint(m)}, {64, 5}, {70, 11}, {82, PallocChunkPages - 82}},
+				min:   m,
+				max:   m,
+				want:  BitRange{64 - uint(m), uint(m)},
+			}
+			tests["SkipMisaligned"+suffix] = test{
+				alloc: []BitRange{{0, 64 - uint(m)}, {64, 63}, {127 + uint(m), PallocChunkPages - (127 + uint(m))}},
+				min:   m,
+				max:   m,
+				want:  BitRange{64 - uint(m), uint(m)},
+			}
+			tests["MaxLessThan"+suffix] = test{
+				scavenged: []BitRange{{0, PallocChunkPages - uint(m)}},
+				min:       m,
+				max:       1,
+				want:      BitRange{PallocChunkPages - uint(m), uint(m)},
+			}
+		}
+	}
+	if PhysHugePageSize > uintptr(PageSize) {
+		// Check hugepage preserving behavior.
+		bits := uint(PhysHugePageSize / uintptr(PageSize))
+		if bits < PallocChunkPages {
+			tests["PreserveHugePageBottom"] = test{
+				alloc: []BitRange{{bits + 2, PallocChunkPages - (bits + 2)}},
+				min:   1,
+				max:   3, // Make it so that max would have us try to break the huge page.
+				want:  BitRange{0, bits + 2},
+			}
+			if 3*bits < PallocChunkPages {
+				// We need at least 3 huge pages in a chunk for this test to make sense.
+				tests["PreserveHugePageMiddle"] = test{
+					alloc: []BitRange{{0, bits - 10}, {2*bits + 10, PallocChunkPages - (2*bits + 10)}},
+					min:   1,
+					max:   12, // Make it so that max would have us try to break the huge page.
+					want:  BitRange{bits, bits + 10},
+				}
+			}
+			tests["PreserveHugePageTop"] = test{
+				alloc: []BitRange{{0, PallocChunkPages - bits}},
+				min:   1,
+				max:   1, // Even one page would break a huge page in this case.
+				want:  BitRange{PallocChunkPages - bits, bits},
+			}
+		} else if bits == PallocChunkPages {
+			tests["PreserveHugePageAll"] = test{
+				min:  1,
+				max:  1, // Even one page would break a huge page in this case.
+				want: BitRange{0, PallocChunkPages},
+			}
+		} else {
+			// The huge page size is greater than pallocChunkPages, so it should
+			// be effectively disabled. There's no way we can possible scavenge
+			// a huge page out of this bitmap chunk.
+			tests["PreserveHugePageNone"] = test{
+				min:  1,
+				max:  1,
+				want: BitRange{PallocChunkPages - 1, 1},
+			}
+		}
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := makePallocData(v.alloc, v.scavenged)
+			start, size := b.FindScavengeCandidate(PallocChunkPages-1, v.min, v.max)
+			got := BitRange{start, size}
+			if !(got.N == 0 && v.want.N == 0) && got != v.want {
+				t.Fatalf("candidate mismatch: got %v, want %v", got, v.want)
+			}
+		})
+	}
+}
+
+// Tests end-to-end scavenging on a pageAlloc.
+func TestPageAllocScavenge(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	type test struct {
+		request, expect uintptr
+	}
+	minPages := PhysPageSize / PageSize
+	if minPages < 1 {
+		minPages = 1
+	}
+	type setup struct {
+		beforeAlloc map[ChunkIdx][]BitRange
+		beforeScav  map[ChunkIdx][]BitRange
+		expect      []test
+		afterScav   map[ChunkIdx][]BitRange
+	}
+	tests := map[string]setup{
+		"AllFreeUnscavExhaust": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {},
+			},
+			expect: []test{
+				{^uintptr(0), 3 * PallocChunkPages * PageSize},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+		},
+		"NoneFreeUnscavExhaust": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {},
+			},
+			expect: []test{
+				{^uintptr(0), 0},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {},
+			},
+		},
+		"ScavHighestPageFirst": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{uint(minPages), PallocChunkPages - uint(2*minPages)}},
+			},
+			expect: []test{
+				{1, minPages * PageSize},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{uint(minPages), PallocChunkPages - uint(minPages)}},
+			},
+		},
+		"ScavMultiple": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{uint(minPages), PallocChunkPages - uint(2*minPages)}},
+			},
+			expect: []test{
+				{minPages * PageSize, minPages * PageSize},
+				{minPages * PageSize, minPages * PageSize},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+		},
+		"ScavMultiple2": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{uint(minPages), PallocChunkPages - uint(2*minPages)}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages - uint(2*minPages)}},
+			},
+			expect: []test{
+				{2 * minPages * PageSize, 2 * minPages * PageSize},
+				{minPages * PageSize, minPages * PageSize},
+				{minPages * PageSize, minPages * PageSize},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+		},
+		"ScavDiscontiguous": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:       {},
+				BaseChunkIdx + 0xe: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:       {{uint(minPages), PallocChunkPages - uint(2*minPages)}},
+				BaseChunkIdx + 0xe: {{uint(2 * minPages), PallocChunkPages - uint(2*minPages)}},
+			},
+			expect: []test{
+				{2 * minPages * PageSize, 2 * minPages * PageSize},
+				{^uintptr(0), 2 * minPages * PageSize},
+				{^uintptr(0), 0},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:       {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xe: {{0, PallocChunkPages}},
+			},
+		},
+	}
+	if PageAlloc64Bit != 0 {
+		tests["ScavAllVeryDiscontiguous"] = setup{
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:          {},
+				BaseChunkIdx + 0x1000: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:          {},
+				BaseChunkIdx + 0x1000: {},
+			},
+			expect: []test{
+				{^uintptr(0), 2 * PallocChunkPages * PageSize},
+				{^uintptr(0), 0},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:          {{0, PallocChunkPages}},
+				BaseChunkIdx + 0x1000: {{0, PallocChunkPages}},
+			},
+		}
+	}
+	for name, v := range tests {
+		v := v
+		runTest := func(t *testing.T, mayUnlock bool) {
+			b := NewPageAlloc(v.beforeAlloc, v.beforeScav)
+			defer FreePageAlloc(b)
+
+			for iter, h := range v.expect {
+				if got := b.Scavenge(h.request, mayUnlock); got != h.expect {
+					t.Fatalf("bad scavenge #%d: want %d, got %d", iter+1, h.expect, got)
+				}
+			}
+			want := NewPageAlloc(v.beforeAlloc, v.afterScav)
+			defer FreePageAlloc(want)
+
+			checkPageAlloc(t, want, b)
+		}
+		t.Run(name, func(t *testing.T) {
+			runTest(t, false)
+		})
+		t.Run(name+"MayUnlock", func(t *testing.T) {
+			runTest(t, true)
+		})
+	}
+}
diff --git a/src/runtime/mgcstack.go b/src/runtime/mgcstack.go
new file mode 100644
index 0000000..8eb941a
--- /dev/null
+++ b/src/runtime/mgcstack.go
@@ -0,0 +1,352 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: stack objects and stack tracing
+// See the design doc at https://docs.google.com/document/d/1un-Jn47yByHL7I0aVIP_uVCMxjdM5mpelJhiKlIqxkE/edit?usp=sharing
+// Also see issue 22350.
+
+// Stack tracing solves the problem of determining which parts of the
+// stack are live and should be scanned. It runs as part of scanning
+// a single goroutine stack.
+//
+// Normally determining which parts of the stack are live is easy to
+// do statically, as user code has explicit references (reads and
+// writes) to stack variables. The compiler can do a simple dataflow
+// analysis to determine liveness of stack variables at every point in
+// the code. See cmd/compile/internal/gc/plive.go for that analysis.
+//
+// However, when we take the address of a stack variable, determining
+// whether that variable is still live is less clear. We can still
+// look for static accesses, but accesses through a pointer to the
+// variable are difficult in general to track statically. That pointer
+// can be passed among functions on the stack, conditionally retained,
+// etc.
+//
+// Instead, we will track pointers to stack variables dynamically.
+// All pointers to stack-allocated variables will themselves be on the
+// stack somewhere (or in associated locations, like defer records), so
+// we can find them all efficiently.
+//
+// Stack tracing is organized as a mini garbage collection tracing
+// pass. The objects in this garbage collection are all the variables
+// on the stack whose address is taken, and which themselves contain a
+// pointer. We call these variables "stack objects".
+//
+// We begin by determining all the stack objects on the stack and all
+// the statically live pointers that may point into the stack. We then
+// process each pointer to see if it points to a stack object. If it
+// does, we scan that stack object. It may contain pointers into the
+// heap, in which case those pointers are passed to the main garbage
+// collection. It may also contain pointers into the stack, in which
+// case we add them to our set of stack pointers.
+//
+// Once we're done processing all the pointers (including the ones we
+// added during processing), we've found all the stack objects that
+// are live. Any dead stack objects are not scanned and their contents
+// will not keep heap objects live. Unlike the main garbage
+// collection, we can't sweep the dead stack objects; they live on in
+// a moribund state until the stack frame that contains them is
+// popped.
+//
+// A stack can look like this:
+//
+// +----------+
+// | foo()    |
+// | +------+ |
+// | |  A   | | <---\
+// | +------+ |     |
+// |          |     |
+// | +------+ |     |
+// | |  B   | |     |
+// | +------+ |     |
+// |          |     |
+// +----------+     |
+// | bar()    |     |
+// | +------+ |     |
+// | |  C   | | <-\ |
+// | +----|-+ |   | |
+// |      |   |   | |
+// | +----v-+ |   | |
+// | |  D  ---------/
+// | +------+ |   |
+// |          |   |
+// +----------+   |
+// | baz()    |   |
+// | +------+ |   |
+// | |  E  -------/
+// | +------+ |
+// |      ^   |
+// | F: --/   |
+// |          |
+// +----------+
+//
+// foo() calls bar() calls baz(). Each has a frame on the stack.
+// foo() has stack objects A and B.
+// bar() has stack objects C and D, with C pointing to D and D pointing to A.
+// baz() has a stack object E pointing to C, and a local variable F pointing to E.
+//
+// Starting from the pointer in local variable F, we will eventually
+// scan all of E, C, D, and A (in that order). B is never scanned
+// because there is no live pointer to it. If B is also statically
+// dead (meaning that foo() never accesses B again after it calls
+// bar()), then B's pointers into the heap are not considered live.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const stackTraceDebug = false
+
+// Buffer for pointers found during stack tracing.
+// Must be smaller than or equal to workbuf.
+//
+//go:notinheap
+type stackWorkBuf struct {
+	stackWorkBufHdr
+	obj [(_WorkbufSize - unsafe.Sizeof(stackWorkBufHdr{})) / sys.PtrSize]uintptr
+}
+
+// Header declaration must come after the buf declaration above, because of issue #14620.
+//
+//go:notinheap
+type stackWorkBufHdr struct {
+	workbufhdr
+	next *stackWorkBuf // linked list of workbufs
+	// Note: we could theoretically repurpose lfnode.next as this next pointer.
+	// It would save 1 word, but that probably isn't worth busting open
+	// the lfnode API.
+}
+
+// Buffer for stack objects found on a goroutine stack.
+// Must be smaller than or equal to workbuf.
+//
+//go:notinheap
+type stackObjectBuf struct {
+	stackObjectBufHdr
+	obj [(_WorkbufSize - unsafe.Sizeof(stackObjectBufHdr{})) / unsafe.Sizeof(stackObject{})]stackObject
+}
+
+//go:notinheap
+type stackObjectBufHdr struct {
+	workbufhdr
+	next *stackObjectBuf
+}
+
+func init() {
+	if unsafe.Sizeof(stackWorkBuf{}) > unsafe.Sizeof(workbuf{}) {
+		panic("stackWorkBuf too big")
+	}
+	if unsafe.Sizeof(stackObjectBuf{}) > unsafe.Sizeof(workbuf{}) {
+		panic("stackObjectBuf too big")
+	}
+}
+
+// A stackObject represents a variable on the stack that has had
+// its address taken.
+//
+//go:notinheap
+type stackObject struct {
+	off   uint32       // offset above stack.lo
+	size  uint32       // size of object
+	typ   *_type       // type info (for ptr/nonptr bits). nil if object has been scanned.
+	left  *stackObject // objects with lower addresses
+	right *stackObject // objects with higher addresses
+}
+
+// obj.typ = typ, but with no write barrier.
+//go:nowritebarrier
+func (obj *stackObject) setType(typ *_type) {
+	// Types of stack objects are always in read-only memory, not the heap.
+	// So not using a write barrier is ok.
+	*(*uintptr)(unsafe.Pointer(&obj.typ)) = uintptr(unsafe.Pointer(typ))
+}
+
+// A stackScanState keeps track of the state used during the GC walk
+// of a goroutine.
+type stackScanState struct {
+	cache pcvalueCache
+
+	// stack limits
+	stack stack
+
+	// conservative indicates that the next frame must be scanned conservatively.
+	// This applies only to the innermost frame at an async safe-point.
+	conservative bool
+
+	// buf contains the set of possible pointers to stack objects.
+	// Organized as a LIFO linked list of buffers.
+	// All buffers except possibly the head buffer are full.
+	buf     *stackWorkBuf
+	freeBuf *stackWorkBuf // keep around one free buffer for allocation hysteresis
+
+	// cbuf contains conservative pointers to stack objects. If
+	// all pointers to a stack object are obtained via
+	// conservative scanning, then the stack object may be dead
+	// and may contain dead pointers, so it must be scanned
+	// defensively.
+	cbuf *stackWorkBuf
+
+	// list of stack objects
+	// Objects are in increasing address order.
+	head  *stackObjectBuf
+	tail  *stackObjectBuf
+	nobjs int
+
+	// root of binary tree for fast object lookup by address
+	// Initialized by buildIndex.
+	root *stackObject
+}
+
+// Add p as a potential pointer to a stack object.
+// p must be a stack address.
+func (s *stackScanState) putPtr(p uintptr, conservative bool) {
+	if p < s.stack.lo || p >= s.stack.hi {
+		throw("address not a stack address")
+	}
+	head := &s.buf
+	if conservative {
+		head = &s.cbuf
+	}
+	buf := *head
+	if buf == nil {
+		// Initial setup.
+		buf = (*stackWorkBuf)(unsafe.Pointer(getempty()))
+		buf.nobj = 0
+		buf.next = nil
+		*head = buf
+	} else if buf.nobj == len(buf.obj) {
+		if s.freeBuf != nil {
+			buf = s.freeBuf
+			s.freeBuf = nil
+		} else {
+			buf = (*stackWorkBuf)(unsafe.Pointer(getempty()))
+		}
+		buf.nobj = 0
+		buf.next = *head
+		*head = buf
+	}
+	buf.obj[buf.nobj] = p
+	buf.nobj++
+}
+
+// Remove and return a potential pointer to a stack object.
+// Returns 0 if there are no more pointers available.
+//
+// This prefers non-conservative pointers so we scan stack objects
+// precisely if there are any non-conservative pointers to them.
+func (s *stackScanState) getPtr() (p uintptr, conservative bool) {
+	for _, head := range []**stackWorkBuf{&s.buf, &s.cbuf} {
+		buf := *head
+		if buf == nil {
+			// Never had any data.
+			continue
+		}
+		if buf.nobj == 0 {
+			if s.freeBuf != nil {
+				// Free old freeBuf.
+				putempty((*workbuf)(unsafe.Pointer(s.freeBuf)))
+			}
+			// Move buf to the freeBuf.
+			s.freeBuf = buf
+			buf = buf.next
+			*head = buf
+			if buf == nil {
+				// No more data in this list.
+				continue
+			}
+		}
+		buf.nobj--
+		return buf.obj[buf.nobj], head == &s.cbuf
+	}
+	// No more data in either list.
+	if s.freeBuf != nil {
+		putempty((*workbuf)(unsafe.Pointer(s.freeBuf)))
+		s.freeBuf = nil
+	}
+	return 0, false
+}
+
+// addObject adds a stack object at addr of type typ to the set of stack objects.
+func (s *stackScanState) addObject(addr uintptr, typ *_type) {
+	x := s.tail
+	if x == nil {
+		// initial setup
+		x = (*stackObjectBuf)(unsafe.Pointer(getempty()))
+		x.next = nil
+		s.head = x
+		s.tail = x
+	}
+	if x.nobj > 0 && uint32(addr-s.stack.lo) < x.obj[x.nobj-1].off+x.obj[x.nobj-1].size {
+		throw("objects added out of order or overlapping")
+	}
+	if x.nobj == len(x.obj) {
+		// full buffer - allocate a new buffer, add to end of linked list
+		y := (*stackObjectBuf)(unsafe.Pointer(getempty()))
+		y.next = nil
+		x.next = y
+		s.tail = y
+		x = y
+	}
+	obj := &x.obj[x.nobj]
+	x.nobj++
+	obj.off = uint32(addr - s.stack.lo)
+	obj.size = uint32(typ.size)
+	obj.setType(typ)
+	// obj.left and obj.right will be initialized by buildIndex before use.
+	s.nobjs++
+}
+
+// buildIndex initializes s.root to a binary search tree.
+// It should be called after all addObject calls but before
+// any call of findObject.
+func (s *stackScanState) buildIndex() {
+	s.root, _, _ = binarySearchTree(s.head, 0, s.nobjs)
+}
+
+// Build a binary search tree with the n objects in the list
+// x.obj[idx], x.obj[idx+1], ..., x.next.obj[0], ...
+// Returns the root of that tree, and the buf+idx of the nth object after x.obj[idx].
+// (The first object that was not included in the binary search tree.)
+// If n == 0, returns nil, x.
+func binarySearchTree(x *stackObjectBuf, idx int, n int) (root *stackObject, restBuf *stackObjectBuf, restIdx int) {
+	if n == 0 {
+		return nil, x, idx
+	}
+	var left, right *stackObject
+	left, x, idx = binarySearchTree(x, idx, n/2)
+	root = &x.obj[idx]
+	idx++
+	if idx == len(x.obj) {
+		x = x.next
+		idx = 0
+	}
+	right, x, idx = binarySearchTree(x, idx, n-n/2-1)
+	root.left = left
+	root.right = right
+	return root, x, idx
+}
+
+// findObject returns the stack object containing address a, if any.
+// Must have called buildIndex previously.
+func (s *stackScanState) findObject(a uintptr) *stackObject {
+	off := uint32(a - s.stack.lo)
+	obj := s.root
+	for {
+		if obj == nil {
+			return nil
+		}
+		if off < obj.off {
+			obj = obj.left
+			continue
+		}
+		if off >= obj.off+obj.size {
+			obj = obj.right
+			continue
+		}
+		return obj
+	}
+}
diff --git a/src/runtime/mgcsweep.go b/src/runtime/mgcsweep.go
new file mode 100644
index 0000000..76bc424
--- /dev/null
+++ b/src/runtime/mgcsweep.go
@@ -0,0 +1,673 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: sweeping
+
+// The sweeper consists of two different algorithms:
+//
+// * The object reclaimer finds and frees unmarked slots in spans. It
+//   can free a whole span if none of the objects are marked, but that
+//   isn't its goal. This can be driven either synchronously by
+//   mcentral.cacheSpan for mcentral spans, or asynchronously by
+//   sweepone, which looks at all the mcentral lists.
+//
+// * The span reclaimer looks for spans that contain no marked objects
+//   and frees whole spans. This is a separate algorithm because
+//   freeing whole spans is the hardest task for the object reclaimer,
+//   but is critical when allocating new spans. The entry point for
+//   this is mheap_.reclaim and it's driven by a sequential scan of
+//   the page marks bitmap in the heap arenas.
+//
+// Both algorithms ultimately call mspan.sweep, which sweeps a single
+// heap span.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+var sweep sweepdata
+
+// State of background sweep.
+type sweepdata struct {
+	lock    mutex
+	g       *g
+	parked  bool
+	started bool
+
+	nbgsweep    uint32
+	npausesweep uint32
+
+	// centralIndex is the current unswept span class.
+	// It represents an index into the mcentral span
+	// sets. Accessed and updated via its load and
+	// update methods. Not protected by a lock.
+	//
+	// Reset at mark termination.
+	// Used by mheap.nextSpanForSweep.
+	centralIndex sweepClass
+}
+
+// sweepClass is a spanClass and one bit to represent whether we're currently
+// sweeping partial or full spans.
+type sweepClass uint32
+
+const (
+	numSweepClasses            = numSpanClasses * 2
+	sweepClassDone  sweepClass = sweepClass(^uint32(0))
+)
+
+func (s *sweepClass) load() sweepClass {
+	return sweepClass(atomic.Load((*uint32)(s)))
+}
+
+func (s *sweepClass) update(sNew sweepClass) {
+	// Only update *s if its current value is less than sNew,
+	// since *s increases monotonically.
+	sOld := s.load()
+	for sOld < sNew && !atomic.Cas((*uint32)(s), uint32(sOld), uint32(sNew)) {
+		sOld = s.load()
+	}
+	// TODO(mknyszek): This isn't the only place we have
+	// an atomic monotonically increasing counter. It would
+	// be nice to have an "atomic max" which is just implemented
+	// as the above on most architectures. Some architectures
+	// like RISC-V however have native support for an atomic max.
+}
+
+func (s *sweepClass) clear() {
+	atomic.Store((*uint32)(s), 0)
+}
+
+// split returns the underlying span class as well as
+// whether we're interested in the full or partial
+// unswept lists for that class, indicated as a boolean
+// (true means "full").
+func (s sweepClass) split() (spc spanClass, full bool) {
+	return spanClass(s >> 1), s&1 == 0
+}
+
+// nextSpanForSweep finds and pops the next span for sweeping from the
+// central sweep buffers. It returns ownership of the span to the caller.
+// Returns nil if no such span exists.
+func (h *mheap) nextSpanForSweep() *mspan {
+	sg := h.sweepgen
+	for sc := sweep.centralIndex.load(); sc < numSweepClasses; sc++ {
+		spc, full := sc.split()
+		c := &h.central[spc].mcentral
+		var s *mspan
+		if full {
+			s = c.fullUnswept(sg).pop()
+		} else {
+			s = c.partialUnswept(sg).pop()
+		}
+		if s != nil {
+			// Write down that we found something so future sweepers
+			// can start from here.
+			sweep.centralIndex.update(sc)
+			return s
+		}
+	}
+	// Write down that we found nothing.
+	sweep.centralIndex.update(sweepClassDone)
+	return nil
+}
+
+// finishsweep_m ensures that all spans are swept.
+//
+// The world must be stopped. This ensures there are no sweeps in
+// progress.
+//
+//go:nowritebarrier
+func finishsweep_m() {
+	assertWorldStopped()
+
+	// Sweeping must be complete before marking commences, so
+	// sweep any unswept spans. If this is a concurrent GC, there
+	// shouldn't be any spans left to sweep, so this should finish
+	// instantly. If GC was forced before the concurrent sweep
+	// finished, there may be spans to sweep.
+	for sweepone() != ^uintptr(0) {
+		sweep.npausesweep++
+	}
+
+	// Reset all the unswept buffers, which should be empty.
+	// Do this in sweep termination as opposed to mark termination
+	// so that we can catch unswept spans and reclaim blocks as
+	// soon as possible.
+	sg := mheap_.sweepgen
+	for i := range mheap_.central {
+		c := &mheap_.central[i].mcentral
+		c.partialUnswept(sg).reset()
+		c.fullUnswept(sg).reset()
+	}
+
+	// Sweeping is done, so if the scavenger isn't already awake,
+	// wake it up. There's definitely work for it to do at this
+	// point.
+	wakeScavenger()
+
+	nextMarkBitArenaEpoch()
+}
+
+func bgsweep(c chan int) {
+	sweep.g = getg()
+
+	lockInit(&sweep.lock, lockRankSweep)
+	lock(&sweep.lock)
+	sweep.parked = true
+	c <- 1
+	goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceEvGoBlock, 1)
+
+	for {
+		for sweepone() != ^uintptr(0) {
+			sweep.nbgsweep++
+			Gosched()
+		}
+		for freeSomeWbufs(true) {
+			Gosched()
+		}
+		lock(&sweep.lock)
+		if !isSweepDone() {
+			// This can happen if a GC runs between
+			// gosweepone returning ^0 above
+			// and the lock being acquired.
+			unlock(&sweep.lock)
+			continue
+		}
+		sweep.parked = true
+		goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceEvGoBlock, 1)
+	}
+}
+
+// sweepone sweeps some unswept heap span and returns the number of pages returned
+// to the heap, or ^uintptr(0) if there was nothing to sweep.
+func sweepone() uintptr {
+	_g_ := getg()
+	sweepRatio := mheap_.sweepPagesPerByte // For debugging
+
+	// increment locks to ensure that the goroutine is not preempted
+	// in the middle of sweep thus leaving the span in an inconsistent state for next GC
+	_g_.m.locks++
+	if atomic.Load(&mheap_.sweepdone) != 0 {
+		_g_.m.locks--
+		return ^uintptr(0)
+	}
+	atomic.Xadd(&mheap_.sweepers, +1)
+
+	// Find a span to sweep.
+	var s *mspan
+	sg := mheap_.sweepgen
+	for {
+		s = mheap_.nextSpanForSweep()
+		if s == nil {
+			atomic.Store(&mheap_.sweepdone, 1)
+			break
+		}
+		if state := s.state.get(); state != mSpanInUse {
+			// This can happen if direct sweeping already
+			// swept this span, but in that case the sweep
+			// generation should always be up-to-date.
+			if !(s.sweepgen == sg || s.sweepgen == sg+3) {
+				print("runtime: bad span s.state=", state, " s.sweepgen=", s.sweepgen, " sweepgen=", sg, "\n")
+				throw("non in-use span in unswept list")
+			}
+			continue
+		}
+		if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
+			break
+		}
+	}
+
+	// Sweep the span we found.
+	npages := ^uintptr(0)
+	if s != nil {
+		npages = s.npages
+		if s.sweep(false) {
+			// Whole span was freed. Count it toward the
+			// page reclaimer credit since these pages can
+			// now be used for span allocation.
+			atomic.Xadduintptr(&mheap_.reclaimCredit, npages)
+		} else {
+			// Span is still in-use, so this returned no
+			// pages to the heap and the span needs to
+			// move to the swept in-use list.
+			npages = 0
+		}
+	}
+
+	// Decrement the number of active sweepers and if this is the
+	// last one print trace information.
+	if atomic.Xadd(&mheap_.sweepers, -1) == 0 && atomic.Load(&mheap_.sweepdone) != 0 {
+		// Since the sweeper is done, move the scavenge gen forward (signalling
+		// that there's new work to do) and wake the scavenger.
+		//
+		// The scavenger is signaled by the last sweeper because once
+		// sweeping is done, we will definitely have useful work for
+		// the scavenger to do, since the scavenger only runs over the
+		// heap once per GC cyle. This update is not done during sweep
+		// termination because in some cases there may be a long delay
+		// between sweep done and sweep termination (e.g. not enough
+		// allocations to trigger a GC) which would be nice to fill in
+		// with scavenging work.
+		systemstack(func() {
+			lock(&mheap_.lock)
+			mheap_.pages.scavengeStartGen()
+			unlock(&mheap_.lock)
+		})
+		// Since we might sweep in an allocation path, it's not possible
+		// for us to wake the scavenger directly via wakeScavenger, since
+		// it could allocate. Ask sysmon to do it for us instead.
+		readyForScavenger()
+
+		if debug.gcpacertrace > 0 {
+			print("pacer: sweep done at heap size ", memstats.heap_live>>20, "MB; allocated ", (memstats.heap_live-mheap_.sweepHeapLiveBasis)>>20, "MB during sweep; swept ", mheap_.pagesSwept, " pages at ", sweepRatio, " pages/byte\n")
+		}
+	}
+	_g_.m.locks--
+	return npages
+}
+
+// isSweepDone reports whether all spans are swept or currently being swept.
+//
+// Note that this condition may transition from false to true at any
+// time as the sweeper runs. It may transition from true to false if a
+// GC runs; to prevent that the caller must be non-preemptible or must
+// somehow block GC progress.
+func isSweepDone() bool {
+	return mheap_.sweepdone != 0
+}
+
+// Returns only when span s has been swept.
+//go:nowritebarrier
+func (s *mspan) ensureSwept() {
+	// Caller must disable preemption.
+	// Otherwise when this function returns the span can become unswept again
+	// (if GC is triggered on another goroutine).
+	_g_ := getg()
+	if _g_.m.locks == 0 && _g_.m.mallocing == 0 && _g_ != _g_.m.g0 {
+		throw("mspan.ensureSwept: m is not locked")
+	}
+
+	sg := mheap_.sweepgen
+	spangen := atomic.Load(&s.sweepgen)
+	if spangen == sg || spangen == sg+3 {
+		return
+	}
+	// The caller must be sure that the span is a mSpanInUse span.
+	if atomic.Cas(&s.sweepgen, sg-2, sg-1) {
+		s.sweep(false)
+		return
+	}
+	// unfortunate condition, and we don't have efficient means to wait
+	for {
+		spangen := atomic.Load(&s.sweepgen)
+		if spangen == sg || spangen == sg+3 {
+			break
+		}
+		osyield()
+	}
+}
+
+// Sweep frees or collects finalizers for blocks not marked in the mark phase.
+// It clears the mark bits in preparation for the next GC round.
+// Returns true if the span was returned to heap.
+// If preserve=true, don't return it to heap nor relink in mcentral lists;
+// caller takes care of it.
+func (s *mspan) sweep(preserve bool) bool {
+	// It's critical that we enter this function with preemption disabled,
+	// GC must not start while we are in the middle of this function.
+	_g_ := getg()
+	if _g_.m.locks == 0 && _g_.m.mallocing == 0 && _g_ != _g_.m.g0 {
+		throw("mspan.sweep: m is not locked")
+	}
+	sweepgen := mheap_.sweepgen
+	if state := s.state.get(); state != mSpanInUse || s.sweepgen != sweepgen-1 {
+		print("mspan.sweep: state=", state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
+		throw("mspan.sweep: bad span state")
+	}
+
+	if trace.enabled {
+		traceGCSweepSpan(s.npages * _PageSize)
+	}
+
+	atomic.Xadd64(&mheap_.pagesSwept, int64(s.npages))
+
+	spc := s.spanclass
+	size := s.elemsize
+
+	// The allocBits indicate which unmarked objects don't need to be
+	// processed since they were free at the end of the last GC cycle
+	// and were not allocated since then.
+	// If the allocBits index is >= s.freeindex and the bit
+	// is not marked then the object remains unallocated
+	// since the last GC.
+	// This situation is analogous to being on a freelist.
+
+	// Unlink & free special records for any objects we're about to free.
+	// Two complications here:
+	// 1. An object can have both finalizer and profile special records.
+	//    In such case we need to queue finalizer for execution,
+	//    mark the object as live and preserve the profile special.
+	// 2. A tiny object can have several finalizers setup for different offsets.
+	//    If such object is not marked, we need to queue all finalizers at once.
+	// Both 1 and 2 are possible at the same time.
+	hadSpecials := s.specials != nil
+	specialp := &s.specials
+	special := *specialp
+	for special != nil {
+		// A finalizer can be set for an inner byte of an object, find object beginning.
+		objIndex := uintptr(special.offset) / size
+		p := s.base() + objIndex*size
+		mbits := s.markBitsForIndex(objIndex)
+		if !mbits.isMarked() {
+			// This object is not marked and has at least one special record.
+			// Pass 1: see if it has at least one finalizer.
+			hasFin := false
+			endOffset := p - s.base() + size
+			for tmp := special; tmp != nil && uintptr(tmp.offset) < endOffset; tmp = tmp.next {
+				if tmp.kind == _KindSpecialFinalizer {
+					// Stop freeing of object if it has a finalizer.
+					mbits.setMarkedNonAtomic()
+					hasFin = true
+					break
+				}
+			}
+			// Pass 2: queue all finalizers _or_ handle profile record.
+			for special != nil && uintptr(special.offset) < endOffset {
+				// Find the exact byte for which the special was setup
+				// (as opposed to object beginning).
+				p := s.base() + uintptr(special.offset)
+				if special.kind == _KindSpecialFinalizer || !hasFin {
+					// Splice out special record.
+					y := special
+					special = special.next
+					*specialp = special
+					freespecial(y, unsafe.Pointer(p), size)
+				} else {
+					// This is profile record, but the object has finalizers (so kept alive).
+					// Keep special record.
+					specialp = &special.next
+					special = *specialp
+				}
+			}
+		} else {
+			// object is still live: keep special record
+			specialp = &special.next
+			special = *specialp
+		}
+	}
+	if hadSpecials && s.specials == nil {
+		spanHasNoSpecials(s)
+	}
+
+	if debug.allocfreetrace != 0 || debug.clobberfree != 0 || raceenabled || msanenabled {
+		// Find all newly freed objects. This doesn't have to
+		// efficient; allocfreetrace has massive overhead.
+		mbits := s.markBitsForBase()
+		abits := s.allocBitsForIndex(0)
+		for i := uintptr(0); i < s.nelems; i++ {
+			if !mbits.isMarked() && (abits.index < s.freeindex || abits.isMarked()) {
+				x := s.base() + i*s.elemsize
+				if debug.allocfreetrace != 0 {
+					tracefree(unsafe.Pointer(x), size)
+				}
+				if debug.clobberfree != 0 {
+					clobberfree(unsafe.Pointer(x), size)
+				}
+				if raceenabled {
+					racefree(unsafe.Pointer(x), size)
+				}
+				if msanenabled {
+					msanfree(unsafe.Pointer(x), size)
+				}
+			}
+			mbits.advance()
+			abits.advance()
+		}
+	}
+
+	// Check for zombie objects.
+	if s.freeindex < s.nelems {
+		// Everything < freeindex is allocated and hence
+		// cannot be zombies.
+		//
+		// Check the first bitmap byte, where we have to be
+		// careful with freeindex.
+		obj := s.freeindex
+		if (*s.gcmarkBits.bytep(obj / 8)&^*s.allocBits.bytep(obj / 8))>>(obj%8) != 0 {
+			s.reportZombies()
+		}
+		// Check remaining bytes.
+		for i := obj/8 + 1; i < divRoundUp(s.nelems, 8); i++ {
+			if *s.gcmarkBits.bytep(i)&^*s.allocBits.bytep(i) != 0 {
+				s.reportZombies()
+			}
+		}
+	}
+
+	// Count the number of free objects in this span.
+	nalloc := uint16(s.countAlloc())
+	nfreed := s.allocCount - nalloc
+	if nalloc > s.allocCount {
+		// The zombie check above should have caught this in
+		// more detail.
+		print("runtime: nelems=", s.nelems, " nalloc=", nalloc, " previous allocCount=", s.allocCount, " nfreed=", nfreed, "\n")
+		throw("sweep increased allocation count")
+	}
+
+	s.allocCount = nalloc
+	s.freeindex = 0 // reset allocation index to start of span.
+	if trace.enabled {
+		getg().m.p.ptr().traceReclaimed += uintptr(nfreed) * s.elemsize
+	}
+
+	// gcmarkBits becomes the allocBits.
+	// get a fresh cleared gcmarkBits in preparation for next GC
+	s.allocBits = s.gcmarkBits
+	s.gcmarkBits = newMarkBits(s.nelems)
+
+	// Initialize alloc bits cache.
+	s.refillAllocCache(0)
+
+	// The span must be in our exclusive ownership until we update sweepgen,
+	// check for potential races.
+	if state := s.state.get(); state != mSpanInUse || s.sweepgen != sweepgen-1 {
+		print("mspan.sweep: state=", state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
+		throw("mspan.sweep: bad span state after sweep")
+	}
+	if s.sweepgen == sweepgen+1 || s.sweepgen == sweepgen+3 {
+		throw("swept cached span")
+	}
+
+	// We need to set s.sweepgen = h.sweepgen only when all blocks are swept,
+	// because of the potential for a concurrent free/SetFinalizer.
+	//
+	// But we need to set it before we make the span available for allocation
+	// (return it to heap or mcentral), because allocation code assumes that a
+	// span is already swept if available for allocation.
+	//
+	// Serialization point.
+	// At this point the mark bits are cleared and allocation ready
+	// to go so release the span.
+	atomic.Store(&s.sweepgen, sweepgen)
+
+	if spc.sizeclass() != 0 {
+		// Handle spans for small objects.
+		if nfreed > 0 {
+			// Only mark the span as needing zeroing if we've freed any
+			// objects, because a fresh span that had been allocated into,
+			// wasn't totally filled, but then swept, still has all of its
+			// free slots zeroed.
+			s.needzero = 1
+			stats := memstats.heapStats.acquire()
+			atomic.Xadduintptr(&stats.smallFreeCount[spc.sizeclass()], uintptr(nfreed))
+			memstats.heapStats.release()
+		}
+		if !preserve {
+			// The caller may not have removed this span from whatever
+			// unswept set its on but taken ownership of the span for
+			// sweeping by updating sweepgen. If this span still is in
+			// an unswept set, then the mcentral will pop it off the
+			// set, check its sweepgen, and ignore it.
+			if nalloc == 0 {
+				// Free totally free span directly back to the heap.
+				mheap_.freeSpan(s)
+				return true
+			}
+			// Return span back to the right mcentral list.
+			if uintptr(nalloc) == s.nelems {
+				mheap_.central[spc].mcentral.fullSwept(sweepgen).push(s)
+			} else {
+				mheap_.central[spc].mcentral.partialSwept(sweepgen).push(s)
+			}
+		}
+	} else if !preserve {
+		// Handle spans for large objects.
+		if nfreed != 0 {
+			// Free large object span to heap.
+
+			// NOTE(rsc,dvyukov): The original implementation of efence
+			// in CL 22060046 used sysFree instead of sysFault, so that
+			// the operating system would eventually give the memory
+			// back to us again, so that an efence program could run
+			// longer without running out of memory. Unfortunately,
+			// calling sysFree here without any kind of adjustment of the
+			// heap data structures means that when the memory does
+			// come back to us, we have the wrong metadata for it, either in
+			// the mspan structures or in the garbage collection bitmap.
+			// Using sysFault here means that the program will run out of
+			// memory fairly quickly in efence mode, but at least it won't
+			// have mysterious crashes due to confused memory reuse.
+			// It should be possible to switch back to sysFree if we also
+			// implement and then call some kind of mheap.deleteSpan.
+			if debug.efence > 0 {
+				s.limit = 0 // prevent mlookup from finding this span
+				sysFault(unsafe.Pointer(s.base()), size)
+			} else {
+				mheap_.freeSpan(s)
+			}
+			stats := memstats.heapStats.acquire()
+			atomic.Xadduintptr(&stats.largeFreeCount, 1)
+			atomic.Xadduintptr(&stats.largeFree, size)
+			memstats.heapStats.release()
+			return true
+		}
+
+		// Add a large span directly onto the full+swept list.
+		mheap_.central[spc].mcentral.fullSwept(sweepgen).push(s)
+	}
+	return false
+}
+
+// reportZombies reports any marked but free objects in s and throws.
+//
+// This generally means one of the following:
+//
+// 1. User code converted a pointer to a uintptr and then back
+// unsafely, and a GC ran while the uintptr was the only reference to
+// an object.
+//
+// 2. User code (or a compiler bug) constructed a bad pointer that
+// points to a free slot, often a past-the-end pointer.
+//
+// 3. The GC two cycles ago missed a pointer and freed a live object,
+// but it was still live in the last cycle, so this GC cycle found a
+// pointer to that object and marked it.
+func (s *mspan) reportZombies() {
+	printlock()
+	print("runtime: marked free object in span ", s, ", elemsize=", s.elemsize, " freeindex=", s.freeindex, " (bad use of unsafe.Pointer? try -d=checkptr)\n")
+	mbits := s.markBitsForBase()
+	abits := s.allocBitsForIndex(0)
+	for i := uintptr(0); i < s.nelems; i++ {
+		addr := s.base() + i*s.elemsize
+		print(hex(addr))
+		alloc := i < s.freeindex || abits.isMarked()
+		if alloc {
+			print(" alloc")
+		} else {
+			print(" free ")
+		}
+		if mbits.isMarked() {
+			print(" marked  ")
+		} else {
+			print(" unmarked")
+		}
+		zombie := mbits.isMarked() && !alloc
+		if zombie {
+			print(" zombie")
+		}
+		print("\n")
+		if zombie {
+			length := s.elemsize
+			if length > 1024 {
+				length = 1024
+			}
+			hexdumpWords(addr, addr+length, nil)
+		}
+		mbits.advance()
+		abits.advance()
+	}
+	throw("found pointer to free object")
+}
+
+// deductSweepCredit deducts sweep credit for allocating a span of
+// size spanBytes. This must be performed *before* the span is
+// allocated to ensure the system has enough credit. If necessary, it
+// performs sweeping to prevent going in to debt. If the caller will
+// also sweep pages (e.g., for a large allocation), it can pass a
+// non-zero callerSweepPages to leave that many pages unswept.
+//
+// deductSweepCredit makes a worst-case assumption that all spanBytes
+// bytes of the ultimately allocated span will be available for object
+// allocation.
+//
+// deductSweepCredit is the core of the "proportional sweep" system.
+// It uses statistics gathered by the garbage collector to perform
+// enough sweeping so that all pages are swept during the concurrent
+// sweep phase between GC cycles.
+//
+// mheap_ must NOT be locked.
+func deductSweepCredit(spanBytes uintptr, callerSweepPages uintptr) {
+	if mheap_.sweepPagesPerByte == 0 {
+		// Proportional sweep is done or disabled.
+		return
+	}
+
+	if trace.enabled {
+		traceGCSweepStart()
+	}
+
+retry:
+	sweptBasis := atomic.Load64(&mheap_.pagesSweptBasis)
+
+	// Fix debt if necessary.
+	newHeapLive := uintptr(atomic.Load64(&memstats.heap_live)-mheap_.sweepHeapLiveBasis) + spanBytes
+	pagesTarget := int64(mheap_.sweepPagesPerByte*float64(newHeapLive)) - int64(callerSweepPages)
+	for pagesTarget > int64(atomic.Load64(&mheap_.pagesSwept)-sweptBasis) {
+		if sweepone() == ^uintptr(0) {
+			mheap_.sweepPagesPerByte = 0
+			break
+		}
+		if atomic.Load64(&mheap_.pagesSweptBasis) != sweptBasis {
+			// Sweep pacing changed. Recompute debt.
+			goto retry
+		}
+	}
+
+	if trace.enabled {
+		traceGCSweepDone()
+	}
+}
+
+// clobberfree sets the memory content at x to bad content, for debugging
+// purposes.
+func clobberfree(x unsafe.Pointer, size uintptr) {
+	// size (span.elemsize) is always a multiple of 4.
+	for i := uintptr(0); i < size; i += 4 {
+		*(*uint32)(add(x, i)) = 0xdeadbeef
+	}
+}
diff --git a/src/runtime/mgcwork.go b/src/runtime/mgcwork.go
new file mode 100644
index 0000000..b3a0686
--- /dev/null
+++ b/src/runtime/mgcwork.go
@@ -0,0 +1,482 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	_WorkbufSize = 2048 // in bytes; larger values result in less contention
+
+	// workbufAlloc is the number of bytes to allocate at a time
+	// for new workbufs. This must be a multiple of pageSize and
+	// should be a multiple of _WorkbufSize.
+	//
+	// Larger values reduce workbuf allocation overhead. Smaller
+	// values reduce heap fragmentation.
+	workbufAlloc = 32 << 10
+)
+
+func init() {
+	if workbufAlloc%pageSize != 0 || workbufAlloc%_WorkbufSize != 0 {
+		throw("bad workbufAlloc")
+	}
+}
+
+// Garbage collector work pool abstraction.
+//
+// This implements a producer/consumer model for pointers to grey
+// objects. A grey object is one that is marked and on a work
+// queue. A black object is marked and not on a work queue.
+//
+// Write barriers, root discovery, stack scanning, and object scanning
+// produce pointers to grey objects. Scanning consumes pointers to
+// grey objects, thus blackening them, and then scans them,
+// potentially producing new pointers to grey objects.
+
+// A gcWork provides the interface to produce and consume work for the
+// garbage collector.
+//
+// A gcWork can be used on the stack as follows:
+//
+//     (preemption must be disabled)
+//     gcw := &getg().m.p.ptr().gcw
+//     .. call gcw.put() to produce and gcw.tryGet() to consume ..
+//
+// It's important that any use of gcWork during the mark phase prevent
+// the garbage collector from transitioning to mark termination since
+// gcWork may locally hold GC work buffers. This can be done by
+// disabling preemption (systemstack or acquirem).
+type gcWork struct {
+	// wbuf1 and wbuf2 are the primary and secondary work buffers.
+	//
+	// This can be thought of as a stack of both work buffers'
+	// pointers concatenated. When we pop the last pointer, we
+	// shift the stack up by one work buffer by bringing in a new
+	// full buffer and discarding an empty one. When we fill both
+	// buffers, we shift the stack down by one work buffer by
+	// bringing in a new empty buffer and discarding a full one.
+	// This way we have one buffer's worth of hysteresis, which
+	// amortizes the cost of getting or putting a work buffer over
+	// at least one buffer of work and reduces contention on the
+	// global work lists.
+	//
+	// wbuf1 is always the buffer we're currently pushing to and
+	// popping from and wbuf2 is the buffer that will be discarded
+	// next.
+	//
+	// Invariant: Both wbuf1 and wbuf2 are nil or neither are.
+	wbuf1, wbuf2 *workbuf
+
+	// Bytes marked (blackened) on this gcWork. This is aggregated
+	// into work.bytesMarked by dispose.
+	bytesMarked uint64
+
+	// Scan work performed on this gcWork. This is aggregated into
+	// gcController by dispose and may also be flushed by callers.
+	scanWork int64
+
+	// flushedWork indicates that a non-empty work buffer was
+	// flushed to the global work list since the last gcMarkDone
+	// termination check. Specifically, this indicates that this
+	// gcWork may have communicated work to another gcWork.
+	flushedWork bool
+}
+
+// Most of the methods of gcWork are go:nowritebarrierrec because the
+// write barrier itself can invoke gcWork methods but the methods are
+// not generally re-entrant. Hence, if a gcWork method invoked the
+// write barrier while the gcWork was in an inconsistent state, and
+// the write barrier in turn invoked a gcWork method, it could
+// permanently corrupt the gcWork.
+
+func (w *gcWork) init() {
+	w.wbuf1 = getempty()
+	wbuf2 := trygetfull()
+	if wbuf2 == nil {
+		wbuf2 = getempty()
+	}
+	w.wbuf2 = wbuf2
+}
+
+// put enqueues a pointer for the garbage collector to trace.
+// obj must point to the beginning of a heap object or an oblet.
+//go:nowritebarrierrec
+func (w *gcWork) put(obj uintptr) {
+	flushed := false
+	wbuf := w.wbuf1
+	// Record that this may acquire the wbufSpans or heap lock to
+	// allocate a workbuf.
+	lockWithRankMayAcquire(&work.wbufSpans.lock, lockRankWbufSpans)
+	lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
+	if wbuf == nil {
+		w.init()
+		wbuf = w.wbuf1
+		// wbuf is empty at this point.
+	} else if wbuf.nobj == len(wbuf.obj) {
+		w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
+		wbuf = w.wbuf1
+		if wbuf.nobj == len(wbuf.obj) {
+			putfull(wbuf)
+			w.flushedWork = true
+			wbuf = getempty()
+			w.wbuf1 = wbuf
+			flushed = true
+		}
+	}
+
+	wbuf.obj[wbuf.nobj] = obj
+	wbuf.nobj++
+
+	// If we put a buffer on full, let the GC controller know so
+	// it can encourage more workers to run. We delay this until
+	// the end of put so that w is in a consistent state, since
+	// enlistWorker may itself manipulate w.
+	if flushed && gcphase == _GCmark {
+		gcController.enlistWorker()
+	}
+}
+
+// putFast does a put and reports whether it can be done quickly
+// otherwise it returns false and the caller needs to call put.
+//go:nowritebarrierrec
+func (w *gcWork) putFast(obj uintptr) bool {
+	wbuf := w.wbuf1
+	if wbuf == nil {
+		return false
+	} else if wbuf.nobj == len(wbuf.obj) {
+		return false
+	}
+
+	wbuf.obj[wbuf.nobj] = obj
+	wbuf.nobj++
+	return true
+}
+
+// putBatch performs a put on every pointer in obj. See put for
+// constraints on these pointers.
+//
+//go:nowritebarrierrec
+func (w *gcWork) putBatch(obj []uintptr) {
+	if len(obj) == 0 {
+		return
+	}
+
+	flushed := false
+	wbuf := w.wbuf1
+	if wbuf == nil {
+		w.init()
+		wbuf = w.wbuf1
+	}
+
+	for len(obj) > 0 {
+		for wbuf.nobj == len(wbuf.obj) {
+			putfull(wbuf)
+			w.flushedWork = true
+			w.wbuf1, w.wbuf2 = w.wbuf2, getempty()
+			wbuf = w.wbuf1
+			flushed = true
+		}
+		n := copy(wbuf.obj[wbuf.nobj:], obj)
+		wbuf.nobj += n
+		obj = obj[n:]
+	}
+
+	if flushed && gcphase == _GCmark {
+		gcController.enlistWorker()
+	}
+}
+
+// tryGet dequeues a pointer for the garbage collector to trace.
+//
+// If there are no pointers remaining in this gcWork or in the global
+// queue, tryGet returns 0.  Note that there may still be pointers in
+// other gcWork instances or other caches.
+//go:nowritebarrierrec
+func (w *gcWork) tryGet() uintptr {
+	wbuf := w.wbuf1
+	if wbuf == nil {
+		w.init()
+		wbuf = w.wbuf1
+		// wbuf is empty at this point.
+	}
+	if wbuf.nobj == 0 {
+		w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
+		wbuf = w.wbuf1
+		if wbuf.nobj == 0 {
+			owbuf := wbuf
+			wbuf = trygetfull()
+			if wbuf == nil {
+				return 0
+			}
+			putempty(owbuf)
+			w.wbuf1 = wbuf
+		}
+	}
+
+	wbuf.nobj--
+	return wbuf.obj[wbuf.nobj]
+}
+
+// tryGetFast dequeues a pointer for the garbage collector to trace
+// if one is readily available. Otherwise it returns 0 and
+// the caller is expected to call tryGet().
+//go:nowritebarrierrec
+func (w *gcWork) tryGetFast() uintptr {
+	wbuf := w.wbuf1
+	if wbuf == nil {
+		return 0
+	}
+	if wbuf.nobj == 0 {
+		return 0
+	}
+
+	wbuf.nobj--
+	return wbuf.obj[wbuf.nobj]
+}
+
+// dispose returns any cached pointers to the global queue.
+// The buffers are being put on the full queue so that the
+// write barriers will not simply reacquire them before the
+// GC can inspect them. This helps reduce the mutator's
+// ability to hide pointers during the concurrent mark phase.
+//
+//go:nowritebarrierrec
+func (w *gcWork) dispose() {
+	if wbuf := w.wbuf1; wbuf != nil {
+		if wbuf.nobj == 0 {
+			putempty(wbuf)
+		} else {
+			putfull(wbuf)
+			w.flushedWork = true
+		}
+		w.wbuf1 = nil
+
+		wbuf = w.wbuf2
+		if wbuf.nobj == 0 {
+			putempty(wbuf)
+		} else {
+			putfull(wbuf)
+			w.flushedWork = true
+		}
+		w.wbuf2 = nil
+	}
+	if w.bytesMarked != 0 {
+		// dispose happens relatively infrequently. If this
+		// atomic becomes a problem, we should first try to
+		// dispose less and if necessary aggregate in a per-P
+		// counter.
+		atomic.Xadd64(&work.bytesMarked, int64(w.bytesMarked))
+		w.bytesMarked = 0
+	}
+	if w.scanWork != 0 {
+		atomic.Xaddint64(&gcController.scanWork, w.scanWork)
+		w.scanWork = 0
+	}
+}
+
+// balance moves some work that's cached in this gcWork back on the
+// global queue.
+//go:nowritebarrierrec
+func (w *gcWork) balance() {
+	if w.wbuf1 == nil {
+		return
+	}
+	if wbuf := w.wbuf2; wbuf.nobj != 0 {
+		putfull(wbuf)
+		w.flushedWork = true
+		w.wbuf2 = getempty()
+	} else if wbuf := w.wbuf1; wbuf.nobj > 4 {
+		w.wbuf1 = handoff(wbuf)
+		w.flushedWork = true // handoff did putfull
+	} else {
+		return
+	}
+	// We flushed a buffer to the full list, so wake a worker.
+	if gcphase == _GCmark {
+		gcController.enlistWorker()
+	}
+}
+
+// empty reports whether w has no mark work available.
+//go:nowritebarrierrec
+func (w *gcWork) empty() bool {
+	return w.wbuf1 == nil || (w.wbuf1.nobj == 0 && w.wbuf2.nobj == 0)
+}
+
+// Internally, the GC work pool is kept in arrays in work buffers.
+// The gcWork interface caches a work buffer until full (or empty) to
+// avoid contending on the global work buffer lists.
+
+type workbufhdr struct {
+	node lfnode // must be first
+	nobj int
+}
+
+//go:notinheap
+type workbuf struct {
+	workbufhdr
+	// account for the above fields
+	obj [(_WorkbufSize - unsafe.Sizeof(workbufhdr{})) / sys.PtrSize]uintptr
+}
+
+// workbuf factory routines. These funcs are used to manage the
+// workbufs.
+// If the GC asks for some work these are the only routines that
+// make wbufs available to the GC.
+
+func (b *workbuf) checknonempty() {
+	if b.nobj == 0 {
+		throw("workbuf is empty")
+	}
+}
+
+func (b *workbuf) checkempty() {
+	if b.nobj != 0 {
+		throw("workbuf is not empty")
+	}
+}
+
+// getempty pops an empty work buffer off the work.empty list,
+// allocating new buffers if none are available.
+//go:nowritebarrier
+func getempty() *workbuf {
+	var b *workbuf
+	if work.empty != 0 {
+		b = (*workbuf)(work.empty.pop())
+		if b != nil {
+			b.checkempty()
+		}
+	}
+	// Record that this may acquire the wbufSpans or heap lock to
+	// allocate a workbuf.
+	lockWithRankMayAcquire(&work.wbufSpans.lock, lockRankWbufSpans)
+	lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
+	if b == nil {
+		// Allocate more workbufs.
+		var s *mspan
+		if work.wbufSpans.free.first != nil {
+			lock(&work.wbufSpans.lock)
+			s = work.wbufSpans.free.first
+			if s != nil {
+				work.wbufSpans.free.remove(s)
+				work.wbufSpans.busy.insert(s)
+			}
+			unlock(&work.wbufSpans.lock)
+		}
+		if s == nil {
+			systemstack(func() {
+				s = mheap_.allocManual(workbufAlloc/pageSize, spanAllocWorkBuf)
+			})
+			if s == nil {
+				throw("out of memory")
+			}
+			// Record the new span in the busy list.
+			lock(&work.wbufSpans.lock)
+			work.wbufSpans.busy.insert(s)
+			unlock(&work.wbufSpans.lock)
+		}
+		// Slice up the span into new workbufs. Return one and
+		// put the rest on the empty list.
+		for i := uintptr(0); i+_WorkbufSize <= workbufAlloc; i += _WorkbufSize {
+			newb := (*workbuf)(unsafe.Pointer(s.base() + i))
+			newb.nobj = 0
+			lfnodeValidate(&newb.node)
+			if i == 0 {
+				b = newb
+			} else {
+				putempty(newb)
+			}
+		}
+	}
+	return b
+}
+
+// putempty puts a workbuf onto the work.empty list.
+// Upon entry this go routine owns b. The lfstack.push relinquishes ownership.
+//go:nowritebarrier
+func putempty(b *workbuf) {
+	b.checkempty()
+	work.empty.push(&b.node)
+}
+
+// putfull puts the workbuf on the work.full list for the GC.
+// putfull accepts partially full buffers so the GC can avoid competing
+// with the mutators for ownership of partially full buffers.
+//go:nowritebarrier
+func putfull(b *workbuf) {
+	b.checknonempty()
+	work.full.push(&b.node)
+}
+
+// trygetfull tries to get a full or partially empty workbuffer.
+// If one is not immediately available return nil
+//go:nowritebarrier
+func trygetfull() *workbuf {
+	b := (*workbuf)(work.full.pop())
+	if b != nil {
+		b.checknonempty()
+		return b
+	}
+	return b
+}
+
+//go:nowritebarrier
+func handoff(b *workbuf) *workbuf {
+	// Make new buffer with half of b's pointers.
+	b1 := getempty()
+	n := b.nobj / 2
+	b.nobj -= n
+	b1.nobj = n
+	memmove(unsafe.Pointer(&b1.obj[0]), unsafe.Pointer(&b.obj[b.nobj]), uintptr(n)*unsafe.Sizeof(b1.obj[0]))
+
+	// Put b on full list - let first half of b get stolen.
+	putfull(b)
+	return b1
+}
+
+// prepareFreeWorkbufs moves busy workbuf spans to free list so they
+// can be freed to the heap. This must only be called when all
+// workbufs are on the empty list.
+func prepareFreeWorkbufs() {
+	lock(&work.wbufSpans.lock)
+	if work.full != 0 {
+		throw("cannot free workbufs when work.full != 0")
+	}
+	// Since all workbufs are on the empty list, we don't care
+	// which ones are in which spans. We can wipe the entire empty
+	// list and move all workbuf spans to the free list.
+	work.empty = 0
+	work.wbufSpans.free.takeAll(&work.wbufSpans.busy)
+	unlock(&work.wbufSpans.lock)
+}
+
+// freeSomeWbufs frees some workbufs back to the heap and returns
+// true if it should be called again to free more.
+func freeSomeWbufs(preemptible bool) bool {
+	const batchSize = 64 // ~1–2 µs per span.
+	lock(&work.wbufSpans.lock)
+	if gcphase != _GCoff || work.wbufSpans.free.isEmpty() {
+		unlock(&work.wbufSpans.lock)
+		return false
+	}
+	systemstack(func() {
+		gp := getg().m.curg
+		for i := 0; i < batchSize && !(preemptible && gp.preempt); i++ {
+			span := work.wbufSpans.free.first
+			if span == nil {
+				break
+			}
+			work.wbufSpans.free.remove(span)
+			mheap_.freeManual(span, spanAllocWorkBuf)
+		}
+	})
+	more := !work.wbufSpans.free.isEmpty()
+	unlock(&work.wbufSpans.lock)
+	return more
+}
diff --git a/src/runtime/mheap.go b/src/runtime/mheap.go
new file mode 100644
index 0000000..1855330
--- /dev/null
+++ b/src/runtime/mheap.go
@@ -0,0 +1,2047 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Page heap.
+//
+// See malloc.go for overview.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	// minPhysPageSize is a lower-bound on the physical page size. The
+	// true physical page size may be larger than this. In contrast,
+	// sys.PhysPageSize is an upper-bound on the physical page size.
+	minPhysPageSize = 4096
+
+	// maxPhysPageSize is the maximum page size the runtime supports.
+	maxPhysPageSize = 512 << 10
+
+	// maxPhysHugePageSize sets an upper-bound on the maximum huge page size
+	// that the runtime supports.
+	maxPhysHugePageSize = pallocChunkBytes
+
+	// pagesPerReclaimerChunk indicates how many pages to scan from the
+	// pageInUse bitmap at a time. Used by the page reclaimer.
+	//
+	// Higher values reduce contention on scanning indexes (such as
+	// h.reclaimIndex), but increase the minimum latency of the
+	// operation.
+	//
+	// The time required to scan this many pages can vary a lot depending
+	// on how many spans are actually freed. Experimentally, it can
+	// scan for pages at ~300 GB/ms on a 2.6GHz Core i7, but can only
+	// free spans at ~32 MB/ms. Using 512 pages bounds this at
+	// roughly 100µs.
+	//
+	// Must be a multiple of the pageInUse bitmap element size and
+	// must also evenly divide pagesPerArena.
+	pagesPerReclaimerChunk = 512
+
+	// physPageAlignedStacks indicates whether stack allocations must be
+	// physical page aligned. This is a requirement for MAP_STACK on
+	// OpenBSD.
+	physPageAlignedStacks = GOOS == "openbsd"
+)
+
+// Main malloc heap.
+// The heap itself is the "free" and "scav" treaps,
+// but all the other global data is here too.
+//
+// mheap must not be heap-allocated because it contains mSpanLists,
+// which must not be heap-allocated.
+//
+//go:notinheap
+type mheap struct {
+	// lock must only be acquired on the system stack, otherwise a g
+	// could self-deadlock if its stack grows with the lock held.
+	lock      mutex
+	pages     pageAlloc // page allocation data structure
+	sweepgen  uint32    // sweep generation, see comment in mspan; written during STW
+	sweepdone uint32    // all spans are swept
+	sweepers  uint32    // number of active sweepone calls
+
+	// allspans is a slice of all mspans ever created. Each mspan
+	// appears exactly once.
+	//
+	// The memory for allspans is manually managed and can be
+	// reallocated and move as the heap grows.
+	//
+	// In general, allspans is protected by mheap_.lock, which
+	// prevents concurrent access as well as freeing the backing
+	// store. Accesses during STW might not hold the lock, but
+	// must ensure that allocation cannot happen around the
+	// access (since that may free the backing store).
+	allspans []*mspan // all spans out there
+
+	_ uint32 // align uint64 fields on 32-bit for atomics
+
+	// Proportional sweep
+	//
+	// These parameters represent a linear function from heap_live
+	// to page sweep count. The proportional sweep system works to
+	// stay in the black by keeping the current page sweep count
+	// above this line at the current heap_live.
+	//
+	// The line has slope sweepPagesPerByte and passes through a
+	// basis point at (sweepHeapLiveBasis, pagesSweptBasis). At
+	// any given time, the system is at (memstats.heap_live,
+	// pagesSwept) in this space.
+	//
+	// It's important that the line pass through a point we
+	// control rather than simply starting at a (0,0) origin
+	// because that lets us adjust sweep pacing at any time while
+	// accounting for current progress. If we could only adjust
+	// the slope, it would create a discontinuity in debt if any
+	// progress has already been made.
+	pagesInUse         uint64  // pages of spans in stats mSpanInUse; updated atomically
+	pagesSwept         uint64  // pages swept this cycle; updated atomically
+	pagesSweptBasis    uint64  // pagesSwept to use as the origin of the sweep ratio; updated atomically
+	sweepHeapLiveBasis uint64  // value of heap_live to use as the origin of sweep ratio; written with lock, read without
+	sweepPagesPerByte  float64 // proportional sweep ratio; written with lock, read without
+	// TODO(austin): pagesInUse should be a uintptr, but the 386
+	// compiler can't 8-byte align fields.
+
+	// scavengeGoal is the amount of total retained heap memory (measured by
+	// heapRetained) that the runtime will try to maintain by returning memory
+	// to the OS.
+	scavengeGoal uint64
+
+	// Page reclaimer state
+
+	// reclaimIndex is the page index in allArenas of next page to
+	// reclaim. Specifically, it refers to page (i %
+	// pagesPerArena) of arena allArenas[i / pagesPerArena].
+	//
+	// If this is >= 1<<63, the page reclaimer is done scanning
+	// the page marks.
+	//
+	// This is accessed atomically.
+	reclaimIndex uint64
+	// reclaimCredit is spare credit for extra pages swept. Since
+	// the page reclaimer works in large chunks, it may reclaim
+	// more than requested. Any spare pages released go to this
+	// credit pool.
+	//
+	// This is accessed atomically.
+	reclaimCredit uintptr
+
+	// arenas is the heap arena map. It points to the metadata for
+	// the heap for every arena frame of the entire usable virtual
+	// address space.
+	//
+	// Use arenaIndex to compute indexes into this array.
+	//
+	// For regions of the address space that are not backed by the
+	// Go heap, the arena map contains nil.
+	//
+	// Modifications are protected by mheap_.lock. Reads can be
+	// performed without locking; however, a given entry can
+	// transition from nil to non-nil at any time when the lock
+	// isn't held. (Entries never transitions back to nil.)
+	//
+	// In general, this is a two-level mapping consisting of an L1
+	// map and possibly many L2 maps. This saves space when there
+	// are a huge number of arena frames. However, on many
+	// platforms (even 64-bit), arenaL1Bits is 0, making this
+	// effectively a single-level map. In this case, arenas[0]
+	// will never be nil.
+	arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena
+
+	// heapArenaAlloc is pre-reserved space for allocating heapArena
+	// objects. This is only used on 32-bit, where we pre-reserve
+	// this space to avoid interleaving it with the heap itself.
+	heapArenaAlloc linearAlloc
+
+	// arenaHints is a list of addresses at which to attempt to
+	// add more heap arenas. This is initially populated with a
+	// set of general hint addresses, and grown with the bounds of
+	// actual heap arena ranges.
+	arenaHints *arenaHint
+
+	// arena is a pre-reserved space for allocating heap arenas
+	// (the actual arenas). This is only used on 32-bit.
+	arena linearAlloc
+
+	// allArenas is the arenaIndex of every mapped arena. This can
+	// be used to iterate through the address space.
+	//
+	// Access is protected by mheap_.lock. However, since this is
+	// append-only and old backing arrays are never freed, it is
+	// safe to acquire mheap_.lock, copy the slice header, and
+	// then release mheap_.lock.
+	allArenas []arenaIdx
+
+	// sweepArenas is a snapshot of allArenas taken at the
+	// beginning of the sweep cycle. This can be read safely by
+	// simply blocking GC (by disabling preemption).
+	sweepArenas []arenaIdx
+
+	// markArenas is a snapshot of allArenas taken at the beginning
+	// of the mark cycle. Because allArenas is append-only, neither
+	// this slice nor its contents will change during the mark, so
+	// it can be read safely.
+	markArenas []arenaIdx
+
+	// curArena is the arena that the heap is currently growing
+	// into. This should always be physPageSize-aligned.
+	curArena struct {
+		base, end uintptr
+	}
+
+	_ uint32 // ensure 64-bit alignment of central
+
+	// central free lists for small size classes.
+	// the padding makes sure that the mcentrals are
+	// spaced CacheLinePadSize bytes apart, so that each mcentral.lock
+	// gets its own cache line.
+	// central is indexed by spanClass.
+	central [numSpanClasses]struct {
+		mcentral mcentral
+		pad      [cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte
+	}
+
+	spanalloc             fixalloc // allocator for span*
+	cachealloc            fixalloc // allocator for mcache*
+	specialfinalizeralloc fixalloc // allocator for specialfinalizer*
+	specialprofilealloc   fixalloc // allocator for specialprofile*
+	speciallock           mutex    // lock for special record allocators.
+	arenaHintAlloc        fixalloc // allocator for arenaHints
+
+	unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
+}
+
+var mheap_ mheap
+
+// A heapArena stores metadata for a heap arena. heapArenas are stored
+// outside of the Go heap and accessed via the mheap_.arenas index.
+//
+//go:notinheap
+type heapArena struct {
+	// bitmap stores the pointer/scalar bitmap for the words in
+	// this arena. See mbitmap.go for a description. Use the
+	// heapBits type to access this.
+	bitmap [heapArenaBitmapBytes]byte
+
+	// spans maps from virtual address page ID within this arena to *mspan.
+	// For allocated spans, their pages map to the span itself.
+	// For free spans, only the lowest and highest pages map to the span itself.
+	// Internal pages map to an arbitrary span.
+	// For pages that have never been allocated, spans entries are nil.
+	//
+	// Modifications are protected by mheap.lock. Reads can be
+	// performed without locking, but ONLY from indexes that are
+	// known to contain in-use or stack spans. This means there
+	// must not be a safe-point between establishing that an
+	// address is live and looking it up in the spans array.
+	spans [pagesPerArena]*mspan
+
+	// pageInUse is a bitmap that indicates which spans are in
+	// state mSpanInUse. This bitmap is indexed by page number,
+	// but only the bit corresponding to the first page in each
+	// span is used.
+	//
+	// Reads and writes are atomic.
+	pageInUse [pagesPerArena / 8]uint8
+
+	// pageMarks is a bitmap that indicates which spans have any
+	// marked objects on them. Like pageInUse, only the bit
+	// corresponding to the first page in each span is used.
+	//
+	// Writes are done atomically during marking. Reads are
+	// non-atomic and lock-free since they only occur during
+	// sweeping (and hence never race with writes).
+	//
+	// This is used to quickly find whole spans that can be freed.
+	//
+	// TODO(austin): It would be nice if this was uint64 for
+	// faster scanning, but we don't have 64-bit atomic bit
+	// operations.
+	pageMarks [pagesPerArena / 8]uint8
+
+	// pageSpecials is a bitmap that indicates which spans have
+	// specials (finalizers or other). Like pageInUse, only the bit
+	// corresponding to the first page in each span is used.
+	//
+	// Writes are done atomically whenever a special is added to
+	// a span and whenever the last special is removed from a span.
+	// Reads are done atomically to find spans containing specials
+	// during marking.
+	pageSpecials [pagesPerArena / 8]uint8
+
+	// checkmarks stores the debug.gccheckmark state. It is only
+	// used if debug.gccheckmark > 0.
+	checkmarks *checkmarksMap
+
+	// zeroedBase marks the first byte of the first page in this
+	// arena which hasn't been used yet and is therefore already
+	// zero. zeroedBase is relative to the arena base.
+	// Increases monotonically until it hits heapArenaBytes.
+	//
+	// This field is sufficient to determine if an allocation
+	// needs to be zeroed because the page allocator follows an
+	// address-ordered first-fit policy.
+	//
+	// Read atomically and written with an atomic CAS.
+	zeroedBase uintptr
+}
+
+// arenaHint is a hint for where to grow the heap arenas. See
+// mheap_.arenaHints.
+//
+//go:notinheap
+type arenaHint struct {
+	addr uintptr
+	down bool
+	next *arenaHint
+}
+
+// An mspan is a run of pages.
+//
+// When a mspan is in the heap free treap, state == mSpanFree
+// and heapmap(s->start) == span, heapmap(s->start+s->npages-1) == span.
+// If the mspan is in the heap scav treap, then in addition to the
+// above scavenged == true. scavenged == false in all other cases.
+//
+// When a mspan is allocated, state == mSpanInUse or mSpanManual
+// and heapmap(i) == span for all s->start <= i < s->start+s->npages.
+
+// Every mspan is in one doubly-linked list, either in the mheap's
+// busy list or one of the mcentral's span lists.
+
+// An mspan representing actual memory has state mSpanInUse,
+// mSpanManual, or mSpanFree. Transitions between these states are
+// constrained as follows:
+//
+// * A span may transition from free to in-use or manual during any GC
+//   phase.
+//
+// * During sweeping (gcphase == _GCoff), a span may transition from
+//   in-use to free (as a result of sweeping) or manual to free (as a
+//   result of stacks being freed).
+//
+// * During GC (gcphase != _GCoff), a span *must not* transition from
+//   manual or in-use to free. Because concurrent GC may read a pointer
+//   and then look up its span, the span state must be monotonic.
+//
+// Setting mspan.state to mSpanInUse or mSpanManual must be done
+// atomically and only after all other span fields are valid.
+// Likewise, if inspecting a span is contingent on it being
+// mSpanInUse, the state should be loaded atomically and checked
+// before depending on other fields. This allows the garbage collector
+// to safely deal with potentially invalid pointers, since resolving
+// such pointers may race with a span being allocated.
+type mSpanState uint8
+
+const (
+	mSpanDead   mSpanState = iota
+	mSpanInUse             // allocated for garbage collected heap
+	mSpanManual            // allocated for manual management (e.g., stack allocator)
+)
+
+// mSpanStateNames are the names of the span states, indexed by
+// mSpanState.
+var mSpanStateNames = []string{
+	"mSpanDead",
+	"mSpanInUse",
+	"mSpanManual",
+	"mSpanFree",
+}
+
+// mSpanStateBox holds an mSpanState and provides atomic operations on
+// it. This is a separate type to disallow accidental comparison or
+// assignment with mSpanState.
+type mSpanStateBox struct {
+	s mSpanState
+}
+
+func (b *mSpanStateBox) set(s mSpanState) {
+	atomic.Store8((*uint8)(&b.s), uint8(s))
+}
+
+func (b *mSpanStateBox) get() mSpanState {
+	return mSpanState(atomic.Load8((*uint8)(&b.s)))
+}
+
+// mSpanList heads a linked list of spans.
+//
+//go:notinheap
+type mSpanList struct {
+	first *mspan // first span in list, or nil if none
+	last  *mspan // last span in list, or nil if none
+}
+
+//go:notinheap
+type mspan struct {
+	next *mspan     // next span in list, or nil if none
+	prev *mspan     // previous span in list, or nil if none
+	list *mSpanList // For debugging. TODO: Remove.
+
+	startAddr uintptr // address of first byte of span aka s.base()
+	npages    uintptr // number of pages in span
+
+	manualFreeList gclinkptr // list of free objects in mSpanManual spans
+
+	// freeindex is the slot index between 0 and nelems at which to begin scanning
+	// for the next free object in this span.
+	// Each allocation scans allocBits starting at freeindex until it encounters a 0
+	// indicating a free object. freeindex is then adjusted so that subsequent scans begin
+	// just past the newly discovered free object.
+	//
+	// If freeindex == nelem, this span has no free objects.
+	//
+	// allocBits is a bitmap of objects in this span.
+	// If n >= freeindex and allocBits[n/8] & (1<<(n%8)) is 0
+	// then object n is free;
+	// otherwise, object n is allocated. Bits starting at nelem are
+	// undefined and should never be referenced.
+	//
+	// Object n starts at address n*elemsize + (start << pageShift).
+	freeindex uintptr
+	// TODO: Look up nelems from sizeclass and remove this field if it
+	// helps performance.
+	nelems uintptr // number of object in the span.
+
+	// Cache of the allocBits at freeindex. allocCache is shifted
+	// such that the lowest bit corresponds to the bit freeindex.
+	// allocCache holds the complement of allocBits, thus allowing
+	// ctz (count trailing zero) to use it directly.
+	// allocCache may contain bits beyond s.nelems; the caller must ignore
+	// these.
+	allocCache uint64
+
+	// allocBits and gcmarkBits hold pointers to a span's mark and
+	// allocation bits. The pointers are 8 byte aligned.
+	// There are three arenas where this data is held.
+	// free: Dirty arenas that are no longer accessed
+	//       and can be reused.
+	// next: Holds information to be used in the next GC cycle.
+	// current: Information being used during this GC cycle.
+	// previous: Information being used during the last GC cycle.
+	// A new GC cycle starts with the call to finishsweep_m.
+	// finishsweep_m moves the previous arena to the free arena,
+	// the current arena to the previous arena, and
+	// the next arena to the current arena.
+	// The next arena is populated as the spans request
+	// memory to hold gcmarkBits for the next GC cycle as well
+	// as allocBits for newly allocated spans.
+	//
+	// The pointer arithmetic is done "by hand" instead of using
+	// arrays to avoid bounds checks along critical performance
+	// paths.
+	// The sweep will free the old allocBits and set allocBits to the
+	// gcmarkBits. The gcmarkBits are replaced with a fresh zeroed
+	// out memory.
+	allocBits  *gcBits
+	gcmarkBits *gcBits
+
+	// sweep generation:
+	// if sweepgen == h->sweepgen - 2, the span needs sweeping
+	// if sweepgen == h->sweepgen - 1, the span is currently being swept
+	// if sweepgen == h->sweepgen, the span is swept and ready to use
+	// if sweepgen == h->sweepgen + 1, the span was cached before sweep began and is still cached, and needs sweeping
+	// if sweepgen == h->sweepgen + 3, the span was swept and then cached and is still cached
+	// h->sweepgen is incremented by 2 after every GC
+
+	sweepgen    uint32
+	divMul      uint16        // for divide by elemsize - divMagic.mul
+	baseMask    uint16        // if non-0, elemsize is a power of 2, & this will get object allocation base
+	allocCount  uint16        // number of allocated objects
+	spanclass   spanClass     // size class and noscan (uint8)
+	state       mSpanStateBox // mSpanInUse etc; accessed atomically (get/set methods)
+	needzero    uint8         // needs to be zeroed before allocation
+	divShift    uint8         // for divide by elemsize - divMagic.shift
+	divShift2   uint8         // for divide by elemsize - divMagic.shift2
+	elemsize    uintptr       // computed from sizeclass or from npages
+	limit       uintptr       // end of data in span
+	speciallock mutex         // guards specials list
+	specials    *special      // linked list of special records sorted by offset.
+}
+
+func (s *mspan) base() uintptr {
+	return s.startAddr
+}
+
+func (s *mspan) layout() (size, n, total uintptr) {
+	total = s.npages << _PageShift
+	size = s.elemsize
+	if size > 0 {
+		n = total / size
+	}
+	return
+}
+
+// recordspan adds a newly allocated span to h.allspans.
+//
+// This only happens the first time a span is allocated from
+// mheap.spanalloc (it is not called when a span is reused).
+//
+// Write barriers are disallowed here because it can be called from
+// gcWork when allocating new workbufs. However, because it's an
+// indirect call from the fixalloc initializer, the compiler can't see
+// this.
+//
+// The heap lock must be held.
+//
+//go:nowritebarrierrec
+func recordspan(vh unsafe.Pointer, p unsafe.Pointer) {
+	h := (*mheap)(vh)
+	s := (*mspan)(p)
+
+	assertLockHeld(&h.lock)
+
+	if len(h.allspans) >= cap(h.allspans) {
+		n := 64 * 1024 / sys.PtrSize
+		if n < cap(h.allspans)*3/2 {
+			n = cap(h.allspans) * 3 / 2
+		}
+		var new []*mspan
+		sp := (*slice)(unsafe.Pointer(&new))
+		sp.array = sysAlloc(uintptr(n)*sys.PtrSize, &memstats.other_sys)
+		if sp.array == nil {
+			throw("runtime: cannot allocate memory")
+		}
+		sp.len = len(h.allspans)
+		sp.cap = n
+		if len(h.allspans) > 0 {
+			copy(new, h.allspans)
+		}
+		oldAllspans := h.allspans
+		*(*notInHeapSlice)(unsafe.Pointer(&h.allspans)) = *(*notInHeapSlice)(unsafe.Pointer(&new))
+		if len(oldAllspans) != 0 {
+			sysFree(unsafe.Pointer(&oldAllspans[0]), uintptr(cap(oldAllspans))*unsafe.Sizeof(oldAllspans[0]), &memstats.other_sys)
+		}
+	}
+	h.allspans = h.allspans[:len(h.allspans)+1]
+	h.allspans[len(h.allspans)-1] = s
+}
+
+// A spanClass represents the size class and noscan-ness of a span.
+//
+// Each size class has a noscan spanClass and a scan spanClass. The
+// noscan spanClass contains only noscan objects, which do not contain
+// pointers and thus do not need to be scanned by the garbage
+// collector.
+type spanClass uint8
+
+const (
+	numSpanClasses = _NumSizeClasses << 1
+	tinySpanClass  = spanClass(tinySizeClass<<1 | 1)
+)
+
+func makeSpanClass(sizeclass uint8, noscan bool) spanClass {
+	return spanClass(sizeclass<<1) | spanClass(bool2int(noscan))
+}
+
+func (sc spanClass) sizeclass() int8 {
+	return int8(sc >> 1)
+}
+
+func (sc spanClass) noscan() bool {
+	return sc&1 != 0
+}
+
+// arenaIndex returns the index into mheap_.arenas of the arena
+// containing metadata for p. This index combines of an index into the
+// L1 map and an index into the L2 map and should be used as
+// mheap_.arenas[ai.l1()][ai.l2()].
+//
+// If p is outside the range of valid heap addresses, either l1() or
+// l2() will be out of bounds.
+//
+// It is nosplit because it's called by spanOf and several other
+// nosplit functions.
+//
+//go:nosplit
+func arenaIndex(p uintptr) arenaIdx {
+	return arenaIdx((p - arenaBaseOffset) / heapArenaBytes)
+}
+
+// arenaBase returns the low address of the region covered by heap
+// arena i.
+func arenaBase(i arenaIdx) uintptr {
+	return uintptr(i)*heapArenaBytes + arenaBaseOffset
+}
+
+type arenaIdx uint
+
+func (i arenaIdx) l1() uint {
+	if arenaL1Bits == 0 {
+		// Let the compiler optimize this away if there's no
+		// L1 map.
+		return 0
+	} else {
+		return uint(i) >> arenaL1Shift
+	}
+}
+
+func (i arenaIdx) l2() uint {
+	if arenaL1Bits == 0 {
+		return uint(i)
+	} else {
+		return uint(i) & (1<<arenaL2Bits - 1)
+	}
+}
+
+// inheap reports whether b is a pointer into a (potentially dead) heap object.
+// It returns false for pointers into mSpanManual spans.
+// Non-preemptible because it is used by write barriers.
+//go:nowritebarrier
+//go:nosplit
+func inheap(b uintptr) bool {
+	return spanOfHeap(b) != nil
+}
+
+// inHeapOrStack is a variant of inheap that returns true for pointers
+// into any allocated heap span.
+//
+//go:nowritebarrier
+//go:nosplit
+func inHeapOrStack(b uintptr) bool {
+	s := spanOf(b)
+	if s == nil || b < s.base() {
+		return false
+	}
+	switch s.state.get() {
+	case mSpanInUse, mSpanManual:
+		return b < s.limit
+	default:
+		return false
+	}
+}
+
+// spanOf returns the span of p. If p does not point into the heap
+// arena or no span has ever contained p, spanOf returns nil.
+//
+// If p does not point to allocated memory, this may return a non-nil
+// span that does *not* contain p. If this is a possibility, the
+// caller should either call spanOfHeap or check the span bounds
+// explicitly.
+//
+// Must be nosplit because it has callers that are nosplit.
+//
+//go:nosplit
+func spanOf(p uintptr) *mspan {
+	// This function looks big, but we use a lot of constant
+	// folding around arenaL1Bits to get it under the inlining
+	// budget. Also, many of the checks here are safety checks
+	// that Go needs to do anyway, so the generated code is quite
+	// short.
+	ri := arenaIndex(p)
+	if arenaL1Bits == 0 {
+		// If there's no L1, then ri.l1() can't be out of bounds but ri.l2() can.
+		if ri.l2() >= uint(len(mheap_.arenas[0])) {
+			return nil
+		}
+	} else {
+		// If there's an L1, then ri.l1() can be out of bounds but ri.l2() can't.
+		if ri.l1() >= uint(len(mheap_.arenas)) {
+			return nil
+		}
+	}
+	l2 := mheap_.arenas[ri.l1()]
+	if arenaL1Bits != 0 && l2 == nil { // Should never happen if there's no L1.
+		return nil
+	}
+	ha := l2[ri.l2()]
+	if ha == nil {
+		return nil
+	}
+	return ha.spans[(p/pageSize)%pagesPerArena]
+}
+
+// spanOfUnchecked is equivalent to spanOf, but the caller must ensure
+// that p points into an allocated heap arena.
+//
+// Must be nosplit because it has callers that are nosplit.
+//
+//go:nosplit
+func spanOfUnchecked(p uintptr) *mspan {
+	ai := arenaIndex(p)
+	return mheap_.arenas[ai.l1()][ai.l2()].spans[(p/pageSize)%pagesPerArena]
+}
+
+// spanOfHeap is like spanOf, but returns nil if p does not point to a
+// heap object.
+//
+// Must be nosplit because it has callers that are nosplit.
+//
+//go:nosplit
+func spanOfHeap(p uintptr) *mspan {
+	s := spanOf(p)
+	// s is nil if it's never been allocated. Otherwise, we check
+	// its state first because we don't trust this pointer, so we
+	// have to synchronize with span initialization. Then, it's
+	// still possible we picked up a stale span pointer, so we
+	// have to check the span's bounds.
+	if s == nil || s.state.get() != mSpanInUse || p < s.base() || p >= s.limit {
+		return nil
+	}
+	return s
+}
+
+// pageIndexOf returns the arena, page index, and page mask for pointer p.
+// The caller must ensure p is in the heap.
+func pageIndexOf(p uintptr) (arena *heapArena, pageIdx uintptr, pageMask uint8) {
+	ai := arenaIndex(p)
+	arena = mheap_.arenas[ai.l1()][ai.l2()]
+	pageIdx = ((p / pageSize) / 8) % uintptr(len(arena.pageInUse))
+	pageMask = byte(1 << ((p / pageSize) % 8))
+	return
+}
+
+// Initialize the heap.
+func (h *mheap) init() {
+	lockInit(&h.lock, lockRankMheap)
+	lockInit(&h.speciallock, lockRankMheapSpecial)
+
+	h.spanalloc.init(unsafe.Sizeof(mspan{}), recordspan, unsafe.Pointer(h), &memstats.mspan_sys)
+	h.cachealloc.init(unsafe.Sizeof(mcache{}), nil, nil, &memstats.mcache_sys)
+	h.specialfinalizeralloc.init(unsafe.Sizeof(specialfinalizer{}), nil, nil, &memstats.other_sys)
+	h.specialprofilealloc.init(unsafe.Sizeof(specialprofile{}), nil, nil, &memstats.other_sys)
+	h.arenaHintAlloc.init(unsafe.Sizeof(arenaHint{}), nil, nil, &memstats.other_sys)
+
+	// Don't zero mspan allocations. Background sweeping can
+	// inspect a span concurrently with allocating it, so it's
+	// important that the span's sweepgen survive across freeing
+	// and re-allocating a span to prevent background sweeping
+	// from improperly cas'ing it from 0.
+	//
+	// This is safe because mspan contains no heap pointers.
+	h.spanalloc.zero = false
+
+	// h->mapcache needs no init
+
+	for i := range h.central {
+		h.central[i].mcentral.init(spanClass(i))
+	}
+
+	h.pages.init(&h.lock, &memstats.gcMiscSys)
+}
+
+// reclaim sweeps and reclaims at least npage pages into the heap.
+// It is called before allocating npage pages to keep growth in check.
+//
+// reclaim implements the page-reclaimer half of the sweeper.
+//
+// h.lock must NOT be held.
+func (h *mheap) reclaim(npage uintptr) {
+	// TODO(austin): Half of the time spent freeing spans is in
+	// locking/unlocking the heap (even with low contention). We
+	// could make the slow path here several times faster by
+	// batching heap frees.
+
+	// Bail early if there's no more reclaim work.
+	if atomic.Load64(&h.reclaimIndex) >= 1<<63 {
+		return
+	}
+
+	// Disable preemption so the GC can't start while we're
+	// sweeping, so we can read h.sweepArenas, and so
+	// traceGCSweepStart/Done pair on the P.
+	mp := acquirem()
+
+	if trace.enabled {
+		traceGCSweepStart()
+	}
+
+	arenas := h.sweepArenas
+	locked := false
+	for npage > 0 {
+		// Pull from accumulated credit first.
+		if credit := atomic.Loaduintptr(&h.reclaimCredit); credit > 0 {
+			take := credit
+			if take > npage {
+				// Take only what we need.
+				take = npage
+			}
+			if atomic.Casuintptr(&h.reclaimCredit, credit, credit-take) {
+				npage -= take
+			}
+			continue
+		}
+
+		// Claim a chunk of work.
+		idx := uintptr(atomic.Xadd64(&h.reclaimIndex, pagesPerReclaimerChunk) - pagesPerReclaimerChunk)
+		if idx/pagesPerArena >= uintptr(len(arenas)) {
+			// Page reclaiming is done.
+			atomic.Store64(&h.reclaimIndex, 1<<63)
+			break
+		}
+
+		if !locked {
+			// Lock the heap for reclaimChunk.
+			lock(&h.lock)
+			locked = true
+		}
+
+		// Scan this chunk.
+		nfound := h.reclaimChunk(arenas, idx, pagesPerReclaimerChunk)
+		if nfound <= npage {
+			npage -= nfound
+		} else {
+			// Put spare pages toward global credit.
+			atomic.Xadduintptr(&h.reclaimCredit, nfound-npage)
+			npage = 0
+		}
+	}
+	if locked {
+		unlock(&h.lock)
+	}
+
+	if trace.enabled {
+		traceGCSweepDone()
+	}
+	releasem(mp)
+}
+
+// reclaimChunk sweeps unmarked spans that start at page indexes [pageIdx, pageIdx+n).
+// It returns the number of pages returned to the heap.
+//
+// h.lock must be held and the caller must be non-preemptible. Note: h.lock may be
+// temporarily unlocked and re-locked in order to do sweeping or if tracing is
+// enabled.
+func (h *mheap) reclaimChunk(arenas []arenaIdx, pageIdx, n uintptr) uintptr {
+	// The heap lock must be held because this accesses the
+	// heapArena.spans arrays using potentially non-live pointers.
+	// In particular, if a span were freed and merged concurrently
+	// with this probing heapArena.spans, it would be possible to
+	// observe arbitrary, stale span pointers.
+	assertLockHeld(&h.lock)
+
+	n0 := n
+	var nFreed uintptr
+	sg := h.sweepgen
+	for n > 0 {
+		ai := arenas[pageIdx/pagesPerArena]
+		ha := h.arenas[ai.l1()][ai.l2()]
+
+		// Get a chunk of the bitmap to work on.
+		arenaPage := uint(pageIdx % pagesPerArena)
+		inUse := ha.pageInUse[arenaPage/8:]
+		marked := ha.pageMarks[arenaPage/8:]
+		if uintptr(len(inUse)) > n/8 {
+			inUse = inUse[:n/8]
+			marked = marked[:n/8]
+		}
+
+		// Scan this bitmap chunk for spans that are in-use
+		// but have no marked objects on them.
+		for i := range inUse {
+			inUseUnmarked := atomic.Load8(&inUse[i]) &^ marked[i]
+			if inUseUnmarked == 0 {
+				continue
+			}
+
+			for j := uint(0); j < 8; j++ {
+				if inUseUnmarked&(1<<j) != 0 {
+					s := ha.spans[arenaPage+uint(i)*8+j]
+					if atomic.Load(&s.sweepgen) == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
+						npages := s.npages
+						unlock(&h.lock)
+						if s.sweep(false) {
+							nFreed += npages
+						}
+						lock(&h.lock)
+						// Reload inUse. It's possible nearby
+						// spans were freed when we dropped the
+						// lock and we don't want to get stale
+						// pointers from the spans array.
+						inUseUnmarked = atomic.Load8(&inUse[i]) &^ marked[i]
+					}
+				}
+			}
+		}
+
+		// Advance.
+		pageIdx += uintptr(len(inUse) * 8)
+		n -= uintptr(len(inUse) * 8)
+	}
+	if trace.enabled {
+		unlock(&h.lock)
+		// Account for pages scanned but not reclaimed.
+		traceGCSweepSpan((n0 - nFreed) * pageSize)
+		lock(&h.lock)
+	}
+
+	assertLockHeld(&h.lock) // Must be locked on return.
+	return nFreed
+}
+
+// spanAllocType represents the type of allocation to make, or
+// the type of allocation to be freed.
+type spanAllocType uint8
+
+const (
+	spanAllocHeap          spanAllocType = iota // heap span
+	spanAllocStack                              // stack span
+	spanAllocPtrScalarBits                      // unrolled GC prog bitmap span
+	spanAllocWorkBuf                            // work buf span
+)
+
+// manual returns true if the span allocation is manually managed.
+func (s spanAllocType) manual() bool {
+	return s != spanAllocHeap
+}
+
+// alloc allocates a new span of npage pages from the GC'd heap.
+//
+// spanclass indicates the span's size class and scannability.
+//
+// If needzero is true, the memory for the returned span will be zeroed.
+func (h *mheap) alloc(npages uintptr, spanclass spanClass, needzero bool) *mspan {
+	// Don't do any operations that lock the heap on the G stack.
+	// It might trigger stack growth, and the stack growth code needs
+	// to be able to allocate heap.
+	var s *mspan
+	systemstack(func() {
+		// To prevent excessive heap growth, before allocating n pages
+		// we need to sweep and reclaim at least n pages.
+		if h.sweepdone == 0 {
+			h.reclaim(npages)
+		}
+		s = h.allocSpan(npages, spanAllocHeap, spanclass)
+	})
+
+	if s != nil {
+		if needzero && s.needzero != 0 {
+			memclrNoHeapPointers(unsafe.Pointer(s.base()), s.npages<<_PageShift)
+		}
+		s.needzero = 0
+	}
+	return s
+}
+
+// allocManual allocates a manually-managed span of npage pages.
+// allocManual returns nil if allocation fails.
+//
+// allocManual adds the bytes used to *stat, which should be a
+// memstats in-use field. Unlike allocations in the GC'd heap, the
+// allocation does *not* count toward heap_inuse or heap_sys.
+//
+// The memory backing the returned span may not be zeroed if
+// span.needzero is set.
+//
+// allocManual must be called on the system stack because it may
+// acquire the heap lock via allocSpan. See mheap for details.
+//
+// If new code is written to call allocManual, do NOT use an
+// existing spanAllocType value and instead declare a new one.
+//
+//go:systemstack
+func (h *mheap) allocManual(npages uintptr, typ spanAllocType) *mspan {
+	if !typ.manual() {
+		throw("manual span allocation called with non-manually-managed type")
+	}
+	return h.allocSpan(npages, typ, 0)
+}
+
+// setSpans modifies the span map so [spanOf(base), spanOf(base+npage*pageSize))
+// is s.
+func (h *mheap) setSpans(base, npage uintptr, s *mspan) {
+	p := base / pageSize
+	ai := arenaIndex(base)
+	ha := h.arenas[ai.l1()][ai.l2()]
+	for n := uintptr(0); n < npage; n++ {
+		i := (p + n) % pagesPerArena
+		if i == 0 {
+			ai = arenaIndex(base + n*pageSize)
+			ha = h.arenas[ai.l1()][ai.l2()]
+		}
+		ha.spans[i] = s
+	}
+}
+
+// allocNeedsZero checks if the region of address space [base, base+npage*pageSize),
+// assumed to be allocated, needs to be zeroed, updating heap arena metadata for
+// future allocations.
+//
+// This must be called each time pages are allocated from the heap, even if the page
+// allocator can otherwise prove the memory it's allocating is already zero because
+// they're fresh from the operating system. It updates heapArena metadata that is
+// critical for future page allocations.
+//
+// There are no locking constraints on this method.
+func (h *mheap) allocNeedsZero(base, npage uintptr) (needZero bool) {
+	for npage > 0 {
+		ai := arenaIndex(base)
+		ha := h.arenas[ai.l1()][ai.l2()]
+
+		zeroedBase := atomic.Loaduintptr(&ha.zeroedBase)
+		arenaBase := base % heapArenaBytes
+		if arenaBase < zeroedBase {
+			// We extended into the non-zeroed part of the
+			// arena, so this region needs to be zeroed before use.
+			//
+			// zeroedBase is monotonically increasing, so if we see this now then
+			// we can be sure we need to zero this memory region.
+			//
+			// We still need to update zeroedBase for this arena, and
+			// potentially more arenas.
+			needZero = true
+		}
+		// We may observe arenaBase > zeroedBase if we're racing with one or more
+		// allocations which are acquiring memory directly before us in the address
+		// space. But, because we know no one else is acquiring *this* memory, it's
+		// still safe to not zero.
+
+		// Compute how far into the arena we extend into, capped
+		// at heapArenaBytes.
+		arenaLimit := arenaBase + npage*pageSize
+		if arenaLimit > heapArenaBytes {
+			arenaLimit = heapArenaBytes
+		}
+		// Increase ha.zeroedBase so it's >= arenaLimit.
+		// We may be racing with other updates.
+		for arenaLimit > zeroedBase {
+			if atomic.Casuintptr(&ha.zeroedBase, zeroedBase, arenaLimit) {
+				break
+			}
+			zeroedBase = atomic.Loaduintptr(&ha.zeroedBase)
+			// Sanity check zeroedBase.
+			if zeroedBase <= arenaLimit && zeroedBase > arenaBase {
+				// The zeroedBase moved into the space we were trying to
+				// claim. That's very bad, and indicates someone allocated
+				// the same region we did.
+				throw("potentially overlapping in-use allocations detected")
+			}
+		}
+
+		// Move base forward and subtract from npage to move into
+		// the next arena, or finish.
+		base += arenaLimit - arenaBase
+		npage -= (arenaLimit - arenaBase) / pageSize
+	}
+	return
+}
+
+// tryAllocMSpan attempts to allocate an mspan object from
+// the P-local cache, but may fail.
+//
+// h.lock need not be held.
+//
+// This caller must ensure that its P won't change underneath
+// it during this function. Currently to ensure that we enforce
+// that the function is run on the system stack, because that's
+// the only place it is used now. In the future, this requirement
+// may be relaxed if its use is necessary elsewhere.
+//
+//go:systemstack
+func (h *mheap) tryAllocMSpan() *mspan {
+	pp := getg().m.p.ptr()
+	// If we don't have a p or the cache is empty, we can't do
+	// anything here.
+	if pp == nil || pp.mspancache.len == 0 {
+		return nil
+	}
+	// Pull off the last entry in the cache.
+	s := pp.mspancache.buf[pp.mspancache.len-1]
+	pp.mspancache.len--
+	return s
+}
+
+// allocMSpanLocked allocates an mspan object.
+//
+// h.lock must be held.
+//
+// allocMSpanLocked must be called on the system stack because
+// its caller holds the heap lock. See mheap for details.
+// Running on the system stack also ensures that we won't
+// switch Ps during this function. See tryAllocMSpan for details.
+//
+//go:systemstack
+func (h *mheap) allocMSpanLocked() *mspan {
+	assertLockHeld(&h.lock)
+
+	pp := getg().m.p.ptr()
+	if pp == nil {
+		// We don't have a p so just do the normal thing.
+		return (*mspan)(h.spanalloc.alloc())
+	}
+	// Refill the cache if necessary.
+	if pp.mspancache.len == 0 {
+		const refillCount = len(pp.mspancache.buf) / 2
+		for i := 0; i < refillCount; i++ {
+			pp.mspancache.buf[i] = (*mspan)(h.spanalloc.alloc())
+		}
+		pp.mspancache.len = refillCount
+	}
+	// Pull off the last entry in the cache.
+	s := pp.mspancache.buf[pp.mspancache.len-1]
+	pp.mspancache.len--
+	return s
+}
+
+// freeMSpanLocked free an mspan object.
+//
+// h.lock must be held.
+//
+// freeMSpanLocked must be called on the system stack because
+// its caller holds the heap lock. See mheap for details.
+// Running on the system stack also ensures that we won't
+// switch Ps during this function. See tryAllocMSpan for details.
+//
+//go:systemstack
+func (h *mheap) freeMSpanLocked(s *mspan) {
+	assertLockHeld(&h.lock)
+
+	pp := getg().m.p.ptr()
+	// First try to free the mspan directly to the cache.
+	if pp != nil && pp.mspancache.len < len(pp.mspancache.buf) {
+		pp.mspancache.buf[pp.mspancache.len] = s
+		pp.mspancache.len++
+		return
+	}
+	// Failing that (or if we don't have a p), just free it to
+	// the heap.
+	h.spanalloc.free(unsafe.Pointer(s))
+}
+
+// allocSpan allocates an mspan which owns npages worth of memory.
+//
+// If typ.manual() == false, allocSpan allocates a heap span of class spanclass
+// and updates heap accounting. If manual == true, allocSpan allocates a
+// manually-managed span (spanclass is ignored), and the caller is
+// responsible for any accounting related to its use of the span. Either
+// way, allocSpan will atomically add the bytes in the newly allocated
+// span to *sysStat.
+//
+// The returned span is fully initialized.
+//
+// h.lock must not be held.
+//
+// allocSpan must be called on the system stack both because it acquires
+// the heap lock and because it must block GC transitions.
+//
+//go:systemstack
+func (h *mheap) allocSpan(npages uintptr, typ spanAllocType, spanclass spanClass) (s *mspan) {
+	// Function-global state.
+	gp := getg()
+	base, scav := uintptr(0), uintptr(0)
+
+	// On some platforms we need to provide physical page aligned stack
+	// allocations. Where the page size is less than the physical page
+	// size, we already manage to do this by default.
+	needPhysPageAlign := physPageAlignedStacks && typ == spanAllocStack && pageSize < physPageSize
+
+	// If the allocation is small enough, try the page cache!
+	// The page cache does not support aligned allocations, so we cannot use
+	// it if we need to provide a physical page aligned stack allocation.
+	pp := gp.m.p.ptr()
+	if !needPhysPageAlign && pp != nil && npages < pageCachePages/4 {
+		c := &pp.pcache
+
+		// If the cache is empty, refill it.
+		if c.empty() {
+			lock(&h.lock)
+			*c = h.pages.allocToCache()
+			unlock(&h.lock)
+		}
+
+		// Try to allocate from the cache.
+		base, scav = c.alloc(npages)
+		if base != 0 {
+			s = h.tryAllocMSpan()
+			if s != nil {
+				goto HaveSpan
+			}
+			// We have a base but no mspan, so we need
+			// to lock the heap.
+		}
+	}
+
+	// For one reason or another, we couldn't get the
+	// whole job done without the heap lock.
+	lock(&h.lock)
+
+	if needPhysPageAlign {
+		// Overallocate by a physical page to allow for later alignment.
+		npages += physPageSize / pageSize
+	}
+
+	if base == 0 {
+		// Try to acquire a base address.
+		base, scav = h.pages.alloc(npages)
+		if base == 0 {
+			if !h.grow(npages) {
+				unlock(&h.lock)
+				return nil
+			}
+			base, scav = h.pages.alloc(npages)
+			if base == 0 {
+				throw("grew heap, but no adequate free space found")
+			}
+		}
+	}
+	if s == nil {
+		// We failed to get an mspan earlier, so grab
+		// one now that we have the heap lock.
+		s = h.allocMSpanLocked()
+	}
+
+	if needPhysPageAlign {
+		allocBase, allocPages := base, npages
+		base = alignUp(allocBase, physPageSize)
+		npages -= physPageSize / pageSize
+
+		// Return memory around the aligned allocation.
+		spaceBefore := base - allocBase
+		if spaceBefore > 0 {
+			h.pages.free(allocBase, spaceBefore/pageSize)
+		}
+		spaceAfter := (allocPages-npages)*pageSize - spaceBefore
+		if spaceAfter > 0 {
+			h.pages.free(base+npages*pageSize, spaceAfter/pageSize)
+		}
+	}
+
+	unlock(&h.lock)
+
+HaveSpan:
+	// At this point, both s != nil and base != 0, and the heap
+	// lock is no longer held. Initialize the span.
+	s.init(base, npages)
+	if h.allocNeedsZero(base, npages) {
+		s.needzero = 1
+	}
+	nbytes := npages * pageSize
+	if typ.manual() {
+		s.manualFreeList = 0
+		s.nelems = 0
+		s.limit = s.base() + s.npages*pageSize
+		s.state.set(mSpanManual)
+	} else {
+		// We must set span properties before the span is published anywhere
+		// since we're not holding the heap lock.
+		s.spanclass = spanclass
+		if sizeclass := spanclass.sizeclass(); sizeclass == 0 {
+			s.elemsize = nbytes
+			s.nelems = 1
+
+			s.divShift = 0
+			s.divMul = 0
+			s.divShift2 = 0
+			s.baseMask = 0
+		} else {
+			s.elemsize = uintptr(class_to_size[sizeclass])
+			s.nelems = nbytes / s.elemsize
+
+			m := &class_to_divmagic[sizeclass]
+			s.divShift = m.shift
+			s.divMul = m.mul
+			s.divShift2 = m.shift2
+			s.baseMask = m.baseMask
+		}
+
+		// Initialize mark and allocation structures.
+		s.freeindex = 0
+		s.allocCache = ^uint64(0) // all 1s indicating all free.
+		s.gcmarkBits = newMarkBits(s.nelems)
+		s.allocBits = newAllocBits(s.nelems)
+
+		// It's safe to access h.sweepgen without the heap lock because it's
+		// only ever updated with the world stopped and we run on the
+		// systemstack which blocks a STW transition.
+		atomic.Store(&s.sweepgen, h.sweepgen)
+
+		// Now that the span is filled in, set its state. This
+		// is a publication barrier for the other fields in
+		// the span. While valid pointers into this span
+		// should never be visible until the span is returned,
+		// if the garbage collector finds an invalid pointer,
+		// access to the span may race with initialization of
+		// the span. We resolve this race by atomically
+		// setting the state after the span is fully
+		// initialized, and atomically checking the state in
+		// any situation where a pointer is suspect.
+		s.state.set(mSpanInUse)
+	}
+
+	// Commit and account for any scavenged memory that the span now owns.
+	if scav != 0 {
+		// sysUsed all the pages that are actually available
+		// in the span since some of them might be scavenged.
+		sysUsed(unsafe.Pointer(base), nbytes)
+		atomic.Xadd64(&memstats.heap_released, -int64(scav))
+	}
+	// Update stats.
+	if typ == spanAllocHeap {
+		atomic.Xadd64(&memstats.heap_inuse, int64(nbytes))
+	}
+	if typ.manual() {
+		// Manually managed memory doesn't count toward heap_sys.
+		memstats.heap_sys.add(-int64(nbytes))
+	}
+	// Update consistent stats.
+	stats := memstats.heapStats.acquire()
+	atomic.Xaddint64(&stats.committed, int64(scav))
+	atomic.Xaddint64(&stats.released, -int64(scav))
+	switch typ {
+	case spanAllocHeap:
+		atomic.Xaddint64(&stats.inHeap, int64(nbytes))
+	case spanAllocStack:
+		atomic.Xaddint64(&stats.inStacks, int64(nbytes))
+	case spanAllocPtrScalarBits:
+		atomic.Xaddint64(&stats.inPtrScalarBits, int64(nbytes))
+	case spanAllocWorkBuf:
+		atomic.Xaddint64(&stats.inWorkBufs, int64(nbytes))
+	}
+	memstats.heapStats.release()
+
+	// Publish the span in various locations.
+
+	// This is safe to call without the lock held because the slots
+	// related to this span will only ever be read or modified by
+	// this thread until pointers into the span are published (and
+	// we execute a publication barrier at the end of this function
+	// before that happens) or pageInUse is updated.
+	h.setSpans(s.base(), npages, s)
+
+	if !typ.manual() {
+		// Mark in-use span in arena page bitmap.
+		//
+		// This publishes the span to the page sweeper, so
+		// it's imperative that the span be completely initialized
+		// prior to this line.
+		arena, pageIdx, pageMask := pageIndexOf(s.base())
+		atomic.Or8(&arena.pageInUse[pageIdx], pageMask)
+
+		// Update related page sweeper stats.
+		atomic.Xadd64(&h.pagesInUse, int64(npages))
+	}
+
+	// Make sure the newly allocated span will be observed
+	// by the GC before pointers into the span are published.
+	publicationBarrier()
+
+	return s
+}
+
+// Try to add at least npage pages of memory to the heap,
+// returning whether it worked.
+//
+// h.lock must be held.
+func (h *mheap) grow(npage uintptr) bool {
+	assertLockHeld(&h.lock)
+
+	// We must grow the heap in whole palloc chunks.
+	ask := alignUp(npage, pallocChunkPages) * pageSize
+
+	totalGrowth := uintptr(0)
+	// This may overflow because ask could be very large
+	// and is otherwise unrelated to h.curArena.base.
+	end := h.curArena.base + ask
+	nBase := alignUp(end, physPageSize)
+	if nBase > h.curArena.end || /* overflow */ end < h.curArena.base {
+		// Not enough room in the current arena. Allocate more
+		// arena space. This may not be contiguous with the
+		// current arena, so we have to request the full ask.
+		av, asize := h.sysAlloc(ask)
+		if av == nil {
+			print("runtime: out of memory: cannot allocate ", ask, "-byte block (", memstats.heap_sys, " in use)\n")
+			return false
+		}
+
+		if uintptr(av) == h.curArena.end {
+			// The new space is contiguous with the old
+			// space, so just extend the current space.
+			h.curArena.end = uintptr(av) + asize
+		} else {
+			// The new space is discontiguous. Track what
+			// remains of the current space and switch to
+			// the new space. This should be rare.
+			if size := h.curArena.end - h.curArena.base; size != 0 {
+				h.pages.grow(h.curArena.base, size)
+				totalGrowth += size
+			}
+			// Switch to the new space.
+			h.curArena.base = uintptr(av)
+			h.curArena.end = uintptr(av) + asize
+		}
+
+		// The memory just allocated counts as both released
+		// and idle, even though it's not yet backed by spans.
+		//
+		// The allocation is always aligned to the heap arena
+		// size which is always > physPageSize, so its safe to
+		// just add directly to heap_released.
+		atomic.Xadd64(&memstats.heap_released, int64(asize))
+		stats := memstats.heapStats.acquire()
+		atomic.Xaddint64(&stats.released, int64(asize))
+		memstats.heapStats.release()
+
+		// Recalculate nBase.
+		// We know this won't overflow, because sysAlloc returned
+		// a valid region starting at h.curArena.base which is at
+		// least ask bytes in size.
+		nBase = alignUp(h.curArena.base+ask, physPageSize)
+	}
+
+	// Grow into the current arena.
+	v := h.curArena.base
+	h.curArena.base = nBase
+	h.pages.grow(v, nBase-v)
+	totalGrowth += nBase - v
+
+	// We just caused a heap growth, so scavenge down what will soon be used.
+	// By scavenging inline we deal with the failure to allocate out of
+	// memory fragments by scavenging the memory fragments that are least
+	// likely to be re-used.
+	if retained := heapRetained(); retained+uint64(totalGrowth) > h.scavengeGoal {
+		todo := totalGrowth
+		if overage := uintptr(retained + uint64(totalGrowth) - h.scavengeGoal); todo > overage {
+			todo = overage
+		}
+		h.pages.scavenge(todo, false)
+	}
+	return true
+}
+
+// Free the span back into the heap.
+func (h *mheap) freeSpan(s *mspan) {
+	systemstack(func() {
+		lock(&h.lock)
+		if msanenabled {
+			// Tell msan that this entire span is no longer in use.
+			base := unsafe.Pointer(s.base())
+			bytes := s.npages << _PageShift
+			msanfree(base, bytes)
+		}
+		h.freeSpanLocked(s, spanAllocHeap)
+		unlock(&h.lock)
+	})
+}
+
+// freeManual frees a manually-managed span returned by allocManual.
+// typ must be the same as the spanAllocType passed to the allocManual that
+// allocated s.
+//
+// This must only be called when gcphase == _GCoff. See mSpanState for
+// an explanation.
+//
+// freeManual must be called on the system stack because it acquires
+// the heap lock. See mheap for details.
+//
+//go:systemstack
+func (h *mheap) freeManual(s *mspan, typ spanAllocType) {
+	s.needzero = 1
+	lock(&h.lock)
+	h.freeSpanLocked(s, typ)
+	unlock(&h.lock)
+}
+
+func (h *mheap) freeSpanLocked(s *mspan, typ spanAllocType) {
+	assertLockHeld(&h.lock)
+
+	switch s.state.get() {
+	case mSpanManual:
+		if s.allocCount != 0 {
+			throw("mheap.freeSpanLocked - invalid stack free")
+		}
+	case mSpanInUse:
+		if s.allocCount != 0 || s.sweepgen != h.sweepgen {
+			print("mheap.freeSpanLocked - span ", s, " ptr ", hex(s.base()), " allocCount ", s.allocCount, " sweepgen ", s.sweepgen, "/", h.sweepgen, "\n")
+			throw("mheap.freeSpanLocked - invalid free")
+		}
+		atomic.Xadd64(&h.pagesInUse, -int64(s.npages))
+
+		// Clear in-use bit in arena page bitmap.
+		arena, pageIdx, pageMask := pageIndexOf(s.base())
+		atomic.And8(&arena.pageInUse[pageIdx], ^pageMask)
+	default:
+		throw("mheap.freeSpanLocked - invalid span state")
+	}
+
+	// Update stats.
+	//
+	// Mirrors the code in allocSpan.
+	nbytes := s.npages * pageSize
+	if typ == spanAllocHeap {
+		atomic.Xadd64(&memstats.heap_inuse, -int64(nbytes))
+	}
+	if typ.manual() {
+		// Manually managed memory doesn't count toward heap_sys, so add it back.
+		memstats.heap_sys.add(int64(nbytes))
+	}
+	// Update consistent stats.
+	stats := memstats.heapStats.acquire()
+	switch typ {
+	case spanAllocHeap:
+		atomic.Xaddint64(&stats.inHeap, -int64(nbytes))
+	case spanAllocStack:
+		atomic.Xaddint64(&stats.inStacks, -int64(nbytes))
+	case spanAllocPtrScalarBits:
+		atomic.Xaddint64(&stats.inPtrScalarBits, -int64(nbytes))
+	case spanAllocWorkBuf:
+		atomic.Xaddint64(&stats.inWorkBufs, -int64(nbytes))
+	}
+	memstats.heapStats.release()
+
+	// Mark the space as free.
+	h.pages.free(s.base(), s.npages)
+
+	// Free the span structure. We no longer have a use for it.
+	s.state.set(mSpanDead)
+	h.freeMSpanLocked(s)
+}
+
+// scavengeAll acquires the heap lock (blocking any additional
+// manipulation of the page allocator) and iterates over the whole
+// heap, scavenging every free page available.
+func (h *mheap) scavengeAll() {
+	// Disallow malloc or panic while holding the heap lock. We do
+	// this here because this is a non-mallocgc entry-point to
+	// the mheap API.
+	gp := getg()
+	gp.m.mallocing++
+	lock(&h.lock)
+	// Start a new scavenge generation so we have a chance to walk
+	// over the whole heap.
+	h.pages.scavengeStartGen()
+	released := h.pages.scavenge(^uintptr(0), false)
+	gen := h.pages.scav.gen
+	unlock(&h.lock)
+	gp.m.mallocing--
+
+	if debug.scavtrace > 0 {
+		printScavTrace(gen, released, true)
+	}
+}
+
+//go:linkname runtime_debug_freeOSMemory runtime/debug.freeOSMemory
+func runtime_debug_freeOSMemory() {
+	GC()
+	systemstack(func() { mheap_.scavengeAll() })
+}
+
+// Initialize a new span with the given start and npages.
+func (span *mspan) init(base uintptr, npages uintptr) {
+	// span is *not* zeroed.
+	span.next = nil
+	span.prev = nil
+	span.list = nil
+	span.startAddr = base
+	span.npages = npages
+	span.allocCount = 0
+	span.spanclass = 0
+	span.elemsize = 0
+	span.speciallock.key = 0
+	span.specials = nil
+	span.needzero = 0
+	span.freeindex = 0
+	span.allocBits = nil
+	span.gcmarkBits = nil
+	span.state.set(mSpanDead)
+	lockInit(&span.speciallock, lockRankMspanSpecial)
+}
+
+func (span *mspan) inList() bool {
+	return span.list != nil
+}
+
+// Initialize an empty doubly-linked list.
+func (list *mSpanList) init() {
+	list.first = nil
+	list.last = nil
+}
+
+func (list *mSpanList) remove(span *mspan) {
+	if span.list != list {
+		print("runtime: failed mSpanList.remove span.npages=", span.npages,
+			" span=", span, " prev=", span.prev, " span.list=", span.list, " list=", list, "\n")
+		throw("mSpanList.remove")
+	}
+	if list.first == span {
+		list.first = span.next
+	} else {
+		span.prev.next = span.next
+	}
+	if list.last == span {
+		list.last = span.prev
+	} else {
+		span.next.prev = span.prev
+	}
+	span.next = nil
+	span.prev = nil
+	span.list = nil
+}
+
+func (list *mSpanList) isEmpty() bool {
+	return list.first == nil
+}
+
+func (list *mSpanList) insert(span *mspan) {
+	if span.next != nil || span.prev != nil || span.list != nil {
+		println("runtime: failed mSpanList.insert", span, span.next, span.prev, span.list)
+		throw("mSpanList.insert")
+	}
+	span.next = list.first
+	if list.first != nil {
+		// The list contains at least one span; link it in.
+		// The last span in the list doesn't change.
+		list.first.prev = span
+	} else {
+		// The list contains no spans, so this is also the last span.
+		list.last = span
+	}
+	list.first = span
+	span.list = list
+}
+
+func (list *mSpanList) insertBack(span *mspan) {
+	if span.next != nil || span.prev != nil || span.list != nil {
+		println("runtime: failed mSpanList.insertBack", span, span.next, span.prev, span.list)
+		throw("mSpanList.insertBack")
+	}
+	span.prev = list.last
+	if list.last != nil {
+		// The list contains at least one span.
+		list.last.next = span
+	} else {
+		// The list contains no spans, so this is also the first span.
+		list.first = span
+	}
+	list.last = span
+	span.list = list
+}
+
+// takeAll removes all spans from other and inserts them at the front
+// of list.
+func (list *mSpanList) takeAll(other *mSpanList) {
+	if other.isEmpty() {
+		return
+	}
+
+	// Reparent everything in other to list.
+	for s := other.first; s != nil; s = s.next {
+		s.list = list
+	}
+
+	// Concatenate the lists.
+	if list.isEmpty() {
+		*list = *other
+	} else {
+		// Neither list is empty. Put other before list.
+		other.last.next = list.first
+		list.first.prev = other.last
+		list.first = other.first
+	}
+
+	other.first, other.last = nil, nil
+}
+
+const (
+	_KindSpecialFinalizer = 1
+	_KindSpecialProfile   = 2
+	// Note: The finalizer special must be first because if we're freeing
+	// an object, a finalizer special will cause the freeing operation
+	// to abort, and we want to keep the other special records around
+	// if that happens.
+)
+
+//go:notinheap
+type special struct {
+	next   *special // linked list in span
+	offset uint16   // span offset of object
+	kind   byte     // kind of special
+}
+
+// spanHasSpecials marks a span as having specials in the arena bitmap.
+func spanHasSpecials(s *mspan) {
+	arenaPage := (s.base() / pageSize) % pagesPerArena
+	ai := arenaIndex(s.base())
+	ha := mheap_.arenas[ai.l1()][ai.l2()]
+	atomic.Or8(&ha.pageSpecials[arenaPage/8], uint8(1)<<(arenaPage%8))
+}
+
+// spanHasNoSpecials marks a span as having no specials in the arena bitmap.
+func spanHasNoSpecials(s *mspan) {
+	arenaPage := (s.base() / pageSize) % pagesPerArena
+	ai := arenaIndex(s.base())
+	ha := mheap_.arenas[ai.l1()][ai.l2()]
+	atomic.And8(&ha.pageSpecials[arenaPage/8], ^(uint8(1) << (arenaPage % 8)))
+}
+
+// Adds the special record s to the list of special records for
+// the object p. All fields of s should be filled in except for
+// offset & next, which this routine will fill in.
+// Returns true if the special was successfully added, false otherwise.
+// (The add will fail only if a record with the same p and s->kind
+//  already exists.)
+func addspecial(p unsafe.Pointer, s *special) bool {
+	span := spanOfHeap(uintptr(p))
+	if span == nil {
+		throw("addspecial on invalid pointer")
+	}
+
+	// Ensure that the span is swept.
+	// Sweeping accesses the specials list w/o locks, so we have
+	// to synchronize with it. And it's just much safer.
+	mp := acquirem()
+	span.ensureSwept()
+
+	offset := uintptr(p) - span.base()
+	kind := s.kind
+
+	lock(&span.speciallock)
+
+	// Find splice point, check for existing record.
+	t := &span.specials
+	for {
+		x := *t
+		if x == nil {
+			break
+		}
+		if offset == uintptr(x.offset) && kind == x.kind {
+			unlock(&span.speciallock)
+			releasem(mp)
+			return false // already exists
+		}
+		if offset < uintptr(x.offset) || (offset == uintptr(x.offset) && kind < x.kind) {
+			break
+		}
+		t = &x.next
+	}
+
+	// Splice in record, fill in offset.
+	s.offset = uint16(offset)
+	s.next = *t
+	*t = s
+	spanHasSpecials(span)
+	unlock(&span.speciallock)
+	releasem(mp)
+
+	return true
+}
+
+// Removes the Special record of the given kind for the object p.
+// Returns the record if the record existed, nil otherwise.
+// The caller must FixAlloc_Free the result.
+func removespecial(p unsafe.Pointer, kind uint8) *special {
+	span := spanOfHeap(uintptr(p))
+	if span == nil {
+		throw("removespecial on invalid pointer")
+	}
+
+	// Ensure that the span is swept.
+	// Sweeping accesses the specials list w/o locks, so we have
+	// to synchronize with it. And it's just much safer.
+	mp := acquirem()
+	span.ensureSwept()
+
+	offset := uintptr(p) - span.base()
+
+	var result *special
+	lock(&span.speciallock)
+	t := &span.specials
+	for {
+		s := *t
+		if s == nil {
+			break
+		}
+		// This function is used for finalizers only, so we don't check for
+		// "interior" specials (p must be exactly equal to s->offset).
+		if offset == uintptr(s.offset) && kind == s.kind {
+			*t = s.next
+			result = s
+			break
+		}
+		t = &s.next
+	}
+	if span.specials == nil {
+		spanHasNoSpecials(span)
+	}
+	unlock(&span.speciallock)
+	releasem(mp)
+	return result
+}
+
+// The described object has a finalizer set for it.
+//
+// specialfinalizer is allocated from non-GC'd memory, so any heap
+// pointers must be specially handled.
+//
+//go:notinheap
+type specialfinalizer struct {
+	special special
+	fn      *funcval // May be a heap pointer.
+	nret    uintptr
+	fint    *_type   // May be a heap pointer, but always live.
+	ot      *ptrtype // May be a heap pointer, but always live.
+}
+
+// Adds a finalizer to the object p. Returns true if it succeeded.
+func addfinalizer(p unsafe.Pointer, f *funcval, nret uintptr, fint *_type, ot *ptrtype) bool {
+	lock(&mheap_.speciallock)
+	s := (*specialfinalizer)(mheap_.specialfinalizeralloc.alloc())
+	unlock(&mheap_.speciallock)
+	s.special.kind = _KindSpecialFinalizer
+	s.fn = f
+	s.nret = nret
+	s.fint = fint
+	s.ot = ot
+	if addspecial(p, &s.special) {
+		// This is responsible for maintaining the same
+		// GC-related invariants as markrootSpans in any
+		// situation where it's possible that markrootSpans
+		// has already run but mark termination hasn't yet.
+		if gcphase != _GCoff {
+			base, _, _ := findObject(uintptr(p), 0, 0)
+			mp := acquirem()
+			gcw := &mp.p.ptr().gcw
+			// Mark everything reachable from the object
+			// so it's retained for the finalizer.
+			scanobject(base, gcw)
+			// Mark the finalizer itself, since the
+			// special isn't part of the GC'd heap.
+			scanblock(uintptr(unsafe.Pointer(&s.fn)), sys.PtrSize, &oneptrmask[0], gcw, nil)
+			releasem(mp)
+		}
+		return true
+	}
+
+	// There was an old finalizer
+	lock(&mheap_.speciallock)
+	mheap_.specialfinalizeralloc.free(unsafe.Pointer(s))
+	unlock(&mheap_.speciallock)
+	return false
+}
+
+// Removes the finalizer (if any) from the object p.
+func removefinalizer(p unsafe.Pointer) {
+	s := (*specialfinalizer)(unsafe.Pointer(removespecial(p, _KindSpecialFinalizer)))
+	if s == nil {
+		return // there wasn't a finalizer to remove
+	}
+	lock(&mheap_.speciallock)
+	mheap_.specialfinalizeralloc.free(unsafe.Pointer(s))
+	unlock(&mheap_.speciallock)
+}
+
+// The described object is being heap profiled.
+//
+//go:notinheap
+type specialprofile struct {
+	special special
+	b       *bucket
+}
+
+// Set the heap profile bucket associated with addr to b.
+func setprofilebucket(p unsafe.Pointer, b *bucket) {
+	lock(&mheap_.speciallock)
+	s := (*specialprofile)(mheap_.specialprofilealloc.alloc())
+	unlock(&mheap_.speciallock)
+	s.special.kind = _KindSpecialProfile
+	s.b = b
+	if !addspecial(p, &s.special) {
+		throw("setprofilebucket: profile already set")
+	}
+}
+
+// Do whatever cleanup needs to be done to deallocate s. It has
+// already been unlinked from the mspan specials list.
+func freespecial(s *special, p unsafe.Pointer, size uintptr) {
+	switch s.kind {
+	case _KindSpecialFinalizer:
+		sf := (*specialfinalizer)(unsafe.Pointer(s))
+		queuefinalizer(p, sf.fn, sf.nret, sf.fint, sf.ot)
+		lock(&mheap_.speciallock)
+		mheap_.specialfinalizeralloc.free(unsafe.Pointer(sf))
+		unlock(&mheap_.speciallock)
+	case _KindSpecialProfile:
+		sp := (*specialprofile)(unsafe.Pointer(s))
+		mProf_Free(sp.b, size)
+		lock(&mheap_.speciallock)
+		mheap_.specialprofilealloc.free(unsafe.Pointer(sp))
+		unlock(&mheap_.speciallock)
+	default:
+		throw("bad special kind")
+		panic("not reached")
+	}
+}
+
+// gcBits is an alloc/mark bitmap. This is always used as *gcBits.
+//
+//go:notinheap
+type gcBits uint8
+
+// bytep returns a pointer to the n'th byte of b.
+func (b *gcBits) bytep(n uintptr) *uint8 {
+	return addb((*uint8)(b), n)
+}
+
+// bitp returns a pointer to the byte containing bit n and a mask for
+// selecting that bit from *bytep.
+func (b *gcBits) bitp(n uintptr) (bytep *uint8, mask uint8) {
+	return b.bytep(n / 8), 1 << (n % 8)
+}
+
+const gcBitsChunkBytes = uintptr(64 << 10)
+const gcBitsHeaderBytes = unsafe.Sizeof(gcBitsHeader{})
+
+type gcBitsHeader struct {
+	free uintptr // free is the index into bits of the next free byte.
+	next uintptr // *gcBits triggers recursive type bug. (issue 14620)
+}
+
+//go:notinheap
+type gcBitsArena struct {
+	// gcBitsHeader // side step recursive type bug (issue 14620) by including fields by hand.
+	free uintptr // free is the index into bits of the next free byte; read/write atomically
+	next *gcBitsArena
+	bits [gcBitsChunkBytes - gcBitsHeaderBytes]gcBits
+}
+
+var gcBitsArenas struct {
+	lock     mutex
+	free     *gcBitsArena
+	next     *gcBitsArena // Read atomically. Write atomically under lock.
+	current  *gcBitsArena
+	previous *gcBitsArena
+}
+
+// tryAlloc allocates from b or returns nil if b does not have enough room.
+// This is safe to call concurrently.
+func (b *gcBitsArena) tryAlloc(bytes uintptr) *gcBits {
+	if b == nil || atomic.Loaduintptr(&b.free)+bytes > uintptr(len(b.bits)) {
+		return nil
+	}
+	// Try to allocate from this block.
+	end := atomic.Xadduintptr(&b.free, bytes)
+	if end > uintptr(len(b.bits)) {
+		return nil
+	}
+	// There was enough room.
+	start := end - bytes
+	return &b.bits[start]
+}
+
+// newMarkBits returns a pointer to 8 byte aligned bytes
+// to be used for a span's mark bits.
+func newMarkBits(nelems uintptr) *gcBits {
+	blocksNeeded := uintptr((nelems + 63) / 64)
+	bytesNeeded := blocksNeeded * 8
+
+	// Try directly allocating from the current head arena.
+	head := (*gcBitsArena)(atomic.Loadp(unsafe.Pointer(&gcBitsArenas.next)))
+	if p := head.tryAlloc(bytesNeeded); p != nil {
+		return p
+	}
+
+	// There's not enough room in the head arena. We may need to
+	// allocate a new arena.
+	lock(&gcBitsArenas.lock)
+	// Try the head arena again, since it may have changed. Now
+	// that we hold the lock, the list head can't change, but its
+	// free position still can.
+	if p := gcBitsArenas.next.tryAlloc(bytesNeeded); p != nil {
+		unlock(&gcBitsArenas.lock)
+		return p
+	}
+
+	// Allocate a new arena. This may temporarily drop the lock.
+	fresh := newArenaMayUnlock()
+	// If newArenaMayUnlock dropped the lock, another thread may
+	// have put a fresh arena on the "next" list. Try allocating
+	// from next again.
+	if p := gcBitsArenas.next.tryAlloc(bytesNeeded); p != nil {
+		// Put fresh back on the free list.
+		// TODO: Mark it "already zeroed"
+		fresh.next = gcBitsArenas.free
+		gcBitsArenas.free = fresh
+		unlock(&gcBitsArenas.lock)
+		return p
+	}
+
+	// Allocate from the fresh arena. We haven't linked it in yet, so
+	// this cannot race and is guaranteed to succeed.
+	p := fresh.tryAlloc(bytesNeeded)
+	if p == nil {
+		throw("markBits overflow")
+	}
+
+	// Add the fresh arena to the "next" list.
+	fresh.next = gcBitsArenas.next
+	atomic.StorepNoWB(unsafe.Pointer(&gcBitsArenas.next), unsafe.Pointer(fresh))
+
+	unlock(&gcBitsArenas.lock)
+	return p
+}
+
+// newAllocBits returns a pointer to 8 byte aligned bytes
+// to be used for this span's alloc bits.
+// newAllocBits is used to provide newly initialized spans
+// allocation bits. For spans not being initialized the
+// mark bits are repurposed as allocation bits when
+// the span is swept.
+func newAllocBits(nelems uintptr) *gcBits {
+	return newMarkBits(nelems)
+}
+
+// nextMarkBitArenaEpoch establishes a new epoch for the arenas
+// holding the mark bits. The arenas are named relative to the
+// current GC cycle which is demarcated by the call to finishweep_m.
+//
+// All current spans have been swept.
+// During that sweep each span allocated room for its gcmarkBits in
+// gcBitsArenas.next block. gcBitsArenas.next becomes the gcBitsArenas.current
+// where the GC will mark objects and after each span is swept these bits
+// will be used to allocate objects.
+// gcBitsArenas.current becomes gcBitsArenas.previous where the span's
+// gcAllocBits live until all the spans have been swept during this GC cycle.
+// The span's sweep extinguishes all the references to gcBitsArenas.previous
+// by pointing gcAllocBits into the gcBitsArenas.current.
+// The gcBitsArenas.previous is released to the gcBitsArenas.free list.
+func nextMarkBitArenaEpoch() {
+	lock(&gcBitsArenas.lock)
+	if gcBitsArenas.previous != nil {
+		if gcBitsArenas.free == nil {
+			gcBitsArenas.free = gcBitsArenas.previous
+		} else {
+			// Find end of previous arenas.
+			last := gcBitsArenas.previous
+			for last = gcBitsArenas.previous; last.next != nil; last = last.next {
+			}
+			last.next = gcBitsArenas.free
+			gcBitsArenas.free = gcBitsArenas.previous
+		}
+	}
+	gcBitsArenas.previous = gcBitsArenas.current
+	gcBitsArenas.current = gcBitsArenas.next
+	atomic.StorepNoWB(unsafe.Pointer(&gcBitsArenas.next), nil) // newMarkBits calls newArena when needed
+	unlock(&gcBitsArenas.lock)
+}
+
+// newArenaMayUnlock allocates and zeroes a gcBits arena.
+// The caller must hold gcBitsArena.lock. This may temporarily release it.
+func newArenaMayUnlock() *gcBitsArena {
+	var result *gcBitsArena
+	if gcBitsArenas.free == nil {
+		unlock(&gcBitsArenas.lock)
+		result = (*gcBitsArena)(sysAlloc(gcBitsChunkBytes, &memstats.gcMiscSys))
+		if result == nil {
+			throw("runtime: cannot allocate memory")
+		}
+		lock(&gcBitsArenas.lock)
+	} else {
+		result = gcBitsArenas.free
+		gcBitsArenas.free = gcBitsArenas.free.next
+		memclrNoHeapPointers(unsafe.Pointer(result), gcBitsChunkBytes)
+	}
+	result.next = nil
+	// If result.bits is not 8 byte aligned adjust index so
+	// that &result.bits[result.free] is 8 byte aligned.
+	if uintptr(unsafe.Offsetof(gcBitsArena{}.bits))&7 == 0 {
+		result.free = 0
+	} else {
+		result.free = 8 - (uintptr(unsafe.Pointer(&result.bits[0])) & 7)
+	}
+	return result
+}
diff --git a/src/runtime/mkduff.go b/src/runtime/mkduff.go
new file mode 100644
index 0000000..94ae75f
--- /dev/null
+++ b/src/runtime/mkduff.go
@@ -0,0 +1,257 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+// runtime·duffzero is a Duff's device for zeroing memory.
+// The compiler jumps to computed addresses within
+// the routine to zero chunks of memory.
+// Do not change duffzero without also
+// changing the uses in cmd/compile/internal/*/*.go.
+
+// runtime·duffcopy is a Duff's device for copying memory.
+// The compiler jumps to computed addresses within
+// the routine to copy chunks of memory.
+// Source and destination must not overlap.
+// Do not change duffcopy without also
+// changing the uses in cmd/compile/internal/*/*.go.
+
+// See the zero* and copy* generators below
+// for architecture-specific comments.
+
+// mkduff generates duff_*.s.
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"io"
+	"log"
+	"os"
+)
+
+func main() {
+	gen("amd64", notags, zeroAMD64, copyAMD64)
+	gen("386", notags, zero386, copy386)
+	gen("arm", notags, zeroARM, copyARM)
+	gen("arm64", notags, zeroARM64, copyARM64)
+	gen("ppc64x", tagsPPC64x, zeroPPC64x, copyPPC64x)
+	gen("mips64x", tagsMIPS64x, zeroMIPS64x, copyMIPS64x)
+	gen("riscv64", notags, zeroRISCV64, copyRISCV64)
+}
+
+func gen(arch string, tags, zero, copy func(io.Writer)) {
+	var buf bytes.Buffer
+
+	fmt.Fprintln(&buf, "// Code generated by mkduff.go; DO NOT EDIT.")
+	fmt.Fprintln(&buf, "// Run go generate from src/runtime to update.")
+	fmt.Fprintln(&buf, "// See mkduff.go for comments.")
+	tags(&buf)
+	fmt.Fprintln(&buf, "#include \"textflag.h\"")
+	fmt.Fprintln(&buf)
+	zero(&buf)
+	fmt.Fprintln(&buf)
+	copy(&buf)
+
+	if err := os.WriteFile("duff_"+arch+".s", buf.Bytes(), 0644); err != nil {
+		log.Fatalln(err)
+	}
+}
+
+func notags(w io.Writer) { fmt.Fprintln(w) }
+
+func zeroAMD64(w io.Writer) {
+	// X0: zero
+	// DI: ptr to memory to be zeroed
+	// DI is updated as a side effect.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT, $0-0")
+	for i := 0; i < 16; i++ {
+		fmt.Fprintln(w, "\tMOVUPS\tX0,(DI)")
+		fmt.Fprintln(w, "\tMOVUPS\tX0,16(DI)")
+		fmt.Fprintln(w, "\tMOVUPS\tX0,32(DI)")
+		fmt.Fprintln(w, "\tMOVUPS\tX0,48(DI)")
+		fmt.Fprintln(w, "\tLEAQ\t64(DI),DI") // We use lea instead of add, to avoid clobbering flags
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyAMD64(w io.Writer) {
+	// SI: ptr to source memory
+	// DI: ptr to destination memory
+	// SI and DI are updated as a side effect.
+	//
+	// This is equivalent to a sequence of MOVSQ but
+	// for some reason that is 3.5x slower than this code.
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT, $0-0")
+	for i := 0; i < 64; i++ {
+		fmt.Fprintln(w, "\tMOVUPS\t(SI), X0")
+		fmt.Fprintln(w, "\tADDQ\t$16, SI")
+		fmt.Fprintln(w, "\tMOVUPS\tX0, (DI)")
+		fmt.Fprintln(w, "\tADDQ\t$16, DI")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func zero386(w io.Writer) {
+	// AX: zero
+	// DI: ptr to memory to be zeroed
+	// DI is updated as a side effect.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tSTOSL")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copy386(w io.Writer) {
+	// SI: ptr to source memory
+	// DI: ptr to destination memory
+	// SI and DI are updated as a side effect.
+	//
+	// This is equivalent to a sequence of MOVSL but
+	// for some reason MOVSL is really slow.
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVL\t(SI), CX")
+		fmt.Fprintln(w, "\tADDL\t$4, SI")
+		fmt.Fprintln(w, "\tMOVL\tCX, (DI)")
+		fmt.Fprintln(w, "\tADDL\t$4, DI")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func zeroARM(w io.Writer) {
+	// R0: zero
+	// R1: ptr to memory to be zeroed
+	// R1 is updated as a side effect.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVW.P\tR0, 4(R1)")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyARM(w io.Writer) {
+	// R0: scratch space
+	// R1: ptr to source memory
+	// R2: ptr to destination memory
+	// R1 and R2 are updated as a side effect
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVW.P\t4(R1), R0")
+		fmt.Fprintln(w, "\tMOVW.P\tR0, 4(R2)")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func zeroARM64(w io.Writer) {
+	// ZR: always zero
+	// R20: ptr to memory to be zeroed
+	// On return, R20 points to the last zeroed dword.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 63; i++ {
+		fmt.Fprintln(w, "\tSTP.P\t(ZR, ZR), 16(R20)")
+	}
+	fmt.Fprintln(w, "\tSTP\t(ZR, ZR), (R20)")
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyARM64(w io.Writer) {
+	// R20: ptr to source memory
+	// R21: ptr to destination memory
+	// R26, R27 (aka REGTMP): scratch space
+	// R20 and R21 are updated as a side effect
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0")
+
+	for i := 0; i < 64; i++ {
+		fmt.Fprintln(w, "\tLDP.P\t16(R20), (R26, R27)")
+		fmt.Fprintln(w, "\tSTP.P\t(R26, R27), 16(R21)")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func tagsPPC64x(w io.Writer) {
+	fmt.Fprintln(w)
+	fmt.Fprintln(w, "// +build ppc64 ppc64le")
+	fmt.Fprintln(w)
+}
+
+func zeroPPC64x(w io.Writer) {
+	// R0: always zero
+	// R3 (aka REGRT1): ptr to memory to be zeroed - 8
+	// On return, R3 points to the last zeroed dword.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVDU\tR0, 8(R3)")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyPPC64x(w io.Writer) {
+	// duffcopy is not used on PPC64.
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0")
+	fmt.Fprintln(w, "\tUNDEF")
+}
+
+func tagsMIPS64x(w io.Writer) {
+	fmt.Fprintln(w)
+	fmt.Fprintln(w, "// +build mips64 mips64le")
+	fmt.Fprintln(w)
+}
+
+func zeroMIPS64x(w io.Writer) {
+	// R0: always zero
+	// R1 (aka REGRT1): ptr to memory to be zeroed - 8
+	// On return, R1 points to the last zeroed dword.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVV\tR0, 8(R1)")
+		fmt.Fprintln(w, "\tADDV\t$8, R1")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyMIPS64x(w io.Writer) {
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVV\t(R1), R23")
+		fmt.Fprintln(w, "\tADDV\t$8, R1")
+		fmt.Fprintln(w, "\tMOVV\tR23, (R2)")
+		fmt.Fprintln(w, "\tADDV\t$8, R2")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func zeroRISCV64(w io.Writer) {
+	// ZERO: always zero
+	// X10: ptr to memory to be zeroed
+	// X10 is updated as a side effect.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOV\tZERO, (X10)")
+		fmt.Fprintln(w, "\tADD\t$8, X10")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyRISCV64(w io.Writer) {
+	// X10: ptr to source memory
+	// X11: ptr to destination memory
+	// X10 and X11 are updated as a side effect
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOV\t(X10), X31")
+		fmt.Fprintln(w, "\tADD\t$8, X10")
+		fmt.Fprintln(w, "\tMOV\tX31, (X11)")
+		fmt.Fprintln(w, "\tADD\t$8, X11")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
diff --git a/src/runtime/mkfastlog2table.go b/src/runtime/mkfastlog2table.go
new file mode 100644
index 0000000..d650292
--- /dev/null
+++ b/src/runtime/mkfastlog2table.go
@@ -0,0 +1,52 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+// fastlog2Table contains log2 approximations for 5 binary digits.
+// This is used to implement fastlog2, which is used for heap sampling.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"log"
+	"math"
+	"os"
+)
+
+func main() {
+	var buf bytes.Buffer
+
+	fmt.Fprintln(&buf, "// Code generated by mkfastlog2table.go; DO NOT EDIT.")
+	fmt.Fprintln(&buf, "// Run go generate from src/runtime to update.")
+	fmt.Fprintln(&buf, "// See mkfastlog2table.go for comments.")
+	fmt.Fprintln(&buf)
+	fmt.Fprintln(&buf, "package runtime")
+	fmt.Fprintln(&buf)
+	fmt.Fprintln(&buf, "const fastlogNumBits =", fastlogNumBits)
+	fmt.Fprintln(&buf)
+
+	fmt.Fprintln(&buf, "var fastlog2Table = [1<<fastlogNumBits + 1]float64{")
+	table := computeTable()
+	for _, t := range table {
+		fmt.Fprintf(&buf, "\t%v,\n", t)
+	}
+	fmt.Fprintln(&buf, "}")
+
+	if err := os.WriteFile("fastlog2table.go", buf.Bytes(), 0644); err != nil {
+		log.Fatalln(err)
+	}
+}
+
+const fastlogNumBits = 5
+
+func computeTable() []float64 {
+	fastlog2Table := make([]float64, 1<<fastlogNumBits+1)
+	for i := 0; i <= (1 << fastlogNumBits); i++ {
+		fastlog2Table[i] = math.Log2(1.0 + float64(i)/(1<<fastlogNumBits))
+	}
+	return fastlog2Table
+}
diff --git a/src/runtime/mkpreempt.go b/src/runtime/mkpreempt.go
new file mode 100644
index 0000000..1d614dd
--- /dev/null
+++ b/src/runtime/mkpreempt.go
@@ -0,0 +1,569 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+// mkpreempt generates the asyncPreempt functions for each
+// architecture.
+package main
+
+import (
+	"flag"
+	"fmt"
+	"io"
+	"log"
+	"os"
+	"strings"
+)
+
+// Copied from cmd/compile/internal/ssa/gen/*Ops.go
+
+var regNames386 = []string{
+	"AX",
+	"CX",
+	"DX",
+	"BX",
+	"SP",
+	"BP",
+	"SI",
+	"DI",
+	"X0",
+	"X1",
+	"X2",
+	"X3",
+	"X4",
+	"X5",
+	"X6",
+	"X7",
+}
+
+var regNamesAMD64 = []string{
+	"AX",
+	"CX",
+	"DX",
+	"BX",
+	"SP",
+	"BP",
+	"SI",
+	"DI",
+	"R8",
+	"R9",
+	"R10",
+	"R11",
+	"R12",
+	"R13",
+	"R14",
+	"R15",
+	"X0",
+	"X1",
+	"X2",
+	"X3",
+	"X4",
+	"X5",
+	"X6",
+	"X7",
+	"X8",
+	"X9",
+	"X10",
+	"X11",
+	"X12",
+	"X13",
+	"X14",
+	"X15",
+}
+
+var out io.Writer
+
+var arches = map[string]func(){
+	"386":     gen386,
+	"amd64":   genAMD64,
+	"arm":     genARM,
+	"arm64":   genARM64,
+	"mips64x": func() { genMIPS(true) },
+	"mipsx":   func() { genMIPS(false) },
+	"ppc64x":  genPPC64,
+	"riscv64": genRISCV64,
+	"s390x":   genS390X,
+	"wasm":    genWasm,
+}
+var beLe = map[string]bool{"mips64x": true, "mipsx": true, "ppc64x": true}
+
+func main() {
+	flag.Parse()
+	if flag.NArg() > 0 {
+		out = os.Stdout
+		for _, arch := range flag.Args() {
+			gen, ok := arches[arch]
+			if !ok {
+				log.Fatalf("unknown arch %s", arch)
+			}
+			header(arch)
+			gen()
+		}
+		return
+	}
+
+	for arch, gen := range arches {
+		f, err := os.Create(fmt.Sprintf("preempt_%s.s", arch))
+		if err != nil {
+			log.Fatal(err)
+		}
+		out = f
+		header(arch)
+		gen()
+		if err := f.Close(); err != nil {
+			log.Fatal(err)
+		}
+	}
+}
+
+func header(arch string) {
+	fmt.Fprintf(out, "// Code generated by mkpreempt.go; DO NOT EDIT.\n\n")
+	if beLe[arch] {
+		base := arch[:len(arch)-1]
+		fmt.Fprintf(out, "// +build %s %sle\n\n", base, base)
+	}
+	fmt.Fprintf(out, "#include \"go_asm.h\"\n")
+	fmt.Fprintf(out, "#include \"textflag.h\"\n\n")
+	fmt.Fprintf(out, "// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.\n")
+	fmt.Fprintf(out, "TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0\n")
+}
+
+func p(f string, args ...interface{}) {
+	fmted := fmt.Sprintf(f, args...)
+	fmt.Fprintf(out, "\t%s\n", strings.ReplaceAll(fmted, "\n", "\n\t"))
+}
+
+func label(l string) {
+	fmt.Fprintf(out, "%s\n", l)
+}
+
+type layout struct {
+	stack int
+	regs  []regPos
+	sp    string // stack pointer register
+}
+
+type regPos struct {
+	pos int
+
+	op  string
+	reg string
+
+	// If this register requires special save and restore, these
+	// give those operations with a %d placeholder for the stack
+	// offset.
+	save, restore string
+}
+
+func (l *layout) add(op, reg string, size int) {
+	l.regs = append(l.regs, regPos{op: op, reg: reg, pos: l.stack})
+	l.stack += size
+}
+
+func (l *layout) addSpecial(save, restore string, size int) {
+	l.regs = append(l.regs, regPos{save: save, restore: restore, pos: l.stack})
+	l.stack += size
+}
+
+func (l *layout) save() {
+	for _, reg := range l.regs {
+		if reg.save != "" {
+			p(reg.save, reg.pos)
+		} else {
+			p("%s %s, %d(%s)", reg.op, reg.reg, reg.pos, l.sp)
+		}
+	}
+}
+
+func (l *layout) restore() {
+	for i := len(l.regs) - 1; i >= 0; i-- {
+		reg := l.regs[i]
+		if reg.restore != "" {
+			p(reg.restore, reg.pos)
+		} else {
+			p("%s %d(%s), %s", reg.op, reg.pos, l.sp, reg.reg)
+		}
+	}
+}
+
+func gen386() {
+	p("PUSHFL")
+	// Save general purpose registers.
+	var l = layout{sp: "SP"}
+	for _, reg := range regNames386 {
+		if reg == "SP" || strings.HasPrefix(reg, "X") {
+			continue
+		}
+		l.add("MOVL", reg, 4)
+	}
+
+	// Save SSE state only if supported.
+	lSSE := layout{stack: l.stack, sp: "SP"}
+	for i := 0; i < 8; i++ {
+		lSSE.add("MOVUPS", fmt.Sprintf("X%d", i), 16)
+	}
+
+	p("ADJSP $%d", lSSE.stack)
+	p("NOP SP")
+	l.save()
+	p("CMPB internal∕cpu·X86+const_offsetX86HasSSE2(SB), $1\nJNE nosse")
+	lSSE.save()
+	label("nosse:")
+	p("CALL ·asyncPreempt2(SB)")
+	p("CMPB internal∕cpu·X86+const_offsetX86HasSSE2(SB), $1\nJNE nosse2")
+	lSSE.restore()
+	label("nosse2:")
+	l.restore()
+	p("ADJSP $%d", -lSSE.stack)
+
+	p("POPFL")
+	p("RET")
+}
+
+func genAMD64() {
+	// Assign stack offsets.
+	var l = layout{sp: "SP"}
+	for _, reg := range regNamesAMD64 {
+		if reg == "SP" || reg == "BP" {
+			continue
+		}
+		if strings.HasPrefix(reg, "X") {
+			l.add("MOVUPS", reg, 16)
+		} else {
+			l.add("MOVQ", reg, 8)
+		}
+	}
+
+	// TODO: MXCSR register?
+
+	p("PUSHQ BP")
+	p("MOVQ SP, BP")
+	p("// Save flags before clobbering them")
+	p("PUSHFQ")
+	p("// obj doesn't understand ADD/SUB on SP, but does understand ADJSP")
+	p("ADJSP $%d", l.stack)
+	p("// But vet doesn't know ADJSP, so suppress vet stack checking")
+	p("NOP SP")
+
+	// Apparently, the signal handling code path in darwin kernel leaves
+	// the upper bits of Y registers in a dirty state, which causes
+	// many SSE operations (128-bit and narrower) become much slower.
+	// Clear the upper bits to get to a clean state. See issue #37174.
+	// It is safe here as Go code don't use the upper bits of Y registers.
+	p("#ifdef GOOS_darwin")
+	p("CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $0")
+	p("JE 2(PC)")
+	p("VZEROUPPER")
+	p("#endif")
+
+	l.save()
+	p("CALL ·asyncPreempt2(SB)")
+	l.restore()
+	p("ADJSP $%d", -l.stack)
+	p("POPFQ")
+	p("POPQ BP")
+	p("RET")
+}
+
+func genARM() {
+	// Add integer registers R0-R12.
+	// R13 (SP), R14 (LR), R15 (PC) are special and not saved here.
+	var l = layout{sp: "R13", stack: 4} // add LR slot
+	for i := 0; i <= 12; i++ {
+		reg := fmt.Sprintf("R%d", i)
+		if i == 10 {
+			continue // R10 is g register, no need to save/restore
+		}
+		l.add("MOVW", reg, 4)
+	}
+	// Add flag register.
+	l.addSpecial(
+		"MOVW CPSR, R0\nMOVW R0, %d(R13)",
+		"MOVW %d(R13), R0\nMOVW R0, CPSR",
+		4)
+
+	// Add floating point registers F0-F15 and flag register.
+	var lfp = layout{stack: l.stack, sp: "R13"}
+	lfp.addSpecial(
+		"MOVW FPCR, R0\nMOVW R0, %d(R13)",
+		"MOVW %d(R13), R0\nMOVW R0, FPCR",
+		4)
+	for i := 0; i <= 15; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		lfp.add("MOVD", reg, 8)
+	}
+
+	p("MOVW.W R14, -%d(R13)", lfp.stack) // allocate frame, save LR
+	l.save()
+	p("MOVB ·goarm(SB), R0\nCMP $6, R0\nBLT nofp") // test goarm, and skip FP registers if goarm=5.
+	lfp.save()
+	label("nofp:")
+	p("CALL ·asyncPreempt2(SB)")
+	p("MOVB ·goarm(SB), R0\nCMP $6, R0\nBLT nofp2") // test goarm, and skip FP registers if goarm=5.
+	lfp.restore()
+	label("nofp2:")
+	l.restore()
+
+	p("MOVW %d(R13), R14", lfp.stack)     // sigctxt.pushCall pushes LR on stack, restore it
+	p("MOVW.P %d(R13), R15", lfp.stack+4) // load PC, pop frame (including the space pushed by sigctxt.pushCall)
+	p("UNDEF")                            // shouldn't get here
+}
+
+func genARM64() {
+	// Add integer registers R0-R26
+	// R27 (REGTMP), R28 (g), R29 (FP), R30 (LR), R31 (SP) are special
+	// and not saved here.
+	var l = layout{sp: "RSP", stack: 8} // add slot to save PC of interrupted instruction
+	for i := 0; i <= 26; i++ {
+		if i == 18 {
+			continue // R18 is not used, skip
+		}
+		reg := fmt.Sprintf("R%d", i)
+		l.add("MOVD", reg, 8)
+	}
+	// Add flag registers.
+	l.addSpecial(
+		"MOVD NZCV, R0\nMOVD R0, %d(RSP)",
+		"MOVD %d(RSP), R0\nMOVD R0, NZCV",
+		8)
+	l.addSpecial(
+		"MOVD FPSR, R0\nMOVD R0, %d(RSP)",
+		"MOVD %d(RSP), R0\nMOVD R0, FPSR",
+		8)
+	// TODO: FPCR? I don't think we'll change it, so no need to save.
+	// Add floating point registers F0-F31.
+	for i := 0; i <= 31; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		l.add("FMOVD", reg, 8)
+	}
+	if l.stack%16 != 0 {
+		l.stack += 8 // SP needs 16-byte alignment
+	}
+
+	// allocate frame, save PC of interrupted instruction (in LR)
+	p("MOVD R30, %d(RSP)", -l.stack)
+	p("SUB $%d, RSP", l.stack)
+	p("#ifdef GOOS_linux")
+	p("MOVD R29, -8(RSP)") // save frame pointer (only used on Linux)
+	p("SUB $8, RSP, R29")  // set up new frame pointer
+	p("#endif")
+	// On iOS, save the LR again after decrementing SP. We run the
+	// signal handler on the G stack (as it doesn't support sigaltstack),
+	// so any writes below SP may be clobbered.
+	p("#ifdef GOOS_ios")
+	p("MOVD R30, (RSP)")
+	p("#endif")
+
+	l.save()
+	p("CALL ·asyncPreempt2(SB)")
+	l.restore()
+
+	p("MOVD %d(RSP), R30", l.stack) // sigctxt.pushCall has pushed LR (at interrupt) on stack, restore it
+	p("#ifdef GOOS_linux")
+	p("MOVD -8(RSP), R29") // restore frame pointer
+	p("#endif")
+	p("MOVD (RSP), R27")          // load PC to REGTMP
+	p("ADD $%d, RSP", l.stack+16) // pop frame (including the space pushed by sigctxt.pushCall)
+	p("JMP (R27)")
+}
+
+func genMIPS(_64bit bool) {
+	mov := "MOVW"
+	movf := "MOVF"
+	add := "ADD"
+	sub := "SUB"
+	r28 := "R28"
+	regsize := 4
+	softfloat := "GOMIPS_softfloat"
+	if _64bit {
+		mov = "MOVV"
+		movf = "MOVD"
+		add = "ADDV"
+		sub = "SUBV"
+		r28 = "RSB"
+		regsize = 8
+		softfloat = "GOMIPS64_softfloat"
+	}
+
+	// Add integer registers R1-R22, R24-R25, R28
+	// R0 (zero), R23 (REGTMP), R29 (SP), R30 (g), R31 (LR) are special,
+	// and not saved here. R26 and R27 are reserved by kernel and not used.
+	var l = layout{sp: "R29", stack: regsize} // add slot to save PC of interrupted instruction (in LR)
+	for i := 1; i <= 25; i++ {
+		if i == 23 {
+			continue // R23 is REGTMP
+		}
+		reg := fmt.Sprintf("R%d", i)
+		l.add(mov, reg, regsize)
+	}
+	l.add(mov, r28, regsize)
+	l.addSpecial(
+		mov+" HI, R1\n"+mov+" R1, %d(R29)",
+		mov+" %d(R29), R1\n"+mov+" R1, HI",
+		regsize)
+	l.addSpecial(
+		mov+" LO, R1\n"+mov+" R1, %d(R29)",
+		mov+" %d(R29), R1\n"+mov+" R1, LO",
+		regsize)
+
+	// Add floating point control/status register FCR31 (FCR0-FCR30 are irrelevant)
+	var lfp = layout{sp: "R29", stack: l.stack}
+	lfp.addSpecial(
+		mov+" FCR31, R1\n"+mov+" R1, %d(R29)",
+		mov+" %d(R29), R1\n"+mov+" R1, FCR31",
+		regsize)
+	// Add floating point registers F0-F31.
+	for i := 0; i <= 31; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		lfp.add(movf, reg, regsize)
+	}
+
+	// allocate frame, save PC of interrupted instruction (in LR)
+	p(mov+" R31, -%d(R29)", lfp.stack)
+	p(sub+" $%d, R29", lfp.stack)
+
+	l.save()
+	p("#ifndef %s", softfloat)
+	lfp.save()
+	p("#endif")
+	p("CALL ·asyncPreempt2(SB)")
+	p("#ifndef %s", softfloat)
+	lfp.restore()
+	p("#endif")
+	l.restore()
+
+	p(mov+" %d(R29), R31", lfp.stack)     // sigctxt.pushCall has pushed LR (at interrupt) on stack, restore it
+	p(mov + " (R29), R23")                // load PC to REGTMP
+	p(add+" $%d, R29", lfp.stack+regsize) // pop frame (including the space pushed by sigctxt.pushCall)
+	p("JMP (R23)")
+}
+
+func genPPC64() {
+	// Add integer registers R3-R29
+	// R0 (zero), R1 (SP), R30 (g) are special and not saved here.
+	// R2 (TOC pointer in PIC mode), R12 (function entry address in PIC mode) have been saved in sigctxt.pushCall.
+	// R31 (REGTMP) will be saved manually.
+	var l = layout{sp: "R1", stack: 32 + 8} // MinFrameSize on PPC64, plus one word for saving R31
+	for i := 3; i <= 29; i++ {
+		if i == 12 || i == 13 {
+			// R12 has been saved in sigctxt.pushCall.
+			// R13 is TLS pointer, not used by Go code. we must NOT
+			// restore it, otherwise if we parked and resumed on a
+			// different thread we'll mess up TLS addresses.
+			continue
+		}
+		reg := fmt.Sprintf("R%d", i)
+		l.add("MOVD", reg, 8)
+	}
+	l.addSpecial(
+		"MOVW CR, R31\nMOVW R31, %d(R1)",
+		"MOVW %d(R1), R31\nMOVFL R31, $0xff", // this is MOVW R31, CR
+		8)                                    // CR is 4-byte wide, but just keep the alignment
+	l.addSpecial(
+		"MOVD XER, R31\nMOVD R31, %d(R1)",
+		"MOVD %d(R1), R31\nMOVD R31, XER",
+		8)
+	// Add floating point registers F0-F31.
+	for i := 0; i <= 31; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		l.add("FMOVD", reg, 8)
+	}
+	// Add floating point control/status register FPSCR.
+	l.addSpecial(
+		"MOVFL FPSCR, F0\nFMOVD F0, %d(R1)",
+		"FMOVD %d(R1), F0\nMOVFL F0, FPSCR",
+		8)
+
+	p("MOVD R31, -%d(R1)", l.stack-32) // save R31 first, we'll use R31 for saving LR
+	p("MOVD LR, R31")
+	p("MOVDU R31, -%d(R1)", l.stack) // allocate frame, save PC of interrupted instruction (in LR)
+
+	l.save()
+	p("CALL ·asyncPreempt2(SB)")
+	l.restore()
+
+	p("MOVD %d(R1), R31", l.stack) // sigctxt.pushCall has pushed LR, R2, R12 (at interrupt) on stack, restore them
+	p("MOVD R31, LR")
+	p("MOVD %d(R1), R2", l.stack+8)
+	p("MOVD %d(R1), R12", l.stack+16)
+	p("MOVD (R1), R31") // load PC to CTR
+	p("MOVD R31, CTR")
+	p("MOVD 32(R1), R31")        // restore R31
+	p("ADD $%d, R1", l.stack+32) // pop frame (including the space pushed by sigctxt.pushCall)
+	p("JMP (CTR)")
+}
+
+func genRISCV64() {
+	// X0 (zero), X1 (LR), X2 (SP), X4 (TP), X27 (g), X31 (TMP) are special.
+	var l = layout{sp: "X2", stack: 8}
+
+	// Add integer registers (X3, X5-X26, X28-30).
+	for i := 3; i < 31; i++ {
+		if i == 4 || i == 27 {
+			continue
+		}
+		reg := fmt.Sprintf("X%d", i)
+		l.add("MOV", reg, 8)
+	}
+
+	// Add floating point registers (F0-F31).
+	for i := 0; i <= 31; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		l.add("MOVD", reg, 8)
+	}
+
+	p("MOV X1, -%d(X2)", l.stack)
+	p("ADD $-%d, X2", l.stack)
+	l.save()
+	p("CALL ·asyncPreempt2(SB)")
+	l.restore()
+	p("MOV %d(X2), X1", l.stack)
+	p("MOV (X2), X31")
+	p("ADD $%d, X2", l.stack+8)
+	p("JMP (X31)")
+}
+
+func genS390X() {
+	// Add integer registers R0-R12
+	// R13 (g), R14 (LR), R15 (SP) are special, and not saved here.
+	// Saving R10 (REGTMP) is not necessary, but it is saved anyway.
+	var l = layout{sp: "R15", stack: 16} // add slot to save PC of interrupted instruction and flags
+	l.addSpecial(
+		"STMG R0, R12, %d(R15)",
+		"LMG %d(R15), R0, R12",
+		13*8)
+	// Add floating point registers F0-F31.
+	for i := 0; i <= 15; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		l.add("FMOVD", reg, 8)
+	}
+
+	// allocate frame, save PC of interrupted instruction (in LR) and flags (condition code)
+	p("IPM R10") // save flags upfront, as ADD will clobber flags
+	p("MOVD R14, -%d(R15)", l.stack)
+	p("ADD $-%d, R15", l.stack)
+	p("MOVW R10, 8(R15)") // save flags
+
+	l.save()
+	p("CALL ·asyncPreempt2(SB)")
+	l.restore()
+
+	p("MOVD %d(R15), R14", l.stack)    // sigctxt.pushCall has pushed LR (at interrupt) on stack, restore it
+	p("ADD $%d, R15", l.stack+8)       // pop frame (including the space pushed by sigctxt.pushCall)
+	p("MOVWZ -%d(R15), R10", l.stack)  // load flags to REGTMP
+	p("TMLH R10, $(3<<12)")            // restore flags
+	p("MOVD -%d(R15), R10", l.stack+8) // load PC to REGTMP
+	p("JMP (R10)")
+}
+
+func genWasm() {
+	p("// No async preemption on wasm")
+	p("UNDEF")
+}
+
+func notImplemented() {
+	p("// Not implemented yet")
+	p("JMP ·abort(SB)")
+}
diff --git a/src/runtime/mksizeclasses.go b/src/runtime/mksizeclasses.go
new file mode 100644
index 0000000..b92d1fe
--- /dev/null
+++ b/src/runtime/mksizeclasses.go
@@ -0,0 +1,327 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+// Generate tables for small malloc size classes.
+//
+// See malloc.go for overview.
+//
+// The size classes are chosen so that rounding an allocation
+// request up to the next size class wastes at most 12.5% (1.125x).
+//
+// Each size class has its own page count that gets allocated
+// and chopped up when new objects of the size class are needed.
+// That page count is chosen so that chopping up the run of
+// pages into objects of the given size wastes at most 12.5% (1.125x)
+// of the memory. It is not necessary that the cutoff here be
+// the same as above.
+//
+// The two sources of waste multiply, so the worst possible case
+// for the above constraints would be that allocations of some
+// size might have a 26.6% (1.266x) overhead.
+// In practice, only one of the wastes comes into play for a
+// given size (sizes < 512 waste mainly on the round-up,
+// sizes > 512 waste mainly on the page chopping).
+// For really small sizes, alignment constraints force the
+// overhead higher.
+
+package main
+
+import (
+	"bytes"
+	"flag"
+	"fmt"
+	"go/format"
+	"io"
+	"log"
+	"os"
+)
+
+// Generate msize.go
+
+var stdout = flag.Bool("stdout", false, "write to stdout instead of sizeclasses.go")
+
+func main() {
+	flag.Parse()
+
+	var b bytes.Buffer
+	fmt.Fprintln(&b, "// Code generated by mksizeclasses.go; DO NOT EDIT.")
+	fmt.Fprintln(&b, "//go:generate go run mksizeclasses.go")
+	fmt.Fprintln(&b)
+	fmt.Fprintln(&b, "package runtime")
+	classes := makeClasses()
+
+	printComment(&b, classes)
+
+	printClasses(&b, classes)
+
+	out, err := format.Source(b.Bytes())
+	if err != nil {
+		log.Fatal(err)
+	}
+	if *stdout {
+		_, err = os.Stdout.Write(out)
+	} else {
+		err = os.WriteFile("sizeclasses.go", out, 0666)
+	}
+	if err != nil {
+		log.Fatal(err)
+	}
+}
+
+const (
+	// Constants that we use and will transfer to the runtime.
+	maxSmallSize = 32 << 10
+	smallSizeDiv = 8
+	smallSizeMax = 1024
+	largeSizeDiv = 128
+	pageShift    = 13
+
+	// Derived constants.
+	pageSize = 1 << pageShift
+)
+
+type class struct {
+	size   int // max size
+	npages int // number of pages
+
+	mul    int
+	shift  uint
+	shift2 uint
+	mask   int
+}
+
+func powerOfTwo(x int) bool {
+	return x != 0 && x&(x-1) == 0
+}
+
+func makeClasses() []class {
+	var classes []class
+
+	classes = append(classes, class{}) // class #0 is a dummy entry
+
+	align := 8
+	for size := align; size <= maxSmallSize; size += align {
+		if powerOfTwo(size) { // bump alignment once in a while
+			if size >= 2048 {
+				align = 256
+			} else if size >= 128 {
+				align = size / 8
+			} else if size >= 32 {
+				align = 16 // heap bitmaps assume 16 byte alignment for allocations >= 32 bytes.
+			}
+		}
+		if !powerOfTwo(align) {
+			panic("incorrect alignment")
+		}
+
+		// Make the allocnpages big enough that
+		// the leftover is less than 1/8 of the total,
+		// so wasted space is at most 12.5%.
+		allocsize := pageSize
+		for allocsize%size > allocsize/8 {
+			allocsize += pageSize
+		}
+		npages := allocsize / pageSize
+
+		// If the previous sizeclass chose the same
+		// allocation size and fit the same number of
+		// objects into the page, we might as well
+		// use just this size instead of having two
+		// different sizes.
+		if len(classes) > 1 && npages == classes[len(classes)-1].npages && allocsize/size == allocsize/classes[len(classes)-1].size {
+			classes[len(classes)-1].size = size
+			continue
+		}
+		classes = append(classes, class{size: size, npages: npages})
+	}
+
+	// Increase object sizes if we can fit the same number of larger objects
+	// into the same number of pages. For example, we choose size 8448 above
+	// with 6 objects in 7 pages. But we can well use object size 9472,
+	// which is also 6 objects in 7 pages but +1024 bytes (+12.12%).
+	// We need to preserve at least largeSizeDiv alignment otherwise
+	// sizeToClass won't work.
+	for i := range classes {
+		if i == 0 {
+			continue
+		}
+		c := &classes[i]
+		psize := c.npages * pageSize
+		new_size := (psize / (psize / c.size)) &^ (largeSizeDiv - 1)
+		if new_size > c.size {
+			c.size = new_size
+		}
+	}
+
+	if len(classes) != 68 {
+		panic("number of size classes has changed")
+	}
+
+	for i := range classes {
+		computeDivMagic(&classes[i])
+	}
+
+	return classes
+}
+
+// computeDivMagic computes some magic constants to implement
+// the division required to compute object number from span offset.
+// n / c.size is implemented as n >> c.shift * c.mul >> c.shift2
+// for all 0 <= n <= c.npages * pageSize
+func computeDivMagic(c *class) {
+	// divisor
+	d := c.size
+	if d == 0 {
+		return
+	}
+
+	// maximum input value for which the formula needs to work.
+	max := c.npages * pageSize
+
+	if powerOfTwo(d) {
+		// If the size is a power of two, heapBitsForObject can divide even faster by masking.
+		// Compute this mask.
+		if max >= 1<<16 {
+			panic("max too big for power of two size")
+		}
+		c.mask = 1<<16 - d
+	}
+
+	// Compute pre-shift by factoring power of 2 out of d.
+	for d%2 == 0 {
+		c.shift++
+		d >>= 1
+		max >>= 1
+	}
+
+	// Find the smallest k that works.
+	// A small k allows us to fit the math required into 32 bits
+	// so we can use 32-bit multiplies and shifts on 32-bit platforms.
+nextk:
+	for k := uint(0); ; k++ {
+		mul := (int(1)<<k + d - 1) / d //  ⌈2^k / d⌉
+
+		// Test to see if mul works.
+		for n := 0; n <= max; n++ {
+			if n*mul>>k != n/d {
+				continue nextk
+			}
+		}
+		if mul >= 1<<16 {
+			panic("mul too big")
+		}
+		if uint64(mul)*uint64(max) >= 1<<32 {
+			panic("mul*max too big")
+		}
+		c.mul = mul
+		c.shift2 = k
+		break
+	}
+
+	// double-check.
+	for n := 0; n <= max; n++ {
+		if n*c.mul>>c.shift2 != n/d {
+			fmt.Printf("d=%d max=%d mul=%d shift2=%d n=%d\n", d, max, c.mul, c.shift2, n)
+			panic("bad multiply magic")
+		}
+		// Also check the exact computations that will be done by the runtime,
+		// for both 32 and 64 bit operations.
+		if uint32(n)*uint32(c.mul)>>uint8(c.shift2) != uint32(n/d) {
+			fmt.Printf("d=%d max=%d mul=%d shift2=%d n=%d\n", d, max, c.mul, c.shift2, n)
+			panic("bad 32-bit multiply magic")
+		}
+		if uint64(n)*uint64(c.mul)>>uint8(c.shift2) != uint64(n/d) {
+			fmt.Printf("d=%d max=%d mul=%d shift2=%d n=%d\n", d, max, c.mul, c.shift2, n)
+			panic("bad 64-bit multiply magic")
+		}
+	}
+}
+
+func printComment(w io.Writer, classes []class) {
+	fmt.Fprintf(w, "// %-5s  %-9s  %-10s  %-7s  %-10s  %-9s\n", "class", "bytes/obj", "bytes/span", "objects", "tail waste", "max waste")
+	prevSize := 0
+	for i, c := range classes {
+		if i == 0 {
+			continue
+		}
+		spanSize := c.npages * pageSize
+		objects := spanSize / c.size
+		tailWaste := spanSize - c.size*(spanSize/c.size)
+		maxWaste := float64((c.size-prevSize-1)*objects+tailWaste) / float64(spanSize)
+		prevSize = c.size
+		fmt.Fprintf(w, "// %5d  %9d  %10d  %7d  %10d  %8.2f%%\n", i, c.size, spanSize, objects, tailWaste, 100*maxWaste)
+	}
+	fmt.Fprintf(w, "\n")
+}
+
+func printClasses(w io.Writer, classes []class) {
+	fmt.Fprintln(w, "const (")
+	fmt.Fprintf(w, "_MaxSmallSize = %d\n", maxSmallSize)
+	fmt.Fprintf(w, "smallSizeDiv = %d\n", smallSizeDiv)
+	fmt.Fprintf(w, "smallSizeMax = %d\n", smallSizeMax)
+	fmt.Fprintf(w, "largeSizeDiv = %d\n", largeSizeDiv)
+	fmt.Fprintf(w, "_NumSizeClasses = %d\n", len(classes))
+	fmt.Fprintf(w, "_PageShift = %d\n", pageShift)
+	fmt.Fprintln(w, ")")
+
+	fmt.Fprint(w, "var class_to_size = [_NumSizeClasses]uint16 {")
+	for _, c := range classes {
+		fmt.Fprintf(w, "%d,", c.size)
+	}
+	fmt.Fprintln(w, "}")
+
+	fmt.Fprint(w, "var class_to_allocnpages = [_NumSizeClasses]uint8 {")
+	for _, c := range classes {
+		fmt.Fprintf(w, "%d,", c.npages)
+	}
+	fmt.Fprintln(w, "}")
+
+	fmt.Fprintln(w, "type divMagic struct {")
+	fmt.Fprintln(w, "  shift uint8")
+	fmt.Fprintln(w, "  shift2 uint8")
+	fmt.Fprintln(w, "  mul uint16")
+	fmt.Fprintln(w, "  baseMask uint16")
+	fmt.Fprintln(w, "}")
+	fmt.Fprint(w, "var class_to_divmagic = [_NumSizeClasses]divMagic {")
+	for _, c := range classes {
+		fmt.Fprintf(w, "{%d,%d,%d,%d},", c.shift, c.shift2, c.mul, c.mask)
+	}
+	fmt.Fprintln(w, "}")
+
+	// map from size to size class, for small sizes.
+	sc := make([]int, smallSizeMax/smallSizeDiv+1)
+	for i := range sc {
+		size := i * smallSizeDiv
+		for j, c := range classes {
+			if c.size >= size {
+				sc[i] = j
+				break
+			}
+		}
+	}
+	fmt.Fprint(w, "var size_to_class8 = [smallSizeMax/smallSizeDiv+1]uint8 {")
+	for _, v := range sc {
+		fmt.Fprintf(w, "%d,", v)
+	}
+	fmt.Fprintln(w, "}")
+
+	// map from size to size class, for large sizes.
+	sc = make([]int, (maxSmallSize-smallSizeMax)/largeSizeDiv+1)
+	for i := range sc {
+		size := smallSizeMax + i*largeSizeDiv
+		for j, c := range classes {
+			if c.size >= size {
+				sc[i] = j
+				break
+			}
+		}
+	}
+	fmt.Fprint(w, "var size_to_class128 = [(_MaxSmallSize-smallSizeMax)/largeSizeDiv+1]uint8 {")
+	for _, v := range sc {
+		fmt.Fprintf(w, "%d,", v)
+	}
+	fmt.Fprintln(w, "}")
+}
diff --git a/src/runtime/mmap.go b/src/runtime/mmap.go
new file mode 100644
index 0000000..1b1848b
--- /dev/null
+++ b/src/runtime/mmap.go
@@ -0,0 +1,27 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !aix
+// +build !darwin
+// +build !js
+// +build !linux !amd64
+// +build !linux !arm64
+// +build !openbsd
+// +build !plan9
+// +build !solaris
+// +build !windows
+
+package runtime
+
+import "unsafe"
+
+// mmap calls the mmap system call. It is implemented in assembly.
+// We only pass the lower 32 bits of file offset to the
+// assembly routine; the higher bits (if required), should be provided
+// by the assembly routine as 0.
+// The err result is an OS error code such as ENOMEM.
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (p unsafe.Pointer, err int)
+
+// munmap calls the munmap system call. It is implemented in assembly.
+func munmap(addr unsafe.Pointer, n uintptr)
diff --git a/src/runtime/mpagealloc.go b/src/runtime/mpagealloc.go
new file mode 100644
index 0000000..dac1f39
--- /dev/null
+++ b/src/runtime/mpagealloc.go
@@ -0,0 +1,1007 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Page allocator.
+//
+// The page allocator manages mapped pages (defined by pageSize, NOT
+// physPageSize) for allocation and re-use. It is embedded into mheap.
+//
+// Pages are managed using a bitmap that is sharded into chunks.
+// In the bitmap, 1 means in-use, and 0 means free. The bitmap spans the
+// process's address space. Chunks are managed in a sparse-array-style structure
+// similar to mheap.arenas, since the bitmap may be large on some systems.
+//
+// The bitmap is efficiently searched by using a radix tree in combination
+// with fast bit-wise intrinsics. Allocation is performed using an address-ordered
+// first-fit approach.
+//
+// Each entry in the radix tree is a summary that describes three properties of
+// a particular region of the address space: the number of contiguous free pages
+// at the start and end of the region it represents, and the maximum number of
+// contiguous free pages found anywhere in that region.
+//
+// Each level of the radix tree is stored as one contiguous array, which represents
+// a different granularity of subdivision of the processes' address space. Thus, this
+// radix tree is actually implicit in these large arrays, as opposed to having explicit
+// dynamically-allocated pointer-based node structures. Naturally, these arrays may be
+// quite large for system with large address spaces, so in these cases they are mapped
+// into memory as needed. The leaf summaries of the tree correspond to a bitmap chunk.
+//
+// The root level (referred to as L0 and index 0 in pageAlloc.summary) has each
+// summary represent the largest section of address space (16 GiB on 64-bit systems),
+// with each subsequent level representing successively smaller subsections until we
+// reach the finest granularity at the leaves, a chunk.
+//
+// More specifically, each summary in each level (except for leaf summaries)
+// represents some number of entries in the following level. For example, each
+// summary in the root level may represent a 16 GiB region of address space,
+// and in the next level there could be 8 corresponding entries which represent 2
+// GiB subsections of that 16 GiB region, each of which could correspond to 8
+// entries in the next level which each represent 256 MiB regions, and so on.
+//
+// Thus, this design only scales to heaps so large, but can always be extended to
+// larger heaps by simply adding levels to the radix tree, which mostly costs
+// additional virtual address space. The choice of managing large arrays also means
+// that a large amount of virtual address space may be reserved by the runtime.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const (
+	// The size of a bitmap chunk, i.e. the amount of bits (that is, pages) to consider
+	// in the bitmap at once.
+	pallocChunkPages    = 1 << logPallocChunkPages
+	pallocChunkBytes    = pallocChunkPages * pageSize
+	logPallocChunkPages = 9
+	logPallocChunkBytes = logPallocChunkPages + pageShift
+
+	// The number of radix bits for each level.
+	//
+	// The value of 3 is chosen such that the block of summaries we need to scan at
+	// each level fits in 64 bytes (2^3 summaries * 8 bytes per summary), which is
+	// close to the L1 cache line width on many systems. Also, a value of 3 fits 4 tree
+	// levels perfectly into the 21-bit pallocBits summary field at the root level.
+	//
+	// The following equation explains how each of the constants relate:
+	// summaryL0Bits + (summaryLevels-1)*summaryLevelBits + logPallocChunkBytes = heapAddrBits
+	//
+	// summaryLevels is an architecture-dependent value defined in mpagealloc_*.go.
+	summaryLevelBits = 3
+	summaryL0Bits    = heapAddrBits - logPallocChunkBytes - (summaryLevels-1)*summaryLevelBits
+
+	// pallocChunksL2Bits is the number of bits of the chunk index number
+	// covered by the second level of the chunks map.
+	//
+	// See (*pageAlloc).chunks for more details. Update the documentation
+	// there should this change.
+	pallocChunksL2Bits  = heapAddrBits - logPallocChunkBytes - pallocChunksL1Bits
+	pallocChunksL1Shift = pallocChunksL2Bits
+)
+
+// Maximum searchAddr value, which indicates that the heap has no free space.
+//
+// We alias maxOffAddr just to make it clear that this is the maximum address
+// for the page allocator's search space. See maxOffAddr for details.
+var maxSearchAddr = maxOffAddr
+
+// Global chunk index.
+//
+// Represents an index into the leaf level of the radix tree.
+// Similar to arenaIndex, except instead of arenas, it divides the address
+// space into chunks.
+type chunkIdx uint
+
+// chunkIndex returns the global index of the palloc chunk containing the
+// pointer p.
+func chunkIndex(p uintptr) chunkIdx {
+	return chunkIdx((p - arenaBaseOffset) / pallocChunkBytes)
+}
+
+// chunkIndex returns the base address of the palloc chunk at index ci.
+func chunkBase(ci chunkIdx) uintptr {
+	return uintptr(ci)*pallocChunkBytes + arenaBaseOffset
+}
+
+// chunkPageIndex computes the index of the page that contains p,
+// relative to the chunk which contains p.
+func chunkPageIndex(p uintptr) uint {
+	return uint(p % pallocChunkBytes / pageSize)
+}
+
+// l1 returns the index into the first level of (*pageAlloc).chunks.
+func (i chunkIdx) l1() uint {
+	if pallocChunksL1Bits == 0 {
+		// Let the compiler optimize this away if there's no
+		// L1 map.
+		return 0
+	} else {
+		return uint(i) >> pallocChunksL1Shift
+	}
+}
+
+// l2 returns the index into the second level of (*pageAlloc).chunks.
+func (i chunkIdx) l2() uint {
+	if pallocChunksL1Bits == 0 {
+		return uint(i)
+	} else {
+		return uint(i) & (1<<pallocChunksL2Bits - 1)
+	}
+}
+
+// offAddrToLevelIndex converts an address in the offset address space
+// to the index into summary[level] containing addr.
+func offAddrToLevelIndex(level int, addr offAddr) int {
+	return int((addr.a - arenaBaseOffset) >> levelShift[level])
+}
+
+// levelIndexToOffAddr converts an index into summary[level] into
+// the corresponding address in the offset address space.
+func levelIndexToOffAddr(level, idx int) offAddr {
+	return offAddr{(uintptr(idx) << levelShift[level]) + arenaBaseOffset}
+}
+
+// addrsToSummaryRange converts base and limit pointers into a range
+// of entries for the given summary level.
+//
+// The returned range is inclusive on the lower bound and exclusive on
+// the upper bound.
+func addrsToSummaryRange(level int, base, limit uintptr) (lo int, hi int) {
+	// This is slightly more nuanced than just a shift for the exclusive
+	// upper-bound. Note that the exclusive upper bound may be within a
+	// summary at this level, meaning if we just do the obvious computation
+	// hi will end up being an inclusive upper bound. Unfortunately, just
+	// adding 1 to that is too broad since we might be on the very edge of
+	// of a summary's max page count boundary for this level
+	// (1 << levelLogPages[level]). So, make limit an inclusive upper bound
+	// then shift, then add 1, so we get an exclusive upper bound at the end.
+	lo = int((base - arenaBaseOffset) >> levelShift[level])
+	hi = int(((limit-1)-arenaBaseOffset)>>levelShift[level]) + 1
+	return
+}
+
+// blockAlignSummaryRange aligns indices into the given level to that
+// level's block width (1 << levelBits[level]). It assumes lo is inclusive
+// and hi is exclusive, and so aligns them down and up respectively.
+func blockAlignSummaryRange(level int, lo, hi int) (int, int) {
+	e := uintptr(1) << levelBits[level]
+	return int(alignDown(uintptr(lo), e)), int(alignUp(uintptr(hi), e))
+}
+
+type pageAlloc struct {
+	// Radix tree of summaries.
+	//
+	// Each slice's cap represents the whole memory reservation.
+	// Each slice's len reflects the allocator's maximum known
+	// mapped heap address for that level.
+	//
+	// The backing store of each summary level is reserved in init
+	// and may or may not be committed in grow (small address spaces
+	// may commit all the memory in init).
+	//
+	// The purpose of keeping len <= cap is to enforce bounds checks
+	// on the top end of the slice so that instead of an unknown
+	// runtime segmentation fault, we get a much friendlier out-of-bounds
+	// error.
+	//
+	// To iterate over a summary level, use inUse to determine which ranges
+	// are currently available. Otherwise one might try to access
+	// memory which is only Reserved which may result in a hard fault.
+	//
+	// We may still get segmentation faults < len since some of that
+	// memory may not be committed yet.
+	summary [summaryLevels][]pallocSum
+
+	// chunks is a slice of bitmap chunks.
+	//
+	// The total size of chunks is quite large on most 64-bit platforms
+	// (O(GiB) or more) if flattened, so rather than making one large mapping
+	// (which has problems on some platforms, even when PROT_NONE) we use a
+	// two-level sparse array approach similar to the arena index in mheap.
+	//
+	// To find the chunk containing a memory address `a`, do:
+	//   chunkOf(chunkIndex(a))
+	//
+	// Below is a table describing the configuration for chunks for various
+	// heapAddrBits supported by the runtime.
+	//
+	// heapAddrBits | L1 Bits | L2 Bits | L2 Entry Size
+	// ------------------------------------------------
+	// 32           | 0       | 10      | 128 KiB
+	// 33 (iOS)     | 0       | 11      | 256 KiB
+	// 48           | 13      | 13      | 1 MiB
+	//
+	// There's no reason to use the L1 part of chunks on 32-bit, the
+	// address space is small so the L2 is small. For platforms with a
+	// 48-bit address space, we pick the L1 such that the L2 is 1 MiB
+	// in size, which is a good balance between low granularity without
+	// making the impact on BSS too high (note the L1 is stored directly
+	// in pageAlloc).
+	//
+	// To iterate over the bitmap, use inUse to determine which ranges
+	// are currently available. Otherwise one might iterate over unused
+	// ranges.
+	//
+	// TODO(mknyszek): Consider changing the definition of the bitmap
+	// such that 1 means free and 0 means in-use so that summaries and
+	// the bitmaps align better on zero-values.
+	chunks [1 << pallocChunksL1Bits]*[1 << pallocChunksL2Bits]pallocData
+
+	// The address to start an allocation search with. It must never
+	// point to any memory that is not contained in inUse, i.e.
+	// inUse.contains(searchAddr.addr()) must always be true. The one
+	// exception to this rule is that it may take on the value of
+	// maxOffAddr to indicate that the heap is exhausted.
+	//
+	// We guarantee that all valid heap addresses below this value
+	// are allocated and not worth searching.
+	searchAddr offAddr
+
+	// start and end represent the chunk indices
+	// which pageAlloc knows about. It assumes
+	// chunks in the range [start, end) are
+	// currently ready to use.
+	start, end chunkIdx
+
+	// inUse is a slice of ranges of address space which are
+	// known by the page allocator to be currently in-use (passed
+	// to grow).
+	//
+	// This field is currently unused on 32-bit architectures but
+	// is harmless to track. We care much more about having a
+	// contiguous heap in these cases and take additional measures
+	// to ensure that, so in nearly all cases this should have just
+	// 1 element.
+	//
+	// All access is protected by the mheapLock.
+	inUse addrRanges
+
+	// scav stores the scavenger state.
+	//
+	// All fields are protected by mheapLock.
+	scav struct {
+		// inUse is a slice of ranges of address space which have not
+		// yet been looked at by the scavenger.
+		inUse addrRanges
+
+		// gen is the scavenge generation number.
+		gen uint32
+
+		// reservationBytes is how large of a reservation should be made
+		// in bytes of address space for each scavenge iteration.
+		reservationBytes uintptr
+
+		// released is the amount of memory released this generation.
+		released uintptr
+
+		// scavLWM is the lowest (offset) address that the scavenger reached this
+		// scavenge generation.
+		scavLWM offAddr
+
+		// freeHWM is the highest (offset) address of a page that was freed to
+		// the page allocator this scavenge generation.
+		freeHWM offAddr
+	}
+
+	// mheap_.lock. This level of indirection makes it possible
+	// to test pageAlloc indepedently of the runtime allocator.
+	mheapLock *mutex
+
+	// sysStat is the runtime memstat to update when new system
+	// memory is committed by the pageAlloc for allocation metadata.
+	sysStat *sysMemStat
+
+	// Whether or not this struct is being used in tests.
+	test bool
+}
+
+func (p *pageAlloc) init(mheapLock *mutex, sysStat *sysMemStat) {
+	if levelLogPages[0] > logMaxPackedValue {
+		// We can't represent 1<<levelLogPages[0] pages, the maximum number
+		// of pages we need to represent at the root level, in a summary, which
+		// is a big problem. Throw.
+		print("runtime: root level max pages = ", 1<<levelLogPages[0], "\n")
+		print("runtime: summary max pages = ", maxPackedValue, "\n")
+		throw("root level max pages doesn't fit in summary")
+	}
+	p.sysStat = sysStat
+
+	// Initialize p.inUse.
+	p.inUse.init(sysStat)
+
+	// System-dependent initialization.
+	p.sysInit()
+
+	// Start with the searchAddr in a state indicating there's no free memory.
+	p.searchAddr = maxSearchAddr
+
+	// Set the mheapLock.
+	p.mheapLock = mheapLock
+
+	// Initialize scavenge tracking state.
+	p.scav.scavLWM = maxSearchAddr
+}
+
+// tryChunkOf returns the bitmap data for the given chunk.
+//
+// Returns nil if the chunk data has not been mapped.
+func (p *pageAlloc) tryChunkOf(ci chunkIdx) *pallocData {
+	l2 := p.chunks[ci.l1()]
+	if l2 == nil {
+		return nil
+	}
+	return &l2[ci.l2()]
+}
+
+// chunkOf returns the chunk at the given chunk index.
+//
+// The chunk index must be valid or this method may throw.
+func (p *pageAlloc) chunkOf(ci chunkIdx) *pallocData {
+	return &p.chunks[ci.l1()][ci.l2()]
+}
+
+// grow sets up the metadata for the address range [base, base+size).
+// It may allocate metadata, in which case *p.sysStat will be updated.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) grow(base, size uintptr) {
+	assertLockHeld(p.mheapLock)
+
+	// Round up to chunks, since we can't deal with increments smaller
+	// than chunks. Also, sysGrow expects aligned values.
+	limit := alignUp(base+size, pallocChunkBytes)
+	base = alignDown(base, pallocChunkBytes)
+
+	// Grow the summary levels in a system-dependent manner.
+	// We just update a bunch of additional metadata here.
+	p.sysGrow(base, limit)
+
+	// Update p.start and p.end.
+	// If no growth happened yet, start == 0. This is generally
+	// safe since the zero page is unmapped.
+	firstGrowth := p.start == 0
+	start, end := chunkIndex(base), chunkIndex(limit)
+	if firstGrowth || start < p.start {
+		p.start = start
+	}
+	if end > p.end {
+		p.end = end
+	}
+	// Note that [base, limit) will never overlap with any existing
+	// range inUse because grow only ever adds never-used memory
+	// regions to the page allocator.
+	p.inUse.add(makeAddrRange(base, limit))
+
+	// A grow operation is a lot like a free operation, so if our
+	// chunk ends up below p.searchAddr, update p.searchAddr to the
+	// new address, just like in free.
+	if b := (offAddr{base}); b.lessThan(p.searchAddr) {
+		p.searchAddr = b
+	}
+
+	// Add entries into chunks, which is sparse, if needed. Then,
+	// initialize the bitmap.
+	//
+	// Newly-grown memory is always considered scavenged.
+	// Set all the bits in the scavenged bitmaps high.
+	for c := chunkIndex(base); c < chunkIndex(limit); c++ {
+		if p.chunks[c.l1()] == nil {
+			// Create the necessary l2 entry.
+			//
+			// Store it atomically to avoid races with readers which
+			// don't acquire the heap lock.
+			r := sysAlloc(unsafe.Sizeof(*p.chunks[0]), p.sysStat)
+			atomic.StorepNoWB(unsafe.Pointer(&p.chunks[c.l1()]), r)
+		}
+		p.chunkOf(c).scavenged.setRange(0, pallocChunkPages)
+	}
+
+	// Update summaries accordingly. The grow acts like a free, so
+	// we need to ensure this newly-free memory is visible in the
+	// summaries.
+	p.update(base, size/pageSize, true, false)
+}
+
+// update updates heap metadata. It must be called each time the bitmap
+// is updated.
+//
+// If contig is true, update does some optimizations assuming that there was
+// a contiguous allocation or free between addr and addr+npages. alloc indicates
+// whether the operation performed was an allocation or a free.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) update(base, npages uintptr, contig, alloc bool) {
+	assertLockHeld(p.mheapLock)
+
+	// base, limit, start, and end are inclusive.
+	limit := base + npages*pageSize - 1
+	sc, ec := chunkIndex(base), chunkIndex(limit)
+
+	// Handle updating the lowest level first.
+	if sc == ec {
+		// Fast path: the allocation doesn't span more than one chunk,
+		// so update this one and if the summary didn't change, return.
+		x := p.summary[len(p.summary)-1][sc]
+		y := p.chunkOf(sc).summarize()
+		if x == y {
+			return
+		}
+		p.summary[len(p.summary)-1][sc] = y
+	} else if contig {
+		// Slow contiguous path: the allocation spans more than one chunk
+		// and at least one summary is guaranteed to change.
+		summary := p.summary[len(p.summary)-1]
+
+		// Update the summary for chunk sc.
+		summary[sc] = p.chunkOf(sc).summarize()
+
+		// Update the summaries for chunks in between, which are
+		// either totally allocated or freed.
+		whole := p.summary[len(p.summary)-1][sc+1 : ec]
+		if alloc {
+			// Should optimize into a memclr.
+			for i := range whole {
+				whole[i] = 0
+			}
+		} else {
+			for i := range whole {
+				whole[i] = freeChunkSum
+			}
+		}
+
+		// Update the summary for chunk ec.
+		summary[ec] = p.chunkOf(ec).summarize()
+	} else {
+		// Slow general path: the allocation spans more than one chunk
+		// and at least one summary is guaranteed to change.
+		//
+		// We can't assume a contiguous allocation happened, so walk over
+		// every chunk in the range and manually recompute the summary.
+		summary := p.summary[len(p.summary)-1]
+		for c := sc; c <= ec; c++ {
+			summary[c] = p.chunkOf(c).summarize()
+		}
+	}
+
+	// Walk up the radix tree and update the summaries appropriately.
+	changed := true
+	for l := len(p.summary) - 2; l >= 0 && changed; l-- {
+		// Update summaries at level l from summaries at level l+1.
+		changed = false
+
+		// "Constants" for the previous level which we
+		// need to compute the summary from that level.
+		logEntriesPerBlock := levelBits[l+1]
+		logMaxPages := levelLogPages[l+1]
+
+		// lo and hi describe all the parts of the level we need to look at.
+		lo, hi := addrsToSummaryRange(l, base, limit+1)
+
+		// Iterate over each block, updating the corresponding summary in the less-granular level.
+		for i := lo; i < hi; i++ {
+			children := p.summary[l+1][i<<logEntriesPerBlock : (i+1)<<logEntriesPerBlock]
+			sum := mergeSummaries(children, logMaxPages)
+			old := p.summary[l][i]
+			if old != sum {
+				changed = true
+				p.summary[l][i] = sum
+			}
+		}
+	}
+}
+
+// allocRange marks the range of memory [base, base+npages*pageSize) as
+// allocated. It also updates the summaries to reflect the newly-updated
+// bitmap.
+//
+// Returns the amount of scavenged memory in bytes present in the
+// allocated range.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) allocRange(base, npages uintptr) uintptr {
+	assertLockHeld(p.mheapLock)
+
+	limit := base + npages*pageSize - 1
+	sc, ec := chunkIndex(base), chunkIndex(limit)
+	si, ei := chunkPageIndex(base), chunkPageIndex(limit)
+
+	scav := uint(0)
+	if sc == ec {
+		// The range doesn't cross any chunk boundaries.
+		chunk := p.chunkOf(sc)
+		scav += chunk.scavenged.popcntRange(si, ei+1-si)
+		chunk.allocRange(si, ei+1-si)
+	} else {
+		// The range crosses at least one chunk boundary.
+		chunk := p.chunkOf(sc)
+		scav += chunk.scavenged.popcntRange(si, pallocChunkPages-si)
+		chunk.allocRange(si, pallocChunkPages-si)
+		for c := sc + 1; c < ec; c++ {
+			chunk := p.chunkOf(c)
+			scav += chunk.scavenged.popcntRange(0, pallocChunkPages)
+			chunk.allocAll()
+		}
+		chunk = p.chunkOf(ec)
+		scav += chunk.scavenged.popcntRange(0, ei+1)
+		chunk.allocRange(0, ei+1)
+	}
+	p.update(base, npages, true, true)
+	return uintptr(scav) * pageSize
+}
+
+// findMappedAddr returns the smallest mapped offAddr that is
+// >= addr. That is, if addr refers to mapped memory, then it is
+// returned. If addr is higher than any mapped region, then
+// it returns maxOffAddr.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) findMappedAddr(addr offAddr) offAddr {
+	assertLockHeld(p.mheapLock)
+
+	// If we're not in a test, validate first by checking mheap_.arenas.
+	// This is a fast path which is only safe to use outside of testing.
+	ai := arenaIndex(addr.addr())
+	if p.test || mheap_.arenas[ai.l1()] == nil || mheap_.arenas[ai.l1()][ai.l2()] == nil {
+		vAddr, ok := p.inUse.findAddrGreaterEqual(addr.addr())
+		if ok {
+			return offAddr{vAddr}
+		} else {
+			// The candidate search address is greater than any
+			// known address, which means we definitely have no
+			// free memory left.
+			return maxOffAddr
+		}
+	}
+	return addr
+}
+
+// find searches for the first (address-ordered) contiguous free region of
+// npages in size and returns a base address for that region.
+//
+// It uses p.searchAddr to prune its search and assumes that no palloc chunks
+// below chunkIndex(p.searchAddr) contain any free memory at all.
+//
+// find also computes and returns a candidate p.searchAddr, which may or
+// may not prune more of the address space than p.searchAddr already does.
+// This candidate is always a valid p.searchAddr.
+//
+// find represents the slow path and the full radix tree search.
+//
+// Returns a base address of 0 on failure, in which case the candidate
+// searchAddr returned is invalid and must be ignored.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) find(npages uintptr) (uintptr, offAddr) {
+	assertLockHeld(p.mheapLock)
+
+	// Search algorithm.
+	//
+	// This algorithm walks each level l of the radix tree from the root level
+	// to the leaf level. It iterates over at most 1 << levelBits[l] of entries
+	// in a given level in the radix tree, and uses the summary information to
+	// find either:
+	//  1) That a given subtree contains a large enough contiguous region, at
+	//     which point it continues iterating on the next level, or
+	//  2) That there are enough contiguous boundary-crossing bits to satisfy
+	//     the allocation, at which point it knows exactly where to start
+	//     allocating from.
+	//
+	// i tracks the index into the current level l's structure for the
+	// contiguous 1 << levelBits[l] entries we're actually interested in.
+	//
+	// NOTE: Technically this search could allocate a region which crosses
+	// the arenaBaseOffset boundary, which when arenaBaseOffset != 0, is
+	// a discontinuity. However, the only way this could happen is if the
+	// page at the zero address is mapped, and this is impossible on
+	// every system we support where arenaBaseOffset != 0. So, the
+	// discontinuity is already encoded in the fact that the OS will never
+	// map the zero page for us, and this function doesn't try to handle
+	// this case in any way.
+
+	// i is the beginning of the block of entries we're searching at the
+	// current level.
+	i := 0
+
+	// firstFree is the region of address space that we are certain to
+	// find the first free page in the heap. base and bound are the inclusive
+	// bounds of this window, and both are addresses in the linearized, contiguous
+	// view of the address space (with arenaBaseOffset pre-added). At each level,
+	// this window is narrowed as we find the memory region containing the
+	// first free page of memory. To begin with, the range reflects the
+	// full process address space.
+	//
+	// firstFree is updated by calling foundFree each time free space in the
+	// heap is discovered.
+	//
+	// At the end of the search, base.addr() is the best new
+	// searchAddr we could deduce in this search.
+	firstFree := struct {
+		base, bound offAddr
+	}{
+		base:  minOffAddr,
+		bound: maxOffAddr,
+	}
+	// foundFree takes the given address range [addr, addr+size) and
+	// updates firstFree if it is a narrower range. The input range must
+	// either be fully contained within firstFree or not overlap with it
+	// at all.
+	//
+	// This way, we'll record the first summary we find with any free
+	// pages on the root level and narrow that down if we descend into
+	// that summary. But as soon as we need to iterate beyond that summary
+	// in a level to find a large enough range, we'll stop narrowing.
+	foundFree := func(addr offAddr, size uintptr) {
+		if firstFree.base.lessEqual(addr) && addr.add(size-1).lessEqual(firstFree.bound) {
+			// This range fits within the current firstFree window, so narrow
+			// down the firstFree window to the base and bound of this range.
+			firstFree.base = addr
+			firstFree.bound = addr.add(size - 1)
+		} else if !(addr.add(size-1).lessThan(firstFree.base) || firstFree.bound.lessThan(addr)) {
+			// This range only partially overlaps with the firstFree range,
+			// so throw.
+			print("runtime: addr = ", hex(addr.addr()), ", size = ", size, "\n")
+			print("runtime: base = ", hex(firstFree.base.addr()), ", bound = ", hex(firstFree.bound.addr()), "\n")
+			throw("range partially overlaps")
+		}
+	}
+
+	// lastSum is the summary which we saw on the previous level that made us
+	// move on to the next level. Used to print additional information in the
+	// case of a catastrophic failure.
+	// lastSumIdx is that summary's index in the previous level.
+	lastSum := packPallocSum(0, 0, 0)
+	lastSumIdx := -1
+
+nextLevel:
+	for l := 0; l < len(p.summary); l++ {
+		// For the root level, entriesPerBlock is the whole level.
+		entriesPerBlock := 1 << levelBits[l]
+		logMaxPages := levelLogPages[l]
+
+		// We've moved into a new level, so let's update i to our new
+		// starting index. This is a no-op for level 0.
+		i <<= levelBits[l]
+
+		// Slice out the block of entries we care about.
+		entries := p.summary[l][i : i+entriesPerBlock]
+
+		// Determine j0, the first index we should start iterating from.
+		// The searchAddr may help us eliminate iterations if we followed the
+		// searchAddr on the previous level or we're on the root leve, in which
+		// case the searchAddr should be the same as i after levelShift.
+		j0 := 0
+		if searchIdx := offAddrToLevelIndex(l, p.searchAddr); searchIdx&^(entriesPerBlock-1) == i {
+			j0 = searchIdx & (entriesPerBlock - 1)
+		}
+
+		// Run over the level entries looking for
+		// a contiguous run of at least npages either
+		// within an entry or across entries.
+		//
+		// base contains the page index (relative to
+		// the first entry's first page) of the currently
+		// considered run of consecutive pages.
+		//
+		// size contains the size of the currently considered
+		// run of consecutive pages.
+		var base, size uint
+		for j := j0; j < len(entries); j++ {
+			sum := entries[j]
+			if sum == 0 {
+				// A full entry means we broke any streak and
+				// that we should skip it altogether.
+				size = 0
+				continue
+			}
+
+			// We've encountered a non-zero summary which means
+			// free memory, so update firstFree.
+			foundFree(levelIndexToOffAddr(l, i+j), (uintptr(1)<<logMaxPages)*pageSize)
+
+			s := sum.start()
+			if size+s >= uint(npages) {
+				// If size == 0 we don't have a run yet,
+				// which means base isn't valid. So, set
+				// base to the first page in this block.
+				if size == 0 {
+					base = uint(j) << logMaxPages
+				}
+				// We hit npages; we're done!
+				size += s
+				break
+			}
+			if sum.max() >= uint(npages) {
+				// The entry itself contains npages contiguous
+				// free pages, so continue on the next level
+				// to find that run.
+				i += j
+				lastSumIdx = i
+				lastSum = sum
+				continue nextLevel
+			}
+			if size == 0 || s < 1<<logMaxPages {
+				// We either don't have a current run started, or this entry
+				// isn't totally free (meaning we can't continue the current
+				// one), so try to begin a new run by setting size and base
+				// based on sum.end.
+				size = sum.end()
+				base = uint(j+1)<<logMaxPages - size
+				continue
+			}
+			// The entry is completely free, so continue the run.
+			size += 1 << logMaxPages
+		}
+		if size >= uint(npages) {
+			// We found a sufficiently large run of free pages straddling
+			// some boundary, so compute the address and return it.
+			addr := levelIndexToOffAddr(l, i).add(uintptr(base) * pageSize).addr()
+			return addr, p.findMappedAddr(firstFree.base)
+		}
+		if l == 0 {
+			// We're at level zero, so that means we've exhausted our search.
+			return 0, maxSearchAddr
+		}
+
+		// We're not at level zero, and we exhausted the level we were looking in.
+		// This means that either our calculations were wrong or the level above
+		// lied to us. In either case, dump some useful state and throw.
+		print("runtime: summary[", l-1, "][", lastSumIdx, "] = ", lastSum.start(), ", ", lastSum.max(), ", ", lastSum.end(), "\n")
+		print("runtime: level = ", l, ", npages = ", npages, ", j0 = ", j0, "\n")
+		print("runtime: p.searchAddr = ", hex(p.searchAddr.addr()), ", i = ", i, "\n")
+		print("runtime: levelShift[level] = ", levelShift[l], ", levelBits[level] = ", levelBits[l], "\n")
+		for j := 0; j < len(entries); j++ {
+			sum := entries[j]
+			print("runtime: summary[", l, "][", i+j, "] = (", sum.start(), ", ", sum.max(), ", ", sum.end(), ")\n")
+		}
+		throw("bad summary data")
+	}
+
+	// Since we've gotten to this point, that means we haven't found a
+	// sufficiently-sized free region straddling some boundary (chunk or larger).
+	// This means the last summary we inspected must have had a large enough "max"
+	// value, so look inside the chunk to find a suitable run.
+	//
+	// After iterating over all levels, i must contain a chunk index which
+	// is what the final level represents.
+	ci := chunkIdx(i)
+	j, searchIdx := p.chunkOf(ci).find(npages, 0)
+	if j == ^uint(0) {
+		// We couldn't find any space in this chunk despite the summaries telling
+		// us it should be there. There's likely a bug, so dump some state and throw.
+		sum := p.summary[len(p.summary)-1][i]
+		print("runtime: summary[", len(p.summary)-1, "][", i, "] = (", sum.start(), ", ", sum.max(), ", ", sum.end(), ")\n")
+		print("runtime: npages = ", npages, "\n")
+		throw("bad summary data")
+	}
+
+	// Compute the address at which the free space starts.
+	addr := chunkBase(ci) + uintptr(j)*pageSize
+
+	// Since we actually searched the chunk, we may have
+	// found an even narrower free window.
+	searchAddr := chunkBase(ci) + uintptr(searchIdx)*pageSize
+	foundFree(offAddr{searchAddr}, chunkBase(ci+1)-searchAddr)
+	return addr, p.findMappedAddr(firstFree.base)
+}
+
+// alloc allocates npages worth of memory from the page heap, returning the base
+// address for the allocation and the amount of scavenged memory in bytes
+// contained in the region [base address, base address + npages*pageSize).
+//
+// Returns a 0 base address on failure, in which case other returned values
+// should be ignored.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) alloc(npages uintptr) (addr uintptr, scav uintptr) {
+	assertLockHeld(p.mheapLock)
+
+	// If the searchAddr refers to a region which has a higher address than
+	// any known chunk, then we know we're out of memory.
+	if chunkIndex(p.searchAddr.addr()) >= p.end {
+		return 0, 0
+	}
+
+	// If npages has a chance of fitting in the chunk where the searchAddr is,
+	// search it directly.
+	searchAddr := minOffAddr
+	if pallocChunkPages-chunkPageIndex(p.searchAddr.addr()) >= uint(npages) {
+		// npages is guaranteed to be no greater than pallocChunkPages here.
+		i := chunkIndex(p.searchAddr.addr())
+		if max := p.summary[len(p.summary)-1][i].max(); max >= uint(npages) {
+			j, searchIdx := p.chunkOf(i).find(npages, chunkPageIndex(p.searchAddr.addr()))
+			if j == ^uint(0) {
+				print("runtime: max = ", max, ", npages = ", npages, "\n")
+				print("runtime: searchIdx = ", chunkPageIndex(p.searchAddr.addr()), ", p.searchAddr = ", hex(p.searchAddr.addr()), "\n")
+				throw("bad summary data")
+			}
+			addr = chunkBase(i) + uintptr(j)*pageSize
+			searchAddr = offAddr{chunkBase(i) + uintptr(searchIdx)*pageSize}
+			goto Found
+		}
+	}
+	// We failed to use a searchAddr for one reason or another, so try
+	// the slow path.
+	addr, searchAddr = p.find(npages)
+	if addr == 0 {
+		if npages == 1 {
+			// We failed to find a single free page, the smallest unit
+			// of allocation. This means we know the heap is completely
+			// exhausted. Otherwise, the heap still might have free
+			// space in it, just not enough contiguous space to
+			// accommodate npages.
+			p.searchAddr = maxSearchAddr
+		}
+		return 0, 0
+	}
+Found:
+	// Go ahead and actually mark the bits now that we have an address.
+	scav = p.allocRange(addr, npages)
+
+	// If we found a higher searchAddr, we know that all the
+	// heap memory before that searchAddr in an offset address space is
+	// allocated, so bump p.searchAddr up to the new one.
+	if p.searchAddr.lessThan(searchAddr) {
+		p.searchAddr = searchAddr
+	}
+	return addr, scav
+}
+
+// free returns npages worth of memory starting at base back to the page heap.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) free(base, npages uintptr) {
+	assertLockHeld(p.mheapLock)
+
+	// If we're freeing pages below the p.searchAddr, update searchAddr.
+	if b := (offAddr{base}); b.lessThan(p.searchAddr) {
+		p.searchAddr = b
+	}
+	// Update the free high watermark for the scavenger.
+	limit := base + npages*pageSize - 1
+	if offLimit := (offAddr{limit}); p.scav.freeHWM.lessThan(offLimit) {
+		p.scav.freeHWM = offLimit
+	}
+	if npages == 1 {
+		// Fast path: we're clearing a single bit, and we know exactly
+		// where it is, so mark it directly.
+		i := chunkIndex(base)
+		p.chunkOf(i).free1(chunkPageIndex(base))
+	} else {
+		// Slow path: we're clearing more bits so we may need to iterate.
+		sc, ec := chunkIndex(base), chunkIndex(limit)
+		si, ei := chunkPageIndex(base), chunkPageIndex(limit)
+
+		if sc == ec {
+			// The range doesn't cross any chunk boundaries.
+			p.chunkOf(sc).free(si, ei+1-si)
+		} else {
+			// The range crosses at least one chunk boundary.
+			p.chunkOf(sc).free(si, pallocChunkPages-si)
+			for c := sc + 1; c < ec; c++ {
+				p.chunkOf(c).freeAll()
+			}
+			p.chunkOf(ec).free(0, ei+1)
+		}
+	}
+	p.update(base, npages, true, false)
+}
+
+const (
+	pallocSumBytes = unsafe.Sizeof(pallocSum(0))
+
+	// maxPackedValue is the maximum value that any of the three fields in
+	// the pallocSum may take on.
+	maxPackedValue    = 1 << logMaxPackedValue
+	logMaxPackedValue = logPallocChunkPages + (summaryLevels-1)*summaryLevelBits
+
+	freeChunkSum = pallocSum(uint64(pallocChunkPages) |
+		uint64(pallocChunkPages<<logMaxPackedValue) |
+		uint64(pallocChunkPages<<(2*logMaxPackedValue)))
+)
+
+// pallocSum is a packed summary type which packs three numbers: start, max,
+// and end into a single 8-byte value. Each of these values are a summary of
+// a bitmap and are thus counts, each of which may have a maximum value of
+// 2^21 - 1, or all three may be equal to 2^21. The latter case is represented
+// by just setting the 64th bit.
+type pallocSum uint64
+
+// packPallocSum takes a start, max, and end value and produces a pallocSum.
+func packPallocSum(start, max, end uint) pallocSum {
+	if max == maxPackedValue {
+		return pallocSum(uint64(1 << 63))
+	}
+	return pallocSum((uint64(start) & (maxPackedValue - 1)) |
+		((uint64(max) & (maxPackedValue - 1)) << logMaxPackedValue) |
+		((uint64(end) & (maxPackedValue - 1)) << (2 * logMaxPackedValue)))
+}
+
+// start extracts the start value from a packed sum.
+func (p pallocSum) start() uint {
+	if uint64(p)&uint64(1<<63) != 0 {
+		return maxPackedValue
+	}
+	return uint(uint64(p) & (maxPackedValue - 1))
+}
+
+// max extracts the max value from a packed sum.
+func (p pallocSum) max() uint {
+	if uint64(p)&uint64(1<<63) != 0 {
+		return maxPackedValue
+	}
+	return uint((uint64(p) >> logMaxPackedValue) & (maxPackedValue - 1))
+}
+
+// end extracts the end value from a packed sum.
+func (p pallocSum) end() uint {
+	if uint64(p)&uint64(1<<63) != 0 {
+		return maxPackedValue
+	}
+	return uint((uint64(p) >> (2 * logMaxPackedValue)) & (maxPackedValue - 1))
+}
+
+// unpack unpacks all three values from the summary.
+func (p pallocSum) unpack() (uint, uint, uint) {
+	if uint64(p)&uint64(1<<63) != 0 {
+		return maxPackedValue, maxPackedValue, maxPackedValue
+	}
+	return uint(uint64(p) & (maxPackedValue - 1)),
+		uint((uint64(p) >> logMaxPackedValue) & (maxPackedValue - 1)),
+		uint((uint64(p) >> (2 * logMaxPackedValue)) & (maxPackedValue - 1))
+}
+
+// mergeSummaries merges consecutive summaries which may each represent at
+// most 1 << logMaxPagesPerSum pages each together into one.
+func mergeSummaries(sums []pallocSum, logMaxPagesPerSum uint) pallocSum {
+	// Merge the summaries in sums into one.
+	//
+	// We do this by keeping a running summary representing the merged
+	// summaries of sums[:i] in start, max, and end.
+	start, max, end := sums[0].unpack()
+	for i := 1; i < len(sums); i++ {
+		// Merge in sums[i].
+		si, mi, ei := sums[i].unpack()
+
+		// Merge in sums[i].start only if the running summary is
+		// completely free, otherwise this summary's start
+		// plays no role in the combined sum.
+		if start == uint(i)<<logMaxPagesPerSum {
+			start += si
+		}
+
+		// Recompute the max value of the running sum by looking
+		// across the boundary between the running sum and sums[i]
+		// and at the max sums[i], taking the greatest of those two
+		// and the max of the running sum.
+		if end+si > max {
+			max = end + si
+		}
+		if mi > max {
+			max = mi
+		}
+
+		// Merge in end by checking if this new summary is totally
+		// free. If it is, then we want to extend the running sum's
+		// end by the new summary. If not, then we have some alloc'd
+		// pages in there and we just want to take the end value in
+		// sums[i].
+		if ei == 1<<logMaxPagesPerSum {
+			end += 1 << logMaxPagesPerSum
+		} else {
+			end = ei
+		}
+	}
+	return packPallocSum(start, max, end)
+}
diff --git a/src/runtime/mpagealloc_32bit.go b/src/runtime/mpagealloc_32bit.go
new file mode 100644
index 0000000..331dada
--- /dev/null
+++ b/src/runtime/mpagealloc_32bit.go
@@ -0,0 +1,116 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build 386 arm mips mipsle wasm ios,arm64
+
+// wasm is a treated as a 32-bit architecture for the purposes of the page
+// allocator, even though it has 64-bit pointers. This is because any wasm
+// pointer always has its top 32 bits as zero, so the effective heap address
+// space is only 2^32 bytes in size (see heapAddrBits).
+
+// ios/arm64 is treated as a 32-bit architecture for the purposes of the
+// page allocator, even though it has 64-bit pointers and a 33-bit address
+// space (see heapAddrBits). The 33 bit address space cannot be rounded up
+// to 64 bits because there are too many summary levels to fit in just 33
+// bits.
+
+package runtime
+
+import "unsafe"
+
+const (
+	// The number of levels in the radix tree.
+	summaryLevels = 4
+
+	// Constants for testing.
+	pageAlloc32Bit = 1
+	pageAlloc64Bit = 0
+
+	// Number of bits needed to represent all indices into the L1 of the
+	// chunks map.
+	//
+	// See (*pageAlloc).chunks for more details. Update the documentation
+	// there should this number change.
+	pallocChunksL1Bits = 0
+)
+
+// See comment in mpagealloc_64bit.go.
+var levelBits = [summaryLevels]uint{
+	summaryL0Bits,
+	summaryLevelBits,
+	summaryLevelBits,
+	summaryLevelBits,
+}
+
+// See comment in mpagealloc_64bit.go.
+var levelShift = [summaryLevels]uint{
+	heapAddrBits - summaryL0Bits,
+	heapAddrBits - summaryL0Bits - 1*summaryLevelBits,
+	heapAddrBits - summaryL0Bits - 2*summaryLevelBits,
+	heapAddrBits - summaryL0Bits - 3*summaryLevelBits,
+}
+
+// See comment in mpagealloc_64bit.go.
+var levelLogPages = [summaryLevels]uint{
+	logPallocChunkPages + 3*summaryLevelBits,
+	logPallocChunkPages + 2*summaryLevelBits,
+	logPallocChunkPages + 1*summaryLevelBits,
+	logPallocChunkPages,
+}
+
+// See mpagealloc_64bit.go for details.
+func (p *pageAlloc) sysInit() {
+	// Calculate how much memory all our entries will take up.
+	//
+	// This should be around 12 KiB or less.
+	totalSize := uintptr(0)
+	for l := 0; l < summaryLevels; l++ {
+		totalSize += (uintptr(1) << (heapAddrBits - levelShift[l])) * pallocSumBytes
+	}
+	totalSize = alignUp(totalSize, physPageSize)
+
+	// Reserve memory for all levels in one go. There shouldn't be much for 32-bit.
+	reservation := sysReserve(nil, totalSize)
+	if reservation == nil {
+		throw("failed to reserve page summary memory")
+	}
+	// There isn't much. Just map it and mark it as used immediately.
+	sysMap(reservation, totalSize, p.sysStat)
+	sysUsed(reservation, totalSize)
+
+	// Iterate over the reservation and cut it up into slices.
+	//
+	// Maintain i as the byte offset from reservation where
+	// the new slice should start.
+	for l, shift := range levelShift {
+		entries := 1 << (heapAddrBits - shift)
+
+		// Put this reservation into a slice.
+		sl := notInHeapSlice{(*notInHeap)(reservation), 0, entries}
+		p.summary[l] = *(*[]pallocSum)(unsafe.Pointer(&sl))
+
+		reservation = add(reservation, uintptr(entries)*pallocSumBytes)
+	}
+}
+
+// See mpagealloc_64bit.go for details.
+func (p *pageAlloc) sysGrow(base, limit uintptr) {
+	if base%pallocChunkBytes != 0 || limit%pallocChunkBytes != 0 {
+		print("runtime: base = ", hex(base), ", limit = ", hex(limit), "\n")
+		throw("sysGrow bounds not aligned to pallocChunkBytes")
+	}
+
+	// Walk up the tree and update the summary slices.
+	for l := len(p.summary) - 1; l >= 0; l-- {
+		// Figure out what part of the summary array this new address space needs.
+		// Note that we need to align the ranges to the block width (1<<levelBits[l])
+		// at this level because the full block is needed to compute the summary for
+		// the next level.
+		lo, hi := addrsToSummaryRange(l, base, limit)
+		_, hi = blockAlignSummaryRange(l, lo, hi)
+		if hi > len(p.summary[l]) {
+			p.summary[l] = p.summary[l][:hi]
+		}
+	}
+}
diff --git a/src/runtime/mpagealloc_64bit.go b/src/runtime/mpagealloc_64bit.go
new file mode 100644
index 0000000..ffacb46
--- /dev/null
+++ b/src/runtime/mpagealloc_64bit.go
@@ -0,0 +1,180 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build amd64 !ios,arm64 mips64 mips64le ppc64 ppc64le riscv64 s390x
+
+// See mpagealloc_32bit.go for why ios/arm64 is excluded here.
+
+package runtime
+
+import "unsafe"
+
+const (
+	// The number of levels in the radix tree.
+	summaryLevels = 5
+
+	// Constants for testing.
+	pageAlloc32Bit = 0
+	pageAlloc64Bit = 1
+
+	// Number of bits needed to represent all indices into the L1 of the
+	// chunks map.
+	//
+	// See (*pageAlloc).chunks for more details. Update the documentation
+	// there should this number change.
+	pallocChunksL1Bits = 13
+)
+
+// levelBits is the number of bits in the radix for a given level in the super summary
+// structure.
+//
+// The sum of all the entries of levelBits should equal heapAddrBits.
+var levelBits = [summaryLevels]uint{
+	summaryL0Bits,
+	summaryLevelBits,
+	summaryLevelBits,
+	summaryLevelBits,
+	summaryLevelBits,
+}
+
+// levelShift is the number of bits to shift to acquire the radix for a given level
+// in the super summary structure.
+//
+// With levelShift, one can compute the index of the summary at level l related to a
+// pointer p by doing:
+//   p >> levelShift[l]
+var levelShift = [summaryLevels]uint{
+	heapAddrBits - summaryL0Bits,
+	heapAddrBits - summaryL0Bits - 1*summaryLevelBits,
+	heapAddrBits - summaryL0Bits - 2*summaryLevelBits,
+	heapAddrBits - summaryL0Bits - 3*summaryLevelBits,
+	heapAddrBits - summaryL0Bits - 4*summaryLevelBits,
+}
+
+// levelLogPages is log2 the maximum number of runtime pages in the address space
+// a summary in the given level represents.
+//
+// The leaf level always represents exactly log2 of 1 chunk's worth of pages.
+var levelLogPages = [summaryLevels]uint{
+	logPallocChunkPages + 4*summaryLevelBits,
+	logPallocChunkPages + 3*summaryLevelBits,
+	logPallocChunkPages + 2*summaryLevelBits,
+	logPallocChunkPages + 1*summaryLevelBits,
+	logPallocChunkPages,
+}
+
+// sysInit performs architecture-dependent initialization of fields
+// in pageAlloc. pageAlloc should be uninitialized except for sysStat
+// if any runtime statistic should be updated.
+func (p *pageAlloc) sysInit() {
+	// Reserve memory for each level. This will get mapped in
+	// as R/W by setArenas.
+	for l, shift := range levelShift {
+		entries := 1 << (heapAddrBits - shift)
+
+		// Reserve b bytes of memory anywhere in the address space.
+		b := alignUp(uintptr(entries)*pallocSumBytes, physPageSize)
+		r := sysReserve(nil, b)
+		if r == nil {
+			throw("failed to reserve page summary memory")
+		}
+
+		// Put this reservation into a slice.
+		sl := notInHeapSlice{(*notInHeap)(r), 0, entries}
+		p.summary[l] = *(*[]pallocSum)(unsafe.Pointer(&sl))
+	}
+}
+
+// sysGrow performs architecture-dependent operations on heap
+// growth for the page allocator, such as mapping in new memory
+// for summaries. It also updates the length of the slices in
+// [.summary.
+//
+// base is the base of the newly-added heap memory and limit is
+// the first address past the end of the newly-added heap memory.
+// Both must be aligned to pallocChunkBytes.
+//
+// The caller must update p.start and p.end after calling sysGrow.
+func (p *pageAlloc) sysGrow(base, limit uintptr) {
+	if base%pallocChunkBytes != 0 || limit%pallocChunkBytes != 0 {
+		print("runtime: base = ", hex(base), ", limit = ", hex(limit), "\n")
+		throw("sysGrow bounds not aligned to pallocChunkBytes")
+	}
+
+	// addrRangeToSummaryRange converts a range of addresses into a range
+	// of summary indices which must be mapped to support those addresses
+	// in the summary range.
+	addrRangeToSummaryRange := func(level int, r addrRange) (int, int) {
+		sumIdxBase, sumIdxLimit := addrsToSummaryRange(level, r.base.addr(), r.limit.addr())
+		return blockAlignSummaryRange(level, sumIdxBase, sumIdxLimit)
+	}
+
+	// summaryRangeToSumAddrRange converts a range of indices in any
+	// level of p.summary into page-aligned addresses which cover that
+	// range of indices.
+	summaryRangeToSumAddrRange := func(level, sumIdxBase, sumIdxLimit int) addrRange {
+		baseOffset := alignDown(uintptr(sumIdxBase)*pallocSumBytes, physPageSize)
+		limitOffset := alignUp(uintptr(sumIdxLimit)*pallocSumBytes, physPageSize)
+		base := unsafe.Pointer(&p.summary[level][0])
+		return addrRange{
+			offAddr{uintptr(add(base, baseOffset))},
+			offAddr{uintptr(add(base, limitOffset))},
+		}
+	}
+
+	// addrRangeToSumAddrRange is a convienience function that converts
+	// an address range r to the address range of the given summary level
+	// that stores the summaries for r.
+	addrRangeToSumAddrRange := func(level int, r addrRange) addrRange {
+		sumIdxBase, sumIdxLimit := addrRangeToSummaryRange(level, r)
+		return summaryRangeToSumAddrRange(level, sumIdxBase, sumIdxLimit)
+	}
+
+	// Find the first inUse index which is strictly greater than base.
+	//
+	// Because this function will never be asked remap the same memory
+	// twice, this index is effectively the index at which we would insert
+	// this new growth, and base will never overlap/be contained within
+	// any existing range.
+	//
+	// This will be used to look at what memory in the summary array is already
+	// mapped before and after this new range.
+	inUseIndex := p.inUse.findSucc(base)
+
+	// Walk up the radix tree and map summaries in as needed.
+	for l := range p.summary {
+		// Figure out what part of the summary array this new address space needs.
+		needIdxBase, needIdxLimit := addrRangeToSummaryRange(l, makeAddrRange(base, limit))
+
+		// Update the summary slices with a new upper-bound. This ensures
+		// we get tight bounds checks on at least the top bound.
+		//
+		// We must do this regardless of whether we map new memory.
+		if needIdxLimit > len(p.summary[l]) {
+			p.summary[l] = p.summary[l][:needIdxLimit]
+		}
+
+		// Compute the needed address range in the summary array for level l.
+		need := summaryRangeToSumAddrRange(l, needIdxBase, needIdxLimit)
+
+		// Prune need down to what needs to be newly mapped. Some parts of it may
+		// already be mapped by what inUse describes due to page alignment requirements
+		// for mapping. prune's invariants are guaranteed by the fact that this
+		// function will never be asked to remap the same memory twice.
+		if inUseIndex > 0 {
+			need = need.subtract(addrRangeToSumAddrRange(l, p.inUse.ranges[inUseIndex-1]))
+		}
+		if inUseIndex < len(p.inUse.ranges) {
+			need = need.subtract(addrRangeToSumAddrRange(l, p.inUse.ranges[inUseIndex]))
+		}
+		// It's possible that after our pruning above, there's nothing new to map.
+		if need.size() == 0 {
+			continue
+		}
+
+		// Map and commit need.
+		sysMap(unsafe.Pointer(need.base.addr()), need.size(), p.sysStat)
+		sysUsed(unsafe.Pointer(need.base.addr()), need.size())
+	}
+}
diff --git a/src/runtime/mpagealloc_test.go b/src/runtime/mpagealloc_test.go
new file mode 100644
index 0000000..5d979fa
--- /dev/null
+++ b/src/runtime/mpagealloc_test.go
@@ -0,0 +1,1035 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	. "runtime"
+	"testing"
+)
+
+func checkPageAlloc(t *testing.T, want, got *PageAlloc) {
+	// Ensure start and end are correct.
+	wantStart, wantEnd := want.Bounds()
+	gotStart, gotEnd := got.Bounds()
+	if gotStart != wantStart {
+		t.Fatalf("start values not equal: got %d, want %d", gotStart, wantStart)
+	}
+	if gotEnd != wantEnd {
+		t.Fatalf("end values not equal: got %d, want %d", gotEnd, wantEnd)
+	}
+
+	for i := gotStart; i < gotEnd; i++ {
+		// Check the bitmaps. Note that we may have nil data.
+		gb, wb := got.PallocData(i), want.PallocData(i)
+		if gb == nil && wb == nil {
+			continue
+		}
+		if (gb == nil && wb != nil) || (gb != nil && wb == nil) {
+			t.Errorf("chunk %d nilness mismatch", i)
+		}
+		if !checkPallocBits(t, gb.PallocBits(), wb.PallocBits()) {
+			t.Logf("in chunk %d (mallocBits)", i)
+		}
+		if !checkPallocBits(t, gb.Scavenged(), wb.Scavenged()) {
+			t.Logf("in chunk %d (scavenged)", i)
+		}
+	}
+	// TODO(mknyszek): Verify summaries too?
+}
+
+func TestPageAllocGrow(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	type test struct {
+		chunks []ChunkIdx
+		inUse  []AddrRange
+	}
+	tests := map[string]test{
+		"One": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0)),
+			},
+		},
+		"Contiguous2": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 1,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+2, 0)),
+			},
+		},
+		"Contiguous5": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 1,
+				BaseChunkIdx + 2,
+				BaseChunkIdx + 3,
+				BaseChunkIdx + 4,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+5, 0)),
+			},
+		},
+		"Discontiguous": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 2,
+				BaseChunkIdx + 4,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+2, 0), PageBase(BaseChunkIdx+3, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+4, 0), PageBase(BaseChunkIdx+5, 0)),
+			},
+		},
+		"Mixed": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 1,
+				BaseChunkIdx + 2,
+				BaseChunkIdx + 4,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+3, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+4, 0), PageBase(BaseChunkIdx+5, 0)),
+			},
+		},
+		"WildlyDiscontiguous": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 1,
+				BaseChunkIdx + 0x10,
+				BaseChunkIdx + 0x21,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+2, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+0x10, 0), PageBase(BaseChunkIdx+0x11, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+0x21, 0), PageBase(BaseChunkIdx+0x22, 0)),
+			},
+		},
+		"ManyDiscontiguous": {
+			// The initial cap is 16. Test 33 ranges, to exercise the growth path (twice).
+			chunks: []ChunkIdx{
+				BaseChunkIdx, BaseChunkIdx + 2, BaseChunkIdx + 4, BaseChunkIdx + 6,
+				BaseChunkIdx + 8, BaseChunkIdx + 10, BaseChunkIdx + 12, BaseChunkIdx + 14,
+				BaseChunkIdx + 16, BaseChunkIdx + 18, BaseChunkIdx + 20, BaseChunkIdx + 22,
+				BaseChunkIdx + 24, BaseChunkIdx + 26, BaseChunkIdx + 28, BaseChunkIdx + 30,
+				BaseChunkIdx + 32, BaseChunkIdx + 34, BaseChunkIdx + 36, BaseChunkIdx + 38,
+				BaseChunkIdx + 40, BaseChunkIdx + 42, BaseChunkIdx + 44, BaseChunkIdx + 46,
+				BaseChunkIdx + 48, BaseChunkIdx + 50, BaseChunkIdx + 52, BaseChunkIdx + 54,
+				BaseChunkIdx + 56, BaseChunkIdx + 58, BaseChunkIdx + 60, BaseChunkIdx + 62,
+				BaseChunkIdx + 64,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+2, 0), PageBase(BaseChunkIdx+3, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+4, 0), PageBase(BaseChunkIdx+5, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+6, 0), PageBase(BaseChunkIdx+7, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+8, 0), PageBase(BaseChunkIdx+9, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+10, 0), PageBase(BaseChunkIdx+11, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+12, 0), PageBase(BaseChunkIdx+13, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+14, 0), PageBase(BaseChunkIdx+15, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+16, 0), PageBase(BaseChunkIdx+17, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+18, 0), PageBase(BaseChunkIdx+19, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+20, 0), PageBase(BaseChunkIdx+21, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+22, 0), PageBase(BaseChunkIdx+23, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+24, 0), PageBase(BaseChunkIdx+25, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+26, 0), PageBase(BaseChunkIdx+27, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+28, 0), PageBase(BaseChunkIdx+29, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+30, 0), PageBase(BaseChunkIdx+31, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+32, 0), PageBase(BaseChunkIdx+33, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+34, 0), PageBase(BaseChunkIdx+35, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+36, 0), PageBase(BaseChunkIdx+37, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+38, 0), PageBase(BaseChunkIdx+39, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+40, 0), PageBase(BaseChunkIdx+41, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+42, 0), PageBase(BaseChunkIdx+43, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+44, 0), PageBase(BaseChunkIdx+45, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+46, 0), PageBase(BaseChunkIdx+47, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+48, 0), PageBase(BaseChunkIdx+49, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+50, 0), PageBase(BaseChunkIdx+51, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+52, 0), PageBase(BaseChunkIdx+53, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+54, 0), PageBase(BaseChunkIdx+55, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+56, 0), PageBase(BaseChunkIdx+57, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+58, 0), PageBase(BaseChunkIdx+59, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+60, 0), PageBase(BaseChunkIdx+61, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+62, 0), PageBase(BaseChunkIdx+63, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+64, 0), PageBase(BaseChunkIdx+65, 0)),
+			},
+		},
+	}
+	if PageAlloc64Bit != 0 {
+		tests["ExtremelyDiscontiguous"] = test{
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 0x100000, // constant translates to O(TiB)
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+0x100000, 0), PageBase(BaseChunkIdx+0x100001, 0)),
+			},
+		}
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			// By creating a new pageAlloc, we will
+			// grow it for each chunk defined in x.
+			x := make(map[ChunkIdx][]BitRange)
+			for _, c := range v.chunks {
+				x[c] = []BitRange{}
+			}
+			b := NewPageAlloc(x, nil)
+			defer FreePageAlloc(b)
+
+			got := b.InUse()
+			want := v.inUse
+
+			// Check for mismatches.
+			if len(got) != len(want) {
+				t.Fail()
+			} else {
+				for i := range want {
+					if !want[i].Equals(got[i]) {
+						t.Fail()
+						break
+					}
+				}
+			}
+			if t.Failed() {
+				t.Logf("found inUse mismatch")
+				t.Logf("got:")
+				for i, r := range got {
+					t.Logf("\t#%d [0x%x, 0x%x)", i, r.Base(), r.Limit())
+				}
+				t.Logf("want:")
+				for i, r := range want {
+					t.Logf("\t#%d [0x%x, 0x%x)", i, r.Base(), r.Limit())
+				}
+			}
+		})
+	}
+}
+
+func TestPageAllocAlloc(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	type hit struct {
+		npages, base, scav uintptr
+	}
+	type test struct {
+		scav   map[ChunkIdx][]BitRange
+		before map[ChunkIdx][]BitRange
+		after  map[ChunkIdx][]BitRange
+		hits   []hit
+	}
+	tests := map[string]test{
+		"AllFree1": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 1}, {2, 2}},
+			},
+			hits: []hit{
+				{1, PageBase(BaseChunkIdx, 0), PageSize},
+				{1, PageBase(BaseChunkIdx, 1), 0},
+				{1, PageBase(BaseChunkIdx, 2), PageSize},
+				{1, PageBase(BaseChunkIdx, 3), PageSize},
+				{1, PageBase(BaseChunkIdx, 4), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 5}},
+			},
+		},
+		"ManyArena1": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages - 1}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+			hits: []hit{
+				{1, PageBase(BaseChunkIdx+2, PallocChunkPages-1), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+		},
+		"NotContiguous1": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{0, 0}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{0, PallocChunkPages}},
+			},
+			hits: []hit{
+				{1, PageBase(BaseChunkIdx+0xff, 0), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{0, 1}},
+			},
+		},
+		"AllFree2": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 3}, {7, 1}},
+			},
+			hits: []hit{
+				{2, PageBase(BaseChunkIdx, 0), 2 * PageSize},
+				{2, PageBase(BaseChunkIdx, 2), PageSize},
+				{2, PageBase(BaseChunkIdx, 4), 0},
+				{2, PageBase(BaseChunkIdx, 6), PageSize},
+				{2, PageBase(BaseChunkIdx, 8), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 10}},
+			},
+		},
+		"Straddle2": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages - 1}},
+				BaseChunkIdx + 1: {{1, PallocChunkPages - 1}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{PallocChunkPages - 1, 1}},
+				BaseChunkIdx + 1: {},
+			},
+			hits: []hit{
+				{2, PageBase(BaseChunkIdx, PallocChunkPages-1), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+		},
+		"AllFree5": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 8}, {9, 1}, {17, 5}},
+			},
+			hits: []hit{
+				{5, PageBase(BaseChunkIdx, 0), 5 * PageSize},
+				{5, PageBase(BaseChunkIdx, 5), 4 * PageSize},
+				{5, PageBase(BaseChunkIdx, 10), 0},
+				{5, PageBase(BaseChunkIdx, 15), 3 * PageSize},
+				{5, PageBase(BaseChunkIdx, 20), 2 * PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 25}},
+			},
+		},
+		"AllFree64": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{21, 1}, {63, 65}},
+			},
+			hits: []hit{
+				{64, PageBase(BaseChunkIdx, 0), 2 * PageSize},
+				{64, PageBase(BaseChunkIdx, 64), 64 * PageSize},
+				{64, PageBase(BaseChunkIdx, 128), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 192}},
+			},
+		},
+		"AllFree65": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{129, 1}},
+			},
+			hits: []hit{
+				{65, PageBase(BaseChunkIdx, 0), 0},
+				{65, PageBase(BaseChunkIdx, 65), PageSize},
+				{65, PageBase(BaseChunkIdx, 130), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 195}},
+			},
+		},
+		"ExhaustPallocChunkPages-3": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{10, 1}},
+			},
+			hits: []hit{
+				{PallocChunkPages - 3, PageBase(BaseChunkIdx, 0), PageSize},
+				{PallocChunkPages - 3, 0, 0},
+				{1, PageBase(BaseChunkIdx, PallocChunkPages-3), 0},
+				{2, PageBase(BaseChunkIdx, PallocChunkPages-2), 0},
+				{1, 0, 0},
+				{PallocChunkPages - 3, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+		},
+		"AllFreePallocChunkPages": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 1}, {PallocChunkPages - 1, 1}},
+			},
+			hits: []hit{
+				{PallocChunkPages, PageBase(BaseChunkIdx, 0), 2 * PageSize},
+				{PallocChunkPages, 0, 0},
+				{1, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+		},
+		"StraddlePallocChunkPages": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {{PallocChunkPages / 2, PallocChunkPages / 2}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {{3, 100}},
+			},
+			hits: []hit{
+				{PallocChunkPages, PageBase(BaseChunkIdx, PallocChunkPages/2), 100 * PageSize},
+				{PallocChunkPages, 0, 0},
+				{1, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+		},
+		"StraddlePallocChunkPages+1": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+			hits: []hit{
+				{PallocChunkPages + 1, PageBase(BaseChunkIdx, PallocChunkPages/2), (PallocChunkPages + 1) * PageSize},
+				{PallocChunkPages, 0, 0},
+				{1, PageBase(BaseChunkIdx+1, PallocChunkPages/2+1), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages/2 + 2}},
+			},
+		},
+		"AllFreePallocChunkPages*2": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+			hits: []hit{
+				{PallocChunkPages * 2, PageBase(BaseChunkIdx, 0), 0},
+				{PallocChunkPages * 2, 0, 0},
+				{1, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+		},
+		"NotContiguousPallocChunkPages*2": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {},
+				BaseChunkIdx + 0x40: {},
+				BaseChunkIdx + 0x41: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0x40: {},
+				BaseChunkIdx + 0x41: {},
+			},
+			hits: []hit{
+				{PallocChunkPages * 2, PageBase(BaseChunkIdx+0x40, 0), 0},
+				{21, PageBase(BaseChunkIdx, 0), 21 * PageSize},
+				{1, PageBase(BaseChunkIdx, 21), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, 22}},
+				BaseChunkIdx + 0x40: {{0, PallocChunkPages}},
+				BaseChunkIdx + 0x41: {{0, PallocChunkPages}},
+			},
+		},
+		"StraddlePallocChunkPages*2": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {{PallocChunkPages / 2, PallocChunkPages / 2}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, 7}},
+				BaseChunkIdx + 1: {{3, 5}, {121, 10}},
+				BaseChunkIdx + 2: {{PallocChunkPages/2 + 12, 2}},
+			},
+			hits: []hit{
+				{PallocChunkPages * 2, PageBase(BaseChunkIdx, PallocChunkPages/2), 15 * PageSize},
+				{PallocChunkPages * 2, 0, 0},
+				{1, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+		},
+		"StraddlePallocChunkPages*5/4": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages * 3 / 4}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages * 3 / 4}},
+				BaseChunkIdx + 3: {{0, 0}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{PallocChunkPages / 2, PallocChunkPages/4 + 1}},
+				BaseChunkIdx + 2: {{PallocChunkPages / 3, 1}},
+				BaseChunkIdx + 3: {{PallocChunkPages * 2 / 3, 1}},
+			},
+			hits: []hit{
+				{PallocChunkPages * 5 / 4, PageBase(BaseChunkIdx+2, PallocChunkPages*3/4), PageSize},
+				{PallocChunkPages * 5 / 4, 0, 0},
+				{1, PageBase(BaseChunkIdx+1, PallocChunkPages*3/4), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages*3/4 + 1}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+				BaseChunkIdx + 3: {{0, PallocChunkPages}},
+			},
+		},
+		"AllFreePallocChunkPages*7+5": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {},
+				BaseChunkIdx + 3: {},
+				BaseChunkIdx + 4: {},
+				BaseChunkIdx + 5: {},
+				BaseChunkIdx + 6: {},
+				BaseChunkIdx + 7: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{50, 1}},
+				BaseChunkIdx + 1: {{31, 1}},
+				BaseChunkIdx + 2: {{7, 1}},
+				BaseChunkIdx + 3: {{200, 1}},
+				BaseChunkIdx + 4: {{3, 1}},
+				BaseChunkIdx + 5: {{51, 1}},
+				BaseChunkIdx + 6: {{20, 1}},
+				BaseChunkIdx + 7: {{1, 1}},
+			},
+			hits: []hit{
+				{PallocChunkPages*7 + 5, PageBase(BaseChunkIdx, 0), 8 * PageSize},
+				{PallocChunkPages*7 + 5, 0, 0},
+				{1, PageBase(BaseChunkIdx+7, 5), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+				BaseChunkIdx + 3: {{0, PallocChunkPages}},
+				BaseChunkIdx + 4: {{0, PallocChunkPages}},
+				BaseChunkIdx + 5: {{0, PallocChunkPages}},
+				BaseChunkIdx + 6: {{0, PallocChunkPages}},
+				BaseChunkIdx + 7: {{0, 6}},
+			},
+		},
+	}
+	if PageAlloc64Bit != 0 {
+		const chunkIdxBigJump = 0x100000 // chunk index offset which translates to O(TiB)
+
+		// This test attempts to trigger a bug wherein we look at unmapped summary
+		// memory that isn't just in the case where we exhaust the heap.
+		//
+		// It achieves this by placing a chunk such that its summary will be
+		// at the very end of a physical page. It then also places another chunk
+		// much further up in the address space, such that any allocations into the
+		// first chunk do not exhaust the heap and the second chunk's summary is not in the
+		// page immediately adjacent to the first chunk's summary's page.
+		// Allocating into this first chunk to exhaustion and then into the second
+		// chunk may then trigger a check in the allocator which erroneously looks at
+		// unmapped summary memory and crashes.
+
+		// Figure out how many chunks are in a physical page, then align BaseChunkIdx
+		// to a physical page in the chunk summary array. Here we only assume that
+		// each summary array is aligned to some physical page.
+		sumsPerPhysPage := ChunkIdx(PhysPageSize / PallocSumBytes)
+		baseChunkIdx := BaseChunkIdx &^ (sumsPerPhysPage - 1)
+		tests["DiscontiguousMappedSumBoundary"] = test{
+			before: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {},
+				baseChunkIdx + chunkIdxBigJump:     {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {},
+				baseChunkIdx + chunkIdxBigJump:     {},
+			},
+			hits: []hit{
+				{PallocChunkPages - 1, PageBase(baseChunkIdx+sumsPerPhysPage-1, 0), 0},
+				{1, PageBase(baseChunkIdx+sumsPerPhysPage-1, PallocChunkPages-1), 0},
+				{1, PageBase(baseChunkIdx+chunkIdxBigJump, 0), 0},
+				{PallocChunkPages - 1, PageBase(baseChunkIdx+chunkIdxBigJump, 1), 0},
+				{1, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {{0, PallocChunkPages}},
+				baseChunkIdx + chunkIdxBigJump:     {{0, PallocChunkPages}},
+			},
+		}
+
+		// Test to check for issue #40191. Essentially, the candidate searchAddr
+		// discovered by find may not point to mapped memory, so we need to handle
+		// that explicitly.
+		//
+		// chunkIdxSmallOffset is an offset intended to be used within chunkIdxBigJump.
+		// It is far enough within chunkIdxBigJump that the summaries at the beginning
+		// of an address range the size of chunkIdxBigJump will not be mapped in.
+		const chunkIdxSmallOffset = 0x503
+		tests["DiscontiguousBadSearchAddr"] = test{
+			before: map[ChunkIdx][]BitRange{
+				// The mechanism for the bug involves three chunks, A, B, and C, which are
+				// far apart in the address space. In particular, B is chunkIdxBigJump +
+				// chunkIdxSmalloffset chunks away from B, and C is 2*chunkIdxBigJump chunks
+				// away from A. A has 1 page free, B has several (NOT at the end of B), and
+				// C is totally free.
+				// Note that B's free memory must not be at the end of B because the fast
+				// path in the page allocator will check if the searchAddr even gives us
+				// enough space to place the allocation in a chunk before accessing the
+				// summary.
+				BaseChunkIdx + chunkIdxBigJump*0: {{0, PallocChunkPages - 1}},
+				BaseChunkIdx + chunkIdxBigJump*1 + chunkIdxSmallOffset: {
+					{0, PallocChunkPages - 10},
+					{PallocChunkPages - 1, 1},
+				},
+				BaseChunkIdx + chunkIdxBigJump*2: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx + chunkIdxBigJump*0:                       {},
+				BaseChunkIdx + chunkIdxBigJump*1 + chunkIdxSmallOffset: {},
+				BaseChunkIdx + chunkIdxBigJump*2:                       {},
+			},
+			hits: []hit{
+				// We first allocate into A to set the page allocator's searchAddr to the
+				// end of that chunk. That is the only purpose A serves.
+				{1, PageBase(BaseChunkIdx, PallocChunkPages-1), 0},
+				// Then, we make a big allocation that doesn't fit into B, and so must be
+				// fulfilled by C.
+				//
+				// On the way to fulfilling the allocation into C, we estimate searchAddr
+				// using the summary structure, but that will give us a searchAddr of
+				// B's base address minus chunkIdxSmallOffset chunks. These chunks will
+				// not be mapped.
+				{100, PageBase(baseChunkIdx+chunkIdxBigJump*2, 0), 0},
+				// Now we try to make a smaller allocation that can be fulfilled by B.
+				// In an older implementation of the page allocator, this will segfault,
+				// because this last allocation will first try to access the summary
+				// for B's base address minus chunkIdxSmallOffset chunks in the fast path,
+				// and this will not be mapped.
+				{9, PageBase(baseChunkIdx+chunkIdxBigJump*1+chunkIdxSmallOffset, PallocChunkPages-10), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx + chunkIdxBigJump*0:                       {{0, PallocChunkPages}},
+				BaseChunkIdx + chunkIdxBigJump*1 + chunkIdxSmallOffset: {{0, PallocChunkPages}},
+				BaseChunkIdx + chunkIdxBigJump*2:                       {{0, 100}},
+			},
+		}
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := NewPageAlloc(v.before, v.scav)
+			defer FreePageAlloc(b)
+
+			for iter, i := range v.hits {
+				a, s := b.Alloc(i.npages)
+				if a != i.base {
+					t.Fatalf("bad alloc #%d: want base 0x%x, got 0x%x", iter+1, i.base, a)
+				}
+				if s != i.scav {
+					t.Fatalf("bad alloc #%d: want scav %d, got %d", iter+1, i.scav, s)
+				}
+			}
+			want := NewPageAlloc(v.after, v.scav)
+			defer FreePageAlloc(want)
+
+			checkPageAlloc(t, want, b)
+		})
+	}
+}
+
+func TestPageAllocExhaust(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	for _, npages := range []uintptr{1, 2, 3, 4, 5, 8, 16, 64, 1024, 1025, 2048, 2049} {
+		npages := npages
+		t.Run(fmt.Sprintf("%d", npages), func(t *testing.T) {
+			// Construct b.
+			bDesc := make(map[ChunkIdx][]BitRange)
+			for i := ChunkIdx(0); i < 4; i++ {
+				bDesc[BaseChunkIdx+i] = []BitRange{}
+			}
+			b := NewPageAlloc(bDesc, nil)
+			defer FreePageAlloc(b)
+
+			// Allocate into b with npages until we've exhausted the heap.
+			nAlloc := (PallocChunkPages * 4) / int(npages)
+			for i := 0; i < nAlloc; i++ {
+				addr := PageBase(BaseChunkIdx, uint(i)*uint(npages))
+				if a, _ := b.Alloc(npages); a != addr {
+					t.Fatalf("bad alloc #%d: want 0x%x, got 0x%x", i+1, addr, a)
+				}
+			}
+
+			// Check to make sure the next allocation fails.
+			if a, _ := b.Alloc(npages); a != 0 {
+				t.Fatalf("bad alloc #%d: want 0, got 0x%x", nAlloc, a)
+			}
+
+			// Construct what we want the heap to look like now.
+			allocPages := nAlloc * int(npages)
+			wantDesc := make(map[ChunkIdx][]BitRange)
+			for i := ChunkIdx(0); i < 4; i++ {
+				if allocPages >= PallocChunkPages {
+					wantDesc[BaseChunkIdx+i] = []BitRange{{0, PallocChunkPages}}
+					allocPages -= PallocChunkPages
+				} else if allocPages > 0 {
+					wantDesc[BaseChunkIdx+i] = []BitRange{{0, uint(allocPages)}}
+					allocPages = 0
+				} else {
+					wantDesc[BaseChunkIdx+i] = []BitRange{}
+				}
+			}
+			want := NewPageAlloc(wantDesc, nil)
+			defer FreePageAlloc(want)
+
+			// Check to make sure the heap b matches what we want.
+			checkPageAlloc(t, want, b)
+		})
+	}
+}
+
+func TestPageAllocFree(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	tests := map[string]struct {
+		before map[ChunkIdx][]BitRange
+		after  map[ChunkIdx][]BitRange
+		npages uintptr
+		frees  []uintptr
+	}{
+		"Free1": {
+			npages: 1,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+				PageBase(BaseChunkIdx, 1),
+				PageBase(BaseChunkIdx, 2),
+				PageBase(BaseChunkIdx, 3),
+				PageBase(BaseChunkIdx, 4),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{5, PallocChunkPages - 5}},
+			},
+		},
+		"ManyArena1": {
+			npages: 1,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, PallocChunkPages/2),
+				PageBase(BaseChunkIdx+1, 0),
+				PageBase(BaseChunkIdx+2, PallocChunkPages-1),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}, {PallocChunkPages/2 + 1, PallocChunkPages/2 - 1}},
+				BaseChunkIdx + 1: {{1, PallocChunkPages - 1}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages - 1}},
+			},
+		},
+		"Free2": {
+			npages: 2,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+				PageBase(BaseChunkIdx, 2),
+				PageBase(BaseChunkIdx, 4),
+				PageBase(BaseChunkIdx, 6),
+				PageBase(BaseChunkIdx, 8),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{10, PallocChunkPages - 10}},
+			},
+		},
+		"Straddle2": {
+			npages: 2,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{PallocChunkPages - 1, 1}},
+				BaseChunkIdx + 1: {{0, 1}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, PallocChunkPages-1),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+		},
+		"Free5": {
+			npages: 5,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+				PageBase(BaseChunkIdx, 5),
+				PageBase(BaseChunkIdx, 10),
+				PageBase(BaseChunkIdx, 15),
+				PageBase(BaseChunkIdx, 20),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{25, PallocChunkPages - 25}},
+			},
+		},
+		"Free64": {
+			npages: 64,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+				PageBase(BaseChunkIdx, 64),
+				PageBase(BaseChunkIdx, 128),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{192, PallocChunkPages - 192}},
+			},
+		},
+		"Free65": {
+			npages: 65,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+				PageBase(BaseChunkIdx, 65),
+				PageBase(BaseChunkIdx, 130),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{195, PallocChunkPages - 195}},
+			},
+		},
+		"FreePallocChunkPages": {
+			npages: PallocChunkPages,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+		},
+		"StraddlePallocChunkPages": {
+			npages: PallocChunkPages,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{PallocChunkPages / 2, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages / 2}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, PallocChunkPages/2),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+		},
+		"StraddlePallocChunkPages+1": {
+			npages: PallocChunkPages + 1,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, PallocChunkPages/2),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {{PallocChunkPages/2 + 1, PallocChunkPages/2 - 1}},
+			},
+		},
+		"FreePallocChunkPages*2": {
+			npages: PallocChunkPages * 2,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+		},
+		"StraddlePallocChunkPages*2": {
+			npages: PallocChunkPages * 2,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, PallocChunkPages/2),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {{PallocChunkPages / 2, PallocChunkPages / 2}},
+			},
+		},
+		"AllFreePallocChunkPages*7+5": {
+			npages: PallocChunkPages*7 + 5,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+				BaseChunkIdx + 3: {{0, PallocChunkPages}},
+				BaseChunkIdx + 4: {{0, PallocChunkPages}},
+				BaseChunkIdx + 5: {{0, PallocChunkPages}},
+				BaseChunkIdx + 6: {{0, PallocChunkPages}},
+				BaseChunkIdx + 7: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {},
+				BaseChunkIdx + 3: {},
+				BaseChunkIdx + 4: {},
+				BaseChunkIdx + 5: {},
+				BaseChunkIdx + 6: {},
+				BaseChunkIdx + 7: {{5, PallocChunkPages - 5}},
+			},
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := NewPageAlloc(v.before, nil)
+			defer FreePageAlloc(b)
+
+			for _, addr := range v.frees {
+				b.Free(addr, v.npages)
+			}
+			want := NewPageAlloc(v.after, nil)
+			defer FreePageAlloc(want)
+
+			checkPageAlloc(t, want, b)
+		})
+	}
+}
+
+func TestPageAllocAllocAndFree(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	type hit struct {
+		alloc  bool
+		npages uintptr
+		base   uintptr
+	}
+	tests := map[string]struct {
+		init map[ChunkIdx][]BitRange
+		hits []hit
+	}{
+		// TODO(mknyszek): Write more tests here.
+		"Chunks8": {
+			init: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {},
+				BaseChunkIdx + 3: {},
+				BaseChunkIdx + 4: {},
+				BaseChunkIdx + 5: {},
+				BaseChunkIdx + 6: {},
+				BaseChunkIdx + 7: {},
+			},
+			hits: []hit{
+				{true, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{false, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{true, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{false, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{true, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{false, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{true, 1, PageBase(BaseChunkIdx, 0)},
+				{false, 1, PageBase(BaseChunkIdx, 0)},
+				{true, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+			},
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := NewPageAlloc(v.init, nil)
+			defer FreePageAlloc(b)
+
+			for iter, i := range v.hits {
+				if i.alloc {
+					if a, _ := b.Alloc(i.npages); a != i.base {
+						t.Fatalf("bad alloc #%d: want 0x%x, got 0x%x", iter+1, i.base, a)
+					}
+				} else {
+					b.Free(i.base, i.npages)
+				}
+			}
+		})
+	}
+}
diff --git a/src/runtime/mpagecache.go b/src/runtime/mpagecache.go
new file mode 100644
index 0000000..4b5c66d
--- /dev/null
+++ b/src/runtime/mpagecache.go
@@ -0,0 +1,173 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const pageCachePages = 8 * unsafe.Sizeof(pageCache{}.cache)
+
+// pageCache represents a per-p cache of pages the allocator can
+// allocate from without a lock. More specifically, it represents
+// a pageCachePages*pageSize chunk of memory with 0 or more free
+// pages in it.
+type pageCache struct {
+	base  uintptr // base address of the chunk
+	cache uint64  // 64-bit bitmap representing free pages (1 means free)
+	scav  uint64  // 64-bit bitmap representing scavenged pages (1 means scavenged)
+}
+
+// empty returns true if the pageCache has any free pages, and false
+// otherwise.
+func (c *pageCache) empty() bool {
+	return c.cache == 0
+}
+
+// alloc allocates npages from the page cache and is the main entry
+// point for allocation.
+//
+// Returns a base address and the amount of scavenged memory in the
+// allocated region in bytes.
+//
+// Returns a base address of zero on failure, in which case the
+// amount of scavenged memory should be ignored.
+func (c *pageCache) alloc(npages uintptr) (uintptr, uintptr) {
+	if c.cache == 0 {
+		return 0, 0
+	}
+	if npages == 1 {
+		i := uintptr(sys.TrailingZeros64(c.cache))
+		scav := (c.scav >> i) & 1
+		c.cache &^= 1 << i // set bit to mark in-use
+		c.scav &^= 1 << i  // clear bit to mark unscavenged
+		return c.base + i*pageSize, uintptr(scav) * pageSize
+	}
+	return c.allocN(npages)
+}
+
+// allocN is a helper which attempts to allocate npages worth of pages
+// from the cache. It represents the general case for allocating from
+// the page cache.
+//
+// Returns a base address and the amount of scavenged memory in the
+// allocated region in bytes.
+func (c *pageCache) allocN(npages uintptr) (uintptr, uintptr) {
+	i := findBitRange64(c.cache, uint(npages))
+	if i >= 64 {
+		return 0, 0
+	}
+	mask := ((uint64(1) << npages) - 1) << i
+	scav := sys.OnesCount64(c.scav & mask)
+	c.cache &^= mask // mark in-use bits
+	c.scav &^= mask  // clear scavenged bits
+	return c.base + uintptr(i*pageSize), uintptr(scav) * pageSize
+}
+
+// flush empties out unallocated free pages in the given cache
+// into s. Then, it clears the cache, such that empty returns
+// true.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (c *pageCache) flush(p *pageAlloc) {
+	assertLockHeld(p.mheapLock)
+
+	if c.empty() {
+		return
+	}
+	ci := chunkIndex(c.base)
+	pi := chunkPageIndex(c.base)
+
+	// This method is called very infrequently, so just do the
+	// slower, safer thing by iterating over each bit individually.
+	for i := uint(0); i < 64; i++ {
+		if c.cache&(1<<i) != 0 {
+			p.chunkOf(ci).free1(pi + i)
+		}
+		if c.scav&(1<<i) != 0 {
+			p.chunkOf(ci).scavenged.setRange(pi+i, 1)
+		}
+	}
+	// Since this is a lot like a free, we need to make sure
+	// we update the searchAddr just like free does.
+	if b := (offAddr{c.base}); b.lessThan(p.searchAddr) {
+		p.searchAddr = b
+	}
+	p.update(c.base, pageCachePages, false, false)
+	*c = pageCache{}
+}
+
+// allocToCache acquires a pageCachePages-aligned chunk of free pages which
+// may not be contiguous, and returns a pageCache structure which owns the
+// chunk.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) allocToCache() pageCache {
+	assertLockHeld(p.mheapLock)
+
+	// If the searchAddr refers to a region which has a higher address than
+	// any known chunk, then we know we're out of memory.
+	if chunkIndex(p.searchAddr.addr()) >= p.end {
+		return pageCache{}
+	}
+	c := pageCache{}
+	ci := chunkIndex(p.searchAddr.addr()) // chunk index
+	if p.summary[len(p.summary)-1][ci] != 0 {
+		// Fast path: there's free pages at or near the searchAddr address.
+		chunk := p.chunkOf(ci)
+		j, _ := chunk.find(1, chunkPageIndex(p.searchAddr.addr()))
+		if j == ^uint(0) {
+			throw("bad summary data")
+		}
+		c = pageCache{
+			base:  chunkBase(ci) + alignDown(uintptr(j), 64)*pageSize,
+			cache: ^chunk.pages64(j),
+			scav:  chunk.scavenged.block64(j),
+		}
+	} else {
+		// Slow path: the searchAddr address had nothing there, so go find
+		// the first free page the slow way.
+		addr, _ := p.find(1)
+		if addr == 0 {
+			// We failed to find adequate free space, so mark the searchAddr as OoM
+			// and return an empty pageCache.
+			p.searchAddr = maxSearchAddr
+			return pageCache{}
+		}
+		ci := chunkIndex(addr)
+		chunk := p.chunkOf(ci)
+		c = pageCache{
+			base:  alignDown(addr, 64*pageSize),
+			cache: ^chunk.pages64(chunkPageIndex(addr)),
+			scav:  chunk.scavenged.block64(chunkPageIndex(addr)),
+		}
+	}
+
+	// Set the bits as allocated and clear the scavenged bits.
+	p.allocRange(c.base, pageCachePages)
+
+	// Update as an allocation, but note that it's not contiguous.
+	p.update(c.base, pageCachePages, false, true)
+
+	// Set the search address to the last page represented by the cache.
+	// Since all of the pages in this block are going to the cache, and we
+	// searched for the first free page, we can confidently start at the
+	// next page.
+	//
+	// However, p.searchAddr is not allowed to point into unmapped heap memory
+	// unless it is maxSearchAddr, so make it the last page as opposed to
+	// the page after.
+	p.searchAddr = offAddr{c.base + pageSize*(pageCachePages-1)}
+	return c
+}
diff --git a/src/runtime/mpagecache_test.go b/src/runtime/mpagecache_test.go
new file mode 100644
index 0000000..2ed0c0a
--- /dev/null
+++ b/src/runtime/mpagecache_test.go
@@ -0,0 +1,399 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math/rand"
+	. "runtime"
+	"testing"
+)
+
+func checkPageCache(t *testing.T, got, want PageCache) {
+	if got.Base() != want.Base() {
+		t.Errorf("bad pageCache base: got 0x%x, want 0x%x", got.Base(), want.Base())
+	}
+	if got.Cache() != want.Cache() {
+		t.Errorf("bad pageCache bits: got %016x, want %016x", got.Base(), want.Base())
+	}
+	if got.Scav() != want.Scav() {
+		t.Errorf("bad pageCache scav: got %016x, want %016x", got.Scav(), want.Scav())
+	}
+}
+
+func TestPageCacheAlloc(t *testing.T) {
+	base := PageBase(BaseChunkIdx, 0)
+	type hit struct {
+		npages uintptr
+		base   uintptr
+		scav   uintptr
+	}
+	tests := map[string]struct {
+		cache PageCache
+		hits  []hit
+	}{
+		"Empty": {
+			cache: NewPageCache(base, 0, 0),
+			hits: []hit{
+				{1, 0, 0},
+				{2, 0, 0},
+				{3, 0, 0},
+				{4, 0, 0},
+				{5, 0, 0},
+				{11, 0, 0},
+				{12, 0, 0},
+				{16, 0, 0},
+				{27, 0, 0},
+				{32, 0, 0},
+				{43, 0, 0},
+				{57, 0, 0},
+				{64, 0, 0},
+				{121, 0, 0},
+			},
+		},
+		"Lo1": {
+			cache: NewPageCache(base, 0x1, 0x1),
+			hits: []hit{
+				{1, base, PageSize},
+				{1, 0, 0},
+				{10, 0, 0},
+			},
+		},
+		"Hi1": {
+			cache: NewPageCache(base, 0x1<<63, 0x1),
+			hits: []hit{
+				{1, base + 63*PageSize, 0},
+				{1, 0, 0},
+				{10, 0, 0},
+			},
+		},
+		"Swiss1": {
+			cache: NewPageCache(base, 0x20005555, 0x5505),
+			hits: []hit{
+				{2, 0, 0},
+				{1, base, PageSize},
+				{1, base + 2*PageSize, PageSize},
+				{1, base + 4*PageSize, 0},
+				{1, base + 6*PageSize, 0},
+				{1, base + 8*PageSize, PageSize},
+				{1, base + 10*PageSize, PageSize},
+				{1, base + 12*PageSize, PageSize},
+				{1, base + 14*PageSize, PageSize},
+				{1, base + 29*PageSize, 0},
+				{1, 0, 0},
+				{10, 0, 0},
+			},
+		},
+		"Lo2": {
+			cache: NewPageCache(base, 0x3, 0x2<<62),
+			hits: []hit{
+				{2, base, 0},
+				{2, 0, 0},
+				{1, 0, 0},
+			},
+		},
+		"Hi2": {
+			cache: NewPageCache(base, 0x3<<62, 0x3<<62),
+			hits: []hit{
+				{2, base + 62*PageSize, 2 * PageSize},
+				{2, 0, 0},
+				{1, 0, 0},
+			},
+		},
+		"Swiss2": {
+			cache: NewPageCache(base, 0x3333<<31, 0x3030<<31),
+			hits: []hit{
+				{2, base + 31*PageSize, 0},
+				{2, base + 35*PageSize, 2 * PageSize},
+				{2, base + 39*PageSize, 0},
+				{2, base + 43*PageSize, 2 * PageSize},
+				{2, 0, 0},
+			},
+		},
+		"Hi53": {
+			cache: NewPageCache(base, ((uint64(1)<<53)-1)<<10, ((uint64(1)<<16)-1)<<10),
+			hits: []hit{
+				{53, base + 10*PageSize, 16 * PageSize},
+				{53, 0, 0},
+				{1, 0, 0},
+			},
+		},
+		"Full53": {
+			cache: NewPageCache(base, ^uint64(0), ((uint64(1)<<16)-1)<<10),
+			hits: []hit{
+				{53, base, 16 * PageSize},
+				{53, 0, 0},
+				{1, base + 53*PageSize, 0},
+			},
+		},
+		"Full64": {
+			cache: NewPageCache(base, ^uint64(0), ^uint64(0)),
+			hits: []hit{
+				{64, base, 64 * PageSize},
+				{64, 0, 0},
+				{1, 0, 0},
+			},
+		},
+		"FullMixed": {
+			cache: NewPageCache(base, ^uint64(0), ^uint64(0)),
+			hits: []hit{
+				{5, base, 5 * PageSize},
+				{7, base + 5*PageSize, 7 * PageSize},
+				{1, base + 12*PageSize, 1 * PageSize},
+				{23, base + 13*PageSize, 23 * PageSize},
+				{63, 0, 0},
+				{3, base + 36*PageSize, 3 * PageSize},
+				{3, base + 39*PageSize, 3 * PageSize},
+				{3, base + 42*PageSize, 3 * PageSize},
+				{12, base + 45*PageSize, 12 * PageSize},
+				{11, 0, 0},
+				{4, base + 57*PageSize, 4 * PageSize},
+				{4, 0, 0},
+				{6, 0, 0},
+				{36, 0, 0},
+				{2, base + 61*PageSize, 2 * PageSize},
+				{3, 0, 0},
+				{1, base + 63*PageSize, 1 * PageSize},
+				{4, 0, 0},
+				{2, 0, 0},
+				{62, 0, 0},
+				{1, 0, 0},
+			},
+		},
+	}
+	for name, test := range tests {
+		test := test
+		t.Run(name, func(t *testing.T) {
+			c := test.cache
+			for i, h := range test.hits {
+				b, s := c.Alloc(h.npages)
+				if b != h.base {
+					t.Fatalf("bad alloc base #%d: got 0x%x, want 0x%x", i, b, h.base)
+				}
+				if s != h.scav {
+					t.Fatalf("bad alloc scav #%d: got %d, want %d", i, s, h.scav)
+				}
+			}
+		})
+	}
+}
+
+func TestPageCacheFlush(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	bits64ToBitRanges := func(bits uint64, base uint) []BitRange {
+		var ranges []BitRange
+		start, size := uint(0), uint(0)
+		for i := 0; i < 64; i++ {
+			if bits&(1<<i) != 0 {
+				if size == 0 {
+					start = uint(i) + base
+				}
+				size++
+			} else {
+				if size != 0 {
+					ranges = append(ranges, BitRange{start, size})
+					size = 0
+				}
+			}
+		}
+		if size != 0 {
+			ranges = append(ranges, BitRange{start, size})
+		}
+		return ranges
+	}
+	runTest := func(t *testing.T, base uint, cache, scav uint64) {
+		// Set up the before state.
+		beforeAlloc := map[ChunkIdx][]BitRange{
+			BaseChunkIdx: {{base, 64}},
+		}
+		beforeScav := map[ChunkIdx][]BitRange{
+			BaseChunkIdx: {},
+		}
+		b := NewPageAlloc(beforeAlloc, beforeScav)
+		defer FreePageAlloc(b)
+
+		// Create and flush the cache.
+		c := NewPageCache(PageBase(BaseChunkIdx, base), cache, scav)
+		c.Flush(b)
+		if !c.Empty() {
+			t.Errorf("pageCache flush did not clear cache")
+		}
+
+		// Set up the expected after state.
+		afterAlloc := map[ChunkIdx][]BitRange{
+			BaseChunkIdx: bits64ToBitRanges(^cache, base),
+		}
+		afterScav := map[ChunkIdx][]BitRange{
+			BaseChunkIdx: bits64ToBitRanges(scav, base),
+		}
+		want := NewPageAlloc(afterAlloc, afterScav)
+		defer FreePageAlloc(want)
+
+		// Check to see if it worked.
+		checkPageAlloc(t, want, b)
+	}
+
+	// Empty.
+	runTest(t, 0, 0, 0)
+
+	// Full.
+	runTest(t, 0, ^uint64(0), ^uint64(0))
+
+	// Random.
+	for i := 0; i < 100; i++ {
+		// Generate random valid base within a chunk.
+		base := uint(rand.Intn(PallocChunkPages/64)) * 64
+
+		// Generate random cache.
+		cache := rand.Uint64()
+		scav := rand.Uint64() & cache
+
+		// Run the test.
+		runTest(t, base, cache, scav)
+	}
+}
+
+func TestPageAllocAllocToCache(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	type test struct {
+		before map[ChunkIdx][]BitRange
+		scav   map[ChunkIdx][]BitRange
+		hits   []PageCache // expected base addresses and patterns
+		after  map[ChunkIdx][]BitRange
+	}
+	tests := map[string]test{
+		"AllFree": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{1, 1}, {64, 64}},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(BaseChunkIdx, 0), ^uint64(0), 0x2),
+				NewPageCache(PageBase(BaseChunkIdx, 64), ^uint64(0), ^uint64(0)),
+				NewPageCache(PageBase(BaseChunkIdx, 128), ^uint64(0), 0),
+				NewPageCache(PageBase(BaseChunkIdx, 192), ^uint64(0), 0),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 256}},
+			},
+		},
+		"ManyArena": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages - 64}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(BaseChunkIdx+2, PallocChunkPages-64), ^uint64(0), 0),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+		},
+		"NotContiguous": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{0, 0}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{31, 67}},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(BaseChunkIdx+0xff, 0), ^uint64(0), ((uint64(1)<<33)-1)<<31),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{0, 64}},
+			},
+		},
+		"First": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 32}, {33, 31}, {96, 32}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{1, 4}, {31, 5}, {66, 2}},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(BaseChunkIdx, 0), 1<<32, 1<<32),
+				NewPageCache(PageBase(BaseChunkIdx, 64), (uint64(1)<<32)-1, 0x3<<2),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 128}},
+			},
+		},
+		"Fail": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			hits: []PageCache{
+				NewPageCache(0, 0, 0),
+				NewPageCache(0, 0, 0),
+				NewPageCache(0, 0, 0),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+		},
+	}
+	if PageAlloc64Bit != 0 {
+		const chunkIdxBigJump = 0x100000 // chunk index offset which translates to O(TiB)
+
+		// This test is similar to the one with the same name for
+		// pageAlloc.alloc and serves the same purpose.
+		// See mpagealloc_test.go for details.
+		sumsPerPhysPage := ChunkIdx(PhysPageSize / PallocSumBytes)
+		baseChunkIdx := BaseChunkIdx &^ (sumsPerPhysPage - 1)
+		tests["DiscontiguousMappedSumBoundary"] = test{
+			before: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {{0, PallocChunkPages - 1}},
+				baseChunkIdx + chunkIdxBigJump:     {{1, PallocChunkPages - 1}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {},
+				baseChunkIdx + chunkIdxBigJump:     {},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(baseChunkIdx+sumsPerPhysPage-1, PallocChunkPages-64), 1<<63, 0),
+				NewPageCache(PageBase(baseChunkIdx+chunkIdxBigJump, 0), 1, 0),
+				NewPageCache(0, 0, 0),
+			},
+			after: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {{0, PallocChunkPages}},
+				baseChunkIdx + chunkIdxBigJump:     {{0, PallocChunkPages}},
+			},
+		}
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := NewPageAlloc(v.before, v.scav)
+			defer FreePageAlloc(b)
+
+			for _, expect := range v.hits {
+				checkPageCache(t, b.AllocToCache(), expect)
+				if t.Failed() {
+					return
+				}
+			}
+			want := NewPageAlloc(v.after, v.scav)
+			defer FreePageAlloc(want)
+
+			checkPageAlloc(t, want, b)
+		})
+	}
+}
diff --git a/src/runtime/mpallocbits.go b/src/runtime/mpallocbits.go
new file mode 100644
index 0000000..ff11230
--- /dev/null
+++ b/src/runtime/mpallocbits.go
@@ -0,0 +1,428 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+)
+
+// pageBits is a bitmap representing one bit per page in a palloc chunk.
+type pageBits [pallocChunkPages / 64]uint64
+
+// get returns the value of the i'th bit in the bitmap.
+func (b *pageBits) get(i uint) uint {
+	return uint((b[i/64] >> (i % 64)) & 1)
+}
+
+// block64 returns the 64-bit aligned block of bits containing the i'th bit.
+func (b *pageBits) block64(i uint) uint64 {
+	return b[i/64]
+}
+
+// set sets bit i of pageBits.
+func (b *pageBits) set(i uint) {
+	b[i/64] |= 1 << (i % 64)
+}
+
+// setRange sets bits in the range [i, i+n).
+func (b *pageBits) setRange(i, n uint) {
+	_ = b[i/64]
+	if n == 1 {
+		// Fast path for the n == 1 case.
+		b.set(i)
+		return
+	}
+	// Set bits [i, j].
+	j := i + n - 1
+	if i/64 == j/64 {
+		b[i/64] |= ((uint64(1) << n) - 1) << (i % 64)
+		return
+	}
+	_ = b[j/64]
+	// Set leading bits.
+	b[i/64] |= ^uint64(0) << (i % 64)
+	for k := i/64 + 1; k < j/64; k++ {
+		b[k] = ^uint64(0)
+	}
+	// Set trailing bits.
+	b[j/64] |= (uint64(1) << (j%64 + 1)) - 1
+}
+
+// setAll sets all the bits of b.
+func (b *pageBits) setAll() {
+	for i := range b {
+		b[i] = ^uint64(0)
+	}
+}
+
+// clear clears bit i of pageBits.
+func (b *pageBits) clear(i uint) {
+	b[i/64] &^= 1 << (i % 64)
+}
+
+// clearRange clears bits in the range [i, i+n).
+func (b *pageBits) clearRange(i, n uint) {
+	_ = b[i/64]
+	if n == 1 {
+		// Fast path for the n == 1 case.
+		b.clear(i)
+		return
+	}
+	// Clear bits [i, j].
+	j := i + n - 1
+	if i/64 == j/64 {
+		b[i/64] &^= ((uint64(1) << n) - 1) << (i % 64)
+		return
+	}
+	_ = b[j/64]
+	// Clear leading bits.
+	b[i/64] &^= ^uint64(0) << (i % 64)
+	for k := i/64 + 1; k < j/64; k++ {
+		b[k] = 0
+	}
+	// Clear trailing bits.
+	b[j/64] &^= (uint64(1) << (j%64 + 1)) - 1
+}
+
+// clearAll frees all the bits of b.
+func (b *pageBits) clearAll() {
+	for i := range b {
+		b[i] = 0
+	}
+}
+
+// popcntRange counts the number of set bits in the
+// range [i, i+n).
+func (b *pageBits) popcntRange(i, n uint) (s uint) {
+	if n == 1 {
+		return uint((b[i/64] >> (i % 64)) & 1)
+	}
+	_ = b[i/64]
+	j := i + n - 1
+	if i/64 == j/64 {
+		return uint(sys.OnesCount64((b[i/64] >> (i % 64)) & ((1 << n) - 1)))
+	}
+	_ = b[j/64]
+	s += uint(sys.OnesCount64(b[i/64] >> (i % 64)))
+	for k := i/64 + 1; k < j/64; k++ {
+		s += uint(sys.OnesCount64(b[k]))
+	}
+	s += uint(sys.OnesCount64(b[j/64] & ((1 << (j%64 + 1)) - 1)))
+	return
+}
+
+// pallocBits is a bitmap that tracks page allocations for at most one
+// palloc chunk.
+//
+// The precise representation is an implementation detail, but for the
+// sake of documentation, 0s are free pages and 1s are allocated pages.
+type pallocBits pageBits
+
+// summarize returns a packed summary of the bitmap in pallocBits.
+func (b *pallocBits) summarize() pallocSum {
+	var start, max, cur uint
+	const notSetYet = ^uint(0) // sentinel for start value
+	start = notSetYet
+	for i := 0; i < len(b); i++ {
+		x := b[i]
+		if x == 0 {
+			cur += 64
+			continue
+		}
+		t := uint(sys.TrailingZeros64(x))
+		l := uint(sys.LeadingZeros64(x))
+
+		// Finish any region spanning the uint64s
+		cur += t
+		if start == notSetYet {
+			start = cur
+		}
+		if cur > max {
+			max = cur
+		}
+		// Final region that might span to next uint64
+		cur = l
+	}
+	if start == notSetYet {
+		// Made it all the way through without finding a single 1 bit.
+		const n = uint(64 * len(b))
+		return packPallocSum(n, n, n)
+	}
+	if cur > max {
+		max = cur
+	}
+	if max >= 64-2 {
+		// There is no way an internal run of zeros could beat max.
+		return packPallocSum(start, max, cur)
+	}
+	// Now look inside each uint64 for runs of zeros.
+	// All uint64s must be nonzero, or we would have aborted above.
+outer:
+	for i := 0; i < len(b); i++ {
+		x := b[i]
+
+		// Look inside this uint64. We have a pattern like
+		// 000000 1xxxxx1 000000
+		// We need to look inside the 1xxxxx1 for any contiguous
+		// region of zeros.
+
+		// We already know the trailing zeros are no larger than max. Remove them.
+		x >>= sys.TrailingZeros64(x) & 63
+		if x&(x+1) == 0 { // no more zeros (except at the top).
+			continue
+		}
+
+		// Strategy: shrink all runs of zeros by max. If any runs of zero
+		// remain, then we've identified a larger maxiumum zero run.
+		p := max     // number of zeros we still need to shrink by.
+		k := uint(1) // current minimum length of runs of ones in x.
+		for {
+			// Shrink all runs of zeros by p places (except the top zeros).
+			for p > 0 {
+				if p <= k {
+					// Shift p ones down into the top of each run of zeros.
+					x |= x >> (p & 63)
+					if x&(x+1) == 0 { // no more zeros (except at the top).
+						continue outer
+					}
+					break
+				}
+				// Shift k ones down into the top of each run of zeros.
+				x |= x >> (k & 63)
+				if x&(x+1) == 0 { // no more zeros (except at the top).
+					continue outer
+				}
+				p -= k
+				// We've just doubled the minimum length of 1-runs.
+				// This allows us to shift farther in the next iteration.
+				k *= 2
+			}
+
+			// The length of the lowest-order zero run is an increment to our maximum.
+			j := uint(sys.TrailingZeros64(^x)) // count contiguous trailing ones
+			x >>= j & 63                       // remove trailing ones
+			j = uint(sys.TrailingZeros64(x))   // count contiguous trailing zeros
+			x >>= j & 63                       // remove zeros
+			max += j                           // we have a new maximum!
+			if x&(x+1) == 0 {                  // no more zeros (except at the top).
+				continue outer
+			}
+			p = j // remove j more zeros from each zero run.
+		}
+	}
+	return packPallocSum(start, max, cur)
+}
+
+// find searches for npages contiguous free pages in pallocBits and returns
+// the index where that run starts, as well as the index of the first free page
+// it found in the search. searchIdx represents the first known free page and
+// where to begin the next search from.
+//
+// If find fails to find any free space, it returns an index of ^uint(0) and
+// the new searchIdx should be ignored.
+//
+// Note that if npages == 1, the two returned values will always be identical.
+func (b *pallocBits) find(npages uintptr, searchIdx uint) (uint, uint) {
+	if npages == 1 {
+		addr := b.find1(searchIdx)
+		return addr, addr
+	} else if npages <= 64 {
+		return b.findSmallN(npages, searchIdx)
+	}
+	return b.findLargeN(npages, searchIdx)
+}
+
+// find1 is a helper for find which searches for a single free page
+// in the pallocBits and returns the index.
+//
+// See find for an explanation of the searchIdx parameter.
+func (b *pallocBits) find1(searchIdx uint) uint {
+	_ = b[0] // lift nil check out of loop
+	for i := searchIdx / 64; i < uint(len(b)); i++ {
+		x := b[i]
+		if ^x == 0 {
+			continue
+		}
+		return i*64 + uint(sys.TrailingZeros64(^x))
+	}
+	return ^uint(0)
+}
+
+// findSmallN is a helper for find which searches for npages contiguous free pages
+// in this pallocBits and returns the index where that run of contiguous pages
+// starts as well as the index of the first free page it finds in its search.
+//
+// See find for an explanation of the searchIdx parameter.
+//
+// Returns a ^uint(0) index on failure and the new searchIdx should be ignored.
+//
+// findSmallN assumes npages <= 64, where any such contiguous run of pages
+// crosses at most one aligned 64-bit boundary in the bits.
+func (b *pallocBits) findSmallN(npages uintptr, searchIdx uint) (uint, uint) {
+	end, newSearchIdx := uint(0), ^uint(0)
+	for i := searchIdx / 64; i < uint(len(b)); i++ {
+		bi := b[i]
+		if ^bi == 0 {
+			end = 0
+			continue
+		}
+		// First see if we can pack our allocation in the trailing
+		// zeros plus the end of the last 64 bits.
+		if newSearchIdx == ^uint(0) {
+			// The new searchIdx is going to be at these 64 bits after any
+			// 1s we file, so count trailing 1s.
+			newSearchIdx = i*64 + uint(sys.TrailingZeros64(^bi))
+		}
+		start := uint(sys.TrailingZeros64(bi))
+		if end+start >= uint(npages) {
+			return i*64 - end, newSearchIdx
+		}
+		// Next, check the interior of the 64-bit chunk.
+		j := findBitRange64(^bi, uint(npages))
+		if j < 64 {
+			return i*64 + j, newSearchIdx
+		}
+		end = uint(sys.LeadingZeros64(bi))
+	}
+	return ^uint(0), newSearchIdx
+}
+
+// findLargeN is a helper for find which searches for npages contiguous free pages
+// in this pallocBits and returns the index where that run starts, as well as the
+// index of the first free page it found it its search.
+//
+// See alloc for an explanation of the searchIdx parameter.
+//
+// Returns a ^uint(0) index on failure and the new searchIdx should be ignored.
+//
+// findLargeN assumes npages > 64, where any such run of free pages
+// crosses at least one aligned 64-bit boundary in the bits.
+func (b *pallocBits) findLargeN(npages uintptr, searchIdx uint) (uint, uint) {
+	start, size, newSearchIdx := ^uint(0), uint(0), ^uint(0)
+	for i := searchIdx / 64; i < uint(len(b)); i++ {
+		x := b[i]
+		if x == ^uint64(0) {
+			size = 0
+			continue
+		}
+		if newSearchIdx == ^uint(0) {
+			// The new searchIdx is going to be at these 64 bits after any
+			// 1s we file, so count trailing 1s.
+			newSearchIdx = i*64 + uint(sys.TrailingZeros64(^x))
+		}
+		if size == 0 {
+			size = uint(sys.LeadingZeros64(x))
+			start = i*64 + 64 - size
+			continue
+		}
+		s := uint(sys.TrailingZeros64(x))
+		if s+size >= uint(npages) {
+			size += s
+			return start, newSearchIdx
+		}
+		if s < 64 {
+			size = uint(sys.LeadingZeros64(x))
+			start = i*64 + 64 - size
+			continue
+		}
+		size += 64
+	}
+	if size < uint(npages) {
+		return ^uint(0), newSearchIdx
+	}
+	return start, newSearchIdx
+}
+
+// allocRange allocates the range [i, i+n).
+func (b *pallocBits) allocRange(i, n uint) {
+	(*pageBits)(b).setRange(i, n)
+}
+
+// allocAll allocates all the bits of b.
+func (b *pallocBits) allocAll() {
+	(*pageBits)(b).setAll()
+}
+
+// free1 frees a single page in the pallocBits at i.
+func (b *pallocBits) free1(i uint) {
+	(*pageBits)(b).clear(i)
+}
+
+// free frees the range [i, i+n) of pages in the pallocBits.
+func (b *pallocBits) free(i, n uint) {
+	(*pageBits)(b).clearRange(i, n)
+}
+
+// freeAll frees all the bits of b.
+func (b *pallocBits) freeAll() {
+	(*pageBits)(b).clearAll()
+}
+
+// pages64 returns a 64-bit bitmap representing a block of 64 pages aligned
+// to 64 pages. The returned block of pages is the one containing the i'th
+// page in this pallocBits. Each bit represents whether the page is in-use.
+func (b *pallocBits) pages64(i uint) uint64 {
+	return (*pageBits)(b).block64(i)
+}
+
+// findBitRange64 returns the bit index of the first set of
+// n consecutive 1 bits. If no consecutive set of 1 bits of
+// size n may be found in c, then it returns an integer >= 64.
+// n must be > 0.
+func findBitRange64(c uint64, n uint) uint {
+	// This implementation is based on shrinking the length of
+	// runs of contiguous 1 bits. We remove the top n-1 1 bits
+	// from each run of 1s, then look for the first remaining 1 bit.
+	p := n - 1   // number of 1s we want to remove.
+	k := uint(1) // current minimum width of runs of 0 in c.
+	for p > 0 {
+		if p <= k {
+			// Shift p 0s down into the top of each run of 1s.
+			c &= c >> (p & 63)
+			break
+		}
+		// Shift k 0s down into the top of each run of 1s.
+		c &= c >> (k & 63)
+		if c == 0 {
+			return 64
+		}
+		p -= k
+		// We've just doubled the minimum length of 0-runs.
+		// This allows us to shift farther in the next iteration.
+		k *= 2
+	}
+	// Find first remaining 1.
+	// Since we shrunk from the top down, the first 1 is in
+	// its correct original position.
+	return uint(sys.TrailingZeros64(c))
+}
+
+// pallocData encapsulates pallocBits and a bitmap for
+// whether or not a given page is scavenged in a single
+// structure. It's effectively a pallocBits with
+// additional functionality.
+//
+// Update the comment on (*pageAlloc).chunks should this
+// structure change.
+type pallocData struct {
+	pallocBits
+	scavenged pageBits
+}
+
+// allocRange sets bits [i, i+n) in the bitmap to 1 and
+// updates the scavenged bits appropriately.
+func (m *pallocData) allocRange(i, n uint) {
+	// Clear the scavenged bits when we alloc the range.
+	m.pallocBits.allocRange(i, n)
+	m.scavenged.clearRange(i, n)
+}
+
+// allocAll sets every bit in the bitmap to 1 and updates
+// the scavenged bits appropriately.
+func (m *pallocData) allocAll() {
+	// Clear the scavenged bits when we alloc the range.
+	m.pallocBits.allocAll()
+	m.scavenged.clearAll()
+}
diff --git a/src/runtime/mpallocbits_test.go b/src/runtime/mpallocbits_test.go
new file mode 100644
index 0000000..5095e24
--- /dev/null
+++ b/src/runtime/mpallocbits_test.go
@@ -0,0 +1,551 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"math/rand"
+	. "runtime"
+	"testing"
+)
+
+// Ensures that got and want are the same, and if not, reports
+// detailed diff information.
+func checkPallocBits(t *testing.T, got, want *PallocBits) bool {
+	d := DiffPallocBits(got, want)
+	if len(d) != 0 {
+		t.Errorf("%d range(s) different", len(d))
+		for _, bits := range d {
+			t.Logf("\t@ bit index %d", bits.I)
+			t.Logf("\t|  got: %s", StringifyPallocBits(got, bits))
+			t.Logf("\t| want: %s", StringifyPallocBits(want, bits))
+		}
+		return false
+	}
+	return true
+}
+
+// makePallocBits produces an initialized PallocBits by setting
+// the ranges in s to 1 and the rest to zero.
+func makePallocBits(s []BitRange) *PallocBits {
+	b := new(PallocBits)
+	for _, v := range s {
+		b.AllocRange(v.I, v.N)
+	}
+	return b
+}
+
+// Ensures that PallocBits.AllocRange works, which is a fundamental
+// method used for testing and initialization since it's used by
+// makePallocBits.
+func TestPallocBitsAllocRange(t *testing.T) {
+	test := func(t *testing.T, i, n uint, want *PallocBits) {
+		checkPallocBits(t, makePallocBits([]BitRange{{i, n}}), want)
+	}
+	t.Run("OneLow", func(t *testing.T) {
+		want := new(PallocBits)
+		want[0] = 0x1
+		test(t, 0, 1, want)
+	})
+	t.Run("OneHigh", func(t *testing.T) {
+		want := new(PallocBits)
+		want[PallocChunkPages/64-1] = 1 << 63
+		test(t, PallocChunkPages-1, 1, want)
+	})
+	t.Run("Inner", func(t *testing.T) {
+		want := new(PallocBits)
+		want[2] = 0x3e
+		test(t, 129, 5, want)
+	})
+	t.Run("Aligned", func(t *testing.T) {
+		want := new(PallocBits)
+		want[2] = ^uint64(0)
+		want[3] = ^uint64(0)
+		test(t, 128, 128, want)
+	})
+	t.Run("Begin", func(t *testing.T) {
+		want := new(PallocBits)
+		want[0] = ^uint64(0)
+		want[1] = ^uint64(0)
+		want[2] = ^uint64(0)
+		want[3] = ^uint64(0)
+		want[4] = ^uint64(0)
+		want[5] = 0x1
+		test(t, 0, 321, want)
+	})
+	t.Run("End", func(t *testing.T) {
+		want := new(PallocBits)
+		want[PallocChunkPages/64-1] = ^uint64(0)
+		want[PallocChunkPages/64-2] = ^uint64(0)
+		want[PallocChunkPages/64-3] = ^uint64(0)
+		want[PallocChunkPages/64-4] = 1 << 63
+		test(t, PallocChunkPages-(64*3+1), 64*3+1, want)
+	})
+	t.Run("All", func(t *testing.T) {
+		want := new(PallocBits)
+		for i := range want {
+			want[i] = ^uint64(0)
+		}
+		test(t, 0, PallocChunkPages, want)
+	})
+}
+
+// Inverts every bit in the PallocBits.
+func invertPallocBits(b *PallocBits) {
+	for i := range b {
+		b[i] = ^b[i]
+	}
+}
+
+// Ensures two packed summaries are identical, and reports a detailed description
+// of the difference if they're not.
+func checkPallocSum(t testing.TB, got, want PallocSum) {
+	if got.Start() != want.Start() {
+		t.Errorf("inconsistent start: got %d, want %d", got.Start(), want.Start())
+	}
+	if got.Max() != want.Max() {
+		t.Errorf("inconsistent max: got %d, want %d", got.Max(), want.Max())
+	}
+	if got.End() != want.End() {
+		t.Errorf("inconsistent end: got %d, want %d", got.End(), want.End())
+	}
+}
+
+func TestMallocBitsPopcntRange(t *testing.T) {
+	type test struct {
+		i, n uint // bit range to popcnt over.
+		want uint // expected popcnt result on that range.
+	}
+	tests := map[string]struct {
+		init  []BitRange // bit ranges to set to 1 in the bitmap.
+		tests []test     // a set of popcnt tests to run over the bitmap.
+	}{
+		"None": {
+			tests: []test{
+				{0, 1, 0},
+				{5, 3, 0},
+				{2, 11, 0},
+				{PallocChunkPages/4 + 1, PallocChunkPages / 2, 0},
+				{0, PallocChunkPages, 0},
+			},
+		},
+		"All": {
+			init: []BitRange{{0, PallocChunkPages}},
+			tests: []test{
+				{0, 1, 1},
+				{5, 3, 3},
+				{2, 11, 11},
+				{PallocChunkPages/4 + 1, PallocChunkPages / 2, PallocChunkPages / 2},
+				{0, PallocChunkPages, PallocChunkPages},
+			},
+		},
+		"Half": {
+			init: []BitRange{{PallocChunkPages / 2, PallocChunkPages / 2}},
+			tests: []test{
+				{0, 1, 0},
+				{5, 3, 0},
+				{2, 11, 0},
+				{PallocChunkPages/2 - 1, 1, 0},
+				{PallocChunkPages / 2, 1, 1},
+				{PallocChunkPages/2 + 10, 1, 1},
+				{PallocChunkPages/2 - 1, 2, 1},
+				{PallocChunkPages / 4, PallocChunkPages / 4, 0},
+				{PallocChunkPages / 4, PallocChunkPages/4 + 1, 1},
+				{PallocChunkPages/4 + 1, PallocChunkPages / 2, PallocChunkPages/4 + 1},
+				{0, PallocChunkPages, PallocChunkPages / 2},
+			},
+		},
+		"OddBound": {
+			init: []BitRange{{0, 111}},
+			tests: []test{
+				{0, 1, 1},
+				{5, 3, 3},
+				{2, 11, 11},
+				{110, 2, 1},
+				{99, 50, 12},
+				{110, 1, 1},
+				{111, 1, 0},
+				{99, 1, 1},
+				{120, 1, 0},
+				{PallocChunkPages / 2, PallocChunkPages / 2, 0},
+				{0, PallocChunkPages, 111},
+			},
+		},
+		"Scattered": {
+			init: []BitRange{
+				{1, 3}, {5, 1}, {7, 1}, {10, 2}, {13, 1}, {15, 4},
+				{21, 1}, {23, 1}, {26, 2}, {30, 5}, {36, 2}, {40, 3},
+				{44, 6}, {51, 1}, {53, 2}, {58, 3}, {63, 1}, {67, 2},
+				{71, 10}, {84, 1}, {89, 7}, {99, 2}, {103, 1}, {107, 2},
+				{111, 1}, {113, 1}, {115, 1}, {118, 1}, {120, 2}, {125, 5},
+			},
+			tests: []test{
+				{0, 11, 6},
+				{0, 64, 39},
+				{13, 64, 40},
+				{64, 64, 34},
+				{0, 128, 73},
+				{1, 128, 74},
+				{0, PallocChunkPages, 75},
+			},
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := makePallocBits(v.init)
+			for _, h := range v.tests {
+				if got := b.PopcntRange(h.i, h.n); got != h.want {
+					t.Errorf("bad popcnt (i=%d, n=%d): got %d, want %d", h.i, h.n, got, h.want)
+				}
+			}
+		})
+	}
+}
+
+// Ensures computing bit summaries works as expected by generating random
+// bitmaps and checking against a reference implementation.
+func TestPallocBitsSummarizeRandom(t *testing.T) {
+	b := new(PallocBits)
+	for i := 0; i < 1000; i++ {
+		// Randomize bitmap.
+		for i := range b {
+			b[i] = rand.Uint64()
+		}
+		// Check summary against reference implementation.
+		checkPallocSum(t, b.Summarize(), SummarizeSlow(b))
+	}
+}
+
+// Ensures computing bit summaries works as expected.
+func TestPallocBitsSummarize(t *testing.T) {
+	var emptySum = PackPallocSum(PallocChunkPages, PallocChunkPages, PallocChunkPages)
+	type test struct {
+		free []BitRange // Ranges of free (zero) bits.
+		hits []PallocSum
+	}
+	tests := make(map[string]test)
+	tests["NoneFree"] = test{
+		free: []BitRange{},
+		hits: []PallocSum{
+			PackPallocSum(0, 0, 0),
+		},
+	}
+	tests["OnlyStart"] = test{
+		free: []BitRange{{0, 10}},
+		hits: []PallocSum{
+			PackPallocSum(10, 10, 0),
+		},
+	}
+	tests["OnlyEnd"] = test{
+		free: []BitRange{{PallocChunkPages - 40, 40}},
+		hits: []PallocSum{
+			PackPallocSum(0, 40, 40),
+		},
+	}
+	tests["StartAndEnd"] = test{
+		free: []BitRange{{0, 11}, {PallocChunkPages - 23, 23}},
+		hits: []PallocSum{
+			PackPallocSum(11, 23, 23),
+		},
+	}
+	tests["StartMaxEnd"] = test{
+		free: []BitRange{{0, 4}, {50, 100}, {PallocChunkPages - 4, 4}},
+		hits: []PallocSum{
+			PackPallocSum(4, 100, 4),
+		},
+	}
+	tests["OnlyMax"] = test{
+		free: []BitRange{{1, 20}, {35, 241}, {PallocChunkPages - 50, 30}},
+		hits: []PallocSum{
+			PackPallocSum(0, 241, 0),
+		},
+	}
+	tests["MultiMax"] = test{
+		free: []BitRange{{35, 2}, {40, 5}, {100, 5}},
+		hits: []PallocSum{
+			PackPallocSum(0, 5, 0),
+		},
+	}
+	tests["One"] = test{
+		free: []BitRange{{2, 1}},
+		hits: []PallocSum{
+			PackPallocSum(0, 1, 0),
+		},
+	}
+	tests["AllFree"] = test{
+		free: []BitRange{{0, PallocChunkPages}},
+		hits: []PallocSum{
+			emptySum,
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := makePallocBits(v.free)
+			// In the PallocBits we create 1's represent free spots, but in our actual
+			// PallocBits 1 means not free, so invert.
+			invertPallocBits(b)
+			for _, h := range v.hits {
+				checkPallocSum(t, b.Summarize(), h)
+			}
+		})
+	}
+}
+
+// Benchmarks how quickly we can summarize a PallocBits.
+func BenchmarkPallocBitsSummarize(b *testing.B) {
+	patterns := []uint64{
+		0,
+		^uint64(0),
+		0xaa,
+		0xaaaaaaaaaaaaaaaa,
+		0x80000000aaaaaaaa,
+		0xaaaaaaaa00000001,
+		0xbbbbbbbbbbbbbbbb,
+		0x80000000bbbbbbbb,
+		0xbbbbbbbb00000001,
+		0xcccccccccccccccc,
+		0x4444444444444444,
+		0x4040404040404040,
+		0x4000400040004000,
+		0x1000404044ccaaff,
+	}
+	for _, p := range patterns {
+		buf := new(PallocBits)
+		for i := 0; i < len(buf); i++ {
+			buf[i] = p
+		}
+		b.Run(fmt.Sprintf("Unpacked%02X", p), func(b *testing.B) {
+			checkPallocSum(b, buf.Summarize(), SummarizeSlow(buf))
+			for i := 0; i < b.N; i++ {
+				buf.Summarize()
+			}
+		})
+	}
+}
+
+// Ensures page allocation works.
+func TestPallocBitsAlloc(t *testing.T) {
+	tests := map[string]struct {
+		before []BitRange
+		after  []BitRange
+		npages uintptr
+		hits   []uint
+	}{
+		"AllFree1": {
+			npages: 1,
+			hits:   []uint{0, 1, 2, 3, 4, 5},
+			after:  []BitRange{{0, 6}},
+		},
+		"AllFree2": {
+			npages: 2,
+			hits:   []uint{0, 2, 4, 6, 8, 10},
+			after:  []BitRange{{0, 12}},
+		},
+		"AllFree5": {
+			npages: 5,
+			hits:   []uint{0, 5, 10, 15, 20},
+			after:  []BitRange{{0, 25}},
+		},
+		"AllFree64": {
+			npages: 64,
+			hits:   []uint{0, 64, 128},
+			after:  []BitRange{{0, 192}},
+		},
+		"AllFree65": {
+			npages: 65,
+			hits:   []uint{0, 65, 130},
+			after:  []BitRange{{0, 195}},
+		},
+		"SomeFree64": {
+			before: []BitRange{{0, 32}, {64, 32}, {100, PallocChunkPages - 100}},
+			npages: 64,
+			hits:   []uint{^uint(0)},
+			after:  []BitRange{{0, 32}, {64, 32}, {100, PallocChunkPages - 100}},
+		},
+		"NoneFree1": {
+			before: []BitRange{{0, PallocChunkPages}},
+			npages: 1,
+			hits:   []uint{^uint(0), ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"NoneFree2": {
+			before: []BitRange{{0, PallocChunkPages}},
+			npages: 2,
+			hits:   []uint{^uint(0), ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"NoneFree5": {
+			before: []BitRange{{0, PallocChunkPages}},
+			npages: 5,
+			hits:   []uint{^uint(0), ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"NoneFree65": {
+			before: []BitRange{{0, PallocChunkPages}},
+			npages: 65,
+			hits:   []uint{^uint(0), ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"ExactFit1": {
+			before: []BitRange{{0, PallocChunkPages/2 - 3}, {PallocChunkPages/2 - 2, PallocChunkPages/2 + 2}},
+			npages: 1,
+			hits:   []uint{PallocChunkPages/2 - 3, ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"ExactFit2": {
+			before: []BitRange{{0, PallocChunkPages/2 - 3}, {PallocChunkPages/2 - 1, PallocChunkPages/2 + 1}},
+			npages: 2,
+			hits:   []uint{PallocChunkPages/2 - 3, ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"ExactFit5": {
+			before: []BitRange{{0, PallocChunkPages/2 - 3}, {PallocChunkPages/2 + 2, PallocChunkPages/2 - 2}},
+			npages: 5,
+			hits:   []uint{PallocChunkPages/2 - 3, ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"ExactFit65": {
+			before: []BitRange{{0, PallocChunkPages/2 - 31}, {PallocChunkPages/2 + 34, PallocChunkPages/2 - 34}},
+			npages: 65,
+			hits:   []uint{PallocChunkPages/2 - 31, ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"SomeFree161": {
+			before: []BitRange{{0, 185}, {331, 1}},
+			npages: 161,
+			hits:   []uint{332},
+			after:  []BitRange{{0, 185}, {331, 162}},
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := makePallocBits(v.before)
+			for iter, i := range v.hits {
+				a, _ := b.Find(v.npages, 0)
+				if i != a {
+					t.Fatalf("find #%d picked wrong index: want %d, got %d", iter+1, i, a)
+				}
+				if i != ^uint(0) {
+					b.AllocRange(a, uint(v.npages))
+				}
+			}
+			want := makePallocBits(v.after)
+			checkPallocBits(t, b, want)
+		})
+	}
+}
+
+// Ensures page freeing works.
+func TestPallocBitsFree(t *testing.T) {
+	tests := map[string]struct {
+		beforeInv []BitRange
+		afterInv  []BitRange
+		frees     []uint
+		npages    uintptr
+	}{
+		"SomeFree": {
+			npages:    1,
+			beforeInv: []BitRange{{0, 32}, {64, 32}, {100, 1}},
+			frees:     []uint{32},
+			afterInv:  []BitRange{{0, 33}, {64, 32}, {100, 1}},
+		},
+		"NoneFree1": {
+			npages:   1,
+			frees:    []uint{0, 1, 2, 3, 4, 5},
+			afterInv: []BitRange{{0, 6}},
+		},
+		"NoneFree2": {
+			npages:   2,
+			frees:    []uint{0, 2, 4, 6, 8, 10},
+			afterInv: []BitRange{{0, 12}},
+		},
+		"NoneFree5": {
+			npages:   5,
+			frees:    []uint{0, 5, 10, 15, 20},
+			afterInv: []BitRange{{0, 25}},
+		},
+		"NoneFree64": {
+			npages:   64,
+			frees:    []uint{0, 64, 128},
+			afterInv: []BitRange{{0, 192}},
+		},
+		"NoneFree65": {
+			npages:   65,
+			frees:    []uint{0, 65, 130},
+			afterInv: []BitRange{{0, 195}},
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := makePallocBits(v.beforeInv)
+			invertPallocBits(b)
+			for _, i := range v.frees {
+				b.Free(i, uint(v.npages))
+			}
+			want := makePallocBits(v.afterInv)
+			invertPallocBits(want)
+			checkPallocBits(t, b, want)
+		})
+	}
+}
+
+func TestFindBitRange64(t *testing.T) {
+	check := func(x uint64, n uint, result uint) {
+		i := FindBitRange64(x, n)
+		if result == ^uint(0) && i < 64 {
+			t.Errorf("case (%016x, %d): got %d, want failure", x, n, i)
+		} else if result != ^uint(0) && i != result {
+			t.Errorf("case (%016x, %d): got %d, want %d", x, n, i, result)
+		}
+	}
+	for i := uint(1); i <= 64; i++ {
+		check(^uint64(0), i, 0)
+	}
+	for i := uint(1); i <= 64; i++ {
+		check(0, i, ^uint(0))
+	}
+	check(0x8000000000000000, 1, 63)
+	check(0xc000010001010000, 2, 62)
+	check(0xc000010001030000, 2, 16)
+	check(0xe000030001030000, 3, 61)
+	check(0xe000030001070000, 3, 16)
+	check(0xffff03ff01070000, 16, 48)
+	check(0xffff03ff0107ffff, 16, 0)
+	check(0x0fff03ff01079fff, 16, ^uint(0))
+}
+
+func BenchmarkFindBitRange64(b *testing.B) {
+	patterns := []uint64{
+		0,
+		^uint64(0),
+		0xaa,
+		0xaaaaaaaaaaaaaaaa,
+		0x80000000aaaaaaaa,
+		0xaaaaaaaa00000001,
+		0xbbbbbbbbbbbbbbbb,
+		0x80000000bbbbbbbb,
+		0xbbbbbbbb00000001,
+		0xcccccccccccccccc,
+		0x4444444444444444,
+		0x4040404040404040,
+		0x4000400040004000,
+	}
+	sizes := []uint{
+		2, 8, 32,
+	}
+	for _, pattern := range patterns {
+		for _, size := range sizes {
+			b.Run(fmt.Sprintf("Pattern%02XSize%d", pattern, size), func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					FindBitRange64(pattern, size)
+				}
+			})
+		}
+	}
+}
diff --git a/src/runtime/mprof.go b/src/runtime/mprof.go
new file mode 100644
index 0000000..128498d
--- /dev/null
+++ b/src/runtime/mprof.go
@@ -0,0 +1,893 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Malloc profiling.
+// Patterned after tcmalloc's algorithms; shorter code.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// NOTE(rsc): Everything here could use cas if contention became an issue.
+var proflock mutex
+
+// All memory allocations are local and do not escape outside of the profiler.
+// The profiler is forbidden from referring to garbage-collected memory.
+
+const (
+	// profile types
+	memProfile bucketType = 1 + iota
+	blockProfile
+	mutexProfile
+
+	// size of bucket hash table
+	buckHashSize = 179999
+
+	// max depth of stack to record in bucket
+	maxStack = 32
+)
+
+type bucketType int
+
+// A bucket holds per-call-stack profiling information.
+// The representation is a bit sleazy, inherited from C.
+// This struct defines the bucket header. It is followed in
+// memory by the stack words and then the actual record
+// data, either a memRecord or a blockRecord.
+//
+// Per-call-stack profiling information.
+// Lookup by hashing call stack into a linked-list hash table.
+//
+// No heap pointers.
+//
+//go:notinheap
+type bucket struct {
+	next    *bucket
+	allnext *bucket
+	typ     bucketType // memBucket or blockBucket (includes mutexProfile)
+	hash    uintptr
+	size    uintptr
+	nstk    uintptr
+}
+
+// A memRecord is the bucket data for a bucket of type memProfile,
+// part of the memory profile.
+type memRecord struct {
+	// The following complex 3-stage scheme of stats accumulation
+	// is required to obtain a consistent picture of mallocs and frees
+	// for some point in time.
+	// The problem is that mallocs come in real time, while frees
+	// come only after a GC during concurrent sweeping. So if we would
+	// naively count them, we would get a skew toward mallocs.
+	//
+	// Hence, we delay information to get consistent snapshots as
+	// of mark termination. Allocations count toward the next mark
+	// termination's snapshot, while sweep frees count toward the
+	// previous mark termination's snapshot:
+	//
+	//              MT          MT          MT          MT
+	//             .·|         .·|         .·|         .·|
+	//          .·˙  |      .·˙  |      .·˙  |      .·˙  |
+	//       .·˙     |   .·˙     |   .·˙     |   .·˙     |
+	//    .·˙        |.·˙        |.·˙        |.·˙        |
+	//
+	//       alloc → ▲ ← free
+	//               ┠┅┅┅┅┅┅┅┅┅┅┅P
+	//       C+2     →    C+1    →  C
+	//
+	//                   alloc → ▲ ← free
+	//                           ┠┅┅┅┅┅┅┅┅┅┅┅P
+	//                   C+2     →    C+1    →  C
+	//
+	// Since we can't publish a consistent snapshot until all of
+	// the sweep frees are accounted for, we wait until the next
+	// mark termination ("MT" above) to publish the previous mark
+	// termination's snapshot ("P" above). To do this, allocation
+	// and free events are accounted to *future* heap profile
+	// cycles ("C+n" above) and we only publish a cycle once all
+	// of the events from that cycle must be done. Specifically:
+	//
+	// Mallocs are accounted to cycle C+2.
+	// Explicit frees are accounted to cycle C+2.
+	// GC frees (done during sweeping) are accounted to cycle C+1.
+	//
+	// After mark termination, we increment the global heap
+	// profile cycle counter and accumulate the stats from cycle C
+	// into the active profile.
+
+	// active is the currently published profile. A profiling
+	// cycle can be accumulated into active once its complete.
+	active memRecordCycle
+
+	// future records the profile events we're counting for cycles
+	// that have not yet been published. This is ring buffer
+	// indexed by the global heap profile cycle C and stores
+	// cycles C, C+1, and C+2. Unlike active, these counts are
+	// only for a single cycle; they are not cumulative across
+	// cycles.
+	//
+	// We store cycle C here because there's a window between when
+	// C becomes the active cycle and when we've flushed it to
+	// active.
+	future [3]memRecordCycle
+}
+
+// memRecordCycle
+type memRecordCycle struct {
+	allocs, frees           uintptr
+	alloc_bytes, free_bytes uintptr
+}
+
+// add accumulates b into a. It does not zero b.
+func (a *memRecordCycle) add(b *memRecordCycle) {
+	a.allocs += b.allocs
+	a.frees += b.frees
+	a.alloc_bytes += b.alloc_bytes
+	a.free_bytes += b.free_bytes
+}
+
+// A blockRecord is the bucket data for a bucket of type blockProfile,
+// which is used in blocking and mutex profiles.
+type blockRecord struct {
+	count  int64
+	cycles int64
+}
+
+var (
+	mbuckets  *bucket // memory profile buckets
+	bbuckets  *bucket // blocking profile buckets
+	xbuckets  *bucket // mutex profile buckets
+	buckhash  *[179999]*bucket
+	bucketmem uintptr
+
+	mProf struct {
+		// All fields in mProf are protected by proflock.
+
+		// cycle is the global heap profile cycle. This wraps
+		// at mProfCycleWrap.
+		cycle uint32
+		// flushed indicates that future[cycle] in all buckets
+		// has been flushed to the active profile.
+		flushed bool
+	}
+)
+
+const mProfCycleWrap = uint32(len(memRecord{}.future)) * (2 << 24)
+
+// newBucket allocates a bucket with the given type and number of stack entries.
+func newBucket(typ bucketType, nstk int) *bucket {
+	size := unsafe.Sizeof(bucket{}) + uintptr(nstk)*unsafe.Sizeof(uintptr(0))
+	switch typ {
+	default:
+		throw("invalid profile bucket type")
+	case memProfile:
+		size += unsafe.Sizeof(memRecord{})
+	case blockProfile, mutexProfile:
+		size += unsafe.Sizeof(blockRecord{})
+	}
+
+	b := (*bucket)(persistentalloc(size, 0, &memstats.buckhash_sys))
+	bucketmem += size
+	b.typ = typ
+	b.nstk = uintptr(nstk)
+	return b
+}
+
+// stk returns the slice in b holding the stack.
+func (b *bucket) stk() []uintptr {
+	stk := (*[maxStack]uintptr)(add(unsafe.Pointer(b), unsafe.Sizeof(*b)))
+	return stk[:b.nstk:b.nstk]
+}
+
+// mp returns the memRecord associated with the memProfile bucket b.
+func (b *bucket) mp() *memRecord {
+	if b.typ != memProfile {
+		throw("bad use of bucket.mp")
+	}
+	data := add(unsafe.Pointer(b), unsafe.Sizeof(*b)+b.nstk*unsafe.Sizeof(uintptr(0)))
+	return (*memRecord)(data)
+}
+
+// bp returns the blockRecord associated with the blockProfile bucket b.
+func (b *bucket) bp() *blockRecord {
+	if b.typ != blockProfile && b.typ != mutexProfile {
+		throw("bad use of bucket.bp")
+	}
+	data := add(unsafe.Pointer(b), unsafe.Sizeof(*b)+b.nstk*unsafe.Sizeof(uintptr(0)))
+	return (*blockRecord)(data)
+}
+
+// Return the bucket for stk[0:nstk], allocating new bucket if needed.
+func stkbucket(typ bucketType, size uintptr, stk []uintptr, alloc bool) *bucket {
+	if buckhash == nil {
+		buckhash = (*[buckHashSize]*bucket)(sysAlloc(unsafe.Sizeof(*buckhash), &memstats.buckhash_sys))
+		if buckhash == nil {
+			throw("runtime: cannot allocate memory")
+		}
+	}
+
+	// Hash stack.
+	var h uintptr
+	for _, pc := range stk {
+		h += pc
+		h += h << 10
+		h ^= h >> 6
+	}
+	// hash in size
+	h += size
+	h += h << 10
+	h ^= h >> 6
+	// finalize
+	h += h << 3
+	h ^= h >> 11
+
+	i := int(h % buckHashSize)
+	for b := buckhash[i]; b != nil; b = b.next {
+		if b.typ == typ && b.hash == h && b.size == size && eqslice(b.stk(), stk) {
+			return b
+		}
+	}
+
+	if !alloc {
+		return nil
+	}
+
+	// Create new bucket.
+	b := newBucket(typ, len(stk))
+	copy(b.stk(), stk)
+	b.hash = h
+	b.size = size
+	b.next = buckhash[i]
+	buckhash[i] = b
+	if typ == memProfile {
+		b.allnext = mbuckets
+		mbuckets = b
+	} else if typ == mutexProfile {
+		b.allnext = xbuckets
+		xbuckets = b
+	} else {
+		b.allnext = bbuckets
+		bbuckets = b
+	}
+	return b
+}
+
+func eqslice(x, y []uintptr) bool {
+	if len(x) != len(y) {
+		return false
+	}
+	for i, xi := range x {
+		if xi != y[i] {
+			return false
+		}
+	}
+	return true
+}
+
+// mProf_NextCycle publishes the next heap profile cycle and creates a
+// fresh heap profile cycle. This operation is fast and can be done
+// during STW. The caller must call mProf_Flush before calling
+// mProf_NextCycle again.
+//
+// This is called by mark termination during STW so allocations and
+// frees after the world is started again count towards a new heap
+// profiling cycle.
+func mProf_NextCycle() {
+	lock(&proflock)
+	// We explicitly wrap mProf.cycle rather than depending on
+	// uint wraparound because the memRecord.future ring does not
+	// itself wrap at a power of two.
+	mProf.cycle = (mProf.cycle + 1) % mProfCycleWrap
+	mProf.flushed = false
+	unlock(&proflock)
+}
+
+// mProf_Flush flushes the events from the current heap profiling
+// cycle into the active profile. After this it is safe to start a new
+// heap profiling cycle with mProf_NextCycle.
+//
+// This is called by GC after mark termination starts the world. In
+// contrast with mProf_NextCycle, this is somewhat expensive, but safe
+// to do concurrently.
+func mProf_Flush() {
+	lock(&proflock)
+	if !mProf.flushed {
+		mProf_FlushLocked()
+		mProf.flushed = true
+	}
+	unlock(&proflock)
+}
+
+func mProf_FlushLocked() {
+	c := mProf.cycle
+	for b := mbuckets; b != nil; b = b.allnext {
+		mp := b.mp()
+
+		// Flush cycle C into the published profile and clear
+		// it for reuse.
+		mpc := &mp.future[c%uint32(len(mp.future))]
+		mp.active.add(mpc)
+		*mpc = memRecordCycle{}
+	}
+}
+
+// mProf_PostSweep records that all sweep frees for this GC cycle have
+// completed. This has the effect of publishing the heap profile
+// snapshot as of the last mark termination without advancing the heap
+// profile cycle.
+func mProf_PostSweep() {
+	lock(&proflock)
+	// Flush cycle C+1 to the active profile so everything as of
+	// the last mark termination becomes visible. *Don't* advance
+	// the cycle, since we're still accumulating allocs in cycle
+	// C+2, which have to become C+1 in the next mark termination
+	// and so on.
+	c := mProf.cycle
+	for b := mbuckets; b != nil; b = b.allnext {
+		mp := b.mp()
+		mpc := &mp.future[(c+1)%uint32(len(mp.future))]
+		mp.active.add(mpc)
+		*mpc = memRecordCycle{}
+	}
+	unlock(&proflock)
+}
+
+// Called by malloc to record a profiled block.
+func mProf_Malloc(p unsafe.Pointer, size uintptr) {
+	var stk [maxStack]uintptr
+	nstk := callers(4, stk[:])
+	lock(&proflock)
+	b := stkbucket(memProfile, size, stk[:nstk], true)
+	c := mProf.cycle
+	mp := b.mp()
+	mpc := &mp.future[(c+2)%uint32(len(mp.future))]
+	mpc.allocs++
+	mpc.alloc_bytes += size
+	unlock(&proflock)
+
+	// Setprofilebucket locks a bunch of other mutexes, so we call it outside of proflock.
+	// This reduces potential contention and chances of deadlocks.
+	// Since the object must be alive during call to mProf_Malloc,
+	// it's fine to do this non-atomically.
+	systemstack(func() {
+		setprofilebucket(p, b)
+	})
+}
+
+// Called when freeing a profiled block.
+func mProf_Free(b *bucket, size uintptr) {
+	lock(&proflock)
+	c := mProf.cycle
+	mp := b.mp()
+	mpc := &mp.future[(c+1)%uint32(len(mp.future))]
+	mpc.frees++
+	mpc.free_bytes += size
+	unlock(&proflock)
+}
+
+var blockprofilerate uint64 // in CPU ticks
+
+// SetBlockProfileRate controls the fraction of goroutine blocking events
+// that are reported in the blocking profile. The profiler aims to sample
+// an average of one blocking event per rate nanoseconds spent blocked.
+//
+// To include every blocking event in the profile, pass rate = 1.
+// To turn off profiling entirely, pass rate <= 0.
+func SetBlockProfileRate(rate int) {
+	var r int64
+	if rate <= 0 {
+		r = 0 // disable profiling
+	} else if rate == 1 {
+		r = 1 // profile everything
+	} else {
+		// convert ns to cycles, use float64 to prevent overflow during multiplication
+		r = int64(float64(rate) * float64(tickspersecond()) / (1000 * 1000 * 1000))
+		if r == 0 {
+			r = 1
+		}
+	}
+
+	atomic.Store64(&blockprofilerate, uint64(r))
+}
+
+func blockevent(cycles int64, skip int) {
+	if cycles <= 0 {
+		cycles = 1
+	}
+	if blocksampled(cycles) {
+		saveblockevent(cycles, skip+1, blockProfile)
+	}
+}
+
+func blocksampled(cycles int64) bool {
+	rate := int64(atomic.Load64(&blockprofilerate))
+	if rate <= 0 || (rate > cycles && int64(fastrand())%rate > cycles) {
+		return false
+	}
+	return true
+}
+
+func saveblockevent(cycles int64, skip int, which bucketType) {
+	gp := getg()
+	var nstk int
+	var stk [maxStack]uintptr
+	if gp.m.curg == nil || gp.m.curg == gp {
+		nstk = callers(skip, stk[:])
+	} else {
+		nstk = gcallers(gp.m.curg, skip, stk[:])
+	}
+	lock(&proflock)
+	b := stkbucket(which, 0, stk[:nstk], true)
+	b.bp().count++
+	b.bp().cycles += cycles
+	unlock(&proflock)
+}
+
+var mutexprofilerate uint64 // fraction sampled
+
+// SetMutexProfileFraction controls the fraction of mutex contention events
+// that are reported in the mutex profile. On average 1/rate events are
+// reported. The previous rate is returned.
+//
+// To turn off profiling entirely, pass rate 0.
+// To just read the current rate, pass rate < 0.
+// (For n>1 the details of sampling may change.)
+func SetMutexProfileFraction(rate int) int {
+	if rate < 0 {
+		return int(mutexprofilerate)
+	}
+	old := mutexprofilerate
+	atomic.Store64(&mutexprofilerate, uint64(rate))
+	return int(old)
+}
+
+//go:linkname mutexevent sync.event
+func mutexevent(cycles int64, skip int) {
+	if cycles < 0 {
+		cycles = 0
+	}
+	rate := int64(atomic.Load64(&mutexprofilerate))
+	// TODO(pjw): measure impact of always calling fastrand vs using something
+	// like malloc.go:nextSample()
+	if rate > 0 && int64(fastrand())%rate == 0 {
+		saveblockevent(cycles, skip+1, mutexProfile)
+	}
+}
+
+// Go interface to profile data.
+
+// A StackRecord describes a single execution stack.
+type StackRecord struct {
+	Stack0 [32]uintptr // stack trace for this record; ends at first 0 entry
+}
+
+// Stack returns the stack trace associated with the record,
+// a prefix of r.Stack0.
+func (r *StackRecord) Stack() []uintptr {
+	for i, v := range r.Stack0 {
+		if v == 0 {
+			return r.Stack0[0:i]
+		}
+	}
+	return r.Stack0[0:]
+}
+
+// MemProfileRate controls the fraction of memory allocations
+// that are recorded and reported in the memory profile.
+// The profiler aims to sample an average of
+// one allocation per MemProfileRate bytes allocated.
+//
+// To include every allocated block in the profile, set MemProfileRate to 1.
+// To turn off profiling entirely, set MemProfileRate to 0.
+//
+// The tools that process the memory profiles assume that the
+// profile rate is constant across the lifetime of the program
+// and equal to the current value. Programs that change the
+// memory profiling rate should do so just once, as early as
+// possible in the execution of the program (for example,
+// at the beginning of main).
+var MemProfileRate int = 512 * 1024
+
+// A MemProfileRecord describes the live objects allocated
+// by a particular call sequence (stack trace).
+type MemProfileRecord struct {
+	AllocBytes, FreeBytes     int64       // number of bytes allocated, freed
+	AllocObjects, FreeObjects int64       // number of objects allocated, freed
+	Stack0                    [32]uintptr // stack trace for this record; ends at first 0 entry
+}
+
+// InUseBytes returns the number of bytes in use (AllocBytes - FreeBytes).
+func (r *MemProfileRecord) InUseBytes() int64 { return r.AllocBytes - r.FreeBytes }
+
+// InUseObjects returns the number of objects in use (AllocObjects - FreeObjects).
+func (r *MemProfileRecord) InUseObjects() int64 {
+	return r.AllocObjects - r.FreeObjects
+}
+
+// Stack returns the stack trace associated with the record,
+// a prefix of r.Stack0.
+func (r *MemProfileRecord) Stack() []uintptr {
+	for i, v := range r.Stack0 {
+		if v == 0 {
+			return r.Stack0[0:i]
+		}
+	}
+	return r.Stack0[0:]
+}
+
+// MemProfile returns a profile of memory allocated and freed per allocation
+// site.
+//
+// MemProfile returns n, the number of records in the current memory profile.
+// If len(p) >= n, MemProfile copies the profile into p and returns n, true.
+// If len(p) < n, MemProfile does not change p and returns n, false.
+//
+// If inuseZero is true, the profile includes allocation records
+// where r.AllocBytes > 0 but r.AllocBytes == r.FreeBytes.
+// These are sites where memory was allocated, but it has all
+// been released back to the runtime.
+//
+// The returned profile may be up to two garbage collection cycles old.
+// This is to avoid skewing the profile toward allocations; because
+// allocations happen in real time but frees are delayed until the garbage
+// collector performs sweeping, the profile only accounts for allocations
+// that have had a chance to be freed by the garbage collector.
+//
+// Most clients should use the runtime/pprof package or
+// the testing package's -test.memprofile flag instead
+// of calling MemProfile directly.
+func MemProfile(p []MemProfileRecord, inuseZero bool) (n int, ok bool) {
+	lock(&proflock)
+	// If we're between mProf_NextCycle and mProf_Flush, take care
+	// of flushing to the active profile so we only have to look
+	// at the active profile below.
+	mProf_FlushLocked()
+	clear := true
+	for b := mbuckets; b != nil; b = b.allnext {
+		mp := b.mp()
+		if inuseZero || mp.active.alloc_bytes != mp.active.free_bytes {
+			n++
+		}
+		if mp.active.allocs != 0 || mp.active.frees != 0 {
+			clear = false
+		}
+	}
+	if clear {
+		// Absolutely no data, suggesting that a garbage collection
+		// has not yet happened. In order to allow profiling when
+		// garbage collection is disabled from the beginning of execution,
+		// accumulate all of the cycles, and recount buckets.
+		n = 0
+		for b := mbuckets; b != nil; b = b.allnext {
+			mp := b.mp()
+			for c := range mp.future {
+				mp.active.add(&mp.future[c])
+				mp.future[c] = memRecordCycle{}
+			}
+			if inuseZero || mp.active.alloc_bytes != mp.active.free_bytes {
+				n++
+			}
+		}
+	}
+	if n <= len(p) {
+		ok = true
+		idx := 0
+		for b := mbuckets; b != nil; b = b.allnext {
+			mp := b.mp()
+			if inuseZero || mp.active.alloc_bytes != mp.active.free_bytes {
+				record(&p[idx], b)
+				idx++
+			}
+		}
+	}
+	unlock(&proflock)
+	return
+}
+
+// Write b's data to r.
+func record(r *MemProfileRecord, b *bucket) {
+	mp := b.mp()
+	r.AllocBytes = int64(mp.active.alloc_bytes)
+	r.FreeBytes = int64(mp.active.free_bytes)
+	r.AllocObjects = int64(mp.active.allocs)
+	r.FreeObjects = int64(mp.active.frees)
+	if raceenabled {
+		racewriterangepc(unsafe.Pointer(&r.Stack0[0]), unsafe.Sizeof(r.Stack0), getcallerpc(), funcPC(MemProfile))
+	}
+	if msanenabled {
+		msanwrite(unsafe.Pointer(&r.Stack0[0]), unsafe.Sizeof(r.Stack0))
+	}
+	copy(r.Stack0[:], b.stk())
+	for i := int(b.nstk); i < len(r.Stack0); i++ {
+		r.Stack0[i] = 0
+	}
+}
+
+func iterate_memprof(fn func(*bucket, uintptr, *uintptr, uintptr, uintptr, uintptr)) {
+	lock(&proflock)
+	for b := mbuckets; b != nil; b = b.allnext {
+		mp := b.mp()
+		fn(b, b.nstk, &b.stk()[0], b.size, mp.active.allocs, mp.active.frees)
+	}
+	unlock(&proflock)
+}
+
+// BlockProfileRecord describes blocking events originated
+// at a particular call sequence (stack trace).
+type BlockProfileRecord struct {
+	Count  int64
+	Cycles int64
+	StackRecord
+}
+
+// BlockProfile returns n, the number of records in the current blocking profile.
+// If len(p) >= n, BlockProfile copies the profile into p and returns n, true.
+// If len(p) < n, BlockProfile does not change p and returns n, false.
+//
+// Most clients should use the runtime/pprof package or
+// the testing package's -test.blockprofile flag instead
+// of calling BlockProfile directly.
+func BlockProfile(p []BlockProfileRecord) (n int, ok bool) {
+	lock(&proflock)
+	for b := bbuckets; b != nil; b = b.allnext {
+		n++
+	}
+	if n <= len(p) {
+		ok = true
+		for b := bbuckets; b != nil; b = b.allnext {
+			bp := b.bp()
+			r := &p[0]
+			r.Count = bp.count
+			r.Cycles = bp.cycles
+			if raceenabled {
+				racewriterangepc(unsafe.Pointer(&r.Stack0[0]), unsafe.Sizeof(r.Stack0), getcallerpc(), funcPC(BlockProfile))
+			}
+			if msanenabled {
+				msanwrite(unsafe.Pointer(&r.Stack0[0]), unsafe.Sizeof(r.Stack0))
+			}
+			i := copy(r.Stack0[:], b.stk())
+			for ; i < len(r.Stack0); i++ {
+				r.Stack0[i] = 0
+			}
+			p = p[1:]
+		}
+	}
+	unlock(&proflock)
+	return
+}
+
+// MutexProfile returns n, the number of records in the current mutex profile.
+// If len(p) >= n, MutexProfile copies the profile into p and returns n, true.
+// Otherwise, MutexProfile does not change p, and returns n, false.
+//
+// Most clients should use the runtime/pprof package
+// instead of calling MutexProfile directly.
+func MutexProfile(p []BlockProfileRecord) (n int, ok bool) {
+	lock(&proflock)
+	for b := xbuckets; b != nil; b = b.allnext {
+		n++
+	}
+	if n <= len(p) {
+		ok = true
+		for b := xbuckets; b != nil; b = b.allnext {
+			bp := b.bp()
+			r := &p[0]
+			r.Count = int64(bp.count)
+			r.Cycles = bp.cycles
+			i := copy(r.Stack0[:], b.stk())
+			for ; i < len(r.Stack0); i++ {
+				r.Stack0[i] = 0
+			}
+			p = p[1:]
+		}
+	}
+	unlock(&proflock)
+	return
+}
+
+// ThreadCreateProfile returns n, the number of records in the thread creation profile.
+// If len(p) >= n, ThreadCreateProfile copies the profile into p and returns n, true.
+// If len(p) < n, ThreadCreateProfile does not change p and returns n, false.
+//
+// Most clients should use the runtime/pprof package instead
+// of calling ThreadCreateProfile directly.
+func ThreadCreateProfile(p []StackRecord) (n int, ok bool) {
+	first := (*m)(atomic.Loadp(unsafe.Pointer(&allm)))
+	for mp := first; mp != nil; mp = mp.alllink {
+		n++
+	}
+	if n <= len(p) {
+		ok = true
+		i := 0
+		for mp := first; mp != nil; mp = mp.alllink {
+			p[i].Stack0 = mp.createstack
+			i++
+		}
+	}
+	return
+}
+
+//go:linkname runtime_goroutineProfileWithLabels runtime/pprof.runtime_goroutineProfileWithLabels
+func runtime_goroutineProfileWithLabels(p []StackRecord, labels []unsafe.Pointer) (n int, ok bool) {
+	return goroutineProfileWithLabels(p, labels)
+}
+
+// labels may be nil. If labels is non-nil, it must have the same length as p.
+func goroutineProfileWithLabels(p []StackRecord, labels []unsafe.Pointer) (n int, ok bool) {
+	if labels != nil && len(labels) != len(p) {
+		labels = nil
+	}
+	gp := getg()
+
+	isOK := func(gp1 *g) bool {
+		// Checking isSystemGoroutine here makes GoroutineProfile
+		// consistent with both NumGoroutine and Stack.
+		return gp1 != gp && readgstatus(gp1) != _Gdead && !isSystemGoroutine(gp1, false)
+	}
+
+	stopTheWorld("profile")
+
+	n = 1
+	for _, gp1 := range allgs {
+		if isOK(gp1) {
+			n++
+		}
+	}
+
+	if n <= len(p) {
+		ok = true
+		r, lbl := p, labels
+
+		// Save current goroutine.
+		sp := getcallersp()
+		pc := getcallerpc()
+		systemstack(func() {
+			saveg(pc, sp, gp, &r[0])
+		})
+		r = r[1:]
+
+		// If we have a place to put our goroutine labelmap, insert it there.
+		if labels != nil {
+			lbl[0] = gp.labels
+			lbl = lbl[1:]
+		}
+
+		// Save other goroutines.
+		for _, gp1 := range allgs {
+			if isOK(gp1) {
+				if len(r) == 0 {
+					// Should be impossible, but better to return a
+					// truncated profile than to crash the entire process.
+					break
+				}
+				saveg(^uintptr(0), ^uintptr(0), gp1, &r[0])
+				if labels != nil {
+					lbl[0] = gp1.labels
+					lbl = lbl[1:]
+				}
+				r = r[1:]
+			}
+		}
+	}
+
+	startTheWorld()
+	return n, ok
+}
+
+// GoroutineProfile returns n, the number of records in the active goroutine stack profile.
+// If len(p) >= n, GoroutineProfile copies the profile into p and returns n, true.
+// If len(p) < n, GoroutineProfile does not change p and returns n, false.
+//
+// Most clients should use the runtime/pprof package instead
+// of calling GoroutineProfile directly.
+func GoroutineProfile(p []StackRecord) (n int, ok bool) {
+
+	return goroutineProfileWithLabels(p, nil)
+}
+
+func saveg(pc, sp uintptr, gp *g, r *StackRecord) {
+	n := gentraceback(pc, sp, 0, gp, 0, &r.Stack0[0], len(r.Stack0), nil, nil, 0)
+	if n < len(r.Stack0) {
+		r.Stack0[n] = 0
+	}
+}
+
+// Stack formats a stack trace of the calling goroutine into buf
+// and returns the number of bytes written to buf.
+// If all is true, Stack formats stack traces of all other goroutines
+// into buf after the trace for the current goroutine.
+func Stack(buf []byte, all bool) int {
+	if all {
+		stopTheWorld("stack trace")
+	}
+
+	n := 0
+	if len(buf) > 0 {
+		gp := getg()
+		sp := getcallersp()
+		pc := getcallerpc()
+		systemstack(func() {
+			g0 := getg()
+			// Force traceback=1 to override GOTRACEBACK setting,
+			// so that Stack's results are consistent.
+			// GOTRACEBACK is only about crash dumps.
+			g0.m.traceback = 1
+			g0.writebuf = buf[0:0:len(buf)]
+			goroutineheader(gp)
+			traceback(pc, sp, 0, gp)
+			if all {
+				tracebackothers(gp)
+			}
+			g0.m.traceback = 0
+			n = len(g0.writebuf)
+			g0.writebuf = nil
+		})
+	}
+
+	if all {
+		startTheWorld()
+	}
+	return n
+}
+
+// Tracing of alloc/free/gc.
+
+var tracelock mutex
+
+func tracealloc(p unsafe.Pointer, size uintptr, typ *_type) {
+	lock(&tracelock)
+	gp := getg()
+	gp.m.traceback = 2
+	if typ == nil {
+		print("tracealloc(", p, ", ", hex(size), ")\n")
+	} else {
+		print("tracealloc(", p, ", ", hex(size), ", ", typ.string(), ")\n")
+	}
+	if gp.m.curg == nil || gp == gp.m.curg {
+		goroutineheader(gp)
+		pc := getcallerpc()
+		sp := getcallersp()
+		systemstack(func() {
+			traceback(pc, sp, 0, gp)
+		})
+	} else {
+		goroutineheader(gp.m.curg)
+		traceback(^uintptr(0), ^uintptr(0), 0, gp.m.curg)
+	}
+	print("\n")
+	gp.m.traceback = 0
+	unlock(&tracelock)
+}
+
+func tracefree(p unsafe.Pointer, size uintptr) {
+	lock(&tracelock)
+	gp := getg()
+	gp.m.traceback = 2
+	print("tracefree(", p, ", ", hex(size), ")\n")
+	goroutineheader(gp)
+	pc := getcallerpc()
+	sp := getcallersp()
+	systemstack(func() {
+		traceback(pc, sp, 0, gp)
+	})
+	print("\n")
+	gp.m.traceback = 0
+	unlock(&tracelock)
+}
+
+func tracegc() {
+	lock(&tracelock)
+	gp := getg()
+	gp.m.traceback = 2
+	print("tracegc()\n")
+	// running on m->g0 stack; show all non-g0 goroutines
+	tracebackothers(gp)
+	print("end tracegc\n")
+	print("\n")
+	gp.m.traceback = 0
+	unlock(&tracelock)
+}
diff --git a/src/runtime/mranges.go b/src/runtime/mranges.go
new file mode 100644
index 0000000..84a2c06
--- /dev/null
+++ b/src/runtime/mranges.go
@@ -0,0 +1,372 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Address range data structure.
+//
+// This file contains an implementation of a data structure which
+// manages ordered address ranges.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// addrRange represents a region of address space.
+//
+// An addrRange must never span a gap in the address space.
+type addrRange struct {
+	// base and limit together represent the region of address space
+	// [base, limit). That is, base is inclusive, limit is exclusive.
+	// These are address over an offset view of the address space on
+	// platforms with a segmented address space, that is, on platforms
+	// where arenaBaseOffset != 0.
+	base, limit offAddr
+}
+
+// makeAddrRange creates a new address range from two virtual addresses.
+//
+// Throws if the base and limit are not in the same memory segment.
+func makeAddrRange(base, limit uintptr) addrRange {
+	r := addrRange{offAddr{base}, offAddr{limit}}
+	if (base-arenaBaseOffset >= base) != (limit-arenaBaseOffset >= limit) {
+		throw("addr range base and limit are not in the same memory segment")
+	}
+	return r
+}
+
+// size returns the size of the range represented in bytes.
+func (a addrRange) size() uintptr {
+	if !a.base.lessThan(a.limit) {
+		return 0
+	}
+	// Subtraction is safe because limit and base must be in the same
+	// segment of the address space.
+	return a.limit.diff(a.base)
+}
+
+// contains returns whether or not the range contains a given address.
+func (a addrRange) contains(addr uintptr) bool {
+	return a.base.lessEqual(offAddr{addr}) && (offAddr{addr}).lessThan(a.limit)
+}
+
+// subtract takes the addrRange toPrune and cuts out any overlap with
+// from, then returns the new range. subtract assumes that a and b
+// either don't overlap at all, only overlap on one side, or are equal.
+// If b is strictly contained in a, thus forcing a split, it will throw.
+func (a addrRange) subtract(b addrRange) addrRange {
+	if b.base.lessEqual(a.base) && a.limit.lessEqual(b.limit) {
+		return addrRange{}
+	} else if a.base.lessThan(b.base) && b.limit.lessThan(a.limit) {
+		throw("bad prune")
+	} else if b.limit.lessThan(a.limit) && a.base.lessThan(b.limit) {
+		a.base = b.limit
+	} else if a.base.lessThan(b.base) && b.base.lessThan(a.limit) {
+		a.limit = b.base
+	}
+	return a
+}
+
+// removeGreaterEqual removes all addresses in a greater than or equal
+// to addr and returns the new range.
+func (a addrRange) removeGreaterEqual(addr uintptr) addrRange {
+	if (offAddr{addr}).lessEqual(a.base) {
+		return addrRange{}
+	}
+	if a.limit.lessEqual(offAddr{addr}) {
+		return a
+	}
+	return makeAddrRange(a.base.addr(), addr)
+}
+
+var (
+	// minOffAddr is the minimum address in the offset space, and
+	// it corresponds to the virtual address arenaBaseOffset.
+	minOffAddr = offAddr{arenaBaseOffset}
+
+	// maxOffAddr is the maximum address in the offset address
+	// space. It corresponds to the highest virtual address representable
+	// by the page alloc chunk and heap arena maps.
+	maxOffAddr = offAddr{(((1 << heapAddrBits) - 1) + arenaBaseOffset) & uintptrMask}
+)
+
+// offAddr represents an address in a contiguous view
+// of the address space on systems where the address space is
+// segmented. On other systems, it's just a normal address.
+type offAddr struct {
+	// a is just the virtual address, but should never be used
+	// directly. Call addr() to get this value instead.
+	a uintptr
+}
+
+// add adds a uintptr offset to the offAddr.
+func (l offAddr) add(bytes uintptr) offAddr {
+	return offAddr{a: l.a + bytes}
+}
+
+// sub subtracts a uintptr offset from the offAddr.
+func (l offAddr) sub(bytes uintptr) offAddr {
+	return offAddr{a: l.a - bytes}
+}
+
+// diff returns the amount of bytes in between the
+// two offAddrs.
+func (l1 offAddr) diff(l2 offAddr) uintptr {
+	return l1.a - l2.a
+}
+
+// lessThan returns true if l1 is less than l2 in the offset
+// address space.
+func (l1 offAddr) lessThan(l2 offAddr) bool {
+	return (l1.a - arenaBaseOffset) < (l2.a - arenaBaseOffset)
+}
+
+// lessEqual returns true if l1 is less than or equal to l2 in
+// the offset address space.
+func (l1 offAddr) lessEqual(l2 offAddr) bool {
+	return (l1.a - arenaBaseOffset) <= (l2.a - arenaBaseOffset)
+}
+
+// equal returns true if the two offAddr values are equal.
+func (l1 offAddr) equal(l2 offAddr) bool {
+	// No need to compare in the offset space, it
+	// means the same thing.
+	return l1 == l2
+}
+
+// addr returns the virtual address for this offset address.
+func (l offAddr) addr() uintptr {
+	return l.a
+}
+
+// addrRanges is a data structure holding a collection of ranges of
+// address space.
+//
+// The ranges are coalesced eagerly to reduce the
+// number ranges it holds.
+//
+// The slice backing store for this field is persistentalloc'd
+// and thus there is no way to free it.
+//
+// addrRanges is not thread-safe.
+type addrRanges struct {
+	// ranges is a slice of ranges sorted by base.
+	ranges []addrRange
+
+	// totalBytes is the total amount of address space in bytes counted by
+	// this addrRanges.
+	totalBytes uintptr
+
+	// sysStat is the stat to track allocations by this type
+	sysStat *sysMemStat
+}
+
+func (a *addrRanges) init(sysStat *sysMemStat) {
+	ranges := (*notInHeapSlice)(unsafe.Pointer(&a.ranges))
+	ranges.len = 0
+	ranges.cap = 16
+	ranges.array = (*notInHeap)(persistentalloc(unsafe.Sizeof(addrRange{})*uintptr(ranges.cap), sys.PtrSize, sysStat))
+	a.sysStat = sysStat
+	a.totalBytes = 0
+}
+
+// findSucc returns the first index in a such that addr is
+// less than the base of the addrRange at that index.
+func (a *addrRanges) findSucc(addr uintptr) int {
+	base := offAddr{addr}
+
+	// Narrow down the search space via a binary search
+	// for large addrRanges until we have at most iterMax
+	// candidates left.
+	const iterMax = 8
+	bot, top := 0, len(a.ranges)
+	for top-bot > iterMax {
+		i := ((top - bot) / 2) + bot
+		if a.ranges[i].contains(base.addr()) {
+			// a.ranges[i] contains base, so
+			// its successor is the next index.
+			return i + 1
+		}
+		if base.lessThan(a.ranges[i].base) {
+			// In this case i might actually be
+			// the successor, but we can't be sure
+			// until we check the ones before it.
+			top = i
+		} else {
+			// In this case we know base is
+			// greater than or equal to a.ranges[i].limit-1,
+			// so i is definitely not the successor.
+			// We already checked i, so pick the next
+			// one.
+			bot = i + 1
+		}
+	}
+	// There are top-bot candidates left, so
+	// iterate over them and find the first that
+	// base is strictly less than.
+	for i := bot; i < top; i++ {
+		if base.lessThan(a.ranges[i].base) {
+			return i
+		}
+	}
+	return top
+}
+
+// findAddrGreaterEqual returns the smallest address represented by a
+// that is >= addr. Thus, if the address is represented by a,
+// then it returns addr. The second return value indicates whether
+// such an address exists for addr in a. That is, if addr is larger than
+// any address known to a, the second return value will be false.
+func (a *addrRanges) findAddrGreaterEqual(addr uintptr) (uintptr, bool) {
+	i := a.findSucc(addr)
+	if i == 0 {
+		return a.ranges[0].base.addr(), true
+	}
+	if a.ranges[i-1].contains(addr) {
+		return addr, true
+	}
+	if i < len(a.ranges) {
+		return a.ranges[i].base.addr(), true
+	}
+	return 0, false
+}
+
+// contains returns true if a covers the address addr.
+func (a *addrRanges) contains(addr uintptr) bool {
+	i := a.findSucc(addr)
+	if i == 0 {
+		return false
+	}
+	return a.ranges[i-1].contains(addr)
+}
+
+// add inserts a new address range to a.
+//
+// r must not overlap with any address range in a and r.size() must be > 0.
+func (a *addrRanges) add(r addrRange) {
+	// The copies in this function are potentially expensive, but this data
+	// structure is meant to represent the Go heap. At worst, copying this
+	// would take ~160µs assuming a conservative copying rate of 25 GiB/s (the
+	// copy will almost never trigger a page fault) for a 1 TiB heap with 4 MiB
+	// arenas which is completely discontiguous. ~160µs is still a lot, but in
+	// practice most platforms have 64 MiB arenas (which cuts this by a factor
+	// of 16) and Go heaps are usually mostly contiguous, so the chance that
+	// an addrRanges even grows to that size is extremely low.
+
+	// An empty range has no effect on the set of addresses represented
+	// by a, but passing a zero-sized range is almost always a bug.
+	if r.size() == 0 {
+		print("runtime: range = {", hex(r.base.addr()), ", ", hex(r.limit.addr()), "}\n")
+		throw("attempted to add zero-sized address range")
+	}
+	// Because we assume r is not currently represented in a,
+	// findSucc gives us our insertion index.
+	i := a.findSucc(r.base.addr())
+	coalescesDown := i > 0 && a.ranges[i-1].limit.equal(r.base)
+	coalescesUp := i < len(a.ranges) && r.limit.equal(a.ranges[i].base)
+	if coalescesUp && coalescesDown {
+		// We have neighbors and they both border us.
+		// Merge a.ranges[i-1], r, and a.ranges[i] together into a.ranges[i-1].
+		a.ranges[i-1].limit = a.ranges[i].limit
+
+		// Delete a.ranges[i].
+		copy(a.ranges[i:], a.ranges[i+1:])
+		a.ranges = a.ranges[:len(a.ranges)-1]
+	} else if coalescesDown {
+		// We have a neighbor at a lower address only and it borders us.
+		// Merge the new space into a.ranges[i-1].
+		a.ranges[i-1].limit = r.limit
+	} else if coalescesUp {
+		// We have a neighbor at a higher address only and it borders us.
+		// Merge the new space into a.ranges[i].
+		a.ranges[i].base = r.base
+	} else {
+		// We may or may not have neighbors which don't border us.
+		// Add the new range.
+		if len(a.ranges)+1 > cap(a.ranges) {
+			// Grow the array. Note that this leaks the old array, but since
+			// we're doubling we have at most 2x waste. For a 1 TiB heap and
+			// 4 MiB arenas which are all discontiguous (both very conservative
+			// assumptions), this would waste at most 4 MiB of memory.
+			oldRanges := a.ranges
+			ranges := (*notInHeapSlice)(unsafe.Pointer(&a.ranges))
+			ranges.len = len(oldRanges) + 1
+			ranges.cap = cap(oldRanges) * 2
+			ranges.array = (*notInHeap)(persistentalloc(unsafe.Sizeof(addrRange{})*uintptr(ranges.cap), sys.PtrSize, a.sysStat))
+
+			// Copy in the old array, but make space for the new range.
+			copy(a.ranges[:i], oldRanges[:i])
+			copy(a.ranges[i+1:], oldRanges[i:])
+		} else {
+			a.ranges = a.ranges[:len(a.ranges)+1]
+			copy(a.ranges[i+1:], a.ranges[i:])
+		}
+		a.ranges[i] = r
+	}
+	a.totalBytes += r.size()
+}
+
+// removeLast removes and returns the highest-addressed contiguous range
+// of a, or the last nBytes of that range, whichever is smaller. If a is
+// empty, it returns an empty range.
+func (a *addrRanges) removeLast(nBytes uintptr) addrRange {
+	if len(a.ranges) == 0 {
+		return addrRange{}
+	}
+	r := a.ranges[len(a.ranges)-1]
+	size := r.size()
+	if size > nBytes {
+		newEnd := r.limit.sub(nBytes)
+		a.ranges[len(a.ranges)-1].limit = newEnd
+		a.totalBytes -= nBytes
+		return addrRange{newEnd, r.limit}
+	}
+	a.ranges = a.ranges[:len(a.ranges)-1]
+	a.totalBytes -= size
+	return r
+}
+
+// removeGreaterEqual removes the ranges of a which are above addr, and additionally
+// splits any range containing addr.
+func (a *addrRanges) removeGreaterEqual(addr uintptr) {
+	pivot := a.findSucc(addr)
+	if pivot == 0 {
+		// addr is before all ranges in a.
+		a.totalBytes = 0
+		a.ranges = a.ranges[:0]
+		return
+	}
+	removed := uintptr(0)
+	for _, r := range a.ranges[pivot:] {
+		removed += r.size()
+	}
+	if r := a.ranges[pivot-1]; r.contains(addr) {
+		removed += r.size()
+		r = r.removeGreaterEqual(addr)
+		if r.size() == 0 {
+			pivot--
+		} else {
+			removed -= r.size()
+			a.ranges[pivot-1] = r
+		}
+	}
+	a.ranges = a.ranges[:pivot]
+	a.totalBytes -= removed
+}
+
+// cloneInto makes a deep clone of a's state into b, re-using
+// b's ranges if able.
+func (a *addrRanges) cloneInto(b *addrRanges) {
+	if len(a.ranges) > cap(b.ranges) {
+		// Grow the array.
+		ranges := (*notInHeapSlice)(unsafe.Pointer(&b.ranges))
+		ranges.len = 0
+		ranges.cap = cap(a.ranges)
+		ranges.array = (*notInHeap)(persistentalloc(unsafe.Sizeof(addrRange{})*uintptr(ranges.cap), sys.PtrSize, b.sysStat))
+	}
+	b.ranges = b.ranges[:len(a.ranges)]
+	b.totalBytes = a.totalBytes
+	copy(b.ranges, a.ranges)
+}
diff --git a/src/runtime/mranges_test.go b/src/runtime/mranges_test.go
new file mode 100644
index 0000000..ed439c5
--- /dev/null
+++ b/src/runtime/mranges_test.go
@@ -0,0 +1,275 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	. "runtime"
+	"testing"
+)
+
+func validateAddrRanges(t *testing.T, a *AddrRanges, want ...AddrRange) {
+	ranges := a.Ranges()
+	if len(ranges) != len(want) {
+		t.Errorf("want %v, got %v", want, ranges)
+		t.Fatal("different lengths")
+	}
+	gotTotalBytes := uintptr(0)
+	wantTotalBytes := uintptr(0)
+	for i := range ranges {
+		gotTotalBytes += ranges[i].Size()
+		wantTotalBytes += want[i].Size()
+		if ranges[i].Base() >= ranges[i].Limit() {
+			t.Error("empty range found")
+		}
+		// Ensure this is equivalent to what we want.
+		if !ranges[i].Equals(want[i]) {
+			t.Errorf("range %d: got [0x%x, 0x%x), want [0x%x, 0x%x)", i,
+				ranges[i].Base(), ranges[i].Limit(),
+				want[i].Base(), want[i].Limit(),
+			)
+		}
+		if i != 0 {
+			// Ensure the ranges are sorted.
+			if ranges[i-1].Base() >= ranges[i].Base() {
+				t.Errorf("ranges %d and %d are out of sorted order", i-1, i)
+			}
+			// Check for a failure to coalesce.
+			if ranges[i-1].Limit() == ranges[i].Base() {
+				t.Errorf("ranges %d and %d should have coalesced", i-1, i)
+			}
+			// Check if any ranges overlap. Because the ranges are sorted
+			// by base, it's sufficient to just check neighbors.
+			if ranges[i-1].Limit() > ranges[i].Base() {
+				t.Errorf("ranges %d and %d overlap", i-1, i)
+			}
+		}
+	}
+	if wantTotalBytes != gotTotalBytes {
+		t.Errorf("expected %d total bytes, got %d", wantTotalBytes, gotTotalBytes)
+	}
+	if b := a.TotalBytes(); b != gotTotalBytes {
+		t.Errorf("inconsistent total bytes: want %d, got %d", gotTotalBytes, b)
+	}
+	if t.Failed() {
+		t.Errorf("addrRanges: %v", ranges)
+		t.Fatal("detected bad addrRanges")
+	}
+}
+
+func TestAddrRangesAdd(t *testing.T) {
+	a := NewAddrRanges()
+
+	// First range.
+	a.Add(MakeAddrRange(512, 1024))
+	validateAddrRanges(t, &a,
+		MakeAddrRange(512, 1024),
+	)
+
+	// Coalesce up.
+	a.Add(MakeAddrRange(1024, 2048))
+	validateAddrRanges(t, &a,
+		MakeAddrRange(512, 2048),
+	)
+
+	// Add new independent range.
+	a.Add(MakeAddrRange(4096, 8192))
+	validateAddrRanges(t, &a,
+		MakeAddrRange(512, 2048),
+		MakeAddrRange(4096, 8192),
+	)
+
+	// Coalesce down.
+	a.Add(MakeAddrRange(3776, 4096))
+	validateAddrRanges(t, &a,
+		MakeAddrRange(512, 2048),
+		MakeAddrRange(3776, 8192),
+	)
+
+	// Coalesce up and down.
+	a.Add(MakeAddrRange(2048, 3776))
+	validateAddrRanges(t, &a,
+		MakeAddrRange(512, 8192),
+	)
+
+	// Push a bunch of independent ranges to the end to try and force growth.
+	expectedRanges := []AddrRange{MakeAddrRange(512, 8192)}
+	for i := uintptr(0); i < 64; i++ {
+		dRange := MakeAddrRange(8192+(i+1)*2048, 8192+(i+1)*2048+10)
+		a.Add(dRange)
+		expectedRanges = append(expectedRanges, dRange)
+		validateAddrRanges(t, &a, expectedRanges...)
+	}
+
+	// Push a bunch of independent ranges to the beginning to try and force growth.
+	var bottomRanges []AddrRange
+	for i := uintptr(0); i < 63; i++ {
+		dRange := MakeAddrRange(8+i*8, 8+i*8+4)
+		a.Add(dRange)
+		bottomRanges = append(bottomRanges, dRange)
+		validateAddrRanges(t, &a, append(bottomRanges, expectedRanges...)...)
+	}
+}
+
+func TestAddrRangesFindSucc(t *testing.T) {
+	var large []AddrRange
+	for i := 0; i < 100; i++ {
+		large = append(large, MakeAddrRange(5+uintptr(i)*5, 5+uintptr(i)*5+3))
+	}
+
+	type testt struct {
+		name   string
+		base   uintptr
+		expect int
+		ranges []AddrRange
+	}
+	tests := []testt{
+		{
+			name:   "Empty",
+			base:   12,
+			expect: 0,
+			ranges: []AddrRange{},
+		},
+		{
+			name:   "OneBefore",
+			base:   12,
+			expect: 0,
+			ranges: []AddrRange{
+				MakeAddrRange(14, 16),
+			},
+		},
+		{
+			name:   "OneWithin",
+			base:   14,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(14, 16),
+			},
+		},
+		{
+			name:   "OneAfterLimit",
+			base:   16,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(14, 16),
+			},
+		},
+		{
+			name:   "OneAfter",
+			base:   17,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(14, 16),
+			},
+		},
+		{
+			name:   "ThreeBefore",
+			base:   3,
+			expect: 0,
+			ranges: []AddrRange{
+				MakeAddrRange(6, 10),
+				MakeAddrRange(12, 16),
+				MakeAddrRange(19, 22),
+			},
+		},
+		{
+			name:   "ThreeAfter",
+			base:   24,
+			expect: 3,
+			ranges: []AddrRange{
+				MakeAddrRange(6, 10),
+				MakeAddrRange(12, 16),
+				MakeAddrRange(19, 22),
+			},
+		},
+		{
+			name:   "ThreeBetween",
+			base:   11,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(6, 10),
+				MakeAddrRange(12, 16),
+				MakeAddrRange(19, 22),
+			},
+		},
+		{
+			name:   "ThreeWithin",
+			base:   9,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(6, 10),
+				MakeAddrRange(12, 16),
+				MakeAddrRange(19, 22),
+			},
+		},
+		{
+			name:   "Zero",
+			base:   0,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(0, 10),
+			},
+		},
+		{
+			name:   "Max",
+			base:   ^uintptr(0),
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(^uintptr(0)-5, ^uintptr(0)),
+			},
+		},
+		{
+			name:   "LargeBefore",
+			base:   2,
+			expect: 0,
+			ranges: large,
+		},
+		{
+			name:   "LargeAfter",
+			base:   5 + uintptr(len(large))*5 + 30,
+			expect: len(large),
+			ranges: large,
+		},
+		{
+			name:   "LargeBetweenLow",
+			base:   14,
+			expect: 2,
+			ranges: large,
+		},
+		{
+			name:   "LargeBetweenHigh",
+			base:   249,
+			expect: 49,
+			ranges: large,
+		},
+		{
+			name:   "LargeWithinLow",
+			base:   25,
+			expect: 5,
+			ranges: large,
+		},
+		{
+			name:   "LargeWithinHigh",
+			base:   396,
+			expect: 79,
+			ranges: large,
+		},
+		{
+			name:   "LargeWithinMiddle",
+			base:   250,
+			expect: 50,
+			ranges: large,
+		},
+	}
+
+	for _, test := range tests {
+		t.Run(test.name, func(t *testing.T) {
+			a := MakeAddrRanges(test.ranges...)
+			i := a.FindSucc(test.base)
+			if i != test.expect {
+				t.Fatalf("expected %d, got %d", test.expect, i)
+			}
+		})
+	}
+}
diff --git a/src/runtime/msan.go b/src/runtime/msan.go
new file mode 100644
index 0000000..6a5960b
--- /dev/null
+++ b/src/runtime/msan.go
@@ -0,0 +1,61 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build msan
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Public memory sanitizer API.
+
+func MSanRead(addr unsafe.Pointer, len int) {
+	msanread(addr, uintptr(len))
+}
+
+func MSanWrite(addr unsafe.Pointer, len int) {
+	msanwrite(addr, uintptr(len))
+}
+
+// Private interface for the runtime.
+const msanenabled = true
+
+// If we are running on the system stack, the C program may have
+// marked part of that stack as uninitialized. We don't instrument
+// the runtime, but operations like a slice copy can call msanread
+// anyhow for values on the stack. Just ignore msanread when running
+// on the system stack. The other msan functions are fine.
+//
+//go:nosplit
+func msanread(addr unsafe.Pointer, sz uintptr) {
+	g := getg()
+	if g == nil || g.m == nil || g == g.m.g0 || g == g.m.gsignal {
+		return
+	}
+	domsanread(addr, sz)
+}
+
+//go:noescape
+func domsanread(addr unsafe.Pointer, sz uintptr)
+
+//go:noescape
+func msanwrite(addr unsafe.Pointer, sz uintptr)
+
+//go:noescape
+func msanmalloc(addr unsafe.Pointer, sz uintptr)
+
+//go:noescape
+func msanfree(addr unsafe.Pointer, sz uintptr)
+
+//go:noescape
+func msanmove(dst, src unsafe.Pointer, sz uintptr)
+
+// These are called from msan_GOARCH.s
+//go:cgo_import_static __msan_read_go
+//go:cgo_import_static __msan_write_go
+//go:cgo_import_static __msan_malloc_go
+//go:cgo_import_static __msan_free_go
+//go:cgo_import_static __msan_memmove
diff --git a/src/runtime/msan/msan.go b/src/runtime/msan/msan.go
new file mode 100644
index 0000000..c81577d
--- /dev/null
+++ b/src/runtime/msan/msan.go
@@ -0,0 +1,33 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build msan,linux
+// +build amd64 arm64
+
+package msan
+
+/*
+#cgo CFLAGS: -fsanitize=memory
+#cgo LDFLAGS: -fsanitize=memory
+
+#include <stdint.h>
+#include <sanitizer/msan_interface.h>
+
+void __msan_read_go(void *addr, uintptr_t sz) {
+	__msan_check_mem_is_initialized(addr, sz);
+}
+
+void __msan_write_go(void *addr, uintptr_t sz) {
+	__msan_unpoison(addr, sz);
+}
+
+void __msan_malloc_go(void *addr, uintptr_t sz) {
+	__msan_unpoison(addr, sz);
+}
+
+void __msan_free_go(void *addr, uintptr_t sz) {
+	__msan_poison(addr, sz);
+}
+*/
+import "C"
diff --git a/src/runtime/msan0.go b/src/runtime/msan0.go
new file mode 100644
index 0000000..374d13f
--- /dev/null
+++ b/src/runtime/msan0.go
@@ -0,0 +1,23 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !msan
+
+// Dummy MSan support API, used when not built with -msan.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+const msanenabled = false
+
+// Because msanenabled is false, none of these functions should be called.
+
+func msanread(addr unsafe.Pointer, sz uintptr)     { throw("msan") }
+func msanwrite(addr unsafe.Pointer, sz uintptr)    { throw("msan") }
+func msanmalloc(addr unsafe.Pointer, sz uintptr)   { throw("msan") }
+func msanfree(addr unsafe.Pointer, sz uintptr)     { throw("msan") }
+func msanmove(dst, src unsafe.Pointer, sz uintptr) { throw("msan") }
diff --git a/src/runtime/msan_amd64.s b/src/runtime/msan_amd64.s
new file mode 100644
index 0000000..669e9ca
--- /dev/null
+++ b/src/runtime/msan_amd64.s
@@ -0,0 +1,89 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build msan
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// This is like race_amd64.s, but for the msan calls.
+// See race_amd64.s for detailed comments.
+
+#ifdef GOOS_windows
+#define RARG0 CX
+#define RARG1 DX
+#define RARG2 R8
+#define RARG3 R9
+#else
+#define RARG0 DI
+#define RARG1 SI
+#define RARG2 DX
+#define RARG3 CX
+#endif
+
+// func runtime·domsanread(addr unsafe.Pointer, sz uintptr)
+// Called from msanread.
+TEXT	runtime·domsanread(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	// void __msan_read_go(void *addr, uintptr_t sz);
+	MOVQ	$__msan_read_go(SB), AX
+	JMP	msancall<>(SB)
+
+// func runtime·msanwrite(addr unsafe.Pointer, sz uintptr)
+// Called from instrumented code.
+TEXT	runtime·msanwrite(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	// void __msan_write_go(void *addr, uintptr_t sz);
+	MOVQ	$__msan_write_go(SB), AX
+	JMP	msancall<>(SB)
+
+// func runtime·msanmalloc(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanmalloc(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	// void __msan_malloc_go(void *addr, uintptr_t sz);
+	MOVQ	$__msan_malloc_go(SB), AX
+	JMP	msancall<>(SB)
+
+// func runtime·msanfree(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanfree(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	// void __msan_free_go(void *addr, uintptr_t sz);
+	MOVQ	$__msan_free_go(SB), AX
+	JMP	msancall<>(SB)
+
+// func runtime·msanmove(dst, src unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanmove(SB), NOSPLIT, $0-24
+	MOVQ	dst+0(FP), RARG0
+	MOVQ	src+8(FP), RARG1
+	MOVQ	size+16(FP), RARG2
+	// void __msan_memmove(void *dst, void *src, uintptr_t sz);
+	MOVQ	$__msan_memmove(SB), AX
+	JMP	msancall<>(SB)
+
+// Switches SP to g0 stack and calls (AX). Arguments already set.
+TEXT	msancall<>(SB), NOSPLIT, $0-0
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	SP, R12		// callee-saved, preserved across the CALL
+	CMPQ	R14, $0
+	JE	call	// no g; still on a system stack
+
+	MOVQ	g_m(R14), R13
+	// Switch to g0 stack.
+	MOVQ	m_g0(R13), R10
+	CMPQ	R10, R14
+	JE	call	// already on g0
+
+	MOVQ	(g_sched+gobuf_sp)(R10), SP
+call:
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	CALL	AX
+	MOVQ	R12, SP
+	RET
diff --git a/src/runtime/msan_arm64.s b/src/runtime/msan_arm64.s
new file mode 100644
index 0000000..f19906c
--- /dev/null
+++ b/src/runtime/msan_arm64.s
@@ -0,0 +1,73 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build msan
+
+#include "go_asm.h"
+#include "textflag.h"
+
+#define RARG0 R0
+#define RARG1 R1
+#define RARG2 R2
+#define FARG R3
+
+// func runtime·domsanread(addr unsafe.Pointer, sz uintptr)
+// Called from msanread.
+TEXT	runtime·domsanread(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __msan_read_go(void *addr, uintptr_t sz);
+	MOVD	$__msan_read_go(SB), FARG
+	JMP	msancall<>(SB)
+
+// func runtime·msanwrite(addr unsafe.Pointer, sz uintptr)
+// Called from instrumented code.
+TEXT	runtime·msanwrite(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __msan_write_go(void *addr, uintptr_t sz);
+	MOVD	$__msan_write_go(SB), FARG
+	JMP	msancall<>(SB)
+
+// func runtime·msanmalloc(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanmalloc(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __msan_malloc_go(void *addr, uintptr_t sz);
+	MOVD	$__msan_malloc_go(SB), FARG
+	JMP	msancall<>(SB)
+
+// func runtime·msanfree(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanfree(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __msan_free_go(void *addr, uintptr_t sz);
+	MOVD	$__msan_free_go(SB), FARG
+	JMP	msancall<>(SB)
+
+// func runtime·msanmove(dst, src unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanmove(SB), NOSPLIT, $0-24
+	MOVD	dst+0(FP), RARG0
+	MOVD	src+8(FP), RARG1
+	MOVD	size+16(FP), RARG2
+	// void __msan_memmove(void *dst, void *src, uintptr_t sz);
+	MOVD	$__msan_memmove(SB), FARG
+	JMP	msancall<>(SB)
+
+// Switches SP to g0 stack and calls (FARG). Arguments already set.
+TEXT	msancall<>(SB), NOSPLIT, $0-0
+	MOVD	RSP, R19                  // callee-saved
+	CBZ	g, g0stack                // no g, still on a system stack
+	MOVD	g_m(g), R10
+	MOVD	m_g0(R10), R11
+	CMP	R11, g
+	BEQ	g0stack
+
+	MOVD	(g_sched+gobuf_sp)(R11), R4
+	MOVD	R4, RSP
+
+g0stack:
+	BL	(FARG)
+	MOVD	R19, RSP
+	RET
diff --git a/src/runtime/msize.go b/src/runtime/msize.go
new file mode 100644
index 0000000..c56aa5a
--- /dev/null
+++ b/src/runtime/msize.go
@@ -0,0 +1,25 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Malloc small size classes.
+//
+// See malloc.go for overview.
+// See also mksizeclasses.go for how we decide what size classes to use.
+
+package runtime
+
+// Returns size of the memory block that mallocgc will allocate if you ask for the size.
+func roundupsize(size uintptr) uintptr {
+	if size < _MaxSmallSize {
+		if size <= smallSizeMax-8 {
+			return uintptr(class_to_size[size_to_class8[divRoundUp(size, smallSizeDiv)]])
+		} else {
+			return uintptr(class_to_size[size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]])
+		}
+	}
+	if size+_PageSize < size {
+		return size
+	}
+	return alignUp(size, _PageSize)
+}
diff --git a/src/runtime/mspanset.go b/src/runtime/mspanset.go
new file mode 100644
index 0000000..10d2596
--- /dev/null
+++ b/src/runtime/mspanset.go
@@ -0,0 +1,354 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// A spanSet is a set of *mspans.
+//
+// spanSet is safe for concurrent push and pop operations.
+type spanSet struct {
+	// A spanSet is a two-level data structure consisting of a
+	// growable spine that points to fixed-sized blocks. The spine
+	// can be accessed without locks, but adding a block or
+	// growing it requires taking the spine lock.
+	//
+	// Because each mspan covers at least 8K of heap and takes at
+	// most 8 bytes in the spanSet, the growth of the spine is
+	// quite limited.
+	//
+	// The spine and all blocks are allocated off-heap, which
+	// allows this to be used in the memory manager and avoids the
+	// need for write barriers on all of these. spanSetBlocks are
+	// managed in a pool, though never freed back to the operating
+	// system. We never release spine memory because there could be
+	// concurrent lock-free access and we're likely to reuse it
+	// anyway. (In principle, we could do this during STW.)
+
+	spineLock mutex
+	spine     unsafe.Pointer // *[N]*spanSetBlock, accessed atomically
+	spineLen  uintptr        // Spine array length, accessed atomically
+	spineCap  uintptr        // Spine array cap, accessed under lock
+
+	// index is the head and tail of the spanSet in a single field.
+	// The head and the tail both represent an index into the logical
+	// concatenation of all blocks, with the head always behind or
+	// equal to the tail (indicating an empty set). This field is
+	// always accessed atomically.
+	//
+	// The head and the tail are only 32 bits wide, which means we
+	// can only support up to 2^32 pushes before a reset. If every
+	// span in the heap were stored in this set, and each span were
+	// the minimum size (1 runtime page, 8 KiB), then roughly the
+	// smallest heap which would be unrepresentable is 32 TiB in size.
+	index headTailIndex
+}
+
+const (
+	spanSetBlockEntries = 512 // 4KB on 64-bit
+	spanSetInitSpineCap = 256 // Enough for 1GB heap on 64-bit
+)
+
+type spanSetBlock struct {
+	// Free spanSetBlocks are managed via a lock-free stack.
+	lfnode
+
+	// popped is the number of pop operations that have occurred on
+	// this block. This number is used to help determine when a block
+	// may be safely recycled.
+	popped uint32
+
+	// spans is the set of spans in this block.
+	spans [spanSetBlockEntries]*mspan
+}
+
+// push adds span s to buffer b. push is safe to call concurrently
+// with other push and pop operations.
+func (b *spanSet) push(s *mspan) {
+	// Obtain our slot.
+	cursor := uintptr(b.index.incTail().tail() - 1)
+	top, bottom := cursor/spanSetBlockEntries, cursor%spanSetBlockEntries
+
+	// Do we need to add a block?
+	spineLen := atomic.Loaduintptr(&b.spineLen)
+	var block *spanSetBlock
+retry:
+	if top < spineLen {
+		spine := atomic.Loadp(unsafe.Pointer(&b.spine))
+		blockp := add(spine, sys.PtrSize*top)
+		block = (*spanSetBlock)(atomic.Loadp(blockp))
+	} else {
+		// Add a new block to the spine, potentially growing
+		// the spine.
+		lock(&b.spineLock)
+		// spineLen cannot change until we release the lock,
+		// but may have changed while we were waiting.
+		spineLen = atomic.Loaduintptr(&b.spineLen)
+		if top < spineLen {
+			unlock(&b.spineLock)
+			goto retry
+		}
+
+		if spineLen == b.spineCap {
+			// Grow the spine.
+			newCap := b.spineCap * 2
+			if newCap == 0 {
+				newCap = spanSetInitSpineCap
+			}
+			newSpine := persistentalloc(newCap*sys.PtrSize, cpu.CacheLineSize, &memstats.gcMiscSys)
+			if b.spineCap != 0 {
+				// Blocks are allocated off-heap, so
+				// no write barriers.
+				memmove(newSpine, b.spine, b.spineCap*sys.PtrSize)
+			}
+			// Spine is allocated off-heap, so no write barrier.
+			atomic.StorepNoWB(unsafe.Pointer(&b.spine), newSpine)
+			b.spineCap = newCap
+			// We can't immediately free the old spine
+			// since a concurrent push with a lower index
+			// could still be reading from it. We let it
+			// leak because even a 1TB heap would waste
+			// less than 2MB of memory on old spines. If
+			// this is a problem, we could free old spines
+			// during STW.
+		}
+
+		// Allocate a new block from the pool.
+		block = spanSetBlockPool.alloc()
+
+		// Add it to the spine.
+		blockp := add(b.spine, sys.PtrSize*top)
+		// Blocks are allocated off-heap, so no write barrier.
+		atomic.StorepNoWB(blockp, unsafe.Pointer(block))
+		atomic.Storeuintptr(&b.spineLen, spineLen+1)
+		unlock(&b.spineLock)
+	}
+
+	// We have a block. Insert the span atomically, since there may be
+	// concurrent readers via the block API.
+	atomic.StorepNoWB(unsafe.Pointer(&block.spans[bottom]), unsafe.Pointer(s))
+}
+
+// pop removes and returns a span from buffer b, or nil if b is empty.
+// pop is safe to call concurrently with other pop and push operations.
+func (b *spanSet) pop() *mspan {
+	var head, tail uint32
+claimLoop:
+	for {
+		headtail := b.index.load()
+		head, tail = headtail.split()
+		if head >= tail {
+			// The buf is empty, as far as we can tell.
+			return nil
+		}
+		// Check if the head position we want to claim is actually
+		// backed by a block.
+		spineLen := atomic.Loaduintptr(&b.spineLen)
+		if spineLen <= uintptr(head)/spanSetBlockEntries {
+			// We're racing with a spine growth and the allocation of
+			// a new block (and maybe a new spine!), and trying to grab
+			// the span at the index which is currently being pushed.
+			// Instead of spinning, let's just notify the caller that
+			// there's nothing currently here. Spinning on this is
+			// almost definitely not worth it.
+			return nil
+		}
+		// Try to claim the current head by CASing in an updated head.
+		// This may fail transiently due to a push which modifies the
+		// tail, so keep trying while the head isn't changing.
+		want := head
+		for want == head {
+			if b.index.cas(headtail, makeHeadTailIndex(want+1, tail)) {
+				break claimLoop
+			}
+			headtail = b.index.load()
+			head, tail = headtail.split()
+		}
+		// We failed to claim the spot we were after and the head changed,
+		// meaning a popper got ahead of us. Try again from the top because
+		// the buf may not be empty.
+	}
+	top, bottom := head/spanSetBlockEntries, head%spanSetBlockEntries
+
+	// We may be reading a stale spine pointer, but because the length
+	// grows monotonically and we've already verified it, we'll definitely
+	// be reading from a valid block.
+	spine := atomic.Loadp(unsafe.Pointer(&b.spine))
+	blockp := add(spine, sys.PtrSize*uintptr(top))
+
+	// Given that the spine length is correct, we know we will never
+	// see a nil block here, since the length is always updated after
+	// the block is set.
+	block := (*spanSetBlock)(atomic.Loadp(blockp))
+	s := (*mspan)(atomic.Loadp(unsafe.Pointer(&block.spans[bottom])))
+	for s == nil {
+		// We raced with the span actually being set, but given that we
+		// know a block for this span exists, the race window here is
+		// extremely small. Try again.
+		s = (*mspan)(atomic.Loadp(unsafe.Pointer(&block.spans[bottom])))
+	}
+	// Clear the pointer. This isn't strictly necessary, but defensively
+	// avoids accidentally re-using blocks which could lead to memory
+	// corruption. This way, we'll get a nil pointer access instead.
+	atomic.StorepNoWB(unsafe.Pointer(&block.spans[bottom]), nil)
+
+	// Increase the popped count. If we are the last possible popper
+	// in the block (note that bottom need not equal spanSetBlockEntries-1
+	// due to races) then it's our resposibility to free the block.
+	//
+	// If we increment popped to spanSetBlockEntries, we can be sure that
+	// we're the last popper for this block, and it's thus safe to free it.
+	// Every other popper must have crossed this barrier (and thus finished
+	// popping its corresponding mspan) by the time we get here. Because
+	// we're the last popper, we also don't have to worry about concurrent
+	// pushers (there can't be any). Note that we may not be the popper
+	// which claimed the last slot in the block, we're just the last one
+	// to finish popping.
+	if atomic.Xadd(&block.popped, 1) == spanSetBlockEntries {
+		// Clear the block's pointer.
+		atomic.StorepNoWB(blockp, nil)
+
+		// Return the block to the block pool.
+		spanSetBlockPool.free(block)
+	}
+	return s
+}
+
+// reset resets a spanSet which is empty. It will also clean up
+// any left over blocks.
+//
+// Throws if the buf is not empty.
+//
+// reset may not be called concurrently with any other operations
+// on the span set.
+func (b *spanSet) reset() {
+	head, tail := b.index.load().split()
+	if head < tail {
+		print("head = ", head, ", tail = ", tail, "\n")
+		throw("attempt to clear non-empty span set")
+	}
+	top := head / spanSetBlockEntries
+	if uintptr(top) < b.spineLen {
+		// If the head catches up to the tail and the set is empty,
+		// we may not clean up the block containing the head and tail
+		// since it may be pushed into again. In order to avoid leaking
+		// memory since we're going to reset the head and tail, clean
+		// up such a block now, if it exists.
+		blockp := (**spanSetBlock)(add(b.spine, sys.PtrSize*uintptr(top)))
+		block := *blockp
+		if block != nil {
+			// Sanity check the popped value.
+			if block.popped == 0 {
+				// popped should never be zero because that means we have
+				// pushed at least one value but not yet popped if this
+				// block pointer is not nil.
+				throw("span set block with unpopped elements found in reset")
+			}
+			if block.popped == spanSetBlockEntries {
+				// popped should also never be equal to spanSetBlockEntries
+				// because the last popper should have made the block pointer
+				// in this slot nil.
+				throw("fully empty unfreed span set block found in reset")
+			}
+
+			// Clear the pointer to the block.
+			atomic.StorepNoWB(unsafe.Pointer(blockp), nil)
+
+			// Return the block to the block pool.
+			spanSetBlockPool.free(block)
+		}
+	}
+	b.index.reset()
+	atomic.Storeuintptr(&b.spineLen, 0)
+}
+
+// spanSetBlockPool is a global pool of spanSetBlocks.
+var spanSetBlockPool spanSetBlockAlloc
+
+// spanSetBlockAlloc represents a concurrent pool of spanSetBlocks.
+type spanSetBlockAlloc struct {
+	stack lfstack
+}
+
+// alloc tries to grab a spanSetBlock out of the pool, and if it fails
+// persistentallocs a new one and returns it.
+func (p *spanSetBlockAlloc) alloc() *spanSetBlock {
+	if s := (*spanSetBlock)(p.stack.pop()); s != nil {
+		return s
+	}
+	return (*spanSetBlock)(persistentalloc(unsafe.Sizeof(spanSetBlock{}), cpu.CacheLineSize, &memstats.gcMiscSys))
+}
+
+// free returns a spanSetBlock back to the pool.
+func (p *spanSetBlockAlloc) free(block *spanSetBlock) {
+	atomic.Store(&block.popped, 0)
+	p.stack.push(&block.lfnode)
+}
+
+// haidTailIndex represents a combined 32-bit head and 32-bit tail
+// of a queue into a single 64-bit value.
+type headTailIndex uint64
+
+// makeHeadTailIndex creates a headTailIndex value from a separate
+// head and tail.
+func makeHeadTailIndex(head, tail uint32) headTailIndex {
+	return headTailIndex(uint64(head)<<32 | uint64(tail))
+}
+
+// head returns the head of a headTailIndex value.
+func (h headTailIndex) head() uint32 {
+	return uint32(h >> 32)
+}
+
+// tail returns the tail of a headTailIndex value.
+func (h headTailIndex) tail() uint32 {
+	return uint32(h)
+}
+
+// split splits the headTailIndex value into its parts.
+func (h headTailIndex) split() (head uint32, tail uint32) {
+	return h.head(), h.tail()
+}
+
+// load atomically reads a headTailIndex value.
+func (h *headTailIndex) load() headTailIndex {
+	return headTailIndex(atomic.Load64((*uint64)(h)))
+}
+
+// cas atomically compares-and-swaps a headTailIndex value.
+func (h *headTailIndex) cas(old, new headTailIndex) bool {
+	return atomic.Cas64((*uint64)(h), uint64(old), uint64(new))
+}
+
+// incHead atomically increments the head of a headTailIndex.
+func (h *headTailIndex) incHead() headTailIndex {
+	return headTailIndex(atomic.Xadd64((*uint64)(h), (1 << 32)))
+}
+
+// decHead atomically decrements the head of a headTailIndex.
+func (h *headTailIndex) decHead() headTailIndex {
+	return headTailIndex(atomic.Xadd64((*uint64)(h), -(1 << 32)))
+}
+
+// incTail atomically increments the tail of a headTailIndex.
+func (h *headTailIndex) incTail() headTailIndex {
+	ht := headTailIndex(atomic.Xadd64((*uint64)(h), +1))
+	// Check for overflow.
+	if ht.tail() == 0 {
+		print("runtime: head = ", ht.head(), ", tail = ", ht.tail(), "\n")
+		throw("headTailIndex overflow")
+	}
+	return ht
+}
+
+// reset clears the headTailIndex to (0, 0).
+func (h *headTailIndex) reset() {
+	atomic.Store64((*uint64)(h), 0)
+}
diff --git a/src/runtime/mstats.go b/src/runtime/mstats.go
new file mode 100644
index 0000000..6defaed
--- /dev/null
+++ b/src/runtime/mstats.go
@@ -0,0 +1,980 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Memory statistics
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// Statistics.
+//
+// For detailed descriptions see the documentation for MemStats.
+// Fields that differ from MemStats are further documented here.
+//
+// Many of these fields are updated on the fly, while others are only
+// updated when updatememstats is called.
+type mstats struct {
+	// General statistics.
+	alloc       uint64 // bytes allocated and not yet freed
+	total_alloc uint64 // bytes allocated (even if freed)
+	sys         uint64 // bytes obtained from system (should be sum of xxx_sys below, no locking, approximate)
+	nlookup     uint64 // number of pointer lookups (unused)
+	nmalloc     uint64 // number of mallocs
+	nfree       uint64 // number of frees
+
+	// Statistics about malloc heap.
+	// Updated atomically, or with the world stopped.
+	//
+	// Like MemStats, heap_sys and heap_inuse do not count memory
+	// in manually-managed spans.
+	heap_sys      sysMemStat // virtual address space obtained from system for GC'd heap
+	heap_inuse    uint64     // bytes in mSpanInUse spans
+	heap_released uint64     // bytes released to the os
+
+	// heap_objects is not used by the runtime directly and instead
+	// computed on the fly by updatememstats.
+	heap_objects uint64 // total number of allocated objects
+
+	// Statistics about stacks.
+	stacks_inuse uint64     // bytes in manually-managed stack spans; computed by updatememstats
+	stacks_sys   sysMemStat // only counts newosproc0 stack in mstats; differs from MemStats.StackSys
+
+	// Statistics about allocation of low-level fixed-size structures.
+	// Protected by FixAlloc locks.
+	mspan_inuse  uint64 // mspan structures
+	mspan_sys    sysMemStat
+	mcache_inuse uint64 // mcache structures
+	mcache_sys   sysMemStat
+	buckhash_sys sysMemStat // profiling bucket hash table
+
+	// Statistics about GC overhead.
+	gcWorkBufInUse           uint64     // computed by updatememstats
+	gcProgPtrScalarBitsInUse uint64     // computed by updatememstats
+	gcMiscSys                sysMemStat // updated atomically or during STW
+
+	// Miscellaneous statistics.
+	other_sys sysMemStat // updated atomically or during STW
+
+	// Statistics about the garbage collector.
+
+	// next_gc is the goal heap_live for when next GC ends.
+	// Set to ^uint64(0) if disabled.
+	//
+	// Read and written atomically, unless the world is stopped.
+	next_gc uint64
+
+	// Protected by mheap or stopping the world during GC.
+	last_gc_unix    uint64 // last gc (in unix time)
+	pause_total_ns  uint64
+	pause_ns        [256]uint64 // circular buffer of recent gc pause lengths
+	pause_end       [256]uint64 // circular buffer of recent gc end times (nanoseconds since 1970)
+	numgc           uint32
+	numforcedgc     uint32  // number of user-forced GCs
+	gc_cpu_fraction float64 // fraction of CPU time used by GC
+	enablegc        bool
+	debuggc         bool
+
+	// Statistics about allocation size classes.
+
+	by_size [_NumSizeClasses]struct {
+		size    uint32
+		nmalloc uint64
+		nfree   uint64
+	}
+
+	// Add an uint32 for even number of size classes to align below fields
+	// to 64 bits for atomic operations on 32 bit platforms.
+	_ [1 - _NumSizeClasses%2]uint32
+
+	last_gc_nanotime uint64 // last gc (monotonic time)
+	tinyallocs       uint64 // number of tiny allocations that didn't cause actual allocation; not exported to go directly
+	last_next_gc     uint64 // next_gc for the previous GC
+	last_heap_inuse  uint64 // heap_inuse at mark termination of the previous GC
+
+	// triggerRatio is the heap growth ratio that triggers marking.
+	//
+	// E.g., if this is 0.6, then GC should start when the live
+	// heap has reached 1.6 times the heap size marked by the
+	// previous cycle. This should be ≤ GOGC/100 so the trigger
+	// heap size is less than the goal heap size. This is set
+	// during mark termination for the next cycle's trigger.
+	triggerRatio float64
+
+	// gc_trigger is the heap size that triggers marking.
+	//
+	// When heap_live ≥ gc_trigger, the mark phase will start.
+	// This is also the heap size by which proportional sweeping
+	// must be complete.
+	//
+	// This is computed from triggerRatio during mark termination
+	// for the next cycle's trigger.
+	gc_trigger uint64
+
+	// heap_live is the number of bytes considered live by the GC.
+	// That is: retained by the most recent GC plus allocated
+	// since then. heap_live <= alloc, since alloc includes unmarked
+	// objects that have not yet been swept (and hence goes up as we
+	// allocate and down as we sweep) while heap_live excludes these
+	// objects (and hence only goes up between GCs).
+	//
+	// This is updated atomically without locking. To reduce
+	// contention, this is updated only when obtaining a span from
+	// an mcentral and at this point it counts all of the
+	// unallocated slots in that span (which will be allocated
+	// before that mcache obtains another span from that
+	// mcentral). Hence, it slightly overestimates the "true" live
+	// heap size. It's better to overestimate than to
+	// underestimate because 1) this triggers the GC earlier than
+	// necessary rather than potentially too late and 2) this
+	// leads to a conservative GC rate rather than a GC rate that
+	// is potentially too low.
+	//
+	// Reads should likewise be atomic (or during STW).
+	//
+	// Whenever this is updated, call traceHeapAlloc() and
+	// gcController.revise().
+	heap_live uint64
+
+	// heap_scan is the number of bytes of "scannable" heap. This
+	// is the live heap (as counted by heap_live), but omitting
+	// no-scan objects and no-scan tails of objects.
+	//
+	// Whenever this is updated, call gcController.revise().
+	//
+	// Read and written atomically or with the world stopped.
+	heap_scan uint64
+
+	// heap_marked is the number of bytes marked by the previous
+	// GC. After mark termination, heap_live == heap_marked, but
+	// unlike heap_live, heap_marked does not change until the
+	// next mark termination.
+	heap_marked uint64
+
+	// heapStats is a set of statistics
+	heapStats consistentHeapStats
+
+	// _ uint32 // ensure gcPauseDist is aligned
+
+	// gcPauseDist represents the distribution of all GC-related
+	// application pauses in the runtime.
+	//
+	// Each individual pause is counted separately, unlike pause_ns.
+	gcPauseDist timeHistogram
+}
+
+var memstats mstats
+
+// A MemStats records statistics about the memory allocator.
+type MemStats struct {
+	// General statistics.
+
+	// Alloc is bytes of allocated heap objects.
+	//
+	// This is the same as HeapAlloc (see below).
+	Alloc uint64
+
+	// TotalAlloc is cumulative bytes allocated for heap objects.
+	//
+	// TotalAlloc increases as heap objects are allocated, but
+	// unlike Alloc and HeapAlloc, it does not decrease when
+	// objects are freed.
+	TotalAlloc uint64
+
+	// Sys is the total bytes of memory obtained from the OS.
+	//
+	// Sys is the sum of the XSys fields below. Sys measures the
+	// virtual address space reserved by the Go runtime for the
+	// heap, stacks, and other internal data structures. It's
+	// likely that not all of the virtual address space is backed
+	// by physical memory at any given moment, though in general
+	// it all was at some point.
+	Sys uint64
+
+	// Lookups is the number of pointer lookups performed by the
+	// runtime.
+	//
+	// This is primarily useful for debugging runtime internals.
+	Lookups uint64
+
+	// Mallocs is the cumulative count of heap objects allocated.
+	// The number of live objects is Mallocs - Frees.
+	Mallocs uint64
+
+	// Frees is the cumulative count of heap objects freed.
+	Frees uint64
+
+	// Heap memory statistics.
+	//
+	// Interpreting the heap statistics requires some knowledge of
+	// how Go organizes memory. Go divides the virtual address
+	// space of the heap into "spans", which are contiguous
+	// regions of memory 8K or larger. A span may be in one of
+	// three states:
+	//
+	// An "idle" span contains no objects or other data. The
+	// physical memory backing an idle span can be released back
+	// to the OS (but the virtual address space never is), or it
+	// can be converted into an "in use" or "stack" span.
+	//
+	// An "in use" span contains at least one heap object and may
+	// have free space available to allocate more heap objects.
+	//
+	// A "stack" span is used for goroutine stacks. Stack spans
+	// are not considered part of the heap. A span can change
+	// between heap and stack memory; it is never used for both
+	// simultaneously.
+
+	// HeapAlloc is bytes of allocated heap objects.
+	//
+	// "Allocated" heap objects include all reachable objects, as
+	// well as unreachable objects that the garbage collector has
+	// not yet freed. Specifically, HeapAlloc increases as heap
+	// objects are allocated and decreases as the heap is swept
+	// and unreachable objects are freed. Sweeping occurs
+	// incrementally between GC cycles, so these two processes
+	// occur simultaneously, and as a result HeapAlloc tends to
+	// change smoothly (in contrast with the sawtooth that is
+	// typical of stop-the-world garbage collectors).
+	HeapAlloc uint64
+
+	// HeapSys is bytes of heap memory obtained from the OS.
+	//
+	// HeapSys measures the amount of virtual address space
+	// reserved for the heap. This includes virtual address space
+	// that has been reserved but not yet used, which consumes no
+	// physical memory, but tends to be small, as well as virtual
+	// address space for which the physical memory has been
+	// returned to the OS after it became unused (see HeapReleased
+	// for a measure of the latter).
+	//
+	// HeapSys estimates the largest size the heap has had.
+	HeapSys uint64
+
+	// HeapIdle is bytes in idle (unused) spans.
+	//
+	// Idle spans have no objects in them. These spans could be
+	// (and may already have been) returned to the OS, or they can
+	// be reused for heap allocations, or they can be reused as
+	// stack memory.
+	//
+	// HeapIdle minus HeapReleased estimates the amount of memory
+	// that could be returned to the OS, but is being retained by
+	// the runtime so it can grow the heap without requesting more
+	// memory from the OS. If this difference is significantly
+	// larger than the heap size, it indicates there was a recent
+	// transient spike in live heap size.
+	HeapIdle uint64
+
+	// HeapInuse is bytes in in-use spans.
+	//
+	// In-use spans have at least one object in them. These spans
+	// can only be used for other objects of roughly the same
+	// size.
+	//
+	// HeapInuse minus HeapAlloc estimates the amount of memory
+	// that has been dedicated to particular size classes, but is
+	// not currently being used. This is an upper bound on
+	// fragmentation, but in general this memory can be reused
+	// efficiently.
+	HeapInuse uint64
+
+	// HeapReleased is bytes of physical memory returned to the OS.
+	//
+	// This counts heap memory from idle spans that was returned
+	// to the OS and has not yet been reacquired for the heap.
+	HeapReleased uint64
+
+	// HeapObjects is the number of allocated heap objects.
+	//
+	// Like HeapAlloc, this increases as objects are allocated and
+	// decreases as the heap is swept and unreachable objects are
+	// freed.
+	HeapObjects uint64
+
+	// Stack memory statistics.
+	//
+	// Stacks are not considered part of the heap, but the runtime
+	// can reuse a span of heap memory for stack memory, and
+	// vice-versa.
+
+	// StackInuse is bytes in stack spans.
+	//
+	// In-use stack spans have at least one stack in them. These
+	// spans can only be used for other stacks of the same size.
+	//
+	// There is no StackIdle because unused stack spans are
+	// returned to the heap (and hence counted toward HeapIdle).
+	StackInuse uint64
+
+	// StackSys is bytes of stack memory obtained from the OS.
+	//
+	// StackSys is StackInuse, plus any memory obtained directly
+	// from the OS for OS thread stacks (which should be minimal).
+	StackSys uint64
+
+	// Off-heap memory statistics.
+	//
+	// The following statistics measure runtime-internal
+	// structures that are not allocated from heap memory (usually
+	// because they are part of implementing the heap). Unlike
+	// heap or stack memory, any memory allocated to these
+	// structures is dedicated to these structures.
+	//
+	// These are primarily useful for debugging runtime memory
+	// overheads.
+
+	// MSpanInuse is bytes of allocated mspan structures.
+	MSpanInuse uint64
+
+	// MSpanSys is bytes of memory obtained from the OS for mspan
+	// structures.
+	MSpanSys uint64
+
+	// MCacheInuse is bytes of allocated mcache structures.
+	MCacheInuse uint64
+
+	// MCacheSys is bytes of memory obtained from the OS for
+	// mcache structures.
+	MCacheSys uint64
+
+	// BuckHashSys is bytes of memory in profiling bucket hash tables.
+	BuckHashSys uint64
+
+	// GCSys is bytes of memory in garbage collection metadata.
+	GCSys uint64
+
+	// OtherSys is bytes of memory in miscellaneous off-heap
+	// runtime allocations.
+	OtherSys uint64
+
+	// Garbage collector statistics.
+
+	// NextGC is the target heap size of the next GC cycle.
+	//
+	// The garbage collector's goal is to keep HeapAlloc ≤ NextGC.
+	// At the end of each GC cycle, the target for the next cycle
+	// is computed based on the amount of reachable data and the
+	// value of GOGC.
+	NextGC uint64
+
+	// LastGC is the time the last garbage collection finished, as
+	// nanoseconds since 1970 (the UNIX epoch).
+	LastGC uint64
+
+	// PauseTotalNs is the cumulative nanoseconds in GC
+	// stop-the-world pauses since the program started.
+	//
+	// During a stop-the-world pause, all goroutines are paused
+	// and only the garbage collector can run.
+	PauseTotalNs uint64
+
+	// PauseNs is a circular buffer of recent GC stop-the-world
+	// pause times in nanoseconds.
+	//
+	// The most recent pause is at PauseNs[(NumGC+255)%256]. In
+	// general, PauseNs[N%256] records the time paused in the most
+	// recent N%256th GC cycle. There may be multiple pauses per
+	// GC cycle; this is the sum of all pauses during a cycle.
+	PauseNs [256]uint64
+
+	// PauseEnd is a circular buffer of recent GC pause end times,
+	// as nanoseconds since 1970 (the UNIX epoch).
+	//
+	// This buffer is filled the same way as PauseNs. There may be
+	// multiple pauses per GC cycle; this records the end of the
+	// last pause in a cycle.
+	PauseEnd [256]uint64
+
+	// NumGC is the number of completed GC cycles.
+	NumGC uint32
+
+	// NumForcedGC is the number of GC cycles that were forced by
+	// the application calling the GC function.
+	NumForcedGC uint32
+
+	// GCCPUFraction is the fraction of this program's available
+	// CPU time used by the GC since the program started.
+	//
+	// GCCPUFraction is expressed as a number between 0 and 1,
+	// where 0 means GC has consumed none of this program's CPU. A
+	// program's available CPU time is defined as the integral of
+	// GOMAXPROCS since the program started. That is, if
+	// GOMAXPROCS is 2 and a program has been running for 10
+	// seconds, its "available CPU" is 20 seconds. GCCPUFraction
+	// does not include CPU time used for write barrier activity.
+	//
+	// This is the same as the fraction of CPU reported by
+	// GODEBUG=gctrace=1.
+	GCCPUFraction float64
+
+	// EnableGC indicates that GC is enabled. It is always true,
+	// even if GOGC=off.
+	EnableGC bool
+
+	// DebugGC is currently unused.
+	DebugGC bool
+
+	// BySize reports per-size class allocation statistics.
+	//
+	// BySize[N] gives statistics for allocations of size S where
+	// BySize[N-1].Size < S ≤ BySize[N].Size.
+	//
+	// This does not report allocations larger than BySize[60].Size.
+	BySize [61]struct {
+		// Size is the maximum byte size of an object in this
+		// size class.
+		Size uint32
+
+		// Mallocs is the cumulative count of heap objects
+		// allocated in this size class. The cumulative bytes
+		// of allocation is Size*Mallocs. The number of live
+		// objects in this size class is Mallocs - Frees.
+		Mallocs uint64
+
+		// Frees is the cumulative count of heap objects freed
+		// in this size class.
+		Frees uint64
+	}
+}
+
+func init() {
+	if offset := unsafe.Offsetof(memstats.heap_live); offset%8 != 0 {
+		println(offset)
+		throw("memstats.heap_live not aligned to 8 bytes")
+	}
+	if offset := unsafe.Offsetof(memstats.heapStats); offset%8 != 0 {
+		println(offset)
+		throw("memstats.heapStats not aligned to 8 bytes")
+	}
+	if offset := unsafe.Offsetof(memstats.gcPauseDist); offset%8 != 0 {
+		println(offset)
+		throw("memstats.gcPauseDist not aligned to 8 bytes")
+	}
+	// Ensure the size of heapStatsDelta causes adjacent fields/slots (e.g.
+	// [3]heapStatsDelta) to be 8-byte aligned.
+	if size := unsafe.Sizeof(heapStatsDelta{}); size%8 != 0 {
+		println(size)
+		throw("heapStatsDelta not a multiple of 8 bytes in size")
+	}
+}
+
+// ReadMemStats populates m with memory allocator statistics.
+//
+// The returned memory allocator statistics are up to date as of the
+// call to ReadMemStats. This is in contrast with a heap profile,
+// which is a snapshot as of the most recently completed garbage
+// collection cycle.
+func ReadMemStats(m *MemStats) {
+	stopTheWorld("read mem stats")
+
+	systemstack(func() {
+		readmemstats_m(m)
+	})
+
+	startTheWorld()
+}
+
+func readmemstats_m(stats *MemStats) {
+	updatememstats()
+
+	stats.Alloc = memstats.alloc
+	stats.TotalAlloc = memstats.total_alloc
+	stats.Sys = memstats.sys
+	stats.Mallocs = memstats.nmalloc
+	stats.Frees = memstats.nfree
+	stats.HeapAlloc = memstats.alloc
+	stats.HeapSys = memstats.heap_sys.load()
+	// By definition, HeapIdle is memory that was mapped
+	// for the heap but is not currently used to hold heap
+	// objects. It also specifically is memory that can be
+	// used for other purposes, like stacks, but this memory
+	// is subtracted out of HeapSys before it makes that
+	// transition. Put another way:
+	//
+	// heap_sys = bytes allocated from the OS for the heap - bytes ultimately used for non-heap purposes
+	// heap_idle = bytes allocated from the OS for the heap - bytes ultimately used for any purpose
+	//
+	// or
+	//
+	// heap_sys = sys - stacks_inuse - gcWorkBufInUse - gcProgPtrScalarBitsInUse
+	// heap_idle = sys - stacks_inuse - gcWorkBufInUse - gcProgPtrScalarBitsInUse - heap_inuse
+	//
+	// => heap_idle = heap_sys - heap_inuse
+	stats.HeapIdle = memstats.heap_sys.load() - memstats.heap_inuse
+	stats.HeapInuse = memstats.heap_inuse
+	stats.HeapReleased = memstats.heap_released
+	stats.HeapObjects = memstats.heap_objects
+	stats.StackInuse = memstats.stacks_inuse
+	// memstats.stacks_sys is only memory mapped directly for OS stacks.
+	// Add in heap-allocated stack memory for user consumption.
+	stats.StackSys = memstats.stacks_inuse + memstats.stacks_sys.load()
+	stats.MSpanInuse = memstats.mspan_inuse
+	stats.MSpanSys = memstats.mspan_sys.load()
+	stats.MCacheInuse = memstats.mcache_inuse
+	stats.MCacheSys = memstats.mcache_sys.load()
+	stats.BuckHashSys = memstats.buckhash_sys.load()
+	// MemStats defines GCSys as an aggregate of all memory related
+	// to the memory management system, but we track this memory
+	// at a more granular level in the runtime.
+	stats.GCSys = memstats.gcMiscSys.load() + memstats.gcWorkBufInUse + memstats.gcProgPtrScalarBitsInUse
+	stats.OtherSys = memstats.other_sys.load()
+	stats.NextGC = memstats.next_gc
+	stats.LastGC = memstats.last_gc_unix
+	stats.PauseTotalNs = memstats.pause_total_ns
+	stats.PauseNs = memstats.pause_ns
+	stats.PauseEnd = memstats.pause_end
+	stats.NumGC = memstats.numgc
+	stats.NumForcedGC = memstats.numforcedgc
+	stats.GCCPUFraction = memstats.gc_cpu_fraction
+	stats.EnableGC = true
+
+	// Handle BySize. Copy N values, where N is
+	// the minimum of the lengths of the two arrays.
+	// Unfortunately copy() won't work here because
+	// the arrays have different structs.
+	//
+	// TODO(mknyszek): Consider renaming the fields
+	// of by_size's elements to align so we can use
+	// the copy built-in.
+	bySizeLen := len(stats.BySize)
+	if l := len(memstats.by_size); l < bySizeLen {
+		bySizeLen = l
+	}
+	for i := 0; i < bySizeLen; i++ {
+		stats.BySize[i].Size = memstats.by_size[i].size
+		stats.BySize[i].Mallocs = memstats.by_size[i].nmalloc
+		stats.BySize[i].Frees = memstats.by_size[i].nfree
+	}
+}
+
+//go:linkname readGCStats runtime/debug.readGCStats
+func readGCStats(pauses *[]uint64) {
+	systemstack(func() {
+		readGCStats_m(pauses)
+	})
+}
+
+// readGCStats_m must be called on the system stack because it acquires the heap
+// lock. See mheap for details.
+//go:systemstack
+func readGCStats_m(pauses *[]uint64) {
+	p := *pauses
+	// Calling code in runtime/debug should make the slice large enough.
+	if cap(p) < len(memstats.pause_ns)+3 {
+		throw("short slice passed to readGCStats")
+	}
+
+	// Pass back: pauses, pause ends, last gc (absolute time), number of gc, total pause ns.
+	lock(&mheap_.lock)
+
+	n := memstats.numgc
+	if n > uint32(len(memstats.pause_ns)) {
+		n = uint32(len(memstats.pause_ns))
+	}
+
+	// The pause buffer is circular. The most recent pause is at
+	// pause_ns[(numgc-1)%len(pause_ns)], and then backward
+	// from there to go back farther in time. We deliver the times
+	// most recent first (in p[0]).
+	p = p[:cap(p)]
+	for i := uint32(0); i < n; i++ {
+		j := (memstats.numgc - 1 - i) % uint32(len(memstats.pause_ns))
+		p[i] = memstats.pause_ns[j]
+		p[n+i] = memstats.pause_end[j]
+	}
+
+	p[n+n] = memstats.last_gc_unix
+	p[n+n+1] = uint64(memstats.numgc)
+	p[n+n+2] = memstats.pause_total_ns
+	unlock(&mheap_.lock)
+	*pauses = p[:n+n+3]
+}
+
+// Updates the memstats structure.
+//
+// The world must be stopped.
+//
+//go:nowritebarrier
+func updatememstats() {
+	assertWorldStopped()
+
+	// Flush mcaches to mcentral before doing anything else.
+	//
+	// Flushing to the mcentral may in general cause stats to
+	// change as mcentral data structures are manipulated.
+	systemstack(flushallmcaches)
+
+	memstats.mcache_inuse = uint64(mheap_.cachealloc.inuse)
+	memstats.mspan_inuse = uint64(mheap_.spanalloc.inuse)
+	memstats.sys = memstats.heap_sys.load() + memstats.stacks_sys.load() + memstats.mspan_sys.load() +
+		memstats.mcache_sys.load() + memstats.buckhash_sys.load() + memstats.gcMiscSys.load() +
+		memstats.other_sys.load()
+
+	// Calculate memory allocator stats.
+	// During program execution we only count number of frees and amount of freed memory.
+	// Current number of alive objects in the heap and amount of alive heap memory
+	// are calculated by scanning all spans.
+	// Total number of mallocs is calculated as number of frees plus number of alive objects.
+	// Similarly, total amount of allocated memory is calculated as amount of freed memory
+	// plus amount of alive heap memory.
+	memstats.alloc = 0
+	memstats.total_alloc = 0
+	memstats.nmalloc = 0
+	memstats.nfree = 0
+	for i := 0; i < len(memstats.by_size); i++ {
+		memstats.by_size[i].nmalloc = 0
+		memstats.by_size[i].nfree = 0
+	}
+	// Collect consistent stats, which are the source-of-truth in the some cases.
+	var consStats heapStatsDelta
+	memstats.heapStats.unsafeRead(&consStats)
+
+	// Collect large allocation stats.
+	totalAlloc := uint64(consStats.largeAlloc)
+	memstats.nmalloc += uint64(consStats.largeAllocCount)
+	totalFree := uint64(consStats.largeFree)
+	memstats.nfree += uint64(consStats.largeFreeCount)
+
+	// Collect per-sizeclass stats.
+	for i := 0; i < _NumSizeClasses; i++ {
+		// Malloc stats.
+		a := uint64(consStats.smallAllocCount[i])
+		totalAlloc += a * uint64(class_to_size[i])
+		memstats.nmalloc += a
+		memstats.by_size[i].nmalloc = a
+
+		// Free stats.
+		f := uint64(consStats.smallFreeCount[i])
+		totalFree += f * uint64(class_to_size[i])
+		memstats.nfree += f
+		memstats.by_size[i].nfree = f
+	}
+
+	// Account for tiny allocations.
+	memstats.nfree += memstats.tinyallocs
+	memstats.nmalloc += memstats.tinyallocs
+
+	// Calculate derived stats.
+	memstats.total_alloc = totalAlloc
+	memstats.alloc = totalAlloc - totalFree
+	memstats.heap_objects = memstats.nmalloc - memstats.nfree
+
+	memstats.stacks_inuse = uint64(consStats.inStacks)
+	memstats.gcWorkBufInUse = uint64(consStats.inWorkBufs)
+	memstats.gcProgPtrScalarBitsInUse = uint64(consStats.inPtrScalarBits)
+
+	// We also count stacks_inuse, gcWorkBufInUse, and gcProgPtrScalarBitsInUse as sys memory.
+	memstats.sys += memstats.stacks_inuse + memstats.gcWorkBufInUse + memstats.gcProgPtrScalarBitsInUse
+
+	// The world is stopped, so the consistent stats (after aggregation)
+	// should be identical to some combination of memstats. In particular:
+	//
+	// * heap_inuse == inHeap
+	// * heap_released == released
+	// * heap_sys - heap_released == committed - inStacks - inWorkBufs - inPtrScalarBits
+	//
+	// Check if that's actually true.
+	//
+	// TODO(mknyszek): Maybe don't throw here. It would be bad if a
+	// bug in otherwise benign accounting caused the whole application
+	// to crash.
+	if memstats.heap_inuse != uint64(consStats.inHeap) {
+		print("runtime: heap_inuse=", memstats.heap_inuse, "\n")
+		print("runtime: consistent value=", consStats.inHeap, "\n")
+		throw("heap_inuse and consistent stats are not equal")
+	}
+	if memstats.heap_released != uint64(consStats.released) {
+		print("runtime: heap_released=", memstats.heap_released, "\n")
+		print("runtime: consistent value=", consStats.released, "\n")
+		throw("heap_released and consistent stats are not equal")
+	}
+	globalRetained := memstats.heap_sys.load() - memstats.heap_released
+	consRetained := uint64(consStats.committed - consStats.inStacks - consStats.inWorkBufs - consStats.inPtrScalarBits)
+	if globalRetained != consRetained {
+		print("runtime: global value=", globalRetained, "\n")
+		print("runtime: consistent value=", consRetained, "\n")
+		throw("measures of the retained heap are not equal")
+	}
+}
+
+// flushmcache flushes the mcache of allp[i].
+//
+// The world must be stopped.
+//
+//go:nowritebarrier
+func flushmcache(i int) {
+	assertWorldStopped()
+
+	p := allp[i]
+	c := p.mcache
+	if c == nil {
+		return
+	}
+	c.releaseAll()
+	stackcache_clear(c)
+}
+
+// flushallmcaches flushes the mcaches of all Ps.
+//
+// The world must be stopped.
+//
+//go:nowritebarrier
+func flushallmcaches() {
+	assertWorldStopped()
+
+	for i := 0; i < int(gomaxprocs); i++ {
+		flushmcache(i)
+	}
+}
+
+// sysMemStat represents a global system statistic that is managed atomically.
+//
+// This type must structurally be a uint64 so that mstats aligns with MemStats.
+type sysMemStat uint64
+
+// load atomically reads the value of the stat.
+//
+// Must be nosplit as it is called in runtime initialization, e.g. newosproc0.
+//go:nosplit
+func (s *sysMemStat) load() uint64 {
+	return atomic.Load64((*uint64)(s))
+}
+
+// add atomically adds the sysMemStat by n.
+//
+// Must be nosplit as it is called in runtime initialization, e.g. newosproc0.
+//go:nosplit
+func (s *sysMemStat) add(n int64) {
+	if s == nil {
+		return
+	}
+	val := atomic.Xadd64((*uint64)(s), n)
+	if (n > 0 && int64(val) < n) || (n < 0 && int64(val)+n < n) {
+		print("runtime: val=", val, " n=", n, "\n")
+		throw("sysMemStat overflow")
+	}
+}
+
+// heapStatsDelta contains deltas of various runtime memory statistics
+// that need to be updated together in order for them to be kept
+// consistent with one another.
+type heapStatsDelta struct {
+	// Memory stats.
+	committed       int64 // byte delta of memory committed
+	released        int64 // byte delta of released memory generated
+	inHeap          int64 // byte delta of memory placed in the heap
+	inStacks        int64 // byte delta of memory reserved for stacks
+	inWorkBufs      int64 // byte delta of memory reserved for work bufs
+	inPtrScalarBits int64 // byte delta of memory reserved for unrolled GC prog bits
+
+	// Allocator stats.
+	largeAlloc      uintptr                  // bytes allocated for large objects
+	largeAllocCount uintptr                  // number of large object allocations
+	smallAllocCount [_NumSizeClasses]uintptr // number of allocs for small objects
+	largeFree       uintptr                  // bytes freed for large objects (>maxSmallSize)
+	largeFreeCount  uintptr                  // number of frees for large objects (>maxSmallSize)
+	smallFreeCount  [_NumSizeClasses]uintptr // number of frees for small objects (<=maxSmallSize)
+
+	// Add a uint32 to ensure this struct is a multiple of 8 bytes in size.
+	// Only necessary on 32-bit platforms.
+	// _ [(sys.PtrSize / 4) % 2]uint32
+}
+
+// merge adds in the deltas from b into a.
+func (a *heapStatsDelta) merge(b *heapStatsDelta) {
+	a.committed += b.committed
+	a.released += b.released
+	a.inHeap += b.inHeap
+	a.inStacks += b.inStacks
+	a.inWorkBufs += b.inWorkBufs
+	a.inPtrScalarBits += b.inPtrScalarBits
+
+	a.largeAlloc += b.largeAlloc
+	a.largeAllocCount += b.largeAllocCount
+	for i := range b.smallAllocCount {
+		a.smallAllocCount[i] += b.smallAllocCount[i]
+	}
+	a.largeFree += b.largeFree
+	a.largeFreeCount += b.largeFreeCount
+	for i := range b.smallFreeCount {
+		a.smallFreeCount[i] += b.smallFreeCount[i]
+	}
+}
+
+// consistentHeapStats represents a set of various memory statistics
+// whose updates must be viewed completely to get a consistent
+// state of the world.
+//
+// To write updates to memory stats use the acquire and release
+// methods. To obtain a consistent global snapshot of these statistics,
+// use read.
+type consistentHeapStats struct {
+	// stats is a ring buffer of heapStatsDelta values.
+	// Writers always atomically update the delta at index gen.
+	//
+	// Readers operate by rotating gen (0 -> 1 -> 2 -> 0 -> ...)
+	// and synchronizing with writers by observing each P's
+	// statsSeq field. If the reader observes a P not writing,
+	// it can be sure that it will pick up the new gen value the
+	// next time it writes.
+	//
+	// The reader then takes responsibility by clearing space
+	// in the ring buffer for the next reader to rotate gen to
+	// that space (i.e. it merges in values from index (gen-2) mod 3
+	// to index (gen-1) mod 3, then clears the former).
+	//
+	// Note that this means only one reader can be reading at a time.
+	// There is no way for readers to synchronize.
+	//
+	// This process is why we need a ring buffer of size 3 instead
+	// of 2: one is for the writers, one contains the most recent
+	// data, and the last one is clear so writers can begin writing
+	// to it the moment gen is updated.
+	stats [3]heapStatsDelta
+
+	// gen represents the current index into which writers
+	// are writing, and can take on the value of 0, 1, or 2.
+	// This value is updated atomically.
+	gen uint32
+
+	// noPLock is intended to provide mutual exclusion for updating
+	// stats when no P is available. It does not block other writers
+	// with a P, only other writers without a P and the reader. Because
+	// stats are usually updated when a P is available, contention on
+	// this lock should be minimal.
+	noPLock mutex
+}
+
+// acquire returns a heapStatsDelta to be updated. In effect,
+// it acquires the shard for writing. release must be called
+// as soon as the relevant deltas are updated.
+//
+// The returned heapStatsDelta must be updated atomically.
+//
+// The caller's P must not change between acquire and
+// release. This also means that the caller should not
+// acquire a P or release its P in between.
+func (m *consistentHeapStats) acquire() *heapStatsDelta {
+	if pp := getg().m.p.ptr(); pp != nil {
+		seq := atomic.Xadd(&pp.statsSeq, 1)
+		if seq%2 == 0 {
+			// Should have been incremented to odd.
+			print("runtime: seq=", seq, "\n")
+			throw("bad sequence number")
+		}
+	} else {
+		lock(&m.noPLock)
+	}
+	gen := atomic.Load(&m.gen) % 3
+	return &m.stats[gen]
+}
+
+// release indicates that the writer is done modifying
+// the delta. The value returned by the corresponding
+// acquire must no longer be accessed or modified after
+// release is called.
+//
+// The caller's P must not change between acquire and
+// release. This also means that the caller should not
+// acquire a P or release its P in between.
+func (m *consistentHeapStats) release() {
+	if pp := getg().m.p.ptr(); pp != nil {
+		seq := atomic.Xadd(&pp.statsSeq, 1)
+		if seq%2 != 0 {
+			// Should have been incremented to even.
+			print("runtime: seq=", seq, "\n")
+			throw("bad sequence number")
+		}
+	} else {
+		unlock(&m.noPLock)
+	}
+}
+
+// unsafeRead aggregates the delta for this shard into out.
+//
+// Unsafe because it does so without any synchronization. The
+// world must be stopped.
+func (m *consistentHeapStats) unsafeRead(out *heapStatsDelta) {
+	assertWorldStopped()
+
+	for i := range m.stats {
+		out.merge(&m.stats[i])
+	}
+}
+
+// unsafeClear clears the shard.
+//
+// Unsafe because the world must be stopped and values should
+// be donated elsewhere before clearing.
+func (m *consistentHeapStats) unsafeClear() {
+	assertWorldStopped()
+
+	for i := range m.stats {
+		m.stats[i] = heapStatsDelta{}
+	}
+}
+
+// read takes a globally consistent snapshot of m
+// and puts the aggregated value in out. Even though out is a
+// heapStatsDelta, the resulting values should be complete and
+// valid statistic values.
+//
+// Not safe to call concurrently. The world must be stopped
+// or metricsSema must be held.
+func (m *consistentHeapStats) read(out *heapStatsDelta) {
+	// Getting preempted after this point is not safe because
+	// we read allp. We need to make sure a STW can't happen
+	// so it doesn't change out from under us.
+	mp := acquirem()
+
+	// Get the current generation. We can be confident that this
+	// will not change since read is serialized and is the only
+	// one that modifies currGen.
+	currGen := atomic.Load(&m.gen)
+	prevGen := currGen - 1
+	if currGen == 0 {
+		prevGen = 2
+	}
+
+	// Prevent writers without a P from writing while we update gen.
+	lock(&m.noPLock)
+
+	// Rotate gen, effectively taking a snapshot of the state of
+	// these statistics at the point of the exchange by moving
+	// writers to the next set of deltas.
+	//
+	// This exchange is safe to do because we won't race
+	// with anyone else trying to update this value.
+	atomic.Xchg(&m.gen, (currGen+1)%3)
+
+	// Allow P-less writers to continue. They'll be writing to the
+	// next generation now.
+	unlock(&m.noPLock)
+
+	for _, p := range allp {
+		// Spin until there are no more writers.
+		for atomic.Load(&p.statsSeq)%2 != 0 {
+		}
+	}
+
+	// At this point we've observed that each sequence
+	// number is even, so any future writers will observe
+	// the new gen value. That means it's safe to read from
+	// the other deltas in the stats buffer.
+
+	// Perform our responsibilities and free up
+	// stats[prevGen] for the next time we want to take
+	// a snapshot.
+	m.stats[currGen].merge(&m.stats[prevGen])
+	m.stats[prevGen] = heapStatsDelta{}
+
+	// Finally, copy out the complete delta.
+	*out = m.stats[currGen]
+
+	releasem(mp)
+}
diff --git a/src/runtime/mwbbuf.go b/src/runtime/mwbbuf.go
new file mode 100644
index 0000000..6efc000
--- /dev/null
+++ b/src/runtime/mwbbuf.go
@@ -0,0 +1,290 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This implements the write barrier buffer. The write barrier itself
+// is gcWriteBarrier and is implemented in assembly.
+//
+// See mbarrier.go for algorithmic details on the write barrier. This
+// file deals only with the buffer.
+//
+// The write barrier has a fast path and a slow path. The fast path
+// simply enqueues to a per-P write barrier buffer. It's written in
+// assembly and doesn't clobber any general purpose registers, so it
+// doesn't have the usual overheads of a Go call.
+//
+// When the buffer fills up, the write barrier invokes the slow path
+// (wbBufFlush) to flush the buffer to the GC work queues. In this
+// path, since the compiler didn't spill registers, we spill *all*
+// registers and disallow any GC safe points that could observe the
+// stack frame (since we don't know the types of the spilled
+// registers).
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// testSmallBuf forces a small write barrier buffer to stress write
+// barrier flushing.
+const testSmallBuf = false
+
+// wbBuf is a per-P buffer of pointers queued by the write barrier.
+// This buffer is flushed to the GC workbufs when it fills up and on
+// various GC transitions.
+//
+// This is closely related to a "sequential store buffer" (SSB),
+// except that SSBs are usually used for maintaining remembered sets,
+// while this is used for marking.
+type wbBuf struct {
+	// next points to the next slot in buf. It must not be a
+	// pointer type because it can point past the end of buf and
+	// must be updated without write barriers.
+	//
+	// This is a pointer rather than an index to optimize the
+	// write barrier assembly.
+	next uintptr
+
+	// end points to just past the end of buf. It must not be a
+	// pointer type because it points past the end of buf and must
+	// be updated without write barriers.
+	end uintptr
+
+	// buf stores a series of pointers to execute write barriers
+	// on. This must be a multiple of wbBufEntryPointers because
+	// the write barrier only checks for overflow once per entry.
+	buf [wbBufEntryPointers * wbBufEntries]uintptr
+}
+
+const (
+	// wbBufEntries is the number of write barriers between
+	// flushes of the write barrier buffer.
+	//
+	// This trades latency for throughput amortization. Higher
+	// values amortize flushing overhead more, but increase the
+	// latency of flushing. Higher values also increase the cache
+	// footprint of the buffer.
+	//
+	// TODO: What is the latency cost of this? Tune this value.
+	wbBufEntries = 256
+
+	// wbBufEntryPointers is the number of pointers added to the
+	// buffer by each write barrier.
+	wbBufEntryPointers = 2
+)
+
+// reset empties b by resetting its next and end pointers.
+func (b *wbBuf) reset() {
+	start := uintptr(unsafe.Pointer(&b.buf[0]))
+	b.next = start
+	if writeBarrier.cgo {
+		// Effectively disable the buffer by forcing a flush
+		// on every barrier.
+		b.end = uintptr(unsafe.Pointer(&b.buf[wbBufEntryPointers]))
+	} else if testSmallBuf {
+		// For testing, allow two barriers in the buffer. If
+		// we only did one, then barriers of non-heap pointers
+		// would be no-ops. This lets us combine a buffered
+		// barrier with a flush at a later time.
+		b.end = uintptr(unsafe.Pointer(&b.buf[2*wbBufEntryPointers]))
+	} else {
+		b.end = start + uintptr(len(b.buf))*unsafe.Sizeof(b.buf[0])
+	}
+
+	if (b.end-b.next)%(wbBufEntryPointers*unsafe.Sizeof(b.buf[0])) != 0 {
+		throw("bad write barrier buffer bounds")
+	}
+}
+
+// discard resets b's next pointer, but not its end pointer.
+//
+// This must be nosplit because it's called by wbBufFlush.
+//
+//go:nosplit
+func (b *wbBuf) discard() {
+	b.next = uintptr(unsafe.Pointer(&b.buf[0]))
+}
+
+// empty reports whether b contains no pointers.
+func (b *wbBuf) empty() bool {
+	return b.next == uintptr(unsafe.Pointer(&b.buf[0]))
+}
+
+// putFast adds old and new to the write barrier buffer and returns
+// false if a flush is necessary. Callers should use this as:
+//
+//     buf := &getg().m.p.ptr().wbBuf
+//     if !buf.putFast(old, new) {
+//         wbBufFlush(...)
+//     }
+//     ... actual memory write ...
+//
+// The arguments to wbBufFlush depend on whether the caller is doing
+// its own cgo pointer checks. If it is, then this can be
+// wbBufFlush(nil, 0). Otherwise, it must pass the slot address and
+// new.
+//
+// The caller must ensure there are no preemption points during the
+// above sequence. There must be no preemption points while buf is in
+// use because it is a per-P resource. There must be no preemption
+// points between the buffer put and the write to memory because this
+// could allow a GC phase change, which could result in missed write
+// barriers.
+//
+// putFast must be nowritebarrierrec to because write barriers here would
+// corrupt the write barrier buffer. It (and everything it calls, if
+// it called anything) has to be nosplit to avoid scheduling on to a
+// different P and a different buffer.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func (b *wbBuf) putFast(old, new uintptr) bool {
+	p := (*[2]uintptr)(unsafe.Pointer(b.next))
+	p[0] = old
+	p[1] = new
+	b.next += 2 * sys.PtrSize
+	return b.next != b.end
+}
+
+// wbBufFlush flushes the current P's write barrier buffer to the GC
+// workbufs. It is passed the slot and value of the write barrier that
+// caused the flush so that it can implement cgocheck.
+//
+// This must not have write barriers because it is part of the write
+// barrier implementation.
+//
+// This and everything it calls must be nosplit because 1) the stack
+// contains untyped slots from gcWriteBarrier and 2) there must not be
+// a GC safe point between the write barrier test in the caller and
+// flushing the buffer.
+//
+// TODO: A "go:nosplitrec" annotation would be perfect for this.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func wbBufFlush(dst *uintptr, src uintptr) {
+	// Note: Every possible return from this function must reset
+	// the buffer's next pointer to prevent buffer overflow.
+
+	// This *must not* modify its arguments because this
+	// function's argument slots do double duty in gcWriteBarrier
+	// as register spill slots. Currently, not modifying the
+	// arguments is sufficient to keep the spill slots unmodified
+	// (which seems unlikely to change since it costs little and
+	// helps with debugging).
+
+	if getg().m.dying > 0 {
+		// We're going down. Not much point in write barriers
+		// and this way we can allow write barriers in the
+		// panic path.
+		getg().m.p.ptr().wbBuf.discard()
+		return
+	}
+
+	if writeBarrier.cgo && dst != nil {
+		// This must be called from the stack that did the
+		// write. It's nosplit all the way down.
+		cgoCheckWriteBarrier(dst, src)
+		if !writeBarrier.needed {
+			// We were only called for cgocheck.
+			getg().m.p.ptr().wbBuf.discard()
+			return
+		}
+	}
+
+	// Switch to the system stack so we don't have to worry about
+	// the untyped stack slots or safe points.
+	systemstack(func() {
+		wbBufFlush1(getg().m.p.ptr())
+	})
+}
+
+// wbBufFlush1 flushes p's write barrier buffer to the GC work queue.
+//
+// This must not have write barriers because it is part of the write
+// barrier implementation, so this may lead to infinite loops or
+// buffer corruption.
+//
+// This must be non-preemptible because it uses the P's workbuf.
+//
+//go:nowritebarrierrec
+//go:systemstack
+func wbBufFlush1(_p_ *p) {
+	// Get the buffered pointers.
+	start := uintptr(unsafe.Pointer(&_p_.wbBuf.buf[0]))
+	n := (_p_.wbBuf.next - start) / unsafe.Sizeof(_p_.wbBuf.buf[0])
+	ptrs := _p_.wbBuf.buf[:n]
+
+	// Poison the buffer to make extra sure nothing is enqueued
+	// while we're processing the buffer.
+	_p_.wbBuf.next = 0
+
+	if useCheckmark {
+		// Slow path for checkmark mode.
+		for _, ptr := range ptrs {
+			shade(ptr)
+		}
+		_p_.wbBuf.reset()
+		return
+	}
+
+	// Mark all of the pointers in the buffer and record only the
+	// pointers we greyed. We use the buffer itself to temporarily
+	// record greyed pointers.
+	//
+	// TODO: Should scanobject/scanblock just stuff pointers into
+	// the wbBuf? Then this would become the sole greying path.
+	//
+	// TODO: We could avoid shading any of the "new" pointers in
+	// the buffer if the stack has been shaded, or even avoid
+	// putting them in the buffer at all (which would double its
+	// capacity). This is slightly complicated with the buffer; we
+	// could track whether any un-shaded goroutine has used the
+	// buffer, or just track globally whether there are any
+	// un-shaded stacks and flush after each stack scan.
+	gcw := &_p_.gcw
+	pos := 0
+	for _, ptr := range ptrs {
+		if ptr < minLegalPointer {
+			// nil pointers are very common, especially
+			// for the "old" values. Filter out these and
+			// other "obvious" non-heap pointers ASAP.
+			//
+			// TODO: Should we filter out nils in the fast
+			// path to reduce the rate of flushes?
+			continue
+		}
+		obj, span, objIndex := findObject(ptr, 0, 0)
+		if obj == 0 {
+			continue
+		}
+		// TODO: Consider making two passes where the first
+		// just prefetches the mark bits.
+		mbits := span.markBitsForIndex(objIndex)
+		if mbits.isMarked() {
+			continue
+		}
+		mbits.setMarked()
+
+		// Mark span.
+		arena, pageIdx, pageMask := pageIndexOf(span.base())
+		if arena.pageMarks[pageIdx]&pageMask == 0 {
+			atomic.Or8(&arena.pageMarks[pageIdx], pageMask)
+		}
+
+		if span.spanclass.noscan() {
+			gcw.bytesMarked += uint64(span.elemsize)
+			continue
+		}
+		ptrs[pos] = obj
+		pos++
+	}
+
+	// Enqueue the greyed objects.
+	gcw.putBatch(ptrs[:pos])
+
+	_p_.wbBuf.reset()
+}
diff --git a/src/runtime/nbpipe_fcntl_libc_test.go b/src/runtime/nbpipe_fcntl_libc_test.go
new file mode 100644
index 0000000..b38c583
--- /dev/null
+++ b/src/runtime/nbpipe_fcntl_libc_test.go
@@ -0,0 +1,18 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin solaris
+
+package runtime_test
+
+import (
+	"runtime"
+	"syscall"
+)
+
+// Call fcntl libc function rather than calling syscall.
+func fcntl(fd uintptr, cmd int, arg uintptr) (uintptr, syscall.Errno) {
+	res, errno := runtime.Fcntl(fd, uintptr(cmd), arg)
+	return res, syscall.Errno(errno)
+}
diff --git a/src/runtime/nbpipe_fcntl_unix_test.go b/src/runtime/nbpipe_fcntl_unix_test.go
new file mode 100644
index 0000000..75acdb6
--- /dev/null
+++ b/src/runtime/nbpipe_fcntl_unix_test.go
@@ -0,0 +1,17 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build dragonfly freebsd linux netbsd openbsd
+
+package runtime_test
+
+import (
+	"internal/syscall/unix"
+	"syscall"
+)
+
+func fcntl(fd uintptr, cmd int, arg uintptr) (uintptr, syscall.Errno) {
+	res, _, err := syscall.Syscall(unix.FcntlSyscall, fd, uintptr(cmd), arg)
+	return res, err
+}
diff --git a/src/runtime/nbpipe_pipe.go b/src/runtime/nbpipe_pipe.go
new file mode 100644
index 0000000..822b294
--- /dev/null
+++ b/src/runtime/nbpipe_pipe.go
@@ -0,0 +1,19 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly
+
+package runtime
+
+func nonblockingPipe() (r, w int32, errno int32) {
+	r, w, errno = pipe()
+	if errno != 0 {
+		return -1, -1, errno
+	}
+	closeonexec(r)
+	setNonblock(r)
+	closeonexec(w)
+	setNonblock(w)
+	return r, w, errno
+}
diff --git a/src/runtime/nbpipe_pipe2.go b/src/runtime/nbpipe_pipe2.go
new file mode 100644
index 0000000..e3639d9
--- /dev/null
+++ b/src/runtime/nbpipe_pipe2.go
@@ -0,0 +1,22 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build freebsd linux netbsd openbsd solaris
+
+package runtime
+
+func nonblockingPipe() (r, w int32, errno int32) {
+	r, w, errno = pipe2(_O_NONBLOCK | _O_CLOEXEC)
+	if errno == -_ENOSYS {
+		r, w, errno = pipe()
+		if errno != 0 {
+			return -1, -1, errno
+		}
+		closeonexec(r)
+		setNonblock(r)
+		closeonexec(w)
+		setNonblock(w)
+	}
+	return r, w, errno
+}
diff --git a/src/runtime/nbpipe_test.go b/src/runtime/nbpipe_test.go
new file mode 100644
index 0000000..d739f57
--- /dev/null
+++ b/src/runtime/nbpipe_test.go
@@ -0,0 +1,93 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+
+package runtime_test
+
+import (
+	"runtime"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+func TestNonblockingPipe(t *testing.T) {
+	t.Parallel()
+
+	// NonblockingPipe is the test name for nonblockingPipe.
+	r, w, errno := runtime.NonblockingPipe()
+	if errno != 0 {
+		t.Fatal(syscall.Errno(errno))
+	}
+	defer func() {
+		runtime.Close(r)
+		runtime.Close(w)
+	}()
+
+	checkIsPipe(t, r, w)
+	checkNonblocking(t, r, "reader")
+	checkCloseonexec(t, r, "reader")
+	checkNonblocking(t, w, "writer")
+	checkCloseonexec(t, w, "writer")
+}
+
+func checkIsPipe(t *testing.T, r, w int32) {
+	bw := byte(42)
+	if n := runtime.Write(uintptr(w), unsafe.Pointer(&bw), 1); n != 1 {
+		t.Fatalf("Write(w, &b, 1) == %d, expected 1", n)
+	}
+	var br byte
+	if n := runtime.Read(r, unsafe.Pointer(&br), 1); n != 1 {
+		t.Fatalf("Read(r, &b, 1) == %d, expected 1", n)
+	}
+	if br != bw {
+		t.Errorf("pipe read %d, expected %d", br, bw)
+	}
+}
+
+func checkNonblocking(t *testing.T, fd int32, name string) {
+	t.Helper()
+	flags, errno := fcntl(uintptr(fd), syscall.F_GETFL, 0)
+	if errno != 0 {
+		t.Errorf("fcntl(%s, F_GETFL) failed: %v", name, syscall.Errno(errno))
+	} else if flags&syscall.O_NONBLOCK == 0 {
+		t.Errorf("O_NONBLOCK not set in %s flags %#x", name, flags)
+	}
+}
+
+func checkCloseonexec(t *testing.T, fd int32, name string) {
+	t.Helper()
+	flags, errno := fcntl(uintptr(fd), syscall.F_GETFD, 0)
+	if errno != 0 {
+		t.Errorf("fcntl(%s, F_GETFD) failed: %v", name, syscall.Errno(errno))
+	} else if flags&syscall.FD_CLOEXEC == 0 {
+		t.Errorf("FD_CLOEXEC not set in %s flags %#x", name, flags)
+	}
+}
+
+func TestSetNonblock(t *testing.T) {
+	t.Parallel()
+
+	r, w, errno := runtime.Pipe()
+	if errno != 0 {
+		t.Fatal(syscall.Errno(errno))
+	}
+	defer func() {
+		runtime.Close(r)
+		runtime.Close(w)
+	}()
+
+	checkIsPipe(t, r, w)
+
+	runtime.SetNonblock(r)
+	runtime.SetNonblock(w)
+	checkNonblocking(t, r, "reader")
+	checkNonblocking(t, w, "writer")
+
+	runtime.Closeonexec(r)
+	runtime.Closeonexec(w)
+	checkCloseonexec(t, r, "reader")
+	checkCloseonexec(t, w, "writer")
+}
diff --git a/src/runtime/net_plan9.go b/src/runtime/net_plan9.go
new file mode 100644
index 0000000..b1ac7c7
--- /dev/null
+++ b/src/runtime/net_plan9.go
@@ -0,0 +1,29 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	_ "unsafe"
+)
+
+//go:linkname runtime_ignoreHangup internal/poll.runtime_ignoreHangup
+func runtime_ignoreHangup() {
+	getg().m.ignoreHangup = true
+}
+
+//go:linkname runtime_unignoreHangup internal/poll.runtime_unignoreHangup
+func runtime_unignoreHangup(sig string) {
+	getg().m.ignoreHangup = false
+}
+
+func ignoredNote(note *byte) bool {
+	if note == nil {
+		return false
+	}
+	if gostringnocopy(note) != "hangup" {
+		return false
+	}
+	return getg().m.ignoreHangup
+}
diff --git a/src/runtime/netpoll.go b/src/runtime/netpoll.go
new file mode 100644
index 0000000..f296b0a
--- /dev/null
+++ b/src/runtime/netpoll.go
@@ -0,0 +1,577 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly freebsd js,wasm linux netbsd openbsd solaris windows
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// Integrated network poller (platform-independent part).
+// A particular implementation (epoll/kqueue/port/AIX/Windows)
+// must define the following functions:
+//
+// func netpollinit()
+//     Initialize the poller. Only called once.
+//
+// func netpollopen(fd uintptr, pd *pollDesc) int32
+//     Arm edge-triggered notifications for fd. The pd argument is to pass
+//     back to netpollready when fd is ready. Return an errno value.
+//
+// func netpoll(delta int64) gList
+//     Poll the network. If delta < 0, block indefinitely. If delta == 0,
+//     poll without blocking. If delta > 0, block for up to delta nanoseconds.
+//     Return a list of goroutines built by calling netpollready.
+//
+// func netpollBreak()
+//     Wake up the network poller, assumed to be blocked in netpoll.
+//
+// func netpollIsPollDescriptor(fd uintptr) bool
+//     Reports whether fd is a file descriptor used by the poller.
+
+// Error codes returned by runtime_pollReset and runtime_pollWait.
+// These must match the values in internal/poll/fd_poll_runtime.go.
+const (
+	pollNoError        = 0 // no error
+	pollErrClosing     = 1 // descriptor is closed
+	pollErrTimeout     = 2 // I/O timeout
+	pollErrNotPollable = 3 // general error polling descriptor
+)
+
+// pollDesc contains 2 binary semaphores, rg and wg, to park reader and writer
+// goroutines respectively. The semaphore can be in the following states:
+// pdReady - io readiness notification is pending;
+//           a goroutine consumes the notification by changing the state to nil.
+// pdWait - a goroutine prepares to park on the semaphore, but not yet parked;
+//          the goroutine commits to park by changing the state to G pointer,
+//          or, alternatively, concurrent io notification changes the state to pdReady,
+//          or, alternatively, concurrent timeout/close changes the state to nil.
+// G pointer - the goroutine is blocked on the semaphore;
+//             io notification or timeout/close changes the state to pdReady or nil respectively
+//             and unparks the goroutine.
+// nil - none of the above.
+const (
+	pdReady uintptr = 1
+	pdWait  uintptr = 2
+)
+
+const pollBlockSize = 4 * 1024
+
+// Network poller descriptor.
+//
+// No heap pointers.
+//
+//go:notinheap
+type pollDesc struct {
+	link *pollDesc // in pollcache, protected by pollcache.lock
+
+	// The lock protects pollOpen, pollSetDeadline, pollUnblock and deadlineimpl operations.
+	// This fully covers seq, rt and wt variables. fd is constant throughout the PollDesc lifetime.
+	// pollReset, pollWait, pollWaitCanceled and runtime·netpollready (IO readiness notification)
+	// proceed w/o taking the lock. So closing, everr, rg, rd, wg and wd are manipulated
+	// in a lock-free way by all operations.
+	// TODO(golang.org/issue/49008): audit these lock-free fields for continued correctness.
+	// NOTE(dvyukov): the following code uses uintptr to store *g (rg/wg),
+	// that will blow up when GC starts moving objects.
+	lock    mutex // protects the following fields
+	fd      uintptr
+	closing bool
+	everr   bool      // marks event scanning error happened
+	user    uint32    // user settable cookie
+	rseq    uintptr   // protects from stale read timers
+	rg      uintptr   // pdReady, pdWait, G waiting for read or nil. Accessed atomically.
+	rt      timer     // read deadline timer (set if rt.f != nil)
+	rd      int64     // read deadline
+	wseq    uintptr   // protects from stale write timers
+	wg      uintptr   // pdReady, pdWait, G waiting for write or nil. Accessed atomically.
+	wt      timer     // write deadline timer
+	wd      int64     // write deadline
+	self    *pollDesc // storage for indirect interface. See (*pollDesc).makeArg.
+}
+
+type pollCache struct {
+	lock  mutex
+	first *pollDesc
+	// PollDesc objects must be type-stable,
+	// because we can get ready notification from epoll/kqueue
+	// after the descriptor is closed/reused.
+	// Stale notifications are detected using seq variable,
+	// seq is incremented when deadlines are changed or descriptor is reused.
+}
+
+var (
+	netpollInitLock mutex
+	netpollInited   uint32
+
+	pollcache      pollCache
+	netpollWaiters uint32
+)
+
+//go:linkname poll_runtime_pollServerInit internal/poll.runtime_pollServerInit
+func poll_runtime_pollServerInit() {
+	netpollGenericInit()
+}
+
+func netpollGenericInit() {
+	if atomic.Load(&netpollInited) == 0 {
+		lockInit(&netpollInitLock, lockRankNetpollInit)
+		lock(&netpollInitLock)
+		if netpollInited == 0 {
+			netpollinit()
+			atomic.Store(&netpollInited, 1)
+		}
+		unlock(&netpollInitLock)
+	}
+}
+
+func netpollinited() bool {
+	return atomic.Load(&netpollInited) != 0
+}
+
+//go:linkname poll_runtime_isPollServerDescriptor internal/poll.runtime_isPollServerDescriptor
+
+// poll_runtime_isPollServerDescriptor reports whether fd is a
+// descriptor being used by netpoll.
+func poll_runtime_isPollServerDescriptor(fd uintptr) bool {
+	return netpollIsPollDescriptor(fd)
+}
+
+//go:linkname poll_runtime_pollOpen internal/poll.runtime_pollOpen
+func poll_runtime_pollOpen(fd uintptr) (*pollDesc, int) {
+	pd := pollcache.alloc()
+	lock(&pd.lock)
+	wg := atomic.Loaduintptr(&pd.wg)
+	if wg != 0 && wg != pdReady {
+		throw("runtime: blocked write on free polldesc")
+	}
+	rg := atomic.Loaduintptr(&pd.rg)
+	if rg != 0 && rg != pdReady {
+		throw("runtime: blocked read on free polldesc")
+	}
+	pd.fd = fd
+	pd.closing = false
+	pd.everr = false
+	pd.rseq++
+	atomic.Storeuintptr(&pd.rg, 0)
+	pd.rd = 0
+	pd.wseq++
+	atomic.Storeuintptr(&pd.wg, 0)
+	pd.wd = 0
+	pd.self = pd
+	unlock(&pd.lock)
+
+	var errno int32
+	errno = netpollopen(fd, pd)
+	return pd, int(errno)
+}
+
+//go:linkname poll_runtime_pollClose internal/poll.runtime_pollClose
+func poll_runtime_pollClose(pd *pollDesc) {
+	if !pd.closing {
+		throw("runtime: close polldesc w/o unblock")
+	}
+	wg := atomic.Loaduintptr(&pd.wg)
+	if wg != 0 && wg != pdReady {
+		throw("runtime: blocked write on closing polldesc")
+	}
+	rg := atomic.Loaduintptr(&pd.rg)
+	if rg != 0 && rg != pdReady {
+		throw("runtime: blocked read on closing polldesc")
+	}
+	netpollclose(pd.fd)
+	pollcache.free(pd)
+}
+
+func (c *pollCache) free(pd *pollDesc) {
+	lock(&c.lock)
+	pd.link = c.first
+	c.first = pd
+	unlock(&c.lock)
+}
+
+// poll_runtime_pollReset, which is internal/poll.runtime_pollReset,
+// prepares a descriptor for polling in mode, which is 'r' or 'w'.
+// This returns an error code; the codes are defined above.
+//go:linkname poll_runtime_pollReset internal/poll.runtime_pollReset
+func poll_runtime_pollReset(pd *pollDesc, mode int) int {
+	errcode := netpollcheckerr(pd, int32(mode))
+	if errcode != pollNoError {
+		return errcode
+	}
+	if mode == 'r' {
+		atomic.Storeuintptr(&pd.rg, 0)
+	} else if mode == 'w' {
+		atomic.Storeuintptr(&pd.wg, 0)
+	}
+	return pollNoError
+}
+
+// poll_runtime_pollWait, which is internal/poll.runtime_pollWait,
+// waits for a descriptor to be ready for reading or writing,
+// according to mode, which is 'r' or 'w'.
+// This returns an error code; the codes are defined above.
+//go:linkname poll_runtime_pollWait internal/poll.runtime_pollWait
+func poll_runtime_pollWait(pd *pollDesc, mode int) int {
+	errcode := netpollcheckerr(pd, int32(mode))
+	if errcode != pollNoError {
+		return errcode
+	}
+	// As for now only Solaris, illumos, and AIX use level-triggered IO.
+	if GOOS == "solaris" || GOOS == "illumos" || GOOS == "aix" {
+		netpollarm(pd, mode)
+	}
+	for !netpollblock(pd, int32(mode), false) {
+		errcode = netpollcheckerr(pd, int32(mode))
+		if errcode != pollNoError {
+			return errcode
+		}
+		// Can happen if timeout has fired and unblocked us,
+		// but before we had a chance to run, timeout has been reset.
+		// Pretend it has not happened and retry.
+	}
+	return pollNoError
+}
+
+//go:linkname poll_runtime_pollWaitCanceled internal/poll.runtime_pollWaitCanceled
+func poll_runtime_pollWaitCanceled(pd *pollDesc, mode int) {
+	// This function is used only on windows after a failed attempt to cancel
+	// a pending async IO operation. Wait for ioready, ignore closing or timeouts.
+	for !netpollblock(pd, int32(mode), true) {
+	}
+}
+
+//go:linkname poll_runtime_pollSetDeadline internal/poll.runtime_pollSetDeadline
+func poll_runtime_pollSetDeadline(pd *pollDesc, d int64, mode int) {
+	lock(&pd.lock)
+	if pd.closing {
+		unlock(&pd.lock)
+		return
+	}
+	rd0, wd0 := pd.rd, pd.wd
+	combo0 := rd0 > 0 && rd0 == wd0
+	if d > 0 {
+		d += nanotime()
+		if d <= 0 {
+			// If the user has a deadline in the future, but the delay calculation
+			// overflows, then set the deadline to the maximum possible value.
+			d = 1<<63 - 1
+		}
+	}
+	if mode == 'r' || mode == 'r'+'w' {
+		pd.rd = d
+	}
+	if mode == 'w' || mode == 'r'+'w' {
+		pd.wd = d
+	}
+	combo := pd.rd > 0 && pd.rd == pd.wd
+	rtf := netpollReadDeadline
+	if combo {
+		rtf = netpollDeadline
+	}
+	if pd.rt.f == nil {
+		if pd.rd > 0 {
+			pd.rt.f = rtf
+			// Copy current seq into the timer arg.
+			// Timer func will check the seq against current descriptor seq,
+			// if they differ the descriptor was reused or timers were reset.
+			pd.rt.arg = pd.makeArg()
+			pd.rt.seq = pd.rseq
+			resettimer(&pd.rt, pd.rd)
+		}
+	} else if pd.rd != rd0 || combo != combo0 {
+		pd.rseq++ // invalidate current timers
+		if pd.rd > 0 {
+			modtimer(&pd.rt, pd.rd, 0, rtf, pd.makeArg(), pd.rseq)
+		} else {
+			deltimer(&pd.rt)
+			pd.rt.f = nil
+		}
+	}
+	if pd.wt.f == nil {
+		if pd.wd > 0 && !combo {
+			pd.wt.f = netpollWriteDeadline
+			pd.wt.arg = pd.makeArg()
+			pd.wt.seq = pd.wseq
+			resettimer(&pd.wt, pd.wd)
+		}
+	} else if pd.wd != wd0 || combo != combo0 {
+		pd.wseq++ // invalidate current timers
+		if pd.wd > 0 && !combo {
+			modtimer(&pd.wt, pd.wd, 0, netpollWriteDeadline, pd.makeArg(), pd.wseq)
+		} else {
+			deltimer(&pd.wt)
+			pd.wt.f = nil
+		}
+	}
+	// If we set the new deadline in the past, unblock currently pending IO if any.
+	var rg, wg *g
+	if pd.rd < 0 || pd.wd < 0 {
+		atomic.StorepNoWB(noescape(unsafe.Pointer(&wg)), nil) // full memory barrier between stores to rd/wd and load of rg/wg in netpollunblock
+		if pd.rd < 0 {
+			rg = netpollunblock(pd, 'r', false)
+		}
+		if pd.wd < 0 {
+			wg = netpollunblock(pd, 'w', false)
+		}
+	}
+	unlock(&pd.lock)
+	if rg != nil {
+		netpollgoready(rg, 3)
+	}
+	if wg != nil {
+		netpollgoready(wg, 3)
+	}
+}
+
+//go:linkname poll_runtime_pollUnblock internal/poll.runtime_pollUnblock
+func poll_runtime_pollUnblock(pd *pollDesc) {
+	lock(&pd.lock)
+	if pd.closing {
+		throw("runtime: unblock on closing polldesc")
+	}
+	pd.closing = true
+	pd.rseq++
+	pd.wseq++
+	var rg, wg *g
+	atomic.StorepNoWB(noescape(unsafe.Pointer(&rg)), nil) // full memory barrier between store to closing and read of rg/wg in netpollunblock
+	rg = netpollunblock(pd, 'r', false)
+	wg = netpollunblock(pd, 'w', false)
+	if pd.rt.f != nil {
+		deltimer(&pd.rt)
+		pd.rt.f = nil
+	}
+	if pd.wt.f != nil {
+		deltimer(&pd.wt)
+		pd.wt.f = nil
+	}
+	unlock(&pd.lock)
+	if rg != nil {
+		netpollgoready(rg, 3)
+	}
+	if wg != nil {
+		netpollgoready(wg, 3)
+	}
+}
+
+// netpollready is called by the platform-specific netpoll function.
+// It declares that the fd associated with pd is ready for I/O.
+// The toRun argument is used to build a list of goroutines to return
+// from netpoll. The mode argument is 'r', 'w', or 'r'+'w' to indicate
+// whether the fd is ready for reading or writing or both.
+//
+// This may run while the world is stopped, so write barriers are not allowed.
+//go:nowritebarrier
+func netpollready(toRun *gList, pd *pollDesc, mode int32) {
+	var rg, wg *g
+	if mode == 'r' || mode == 'r'+'w' {
+		rg = netpollunblock(pd, 'r', true)
+	}
+	if mode == 'w' || mode == 'r'+'w' {
+		wg = netpollunblock(pd, 'w', true)
+	}
+	if rg != nil {
+		toRun.push(rg)
+	}
+	if wg != nil {
+		toRun.push(wg)
+	}
+}
+
+func netpollcheckerr(pd *pollDesc, mode int32) int {
+	if pd.closing {
+		return pollErrClosing
+	}
+	if (mode == 'r' && pd.rd < 0) || (mode == 'w' && pd.wd < 0) {
+		return pollErrTimeout
+	}
+	// Report an event scanning error only on a read event.
+	// An error on a write event will be captured in a subsequent
+	// write call that is able to report a more specific error.
+	if mode == 'r' && pd.everr {
+		return pollErrNotPollable
+	}
+	return pollNoError
+}
+
+func netpollblockcommit(gp *g, gpp unsafe.Pointer) bool {
+	r := atomic.Casuintptr((*uintptr)(gpp), pdWait, uintptr(unsafe.Pointer(gp)))
+	if r {
+		// Bump the count of goroutines waiting for the poller.
+		// The scheduler uses this to decide whether to block
+		// waiting for the poller if there is nothing else to do.
+		atomic.Xadd(&netpollWaiters, 1)
+	}
+	return r
+}
+
+func netpollgoready(gp *g, traceskip int) {
+	atomic.Xadd(&netpollWaiters, -1)
+	goready(gp, traceskip+1)
+}
+
+// returns true if IO is ready, or false if timedout or closed
+// waitio - wait only for completed IO, ignore errors
+// Concurrent calls to netpollblock in the same mode are forbidden, as pollDesc
+// can hold only a single waiting goroutine for each mode.
+func netpollblock(pd *pollDesc, mode int32, waitio bool) bool {
+	gpp := &pd.rg
+	if mode == 'w' {
+		gpp = &pd.wg
+	}
+
+	// set the gpp semaphore to pdWait
+	for {
+		// Consume notification if already ready.
+		if atomic.Casuintptr(gpp, pdReady, 0) {
+			return true
+		}
+		if atomic.Casuintptr(gpp, 0, pdWait) {
+			break
+		}
+
+		// Double check that this isn't corrupt; otherwise we'd loop
+		// forever.
+		if v := atomic.Loaduintptr(gpp); v != pdReady && v != 0 {
+			throw("runtime: double wait")
+		}
+	}
+
+	// need to recheck error states after setting gpp to pdWait
+	// this is necessary because runtime_pollUnblock/runtime_pollSetDeadline/deadlineimpl
+	// do the opposite: store to closing/rd/wd, membarrier, load of rg/wg
+	if waitio || netpollcheckerr(pd, mode) == 0 {
+		gopark(netpollblockcommit, unsafe.Pointer(gpp), waitReasonIOWait, traceEvGoBlockNet, 5)
+	}
+	// be careful to not lose concurrent pdReady notification
+	old := atomic.Xchguintptr(gpp, 0)
+	if old > pdWait {
+		throw("runtime: corrupted polldesc")
+	}
+	return old == pdReady
+}
+
+func netpollunblock(pd *pollDesc, mode int32, ioready bool) *g {
+	gpp := &pd.rg
+	if mode == 'w' {
+		gpp = &pd.wg
+	}
+
+	for {
+		old := atomic.Loaduintptr(gpp)
+		if old == pdReady {
+			return nil
+		}
+		if old == 0 && !ioready {
+			// Only set pdReady for ioready. runtime_pollWait
+			// will check for timeout/cancel before waiting.
+			return nil
+		}
+		var new uintptr
+		if ioready {
+			new = pdReady
+		}
+		if atomic.Casuintptr(gpp, old, new) {
+			if old == pdWait {
+				old = 0
+			}
+			return (*g)(unsafe.Pointer(old))
+		}
+	}
+}
+
+func netpolldeadlineimpl(pd *pollDesc, seq uintptr, read, write bool) {
+	lock(&pd.lock)
+	// Seq arg is seq when the timer was set.
+	// If it's stale, ignore the timer event.
+	currentSeq := pd.rseq
+	if !read {
+		currentSeq = pd.wseq
+	}
+	if seq != currentSeq {
+		// The descriptor was reused or timers were reset.
+		unlock(&pd.lock)
+		return
+	}
+	var rg *g
+	if read {
+		if pd.rd <= 0 || pd.rt.f == nil {
+			throw("runtime: inconsistent read deadline")
+		}
+		pd.rd = -1
+		atomic.StorepNoWB(unsafe.Pointer(&pd.rt.f), nil) // full memory barrier between store to rd and load of rg in netpollunblock
+		rg = netpollunblock(pd, 'r', false)
+	}
+	var wg *g
+	if write {
+		if pd.wd <= 0 || pd.wt.f == nil && !read {
+			throw("runtime: inconsistent write deadline")
+		}
+		pd.wd = -1
+		atomic.StorepNoWB(unsafe.Pointer(&pd.wt.f), nil) // full memory barrier between store to wd and load of wg in netpollunblock
+		wg = netpollunblock(pd, 'w', false)
+	}
+	unlock(&pd.lock)
+	if rg != nil {
+		netpollgoready(rg, 0)
+	}
+	if wg != nil {
+		netpollgoready(wg, 0)
+	}
+}
+
+func netpollDeadline(arg interface{}, seq uintptr) {
+	netpolldeadlineimpl(arg.(*pollDesc), seq, true, true)
+}
+
+func netpollReadDeadline(arg interface{}, seq uintptr) {
+	netpolldeadlineimpl(arg.(*pollDesc), seq, true, false)
+}
+
+func netpollWriteDeadline(arg interface{}, seq uintptr) {
+	netpolldeadlineimpl(arg.(*pollDesc), seq, false, true)
+}
+
+func (c *pollCache) alloc() *pollDesc {
+	lock(&c.lock)
+	if c.first == nil {
+		const pdSize = unsafe.Sizeof(pollDesc{})
+		n := pollBlockSize / pdSize
+		if n == 0 {
+			n = 1
+		}
+		// Must be in non-GC memory because can be referenced
+		// only from epoll/kqueue internals.
+		mem := persistentalloc(n*pdSize, 0, &memstats.other_sys)
+		for i := uintptr(0); i < n; i++ {
+			pd := (*pollDesc)(add(mem, i*pdSize))
+			pd.link = c.first
+			c.first = pd
+		}
+	}
+	pd := c.first
+	c.first = pd.link
+	lockInit(&pd.lock, lockRankPollDesc)
+	unlock(&c.lock)
+	return pd
+}
+
+// makeArg converts pd to an interface{}.
+// makeArg does not do any allocation. Normally, such
+// a conversion requires an allocation because pointers to
+// go:notinheap types (which pollDesc is) must be stored
+// in interfaces indirectly. See issue 42076.
+func (pd *pollDesc) makeArg() (i interface{}) {
+	x := (*eface)(unsafe.Pointer(&i))
+	x._type = pdType
+	x.data = unsafe.Pointer(&pd.self)
+	return
+}
+
+var (
+	pdEface interface{} = (*pollDesc)(nil)
+	pdType  *_type      = efaceOf(&pdEface)._type
+)
diff --git a/src/runtime/netpoll_aix.go b/src/runtime/netpoll_aix.go
new file mode 100644
index 0000000..4590ed8
--- /dev/null
+++ b/src/runtime/netpoll_aix.go
@@ -0,0 +1,225 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// This is based on the former libgo/runtime/netpoll_select.c implementation
+// except that it uses poll instead of select and is written in Go.
+// It's also based on Solaris implementation for the arming mechanisms
+
+//go:cgo_import_dynamic libc_poll poll "libc.a/shr_64.o"
+//go:linkname libc_poll libc_poll
+
+var libc_poll libFunc
+
+//go:nosplit
+func poll(pfds *pollfd, npfds uintptr, timeout uintptr) (int32, int32) {
+	r, err := syscall3(&libc_poll, uintptr(unsafe.Pointer(pfds)), npfds, timeout)
+	return int32(r), int32(err)
+}
+
+// pollfd represents the poll structure for AIX operating system.
+type pollfd struct {
+	fd      int32
+	events  int16
+	revents int16
+}
+
+const _POLLIN = 0x0001
+const _POLLOUT = 0x0002
+const _POLLHUP = 0x2000
+const _POLLERR = 0x4000
+
+var (
+	pfds           []pollfd
+	pds            []*pollDesc
+	mtxpoll        mutex
+	mtxset         mutex
+	rdwake         int32
+	wrwake         int32
+	pendingUpdates int32
+
+	netpollWakeSig uint32 // used to avoid duplicate calls of netpollBreak
+)
+
+func netpollinit() {
+	// Create the pipe we use to wakeup poll.
+	r, w, errno := nonblockingPipe()
+	if errno != 0 {
+		throw("netpollinit: failed to create pipe")
+	}
+	rdwake = r
+	wrwake = w
+
+	// Pre-allocate array of pollfd structures for poll.
+	pfds = make([]pollfd, 1, 128)
+
+	// Poll the read side of the pipe.
+	pfds[0].fd = rdwake
+	pfds[0].events = _POLLIN
+
+	pds = make([]*pollDesc, 1, 128)
+	pds[0] = nil
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return fd == uintptr(rdwake) || fd == uintptr(wrwake)
+}
+
+// netpollwakeup writes on wrwake to wakeup poll before any changes.
+func netpollwakeup() {
+	if pendingUpdates == 0 {
+		pendingUpdates = 1
+		b := [1]byte{0}
+		write(uintptr(wrwake), unsafe.Pointer(&b[0]), 1)
+	}
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	lock(&mtxpoll)
+	netpollwakeup()
+
+	lock(&mtxset)
+	unlock(&mtxpoll)
+
+	pd.user = uint32(len(pfds))
+	pfds = append(pfds, pollfd{fd: int32(fd)})
+	pds = append(pds, pd)
+	unlock(&mtxset)
+	return 0
+}
+
+func netpollclose(fd uintptr) int32 {
+	lock(&mtxpoll)
+	netpollwakeup()
+
+	lock(&mtxset)
+	unlock(&mtxpoll)
+
+	for i := 0; i < len(pfds); i++ {
+		if pfds[i].fd == int32(fd) {
+			pfds[i] = pfds[len(pfds)-1]
+			pfds = pfds[:len(pfds)-1]
+
+			pds[i] = pds[len(pds)-1]
+			pds[i].user = uint32(i)
+			pds = pds[:len(pds)-1]
+			break
+		}
+	}
+	unlock(&mtxset)
+	return 0
+}
+
+func netpollarm(pd *pollDesc, mode int) {
+	lock(&mtxpoll)
+	netpollwakeup()
+
+	lock(&mtxset)
+	unlock(&mtxpoll)
+
+	switch mode {
+	case 'r':
+		pfds[pd.user].events |= _POLLIN
+	case 'w':
+		pfds[pd.user].events |= _POLLOUT
+	}
+	unlock(&mtxset)
+}
+
+// netpollBreak interrupts a poll.
+func netpollBreak() {
+	if atomic.Cas(&netpollWakeSig, 0, 1) {
+		b := [1]byte{0}
+		write(uintptr(wrwake), unsafe.Pointer(&b[0]), 1)
+	}
+}
+
+// netpoll checks for ready network connections.
+// Returns list of goroutines that become runnable.
+// delay < 0: blocks indefinitely
+// delay == 0: does not block, just polls
+// delay > 0: block for up to that many nanoseconds
+//go:nowritebarrierrec
+func netpoll(delay int64) gList {
+	var timeout uintptr
+	if delay < 0 {
+		timeout = ^uintptr(0)
+	} else if delay == 0 {
+		// TODO: call poll with timeout == 0
+		return gList{}
+	} else if delay < 1e6 {
+		timeout = 1
+	} else if delay < 1e15 {
+		timeout = uintptr(delay / 1e6)
+	} else {
+		// An arbitrary cap on how long to wait for a timer.
+		// 1e9 ms == ~11.5 days.
+		timeout = 1e9
+	}
+retry:
+	lock(&mtxpoll)
+	lock(&mtxset)
+	pendingUpdates = 0
+	unlock(&mtxpoll)
+
+	n, e := poll(&pfds[0], uintptr(len(pfds)), timeout)
+	if n < 0 {
+		if e != _EINTR {
+			println("errno=", e, " len(pfds)=", len(pfds))
+			throw("poll failed")
+		}
+		unlock(&mtxset)
+		// If a timed sleep was interrupted, just return to
+		// recalculate how long we should sleep now.
+		if timeout > 0 {
+			return gList{}
+		}
+		goto retry
+	}
+	// Check if some descriptors need to be changed
+	if n != 0 && pfds[0].revents&(_POLLIN|_POLLHUP|_POLLERR) != 0 {
+		if delay != 0 {
+			// A netpollwakeup could be picked up by a
+			// non-blocking poll. Only clear the wakeup
+			// if blocking.
+			var b [1]byte
+			for read(rdwake, unsafe.Pointer(&b[0]), 1) == 1 {
+			}
+			atomic.Store(&netpollWakeSig, 0)
+		}
+		// Still look at the other fds even if the mode may have
+		// changed, as netpollBreak might have been called.
+		n--
+	}
+	var toRun gList
+	for i := 1; i < len(pfds) && n > 0; i++ {
+		pfd := &pfds[i]
+
+		var mode int32
+		if pfd.revents&(_POLLIN|_POLLHUP|_POLLERR) != 0 {
+			mode += 'r'
+			pfd.events &= ^_POLLIN
+		}
+		if pfd.revents&(_POLLOUT|_POLLHUP|_POLLERR) != 0 {
+			mode += 'w'
+			pfd.events &= ^_POLLOUT
+		}
+		if mode != 0 {
+			pds[i].everr = false
+			if pfd.revents == _POLLERR {
+				pds[i].everr = true
+			}
+			netpollready(&toRun, pds[i], mode)
+			n--
+		}
+	}
+	unlock(&mtxset)
+	return toRun
+}
diff --git a/src/runtime/netpoll_epoll.go b/src/runtime/netpoll_epoll.go
new file mode 100644
index 0000000..58f4fa8
--- /dev/null
+++ b/src/runtime/netpoll_epoll.go
@@ -0,0 +1,179 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+func epollcreate(size int32) int32
+func epollcreate1(flags int32) int32
+
+//go:noescape
+func epollctl(epfd, op, fd int32, ev *epollevent) int32
+
+//go:noescape
+func epollwait(epfd int32, ev *epollevent, nev, timeout int32) int32
+func closeonexec(fd int32)
+
+var (
+	epfd int32 = -1 // epoll descriptor
+
+	netpollBreakRd, netpollBreakWr uintptr // for netpollBreak
+
+	netpollWakeSig uint32 // used to avoid duplicate calls of netpollBreak
+)
+
+func netpollinit() {
+	epfd = epollcreate1(_EPOLL_CLOEXEC)
+	if epfd < 0 {
+		epfd = epollcreate(1024)
+		if epfd < 0 {
+			println("runtime: epollcreate failed with", -epfd)
+			throw("runtime: netpollinit failed")
+		}
+		closeonexec(epfd)
+	}
+	r, w, errno := nonblockingPipe()
+	if errno != 0 {
+		println("runtime: pipe failed with", -errno)
+		throw("runtime: pipe failed")
+	}
+	ev := epollevent{
+		events: _EPOLLIN,
+	}
+	*(**uintptr)(unsafe.Pointer(&ev.data)) = &netpollBreakRd
+	errno = epollctl(epfd, _EPOLL_CTL_ADD, r, &ev)
+	if errno != 0 {
+		println("runtime: epollctl failed with", -errno)
+		throw("runtime: epollctl failed")
+	}
+	netpollBreakRd = uintptr(r)
+	netpollBreakWr = uintptr(w)
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return fd == uintptr(epfd) || fd == netpollBreakRd || fd == netpollBreakWr
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	var ev epollevent
+	ev.events = _EPOLLIN | _EPOLLOUT | _EPOLLRDHUP | _EPOLLET
+	*(**pollDesc)(unsafe.Pointer(&ev.data)) = pd
+	return -epollctl(epfd, _EPOLL_CTL_ADD, int32(fd), &ev)
+}
+
+func netpollclose(fd uintptr) int32 {
+	var ev epollevent
+	return -epollctl(epfd, _EPOLL_CTL_DEL, int32(fd), &ev)
+}
+
+func netpollarm(pd *pollDesc, mode int) {
+	throw("runtime: unused")
+}
+
+// netpollBreak interrupts an epollwait.
+func netpollBreak() {
+	if atomic.Cas(&netpollWakeSig, 0, 1) {
+		for {
+			var b byte
+			n := write(netpollBreakWr, unsafe.Pointer(&b), 1)
+			if n == 1 {
+				break
+			}
+			if n == -_EINTR {
+				continue
+			}
+			if n == -_EAGAIN {
+				return
+			}
+			println("runtime: netpollBreak write failed with", -n)
+			throw("runtime: netpollBreak write failed")
+		}
+	}
+}
+
+// netpoll checks for ready network connections.
+// Returns list of goroutines that become runnable.
+// delay < 0: blocks indefinitely
+// delay == 0: does not block, just polls
+// delay > 0: block for up to that many nanoseconds
+func netpoll(delay int64) gList {
+	if epfd == -1 {
+		return gList{}
+	}
+	var waitms int32
+	if delay < 0 {
+		waitms = -1
+	} else if delay == 0 {
+		waitms = 0
+	} else if delay < 1e6 {
+		waitms = 1
+	} else if delay < 1e15 {
+		waitms = int32(delay / 1e6)
+	} else {
+		// An arbitrary cap on how long to wait for a timer.
+		// 1e9 ms == ~11.5 days.
+		waitms = 1e9
+	}
+	var events [128]epollevent
+retry:
+	n := epollwait(epfd, &events[0], int32(len(events)), waitms)
+	if n < 0 {
+		if n != -_EINTR {
+			println("runtime: epollwait on fd", epfd, "failed with", -n)
+			throw("runtime: netpoll failed")
+		}
+		// If a timed sleep was interrupted, just return to
+		// recalculate how long we should sleep now.
+		if waitms > 0 {
+			return gList{}
+		}
+		goto retry
+	}
+	var toRun gList
+	for i := int32(0); i < n; i++ {
+		ev := &events[i]
+		if ev.events == 0 {
+			continue
+		}
+
+		if *(**uintptr)(unsafe.Pointer(&ev.data)) == &netpollBreakRd {
+			if ev.events != _EPOLLIN {
+				println("runtime: netpoll: break fd ready for", ev.events)
+				throw("runtime: netpoll: break fd ready for something unexpected")
+			}
+			if delay != 0 {
+				// netpollBreak could be picked up by a
+				// nonblocking poll. Only read the byte
+				// if blocking.
+				var tmp [16]byte
+				read(int32(netpollBreakRd), noescape(unsafe.Pointer(&tmp[0])), int32(len(tmp)))
+				atomic.Store(&netpollWakeSig, 0)
+			}
+			continue
+		}
+
+		var mode int32
+		if ev.events&(_EPOLLIN|_EPOLLRDHUP|_EPOLLHUP|_EPOLLERR) != 0 {
+			mode += 'r'
+		}
+		if ev.events&(_EPOLLOUT|_EPOLLHUP|_EPOLLERR) != 0 {
+			mode += 'w'
+		}
+		if mode != 0 {
+			pd := *(**pollDesc)(unsafe.Pointer(&ev.data))
+			pd.everr = false
+			if ev.events == _EPOLLERR {
+				pd.everr = true
+			}
+			netpollready(&toRun, pd, mode)
+		}
+	}
+	return toRun
+}
diff --git a/src/runtime/netpoll_fake.go b/src/runtime/netpoll_fake.go
new file mode 100644
index 0000000..b2af3b8
--- /dev/null
+++ b/src/runtime/netpoll_fake.go
@@ -0,0 +1,35 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Fake network poller for wasm/js.
+// Should never be used, because wasm/js network connections do not honor "SetNonblock".
+
+// +build js,wasm
+
+package runtime
+
+func netpollinit() {
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return false
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	return 0
+}
+
+func netpollclose(fd uintptr) int32 {
+	return 0
+}
+
+func netpollarm(pd *pollDesc, mode int) {
+}
+
+func netpollBreak() {
+}
+
+func netpoll(delay int64) gList {
+	return gList{}
+}
diff --git a/src/runtime/netpoll_kqueue.go b/src/runtime/netpoll_kqueue.go
new file mode 100644
index 0000000..3bd93c1
--- /dev/null
+++ b/src/runtime/netpoll_kqueue.go
@@ -0,0 +1,190 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build darwin dragonfly freebsd netbsd openbsd
+
+package runtime
+
+// Integrated network poller (kqueue-based implementation).
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+var (
+	kq int32 = -1
+
+	netpollBreakRd, netpollBreakWr uintptr // for netpollBreak
+
+	netpollWakeSig uint32 // used to avoid duplicate calls of netpollBreak
+)
+
+func netpollinit() {
+	kq = kqueue()
+	if kq < 0 {
+		println("runtime: kqueue failed with", -kq)
+		throw("runtime: netpollinit failed")
+	}
+	closeonexec(kq)
+	r, w, errno := nonblockingPipe()
+	if errno != 0 {
+		println("runtime: pipe failed with", -errno)
+		throw("runtime: pipe failed")
+	}
+	ev := keventt{
+		filter: _EVFILT_READ,
+		flags:  _EV_ADD,
+	}
+	*(*uintptr)(unsafe.Pointer(&ev.ident)) = uintptr(r)
+	n := kevent(kq, &ev, 1, nil, 0, nil)
+	if n < 0 {
+		println("runtime: kevent failed with", -n)
+		throw("runtime: kevent failed")
+	}
+	netpollBreakRd = uintptr(r)
+	netpollBreakWr = uintptr(w)
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return fd == uintptr(kq) || fd == netpollBreakRd || fd == netpollBreakWr
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	// Arm both EVFILT_READ and EVFILT_WRITE in edge-triggered mode (EV_CLEAR)
+	// for the whole fd lifetime. The notifications are automatically unregistered
+	// when fd is closed.
+	var ev [2]keventt
+	*(*uintptr)(unsafe.Pointer(&ev[0].ident)) = fd
+	ev[0].filter = _EVFILT_READ
+	ev[0].flags = _EV_ADD | _EV_CLEAR
+	ev[0].fflags = 0
+	ev[0].data = 0
+	ev[0].udata = (*byte)(unsafe.Pointer(pd))
+	ev[1] = ev[0]
+	ev[1].filter = _EVFILT_WRITE
+	n := kevent(kq, &ev[0], 2, nil, 0, nil)
+	if n < 0 {
+		return -n
+	}
+	return 0
+}
+
+func netpollclose(fd uintptr) int32 {
+	// Don't need to unregister because calling close()
+	// on fd will remove any kevents that reference the descriptor.
+	return 0
+}
+
+func netpollarm(pd *pollDesc, mode int) {
+	throw("runtime: unused")
+}
+
+// netpollBreak interrupts a kevent.
+func netpollBreak() {
+	if atomic.Cas(&netpollWakeSig, 0, 1) {
+		for {
+			var b byte
+			n := write(netpollBreakWr, unsafe.Pointer(&b), 1)
+			if n == 1 || n == -_EAGAIN {
+				break
+			}
+			if n == -_EINTR {
+				continue
+			}
+			println("runtime: netpollBreak write failed with", -n)
+			throw("runtime: netpollBreak write failed")
+		}
+	}
+}
+
+// netpoll checks for ready network connections.
+// Returns list of goroutines that become runnable.
+// delay < 0: blocks indefinitely
+// delay == 0: does not block, just polls
+// delay > 0: block for up to that many nanoseconds
+func netpoll(delay int64) gList {
+	if kq == -1 {
+		return gList{}
+	}
+	var tp *timespec
+	var ts timespec
+	if delay < 0 {
+		tp = nil
+	} else if delay == 0 {
+		tp = &ts
+	} else {
+		ts.setNsec(delay)
+		if ts.tv_sec > 1e6 {
+			// Darwin returns EINVAL if the sleep time is too long.
+			ts.tv_sec = 1e6
+		}
+		tp = &ts
+	}
+	var events [64]keventt
+retry:
+	n := kevent(kq, nil, 0, &events[0], int32(len(events)), tp)
+	if n < 0 {
+		if n != -_EINTR {
+			println("runtime: kevent on fd", kq, "failed with", -n)
+			throw("runtime: netpoll failed")
+		}
+		// If a timed sleep was interrupted, just return to
+		// recalculate how long we should sleep now.
+		if delay > 0 {
+			return gList{}
+		}
+		goto retry
+	}
+	var toRun gList
+	for i := 0; i < int(n); i++ {
+		ev := &events[i]
+
+		if uintptr(ev.ident) == netpollBreakRd {
+			if ev.filter != _EVFILT_READ {
+				println("runtime: netpoll: break fd ready for", ev.filter)
+				throw("runtime: netpoll: break fd ready for something unexpected")
+			}
+			if delay != 0 {
+				// netpollBreak could be picked up by a
+				// nonblocking poll. Only read the byte
+				// if blocking.
+				var tmp [16]byte
+				read(int32(netpollBreakRd), noescape(unsafe.Pointer(&tmp[0])), int32(len(tmp)))
+				atomic.Store(&netpollWakeSig, 0)
+			}
+			continue
+		}
+
+		var mode int32
+		switch ev.filter {
+		case _EVFILT_READ:
+			mode += 'r'
+
+			// On some systems when the read end of a pipe
+			// is closed the write end will not get a
+			// _EVFILT_WRITE event, but will get a
+			// _EVFILT_READ event with EV_EOF set.
+			// Note that setting 'w' here just means that we
+			// will wake up a goroutine waiting to write;
+			// that goroutine will try the write again,
+			// and the appropriate thing will happen based
+			// on what that write returns (success, EPIPE, EAGAIN).
+			if ev.flags&_EV_EOF != 0 {
+				mode += 'w'
+			}
+		case _EVFILT_WRITE:
+			mode += 'w'
+		}
+		if mode != 0 {
+			pd := (*pollDesc)(unsafe.Pointer(ev.udata))
+			pd.everr = false
+			if ev.flags == _EV_ERROR {
+				pd.everr = true
+			}
+			netpollready(&toRun, pd, mode)
+		}
+	}
+	return toRun
+}
diff --git a/src/runtime/netpoll_os_test.go b/src/runtime/netpoll_os_test.go
new file mode 100644
index 0000000..b96b9f3
--- /dev/null
+++ b/src/runtime/netpoll_os_test.go
@@ -0,0 +1,28 @@
+package runtime_test
+
+import (
+	"runtime"
+	"sync"
+	"testing"
+)
+
+var wg sync.WaitGroup
+
+func init() {
+	runtime.NetpollGenericInit()
+}
+
+func BenchmarkNetpollBreak(b *testing.B) {
+	b.StartTimer()
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < 10; j++ {
+			wg.Add(1)
+			go func() {
+				runtime.NetpollBreak()
+				wg.Done()
+			}()
+		}
+	}
+	wg.Wait()
+	b.StopTimer()
+}
diff --git a/src/runtime/netpoll_solaris.go b/src/runtime/netpoll_solaris.go
new file mode 100644
index 0000000..d217d5b
--- /dev/null
+++ b/src/runtime/netpoll_solaris.go
@@ -0,0 +1,319 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// Solaris runtime-integrated network poller.
+//
+// Solaris uses event ports for scalable network I/O. Event
+// ports are level-triggered, unlike epoll and kqueue which
+// can be configured in both level-triggered and edge-triggered
+// mode. Level triggering means we have to keep track of a few things
+// ourselves. After we receive an event for a file descriptor,
+// it's our responsibility to ask again to be notified for future
+// events for that descriptor. When doing this we must keep track of
+// what kind of events the goroutines are currently interested in,
+// for example a fd may be open both for reading and writing.
+//
+// A description of the high level operation of this code
+// follows. Networking code will get a file descriptor by some means
+// and will register it with the netpolling mechanism by a code path
+// that eventually calls runtime·netpollopen. runtime·netpollopen
+// calls port_associate with an empty event set. That means that we
+// will not receive any events at this point. The association needs
+// to be done at this early point because we need to process the I/O
+// readiness notification at some point in the future. If I/O becomes
+// ready when nobody is listening, when we finally care about it,
+// nobody will tell us anymore.
+//
+// Beside calling runtime·netpollopen, the networking code paths
+// will call runtime·netpollarm each time goroutines are interested
+// in doing network I/O. Because now we know what kind of I/O we
+// are interested in (reading/writing), we can call port_associate
+// passing the correct type of event set (POLLIN/POLLOUT). As we made
+// sure to have already associated the file descriptor with the port,
+// when we now call port_associate, we will unblock the main poller
+// loop (in runtime·netpoll) right away if the socket is actually
+// ready for I/O.
+//
+// The main poller loop runs in its own thread waiting for events
+// using port_getn. When an event happens, it will tell the scheduler
+// about it using runtime·netpollready. Besides doing this, it must
+// also re-associate the events that were not part of this current
+// notification with the file descriptor. Failing to do this would
+// mean each notification will prevent concurrent code using the
+// same file descriptor in parallel.
+//
+// The logic dealing with re-associations is encapsulated in
+// runtime·netpollupdate. This function takes care to associate the
+// descriptor only with the subset of events that were previously
+// part of the association, except the one that just happened. We
+// can't re-associate with that right away, because event ports
+// are level triggered so it would cause a busy loop. Instead, that
+// association is effected only by the runtime·netpollarm code path,
+// when Go code actually asks for I/O.
+//
+// The open and arming mechanisms are serialized using the lock
+// inside PollDesc. This is required because the netpoll loop runs
+// asynchronously in respect to other Go code and by the time we get
+// to call port_associate to update the association in the loop, the
+// file descriptor might have been closed and reopened already. The
+// lock allows runtime·netpollupdate to be called synchronously from
+// the loop thread while preventing other threads operating to the
+// same PollDesc, so once we unblock in the main loop, until we loop
+// again we know for sure we are always talking about the same file
+// descriptor and can safely access the data we want (the event set).
+
+//go:cgo_import_dynamic libc_port_create port_create "libc.so"
+//go:cgo_import_dynamic libc_port_associate port_associate "libc.so"
+//go:cgo_import_dynamic libc_port_dissociate port_dissociate "libc.so"
+//go:cgo_import_dynamic libc_port_getn port_getn "libc.so"
+//go:cgo_import_dynamic libc_port_alert port_alert "libc.so"
+
+//go:linkname libc_port_create libc_port_create
+//go:linkname libc_port_associate libc_port_associate
+//go:linkname libc_port_dissociate libc_port_dissociate
+//go:linkname libc_port_getn libc_port_getn
+//go:linkname libc_port_alert libc_port_alert
+
+var (
+	libc_port_create,
+	libc_port_associate,
+	libc_port_dissociate,
+	libc_port_getn,
+	libc_port_alert libcFunc
+	netpollWakeSig uint32 // used to avoid duplicate calls of netpollBreak
+)
+
+func errno() int32 {
+	return *getg().m.perrno
+}
+
+func fcntl(fd, cmd, arg int32) int32 {
+	return int32(sysvicall3(&libc_fcntl, uintptr(fd), uintptr(cmd), uintptr(arg)))
+}
+
+func port_create() int32 {
+	return int32(sysvicall0(&libc_port_create))
+}
+
+func port_associate(port, source int32, object uintptr, events uint32, user uintptr) int32 {
+	return int32(sysvicall5(&libc_port_associate, uintptr(port), uintptr(source), object, uintptr(events), user))
+}
+
+func port_dissociate(port, source int32, object uintptr) int32 {
+	return int32(sysvicall3(&libc_port_dissociate, uintptr(port), uintptr(source), object))
+}
+
+func port_getn(port int32, evs *portevent, max uint32, nget *uint32, timeout *timespec) int32 {
+	return int32(sysvicall5(&libc_port_getn, uintptr(port), uintptr(unsafe.Pointer(evs)), uintptr(max), uintptr(unsafe.Pointer(nget)), uintptr(unsafe.Pointer(timeout))))
+}
+
+func port_alert(port int32, flags, events uint32, user uintptr) int32 {
+	return int32(sysvicall4(&libc_port_alert, uintptr(port), uintptr(flags), uintptr(events), user))
+}
+
+var portfd int32 = -1
+
+func netpollinit() {
+	portfd = port_create()
+	if portfd >= 0 {
+		fcntl(portfd, _F_SETFD, _FD_CLOEXEC)
+		return
+	}
+
+	print("runtime: port_create failed (errno=", errno(), ")\n")
+	throw("runtime: netpollinit failed")
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return fd == uintptr(portfd)
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	lock(&pd.lock)
+	// We don't register for any specific type of events yet, that's
+	// netpollarm's job. We merely ensure we call port_associate before
+	// asynchronous connect/accept completes, so when we actually want
+	// to do any I/O, the call to port_associate (from netpollarm,
+	// with the interested event set) will unblock port_getn right away
+	// because of the I/O readiness notification.
+	pd.user = 0
+	r := port_associate(portfd, _PORT_SOURCE_FD, fd, 0, uintptr(unsafe.Pointer(pd)))
+	unlock(&pd.lock)
+	return r
+}
+
+func netpollclose(fd uintptr) int32 {
+	return port_dissociate(portfd, _PORT_SOURCE_FD, fd)
+}
+
+// Updates the association with a new set of interested events. After
+// this call, port_getn will return one and only one event for that
+// particular descriptor, so this function needs to be called again.
+func netpollupdate(pd *pollDesc, set, clear uint32) {
+	if pd.closing {
+		return
+	}
+
+	old := pd.user
+	events := (old & ^clear) | set
+	if old == events {
+		return
+	}
+
+	if events != 0 && port_associate(portfd, _PORT_SOURCE_FD, pd.fd, events, uintptr(unsafe.Pointer(pd))) != 0 {
+		print("runtime: port_associate failed (errno=", errno(), ")\n")
+		throw("runtime: netpollupdate failed")
+	}
+	pd.user = events
+}
+
+// subscribe the fd to the port such that port_getn will return one event.
+func netpollarm(pd *pollDesc, mode int) {
+	lock(&pd.lock)
+	switch mode {
+	case 'r':
+		netpollupdate(pd, _POLLIN, 0)
+	case 'w':
+		netpollupdate(pd, _POLLOUT, 0)
+	default:
+		throw("runtime: bad mode")
+	}
+	unlock(&pd.lock)
+}
+
+// netpollBreak interrupts a port_getn wait.
+func netpollBreak() {
+	if atomic.Cas(&netpollWakeSig, 0, 1) {
+		// Use port_alert to put portfd into alert mode.
+		// This will wake up all threads sleeping in port_getn on portfd,
+		// and cause their calls to port_getn to return immediately.
+		// Further, until portfd is taken out of alert mode,
+		// all calls to port_getn will return immediately.
+		if port_alert(portfd, _PORT_ALERT_UPDATE, _POLLHUP, uintptr(unsafe.Pointer(&portfd))) < 0 {
+			if e := errno(); e != _EBUSY {
+				println("runtime: port_alert failed with", e)
+				throw("runtime: netpoll: port_alert failed")
+			}
+		}
+	}
+}
+
+// netpoll checks for ready network connections.
+// Returns list of goroutines that become runnable.
+// delay < 0: blocks indefinitely
+// delay == 0: does not block, just polls
+// delay > 0: block for up to that many nanoseconds
+func netpoll(delay int64) gList {
+	if portfd == -1 {
+		return gList{}
+	}
+
+	var wait *timespec
+	var ts timespec
+	if delay < 0 {
+		wait = nil
+	} else if delay == 0 {
+		wait = &ts
+	} else {
+		ts.setNsec(delay)
+		if ts.tv_sec > 1e6 {
+			// An arbitrary cap on how long to wait for a timer.
+			// 1e6 s == ~11.5 days.
+			ts.tv_sec = 1e6
+		}
+		wait = &ts
+	}
+
+	var events [128]portevent
+retry:
+	var n uint32 = 1
+	r := port_getn(portfd, &events[0], uint32(len(events)), &n, wait)
+	e := errno()
+	if r < 0 && e == _ETIME && n > 0 {
+		// As per port_getn(3C), an ETIME failure does not preclude the
+		// delivery of some number of events.  Treat a timeout failure
+		// with delivered events as a success.
+		r = 0
+	}
+	if r < 0 {
+		if e != _EINTR && e != _ETIME {
+			print("runtime: port_getn on fd ", portfd, " failed (errno=", e, ")\n")
+			throw("runtime: netpoll failed")
+		}
+		// If a timed sleep was interrupted and there are no events,
+		// just return to recalculate how long we should sleep now.
+		if delay > 0 {
+			return gList{}
+		}
+		goto retry
+	}
+
+	var toRun gList
+	for i := 0; i < int(n); i++ {
+		ev := &events[i]
+
+		if ev.portev_source == _PORT_SOURCE_ALERT {
+			if ev.portev_events != _POLLHUP || unsafe.Pointer(ev.portev_user) != unsafe.Pointer(&portfd) {
+				throw("runtime: netpoll: bad port_alert wakeup")
+			}
+			if delay != 0 {
+				// Now that a blocking call to netpoll
+				// has seen the alert, take portfd
+				// back out of alert mode.
+				// See the comment in netpollBreak.
+				if port_alert(portfd, 0, 0, 0) < 0 {
+					e := errno()
+					println("runtime: port_alert failed with", e)
+					throw("runtime: netpoll: port_alert failed")
+				}
+				atomic.Store(&netpollWakeSig, 0)
+			}
+			continue
+		}
+
+		if ev.portev_events == 0 {
+			continue
+		}
+		pd := (*pollDesc)(unsafe.Pointer(ev.portev_user))
+
+		var mode, clear int32
+		if (ev.portev_events & (_POLLIN | _POLLHUP | _POLLERR)) != 0 {
+			mode += 'r'
+			clear |= _POLLIN
+		}
+		if (ev.portev_events & (_POLLOUT | _POLLHUP | _POLLERR)) != 0 {
+			mode += 'w'
+			clear |= _POLLOUT
+		}
+		// To effect edge-triggered events, we need to be sure to
+		// update our association with whatever events were not
+		// set with the event. For example if we are registered
+		// for POLLIN|POLLOUT, and we get POLLIN, besides waking
+		// the goroutine interested in POLLIN we have to not forget
+		// about the one interested in POLLOUT.
+		if clear != 0 {
+			lock(&pd.lock)
+			netpollupdate(pd, 0, uint32(clear))
+			unlock(&pd.lock)
+		}
+
+		if mode != 0 {
+			// TODO(mikio): Consider implementing event
+			// scanning error reporting once we are sure
+			// about the event port on SmartOS.
+			//
+			// See golang.org/x/issue/30840.
+			netpollready(&toRun, pd, mode)
+		}
+	}
+
+	return toRun
+}
diff --git a/src/runtime/netpoll_stub.go b/src/runtime/netpoll_stub.go
new file mode 100644
index 0000000..3599f2d
--- /dev/null
+++ b/src/runtime/netpoll_stub.go
@@ -0,0 +1,61 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build plan9
+
+package runtime
+
+import "runtime/internal/atomic"
+
+var netpollInited uint32
+var netpollWaiters uint32
+
+var netpollStubLock mutex
+var netpollNote note
+
+// netpollBroken, protected by netpollBrokenLock, avoids a double notewakeup.
+var netpollBrokenLock mutex
+var netpollBroken bool
+
+func netpollGenericInit() {
+	atomic.Store(&netpollInited, 1)
+}
+
+func netpollBreak() {
+	lock(&netpollBrokenLock)
+	broken := netpollBroken
+	netpollBroken = true
+	if !broken {
+		notewakeup(&netpollNote)
+	}
+	unlock(&netpollBrokenLock)
+}
+
+// Polls for ready network connections.
+// Returns list of goroutines that become runnable.
+func netpoll(delay int64) gList {
+	// Implementation for platforms that do not support
+	// integrated network poller.
+	if delay != 0 {
+		// This lock ensures that only one goroutine tries to use
+		// the note. It should normally be completely uncontended.
+		lock(&netpollStubLock)
+
+		lock(&netpollBrokenLock)
+		noteclear(&netpollNote)
+		netpollBroken = false
+		unlock(&netpollBrokenLock)
+
+		notetsleep(&netpollNote, delay)
+		unlock(&netpollStubLock)
+		// Guard against starvation in case the lock is contended
+		// (eg when running TestNetpollBreak).
+		osyield()
+	}
+	return gList{}
+}
+
+func netpollinited() bool {
+	return atomic.Load(&netpollInited) != 0
+}
diff --git a/src/runtime/netpoll_windows.go b/src/runtime/netpoll_windows.go
new file mode 100644
index 0000000..4c1cd26
--- /dev/null
+++ b/src/runtime/netpoll_windows.go
@@ -0,0 +1,156 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const _DWORD_MAX = 0xffffffff
+
+const _INVALID_HANDLE_VALUE = ^uintptr(0)
+
+// net_op must be the same as beginning of internal/poll.operation.
+// Keep these in sync.
+type net_op struct {
+	// used by windows
+	o overlapped
+	// used by netpoll
+	pd    *pollDesc
+	mode  int32
+	errno int32
+	qty   uint32
+}
+
+type overlappedEntry struct {
+	key      uintptr
+	op       *net_op // In reality it's *overlapped, but we cast it to *net_op anyway.
+	internal uintptr
+	qty      uint32
+}
+
+var (
+	iocphandle uintptr = _INVALID_HANDLE_VALUE // completion port io handle
+
+	netpollWakeSig uint32 // used to avoid duplicate calls of netpollBreak
+)
+
+func netpollinit() {
+	iocphandle = stdcall4(_CreateIoCompletionPort, _INVALID_HANDLE_VALUE, 0, 0, _DWORD_MAX)
+	if iocphandle == 0 {
+		println("runtime: CreateIoCompletionPort failed (errno=", getlasterror(), ")")
+		throw("runtime: netpollinit failed")
+	}
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return fd == iocphandle
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	if stdcall4(_CreateIoCompletionPort, fd, iocphandle, 0, 0) == 0 {
+		return int32(getlasterror())
+	}
+	return 0
+}
+
+func netpollclose(fd uintptr) int32 {
+	// nothing to do
+	return 0
+}
+
+func netpollarm(pd *pollDesc, mode int) {
+	throw("runtime: unused")
+}
+
+func netpollBreak() {
+	if atomic.Cas(&netpollWakeSig, 0, 1) {
+		if stdcall4(_PostQueuedCompletionStatus, iocphandle, 0, 0, 0) == 0 {
+			println("runtime: netpoll: PostQueuedCompletionStatus failed (errno=", getlasterror(), ")")
+			throw("runtime: netpoll: PostQueuedCompletionStatus failed")
+		}
+	}
+}
+
+// netpoll checks for ready network connections.
+// Returns list of goroutines that become runnable.
+// delay < 0: blocks indefinitely
+// delay == 0: does not block, just polls
+// delay > 0: block for up to that many nanoseconds
+func netpoll(delay int64) gList {
+	var entries [64]overlappedEntry
+	var wait, qty, flags, n, i uint32
+	var errno int32
+	var op *net_op
+	var toRun gList
+
+	mp := getg().m
+
+	if iocphandle == _INVALID_HANDLE_VALUE {
+		return gList{}
+	}
+	if delay < 0 {
+		wait = _INFINITE
+	} else if delay == 0 {
+		wait = 0
+	} else if delay < 1e6 {
+		wait = 1
+	} else if delay < 1e15 {
+		wait = uint32(delay / 1e6)
+	} else {
+		// An arbitrary cap on how long to wait for a timer.
+		// 1e9 ms == ~11.5 days.
+		wait = 1e9
+	}
+
+	n = uint32(len(entries) / int(gomaxprocs))
+	if n < 8 {
+		n = 8
+	}
+	if delay != 0 {
+		mp.blocked = true
+	}
+	if stdcall6(_GetQueuedCompletionStatusEx, iocphandle, uintptr(unsafe.Pointer(&entries[0])), uintptr(n), uintptr(unsafe.Pointer(&n)), uintptr(wait), 0) == 0 {
+		mp.blocked = false
+		errno = int32(getlasterror())
+		if errno == _WAIT_TIMEOUT {
+			return gList{}
+		}
+		println("runtime: GetQueuedCompletionStatusEx failed (errno=", errno, ")")
+		throw("runtime: netpoll failed")
+	}
+	mp.blocked = false
+	for i = 0; i < n; i++ {
+		op = entries[i].op
+		if op != nil {
+			errno = 0
+			qty = 0
+			if stdcall5(_WSAGetOverlappedResult, op.pd.fd, uintptr(unsafe.Pointer(op)), uintptr(unsafe.Pointer(&qty)), 0, uintptr(unsafe.Pointer(&flags))) == 0 {
+				errno = int32(getlasterror())
+			}
+			handlecompletion(&toRun, op, errno, qty)
+		} else {
+			atomic.Store(&netpollWakeSig, 0)
+			if delay == 0 {
+				// Forward the notification to the
+				// blocked poller.
+				netpollBreak()
+			}
+		}
+	}
+	return toRun
+}
+
+func handlecompletion(toRun *gList, op *net_op, errno int32, qty uint32) {
+	mode := op.mode
+	if mode != 'r' && mode != 'w' {
+		println("runtime: GetQueuedCompletionStatusEx returned invalid mode=", mode)
+		throw("runtime: netpoll failed")
+	}
+	op.errno = errno
+	op.qty = qty
+	netpollready(toRun, op.pd, mode)
+}
diff --git a/src/runtime/norace_linux_test.go b/src/runtime/norace_linux_test.go
new file mode 100644
index 0000000..3517a5d
--- /dev/null
+++ b/src/runtime/norace_linux_test.go
@@ -0,0 +1,41 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The file contains tests that cannot run under race detector for some reason.
+// +build !race
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+var newOSProcDone bool
+
+//go:nosplit
+func newOSProcCreated() {
+	newOSProcDone = true
+}
+
+// Can't be run with -race because it inserts calls into newOSProcCreated()
+// that require a valid G/M.
+func TestNewOSProc0(t *testing.T) {
+	runtime.NewOSProc0(0x800000, unsafe.Pointer(runtime.FuncPC(newOSProcCreated)))
+	check := time.NewTicker(100 * time.Millisecond)
+	defer check.Stop()
+	end := time.After(5 * time.Second)
+	for {
+		select {
+		case <-check.C:
+			if newOSProcDone {
+				return
+			}
+		case <-end:
+			t.Fatalf("couldn't create new OS process")
+		}
+	}
+}
diff --git a/src/runtime/norace_test.go b/src/runtime/norace_test.go
new file mode 100644
index 0000000..e90128b
--- /dev/null
+++ b/src/runtime/norace_test.go
@@ -0,0 +1,46 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The file contains tests that cannot run under race detector for some reason.
+// +build !race
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+)
+
+// Syscall tests split stack between Entersyscall and Exitsyscall under race detector.
+func BenchmarkSyscall(b *testing.B) {
+	benchmarkSyscall(b, 0, 1)
+}
+
+func BenchmarkSyscallWork(b *testing.B) {
+	benchmarkSyscall(b, 100, 1)
+}
+
+func BenchmarkSyscallExcess(b *testing.B) {
+	benchmarkSyscall(b, 0, 4)
+}
+
+func BenchmarkSyscallExcessWork(b *testing.B) {
+	benchmarkSyscall(b, 100, 4)
+}
+
+func benchmarkSyscall(b *testing.B, work, excess int) {
+	b.SetParallelism(excess)
+	b.RunParallel(func(pb *testing.PB) {
+		foo := 42
+		for pb.Next() {
+			runtime.Entersyscall()
+			for i := 0; i < work; i++ {
+				foo *= 2
+				foo /= 2
+			}
+			runtime.Exitsyscall()
+		}
+		_ = foo
+	})
+}
diff --git a/src/runtime/numcpu_freebsd_test.go b/src/runtime/numcpu_freebsd_test.go
new file mode 100644
index 0000000..e78890a
--- /dev/null
+++ b/src/runtime/numcpu_freebsd_test.go
@@ -0,0 +1,15 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import "testing"
+
+func TestFreeBSDNumCPU(t *testing.T) {
+	got := runTestProg(t, "testprog", "FreeBSDNumCPU")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
diff --git a/src/runtime/os2_aix.go b/src/runtime/os2_aix.go
new file mode 100644
index 0000000..428ff7f
--- /dev/null
+++ b/src/runtime/os2_aix.go
@@ -0,0 +1,756 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file contains main runtime AIX syscalls.
+// Pollset syscalls are in netpoll_aix.go.
+// The implementation is based on Solaris and Windows.
+// Each syscall is made by calling its libc symbol using asmcgocall and asmsyscall6
+// assembly functions.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Symbols imported for __start function.
+
+//go:cgo_import_dynamic libc___n_pthreads __n_pthreads "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libc___mod_init __mod_init "libc.a/shr_64.o"
+//go:linkname libc___n_pthreads libc___n_pthread
+//go:linkname libc___mod_init libc___mod_init
+
+var (
+	libc___n_pthread,
+	libc___mod_init libFunc
+)
+
+// Syscalls
+
+//go:cgo_import_dynamic libc__Errno _Errno "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_clock_gettime clock_gettime "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_close close "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_exit exit "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_getpid getpid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_getsystemcfg getsystemcfg "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_kill kill "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_madvise madvise "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_malloc malloc "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_mmap mmap "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_mprotect mprotect "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_munmap munmap "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_open open "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_pipe pipe "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_raise raise "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_read read "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sched_yield sched_yield "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sem_init sem_init "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sem_post sem_post "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sem_timedwait sem_timedwait "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sem_wait sem_wait "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setitimer setitimer "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sigaction sigaction "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sigaltstack sigaltstack "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sysconf sysconf "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_usleep usleep "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_write write "libc.a/shr_64.o"
+
+//go:cgo_import_dynamic libpthread___pth_init __pth_init "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_destroy pthread_attr_destroy "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_init pthread_attr_init "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_getstacksize pthread_attr_getstacksize "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_setstacksize pthread_attr_setstacksize "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_setdetachstate pthread_attr_setdetachstate "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_setstackaddr pthread_attr_setstackaddr "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_create pthread_create "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_sigthreadmask sigthreadmask "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_self pthread_self "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_kill pthread_kill "libpthread.a/shr_xpg5_64.o"
+
+//go:linkname libc__Errno libc__Errno
+//go:linkname libc_clock_gettime libc_clock_gettime
+//go:linkname libc_close libc_close
+//go:linkname libc_exit libc_exit
+//go:linkname libc_getpid libc_getpid
+//go:linkname libc_getsystemcfg libc_getsystemcfg
+//go:linkname libc_kill libc_kill
+//go:linkname libc_madvise libc_madvise
+//go:linkname libc_malloc libc_malloc
+//go:linkname libc_mmap libc_mmap
+//go:linkname libc_mprotect libc_mprotect
+//go:linkname libc_munmap libc_munmap
+//go:linkname libc_open libc_open
+//go:linkname libc_pipe libc_pipe
+//go:linkname libc_raise libc_raise
+//go:linkname libc_read libc_read
+//go:linkname libc_sched_yield libc_sched_yield
+//go:linkname libc_sem_init libc_sem_init
+//go:linkname libc_sem_post libc_sem_post
+//go:linkname libc_sem_timedwait libc_sem_timedwait
+//go:linkname libc_sem_wait libc_sem_wait
+//go:linkname libc_setitimer libc_setitimer
+//go:linkname libc_sigaction libc_sigaction
+//go:linkname libc_sigaltstack libc_sigaltstack
+//go:linkname libc_sysconf libc_sysconf
+//go:linkname libc_usleep libc_usleep
+//go:linkname libc_write libc_write
+
+//go:linkname libpthread___pth_init libpthread___pth_init
+//go:linkname libpthread_attr_destroy libpthread_attr_destroy
+//go:linkname libpthread_attr_init libpthread_attr_init
+//go:linkname libpthread_attr_getstacksize libpthread_attr_getstacksize
+//go:linkname libpthread_attr_setstacksize libpthread_attr_setstacksize
+//go:linkname libpthread_attr_setdetachstate libpthread_attr_setdetachstate
+//go:linkname libpthread_attr_setstackaddr libpthread_attr_setstackaddr
+//go:linkname libpthread_create libpthread_create
+//go:linkname libpthread_sigthreadmask libpthread_sigthreadmask
+//go:linkname libpthread_self libpthread_self
+//go:linkname libpthread_kill libpthread_kill
+
+var (
+	//libc
+	libc__Errno,
+	libc_clock_gettime,
+	libc_close,
+	libc_exit,
+	libc_getpid,
+	libc_getsystemcfg,
+	libc_kill,
+	libc_madvise,
+	libc_malloc,
+	libc_mmap,
+	libc_mprotect,
+	libc_munmap,
+	libc_open,
+	libc_pipe,
+	libc_raise,
+	libc_read,
+	libc_sched_yield,
+	libc_sem_init,
+	libc_sem_post,
+	libc_sem_timedwait,
+	libc_sem_wait,
+	libc_setitimer,
+	libc_sigaction,
+	libc_sigaltstack,
+	libc_sysconf,
+	libc_usleep,
+	libc_write,
+	//libpthread
+	libpthread___pth_init,
+	libpthread_attr_destroy,
+	libpthread_attr_init,
+	libpthread_attr_getstacksize,
+	libpthread_attr_setstacksize,
+	libpthread_attr_setdetachstate,
+	libpthread_attr_setstackaddr,
+	libpthread_create,
+	libpthread_sigthreadmask,
+	libpthread_self,
+	libpthread_kill libFunc
+)
+
+type libFunc uintptr
+
+// asmsyscall6 calls the libc symbol using a C convention.
+// It's defined in sys_aix_ppc64.go.
+var asmsyscall6 libFunc
+
+// syscallX functions must always be called with g != nil and m != nil,
+// as it relies on g.m.libcall to pass arguments to asmcgocall.
+// The few cases where syscalls haven't a g or a m must call their equivalent
+// function in sys_aix_ppc64.s to handle them.
+
+//go:nowritebarrier
+//go:nosplit
+func syscall0(fn *libFunc) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    0,
+		args: uintptr(unsafe.Pointer(&fn)), // it's unused but must be non-nil, otherwise crashes
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+func syscall1(fn *libFunc, a0 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall2(fn *libFunc, a0, a1 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    2,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall3(fn *libFunc, a0, a1, a2 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    3,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall4(fn *libFunc, a0, a1, a2, a3 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    4,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall5(fn *libFunc, a0, a1, a2, a3, a4 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    5,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall6(fn *libFunc, a0, a1, a2, a3, a4, a5 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    6,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+func exit1(code int32)
+
+//go:nosplit
+func exit(code int32) {
+	_g_ := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if _g_ != nil {
+		syscall1(&libc_exit, uintptr(code))
+		return
+	}
+	exit1(code)
+}
+
+func write2(fd, p uintptr, n int32) int32
+
+//go:nosplit
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	_g_ := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if _g_ != nil {
+		r, errno := syscall3(&libc_write, uintptr(fd), uintptr(p), uintptr(n))
+		if int32(r) < 0 {
+			return -int32(errno)
+		}
+		return int32(r)
+	}
+	// Note that in this case we can't return a valid errno value.
+	return write2(fd, uintptr(p), n)
+
+}
+
+//go:nosplit
+func read(fd int32, p unsafe.Pointer, n int32) int32 {
+	r, errno := syscall3(&libc_read, uintptr(fd), uintptr(p), uintptr(n))
+	if int32(r) < 0 {
+		return -int32(errno)
+	}
+	return int32(r)
+}
+
+//go:nosplit
+func open(name *byte, mode, perm int32) int32 {
+	r, _ := syscall3(&libc_open, uintptr(unsafe.Pointer(name)), uintptr(mode), uintptr(perm))
+	return int32(r)
+}
+
+//go:nosplit
+func closefd(fd int32) int32 {
+	r, _ := syscall1(&libc_close, uintptr(fd))
+	return int32(r)
+}
+
+//go:nosplit
+func pipe() (r, w int32, errno int32) {
+	var p [2]int32
+	_, err := syscall1(&libc_pipe, uintptr(noescape(unsafe.Pointer(&p[0]))))
+	return p[0], p[1], int32(err)
+}
+
+// mmap calls the mmap system call.
+// We only pass the lower 32 bits of file offset to the
+// assembly routine; the higher bits (if required), should be provided
+// by the assembly routine as 0.
+// The err result is an OS error code such as ENOMEM.
+//go:nosplit
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (unsafe.Pointer, int) {
+	r, err0 := syscall6(&libc_mmap, uintptr(addr), uintptr(n), uintptr(prot), uintptr(flags), uintptr(fd), uintptr(off))
+	if r == ^uintptr(0) {
+		return nil, int(err0)
+	}
+	return unsafe.Pointer(r), int(err0)
+}
+
+//go:nosplit
+func mprotect(addr unsafe.Pointer, n uintptr, prot int32) (unsafe.Pointer, int) {
+	r, err0 := syscall3(&libc_mprotect, uintptr(addr), uintptr(n), uintptr(prot))
+	if r == ^uintptr(0) {
+		return nil, int(err0)
+	}
+	return unsafe.Pointer(r), int(err0)
+}
+
+//go:nosplit
+func munmap(addr unsafe.Pointer, n uintptr) {
+	r, err := syscall2(&libc_munmap, uintptr(addr), uintptr(n))
+	if int32(r) == -1 {
+		println("syscall munmap failed: ", hex(err))
+		throw("syscall munmap")
+	}
+}
+
+//go:nosplit
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) {
+	r, err := syscall3(&libc_madvise, uintptr(addr), uintptr(n), uintptr(flags))
+	if int32(r) == -1 {
+		println("syscall madvise failed: ", hex(err))
+		throw("syscall madvise")
+	}
+}
+
+func sigaction1(sig, new, old uintptr)
+
+//go:nosplit
+func sigaction(sig uintptr, new, old *sigactiont) {
+	_g_ := getg()
+
+	// Check the validity of g because without a g during
+	// runtime.libpreinit.
+	if _g_ != nil {
+		r, err := syscall3(&libc_sigaction, sig, uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+		if int32(r) == -1 {
+			println("Sigaction failed for sig: ", sig, " with error:", hex(err))
+			throw("syscall sigaction")
+		}
+		return
+	}
+
+	sigaction1(sig, uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+}
+
+//go:nosplit
+func sigaltstack(new, old *stackt) {
+	r, err := syscall2(&libc_sigaltstack, uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+	if int32(r) == -1 {
+		println("syscall sigaltstack failed: ", hex(err))
+		throw("syscall sigaltstack")
+	}
+}
+
+//go:nosplit
+//go:linkname internal_cpu_getsystemcfg internal/cpu.getsystemcfg
+func internal_cpu_getsystemcfg(label uint) uint {
+	r, _ := syscall1(&libc_getsystemcfg, uintptr(label))
+	return uint(r)
+}
+
+func usleep1(us uint32)
+
+//go:nosplit
+func usleep(us uint32) {
+	_g_ := getg()
+
+	// Check the validity of m because we might be called in cgo callback
+	// path early enough where there isn't a g or a m available yet.
+	if _g_ != nil && _g_.m != nil {
+		r, err := syscall1(&libc_usleep, uintptr(us))
+		if int32(r) == -1 {
+			println("syscall usleep failed: ", hex(err))
+			throw("syscall usleep")
+		}
+		return
+	}
+	usleep1(us)
+}
+
+//go:nosplit
+func clock_gettime(clockid int32, tp *timespec) int32 {
+	r, _ := syscall2(&libc_clock_gettime, uintptr(clockid), uintptr(unsafe.Pointer(tp)))
+	return int32(r)
+}
+
+//go:nosplit
+func setitimer(mode int32, new, old *itimerval) {
+	r, err := syscall3(&libc_setitimer, uintptr(mode), uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+	if int32(r) == -1 {
+		println("syscall setitimer failed: ", hex(err))
+		throw("syscall setitimer")
+	}
+}
+
+//go:nosplit
+func malloc(size uintptr) unsafe.Pointer {
+	r, _ := syscall1(&libc_malloc, size)
+	return unsafe.Pointer(r)
+}
+
+//go:nosplit
+func sem_init(sem *semt, pshared int32, value uint32) int32 {
+	r, _ := syscall3(&libc_sem_init, uintptr(unsafe.Pointer(sem)), uintptr(pshared), uintptr(value))
+	return int32(r)
+}
+
+//go:nosplit
+func sem_wait(sem *semt) (int32, int32) {
+	r, err := syscall1(&libc_sem_wait, uintptr(unsafe.Pointer(sem)))
+	return int32(r), int32(err)
+}
+
+//go:nosplit
+func sem_post(sem *semt) int32 {
+	r, _ := syscall1(&libc_sem_post, uintptr(unsafe.Pointer(sem)))
+	return int32(r)
+}
+
+//go:nosplit
+func sem_timedwait(sem *semt, timeout *timespec) (int32, int32) {
+	r, err := syscall2(&libc_sem_timedwait, uintptr(unsafe.Pointer(sem)), uintptr(unsafe.Pointer(timeout)))
+	return int32(r), int32(err)
+}
+
+//go:nosplit
+func raise(sig uint32) {
+	r, err := syscall1(&libc_raise, uintptr(sig))
+	if int32(r) == -1 {
+		println("syscall raise failed: ", hex(err))
+		throw("syscall raise")
+	}
+}
+
+//go:nosplit
+func raiseproc(sig uint32) {
+	pid, err := syscall0(&libc_getpid)
+	if int32(pid) == -1 {
+		println("syscall getpid failed: ", hex(err))
+		throw("syscall raiseproc")
+	}
+
+	syscall2(&libc_kill, pid, uintptr(sig))
+}
+
+func osyield1()
+
+//go:nosplit
+func osyield() {
+	_g_ := getg()
+
+	// Check the validity of m because it might be called during a cgo
+	// callback early enough where m isn't available yet.
+	if _g_ != nil && _g_.m != nil {
+		r, err := syscall0(&libc_sched_yield)
+		if int32(r) == -1 {
+			println("syscall osyield failed: ", hex(err))
+			throw("syscall osyield")
+		}
+		return
+	}
+	osyield1()
+}
+
+//go:nosplit
+func sysconf(name int32) uintptr {
+	r, _ := syscall1(&libc_sysconf, uintptr(name))
+	if int32(r) == -1 {
+		throw("syscall sysconf")
+	}
+	return r
+
+}
+
+// pthread functions returns its error code in the main return value
+// Therefore, err returns by syscall means nothing and must not be used
+
+//go:nosplit
+func pthread_attr_destroy(attr *pthread_attr) int32 {
+	r, _ := syscall1(&libpthread_attr_destroy, uintptr(unsafe.Pointer(attr)))
+	return int32(r)
+}
+
+func pthread_attr_init1(attr uintptr) int32
+
+//go:nosplit
+func pthread_attr_init(attr *pthread_attr) int32 {
+	_g_ := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if _g_ != nil {
+		r, _ := syscall1(&libpthread_attr_init, uintptr(unsafe.Pointer(attr)))
+		return int32(r)
+	}
+
+	return pthread_attr_init1(uintptr(unsafe.Pointer(attr)))
+}
+
+func pthread_attr_setdetachstate1(attr uintptr, state int32) int32
+
+//go:nosplit
+func pthread_attr_setdetachstate(attr *pthread_attr, state int32) int32 {
+	_g_ := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if _g_ != nil {
+		r, _ := syscall2(&libpthread_attr_setdetachstate, uintptr(unsafe.Pointer(attr)), uintptr(state))
+		return int32(r)
+	}
+
+	return pthread_attr_setdetachstate1(uintptr(unsafe.Pointer(attr)), state)
+}
+
+//go:nosplit
+func pthread_attr_setstackaddr(attr *pthread_attr, stk unsafe.Pointer) int32 {
+	r, _ := syscall2(&libpthread_attr_setstackaddr, uintptr(unsafe.Pointer(attr)), uintptr(stk))
+	return int32(r)
+}
+
+//go:nosplit
+func pthread_attr_getstacksize(attr *pthread_attr, size *uint64) int32 {
+	r, _ := syscall2(&libpthread_attr_getstacksize, uintptr(unsafe.Pointer(attr)), uintptr(unsafe.Pointer(size)))
+	return int32(r)
+}
+
+func pthread_attr_setstacksize1(attr uintptr, size uint64) int32
+
+//go:nosplit
+func pthread_attr_setstacksize(attr *pthread_attr, size uint64) int32 {
+	_g_ := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if _g_ != nil {
+		r, _ := syscall2(&libpthread_attr_setstacksize, uintptr(unsafe.Pointer(attr)), uintptr(size))
+		return int32(r)
+	}
+
+	return pthread_attr_setstacksize1(uintptr(unsafe.Pointer(attr)), size)
+}
+
+func pthread_create1(tid, attr, fn, arg uintptr) int32
+
+//go:nosplit
+func pthread_create(tid *pthread, attr *pthread_attr, fn *funcDescriptor, arg unsafe.Pointer) int32 {
+	_g_ := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if _g_ != nil {
+		r, _ := syscall4(&libpthread_create, uintptr(unsafe.Pointer(tid)), uintptr(unsafe.Pointer(attr)), uintptr(unsafe.Pointer(fn)), uintptr(arg))
+		return int32(r)
+	}
+
+	return pthread_create1(uintptr(unsafe.Pointer(tid)), uintptr(unsafe.Pointer(attr)), uintptr(unsafe.Pointer(fn)), uintptr(arg))
+}
+
+// On multi-thread program, sigprocmask must not be called.
+// It's replaced by sigthreadmask.
+func sigprocmask1(how, new, old uintptr)
+
+//go:nosplit
+func sigprocmask(how int32, new, old *sigset) {
+	_g_ := getg()
+
+	// Check the validity of m because it might be called during a cgo
+	// callback early enough where m isn't available yet.
+	if _g_ != nil && _g_.m != nil {
+		r, err := syscall3(&libpthread_sigthreadmask, uintptr(how), uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+		if int32(r) != 0 {
+			println("syscall sigthreadmask failed: ", hex(err))
+			throw("syscall sigthreadmask")
+		}
+		return
+	}
+	sigprocmask1(uintptr(how), uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+
+}
+
+//go:nosplit
+func pthread_self() pthread {
+	r, _ := syscall0(&libpthread_self)
+	return pthread(r)
+}
+
+//go:nosplit
+func signalM(mp *m, sig int) {
+	syscall2(&libpthread_kill, uintptr(pthread(mp.procid)), uintptr(sig))
+}
diff --git a/src/runtime/os2_freebsd.go b/src/runtime/os2_freebsd.go
new file mode 100644
index 0000000..29f0b76
--- /dev/null
+++ b/src/runtime/os2_freebsd.go
@@ -0,0 +1,14 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_SS_DISABLE  = 4
+	_NSIG        = 33
+	_SI_USER     = 0x10001
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+)
diff --git a/src/runtime/os2_openbsd.go b/src/runtime/os2_openbsd.go
new file mode 100644
index 0000000..8656a91
--- /dev/null
+++ b/src/runtime/os2_openbsd.go
@@ -0,0 +1,14 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_SS_DISABLE  = 4
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+	_NSIG        = 33
+	_SI_USER     = 0
+)
diff --git a/src/runtime/os2_plan9.go b/src/runtime/os2_plan9.go
new file mode 100644
index 0000000..58fb2be
--- /dev/null
+++ b/src/runtime/os2_plan9.go
@@ -0,0 +1,74 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Plan 9-specific system calls
+
+package runtime
+
+// open
+const (
+	_OREAD   = 0
+	_OWRITE  = 1
+	_ORDWR   = 2
+	_OEXEC   = 3
+	_OTRUNC  = 16
+	_OCEXEC  = 32
+	_ORCLOSE = 64
+	_OEXCL   = 0x1000
+)
+
+// rfork
+const (
+	_RFNAMEG  = 1 << 0
+	_RFENVG   = 1 << 1
+	_RFFDG    = 1 << 2
+	_RFNOTEG  = 1 << 3
+	_RFPROC   = 1 << 4
+	_RFMEM    = 1 << 5
+	_RFNOWAIT = 1 << 6
+	_RFCNAMEG = 1 << 10
+	_RFCENVG  = 1 << 11
+	_RFCFDG   = 1 << 12
+	_RFREND   = 1 << 13
+	_RFNOMNT  = 1 << 14
+)
+
+// notify
+const (
+	_NCONT = 0
+	_NDFLT = 1
+)
+
+type uinptr _Plink
+
+type tos struct {
+	prof struct { // Per process profiling
+		pp    *_Plink // known to be 0(ptr)
+		next  *_Plink // known to be 4(ptr)
+		last  *_Plink
+		first *_Plink
+		pid   uint32
+		what  uint32
+	}
+	cyclefreq uint64 // cycle clock frequency if there is one, 0 otherwise
+	kcycles   int64  // cycles spent in kernel
+	pcycles   int64  // cycles spent in process (kernel + user)
+	pid       uint32 // might as well put the pid here
+	clock     uint32
+	// top of stack is here
+}
+
+const (
+	_NSIG   = 14  // number of signals in sigtable array
+	_ERRMAX = 128 // max length of note string
+
+	// Notes in runtime·sigtab that are handled by runtime·sigpanic.
+	_SIGRFAULT = 2
+	_SIGWFAULT = 3
+	_SIGINTDIV = 4
+	_SIGFLOAT  = 5
+	_SIGTRAP   = 6
+	_SIGPROF   = 0 // dummy value defined for badsignal
+	_SIGQUIT   = 0 // dummy value defined for sighandler
+)
diff --git a/src/runtime/os2_solaris.go b/src/runtime/os2_solaris.go
new file mode 100644
index 0000000..108bea6
--- /dev/null
+++ b/src/runtime/os2_solaris.go
@@ -0,0 +1,13 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_SS_DISABLE  = 2
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+	_NSIG        = 73 /* number of signals in sigtable array */
+	_SI_USER     = 0
+)
diff --git a/src/runtime/os3_plan9.go b/src/runtime/os3_plan9.go
new file mode 100644
index 0000000..15ca335
--- /dev/null
+++ b/src/runtime/os3_plan9.go
@@ -0,0 +1,167 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func sighandler(_ureg *ureg, note *byte, gp *g) int {
+	_g_ := getg()
+	var t sigTabT
+	var docrash bool
+	var sig int
+	var flags int
+	var level int32
+
+	c := &sigctxt{_ureg}
+	notestr := gostringnocopy(note)
+
+	// The kernel will never pass us a nil note or ureg so we probably
+	// made a mistake somewhere in sigtramp.
+	if _ureg == nil || note == nil {
+		print("sighandler: ureg ", _ureg, " note ", note, "\n")
+		goto Throw
+	}
+	// Check that the note is no more than ERRMAX bytes (including
+	// the trailing NUL). We should never receive a longer note.
+	if len(notestr) > _ERRMAX-1 {
+		print("sighandler: note is longer than ERRMAX\n")
+		goto Throw
+	}
+	if isAbortPC(c.pc()) {
+		// Never turn abort into a panic.
+		goto Throw
+	}
+	// See if the note matches one of the patterns in sigtab.
+	// Notes that do not match any pattern can be handled at a higher
+	// level by the program but will otherwise be ignored.
+	flags = _SigNotify
+	for sig, t = range sigtable {
+		if hasPrefix(notestr, t.name) {
+			flags = t.flags
+			break
+		}
+	}
+	if flags&_SigPanic != 0 && gp.throwsplit {
+		// We can't safely sigpanic because it may grow the
+		// stack. Abort in the signal handler instead.
+		flags = (flags &^ _SigPanic) | _SigThrow
+	}
+	if flags&_SigGoExit != 0 {
+		exits((*byte)(add(unsafe.Pointer(note), 9))) // Strip "go: exit " prefix.
+	}
+	if flags&_SigPanic != 0 {
+		// Copy the error string from sigtramp's stack into m->notesig so
+		// we can reliably access it from the panic routines.
+		memmove(unsafe.Pointer(_g_.m.notesig), unsafe.Pointer(note), uintptr(len(notestr)+1))
+		gp.sig = uint32(sig)
+		gp.sigpc = c.pc()
+
+		pc := c.pc()
+		sp := c.sp()
+
+		// If we don't recognize the PC as code
+		// but we do recognize the top pointer on the stack as code,
+		// then assume this was a call to non-code and treat like
+		// pc == 0, to make unwinding show the context.
+		if pc != 0 && !findfunc(pc).valid() && findfunc(*(*uintptr)(unsafe.Pointer(sp))).valid() {
+			pc = 0
+		}
+
+		// IF LR exists, sigpanictramp must save it to the stack
+		// before entry to sigpanic so that panics in leaf
+		// functions are correctly handled. This will smash
+		// the stack frame but we're not going back there
+		// anyway.
+		if usesLR {
+			c.savelr(c.lr())
+		}
+
+		// If PC == 0, probably panicked because of a call to a nil func.
+		// Not faking that as the return address will make the trace look like a call
+		// to sigpanic instead. (Otherwise the trace will end at
+		// sigpanic and we won't get to see who faulted).
+		if pc != 0 {
+			if usesLR {
+				c.setlr(pc)
+			} else {
+				if sys.RegSize > sys.PtrSize {
+					sp -= sys.PtrSize
+					*(*uintptr)(unsafe.Pointer(sp)) = 0
+				}
+				sp -= sys.PtrSize
+				*(*uintptr)(unsafe.Pointer(sp)) = pc
+				c.setsp(sp)
+			}
+		}
+		if usesLR {
+			c.setpc(funcPC(sigpanictramp))
+		} else {
+			c.setpc(funcPC(sigpanic))
+		}
+		return _NCONT
+	}
+	if flags&_SigNotify != 0 {
+		if ignoredNote(note) {
+			return _NCONT
+		}
+		if sendNote(note) {
+			return _NCONT
+		}
+	}
+	if flags&_SigKill != 0 {
+		goto Exit
+	}
+	if flags&_SigThrow == 0 {
+		return _NCONT
+	}
+Throw:
+	_g_.m.throwing = 1
+	_g_.m.caughtsig.set(gp)
+	startpanic_m()
+	print(notestr, "\n")
+	print("PC=", hex(c.pc()), "\n")
+	print("\n")
+	level, _, docrash = gotraceback()
+	if level > 0 {
+		goroutineheader(gp)
+		tracebacktrap(c.pc(), c.sp(), c.lr(), gp)
+		tracebackothers(gp)
+		print("\n")
+		dumpregs(_ureg)
+	}
+	if docrash {
+		crash()
+	}
+Exit:
+	goexitsall(note)
+	exits(note)
+	return _NDFLT // not reached
+}
+
+func sigenable(sig uint32) {
+}
+
+func sigdisable(sig uint32) {
+}
+
+func sigignore(sig uint32) {
+}
+
+func setProcessCPUProfiler(hz int32) {
+}
+
+func setThreadCPUProfiler(hz int32) {
+	// TODO: Enable profiling interrupts.
+	getg().m.profilehz = hz
+}
+
+// gsignalStack is unused on Plan 9.
+type gsignalStack struct{}
diff --git a/src/runtime/os3_solaris.go b/src/runtime/os3_solaris.go
new file mode 100644
index 0000000..6ba11af
--- /dev/null
+++ b/src/runtime/os3_solaris.go
@@ -0,0 +1,619 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+//go:cgo_export_dynamic runtime.end _end
+//go:cgo_export_dynamic runtime.etext _etext
+//go:cgo_export_dynamic runtime.edata _edata
+
+//go:cgo_import_dynamic libc____errno ___errno "libc.so"
+//go:cgo_import_dynamic libc_clock_gettime clock_gettime "libc.so"
+//go:cgo_import_dynamic libc_exit exit "libc.so"
+//go:cgo_import_dynamic libc_getcontext getcontext "libc.so"
+//go:cgo_import_dynamic libc_kill kill "libc.so"
+//go:cgo_import_dynamic libc_madvise madvise "libc.so"
+//go:cgo_import_dynamic libc_malloc malloc "libc.so"
+//go:cgo_import_dynamic libc_mmap mmap "libc.so"
+//go:cgo_import_dynamic libc_munmap munmap "libc.so"
+//go:cgo_import_dynamic libc_open open "libc.so"
+//go:cgo_import_dynamic libc_pthread_attr_destroy pthread_attr_destroy "libc.so"
+//go:cgo_import_dynamic libc_pthread_attr_getstack pthread_attr_getstack "libc.so"
+//go:cgo_import_dynamic libc_pthread_attr_init pthread_attr_init "libc.so"
+//go:cgo_import_dynamic libc_pthread_attr_setdetachstate pthread_attr_setdetachstate "libc.so"
+//go:cgo_import_dynamic libc_pthread_attr_setstack pthread_attr_setstack "libc.so"
+//go:cgo_import_dynamic libc_pthread_create pthread_create "libc.so"
+//go:cgo_import_dynamic libc_pthread_self pthread_self "libc.so"
+//go:cgo_import_dynamic libc_pthread_kill pthread_kill "libc.so"
+//go:cgo_import_dynamic libc_raise raise "libc.so"
+//go:cgo_import_dynamic libc_read read "libc.so"
+//go:cgo_import_dynamic libc_select select "libc.so"
+//go:cgo_import_dynamic libc_sched_yield sched_yield "libc.so"
+//go:cgo_import_dynamic libc_sem_init sem_init "libc.so"
+//go:cgo_import_dynamic libc_sem_post sem_post "libc.so"
+//go:cgo_import_dynamic libc_sem_reltimedwait_np sem_reltimedwait_np "libc.so"
+//go:cgo_import_dynamic libc_sem_wait sem_wait "libc.so"
+//go:cgo_import_dynamic libc_setitimer setitimer "libc.so"
+//go:cgo_import_dynamic libc_sigaction sigaction "libc.so"
+//go:cgo_import_dynamic libc_sigaltstack sigaltstack "libc.so"
+//go:cgo_import_dynamic libc_sigprocmask sigprocmask "libc.so"
+//go:cgo_import_dynamic libc_sysconf sysconf "libc.so"
+//go:cgo_import_dynamic libc_usleep usleep "libc.so"
+//go:cgo_import_dynamic libc_write write "libc.so"
+//go:cgo_import_dynamic libc_pipe pipe "libc.so"
+//go:cgo_import_dynamic libc_pipe2 pipe2 "libc.so"
+
+//go:linkname libc____errno libc____errno
+//go:linkname libc_clock_gettime libc_clock_gettime
+//go:linkname libc_exit libc_exit
+//go:linkname libc_getcontext libc_getcontext
+//go:linkname libc_kill libc_kill
+//go:linkname libc_madvise libc_madvise
+//go:linkname libc_malloc libc_malloc
+//go:linkname libc_mmap libc_mmap
+//go:linkname libc_munmap libc_munmap
+//go:linkname libc_open libc_open
+//go:linkname libc_pthread_attr_destroy libc_pthread_attr_destroy
+//go:linkname libc_pthread_attr_getstack libc_pthread_attr_getstack
+//go:linkname libc_pthread_attr_init libc_pthread_attr_init
+//go:linkname libc_pthread_attr_setdetachstate libc_pthread_attr_setdetachstate
+//go:linkname libc_pthread_attr_setstack libc_pthread_attr_setstack
+//go:linkname libc_pthread_create libc_pthread_create
+//go:linkname libc_pthread_self libc_pthread_self
+//go:linkname libc_pthread_kill libc_pthread_kill
+//go:linkname libc_raise libc_raise
+//go:linkname libc_read libc_read
+//go:linkname libc_select libc_select
+//go:linkname libc_sched_yield libc_sched_yield
+//go:linkname libc_sem_init libc_sem_init
+//go:linkname libc_sem_post libc_sem_post
+//go:linkname libc_sem_reltimedwait_np libc_sem_reltimedwait_np
+//go:linkname libc_sem_wait libc_sem_wait
+//go:linkname libc_setitimer libc_setitimer
+//go:linkname libc_sigaction libc_sigaction
+//go:linkname libc_sigaltstack libc_sigaltstack
+//go:linkname libc_sigprocmask libc_sigprocmask
+//go:linkname libc_sysconf libc_sysconf
+//go:linkname libc_usleep libc_usleep
+//go:linkname libc_write libc_write
+//go:linkname libc_pipe libc_pipe
+//go:linkname libc_pipe2 libc_pipe2
+
+var (
+	libc____errno,
+	libc_clock_gettime,
+	libc_exit,
+	libc_getcontext,
+	libc_kill,
+	libc_madvise,
+	libc_malloc,
+	libc_mmap,
+	libc_munmap,
+	libc_open,
+	libc_pthread_attr_destroy,
+	libc_pthread_attr_getstack,
+	libc_pthread_attr_init,
+	libc_pthread_attr_setdetachstate,
+	libc_pthread_attr_setstack,
+	libc_pthread_create,
+	libc_pthread_self,
+	libc_pthread_kill,
+	libc_raise,
+	libc_read,
+	libc_sched_yield,
+	libc_select,
+	libc_sem_init,
+	libc_sem_post,
+	libc_sem_reltimedwait_np,
+	libc_sem_wait,
+	libc_setitimer,
+	libc_sigaction,
+	libc_sigaltstack,
+	libc_sigprocmask,
+	libc_sysconf,
+	libc_usleep,
+	libc_write,
+	libc_pipe,
+	libc_pipe2 libcFunc
+)
+
+var sigset_all = sigset{[4]uint32{^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)}}
+
+func getPageSize() uintptr {
+	n := int32(sysconf(__SC_PAGESIZE))
+	if n <= 0 {
+		return 0
+	}
+	return uintptr(n)
+}
+
+func osinit() {
+	ncpu = getncpu()
+	if physPageSize == 0 {
+		physPageSize = getPageSize()
+	}
+}
+
+func tstart_sysvicall(newm *m) uint32
+
+// May run with m.p==nil, so write barriers are not allowed.
+//go:nowritebarrier
+func newosproc(mp *m) {
+	var (
+		attr pthreadattr
+		oset sigset
+		tid  pthread
+		ret  int32
+		size uint64
+	)
+
+	if pthread_attr_init(&attr) != 0 {
+		throw("pthread_attr_init")
+	}
+	// Allocate a new 2MB stack.
+	if pthread_attr_setstack(&attr, 0, 0x200000) != 0 {
+		throw("pthread_attr_setstack")
+	}
+	// Read back the allocated stack.
+	if pthread_attr_getstack(&attr, unsafe.Pointer(&mp.g0.stack.hi), &size) != 0 {
+		throw("pthread_attr_getstack")
+	}
+	mp.g0.stack.lo = mp.g0.stack.hi - uintptr(size)
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		throw("pthread_attr_setdetachstate")
+	}
+
+	// Disable signals during create, so that the new thread starts
+	// with signals disabled. It will enable them in minit.
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret = pthread_create(&tid, &attr, funcPC(tstart_sysvicall), unsafe.Pointer(mp))
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret != 0 {
+		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", ret, ")\n")
+		if ret == -_EAGAIN {
+			println("runtime: may need to increase max user processes (ulimit -u)")
+		}
+		throw("newosproc")
+	}
+}
+
+func exitThread(wait *uint32) {
+	// We should never reach exitThread on Solaris because we let
+	// libc clean up threads.
+	throw("exitThread")
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+}
+
+func miniterrno()
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	asmcgocall(unsafe.Pointer(funcPC(miniterrno)), unsafe.Pointer(&libc____errno))
+
+	minitSignals()
+
+	getg().m.procid = uint64(pthread_self())
+}
+
+// Called from dropm to undo the effect of an minit.
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func sigtramp()
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == funcPC(sighandler) {
+		fn = funcPC(sigtramp)
+	}
+	*((*uintptr)(unsafe.Pointer(&sa._funcptr))) = fn
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	if sa.sa_flags&_SA_ONSTACK != 0 {
+		return
+	}
+	sa.sa_flags |= _SA_ONSTACK
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return *((*uintptr)(unsafe.Pointer(&sa._funcptr)))
+}
+
+// setSignaltstackSP sets the ss_sp field of a stackt.
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	*(*uintptr)(unsafe.Pointer(&s.ss_sp)) = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	mask.__sigbits[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	mask.__sigbits[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+	if mp.waitsema != 0 {
+		return
+	}
+
+	var sem *semt
+	_g_ := getg()
+
+	// Call libc's malloc rather than malloc. This will
+	// allocate space on the C heap. We can't call malloc
+	// here because it could cause a deadlock.
+	_g_.m.libcall.fn = uintptr(unsafe.Pointer(&libc_malloc))
+	_g_.m.libcall.n = 1
+	_g_.m.scratch = mscratch{}
+	_g_.m.scratch.v[0] = unsafe.Sizeof(*sem)
+	_g_.m.libcall.args = uintptr(unsafe.Pointer(&_g_.m.scratch))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&_g_.m.libcall))
+	sem = (*semt)(unsafe.Pointer(_g_.m.libcall.r1))
+	if sem_init(sem, 0, 0) != 0 {
+		throw("sem_init")
+	}
+	mp.waitsema = uintptr(unsafe.Pointer(sem))
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	_m_ := getg().m
+	if ns >= 0 {
+		_m_.ts.tv_sec = ns / 1000000000
+		_m_.ts.tv_nsec = ns % 1000000000
+
+		_m_.libcall.fn = uintptr(unsafe.Pointer(&libc_sem_reltimedwait_np))
+		_m_.libcall.n = 2
+		_m_.scratch = mscratch{}
+		_m_.scratch.v[0] = _m_.waitsema
+		_m_.scratch.v[1] = uintptr(unsafe.Pointer(&_m_.ts))
+		_m_.libcall.args = uintptr(unsafe.Pointer(&_m_.scratch))
+		asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&_m_.libcall))
+		if *_m_.perrno != 0 {
+			if *_m_.perrno == _ETIMEDOUT || *_m_.perrno == _EAGAIN || *_m_.perrno == _EINTR {
+				return -1
+			}
+			throw("sem_reltimedwait_np")
+		}
+		return 0
+	}
+	for {
+		_m_.libcall.fn = uintptr(unsafe.Pointer(&libc_sem_wait))
+		_m_.libcall.n = 1
+		_m_.scratch = mscratch{}
+		_m_.scratch.v[0] = _m_.waitsema
+		_m_.libcall.args = uintptr(unsafe.Pointer(&_m_.scratch))
+		asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&_m_.libcall))
+		if _m_.libcall.r1 == 0 {
+			break
+		}
+		if *_m_.perrno == _EINTR {
+			continue
+		}
+		throw("sem_wait")
+	}
+	return 0
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	if sem_post((*semt)(unsafe.Pointer(mp.waitsema))) != 0 {
+		throw("sem_post")
+	}
+}
+
+//go:nosplit
+func closefd(fd int32) int32 {
+	return int32(sysvicall1(&libc_close, uintptr(fd)))
+}
+
+//go:nosplit
+func exit(r int32) {
+	sysvicall1(&libc_exit, uintptr(r))
+}
+
+//go:nosplit
+func getcontext(context *ucontext) /* int32 */ {
+	sysvicall1(&libc_getcontext, uintptr(unsafe.Pointer(context)))
+}
+
+//go:nosplit
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) {
+	sysvicall3(&libc_madvise, uintptr(addr), uintptr(n), uintptr(flags))
+}
+
+//go:nosplit
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (unsafe.Pointer, int) {
+	p, err := doMmap(uintptr(addr), n, uintptr(prot), uintptr(flags), uintptr(fd), uintptr(off))
+	if p == ^uintptr(0) {
+		return nil, int(err)
+	}
+	return unsafe.Pointer(p), 0
+}
+
+//go:nosplit
+func doMmap(addr, n, prot, flags, fd, off uintptr) (uintptr, uintptr) {
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(&libc_mmap))
+	libcall.n = 6
+	libcall.args = uintptr(noescape(unsafe.Pointer(&addr)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	return libcall.r1, libcall.err
+}
+
+//go:nosplit
+func munmap(addr unsafe.Pointer, n uintptr) {
+	sysvicall2(&libc_munmap, uintptr(addr), uintptr(n))
+}
+
+const (
+	_CLOCK_REALTIME  = 3
+	_CLOCK_MONOTONIC = 4
+)
+
+//go:nosplit
+func nanotime1() int64 {
+	var ts mts
+	sysvicall2(&libc_clock_gettime, _CLOCK_MONOTONIC, uintptr(unsafe.Pointer(&ts)))
+	return ts.tv_sec*1e9 + ts.tv_nsec
+}
+
+//go:nosplit
+func open(path *byte, mode, perm int32) int32 {
+	return int32(sysvicall3(&libc_open, uintptr(unsafe.Pointer(path)), uintptr(mode), uintptr(perm)))
+}
+
+func pthread_attr_destroy(attr *pthreadattr) int32 {
+	return int32(sysvicall1(&libc_pthread_attr_destroy, uintptr(unsafe.Pointer(attr))))
+}
+
+func pthread_attr_getstack(attr *pthreadattr, addr unsafe.Pointer, size *uint64) int32 {
+	return int32(sysvicall3(&libc_pthread_attr_getstack, uintptr(unsafe.Pointer(attr)), uintptr(addr), uintptr(unsafe.Pointer(size))))
+}
+
+func pthread_attr_init(attr *pthreadattr) int32 {
+	return int32(sysvicall1(&libc_pthread_attr_init, uintptr(unsafe.Pointer(attr))))
+}
+
+func pthread_attr_setdetachstate(attr *pthreadattr, state int32) int32 {
+	return int32(sysvicall2(&libc_pthread_attr_setdetachstate, uintptr(unsafe.Pointer(attr)), uintptr(state)))
+}
+
+func pthread_attr_setstack(attr *pthreadattr, addr uintptr, size uint64) int32 {
+	return int32(sysvicall3(&libc_pthread_attr_setstack, uintptr(unsafe.Pointer(attr)), uintptr(addr), uintptr(size)))
+}
+
+func pthread_create(thread *pthread, attr *pthreadattr, fn uintptr, arg unsafe.Pointer) int32 {
+	return int32(sysvicall4(&libc_pthread_create, uintptr(unsafe.Pointer(thread)), uintptr(unsafe.Pointer(attr)), uintptr(fn), uintptr(arg)))
+}
+
+func pthread_self() pthread {
+	return pthread(sysvicall0(&libc_pthread_self))
+}
+
+func signalM(mp *m, sig int) {
+	sysvicall2(&libc_pthread_kill, uintptr(pthread(mp.procid)), uintptr(sig))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func raise(sig uint32) /* int32 */ {
+	sysvicall1(&libc_raise, uintptr(sig))
+}
+
+func raiseproc(sig uint32) /* int32 */ {
+	pid := sysvicall0(&libc_getpid)
+	sysvicall2(&libc_kill, pid, uintptr(sig))
+}
+
+//go:nosplit
+func read(fd int32, buf unsafe.Pointer, nbyte int32) int32 {
+	r1, err := sysvicall3Err(&libc_read, uintptr(fd), uintptr(buf), uintptr(nbyte))
+	if c := int32(r1); c >= 0 {
+		return c
+	}
+	return -int32(err)
+}
+
+//go:nosplit
+func sem_init(sem *semt, pshared int32, value uint32) int32 {
+	return int32(sysvicall3(&libc_sem_init, uintptr(unsafe.Pointer(sem)), uintptr(pshared), uintptr(value)))
+}
+
+//go:nosplit
+func sem_post(sem *semt) int32 {
+	return int32(sysvicall1(&libc_sem_post, uintptr(unsafe.Pointer(sem))))
+}
+
+//go:nosplit
+func sem_reltimedwait_np(sem *semt, timeout *timespec) int32 {
+	return int32(sysvicall2(&libc_sem_reltimedwait_np, uintptr(unsafe.Pointer(sem)), uintptr(unsafe.Pointer(timeout))))
+}
+
+//go:nosplit
+func sem_wait(sem *semt) int32 {
+	return int32(sysvicall1(&libc_sem_wait, uintptr(unsafe.Pointer(sem))))
+}
+
+func setitimer(which int32, value *itimerval, ovalue *itimerval) /* int32 */ {
+	sysvicall3(&libc_setitimer, uintptr(which), uintptr(unsafe.Pointer(value)), uintptr(unsafe.Pointer(ovalue)))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaction(sig uint32, act *sigactiont, oact *sigactiont) /* int32 */ {
+	sysvicall3(&libc_sigaction, uintptr(sig), uintptr(unsafe.Pointer(act)), uintptr(unsafe.Pointer(oact)))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaltstack(ss *stackt, oss *stackt) /* int32 */ {
+	sysvicall2(&libc_sigaltstack, uintptr(unsafe.Pointer(ss)), uintptr(unsafe.Pointer(oss)))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigprocmask(how int32, set *sigset, oset *sigset) /* int32 */ {
+	sysvicall3(&libc_sigprocmask, uintptr(how), uintptr(unsafe.Pointer(set)), uintptr(unsafe.Pointer(oset)))
+}
+
+func sysconf(name int32) int64 {
+	return int64(sysvicall1(&libc_sysconf, uintptr(name)))
+}
+
+func usleep1(usec uint32)
+
+//go:nosplit
+func usleep(µs uint32) {
+	usleep1(µs)
+}
+
+func walltime1() (sec int64, nsec int32) {
+	var ts mts
+	sysvicall2(&libc_clock_gettime, _CLOCK_REALTIME, uintptr(unsafe.Pointer(&ts)))
+	return ts.tv_sec, int32(ts.tv_nsec)
+}
+
+//go:nosplit
+func write1(fd uintptr, buf unsafe.Pointer, nbyte int32) int32 {
+	r1, err := sysvicall3Err(&libc_write, fd, uintptr(buf), uintptr(nbyte))
+	if c := int32(r1); c >= 0 {
+		return c
+	}
+	return -int32(err)
+}
+
+//go:nosplit
+func pipe() (r, w int32, errno int32) {
+	var p [2]int32
+	_, e := sysvicall1Err(&libc_pipe, uintptr(noescape(unsafe.Pointer(&p))))
+	return p[0], p[1], int32(e)
+}
+
+//go:nosplit
+func pipe2(flags int32) (r, w int32, errno int32) {
+	var p [2]int32
+	_, e := sysvicall2Err(&libc_pipe2, uintptr(noescape(unsafe.Pointer(&p))), uintptr(flags))
+	return p[0], p[1], int32(e)
+}
+
+//go:nosplit
+func closeonexec(fd int32) {
+	fcntl(fd, _F_SETFD, _FD_CLOEXEC)
+}
+
+//go:nosplit
+func setNonblock(fd int32) {
+	flags := fcntl(fd, _F_GETFL, 0)
+	fcntl(fd, _F_SETFL, flags|_O_NONBLOCK)
+}
+
+func osyield1()
+
+//go:nosplit
+func osyield() {
+	_g_ := getg()
+
+	// Check the validity of m because we might be called in cgo callback
+	// path early enough where there isn't a m available yet.
+	if _g_ != nil && _g_.m != nil {
+		sysvicall0(&libc_sched_yield)
+		return
+	}
+	osyield1()
+}
+
+//go:linkname executablePath os.executablePath
+var executablePath string
+
+func sysargs(argc int32, argv **byte) {
+	n := argc + 1
+
+	// skip over argv, envp to get to auxv
+	for argv_index(argv, n) != nil {
+		n++
+	}
+
+	// skip NULL separator
+	n++
+
+	// now argv+n is auxv
+	auxv := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*sys.PtrSize))
+	sysauxv(auxv[:])
+}
+
+const (
+	_AT_NULL         = 0    // Terminates the vector
+	_AT_PAGESZ       = 6    // Page size in bytes
+	_AT_SUN_EXECNAME = 2014 // exec() path name
+)
+
+func sysauxv(auxv []uintptr) {
+	for i := 0; auxv[i] != _AT_NULL; i += 2 {
+		tag, val := auxv[i], auxv[i+1]
+		switch tag {
+		case _AT_PAGESZ:
+			physPageSize = val
+		case _AT_SUN_EXECNAME:
+			executablePath = gostringnocopy((*byte)(unsafe.Pointer(val)))
+		}
+	}
+}
diff --git a/src/runtime/os_aix.go b/src/runtime/os_aix.go
new file mode 100644
index 0000000..303f087
--- /dev/null
+++ b/src/runtime/os_aix.go
@@ -0,0 +1,361 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+const (
+	threadStackSize = 0x100000 // size of a thread stack allocated by OS
+)
+
+// funcDescriptor is a structure representing a function descriptor
+// A variable with this type is always created in assembler
+type funcDescriptor struct {
+	fn         uintptr
+	toc        uintptr
+	envPointer uintptr // unused in Golang
+}
+
+type mOS struct {
+	waitsema uintptr // semaphore for parking on locks
+	perrno   uintptr // pointer to tls errno
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+	if mp.waitsema != 0 {
+		return
+	}
+
+	var sem *semt
+
+	// Call libc's malloc rather than malloc. This will
+	// allocate space on the C heap. We can't call mallocgc
+	// here because it could cause a deadlock.
+	sem = (*semt)(malloc(unsafe.Sizeof(*sem)))
+	if sem_init(sem, 0, 0) != 0 {
+		throw("sem_init")
+	}
+	mp.waitsema = uintptr(unsafe.Pointer(sem))
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	_m_ := getg().m
+	if ns >= 0 {
+		var ts timespec
+
+		if clock_gettime(_CLOCK_REALTIME, &ts) != 0 {
+			throw("clock_gettime")
+		}
+		ts.tv_sec += ns / 1e9
+		ts.tv_nsec += ns % 1e9
+		if ts.tv_nsec >= 1e9 {
+			ts.tv_sec++
+			ts.tv_nsec -= 1e9
+		}
+
+		if r, err := sem_timedwait((*semt)(unsafe.Pointer(_m_.waitsema)), &ts); r != 0 {
+			if err == _ETIMEDOUT || err == _EAGAIN || err == _EINTR {
+				return -1
+			}
+			println("sem_timedwait err ", err, " ts.tv_sec ", ts.tv_sec, " ts.tv_nsec ", ts.tv_nsec, " ns ", ns, " id ", _m_.id)
+			throw("sem_timedwait")
+		}
+		return 0
+	}
+	for {
+		r1, err := sem_wait((*semt)(unsafe.Pointer(_m_.waitsema)))
+		if r1 == 0 {
+			break
+		}
+		if err == _EINTR {
+			continue
+		}
+		throw("sem_wait")
+	}
+	return 0
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	if sem_post((*semt)(unsafe.Pointer(mp.waitsema))) != 0 {
+		throw("sem_post")
+	}
+}
+
+func osinit() {
+	ncpu = int32(sysconf(__SC_NPROCESSORS_ONLN))
+	physPageSize = sysconf(__SC_PAGE_SIZE)
+}
+
+// newosproc0 is a version of newosproc that can be called before the runtime
+// is initialized.
+//
+// This function is not safe to use after initialization as it does not pass an M as fnarg.
+//
+//go:nosplit
+func newosproc0(stacksize uintptr, fn *funcDescriptor) {
+	var (
+		attr pthread_attr
+		oset sigset
+		tid  pthread
+	)
+
+	if pthread_attr_init(&attr) != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+
+	if pthread_attr_setstacksize(&attr, threadStackSize) != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+
+	// Disable signals during create, so that the new thread starts
+	// with signals disabled. It will enable them in minit.
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	var ret int32
+	for tries := 0; tries < 20; tries++ {
+		// pthread_create can fail with EAGAIN for no reasons
+		// but it will be ok if it retries.
+		ret = pthread_create(&tid, &attr, fn, nil)
+		if ret != _EAGAIN {
+			break
+		}
+		usleep(uint32(tries+1) * 1000) // Milliseconds.
+	}
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+
+}
+
+var failthreadcreate = []byte("runtime: failed to create new OS thread\n")
+
+// Called to do synchronous initialization of Go code built with
+// -buildmode=c-archive or -buildmode=c-shared.
+// None of the Go runtime is initialized.
+//go:nosplit
+//go:nowritebarrierrec
+func libpreinit() {
+	initsig(true)
+}
+
+// Ms related functions
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024) // AIX wants >= 8K
+	mp.gsignal.m = mp
+}
+
+// errno address must be retrieved by calling _Errno libc function.
+// This will return a pointer to errno
+func miniterrno() {
+	mp := getg().m
+	r, _ := syscall0(&libc__Errno)
+	mp.perrno = r
+
+}
+
+func minit() {
+	miniterrno()
+	minitSignals()
+	getg().m.procid = uint64(pthread_self())
+}
+
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+// tstart is a function descriptor to _tstart defined in assembly.
+var tstart funcDescriptor
+
+func newosproc(mp *m) {
+	var (
+		attr pthread_attr
+		oset sigset
+		tid  pthread
+	)
+
+	if pthread_attr_init(&attr) != 0 {
+		throw("pthread_attr_init")
+	}
+
+	if pthread_attr_setstacksize(&attr, threadStackSize) != 0 {
+		throw("pthread_attr_getstacksize")
+	}
+
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		throw("pthread_attr_setdetachstate")
+	}
+
+	// Disable signals during create, so that the new thread starts
+	// with signals disabled. It will enable them in minit.
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	var ret int32
+	for tries := 0; tries < 20; tries++ {
+		// pthread_create can fail with EAGAIN for no reasons
+		// but it will be ok if it retries.
+		ret = pthread_create(&tid, &attr, &tstart, unsafe.Pointer(mp))
+		if ret != _EAGAIN {
+			break
+		}
+		usleep(uint32(tries+1) * 1000) // Milliseconds.
+	}
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret != 0 {
+		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", ret, ")\n")
+		if ret == _EAGAIN {
+			println("runtime: may need to increase max user processes (ulimit -u)")
+		}
+		throw("newosproc")
+	}
+
+}
+
+func exitThread(wait *uint32) {
+	// We should never reach exitThread on AIX because we let
+	// libc clean up threads.
+	throw("exitThread")
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+/* SIGNAL */
+
+const (
+	_NSIG = 256
+)
+
+// sigtramp is a function descriptor to _sigtramp defined in assembly
+var sigtramp funcDescriptor
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == funcPC(sighandler) {
+		fn = uintptr(unsafe.Pointer(&sigtramp))
+	}
+	sa.sa_handler = fn
+	sigaction(uintptr(i), &sa, nil)
+
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	var sa sigactiont
+	sigaction(uintptr(i), nil, &sa)
+	if sa.sa_flags&_SA_ONSTACK != 0 {
+		return
+	}
+	sa.sa_flags |= _SA_ONSTACK
+	sigaction(uintptr(i), &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(uintptr(i), nil, &sa)
+	return sa.sa_handler
+}
+
+// setSignaltstackSP sets the ss_sp field of a stackt.
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	*(*uintptr)(unsafe.Pointer(&s.ss_sp)) = sp
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+	switch sig {
+	case _SIGPIPE:
+		// For SIGPIPE, c.sigcode() isn't set to _SI_USER as on Linux.
+		// Therefore, raisebadsignal won't raise SIGPIPE again if
+		// it was deliver in a non-Go thread.
+		c.set_sigcode(_SI_USER)
+	}
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	(*mask)[(i-1)/64] |= 1 << ((uint32(i) - 1) & 63)
+}
+
+func sigdelset(mask *sigset, i int) {
+	(*mask)[(i-1)/64] &^= 1 << ((uint32(i) - 1) & 63)
+}
+
+const (
+	_CLOCK_REALTIME  = 9
+	_CLOCK_MONOTONIC = 10
+)
+
+//go:nosplit
+func nanotime1() int64 {
+	tp := &timespec{}
+	if clock_gettime(_CLOCK_REALTIME, tp) != 0 {
+		throw("syscall clock_gettime failed")
+	}
+	return tp.tv_sec*1000000000 + tp.tv_nsec
+}
+
+func walltime1() (sec int64, nsec int32) {
+	ts := &timespec{}
+	if clock_gettime(_CLOCK_REALTIME, ts) != 0 {
+		throw("syscall clock_gettime failed")
+	}
+	return ts.tv_sec, int32(ts.tv_nsec)
+}
+
+//go:nosplit
+func fcntl(fd, cmd, arg int32) int32 {
+	r, _ := syscall3(&libc_fcntl, uintptr(fd), uintptr(cmd), uintptr(arg))
+	return int32(r)
+}
+
+//go:nosplit
+func closeonexec(fd int32) {
+	fcntl(fd, _F_SETFD, _FD_CLOEXEC)
+}
+
+//go:nosplit
+func setNonblock(fd int32) {
+	flags := fcntl(fd, _F_GETFL, 0)
+	fcntl(fd, _F_SETFL, flags|_O_NONBLOCK)
+}
diff --git a/src/runtime/os_android.go b/src/runtime/os_android.go
new file mode 100644
index 0000000..52c8c86
--- /dev/null
+++ b/src/runtime/os_android.go
@@ -0,0 +1,15 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import _ "unsafe" // for go:cgo_export_static and go:cgo_export_dynamic
+
+// Export the main function.
+//
+// Used by the app package to start all-Go Android apps that are
+// loaded via JNI. See golang.org/x/mobile/app.
+
+//go:cgo_export_static main.main
+//go:cgo_export_dynamic main.main
diff --git a/src/runtime/os_darwin.go b/src/runtime/os_darwin.go
new file mode 100644
index 0000000..9ca17c2
--- /dev/null
+++ b/src/runtime/os_darwin.go
@@ -0,0 +1,435 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type mOS struct {
+	initialized bool
+	mutex       pthreadmutex
+	cond        pthreadcond
+	count       int
+}
+
+func unimplemented(name string) {
+	println(name, "not implemented")
+	*(*int)(unsafe.Pointer(uintptr(1231))) = 1231
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+	if mp.initialized {
+		return
+	}
+	mp.initialized = true
+	if err := pthread_mutex_init(&mp.mutex, nil); err != 0 {
+		throw("pthread_mutex_init")
+	}
+	if err := pthread_cond_init(&mp.cond, nil); err != 0 {
+		throw("pthread_cond_init")
+	}
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	var start int64
+	if ns >= 0 {
+		start = nanotime()
+	}
+	mp := getg().m
+	pthread_mutex_lock(&mp.mutex)
+	for {
+		if mp.count > 0 {
+			mp.count--
+			pthread_mutex_unlock(&mp.mutex)
+			return 0
+		}
+		if ns >= 0 {
+			spent := nanotime() - start
+			if spent >= ns {
+				pthread_mutex_unlock(&mp.mutex)
+				return -1
+			}
+			var t timespec
+			t.setNsec(ns - spent)
+			err := pthread_cond_timedwait_relative_np(&mp.cond, &mp.mutex, &t)
+			if err == _ETIMEDOUT {
+				pthread_mutex_unlock(&mp.mutex)
+				return -1
+			}
+		} else {
+			pthread_cond_wait(&mp.cond, &mp.mutex)
+		}
+	}
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	pthread_mutex_lock(&mp.mutex)
+	mp.count++
+	if mp.count > 0 {
+		pthread_cond_signal(&mp.cond)
+	}
+	pthread_mutex_unlock(&mp.mutex)
+}
+
+// The read and write file descriptors used by the sigNote functions.
+var sigNoteRead, sigNoteWrite int32
+
+// sigNoteSetup initializes an async-signal-safe note.
+//
+// The current implementation of notes on Darwin is not async-signal-safe,
+// because the functions pthread_mutex_lock, pthread_cond_signal, and
+// pthread_mutex_unlock, called by semawakeup, are not async-signal-safe.
+// There is only one case where we need to wake up a note from a signal
+// handler: the sigsend function. The signal handler code does not require
+// all the features of notes: it does not need to do a timed wait.
+// This is a separate implementation of notes, based on a pipe, that does
+// not support timed waits but is async-signal-safe.
+func sigNoteSetup(*note) {
+	if sigNoteRead != 0 || sigNoteWrite != 0 {
+		throw("duplicate sigNoteSetup")
+	}
+	var errno int32
+	sigNoteRead, sigNoteWrite, errno = pipe()
+	if errno != 0 {
+		throw("pipe failed")
+	}
+	closeonexec(sigNoteRead)
+	closeonexec(sigNoteWrite)
+
+	// Make the write end of the pipe non-blocking, so that if the pipe
+	// buffer is somehow full we will not block in the signal handler.
+	// Leave the read end of the pipe blocking so that we will block
+	// in sigNoteSleep.
+	setNonblock(sigNoteWrite)
+}
+
+// sigNoteWakeup wakes up a thread sleeping on a note created by sigNoteSetup.
+func sigNoteWakeup(*note) {
+	var b byte
+	write(uintptr(sigNoteWrite), unsafe.Pointer(&b), 1)
+}
+
+// sigNoteSleep waits for a note created by sigNoteSetup to be woken.
+func sigNoteSleep(*note) {
+	entersyscallblock()
+	var b byte
+	read(sigNoteRead, unsafe.Pointer(&b), 1)
+	exitsyscall()
+}
+
+// BSD interface for threading.
+func osinit() {
+	// pthread_create delayed until end of goenvs so that we
+	// can look at the environment first.
+
+	ncpu = getncpu()
+	physPageSize = getPageSize()
+}
+
+func sysctlbynameInt32(name []byte) (int32, int32) {
+	out := int32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctlbyname(&name[0], (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	return ret, out
+}
+
+//go:linkname internal_cpu_getsysctlbyname internal/cpu.getsysctlbyname
+func internal_cpu_getsysctlbyname(name []byte) (int32, int32) {
+	return sysctlbynameInt32(name)
+}
+
+const (
+	_CTL_HW      = 6
+	_HW_NCPU     = 3
+	_HW_PAGESIZE = 7
+)
+
+func getncpu() int32 {
+	// Use sysctl to fetch hw.ncpu.
+	mib := [2]uint32{_CTL_HW, _HW_NCPU}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 && int32(out) > 0 {
+		return int32(out)
+	}
+	return 1
+}
+
+func getPageSize() uintptr {
+	// Use sysctl to fetch hw.pagesize.
+	mib := [2]uint32{_CTL_HW, _HW_PAGESIZE}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 && int32(out) > 0 {
+		return uintptr(out)
+	}
+	return 0
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// May run with m.p==nil, so write barriers are not allowed.
+//go:nowritebarrierrec
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	// Initialize an attribute object.
+	var attr pthreadattr
+	var err int32
+	err = pthread_attr_init(&attr)
+	if err != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+
+	// Find out OS stack size for our own stack guard.
+	var stacksize uintptr
+	if pthread_attr_getstacksize(&attr, &stacksize) != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+	mp.g0.stack.hi = stacksize // for mstart
+
+	// Tell the pthread library we won't join with this thread.
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+
+	// Finally, create the thread. It starts at mstart_stub, which does some low-level
+	// setup and then calls mstart.
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	err = pthread_create(&attr, funcPC(mstart_stub), unsafe.Pointer(mp))
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if err != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+}
+
+// glue code to call mstart from pthread_create.
+func mstart_stub()
+
+// newosproc0 is a version of newosproc that can be called before the runtime
+// is initialized.
+//
+// This function is not safe to use after initialization as it does not pass an M as fnarg.
+//
+//go:nosplit
+func newosproc0(stacksize uintptr, fn uintptr) {
+	// Initialize an attribute object.
+	var attr pthreadattr
+	var err int32
+	err = pthread_attr_init(&attr)
+	if err != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+
+	// The caller passes in a suggested stack size,
+	// from when we allocated the stack and thread ourselves,
+	// without libpthread. Now that we're using libpthread,
+	// we use the OS default stack size instead of the suggestion.
+	// Find out that stack size for our own stack guard.
+	if pthread_attr_getstacksize(&attr, &stacksize) != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+	g0.stack.hi = stacksize // for mstart
+	memstats.stacks_sys.add(int64(stacksize))
+
+	// Tell the pthread library we won't join with this thread.
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+
+	// Finally, create the thread. It starts at mstart_stub, which does some low-level
+	// setup and then calls mstart.
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	err = pthread_create(&attr, fn, nil)
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if err != 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+}
+
+var failallocatestack = []byte("runtime: failed to allocate stack for the new OS thread\n")
+var failthreadcreate = []byte("runtime: failed to create new OS thread\n")
+
+// Called to do synchronous initialization of Go code built with
+// -buildmode=c-archive or -buildmode=c-shared.
+// None of the Go runtime is initialized.
+//go:nosplit
+//go:nowritebarrierrec
+func libpreinit() {
+	initsig(true)
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024) // OS X wants >= 8K
+	mp.gsignal.m = mp
+	if GOOS == "darwin" && GOARCH == "arm64" {
+		// mlock the signal stack to work around a kernel bug where it may
+		// SIGILL when the signal stack is not faulted in while a signal
+		// arrives. See issue 42774.
+		mlock(unsafe.Pointer(mp.gsignal.stack.hi-physPageSize), physPageSize)
+	}
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	// iOS does not support alternate signal stack.
+	// The signal handler handles it directly.
+	if !(GOOS == "ios" && GOARCH == "arm64") {
+		minitSignalStack()
+	}
+	minitSignalMask()
+	getg().m.procid = uint64(pthread_self())
+}
+
+// Called from dropm to undo the effect of an minit.
+//go:nosplit
+func unminit() {
+	// iOS does not support alternate signal stack.
+	// See minit.
+	if !(GOOS == "ios" && GOARCH == "arm64") {
+		unminitSignals()
+	}
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+//go:nosplit
+func osyield() {
+	usleep(1)
+}
+
+const (
+	_NSIG        = 32
+	_SI_USER     = 0 /* empirically true, but not what headers say */
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+	_SS_DISABLE  = 4
+)
+
+//extern SigTabTT runtime·sigtab[];
+
+type sigset uint32
+
+var sigset_all = ^sigset(0)
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa usigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = ^uint32(0)
+	if fn == funcPC(sighandler) {
+		if iscgo {
+			fn = funcPC(cgoSigtramp)
+		} else {
+			fn = funcPC(sigtramp)
+		}
+	}
+	*(*uintptr)(unsafe.Pointer(&sa.__sigaction_u)) = fn
+	sigaction(i, &sa, nil)
+}
+
+// sigtramp is the callback from libc when a signal is received.
+// It is called with the C calling convention.
+func sigtramp()
+func cgoSigtramp()
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	var osa usigactiont
+	sigaction(i, nil, &osa)
+	handler := *(*uintptr)(unsafe.Pointer(&osa.__sigaction_u))
+	if osa.sa_flags&_SA_ONSTACK != 0 {
+		return
+	}
+	var sa usigactiont
+	*(*uintptr)(unsafe.Pointer(&sa.__sigaction_u)) = handler
+	sa.sa_mask = osa.sa_mask
+	sa.sa_flags = osa.sa_flags | _SA_ONSTACK
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa usigactiont
+	sigaction(i, nil, &sa)
+	return *(*uintptr)(unsafe.Pointer(&sa.__sigaction_u))
+}
+
+// setSignaltstackSP sets the ss_sp field of a stackt.
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	*(*uintptr)(unsafe.Pointer(&s.ss_sp)) = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	*mask |= 1 << (uint32(i) - 1)
+}
+
+func sigdelset(mask *sigset, i int) {
+	*mask &^= 1 << (uint32(i) - 1)
+}
+
+//go:linkname executablePath os.executablePath
+var executablePath string
+
+func sysargs(argc int32, argv **byte) {
+	// skip over argv, envv and the first string will be the path
+	n := argc + 1
+	for argv_index(argv, n) != nil {
+		n++
+	}
+	executablePath = gostringnocopy(argv_index(argv, n+1))
+
+	// strip "executable_path=" prefix if available, it's added after OS X 10.11.
+	const prefix = "executable_path="
+	if len(executablePath) > len(prefix) && executablePath[:len(prefix)] == prefix {
+		executablePath = executablePath[len(prefix):]
+	}
+}
+
+func signalM(mp *m, sig int) {
+	pthread_kill(pthread(mp.procid), uint32(sig))
+}
diff --git a/src/runtime/os_darwin_arm64.go b/src/runtime/os_darwin_arm64.go
new file mode 100644
index 0000000..b808150
--- /dev/null
+++ b/src/runtime/os_darwin_arm64.go
@@ -0,0 +1,12 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_dragonfly.go b/src/runtime/os_dragonfly.go
new file mode 100644
index 0000000..383df54
--- /dev/null
+++ b/src/runtime/os_dragonfly.go
@@ -0,0 +1,308 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	_NSIG        = 33
+	_SI_USER     = 0
+	_SS_DISABLE  = 4
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+)
+
+type mOS struct{}
+
+//go:noescape
+func lwp_create(param *lwpparams) int32
+
+//go:noescape
+func sigaltstack(new, old *stackt)
+
+//go:noescape
+func sigaction(sig uint32, new, old *sigactiont)
+
+//go:noescape
+func sigprocmask(how int32, new, old *sigset)
+
+//go:noescape
+func setitimer(mode int32, new, old *itimerval)
+
+//go:noescape
+func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+
+func raiseproc(sig uint32)
+
+func lwp_gettid() int32
+func lwp_kill(pid, tid int32, sig int)
+
+//go:noescape
+func sys_umtx_sleep(addr *uint32, val, timeout int32) int32
+
+//go:noescape
+func sys_umtx_wakeup(addr *uint32, val int32) int32
+
+func osyield()
+
+func kqueue() int32
+
+//go:noescape
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32
+func closeonexec(fd int32)
+func setNonblock(fd int32)
+
+func pipe() (r, w int32, errno int32)
+
+const stackSystem = 0
+
+// From DragonFly's <sys/sysctl.h>
+const (
+	_CTL_HW      = 6
+	_HW_NCPU     = 3
+	_HW_PAGESIZE = 7
+)
+
+var sigset_all = sigset{[4]uint32{^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)}}
+
+func getncpu() int32 {
+	mib := [2]uint32{_CTL_HW, _HW_NCPU}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 {
+		return int32(out)
+	}
+	return 1
+}
+
+func getPageSize() uintptr {
+	mib := [2]uint32{_CTL_HW, _HW_PAGESIZE}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 {
+		return uintptr(out)
+	}
+	return 0
+}
+
+//go:nosplit
+func futexsleep(addr *uint32, val uint32, ns int64) {
+	systemstack(func() {
+		futexsleep1(addr, val, ns)
+	})
+}
+
+func futexsleep1(addr *uint32, val uint32, ns int64) {
+	var timeout int32
+	if ns >= 0 {
+		// The timeout is specified in microseconds - ensure that we
+		// do not end up dividing to zero, which would put us to sleep
+		// indefinitely...
+		timeout = timediv(ns, 1000, nil)
+		if timeout == 0 {
+			timeout = 1
+		}
+	}
+
+	// sys_umtx_sleep will return EWOULDBLOCK (EAGAIN) when the timeout
+	// expires or EBUSY if the mutex value does not match.
+	ret := sys_umtx_sleep(addr, int32(val), timeout)
+	if ret >= 0 || ret == -_EINTR || ret == -_EAGAIN || ret == -_EBUSY {
+		return
+	}
+
+	print("umtx_sleep addr=", addr, " val=", val, " ret=", ret, "\n")
+	*(*int32)(unsafe.Pointer(uintptr(0x1005))) = 0x1005
+}
+
+//go:nosplit
+func futexwakeup(addr *uint32, cnt uint32) {
+	ret := sys_umtx_wakeup(addr, int32(cnt))
+	if ret >= 0 {
+		return
+	}
+
+	systemstack(func() {
+		print("umtx_wake_addr=", addr, " ret=", ret, "\n")
+		*(*int32)(unsafe.Pointer(uintptr(0x1006))) = 0x1006
+	})
+}
+
+func lwp_start(uintptr)
+
+// May run with m.p==nil, so write barriers are not allowed.
+//go:nowritebarrier
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " lwp_start=", funcPC(lwp_start), " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+
+	params := lwpparams{
+		start_func: funcPC(lwp_start),
+		arg:        unsafe.Pointer(mp),
+		stack:      uintptr(stk),
+		tid1:       nil, // minit will record tid
+		tid2:       nil,
+	}
+
+	// TODO: Check for error.
+	lwp_create(&params)
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+}
+
+func osinit() {
+	ncpu = getncpu()
+	if physPageSize == 0 {
+		physPageSize = getPageSize()
+	}
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	getg().m.procid = uint64(lwp_gettid())
+	minitSignals()
+}
+
+// Called from dropm to undo the effect of an minit.
+//go:nosplit
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func sigtramp()
+
+type sigactiont struct {
+	sa_sigaction uintptr
+	sa_flags     int32
+	sa_mask      sigset
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == funcPC(sighandler) {
+		fn = funcPC(sigtramp)
+	}
+	sa.sa_sigaction = fn
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	throw("setsigstack")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return sa.sa_sigaction
+}
+
+// setSignaltstackSP sets the ss_sp field of a stackt.
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	s.ss_sp = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+func sysargs(argc int32, argv **byte) {
+	n := argc + 1
+
+	// skip over argv, envp to get to auxv
+	for argv_index(argv, n) != nil {
+		n++
+	}
+
+	// skip NULL separator
+	n++
+
+	auxv := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*sys.PtrSize))
+	sysauxv(auxv[:])
+}
+
+const (
+	_AT_NULL   = 0
+	_AT_PAGESZ = 6
+)
+
+func sysauxv(auxv []uintptr) {
+	for i := 0; auxv[i] != _AT_NULL; i += 2 {
+		tag, val := auxv[i], auxv[i+1]
+		switch tag {
+		case _AT_PAGESZ:
+			physPageSize = val
+		}
+	}
+}
+
+// raise sends a signal to the calling thread.
+//
+// It must be nosplit because it is used by the signal handler before
+// it definitely has a Go stack.
+//
+//go:nosplit
+func raise(sig uint32) {
+	lwp_kill(-1, lwp_gettid(), int(sig))
+}
+
+func signalM(mp *m, sig int) {
+	lwp_kill(-1, int32(mp.procid), sig)
+}
diff --git a/src/runtime/os_freebsd.go b/src/runtime/os_freebsd.go
new file mode 100644
index 0000000..09065cc
--- /dev/null
+++ b/src/runtime/os_freebsd.go
@@ -0,0 +1,443 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type mOS struct{}
+
+//go:noescape
+func thr_new(param *thrparam, size int32) int32
+
+//go:noescape
+func sigaltstack(new, old *stackt)
+
+//go:noescape
+func sigprocmask(how int32, new, old *sigset)
+
+//go:noescape
+func setitimer(mode int32, new, old *itimerval)
+
+//go:noescape
+func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+
+func raiseproc(sig uint32)
+
+func thr_self() thread
+func thr_kill(tid thread, sig int)
+
+//go:noescape
+func sys_umtx_op(addr *uint32, mode int32, val uint32, uaddr1 uintptr, ut *umtx_time) int32
+
+func osyield()
+
+func kqueue() int32
+
+//go:noescape
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32
+
+func pipe() (r, w int32, errno int32)
+func pipe2(flags int32) (r, w int32, errno int32)
+func closeonexec(fd int32)
+func setNonblock(fd int32)
+
+// From FreeBSD's <sys/sysctl.h>
+const (
+	_CTL_HW      = 6
+	_HW_PAGESIZE = 7
+)
+
+var sigset_all = sigset{[4]uint32{^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)}}
+
+// Undocumented numbers from FreeBSD's lib/libc/gen/sysctlnametomib.c.
+const (
+	_CTL_QUERY     = 0
+	_CTL_QUERY_MIB = 3
+)
+
+// sysctlnametomib fill mib with dynamically assigned sysctl entries of name,
+// return count of effected mib slots, return 0 on error.
+func sysctlnametomib(name []byte, mib *[_CTL_MAXNAME]uint32) uint32 {
+	oid := [2]uint32{_CTL_QUERY, _CTL_QUERY_MIB}
+	miblen := uintptr(_CTL_MAXNAME)
+	if sysctl(&oid[0], 2, (*byte)(unsafe.Pointer(mib)), &miblen, (*byte)(unsafe.Pointer(&name[0])), (uintptr)(len(name))) < 0 {
+		return 0
+	}
+	miblen /= unsafe.Sizeof(uint32(0))
+	if miblen <= 0 {
+		return 0
+	}
+	return uint32(miblen)
+}
+
+const (
+	_CPU_CURRENT_PID = -1 // Current process ID.
+)
+
+//go:noescape
+func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+
+//go:systemstack
+func getncpu() int32 {
+	// Use a large buffer for the CPU mask. We're on the system
+	// stack, so this is fine, and we can't allocate memory for a
+	// dynamically-sized buffer at this point.
+	const maxCPUs = 64 * 1024
+	var mask [maxCPUs / 8]byte
+	var mib [_CTL_MAXNAME]uint32
+
+	// According to FreeBSD's /usr/src/sys/kern/kern_cpuset.c,
+	// cpuset_getaffinity return ERANGE when provided buffer size exceed the limits in kernel.
+	// Querying kern.smp.maxcpus to calculate maximum buffer size.
+	// See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200802
+
+	// Variable kern.smp.maxcpus introduced at Dec 23 2003, revision 123766,
+	// with dynamically assigned sysctl entries.
+	miblen := sysctlnametomib([]byte("kern.smp.maxcpus"), &mib)
+	if miblen == 0 {
+		return 1
+	}
+
+	// Query kern.smp.maxcpus.
+	dstsize := uintptr(4)
+	maxcpus := uint32(0)
+	if sysctl(&mib[0], miblen, (*byte)(unsafe.Pointer(&maxcpus)), &dstsize, nil, 0) != 0 {
+		return 1
+	}
+
+	maskSize := int(maxcpus+7) / 8
+	if maskSize < sys.PtrSize {
+		maskSize = sys.PtrSize
+	}
+	if maskSize > len(mask) {
+		maskSize = len(mask)
+	}
+
+	if cpuset_getaffinity(_CPU_LEVEL_WHICH, _CPU_WHICH_PID, _CPU_CURRENT_PID,
+		maskSize, (*byte)(unsafe.Pointer(&mask[0]))) != 0 {
+		return 1
+	}
+	n := int32(0)
+	for _, v := range mask[:maskSize] {
+		for v != 0 {
+			n += int32(v & 1)
+			v >>= 1
+		}
+	}
+	if n == 0 {
+		return 1
+	}
+	return n
+}
+
+func getPageSize() uintptr {
+	mib := [2]uint32{_CTL_HW, _HW_PAGESIZE}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 {
+		return uintptr(out)
+	}
+	return 0
+}
+
+// FreeBSD's umtx_op syscall is effectively the same as Linux's futex, and
+// thus the code is largely similar. See Linux implementation
+// and lock_futex.go for comments.
+
+//go:nosplit
+func futexsleep(addr *uint32, val uint32, ns int64) {
+	systemstack(func() {
+		futexsleep1(addr, val, ns)
+	})
+}
+
+func futexsleep1(addr *uint32, val uint32, ns int64) {
+	var utp *umtx_time
+	if ns >= 0 {
+		var ut umtx_time
+		ut._clockid = _CLOCK_MONOTONIC
+		ut._timeout.setNsec(ns)
+		utp = &ut
+	}
+	ret := sys_umtx_op(addr, _UMTX_OP_WAIT_UINT_PRIVATE, val, unsafe.Sizeof(*utp), utp)
+	if ret >= 0 || ret == -_EINTR || ret == -_ETIMEDOUT {
+		return
+	}
+	print("umtx_wait addr=", addr, " val=", val, " ret=", ret, "\n")
+	*(*int32)(unsafe.Pointer(uintptr(0x1005))) = 0x1005
+}
+
+//go:nosplit
+func futexwakeup(addr *uint32, cnt uint32) {
+	ret := sys_umtx_op(addr, _UMTX_OP_WAKE_PRIVATE, cnt, 0, nil)
+	if ret >= 0 {
+		return
+	}
+
+	systemstack(func() {
+		print("umtx_wake_addr=", addr, " ret=", ret, "\n")
+	})
+}
+
+func thr_start()
+
+// May run with m.p==nil, so write barriers are not allowed.
+//go:nowritebarrier
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " thr_start=", funcPC(thr_start), " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	param := thrparam{
+		start_func: funcPC(thr_start),
+		arg:        unsafe.Pointer(mp),
+		stack_base: mp.g0.stack.lo,
+		stack_size: uintptr(stk) - mp.g0.stack.lo,
+		child_tid:  nil, // minit will record tid
+		parent_tid: nil,
+		tls_base:   unsafe.Pointer(&mp.tls[0]),
+		tls_size:   unsafe.Sizeof(mp.tls),
+	}
+
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret := thr_new(&param, int32(unsafe.Sizeof(param)))
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret < 0 {
+		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", -ret, ")\n")
+		throw("newosproc")
+	}
+}
+
+// Version of newosproc that doesn't require a valid G.
+//go:nosplit
+func newosproc0(stacksize uintptr, fn unsafe.Pointer) {
+	stack := sysAlloc(stacksize, &memstats.stacks_sys)
+	if stack == nil {
+		write(2, unsafe.Pointer(&failallocatestack[0]), int32(len(failallocatestack)))
+		exit(1)
+	}
+	// This code "knows" it's being called once from the library
+	// initialization code, and so it's using the static m0 for the
+	// tls and procid (thread) pointers. thr_new() requires the tls
+	// pointers, though the tid pointers can be nil.
+	// However, newosproc0 is currently unreachable because builds
+	// utilizing c-shared/c-archive force external linking.
+	param := thrparam{
+		start_func: funcPC(fn),
+		arg:        nil,
+		stack_base: uintptr(stack), //+stacksize?
+		stack_size: stacksize,
+		child_tid:  nil, // minit will record tid
+		parent_tid: nil,
+		tls_base:   unsafe.Pointer(&m0.tls[0]),
+		tls_size:   unsafe.Sizeof(m0.tls),
+	}
+
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret := thr_new(&param, int32(unsafe.Sizeof(param)))
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret < 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+}
+
+var failallocatestack = []byte("runtime: failed to allocate stack for the new OS thread\n")
+var failthreadcreate = []byte("runtime: failed to create new OS thread\n")
+
+// Called to do synchronous initialization of Go code built with
+// -buildmode=c-archive or -buildmode=c-shared.
+// None of the Go runtime is initialized.
+//go:nosplit
+//go:nowritebarrierrec
+func libpreinit() {
+	initsig(true)
+}
+
+func osinit() {
+	ncpu = getncpu()
+	if physPageSize == 0 {
+		physPageSize = getPageSize()
+	}
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	getg().m.procid = uint64(thr_self())
+
+	// On FreeBSD before about April 2017 there was a bug such
+	// that calling execve from a thread other than the main
+	// thread did not reset the signal stack. That would confuse
+	// minitSignals, which calls minitSignalStack, which checks
+	// whether there is currently a signal stack and uses it if
+	// present. To avoid this confusion, explicitly disable the
+	// signal stack on the main thread when not running in a
+	// library. This can be removed when we are confident that all
+	// FreeBSD users are running a patched kernel. See issue #15658.
+	if gp := getg(); !isarchive && !islibrary && gp.m == &m0 && gp == gp.m.g0 {
+		st := stackt{ss_flags: _SS_DISABLE}
+		sigaltstack(&st, nil)
+	}
+
+	minitSignals()
+}
+
+// Called from dropm to undo the effect of an minit.
+//go:nosplit
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func sigtramp()
+
+type sigactiont struct {
+	sa_handler uintptr
+	sa_flags   int32
+	sa_mask    sigset
+}
+
+// See os_freebsd2.go, os_freebsd_amd64.go for setsig function
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	if sa.sa_flags&_SA_ONSTACK != 0 {
+		return
+	}
+	sa.sa_flags |= _SA_ONSTACK
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return sa.sa_handler
+}
+
+// setSignaltstackSP sets the ss_sp field of a stackt.
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	s.ss_sp = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+func sysargs(argc int32, argv **byte) {
+	n := argc + 1
+
+	// skip over argv, envp to get to auxv
+	for argv_index(argv, n) != nil {
+		n++
+	}
+
+	// skip NULL separator
+	n++
+
+	// now argv+n is auxv
+	auxv := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*sys.PtrSize))
+	sysauxv(auxv[:])
+}
+
+const (
+	_AT_NULL     = 0  // Terminates the vector
+	_AT_PAGESZ   = 6  // Page size in bytes
+	_AT_TIMEKEEP = 22 // Pointer to timehands.
+	_AT_HWCAP    = 25 // CPU feature flags
+	_AT_HWCAP2   = 26 // CPU feature flags 2
+)
+
+func sysauxv(auxv []uintptr) {
+	for i := 0; auxv[i] != _AT_NULL; i += 2 {
+		tag, val := auxv[i], auxv[i+1]
+		switch tag {
+		// _AT_NCPUS from auxv shouldn't be used due to golang.org/issue/15206
+		case _AT_PAGESZ:
+			physPageSize = val
+		case _AT_TIMEKEEP:
+			timekeepSharedPage = (*vdsoTimekeep)(unsafe.Pointer(val))
+		}
+
+		archauxv(tag, val)
+	}
+}
+
+// sysSigaction calls the sigaction system call.
+//go:nosplit
+func sysSigaction(sig uint32, new, old *sigactiont) {
+	// Use system stack to avoid split stack overflow on amd64
+	if asmSigaction(uintptr(sig), new, old) != 0 {
+		systemstack(func() {
+			throw("sigaction failed")
+		})
+	}
+}
+
+// asmSigaction is implemented in assembly.
+//go:noescape
+func asmSigaction(sig uintptr, new, old *sigactiont) int32
+
+// raise sends a signal to the calling thread.
+//
+// It must be nosplit because it is used by the signal handler before
+// it definitely has a Go stack.
+//
+//go:nosplit
+func raise(sig uint32) {
+	thr_kill(thr_self(), int(sig))
+}
+
+func signalM(mp *m, sig int) {
+	thr_kill(thread(mp.procid), sig)
+}
diff --git a/src/runtime/os_freebsd2.go b/src/runtime/os_freebsd2.go
new file mode 100644
index 0000000..6947a05
--- /dev/null
+++ b/src/runtime/os_freebsd2.go
@@ -0,0 +1,20 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build freebsd,!amd64
+
+package runtime
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == funcPC(sighandler) {
+		fn = funcPC(sigtramp)
+	}
+	sa.sa_handler = fn
+	sigaction(i, &sa, nil)
+}
diff --git a/src/runtime/os_freebsd_amd64.go b/src/runtime/os_freebsd_amd64.go
new file mode 100644
index 0000000..dc0bb9f
--- /dev/null
+++ b/src/runtime/os_freebsd_amd64.go
@@ -0,0 +1,24 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func cgoSigtramp()
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == funcPC(sighandler) {
+		if iscgo {
+			fn = funcPC(cgoSigtramp)
+		} else {
+			fn = funcPC(sigtramp)
+		}
+	}
+	sa.sa_handler = fn
+	sigaction(i, &sa, nil)
+}
diff --git a/src/runtime/os_freebsd_arm.go b/src/runtime/os_freebsd_arm.go
new file mode 100644
index 0000000..3feaa5e
--- /dev/null
+++ b/src/runtime/os_freebsd_arm.go
@@ -0,0 +1,48 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "internal/cpu"
+
+const (
+	_HWCAP_VFP   = 1 << 6
+	_HWCAP_VFPv3 = 1 << 13
+)
+
+func checkgoarm() {
+	if goarm > 5 && cpu.HWCap&_HWCAP_VFP == 0 {
+		print("runtime: this CPU has no floating point hardware, so it cannot run\n")
+		print("this GOARM=", goarm, " binary. Recompile using GOARM=5.\n")
+		exit(1)
+	}
+	if goarm > 6 && cpu.HWCap&_HWCAP_VFPv3 == 0 {
+		print("runtime: this CPU has no VFPv3 floating point hardware, so it cannot run\n")
+		print("this GOARM=", goarm, " binary. Recompile using GOARM=5 or GOARM=6.\n")
+		exit(1)
+	}
+
+	// osinit not called yet, so ncpu not set: must use getncpu directly.
+	if getncpu() > 1 && goarm < 7 {
+		print("runtime: this system has multiple CPUs and must use\n")
+		print("atomic synchronization instructions. Recompile using GOARM=7.\n")
+		exit(1)
+	}
+}
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		cpu.HWCap = uint(val)
+	case _AT_HWCAP2:
+		cpu.HWCap2 = uint(val)
+	}
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_freebsd_arm64.go b/src/runtime/os_freebsd_arm64.go
new file mode 100644
index 0000000..b5b25f0
--- /dev/null
+++ b/src/runtime/os_freebsd_arm64.go
@@ -0,0 +1,12 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed fastrand().
+	// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_freebsd_noauxv.go b/src/runtime/os_freebsd_noauxv.go
new file mode 100644
index 0000000..01efb9b
--- /dev/null
+++ b/src/runtime/os_freebsd_noauxv.go
@@ -0,0 +1,11 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build freebsd
+// +build !arm
+
+package runtime
+
+func archauxv(tag, val uintptr) {
+}
diff --git a/src/runtime/os_illumos.go b/src/runtime/os_illumos.go
new file mode 100644
index 0000000..c3c3e4e
--- /dev/null
+++ b/src/runtime/os_illumos.go
@@ -0,0 +1,132 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+//go:cgo_import_dynamic libc_getrctl getrctl "libc.so"
+//go:cgo_import_dynamic libc_rctlblk_get_local_action rctlblk_get_local_action "libc.so"
+//go:cgo_import_dynamic libc_rctlblk_get_local_flags rctlblk_get_local_flags "libc.so"
+//go:cgo_import_dynamic libc_rctlblk_get_value rctlblk_get_value "libc.so"
+//go:cgo_import_dynamic libc_rctlblk_size rctlblk_size "libc.so"
+
+//go:linkname libc_getrctl libc_getrctl
+//go:linkname libc_rctlblk_get_local_action libc_rctlblk_get_local_action
+//go:linkname libc_rctlblk_get_local_flags libc_rctlblk_get_local_flags
+//go:linkname libc_rctlblk_get_value libc_rctlblk_get_value
+//go:linkname libc_rctlblk_size libc_rctlblk_size
+
+var (
+	libc_getrctl,
+	libc_rctlblk_get_local_action,
+	libc_rctlblk_get_local_flags,
+	libc_rctlblk_get_value,
+	libc_rctlblk_size libcFunc
+)
+
+// Return the minimum value seen for the zone CPU cap, or 0 if no cap is
+// detected.
+func getcpucap() uint64 {
+	// The resource control block is an opaque object whose size is only
+	// known to libc.  In practice, given the contents, it is unlikely to
+	// grow beyond 8KB so we'll use a static buffer of that size here.
+	const rblkmaxsize = 8 * 1024
+	if rctlblk_size() > rblkmaxsize {
+		return 0
+	}
+
+	// The "zone.cpu-cap" resource control, as described in
+	// resource_controls(5), "sets a limit on the amount of CPU time that
+	// can be used by a zone.  The unit used is the percentage of a single
+	// CPU that can be used by all user threads in a zone, expressed as an
+	// integer."  A C string of the name must be passed to getrctl(2).
+	name := []byte("zone.cpu-cap\x00")
+
+	// To iterate over the list of values for a particular resource
+	// control, we need two blocks: one for the previously read value and
+	// one for the next value.
+	var rblk0 [rblkmaxsize]byte
+	var rblk1 [rblkmaxsize]byte
+	rblk := &rblk0[0]
+	rblkprev := &rblk1[0]
+
+	var flag uint32 = _RCTL_FIRST
+	var capval uint64 = 0
+
+	for {
+		if getrctl(unsafe.Pointer(&name[0]), unsafe.Pointer(rblkprev), unsafe.Pointer(rblk), flag) != 0 {
+			// The end of the sequence is reported as an ENOENT
+			// failure, but determining the CPU cap is not critical
+			// here.  We'll treat any failure as if it were the end
+			// of sequence.
+			break
+		}
+
+		lflags := rctlblk_get_local_flags(unsafe.Pointer(rblk))
+		action := rctlblk_get_local_action(unsafe.Pointer(rblk))
+		if (lflags&_RCTL_LOCAL_MAXIMAL) == 0 && action == _RCTL_LOCAL_DENY {
+			// This is a finite (not maximal) value representing a
+			// cap (deny) action.
+			v := rctlblk_get_value(unsafe.Pointer(rblk))
+			if capval == 0 || capval > v {
+				capval = v
+			}
+		}
+
+		// Swap the blocks around so that we can fetch the next value
+		t := rblk
+		rblk = rblkprev
+		rblkprev = t
+		flag = _RCTL_NEXT
+	}
+
+	return capval
+}
+
+func getncpu() int32 {
+	n := int32(sysconf(__SC_NPROCESSORS_ONLN))
+	if n < 1 {
+		return 1
+	}
+
+	if cents := int32(getcpucap()); cents > 0 {
+		// Convert from a percentage of CPUs to a number of CPUs,
+		// rounding up to make use of a fractional CPU
+		// e.g., 336% becomes 4 CPUs
+		ncap := (cents + 99) / 100
+		if ncap < n {
+			return ncap
+		}
+	}
+
+	return n
+}
+
+//go:nosplit
+func getrctl(controlname, oldbuf, newbuf unsafe.Pointer, flags uint32) uintptr {
+	return sysvicall4(&libc_getrctl, uintptr(controlname), uintptr(oldbuf), uintptr(newbuf), uintptr(flags))
+}
+
+//go:nosplit
+func rctlblk_get_local_action(buf unsafe.Pointer) uintptr {
+	return sysvicall2(&libc_rctlblk_get_local_action, uintptr(buf), uintptr(0))
+}
+
+//go:nosplit
+func rctlblk_get_local_flags(buf unsafe.Pointer) uintptr {
+	return sysvicall1(&libc_rctlblk_get_local_flags, uintptr(buf))
+}
+
+//go:nosplit
+func rctlblk_get_value(buf unsafe.Pointer) uint64 {
+	return uint64(sysvicall1(&libc_rctlblk_get_value, uintptr(buf)))
+}
+
+//go:nosplit
+func rctlblk_size() uintptr {
+	return sysvicall0(&libc_rctlblk_size)
+}
diff --git a/src/runtime/os_js.go b/src/runtime/os_js.go
new file mode 100644
index 0000000..24261e8
--- /dev/null
+++ b/src/runtime/os_js.go
@@ -0,0 +1,155 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build js,wasm
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+func exit(code int32)
+
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	if fd > 2 {
+		throw("runtime.write to fd > 2 is unsupported")
+	}
+	wasmWrite(fd, p, n)
+	return n
+}
+
+// Stubs so tests can link correctly. These should never be called.
+func open(name *byte, mode, perm int32) int32        { panic("not implemented") }
+func closefd(fd int32) int32                         { panic("not implemented") }
+func read(fd int32, p unsafe.Pointer, n int32) int32 { panic("not implemented") }
+
+//go:noescape
+func wasmWrite(fd uintptr, p unsafe.Pointer, n int32)
+
+func usleep(usec uint32)
+
+func exitThread(wait *uint32)
+
+type mOS struct{}
+
+func osyield()
+
+const _SIGSEGV = 0xb
+
+func sigpanic() {
+	g := getg()
+	if !canpanic(g) {
+		throw("unexpected signal during runtime execution")
+	}
+
+	// js only invokes the exception handler for memory faults.
+	g.sig = _SIGSEGV
+	panicmem()
+}
+
+type sigset struct{}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+}
+
+//go:nosplit
+func sigsave(p *sigset) {
+}
+
+//go:nosplit
+func msigrestore(sigmask sigset) {
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func clearSignalHandlers() {
+}
+
+//go:nosplit
+func sigblock(exiting bool) {
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+}
+
+// Called from dropm to undo the effect of an minit.
+func unminit() {
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func osinit() {
+	ncpu = 1
+	getg().m.procid = 2
+	physPageSize = 64 * 1024
+}
+
+// wasm has no signals
+const _NSIG = 0
+
+func signame(sig uint32) string {
+	return ""
+}
+
+func crash() {
+	*(*int32)(nil) = 0
+}
+
+func getRandomData(r []byte)
+
+func goenvs() {
+	goenvs_unix()
+}
+
+func initsig(preinit bool) {
+}
+
+// May run with m.p==nil, so write barriers are not allowed.
+//go:nowritebarrier
+func newosproc(mp *m) {
+	panic("newosproc: not implemented")
+}
+
+func setProcessCPUProfiler(hz int32) {}
+func setThreadCPUProfiler(hz int32)  {}
+func sigdisable(uint32)              {}
+func sigenable(uint32)               {}
+func sigignore(uint32)               {}
+
+//go:linkname os_sigpipe os.sigpipe
+func os_sigpipe() {
+	throw("too many writes on closed pipe")
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
+
+//go:linkname syscall_now syscall.now
+func syscall_now() (sec int64, nsec int32) {
+	sec, nsec, _ = time_now()
+	return
+}
+
+// gsignalStack is unused on js.
+type gsignalStack struct{}
+
+const preemptMSupported = false
+
+func preemptM(mp *m) {
+	// No threads, so nothing to do.
+}
diff --git a/src/runtime/os_linux.go b/src/runtime/os_linux.go
new file mode 100644
index 0000000..058c7da
--- /dev/null
+++ b/src/runtime/os_linux.go
@@ -0,0 +1,504 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type mOS struct{}
+
+//go:noescape
+func futex(addr unsafe.Pointer, op int32, val uint32, ts, addr2 unsafe.Pointer, val3 uint32) int32
+
+// Linux futex.
+//
+//	futexsleep(uint32 *addr, uint32 val)
+//	futexwakeup(uint32 *addr)
+//
+// Futexsleep atomically checks if *addr == val and if so, sleeps on addr.
+// Futexwakeup wakes up threads sleeping on addr.
+// Futexsleep is allowed to wake up spuriously.
+
+const (
+	_FUTEX_PRIVATE_FLAG = 128
+	_FUTEX_WAIT_PRIVATE = 0 | _FUTEX_PRIVATE_FLAG
+	_FUTEX_WAKE_PRIVATE = 1 | _FUTEX_PRIVATE_FLAG
+)
+
+// Atomically,
+//	if(*addr == val) sleep
+// Might be woken up spuriously; that's allowed.
+// Don't sleep longer than ns; ns < 0 means forever.
+//go:nosplit
+func futexsleep(addr *uint32, val uint32, ns int64) {
+	// Some Linux kernels have a bug where futex of
+	// FUTEX_WAIT returns an internal error code
+	// as an errno. Libpthread ignores the return value
+	// here, and so can we: as it says a few lines up,
+	// spurious wakeups are allowed.
+	if ns < 0 {
+		futex(unsafe.Pointer(addr), _FUTEX_WAIT_PRIVATE, val, nil, nil, 0)
+		return
+	}
+
+	var ts timespec
+	ts.setNsec(ns)
+	futex(unsafe.Pointer(addr), _FUTEX_WAIT_PRIVATE, val, unsafe.Pointer(&ts), nil, 0)
+}
+
+// If any procs are sleeping on addr, wake up at most cnt.
+//go:nosplit
+func futexwakeup(addr *uint32, cnt uint32) {
+	ret := futex(unsafe.Pointer(addr), _FUTEX_WAKE_PRIVATE, cnt, nil, nil, 0)
+	if ret >= 0 {
+		return
+	}
+
+	// I don't know that futex wakeup can return
+	// EAGAIN or EINTR, but if it does, it would be
+	// safe to loop and call futex again.
+	systemstack(func() {
+		print("futexwakeup addr=", addr, " returned ", ret, "\n")
+	})
+
+	*(*int32)(unsafe.Pointer(uintptr(0x1006))) = 0x1006
+}
+
+func getproccount() int32 {
+	// This buffer is huge (8 kB) but we are on the system stack
+	// and there should be plenty of space (64 kB).
+	// Also this is a leaf, so we're not holding up the memory for long.
+	// See golang.org/issue/11823.
+	// The suggested behavior here is to keep trying with ever-larger
+	// buffers, but we don't have a dynamic memory allocator at the
+	// moment, so that's a bit tricky and seems like overkill.
+	const maxCPUs = 64 * 1024
+	var buf [maxCPUs / 8]byte
+	r := sched_getaffinity(0, unsafe.Sizeof(buf), &buf[0])
+	if r < 0 {
+		return 1
+	}
+	n := int32(0)
+	for _, v := range buf[:r] {
+		for v != 0 {
+			n += int32(v & 1)
+			v >>= 1
+		}
+	}
+	if n == 0 {
+		n = 1
+	}
+	return n
+}
+
+// Clone, the Linux rfork.
+const (
+	_CLONE_VM             = 0x100
+	_CLONE_FS             = 0x200
+	_CLONE_FILES          = 0x400
+	_CLONE_SIGHAND        = 0x800
+	_CLONE_PTRACE         = 0x2000
+	_CLONE_VFORK          = 0x4000
+	_CLONE_PARENT         = 0x8000
+	_CLONE_THREAD         = 0x10000
+	_CLONE_NEWNS          = 0x20000
+	_CLONE_SYSVSEM        = 0x40000
+	_CLONE_SETTLS         = 0x80000
+	_CLONE_PARENT_SETTID  = 0x100000
+	_CLONE_CHILD_CLEARTID = 0x200000
+	_CLONE_UNTRACED       = 0x800000
+	_CLONE_CHILD_SETTID   = 0x1000000
+	_CLONE_STOPPED        = 0x2000000
+	_CLONE_NEWUTS         = 0x4000000
+	_CLONE_NEWIPC         = 0x8000000
+
+	// As of QEMU 2.8.0 (5ea2fc84d), user emulation requires all six of these
+	// flags to be set when creating a thread; attempts to share the other
+	// five but leave SYSVSEM unshared will fail with -EINVAL.
+	//
+	// In non-QEMU environments CLONE_SYSVSEM is inconsequential as we do not
+	// use System V semaphores.
+
+	cloneFlags = _CLONE_VM | /* share memory */
+		_CLONE_FS | /* share cwd, etc */
+		_CLONE_FILES | /* share fd table */
+		_CLONE_SIGHAND | /* share sig handler table */
+		_CLONE_SYSVSEM | /* share SysV semaphore undo lists (see issue #20763) */
+		_CLONE_THREAD /* revisit - okay for now */
+)
+
+//go:noescape
+func clone(flags int32, stk, mp, gp, fn unsafe.Pointer) int32
+
+// May run with m.p==nil, so write barriers are not allowed.
+//go:nowritebarrier
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	/*
+	 * note: strace gets confused if we use CLONE_PTRACE here.
+	 */
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " clone=", funcPC(clone), " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	// Disable signals during clone, so that the new thread starts
+	// with signals disabled. It will enable them in minit.
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret := clone(cloneFlags, stk, unsafe.Pointer(mp), unsafe.Pointer(mp.g0), unsafe.Pointer(funcPC(mstart)))
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+
+	if ret < 0 {
+		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", -ret, ")\n")
+		if ret == -_EAGAIN {
+			println("runtime: may need to increase max user processes (ulimit -u)")
+		}
+		throw("newosproc")
+	}
+}
+
+// Version of newosproc that doesn't require a valid G.
+//go:nosplit
+func newosproc0(stacksize uintptr, fn unsafe.Pointer) {
+	stack := sysAlloc(stacksize, &memstats.stacks_sys)
+	if stack == nil {
+		write(2, unsafe.Pointer(&failallocatestack[0]), int32(len(failallocatestack)))
+		exit(1)
+	}
+	ret := clone(cloneFlags, unsafe.Pointer(uintptr(stack)+stacksize), nil, nil, fn)
+	if ret < 0 {
+		write(2, unsafe.Pointer(&failthreadcreate[0]), int32(len(failthreadcreate)))
+		exit(1)
+	}
+}
+
+var failallocatestack = []byte("runtime: failed to allocate stack for the new OS thread\n")
+var failthreadcreate = []byte("runtime: failed to create new OS thread\n")
+
+const (
+	_AT_NULL   = 0  // End of vector
+	_AT_PAGESZ = 6  // System physical page size
+	_AT_HWCAP  = 16 // hardware capability bit vector
+	_AT_RANDOM = 25 // introduced in 2.6.29
+	_AT_HWCAP2 = 26 // hardware capability bit vector 2
+)
+
+var procAuxv = []byte("/proc/self/auxv\x00")
+
+var addrspace_vec [1]byte
+
+func mincore(addr unsafe.Pointer, n uintptr, dst *byte) int32
+
+func sysargs(argc int32, argv **byte) {
+	n := argc + 1
+
+	// skip over argv, envp to get to auxv
+	for argv_index(argv, n) != nil {
+		n++
+	}
+
+	// skip NULL separator
+	n++
+
+	// now argv+n is auxv
+	auxv := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*sys.PtrSize))
+	if sysauxv(auxv[:]) != 0 {
+		return
+	}
+	// In some situations we don't get a loader-provided
+	// auxv, such as when loaded as a library on Android.
+	// Fall back to /proc/self/auxv.
+	fd := open(&procAuxv[0], 0 /* O_RDONLY */, 0)
+	if fd < 0 {
+		// On Android, /proc/self/auxv might be unreadable (issue 9229), so we fallback to
+		// try using mincore to detect the physical page size.
+		// mincore should return EINVAL when address is not a multiple of system page size.
+		const size = 256 << 10 // size of memory region to allocate
+		p, err := mmap(nil, size, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+		if err != 0 {
+			return
+		}
+		var n uintptr
+		for n = 4 << 10; n < size; n <<= 1 {
+			err := mincore(unsafe.Pointer(uintptr(p)+n), 1, &addrspace_vec[0])
+			if err == 0 {
+				physPageSize = n
+				break
+			}
+		}
+		if physPageSize == 0 {
+			physPageSize = size
+		}
+		munmap(p, size)
+		return
+	}
+	var buf [128]uintptr
+	n = read(fd, noescape(unsafe.Pointer(&buf[0])), int32(unsafe.Sizeof(buf)))
+	closefd(fd)
+	if n < 0 {
+		return
+	}
+	// Make sure buf is terminated, even if we didn't read
+	// the whole file.
+	buf[len(buf)-2] = _AT_NULL
+	sysauxv(buf[:])
+}
+
+// startupRandomData holds random bytes initialized at startup. These come from
+// the ELF AT_RANDOM auxiliary vector.
+var startupRandomData []byte
+
+func sysauxv(auxv []uintptr) int {
+	var i int
+	for ; auxv[i] != _AT_NULL; i += 2 {
+		tag, val := auxv[i], auxv[i+1]
+		switch tag {
+		case _AT_RANDOM:
+			// The kernel provides a pointer to 16-bytes
+			// worth of random data.
+			startupRandomData = (*[16]byte)(unsafe.Pointer(val))[:]
+
+		case _AT_PAGESZ:
+			physPageSize = val
+		}
+
+		archauxv(tag, val)
+		vdsoauxv(tag, val)
+	}
+	return i / 2
+}
+
+var sysTHPSizePath = []byte("/sys/kernel/mm/transparent_hugepage/hpage_pmd_size\x00")
+
+func getHugePageSize() uintptr {
+	var numbuf [20]byte
+	fd := open(&sysTHPSizePath[0], 0 /* O_RDONLY */, 0)
+	if fd < 0 {
+		return 0
+	}
+	ptr := noescape(unsafe.Pointer(&numbuf[0]))
+	n := read(fd, ptr, int32(len(numbuf)))
+	closefd(fd)
+	if n <= 0 {
+		return 0
+	}
+	n-- // remove trailing newline
+	v, ok := atoi(slicebytetostringtmp((*byte)(ptr), int(n)))
+	if !ok || v < 0 {
+		v = 0
+	}
+	if v&(v-1) != 0 {
+		// v is not a power of 2
+		return 0
+	}
+	return uintptr(v)
+}
+
+func osinit() {
+	ncpu = getproccount()
+	physHugePageSize = getHugePageSize()
+	if iscgo {
+		// #42494 glibc and musl reserve some signals for
+		// internal use and require they not be blocked by
+		// the rest of a normal C runtime. When the go runtime
+		// blocks...unblocks signals, temporarily, the blocked
+		// interval of time is generally very short. As such,
+		// these expectations of *libc code are mostly met by
+		// the combined go+cgo system of threads. However,
+		// when go causes a thread to exit, via a return from
+		// mstart(), the combined runtime can deadlock if
+		// these signals are blocked. Thus, don't block these
+		// signals when exiting threads.
+		// - glibc: SIGCANCEL (32), SIGSETXID (33)
+		// - musl: SIGTIMER (32), SIGCANCEL (33), SIGSYNCCALL (34)
+		sigdelset(&sigsetAllExiting, 32)
+		sigdelset(&sigsetAllExiting, 33)
+		sigdelset(&sigsetAllExiting, 34)
+	}
+	osArchInit()
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+func getRandomData(r []byte) {
+	if startupRandomData != nil {
+		n := copy(r, startupRandomData)
+		extendRandom(r, n)
+		return
+	}
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to do synchronous initialization of Go code built with
+// -buildmode=c-archive or -buildmode=c-shared.
+// None of the Go runtime is initialized.
+//go:nosplit
+//go:nowritebarrierrec
+func libpreinit() {
+	initsig(true)
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024) // Linux wants >= 2K
+	mp.gsignal.m = mp
+}
+
+func gettid() uint32
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	minitSignals()
+
+	// Cgo-created threads and the bootstrap m are missing a
+	// procid. We need this for asynchronous preemption and it's
+	// useful in debuggers.
+	getg().m.procid = uint64(gettid())
+}
+
+// Called from dropm to undo the effect of an minit.
+//go:nosplit
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+//#ifdef GOARCH_386
+//#define sa_handler k_sa_handler
+//#endif
+
+func sigreturn()
+func sigtramp(sig uint32, info *siginfo, ctx unsafe.Pointer)
+func cgoSigtramp()
+
+//go:noescape
+func sigaltstack(new, old *stackt)
+
+//go:noescape
+func setitimer(mode int32, new, old *itimerval)
+
+//go:noescape
+func rtsigprocmask(how int32, new, old *sigset, size int32)
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigprocmask(how int32, new, old *sigset) {
+	rtsigprocmask(how, new, old, int32(unsafe.Sizeof(*new)))
+}
+
+func raise(sig uint32)
+func raiseproc(sig uint32)
+
+//go:noescape
+func sched_getaffinity(pid, len uintptr, buf *byte) int32
+func osyield()
+
+func pipe() (r, w int32, errno int32)
+func pipe2(flags int32) (r, w int32, errno int32)
+func setNonblock(fd int32)
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTORER | _SA_RESTART
+	sigfillset(&sa.sa_mask)
+	// Although Linux manpage says "sa_restorer element is obsolete and
+	// should not be used". x86_64 kernel requires it. Only use it on
+	// x86.
+	if GOARCH == "386" || GOARCH == "amd64" {
+		sa.sa_restorer = funcPC(sigreturn)
+	}
+	if fn == funcPC(sighandler) {
+		if iscgo {
+			fn = funcPC(cgoSigtramp)
+		} else {
+			fn = funcPC(sigtramp)
+		}
+	}
+	sa.sa_handler = fn
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	if sa.sa_flags&_SA_ONSTACK != 0 {
+		return
+	}
+	sa.sa_flags |= _SA_ONSTACK
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return sa.sa_handler
+}
+
+// setSignaltstackSP sets the ss_sp field of a stackt.
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	*(*uintptr)(unsafe.Pointer(&s.ss_sp)) = sp
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+// sysSigaction calls the rt_sigaction system call.
+//go:nosplit
+func sysSigaction(sig uint32, new, old *sigactiont) {
+	if rt_sigaction(uintptr(sig), new, old, unsafe.Sizeof(sigactiont{}.sa_mask)) != 0 {
+		// Workaround for bugs in QEMU user mode emulation.
+		//
+		// QEMU turns calls to the sigaction system call into
+		// calls to the C library sigaction call; the C
+		// library call rejects attempts to call sigaction for
+		// SIGCANCEL (32) or SIGSETXID (33).
+		//
+		// QEMU rejects calling sigaction on SIGRTMAX (64).
+		//
+		// Just ignore the error in these case. There isn't
+		// anything we can do about it anyhow.
+		if sig != 32 && sig != 33 && sig != 64 {
+			// Use system stack to avoid split stack overflow on ppc64/ppc64le.
+			systemstack(func() {
+				throw("sigaction failed")
+			})
+		}
+	}
+}
+
+// rt_sigaction is implemented in assembly.
+//go:noescape
+func rt_sigaction(sig uintptr, new, old *sigactiont, size uintptr) int32
+
+func getpid() int
+func tgkill(tgid, tid, sig int)
+
+// signalM sends a signal to mp.
+func signalM(mp *m, sig int) {
+	tgkill(getpid(), int(mp.procid), sig)
+}
diff --git a/src/runtime/os_linux_arm.go b/src/runtime/os_linux_arm.go
new file mode 100644
index 0000000..b590da7
--- /dev/null
+++ b/src/runtime/os_linux_arm.go
@@ -0,0 +1,49 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "internal/cpu"
+
+const (
+	_HWCAP_VFP   = 1 << 6  // introduced in at least 2.6.11
+	_HWCAP_VFPv3 = 1 << 13 // introduced in 2.6.30
+)
+
+func checkgoarm() {
+	// On Android, /proc/self/auxv might be unreadable and hwcap won't
+	// reflect the CPU capabilities. Assume that every Android arm device
+	// has the necessary floating point hardware available.
+	if GOOS == "android" {
+		return
+	}
+	if goarm > 5 && cpu.HWCap&_HWCAP_VFP == 0 {
+		print("runtime: this CPU has no floating point hardware, so it cannot run\n")
+		print("this GOARM=", goarm, " binary. Recompile using GOARM=5.\n")
+		exit(1)
+	}
+	if goarm > 6 && cpu.HWCap&_HWCAP_VFPv3 == 0 {
+		print("runtime: this CPU has no VFPv3 floating point hardware, so it cannot run\n")
+		print("this GOARM=", goarm, " binary. Recompile using GOARM=5 or GOARM=6.\n")
+		exit(1)
+	}
+}
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		cpu.HWCap = uint(val)
+	case _AT_HWCAP2:
+		cpu.HWCap2 = uint(val)
+	}
+}
+
+func osArchInit() {}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed fastrand().
+	// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_linux_arm64.go b/src/runtime/os_linux_arm64.go
new file mode 100644
index 0000000..c5fd742
--- /dev/null
+++ b/src/runtime/os_linux_arm64.go
@@ -0,0 +1,25 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build arm64
+
+package runtime
+
+import "internal/cpu"
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		cpu.HWCap = uint(val)
+	}
+}
+
+func osArchInit() {}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed fastrand().
+	// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_linux_be64.go b/src/runtime/os_linux_be64.go
new file mode 100644
index 0000000..9860002
--- /dev/null
+++ b/src/runtime/os_linux_be64.go
@@ -0,0 +1,44 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The standard GNU/Linux sigset type on big-endian 64-bit machines.
+
+// +build linux
+// +build ppc64 s390x
+
+package runtime
+
+const (
+	_SS_DISABLE  = 2
+	_NSIG        = 65
+	_SI_USER     = 0
+	_SIG_BLOCK   = 0
+	_SIG_UNBLOCK = 1
+	_SIG_SETMASK = 2
+)
+
+type sigset uint64
+
+var sigset_all = sigset(^uint64(0))
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	if i > 64 {
+		throw("unexpected signal greater than 64")
+	}
+	*mask |= 1 << (uint(i) - 1)
+}
+
+func sigdelset(mask *sigset, i int) {
+	if i > 64 {
+		throw("unexpected signal greater than 64")
+	}
+	*mask &^= 1 << (uint(i) - 1)
+}
+
+//go:nosplit
+func sigfillset(mask *uint64) {
+	*mask = ^uint64(0)
+}
diff --git a/src/runtime/os_linux_generic.go b/src/runtime/os_linux_generic.go
new file mode 100644
index 0000000..e1d0952
--- /dev/null
+++ b/src/runtime/os_linux_generic.go
@@ -0,0 +1,44 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !mips
+// +build !mipsle
+// +build !mips64
+// +build !mips64le
+// +build !s390x
+// +build !ppc64
+// +build linux
+
+package runtime
+
+const (
+	_SS_DISABLE  = 2
+	_NSIG        = 65
+	_SI_USER     = 0
+	_SIG_BLOCK   = 0
+	_SIG_UNBLOCK = 1
+	_SIG_SETMASK = 2
+)
+
+// It's hard to tease out exactly how big a Sigset is, but
+// rt_sigprocmask crashes if we get it wrong, so if binaries
+// are running, this is right.
+type sigset [2]uint32
+
+var sigset_all = sigset{^uint32(0), ^uint32(0)}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	(*mask)[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	(*mask)[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func sigfillset(mask *uint64) {
+	*mask = ^uint64(0)
+}
diff --git a/src/runtime/os_linux_mips64x.go b/src/runtime/os_linux_mips64x.go
new file mode 100644
index 0000000..815a83a
--- /dev/null
+++ b/src/runtime/os_linux_mips64x.go
@@ -0,0 +1,54 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build mips64 mips64le
+
+package runtime
+
+import "internal/cpu"
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		cpu.HWCap = uint(val)
+	}
+}
+
+func osArchInit() {}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed fastrand().
+	// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
+
+const (
+	_SS_DISABLE  = 2
+	_NSIG        = 129
+	_SI_USER     = 0
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+)
+
+type sigset [2]uint64
+
+var sigset_all = sigset{^uint64(0), ^uint64(0)}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	(*mask)[(i-1)/64] |= 1 << ((uint32(i) - 1) & 63)
+}
+
+func sigdelset(mask *sigset, i int) {
+	(*mask)[(i-1)/64] &^= 1 << ((uint32(i) - 1) & 63)
+}
+
+//go:nosplit
+func sigfillset(mask *[2]uint64) {
+	(*mask)[0], (*mask)[1] = ^uint64(0), ^uint64(0)
+}
diff --git a/src/runtime/os_linux_mipsx.go b/src/runtime/os_linux_mipsx.go
new file mode 100644
index 0000000..00fb02e
--- /dev/null
+++ b/src/runtime/os_linux_mipsx.go
@@ -0,0 +1,48 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build mips mipsle
+
+package runtime
+
+func archauxv(tag, val uintptr) {
+}
+
+func osArchInit() {}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed fastrand().
+	// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
+
+const (
+	_SS_DISABLE  = 2
+	_NSIG        = 128 + 1
+	_SI_USER     = 0
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+)
+
+type sigset [4]uint32
+
+var sigset_all = sigset{^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	(*mask)[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	(*mask)[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func sigfillset(mask *[4]uint32) {
+	(*mask)[0], (*mask)[1], (*mask)[2], (*mask)[3] = ^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)
+}
diff --git a/src/runtime/os_linux_noauxv.go b/src/runtime/os_linux_noauxv.go
new file mode 100644
index 0000000..895b4cd
--- /dev/null
+++ b/src/runtime/os_linux_noauxv.go
@@ -0,0 +1,11 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build !arm,!arm64,!mips,!mipsle,!mips64,!mips64le,!s390x,!ppc64,!ppc64le
+
+package runtime
+
+func archauxv(tag, val uintptr) {
+}
diff --git a/src/runtime/os_linux_novdso.go b/src/runtime/os_linux_novdso.go
new file mode 100644
index 0000000..155f415
--- /dev/null
+++ b/src/runtime/os_linux_novdso.go
@@ -0,0 +1,11 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build !386,!amd64,!arm,!arm64,!mips64,!mips64le,!ppc64,!ppc64le
+
+package runtime
+
+func vdsoauxv(tag, val uintptr) {
+}
diff --git a/src/runtime/os_linux_ppc64x.go b/src/runtime/os_linux_ppc64x.go
new file mode 100644
index 0000000..3aedc23
--- /dev/null
+++ b/src/runtime/os_linux_ppc64x.go
@@ -0,0 +1,24 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build ppc64 ppc64le
+
+package runtime
+
+import "internal/cpu"
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		// ppc64x doesn't have a 'cpuid' instruction
+		// equivalent and relies on HWCAP/HWCAP2 bits for
+		// hardware capabilities.
+		cpu.HWCap = uint(val)
+	case _AT_HWCAP2:
+		cpu.HWCap2 = uint(val)
+	}
+}
+
+func osArchInit() {}
diff --git a/src/runtime/os_linux_riscv64.go b/src/runtime/os_linux_riscv64.go
new file mode 100644
index 0000000..9be88a5
--- /dev/null
+++ b/src/runtime/os_linux_riscv64.go
@@ -0,0 +1,7 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func osArchInit() {}
diff --git a/src/runtime/os_linux_s390x.go b/src/runtime/os_linux_s390x.go
new file mode 100644
index 0000000..b9651f1
--- /dev/null
+++ b/src/runtime/os_linux_s390x.go
@@ -0,0 +1,16 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "internal/cpu"
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		cpu.HWCap = uint(val)
+	}
+}
+
+func osArchInit() {}
diff --git a/src/runtime/os_linux_x86.go b/src/runtime/os_linux_x86.go
new file mode 100644
index 0000000..d91fa1a
--- /dev/null
+++ b/src/runtime/os_linux_x86.go
@@ -0,0 +1,10 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build 386 amd64
+
+package runtime
+
+func osArchInit() {}
diff --git a/src/runtime/os_netbsd.go b/src/runtime/os_netbsd.go
new file mode 100644
index 0000000..2b742a3
--- /dev/null
+++ b/src/runtime/os_netbsd.go
@@ -0,0 +1,396 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	_SS_DISABLE  = 4
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+	_NSIG        = 33
+	_SI_USER     = 0
+
+	// From NetBSD's <sys/ucontext.h>
+	_UC_SIGMASK = 0x01
+	_UC_CPU     = 0x04
+
+	// From <sys/lwp.h>
+	_LWP_DETACHED = 0x00000040
+)
+
+type mOS struct {
+	waitsemacount uint32
+}
+
+//go:noescape
+func setitimer(mode int32, new, old *itimerval)
+
+//go:noescape
+func sigaction(sig uint32, new, old *sigactiont)
+
+//go:noescape
+func sigaltstack(new, old *stackt)
+
+//go:noescape
+func sigprocmask(how int32, new, old *sigset)
+
+//go:noescape
+func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+
+func lwp_tramp()
+
+func raiseproc(sig uint32)
+
+func lwp_kill(tid int32, sig int)
+
+//go:noescape
+func getcontext(ctxt unsafe.Pointer)
+
+//go:noescape
+func lwp_create(ctxt unsafe.Pointer, flags uintptr, lwpid unsafe.Pointer) int32
+
+//go:noescape
+func lwp_park(clockid, flags int32, ts *timespec, unpark int32, hint, unparkhint unsafe.Pointer) int32
+
+//go:noescape
+func lwp_unpark(lwp int32, hint unsafe.Pointer) int32
+
+func lwp_self() int32
+
+func osyield()
+
+func kqueue() int32
+
+//go:noescape
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32
+
+func pipe() (r, w int32, errno int32)
+func pipe2(flags int32) (r, w int32, errno int32)
+func closeonexec(fd int32)
+func setNonblock(fd int32)
+
+const (
+	_ESRCH     = 3
+	_ETIMEDOUT = 60
+
+	// From NetBSD's <sys/time.h>
+	_CLOCK_REALTIME  = 0
+	_CLOCK_VIRTUAL   = 1
+	_CLOCK_PROF      = 2
+	_CLOCK_MONOTONIC = 3
+
+	_TIMER_RELTIME = 0
+	_TIMER_ABSTIME = 1
+)
+
+var sigset_all = sigset{[4]uint32{^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)}}
+
+// From NetBSD's <sys/sysctl.h>
+const (
+	_CTL_HW        = 6
+	_HW_NCPU       = 3
+	_HW_PAGESIZE   = 7
+	_HW_NCPUONLINE = 16
+)
+
+func sysctlInt(mib []uint32) (int32, bool) {
+	var out int32
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], uint32(len(mib)), (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret < 0 {
+		return 0, false
+	}
+	return out, true
+}
+
+func getncpu() int32 {
+	if n, ok := sysctlInt([]uint32{_CTL_HW, _HW_NCPUONLINE}); ok {
+		return int32(n)
+	}
+	if n, ok := sysctlInt([]uint32{_CTL_HW, _HW_NCPU}); ok {
+		return int32(n)
+	}
+	return 1
+}
+
+func getPageSize() uintptr {
+	mib := [2]uint32{_CTL_HW, _HW_PAGESIZE}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 {
+		return uintptr(out)
+	}
+	return 0
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	_g_ := getg()
+	var deadline int64
+	if ns >= 0 {
+		deadline = nanotime() + ns
+	}
+
+	for {
+		v := atomic.Load(&_g_.m.waitsemacount)
+		if v > 0 {
+			if atomic.Cas(&_g_.m.waitsemacount, v, v-1) {
+				return 0 // semaphore acquired
+			}
+			continue
+		}
+
+		// Sleep until unparked by semawakeup or timeout.
+		var tsp *timespec
+		var ts timespec
+		if ns >= 0 {
+			wait := deadline - nanotime()
+			if wait <= 0 {
+				return -1
+			}
+			ts.setNsec(wait)
+			tsp = &ts
+		}
+		ret := lwp_park(_CLOCK_MONOTONIC, _TIMER_RELTIME, tsp, 0, unsafe.Pointer(&_g_.m.waitsemacount), nil)
+		if ret == _ETIMEDOUT {
+			return -1
+		}
+	}
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	atomic.Xadd(&mp.waitsemacount, 1)
+	// From NetBSD's _lwp_unpark(2) manual:
+	// "If the target LWP is not currently waiting, it will return
+	// immediately upon the next call to _lwp_park()."
+	ret := lwp_unpark(int32(mp.procid), unsafe.Pointer(&mp.waitsemacount))
+	if ret != 0 && ret != _ESRCH {
+		// semawakeup can be called on signal stack.
+		systemstack(func() {
+			print("thrwakeup addr=", &mp.waitsemacount, " sem=", mp.waitsemacount, " ret=", ret, "\n")
+		})
+	}
+}
+
+// May run with m.p==nil, so write barriers are not allowed.
+//go:nowritebarrier
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	var uc ucontextt
+	getcontext(unsafe.Pointer(&uc))
+
+	// _UC_SIGMASK does not seem to work here.
+	// It would be nice if _UC_SIGMASK and _UC_STACK
+	// worked so that we could do all the work setting
+	// the sigmask and the stack here, instead of setting
+	// the mask here and the stack in netbsdMstart.
+	// For now do the blocking manually.
+	uc.uc_flags = _UC_SIGMASK | _UC_CPU
+	uc.uc_link = nil
+	uc.uc_sigmask = sigset_all
+
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+
+	lwp_mcontext_init(&uc.uc_mcontext, stk, mp, mp.g0, funcPC(netbsdMstart))
+
+	ret := lwp_create(unsafe.Pointer(&uc), _LWP_DETACHED, unsafe.Pointer(&mp.procid))
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret < 0 {
+		print("runtime: failed to create new OS thread (have ", mcount()-1, " already; errno=", -ret, ")\n")
+		if ret == -_EAGAIN {
+			println("runtime: may need to increase max user processes (ulimit -p)")
+		}
+		throw("runtime.newosproc")
+	}
+}
+
+// netbsdMStart is the function call that starts executing a newly
+// created thread. On NetBSD, a new thread inherits the signal stack
+// of the creating thread. That confuses minit, so we remove that
+// signal stack here before calling the regular mstart. It's a bit
+// baroque to remove a signal stack here only to add one in minit, but
+// it's a simple change that keeps NetBSD working like other OS's.
+// At this point all signals are blocked, so there is no race.
+//go:nosplit
+func netbsdMstart() {
+	st := stackt{ss_flags: _SS_DISABLE}
+	sigaltstack(&st, nil)
+	mstart()
+}
+
+func osinit() {
+	ncpu = getncpu()
+	if physPageSize == 0 {
+		physPageSize = getPageSize()
+	}
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	_g_ := getg()
+	_g_.m.procid = uint64(lwp_self())
+
+	// On NetBSD a thread created by pthread_create inherits the
+	// signal stack of the creating thread. We always create a
+	// new signal stack here, to avoid having two Go threads using
+	// the same signal stack. This breaks the case of a thread
+	// created in C that calls sigaltstack and then calls a Go
+	// function, because we will lose track of the C code's
+	// sigaltstack, but it's the best we can do.
+	signalstack(&_g_.m.gsignal.stack)
+	_g_.m.newSigstack = true
+
+	minitSignalMask()
+}
+
+// Called from dropm to undo the effect of an minit.
+//go:nosplit
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func sigtramp()
+
+type sigactiont struct {
+	sa_sigaction uintptr
+	sa_mask      sigset
+	sa_flags     int32
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == funcPC(sighandler) {
+		fn = funcPC(sigtramp)
+	}
+	sa.sa_sigaction = fn
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	throw("setsigstack")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return sa.sa_sigaction
+}
+
+// setSignaltstackSP sets the ss_sp field of a stackt.
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	s.ss_sp = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+func sysargs(argc int32, argv **byte) {
+	n := argc + 1
+
+	// skip over argv, envp to get to auxv
+	for argv_index(argv, n) != nil {
+		n++
+	}
+
+	// skip NULL separator
+	n++
+
+	// now argv+n is auxv
+	auxv := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*sys.PtrSize))
+	sysauxv(auxv[:])
+}
+
+const (
+	_AT_NULL   = 0 // Terminates the vector
+	_AT_PAGESZ = 6 // Page size in bytes
+)
+
+func sysauxv(auxv []uintptr) {
+	for i := 0; auxv[i] != _AT_NULL; i += 2 {
+		tag, val := auxv[i], auxv[i+1]
+		switch tag {
+		case _AT_PAGESZ:
+			physPageSize = val
+		}
+	}
+}
+
+// raise sends signal to the calling thread.
+//
+// It must be nosplit because it is used by the signal handler before
+// it definitely has a Go stack.
+//
+//go:nosplit
+func raise(sig uint32) {
+	lwp_kill(lwp_self(), int(sig))
+}
+
+func signalM(mp *m, sig int) {
+	lwp_kill(int32(mp.procid), sig)
+}
diff --git a/src/runtime/os_netbsd_386.go b/src/runtime/os_netbsd_386.go
new file mode 100644
index 0000000..037f7e3
--- /dev/null
+++ b/src/runtime/os_netbsd_386.go
@@ -0,0 +1,16 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func lwp_mcontext_init(mc *mcontextt, stk unsafe.Pointer, mp *m, gp *g, fn uintptr) {
+	// Machine dependent mcontext initialisation for LWP.
+	mc.__gregs[_REG_EIP] = uint32(funcPC(lwp_tramp))
+	mc.__gregs[_REG_UESP] = uint32(uintptr(stk))
+	mc.__gregs[_REG_EBX] = uint32(uintptr(unsafe.Pointer(mp)))
+	mc.__gregs[_REG_EDX] = uint32(uintptr(unsafe.Pointer(gp)))
+	mc.__gregs[_REG_ESI] = uint32(fn)
+}
diff --git a/src/runtime/os_netbsd_amd64.go b/src/runtime/os_netbsd_amd64.go
new file mode 100644
index 0000000..5118b0c
--- /dev/null
+++ b/src/runtime/os_netbsd_amd64.go
@@ -0,0 +1,16 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func lwp_mcontext_init(mc *mcontextt, stk unsafe.Pointer, mp *m, gp *g, fn uintptr) {
+	// Machine dependent mcontext initialisation for LWP.
+	mc.__gregs[_REG_RIP] = uint64(funcPC(lwp_tramp))
+	mc.__gregs[_REG_RSP] = uint64(uintptr(stk))
+	mc.__gregs[_REG_R8] = uint64(uintptr(unsafe.Pointer(mp)))
+	mc.__gregs[_REG_R9] = uint64(uintptr(unsafe.Pointer(gp)))
+	mc.__gregs[_REG_R12] = uint64(fn)
+}
diff --git a/src/runtime/os_netbsd_arm.go b/src/runtime/os_netbsd_arm.go
new file mode 100644
index 0000000..b5ec23e
--- /dev/null
+++ b/src/runtime/os_netbsd_arm.go
@@ -0,0 +1,34 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func lwp_mcontext_init(mc *mcontextt, stk unsafe.Pointer, mp *m, gp *g, fn uintptr) {
+	// Machine dependent mcontext initialisation for LWP.
+	mc.__gregs[_REG_R15] = uint32(funcPC(lwp_tramp))
+	mc.__gregs[_REG_R13] = uint32(uintptr(stk))
+	mc.__gregs[_REG_R0] = uint32(uintptr(unsafe.Pointer(mp)))
+	mc.__gregs[_REG_R1] = uint32(uintptr(unsafe.Pointer(gp)))
+	mc.__gregs[_REG_R2] = uint32(fn)
+}
+
+func checkgoarm() {
+	// TODO(minux): FP checks like in os_linux_arm.go.
+
+	// osinit not called yet, so ncpu not set: must use getncpu directly.
+	if getncpu() > 1 && goarm < 7 {
+		print("runtime: this system has multiple CPUs and must use\n")
+		print("atomic synchronization instructions. Recompile using GOARM=7.\n")
+		exit(1)
+	}
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_netbsd_arm64.go b/src/runtime/os_netbsd_arm64.go
new file mode 100644
index 0000000..8d21b0a
--- /dev/null
+++ b/src/runtime/os_netbsd_arm64.go
@@ -0,0 +1,23 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func lwp_mcontext_init(mc *mcontextt, stk unsafe.Pointer, mp *m, gp *g, fn uintptr) {
+	// Machine dependent mcontext initialisation for LWP.
+	mc.__gregs[_REG_ELR] = uint64(funcPC(lwp_tramp))
+	mc.__gregs[_REG_X31] = uint64(uintptr(stk))
+	mc.__gregs[_REG_X0] = uint64(uintptr(unsafe.Pointer(mp)))
+	mc.__gregs[_REG_X1] = uint64(uintptr(unsafe.Pointer(mp.g0)))
+	mc.__gregs[_REG_X2] = uint64(fn)
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_nonopenbsd.go b/src/runtime/os_nonopenbsd.go
new file mode 100644
index 0000000..e65697b
--- /dev/null
+++ b/src/runtime/os_nonopenbsd.go
@@ -0,0 +1,17 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !openbsd
+
+package runtime
+
+// osStackAlloc performs OS-specific initialization before s is used
+// as stack memory.
+func osStackAlloc(s *mspan) {
+}
+
+// osStackFree undoes the effect of osStackAlloc before s is returned
+// to the heap.
+func osStackFree(s *mspan) {
+}
diff --git a/src/runtime/os_only_solaris.go b/src/runtime/os_only_solaris.go
new file mode 100644
index 0000000..e2f5409
--- /dev/null
+++ b/src/runtime/os_only_solaris.go
@@ -0,0 +1,18 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Solaris code that doesn't also apply to illumos.
+
+// +build !illumos
+
+package runtime
+
+func getncpu() int32 {
+	n := int32(sysconf(__SC_NPROCESSORS_ONLN))
+	if n < 1 {
+		return 1
+	}
+
+	return n
+}
diff --git a/src/runtime/os_openbsd.go b/src/runtime/os_openbsd.go
new file mode 100644
index 0000000..6259b96
--- /dev/null
+++ b/src/runtime/os_openbsd.go
@@ -0,0 +1,274 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+type mOS struct {
+	waitsemacount uint32
+}
+
+const (
+	_ESRCH       = 3
+	_EWOULDBLOCK = _EAGAIN
+	_ENOTSUP     = 91
+
+	// From OpenBSD's sys/time.h
+	_CLOCK_REALTIME  = 0
+	_CLOCK_VIRTUAL   = 1
+	_CLOCK_PROF      = 2
+	_CLOCK_MONOTONIC = 3
+)
+
+type sigset uint32
+
+var sigset_all = ^sigset(0)
+
+// From OpenBSD's <sys/sysctl.h>
+const (
+	_CTL_KERN   = 1
+	_KERN_OSREV = 3
+
+	_CTL_HW        = 6
+	_HW_NCPU       = 3
+	_HW_PAGESIZE   = 7
+	_HW_NCPUONLINE = 25
+)
+
+func sysctlInt(mib []uint32) (int32, bool) {
+	var out int32
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], uint32(len(mib)), (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret < 0 {
+		return 0, false
+	}
+	return out, true
+}
+
+func getncpu() int32 {
+	// Try hw.ncpuonline first because hw.ncpu would report a number twice as
+	// high as the actual CPUs running on OpenBSD 6.4 with hyperthreading
+	// disabled (hw.smt=0). See https://golang.org/issue/30127
+	if n, ok := sysctlInt([]uint32{_CTL_HW, _HW_NCPUONLINE}); ok {
+		return int32(n)
+	}
+	if n, ok := sysctlInt([]uint32{_CTL_HW, _HW_NCPU}); ok {
+		return int32(n)
+	}
+	return 1
+}
+
+func getPageSize() uintptr {
+	if ps, ok := sysctlInt([]uint32{_CTL_HW, _HW_PAGESIZE}); ok {
+		return uintptr(ps)
+	}
+	return 0
+}
+
+func getOSRev() int {
+	if osrev, ok := sysctlInt([]uint32{_CTL_KERN, _KERN_OSREV}); ok {
+		return int(osrev)
+	}
+	return 0
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	_g_ := getg()
+
+	// Compute sleep deadline.
+	var tsp *timespec
+	if ns >= 0 {
+		var ts timespec
+		ts.setNsec(ns + nanotime())
+		tsp = &ts
+	}
+
+	for {
+		v := atomic.Load(&_g_.m.waitsemacount)
+		if v > 0 {
+			if atomic.Cas(&_g_.m.waitsemacount, v, v-1) {
+				return 0 // semaphore acquired
+			}
+			continue
+		}
+
+		// Sleep until woken by semawakeup or timeout; or abort if waitsemacount != 0.
+		//
+		// From OpenBSD's __thrsleep(2) manual:
+		// "The abort argument, if not NULL, points to an int that will
+		// be examined [...] immediately before blocking. If that int
+		// is non-zero then __thrsleep() will immediately return EINTR
+		// without blocking."
+		ret := thrsleep(uintptr(unsafe.Pointer(&_g_.m.waitsemacount)), _CLOCK_MONOTONIC, tsp, 0, &_g_.m.waitsemacount)
+		if ret == _EWOULDBLOCK {
+			return -1
+		}
+	}
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	atomic.Xadd(&mp.waitsemacount, 1)
+	ret := thrwakeup(uintptr(unsafe.Pointer(&mp.waitsemacount)), 1)
+	if ret != 0 && ret != _ESRCH {
+		// semawakeup can be called on signal stack.
+		systemstack(func() {
+			print("thrwakeup addr=", &mp.waitsemacount, " sem=", mp.waitsemacount, " ret=", ret, "\n")
+		})
+	}
+}
+
+func osinit() {
+	ncpu = getncpu()
+	physPageSize = getPageSize()
+	haveMapStack = getOSRev() >= 201805 // OpenBSD 6.3
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	gsignalSize := int32(32 * 1024)
+	if GOARCH == "mips64" {
+		gsignalSize = int32(64 * 1024)
+	}
+	mp.gsignal = malg(gsignalSize)
+	mp.gsignal.m = mp
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, can not allocate memory.
+func minit() {
+	getg().m.procid = uint64(getthrid())
+	minitSignals()
+}
+
+// Called from dropm to undo the effect of an minit.
+//go:nosplit
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func sigtramp()
+
+type sigactiont struct {
+	sa_sigaction uintptr
+	sa_mask      uint32
+	sa_flags     int32
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = uint32(sigset_all)
+	if fn == funcPC(sighandler) {
+		fn = funcPC(sigtramp)
+	}
+	sa.sa_sigaction = fn
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	throw("setsigstack")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return sa.sa_sigaction
+}
+
+// setSignaltstackSP sets the ss_sp field of a stackt.
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	s.ss_sp = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	*mask |= 1 << (uint32(i) - 1)
+}
+
+func sigdelset(mask *sigset, i int) {
+	*mask &^= 1 << (uint32(i) - 1)
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+var haveMapStack = false
+
+func osStackAlloc(s *mspan) {
+	// OpenBSD 6.4+ requires that stacks be mapped with MAP_STACK.
+	// It will check this on entry to system calls, traps, and
+	// when switching to the alternate system stack.
+	//
+	// This function is called before s is used for any data, so
+	// it's safe to simply re-map it.
+	osStackRemap(s, _MAP_STACK)
+}
+
+func osStackFree(s *mspan) {
+	// Undo MAP_STACK.
+	osStackRemap(s, 0)
+}
+
+func osStackRemap(s *mspan, flags int32) {
+	if !haveMapStack {
+		// OpenBSD prior to 6.3 did not have MAP_STACK and so
+		// the following mmap will fail. But it also didn't
+		// require MAP_STACK (obviously), so there's no need
+		// to do the mmap.
+		return
+	}
+	a, err := mmap(unsafe.Pointer(s.base()), s.npages*pageSize, _PROT_READ|_PROT_WRITE, _MAP_PRIVATE|_MAP_ANON|_MAP_FIXED|flags, -1, 0)
+	if err != 0 || uintptr(a) != s.base() {
+		print("runtime: remapping stack memory ", hex(s.base()), " ", s.npages*pageSize, " a=", a, " err=", err, "\n")
+		throw("remapping stack memory failed")
+	}
+}
+
+//go:nosplit
+func raise(sig uint32) {
+	thrkill(getthrid(), int(sig))
+}
+
+func signalM(mp *m, sig int) {
+	thrkill(int32(mp.procid), sig)
+}
diff --git a/src/runtime/os_openbsd_arm.go b/src/runtime/os_openbsd_arm.go
new file mode 100644
index 0000000..0a24096
--- /dev/null
+++ b/src/runtime/os_openbsd_arm.go
@@ -0,0 +1,23 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func checkgoarm() {
+	// TODO(minux): FP checks like in os_linux_arm.go.
+
+	// osinit not called yet, so ncpu not set: must use getncpu directly.
+	if getncpu() > 1 && goarm < 7 {
+		print("runtime: this system has multiple CPUs and must use\n")
+		print("atomic synchronization instructions. Recompile using GOARM=7.\n")
+		exit(1)
+	}
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_openbsd_arm64.go b/src/runtime/os_openbsd_arm64.go
new file mode 100644
index 0000000..d71de7d
--- /dev/null
+++ b/src/runtime/os_openbsd_arm64.go
@@ -0,0 +1,12 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_openbsd_libc.go b/src/runtime/os_openbsd_libc.go
new file mode 100644
index 0000000..2edb035
--- /dev/null
+++ b/src/runtime/os_openbsd_libc.go
@@ -0,0 +1,58 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build openbsd,amd64 openbsd,arm64
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+var failThreadCreate = []byte("runtime: failed to create new OS thread\n")
+
+// mstart_stub provides glue code to call mstart from pthread_create.
+func mstart_stub()
+
+// May run with m.p==nil, so write barriers are not allowed.
+//go:nowritebarrierrec
+func newosproc(mp *m) {
+	if false {
+		print("newosproc m=", mp, " g=", mp.g0, " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	// Initialize an attribute object.
+	var attr pthreadattr
+	if err := pthread_attr_init(&attr); err != 0 {
+		write(2, unsafe.Pointer(&failThreadCreate[0]), int32(len(failThreadCreate)))
+		exit(1)
+	}
+
+	// Find out OS stack size for our own stack guard.
+	var stacksize uintptr
+	if pthread_attr_getstacksize(&attr, &stacksize) != 0 {
+		write(2, unsafe.Pointer(&failThreadCreate[0]), int32(len(failThreadCreate)))
+		exit(1)
+	}
+	mp.g0.stack.hi = stacksize // for mstart
+
+	// Tell the pthread library we won't join with this thread.
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		write(2, unsafe.Pointer(&failThreadCreate[0]), int32(len(failThreadCreate)))
+		exit(1)
+	}
+
+	// Finally, create the thread. It starts at mstart_stub, which does some low-level
+	// setup and then calls mstart.
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	err := pthread_create(&attr, funcPC(mstart_stub), unsafe.Pointer(mp))
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if err != 0 {
+		write(2, unsafe.Pointer(&failThreadCreate[0]), int32(len(failThreadCreate)))
+		exit(1)
+	}
+
+	pthread_attr_destroy(&attr)
+}
diff --git a/src/runtime/os_openbsd_mips64.go b/src/runtime/os_openbsd_mips64.go
new file mode 100644
index 0000000..ae220cd
--- /dev/null
+++ b/src/runtime/os_openbsd_mips64.go
@@ -0,0 +1,12 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_openbsd_syscall.go b/src/runtime/os_openbsd_syscall.go
new file mode 100644
index 0000000..16ff2b8
--- /dev/null
+++ b/src/runtime/os_openbsd_syscall.go
@@ -0,0 +1,46 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build openbsd,!amd64
+// +build openbsd,!arm64
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+//go:noescape
+func tfork(param *tforkt, psize uintptr, mm *m, gg *g, fn uintptr) int32
+
+// May run with m.p==nil, so write barriers are not allowed.
+//go:nowritebarrier
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	// Stack pointer must point inside stack area (as marked with MAP_STACK),
+	// rather than at the top of it.
+	param := tforkt{
+		tf_tcb:   unsafe.Pointer(&mp.tls[0]),
+		tf_tid:   nil, // minit will record tid
+		tf_stack: uintptr(stk) - sys.PtrSize,
+	}
+
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret := tfork(&param, unsafe.Sizeof(param), mp, mp.g0, funcPC(mstart))
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+
+	if ret < 0 {
+		print("runtime: failed to create new OS thread (have ", mcount()-1, " already; errno=", -ret, ")\n")
+		if ret == -_EAGAIN {
+			println("runtime: may need to increase max user processes (ulimit -p)")
+		}
+		throw("runtime.newosproc")
+	}
+}
diff --git a/src/runtime/os_openbsd_syscall1.go b/src/runtime/os_openbsd_syscall1.go
new file mode 100644
index 0000000..b0bef4c
--- /dev/null
+++ b/src/runtime/os_openbsd_syscall1.go
@@ -0,0 +1,15 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build openbsd,!amd64,!arm64
+
+package runtime
+
+//go:noescape
+func thrsleep(ident uintptr, clock_id int32, tsp *timespec, lock uintptr, abort *uint32) int32
+
+//go:noescape
+func thrwakeup(ident uintptr, n int32) int32
+
+func osyield()
diff --git a/src/runtime/os_openbsd_syscall2.go b/src/runtime/os_openbsd_syscall2.go
new file mode 100644
index 0000000..ab94051
--- /dev/null
+++ b/src/runtime/os_openbsd_syscall2.go
@@ -0,0 +1,95 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build openbsd,!amd64,!arm64
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+//go:noescape
+func sigaction(sig uint32, new, old *sigactiont)
+
+func kqueue() int32
+
+//go:noescape
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32
+
+func raiseproc(sig uint32)
+
+func getthrid() int32
+func thrkill(tid int32, sig int)
+
+// read calls the read system call.
+// It returns a non-negative number of bytes written or a negative errno value.
+func read(fd int32, p unsafe.Pointer, n int32) int32
+
+func closefd(fd int32) int32
+
+func exit(code int32)
+func usleep(usec uint32)
+
+// write calls the write system call.
+// It returns a non-negative number of bytes written or a negative errno value.
+//go:noescape
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32
+
+//go:noescape
+func open(name *byte, mode, perm int32) int32
+
+// return value is only set on linux to be used in osinit()
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) int32
+
+// exitThread terminates the current thread, writing *wait = 0 when
+// the stack is safe to reclaim.
+//
+//go:noescape
+func exitThread(wait *uint32)
+
+//go:noescape
+func obsdsigprocmask(how int32, new sigset) sigset
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigprocmask(how int32, new, old *sigset) {
+	n := sigset(0)
+	if new != nil {
+		n = *new
+	}
+	r := obsdsigprocmask(how, n)
+	if old != nil {
+		*old = r
+	}
+}
+
+func pipe() (r, w int32, errno int32)
+func pipe2(flags int32) (r, w int32, errno int32)
+
+//go:noescape
+func setitimer(mode int32, new, old *itimerval)
+
+//go:noescape
+func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+
+// mmap calls the mmap system call. It is implemented in assembly.
+// We only pass the lower 32 bits of file offset to the
+// assembly routine; the higher bits (if required), should be provided
+// by the assembly routine as 0.
+// The err result is an OS error code such as ENOMEM.
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (p unsafe.Pointer, err int)
+
+// munmap calls the munmap system call. It is implemented in assembly.
+func munmap(addr unsafe.Pointer, n uintptr)
+
+func nanotime1() int64
+
+//go:noescape
+func sigaltstack(new, old *stackt)
+
+func closeonexec(fd int32)
+func setNonblock(fd int32)
+
+func walltime1() (sec int64, nsec int32)
diff --git a/src/runtime/os_plan9.go b/src/runtime/os_plan9.go
new file mode 100644
index 0000000..2a84a73
--- /dev/null
+++ b/src/runtime/os_plan9.go
@@ -0,0 +1,514 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+type mOS struct {
+	waitsemacount uint32
+	notesig       *int8
+	errstr        *byte
+	ignoreHangup  bool
+}
+
+func closefd(fd int32) int32
+
+//go:noescape
+func open(name *byte, mode, perm int32) int32
+
+//go:noescape
+func pread(fd int32, buf unsafe.Pointer, nbytes int32, offset int64) int32
+
+//go:noescape
+func pwrite(fd int32, buf unsafe.Pointer, nbytes int32, offset int64) int32
+
+func seek(fd int32, offset int64, whence int32) int64
+
+//go:noescape
+func exits(msg *byte)
+
+//go:noescape
+func brk_(addr unsafe.Pointer) int32
+
+func sleep(ms int32) int32
+
+func rfork(flags int32) int32
+
+//go:noescape
+func plan9_semacquire(addr *uint32, block int32) int32
+
+//go:noescape
+func plan9_tsemacquire(addr *uint32, ms int32) int32
+
+//go:noescape
+func plan9_semrelease(addr *uint32, count int32) int32
+
+//go:noescape
+func notify(fn unsafe.Pointer) int32
+
+func noted(mode int32) int32
+
+//go:noescape
+func nsec(*int64) int64
+
+//go:noescape
+func sigtramp(ureg, note unsafe.Pointer)
+
+func setfpmasks()
+
+//go:noescape
+func tstart_plan9(newm *m)
+
+func errstr() string
+
+type _Plink uintptr
+
+//go:linkname os_sigpipe os.sigpipe
+func os_sigpipe() {
+	throw("too many writes on closed pipe")
+}
+
+func sigpanic() {
+	g := getg()
+	if !canpanic(g) {
+		throw("unexpected signal during runtime execution")
+	}
+
+	note := gostringnocopy((*byte)(unsafe.Pointer(g.m.notesig)))
+	switch g.sig {
+	case _SIGRFAULT, _SIGWFAULT:
+		i := indexNoFloat(note, "addr=")
+		if i >= 0 {
+			i += 5
+		} else if i = indexNoFloat(note, "va="); i >= 0 {
+			i += 3
+		} else {
+			panicmem()
+		}
+		addr := note[i:]
+		g.sigcode1 = uintptr(atolwhex(addr))
+		if g.sigcode1 < 0x1000 {
+			panicmem()
+		}
+		if g.paniconfault {
+			panicmemAddr(g.sigcode1)
+		}
+		print("unexpected fault address ", hex(g.sigcode1), "\n")
+		throw("fault")
+	case _SIGTRAP:
+		if g.paniconfault {
+			panicmem()
+		}
+		throw(note)
+	case _SIGINTDIV:
+		panicdivide()
+	case _SIGFLOAT:
+		panicfloat()
+	default:
+		panic(errorString(note))
+	}
+}
+
+// indexNoFloat is bytealg.IndexString but safe to use in a note
+// handler.
+func indexNoFloat(s, t string) int {
+	if len(t) == 0 {
+		return 0
+	}
+	for i := 0; i < len(s); i++ {
+		if s[i] == t[0] && hasPrefix(s[i:], t) {
+			return i
+		}
+	}
+	return -1
+}
+
+func atolwhex(p string) int64 {
+	for hasPrefix(p, " ") || hasPrefix(p, "\t") {
+		p = p[1:]
+	}
+	neg := false
+	if hasPrefix(p, "-") || hasPrefix(p, "+") {
+		neg = p[0] == '-'
+		p = p[1:]
+		for hasPrefix(p, " ") || hasPrefix(p, "\t") {
+			p = p[1:]
+		}
+	}
+	var n int64
+	switch {
+	case hasPrefix(p, "0x"), hasPrefix(p, "0X"):
+		p = p[2:]
+		for ; len(p) > 0; p = p[1:] {
+			if '0' <= p[0] && p[0] <= '9' {
+				n = n*16 + int64(p[0]-'0')
+			} else if 'a' <= p[0] && p[0] <= 'f' {
+				n = n*16 + int64(p[0]-'a'+10)
+			} else if 'A' <= p[0] && p[0] <= 'F' {
+				n = n*16 + int64(p[0]-'A'+10)
+			} else {
+				break
+			}
+		}
+	case hasPrefix(p, "0"):
+		for ; len(p) > 0 && '0' <= p[0] && p[0] <= '7'; p = p[1:] {
+			n = n*8 + int64(p[0]-'0')
+		}
+	default:
+		for ; len(p) > 0 && '0' <= p[0] && p[0] <= '9'; p = p[1:] {
+			n = n*10 + int64(p[0]-'0')
+		}
+	}
+	if neg {
+		n = -n
+	}
+	return n
+}
+
+type sigset struct{}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	// Initialize stack and goroutine for note handling.
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+	mp.notesig = (*int8)(mallocgc(_ERRMAX, nil, true))
+	// Initialize stack for handling strings from the
+	// errstr system call, as used in package syscall.
+	mp.errstr = (*byte)(mallocgc(_ERRMAX, nil, true))
+}
+
+func sigsave(p *sigset) {
+}
+
+func msigrestore(sigmask sigset) {
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func clearSignalHandlers() {
+}
+
+func sigblock(exiting bool) {
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	if atomic.Load(&exiting) != 0 {
+		exits(&emptystatus[0])
+	}
+	// Mask all SSE floating-point exceptions
+	// when running on the 64-bit kernel.
+	setfpmasks()
+}
+
+// Called from dropm to undo the effect of an minit.
+func unminit() {
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+var sysstat = []byte("/dev/sysstat\x00")
+
+func getproccount() int32 {
+	var buf [2048]byte
+	fd := open(&sysstat[0], _OREAD, 0)
+	if fd < 0 {
+		return 1
+	}
+	ncpu := int32(0)
+	for {
+		n := read(fd, unsafe.Pointer(&buf), int32(len(buf)))
+		if n <= 0 {
+			break
+		}
+		for i := int32(0); i < n; i++ {
+			if buf[i] == '\n' {
+				ncpu++
+			}
+		}
+	}
+	closefd(fd)
+	if ncpu == 0 {
+		ncpu = 1
+	}
+	return ncpu
+}
+
+var devswap = []byte("/dev/swap\x00")
+var pagesize = []byte(" pagesize\n")
+
+func getPageSize() uintptr {
+	var buf [2048]byte
+	var pos int
+	fd := open(&devswap[0], _OREAD, 0)
+	if fd < 0 {
+		// There's not much we can do if /dev/swap doesn't
+		// exist. However, nothing in the memory manager uses
+		// this on Plan 9, so it also doesn't really matter.
+		return minPhysPageSize
+	}
+	for pos < len(buf) {
+		n := read(fd, unsafe.Pointer(&buf[pos]), int32(len(buf)-pos))
+		if n <= 0 {
+			break
+		}
+		pos += int(n)
+	}
+	closefd(fd)
+	text := buf[:pos]
+	// Find "<n> pagesize" line.
+	bol := 0
+	for i, c := range text {
+		if c == '\n' {
+			bol = i + 1
+		}
+		if bytesHasPrefix(text[i:], pagesize) {
+			// Parse number at the beginning of this line.
+			return uintptr(_atoi(text[bol:]))
+		}
+	}
+	// Again, the page size doesn't really matter, so use a fallback.
+	return minPhysPageSize
+}
+
+func bytesHasPrefix(s, prefix []byte) bool {
+	if len(s) < len(prefix) {
+		return false
+	}
+	for i, p := range prefix {
+		if s[i] != p {
+			return false
+		}
+	}
+	return true
+}
+
+var pid = []byte("#c/pid\x00")
+
+func getpid() uint64 {
+	var b [20]byte
+	fd := open(&pid[0], 0, 0)
+	if fd >= 0 {
+		read(fd, unsafe.Pointer(&b), int32(len(b)))
+		closefd(fd)
+	}
+	c := b[:]
+	for c[0] == ' ' || c[0] == '\t' {
+		c = c[1:]
+	}
+	return uint64(_atoi(c))
+}
+
+func osinit() {
+	initBloc()
+	ncpu = getproccount()
+	physPageSize = getPageSize()
+	getg().m.procid = getpid()
+}
+
+//go:nosplit
+func crash() {
+	notify(nil)
+	*(*int)(nil) = 0
+}
+
+//go:nosplit
+func getRandomData(r []byte) {
+	extendRandom(r, 0)
+}
+
+func initsig(preinit bool) {
+	if !preinit {
+		notify(unsafe.Pointer(funcPC(sigtramp)))
+	}
+}
+
+//go:nosplit
+func osyield() {
+	sleep(0)
+}
+
+//go:nosplit
+func usleep(µs uint32) {
+	ms := int32(µs / 1000)
+	if ms == 0 {
+		ms = 1
+	}
+	sleep(ms)
+}
+
+//go:nosplit
+func nanotime1() int64 {
+	var scratch int64
+	ns := nsec(&scratch)
+	// TODO(aram): remove hack after I fix _nsec in the pc64 kernel.
+	if ns == 0 {
+		return scratch
+	}
+	return ns
+}
+
+var goexits = []byte("go: exit ")
+var emptystatus = []byte("\x00")
+var exiting uint32
+
+func goexitsall(status *byte) {
+	var buf [_ERRMAX]byte
+	if !atomic.Cas(&exiting, 0, 1) {
+		return
+	}
+	getg().m.locks++
+	n := copy(buf[:], goexits)
+	n = copy(buf[n:], gostringnocopy(status))
+	pid := getpid()
+	for mp := (*m)(atomic.Loadp(unsafe.Pointer(&allm))); mp != nil; mp = mp.alllink {
+		if mp.procid != 0 && mp.procid != pid {
+			postnote(mp.procid, buf[:])
+		}
+	}
+	getg().m.locks--
+}
+
+var procdir = []byte("/proc/")
+var notefile = []byte("/note\x00")
+
+func postnote(pid uint64, msg []byte) int {
+	var buf [128]byte
+	var tmp [32]byte
+	n := copy(buf[:], procdir)
+	n += copy(buf[n:], itoa(tmp[:], pid))
+	copy(buf[n:], notefile)
+	fd := open(&buf[0], _OWRITE, 0)
+	if fd < 0 {
+		return -1
+	}
+	len := findnull(&msg[0])
+	if write1(uintptr(fd), unsafe.Pointer(&msg[0]), int32(len)) != int32(len) {
+		closefd(fd)
+		return -1
+	}
+	closefd(fd)
+	return 0
+}
+
+//go:nosplit
+func exit(e int32) {
+	var status []byte
+	if e == 0 {
+		status = emptystatus
+	} else {
+		// build error string
+		var tmp [32]byte
+		status = append(itoa(tmp[:len(tmp)-1], uint64(e)), 0)
+	}
+	goexitsall(&status[0])
+	exits(&status[0])
+}
+
+// May run with m.p==nil, so write barriers are not allowed.
+//go:nowritebarrier
+func newosproc(mp *m) {
+	if false {
+		print("newosproc mp=", mp, " ostk=", &mp, "\n")
+	}
+	pid := rfork(_RFPROC | _RFMEM | _RFNOWAIT)
+	if pid < 0 {
+		throw("newosproc: rfork failed")
+	}
+	if pid == 0 {
+		tstart_plan9(mp)
+	}
+}
+
+func exitThread(wait *uint32) {
+	// We should never reach exitThread on Plan 9 because we let
+	// the OS clean up threads.
+	throw("exitThread")
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+}
+
+//go:nosplit
+func semasleep(ns int64) int {
+	_g_ := getg()
+	if ns >= 0 {
+		ms := timediv(ns, 1000000, nil)
+		if ms == 0 {
+			ms = 1
+		}
+		ret := plan9_tsemacquire(&_g_.m.waitsemacount, ms)
+		if ret == 1 {
+			return 0 // success
+		}
+		return -1 // timeout or interrupted
+	}
+	for plan9_semacquire(&_g_.m.waitsemacount, 1) < 0 {
+		// interrupted; try again (c.f. lock_sema.go)
+	}
+	return 0 // success
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	plan9_semrelease(&mp.waitsemacount, 1)
+}
+
+//go:nosplit
+func read(fd int32, buf unsafe.Pointer, n int32) int32 {
+	return pread(fd, buf, n, -1)
+}
+
+//go:nosplit
+func write1(fd uintptr, buf unsafe.Pointer, n int32) int32 {
+	return pwrite(int32(fd), buf, n, -1)
+}
+
+var _badsignal = []byte("runtime: signal received on thread not created by Go.\n")
+
+// This runs on a foreign stack, without an m or a g. No stack split.
+//go:nosplit
+func badsignal2() {
+	pwrite(2, unsafe.Pointer(&_badsignal[0]), int32(len(_badsignal)), -1)
+	exits(&_badsignal[0])
+}
+
+func raisebadsignal(sig uint32) {
+	badsignal2()
+}
+
+func _atoi(b []byte) int {
+	n := 0
+	for len(b) > 0 && '0' <= b[0] && b[0] <= '9' {
+		n = n*10 + int(b[0]) - '0'
+		b = b[1:]
+	}
+	return n
+}
+
+func signame(sig uint32) string {
+	if sig >= uint32(len(sigtable)) {
+		return ""
+	}
+	return sigtable[sig].name
+}
+
+const preemptMSupported = false
+
+func preemptM(mp *m) {
+	// Not currently supported.
+	//
+	// TODO: Use a note like we use signals on POSIX OSes
+}
diff --git a/src/runtime/os_plan9_arm.go b/src/runtime/os_plan9_arm.go
new file mode 100644
index 0000000..f165a34
--- /dev/null
+++ b/src/runtime/os_plan9_arm.go
@@ -0,0 +1,16 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func checkgoarm() {
+	return // TODO(minux)
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_solaris.go b/src/runtime/os_solaris.go
new file mode 100644
index 0000000..89129e5
--- /dev/null
+++ b/src/runtime/os_solaris.go
@@ -0,0 +1,266 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type mts struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+type mscratch struct {
+	v [6]uintptr
+}
+
+type mOS struct {
+	waitsema uintptr // semaphore for parking on locks
+	perrno   *int32  // pointer to tls errno
+	// these are here because they are too large to be on the stack
+	// of low-level NOSPLIT functions.
+	//LibCall       libcall;
+	ts      mts
+	scratch mscratch
+}
+
+type libcFunc uintptr
+
+//go:linkname asmsysvicall6x runtime.asmsysvicall6
+var asmsysvicall6x libcFunc // name to take addr of asmsysvicall6
+
+func asmsysvicall6() // declared for vet; do NOT call
+
+//go:nosplit
+func sysvicall0(fn *libcFunc) uintptr {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil // See comment in sys_darwin.go:libcCall
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 0
+	libcall.args = uintptr(unsafe.Pointer(fn)) // it's unused but must be non-nil, otherwise crashes
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1
+}
+
+//go:nosplit
+func sysvicall1(fn *libcFunc, a1 uintptr) uintptr {
+	r1, _ := sysvicall1Err(fn, a1)
+	return r1
+}
+
+//go:nosplit
+
+// sysvicall1Err returns both the system call result and the errno value.
+// This is used by sysvicall1 and pipe.
+func sysvicall1Err(fn *libcFunc, a1 uintptr) (r1, err uintptr) {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 1
+	// TODO(rsc): Why is noescape necessary here and below?
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1, libcall.err
+}
+
+//go:nosplit
+func sysvicall2(fn *libcFunc, a1, a2 uintptr) uintptr {
+	r1, _ := sysvicall2Err(fn, a1, a2)
+	return r1
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+
+// sysvicall2Err returns both the system call result and the errno value.
+// This is used by sysvicall2 and pipe2.
+func sysvicall2Err(fn *libcFunc, a1, a2 uintptr) (uintptr, uintptr) {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 2
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1, libcall.err
+}
+
+//go:nosplit
+func sysvicall3(fn *libcFunc, a1, a2, a3 uintptr) uintptr {
+	r1, _ := sysvicall3Err(fn, a1, a2, a3)
+	return r1
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+
+// sysvicall3Err returns both the system call result and the errno value.
+// This is used by sysicall3 and write1.
+func sysvicall3Err(fn *libcFunc, a1, a2, a3 uintptr) (r1, err uintptr) {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 3
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1, libcall.err
+}
+
+//go:nosplit
+func sysvicall4(fn *libcFunc, a1, a2, a3, a4 uintptr) uintptr {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 4
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1
+}
+
+//go:nosplit
+func sysvicall5(fn *libcFunc, a1, a2, a3, a4, a5 uintptr) uintptr {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 5
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1
+}
+
+//go:nosplit
+func sysvicall6(fn *libcFunc, a1, a2, a3, a4, a5, a6 uintptr) uintptr {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 6
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1
+}
diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
new file mode 100644
index 0000000..e6b22e3
--- /dev/null
+++ b/src/runtime/os_windows.go
@@ -0,0 +1,1349 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// TODO(brainman): should not need those
+const (
+	_NSIG = 65
+)
+
+//go:cgo_import_dynamic runtime._AddVectoredExceptionHandler AddVectoredExceptionHandler%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CloseHandle CloseHandle%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateEventA CreateEventA%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateIoCompletionPort CreateIoCompletionPort%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateThread CreateThread%6 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateWaitableTimerA CreateWaitableTimerA%3 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateWaitableTimerExW CreateWaitableTimerExW%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._DuplicateHandle DuplicateHandle%7 "kernel32.dll"
+//go:cgo_import_dynamic runtime._ExitProcess ExitProcess%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._FreeEnvironmentStringsW FreeEnvironmentStringsW%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetConsoleMode GetConsoleMode%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetEnvironmentStringsW GetEnvironmentStringsW%0 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetProcAddress GetProcAddress%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetProcessAffinityMask GetProcessAffinityMask%3 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetQueuedCompletionStatusEx GetQueuedCompletionStatusEx%6 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetStdHandle GetStdHandle%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetSystemDirectoryA GetSystemDirectoryA%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetSystemInfo GetSystemInfo%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetThreadContext GetThreadContext%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetThreadContext SetThreadContext%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._LoadLibraryW LoadLibraryW%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._LoadLibraryA LoadLibraryA%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._PostQueuedCompletionStatus PostQueuedCompletionStatus%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._ResumeThread ResumeThread%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetConsoleCtrlHandler SetConsoleCtrlHandler%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetErrorMode SetErrorMode%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetEvent SetEvent%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetProcessPriorityBoost SetProcessPriorityBoost%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetThreadPriority SetThreadPriority%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetUnhandledExceptionFilter SetUnhandledExceptionFilter%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetWaitableTimer SetWaitableTimer%6 "kernel32.dll"
+//go:cgo_import_dynamic runtime._Sleep Sleep%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SuspendThread SuspendThread%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SwitchToThread SwitchToThread%0 "kernel32.dll"
+//go:cgo_import_dynamic runtime._TlsAlloc TlsAlloc%0 "kernel32.dll"
+//go:cgo_import_dynamic runtime._VirtualAlloc VirtualAlloc%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._VirtualFree VirtualFree%3 "kernel32.dll"
+//go:cgo_import_dynamic runtime._VirtualQuery VirtualQuery%3 "kernel32.dll"
+//go:cgo_import_dynamic runtime._WaitForSingleObject WaitForSingleObject%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._WaitForMultipleObjects WaitForMultipleObjects%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._WriteConsoleW WriteConsoleW%5 "kernel32.dll"
+//go:cgo_import_dynamic runtime._WriteFile WriteFile%5 "kernel32.dll"
+
+type stdFunction unsafe.Pointer
+
+var (
+	// Following syscalls are available on every Windows PC.
+	// All these variables are set by the Windows executable
+	// loader before the Go program starts.
+	_AddVectoredExceptionHandler,
+	_CloseHandle,
+	_CreateEventA,
+	_CreateIoCompletionPort,
+	_CreateThread,
+	_CreateWaitableTimerA,
+	_CreateWaitableTimerExW,
+	_DuplicateHandle,
+	_ExitProcess,
+	_FreeEnvironmentStringsW,
+	_GetConsoleMode,
+	_GetEnvironmentStringsW,
+	_GetProcAddress,
+	_GetProcessAffinityMask,
+	_GetQueuedCompletionStatusEx,
+	_GetStdHandle,
+	_GetSystemDirectoryA,
+	_GetSystemInfo,
+	_GetSystemTimeAsFileTime,
+	_GetThreadContext,
+	_SetThreadContext,
+	_LoadLibraryW,
+	_LoadLibraryA,
+	_PostQueuedCompletionStatus,
+	_QueryPerformanceCounter,
+	_QueryPerformanceFrequency,
+	_ResumeThread,
+	_SetConsoleCtrlHandler,
+	_SetErrorMode,
+	_SetEvent,
+	_SetProcessPriorityBoost,
+	_SetThreadPriority,
+	_SetUnhandledExceptionFilter,
+	_SetWaitableTimer,
+	_Sleep,
+	_SuspendThread,
+	_SwitchToThread,
+	_TlsAlloc,
+	_VirtualAlloc,
+	_VirtualFree,
+	_VirtualQuery,
+	_WaitForSingleObject,
+	_WaitForMultipleObjects,
+	_WriteConsoleW,
+	_WriteFile,
+	_ stdFunction
+
+	// Following syscalls are only available on some Windows PCs.
+	// We will load syscalls, if available, before using them.
+	_AddDllDirectory,
+	_AddVectoredContinueHandler,
+	_LoadLibraryExA,
+	_LoadLibraryExW,
+	_ stdFunction
+
+	// Use RtlGenRandom to generate cryptographically random data.
+	// This approach has been recommended by Microsoft (see issue
+	// 15589 for details).
+	// The RtlGenRandom is not listed in advapi32.dll, instead
+	// RtlGenRandom function can be found by searching for SystemFunction036.
+	// Also some versions of Mingw cannot link to SystemFunction036
+	// when building executable as Cgo. So load SystemFunction036
+	// manually during runtime startup.
+	_RtlGenRandom stdFunction
+
+	// Load ntdll.dll manually during startup, otherwise Mingw
+	// links wrong printf function to cgo executable (see issue
+	// 12030 for details).
+	_NtWaitForSingleObject stdFunction
+
+	// These are from non-kernel32.dll, so we prefer to LoadLibraryEx them.
+	_timeBeginPeriod,
+	_timeEndPeriod,
+	_WSAGetOverlappedResult,
+	_ stdFunction
+)
+
+// Function to be called by windows CreateThread
+// to start new os thread.
+func tstart_stdcall(newm *m)
+
+// Called by OS using stdcall ABI.
+func ctrlhandler()
+
+type mOS struct {
+	threadLock mutex   // protects "thread" and prevents closing
+	thread     uintptr // thread handle
+
+	waitsema   uintptr // semaphore for parking on locks
+	resumesema uintptr // semaphore to indicate suspend/resume
+
+	highResTimer uintptr // high resolution timer handle used in usleep
+
+	// preemptExtLock synchronizes preemptM with entry/exit from
+	// external C code.
+	//
+	// This protects against races between preemptM calling
+	// SuspendThread and external code on this thread calling
+	// ExitProcess. If these happen concurrently, it's possible to
+	// exit the suspending thread and suspend the exiting thread,
+	// leading to deadlock.
+	//
+	// 0 indicates this M is not being preempted or in external
+	// code. Entering external code CASes this from 0 to 1. If
+	// this fails, a preemption is in progress, so the thread must
+	// wait for the preemption. preemptM also CASes this from 0 to
+	// 1. If this fails, the preemption fails (as it would if the
+	// PC weren't in Go code). The value is reset to 0 when
+	// returning from external code or after a preemption is
+	// complete.
+	//
+	// TODO(austin): We may not need this if preemption were more
+	// tightly synchronized on the G/P status and preemption
+	// blocked transition into _Gsyscall/_Psyscall.
+	preemptExtLock uint32
+}
+
+//go:linkname os_sigpipe os.sigpipe
+func os_sigpipe() {
+	throw("too many writes on closed pipe")
+}
+
+// Stubs so tests can link correctly. These should never be called.
+func open(name *byte, mode, perm int32) int32 {
+	throw("unimplemented")
+	return -1
+}
+func closefd(fd int32) int32 {
+	throw("unimplemented")
+	return -1
+}
+func read(fd int32, p unsafe.Pointer, n int32) int32 {
+	throw("unimplemented")
+	return -1
+}
+
+type sigset struct{}
+
+// Call a Windows function with stdcall conventions,
+// and switch to os stack during the call.
+func asmstdcall(fn unsafe.Pointer)
+
+var asmstdcallAddr unsafe.Pointer
+
+func windowsFindfunc(lib uintptr, name []byte) stdFunction {
+	if name[len(name)-1] != 0 {
+		throw("usage")
+	}
+	f := stdcall2(_GetProcAddress, lib, uintptr(unsafe.Pointer(&name[0])))
+	return stdFunction(unsafe.Pointer(f))
+}
+
+var sysDirectory [521]byte
+var sysDirectoryLen uintptr
+
+func windowsLoadSystemLib(name []byte) uintptr {
+	if useLoadLibraryEx {
+		return stdcall3(_LoadLibraryExA, uintptr(unsafe.Pointer(&name[0])), 0, _LOAD_LIBRARY_SEARCH_SYSTEM32)
+	} else {
+		if sysDirectoryLen == 0 {
+			l := stdcall2(_GetSystemDirectoryA, uintptr(unsafe.Pointer(&sysDirectory[0])), uintptr(len(sysDirectory)-1))
+			if l == 0 || l > uintptr(len(sysDirectory)-1) {
+				throw("Unable to determine system directory")
+			}
+			sysDirectory[l] = '\\'
+			sysDirectoryLen = l + 1
+		}
+		absName := append(sysDirectory[:sysDirectoryLen], name...)
+		return stdcall1(_LoadLibraryA, uintptr(unsafe.Pointer(&absName[0])))
+	}
+}
+
+func loadOptionalSyscalls() {
+	var kernel32dll = []byte("kernel32.dll\000")
+	k32 := stdcall1(_LoadLibraryA, uintptr(unsafe.Pointer(&kernel32dll[0])))
+	if k32 == 0 {
+		throw("kernel32.dll not found")
+	}
+	_AddDllDirectory = windowsFindfunc(k32, []byte("AddDllDirectory\000"))
+	_AddVectoredContinueHandler = windowsFindfunc(k32, []byte("AddVectoredContinueHandler\000"))
+	_LoadLibraryExA = windowsFindfunc(k32, []byte("LoadLibraryExA\000"))
+	_LoadLibraryExW = windowsFindfunc(k32, []byte("LoadLibraryExW\000"))
+	useLoadLibraryEx = (_LoadLibraryExW != nil && _LoadLibraryExA != nil && _AddDllDirectory != nil)
+
+	var advapi32dll = []byte("advapi32.dll\000")
+	a32 := windowsLoadSystemLib(advapi32dll)
+	if a32 == 0 {
+		throw("advapi32.dll not found")
+	}
+	_RtlGenRandom = windowsFindfunc(a32, []byte("SystemFunction036\000"))
+
+	var ntdll = []byte("ntdll.dll\000")
+	n32 := windowsLoadSystemLib(ntdll)
+	if n32 == 0 {
+		throw("ntdll.dll not found")
+	}
+	_NtWaitForSingleObject = windowsFindfunc(n32, []byte("NtWaitForSingleObject\000"))
+
+	if GOARCH == "arm" {
+		_QueryPerformanceCounter = windowsFindfunc(k32, []byte("QueryPerformanceCounter\000"))
+		if _QueryPerformanceCounter == nil {
+			throw("could not find QPC syscalls")
+		}
+	}
+
+	var winmmdll = []byte("winmm.dll\000")
+	m32 := windowsLoadSystemLib(winmmdll)
+	if m32 == 0 {
+		throw("winmm.dll not found")
+	}
+	_timeBeginPeriod = windowsFindfunc(m32, []byte("timeBeginPeriod\000"))
+	_timeEndPeriod = windowsFindfunc(m32, []byte("timeEndPeriod\000"))
+	if _timeBeginPeriod == nil || _timeEndPeriod == nil {
+		throw("timeBegin/EndPeriod not found")
+	}
+
+	var ws232dll = []byte("ws2_32.dll\000")
+	ws232 := windowsLoadSystemLib(ws232dll)
+	if ws232 == 0 {
+		throw("ws2_32.dll not found")
+	}
+	_WSAGetOverlappedResult = windowsFindfunc(ws232, []byte("WSAGetOverlappedResult\000"))
+	if _WSAGetOverlappedResult == nil {
+		throw("WSAGetOverlappedResult not found")
+	}
+
+	if windowsFindfunc(n32, []byte("wine_get_version\000")) != nil {
+		// running on Wine
+		initWine(k32)
+	}
+}
+
+func monitorSuspendResume() {
+	const (
+		_DEVICE_NOTIFY_CALLBACK = 2
+	)
+	type _DEVICE_NOTIFY_SUBSCRIBE_PARAMETERS struct {
+		callback uintptr
+		context  uintptr
+	}
+
+	powrprof := windowsLoadSystemLib([]byte("powrprof.dll\000"))
+	if powrprof == 0 {
+		return // Running on Windows 7, where we don't need it anyway.
+	}
+	powerRegisterSuspendResumeNotification := windowsFindfunc(powrprof, []byte("PowerRegisterSuspendResumeNotification\000"))
+	if powerRegisterSuspendResumeNotification == nil {
+		return // Running on Windows 7, where we don't need it anyway.
+	}
+	var fn interface{} = func(context uintptr, changeType uint32, setting uintptr) uintptr {
+		for mp := (*m)(atomic.Loadp(unsafe.Pointer(&allm))); mp != nil; mp = mp.alllink {
+			if mp.resumesema != 0 {
+				stdcall1(_SetEvent, mp.resumesema)
+			}
+		}
+		return 0
+	}
+	params := _DEVICE_NOTIFY_SUBSCRIBE_PARAMETERS{
+		callback: compileCallback(*efaceOf(&fn), true),
+	}
+	handle := uintptr(0)
+	stdcall3(powerRegisterSuspendResumeNotification, _DEVICE_NOTIFY_CALLBACK,
+		uintptr(unsafe.Pointer(&params)), uintptr(unsafe.Pointer(&handle)))
+}
+
+//go:nosplit
+func getLoadLibrary() uintptr {
+	return uintptr(unsafe.Pointer(_LoadLibraryW))
+}
+
+//go:nosplit
+func getLoadLibraryEx() uintptr {
+	return uintptr(unsafe.Pointer(_LoadLibraryExW))
+}
+
+//go:nosplit
+func getGetProcAddress() uintptr {
+	return uintptr(unsafe.Pointer(_GetProcAddress))
+}
+
+func getproccount() int32 {
+	var mask, sysmask uintptr
+	ret := stdcall3(_GetProcessAffinityMask, currentProcess, uintptr(unsafe.Pointer(&mask)), uintptr(unsafe.Pointer(&sysmask)))
+	if ret != 0 {
+		n := 0
+		maskbits := int(unsafe.Sizeof(mask) * 8)
+		for i := 0; i < maskbits; i++ {
+			if mask&(1<<uint(i)) != 0 {
+				n++
+			}
+		}
+		if n != 0 {
+			return int32(n)
+		}
+	}
+	// use GetSystemInfo if GetProcessAffinityMask fails
+	var info systeminfo
+	stdcall1(_GetSystemInfo, uintptr(unsafe.Pointer(&info)))
+	return int32(info.dwnumberofprocessors)
+}
+
+func getPageSize() uintptr {
+	var info systeminfo
+	stdcall1(_GetSystemInfo, uintptr(unsafe.Pointer(&info)))
+	return uintptr(info.dwpagesize)
+}
+
+const (
+	currentProcess = ^uintptr(0) // -1 = current process
+	currentThread  = ^uintptr(1) // -2 = current thread
+)
+
+// in sys_windows_386.s and sys_windows_amd64.s:
+func externalthreadhandler()
+func getlasterror() uint32
+func setlasterror(err uint32)
+
+// When loading DLLs, we prefer to use LoadLibraryEx with
+// LOAD_LIBRARY_SEARCH_* flags, if available. LoadLibraryEx is not
+// available on old Windows, though, and the LOAD_LIBRARY_SEARCH_*
+// flags are not available on some versions of Windows without a
+// security patch.
+//
+// https://msdn.microsoft.com/en-us/library/ms684179(v=vs.85).aspx says:
+// "Windows 7, Windows Server 2008 R2, Windows Vista, and Windows
+// Server 2008: The LOAD_LIBRARY_SEARCH_* flags are available on
+// systems that have KB2533623 installed. To determine whether the
+// flags are available, use GetProcAddress to get the address of the
+// AddDllDirectory, RemoveDllDirectory, or SetDefaultDllDirectories
+// function. If GetProcAddress succeeds, the LOAD_LIBRARY_SEARCH_*
+// flags can be used with LoadLibraryEx."
+var useLoadLibraryEx bool
+
+var timeBeginPeriodRetValue uint32
+
+// osRelaxMinNS indicates that sysmon shouldn't osRelax if the next
+// timer is less than 60 ms from now. Since osRelaxing may reduce
+// timer resolution to 15.6 ms, this keeps timer error under roughly 1
+// part in 4.
+const osRelaxMinNS = 60 * 1e6
+
+// osRelax is called by the scheduler when transitioning to and from
+// all Ps being idle.
+//
+// Some versions of Windows have high resolution timer. For those
+// versions osRelax is noop.
+// For Windows versions without high resolution timer, osRelax
+// adjusts the system-wide timer resolution. Go needs a
+// high resolution timer while running and there's little extra cost
+// if we're already using the CPU, but if all Ps are idle there's no
+// need to consume extra power to drive the high-res timer.
+func osRelax(relax bool) uint32 {
+	if haveHighResTimer {
+		// If the high resolution timer is available, the runtime uses the timer
+		// to sleep for short durations. This means there's no need to adjust
+		// the global clock frequency.
+		return 0
+	}
+
+	if relax {
+		return uint32(stdcall1(_timeEndPeriod, 1))
+	} else {
+		return uint32(stdcall1(_timeBeginPeriod, 1))
+	}
+}
+
+// haveHighResTimer indicates that the CreateWaitableTimerEx
+// CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag is available.
+var haveHighResTimer = false
+
+// createHighResTimer calls CreateWaitableTimerEx with
+// CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag to create high
+// resolution timer. createHighResTimer returns new timer
+// handle or 0, if CreateWaitableTimerEx failed.
+func createHighResTimer() uintptr {
+	const (
+		// As per @jstarks, see
+		// https://github.com/golang/go/issues/8687#issuecomment-656259353
+		_CREATE_WAITABLE_TIMER_HIGH_RESOLUTION = 0x00000002
+
+		_SYNCHRONIZE        = 0x00100000
+		_TIMER_QUERY_STATE  = 0x0001
+		_TIMER_MODIFY_STATE = 0x0002
+	)
+	return stdcall4(_CreateWaitableTimerExW, 0, 0,
+		_CREATE_WAITABLE_TIMER_HIGH_RESOLUTION,
+		_SYNCHRONIZE|_TIMER_QUERY_STATE|_TIMER_MODIFY_STATE)
+}
+
+func initHighResTimer() {
+	if GOARCH == "arm" {
+		// TODO: Not yet implemented.
+		return
+	}
+	h := createHighResTimer()
+	if h != 0 {
+		haveHighResTimer = true
+		usleep2Addr = unsafe.Pointer(funcPC(usleep2HighRes))
+		stdcall1(_CloseHandle, h)
+	}
+}
+
+func osinit() {
+	asmstdcallAddr = unsafe.Pointer(funcPC(asmstdcall))
+	usleep2Addr = unsafe.Pointer(funcPC(usleep2))
+	switchtothreadAddr = unsafe.Pointer(funcPC(switchtothread))
+
+	setBadSignalMsg()
+
+	loadOptionalSyscalls()
+
+	disableWER()
+
+	initExceptionHandler()
+
+	stdcall2(_SetConsoleCtrlHandler, funcPC(ctrlhandler), 1)
+
+	initHighResTimer()
+	timeBeginPeriodRetValue = osRelax(false)
+
+	ncpu = getproccount()
+
+	physPageSize = getPageSize()
+
+	// Windows dynamic priority boosting assumes that a process has different types
+	// of dedicated threads -- GUI, IO, computational, etc. Go processes use
+	// equivalent threads that all do a mix of GUI, IO, computations, etc.
+	// In such context dynamic priority boosting does nothing but harm, so we turn it off.
+	stdcall2(_SetProcessPriorityBoost, currentProcess, 1)
+}
+
+// useQPCTime controls whether time.now and nanotime use QueryPerformanceCounter.
+// This is only set to 1 when running under Wine.
+var useQPCTime uint8
+
+var qpcStartCounter int64
+var qpcMultiplier int64
+
+//go:nosplit
+func nanotimeQPC() int64 {
+	var counter int64 = 0
+	stdcall1(_QueryPerformanceCounter, uintptr(unsafe.Pointer(&counter)))
+
+	// returns number of nanoseconds
+	return (counter - qpcStartCounter) * qpcMultiplier
+}
+
+//go:nosplit
+func nowQPC() (sec int64, nsec int32, mono int64) {
+	var ft int64
+	stdcall1(_GetSystemTimeAsFileTime, uintptr(unsafe.Pointer(&ft)))
+
+	t := (ft - 116444736000000000) * 100
+
+	sec = t / 1000000000
+	nsec = int32(t - sec*1000000000)
+
+	mono = nanotimeQPC()
+	return
+}
+
+func initWine(k32 uintptr) {
+	_GetSystemTimeAsFileTime = windowsFindfunc(k32, []byte("GetSystemTimeAsFileTime\000"))
+	if _GetSystemTimeAsFileTime == nil {
+		throw("could not find GetSystemTimeAsFileTime() syscall")
+	}
+
+	_QueryPerformanceCounter = windowsFindfunc(k32, []byte("QueryPerformanceCounter\000"))
+	_QueryPerformanceFrequency = windowsFindfunc(k32, []byte("QueryPerformanceFrequency\000"))
+	if _QueryPerformanceCounter == nil || _QueryPerformanceFrequency == nil {
+		throw("could not find QPC syscalls")
+	}
+
+	// We can not simply fallback to GetSystemTimeAsFileTime() syscall, since its time is not monotonic,
+	// instead we use QueryPerformanceCounter family of syscalls to implement monotonic timer
+	// https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx
+
+	var tmp int64
+	stdcall1(_QueryPerformanceFrequency, uintptr(unsafe.Pointer(&tmp)))
+	if tmp == 0 {
+		throw("QueryPerformanceFrequency syscall returned zero, running on unsupported hardware")
+	}
+
+	// This should not overflow, it is a number of ticks of the performance counter per second,
+	// its resolution is at most 10 per usecond (on Wine, even smaller on real hardware), so it will be at most 10 millions here,
+	// panic if overflows.
+	if tmp > (1<<31 - 1) {
+		throw("QueryPerformanceFrequency overflow 32 bit divider, check nosplit discussion to proceed")
+	}
+	qpcFrequency := int32(tmp)
+	stdcall1(_QueryPerformanceCounter, uintptr(unsafe.Pointer(&qpcStartCounter)))
+
+	// Since we are supposed to run this time calls only on Wine, it does not lose precision,
+	// since Wine's timer is kind of emulated at 10 Mhz, so it will be a nice round multiplier of 100
+	// but for general purpose system (like 3.3 Mhz timer on i7) it will not be very precise.
+	// We have to do it this way (or similar), since multiplying QPC counter by 100 millions overflows
+	// int64 and resulted time will always be invalid.
+	qpcMultiplier = int64(timediv(1000000000, qpcFrequency, nil))
+
+	useQPCTime = 1
+}
+
+//go:nosplit
+func getRandomData(r []byte) {
+	n := 0
+	if stdcall2(_RtlGenRandom, uintptr(unsafe.Pointer(&r[0])), uintptr(len(r)))&0xff != 0 {
+		n = len(r)
+	}
+	extendRandom(r, n)
+}
+
+func goenvs() {
+	// strings is a pointer to environment variable pairs in the form:
+	//     "envA=valA\x00envB=valB\x00\x00" (in UTF-16)
+	// Two consecutive zero bytes end the list.
+	strings := unsafe.Pointer(stdcall0(_GetEnvironmentStringsW))
+	p := (*[1 << 24]uint16)(strings)[:]
+
+	n := 0
+	for from, i := 0, 0; true; i++ {
+		if p[i] == 0 {
+			// empty string marks the end
+			if i == from {
+				break
+			}
+			from = i + 1
+			n++
+		}
+	}
+	envs = make([]string, n)
+
+	for i := range envs {
+		envs[i] = gostringw(&p[0])
+		for p[0] != 0 {
+			p = p[1:]
+		}
+		p = p[1:] // skip nil byte
+	}
+
+	stdcall1(_FreeEnvironmentStringsW, uintptr(strings))
+
+	// We call this all the way here, late in init, so that malloc works
+	// for the callback function this generates.
+	monitorSuspendResume()
+}
+
+// exiting is set to non-zero when the process is exiting.
+var exiting uint32
+
+//go:nosplit
+func exit(code int32) {
+	// Disallow thread suspension for preemption. Otherwise,
+	// ExitProcess and SuspendThread can race: SuspendThread
+	// queues a suspension request for this thread, ExitProcess
+	// kills the suspending thread, and then this thread suspends.
+	lock(&suspendLock)
+	atomic.Store(&exiting, 1)
+	stdcall1(_ExitProcess, uintptr(code))
+}
+
+// write1 must be nosplit because it's used as a last resort in
+// functions like badmorestackg0. In such cases, we'll always take the
+// ASCII path.
+//
+//go:nosplit
+func write1(fd uintptr, buf unsafe.Pointer, n int32) int32 {
+	const (
+		_STD_OUTPUT_HANDLE = ^uintptr(10) // -11
+		_STD_ERROR_HANDLE  = ^uintptr(11) // -12
+	)
+	var handle uintptr
+	switch fd {
+	case 1:
+		handle = stdcall1(_GetStdHandle, _STD_OUTPUT_HANDLE)
+	case 2:
+		handle = stdcall1(_GetStdHandle, _STD_ERROR_HANDLE)
+	default:
+		// assume fd is real windows handle.
+		handle = fd
+	}
+	isASCII := true
+	b := (*[1 << 30]byte)(buf)[:n]
+	for _, x := range b {
+		if x >= 0x80 {
+			isASCII = false
+			break
+		}
+	}
+
+	if !isASCII {
+		var m uint32
+		isConsole := stdcall2(_GetConsoleMode, handle, uintptr(unsafe.Pointer(&m))) != 0
+		// If this is a console output, various non-unicode code pages can be in use.
+		// Use the dedicated WriteConsole call to ensure unicode is printed correctly.
+		if isConsole {
+			return int32(writeConsole(handle, buf, n))
+		}
+	}
+	var written uint32
+	stdcall5(_WriteFile, handle, uintptr(buf), uintptr(n), uintptr(unsafe.Pointer(&written)), 0)
+	return int32(written)
+}
+
+var (
+	utf16ConsoleBack     [1000]uint16
+	utf16ConsoleBackLock mutex
+)
+
+// writeConsole writes bufLen bytes from buf to the console File.
+// It returns the number of bytes written.
+func writeConsole(handle uintptr, buf unsafe.Pointer, bufLen int32) int {
+	const surr2 = (surrogateMin + surrogateMax + 1) / 2
+
+	// Do not use defer for unlock. May cause issues when printing a panic.
+	lock(&utf16ConsoleBackLock)
+
+	b := (*[1 << 30]byte)(buf)[:bufLen]
+	s := *(*string)(unsafe.Pointer(&b))
+
+	utf16tmp := utf16ConsoleBack[:]
+
+	total := len(s)
+	w := 0
+	for _, r := range s {
+		if w >= len(utf16tmp)-2 {
+			writeConsoleUTF16(handle, utf16tmp[:w])
+			w = 0
+		}
+		if r < 0x10000 {
+			utf16tmp[w] = uint16(r)
+			w++
+		} else {
+			r -= 0x10000
+			utf16tmp[w] = surrogateMin + uint16(r>>10)&0x3ff
+			utf16tmp[w+1] = surr2 + uint16(r)&0x3ff
+			w += 2
+		}
+	}
+	writeConsoleUTF16(handle, utf16tmp[:w])
+	unlock(&utf16ConsoleBackLock)
+	return total
+}
+
+// writeConsoleUTF16 is the dedicated windows calls that correctly prints
+// to the console regardless of the current code page. Input is utf-16 code points.
+// The handle must be a console handle.
+func writeConsoleUTF16(handle uintptr, b []uint16) {
+	l := uint32(len(b))
+	if l == 0 {
+		return
+	}
+	var written uint32
+	stdcall5(_WriteConsoleW,
+		handle,
+		uintptr(unsafe.Pointer(&b[0])),
+		uintptr(l),
+		uintptr(unsafe.Pointer(&written)),
+		0,
+	)
+	return
+}
+
+// walltime1 isn't implemented on Windows, but will never be called.
+func walltime1() (sec int64, nsec int32)
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	const (
+		_WAIT_ABANDONED = 0x00000080
+		_WAIT_OBJECT_0  = 0x00000000
+		_WAIT_TIMEOUT   = 0x00000102
+		_WAIT_FAILED    = 0xFFFFFFFF
+	)
+
+	var result uintptr
+	if ns < 0 {
+		result = stdcall2(_WaitForSingleObject, getg().m.waitsema, uintptr(_INFINITE))
+	} else {
+		start := nanotime()
+		elapsed := int64(0)
+		for {
+			ms := int64(timediv(ns-elapsed, 1000000, nil))
+			if ms == 0 {
+				ms = 1
+			}
+			result = stdcall4(_WaitForMultipleObjects, 2,
+				uintptr(unsafe.Pointer(&[2]uintptr{getg().m.waitsema, getg().m.resumesema})),
+				0, uintptr(ms))
+			if result != _WAIT_OBJECT_0+1 {
+				// Not a suspend/resume event
+				break
+			}
+			elapsed = nanotime() - start
+			if elapsed >= ns {
+				return -1
+			}
+		}
+	}
+	switch result {
+	case _WAIT_OBJECT_0: // Signaled
+		return 0
+
+	case _WAIT_TIMEOUT:
+		return -1
+
+	case _WAIT_ABANDONED:
+		systemstack(func() {
+			throw("runtime.semasleep wait_abandoned")
+		})
+
+	case _WAIT_FAILED:
+		systemstack(func() {
+			print("runtime: waitforsingleobject wait_failed; errno=", getlasterror(), "\n")
+			throw("runtime.semasleep wait_failed")
+		})
+
+	default:
+		systemstack(func() {
+			print("runtime: waitforsingleobject unexpected; result=", result, "\n")
+			throw("runtime.semasleep unexpected")
+		})
+	}
+
+	return -1 // unreachable
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	if stdcall1(_SetEvent, mp.waitsema) == 0 {
+		systemstack(func() {
+			print("runtime: setevent failed; errno=", getlasterror(), "\n")
+			throw("runtime.semawakeup")
+		})
+	}
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+	if mp.waitsema != 0 {
+		return
+	}
+	mp.waitsema = stdcall4(_CreateEventA, 0, 0, 0, 0)
+	if mp.waitsema == 0 {
+		systemstack(func() {
+			print("runtime: createevent failed; errno=", getlasterror(), "\n")
+			throw("runtime.semacreate")
+		})
+	}
+	mp.resumesema = stdcall4(_CreateEventA, 0, 0, 0, 0)
+	if mp.resumesema == 0 {
+		systemstack(func() {
+			print("runtime: createevent failed; errno=", getlasterror(), "\n")
+			throw("runtime.semacreate")
+		})
+		stdcall1(_CloseHandle, mp.waitsema)
+		mp.waitsema = 0
+	}
+}
+
+// May run with m.p==nil, so write barriers are not allowed. This
+// function is called by newosproc0, so it is also required to
+// operate without stack guards.
+//go:nowritebarrierrec
+//go:nosplit
+func newosproc(mp *m) {
+	// We pass 0 for the stack size to use the default for this binary.
+	thandle := stdcall6(_CreateThread, 0, 0,
+		funcPC(tstart_stdcall), uintptr(unsafe.Pointer(mp)),
+		0, 0)
+
+	if thandle == 0 {
+		if atomic.Load(&exiting) != 0 {
+			// CreateThread may fail if called
+			// concurrently with ExitProcess. If this
+			// happens, just freeze this thread and let
+			// the process exit. See issue #18253.
+			lock(&deadlock)
+			lock(&deadlock)
+		}
+		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", getlasterror(), ")\n")
+		throw("runtime.newosproc")
+	}
+
+	// Close thandle to avoid leaking the thread object if it exits.
+	stdcall1(_CloseHandle, thandle)
+}
+
+// Used by the C library build mode. On Linux this function would allocate a
+// stack, but that's not necessary for Windows. No stack guards are present
+// and the GC has not been initialized, so write barriers will fail.
+//go:nowritebarrierrec
+//go:nosplit
+func newosproc0(mp *m, stk unsafe.Pointer) {
+	// TODO: this is completely broken. The args passed to newosproc0 (in asm_amd64.s)
+	// are stacksize and function, not *m and stack.
+	// Check os_linux.go for an implementation that might actually work.
+	throw("bad newosproc0")
+}
+
+func exitThread(wait *uint32) {
+	// We should never reach exitThread on Windows because we let
+	// the OS clean up threads.
+	throw("exitThread")
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+}
+
+//go:nosplit
+func sigsave(p *sigset) {
+}
+
+//go:nosplit
+func msigrestore(sigmask sigset) {
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func clearSignalHandlers() {
+}
+
+//go:nosplit
+func sigblock(exiting bool) {
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	var thandle uintptr
+	if stdcall7(_DuplicateHandle, currentProcess, currentThread, currentProcess, uintptr(unsafe.Pointer(&thandle)), 0, 0, _DUPLICATE_SAME_ACCESS) == 0 {
+		print("runtime.minit: duplicatehandle failed; errno=", getlasterror(), "\n")
+		throw("runtime.minit: duplicatehandle failed")
+	}
+
+	mp := getg().m
+	lock(&mp.threadLock)
+	mp.thread = thandle
+
+	// Configure usleep timer, if possible.
+	if mp.highResTimer == 0 && haveHighResTimer {
+		mp.highResTimer = createHighResTimer()
+		if mp.highResTimer == 0 {
+			print("runtime: CreateWaitableTimerEx failed; errno=", getlasterror(), "\n")
+			throw("CreateWaitableTimerEx when creating timer failed")
+		}
+	}
+	unlock(&mp.threadLock)
+
+	// Query the true stack base from the OS. Currently we're
+	// running on a small assumed stack.
+	var mbi memoryBasicInformation
+	res := stdcall3(_VirtualQuery, uintptr(unsafe.Pointer(&mbi)), uintptr(unsafe.Pointer(&mbi)), unsafe.Sizeof(mbi))
+	if res == 0 {
+		print("runtime: VirtualQuery failed; errno=", getlasterror(), "\n")
+		throw("VirtualQuery for stack base failed")
+	}
+	// The system leaves an 8K PAGE_GUARD region at the bottom of
+	// the stack (in theory VirtualQuery isn't supposed to include
+	// that, but it does). Add an additional 8K of slop for
+	// calling C functions that don't have stack checks and for
+	// lastcontinuehandler. We shouldn't be anywhere near this
+	// bound anyway.
+	base := mbi.allocationBase + 16<<10
+	// Sanity check the stack bounds.
+	g0 := getg()
+	if base > g0.stack.hi || g0.stack.hi-base > 64<<20 {
+		print("runtime: g0 stack [", hex(base), ",", hex(g0.stack.hi), ")\n")
+		throw("bad g0 stack")
+	}
+	g0.stack.lo = base
+	g0.stackguard0 = g0.stack.lo + _StackGuard
+	g0.stackguard1 = g0.stackguard0
+	// Sanity check the SP.
+	stackcheck()
+}
+
+// Called from dropm to undo the effect of an minit.
+//go:nosplit
+func unminit() {
+	mp := getg().m
+	lock(&mp.threadLock)
+	if mp.thread != 0 {
+		stdcall1(_CloseHandle, mp.thread)
+		mp.thread = 0
+	}
+	unlock(&mp.threadLock)
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+//go:nosplit
+func mdestroy(mp *m) {
+	if mp.highResTimer != 0 {
+		stdcall1(_CloseHandle, mp.highResTimer)
+		mp.highResTimer = 0
+	}
+	if mp.waitsema != 0 {
+		stdcall1(_CloseHandle, mp.waitsema)
+		mp.waitsema = 0
+	}
+	if mp.resumesema != 0 {
+		stdcall1(_CloseHandle, mp.resumesema)
+		mp.resumesema = 0
+	}
+}
+
+// Calling stdcall on os stack.
+// May run during STW, so write barriers are not allowed.
+//go:nowritebarrier
+//go:nosplit
+func stdcall(fn stdFunction) uintptr {
+	gp := getg()
+	mp := gp.m
+	mp.libcall.fn = uintptr(unsafe.Pointer(fn))
+	resetLibcall := false
+	if mp.profilehz != 0 && mp.libcallsp == 0 {
+		// leave pc/sp for cpu profiler
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+		resetLibcall = true // See comment in sys_darwin.go:libcCall
+	}
+	asmcgocall(asmstdcallAddr, unsafe.Pointer(&mp.libcall))
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+	return mp.libcall.r1
+}
+
+//go:nosplit
+func stdcall0(fn stdFunction) uintptr {
+	mp := getg().m
+	mp.libcall.n = 0
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&fn))) // it's unused but must be non-nil, otherwise crashes
+	return stdcall(fn)
+}
+
+//go:nosplit
+func stdcall1(fn stdFunction, a0 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 1
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+func stdcall2(fn stdFunction, a0, a1 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 2
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+func stdcall3(fn stdFunction, a0, a1, a2 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 3
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+func stdcall4(fn stdFunction, a0, a1, a2, a3 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 4
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+func stdcall5(fn stdFunction, a0, a1, a2, a3, a4 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 5
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+func stdcall6(fn stdFunction, a0, a1, a2, a3, a4, a5 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 6
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+func stdcall7(fn stdFunction, a0, a1, a2, a3, a4, a5, a6 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 7
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+// In sys_windows_386.s and sys_windows_amd64.s.
+func onosstack(fn unsafe.Pointer, arg uint32)
+
+// These are not callable functions. They should only be called via onosstack.
+func usleep2(usec uint32)
+func usleep2HighRes(usec uint32)
+func switchtothread()
+
+var usleep2Addr unsafe.Pointer
+var switchtothreadAddr unsafe.Pointer
+
+//go:nosplit
+func osyield() {
+	onosstack(switchtothreadAddr, 0)
+}
+
+//go:nosplit
+func usleep(us uint32) {
+	// Have 1us units; want 100ns units.
+	onosstack(usleep2Addr, 10*us)
+}
+
+func ctrlhandler1(_type uint32) uint32 {
+	var s uint32
+
+	switch _type {
+	case _CTRL_C_EVENT, _CTRL_BREAK_EVENT:
+		s = _SIGINT
+	case _CTRL_CLOSE_EVENT, _CTRL_LOGOFF_EVENT, _CTRL_SHUTDOWN_EVENT:
+		s = _SIGTERM
+	default:
+		return 0
+	}
+
+	if sigsend(s) {
+		if s == _SIGTERM {
+			// Windows terminates the process after this handler returns.
+			// Block indefinitely to give signal handlers a chance to clean up.
+			stdcall1(_Sleep, uintptr(_INFINITE))
+		}
+		return 1
+	}
+	return 0
+}
+
+// in sys_windows_386.s and sys_windows_amd64.s
+func profileloop()
+
+// called from zcallback_windows_*.s to sys_windows_*.s
+func callbackasm1()
+
+var profiletimer uintptr
+
+func profilem(mp *m, thread uintptr) {
+	// Align Context to 16 bytes.
+	var c *context
+	var cbuf [unsafe.Sizeof(*c) + 15]byte
+	c = (*context)(unsafe.Pointer((uintptr(unsafe.Pointer(&cbuf[15]))) &^ 15))
+
+	c.contextflags = _CONTEXT_CONTROL
+	stdcall2(_GetThreadContext, thread, uintptr(unsafe.Pointer(c)))
+
+	gp := gFromTLS(mp)
+
+	sigprof(c.ip(), c.sp(), c.lr(), gp, mp)
+}
+
+func gFromTLS(mp *m) *g {
+	switch GOARCH {
+	case "arm":
+		tls := &mp.tls[0]
+		return **((***g)(unsafe.Pointer(tls)))
+	case "386", "amd64":
+		tls := &mp.tls[0]
+		return *((**g)(unsafe.Pointer(tls)))
+	}
+	throw("unsupported architecture")
+	return nil
+}
+
+func profileloop1(param uintptr) uint32 {
+	stdcall2(_SetThreadPriority, currentThread, _THREAD_PRIORITY_HIGHEST)
+
+	for {
+		stdcall2(_WaitForSingleObject, profiletimer, _INFINITE)
+		first := (*m)(atomic.Loadp(unsafe.Pointer(&allm)))
+		for mp := first; mp != nil; mp = mp.alllink {
+			lock(&mp.threadLock)
+			// Do not profile threads blocked on Notes,
+			// this includes idle worker threads,
+			// idle timer thread, idle heap scavenger, etc.
+			if mp.thread == 0 || mp.profilehz == 0 || mp.blocked {
+				unlock(&mp.threadLock)
+				continue
+			}
+			// Acquire our own handle to the thread.
+			var thread uintptr
+			if stdcall7(_DuplicateHandle, currentProcess, mp.thread, currentProcess, uintptr(unsafe.Pointer(&thread)), 0, 0, _DUPLICATE_SAME_ACCESS) == 0 {
+				print("runtime.profileloop1: duplicatehandle failed; errno=", getlasterror(), "\n")
+				throw("runtime.profileloop1: duplicatehandle failed")
+			}
+			unlock(&mp.threadLock)
+
+			// mp may exit between the DuplicateHandle
+			// above and the SuspendThread. The handle
+			// will remain valid, but SuspendThread may
+			// fail.
+			if int32(stdcall1(_SuspendThread, thread)) == -1 {
+				// The thread no longer exists.
+				stdcall1(_CloseHandle, thread)
+				continue
+			}
+			if mp.profilehz != 0 && !mp.blocked {
+				// Pass the thread handle in case mp
+				// was in the process of shutting down.
+				profilem(mp, thread)
+			}
+			stdcall1(_ResumeThread, thread)
+			stdcall1(_CloseHandle, thread)
+		}
+	}
+}
+
+func setProcessCPUProfiler(hz int32) {
+	if profiletimer == 0 {
+		timer := stdcall3(_CreateWaitableTimerA, 0, 0, 0)
+		atomic.Storeuintptr(&profiletimer, timer)
+		thread := stdcall6(_CreateThread, 0, 0, funcPC(profileloop), 0, 0, 0)
+		stdcall2(_SetThreadPriority, thread, _THREAD_PRIORITY_HIGHEST)
+		stdcall1(_CloseHandle, thread)
+	}
+}
+
+func setThreadCPUProfiler(hz int32) {
+	ms := int32(0)
+	due := ^int64(^uint64(1 << 63))
+	if hz > 0 {
+		ms = 1000 / hz
+		if ms == 0 {
+			ms = 1
+		}
+		due = int64(ms) * -10000
+	}
+	stdcall6(_SetWaitableTimer, profiletimer, uintptr(unsafe.Pointer(&due)), uintptr(ms), 0, 0, 0)
+	atomic.Store((*uint32)(unsafe.Pointer(&getg().m.profilehz)), uint32(hz))
+}
+
+const preemptMSupported = GOARCH != "arm"
+
+// suspendLock protects simultaneous SuspendThread operations from
+// suspending each other.
+var suspendLock mutex
+
+func preemptM(mp *m) {
+	if GOARCH == "arm" {
+		// TODO: Implement call injection
+		return
+	}
+
+	if mp == getg().m {
+		throw("self-preempt")
+	}
+
+	// Synchronize with external code that may try to ExitProcess.
+	if !atomic.Cas(&mp.preemptExtLock, 0, 1) {
+		// External code is running. Fail the preemption
+		// attempt.
+		atomic.Xadd(&mp.preemptGen, 1)
+		return
+	}
+
+	// Acquire our own handle to mp's thread.
+	lock(&mp.threadLock)
+	if mp.thread == 0 {
+		// The M hasn't been minit'd yet (or was just unminit'd).
+		unlock(&mp.threadLock)
+		atomic.Store(&mp.preemptExtLock, 0)
+		atomic.Xadd(&mp.preemptGen, 1)
+		return
+	}
+	var thread uintptr
+	if stdcall7(_DuplicateHandle, currentProcess, mp.thread, currentProcess, uintptr(unsafe.Pointer(&thread)), 0, 0, _DUPLICATE_SAME_ACCESS) == 0 {
+		print("runtime.preemptM: duplicatehandle failed; errno=", getlasterror(), "\n")
+		throw("runtime.preemptM: duplicatehandle failed")
+	}
+	unlock(&mp.threadLock)
+
+	// Prepare thread context buffer. This must be aligned to 16 bytes.
+	var c *context
+	var cbuf [unsafe.Sizeof(*c) + 15]byte
+	c = (*context)(unsafe.Pointer((uintptr(unsafe.Pointer(&cbuf[15]))) &^ 15))
+	c.contextflags = _CONTEXT_CONTROL
+
+	// Serialize thread suspension. SuspendThread is asynchronous,
+	// so it's otherwise possible for two threads to suspend each
+	// other and deadlock. We must hold this lock until after
+	// GetThreadContext, since that blocks until the thread is
+	// actually suspended.
+	lock(&suspendLock)
+
+	// Suspend the thread.
+	if int32(stdcall1(_SuspendThread, thread)) == -1 {
+		unlock(&suspendLock)
+		stdcall1(_CloseHandle, thread)
+		atomic.Store(&mp.preemptExtLock, 0)
+		// The thread no longer exists. This shouldn't be
+		// possible, but just acknowledge the request.
+		atomic.Xadd(&mp.preemptGen, 1)
+		return
+	}
+
+	// We have to be very careful between this point and once
+	// we've shown mp is at an async safe-point. This is like a
+	// signal handler in the sense that mp could have been doing
+	// anything when we stopped it, including holding arbitrary
+	// locks.
+
+	// We have to get the thread context before inspecting the M
+	// because SuspendThread only requests a suspend.
+	// GetThreadContext actually blocks until it's suspended.
+	stdcall2(_GetThreadContext, thread, uintptr(unsafe.Pointer(c)))
+
+	unlock(&suspendLock)
+
+	// Does it want a preemption and is it safe to preempt?
+	gp := gFromTLS(mp)
+	if wantAsyncPreempt(gp) {
+		if ok, newpc := isAsyncSafePoint(gp, c.ip(), c.sp(), c.lr()); ok {
+			// Inject call to asyncPreempt
+			targetPC := funcPC(asyncPreempt)
+			switch GOARCH {
+			default:
+				throw("unsupported architecture")
+			case "386", "amd64":
+				// Make it look like the thread called targetPC.
+				sp := c.sp()
+				sp -= sys.PtrSize
+				*(*uintptr)(unsafe.Pointer(sp)) = newpc
+				c.set_sp(sp)
+				c.set_ip(targetPC)
+			}
+
+			stdcall2(_SetThreadContext, thread, uintptr(unsafe.Pointer(c)))
+		}
+	}
+
+	atomic.Store(&mp.preemptExtLock, 0)
+
+	// Acknowledge the preemption.
+	atomic.Xadd(&mp.preemptGen, 1)
+
+	stdcall1(_ResumeThread, thread)
+	stdcall1(_CloseHandle, thread)
+}
+
+// osPreemptExtEnter is called before entering external code that may
+// call ExitProcess.
+//
+// This must be nosplit because it may be called from a syscall with
+// untyped stack slots, so the stack must not be grown or scanned.
+//
+//go:nosplit
+func osPreemptExtEnter(mp *m) {
+	for !atomic.Cas(&mp.preemptExtLock, 0, 1) {
+		// An asynchronous preemption is in progress. It's not
+		// safe to enter external code because it may call
+		// ExitProcess and deadlock with SuspendThread.
+		// Ideally we would do the preemption ourselves, but
+		// can't since there may be untyped syscall arguments
+		// on the stack. Instead, just wait and encourage the
+		// SuspendThread APC to run. The preemption should be
+		// done shortly.
+		osyield()
+	}
+	// Asynchronous preemption is now blocked.
+}
+
+// osPreemptExtExit is called after returning from external code that
+// may call ExitProcess.
+//
+// See osPreemptExtEnter for why this is nosplit.
+//
+//go:nosplit
+func osPreemptExtExit(mp *m) {
+	atomic.Store(&mp.preemptExtLock, 0)
+}
diff --git a/src/runtime/os_windows_arm.go b/src/runtime/os_windows_arm.go
new file mode 100644
index 0000000..10aff75
--- /dev/null
+++ b/src/runtime/os_windows_arm.go
@@ -0,0 +1,22 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+//go:nosplit
+func cputicks() int64 {
+	var counter int64
+	stdcall1(_QueryPerformanceCounter, uintptr(unsafe.Pointer(&counter)))
+	return counter
+}
+
+func checkgoarm() {
+	if goarm < 7 {
+		print("Need atomic synchronization instructions, coprocessor ",
+			"access instructions. Recompile using GOARM=7.\n")
+		exit(1)
+	}
+}
diff --git a/src/runtime/panic.go b/src/runtime/panic.go
new file mode 100644
index 0000000..5b2ccdd
--- /dev/null
+++ b/src/runtime/panic.go
@@ -0,0 +1,1413 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// We have two different ways of doing defers. The older way involves creating a
+// defer record at the time that a defer statement is executing and adding it to a
+// defer chain. This chain is inspected by the deferreturn call at all function
+// exits in order to run the appropriate defer calls. A cheaper way (which we call
+// open-coded defers) is used for functions in which no defer statements occur in
+// loops. In that case, we simply store the defer function/arg information into
+// specific stack slots at the point of each defer statement, as well as setting a
+// bit in a bitmask. At each function exit, we add inline code to directly make
+// the appropriate defer calls based on the bitmask and fn/arg information stored
+// on the stack. During panic/Goexit processing, the appropriate defer calls are
+// made using extra funcdata info that indicates the exact stack slots that
+// contain the bitmask and defer fn/args.
+
+// Check to make sure we can really generate a panic. If the panic
+// was generated from the runtime, or from inside malloc, then convert
+// to a throw of msg.
+// pc should be the program counter of the compiler-generated code that
+// triggered this panic.
+func panicCheck1(pc uintptr, msg string) {
+	if sys.GoarchWasm == 0 && hasPrefix(funcname(findfunc(pc)), "runtime.") {
+		// Note: wasm can't tail call, so we can't get the original caller's pc.
+		throw(msg)
+	}
+	// TODO: is this redundant? How could we be in malloc
+	// but not in the runtime? runtime/internal/*, maybe?
+	gp := getg()
+	if gp != nil && gp.m != nil && gp.m.mallocing != 0 {
+		throw(msg)
+	}
+}
+
+// Same as above, but calling from the runtime is allowed.
+//
+// Using this function is necessary for any panic that may be
+// generated by runtime.sigpanic, since those are always called by the
+// runtime.
+func panicCheck2(err string) {
+	// panic allocates, so to avoid recursive malloc, turn panics
+	// during malloc into throws.
+	gp := getg()
+	if gp != nil && gp.m != nil && gp.m.mallocing != 0 {
+		throw(err)
+	}
+}
+
+// Many of the following panic entry-points turn into throws when they
+// happen in various runtime contexts. These should never happen in
+// the runtime, and if they do, they indicate a serious issue and
+// should not be caught by user code.
+//
+// The panic{Index,Slice,divide,shift} functions are called by
+// code generated by the compiler for out of bounds index expressions,
+// out of bounds slice expressions, division by zero, and shift by negative.
+// The panicdivide (again), panicoverflow, panicfloat, and panicmem
+// functions are called by the signal handler when a signal occurs
+// indicating the respective problem.
+//
+// Since panic{Index,Slice,shift} are never called directly, and
+// since the runtime package should never have an out of bounds slice
+// or array reference or negative shift, if we see those functions called from the
+// runtime package we turn the panic into a throw. That will dump the
+// entire runtime stack for easier debugging.
+//
+// The entry points called by the signal handler will be called from
+// runtime.sigpanic, so we can't disallow calls from the runtime to
+// these (they always look like they're called from the runtime).
+// Hence, for these, we just check for clearly bad runtime conditions.
+//
+// The panic{Index,Slice} functions are implemented in assembly and tail call
+// to the goPanic{Index,Slice} functions below. This is done so we can use
+// a space-minimal register calling convention.
+
+// failures in the comparisons for s[x], 0 <= x < y (y == len(s))
+func goPanicIndex(x int, y int) {
+	panicCheck1(getcallerpc(), "index out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsIndex})
+}
+func goPanicIndexU(x uint, y int) {
+	panicCheck1(getcallerpc(), "index out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsIndex})
+}
+
+// failures in the comparisons for s[:x], 0 <= x <= y (y == len(s) or cap(s))
+func goPanicSliceAlen(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSliceAlen})
+}
+func goPanicSliceAlenU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSliceAlen})
+}
+func goPanicSliceAcap(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSliceAcap})
+}
+func goPanicSliceAcapU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSliceAcap})
+}
+
+// failures in the comparisons for s[x:y], 0 <= x <= y
+func goPanicSliceB(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSliceB})
+}
+func goPanicSliceBU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSliceB})
+}
+
+// failures in the comparisons for s[::x], 0 <= x <= y (y == len(s) or cap(s))
+func goPanicSlice3Alen(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSlice3Alen})
+}
+func goPanicSlice3AlenU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSlice3Alen})
+}
+func goPanicSlice3Acap(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSlice3Acap})
+}
+func goPanicSlice3AcapU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSlice3Acap})
+}
+
+// failures in the comparisons for s[:x:y], 0 <= x <= y
+func goPanicSlice3B(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSlice3B})
+}
+func goPanicSlice3BU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSlice3B})
+}
+
+// failures in the comparisons for s[x:y:], 0 <= x <= y
+func goPanicSlice3C(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSlice3C})
+}
+func goPanicSlice3CU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSlice3C})
+}
+
+// Implemented in assembly, as they take arguments in registers.
+// Declared here to mark them as ABIInternal.
+func panicIndex(x int, y int)
+func panicIndexU(x uint, y int)
+func panicSliceAlen(x int, y int)
+func panicSliceAlenU(x uint, y int)
+func panicSliceAcap(x int, y int)
+func panicSliceAcapU(x uint, y int)
+func panicSliceB(x int, y int)
+func panicSliceBU(x uint, y int)
+func panicSlice3Alen(x int, y int)
+func panicSlice3AlenU(x uint, y int)
+func panicSlice3Acap(x int, y int)
+func panicSlice3AcapU(x uint, y int)
+func panicSlice3B(x int, y int)
+func panicSlice3BU(x uint, y int)
+func panicSlice3C(x int, y int)
+func panicSlice3CU(x uint, y int)
+
+var shiftError = error(errorString("negative shift amount"))
+
+func panicshift() {
+	panicCheck1(getcallerpc(), "negative shift amount")
+	panic(shiftError)
+}
+
+var divideError = error(errorString("integer divide by zero"))
+
+func panicdivide() {
+	panicCheck2("integer divide by zero")
+	panic(divideError)
+}
+
+var overflowError = error(errorString("integer overflow"))
+
+func panicoverflow() {
+	panicCheck2("integer overflow")
+	panic(overflowError)
+}
+
+var floatError = error(errorString("floating point error"))
+
+func panicfloat() {
+	panicCheck2("floating point error")
+	panic(floatError)
+}
+
+var memoryError = error(errorString("invalid memory address or nil pointer dereference"))
+
+func panicmem() {
+	panicCheck2("invalid memory address or nil pointer dereference")
+	panic(memoryError)
+}
+
+func panicmemAddr(addr uintptr) {
+	panicCheck2("invalid memory address or nil pointer dereference")
+	panic(errorAddressString{msg: "invalid memory address or nil pointer dereference", addr: addr})
+}
+
+// Create a new deferred function fn with siz bytes of arguments.
+// The compiler turns a defer statement into a call to this.
+//go:nosplit
+func deferproc(siz int32, fn *funcval) { // arguments of fn follow fn
+	gp := getg()
+	if gp.m.curg != gp {
+		// go code on the system stack can't defer
+		throw("defer on system stack")
+	}
+
+	// the arguments of fn are in a perilous state. The stack map
+	// for deferproc does not describe them. So we can't let garbage
+	// collection or stack copying trigger until we've copied them out
+	// to somewhere safe. The memmove below does that.
+	// Until the copy completes, we can only call nosplit routines.
+	sp := getcallersp()
+	argp := uintptr(unsafe.Pointer(&fn)) + unsafe.Sizeof(fn)
+	callerpc := getcallerpc()
+
+	d := newdefer(siz)
+	if d._panic != nil {
+		throw("deferproc: d.panic != nil after newdefer")
+	}
+	d.link = gp._defer
+	gp._defer = d
+	d.fn = fn
+	d.pc = callerpc
+	d.sp = sp
+	switch siz {
+	case 0:
+		// Do nothing.
+	case sys.PtrSize:
+		*(*uintptr)(deferArgs(d)) = *(*uintptr)(unsafe.Pointer(argp))
+	default:
+		memmove(deferArgs(d), unsafe.Pointer(argp), uintptr(siz))
+	}
+
+	// deferproc returns 0 normally.
+	// a deferred func that stops a panic
+	// makes the deferproc return 1.
+	// the code the compiler generates always
+	// checks the return value and jumps to the
+	// end of the function if deferproc returns != 0.
+	return0()
+	// No code can go here - the C return register has
+	// been set and must not be clobbered.
+}
+
+// deferprocStack queues a new deferred function with a defer record on the stack.
+// The defer record must have its siz and fn fields initialized.
+// All other fields can contain junk.
+// The defer record must be immediately followed in memory by
+// the arguments of the defer.
+// Nosplit because the arguments on the stack won't be scanned
+// until the defer record is spliced into the gp._defer list.
+//go:nosplit
+func deferprocStack(d *_defer) {
+	gp := getg()
+	if gp.m.curg != gp {
+		// go code on the system stack can't defer
+		throw("defer on system stack")
+	}
+	// siz and fn are already set.
+	// The other fields are junk on entry to deferprocStack and
+	// are initialized here.
+	d.started = false
+	d.heap = false
+	d.openDefer = false
+	d.sp = getcallersp()
+	d.pc = getcallerpc()
+	d.framepc = 0
+	d.varp = 0
+	// The lines below implement:
+	//   d.panic = nil
+	//   d.fd = nil
+	//   d.link = gp._defer
+	//   gp._defer = d
+	// But without write barriers. The first three are writes to
+	// the stack so they don't need a write barrier, and furthermore
+	// are to uninitialized memory, so they must not use a write barrier.
+	// The fourth write does not require a write barrier because we
+	// explicitly mark all the defer structures, so we don't need to
+	// keep track of pointers to them with a write barrier.
+	*(*uintptr)(unsafe.Pointer(&d._panic)) = 0
+	*(*uintptr)(unsafe.Pointer(&d.fd)) = 0
+	*(*uintptr)(unsafe.Pointer(&d.link)) = uintptr(unsafe.Pointer(gp._defer))
+	*(*uintptr)(unsafe.Pointer(&gp._defer)) = uintptr(unsafe.Pointer(d))
+
+	return0()
+	// No code can go here - the C return register has
+	// been set and must not be clobbered.
+}
+
+// Small malloc size classes >= 16 are the multiples of 16: 16, 32, 48, 64, 80, 96, 112, 128, 144, ...
+// Each P holds a pool for defers with small arg sizes.
+// Assign defer allocations to pools by rounding to 16, to match malloc size classes.
+
+const (
+	deferHeaderSize = unsafe.Sizeof(_defer{})
+	minDeferAlloc   = (deferHeaderSize + 15) &^ 15
+	minDeferArgs    = minDeferAlloc - deferHeaderSize
+)
+
+// defer size class for arg size sz
+//go:nosplit
+func deferclass(siz uintptr) uintptr {
+	if siz <= minDeferArgs {
+		return 0
+	}
+	return (siz - minDeferArgs + 15) / 16
+}
+
+// total size of memory block for defer with arg size sz
+func totaldefersize(siz uintptr) uintptr {
+	if siz <= minDeferArgs {
+		return minDeferAlloc
+	}
+	return deferHeaderSize + siz
+}
+
+// Ensure that defer arg sizes that map to the same defer size class
+// also map to the same malloc size class.
+func testdefersizes() {
+	var m [len(p{}.deferpool)]int32
+
+	for i := range m {
+		m[i] = -1
+	}
+	for i := uintptr(0); ; i++ {
+		defersc := deferclass(i)
+		if defersc >= uintptr(len(m)) {
+			break
+		}
+		siz := roundupsize(totaldefersize(i))
+		if m[defersc] < 0 {
+			m[defersc] = int32(siz)
+			continue
+		}
+		if m[defersc] != int32(siz) {
+			print("bad defer size class: i=", i, " siz=", siz, " defersc=", defersc, "\n")
+			throw("bad defer size class")
+		}
+	}
+}
+
+// The arguments associated with a deferred call are stored
+// immediately after the _defer header in memory.
+//go:nosplit
+func deferArgs(d *_defer) unsafe.Pointer {
+	if d.siz == 0 {
+		// Avoid pointer past the defer allocation.
+		return nil
+	}
+	return add(unsafe.Pointer(d), unsafe.Sizeof(*d))
+}
+
+var deferType *_type // type of _defer struct
+
+func init() {
+	var x interface{}
+	x = (*_defer)(nil)
+	deferType = (*(**ptrtype)(unsafe.Pointer(&x))).elem
+}
+
+// Allocate a Defer, usually using per-P pool.
+// Each defer must be released with freedefer.  The defer is not
+// added to any defer chain yet.
+//
+// This must not grow the stack because there may be a frame without
+// stack map information when this is called.
+//
+//go:nosplit
+func newdefer(siz int32) *_defer {
+	var d *_defer
+	sc := deferclass(uintptr(siz))
+	gp := getg()
+	if sc < uintptr(len(p{}.deferpool)) {
+		pp := gp.m.p.ptr()
+		if len(pp.deferpool[sc]) == 0 && sched.deferpool[sc] != nil {
+			// Take the slow path on the system stack so
+			// we don't grow newdefer's stack.
+			systemstack(func() {
+				lock(&sched.deferlock)
+				for len(pp.deferpool[sc]) < cap(pp.deferpool[sc])/2 && sched.deferpool[sc] != nil {
+					d := sched.deferpool[sc]
+					sched.deferpool[sc] = d.link
+					d.link = nil
+					pp.deferpool[sc] = append(pp.deferpool[sc], d)
+				}
+				unlock(&sched.deferlock)
+			})
+		}
+		if n := len(pp.deferpool[sc]); n > 0 {
+			d = pp.deferpool[sc][n-1]
+			pp.deferpool[sc][n-1] = nil
+			pp.deferpool[sc] = pp.deferpool[sc][:n-1]
+		}
+	}
+	if d == nil {
+		// Allocate new defer+args.
+		systemstack(func() {
+			total := roundupsize(totaldefersize(uintptr(siz)))
+			d = (*_defer)(mallocgc(total, deferType, true))
+		})
+	}
+	d.siz = siz
+	d.heap = true
+	return d
+}
+
+// Free the given defer.
+// The defer cannot be used after this call.
+//
+// This must not grow the stack because there may be a frame without a
+// stack map when this is called.
+//
+//go:nosplit
+func freedefer(d *_defer) {
+	if d._panic != nil {
+		freedeferpanic()
+	}
+	if d.fn != nil {
+		freedeferfn()
+	}
+	if !d.heap {
+		return
+	}
+	sc := deferclass(uintptr(d.siz))
+	if sc >= uintptr(len(p{}.deferpool)) {
+		return
+	}
+	pp := getg().m.p.ptr()
+	if len(pp.deferpool[sc]) == cap(pp.deferpool[sc]) {
+		// Transfer half of local cache to the central cache.
+		//
+		// Take this slow path on the system stack so
+		// we don't grow freedefer's stack.
+		systemstack(func() {
+			var first, last *_defer
+			for len(pp.deferpool[sc]) > cap(pp.deferpool[sc])/2 {
+				n := len(pp.deferpool[sc])
+				d := pp.deferpool[sc][n-1]
+				pp.deferpool[sc][n-1] = nil
+				pp.deferpool[sc] = pp.deferpool[sc][:n-1]
+				if first == nil {
+					first = d
+				} else {
+					last.link = d
+				}
+				last = d
+			}
+			lock(&sched.deferlock)
+			last.link = sched.deferpool[sc]
+			sched.deferpool[sc] = first
+			unlock(&sched.deferlock)
+		})
+	}
+
+	// These lines used to be simply `*d = _defer{}` but that
+	// started causing a nosplit stack overflow via typedmemmove.
+	d.siz = 0
+	d.started = false
+	d.openDefer = false
+	d.sp = 0
+	d.pc = 0
+	d.framepc = 0
+	d.varp = 0
+	d.fd = nil
+	// d._panic and d.fn must be nil already.
+	// If not, we would have called freedeferpanic or freedeferfn above,
+	// both of which throw.
+	d.link = nil
+
+	pp.deferpool[sc] = append(pp.deferpool[sc], d)
+}
+
+// Separate function so that it can split stack.
+// Windows otherwise runs out of stack space.
+func freedeferpanic() {
+	// _panic must be cleared before d is unlinked from gp.
+	throw("freedefer with d._panic != nil")
+}
+
+func freedeferfn() {
+	// fn must be cleared before d is unlinked from gp.
+	throw("freedefer with d.fn != nil")
+}
+
+// Run a deferred function if there is one.
+// The compiler inserts a call to this at the end of any
+// function which calls defer.
+// If there is a deferred function, this will call runtime·jmpdefer,
+// which will jump to the deferred function such that it appears
+// to have been called by the caller of deferreturn at the point
+// just before deferreturn was called. The effect is that deferreturn
+// is called again and again until there are no more deferred functions.
+//
+// Declared as nosplit, because the function should not be preempted once we start
+// modifying the caller's frame in order to reuse the frame to call the deferred
+// function.
+//
+// The single argument isn't actually used - it just has its address
+// taken so it can be matched against pending defers.
+//go:nosplit
+func deferreturn(arg0 uintptr) {
+	gp := getg()
+	d := gp._defer
+	if d == nil {
+		return
+	}
+	sp := getcallersp()
+	if d.sp != sp {
+		return
+	}
+	if d.openDefer {
+		done := runOpenDeferFrame(gp, d)
+		if !done {
+			throw("unfinished open-coded defers in deferreturn")
+		}
+		gp._defer = d.link
+		freedefer(d)
+		return
+	}
+
+	// Moving arguments around.
+	//
+	// Everything called after this point must be recursively
+	// nosplit because the garbage collector won't know the form
+	// of the arguments until the jmpdefer can flip the PC over to
+	// fn.
+	switch d.siz {
+	case 0:
+		// Do nothing.
+	case sys.PtrSize:
+		*(*uintptr)(unsafe.Pointer(&arg0)) = *(*uintptr)(deferArgs(d))
+	default:
+		memmove(unsafe.Pointer(&arg0), deferArgs(d), uintptr(d.siz))
+	}
+	fn := d.fn
+	d.fn = nil
+	gp._defer = d.link
+	freedefer(d)
+	// If the defer function pointer is nil, force the seg fault to happen
+	// here rather than in jmpdefer. gentraceback() throws an error if it is
+	// called with a callback on an LR architecture and jmpdefer is on the
+	// stack, because the stack trace can be incorrect in that case - see
+	// issue #8153).
+	_ = fn.fn
+	jmpdefer(fn, uintptr(unsafe.Pointer(&arg0)))
+}
+
+// Goexit terminates the goroutine that calls it. No other goroutine is affected.
+// Goexit runs all deferred calls before terminating the goroutine. Because Goexit
+// is not a panic, any recover calls in those deferred functions will return nil.
+//
+// Calling Goexit from the main goroutine terminates that goroutine
+// without func main returning. Since func main has not returned,
+// the program continues execution of other goroutines.
+// If all other goroutines exit, the program crashes.
+func Goexit() {
+	// Run all deferred functions for the current goroutine.
+	// This code is similar to gopanic, see that implementation
+	// for detailed comments.
+	gp := getg()
+
+	// Create a panic object for Goexit, so we can recognize when it might be
+	// bypassed by a recover().
+	var p _panic
+	p.goexit = true
+	p.link = gp._panic
+	gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
+
+	addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))
+	for {
+		d := gp._defer
+		if d == nil {
+			break
+		}
+		if d.started {
+			if d._panic != nil {
+				d._panic.aborted = true
+				d._panic = nil
+			}
+			if !d.openDefer {
+				d.fn = nil
+				gp._defer = d.link
+				freedefer(d)
+				continue
+			}
+		}
+		d.started = true
+		d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
+		if d.openDefer {
+			done := runOpenDeferFrame(gp, d)
+			if !done {
+				// We should always run all defers in the frame,
+				// since there is no panic associated with this
+				// defer that can be recovered.
+				throw("unfinished open-coded defers in Goexit")
+			}
+			if p.aborted {
+				// Since our current defer caused a panic and may
+				// have been already freed, just restart scanning
+				// for open-coded defers from this frame again.
+				addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))
+			} else {
+				addOneOpenDeferFrame(gp, 0, nil)
+			}
+		} else {
+
+			// Save the pc/sp in reflectcallSave(), so we can "recover" back to this
+			// loop if necessary.
+			reflectcallSave(&p, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz))
+		}
+		if p.aborted {
+			// We had a recursive panic in the defer d we started, and
+			// then did a recover in a defer that was further down the
+			// defer chain than d. In the case of an outstanding Goexit,
+			// we force the recover to return back to this loop. d will
+			// have already been freed if completed, so just continue
+			// immediately to the next defer on the chain.
+			p.aborted = false
+			continue
+		}
+		if gp._defer != d {
+			throw("bad defer entry in Goexit")
+		}
+		d._panic = nil
+		d.fn = nil
+		gp._defer = d.link
+		freedefer(d)
+		// Note: we ignore recovers here because Goexit isn't a panic
+	}
+	goexit1()
+}
+
+// Call all Error and String methods before freezing the world.
+// Used when crashing with panicking.
+func preprintpanics(p *_panic) {
+	defer func() {
+		if recover() != nil {
+			throw("panic while printing panic value")
+		}
+	}()
+	for p != nil {
+		switch v := p.arg.(type) {
+		case error:
+			p.arg = v.Error()
+		case stringer:
+			p.arg = v.String()
+		}
+		p = p.link
+	}
+}
+
+// Print all currently active panics. Used when crashing.
+// Should only be called after preprintpanics.
+func printpanics(p *_panic) {
+	if p.link != nil {
+		printpanics(p.link)
+		if !p.link.goexit {
+			print("\t")
+		}
+	}
+	if p.goexit {
+		return
+	}
+	print("panic: ")
+	printany(p.arg)
+	if p.recovered {
+		print(" [recovered]")
+	}
+	print("\n")
+}
+
+// addOneOpenDeferFrame scans the stack for the first frame (if any) with
+// open-coded defers and if it finds one, adds a single record to the defer chain
+// for that frame. If sp is non-nil, it starts the stack scan from the frame
+// specified by sp. If sp is nil, it uses the sp from the current defer record
+// (which has just been finished). Hence, it continues the stack scan from the
+// frame of the defer that just finished. It skips any frame that already has an
+// open-coded _defer record, which would have been been created from a previous
+// (unrecovered) panic.
+//
+// Note: All entries of the defer chain (including this new open-coded entry) have
+// their pointers (including sp) adjusted properly if the stack moves while
+// running deferred functions. Also, it is safe to pass in the sp arg (which is
+// the direct result of calling getcallersp()), because all pointer variables
+// (including arguments) are adjusted as needed during stack copies.
+func addOneOpenDeferFrame(gp *g, pc uintptr, sp unsafe.Pointer) {
+	var prevDefer *_defer
+	if sp == nil {
+		prevDefer = gp._defer
+		pc = prevDefer.framepc
+		sp = unsafe.Pointer(prevDefer.sp)
+	}
+	systemstack(func() {
+		gentraceback(pc, uintptr(sp), 0, gp, 0, nil, 0x7fffffff,
+			func(frame *stkframe, unused unsafe.Pointer) bool {
+				if prevDefer != nil && prevDefer.sp == frame.sp {
+					// Skip the frame for the previous defer that
+					// we just finished (and was used to set
+					// where we restarted the stack scan)
+					return true
+				}
+				f := frame.fn
+				fd := funcdata(f, _FUNCDATA_OpenCodedDeferInfo)
+				if fd == nil {
+					return true
+				}
+				// Insert the open defer record in the
+				// chain, in order sorted by sp.
+				d := gp._defer
+				var prev *_defer
+				for d != nil {
+					dsp := d.sp
+					if frame.sp < dsp {
+						break
+					}
+					if frame.sp == dsp {
+						if !d.openDefer {
+							throw("duplicated defer entry")
+						}
+						return true
+					}
+					prev = d
+					d = d.link
+				}
+				if frame.fn.deferreturn == 0 {
+					throw("missing deferreturn")
+				}
+
+				maxargsize, _ := readvarintUnsafe(fd)
+				d1 := newdefer(int32(maxargsize))
+				d1.openDefer = true
+				d1._panic = nil
+				// These are the pc/sp to set after we've
+				// run a defer in this frame that did a
+				// recover. We return to a special
+				// deferreturn that runs any remaining
+				// defers and then returns from the
+				// function.
+				d1.pc = frame.fn.entry + uintptr(frame.fn.deferreturn)
+				d1.varp = frame.varp
+				d1.fd = fd
+				// Save the SP/PC associated with current frame,
+				// so we can continue stack trace later if needed.
+				d1.framepc = frame.pc
+				d1.sp = frame.sp
+				d1.link = d
+				if prev == nil {
+					gp._defer = d1
+				} else {
+					prev.link = d1
+				}
+				// Stop stack scanning after adding one open defer record
+				return false
+			},
+			nil, 0)
+	})
+}
+
+// readvarintUnsafe reads the uint32 in varint format starting at fd, and returns the
+// uint32 and a pointer to the byte following the varint.
+//
+// There is a similar function runtime.readvarint, which takes a slice of bytes,
+// rather than an unsafe pointer. These functions are duplicated, because one of
+// the two use cases for the functions would get slower if the functions were
+// combined.
+func readvarintUnsafe(fd unsafe.Pointer) (uint32, unsafe.Pointer) {
+	var r uint32
+	var shift int
+	for {
+		b := *(*uint8)((unsafe.Pointer(fd)))
+		fd = add(fd, unsafe.Sizeof(b))
+		if b < 128 {
+			return r + uint32(b)<<shift, fd
+		}
+		r += ((uint32(b) &^ 128) << shift)
+		shift += 7
+		if shift > 28 {
+			panic("Bad varint")
+		}
+	}
+}
+
+// runOpenDeferFrame runs the active open-coded defers in the frame specified by
+// d. It normally processes all active defers in the frame, but stops immediately
+// if a defer does a successful recover. It returns true if there are no
+// remaining defers to run in the frame.
+func runOpenDeferFrame(gp *g, d *_defer) bool {
+	done := true
+	fd := d.fd
+
+	// Skip the maxargsize
+	_, fd = readvarintUnsafe(fd)
+	deferBitsOffset, fd := readvarintUnsafe(fd)
+	nDefers, fd := readvarintUnsafe(fd)
+	deferBits := *(*uint8)(unsafe.Pointer(d.varp - uintptr(deferBitsOffset)))
+
+	for i := int(nDefers) - 1; i >= 0; i-- {
+		// read the funcdata info for this defer
+		var argWidth, closureOffset, nArgs uint32
+		argWidth, fd = readvarintUnsafe(fd)
+		closureOffset, fd = readvarintUnsafe(fd)
+		nArgs, fd = readvarintUnsafe(fd)
+		if deferBits&(1<<i) == 0 {
+			for j := uint32(0); j < nArgs; j++ {
+				_, fd = readvarintUnsafe(fd)
+				_, fd = readvarintUnsafe(fd)
+				_, fd = readvarintUnsafe(fd)
+			}
+			continue
+		}
+		closure := *(**funcval)(unsafe.Pointer(d.varp - uintptr(closureOffset)))
+		d.fn = closure
+		deferArgs := deferArgs(d)
+		// If there is an interface receiver or method receiver, it is
+		// described/included as the first arg.
+		for j := uint32(0); j < nArgs; j++ {
+			var argOffset, argLen, argCallOffset uint32
+			argOffset, fd = readvarintUnsafe(fd)
+			argLen, fd = readvarintUnsafe(fd)
+			argCallOffset, fd = readvarintUnsafe(fd)
+			memmove(unsafe.Pointer(uintptr(deferArgs)+uintptr(argCallOffset)),
+				unsafe.Pointer(d.varp-uintptr(argOffset)),
+				uintptr(argLen))
+		}
+		deferBits = deferBits &^ (1 << i)
+		*(*uint8)(unsafe.Pointer(d.varp - uintptr(deferBitsOffset))) = deferBits
+		p := d._panic
+		reflectcallSave(p, unsafe.Pointer(closure), deferArgs, argWidth)
+		if p != nil && p.aborted {
+			break
+		}
+		d.fn = nil
+		// These args are just a copy, so can be cleared immediately
+		memclrNoHeapPointers(deferArgs, uintptr(argWidth))
+		if d._panic != nil && d._panic.recovered {
+			done = deferBits == 0
+			break
+		}
+	}
+
+	return done
+}
+
+// reflectcallSave calls reflectcall after saving the caller's pc and sp in the
+// panic record. This allows the runtime to return to the Goexit defer processing
+// loop, in the unusual case where the Goexit may be bypassed by a successful
+// recover.
+func reflectcallSave(p *_panic, fn, arg unsafe.Pointer, argsize uint32) {
+	if p != nil {
+		p.argp = unsafe.Pointer(getargp(0))
+		p.pc = getcallerpc()
+		p.sp = unsafe.Pointer(getcallersp())
+	}
+	reflectcall(nil, fn, arg, argsize, argsize)
+	if p != nil {
+		p.pc = 0
+		p.sp = unsafe.Pointer(nil)
+	}
+}
+
+// The implementation of the predeclared function panic.
+func gopanic(e interface{}) {
+	gp := getg()
+	if gp.m.curg != gp {
+		print("panic: ")
+		printany(e)
+		print("\n")
+		throw("panic on system stack")
+	}
+
+	if gp.m.mallocing != 0 {
+		print("panic: ")
+		printany(e)
+		print("\n")
+		throw("panic during malloc")
+	}
+	if gp.m.preemptoff != "" {
+		print("panic: ")
+		printany(e)
+		print("\n")
+		print("preempt off reason: ")
+		print(gp.m.preemptoff)
+		print("\n")
+		throw("panic during preemptoff")
+	}
+	if gp.m.locks != 0 {
+		print("panic: ")
+		printany(e)
+		print("\n")
+		throw("panic holding locks")
+	}
+
+	var p _panic
+	p.arg = e
+	p.link = gp._panic
+	gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
+
+	atomic.Xadd(&runningPanicDefers, 1)
+
+	// By calculating getcallerpc/getcallersp here, we avoid scanning the
+	// gopanic frame (stack scanning is slow...)
+	addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))
+
+	for {
+		d := gp._defer
+		if d == nil {
+			break
+		}
+
+		// If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),
+		// take defer off list. An earlier panic will not continue running, but we will make sure below that an
+		// earlier Goexit does continue running.
+		if d.started {
+			if d._panic != nil {
+				d._panic.aborted = true
+			}
+			d._panic = nil
+			if !d.openDefer {
+				// For open-coded defers, we need to process the
+				// defer again, in case there are any other defers
+				// to call in the frame (not including the defer
+				// call that caused the panic).
+				d.fn = nil
+				gp._defer = d.link
+				freedefer(d)
+				continue
+			}
+		}
+
+		// Mark defer as started, but keep on list, so that traceback
+		// can find and update the defer's argument frame if stack growth
+		// or a garbage collection happens before reflectcall starts executing d.fn.
+		d.started = true
+
+		// Record the panic that is running the defer.
+		// If there is a new panic during the deferred call, that panic
+		// will find d in the list and will mark d._panic (this panic) aborted.
+		d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
+
+		done := true
+		if d.openDefer {
+			done = runOpenDeferFrame(gp, d)
+			if done && !d._panic.recovered {
+				addOneOpenDeferFrame(gp, 0, nil)
+			}
+		} else {
+			p.argp = unsafe.Pointer(getargp(0))
+			reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
+		}
+		p.argp = nil
+
+		// reflectcall did not panic. Remove d.
+		if gp._defer != d {
+			throw("bad defer entry in panic")
+		}
+		d._panic = nil
+
+		// trigger shrinkage to test stack copy. See stack_test.go:TestStackPanic
+		//GC()
+
+		pc := d.pc
+		sp := unsafe.Pointer(d.sp) // must be pointer so it gets adjusted during stack copy
+		if done {
+			d.fn = nil
+			gp._defer = d.link
+			freedefer(d)
+		}
+		if p.recovered {
+			gp._panic = p.link
+			if gp._panic != nil && gp._panic.goexit && gp._panic.aborted {
+				// A normal recover would bypass/abort the Goexit.  Instead,
+				// we return to the processing loop of the Goexit.
+				gp.sigcode0 = uintptr(gp._panic.sp)
+				gp.sigcode1 = uintptr(gp._panic.pc)
+				mcall(recovery)
+				throw("bypassed recovery failed") // mcall should not return
+			}
+			atomic.Xadd(&runningPanicDefers, -1)
+
+			// Remove any remaining non-started, open-coded
+			// defer entries after a recover, since the
+			// corresponding defers will be executed normally
+			// (inline). Any such entry will become stale once
+			// we run the corresponding defers inline and exit
+			// the associated stack frame.
+			d := gp._defer
+			var prev *_defer
+			if !done {
+				// Skip our current frame, if not done. It is
+				// needed to complete any remaining defers in
+				// deferreturn()
+				prev = d
+				d = d.link
+			}
+			for d != nil {
+				if d.started {
+					// This defer is started but we
+					// are in the middle of a
+					// defer-panic-recover inside of
+					// it, so don't remove it or any
+					// further defer entries
+					break
+				}
+				if d.openDefer {
+					if prev == nil {
+						gp._defer = d.link
+					} else {
+						prev.link = d.link
+					}
+					newd := d.link
+					freedefer(d)
+					d = newd
+				} else {
+					prev = d
+					d = d.link
+				}
+			}
+
+			gp._panic = p.link
+			// Aborted panics are marked but remain on the g.panic list.
+			// Remove them from the list.
+			for gp._panic != nil && gp._panic.aborted {
+				gp._panic = gp._panic.link
+			}
+			if gp._panic == nil { // must be done with signal
+				gp.sig = 0
+			}
+			// Pass information about recovering frame to recovery.
+			gp.sigcode0 = uintptr(sp)
+			gp.sigcode1 = pc
+			mcall(recovery)
+			throw("recovery failed") // mcall should not return
+		}
+	}
+
+	// ran out of deferred calls - old-school panic now
+	// Because it is unsafe to call arbitrary user code after freezing
+	// the world, we call preprintpanics to invoke all necessary Error
+	// and String methods to prepare the panic strings before startpanic.
+	preprintpanics(gp._panic)
+
+	fatalpanic(gp._panic) // should not return
+	*(*int)(nil) = 0      // not reached
+}
+
+// getargp returns the location where the caller
+// writes outgoing function call arguments.
+//go:nosplit
+//go:noinline
+func getargp(x int) uintptr {
+	// x is an argument mainly so that we can return its address.
+	return uintptr(noescape(unsafe.Pointer(&x)))
+}
+
+// The implementation of the predeclared function recover.
+// Cannot split the stack because it needs to reliably
+// find the stack segment of its caller.
+//
+// TODO(rsc): Once we commit to CopyStackAlways,
+// this doesn't need to be nosplit.
+//go:nosplit
+func gorecover(argp uintptr) interface{} {
+	// Must be in a function running as part of a deferred call during the panic.
+	// Must be called from the topmost function of the call
+	// (the function used in the defer statement).
+	// p.argp is the argument pointer of that topmost deferred function call.
+	// Compare against argp reported by caller.
+	// If they match, the caller is the one who can recover.
+	gp := getg()
+	p := gp._panic
+	if p != nil && !p.goexit && !p.recovered && argp == uintptr(p.argp) {
+		p.recovered = true
+		return p.arg
+	}
+	return nil
+}
+
+//go:linkname sync_throw sync.throw
+func sync_throw(s string) {
+	throw(s)
+}
+
+//go:nosplit
+func throw(s string) {
+	// Everything throw does should be recursively nosplit so it
+	// can be called even when it's unsafe to grow the stack.
+	systemstack(func() {
+		print("fatal error: ", s, "\n")
+	})
+	gp := getg()
+	if gp.m.throwing == 0 {
+		gp.m.throwing = 1
+	}
+	fatalthrow()
+	*(*int)(nil) = 0 // not reached
+}
+
+// runningPanicDefers is non-zero while running deferred functions for panic.
+// runningPanicDefers is incremented and decremented atomically.
+// This is used to try hard to get a panic stack trace out when exiting.
+var runningPanicDefers uint32
+
+// panicking is non-zero when crashing the program for an unrecovered panic.
+// panicking is incremented and decremented atomically.
+var panicking uint32
+
+// paniclk is held while printing the panic information and stack trace,
+// so that two concurrent panics don't overlap their output.
+var paniclk mutex
+
+// Unwind the stack after a deferred function calls recover
+// after a panic. Then arrange to continue running as though
+// the caller of the deferred function returned normally.
+func recovery(gp *g) {
+	// Info about defer passed in G struct.
+	sp := gp.sigcode0
+	pc := gp.sigcode1
+
+	// d's arguments need to be in the stack.
+	if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {
+		print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
+		throw("bad recovery")
+	}
+
+	// Make the deferproc for this d return again,
+	// this time returning 1. The calling function will
+	// jump to the standard return epilogue.
+	gp.sched.sp = sp
+	gp.sched.pc = pc
+	gp.sched.lr = 0
+	gp.sched.ret = 1
+	gogo(&gp.sched)
+}
+
+// fatalthrow implements an unrecoverable runtime throw. It freezes the
+// system, prints stack traces starting from its caller, and terminates the
+// process.
+//
+//go:nosplit
+func fatalthrow() {
+	pc := getcallerpc()
+	sp := getcallersp()
+	gp := getg()
+	// Switch to the system stack to avoid any stack growth, which
+	// may make things worse if the runtime is in a bad state.
+	systemstack(func() {
+		startpanic_m()
+
+		if dopanic_m(gp, pc, sp) {
+			// crash uses a decent amount of nosplit stack and we're already
+			// low on stack in throw, so crash on the system stack (unlike
+			// fatalpanic).
+			crash()
+		}
+
+		exit(2)
+	})
+
+	*(*int)(nil) = 0 // not reached
+}
+
+// fatalpanic implements an unrecoverable panic. It is like fatalthrow, except
+// that if msgs != nil, fatalpanic also prints panic messages and decrements
+// runningPanicDefers once main is blocked from exiting.
+//
+//go:nosplit
+func fatalpanic(msgs *_panic) {
+	pc := getcallerpc()
+	sp := getcallersp()
+	gp := getg()
+	var docrash bool
+	// Switch to the system stack to avoid any stack growth, which
+	// may make things worse if the runtime is in a bad state.
+	systemstack(func() {
+		if startpanic_m() && msgs != nil {
+			// There were panic messages and startpanic_m
+			// says it's okay to try to print them.
+
+			// startpanic_m set panicking, which will
+			// block main from exiting, so now OK to
+			// decrement runningPanicDefers.
+			atomic.Xadd(&runningPanicDefers, -1)
+
+			printpanics(msgs)
+		}
+
+		docrash = dopanic_m(gp, pc, sp)
+	})
+
+	if docrash {
+		// By crashing outside the above systemstack call, debuggers
+		// will not be confused when generating a backtrace.
+		// Function crash is marked nosplit to avoid stack growth.
+		crash()
+	}
+
+	systemstack(func() {
+		exit(2)
+	})
+
+	*(*int)(nil) = 0 // not reached
+}
+
+// startpanic_m prepares for an unrecoverable panic.
+//
+// It returns true if panic messages should be printed, or false if
+// the runtime is in bad shape and should just print stacks.
+//
+// It must not have write barriers even though the write barrier
+// explicitly ignores writes once dying > 0. Write barriers still
+// assume that g.m.p != nil, and this function may not have P
+// in some contexts (e.g. a panic in a signal handler for a signal
+// sent to an M with no P).
+//
+//go:nowritebarrierrec
+func startpanic_m() bool {
+	_g_ := getg()
+	if mheap_.cachealloc.size == 0 { // very early
+		print("runtime: panic before malloc heap initialized\n")
+	}
+	// Disallow malloc during an unrecoverable panic. A panic
+	// could happen in a signal handler, or in a throw, or inside
+	// malloc itself. We want to catch if an allocation ever does
+	// happen (even if we're not in one of these situations).
+	_g_.m.mallocing++
+
+	// If we're dying because of a bad lock count, set it to a
+	// good lock count so we don't recursively panic below.
+	if _g_.m.locks < 0 {
+		_g_.m.locks = 1
+	}
+
+	switch _g_.m.dying {
+	case 0:
+		// Setting dying >0 has the side-effect of disabling this G's writebuf.
+		_g_.m.dying = 1
+		atomic.Xadd(&panicking, 1)
+		lock(&paniclk)
+		if debug.schedtrace > 0 || debug.scheddetail > 0 {
+			schedtrace(true)
+		}
+		freezetheworld()
+		return true
+	case 1:
+		// Something failed while panicking.
+		// Just print a stack trace and exit.
+		_g_.m.dying = 2
+		print("panic during panic\n")
+		return false
+	case 2:
+		// This is a genuine bug in the runtime, we couldn't even
+		// print the stack trace successfully.
+		_g_.m.dying = 3
+		print("stack trace unavailable\n")
+		exit(4)
+		fallthrough
+	default:
+		// Can't even print! Just exit.
+		exit(5)
+		return false // Need to return something.
+	}
+}
+
+var didothers bool
+var deadlock mutex
+
+func dopanic_m(gp *g, pc, sp uintptr) bool {
+	if gp.sig != 0 {
+		signame := signame(gp.sig)
+		if signame != "" {
+			print("[signal ", signame)
+		} else {
+			print("[signal ", hex(gp.sig))
+		}
+		print(" code=", hex(gp.sigcode0), " addr=", hex(gp.sigcode1), " pc=", hex(gp.sigpc), "]\n")
+	}
+
+	level, all, docrash := gotraceback()
+	_g_ := getg()
+	if level > 0 {
+		if gp != gp.m.curg {
+			all = true
+		}
+		if gp != gp.m.g0 {
+			print("\n")
+			goroutineheader(gp)
+			traceback(pc, sp, 0, gp)
+		} else if level >= 2 || _g_.m.throwing > 0 {
+			print("\nruntime stack:\n")
+			traceback(pc, sp, 0, gp)
+		}
+		if !didothers && all {
+			didothers = true
+			tracebackothers(gp)
+		}
+	}
+	unlock(&paniclk)
+
+	if atomic.Xadd(&panicking, -1) != 0 {
+		// Some other m is panicking too.
+		// Let it print what it needs to print.
+		// Wait forever without chewing up cpu.
+		// It will exit when it's done.
+		lock(&deadlock)
+		lock(&deadlock)
+	}
+
+	printDebugLog()
+
+	return docrash
+}
+
+// canpanic returns false if a signal should throw instead of
+// panicking.
+//
+//go:nosplit
+func canpanic(gp *g) bool {
+	// Note that g is m->gsignal, different from gp.
+	// Note also that g->m can change at preemption, so m can go stale
+	// if this function ever makes a function call.
+	_g_ := getg()
+	_m_ := _g_.m
+
+	// Is it okay for gp to panic instead of crashing the program?
+	// Yes, as long as it is running Go code, not runtime code,
+	// and not stuck in a system call.
+	if gp == nil || gp != _m_.curg {
+		return false
+	}
+	if _m_.locks != 0 || _m_.mallocing != 0 || _m_.throwing != 0 || _m_.preemptoff != "" || _m_.dying != 0 {
+		return false
+	}
+	status := readgstatus(gp)
+	if status&^_Gscan != _Grunning || gp.syscallsp != 0 {
+		return false
+	}
+	if GOOS == "windows" && _m_.libcallsp != 0 {
+		return false
+	}
+	return true
+}
+
+// shouldPushSigpanic reports whether pc should be used as sigpanic's
+// return PC (pushing a frame for the call). Otherwise, it should be
+// left alone so that LR is used as sigpanic's return PC, effectively
+// replacing the top-most frame with sigpanic. This is used by
+// preparePanic.
+func shouldPushSigpanic(gp *g, pc, lr uintptr) bool {
+	if pc == 0 {
+		// Probably a call to a nil func. The old LR is more
+		// useful in the stack trace. Not pushing the frame
+		// will make the trace look like a call to sigpanic
+		// instead. (Otherwise the trace will end at sigpanic
+		// and we won't get to see who faulted.)
+		return false
+	}
+	// If we don't recognize the PC as code, but we do recognize
+	// the link register as code, then this assumes the panic was
+	// caused by a call to non-code. In this case, we want to
+	// ignore this call to make unwinding show the context.
+	//
+	// If we running C code, we're not going to recognize pc as a
+	// Go function, so just assume it's good. Otherwise, traceback
+	// may try to read a stale LR that looks like a Go code
+	// pointer and wander into the woods.
+	if gp.m.incgo || findfunc(pc).valid() {
+		// This wasn't a bad call, so use PC as sigpanic's
+		// return PC.
+		return true
+	}
+	if findfunc(lr).valid() {
+		// This was a bad call, but the LR is good, so use the
+		// LR as sigpanic's return PC.
+		return false
+	}
+	// Neither the PC or LR is good. Hopefully pushing a frame
+	// will work.
+	return true
+}
+
+// isAbortPC reports whether pc is the program counter at which
+// runtime.abort raises a signal.
+//
+// It is nosplit because it's part of the isgoexception
+// implementation.
+//
+//go:nosplit
+func isAbortPC(pc uintptr) bool {
+	return pc == funcPC(abort) || ((GOARCH == "arm" || GOARCH == "arm64") && pc == funcPC(abort)+sys.PCQuantum)
+}
diff --git a/src/runtime/panic32.go b/src/runtime/panic32.go
new file mode 100644
index 0000000..aea8401
--- /dev/null
+++ b/src/runtime/panic32.go
@@ -0,0 +1,105 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build 386 arm mips mipsle
+
+package runtime
+
+// Additional index/slice error paths for 32-bit platforms.
+// Used when the high word of a 64-bit index is not zero.
+
+// failures in the comparisons for s[x], 0 <= x < y (y == len(s))
+func goPanicExtendIndex(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "index out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsIndex})
+}
+func goPanicExtendIndexU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "index out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsIndex})
+}
+
+// failures in the comparisons for s[:x], 0 <= x <= y (y == len(s) or cap(s))
+func goPanicExtendSliceAlen(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSliceAlen})
+}
+func goPanicExtendSliceAlenU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSliceAlen})
+}
+func goPanicExtendSliceAcap(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSliceAcap})
+}
+func goPanicExtendSliceAcapU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSliceAcap})
+}
+
+// failures in the comparisons for s[x:y], 0 <= x <= y
+func goPanicExtendSliceB(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSliceB})
+}
+func goPanicExtendSliceBU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSliceB})
+}
+
+// failures in the comparisons for s[::x], 0 <= x <= y (y == len(s) or cap(s))
+func goPanicExtendSlice3Alen(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSlice3Alen})
+}
+func goPanicExtendSlice3AlenU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSlice3Alen})
+}
+func goPanicExtendSlice3Acap(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSlice3Acap})
+}
+func goPanicExtendSlice3AcapU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSlice3Acap})
+}
+
+// failures in the comparisons for s[:x:y], 0 <= x <= y
+func goPanicExtendSlice3B(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSlice3B})
+}
+func goPanicExtendSlice3BU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSlice3B})
+}
+
+// failures in the comparisons for s[x:y:], 0 <= x <= y
+func goPanicExtendSlice3C(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSlice3C})
+}
+func goPanicExtendSlice3CU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSlice3C})
+}
+
+// Implemented in assembly, as they take arguments in registers.
+// Declared here to mark them as ABIInternal.
+func panicExtendIndex(hi int, lo uint, y int)
+func panicExtendIndexU(hi uint, lo uint, y int)
+func panicExtendSliceAlen(hi int, lo uint, y int)
+func panicExtendSliceAlenU(hi uint, lo uint, y int)
+func panicExtendSliceAcap(hi int, lo uint, y int)
+func panicExtendSliceAcapU(hi uint, lo uint, y int)
+func panicExtendSliceB(hi int, lo uint, y int)
+func panicExtendSliceBU(hi uint, lo uint, y int)
+func panicExtendSlice3Alen(hi int, lo uint, y int)
+func panicExtendSlice3AlenU(hi uint, lo uint, y int)
+func panicExtendSlice3Acap(hi int, lo uint, y int)
+func panicExtendSlice3AcapU(hi uint, lo uint, y int)
+func panicExtendSlice3B(hi int, lo uint, y int)
+func panicExtendSlice3BU(hi uint, lo uint, y int)
+func panicExtendSlice3C(hi int, lo uint, y int)
+func panicExtendSlice3CU(hi uint, lo uint, y int)
diff --git a/src/runtime/panic_test.go b/src/runtime/panic_test.go
new file mode 100644
index 0000000..b8a300f
--- /dev/null
+++ b/src/runtime/panic_test.go
@@ -0,0 +1,48 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"strings"
+	"testing"
+)
+
+// Test that panics print out the underlying value
+// when the underlying kind is directly printable.
+// Issue: https://golang.org/issues/37531
+func TestPanicWithDirectlyPrintableCustomTypes(t *testing.T) {
+	tests := []struct {
+		name            string
+		wantPanicPrefix string
+	}{
+		{"panicCustomBool", `panic: main.MyBool(true)`},
+		{"panicCustomComplex128", `panic: main.MyComplex128(+3.210000e+001+1.000000e+001i)`},
+		{"panicCustomComplex64", `panic: main.MyComplex64(+1.100000e-001+3.000000e+000i)`},
+		{"panicCustomFloat32", `panic: main.MyFloat32(-9.370000e+001)`},
+		{"panicCustomFloat64", `panic: main.MyFloat64(-9.370000e+001)`},
+		{"panicCustomInt", `panic: main.MyInt(93)`},
+		{"panicCustomInt8", `panic: main.MyInt8(93)`},
+		{"panicCustomInt16", `panic: main.MyInt16(93)`},
+		{"panicCustomInt32", `panic: main.MyInt32(93)`},
+		{"panicCustomInt64", `panic: main.MyInt64(93)`},
+		{"panicCustomString", `panic: main.MyString("Panic")`},
+		{"panicCustomUint", `panic: main.MyUint(93)`},
+		{"panicCustomUint8", `panic: main.MyUint8(93)`},
+		{"panicCustomUint16", `panic: main.MyUint16(93)`},
+		{"panicCustomUint32", `panic: main.MyUint32(93)`},
+		{"panicCustomUint64", `panic: main.MyUint64(93)`},
+		{"panicCustomUintptr", `panic: main.MyUintptr(93)`},
+	}
+
+	for _, tt := range tests {
+		t := t
+		t.Run(tt.name, func(t *testing.T) {
+			output := runTestProg(t, "testprog", tt.name)
+			if !strings.HasPrefix(output, tt.wantPanicPrefix) {
+				t.Fatalf("%q\nis not present in\n%s", tt.wantPanicPrefix, output)
+			}
+		})
+	}
+}
diff --git a/src/runtime/plugin.go b/src/runtime/plugin.go
new file mode 100644
index 0000000..5e05be7
--- /dev/null
+++ b/src/runtime/plugin.go
@@ -0,0 +1,136 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+//go:linkname plugin_lastmoduleinit plugin.lastmoduleinit
+func plugin_lastmoduleinit() (path string, syms map[string]interface{}, errstr string) {
+	var md *moduledata
+	for pmd := firstmoduledata.next; pmd != nil; pmd = pmd.next {
+		if pmd.bad {
+			md = nil // we only want the last module
+			continue
+		}
+		md = pmd
+	}
+	if md == nil {
+		throw("runtime: no plugin module data")
+	}
+	if md.pluginpath == "" {
+		throw("runtime: plugin has empty pluginpath")
+	}
+	if md.typemap != nil {
+		return "", nil, "plugin already loaded"
+	}
+
+	for _, pmd := range activeModules() {
+		if pmd.pluginpath == md.pluginpath {
+			md.bad = true
+			return "", nil, "plugin already loaded"
+		}
+
+		if inRange(pmd.text, pmd.etext, md.text, md.etext) ||
+			inRange(pmd.bss, pmd.ebss, md.bss, md.ebss) ||
+			inRange(pmd.data, pmd.edata, md.data, md.edata) ||
+			inRange(pmd.types, pmd.etypes, md.types, md.etypes) {
+			println("plugin: new module data overlaps with previous moduledata")
+			println("\tpmd.text-etext=", hex(pmd.text), "-", hex(pmd.etext))
+			println("\tpmd.bss-ebss=", hex(pmd.bss), "-", hex(pmd.ebss))
+			println("\tpmd.data-edata=", hex(pmd.data), "-", hex(pmd.edata))
+			println("\tpmd.types-etypes=", hex(pmd.types), "-", hex(pmd.etypes))
+			println("\tmd.text-etext=", hex(md.text), "-", hex(md.etext))
+			println("\tmd.bss-ebss=", hex(md.bss), "-", hex(md.ebss))
+			println("\tmd.data-edata=", hex(md.data), "-", hex(md.edata))
+			println("\tmd.types-etypes=", hex(md.types), "-", hex(md.etypes))
+			throw("plugin: new module data overlaps with previous moduledata")
+		}
+	}
+	for _, pkghash := range md.pkghashes {
+		if pkghash.linktimehash != *pkghash.runtimehash {
+			md.bad = true
+			return "", nil, "plugin was built with a different version of package " + pkghash.modulename
+		}
+	}
+
+	// Initialize the freshly loaded module.
+	modulesinit()
+	typelinksinit()
+
+	pluginftabverify(md)
+	moduledataverify1(md)
+
+	lock(&itabLock)
+	for _, i := range md.itablinks {
+		itabAdd(i)
+	}
+	unlock(&itabLock)
+
+	// Build a map of symbol names to symbols. Here in the runtime
+	// we fill out the first word of the interface, the type. We
+	// pass these zero value interfaces to the plugin package,
+	// where the symbol value is filled in (usually via cgo).
+	//
+	// Because functions are handled specially in the plugin package,
+	// function symbol names are prefixed here with '.' to avoid
+	// a dependency on the reflect package.
+	syms = make(map[string]interface{}, len(md.ptab))
+	for _, ptab := range md.ptab {
+		symName := resolveNameOff(unsafe.Pointer(md.types), ptab.name)
+		t := (*_type)(unsafe.Pointer(md.types)).typeOff(ptab.typ)
+		var val interface{}
+		valp := (*[2]unsafe.Pointer)(unsafe.Pointer(&val))
+		(*valp)[0] = unsafe.Pointer(t)
+
+		name := symName.name()
+		if t.kind&kindMask == kindFunc {
+			name = "." + name
+		}
+		syms[name] = val
+	}
+	return md.pluginpath, syms, ""
+}
+
+func pluginftabverify(md *moduledata) {
+	badtable := false
+	for i := 0; i < len(md.ftab); i++ {
+		entry := md.ftab[i].entry
+		if md.minpc <= entry && entry <= md.maxpc {
+			continue
+		}
+
+		f := funcInfo{(*_func)(unsafe.Pointer(&md.pclntable[md.ftab[i].funcoff])), md}
+		name := funcname(f)
+
+		// A common bug is f.entry has a relocation to a duplicate
+		// function symbol, meaning if we search for its PC we get
+		// a valid entry with a name that is useful for debugging.
+		name2 := "none"
+		entry2 := uintptr(0)
+		f2 := findfunc(entry)
+		if f2.valid() {
+			name2 = funcname(f2)
+			entry2 = f2.entry
+		}
+		badtable = true
+		println("ftab entry outside pc range: ", hex(entry), "/", hex(entry2), ": ", name, "/", name2)
+	}
+	if badtable {
+		throw("runtime: plugin has bad symbol table")
+	}
+}
+
+// inRange reports whether v0 or v1 are in the range [r0, r1].
+func inRange(r0, r1, v0, v1 uintptr) bool {
+	return (v0 >= r0 && v0 <= r1) || (v1 >= r0 && v1 <= r1)
+}
+
+// A ptabEntry is generated by the compiler for each exported function
+// and global variable in the main package of a plugin. It is used to
+// initialize the plugin module's symbol map.
+type ptabEntry struct {
+	name nameOff
+	typ  typeOff
+}
diff --git a/src/runtime/pprof/elf.go b/src/runtime/pprof/elf.go
new file mode 100644
index 0000000..a8b5ea6
--- /dev/null
+++ b/src/runtime/pprof/elf.go
@@ -0,0 +1,109 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"encoding/binary"
+	"errors"
+	"fmt"
+	"os"
+)
+
+var (
+	errBadELF    = errors.New("malformed ELF binary")
+	errNoBuildID = errors.New("no NT_GNU_BUILD_ID found in ELF binary")
+)
+
+// elfBuildID returns the GNU build ID of the named ELF binary,
+// without introducing a dependency on debug/elf and its dependencies.
+func elfBuildID(file string) (string, error) {
+	buf := make([]byte, 256)
+	f, err := os.Open(file)
+	if err != nil {
+		return "", err
+	}
+	defer f.Close()
+
+	if _, err := f.ReadAt(buf[:64], 0); err != nil {
+		return "", err
+	}
+
+	// ELF file begins with \x7F E L F.
+	if buf[0] != 0x7F || buf[1] != 'E' || buf[2] != 'L' || buf[3] != 'F' {
+		return "", errBadELF
+	}
+
+	var byteOrder binary.ByteOrder
+	switch buf[5] {
+	default:
+		return "", errBadELF
+	case 1: // little-endian
+		byteOrder = binary.LittleEndian
+	case 2: // big-endian
+		byteOrder = binary.BigEndian
+	}
+
+	var shnum int
+	var shoff, shentsize int64
+	switch buf[4] {
+	default:
+		return "", errBadELF
+	case 1: // 32-bit file header
+		shoff = int64(byteOrder.Uint32(buf[32:]))
+		shentsize = int64(byteOrder.Uint16(buf[46:]))
+		if shentsize != 40 {
+			return "", errBadELF
+		}
+		shnum = int(byteOrder.Uint16(buf[48:]))
+	case 2: // 64-bit file header
+		shoff = int64(byteOrder.Uint64(buf[40:]))
+		shentsize = int64(byteOrder.Uint16(buf[58:]))
+		if shentsize != 64 {
+			return "", errBadELF
+		}
+		shnum = int(byteOrder.Uint16(buf[60:]))
+	}
+
+	for i := 0; i < shnum; i++ {
+		if _, err := f.ReadAt(buf[:shentsize], shoff+int64(i)*shentsize); err != nil {
+			return "", err
+		}
+		if typ := byteOrder.Uint32(buf[4:]); typ != 7 { // SHT_NOTE
+			continue
+		}
+		var off, size int64
+		if shentsize == 40 {
+			// 32-bit section header
+			off = int64(byteOrder.Uint32(buf[16:]))
+			size = int64(byteOrder.Uint32(buf[20:]))
+		} else {
+			// 64-bit section header
+			off = int64(byteOrder.Uint64(buf[24:]))
+			size = int64(byteOrder.Uint64(buf[32:]))
+		}
+		size += off
+		for off < size {
+			if _, err := f.ReadAt(buf[:16], off); err != nil { // room for header + name GNU\x00
+				return "", err
+			}
+			nameSize := int(byteOrder.Uint32(buf[0:]))
+			descSize := int(byteOrder.Uint32(buf[4:]))
+			noteType := int(byteOrder.Uint32(buf[8:]))
+			descOff := off + int64(12+(nameSize+3)&^3)
+			off = descOff + int64((descSize+3)&^3)
+			if nameSize != 4 || noteType != 3 || buf[12] != 'G' || buf[13] != 'N' || buf[14] != 'U' || buf[15] != '\x00' { // want name GNU\x00 type 3 (NT_GNU_BUILD_ID)
+				continue
+			}
+			if descSize > len(buf) {
+				return "", errBadELF
+			}
+			if _, err := f.ReadAt(buf[:descSize], descOff); err != nil {
+				return "", err
+			}
+			return fmt.Sprintf("%x", buf[:descSize]), nil
+		}
+	}
+	return "", errNoBuildID
+}
diff --git a/src/runtime/pprof/label.go b/src/runtime/pprof/label.go
new file mode 100644
index 0000000..b614f12
--- /dev/null
+++ b/src/runtime/pprof/label.go
@@ -0,0 +1,108 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"context"
+	"fmt"
+	"sort"
+	"strings"
+)
+
+type label struct {
+	key   string
+	value string
+}
+
+// LabelSet is a set of labels.
+type LabelSet struct {
+	list []label
+}
+
+// labelContextKey is the type of contextKeys used for profiler labels.
+type labelContextKey struct{}
+
+func labelValue(ctx context.Context) labelMap {
+	labels, _ := ctx.Value(labelContextKey{}).(*labelMap)
+	if labels == nil {
+		return labelMap(nil)
+	}
+	return *labels
+}
+
+// labelMap is the representation of the label set held in the context type.
+// This is an initial implementation, but it will be replaced with something
+// that admits incremental immutable modification more efficiently.
+type labelMap map[string]string
+
+// String statisfies Stringer and returns key, value pairs in a consistent
+// order.
+func (l *labelMap) String() string {
+	if l == nil {
+		return ""
+	}
+	keyVals := make([]string, 0, len(*l))
+
+	for k, v := range *l {
+		keyVals = append(keyVals, fmt.Sprintf("%q:%q", k, v))
+	}
+
+	sort.Strings(keyVals)
+
+	return "{" + strings.Join(keyVals, ", ") + "}"
+}
+
+// WithLabels returns a new context.Context with the given labels added.
+// A label overwrites a prior label with the same key.
+func WithLabels(ctx context.Context, labels LabelSet) context.Context {
+	childLabels := make(labelMap)
+	parentLabels := labelValue(ctx)
+	// TODO(matloob): replace the map implementation with something
+	// more efficient so creating a child context WithLabels doesn't need
+	// to clone the map.
+	for k, v := range parentLabels {
+		childLabels[k] = v
+	}
+	for _, label := range labels.list {
+		childLabels[label.key] = label.value
+	}
+	return context.WithValue(ctx, labelContextKey{}, &childLabels)
+}
+
+// Labels takes an even number of strings representing key-value pairs
+// and makes a LabelSet containing them.
+// A label overwrites a prior label with the same key.
+// Currently only the CPU and goroutine profiles utilize any labels
+// information.
+// See https://golang.org/issue/23458 for details.
+func Labels(args ...string) LabelSet {
+	if len(args)%2 != 0 {
+		panic("uneven number of arguments to pprof.Labels")
+	}
+	list := make([]label, 0, len(args)/2)
+	for i := 0; i+1 < len(args); i += 2 {
+		list = append(list, label{key: args[i], value: args[i+1]})
+	}
+	return LabelSet{list: list}
+}
+
+// Label returns the value of the label with the given key on ctx, and a boolean indicating
+// whether that label exists.
+func Label(ctx context.Context, key string) (string, bool) {
+	ctxLabels := labelValue(ctx)
+	v, ok := ctxLabels[key]
+	return v, ok
+}
+
+// ForLabels invokes f with each label set on the context.
+// The function f should return true to continue iteration or false to stop iteration early.
+func ForLabels(ctx context.Context, f func(key, value string) bool) {
+	ctxLabels := labelValue(ctx)
+	for k, v := range ctxLabels {
+		if !f(k, v) {
+			break
+		}
+	}
+}
diff --git a/src/runtime/pprof/label_test.go b/src/runtime/pprof/label_test.go
new file mode 100644
index 0000000..fcb00bd
--- /dev/null
+++ b/src/runtime/pprof/label_test.go
@@ -0,0 +1,114 @@
+package pprof
+
+import (
+	"context"
+	"reflect"
+	"sort"
+	"testing"
+)
+
+func labelsSorted(ctx context.Context) []label {
+	ls := []label{}
+	ForLabels(ctx, func(key, value string) bool {
+		ls = append(ls, label{key, value})
+		return true
+	})
+	sort.Sort(labelSorter(ls))
+	return ls
+}
+
+type labelSorter []label
+
+func (s labelSorter) Len() int           { return len(s) }
+func (s labelSorter) Swap(i, j int)      { s[i], s[j] = s[j], s[i] }
+func (s labelSorter) Less(i, j int) bool { return s[i].key < s[j].key }
+
+func TestContextLabels(t *testing.T) {
+	// Background context starts with no labels.
+	ctx := context.Background()
+	labels := labelsSorted(ctx)
+	if len(labels) != 0 {
+		t.Errorf("labels on background context: want [], got %v ", labels)
+	}
+
+	// Add a single label.
+	ctx = WithLabels(ctx, Labels("key", "value"))
+	// Retrieve it with Label.
+	v, ok := Label(ctx, "key")
+	if !ok || v != "value" {
+		t.Errorf(`Label(ctx, "key"): got %v, %v; want "value", ok`, v, ok)
+	}
+	gotLabels := labelsSorted(ctx)
+	wantLabels := []label{{"key", "value"}}
+	if !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("(sorted) labels on context: got %v, want %v", gotLabels, wantLabels)
+	}
+
+	// Add a label with a different key.
+	ctx = WithLabels(ctx, Labels("key2", "value2"))
+	v, ok = Label(ctx, "key2")
+	if !ok || v != "value2" {
+		t.Errorf(`Label(ctx, "key2"): got %v, %v; want "value2", ok`, v, ok)
+	}
+	gotLabels = labelsSorted(ctx)
+	wantLabels = []label{{"key", "value"}, {"key2", "value2"}}
+	if !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("(sorted) labels on context: got %v, want %v", gotLabels, wantLabels)
+	}
+
+	// Add label with first key to test label replacement.
+	ctx = WithLabels(ctx, Labels("key", "value3"))
+	v, ok = Label(ctx, "key")
+	if !ok || v != "value3" {
+		t.Errorf(`Label(ctx, "key3"): got %v, %v; want "value3", ok`, v, ok)
+	}
+	gotLabels = labelsSorted(ctx)
+	wantLabels = []label{{"key", "value3"}, {"key2", "value2"}}
+	if !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("(sorted) labels on context: got %v, want %v", gotLabels, wantLabels)
+	}
+
+	// Labels called with two labels with the same key should pick the second.
+	ctx = WithLabels(ctx, Labels("key4", "value4a", "key4", "value4b"))
+	v, ok = Label(ctx, "key4")
+	if !ok || v != "value4b" {
+		t.Errorf(`Label(ctx, "key4"): got %v, %v; want "value4b", ok`, v, ok)
+	}
+	gotLabels = labelsSorted(ctx)
+	wantLabels = []label{{"key", "value3"}, {"key2", "value2"}, {"key4", "value4b"}}
+	if !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("(sorted) labels on context: got %v, want %v", gotLabels, wantLabels)
+	}
+}
+
+func TestLabelMapStringer(t *testing.T) {
+	for _, tbl := range []struct {
+		m        labelMap
+		expected string
+	}{
+		{
+			m: labelMap{
+				// empty map
+			},
+			expected: "{}",
+		}, {
+			m: labelMap{
+				"foo": "bar",
+			},
+			expected: `{"foo":"bar"}`,
+		}, {
+			m: labelMap{
+				"foo":             "bar",
+				"key1":            "value1",
+				"key2":            "value2",
+				"key3":            "value3",
+				"key4WithNewline": "\nvalue4",
+			},
+			expected: `{"foo":"bar", "key1":"value1", "key2":"value2", "key3":"value3", "key4WithNewline":"\nvalue4"}`,
+		},
+	} {
+		if got := tbl.m.String(); tbl.expected != got {
+			t.Errorf("%#v.String() = %q; want %q", tbl.m, got, tbl.expected)
+		}
+	}
+}
diff --git a/src/runtime/pprof/map.go b/src/runtime/pprof/map.go
new file mode 100644
index 0000000..7c75872
--- /dev/null
+++ b/src/runtime/pprof/map.go
@@ -0,0 +1,90 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import "unsafe"
+
+// A profMap is a map from (stack, tag) to mapEntry.
+// It grows without bound, but that's assumed to be OK.
+type profMap struct {
+	hash    map[uintptr]*profMapEntry
+	all     *profMapEntry
+	last    *profMapEntry
+	free    []profMapEntry
+	freeStk []uintptr
+}
+
+// A profMapEntry is a single entry in the profMap.
+type profMapEntry struct {
+	nextHash *profMapEntry // next in hash list
+	nextAll  *profMapEntry // next in list of all entries
+	stk      []uintptr
+	tag      unsafe.Pointer
+	count    int64
+}
+
+func (m *profMap) lookup(stk []uint64, tag unsafe.Pointer) *profMapEntry {
+	// Compute hash of (stk, tag).
+	h := uintptr(0)
+	for _, x := range stk {
+		h = h<<8 | (h >> (8 * (unsafe.Sizeof(h) - 1)))
+		h += uintptr(x) * 41
+	}
+	h = h<<8 | (h >> (8 * (unsafe.Sizeof(h) - 1)))
+	h += uintptr(tag) * 41
+
+	// Find entry if present.
+	var last *profMapEntry
+Search:
+	for e := m.hash[h]; e != nil; last, e = e, e.nextHash {
+		if len(e.stk) != len(stk) || e.tag != tag {
+			continue
+		}
+		for j := range stk {
+			if e.stk[j] != uintptr(stk[j]) {
+				continue Search
+			}
+		}
+		// Move to front.
+		if last != nil {
+			last.nextHash = e.nextHash
+			e.nextHash = m.hash[h]
+			m.hash[h] = e
+		}
+		return e
+	}
+
+	// Add new entry.
+	if len(m.free) < 1 {
+		m.free = make([]profMapEntry, 128)
+	}
+	e := &m.free[0]
+	m.free = m.free[1:]
+	e.nextHash = m.hash[h]
+	e.tag = tag
+
+	if len(m.freeStk) < len(stk) {
+		m.freeStk = make([]uintptr, 1024)
+	}
+	// Limit cap to prevent append from clobbering freeStk.
+	e.stk = m.freeStk[:len(stk):len(stk)]
+	m.freeStk = m.freeStk[len(stk):]
+
+	for j := range stk {
+		e.stk[j] = uintptr(stk[j])
+	}
+	if m.hash == nil {
+		m.hash = make(map[uintptr]*profMapEntry)
+	}
+	m.hash[h] = e
+	if m.all == nil {
+		m.all = e
+		m.last = e
+	} else {
+		m.last.nextAll = e
+		m.last = e
+	}
+	return e
+}
diff --git a/src/runtime/pprof/mprof_test.go b/src/runtime/pprof/mprof_test.go
new file mode 100644
index 0000000..c11a45f
--- /dev/null
+++ b/src/runtime/pprof/mprof_test.go
@@ -0,0 +1,176 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !js
+
+package pprof
+
+import (
+	"bytes"
+	"fmt"
+	"internal/profile"
+	"reflect"
+	"regexp"
+	"runtime"
+	"testing"
+	"unsafe"
+)
+
+var memSink interface{}
+
+func allocateTransient1M() {
+	for i := 0; i < 1024; i++ {
+		memSink = &struct{ x [1024]byte }{}
+	}
+}
+
+//go:noinline
+func allocateTransient2M() {
+	memSink = make([]byte, 2<<20)
+}
+
+func allocateTransient2MInline() {
+	memSink = make([]byte, 2<<20)
+}
+
+type Obj32 struct {
+	link *Obj32
+	pad  [32 - unsafe.Sizeof(uintptr(0))]byte
+}
+
+var persistentMemSink *Obj32
+
+func allocatePersistent1K() {
+	for i := 0; i < 32; i++ {
+		// Can't use slice because that will introduce implicit allocations.
+		obj := &Obj32{link: persistentMemSink}
+		persistentMemSink = obj
+	}
+}
+
+// Allocate transient memory using reflect.Call.
+
+func allocateReflectTransient() {
+	memSink = make([]byte, 2<<20)
+}
+
+func allocateReflect() {
+	rv := reflect.ValueOf(allocateReflectTransient)
+	rv.Call(nil)
+}
+
+var memoryProfilerRun = 0
+
+func TestMemoryProfiler(t *testing.T) {
+	// Disable sampling, otherwise it's difficult to assert anything.
+	oldRate := runtime.MemProfileRate
+	runtime.MemProfileRate = 1
+	defer func() {
+		runtime.MemProfileRate = oldRate
+	}()
+
+	// Allocate a meg to ensure that mcache.nextSample is updated to 1.
+	for i := 0; i < 1024; i++ {
+		memSink = make([]byte, 1024)
+	}
+
+	// Do the interesting allocations.
+	allocateTransient1M()
+	allocateTransient2M()
+	allocateTransient2MInline()
+	allocatePersistent1K()
+	allocateReflect()
+	memSink = nil
+
+	runtime.GC() // materialize stats
+
+	memoryProfilerRun++
+
+	tests := []struct {
+		stk    []string
+		legacy string
+	}{{
+		stk: []string{"runtime/pprof.allocatePersistent1K", "runtime/pprof.TestMemoryProfiler"},
+		legacy: fmt.Sprintf(`%v: %v \[%v: %v\] @ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+
+#	0x[0-9,a-f]+	runtime/pprof\.allocatePersistent1K\+0x[0-9,a-f]+	.*/runtime/pprof/mprof_test\.go:47
+#	0x[0-9,a-f]+	runtime/pprof\.TestMemoryProfiler\+0x[0-9,a-f]+	.*/runtime/pprof/mprof_test\.go:82
+`, 32*memoryProfilerRun, 1024*memoryProfilerRun, 32*memoryProfilerRun, 1024*memoryProfilerRun),
+	}, {
+		stk: []string{"runtime/pprof.allocateTransient1M", "runtime/pprof.TestMemoryProfiler"},
+		legacy: fmt.Sprintf(`0: 0 \[%v: %v\] @ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+
+#	0x[0-9,a-f]+	runtime/pprof\.allocateTransient1M\+0x[0-9,a-f]+	.*/runtime/pprof/mprof_test.go:24
+#	0x[0-9,a-f]+	runtime/pprof\.TestMemoryProfiler\+0x[0-9,a-f]+	.*/runtime/pprof/mprof_test.go:79
+`, (1<<10)*memoryProfilerRun, (1<<20)*memoryProfilerRun),
+	}, {
+		stk: []string{"runtime/pprof.allocateTransient2M", "runtime/pprof.TestMemoryProfiler"},
+		legacy: fmt.Sprintf(`0: 0 \[%v: %v\] @ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+
+#	0x[0-9,a-f]+	runtime/pprof\.allocateTransient2M\+0x[0-9,a-f]+	.*/runtime/pprof/mprof_test.go:30
+#	0x[0-9,a-f]+	runtime/pprof\.TestMemoryProfiler\+0x[0-9,a-f]+	.*/runtime/pprof/mprof_test.go:80
+`, memoryProfilerRun, (2<<20)*memoryProfilerRun),
+	}, {
+		stk: []string{"runtime/pprof.allocateTransient2MInline", "runtime/pprof.TestMemoryProfiler"},
+		legacy: fmt.Sprintf(`0: 0 \[%v: %v\] @ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+
+#	0x[0-9,a-f]+	runtime/pprof\.allocateTransient2MInline\+0x[0-9,a-f]+	.*/runtime/pprof/mprof_test.go:34
+#	0x[0-9,a-f]+	runtime/pprof\.TestMemoryProfiler\+0x[0-9,a-f]+	.*/runtime/pprof/mprof_test.go:81
+`, memoryProfilerRun, (2<<20)*memoryProfilerRun),
+	}, {
+		stk: []string{"runtime/pprof.allocateReflectTransient"},
+		legacy: fmt.Sprintf(`0: 0 \[%v: %v\] @( 0x[0-9,a-f]+)+
+#	0x[0-9,a-f]+	runtime/pprof\.allocateReflectTransient\+0x[0-9,a-f]+	.*/runtime/pprof/mprof_test.go:55
+`, memoryProfilerRun, (2<<20)*memoryProfilerRun),
+	}}
+
+	t.Run("debug=1", func(t *testing.T) {
+		var buf bytes.Buffer
+		if err := Lookup("heap").WriteTo(&buf, 1); err != nil {
+			t.Fatalf("failed to write heap profile: %v", err)
+		}
+
+		for _, test := range tests {
+			if !regexp.MustCompile(test.legacy).Match(buf.Bytes()) {
+				t.Fatalf("The entry did not match:\n%v\n\nProfile:\n%v\n", test.legacy, buf.String())
+			}
+		}
+	})
+
+	t.Run("proto", func(t *testing.T) {
+		var buf bytes.Buffer
+		if err := Lookup("heap").WriteTo(&buf, 0); err != nil {
+			t.Fatalf("failed to write heap profile: %v", err)
+		}
+		p, err := profile.Parse(&buf)
+		if err != nil {
+			t.Fatalf("failed to parse heap profile: %v", err)
+		}
+		t.Logf("Profile = %v", p)
+
+		stks := stacks(p)
+		for _, test := range tests {
+			if !containsStack(stks, test.stk) {
+				t.Fatalf("No matching stack entry for %q\n\nProfile:\n%v\n", test.stk, p)
+			}
+		}
+
+		if !containsInlinedCall(TestMemoryProfiler, 4<<10) {
+			t.Logf("Can't determine whether allocateTransient2MInline was inlined into TestMemoryProfiler.")
+			return
+		}
+
+		// Check the inlined function location is encoded correctly.
+		for _, loc := range p.Location {
+			inlinedCaller, inlinedCallee := false, false
+			for _, line := range loc.Line {
+				if line.Function.Name == "runtime/pprof.allocateTransient2MInline" {
+					inlinedCallee = true
+				}
+				if inlinedCallee && line.Function.Name == "runtime/pprof.TestMemoryProfiler" {
+					inlinedCaller = true
+				}
+			}
+			if inlinedCallee != inlinedCaller {
+				t.Errorf("want allocateTransient2MInline after TestMemoryProfiler in one location, got separate location entries:\n%v", loc)
+			}
+		}
+	})
+}
diff --git a/src/runtime/pprof/pprof.go b/src/runtime/pprof/pprof.go
new file mode 100644
index 0000000..d3b7df3
--- /dev/null
+++ b/src/runtime/pprof/pprof.go
@@ -0,0 +1,945 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package pprof writes runtime profiling data in the format expected
+// by the pprof visualization tool.
+//
+// Profiling a Go program
+//
+// The first step to profiling a Go program is to enable profiling.
+// Support for profiling benchmarks built with the standard testing
+// package is built into go test. For example, the following command
+// runs benchmarks in the current directory and writes the CPU and
+// memory profiles to cpu.prof and mem.prof:
+//
+//     go test -cpuprofile cpu.prof -memprofile mem.prof -bench .
+//
+// To add equivalent profiling support to a standalone program, add
+// code like the following to your main function:
+//
+//    var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to `file`")
+//    var memprofile = flag.String("memprofile", "", "write memory profile to `file`")
+//
+//    func main() {
+//        flag.Parse()
+//        if *cpuprofile != "" {
+//            f, err := os.Create(*cpuprofile)
+//            if err != nil {
+//                log.Fatal("could not create CPU profile: ", err)
+//            }
+//            defer f.Close() // error handling omitted for example
+//            if err := pprof.StartCPUProfile(f); err != nil {
+//                log.Fatal("could not start CPU profile: ", err)
+//            }
+//            defer pprof.StopCPUProfile()
+//        }
+//
+//        // ... rest of the program ...
+//
+//        if *memprofile != "" {
+//            f, err := os.Create(*memprofile)
+//            if err != nil {
+//                log.Fatal("could not create memory profile: ", err)
+//            }
+//            defer f.Close() // error handling omitted for example
+//            runtime.GC() // get up-to-date statistics
+//            if err := pprof.WriteHeapProfile(f); err != nil {
+//                log.Fatal("could not write memory profile: ", err)
+//            }
+//        }
+//    }
+//
+// There is also a standard HTTP interface to profiling data. Adding
+// the following line will install handlers under the /debug/pprof/
+// URL to download live profiles:
+//
+//    import _ "net/http/pprof"
+//
+// See the net/http/pprof package for more details.
+//
+// Profiles can then be visualized with the pprof tool:
+//
+//    go tool pprof cpu.prof
+//
+// There are many commands available from the pprof command line.
+// Commonly used commands include "top", which prints a summary of the
+// top program hot-spots, and "web", which opens an interactive graph
+// of hot-spots and their call graphs. Use "help" for information on
+// all pprof commands.
+//
+// For more information about pprof, see
+// https://github.com/google/pprof/blob/master/doc/README.md.
+package pprof
+
+import (
+	"bufio"
+	"bytes"
+	"fmt"
+	"io"
+	"runtime"
+	"sort"
+	"strings"
+	"sync"
+	"text/tabwriter"
+	"time"
+	"unsafe"
+)
+
+// BUG(rsc): Profiles are only as good as the kernel support used to generate them.
+// See https://golang.org/issue/13841 for details about known problems.
+
+// A Profile is a collection of stack traces showing the call sequences
+// that led to instances of a particular event, such as allocation.
+// Packages can create and maintain their own profiles; the most common
+// use is for tracking resources that must be explicitly closed, such as files
+// or network connections.
+//
+// A Profile's methods can be called from multiple goroutines simultaneously.
+//
+// Each Profile has a unique name. A few profiles are predefined:
+//
+//	goroutine    - stack traces of all current goroutines
+//	heap         - a sampling of memory allocations of live objects
+//	allocs       - a sampling of all past memory allocations
+//	threadcreate - stack traces that led to the creation of new OS threads
+//	block        - stack traces that led to blocking on synchronization primitives
+//	mutex        - stack traces of holders of contended mutexes
+//
+// These predefined profiles maintain themselves and panic on an explicit
+// Add or Remove method call.
+//
+// The heap profile reports statistics as of the most recently completed
+// garbage collection; it elides more recent allocation to avoid skewing
+// the profile away from live data and toward garbage.
+// If there has been no garbage collection at all, the heap profile reports
+// all known allocations. This exception helps mainly in programs running
+// without garbage collection enabled, usually for debugging purposes.
+//
+// The heap profile tracks both the allocation sites for all live objects in
+// the application memory and for all objects allocated since the program start.
+// Pprof's -inuse_space, -inuse_objects, -alloc_space, and -alloc_objects
+// flags select which to display, defaulting to -inuse_space (live objects,
+// scaled by size).
+//
+// The allocs profile is the same as the heap profile but changes the default
+// pprof display to -alloc_space, the total number of bytes allocated since
+// the program began (including garbage-collected bytes).
+//
+// The CPU profile is not available as a Profile. It has a special API,
+// the StartCPUProfile and StopCPUProfile functions, because it streams
+// output to a writer during profiling.
+//
+type Profile struct {
+	name  string
+	mu    sync.Mutex
+	m     map[interface{}][]uintptr
+	count func() int
+	write func(io.Writer, int) error
+}
+
+// profiles records all registered profiles.
+var profiles struct {
+	mu sync.Mutex
+	m  map[string]*Profile
+}
+
+var goroutineProfile = &Profile{
+	name:  "goroutine",
+	count: countGoroutine,
+	write: writeGoroutine,
+}
+
+var threadcreateProfile = &Profile{
+	name:  "threadcreate",
+	count: countThreadCreate,
+	write: writeThreadCreate,
+}
+
+var heapProfile = &Profile{
+	name:  "heap",
+	count: countHeap,
+	write: writeHeap,
+}
+
+var allocsProfile = &Profile{
+	name:  "allocs",
+	count: countHeap, // identical to heap profile
+	write: writeAlloc,
+}
+
+var blockProfile = &Profile{
+	name:  "block",
+	count: countBlock,
+	write: writeBlock,
+}
+
+var mutexProfile = &Profile{
+	name:  "mutex",
+	count: countMutex,
+	write: writeMutex,
+}
+
+func lockProfiles() {
+	profiles.mu.Lock()
+	if profiles.m == nil {
+		// Initial built-in profiles.
+		profiles.m = map[string]*Profile{
+			"goroutine":    goroutineProfile,
+			"threadcreate": threadcreateProfile,
+			"heap":         heapProfile,
+			"allocs":       allocsProfile,
+			"block":        blockProfile,
+			"mutex":        mutexProfile,
+		}
+	}
+}
+
+func unlockProfiles() {
+	profiles.mu.Unlock()
+}
+
+// NewProfile creates a new profile with the given name.
+// If a profile with that name already exists, NewProfile panics.
+// The convention is to use a 'import/path.' prefix to create
+// separate name spaces for each package.
+// For compatibility with various tools that read pprof data,
+// profile names should not contain spaces.
+func NewProfile(name string) *Profile {
+	lockProfiles()
+	defer unlockProfiles()
+	if name == "" {
+		panic("pprof: NewProfile with empty name")
+	}
+	if profiles.m[name] != nil {
+		panic("pprof: NewProfile name already in use: " + name)
+	}
+	p := &Profile{
+		name: name,
+		m:    map[interface{}][]uintptr{},
+	}
+	profiles.m[name] = p
+	return p
+}
+
+// Lookup returns the profile with the given name, or nil if no such profile exists.
+func Lookup(name string) *Profile {
+	lockProfiles()
+	defer unlockProfiles()
+	return profiles.m[name]
+}
+
+// Profiles returns a slice of all the known profiles, sorted by name.
+func Profiles() []*Profile {
+	lockProfiles()
+	defer unlockProfiles()
+
+	all := make([]*Profile, 0, len(profiles.m))
+	for _, p := range profiles.m {
+		all = append(all, p)
+	}
+
+	sort.Slice(all, func(i, j int) bool { return all[i].name < all[j].name })
+	return all
+}
+
+// Name returns this profile's name, which can be passed to Lookup to reobtain the profile.
+func (p *Profile) Name() string {
+	return p.name
+}
+
+// Count returns the number of execution stacks currently in the profile.
+func (p *Profile) Count() int {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	if p.count != nil {
+		return p.count()
+	}
+	return len(p.m)
+}
+
+// Add adds the current execution stack to the profile, associated with value.
+// Add stores value in an internal map, so value must be suitable for use as
+// a map key and will not be garbage collected until the corresponding
+// call to Remove. Add panics if the profile already contains a stack for value.
+//
+// The skip parameter has the same meaning as runtime.Caller's skip
+// and controls where the stack trace begins. Passing skip=0 begins the
+// trace in the function calling Add. For example, given this
+// execution stack:
+//
+//	Add
+//	called from rpc.NewClient
+//	called from mypkg.Run
+//	called from main.main
+//
+// Passing skip=0 begins the stack trace at the call to Add inside rpc.NewClient.
+// Passing skip=1 begins the stack trace at the call to NewClient inside mypkg.Run.
+//
+func (p *Profile) Add(value interface{}, skip int) {
+	if p.name == "" {
+		panic("pprof: use of uninitialized Profile")
+	}
+	if p.write != nil {
+		panic("pprof: Add called on built-in Profile " + p.name)
+	}
+
+	stk := make([]uintptr, 32)
+	n := runtime.Callers(skip+1, stk[:])
+	stk = stk[:n]
+	if len(stk) == 0 {
+		// The value for skip is too large, and there's no stack trace to record.
+		stk = []uintptr{funcPC(lostProfileEvent)}
+	}
+
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	if p.m[value] != nil {
+		panic("pprof: Profile.Add of duplicate value")
+	}
+	p.m[value] = stk
+}
+
+// Remove removes the execution stack associated with value from the profile.
+// It is a no-op if the value is not in the profile.
+func (p *Profile) Remove(value interface{}) {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	delete(p.m, value)
+}
+
+// WriteTo writes a pprof-formatted snapshot of the profile to w.
+// If a write to w returns an error, WriteTo returns that error.
+// Otherwise, WriteTo returns nil.
+//
+// The debug parameter enables additional output.
+// Passing debug=0 writes the gzip-compressed protocol buffer described
+// in https://github.com/google/pprof/tree/master/proto#overview.
+// Passing debug=1 writes the legacy text format with comments
+// translating addresses to function names and line numbers, so that a
+// programmer can read the profile without tools.
+//
+// The predefined profiles may assign meaning to other debug values;
+// for example, when printing the "goroutine" profile, debug=2 means to
+// print the goroutine stacks in the same form that a Go program uses
+// when dying due to an unrecovered panic.
+func (p *Profile) WriteTo(w io.Writer, debug int) error {
+	if p.name == "" {
+		panic("pprof: use of zero Profile")
+	}
+	if p.write != nil {
+		return p.write(w, debug)
+	}
+
+	// Obtain consistent snapshot under lock; then process without lock.
+	p.mu.Lock()
+	all := make([][]uintptr, 0, len(p.m))
+	for _, stk := range p.m {
+		all = append(all, stk)
+	}
+	p.mu.Unlock()
+
+	// Map order is non-deterministic; make output deterministic.
+	sort.Slice(all, func(i, j int) bool {
+		t, u := all[i], all[j]
+		for k := 0; k < len(t) && k < len(u); k++ {
+			if t[k] != u[k] {
+				return t[k] < u[k]
+			}
+		}
+		return len(t) < len(u)
+	})
+
+	return printCountProfile(w, debug, p.name, stackProfile(all))
+}
+
+type stackProfile [][]uintptr
+
+func (x stackProfile) Len() int              { return len(x) }
+func (x stackProfile) Stack(i int) []uintptr { return x[i] }
+func (x stackProfile) Label(i int) *labelMap { return nil }
+
+// A countProfile is a set of stack traces to be printed as counts
+// grouped by stack trace. There are multiple implementations:
+// all that matters is that we can find out how many traces there are
+// and obtain each trace in turn.
+type countProfile interface {
+	Len() int
+	Stack(i int) []uintptr
+	Label(i int) *labelMap
+}
+
+// printCountCycleProfile outputs block profile records (for block or mutex profiles)
+// as the pprof-proto format output. Translations from cycle count to time duration
+// are done because The proto expects count and time (nanoseconds) instead of count
+// and the number of cycles for block, contention profiles.
+// Possible 'scaler' functions are scaleBlockProfile and scaleMutexProfile.
+func printCountCycleProfile(w io.Writer, countName, cycleName string, scaler func(int64, float64) (int64, float64), records []runtime.BlockProfileRecord) error {
+	// Output profile in protobuf form.
+	b := newProfileBuilder(w)
+	b.pbValueType(tagProfile_PeriodType, countName, "count")
+	b.pb.int64Opt(tagProfile_Period, 1)
+	b.pbValueType(tagProfile_SampleType, countName, "count")
+	b.pbValueType(tagProfile_SampleType, cycleName, "nanoseconds")
+
+	cpuGHz := float64(runtime_cyclesPerSecond()) / 1e9
+
+	values := []int64{0, 0}
+	var locs []uint64
+	for _, r := range records {
+		count, nanosec := scaler(r.Count, float64(r.Cycles)/cpuGHz)
+		values[0] = count
+		values[1] = int64(nanosec)
+		// For count profiles, all stack addresses are
+		// return PCs, which is what appendLocsForStack expects.
+		locs = b.appendLocsForStack(locs[:0], r.Stack())
+		b.pbSample(values, locs, nil)
+	}
+	b.build()
+	return nil
+}
+
+// printCountProfile prints a countProfile at the specified debug level.
+// The profile will be in compressed proto format unless debug is nonzero.
+func printCountProfile(w io.Writer, debug int, name string, p countProfile) error {
+	// Build count of each stack.
+	var buf bytes.Buffer
+	key := func(stk []uintptr, lbls *labelMap) string {
+		buf.Reset()
+		fmt.Fprintf(&buf, "@")
+		for _, pc := range stk {
+			fmt.Fprintf(&buf, " %#x", pc)
+		}
+		if lbls != nil {
+			buf.WriteString("\n# labels: ")
+			buf.WriteString(lbls.String())
+		}
+		return buf.String()
+	}
+	count := map[string]int{}
+	index := map[string]int{}
+	var keys []string
+	n := p.Len()
+	for i := 0; i < n; i++ {
+		k := key(p.Stack(i), p.Label(i))
+		if count[k] == 0 {
+			index[k] = i
+			keys = append(keys, k)
+		}
+		count[k]++
+	}
+
+	sort.Sort(&keysByCount{keys, count})
+
+	if debug > 0 {
+		// Print debug profile in legacy format
+		tw := tabwriter.NewWriter(w, 1, 8, 1, '\t', 0)
+		fmt.Fprintf(tw, "%s profile: total %d\n", name, p.Len())
+		for _, k := range keys {
+			fmt.Fprintf(tw, "%d %s\n", count[k], k)
+			printStackRecord(tw, p.Stack(index[k]), false)
+		}
+		return tw.Flush()
+	}
+
+	// Output profile in protobuf form.
+	b := newProfileBuilder(w)
+	b.pbValueType(tagProfile_PeriodType, name, "count")
+	b.pb.int64Opt(tagProfile_Period, 1)
+	b.pbValueType(tagProfile_SampleType, name, "count")
+
+	values := []int64{0}
+	var locs []uint64
+	for _, k := range keys {
+		values[0] = int64(count[k])
+		// For count profiles, all stack addresses are
+		// return PCs, which is what appendLocsForStack expects.
+		locs = b.appendLocsForStack(locs[:0], p.Stack(index[k]))
+		idx := index[k]
+		var labels func()
+		if p.Label(idx) != nil {
+			labels = func() {
+				for k, v := range *p.Label(idx) {
+					b.pbLabel(tagSample_Label, k, v, 0)
+				}
+			}
+		}
+		b.pbSample(values, locs, labels)
+	}
+	b.build()
+	return nil
+}
+
+// keysByCount sorts keys with higher counts first, breaking ties by key string order.
+type keysByCount struct {
+	keys  []string
+	count map[string]int
+}
+
+func (x *keysByCount) Len() int      { return len(x.keys) }
+func (x *keysByCount) Swap(i, j int) { x.keys[i], x.keys[j] = x.keys[j], x.keys[i] }
+func (x *keysByCount) Less(i, j int) bool {
+	ki, kj := x.keys[i], x.keys[j]
+	ci, cj := x.count[ki], x.count[kj]
+	if ci != cj {
+		return ci > cj
+	}
+	return ki < kj
+}
+
+// printStackRecord prints the function + source line information
+// for a single stack trace.
+func printStackRecord(w io.Writer, stk []uintptr, allFrames bool) {
+	show := allFrames
+	frames := runtime.CallersFrames(stk)
+	for {
+		frame, more := frames.Next()
+		name := frame.Function
+		if name == "" {
+			show = true
+			fmt.Fprintf(w, "#\t%#x\n", frame.PC)
+		} else if name != "runtime.goexit" && (show || !strings.HasPrefix(name, "runtime.")) {
+			// Hide runtime.goexit and any runtime functions at the beginning.
+			// This is useful mainly for allocation traces.
+			show = true
+			fmt.Fprintf(w, "#\t%#x\t%s+%#x\t%s:%d\n", frame.PC, name, frame.PC-frame.Entry, frame.File, frame.Line)
+		}
+		if !more {
+			break
+		}
+	}
+	if !show {
+		// We didn't print anything; do it again,
+		// and this time include runtime functions.
+		printStackRecord(w, stk, true)
+		return
+	}
+	fmt.Fprintf(w, "\n")
+}
+
+// Interface to system profiles.
+
+// WriteHeapProfile is shorthand for Lookup("heap").WriteTo(w, 0).
+// It is preserved for backwards compatibility.
+func WriteHeapProfile(w io.Writer) error {
+	return writeHeap(w, 0)
+}
+
+// countHeap returns the number of records in the heap profile.
+func countHeap() int {
+	n, _ := runtime.MemProfile(nil, true)
+	return n
+}
+
+// writeHeap writes the current runtime heap profile to w.
+func writeHeap(w io.Writer, debug int) error {
+	return writeHeapInternal(w, debug, "")
+}
+
+// writeAlloc writes the current runtime heap profile to w
+// with the total allocation space as the default sample type.
+func writeAlloc(w io.Writer, debug int) error {
+	return writeHeapInternal(w, debug, "alloc_space")
+}
+
+func writeHeapInternal(w io.Writer, debug int, defaultSampleType string) error {
+	var memStats *runtime.MemStats
+	if debug != 0 {
+		// Read mem stats first, so that our other allocations
+		// do not appear in the statistics.
+		memStats = new(runtime.MemStats)
+		runtime.ReadMemStats(memStats)
+	}
+
+	// Find out how many records there are (MemProfile(nil, true)),
+	// allocate that many records, and get the data.
+	// There's a race—more records might be added between
+	// the two calls—so allocate a few extra records for safety
+	// and also try again if we're very unlucky.
+	// The loop should only execute one iteration in the common case.
+	var p []runtime.MemProfileRecord
+	n, ok := runtime.MemProfile(nil, true)
+	for {
+		// Allocate room for a slightly bigger profile,
+		// in case a few more entries have been added
+		// since the call to MemProfile.
+		p = make([]runtime.MemProfileRecord, n+50)
+		n, ok = runtime.MemProfile(p, true)
+		if ok {
+			p = p[0:n]
+			break
+		}
+		// Profile grew; try again.
+	}
+
+	if debug == 0 {
+		return writeHeapProto(w, p, int64(runtime.MemProfileRate), defaultSampleType)
+	}
+
+	sort.Slice(p, func(i, j int) bool { return p[i].InUseBytes() > p[j].InUseBytes() })
+
+	b := bufio.NewWriter(w)
+	tw := tabwriter.NewWriter(b, 1, 8, 1, '\t', 0)
+	w = tw
+
+	var total runtime.MemProfileRecord
+	for i := range p {
+		r := &p[i]
+		total.AllocBytes += r.AllocBytes
+		total.AllocObjects += r.AllocObjects
+		total.FreeBytes += r.FreeBytes
+		total.FreeObjects += r.FreeObjects
+	}
+
+	// Technically the rate is MemProfileRate not 2*MemProfileRate,
+	// but early versions of the C++ heap profiler reported 2*MemProfileRate,
+	// so that's what pprof has come to expect.
+	fmt.Fprintf(w, "heap profile: %d: %d [%d: %d] @ heap/%d\n",
+		total.InUseObjects(), total.InUseBytes(),
+		total.AllocObjects, total.AllocBytes,
+		2*runtime.MemProfileRate)
+
+	for i := range p {
+		r := &p[i]
+		fmt.Fprintf(w, "%d: %d [%d: %d] @",
+			r.InUseObjects(), r.InUseBytes(),
+			r.AllocObjects, r.AllocBytes)
+		for _, pc := range r.Stack() {
+			fmt.Fprintf(w, " %#x", pc)
+		}
+		fmt.Fprintf(w, "\n")
+		printStackRecord(w, r.Stack(), false)
+	}
+
+	// Print memstats information too.
+	// Pprof will ignore, but useful for people
+	s := memStats
+	fmt.Fprintf(w, "\n# runtime.MemStats\n")
+	fmt.Fprintf(w, "# Alloc = %d\n", s.Alloc)
+	fmt.Fprintf(w, "# TotalAlloc = %d\n", s.TotalAlloc)
+	fmt.Fprintf(w, "# Sys = %d\n", s.Sys)
+	fmt.Fprintf(w, "# Lookups = %d\n", s.Lookups)
+	fmt.Fprintf(w, "# Mallocs = %d\n", s.Mallocs)
+	fmt.Fprintf(w, "# Frees = %d\n", s.Frees)
+
+	fmt.Fprintf(w, "# HeapAlloc = %d\n", s.HeapAlloc)
+	fmt.Fprintf(w, "# HeapSys = %d\n", s.HeapSys)
+	fmt.Fprintf(w, "# HeapIdle = %d\n", s.HeapIdle)
+	fmt.Fprintf(w, "# HeapInuse = %d\n", s.HeapInuse)
+	fmt.Fprintf(w, "# HeapReleased = %d\n", s.HeapReleased)
+	fmt.Fprintf(w, "# HeapObjects = %d\n", s.HeapObjects)
+
+	fmt.Fprintf(w, "# Stack = %d / %d\n", s.StackInuse, s.StackSys)
+	fmt.Fprintf(w, "# MSpan = %d / %d\n", s.MSpanInuse, s.MSpanSys)
+	fmt.Fprintf(w, "# MCache = %d / %d\n", s.MCacheInuse, s.MCacheSys)
+	fmt.Fprintf(w, "# BuckHashSys = %d\n", s.BuckHashSys)
+	fmt.Fprintf(w, "# GCSys = %d\n", s.GCSys)
+	fmt.Fprintf(w, "# OtherSys = %d\n", s.OtherSys)
+
+	fmt.Fprintf(w, "# NextGC = %d\n", s.NextGC)
+	fmt.Fprintf(w, "# LastGC = %d\n", s.LastGC)
+	fmt.Fprintf(w, "# PauseNs = %d\n", s.PauseNs)
+	fmt.Fprintf(w, "# PauseEnd = %d\n", s.PauseEnd)
+	fmt.Fprintf(w, "# NumGC = %d\n", s.NumGC)
+	fmt.Fprintf(w, "# NumForcedGC = %d\n", s.NumForcedGC)
+	fmt.Fprintf(w, "# GCCPUFraction = %v\n", s.GCCPUFraction)
+	fmt.Fprintf(w, "# DebugGC = %v\n", s.DebugGC)
+
+	// Also flush out MaxRSS on supported platforms.
+	addMaxRSS(w)
+
+	tw.Flush()
+	return b.Flush()
+}
+
+// countThreadCreate returns the size of the current ThreadCreateProfile.
+func countThreadCreate() int {
+	n, _ := runtime.ThreadCreateProfile(nil)
+	return n
+}
+
+// writeThreadCreate writes the current runtime ThreadCreateProfile to w.
+func writeThreadCreate(w io.Writer, debug int) error {
+	// Until https://golang.org/issues/6104 is addressed, wrap
+	// ThreadCreateProfile because there's no point in tracking labels when we
+	// don't get any stack-traces.
+	return writeRuntimeProfile(w, debug, "threadcreate", func(p []runtime.StackRecord, _ []unsafe.Pointer) (n int, ok bool) {
+		return runtime.ThreadCreateProfile(p)
+	})
+}
+
+// countGoroutine returns the number of goroutines.
+func countGoroutine() int {
+	return runtime.NumGoroutine()
+}
+
+// runtime_goroutineProfileWithLabels is defined in runtime/mprof.go
+func runtime_goroutineProfileWithLabels(p []runtime.StackRecord, labels []unsafe.Pointer) (n int, ok bool)
+
+// writeGoroutine writes the current runtime GoroutineProfile to w.
+func writeGoroutine(w io.Writer, debug int) error {
+	if debug >= 2 {
+		return writeGoroutineStacks(w)
+	}
+	return writeRuntimeProfile(w, debug, "goroutine", runtime_goroutineProfileWithLabels)
+}
+
+func writeGoroutineStacks(w io.Writer) error {
+	// We don't know how big the buffer needs to be to collect
+	// all the goroutines. Start with 1 MB and try a few times, doubling each time.
+	// Give up and use a truncated trace if 64 MB is not enough.
+	buf := make([]byte, 1<<20)
+	for i := 0; ; i++ {
+		n := runtime.Stack(buf, true)
+		if n < len(buf) {
+			buf = buf[:n]
+			break
+		}
+		if len(buf) >= 64<<20 {
+			// Filled 64 MB - stop there.
+			break
+		}
+		buf = make([]byte, 2*len(buf))
+	}
+	_, err := w.Write(buf)
+	return err
+}
+
+func writeRuntimeProfile(w io.Writer, debug int, name string, fetch func([]runtime.StackRecord, []unsafe.Pointer) (int, bool)) error {
+	// Find out how many records there are (fetch(nil)),
+	// allocate that many records, and get the data.
+	// There's a race—more records might be added between
+	// the two calls—so allocate a few extra records for safety
+	// and also try again if we're very unlucky.
+	// The loop should only execute one iteration in the common case.
+	var p []runtime.StackRecord
+	var labels []unsafe.Pointer
+	n, ok := fetch(nil, nil)
+	for {
+		// Allocate room for a slightly bigger profile,
+		// in case a few more entries have been added
+		// since the call to ThreadProfile.
+		p = make([]runtime.StackRecord, n+10)
+		labels = make([]unsafe.Pointer, n+10)
+		n, ok = fetch(p, labels)
+		if ok {
+			p = p[0:n]
+			break
+		}
+		// Profile grew; try again.
+	}
+
+	return printCountProfile(w, debug, name, &runtimeProfile{p, labels})
+}
+
+type runtimeProfile struct {
+	stk    []runtime.StackRecord
+	labels []unsafe.Pointer
+}
+
+func (p *runtimeProfile) Len() int              { return len(p.stk) }
+func (p *runtimeProfile) Stack(i int) []uintptr { return p.stk[i].Stack() }
+func (p *runtimeProfile) Label(i int) *labelMap { return (*labelMap)(p.labels[i]) }
+
+var cpu struct {
+	sync.Mutex
+	profiling bool
+	done      chan bool
+}
+
+// StartCPUProfile enables CPU profiling for the current process.
+// While profiling, the profile will be buffered and written to w.
+// StartCPUProfile returns an error if profiling is already enabled.
+//
+// On Unix-like systems, StartCPUProfile does not work by default for
+// Go code built with -buildmode=c-archive or -buildmode=c-shared.
+// StartCPUProfile relies on the SIGPROF signal, but that signal will
+// be delivered to the main program's SIGPROF signal handler (if any)
+// not to the one used by Go. To make it work, call os/signal.Notify
+// for syscall.SIGPROF, but note that doing so may break any profiling
+// being done by the main program.
+func StartCPUProfile(w io.Writer) error {
+	// The runtime routines allow a variable profiling rate,
+	// but in practice operating systems cannot trigger signals
+	// at more than about 500 Hz, and our processing of the
+	// signal is not cheap (mostly getting the stack trace).
+	// 100 Hz is a reasonable choice: it is frequent enough to
+	// produce useful data, rare enough not to bog down the
+	// system, and a nice round number to make it easy to
+	// convert sample counts to seconds. Instead of requiring
+	// each client to specify the frequency, we hard code it.
+	const hz = 100
+
+	cpu.Lock()
+	defer cpu.Unlock()
+	if cpu.done == nil {
+		cpu.done = make(chan bool)
+	}
+	// Double-check.
+	if cpu.profiling {
+		return fmt.Errorf("cpu profiling already in use")
+	}
+	cpu.profiling = true
+	runtime.SetCPUProfileRate(hz)
+	go profileWriter(w)
+	return nil
+}
+
+// readProfile, provided by the runtime, returns the next chunk of
+// binary CPU profiling stack trace data, blocking until data is available.
+// If profiling is turned off and all the profile data accumulated while it was
+// on has been returned, readProfile returns eof=true.
+// The caller must save the returned data and tags before calling readProfile again.
+func readProfile() (data []uint64, tags []unsafe.Pointer, eof bool)
+
+func profileWriter(w io.Writer) {
+	b := newProfileBuilder(w)
+	var err error
+	for {
+		time.Sleep(100 * time.Millisecond)
+		data, tags, eof := readProfile()
+		if e := b.addCPUData(data, tags); e != nil && err == nil {
+			err = e
+		}
+		if eof {
+			break
+		}
+	}
+	if err != nil {
+		// The runtime should never produce an invalid or truncated profile.
+		// It drops records that can't fit into its log buffers.
+		panic("runtime/pprof: converting profile: " + err.Error())
+	}
+	b.build()
+	cpu.done <- true
+}
+
+// StopCPUProfile stops the current CPU profile, if any.
+// StopCPUProfile only returns after all the writes for the
+// profile have completed.
+func StopCPUProfile() {
+	cpu.Lock()
+	defer cpu.Unlock()
+
+	if !cpu.profiling {
+		return
+	}
+	cpu.profiling = false
+	runtime.SetCPUProfileRate(0)
+	<-cpu.done
+}
+
+// countBlock returns the number of records in the blocking profile.
+func countBlock() int {
+	n, _ := runtime.BlockProfile(nil)
+	return n
+}
+
+// countMutex returns the number of records in the mutex profile.
+func countMutex() int {
+	n, _ := runtime.MutexProfile(nil)
+	return n
+}
+
+// writeBlock writes the current blocking profile to w.
+func writeBlock(w io.Writer, debug int) error {
+	var p []runtime.BlockProfileRecord
+	n, ok := runtime.BlockProfile(nil)
+	for {
+		p = make([]runtime.BlockProfileRecord, n+50)
+		n, ok = runtime.BlockProfile(p)
+		if ok {
+			p = p[:n]
+			break
+		}
+	}
+
+	sort.Slice(p, func(i, j int) bool { return p[i].Cycles > p[j].Cycles })
+
+	if debug <= 0 {
+		return printCountCycleProfile(w, "contentions", "delay", scaleBlockProfile, p)
+	}
+
+	b := bufio.NewWriter(w)
+	tw := tabwriter.NewWriter(w, 1, 8, 1, '\t', 0)
+	w = tw
+
+	fmt.Fprintf(w, "--- contention:\n")
+	fmt.Fprintf(w, "cycles/second=%v\n", runtime_cyclesPerSecond())
+	for i := range p {
+		r := &p[i]
+		fmt.Fprintf(w, "%v %v @", r.Cycles, r.Count)
+		for _, pc := range r.Stack() {
+			fmt.Fprintf(w, " %#x", pc)
+		}
+		fmt.Fprint(w, "\n")
+		if debug > 0 {
+			printStackRecord(w, r.Stack(), true)
+		}
+	}
+
+	if tw != nil {
+		tw.Flush()
+	}
+	return b.Flush()
+}
+
+func scaleBlockProfile(cnt int64, ns float64) (int64, float64) {
+	// Do nothing.
+	// The current way of block profile sampling makes it
+	// hard to compute the unsampled number. The legacy block
+	// profile parse doesn't attempt to scale or unsample.
+	return cnt, ns
+}
+
+// writeMutex writes the current mutex profile to w.
+func writeMutex(w io.Writer, debug int) error {
+	// TODO(pjw): too much common code with writeBlock. FIX!
+	var p []runtime.BlockProfileRecord
+	n, ok := runtime.MutexProfile(nil)
+	for {
+		p = make([]runtime.BlockProfileRecord, n+50)
+		n, ok = runtime.MutexProfile(p)
+		if ok {
+			p = p[:n]
+			break
+		}
+	}
+
+	sort.Slice(p, func(i, j int) bool { return p[i].Cycles > p[j].Cycles })
+
+	if debug <= 0 {
+		return printCountCycleProfile(w, "contentions", "delay", scaleMutexProfile, p)
+	}
+
+	b := bufio.NewWriter(w)
+	tw := tabwriter.NewWriter(w, 1, 8, 1, '\t', 0)
+	w = tw
+
+	fmt.Fprintf(w, "--- mutex:\n")
+	fmt.Fprintf(w, "cycles/second=%v\n", runtime_cyclesPerSecond())
+	fmt.Fprintf(w, "sampling period=%d\n", runtime.SetMutexProfileFraction(-1))
+	for i := range p {
+		r := &p[i]
+		fmt.Fprintf(w, "%v %v @", r.Cycles, r.Count)
+		for _, pc := range r.Stack() {
+			fmt.Fprintf(w, " %#x", pc)
+		}
+		fmt.Fprint(w, "\n")
+		if debug > 0 {
+			printStackRecord(w, r.Stack(), true)
+		}
+	}
+
+	if tw != nil {
+		tw.Flush()
+	}
+	return b.Flush()
+}
+
+func scaleMutexProfile(cnt int64, ns float64) (int64, float64) {
+	period := runtime.SetMutexProfileFraction(-1)
+	return cnt * int64(period), ns * float64(period)
+}
+
+func runtime_cyclesPerSecond() int64
diff --git a/src/runtime/pprof/pprof_norusage.go b/src/runtime/pprof/pprof_norusage.go
new file mode 100644
index 0000000..6fdcc6c
--- /dev/null
+++ b/src/runtime/pprof/pprof_norusage.go
@@ -0,0 +1,15 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !darwin,!linux
+
+package pprof
+
+import (
+	"io"
+)
+
+// Stub call for platforms that don't support rusage.
+func addMaxRSS(w io.Writer) {
+}
diff --git a/src/runtime/pprof/pprof_rusage.go b/src/runtime/pprof/pprof_rusage.go
new file mode 100644
index 0000000..7954673
--- /dev/null
+++ b/src/runtime/pprof/pprof_rusage.go
@@ -0,0 +1,31 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build darwin linux
+
+package pprof
+
+import (
+	"fmt"
+	"io"
+	"runtime"
+	"syscall"
+)
+
+// Adds MaxRSS to platforms that are supported.
+func addMaxRSS(w io.Writer) {
+	var rssToBytes uintptr
+	switch runtime.GOOS {
+	case "linux", "android":
+		rssToBytes = 1024
+	case "darwin", "ios":
+		rssToBytes = 1
+	default:
+		panic("unsupported OS")
+	}
+
+	var rusage syscall.Rusage
+	syscall.Getrusage(0, &rusage)
+	fmt.Fprintf(w, "# MaxRSS = %d\n", uintptr(rusage.Maxrss)*rssToBytes)
+}
diff --git a/src/runtime/pprof/pprof_test.go b/src/runtime/pprof/pprof_test.go
new file mode 100644
index 0000000..0e0cccb
--- /dev/null
+++ b/src/runtime/pprof/pprof_test.go
@@ -0,0 +1,1460 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !js
+
+package pprof
+
+import (
+	"bytes"
+	"context"
+	"fmt"
+	"internal/profile"
+	"internal/testenv"
+	"io"
+	"math/big"
+	"os"
+	"os/exec"
+	"regexp"
+	"runtime"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"time"
+)
+
+func cpuHogger(f func(x int) int, y *int, dur time.Duration) {
+	// We only need to get one 100 Hz clock tick, so we've got
+	// a large safety buffer.
+	// But do at least 500 iterations (which should take about 100ms),
+	// otherwise TestCPUProfileMultithreaded can fail if only one
+	// thread is scheduled during the testing period.
+	t0 := time.Now()
+	accum := *y
+	for i := 0; i < 500 || time.Since(t0) < dur; i++ {
+		accum = f(accum)
+	}
+	*y = accum
+}
+
+var (
+	salt1 = 0
+	salt2 = 0
+)
+
+// The actual CPU hogging function.
+// Must not call other functions nor access heap/globals in the loop,
+// otherwise under race detector the samples will be in the race runtime.
+func cpuHog1(x int) int {
+	return cpuHog0(x, 1e5)
+}
+
+func cpuHog0(x, n int) int {
+	foo := x
+	for i := 0; i < n; i++ {
+		if foo > 0 {
+			foo *= foo
+		} else {
+			foo *= foo + 1
+		}
+	}
+	return foo
+}
+
+func cpuHog2(x int) int {
+	foo := x
+	for i := 0; i < 1e5; i++ {
+		if foo > 0 {
+			foo *= foo
+		} else {
+			foo *= foo + 2
+		}
+	}
+	return foo
+}
+
+// Return a list of functions that we don't want to ever appear in CPU
+// profiles. For gccgo, that list includes the sigprof handler itself.
+func avoidFunctions() []string {
+	if runtime.Compiler == "gccgo" {
+		return []string{"runtime.sigprof"}
+	}
+	return nil
+}
+
+func TestCPUProfile(t *testing.T) {
+	testCPUProfile(t, stackContains, []string{"runtime/pprof.cpuHog1"}, avoidFunctions(), func(dur time.Duration) {
+		cpuHogger(cpuHog1, &salt1, dur)
+	})
+}
+
+func TestCPUProfileMultithreaded(t *testing.T) {
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(2))
+	testCPUProfile(t, stackContains, []string{"runtime/pprof.cpuHog1", "runtime/pprof.cpuHog2"}, avoidFunctions(), func(dur time.Duration) {
+		c := make(chan int)
+		go func() {
+			cpuHogger(cpuHog1, &salt1, dur)
+			c <- 1
+		}()
+		cpuHogger(cpuHog2, &salt2, dur)
+		<-c
+	})
+}
+
+// containsInlinedCall reports whether the function body for the function f is
+// known to contain an inlined function call within the first maxBytes bytes.
+func containsInlinedCall(f interface{}, maxBytes int) bool {
+	_, found := findInlinedCall(f, maxBytes)
+	return found
+}
+
+// findInlinedCall returns the PC of an inlined function call within
+// the function body for the function f if any.
+func findInlinedCall(f interface{}, maxBytes int) (pc uint64, found bool) {
+	fFunc := runtime.FuncForPC(uintptr(funcPC(f)))
+	if fFunc == nil || fFunc.Entry() == 0 {
+		panic("failed to locate function entry")
+	}
+
+	for offset := 0; offset < maxBytes; offset++ {
+		innerPC := fFunc.Entry() + uintptr(offset)
+		inner := runtime.FuncForPC(innerPC)
+		if inner == nil {
+			// No function known for this PC value.
+			// It might simply be misaligned, so keep searching.
+			continue
+		}
+		if inner.Entry() != fFunc.Entry() {
+			// Scanned past f and didn't find any inlined functions.
+			break
+		}
+		if inner.Name() != fFunc.Name() {
+			// This PC has f as its entry-point, but is not f. Therefore, it must be a
+			// function inlined into f.
+			return uint64(innerPC), true
+		}
+	}
+
+	return 0, false
+}
+
+func TestCPUProfileInlining(t *testing.T) {
+	if !containsInlinedCall(inlinedCaller, 4<<10) {
+		t.Skip("Can't determine whether inlinedCallee was inlined into inlinedCaller.")
+	}
+
+	p := testCPUProfile(t, stackContains, []string{"runtime/pprof.inlinedCallee", "runtime/pprof.inlinedCaller"}, avoidFunctions(), func(dur time.Duration) {
+		cpuHogger(inlinedCaller, &salt1, dur)
+	})
+
+	// Check if inlined function locations are encoded correctly. The inlinedCalee and inlinedCaller should be in one location.
+	for _, loc := range p.Location {
+		hasInlinedCallerAfterInlinedCallee, hasInlinedCallee := false, false
+		for _, line := range loc.Line {
+			if line.Function.Name == "runtime/pprof.inlinedCallee" {
+				hasInlinedCallee = true
+			}
+			if hasInlinedCallee && line.Function.Name == "runtime/pprof.inlinedCaller" {
+				hasInlinedCallerAfterInlinedCallee = true
+			}
+		}
+		if hasInlinedCallee != hasInlinedCallerAfterInlinedCallee {
+			t.Fatalf("want inlinedCallee followed by inlinedCaller, got separate Location entries:\n%v", p)
+		}
+	}
+}
+
+func inlinedCaller(x int) int {
+	x = inlinedCallee(x, 1e5)
+	return x
+}
+
+func inlinedCallee(x, n int) int {
+	return cpuHog0(x, n)
+}
+
+//go:noinline
+func dumpCallers(pcs []uintptr) {
+	if pcs == nil {
+		return
+	}
+
+	skip := 2 // Callers and dumpCallers
+	runtime.Callers(skip, pcs)
+}
+
+//go:noinline
+func inlinedCallerDump(pcs []uintptr) {
+	inlinedCalleeDump(pcs)
+}
+
+func inlinedCalleeDump(pcs []uintptr) {
+	dumpCallers(pcs)
+}
+
+func TestCPUProfileRecursion(t *testing.T) {
+	p := testCPUProfile(t, stackContains, []string{"runtime/pprof.inlinedCallee", "runtime/pprof.recursionCallee", "runtime/pprof.recursionCaller"}, avoidFunctions(), func(dur time.Duration) {
+		cpuHogger(recursionCaller, &salt1, dur)
+	})
+
+	// check the Location encoding was not confused by recursive calls.
+	for i, loc := range p.Location {
+		recursionFunc := 0
+		for _, line := range loc.Line {
+			if name := line.Function.Name; name == "runtime/pprof.recursionCaller" || name == "runtime/pprof.recursionCallee" {
+				recursionFunc++
+			}
+		}
+		if recursionFunc > 1 {
+			t.Fatalf("want at most one recursionCaller or recursionCallee in one Location, got a violating Location (index: %d):\n%v", i, p)
+		}
+	}
+}
+
+func recursionCaller(x int) int {
+	y := recursionCallee(3, x)
+	return y
+}
+
+func recursionCallee(n, x int) int {
+	if n == 0 {
+		return 1
+	}
+	y := inlinedCallee(x, 1e4)
+	return y * recursionCallee(n-1, x)
+}
+
+func recursionChainTop(x int, pcs []uintptr) {
+	if x < 0 {
+		return
+	}
+	recursionChainMiddle(x, pcs)
+}
+
+func recursionChainMiddle(x int, pcs []uintptr) {
+	recursionChainBottom(x, pcs)
+}
+
+func recursionChainBottom(x int, pcs []uintptr) {
+	// This will be called each time, we only care about the last. We
+	// can't make this conditional or this function won't be inlined.
+	dumpCallers(pcs)
+
+	recursionChainTop(x-1, pcs)
+}
+
+func parseProfile(t *testing.T, valBytes []byte, f func(uintptr, []*profile.Location, map[string][]string)) *profile.Profile {
+	p, err := profile.Parse(bytes.NewReader(valBytes))
+	if err != nil {
+		t.Fatal(err)
+	}
+	for _, sample := range p.Sample {
+		count := uintptr(sample.Value[0])
+		f(count, sample.Location, sample.Label)
+	}
+	return p
+}
+
+// testCPUProfile runs f under the CPU profiler, checking for some conditions specified by need,
+// as interpreted by matches, and returns the parsed profile.
+func testCPUProfile(t *testing.T, matches matchFunc, need []string, avoid []string, f func(dur time.Duration)) *profile.Profile {
+	switch runtime.GOOS {
+	case "darwin", "ios":
+		switch runtime.GOARCH {
+		case "arm64":
+			// nothing
+		default:
+			out, err := exec.Command("uname", "-a").CombinedOutput()
+			if err != nil {
+				t.Fatal(err)
+			}
+			vers := string(out)
+			t.Logf("uname -a: %v", vers)
+		}
+	case "plan9":
+		t.Skip("skipping on plan9")
+	}
+
+	broken := false
+	switch runtime.GOOS {
+	// See https://golang.org/issue/45170 for AIX.
+	case "darwin", "ios", "dragonfly", "netbsd", "illumos", "solaris", "aix":
+		broken = true
+	case "openbsd":
+		if runtime.GOARCH == "arm" || runtime.GOARCH == "arm64" {
+			broken = true
+		}
+	case "windows":
+		if runtime.GOARCH == "arm" {
+			broken = true // See https://golang.org/issues/42862
+		}
+	}
+
+	maxDuration := 5 * time.Second
+	if testing.Short() && broken {
+		// If it's expected to be broken, no point waiting around.
+		maxDuration /= 10
+	}
+
+	// If we're running a long test, start with a long duration
+	// for tests that try to make sure something *doesn't* happen.
+	duration := 5 * time.Second
+	if testing.Short() {
+		duration = 100 * time.Millisecond
+	}
+
+	// Profiling tests are inherently flaky, especially on a
+	// loaded system, such as when this test is running with
+	// several others under go test std. If a test fails in a way
+	// that could mean it just didn't run long enough, try with a
+	// longer duration.
+	for duration <= maxDuration {
+		var prof bytes.Buffer
+		if err := StartCPUProfile(&prof); err != nil {
+			t.Fatal(err)
+		}
+		f(duration)
+		StopCPUProfile()
+
+		if p, ok := profileOk(t, matches, need, avoid, prof, duration); ok {
+			return p
+		}
+
+		duration *= 2
+		if duration <= maxDuration {
+			t.Logf("retrying with %s duration", duration)
+		}
+	}
+
+	if broken {
+		t.Skipf("ignoring failure on %s/%s; see golang.org/issue/13841", runtime.GOOS, runtime.GOARCH)
+	}
+
+	// Ignore the failure if the tests are running in a QEMU-based emulator,
+	// QEMU is not perfect at emulating everything.
+	// IN_QEMU environmental variable is set by some of the Go builders.
+	// IN_QEMU=1 indicates that the tests are running in QEMU. See issue 9605.
+	if os.Getenv("IN_QEMU") == "1" {
+		t.Skip("ignore the failure in QEMU; see golang.org/issue/9605")
+	}
+	t.FailNow()
+	return nil
+}
+
+func contains(slice []string, s string) bool {
+	for i := range slice {
+		if slice[i] == s {
+			return true
+		}
+	}
+	return false
+}
+
+// stackContains matches if a function named spec appears anywhere in the stack trace.
+func stackContains(spec string, count uintptr, stk []*profile.Location, labels map[string][]string) bool {
+	for _, loc := range stk {
+		for _, line := range loc.Line {
+			if strings.Contains(line.Function.Name, spec) {
+				return true
+			}
+		}
+	}
+	return false
+}
+
+type matchFunc func(spec string, count uintptr, stk []*profile.Location, labels map[string][]string) bool
+
+func profileOk(t *testing.T, matches matchFunc, need []string, avoid []string, prof bytes.Buffer, duration time.Duration) (_ *profile.Profile, ok bool) {
+	ok = true
+
+	// Check that profile is well formed, contains 'need', and does not contain
+	// anything from 'avoid'.
+	have := make([]uintptr, len(need))
+	avoidSamples := make([]uintptr, len(avoid))
+	var samples uintptr
+	var buf bytes.Buffer
+	p := parseProfile(t, prof.Bytes(), func(count uintptr, stk []*profile.Location, labels map[string][]string) {
+		fmt.Fprintf(&buf, "%d:", count)
+		fprintStack(&buf, stk)
+		samples += count
+		for i, spec := range need {
+			if matches(spec, count, stk, labels) {
+				have[i] += count
+			}
+		}
+		for i, name := range avoid {
+			for _, loc := range stk {
+				for _, line := range loc.Line {
+					if strings.Contains(line.Function.Name, name) {
+						avoidSamples[i] += count
+					}
+				}
+			}
+		}
+		fmt.Fprintf(&buf, "\n")
+	})
+	t.Logf("total %d CPU profile samples collected:\n%s", samples, buf.String())
+
+	if samples < 10 && runtime.GOOS == "windows" {
+		// On some windows machines we end up with
+		// not enough samples due to coarse timer
+		// resolution. Let it go.
+		t.Log("too few samples on Windows (golang.org/issue/10842)")
+		return p, false
+	}
+
+	// Check that we got a reasonable number of samples.
+	// We used to always require at least ideal/4 samples,
+	// but that is too hard to guarantee on a loaded system.
+	// Now we accept 10 or more samples, which we take to be
+	// enough to show that at least some profiling is occurring.
+	if ideal := uintptr(duration * 100 / time.Second); samples == 0 || (samples < ideal/4 && samples < 10) {
+		t.Logf("too few samples; got %d, want at least %d, ideally %d", samples, ideal/4, ideal)
+		ok = false
+	}
+
+	for i, name := range avoid {
+		bad := avoidSamples[i]
+		if bad != 0 {
+			t.Logf("found %d samples in avoid-function %s\n", bad, name)
+			ok = false
+		}
+	}
+
+	if len(need) == 0 {
+		return p, ok
+	}
+
+	var total uintptr
+	for i, name := range need {
+		total += have[i]
+		t.Logf("%s: %d\n", name, have[i])
+	}
+	if total == 0 {
+		t.Logf("no samples in expected functions")
+		ok = false
+	}
+	// We'd like to check a reasonable minimum, like
+	// total / len(have) / smallconstant, but this test is
+	// pretty flaky (see bug 7095).  So we'll just test to
+	// make sure we got at least one sample.
+	min := uintptr(1)
+	for i, name := range need {
+		if have[i] < min {
+			t.Logf("%s has %d samples out of %d, want at least %d, ideally %d", name, have[i], total, min, total/uintptr(len(have)))
+			ok = false
+		}
+	}
+	return p, ok
+}
+
+// Fork can hang if preempted with signals frequently enough (see issue 5517).
+// Ensure that we do not do this.
+func TestCPUProfileWithFork(t *testing.T) {
+	testenv.MustHaveExec(t)
+
+	heap := 1 << 30
+	if runtime.GOOS == "android" {
+		// Use smaller size for Android to avoid crash.
+		heap = 100 << 20
+	}
+	if runtime.GOOS == "windows" && runtime.GOARCH == "arm" {
+		// Use smaller heap for Windows/ARM to avoid crash.
+		heap = 100 << 20
+	}
+	if testing.Short() {
+		heap = 100 << 20
+	}
+	// This makes fork slower.
+	garbage := make([]byte, heap)
+	// Need to touch the slice, otherwise it won't be paged in.
+	done := make(chan bool)
+	go func() {
+		for i := range garbage {
+			garbage[i] = 42
+		}
+		done <- true
+	}()
+	<-done
+
+	var prof bytes.Buffer
+	if err := StartCPUProfile(&prof); err != nil {
+		t.Fatal(err)
+	}
+	defer StopCPUProfile()
+
+	for i := 0; i < 10; i++ {
+		exec.Command(os.Args[0], "-h").CombinedOutput()
+	}
+}
+
+// Test that profiler does not observe runtime.gogo as "user" goroutine execution.
+// If it did, it would see inconsistent state and would either record an incorrect stack
+// or crash because the stack was malformed.
+func TestGoroutineSwitch(t *testing.T) {
+	if runtime.Compiler == "gccgo" {
+		t.Skip("not applicable for gccgo")
+	}
+	// How much to try. These defaults take about 1 seconds
+	// on a 2012 MacBook Pro. The ones in short mode take
+	// about 0.1 seconds.
+	tries := 10
+	count := 1000000
+	if testing.Short() {
+		tries = 1
+	}
+	for try := 0; try < tries; try++ {
+		var prof bytes.Buffer
+		if err := StartCPUProfile(&prof); err != nil {
+			t.Fatal(err)
+		}
+		for i := 0; i < count; i++ {
+			runtime.Gosched()
+		}
+		StopCPUProfile()
+
+		// Read profile to look for entries for runtime.gogo with an attempt at a traceback.
+		// The special entry
+		parseProfile(t, prof.Bytes(), func(count uintptr, stk []*profile.Location, _ map[string][]string) {
+			// An entry with two frames with 'System' in its top frame
+			// exists to record a PC without a traceback. Those are okay.
+			if len(stk) == 2 {
+				name := stk[1].Line[0].Function.Name
+				if name == "runtime._System" || name == "runtime._ExternalCode" || name == "runtime._GC" {
+					return
+				}
+			}
+
+			// Otherwise, should not see runtime.gogo.
+			// The place we'd see it would be the inner most frame.
+			name := stk[0].Line[0].Function.Name
+			if name == "runtime.gogo" {
+				var buf bytes.Buffer
+				fprintStack(&buf, stk)
+				t.Fatalf("found profile entry for runtime.gogo:\n%s", buf.String())
+			}
+		})
+	}
+}
+
+func fprintStack(w io.Writer, stk []*profile.Location) {
+	for _, loc := range stk {
+		fmt.Fprintf(w, " %#x", loc.Address)
+		fmt.Fprintf(w, " (")
+		for i, line := range loc.Line {
+			if i > 0 {
+				fmt.Fprintf(w, " ")
+			}
+			fmt.Fprintf(w, "%s:%d", line.Function.Name, line.Line)
+		}
+		fmt.Fprintf(w, ")")
+	}
+	fmt.Fprintf(w, "\n")
+}
+
+// Test that profiling of division operations is okay, especially on ARM. See issue 6681.
+func TestMathBigDivide(t *testing.T) {
+	testCPUProfile(t, nil, nil, nil, func(duration time.Duration) {
+		t := time.After(duration)
+		pi := new(big.Int)
+		for {
+			for i := 0; i < 100; i++ {
+				n := big.NewInt(2646693125139304345)
+				d := big.NewInt(842468587426513207)
+				pi.Div(n, d)
+			}
+			select {
+			case <-t:
+				return
+			default:
+			}
+		}
+	})
+}
+
+// stackContainsAll matches if all functions in spec (comma-separated) appear somewhere in the stack trace.
+func stackContainsAll(spec string, count uintptr, stk []*profile.Location, labels map[string][]string) bool {
+	for _, f := range strings.Split(spec, ",") {
+		if !stackContains(f, count, stk, labels) {
+			return false
+		}
+	}
+	return true
+}
+
+func TestMorestack(t *testing.T) {
+	testCPUProfile(t, stackContainsAll, []string{"runtime.newstack,runtime/pprof.growstack"}, avoidFunctions(), func(duration time.Duration) {
+		t := time.After(duration)
+		c := make(chan bool)
+		for {
+			go func() {
+				growstack1()
+				c <- true
+			}()
+			select {
+			case <-t:
+				return
+			case <-c:
+			}
+		}
+	})
+}
+
+//go:noinline
+func growstack1() {
+	growstack()
+}
+
+//go:noinline
+func growstack() {
+	var buf [8 << 10]byte
+	use(buf)
+}
+
+//go:noinline
+func use(x [8 << 10]byte) {}
+
+func TestBlockProfile(t *testing.T) {
+	type TestCase struct {
+		name string
+		f    func()
+		stk  []string
+		re   string
+	}
+	tests := [...]TestCase{
+		{
+			name: "chan recv",
+			f:    blockChanRecv,
+			stk: []string{
+				"runtime.chanrecv1",
+				"runtime/pprof.blockChanRecv",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	runtime\.chanrecv1\+0x[0-9a-f]+	.*/src/runtime/chan.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockChanRecv\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "chan send",
+			f:    blockChanSend,
+			stk: []string{
+				"runtime.chansend1",
+				"runtime/pprof.blockChanSend",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	runtime\.chansend1\+0x[0-9a-f]+	.*/src/runtime/chan.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockChanSend\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "chan close",
+			f:    blockChanClose,
+			stk: []string{
+				"runtime.chanrecv1",
+				"runtime/pprof.blockChanClose",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	runtime\.chanrecv1\+0x[0-9a-f]+	.*/src/runtime/chan.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockChanClose\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "select recv async",
+			f:    blockSelectRecvAsync,
+			stk: []string{
+				"runtime.selectgo",
+				"runtime/pprof.blockSelectRecvAsync",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	runtime\.selectgo\+0x[0-9a-f]+	.*/src/runtime/select.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockSelectRecvAsync\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "select send sync",
+			f:    blockSelectSendSync,
+			stk: []string{
+				"runtime.selectgo",
+				"runtime/pprof.blockSelectSendSync",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	runtime\.selectgo\+0x[0-9a-f]+	.*/src/runtime/select.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockSelectSendSync\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "mutex",
+			f:    blockMutex,
+			stk: []string{
+				"sync.(*Mutex).Lock",
+				"runtime/pprof.blockMutex",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	sync\.\(\*Mutex\)\.Lock\+0x[0-9a-f]+	.*/src/sync/mutex\.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockMutex\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "cond",
+			f:    blockCond,
+			stk: []string{
+				"sync.(*Cond).Wait",
+				"runtime/pprof.blockCond",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	sync\.\(\*Cond\)\.Wait\+0x[0-9a-f]+	.*/src/sync/cond\.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockCond\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*/src/runtime/pprof/pprof_test.go:[0-9]+
+`},
+	}
+
+	// Generate block profile
+	runtime.SetBlockProfileRate(1)
+	defer runtime.SetBlockProfileRate(0)
+	for _, test := range tests {
+		test.f()
+	}
+
+	t.Run("debug=1", func(t *testing.T) {
+		var w bytes.Buffer
+		Lookup("block").WriteTo(&w, 1)
+		prof := w.String()
+
+		if !strings.HasPrefix(prof, "--- contention:\ncycles/second=") {
+			t.Fatalf("Bad profile header:\n%v", prof)
+		}
+
+		if strings.HasSuffix(prof, "#\t0x0\n\n") {
+			t.Errorf("Useless 0 suffix:\n%v", prof)
+		}
+
+		for _, test := range tests {
+			if !regexp.MustCompile(strings.ReplaceAll(test.re, "\t", "\t+")).MatchString(prof) {
+				t.Errorf("Bad %v entry, expect:\n%v\ngot:\n%v", test.name, test.re, prof)
+			}
+		}
+	})
+
+	t.Run("proto", func(t *testing.T) {
+		// proto format
+		var w bytes.Buffer
+		Lookup("block").WriteTo(&w, 0)
+		p, err := profile.Parse(&w)
+		if err != nil {
+			t.Fatalf("failed to parse profile: %v", err)
+		}
+		t.Logf("parsed proto: %s", p)
+		if err := p.CheckValid(); err != nil {
+			t.Fatalf("invalid profile: %v", err)
+		}
+
+		stks := stacks(p)
+		for _, test := range tests {
+			if !containsStack(stks, test.stk) {
+				t.Errorf("No matching stack entry for %v, want %+v", test.name, test.stk)
+			}
+		}
+	})
+
+}
+
+func stacks(p *profile.Profile) (res [][]string) {
+	for _, s := range p.Sample {
+		var stk []string
+		for _, l := range s.Location {
+			for _, line := range l.Line {
+				stk = append(stk, line.Function.Name)
+			}
+		}
+		res = append(res, stk)
+	}
+	return res
+}
+
+func containsStack(got [][]string, want []string) bool {
+	for _, stk := range got {
+		if len(stk) < len(want) {
+			continue
+		}
+		for i, f := range want {
+			if f != stk[i] {
+				break
+			}
+			if i == len(want)-1 {
+				return true
+			}
+		}
+	}
+	return false
+}
+
+const blockDelay = 10 * time.Millisecond
+
+func blockChanRecv() {
+	c := make(chan bool)
+	go func() {
+		time.Sleep(blockDelay)
+		c <- true
+	}()
+	<-c
+}
+
+func blockChanSend() {
+	c := make(chan bool)
+	go func() {
+		time.Sleep(blockDelay)
+		<-c
+	}()
+	c <- true
+}
+
+func blockChanClose() {
+	c := make(chan bool)
+	go func() {
+		time.Sleep(blockDelay)
+		close(c)
+	}()
+	<-c
+}
+
+func blockSelectRecvAsync() {
+	const numTries = 3
+	c := make(chan bool, 1)
+	c2 := make(chan bool, 1)
+	go func() {
+		for i := 0; i < numTries; i++ {
+			time.Sleep(blockDelay)
+			c <- true
+		}
+	}()
+	for i := 0; i < numTries; i++ {
+		select {
+		case <-c:
+		case <-c2:
+		}
+	}
+}
+
+func blockSelectSendSync() {
+	c := make(chan bool)
+	c2 := make(chan bool)
+	go func() {
+		time.Sleep(blockDelay)
+		<-c
+	}()
+	select {
+	case c <- true:
+	case c2 <- true:
+	}
+}
+
+func blockMutex() {
+	var mu sync.Mutex
+	mu.Lock()
+	go func() {
+		time.Sleep(blockDelay)
+		mu.Unlock()
+	}()
+	// Note: Unlock releases mu before recording the mutex event,
+	// so it's theoretically possible for this to proceed and
+	// capture the profile before the event is recorded. As long
+	// as this is blocked before the unlock happens, it's okay.
+	mu.Lock()
+}
+
+func blockCond() {
+	var mu sync.Mutex
+	c := sync.NewCond(&mu)
+	mu.Lock()
+	go func() {
+		time.Sleep(blockDelay)
+		mu.Lock()
+		c.Signal()
+		mu.Unlock()
+	}()
+	c.Wait()
+	mu.Unlock()
+}
+
+func TestMutexProfile(t *testing.T) {
+	// Generate mutex profile
+
+	old := runtime.SetMutexProfileFraction(1)
+	defer runtime.SetMutexProfileFraction(old)
+	if old != 0 {
+		t.Fatalf("need MutexProfileRate 0, got %d", old)
+	}
+
+	blockMutex()
+
+	t.Run("debug=1", func(t *testing.T) {
+		var w bytes.Buffer
+		Lookup("mutex").WriteTo(&w, 1)
+		prof := w.String()
+		t.Logf("received profile: %v", prof)
+
+		if !strings.HasPrefix(prof, "--- mutex:\ncycles/second=") {
+			t.Errorf("Bad profile header:\n%v", prof)
+		}
+		prof = strings.Trim(prof, "\n")
+		lines := strings.Split(prof, "\n")
+		if len(lines) != 6 {
+			t.Errorf("expected 6 lines, got %d %q\n%s", len(lines), prof, prof)
+		}
+		if len(lines) < 6 {
+			return
+		}
+		// checking that the line is like "35258904 1 @ 0x48288d 0x47cd28 0x458931"
+		r2 := `^\d+ \d+ @(?: 0x[[:xdigit:]]+)+`
+		//r2 := "^[0-9]+ 1 @ 0x[0-9a-f x]+$"
+		if ok, err := regexp.MatchString(r2, lines[3]); err != nil || !ok {
+			t.Errorf("%q didn't match %q", lines[3], r2)
+		}
+		r3 := "^#.*runtime/pprof.blockMutex.*$"
+		if ok, err := regexp.MatchString(r3, lines[5]); err != nil || !ok {
+			t.Errorf("%q didn't match %q", lines[5], r3)
+		}
+		t.Logf(prof)
+	})
+	t.Run("proto", func(t *testing.T) {
+		// proto format
+		var w bytes.Buffer
+		Lookup("mutex").WriteTo(&w, 0)
+		p, err := profile.Parse(&w)
+		if err != nil {
+			t.Fatalf("failed to parse profile: %v", err)
+		}
+		t.Logf("parsed proto: %s", p)
+		if err := p.CheckValid(); err != nil {
+			t.Fatalf("invalid profile: %v", err)
+		}
+
+		stks := stacks(p)
+		for _, want := range [][]string{
+			{"sync.(*Mutex).Unlock", "runtime/pprof.blockMutex.func1"},
+		} {
+			if !containsStack(stks, want) {
+				t.Errorf("No matching stack entry for %+v", want)
+			}
+		}
+	})
+}
+
+func func1(c chan int) { <-c }
+func func2(c chan int) { <-c }
+func func3(c chan int) { <-c }
+func func4(c chan int) { <-c }
+
+func TestGoroutineCounts(t *testing.T) {
+	// Setting GOMAXPROCS to 1 ensures we can force all goroutines to the
+	// desired blocking point.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+
+	c := make(chan int)
+	for i := 0; i < 100; i++ {
+		switch {
+		case i%10 == 0:
+			go func1(c)
+		case i%2 == 0:
+			go func2(c)
+		default:
+			go func3(c)
+		}
+		// Let goroutines block on channel
+		for j := 0; j < 5; j++ {
+			runtime.Gosched()
+		}
+	}
+	ctx := context.Background()
+
+	// ... and again, with labels this time (just with fewer iterations to keep
+	// sorting deterministic).
+	Do(ctx, Labels("label", "value"), func(context.Context) {
+		for i := 0; i < 89; i++ {
+			switch {
+			case i%10 == 0:
+				go func1(c)
+			case i%2 == 0:
+				go func2(c)
+			default:
+				go func3(c)
+			}
+			// Let goroutines block on channel
+			for j := 0; j < 5; j++ {
+				runtime.Gosched()
+			}
+		}
+	})
+
+	var w bytes.Buffer
+	goroutineProf := Lookup("goroutine")
+
+	// Check debug profile
+	goroutineProf.WriteTo(&w, 1)
+	prof := w.String()
+
+	labels := labelMap{"label": "value"}
+	labelStr := "\n# labels: " + labels.String()
+	if !containsInOrder(prof, "\n50 @ ", "\n44 @", labelStr,
+		"\n40 @", "\n36 @", labelStr, "\n10 @", "\n9 @", labelStr, "\n1 @") {
+		t.Errorf("expected sorted goroutine counts with Labels:\n%s", prof)
+	}
+
+	// Check proto profile
+	w.Reset()
+	goroutineProf.WriteTo(&w, 0)
+	p, err := profile.Parse(&w)
+	if err != nil {
+		t.Errorf("error parsing protobuf profile: %v", err)
+	}
+	if err := p.CheckValid(); err != nil {
+		t.Errorf("protobuf profile is invalid: %v", err)
+	}
+	expectedLabels := map[int64]map[string]string{
+		50: map[string]string{},
+		44: map[string]string{"label": "value"},
+		40: map[string]string{},
+		36: map[string]string{"label": "value"},
+		10: map[string]string{},
+		9:  map[string]string{"label": "value"},
+		1:  map[string]string{},
+	}
+	if !containsCountsLabels(p, expectedLabels) {
+		t.Errorf("expected count profile to contain goroutines with counts and labels %v, got %v",
+			expectedLabels, p)
+	}
+
+	close(c)
+
+	time.Sleep(10 * time.Millisecond) // let goroutines exit
+}
+
+func containsInOrder(s string, all ...string) bool {
+	for _, t := range all {
+		i := strings.Index(s, t)
+		if i < 0 {
+			return false
+		}
+		s = s[i+len(t):]
+	}
+	return true
+}
+
+func containsCountsLabels(prof *profile.Profile, countLabels map[int64]map[string]string) bool {
+	m := make(map[int64]int)
+	type nkey struct {
+		count    int64
+		key, val string
+	}
+	n := make(map[nkey]int)
+	for c, kv := range countLabels {
+		m[c]++
+		for k, v := range kv {
+			n[nkey{
+				count: c,
+				key:   k,
+				val:   v,
+			}]++
+
+		}
+	}
+	for _, s := range prof.Sample {
+		// The count is the single value in the sample
+		if len(s.Value) != 1 {
+			return false
+		}
+		m[s.Value[0]]--
+		for k, vs := range s.Label {
+			for _, v := range vs {
+				n[nkey{
+					count: s.Value[0],
+					key:   k,
+					val:   v,
+				}]--
+			}
+		}
+	}
+	for _, n := range m {
+		if n > 0 {
+			return false
+		}
+	}
+	for _, ncnt := range n {
+		if ncnt != 0 {
+			return false
+		}
+	}
+	return true
+}
+
+var emptyCallStackTestRun int64
+
+// Issue 18836.
+func TestEmptyCallStack(t *testing.T) {
+	name := fmt.Sprintf("test18836_%d", emptyCallStackTestRun)
+	emptyCallStackTestRun++
+
+	t.Parallel()
+	var buf bytes.Buffer
+	p := NewProfile(name)
+
+	p.Add("foo", 47674)
+	p.WriteTo(&buf, 1)
+	p.Remove("foo")
+	got := buf.String()
+	prefix := name + " profile: total 1\n"
+	if !strings.HasPrefix(got, prefix) {
+		t.Fatalf("got:\n\t%q\nwant prefix:\n\t%q\n", got, prefix)
+	}
+	lostevent := "lostProfileEvent"
+	if !strings.Contains(got, lostevent) {
+		t.Fatalf("got:\n\t%q\ndoes not contain:\n\t%q\n", got, lostevent)
+	}
+}
+
+// stackContainsLabeled takes a spec like funcname;key=value and matches if the stack has that key
+// and value and has funcname somewhere in the stack.
+func stackContainsLabeled(spec string, count uintptr, stk []*profile.Location, labels map[string][]string) bool {
+	semi := strings.Index(spec, ";")
+	if semi == -1 {
+		panic("no semicolon in key/value spec")
+	}
+	kv := strings.SplitN(spec[semi+1:], "=", 2)
+	if len(kv) != 2 {
+		panic("missing = in key/value spec")
+	}
+	if !contains(labels[kv[0]], kv[1]) {
+		return false
+	}
+	return stackContains(spec[:semi], count, stk, labels)
+}
+
+func TestCPUProfileLabel(t *testing.T) {
+	testCPUProfile(t, stackContainsLabeled, []string{"runtime/pprof.cpuHogger;key=value"}, avoidFunctions(), func(dur time.Duration) {
+		Do(context.Background(), Labels("key", "value"), func(context.Context) {
+			cpuHogger(cpuHog1, &salt1, dur)
+		})
+	})
+}
+
+func TestLabelRace(t *testing.T) {
+	// Test the race detector annotations for synchronization
+	// between settings labels and consuming them from the
+	// profile.
+	testCPUProfile(t, stackContainsLabeled, []string{"runtime/pprof.cpuHogger;key=value"}, nil, func(dur time.Duration) {
+		start := time.Now()
+		var wg sync.WaitGroup
+		for time.Since(start) < dur {
+			var salts [10]int
+			for i := 0; i < 10; i++ {
+				wg.Add(1)
+				go func(j int) {
+					Do(context.Background(), Labels("key", "value"), func(context.Context) {
+						cpuHogger(cpuHog1, &salts[j], time.Millisecond)
+					})
+					wg.Done()
+				}(i)
+			}
+			wg.Wait()
+		}
+	})
+}
+
+// Check that there is no deadlock when the program receives SIGPROF while in
+// 64bit atomics' critical section. Used to happen on mips{,le}. See #20146.
+func TestAtomicLoadStore64(t *testing.T) {
+	f, err := os.CreateTemp("", "profatomic")
+	if err != nil {
+		t.Fatalf("TempFile: %v", err)
+	}
+	defer os.Remove(f.Name())
+	defer f.Close()
+
+	if err := StartCPUProfile(f); err != nil {
+		t.Fatal(err)
+	}
+	defer StopCPUProfile()
+
+	var flag uint64
+	done := make(chan bool, 1)
+
+	go func() {
+		for atomic.LoadUint64(&flag) == 0 {
+			runtime.Gosched()
+		}
+		done <- true
+	}()
+	time.Sleep(50 * time.Millisecond)
+	atomic.StoreUint64(&flag, 1)
+	<-done
+}
+
+func TestTracebackAll(t *testing.T) {
+	// With gccgo, if a profiling signal arrives at the wrong time
+	// during traceback, it may crash or hang. See issue #29448.
+	f, err := os.CreateTemp("", "proftraceback")
+	if err != nil {
+		t.Fatalf("TempFile: %v", err)
+	}
+	defer os.Remove(f.Name())
+	defer f.Close()
+
+	if err := StartCPUProfile(f); err != nil {
+		t.Fatal(err)
+	}
+	defer StopCPUProfile()
+
+	ch := make(chan int)
+	defer close(ch)
+
+	count := 10
+	for i := 0; i < count; i++ {
+		go func() {
+			<-ch // block
+		}()
+	}
+
+	N := 10000
+	if testing.Short() {
+		N = 500
+	}
+	buf := make([]byte, 10*1024)
+	for i := 0; i < N; i++ {
+		runtime.Stack(buf, true)
+	}
+}
+
+// TestTryAdd tests the cases that are hard to test with real program execution.
+//
+// For example, the current go compilers may not always inline functions
+// involved in recursion but that may not be true in the future compilers. This
+// tests such cases by using fake call sequences and forcing the profile build
+// utilizing translateCPUProfile defined in proto_test.go
+func TestTryAdd(t *testing.T) {
+	if _, found := findInlinedCall(inlinedCallerDump, 4<<10); !found {
+		t.Skip("Can't determine whether anything was inlined into inlinedCallerDump.")
+	}
+
+	// inlinedCallerDump
+	//   inlinedCalleeDump
+	pcs := make([]uintptr, 2)
+	inlinedCallerDump(pcs)
+	inlinedCallerStack := make([]uint64, 2)
+	for i := range pcs {
+		inlinedCallerStack[i] = uint64(pcs[i])
+	}
+
+	if _, found := findInlinedCall(recursionChainBottom, 4<<10); !found {
+		t.Skip("Can't determine whether anything was inlined into recursionChainBottom.")
+	}
+
+	// recursionChainTop
+	//   recursionChainMiddle
+	//     recursionChainBottom
+	//       recursionChainTop
+	//         recursionChainMiddle
+	//           recursionChainBottom
+	pcs = make([]uintptr, 6)
+	recursionChainTop(1, pcs)
+	recursionStack := make([]uint64, len(pcs))
+	for i := range pcs {
+		recursionStack[i] = uint64(pcs[i])
+	}
+
+	period := int64(2000 * 1000) // 1/500*1e9 nanosec.
+
+	testCases := []struct {
+		name        string
+		input       []uint64          // following the input format assumed by profileBuilder.addCPUData.
+		wantLocs    [][]string        // ordered location entries with function names.
+		wantSamples []*profile.Sample // ordered samples, we care only about Value and the profile location IDs.
+	}{{
+		// Sanity test for a normal, complete stack trace.
+		name: "full_stack_trace",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			5, 0, 50, inlinedCallerStack[0], inlinedCallerStack[1],
+		},
+		wantLocs: [][]string{
+			{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"},
+		},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{50, 50 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		name: "bug35538",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			// Fake frame: tryAdd will have inlinedCallerDump
+			// (stack[1]) on the deck when it encounters the next
+			// inline function. It should accept this.
+			7, 0, 10, inlinedCallerStack[0], inlinedCallerStack[1], inlinedCallerStack[0], inlinedCallerStack[1],
+			5, 0, 20, inlinedCallerStack[0], inlinedCallerStack[1],
+		},
+		wantLocs: [][]string{{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{10, 10 * period}, Location: []*profile.Location{{ID: 1}, {ID: 1}}},
+			{Value: []int64{20, 20 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		name: "bug38096",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			// count (data[2]) == 0 && len(stk) == 1 is an overflow
+			// entry. The "stk" entry is actually the count.
+			4, 0, 0, 4242,
+		},
+		wantLocs: [][]string{{"runtime/pprof.lostProfileEvent"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{4242, 4242 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		// If a function is directly called recursively then it must
+		// not be inlined in the caller.
+		//
+		// N.B. We're generating an impossible profile here, with a
+		// recursive inlineCalleeDump call. This is simulating a non-Go
+		// function that looks like an inlined Go function other than
+		// its recursive property. See pcDeck.tryAdd.
+		name: "directly_recursive_func_is_not_inlined",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			5, 0, 30, inlinedCallerStack[0], inlinedCallerStack[0],
+			4, 0, 40, inlinedCallerStack[0],
+		},
+		// inlinedCallerDump shows up here because
+		// runtime_expandFinalInlineFrame adds it to the stack frame.
+		wantLocs: [][]string{{"runtime/pprof.inlinedCalleeDump"}, {"runtime/pprof.inlinedCallerDump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{30, 30 * period}, Location: []*profile.Location{{ID: 1}, {ID: 1}, {ID: 2}}},
+			{Value: []int64{40, 40 * period}, Location: []*profile.Location{{ID: 1}, {ID: 2}}},
+		},
+	}, {
+		name: "recursion_chain_inline",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			9, 0, 10, recursionStack[0], recursionStack[1], recursionStack[2], recursionStack[3], recursionStack[4], recursionStack[5],
+		},
+		wantLocs: [][]string{
+			{"runtime/pprof.recursionChainBottom"},
+			{
+				"runtime/pprof.recursionChainMiddle",
+				"runtime/pprof.recursionChainTop",
+				"runtime/pprof.recursionChainBottom",
+			},
+			{
+				"runtime/pprof.recursionChainMiddle",
+				"runtime/pprof.recursionChainTop",
+				"runtime/pprof.TestTryAdd", // inlined into the test.
+			},
+		},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{10, 10 * period}, Location: []*profile.Location{{ID: 1}, {ID: 2}, {ID: 3}}},
+		},
+	}, {
+		name: "truncated_stack_trace_later",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			5, 0, 50, inlinedCallerStack[0], inlinedCallerStack[1],
+			4, 0, 60, inlinedCallerStack[0],
+		},
+		wantLocs: [][]string{{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{50, 50 * period}, Location: []*profile.Location{{ID: 1}}},
+			{Value: []int64{60, 60 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		name: "truncated_stack_trace_first",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			4, 0, 70, inlinedCallerStack[0],
+			5, 0, 80, inlinedCallerStack[0], inlinedCallerStack[1],
+		},
+		wantLocs: [][]string{{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{70, 70 * period}, Location: []*profile.Location{{ID: 1}}},
+			{Value: []int64{80, 80 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		// We can recover the inlined caller from a truncated stack.
+		name: "truncated_stack_trace_only",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			4, 0, 70, inlinedCallerStack[0],
+		},
+		wantLocs: [][]string{{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{70, 70 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		// The same location is used for duplicated stacks.
+		name: "truncated_stack_trace_twice",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			4, 0, 70, inlinedCallerStack[0],
+			// Fake frame: add a fake call to
+			// inlinedCallerDump to prevent this sample
+			// from getting merged into above.
+			5, 0, 80, inlinedCallerStack[1], inlinedCallerStack[0],
+		},
+		wantLocs: [][]string{
+			{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"},
+			{"runtime/pprof.inlinedCallerDump"},
+		},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{70, 70 * period}, Location: []*profile.Location{{ID: 1}}},
+			{Value: []int64{80, 80 * period}, Location: []*profile.Location{{ID: 2}, {ID: 1}}},
+		},
+	}}
+
+	for _, tc := range testCases {
+		t.Run(tc.name, func(t *testing.T) {
+			p, err := translateCPUProfile(tc.input)
+			if err != nil {
+				t.Fatalf("translating profile: %v", err)
+			}
+			t.Logf("Profile: %v\n", p)
+
+			// One location entry with all inlined functions.
+			var gotLoc [][]string
+			for _, loc := range p.Location {
+				var names []string
+				for _, line := range loc.Line {
+					names = append(names, line.Function.Name)
+				}
+				gotLoc = append(gotLoc, names)
+			}
+			if got, want := fmtJSON(gotLoc), fmtJSON(tc.wantLocs); got != want {
+				t.Errorf("Got Location = %+v\n\twant %+v", got, want)
+			}
+			// All samples should point to one location.
+			var gotSamples []*profile.Sample
+			for _, sample := range p.Sample {
+				var locs []*profile.Location
+				for _, loc := range sample.Location {
+					locs = append(locs, &profile.Location{ID: loc.ID})
+				}
+				gotSamples = append(gotSamples, &profile.Sample{Value: sample.Value, Location: locs})
+			}
+			if got, want := fmtJSON(gotSamples), fmtJSON(tc.wantSamples); got != want {
+				t.Errorf("Got Samples = %+v\n\twant %+v", got, want)
+			}
+		})
+	}
+}
diff --git a/src/runtime/pprof/proto.go b/src/runtime/pprof/proto.go
new file mode 100644
index 0000000..bdb4454
--- /dev/null
+++ b/src/runtime/pprof/proto.go
@@ -0,0 +1,706 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"bytes"
+	"compress/gzip"
+	"fmt"
+	"io"
+	"os"
+	"runtime"
+	"strconv"
+	"time"
+	"unsafe"
+)
+
+// lostProfileEvent is the function to which lost profiling
+// events are attributed.
+// (The name shows up in the pprof graphs.)
+func lostProfileEvent() { lostProfileEvent() }
+
+// funcPC returns the PC for the func value f.
+func funcPC(f interface{}) uintptr {
+	return *(*[2]*uintptr)(unsafe.Pointer(&f))[1]
+}
+
+// A profileBuilder writes a profile incrementally from a
+// stream of profile samples delivered by the runtime.
+type profileBuilder struct {
+	start      time.Time
+	end        time.Time
+	havePeriod bool
+	period     int64
+	m          profMap
+
+	// encoding state
+	w         io.Writer
+	zw        *gzip.Writer
+	pb        protobuf
+	strings   []string
+	stringMap map[string]int
+	locs      map[uintptr]locInfo // list of locInfo starting with the given PC.
+	funcs     map[string]int      // Package path-qualified function name to Function.ID
+	mem       []memMap
+	deck      pcDeck
+}
+
+type memMap struct {
+	// initialized as reading mapping
+	start         uintptr
+	end           uintptr
+	offset        uint64
+	file, buildID string
+
+	funcs symbolizeFlag
+	fake  bool // map entry was faked; /proc/self/maps wasn't available
+}
+
+// symbolizeFlag keeps track of symbolization result.
+//   0                  : no symbol lookup was performed
+//   1<<0 (lookupTried) : symbol lookup was performed
+//   1<<1 (lookupFailed): symbol lookup was performed but failed
+type symbolizeFlag uint8
+
+const (
+	lookupTried  symbolizeFlag = 1 << iota
+	lookupFailed symbolizeFlag = 1 << iota
+)
+
+const (
+	// message Profile
+	tagProfile_SampleType        = 1  // repeated ValueType
+	tagProfile_Sample            = 2  // repeated Sample
+	tagProfile_Mapping           = 3  // repeated Mapping
+	tagProfile_Location          = 4  // repeated Location
+	tagProfile_Function          = 5  // repeated Function
+	tagProfile_StringTable       = 6  // repeated string
+	tagProfile_DropFrames        = 7  // int64 (string table index)
+	tagProfile_KeepFrames        = 8  // int64 (string table index)
+	tagProfile_TimeNanos         = 9  // int64
+	tagProfile_DurationNanos     = 10 // int64
+	tagProfile_PeriodType        = 11 // ValueType (really optional string???)
+	tagProfile_Period            = 12 // int64
+	tagProfile_Comment           = 13 // repeated int64
+	tagProfile_DefaultSampleType = 14 // int64
+
+	// message ValueType
+	tagValueType_Type = 1 // int64 (string table index)
+	tagValueType_Unit = 2 // int64 (string table index)
+
+	// message Sample
+	tagSample_Location = 1 // repeated uint64
+	tagSample_Value    = 2 // repeated int64
+	tagSample_Label    = 3 // repeated Label
+
+	// message Label
+	tagLabel_Key = 1 // int64 (string table index)
+	tagLabel_Str = 2 // int64 (string table index)
+	tagLabel_Num = 3 // int64
+
+	// message Mapping
+	tagMapping_ID              = 1  // uint64
+	tagMapping_Start           = 2  // uint64
+	tagMapping_Limit           = 3  // uint64
+	tagMapping_Offset          = 4  // uint64
+	tagMapping_Filename        = 5  // int64 (string table index)
+	tagMapping_BuildID         = 6  // int64 (string table index)
+	tagMapping_HasFunctions    = 7  // bool
+	tagMapping_HasFilenames    = 8  // bool
+	tagMapping_HasLineNumbers  = 9  // bool
+	tagMapping_HasInlineFrames = 10 // bool
+
+	// message Location
+	tagLocation_ID        = 1 // uint64
+	tagLocation_MappingID = 2 // uint64
+	tagLocation_Address   = 3 // uint64
+	tagLocation_Line      = 4 // repeated Line
+
+	// message Line
+	tagLine_FunctionID = 1 // uint64
+	tagLine_Line       = 2 // int64
+
+	// message Function
+	tagFunction_ID         = 1 // uint64
+	tagFunction_Name       = 2 // int64 (string table index)
+	tagFunction_SystemName = 3 // int64 (string table index)
+	tagFunction_Filename   = 4 // int64 (string table index)
+	tagFunction_StartLine  = 5 // int64
+)
+
+// stringIndex adds s to the string table if not already present
+// and returns the index of s in the string table.
+func (b *profileBuilder) stringIndex(s string) int64 {
+	id, ok := b.stringMap[s]
+	if !ok {
+		id = len(b.strings)
+		b.strings = append(b.strings, s)
+		b.stringMap[s] = id
+	}
+	return int64(id)
+}
+
+func (b *profileBuilder) flush() {
+	const dataFlush = 4096
+	if b.pb.nest == 0 && len(b.pb.data) > dataFlush {
+		b.zw.Write(b.pb.data)
+		b.pb.data = b.pb.data[:0]
+	}
+}
+
+// pbValueType encodes a ValueType message to b.pb.
+func (b *profileBuilder) pbValueType(tag int, typ, unit string) {
+	start := b.pb.startMessage()
+	b.pb.int64(tagValueType_Type, b.stringIndex(typ))
+	b.pb.int64(tagValueType_Unit, b.stringIndex(unit))
+	b.pb.endMessage(tag, start)
+}
+
+// pbSample encodes a Sample message to b.pb.
+func (b *profileBuilder) pbSample(values []int64, locs []uint64, labels func()) {
+	start := b.pb.startMessage()
+	b.pb.int64s(tagSample_Value, values)
+	b.pb.uint64s(tagSample_Location, locs)
+	if labels != nil {
+		labels()
+	}
+	b.pb.endMessage(tagProfile_Sample, start)
+	b.flush()
+}
+
+// pbLabel encodes a Label message to b.pb.
+func (b *profileBuilder) pbLabel(tag int, key, str string, num int64) {
+	start := b.pb.startMessage()
+	b.pb.int64Opt(tagLabel_Key, b.stringIndex(key))
+	b.pb.int64Opt(tagLabel_Str, b.stringIndex(str))
+	b.pb.int64Opt(tagLabel_Num, num)
+	b.pb.endMessage(tag, start)
+}
+
+// pbLine encodes a Line message to b.pb.
+func (b *profileBuilder) pbLine(tag int, funcID uint64, line int64) {
+	start := b.pb.startMessage()
+	b.pb.uint64Opt(tagLine_FunctionID, funcID)
+	b.pb.int64Opt(tagLine_Line, line)
+	b.pb.endMessage(tag, start)
+}
+
+// pbMapping encodes a Mapping message to b.pb.
+func (b *profileBuilder) pbMapping(tag int, id, base, limit, offset uint64, file, buildID string, hasFuncs bool) {
+	start := b.pb.startMessage()
+	b.pb.uint64Opt(tagMapping_ID, id)
+	b.pb.uint64Opt(tagMapping_Start, base)
+	b.pb.uint64Opt(tagMapping_Limit, limit)
+	b.pb.uint64Opt(tagMapping_Offset, offset)
+	b.pb.int64Opt(tagMapping_Filename, b.stringIndex(file))
+	b.pb.int64Opt(tagMapping_BuildID, b.stringIndex(buildID))
+	// TODO: we set HasFunctions if all symbols from samples were symbolized (hasFuncs).
+	// Decide what to do about HasInlineFrames and HasLineNumbers.
+	// Also, another approach to handle the mapping entry with
+	// incomplete symbolization results is to dupliace the mapping
+	// entry (but with different Has* fields values) and use
+	// different entries for symbolized locations and unsymbolized locations.
+	if hasFuncs {
+		b.pb.bool(tagMapping_HasFunctions, true)
+	}
+	b.pb.endMessage(tag, start)
+}
+
+func allFrames(addr uintptr) ([]runtime.Frame, symbolizeFlag) {
+	// Expand this one address using CallersFrames so we can cache
+	// each expansion. In general, CallersFrames takes a whole
+	// stack, but in this case we know there will be no skips in
+	// the stack and we have return PCs anyway.
+	frames := runtime.CallersFrames([]uintptr{addr})
+	frame, more := frames.Next()
+	if frame.Function == "runtime.goexit" {
+		// Short-circuit if we see runtime.goexit so the loop
+		// below doesn't allocate a useless empty location.
+		return nil, 0
+	}
+
+	symbolizeResult := lookupTried
+	if frame.PC == 0 || frame.Function == "" || frame.File == "" || frame.Line == 0 {
+		symbolizeResult |= lookupFailed
+	}
+
+	if frame.PC == 0 {
+		// If we failed to resolve the frame, at least make up
+		// a reasonable call PC. This mostly happens in tests.
+		frame.PC = addr - 1
+	}
+	ret := []runtime.Frame{frame}
+	for frame.Function != "runtime.goexit" && more == true {
+		frame, more = frames.Next()
+		ret = append(ret, frame)
+	}
+	return ret, symbolizeResult
+}
+
+type locInfo struct {
+	// location id assigned by the profileBuilder
+	id uint64
+
+	// sequence of PCs, including the fake PCs returned by the traceback
+	// to represent inlined functions
+	// https://github.com/golang/go/blob/d6f2f833c93a41ec1c68e49804b8387a06b131c5/src/runtime/traceback.go#L347-L368
+	pcs []uintptr
+}
+
+// newProfileBuilder returns a new profileBuilder.
+// CPU profiling data obtained from the runtime can be added
+// by calling b.addCPUData, and then the eventual profile
+// can be obtained by calling b.finish.
+func newProfileBuilder(w io.Writer) *profileBuilder {
+	zw, _ := gzip.NewWriterLevel(w, gzip.BestSpeed)
+	b := &profileBuilder{
+		w:         w,
+		zw:        zw,
+		start:     time.Now(),
+		strings:   []string{""},
+		stringMap: map[string]int{"": 0},
+		locs:      map[uintptr]locInfo{},
+		funcs:     map[string]int{},
+	}
+	b.readMapping()
+	return b
+}
+
+// addCPUData adds the CPU profiling data to the profile.
+// The data must be a whole number of records,
+// as delivered by the runtime.
+func (b *profileBuilder) addCPUData(data []uint64, tags []unsafe.Pointer) error {
+	if !b.havePeriod {
+		// first record is period
+		if len(data) < 3 {
+			return fmt.Errorf("truncated profile")
+		}
+		if data[0] != 3 || data[2] == 0 {
+			return fmt.Errorf("malformed profile")
+		}
+		// data[2] is sampling rate in Hz. Convert to sampling
+		// period in nanoseconds.
+		b.period = 1e9 / int64(data[2])
+		b.havePeriod = true
+		data = data[3:]
+	}
+
+	// Parse CPU samples from the profile.
+	// Each sample is 3+n uint64s:
+	//	data[0] = 3+n
+	//	data[1] = time stamp (ignored)
+	//	data[2] = count
+	//	data[3:3+n] = stack
+	// If the count is 0 and the stack has length 1,
+	// that's an overflow record inserted by the runtime
+	// to indicate that stack[0] samples were lost.
+	// Otherwise the count is usually 1,
+	// but in a few special cases like lost non-Go samples
+	// there can be larger counts.
+	// Because many samples with the same stack arrive,
+	// we want to deduplicate immediately, which we do
+	// using the b.m profMap.
+	for len(data) > 0 {
+		if len(data) < 3 || data[0] > uint64(len(data)) {
+			return fmt.Errorf("truncated profile")
+		}
+		if data[0] < 3 || tags != nil && len(tags) < 1 {
+			return fmt.Errorf("malformed profile")
+		}
+		count := data[2]
+		stk := data[3:data[0]]
+		data = data[data[0]:]
+		var tag unsafe.Pointer
+		if tags != nil {
+			tag = tags[0]
+			tags = tags[1:]
+		}
+
+		if count == 0 && len(stk) == 1 {
+			// overflow record
+			count = uint64(stk[0])
+			stk = []uint64{
+				// gentraceback guarantees that PCs in the
+				// stack can be unconditionally decremented and
+				// still be valid, so we must do the same.
+				uint64(funcPC(lostProfileEvent) + 1),
+			}
+		}
+		b.m.lookup(stk, tag).count += int64(count)
+	}
+	return nil
+}
+
+// build completes and returns the constructed profile.
+func (b *profileBuilder) build() {
+	b.end = time.Now()
+
+	b.pb.int64Opt(tagProfile_TimeNanos, b.start.UnixNano())
+	if b.havePeriod { // must be CPU profile
+		b.pbValueType(tagProfile_SampleType, "samples", "count")
+		b.pbValueType(tagProfile_SampleType, "cpu", "nanoseconds")
+		b.pb.int64Opt(tagProfile_DurationNanos, b.end.Sub(b.start).Nanoseconds())
+		b.pbValueType(tagProfile_PeriodType, "cpu", "nanoseconds")
+		b.pb.int64Opt(tagProfile_Period, b.period)
+	}
+
+	values := []int64{0, 0}
+	var locs []uint64
+
+	for e := b.m.all; e != nil; e = e.nextAll {
+		values[0] = e.count
+		values[1] = e.count * b.period
+
+		var labels func()
+		if e.tag != nil {
+			labels = func() {
+				for k, v := range *(*labelMap)(e.tag) {
+					b.pbLabel(tagSample_Label, k, v, 0)
+				}
+			}
+		}
+
+		locs = b.appendLocsForStack(locs[:0], e.stk)
+
+		b.pbSample(values, locs, labels)
+	}
+
+	for i, m := range b.mem {
+		hasFunctions := m.funcs == lookupTried // lookupTried but not lookupFailed
+		b.pbMapping(tagProfile_Mapping, uint64(i+1), uint64(m.start), uint64(m.end), m.offset, m.file, m.buildID, hasFunctions)
+	}
+
+	// TODO: Anything for tagProfile_DropFrames?
+	// TODO: Anything for tagProfile_KeepFrames?
+
+	b.pb.strings(tagProfile_StringTable, b.strings)
+	b.zw.Write(b.pb.data)
+	b.zw.Close()
+}
+
+// appendLocsForStack appends the location IDs for the given stack trace to the given
+// location ID slice, locs. The addresses in the stack are return PCs or 1 + the PC of
+// an inline marker as the runtime traceback function returns.
+//
+// It may emit to b.pb, so there must be no message encoding in progress.
+func (b *profileBuilder) appendLocsForStack(locs []uint64, stk []uintptr) (newLocs []uint64) {
+	b.deck.reset()
+
+	// The last frame might be truncated. Recover lost inline frames.
+	stk = runtime_expandFinalInlineFrame(stk)
+
+	for len(stk) > 0 {
+		addr := stk[0]
+		if l, ok := b.locs[addr]; ok {
+			// first record the location if there is any pending accumulated info.
+			if id := b.emitLocation(); id > 0 {
+				locs = append(locs, id)
+			}
+
+			// then, record the cached location.
+			locs = append(locs, l.id)
+
+			// Skip the matching pcs.
+			//
+			// Even if stk was truncated due to the stack depth
+			// limit, expandFinalInlineFrame above has already
+			// fixed the truncation, ensuring it is long enough.
+			stk = stk[len(l.pcs):]
+			continue
+		}
+
+		frames, symbolizeResult := allFrames(addr)
+		if len(frames) == 0 { // runtime.goexit.
+			if id := b.emitLocation(); id > 0 {
+				locs = append(locs, id)
+			}
+			stk = stk[1:]
+			continue
+		}
+
+		if added := b.deck.tryAdd(addr, frames, symbolizeResult); added {
+			stk = stk[1:]
+			continue
+		}
+		// add failed because this addr is not inlined with the
+		// existing PCs in the deck. Flush the deck and retry handling
+		// this pc.
+		if id := b.emitLocation(); id > 0 {
+			locs = append(locs, id)
+		}
+
+		// check cache again - previous emitLocation added a new entry
+		if l, ok := b.locs[addr]; ok {
+			locs = append(locs, l.id)
+			stk = stk[len(l.pcs):] // skip the matching pcs.
+		} else {
+			b.deck.tryAdd(addr, frames, symbolizeResult) // must succeed.
+			stk = stk[1:]
+		}
+	}
+	if id := b.emitLocation(); id > 0 { // emit remaining location.
+		locs = append(locs, id)
+	}
+	return locs
+}
+
+// pcDeck is a helper to detect a sequence of inlined functions from
+// a stack trace returned by the runtime.
+//
+// The stack traces returned by runtime's trackback functions are fully
+// expanded (at least for Go functions) and include the fake pcs representing
+// inlined functions. The profile proto expects the inlined functions to be
+// encoded in one Location message.
+// https://github.com/google/pprof/blob/5e965273ee43930341d897407202dd5e10e952cb/proto/profile.proto#L177-L184
+//
+// Runtime does not directly expose whether a frame is for an inlined function
+// and looking up debug info is not ideal, so we use a heuristic to filter
+// the fake pcs and restore the inlined and entry functions. Inlined functions
+// have the following properties:
+//   Frame's Func is nil (note: also true for non-Go functions), and
+//   Frame's Entry matches its entry function frame's Entry (note: could also be true for recursive calls and non-Go functions), and
+//   Frame's Name does not match its entry function frame's name (note: inlined functions cannot be directly recursive).
+//
+// As reading and processing the pcs in a stack trace one by one (from leaf to the root),
+// we use pcDeck to temporarily hold the observed pcs and their expanded frames
+// until we observe the entry function frame.
+type pcDeck struct {
+	pcs             []uintptr
+	frames          []runtime.Frame
+	symbolizeResult symbolizeFlag
+}
+
+func (d *pcDeck) reset() {
+	d.pcs = d.pcs[:0]
+	d.frames = d.frames[:0]
+	d.symbolizeResult = 0
+}
+
+// tryAdd tries to add the pc and Frames expanded from it (most likely one,
+// since the stack trace is already fully expanded) and the symbolizeResult
+// to the deck. If it fails the caller needs to flush the deck and retry.
+func (d *pcDeck) tryAdd(pc uintptr, frames []runtime.Frame, symbolizeResult symbolizeFlag) (success bool) {
+	if existing := len(d.pcs); existing > 0 {
+		// 'd.frames' are all expanded from one 'pc' and represent all
+		// inlined functions so we check only the last one.
+		newFrame := frames[0]
+		last := d.frames[existing-1]
+		if last.Func != nil { // the last frame can't be inlined. Flush.
+			return false
+		}
+		if last.Entry == 0 || newFrame.Entry == 0 { // Possibly not a Go function. Don't try to merge.
+			return false
+		}
+
+		if last.Entry != newFrame.Entry { // newFrame is for a different function.
+			return false
+		}
+		if last.Function == newFrame.Function { // maybe recursion.
+			return false
+		}
+	}
+	d.pcs = append(d.pcs, pc)
+	d.frames = append(d.frames, frames...)
+	d.symbolizeResult |= symbolizeResult
+	return true
+}
+
+// emitLocation emits the new location and function information recorded in the deck
+// and returns the location ID encoded in the profile protobuf.
+// It emits to b.pb, so there must be no message encoding in progress.
+// It resets the deck.
+func (b *profileBuilder) emitLocation() uint64 {
+	if len(b.deck.pcs) == 0 {
+		return 0
+	}
+	defer b.deck.reset()
+
+	addr := b.deck.pcs[0]
+	firstFrame := b.deck.frames[0]
+
+	// We can't write out functions while in the middle of the
+	// Location message, so record new functions we encounter and
+	// write them out after the Location.
+	type newFunc struct {
+		id         uint64
+		name, file string
+	}
+	newFuncs := make([]newFunc, 0, 8)
+
+	id := uint64(len(b.locs)) + 1
+	b.locs[addr] = locInfo{id: id, pcs: append([]uintptr{}, b.deck.pcs...)}
+
+	start := b.pb.startMessage()
+	b.pb.uint64Opt(tagLocation_ID, id)
+	b.pb.uint64Opt(tagLocation_Address, uint64(firstFrame.PC))
+	for _, frame := range b.deck.frames {
+		// Write out each line in frame expansion.
+		funcID := uint64(b.funcs[frame.Function])
+		if funcID == 0 {
+			funcID = uint64(len(b.funcs)) + 1
+			b.funcs[frame.Function] = int(funcID)
+			newFuncs = append(newFuncs, newFunc{funcID, frame.Function, frame.File})
+		}
+		b.pbLine(tagLocation_Line, funcID, int64(frame.Line))
+	}
+	for i := range b.mem {
+		if b.mem[i].start <= addr && addr < b.mem[i].end || b.mem[i].fake {
+			b.pb.uint64Opt(tagLocation_MappingID, uint64(i+1))
+
+			m := b.mem[i]
+			m.funcs |= b.deck.symbolizeResult
+			b.mem[i] = m
+			break
+		}
+	}
+	b.pb.endMessage(tagProfile_Location, start)
+
+	// Write out functions we found during frame expansion.
+	for _, fn := range newFuncs {
+		start := b.pb.startMessage()
+		b.pb.uint64Opt(tagFunction_ID, fn.id)
+		b.pb.int64Opt(tagFunction_Name, b.stringIndex(fn.name))
+		b.pb.int64Opt(tagFunction_SystemName, b.stringIndex(fn.name))
+		b.pb.int64Opt(tagFunction_Filename, b.stringIndex(fn.file))
+		b.pb.endMessage(tagProfile_Function, start)
+	}
+
+	b.flush()
+	return id
+}
+
+// readMapping reads /proc/self/maps and writes mappings to b.pb.
+// It saves the address ranges of the mappings in b.mem for use
+// when emitting locations.
+func (b *profileBuilder) readMapping() {
+	data, _ := os.ReadFile("/proc/self/maps")
+	parseProcSelfMaps(data, b.addMapping)
+	if len(b.mem) == 0 { // pprof expects a map entry, so fake one.
+		b.addMappingEntry(0, 0, 0, "", "", true)
+		// TODO(hyangah): make addMapping return *memMap or
+		// take a memMap struct, and get rid of addMappingEntry
+		// that takes a bunch of positional arguments.
+	}
+}
+
+func parseProcSelfMaps(data []byte, addMapping func(lo, hi, offset uint64, file, buildID string)) {
+	// $ cat /proc/self/maps
+	// 00400000-0040b000 r-xp 00000000 fc:01 787766                             /bin/cat
+	// 0060a000-0060b000 r--p 0000a000 fc:01 787766                             /bin/cat
+	// 0060b000-0060c000 rw-p 0000b000 fc:01 787766                             /bin/cat
+	// 014ab000-014cc000 rw-p 00000000 00:00 0                                  [heap]
+	// 7f7d76af8000-7f7d7797c000 r--p 00000000 fc:01 1318064                    /usr/lib/locale/locale-archive
+	// 7f7d7797c000-7f7d77b36000 r-xp 00000000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+	// 7f7d77b36000-7f7d77d36000 ---p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+	// 7f7d77d36000-7f7d77d3a000 r--p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+	// 7f7d77d3a000-7f7d77d3c000 rw-p 001be000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+	// 7f7d77d3c000-7f7d77d41000 rw-p 00000000 00:00 0
+	// 7f7d77d41000-7f7d77d64000 r-xp 00000000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+	// 7f7d77f3f000-7f7d77f42000 rw-p 00000000 00:00 0
+	// 7f7d77f61000-7f7d77f63000 rw-p 00000000 00:00 0
+	// 7f7d77f63000-7f7d77f64000 r--p 00022000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+	// 7f7d77f64000-7f7d77f65000 rw-p 00023000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+	// 7f7d77f65000-7f7d77f66000 rw-p 00000000 00:00 0
+	// 7ffc342a2000-7ffc342c3000 rw-p 00000000 00:00 0                          [stack]
+	// 7ffc34343000-7ffc34345000 r-xp 00000000 00:00 0                          [vdso]
+	// ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
+
+	var line []byte
+	// next removes and returns the next field in the line.
+	// It also removes from line any spaces following the field.
+	next := func() []byte {
+		j := bytes.IndexByte(line, ' ')
+		if j < 0 {
+			f := line
+			line = nil
+			return f
+		}
+		f := line[:j]
+		line = line[j+1:]
+		for len(line) > 0 && line[0] == ' ' {
+			line = line[1:]
+		}
+		return f
+	}
+
+	for len(data) > 0 {
+		i := bytes.IndexByte(data, '\n')
+		if i < 0 {
+			line, data = data, nil
+		} else {
+			line, data = data[:i], data[i+1:]
+		}
+		addr := next()
+		i = bytes.IndexByte(addr, '-')
+		if i < 0 {
+			continue
+		}
+		lo, err := strconv.ParseUint(string(addr[:i]), 16, 64)
+		if err != nil {
+			continue
+		}
+		hi, err := strconv.ParseUint(string(addr[i+1:]), 16, 64)
+		if err != nil {
+			continue
+		}
+		perm := next()
+		if len(perm) < 4 || perm[2] != 'x' {
+			// Only interested in executable mappings.
+			continue
+		}
+		offset, err := strconv.ParseUint(string(next()), 16, 64)
+		if err != nil {
+			continue
+		}
+		next()          // dev
+		inode := next() // inode
+		if line == nil {
+			continue
+		}
+		file := string(line)
+
+		// Trim deleted file marker.
+		deletedStr := " (deleted)"
+		deletedLen := len(deletedStr)
+		if len(file) >= deletedLen && file[len(file)-deletedLen:] == deletedStr {
+			file = file[:len(file)-deletedLen]
+		}
+
+		if len(inode) == 1 && inode[0] == '0' && file == "" {
+			// Huge-page text mappings list the initial fragment of
+			// mapped but unpopulated memory as being inode 0.
+			// Don't report that part.
+			// But [vdso] and [vsyscall] are inode 0, so let non-empty file names through.
+			continue
+		}
+
+		// TODO: pprof's remapMappingIDs makes two adjustments:
+		// 1. If there is an /anon_hugepage mapping first and it is
+		// consecutive to a next mapping, drop the /anon_hugepage.
+		// 2. If start-offset = 0x400000, change start to 0x400000 and offset to 0.
+		// There's no indication why either of these is needed.
+		// Let's try not doing these and see what breaks.
+		// If we do need them, they would go here, before we
+		// enter the mappings into b.mem in the first place.
+
+		buildID, _ := elfBuildID(file)
+		addMapping(lo, hi, offset, file, buildID)
+	}
+}
+
+func (b *profileBuilder) addMapping(lo, hi, offset uint64, file, buildID string) {
+	b.addMappingEntry(lo, hi, offset, file, buildID, false)
+}
+
+func (b *profileBuilder) addMappingEntry(lo, hi, offset uint64, file, buildID string, fake bool) {
+	b.mem = append(b.mem, memMap{
+		start:   uintptr(lo),
+		end:     uintptr(hi),
+		offset:  offset,
+		file:    file,
+		buildID: buildID,
+		fake:    fake,
+	})
+}
diff --git a/src/runtime/pprof/proto_test.go b/src/runtime/pprof/proto_test.go
new file mode 100644
index 0000000..5eb1aab
--- /dev/null
+++ b/src/runtime/pprof/proto_test.go
@@ -0,0 +1,436 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"bytes"
+	"encoding/json"
+	"fmt"
+	"internal/profile"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"reflect"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+// translateCPUProfile parses binary CPU profiling stack trace data
+// generated by runtime.CPUProfile() into a profile struct.
+// This is only used for testing. Real conversions stream the
+// data into the profileBuilder as it becomes available.
+func translateCPUProfile(data []uint64) (*profile.Profile, error) {
+	var buf bytes.Buffer
+	b := newProfileBuilder(&buf)
+	if err := b.addCPUData(data, nil); err != nil {
+		return nil, err
+	}
+	b.build()
+	return profile.Parse(&buf)
+}
+
+// fmtJSON returns a pretty-printed JSON form for x.
+// It works reasonbly well for printing protocol-buffer
+// data structures like profile.Profile.
+func fmtJSON(x interface{}) string {
+	js, _ := json.MarshalIndent(x, "", "\t")
+	return string(js)
+}
+
+func TestConvertCPUProfileEmpty(t *testing.T) {
+	// A test server with mock cpu profile data.
+	var buf bytes.Buffer
+
+	b := []uint64{3, 0, 500} // empty profile at 500 Hz (2ms sample period)
+	p, err := translateCPUProfile(b)
+	if err != nil {
+		t.Fatalf("translateCPUProfile: %v", err)
+	}
+	if err := p.Write(&buf); err != nil {
+		t.Fatalf("writing profile: %v", err)
+	}
+
+	p, err = profile.Parse(&buf)
+	if err != nil {
+		t.Fatalf("profile.Parse: %v", err)
+	}
+
+	// Expected PeriodType and SampleType.
+	periodType := &profile.ValueType{Type: "cpu", Unit: "nanoseconds"}
+	sampleType := []*profile.ValueType{
+		{Type: "samples", Unit: "count"},
+		{Type: "cpu", Unit: "nanoseconds"},
+	}
+
+	checkProfile(t, p, 2000*1000, periodType, sampleType, nil, "")
+}
+
+func f1() { f1() }
+func f2() { f2() }
+
+// testPCs returns two PCs and two corresponding memory mappings
+// to use in test profiles.
+func testPCs(t *testing.T) (addr1, addr2 uint64, map1, map2 *profile.Mapping) {
+	switch runtime.GOOS {
+	case "linux", "android", "netbsd":
+		// Figure out two addresses from /proc/self/maps.
+		mmap, err := os.ReadFile("/proc/self/maps")
+		if err != nil {
+			t.Fatal(err)
+		}
+		mprof := &profile.Profile{}
+		if err = mprof.ParseMemoryMap(bytes.NewReader(mmap)); err != nil {
+			t.Fatalf("parsing /proc/self/maps: %v", err)
+		}
+		if len(mprof.Mapping) < 2 {
+			// It is possible for a binary to only have 1 executable
+			// region of memory.
+			t.Skipf("need 2 or more mappings, got %v", len(mprof.Mapping))
+		}
+		addr1 = mprof.Mapping[0].Start
+		map1 = mprof.Mapping[0]
+		map1.BuildID, _ = elfBuildID(map1.File)
+		addr2 = mprof.Mapping[1].Start
+		map2 = mprof.Mapping[1]
+		map2.BuildID, _ = elfBuildID(map2.File)
+	case "js":
+		addr1 = uint64(funcPC(f1))
+		addr2 = uint64(funcPC(f2))
+	default:
+		addr1 = uint64(funcPC(f1))
+		addr2 = uint64(funcPC(f2))
+		// Fake mapping - HasFunctions will be true because two PCs from Go
+		// will be fully symbolized.
+		fake := &profile.Mapping{ID: 1, HasFunctions: true}
+		map1, map2 = fake, fake
+	}
+	return
+}
+
+func TestConvertCPUProfile(t *testing.T) {
+	addr1, addr2, map1, map2 := testPCs(t)
+
+	b := []uint64{
+		3, 0, 500, // hz = 500
+		5, 0, 10, uint64(addr1 + 1), uint64(addr1 + 2), // 10 samples in addr1
+		5, 0, 40, uint64(addr2 + 1), uint64(addr2 + 2), // 40 samples in addr2
+		5, 0, 10, uint64(addr1 + 1), uint64(addr1 + 2), // 10 samples in addr1
+	}
+	p, err := translateCPUProfile(b)
+	if err != nil {
+		t.Fatalf("translating profile: %v", err)
+	}
+	period := int64(2000 * 1000)
+	periodType := &profile.ValueType{Type: "cpu", Unit: "nanoseconds"}
+	sampleType := []*profile.ValueType{
+		{Type: "samples", Unit: "count"},
+		{Type: "cpu", Unit: "nanoseconds"},
+	}
+	samples := []*profile.Sample{
+		{Value: []int64{20, 20 * 2000 * 1000}, Location: []*profile.Location{
+			{ID: 1, Mapping: map1, Address: addr1},
+			{ID: 2, Mapping: map1, Address: addr1 + 1},
+		}},
+		{Value: []int64{40, 40 * 2000 * 1000}, Location: []*profile.Location{
+			{ID: 3, Mapping: map2, Address: addr2},
+			{ID: 4, Mapping: map2, Address: addr2 + 1},
+		}},
+	}
+	checkProfile(t, p, period, periodType, sampleType, samples, "")
+}
+
+func checkProfile(t *testing.T, p *profile.Profile, period int64, periodType *profile.ValueType, sampleType []*profile.ValueType, samples []*profile.Sample, defaultSampleType string) {
+	t.Helper()
+
+	if p.Period != period {
+		t.Errorf("p.Period = %d, want %d", p.Period, period)
+	}
+	if !reflect.DeepEqual(p.PeriodType, periodType) {
+		t.Errorf("p.PeriodType = %v\nwant = %v", fmtJSON(p.PeriodType), fmtJSON(periodType))
+	}
+	if !reflect.DeepEqual(p.SampleType, sampleType) {
+		t.Errorf("p.SampleType = %v\nwant = %v", fmtJSON(p.SampleType), fmtJSON(sampleType))
+	}
+	if defaultSampleType != p.DefaultSampleType {
+		t.Errorf("p.DefaultSampleType = %v\nwant = %v", p.DefaultSampleType, defaultSampleType)
+	}
+	// Clear line info since it is not in the expected samples.
+	// If we used f1 and f2 above, then the samples will have line info.
+	for _, s := range p.Sample {
+		for _, l := range s.Location {
+			l.Line = nil
+		}
+	}
+	if fmtJSON(p.Sample) != fmtJSON(samples) { // ignore unexported fields
+		if len(p.Sample) == len(samples) {
+			for i := range p.Sample {
+				if !reflect.DeepEqual(p.Sample[i], samples[i]) {
+					t.Errorf("sample %d = %v\nwant = %v\n", i, fmtJSON(p.Sample[i]), fmtJSON(samples[i]))
+				}
+			}
+			if t.Failed() {
+				t.FailNow()
+			}
+		}
+		t.Fatalf("p.Sample = %v\nwant = %v", fmtJSON(p.Sample), fmtJSON(samples))
+	}
+}
+
+var profSelfMapsTests = `
+00400000-0040b000 r-xp 00000000 fc:01 787766                             /bin/cat
+0060a000-0060b000 r--p 0000a000 fc:01 787766                             /bin/cat
+0060b000-0060c000 rw-p 0000b000 fc:01 787766                             /bin/cat
+014ab000-014cc000 rw-p 00000000 00:00 0                                  [heap]
+7f7d76af8000-7f7d7797c000 r--p 00000000 fc:01 1318064                    /usr/lib/locale/locale-archive
+7f7d7797c000-7f7d77b36000 r-xp 00000000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77b36000-7f7d77d36000 ---p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d36000-7f7d77d3a000 r--p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3a000-7f7d77d3c000 rw-p 001be000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3c000-7f7d77d41000 rw-p 00000000 00:00 0
+7f7d77d41000-7f7d77d64000 r-xp 00000000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f3f000-7f7d77f42000 rw-p 00000000 00:00 0
+7f7d77f61000-7f7d77f63000 rw-p 00000000 00:00 0
+7f7d77f63000-7f7d77f64000 r--p 00022000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f64000-7f7d77f65000 rw-p 00023000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f65000-7f7d77f66000 rw-p 00000000 00:00 0
+7ffc342a2000-7ffc342c3000 rw-p 00000000 00:00 0                          [stack]
+7ffc34343000-7ffc34345000 r-xp 00000000 00:00 0                          [vdso]
+ffffffffff600000-ffffffffff601000 r-xp 00000090 00:00 0                  [vsyscall]
+->
+00400000 0040b000 00000000 /bin/cat
+7f7d7797c000 7f7d77b36000 00000000 /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d41000 7f7d77d64000 00000000 /lib/x86_64-linux-gnu/ld-2.19.so
+7ffc34343000 7ffc34345000 00000000 [vdso]
+ffffffffff600000 ffffffffff601000 00000090 [vsyscall]
+
+00400000-07000000 r-xp 00000000 00:00 0
+07000000-07093000 r-xp 06c00000 00:2e 536754                             /path/to/gobench_server_main
+07093000-0722d000 rw-p 06c92000 00:2e 536754                             /path/to/gobench_server_main
+0722d000-07b21000 rw-p 00000000 00:00 0
+c000000000-c000036000 rw-p 00000000 00:00 0
+->
+07000000 07093000 06c00000 /path/to/gobench_server_main
+`
+
+var profSelfMapsTestsWithDeleted = `
+00400000-0040b000 r-xp 00000000 fc:01 787766                             /bin/cat (deleted)
+0060a000-0060b000 r--p 0000a000 fc:01 787766                             /bin/cat (deleted)
+0060b000-0060c000 rw-p 0000b000 fc:01 787766                             /bin/cat (deleted)
+014ab000-014cc000 rw-p 00000000 00:00 0                                  [heap]
+7f7d76af8000-7f7d7797c000 r--p 00000000 fc:01 1318064                    /usr/lib/locale/locale-archive
+7f7d7797c000-7f7d77b36000 r-xp 00000000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77b36000-7f7d77d36000 ---p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d36000-7f7d77d3a000 r--p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3a000-7f7d77d3c000 rw-p 001be000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3c000-7f7d77d41000 rw-p 00000000 00:00 0
+7f7d77d41000-7f7d77d64000 r-xp 00000000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f3f000-7f7d77f42000 rw-p 00000000 00:00 0
+7f7d77f61000-7f7d77f63000 rw-p 00000000 00:00 0
+7f7d77f63000-7f7d77f64000 r--p 00022000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f64000-7f7d77f65000 rw-p 00023000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f65000-7f7d77f66000 rw-p 00000000 00:00 0
+7ffc342a2000-7ffc342c3000 rw-p 00000000 00:00 0                          [stack]
+7ffc34343000-7ffc34345000 r-xp 00000000 00:00 0                          [vdso]
+ffffffffff600000-ffffffffff601000 r-xp 00000090 00:00 0                  [vsyscall]
+->
+00400000 0040b000 00000000 /bin/cat
+7f7d7797c000 7f7d77b36000 00000000 /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d41000 7f7d77d64000 00000000 /lib/x86_64-linux-gnu/ld-2.19.so
+7ffc34343000 7ffc34345000 00000000 [vdso]
+ffffffffff600000 ffffffffff601000 00000090 [vsyscall]
+
+00400000-0040b000 r-xp 00000000 fc:01 787766                             /bin/cat with space
+0060a000-0060b000 r--p 0000a000 fc:01 787766                             /bin/cat with space
+0060b000-0060c000 rw-p 0000b000 fc:01 787766                             /bin/cat with space
+014ab000-014cc000 rw-p 00000000 00:00 0                                  [heap]
+7f7d76af8000-7f7d7797c000 r--p 00000000 fc:01 1318064                    /usr/lib/locale/locale-archive
+7f7d7797c000-7f7d77b36000 r-xp 00000000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77b36000-7f7d77d36000 ---p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d36000-7f7d77d3a000 r--p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3a000-7f7d77d3c000 rw-p 001be000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3c000-7f7d77d41000 rw-p 00000000 00:00 0
+7f7d77d41000-7f7d77d64000 r-xp 00000000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f3f000-7f7d77f42000 rw-p 00000000 00:00 0
+7f7d77f61000-7f7d77f63000 rw-p 00000000 00:00 0
+7f7d77f63000-7f7d77f64000 r--p 00022000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f64000-7f7d77f65000 rw-p 00023000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f65000-7f7d77f66000 rw-p 00000000 00:00 0
+7ffc342a2000-7ffc342c3000 rw-p 00000000 00:00 0                          [stack]
+7ffc34343000-7ffc34345000 r-xp 00000000 00:00 0                          [vdso]
+ffffffffff600000-ffffffffff601000 r-xp 00000090 00:00 0                  [vsyscall]
+->
+00400000 0040b000 00000000 /bin/cat with space
+7f7d7797c000 7f7d77b36000 00000000 /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d41000 7f7d77d64000 00000000 /lib/x86_64-linux-gnu/ld-2.19.so
+7ffc34343000 7ffc34345000 00000000 [vdso]
+ffffffffff600000 ffffffffff601000 00000090 [vsyscall]
+`
+
+func TestProcSelfMaps(t *testing.T) {
+
+	f := func(t *testing.T, input string) {
+		for tx, tt := range strings.Split(input, "\n\n") {
+			i := strings.Index(tt, "->\n")
+			if i < 0 {
+				t.Fatal("malformed test case")
+			}
+			in, out := tt[:i], tt[i+len("->\n"):]
+			if len(out) > 0 && out[len(out)-1] != '\n' {
+				out += "\n"
+			}
+			var buf bytes.Buffer
+			parseProcSelfMaps([]byte(in), func(lo, hi, offset uint64, file, buildID string) {
+				fmt.Fprintf(&buf, "%08x %08x %08x %s\n", lo, hi, offset, file)
+			})
+			if buf.String() != out {
+				t.Errorf("#%d: have:\n%s\nwant:\n%s\n%q\n%q", tx, buf.String(), out, buf.String(), out)
+			}
+		}
+	}
+
+	t.Run("Normal", func(t *testing.T) {
+		f(t, profSelfMapsTests)
+	})
+
+	t.Run("WithDeletedFile", func(t *testing.T) {
+		f(t, profSelfMapsTestsWithDeleted)
+	})
+}
+
+// TestMapping checks the mapping section of CPU profiles
+// has the HasFunctions field set correctly. If all PCs included
+// in the samples are successfully symbolized, the corresponding
+// mapping entry (in this test case, only one entry) should have
+// its HasFunctions field set true.
+// The test generates a CPU profile that includes PCs from C side
+// that the runtime can't symbolize. See ./testdata/mappingtest.
+func TestMapping(t *testing.T) {
+	testenv.MustHaveGoRun(t)
+	testenv.MustHaveCGO(t)
+
+	prog := "./testdata/mappingtest/main.go"
+
+	// GoOnly includes only Go symbols that runtime will symbolize.
+	// Go+C includes C symbols that runtime will not symbolize.
+	for _, traceback := range []string{"GoOnly", "Go+C"} {
+		t.Run("traceback"+traceback, func(t *testing.T) {
+			cmd := exec.Command(testenv.GoToolPath(t), "run", prog)
+			if traceback != "GoOnly" {
+				cmd.Env = append(os.Environ(), "SETCGOTRACEBACK=1")
+			}
+			cmd.Stderr = new(bytes.Buffer)
+
+			out, err := cmd.Output()
+			if err != nil {
+				t.Fatalf("failed to run the test program %q: %v\n%v", prog, err, cmd.Stderr)
+			}
+
+			prof, err := profile.Parse(bytes.NewReader(out))
+			if err != nil {
+				t.Fatalf("failed to parse the generated profile data: %v", err)
+			}
+			t.Logf("Profile: %s", prof)
+
+			hit := make(map[*profile.Mapping]bool)
+			miss := make(map[*profile.Mapping]bool)
+			for _, loc := range prof.Location {
+				if symbolized(loc) {
+					hit[loc.Mapping] = true
+				} else {
+					miss[loc.Mapping] = true
+				}
+			}
+			if len(miss) == 0 {
+				t.Log("no location with missing symbol info was sampled")
+			}
+
+			for _, m := range prof.Mapping {
+				if miss[m] && m.HasFunctions {
+					t.Errorf("mapping %+v has HasFunctions=true, but contains locations with failed symbolization", m)
+					continue
+				}
+				if !miss[m] && hit[m] && !m.HasFunctions {
+					t.Errorf("mapping %+v has HasFunctions=false, but all referenced locations from this lapping were symbolized successfully", m)
+					continue
+				}
+			}
+
+			if traceback == "Go+C" {
+				// The test code was arranged to have PCs from C and
+				// they are not symbolized.
+				// Check no Location containing those unsymbolized PCs contains multiple lines.
+				for i, loc := range prof.Location {
+					if !symbolized(loc) && len(loc.Line) > 1 {
+						t.Errorf("Location[%d] contains unsymbolized PCs and multiple lines: %v", i, loc)
+					}
+				}
+			}
+		})
+	}
+}
+
+func symbolized(loc *profile.Location) bool {
+	if len(loc.Line) == 0 {
+		return false
+	}
+	l := loc.Line[0]
+	f := l.Function
+	if l.Line == 0 || f == nil || f.Name == "" || f.Filename == "" {
+		return false
+	}
+	return true
+}
+
+// TestFakeMapping tests if at least one mapping exists
+// (including a fake mapping), and their HasFunctions bits
+// are set correctly.
+func TestFakeMapping(t *testing.T) {
+	var buf bytes.Buffer
+	if err := Lookup("heap").WriteTo(&buf, 0); err != nil {
+		t.Fatalf("failed to write heap profile: %v", err)
+	}
+	prof, err := profile.Parse(&buf)
+	if err != nil {
+		t.Fatalf("failed to parse the generated profile data: %v", err)
+	}
+	t.Logf("Profile: %s", prof)
+	if len(prof.Mapping) == 0 {
+		t.Fatal("want profile with at least one mapping entry, got 0 mapping")
+	}
+
+	hit := make(map[*profile.Mapping]bool)
+	miss := make(map[*profile.Mapping]bool)
+	for _, loc := range prof.Location {
+		if symbolized(loc) {
+			hit[loc.Mapping] = true
+		} else {
+			miss[loc.Mapping] = true
+		}
+	}
+	for _, m := range prof.Mapping {
+		if miss[m] && m.HasFunctions {
+			t.Errorf("mapping %+v has HasFunctions=true, but contains locations with failed symbolization", m)
+			continue
+		}
+		if !miss[m] && hit[m] && !m.HasFunctions {
+			t.Errorf("mapping %+v has HasFunctions=false, but all referenced locations from this lapping were symbolized successfully", m)
+			continue
+		}
+	}
+}
+
+// Make sure the profiler can handle an empty stack trace.
+// See issue 37967.
+func TestEmptyStack(t *testing.T) {
+	b := []uint64{
+		3, 0, 500, // hz = 500
+		3, 0, 10, // 10 samples with an empty stack trace
+	}
+	_, err := translateCPUProfile(b)
+	if err != nil {
+		t.Fatalf("translating profile: %v", err)
+	}
+}
diff --git a/src/runtime/pprof/protobuf.go b/src/runtime/pprof/protobuf.go
new file mode 100644
index 0000000..7b99095
--- /dev/null
+++ b/src/runtime/pprof/protobuf.go
@@ -0,0 +1,141 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+// A protobuf is a simple protocol buffer encoder.
+type protobuf struct {
+	data []byte
+	tmp  [16]byte
+	nest int
+}
+
+func (b *protobuf) varint(x uint64) {
+	for x >= 128 {
+		b.data = append(b.data, byte(x)|0x80)
+		x >>= 7
+	}
+	b.data = append(b.data, byte(x))
+}
+
+func (b *protobuf) length(tag int, len int) {
+	b.varint(uint64(tag)<<3 | 2)
+	b.varint(uint64(len))
+}
+
+func (b *protobuf) uint64(tag int, x uint64) {
+	// append varint to b.data
+	b.varint(uint64(tag)<<3 | 0)
+	b.varint(x)
+}
+
+func (b *protobuf) uint64s(tag int, x []uint64) {
+	if len(x) > 2 {
+		// Use packed encoding
+		n1 := len(b.data)
+		for _, u := range x {
+			b.varint(u)
+		}
+		n2 := len(b.data)
+		b.length(tag, n2-n1)
+		n3 := len(b.data)
+		copy(b.tmp[:], b.data[n2:n3])
+		copy(b.data[n1+(n3-n2):], b.data[n1:n2])
+		copy(b.data[n1:], b.tmp[:n3-n2])
+		return
+	}
+	for _, u := range x {
+		b.uint64(tag, u)
+	}
+}
+
+func (b *protobuf) uint64Opt(tag int, x uint64) {
+	if x == 0 {
+		return
+	}
+	b.uint64(tag, x)
+}
+
+func (b *protobuf) int64(tag int, x int64) {
+	u := uint64(x)
+	b.uint64(tag, u)
+}
+
+func (b *protobuf) int64Opt(tag int, x int64) {
+	if x == 0 {
+		return
+	}
+	b.int64(tag, x)
+}
+
+func (b *protobuf) int64s(tag int, x []int64) {
+	if len(x) > 2 {
+		// Use packed encoding
+		n1 := len(b.data)
+		for _, u := range x {
+			b.varint(uint64(u))
+		}
+		n2 := len(b.data)
+		b.length(tag, n2-n1)
+		n3 := len(b.data)
+		copy(b.tmp[:], b.data[n2:n3])
+		copy(b.data[n1+(n3-n2):], b.data[n1:n2])
+		copy(b.data[n1:], b.tmp[:n3-n2])
+		return
+	}
+	for _, u := range x {
+		b.int64(tag, u)
+	}
+}
+
+func (b *protobuf) string(tag int, x string) {
+	b.length(tag, len(x))
+	b.data = append(b.data, x...)
+}
+
+func (b *protobuf) strings(tag int, x []string) {
+	for _, s := range x {
+		b.string(tag, s)
+	}
+}
+
+func (b *protobuf) stringOpt(tag int, x string) {
+	if x == "" {
+		return
+	}
+	b.string(tag, x)
+}
+
+func (b *protobuf) bool(tag int, x bool) {
+	if x {
+		b.uint64(tag, 1)
+	} else {
+		b.uint64(tag, 0)
+	}
+}
+
+func (b *protobuf) boolOpt(tag int, x bool) {
+	if x == false {
+		return
+	}
+	b.bool(tag, x)
+}
+
+type msgOffset int
+
+func (b *protobuf) startMessage() msgOffset {
+	b.nest++
+	return msgOffset(len(b.data))
+}
+
+func (b *protobuf) endMessage(tag int, start msgOffset) {
+	n1 := int(start)
+	n2 := len(b.data)
+	b.length(tag, n2-n1)
+	n3 := len(b.data)
+	copy(b.tmp[:], b.data[n2:n3])
+	copy(b.data[n1+(n3-n2):], b.data[n1:n2])
+	copy(b.data[n1:], b.tmp[:n3-n2])
+	b.nest--
+}
diff --git a/src/runtime/pprof/protomem.go b/src/runtime/pprof/protomem.go
new file mode 100644
index 0000000..fa75a28
--- /dev/null
+++ b/src/runtime/pprof/protomem.go
@@ -0,0 +1,93 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"io"
+	"math"
+	"runtime"
+	"strings"
+)
+
+// writeHeapProto writes the current heap profile in protobuf format to w.
+func writeHeapProto(w io.Writer, p []runtime.MemProfileRecord, rate int64, defaultSampleType string) error {
+	b := newProfileBuilder(w)
+	b.pbValueType(tagProfile_PeriodType, "space", "bytes")
+	b.pb.int64Opt(tagProfile_Period, rate)
+	b.pbValueType(tagProfile_SampleType, "alloc_objects", "count")
+	b.pbValueType(tagProfile_SampleType, "alloc_space", "bytes")
+	b.pbValueType(tagProfile_SampleType, "inuse_objects", "count")
+	b.pbValueType(tagProfile_SampleType, "inuse_space", "bytes")
+	if defaultSampleType != "" {
+		b.pb.int64Opt(tagProfile_DefaultSampleType, b.stringIndex(defaultSampleType))
+	}
+
+	values := []int64{0, 0, 0, 0}
+	var locs []uint64
+	for _, r := range p {
+		hideRuntime := true
+		for tries := 0; tries < 2; tries++ {
+			stk := r.Stack()
+			// For heap profiles, all stack
+			// addresses are return PCs, which is
+			// what appendLocsForStack expects.
+			if hideRuntime {
+				for i, addr := range stk {
+					if f := runtime.FuncForPC(addr); f != nil && strings.HasPrefix(f.Name(), "runtime.") {
+						continue
+					}
+					// Found non-runtime. Show any runtime uses above it.
+					stk = stk[i:]
+					break
+				}
+			}
+			locs = b.appendLocsForStack(locs[:0], stk)
+			if len(locs) > 0 {
+				break
+			}
+			hideRuntime = false // try again, and show all frames next time.
+		}
+
+		values[0], values[1] = scaleHeapSample(r.AllocObjects, r.AllocBytes, rate)
+		values[2], values[3] = scaleHeapSample(r.InUseObjects(), r.InUseBytes(), rate)
+		var blockSize int64
+		if r.AllocObjects > 0 {
+			blockSize = r.AllocBytes / r.AllocObjects
+		}
+		b.pbSample(values, locs, func() {
+			if blockSize != 0 {
+				b.pbLabel(tagSample_Label, "bytes", "", blockSize)
+			}
+		})
+	}
+	b.build()
+	return nil
+}
+
+// scaleHeapSample adjusts the data from a heap Sample to
+// account for its probability of appearing in the collected
+// data. heap profiles are a sampling of the memory allocations
+// requests in a program. We estimate the unsampled value by dividing
+// each collected sample by its probability of appearing in the
+// profile. heap profiles rely on a poisson process to determine
+// which samples to collect, based on the desired average collection
+// rate R. The probability of a sample of size S to appear in that
+// profile is 1-exp(-S/R).
+func scaleHeapSample(count, size, rate int64) (int64, int64) {
+	if count == 0 || size == 0 {
+		return 0, 0
+	}
+
+	if rate <= 1 {
+		// if rate==1 all samples were collected so no adjustment is needed.
+		// if rate<1 treat as unknown and skip scaling.
+		return count, size
+	}
+
+	avgSize := float64(size) / float64(count)
+	scale := 1 / (1 - math.Exp(-avgSize/float64(rate)))
+
+	return int64(float64(count) * scale), int64(float64(size) * scale)
+}
diff --git a/src/runtime/pprof/protomem_test.go b/src/runtime/pprof/protomem_test.go
new file mode 100644
index 0000000..156f628
--- /dev/null
+++ b/src/runtime/pprof/protomem_test.go
@@ -0,0 +1,84 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"bytes"
+	"internal/profile"
+	"runtime"
+	"testing"
+)
+
+func TestConvertMemProfile(t *testing.T) {
+	addr1, addr2, map1, map2 := testPCs(t)
+
+	// MemProfileRecord stacks are return PCs, so add one to the
+	// addresses recorded in the "profile". The proto profile
+	// locations are call PCs, so conversion will subtract one
+	// from these and get back to addr1 and addr2.
+	a1, a2 := uintptr(addr1)+1, uintptr(addr2)+1
+	rate := int64(512 * 1024)
+	rec := []runtime.MemProfileRecord{
+		{AllocBytes: 4096, FreeBytes: 1024, AllocObjects: 4, FreeObjects: 1, Stack0: [32]uintptr{a1, a2}},
+		{AllocBytes: 512 * 1024, FreeBytes: 0, AllocObjects: 1, FreeObjects: 0, Stack0: [32]uintptr{a2 + 1, a2 + 2}},
+		{AllocBytes: 512 * 1024, FreeBytes: 512 * 1024, AllocObjects: 1, FreeObjects: 1, Stack0: [32]uintptr{a1 + 1, a1 + 2, a2 + 3}},
+	}
+
+	periodType := &profile.ValueType{Type: "space", Unit: "bytes"}
+	sampleType := []*profile.ValueType{
+		{Type: "alloc_objects", Unit: "count"},
+		{Type: "alloc_space", Unit: "bytes"},
+		{Type: "inuse_objects", Unit: "count"},
+		{Type: "inuse_space", Unit: "bytes"},
+	}
+	samples := []*profile.Sample{
+		{
+			Value: []int64{2050, 2099200, 1537, 1574400},
+			Location: []*profile.Location{
+				{ID: 1, Mapping: map1, Address: addr1},
+				{ID: 2, Mapping: map2, Address: addr2},
+			},
+			NumLabel: map[string][]int64{"bytes": {1024}},
+		},
+		{
+			Value: []int64{1, 829411, 1, 829411},
+			Location: []*profile.Location{
+				{ID: 3, Mapping: map2, Address: addr2 + 1},
+				{ID: 4, Mapping: map2, Address: addr2 + 2},
+			},
+			NumLabel: map[string][]int64{"bytes": {512 * 1024}},
+		},
+		{
+			Value: []int64{1, 829411, 0, 0},
+			Location: []*profile.Location{
+				{ID: 5, Mapping: map1, Address: addr1 + 1},
+				{ID: 6, Mapping: map1, Address: addr1 + 2},
+				{ID: 7, Mapping: map2, Address: addr2 + 3},
+			},
+			NumLabel: map[string][]int64{"bytes": {512 * 1024}},
+		},
+	}
+	for _, tc := range []struct {
+		name              string
+		defaultSampleType string
+	}{
+		{"heap", ""},
+		{"allocs", "alloc_space"},
+	} {
+		t.Run(tc.name, func(t *testing.T) {
+			var buf bytes.Buffer
+			if err := writeHeapProto(&buf, rec, rate, tc.defaultSampleType); err != nil {
+				t.Fatalf("writing profile: %v", err)
+			}
+
+			p, err := profile.Parse(&buf)
+			if err != nil {
+				t.Fatalf("profile.Parse: %v", err)
+			}
+
+			checkProfile(t, p, rate, periodType, sampleType, samples, tc.defaultSampleType)
+		})
+	}
+}
diff --git a/src/runtime/pprof/runtime.go b/src/runtime/pprof/runtime.go
new file mode 100644
index 0000000..dd2545b
--- /dev/null
+++ b/src/runtime/pprof/runtime.go
@@ -0,0 +1,41 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"context"
+	"unsafe"
+)
+
+// runtime_expandFinalInlineFrame is defined in runtime/symtab.go.
+func runtime_expandFinalInlineFrame(stk []uintptr) []uintptr
+
+// runtime_setProfLabel is defined in runtime/proflabel.go.
+func runtime_setProfLabel(labels unsafe.Pointer)
+
+// runtime_getProfLabel is defined in runtime/proflabel.go.
+func runtime_getProfLabel() unsafe.Pointer
+
+// SetGoroutineLabels sets the current goroutine's labels to match ctx.
+// A new goroutine inherits the labels of the goroutine that created it.
+// This is a lower-level API than Do, which should be used instead when possible.
+func SetGoroutineLabels(ctx context.Context) {
+	ctxLabels, _ := ctx.Value(labelContextKey{}).(*labelMap)
+	runtime_setProfLabel(unsafe.Pointer(ctxLabels))
+}
+
+// Do calls f with a copy of the parent context with the
+// given labels added to the parent's label map.
+// Goroutines spawned while executing f will inherit the augmented label-set.
+// Each key/value pair in labels is inserted into the label map in the
+// order provided, overriding any previous value for the same key.
+// The augmented label map will be set for the duration of the call to f
+// and restored once f returns.
+func Do(ctx context.Context, labels LabelSet, f func(context.Context)) {
+	defer SetGoroutineLabels(ctx)
+	ctx = WithLabels(ctx, labels)
+	SetGoroutineLabels(ctx)
+	f(ctx)
+}
diff --git a/src/runtime/pprof/runtime_test.go b/src/runtime/pprof/runtime_test.go
new file mode 100644
index 0000000..0dd5324
--- /dev/null
+++ b/src/runtime/pprof/runtime_test.go
@@ -0,0 +1,96 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"context"
+	"fmt"
+	"reflect"
+	"testing"
+)
+
+func TestSetGoroutineLabels(t *testing.T) {
+	sync := make(chan struct{})
+
+	wantLabels := map[string]string{}
+	if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("Expected parent goroutine's profile labels to be empty before test, got %v", gotLabels)
+	}
+	go func() {
+		if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+			t.Errorf("Expected child goroutine's profile labels to be empty before test, got %v", gotLabels)
+		}
+		sync <- struct{}{}
+	}()
+	<-sync
+
+	wantLabels = map[string]string{"key": "value"}
+	ctx := WithLabels(context.Background(), Labels("key", "value"))
+	SetGoroutineLabels(ctx)
+	if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("parent goroutine's profile labels: got %v, want %v", gotLabels, wantLabels)
+	}
+	go func() {
+		if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+			t.Errorf("child goroutine's profile labels: got %v, want %v", gotLabels, wantLabels)
+		}
+		sync <- struct{}{}
+	}()
+	<-sync
+
+	wantLabels = map[string]string{}
+	ctx = context.Background()
+	SetGoroutineLabels(ctx)
+	if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("Expected parent goroutine's profile labels to be empty, got %v", gotLabels)
+	}
+	go func() {
+		if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+			t.Errorf("Expected child goroutine's profile labels to be empty, got %v", gotLabels)
+		}
+		sync <- struct{}{}
+	}()
+	<-sync
+}
+
+func TestDo(t *testing.T) {
+	wantLabels := map[string]string{}
+	if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("Expected parent goroutine's profile labels to be empty before Do, got %v", gotLabels)
+	}
+
+	Do(context.Background(), Labels("key1", "value1", "key2", "value2"), func(ctx context.Context) {
+		wantLabels := map[string]string{"key1": "value1", "key2": "value2"}
+		if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+			t.Errorf("parent goroutine's profile labels: got %v, want %v", gotLabels, wantLabels)
+		}
+
+		sync := make(chan struct{})
+		go func() {
+			wantLabels := map[string]string{"key1": "value1", "key2": "value2"}
+			if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+				t.Errorf("child goroutine's profile labels: got %v, want %v", gotLabels, wantLabels)
+			}
+			sync <- struct{}{}
+		}()
+		<-sync
+
+	})
+
+	wantLabels = map[string]string{}
+	if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+		fmt.Printf("%#v", gotLabels)
+		fmt.Printf("%#v", wantLabels)
+		t.Errorf("Expected parent goroutine's profile labels to be empty after Do, got %v", gotLabels)
+	}
+}
+
+func getProfLabel() map[string]string {
+	l := (*labelMap)(runtime_getProfLabel())
+	if l == nil {
+		return map[string]string{}
+	}
+	return *l
+}
diff --git a/src/runtime/pprof/testdata/README b/src/runtime/pprof/testdata/README
new file mode 100644
index 0000000..876538e
--- /dev/null
+++ b/src/runtime/pprof/testdata/README
@@ -0,0 +1,9 @@
+These binaries were generated by:
+
+$ cat empty.s
+.global _start
+_start:
+$ as --32 -o empty.o empty.s && ld  --build-id -m elf_i386 -o test32 empty.o
+$ as --64 -o empty.o empty.s && ld --build-id -o test64 empty.o
+$ powerpc-linux-gnu-as -o empty.o empty.s && powerpc-linux-gnu-ld --build-id -o test32be empty.o
+$ powerpc64-linux-gnu-as -o empty.o empty.s && powerpc64-linux-gnu-ld --build-id -o test64be empty.o
diff --git a/src/runtime/pprof/testdata/mappingtest/main.go b/src/runtime/pprof/testdata/mappingtest/main.go
new file mode 100644
index 0000000..484b7f9
--- /dev/null
+++ b/src/runtime/pprof/testdata/mappingtest/main.go
@@ -0,0 +1,108 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This program outputs a CPU profile that includes
+// both Go and Cgo stacks. This is used by the mapping info
+// tests in runtime/pprof.
+//
+// If SETCGOTRACEBACK=1 is set, the CPU profile will includes
+// PCs from C side but they will not be symbolized.
+package main
+
+/*
+#include <stdint.h>
+#include <stdlib.h>
+
+int cpuHogCSalt1 = 0;
+int cpuHogCSalt2 = 0;
+
+void CPUHogCFunction0(int foo) {
+	int i;
+	for (i = 0; i < 100000; i++) {
+		if (foo > 0) {
+			foo *= foo;
+		} else {
+			foo *= foo + 1;
+		}
+		cpuHogCSalt2 = foo;
+	}
+}
+
+void CPUHogCFunction() {
+	CPUHogCFunction0(cpuHogCSalt1);
+}
+
+struct CgoTracebackArg {
+	uintptr_t context;
+        uintptr_t sigContext;
+	uintptr_t *buf;
+        uintptr_t max;
+};
+
+void CollectCgoTraceback(void* parg) {
+        struct CgoTracebackArg* arg = (struct CgoTracebackArg*)(parg);
+	arg->buf[0] = (uintptr_t)(CPUHogCFunction0);
+	arg->buf[1] = (uintptr_t)(CPUHogCFunction);
+	arg->buf[2] = 0;
+};
+*/
+import "C"
+
+import (
+	"log"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	if v := os.Getenv("SETCGOTRACEBACK"); v == "1" {
+		// Collect some PCs from C-side, but don't symbolize.
+		runtime.SetCgoTraceback(0, unsafe.Pointer(C.CollectCgoTraceback), nil, nil)
+	}
+}
+
+func main() {
+	go cpuHogGoFunction()
+	go cpuHogCFunction()
+	runtime.Gosched()
+
+	if err := pprof.StartCPUProfile(os.Stdout); err != nil {
+		log.Fatal("can't start CPU profile: ", err)
+	}
+	time.Sleep(200 * time.Millisecond)
+	pprof.StopCPUProfile()
+
+	if err := os.Stdout.Close(); err != nil {
+		log.Fatal("can't write CPU profile: ", err)
+	}
+}
+
+var salt1 int
+var salt2 int
+
+func cpuHogGoFunction() {
+	for {
+		foo := salt1
+		for i := 0; i < 1e5; i++ {
+			if foo > 0 {
+				foo *= foo
+			} else {
+				foo *= foo + 1
+			}
+			salt2 = foo
+		}
+		runtime.Gosched()
+	}
+}
+
+func cpuHogCFunction() {
+	// Generates CPU profile samples including a Cgo call path.
+	for {
+		C.CPUHogCFunction()
+		runtime.Gosched()
+	}
+}
diff --git a/src/runtime/pprof/testdata/test32 b/src/runtime/pprof/testdata/test32
new file mode 100755
index 0000000..ce59472
--- /dev/null
+++ b/src/runtime/pprof/testdata/test32
diff --git a/src/runtime/pprof/testdata/test32be b/src/runtime/pprof/testdata/test32be
new file mode 100755
index 0000000..f13a732
--- /dev/null
+++ b/src/runtime/pprof/testdata/test32be
diff --git a/src/runtime/pprof/testdata/test64 b/src/runtime/pprof/testdata/test64
new file mode 100755
index 0000000..3fb42fb
--- /dev/null
+++ b/src/runtime/pprof/testdata/test64
diff --git a/src/runtime/pprof/testdata/test64be b/src/runtime/pprof/testdata/test64be
new file mode 100755
index 0000000..09b4b01
--- /dev/null
+++ b/src/runtime/pprof/testdata/test64be
diff --git a/src/runtime/preempt.go b/src/runtime/preempt.go
new file mode 100644
index 0000000..3721852
--- /dev/null
+++ b/src/runtime/preempt.go
@@ -0,0 +1,456 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Goroutine preemption
+//
+// A goroutine can be preempted at any safe-point. Currently, there
+// are a few categories of safe-points:
+//
+// 1. A blocked safe-point occurs for the duration that a goroutine is
+//    descheduled, blocked on synchronization, or in a system call.
+//
+// 2. Synchronous safe-points occur when a running goroutine checks
+//    for a preemption request.
+//
+// 3. Asynchronous safe-points occur at any instruction in user code
+//    where the goroutine can be safely paused and a conservative
+//    stack and register scan can find stack roots. The runtime can
+//    stop a goroutine at an async safe-point using a signal.
+//
+// At both blocked and synchronous safe-points, a goroutine's CPU
+// state is minimal and the garbage collector has complete information
+// about its entire stack. This makes it possible to deschedule a
+// goroutine with minimal space, and to precisely scan a goroutine's
+// stack.
+//
+// Synchronous safe-points are implemented by overloading the stack
+// bound check in function prologues. To preempt a goroutine at the
+// next synchronous safe-point, the runtime poisons the goroutine's
+// stack bound to a value that will cause the next stack bound check
+// to fail and enter the stack growth implementation, which will
+// detect that it was actually a preemption and redirect to preemption
+// handling.
+//
+// Preemption at asynchronous safe-points is implemented by suspending
+// the thread using an OS mechanism (e.g., signals) and inspecting its
+// state to determine if the goroutine was at an asynchronous
+// safe-point. Since the thread suspension itself is generally
+// asynchronous, it also checks if the running goroutine wants to be
+// preempted, since this could have changed. If all conditions are
+// satisfied, it adjusts the signal context to make it look like the
+// signaled thread just called asyncPreempt and resumes the thread.
+// asyncPreempt spills all registers and enters the scheduler.
+//
+// (An alternative would be to preempt in the signal handler itself.
+// This would let the OS save and restore the register state and the
+// runtime would only need to know how to extract potentially
+// pointer-containing registers from the signal context. However, this
+// would consume an M for every preempted G, and the scheduler itself
+// is not designed to run from a signal handler, as it tends to
+// allocate memory and start threads in the preemption path.)
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type suspendGState struct {
+	g *g
+
+	// dead indicates the goroutine was not suspended because it
+	// is dead. This goroutine could be reused after the dead
+	// state was observed, so the caller must not assume that it
+	// remains dead.
+	dead bool
+
+	// stopped indicates that this suspendG transitioned the G to
+	// _Gwaiting via g.preemptStop and thus is responsible for
+	// readying it when done.
+	stopped bool
+}
+
+// suspendG suspends goroutine gp at a safe-point and returns the
+// state of the suspended goroutine. The caller gets read access to
+// the goroutine until it calls resumeG.
+//
+// It is safe for multiple callers to attempt to suspend the same
+// goroutine at the same time. The goroutine may execute between
+// subsequent successful suspend operations. The current
+// implementation grants exclusive access to the goroutine, and hence
+// multiple callers will serialize. However, the intent is to grant
+// shared read access, so please don't depend on exclusive access.
+//
+// This must be called from the system stack and the user goroutine on
+// the current M (if any) must be in a preemptible state. This
+// prevents deadlocks where two goroutines attempt to suspend each
+// other and both are in non-preemptible states. There are other ways
+// to resolve this deadlock, but this seems simplest.
+//
+// TODO(austin): What if we instead required this to be called from a
+// user goroutine? Then we could deschedule the goroutine while
+// waiting instead of blocking the thread. If two goroutines tried to
+// suspend each other, one of them would win and the other wouldn't
+// complete the suspend until it was resumed. We would have to be
+// careful that they couldn't actually queue up suspend for each other
+// and then both be suspended. This would also avoid the need for a
+// kernel context switch in the synchronous case because we could just
+// directly schedule the waiter. The context switch is unavoidable in
+// the signal case.
+//
+//go:systemstack
+func suspendG(gp *g) suspendGState {
+	if mp := getg().m; mp.curg != nil && readgstatus(mp.curg) == _Grunning {
+		// Since we're on the system stack of this M, the user
+		// G is stuck at an unsafe point. If another goroutine
+		// were to try to preempt m.curg, it could deadlock.
+		throw("suspendG from non-preemptible goroutine")
+	}
+
+	// See https://golang.org/cl/21503 for justification of the yield delay.
+	const yieldDelay = 10 * 1000
+	var nextYield int64
+
+	// Drive the goroutine to a preemption point.
+	stopped := false
+	var asyncM *m
+	var asyncGen uint32
+	var nextPreemptM int64
+	for i := 0; ; i++ {
+		switch s := readgstatus(gp); s {
+		default:
+			if s&_Gscan != 0 {
+				// Someone else is suspending it. Wait
+				// for them to finish.
+				//
+				// TODO: It would be nicer if we could
+				// coalesce suspends.
+				break
+			}
+
+			dumpgstatus(gp)
+			throw("invalid g status")
+
+		case _Gdead:
+			// Nothing to suspend.
+			//
+			// preemptStop may need to be cleared, but
+			// doing that here could race with goroutine
+			// reuse. Instead, goexit0 clears it.
+			return suspendGState{dead: true}
+
+		case _Gcopystack:
+			// The stack is being copied. We need to wait
+			// until this is done.
+
+		case _Gpreempted:
+			// We (or someone else) suspended the G. Claim
+			// ownership of it by transitioning it to
+			// _Gwaiting.
+			if !casGFromPreempted(gp, _Gpreempted, _Gwaiting) {
+				break
+			}
+
+			// We stopped the G, so we have to ready it later.
+			stopped = true
+
+			s = _Gwaiting
+			fallthrough
+
+		case _Grunnable, _Gsyscall, _Gwaiting:
+			// Claim goroutine by setting scan bit.
+			// This may race with execution or readying of gp.
+			// The scan bit keeps it from transition state.
+			if !castogscanstatus(gp, s, s|_Gscan) {
+				break
+			}
+
+			// Clear the preemption request. It's safe to
+			// reset the stack guard because we hold the
+			// _Gscan bit and thus own the stack.
+			gp.preemptStop = false
+			gp.preempt = false
+			gp.stackguard0 = gp.stack.lo + _StackGuard
+
+			// The goroutine was already at a safe-point
+			// and we've now locked that in.
+			//
+			// TODO: It would be much better if we didn't
+			// leave it in _Gscan, but instead gently
+			// prevented its scheduling until resumption.
+			// Maybe we only use this to bump a suspended
+			// count and the scheduler skips suspended
+			// goroutines? That wouldn't be enough for
+			// {_Gsyscall,_Gwaiting} -> _Grunning. Maybe
+			// for all those transitions we need to check
+			// suspended and deschedule?
+			return suspendGState{g: gp, stopped: stopped}
+
+		case _Grunning:
+			// Optimization: if there is already a pending preemption request
+			// (from the previous loop iteration), don't bother with the atomics.
+			if gp.preemptStop && gp.preempt && gp.stackguard0 == stackPreempt && asyncM == gp.m && atomic.Load(&asyncM.preemptGen) == asyncGen {
+				break
+			}
+
+			// Temporarily block state transitions.
+			if !castogscanstatus(gp, _Grunning, _Gscanrunning) {
+				break
+			}
+
+			// Request synchronous preemption.
+			gp.preemptStop = true
+			gp.preempt = true
+			gp.stackguard0 = stackPreempt
+
+			// Prepare for asynchronous preemption.
+			asyncM2 := gp.m
+			asyncGen2 := atomic.Load(&asyncM2.preemptGen)
+			needAsync := asyncM != asyncM2 || asyncGen != asyncGen2
+			asyncM = asyncM2
+			asyncGen = asyncGen2
+
+			casfrom_Gscanstatus(gp, _Gscanrunning, _Grunning)
+
+			// Send asynchronous preemption. We do this
+			// after CASing the G back to _Grunning
+			// because preemptM may be synchronous and we
+			// don't want to catch the G just spinning on
+			// its status.
+			if preemptMSupported && debug.asyncpreemptoff == 0 && needAsync {
+				// Rate limit preemptM calls. This is
+				// particularly important on Windows
+				// where preemptM is actually
+				// synchronous and the spin loop here
+				// can lead to live-lock.
+				now := nanotime()
+				if now >= nextPreemptM {
+					nextPreemptM = now + yieldDelay/2
+					preemptM(asyncM)
+				}
+			}
+		}
+
+		// TODO: Don't busy wait. This loop should really only
+		// be a simple read/decide/CAS loop that only fails if
+		// there's an active race. Once the CAS succeeds, we
+		// should queue up the preemption (which will require
+		// it to be reliable in the _Grunning case, not
+		// best-effort) and then sleep until we're notified
+		// that the goroutine is suspended.
+		if i == 0 {
+			nextYield = nanotime() + yieldDelay
+		}
+		if nanotime() < nextYield {
+			procyield(10)
+		} else {
+			osyield()
+			nextYield = nanotime() + yieldDelay/2
+		}
+	}
+}
+
+// resumeG undoes the effects of suspendG, allowing the suspended
+// goroutine to continue from its current safe-point.
+func resumeG(state suspendGState) {
+	if state.dead {
+		// We didn't actually stop anything.
+		return
+	}
+
+	gp := state.g
+	switch s := readgstatus(gp); s {
+	default:
+		dumpgstatus(gp)
+		throw("unexpected g status")
+
+	case _Grunnable | _Gscan,
+		_Gwaiting | _Gscan,
+		_Gsyscall | _Gscan:
+		casfrom_Gscanstatus(gp, s, s&^_Gscan)
+	}
+
+	if state.stopped {
+		// We stopped it, so we need to re-schedule it.
+		ready(gp, 0, true)
+	}
+}
+
+// canPreemptM reports whether mp is in a state that is safe to preempt.
+//
+// It is nosplit because it has nosplit callers.
+//
+//go:nosplit
+func canPreemptM(mp *m) bool {
+	return mp.locks == 0 && mp.mallocing == 0 && mp.preemptoff == "" && mp.p.ptr().status == _Prunning
+}
+
+//go:generate go run mkpreempt.go
+
+// asyncPreempt saves all user registers and calls asyncPreempt2.
+//
+// When stack scanning encounters an asyncPreempt frame, it scans that
+// frame and its parent frame conservatively.
+//
+// asyncPreempt is implemented in assembly.
+func asyncPreempt()
+
+//go:nosplit
+func asyncPreempt2() {
+	gp := getg()
+	gp.asyncSafePoint = true
+	if gp.preemptStop {
+		mcall(preemptPark)
+	} else {
+		mcall(gopreempt_m)
+	}
+	gp.asyncSafePoint = false
+}
+
+// asyncPreemptStack is the bytes of stack space required to inject an
+// asyncPreempt call.
+var asyncPreemptStack = ^uintptr(0)
+
+func init() {
+	f := findfunc(funcPC(asyncPreempt))
+	total := funcMaxSPDelta(f)
+	f = findfunc(funcPC(asyncPreempt2))
+	total += funcMaxSPDelta(f)
+	// Add some overhead for return PCs, etc.
+	asyncPreemptStack = uintptr(total) + 8*sys.PtrSize
+	if asyncPreemptStack > _StackLimit {
+		// We need more than the nosplit limit. This isn't
+		// unsafe, but it may limit asynchronous preemption.
+		//
+		// This may be a problem if we start using more
+		// registers. In that case, we should store registers
+		// in a context object. If we pre-allocate one per P,
+		// asyncPreempt can spill just a few registers to the
+		// stack, then grab its context object and spill into
+		// it. When it enters the runtime, it would allocate a
+		// new context for the P.
+		print("runtime: asyncPreemptStack=", asyncPreemptStack, "\n")
+		throw("async stack too large")
+	}
+}
+
+// wantAsyncPreempt returns whether an asynchronous preemption is
+// queued for gp.
+func wantAsyncPreempt(gp *g) bool {
+	// Check both the G and the P.
+	return (gp.preempt || gp.m.p != 0 && gp.m.p.ptr().preempt) && readgstatus(gp)&^_Gscan == _Grunning
+}
+
+// isAsyncSafePoint reports whether gp at instruction PC is an
+// asynchronous safe point. This indicates that:
+//
+// 1. It's safe to suspend gp and conservatively scan its stack and
+// registers. There are no potentially hidden pointer values and it's
+// not in the middle of an atomic sequence like a write barrier.
+//
+// 2. gp has enough stack space to inject the asyncPreempt call.
+//
+// 3. It's generally safe to interact with the runtime, even if we're
+// in a signal handler stopped here. For example, there are no runtime
+// locks held, so acquiring a runtime lock won't self-deadlock.
+//
+// In some cases the PC is safe for asynchronous preemption but it
+// also needs to adjust the resumption PC. The new PC is returned in
+// the second result.
+func isAsyncSafePoint(gp *g, pc, sp, lr uintptr) (bool, uintptr) {
+	mp := gp.m
+
+	// Only user Gs can have safe-points. We check this first
+	// because it's extremely common that we'll catch mp in the
+	// scheduler processing this G preemption.
+	if mp.curg != gp {
+		return false, 0
+	}
+
+	// Check M state.
+	if mp.p == 0 || !canPreemptM(mp) {
+		return false, 0
+	}
+
+	// Check stack space.
+	if sp < gp.stack.lo || sp-gp.stack.lo < asyncPreemptStack {
+		return false, 0
+	}
+
+	// Check if PC is an unsafe-point.
+	f := findfunc(pc)
+	if !f.valid() {
+		// Not Go code.
+		return false, 0
+	}
+	if (GOARCH == "mips" || GOARCH == "mipsle" || GOARCH == "mips64" || GOARCH == "mips64le") && lr == pc+8 && funcspdelta(f, pc, nil) == 0 {
+		// We probably stopped at a half-executed CALL instruction,
+		// where the LR is updated but the PC has not. If we preempt
+		// here we'll see a seemingly self-recursive call, which is in
+		// fact not.
+		// This is normally ok, as we use the return address saved on
+		// stack for unwinding, not the LR value. But if this is a
+		// call to morestack, we haven't created the frame, and we'll
+		// use the LR for unwinding, which will be bad.
+		return false, 0
+	}
+	up, startpc := pcdatavalue2(f, _PCDATA_UnsafePoint, pc)
+	if up != _PCDATA_UnsafePointSafe {
+		// Unsafe-point marked by compiler. This includes
+		// atomic sequences (e.g., write barrier) and nosplit
+		// functions (except at calls).
+		return false, 0
+	}
+	if fd := funcdata(f, _FUNCDATA_LocalsPointerMaps); fd == nil || fd == unsafe.Pointer(&no_pointers_stackmap) {
+		// This is assembly code. Don't assume it's
+		// well-formed. We identify assembly code by
+		// checking that it has either no stack map, or
+		// no_pointers_stackmap, which is the stack map
+		// for ones marked as NO_LOCAL_POINTERS.
+		//
+		// TODO: Are there cases that are safe but don't have a
+		// locals pointer map, like empty frame functions?
+		return false, 0
+	}
+	name := funcname(f)
+	if inldata := funcdata(f, _FUNCDATA_InlTree); inldata != nil {
+		inltree := (*[1 << 20]inlinedCall)(inldata)
+		ix := pcdatavalue(f, _PCDATA_InlTreeIndex, pc, nil)
+		if ix >= 0 {
+			name = funcnameFromNameoff(f, inltree[ix].func_)
+		}
+	}
+	if hasPrefix(name, "runtime.") ||
+		hasPrefix(name, "runtime/internal/") ||
+		hasPrefix(name, "reflect.") {
+		// For now we never async preempt the runtime or
+		// anything closely tied to the runtime. Known issues
+		// include: various points in the scheduler ("don't
+		// preempt between here and here"), much of the defer
+		// implementation (untyped info on stack), bulk write
+		// barriers (write barrier check),
+		// reflect.{makeFuncStub,methodValueCall}.
+		//
+		// TODO(austin): We should improve this, or opt things
+		// in incrementally.
+		return false, 0
+	}
+	switch up {
+	case _PCDATA_Restart1, _PCDATA_Restart2:
+		// Restartable instruction sequence. Back off PC to
+		// the start PC.
+		if startpc == 0 || startpc > pc || pc-startpc > 20 {
+			throw("bad restart PC")
+		}
+		return true, startpc
+	case _PCDATA_RestartAtEntry:
+		// Restart from the function entry at resumption.
+		return true, f.entry
+	}
+	return true, pc
+}
+
+var no_pointers_stackmap uint64 // defined in assembly, for NO_LOCAL_POINTERS macro
diff --git a/src/runtime/preempt_386.s b/src/runtime/preempt_386.s
new file mode 100644
index 0000000..a803b24
--- /dev/null
+++ b/src/runtime/preempt_386.s
@@ -0,0 +1,50 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.
+TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	PUSHFL
+	ADJSP $156
+	NOP SP
+	MOVL AX, 0(SP)
+	MOVL CX, 4(SP)
+	MOVL DX, 8(SP)
+	MOVL BX, 12(SP)
+	MOVL BP, 16(SP)
+	MOVL SI, 20(SP)
+	MOVL DI, 24(SP)
+	CMPB internal∕cpu·X86+const_offsetX86HasSSE2(SB), $1
+	JNE nosse
+	MOVUPS X0, 28(SP)
+	MOVUPS X1, 44(SP)
+	MOVUPS X2, 60(SP)
+	MOVUPS X3, 76(SP)
+	MOVUPS X4, 92(SP)
+	MOVUPS X5, 108(SP)
+	MOVUPS X6, 124(SP)
+	MOVUPS X7, 140(SP)
+nosse:
+	CALL ·asyncPreempt2(SB)
+	CMPB internal∕cpu·X86+const_offsetX86HasSSE2(SB), $1
+	JNE nosse2
+	MOVUPS 140(SP), X7
+	MOVUPS 124(SP), X6
+	MOVUPS 108(SP), X5
+	MOVUPS 92(SP), X4
+	MOVUPS 76(SP), X3
+	MOVUPS 60(SP), X2
+	MOVUPS 44(SP), X1
+	MOVUPS 28(SP), X0
+nosse2:
+	MOVL 24(SP), DI
+	MOVL 20(SP), SI
+	MOVL 16(SP), BP
+	MOVL 12(SP), BX
+	MOVL 8(SP), DX
+	MOVL 4(SP), CX
+	MOVL 0(SP), AX
+	ADJSP $-156
+	POPFL
+	RET
diff --git a/src/runtime/preempt_amd64.s b/src/runtime/preempt_amd64.s
new file mode 100644
index 0000000..92c664d
--- /dev/null
+++ b/src/runtime/preempt_amd64.s
@@ -0,0 +1,85 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.
+TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	PUSHQ BP
+	MOVQ SP, BP
+	// Save flags before clobbering them
+	PUSHFQ
+	// obj doesn't understand ADD/SUB on SP, but does understand ADJSP
+	ADJSP $368
+	// But vet doesn't know ADJSP, so suppress vet stack checking
+	NOP SP
+	#ifdef GOOS_darwin
+	CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $0
+	JE 2(PC)
+	VZEROUPPER
+	#endif
+	MOVQ AX, 0(SP)
+	MOVQ CX, 8(SP)
+	MOVQ DX, 16(SP)
+	MOVQ BX, 24(SP)
+	MOVQ SI, 32(SP)
+	MOVQ DI, 40(SP)
+	MOVQ R8, 48(SP)
+	MOVQ R9, 56(SP)
+	MOVQ R10, 64(SP)
+	MOVQ R11, 72(SP)
+	MOVQ R12, 80(SP)
+	MOVQ R13, 88(SP)
+	MOVQ R14, 96(SP)
+	MOVQ R15, 104(SP)
+	MOVUPS X0, 112(SP)
+	MOVUPS X1, 128(SP)
+	MOVUPS X2, 144(SP)
+	MOVUPS X3, 160(SP)
+	MOVUPS X4, 176(SP)
+	MOVUPS X5, 192(SP)
+	MOVUPS X6, 208(SP)
+	MOVUPS X7, 224(SP)
+	MOVUPS X8, 240(SP)
+	MOVUPS X9, 256(SP)
+	MOVUPS X10, 272(SP)
+	MOVUPS X11, 288(SP)
+	MOVUPS X12, 304(SP)
+	MOVUPS X13, 320(SP)
+	MOVUPS X14, 336(SP)
+	MOVUPS X15, 352(SP)
+	CALL ·asyncPreempt2(SB)
+	MOVUPS 352(SP), X15
+	MOVUPS 336(SP), X14
+	MOVUPS 320(SP), X13
+	MOVUPS 304(SP), X12
+	MOVUPS 288(SP), X11
+	MOVUPS 272(SP), X10
+	MOVUPS 256(SP), X9
+	MOVUPS 240(SP), X8
+	MOVUPS 224(SP), X7
+	MOVUPS 208(SP), X6
+	MOVUPS 192(SP), X5
+	MOVUPS 176(SP), X4
+	MOVUPS 160(SP), X3
+	MOVUPS 144(SP), X2
+	MOVUPS 128(SP), X1
+	MOVUPS 112(SP), X0
+	MOVQ 104(SP), R15
+	MOVQ 96(SP), R14
+	MOVQ 88(SP), R13
+	MOVQ 80(SP), R12
+	MOVQ 72(SP), R11
+	MOVQ 64(SP), R10
+	MOVQ 56(SP), R9
+	MOVQ 48(SP), R8
+	MOVQ 40(SP), DI
+	MOVQ 32(SP), SI
+	MOVQ 24(SP), BX
+	MOVQ 16(SP), DX
+	MOVQ 8(SP), CX
+	MOVQ 0(SP), AX
+	ADJSP $-368
+	POPFQ
+	POPQ BP
+	RET
diff --git a/src/runtime/preempt_arm.s b/src/runtime/preempt_arm.s
new file mode 100644
index 0000000..bbc9fbb
--- /dev/null
+++ b/src/runtime/preempt_arm.s
@@ -0,0 +1,84 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.
+TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW.W R14, -188(R13)
+	MOVW R0, 4(R13)
+	MOVW R1, 8(R13)
+	MOVW R2, 12(R13)
+	MOVW R3, 16(R13)
+	MOVW R4, 20(R13)
+	MOVW R5, 24(R13)
+	MOVW R6, 28(R13)
+	MOVW R7, 32(R13)
+	MOVW R8, 36(R13)
+	MOVW R9, 40(R13)
+	MOVW R11, 44(R13)
+	MOVW R12, 48(R13)
+	MOVW CPSR, R0
+	MOVW R0, 52(R13)
+	MOVB ·goarm(SB), R0
+	CMP $6, R0
+	BLT nofp
+	MOVW FPCR, R0
+	MOVW R0, 56(R13)
+	MOVD F0, 60(R13)
+	MOVD F1, 68(R13)
+	MOVD F2, 76(R13)
+	MOVD F3, 84(R13)
+	MOVD F4, 92(R13)
+	MOVD F5, 100(R13)
+	MOVD F6, 108(R13)
+	MOVD F7, 116(R13)
+	MOVD F8, 124(R13)
+	MOVD F9, 132(R13)
+	MOVD F10, 140(R13)
+	MOVD F11, 148(R13)
+	MOVD F12, 156(R13)
+	MOVD F13, 164(R13)
+	MOVD F14, 172(R13)
+	MOVD F15, 180(R13)
+nofp:
+	CALL ·asyncPreempt2(SB)
+	MOVB ·goarm(SB), R0
+	CMP $6, R0
+	BLT nofp2
+	MOVD 180(R13), F15
+	MOVD 172(R13), F14
+	MOVD 164(R13), F13
+	MOVD 156(R13), F12
+	MOVD 148(R13), F11
+	MOVD 140(R13), F10
+	MOVD 132(R13), F9
+	MOVD 124(R13), F8
+	MOVD 116(R13), F7
+	MOVD 108(R13), F6
+	MOVD 100(R13), F5
+	MOVD 92(R13), F4
+	MOVD 84(R13), F3
+	MOVD 76(R13), F2
+	MOVD 68(R13), F1
+	MOVD 60(R13), F0
+	MOVW 56(R13), R0
+	MOVW R0, FPCR
+nofp2:
+	MOVW 52(R13), R0
+	MOVW R0, CPSR
+	MOVW 48(R13), R12
+	MOVW 44(R13), R11
+	MOVW 40(R13), R9
+	MOVW 36(R13), R8
+	MOVW 32(R13), R7
+	MOVW 28(R13), R6
+	MOVW 24(R13), R5
+	MOVW 20(R13), R4
+	MOVW 16(R13), R3
+	MOVW 12(R13), R2
+	MOVW 8(R13), R1
+	MOVW 4(R13), R0
+	MOVW 188(R13), R14
+	MOVW.P 192(R13), R15
+	UNDEF
diff --git a/src/runtime/preempt_arm64.s b/src/runtime/preempt_arm64.s
new file mode 100644
index 0000000..2b70a28
--- /dev/null
+++ b/src/runtime/preempt_arm64.s
@@ -0,0 +1,148 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.
+TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD R30, -496(RSP)
+	SUB $496, RSP
+	#ifdef GOOS_linux
+	MOVD R29, -8(RSP)
+	SUB $8, RSP, R29
+	#endif
+	#ifdef GOOS_ios
+	MOVD R30, (RSP)
+	#endif
+	MOVD R0, 8(RSP)
+	MOVD R1, 16(RSP)
+	MOVD R2, 24(RSP)
+	MOVD R3, 32(RSP)
+	MOVD R4, 40(RSP)
+	MOVD R5, 48(RSP)
+	MOVD R6, 56(RSP)
+	MOVD R7, 64(RSP)
+	MOVD R8, 72(RSP)
+	MOVD R9, 80(RSP)
+	MOVD R10, 88(RSP)
+	MOVD R11, 96(RSP)
+	MOVD R12, 104(RSP)
+	MOVD R13, 112(RSP)
+	MOVD R14, 120(RSP)
+	MOVD R15, 128(RSP)
+	MOVD R16, 136(RSP)
+	MOVD R17, 144(RSP)
+	MOVD R19, 152(RSP)
+	MOVD R20, 160(RSP)
+	MOVD R21, 168(RSP)
+	MOVD R22, 176(RSP)
+	MOVD R23, 184(RSP)
+	MOVD R24, 192(RSP)
+	MOVD R25, 200(RSP)
+	MOVD R26, 208(RSP)
+	MOVD NZCV, R0
+	MOVD R0, 216(RSP)
+	MOVD FPSR, R0
+	MOVD R0, 224(RSP)
+	FMOVD F0, 232(RSP)
+	FMOVD F1, 240(RSP)
+	FMOVD F2, 248(RSP)
+	FMOVD F3, 256(RSP)
+	FMOVD F4, 264(RSP)
+	FMOVD F5, 272(RSP)
+	FMOVD F6, 280(RSP)
+	FMOVD F7, 288(RSP)
+	FMOVD F8, 296(RSP)
+	FMOVD F9, 304(RSP)
+	FMOVD F10, 312(RSP)
+	FMOVD F11, 320(RSP)
+	FMOVD F12, 328(RSP)
+	FMOVD F13, 336(RSP)
+	FMOVD F14, 344(RSP)
+	FMOVD F15, 352(RSP)
+	FMOVD F16, 360(RSP)
+	FMOVD F17, 368(RSP)
+	FMOVD F18, 376(RSP)
+	FMOVD F19, 384(RSP)
+	FMOVD F20, 392(RSP)
+	FMOVD F21, 400(RSP)
+	FMOVD F22, 408(RSP)
+	FMOVD F23, 416(RSP)
+	FMOVD F24, 424(RSP)
+	FMOVD F25, 432(RSP)
+	FMOVD F26, 440(RSP)
+	FMOVD F27, 448(RSP)
+	FMOVD F28, 456(RSP)
+	FMOVD F29, 464(RSP)
+	FMOVD F30, 472(RSP)
+	FMOVD F31, 480(RSP)
+	CALL ·asyncPreempt2(SB)
+	FMOVD 480(RSP), F31
+	FMOVD 472(RSP), F30
+	FMOVD 464(RSP), F29
+	FMOVD 456(RSP), F28
+	FMOVD 448(RSP), F27
+	FMOVD 440(RSP), F26
+	FMOVD 432(RSP), F25
+	FMOVD 424(RSP), F24
+	FMOVD 416(RSP), F23
+	FMOVD 408(RSP), F22
+	FMOVD 400(RSP), F21
+	FMOVD 392(RSP), F20
+	FMOVD 384(RSP), F19
+	FMOVD 376(RSP), F18
+	FMOVD 368(RSP), F17
+	FMOVD 360(RSP), F16
+	FMOVD 352(RSP), F15
+	FMOVD 344(RSP), F14
+	FMOVD 336(RSP), F13
+	FMOVD 328(RSP), F12
+	FMOVD 320(RSP), F11
+	FMOVD 312(RSP), F10
+	FMOVD 304(RSP), F9
+	FMOVD 296(RSP), F8
+	FMOVD 288(RSP), F7
+	FMOVD 280(RSP), F6
+	FMOVD 272(RSP), F5
+	FMOVD 264(RSP), F4
+	FMOVD 256(RSP), F3
+	FMOVD 248(RSP), F2
+	FMOVD 240(RSP), F1
+	FMOVD 232(RSP), F0
+	MOVD 224(RSP), R0
+	MOVD R0, FPSR
+	MOVD 216(RSP), R0
+	MOVD R0, NZCV
+	MOVD 208(RSP), R26
+	MOVD 200(RSP), R25
+	MOVD 192(RSP), R24
+	MOVD 184(RSP), R23
+	MOVD 176(RSP), R22
+	MOVD 168(RSP), R21
+	MOVD 160(RSP), R20
+	MOVD 152(RSP), R19
+	MOVD 144(RSP), R17
+	MOVD 136(RSP), R16
+	MOVD 128(RSP), R15
+	MOVD 120(RSP), R14
+	MOVD 112(RSP), R13
+	MOVD 104(RSP), R12
+	MOVD 96(RSP), R11
+	MOVD 88(RSP), R10
+	MOVD 80(RSP), R9
+	MOVD 72(RSP), R8
+	MOVD 64(RSP), R7
+	MOVD 56(RSP), R6
+	MOVD 48(RSP), R5
+	MOVD 40(RSP), R4
+	MOVD 32(RSP), R3
+	MOVD 24(RSP), R2
+	MOVD 16(RSP), R1
+	MOVD 8(RSP), R0
+	MOVD 496(RSP), R30
+	#ifdef GOOS_linux
+	MOVD -8(RSP), R29
+	#endif
+	MOVD (RSP), R27
+	ADD $512, RSP
+	JMP (R27)
diff --git a/src/runtime/preempt_mips64x.s b/src/runtime/preempt_mips64x.s
new file mode 100644
index 0000000..0d0c157
--- /dev/null
+++ b/src/runtime/preempt_mips64x.s
@@ -0,0 +1,146 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+// +build mips64 mips64le
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.
+TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	MOVV R31, -488(R29)
+	SUBV $488, R29
+	MOVV R1, 8(R29)
+	MOVV R2, 16(R29)
+	MOVV R3, 24(R29)
+	MOVV R4, 32(R29)
+	MOVV R5, 40(R29)
+	MOVV R6, 48(R29)
+	MOVV R7, 56(R29)
+	MOVV R8, 64(R29)
+	MOVV R9, 72(R29)
+	MOVV R10, 80(R29)
+	MOVV R11, 88(R29)
+	MOVV R12, 96(R29)
+	MOVV R13, 104(R29)
+	MOVV R14, 112(R29)
+	MOVV R15, 120(R29)
+	MOVV R16, 128(R29)
+	MOVV R17, 136(R29)
+	MOVV R18, 144(R29)
+	MOVV R19, 152(R29)
+	MOVV R20, 160(R29)
+	MOVV R21, 168(R29)
+	MOVV R22, 176(R29)
+	MOVV R24, 184(R29)
+	MOVV R25, 192(R29)
+	MOVV RSB, 200(R29)
+	MOVV HI, R1
+	MOVV R1, 208(R29)
+	MOVV LO, R1
+	MOVV R1, 216(R29)
+	#ifndef GOMIPS64_softfloat
+	MOVV FCR31, R1
+	MOVV R1, 224(R29)
+	MOVD F0, 232(R29)
+	MOVD F1, 240(R29)
+	MOVD F2, 248(R29)
+	MOVD F3, 256(R29)
+	MOVD F4, 264(R29)
+	MOVD F5, 272(R29)
+	MOVD F6, 280(R29)
+	MOVD F7, 288(R29)
+	MOVD F8, 296(R29)
+	MOVD F9, 304(R29)
+	MOVD F10, 312(R29)
+	MOVD F11, 320(R29)
+	MOVD F12, 328(R29)
+	MOVD F13, 336(R29)
+	MOVD F14, 344(R29)
+	MOVD F15, 352(R29)
+	MOVD F16, 360(R29)
+	MOVD F17, 368(R29)
+	MOVD F18, 376(R29)
+	MOVD F19, 384(R29)
+	MOVD F20, 392(R29)
+	MOVD F21, 400(R29)
+	MOVD F22, 408(R29)
+	MOVD F23, 416(R29)
+	MOVD F24, 424(R29)
+	MOVD F25, 432(R29)
+	MOVD F26, 440(R29)
+	MOVD F27, 448(R29)
+	MOVD F28, 456(R29)
+	MOVD F29, 464(R29)
+	MOVD F30, 472(R29)
+	MOVD F31, 480(R29)
+	#endif
+	CALL ·asyncPreempt2(SB)
+	#ifndef GOMIPS64_softfloat
+	MOVD 480(R29), F31
+	MOVD 472(R29), F30
+	MOVD 464(R29), F29
+	MOVD 456(R29), F28
+	MOVD 448(R29), F27
+	MOVD 440(R29), F26
+	MOVD 432(R29), F25
+	MOVD 424(R29), F24
+	MOVD 416(R29), F23
+	MOVD 408(R29), F22
+	MOVD 400(R29), F21
+	MOVD 392(R29), F20
+	MOVD 384(R29), F19
+	MOVD 376(R29), F18
+	MOVD 368(R29), F17
+	MOVD 360(R29), F16
+	MOVD 352(R29), F15
+	MOVD 344(R29), F14
+	MOVD 336(R29), F13
+	MOVD 328(R29), F12
+	MOVD 320(R29), F11
+	MOVD 312(R29), F10
+	MOVD 304(R29), F9
+	MOVD 296(R29), F8
+	MOVD 288(R29), F7
+	MOVD 280(R29), F6
+	MOVD 272(R29), F5
+	MOVD 264(R29), F4
+	MOVD 256(R29), F3
+	MOVD 248(R29), F2
+	MOVD 240(R29), F1
+	MOVD 232(R29), F0
+	MOVV 224(R29), R1
+	MOVV R1, FCR31
+	#endif
+	MOVV 216(R29), R1
+	MOVV R1, LO
+	MOVV 208(R29), R1
+	MOVV R1, HI
+	MOVV 200(R29), RSB
+	MOVV 192(R29), R25
+	MOVV 184(R29), R24
+	MOVV 176(R29), R22
+	MOVV 168(R29), R21
+	MOVV 160(R29), R20
+	MOVV 152(R29), R19
+	MOVV 144(R29), R18
+	MOVV 136(R29), R17
+	MOVV 128(R29), R16
+	MOVV 120(R29), R15
+	MOVV 112(R29), R14
+	MOVV 104(R29), R13
+	MOVV 96(R29), R12
+	MOVV 88(R29), R11
+	MOVV 80(R29), R10
+	MOVV 72(R29), R9
+	MOVV 64(R29), R8
+	MOVV 56(R29), R7
+	MOVV 48(R29), R6
+	MOVV 40(R29), R5
+	MOVV 32(R29), R4
+	MOVV 24(R29), R3
+	MOVV 16(R29), R2
+	MOVV 8(R29), R1
+	MOVV 488(R29), R31
+	MOVV (R29), R23
+	ADDV $496, R29
+	JMP (R23)
diff --git a/src/runtime/preempt_mipsx.s b/src/runtime/preempt_mipsx.s
new file mode 100644
index 0000000..86d3a91
--- /dev/null
+++ b/src/runtime/preempt_mipsx.s
@@ -0,0 +1,146 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+// +build mips mipsle
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.
+TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW R31, -244(R29)
+	SUB $244, R29
+	MOVW R1, 4(R29)
+	MOVW R2, 8(R29)
+	MOVW R3, 12(R29)
+	MOVW R4, 16(R29)
+	MOVW R5, 20(R29)
+	MOVW R6, 24(R29)
+	MOVW R7, 28(R29)
+	MOVW R8, 32(R29)
+	MOVW R9, 36(R29)
+	MOVW R10, 40(R29)
+	MOVW R11, 44(R29)
+	MOVW R12, 48(R29)
+	MOVW R13, 52(R29)
+	MOVW R14, 56(R29)
+	MOVW R15, 60(R29)
+	MOVW R16, 64(R29)
+	MOVW R17, 68(R29)
+	MOVW R18, 72(R29)
+	MOVW R19, 76(R29)
+	MOVW R20, 80(R29)
+	MOVW R21, 84(R29)
+	MOVW R22, 88(R29)
+	MOVW R24, 92(R29)
+	MOVW R25, 96(R29)
+	MOVW R28, 100(R29)
+	MOVW HI, R1
+	MOVW R1, 104(R29)
+	MOVW LO, R1
+	MOVW R1, 108(R29)
+	#ifndef GOMIPS_softfloat
+	MOVW FCR31, R1
+	MOVW R1, 112(R29)
+	MOVF F0, 116(R29)
+	MOVF F1, 120(R29)
+	MOVF F2, 124(R29)
+	MOVF F3, 128(R29)
+	MOVF F4, 132(R29)
+	MOVF F5, 136(R29)
+	MOVF F6, 140(R29)
+	MOVF F7, 144(R29)
+	MOVF F8, 148(R29)
+	MOVF F9, 152(R29)
+	MOVF F10, 156(R29)
+	MOVF F11, 160(R29)
+	MOVF F12, 164(R29)
+	MOVF F13, 168(R29)
+	MOVF F14, 172(R29)
+	MOVF F15, 176(R29)
+	MOVF F16, 180(R29)
+	MOVF F17, 184(R29)
+	MOVF F18, 188(R29)
+	MOVF F19, 192(R29)
+	MOVF F20, 196(R29)
+	MOVF F21, 200(R29)
+	MOVF F22, 204(R29)
+	MOVF F23, 208(R29)
+	MOVF F24, 212(R29)
+	MOVF F25, 216(R29)
+	MOVF F26, 220(R29)
+	MOVF F27, 224(R29)
+	MOVF F28, 228(R29)
+	MOVF F29, 232(R29)
+	MOVF F30, 236(R29)
+	MOVF F31, 240(R29)
+	#endif
+	CALL ·asyncPreempt2(SB)
+	#ifndef GOMIPS_softfloat
+	MOVF 240(R29), F31
+	MOVF 236(R29), F30
+	MOVF 232(R29), F29
+	MOVF 228(R29), F28
+	MOVF 224(R29), F27
+	MOVF 220(R29), F26
+	MOVF 216(R29), F25
+	MOVF 212(R29), F24
+	MOVF 208(R29), F23
+	MOVF 204(R29), F22
+	MOVF 200(R29), F21
+	MOVF 196(R29), F20
+	MOVF 192(R29), F19
+	MOVF 188(R29), F18
+	MOVF 184(R29), F17
+	MOVF 180(R29), F16
+	MOVF 176(R29), F15
+	MOVF 172(R29), F14
+	MOVF 168(R29), F13
+	MOVF 164(R29), F12
+	MOVF 160(R29), F11
+	MOVF 156(R29), F10
+	MOVF 152(R29), F9
+	MOVF 148(R29), F8
+	MOVF 144(R29), F7
+	MOVF 140(R29), F6
+	MOVF 136(R29), F5
+	MOVF 132(R29), F4
+	MOVF 128(R29), F3
+	MOVF 124(R29), F2
+	MOVF 120(R29), F1
+	MOVF 116(R29), F0
+	MOVW 112(R29), R1
+	MOVW R1, FCR31
+	#endif
+	MOVW 108(R29), R1
+	MOVW R1, LO
+	MOVW 104(R29), R1
+	MOVW R1, HI
+	MOVW 100(R29), R28
+	MOVW 96(R29), R25
+	MOVW 92(R29), R24
+	MOVW 88(R29), R22
+	MOVW 84(R29), R21
+	MOVW 80(R29), R20
+	MOVW 76(R29), R19
+	MOVW 72(R29), R18
+	MOVW 68(R29), R17
+	MOVW 64(R29), R16
+	MOVW 60(R29), R15
+	MOVW 56(R29), R14
+	MOVW 52(R29), R13
+	MOVW 48(R29), R12
+	MOVW 44(R29), R11
+	MOVW 40(R29), R10
+	MOVW 36(R29), R9
+	MOVW 32(R29), R8
+	MOVW 28(R29), R7
+	MOVW 24(R29), R6
+	MOVW 20(R29), R5
+	MOVW 16(R29), R4
+	MOVW 12(R29), R3
+	MOVW 8(R29), R2
+	MOVW 4(R29), R1
+	MOVW 244(R29), R31
+	MOVW (R29), R23
+	ADD $248, R29
+	JMP (R23)
diff --git a/src/runtime/preempt_nonwindows.go b/src/runtime/preempt_nonwindows.go
new file mode 100644
index 0000000..3066a15
--- /dev/null
+++ b/src/runtime/preempt_nonwindows.go
@@ -0,0 +1,13 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !windows
+
+package runtime
+
+//go:nosplit
+func osPreemptExtEnter(mp *m) {}
+
+//go:nosplit
+func osPreemptExtExit(mp *m) {}
diff --git a/src/runtime/preempt_ppc64x.s b/src/runtime/preempt_ppc64x.s
new file mode 100644
index 0000000..9063438
--- /dev/null
+++ b/src/runtime/preempt_ppc64x.s
@@ -0,0 +1,148 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+// +build ppc64 ppc64le
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.
+TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD R31, -488(R1)
+	MOVD LR, R31
+	MOVDU R31, -520(R1)
+	MOVD R3, 40(R1)
+	MOVD R4, 48(R1)
+	MOVD R5, 56(R1)
+	MOVD R6, 64(R1)
+	MOVD R7, 72(R1)
+	MOVD R8, 80(R1)
+	MOVD R9, 88(R1)
+	MOVD R10, 96(R1)
+	MOVD R11, 104(R1)
+	MOVD R14, 112(R1)
+	MOVD R15, 120(R1)
+	MOVD R16, 128(R1)
+	MOVD R17, 136(R1)
+	MOVD R18, 144(R1)
+	MOVD R19, 152(R1)
+	MOVD R20, 160(R1)
+	MOVD R21, 168(R1)
+	MOVD R22, 176(R1)
+	MOVD R23, 184(R1)
+	MOVD R24, 192(R1)
+	MOVD R25, 200(R1)
+	MOVD R26, 208(R1)
+	MOVD R27, 216(R1)
+	MOVD R28, 224(R1)
+	MOVD R29, 232(R1)
+	MOVW CR, R31
+	MOVW R31, 240(R1)
+	MOVD XER, R31
+	MOVD R31, 248(R1)
+	FMOVD F0, 256(R1)
+	FMOVD F1, 264(R1)
+	FMOVD F2, 272(R1)
+	FMOVD F3, 280(R1)
+	FMOVD F4, 288(R1)
+	FMOVD F5, 296(R1)
+	FMOVD F6, 304(R1)
+	FMOVD F7, 312(R1)
+	FMOVD F8, 320(R1)
+	FMOVD F9, 328(R1)
+	FMOVD F10, 336(R1)
+	FMOVD F11, 344(R1)
+	FMOVD F12, 352(R1)
+	FMOVD F13, 360(R1)
+	FMOVD F14, 368(R1)
+	FMOVD F15, 376(R1)
+	FMOVD F16, 384(R1)
+	FMOVD F17, 392(R1)
+	FMOVD F18, 400(R1)
+	FMOVD F19, 408(R1)
+	FMOVD F20, 416(R1)
+	FMOVD F21, 424(R1)
+	FMOVD F22, 432(R1)
+	FMOVD F23, 440(R1)
+	FMOVD F24, 448(R1)
+	FMOVD F25, 456(R1)
+	FMOVD F26, 464(R1)
+	FMOVD F27, 472(R1)
+	FMOVD F28, 480(R1)
+	FMOVD F29, 488(R1)
+	FMOVD F30, 496(R1)
+	FMOVD F31, 504(R1)
+	MOVFL FPSCR, F0
+	FMOVD F0, 512(R1)
+	CALL ·asyncPreempt2(SB)
+	FMOVD 512(R1), F0
+	MOVFL F0, FPSCR
+	FMOVD 504(R1), F31
+	FMOVD 496(R1), F30
+	FMOVD 488(R1), F29
+	FMOVD 480(R1), F28
+	FMOVD 472(R1), F27
+	FMOVD 464(R1), F26
+	FMOVD 456(R1), F25
+	FMOVD 448(R1), F24
+	FMOVD 440(R1), F23
+	FMOVD 432(R1), F22
+	FMOVD 424(R1), F21
+	FMOVD 416(R1), F20
+	FMOVD 408(R1), F19
+	FMOVD 400(R1), F18
+	FMOVD 392(R1), F17
+	FMOVD 384(R1), F16
+	FMOVD 376(R1), F15
+	FMOVD 368(R1), F14
+	FMOVD 360(R1), F13
+	FMOVD 352(R1), F12
+	FMOVD 344(R1), F11
+	FMOVD 336(R1), F10
+	FMOVD 328(R1), F9
+	FMOVD 320(R1), F8
+	FMOVD 312(R1), F7
+	FMOVD 304(R1), F6
+	FMOVD 296(R1), F5
+	FMOVD 288(R1), F4
+	FMOVD 280(R1), F3
+	FMOVD 272(R1), F2
+	FMOVD 264(R1), F1
+	FMOVD 256(R1), F0
+	MOVD 248(R1), R31
+	MOVD R31, XER
+	MOVW 240(R1), R31
+	MOVFL R31, $0xff
+	MOVD 232(R1), R29
+	MOVD 224(R1), R28
+	MOVD 216(R1), R27
+	MOVD 208(R1), R26
+	MOVD 200(R1), R25
+	MOVD 192(R1), R24
+	MOVD 184(R1), R23
+	MOVD 176(R1), R22
+	MOVD 168(R1), R21
+	MOVD 160(R1), R20
+	MOVD 152(R1), R19
+	MOVD 144(R1), R18
+	MOVD 136(R1), R17
+	MOVD 128(R1), R16
+	MOVD 120(R1), R15
+	MOVD 112(R1), R14
+	MOVD 104(R1), R11
+	MOVD 96(R1), R10
+	MOVD 88(R1), R9
+	MOVD 80(R1), R8
+	MOVD 72(R1), R7
+	MOVD 64(R1), R6
+	MOVD 56(R1), R5
+	MOVD 48(R1), R4
+	MOVD 40(R1), R3
+	MOVD 520(R1), R31
+	MOVD R31, LR
+	MOVD 528(R1), R2
+	MOVD 536(R1), R12
+	MOVD (R1), R31
+	MOVD R31, CTR
+	MOVD 32(R1), R31
+	ADD $552, R1
+	JMP (CTR)
diff --git a/src/runtime/preempt_riscv64.s b/src/runtime/preempt_riscv64.s
new file mode 100644
index 0000000..d4f9cc2
--- /dev/null
+++ b/src/runtime/preempt_riscv64.s
@@ -0,0 +1,130 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.
+TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	MOV X1, -472(X2)
+	ADD $-472, X2
+	MOV X3, 8(X2)
+	MOV X5, 16(X2)
+	MOV X6, 24(X2)
+	MOV X7, 32(X2)
+	MOV X8, 40(X2)
+	MOV X9, 48(X2)
+	MOV X10, 56(X2)
+	MOV X11, 64(X2)
+	MOV X12, 72(X2)
+	MOV X13, 80(X2)
+	MOV X14, 88(X2)
+	MOV X15, 96(X2)
+	MOV X16, 104(X2)
+	MOV X17, 112(X2)
+	MOV X18, 120(X2)
+	MOV X19, 128(X2)
+	MOV X20, 136(X2)
+	MOV X21, 144(X2)
+	MOV X22, 152(X2)
+	MOV X23, 160(X2)
+	MOV X24, 168(X2)
+	MOV X25, 176(X2)
+	MOV X26, 184(X2)
+	MOV X28, 192(X2)
+	MOV X29, 200(X2)
+	MOV X30, 208(X2)
+	MOVD F0, 216(X2)
+	MOVD F1, 224(X2)
+	MOVD F2, 232(X2)
+	MOVD F3, 240(X2)
+	MOVD F4, 248(X2)
+	MOVD F5, 256(X2)
+	MOVD F6, 264(X2)
+	MOVD F7, 272(X2)
+	MOVD F8, 280(X2)
+	MOVD F9, 288(X2)
+	MOVD F10, 296(X2)
+	MOVD F11, 304(X2)
+	MOVD F12, 312(X2)
+	MOVD F13, 320(X2)
+	MOVD F14, 328(X2)
+	MOVD F15, 336(X2)
+	MOVD F16, 344(X2)
+	MOVD F17, 352(X2)
+	MOVD F18, 360(X2)
+	MOVD F19, 368(X2)
+	MOVD F20, 376(X2)
+	MOVD F21, 384(X2)
+	MOVD F22, 392(X2)
+	MOVD F23, 400(X2)
+	MOVD F24, 408(X2)
+	MOVD F25, 416(X2)
+	MOVD F26, 424(X2)
+	MOVD F27, 432(X2)
+	MOVD F28, 440(X2)
+	MOVD F29, 448(X2)
+	MOVD F30, 456(X2)
+	MOVD F31, 464(X2)
+	CALL ·asyncPreempt2(SB)
+	MOVD 464(X2), F31
+	MOVD 456(X2), F30
+	MOVD 448(X2), F29
+	MOVD 440(X2), F28
+	MOVD 432(X2), F27
+	MOVD 424(X2), F26
+	MOVD 416(X2), F25
+	MOVD 408(X2), F24
+	MOVD 400(X2), F23
+	MOVD 392(X2), F22
+	MOVD 384(X2), F21
+	MOVD 376(X2), F20
+	MOVD 368(X2), F19
+	MOVD 360(X2), F18
+	MOVD 352(X2), F17
+	MOVD 344(X2), F16
+	MOVD 336(X2), F15
+	MOVD 328(X2), F14
+	MOVD 320(X2), F13
+	MOVD 312(X2), F12
+	MOVD 304(X2), F11
+	MOVD 296(X2), F10
+	MOVD 288(X2), F9
+	MOVD 280(X2), F8
+	MOVD 272(X2), F7
+	MOVD 264(X2), F6
+	MOVD 256(X2), F5
+	MOVD 248(X2), F4
+	MOVD 240(X2), F3
+	MOVD 232(X2), F2
+	MOVD 224(X2), F1
+	MOVD 216(X2), F0
+	MOV 208(X2), X30
+	MOV 200(X2), X29
+	MOV 192(X2), X28
+	MOV 184(X2), X26
+	MOV 176(X2), X25
+	MOV 168(X2), X24
+	MOV 160(X2), X23
+	MOV 152(X2), X22
+	MOV 144(X2), X21
+	MOV 136(X2), X20
+	MOV 128(X2), X19
+	MOV 120(X2), X18
+	MOV 112(X2), X17
+	MOV 104(X2), X16
+	MOV 96(X2), X15
+	MOV 88(X2), X14
+	MOV 80(X2), X13
+	MOV 72(X2), X12
+	MOV 64(X2), X11
+	MOV 56(X2), X10
+	MOV 48(X2), X9
+	MOV 40(X2), X8
+	MOV 32(X2), X7
+	MOV 24(X2), X6
+	MOV 16(X2), X5
+	MOV 8(X2), X3
+	MOV 472(X2), X1
+	MOV (X2), X31
+	ADD $480, X2
+	JMP (X31)
diff --git a/src/runtime/preempt_s390x.s b/src/runtime/preempt_s390x.s
new file mode 100644
index 0000000..c6f1157
--- /dev/null
+++ b/src/runtime/preempt_s390x.s
@@ -0,0 +1,52 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.
+TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	IPM R10
+	MOVD R14, -248(R15)
+	ADD $-248, R15
+	MOVW R10, 8(R15)
+	STMG R0, R12, 16(R15)
+	FMOVD F0, 120(R15)
+	FMOVD F1, 128(R15)
+	FMOVD F2, 136(R15)
+	FMOVD F3, 144(R15)
+	FMOVD F4, 152(R15)
+	FMOVD F5, 160(R15)
+	FMOVD F6, 168(R15)
+	FMOVD F7, 176(R15)
+	FMOVD F8, 184(R15)
+	FMOVD F9, 192(R15)
+	FMOVD F10, 200(R15)
+	FMOVD F11, 208(R15)
+	FMOVD F12, 216(R15)
+	FMOVD F13, 224(R15)
+	FMOVD F14, 232(R15)
+	FMOVD F15, 240(R15)
+	CALL ·asyncPreempt2(SB)
+	FMOVD 240(R15), F15
+	FMOVD 232(R15), F14
+	FMOVD 224(R15), F13
+	FMOVD 216(R15), F12
+	FMOVD 208(R15), F11
+	FMOVD 200(R15), F10
+	FMOVD 192(R15), F9
+	FMOVD 184(R15), F8
+	FMOVD 176(R15), F7
+	FMOVD 168(R15), F6
+	FMOVD 160(R15), F5
+	FMOVD 152(R15), F4
+	FMOVD 144(R15), F3
+	FMOVD 136(R15), F2
+	FMOVD 128(R15), F1
+	FMOVD 120(R15), F0
+	LMG 16(R15), R0, R12
+	MOVD 248(R15), R14
+	ADD $256, R15
+	MOVWZ -248(R15), R10
+	TMLH R10, $(3<<12)
+	MOVD -256(R15), R10
+	JMP (R10)
diff --git a/src/runtime/preempt_wasm.s b/src/runtime/preempt_wasm.s
new file mode 100644
index 0000000..da90e8a
--- /dev/null
+++ b/src/runtime/preempt_wasm.s
@@ -0,0 +1,9 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Note: asyncPreempt doesn't use the internal ABI, but we must be able to inject calls to it from the signal handler, so Go code has to see the PC of this function literally.
+TEXT ·asyncPreempt<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	// No async preemption on wasm
+	UNDEF
diff --git a/src/runtime/print.go b/src/runtime/print.go
new file mode 100644
index 0000000..64055a3
--- /dev/null
+++ b/src/runtime/print.go
@@ -0,0 +1,312 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// The compiler knows that a print of a value of this type
+// should use printhex instead of printuint (decimal).
+type hex uint64
+
+func bytes(s string) (ret []byte) {
+	rp := (*slice)(unsafe.Pointer(&ret))
+	sp := stringStructOf(&s)
+	rp.array = sp.str
+	rp.len = sp.len
+	rp.cap = sp.len
+	return
+}
+
+var (
+	// printBacklog is a circular buffer of messages written with the builtin
+	// print* functions, for use in postmortem analysis of core dumps.
+	printBacklog      [512]byte
+	printBacklogIndex int
+)
+
+// recordForPanic maintains a circular buffer of messages written by the
+// runtime leading up to a process crash, allowing the messages to be
+// extracted from a core dump.
+//
+// The text written during a process crash (following "panic" or "fatal
+// error") is not saved, since the goroutine stacks will generally be readable
+// from the runtime datastructures in the core file.
+func recordForPanic(b []byte) {
+	printlock()
+
+	if atomic.Load(&panicking) == 0 {
+		// Not actively crashing: maintain circular buffer of print output.
+		for i := 0; i < len(b); {
+			n := copy(printBacklog[printBacklogIndex:], b[i:])
+			i += n
+			printBacklogIndex += n
+			printBacklogIndex %= len(printBacklog)
+		}
+	}
+
+	printunlock()
+}
+
+var debuglock mutex
+
+// The compiler emits calls to printlock and printunlock around
+// the multiple calls that implement a single Go print or println
+// statement. Some of the print helpers (printslice, for example)
+// call print recursively. There is also the problem of a crash
+// happening during the print routines and needing to acquire
+// the print lock to print information about the crash.
+// For both these reasons, let a thread acquire the printlock 'recursively'.
+
+func printlock() {
+	mp := getg().m
+	mp.locks++ // do not reschedule between printlock++ and lock(&debuglock).
+	mp.printlock++
+	if mp.printlock == 1 {
+		lock(&debuglock)
+	}
+	mp.locks-- // now we know debuglock is held and holding up mp.locks for us.
+}
+
+func printunlock() {
+	mp := getg().m
+	mp.printlock--
+	if mp.printlock == 0 {
+		unlock(&debuglock)
+	}
+}
+
+// write to goroutine-local buffer if diverting output,
+// or else standard error.
+func gwrite(b []byte) {
+	if len(b) == 0 {
+		return
+	}
+	recordForPanic(b)
+	gp := getg()
+	// Don't use the writebuf if gp.m is dying. We want anything
+	// written through gwrite to appear in the terminal rather
+	// than be written to in some buffer, if we're in a panicking state.
+	// Note that we can't just clear writebuf in the gp.m.dying case
+	// because a panic isn't allowed to have any write barriers.
+	if gp == nil || gp.writebuf == nil || gp.m.dying > 0 {
+		writeErr(b)
+		return
+	}
+
+	n := copy(gp.writebuf[len(gp.writebuf):cap(gp.writebuf)], b)
+	gp.writebuf = gp.writebuf[:len(gp.writebuf)+n]
+}
+
+func printsp() {
+	printstring(" ")
+}
+
+func printnl() {
+	printstring("\n")
+}
+
+func printbool(v bool) {
+	if v {
+		printstring("true")
+	} else {
+		printstring("false")
+	}
+}
+
+func printfloat(v float64) {
+	switch {
+	case v != v:
+		printstring("NaN")
+		return
+	case v+v == v && v > 0:
+		printstring("+Inf")
+		return
+	case v+v == v && v < 0:
+		printstring("-Inf")
+		return
+	}
+
+	const n = 7 // digits printed
+	var buf [n + 7]byte
+	buf[0] = '+'
+	e := 0 // exp
+	if v == 0 {
+		if 1/v < 0 {
+			buf[0] = '-'
+		}
+	} else {
+		if v < 0 {
+			v = -v
+			buf[0] = '-'
+		}
+
+		// normalize
+		for v >= 10 {
+			e++
+			v /= 10
+		}
+		for v < 1 {
+			e--
+			v *= 10
+		}
+
+		// round
+		h := 5.0
+		for i := 0; i < n; i++ {
+			h /= 10
+		}
+		v += h
+		if v >= 10 {
+			e++
+			v /= 10
+		}
+	}
+
+	// format +d.dddd+edd
+	for i := 0; i < n; i++ {
+		s := int(v)
+		buf[i+2] = byte(s + '0')
+		v -= float64(s)
+		v *= 10
+	}
+	buf[1] = buf[2]
+	buf[2] = '.'
+
+	buf[n+2] = 'e'
+	buf[n+3] = '+'
+	if e < 0 {
+		e = -e
+		buf[n+3] = '-'
+	}
+
+	buf[n+4] = byte(e/100) + '0'
+	buf[n+5] = byte(e/10)%10 + '0'
+	buf[n+6] = byte(e%10) + '0'
+	gwrite(buf[:])
+}
+
+func printcomplex(c complex128) {
+	print("(", real(c), imag(c), "i)")
+}
+
+func printuint(v uint64) {
+	var buf [100]byte
+	i := len(buf)
+	for i--; i > 0; i-- {
+		buf[i] = byte(v%10 + '0')
+		if v < 10 {
+			break
+		}
+		v /= 10
+	}
+	gwrite(buf[i:])
+}
+
+func printint(v int64) {
+	if v < 0 {
+		printstring("-")
+		v = -v
+	}
+	printuint(uint64(v))
+}
+
+func printhex(v uint64) {
+	const dig = "0123456789abcdef"
+	var buf [100]byte
+	i := len(buf)
+	for i--; i > 0; i-- {
+		buf[i] = dig[v%16]
+		if v < 16 {
+			break
+		}
+		v /= 16
+	}
+	i--
+	buf[i] = 'x'
+	i--
+	buf[i] = '0'
+	gwrite(buf[i:])
+}
+
+func printpointer(p unsafe.Pointer) {
+	printhex(uint64(uintptr(p)))
+}
+func printuintptr(p uintptr) {
+	printhex(uint64(p))
+}
+
+func printstring(s string) {
+	gwrite(bytes(s))
+}
+
+func printslice(s []byte) {
+	sp := (*slice)(unsafe.Pointer(&s))
+	print("[", len(s), "/", cap(s), "]")
+	printpointer(sp.array)
+}
+
+func printeface(e eface) {
+	print("(", e._type, ",", e.data, ")")
+}
+
+func printiface(i iface) {
+	print("(", i.tab, ",", i.data, ")")
+}
+
+// hexdumpWords prints a word-oriented hex dump of [p, end).
+//
+// If mark != nil, it will be called with each printed word's address
+// and should return a character mark to appear just before that
+// word's value. It can return 0 to indicate no mark.
+func hexdumpWords(p, end uintptr, mark func(uintptr) byte) {
+	p1 := func(x uintptr) {
+		var buf [2 * sys.PtrSize]byte
+		for i := len(buf) - 1; i >= 0; i-- {
+			if x&0xF < 10 {
+				buf[i] = byte(x&0xF) + '0'
+			} else {
+				buf[i] = byte(x&0xF) - 10 + 'a'
+			}
+			x >>= 4
+		}
+		gwrite(buf[:])
+	}
+
+	printlock()
+	var markbuf [1]byte
+	markbuf[0] = ' '
+	for i := uintptr(0); p+i < end; i += sys.PtrSize {
+		if i%16 == 0 {
+			if i != 0 {
+				println()
+			}
+			p1(p + i)
+			print(": ")
+		}
+
+		if mark != nil {
+			markbuf[0] = mark(p + i)
+			if markbuf[0] == 0 {
+				markbuf[0] = ' '
+			}
+		}
+		gwrite(markbuf[:])
+		val := *(*uintptr)(unsafe.Pointer(p + i))
+		p1(val)
+		print(" ")
+
+		// Can we symbolize val?
+		fn := findfunc(val)
+		if fn.valid() {
+			print("<", funcname(fn), "+", val-fn.entry, "> ")
+		}
+	}
+	println()
+	printunlock()
+}
diff --git a/src/runtime/proc.go b/src/runtime/proc.go
new file mode 100644
index 0000000..32fe877
--- /dev/null
+++ b/src/runtime/proc.go
@@ -0,0 +1,6336 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/bytealg"
+	"internal/cpu"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+var buildVersion = sys.TheVersion
+
+// set using cmd/go/internal/modload.ModInfoProg
+var modinfo string
+
+// Goroutine scheduler
+// The scheduler's job is to distribute ready-to-run goroutines over worker threads.
+//
+// The main concepts are:
+// G - goroutine.
+// M - worker thread, or machine.
+// P - processor, a resource that is required to execute Go code.
+//     M must have an associated P to execute Go code, however it can be
+//     blocked or in a syscall w/o an associated P.
+//
+// Design doc at https://golang.org/s/go11sched.
+
+// Worker thread parking/unparking.
+// We need to balance between keeping enough running worker threads to utilize
+// available hardware parallelism and parking excessive running worker threads
+// to conserve CPU resources and power. This is not simple for two reasons:
+// (1) scheduler state is intentionally distributed (in particular, per-P work
+// queues), so it is not possible to compute global predicates on fast paths;
+// (2) for optimal thread management we would need to know the future (don't park
+// a worker thread when a new goroutine will be readied in near future).
+//
+// Three rejected approaches that would work badly:
+// 1. Centralize all scheduler state (would inhibit scalability).
+// 2. Direct goroutine handoff. That is, when we ready a new goroutine and there
+//    is a spare P, unpark a thread and handoff it the thread and the goroutine.
+//    This would lead to thread state thrashing, as the thread that readied the
+//    goroutine can be out of work the very next moment, we will need to park it.
+//    Also, it would destroy locality of computation as we want to preserve
+//    dependent goroutines on the same thread; and introduce additional latency.
+// 3. Unpark an additional thread whenever we ready a goroutine and there is an
+//    idle P, but don't do handoff. This would lead to excessive thread parking/
+//    unparking as the additional threads will instantly park without discovering
+//    any work to do.
+//
+// The current approach:
+// We unpark an additional thread when we ready a goroutine if (1) there is an
+// idle P and there are no "spinning" worker threads. A worker thread is considered
+// spinning if it is out of local work and did not find work in global run queue/
+// netpoller; the spinning state is denoted in m.spinning and in sched.nmspinning.
+// Threads unparked this way are also considered spinning; we don't do goroutine
+// handoff so such threads are out of work initially. Spinning threads do some
+// spinning looking for work in per-P run queues before parking. If a spinning
+// thread finds work it takes itself out of the spinning state and proceeds to
+// execution. If it does not find work it takes itself out of the spinning state
+// and then parks.
+// If there is at least one spinning thread (sched.nmspinning>1), we don't unpark
+// new threads when readying goroutines. To compensate for that, if the last spinning
+// thread finds work and stops spinning, it must unpark a new spinning thread.
+// This approach smooths out unjustified spikes of thread unparking,
+// but at the same time guarantees eventual maximal CPU parallelism utilization.
+//
+// The main implementation complication is that we need to be very careful during
+// spinning->non-spinning thread transition. This transition can race with submission
+// of a new goroutine, and either one part or another needs to unpark another worker
+// thread. If they both fail to do that, we can end up with semi-persistent CPU
+// underutilization. The general pattern for goroutine readying is: submit a goroutine
+// to local work queue, #StoreLoad-style memory barrier, check sched.nmspinning.
+// The general pattern for spinning->non-spinning transition is: decrement nmspinning,
+// #StoreLoad-style memory barrier, check all per-P work queues for new work.
+// Note that all this complexity does not apply to global run queue as we are not
+// sloppy about thread unparking when submitting to global queue. Also see comments
+// for nmspinning manipulation.
+
+var (
+	m0           m
+	g0           g
+	mcache0      *mcache
+	raceprocctx0 uintptr
+)
+
+//go:linkname runtime_inittask runtime..inittask
+var runtime_inittask initTask
+
+//go:linkname main_inittask main..inittask
+var main_inittask initTask
+
+// main_init_done is a signal used by cgocallbackg that initialization
+// has been completed. It is made before _cgo_notify_runtime_init_done,
+// so all cgo calls can rely on it existing. When main_init is complete,
+// it is closed, meaning cgocallbackg can reliably receive from it.
+var main_init_done chan bool
+
+//go:linkname main_main main.main
+func main_main()
+
+// mainStarted indicates that the main M has started.
+var mainStarted bool
+
+// runtimeInitTime is the nanotime() at which the runtime started.
+var runtimeInitTime int64
+
+// Value to use for signal mask for newly created M's.
+var initSigmask sigset
+
+// The main goroutine.
+func main() {
+	g := getg()
+
+	// Racectx of m0->g0 is used only as the parent of the main goroutine.
+	// It must not be used for anything else.
+	g.m.g0.racectx = 0
+
+	// Max stack size is 1 GB on 64-bit, 250 MB on 32-bit.
+	// Using decimal instead of binary GB and MB because
+	// they look nicer in the stack overflow failure message.
+	if sys.PtrSize == 8 {
+		maxstacksize = 1000000000
+	} else {
+		maxstacksize = 250000000
+	}
+
+	// An upper limit for max stack size. Used to avoid random crashes
+	// after calling SetMaxStack and trying to allocate a stack that is too big,
+	// since stackalloc works with 32-bit sizes.
+	maxstackceiling = 2 * maxstacksize
+
+	// Allow newproc to start new Ms.
+	mainStarted = true
+
+	if GOARCH != "wasm" { // no threads on wasm yet, so no sysmon
+		// For runtime_syscall_doAllThreadsSyscall, we
+		// register sysmon is not ready for the world to be
+		// stopped.
+		atomic.Store(&sched.sysmonStarting, 1)
+		systemstack(func() {
+			newm(sysmon, nil, -1)
+		})
+	}
+
+	// Lock the main goroutine onto this, the main OS thread,
+	// during initialization. Most programs won't care, but a few
+	// do require certain calls to be made by the main thread.
+	// Those can arrange for main.main to run in the main thread
+	// by calling runtime.LockOSThread during initialization
+	// to preserve the lock.
+	lockOSThread()
+
+	if g.m != &m0 {
+		throw("runtime.main not on m0")
+	}
+	m0.doesPark = true
+
+	// Record when the world started.
+	// Must be before doInit for tracing init.
+	runtimeInitTime = nanotime()
+	if runtimeInitTime == 0 {
+		throw("nanotime returning zero")
+	}
+
+	if debug.inittrace != 0 {
+		inittrace.id = getg().goid
+		inittrace.active = true
+	}
+
+	doInit(&runtime_inittask) // Must be before defer.
+
+	// Defer unlock so that runtime.Goexit during init does the unlock too.
+	needUnlock := true
+	defer func() {
+		if needUnlock {
+			unlockOSThread()
+		}
+	}()
+
+	gcenable()
+
+	main_init_done = make(chan bool)
+	if iscgo {
+		if _cgo_thread_start == nil {
+			throw("_cgo_thread_start missing")
+		}
+		if GOOS != "windows" {
+			if _cgo_setenv == nil {
+				throw("_cgo_setenv missing")
+			}
+			if _cgo_unsetenv == nil {
+				throw("_cgo_unsetenv missing")
+			}
+		}
+		if _cgo_notify_runtime_init_done == nil {
+			throw("_cgo_notify_runtime_init_done missing")
+		}
+		// Start the template thread in case we enter Go from
+		// a C-created thread and need to create a new thread.
+		startTemplateThread()
+		cgocall(_cgo_notify_runtime_init_done, nil)
+	}
+
+	doInit(&main_inittask)
+
+	// Disable init tracing after main init done to avoid overhead
+	// of collecting statistics in malloc and newproc
+	inittrace.active = false
+
+	close(main_init_done)
+
+	needUnlock = false
+	unlockOSThread()
+
+	if isarchive || islibrary {
+		// A program compiled with -buildmode=c-archive or c-shared
+		// has a main, but it is not executed.
+		return
+	}
+	fn := main_main // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
+	fn()
+	if raceenabled {
+		racefini()
+	}
+
+	// Make racy client program work: if panicking on
+	// another goroutine at the same time as main returns,
+	// let the other goroutine finish printing the panic trace.
+	// Once it does, it will exit. See issues 3934 and 20018.
+	if atomic.Load(&runningPanicDefers) != 0 {
+		// Running deferred functions should not take long.
+		for c := 0; c < 1000; c++ {
+			if atomic.Load(&runningPanicDefers) == 0 {
+				break
+			}
+			Gosched()
+		}
+	}
+	if atomic.Load(&panicking) != 0 {
+		gopark(nil, nil, waitReasonPanicWait, traceEvGoStop, 1)
+	}
+
+	exit(0)
+	for {
+		var x *int32
+		*x = 0
+	}
+}
+
+// os_beforeExit is called from os.Exit(0).
+//go:linkname os_beforeExit os.runtime_beforeExit
+func os_beforeExit() {
+	if raceenabled {
+		racefini()
+	}
+}
+
+// start forcegc helper goroutine
+func init() {
+	go forcegchelper()
+}
+
+func forcegchelper() {
+	forcegc.g = getg()
+	lockInit(&forcegc.lock, lockRankForcegc)
+	for {
+		lock(&forcegc.lock)
+		if forcegc.idle != 0 {
+			throw("forcegc: phase error")
+		}
+		atomic.Store(&forcegc.idle, 1)
+		goparkunlock(&forcegc.lock, waitReasonForceGCIdle, traceEvGoBlock, 1)
+		// this goroutine is explicitly resumed by sysmon
+		if debug.gctrace > 0 {
+			println("GC forced")
+		}
+		// Time-triggered, fully concurrent.
+		gcStart(gcTrigger{kind: gcTriggerTime, now: nanotime()})
+	}
+}
+
+//go:nosplit
+
+// Gosched yields the processor, allowing other goroutines to run. It does not
+// suspend the current goroutine, so execution resumes automatically.
+func Gosched() {
+	checkTimeouts()
+	mcall(gosched_m)
+}
+
+// goschedguarded yields the processor like gosched, but also checks
+// for forbidden states and opts out of the yield in those cases.
+//go:nosplit
+func goschedguarded() {
+	mcall(goschedguarded_m)
+}
+
+// Puts the current goroutine into a waiting state and calls unlockf on the
+// system stack.
+//
+// If unlockf returns false, the goroutine is resumed.
+//
+// unlockf must not access this G's stack, as it may be moved between
+// the call to gopark and the call to unlockf.
+//
+// Note that because unlockf is called after putting the G into a waiting
+// state, the G may have already been readied by the time unlockf is called
+// unless there is external synchronization preventing the G from being
+// readied. If unlockf returns false, it must guarantee that the G cannot be
+// externally readied.
+//
+// Reason explains why the goroutine has been parked. It is displayed in stack
+// traces and heap dumps. Reasons should be unique and descriptive. Do not
+// re-use reasons, add new ones.
+func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer, reason waitReason, traceEv byte, traceskip int) {
+	if reason != waitReasonSleep {
+		checkTimeouts() // timeouts may expire while two goroutines keep the scheduler busy
+	}
+	mp := acquirem()
+	gp := mp.curg
+	status := readgstatus(gp)
+	if status != _Grunning && status != _Gscanrunning {
+		throw("gopark: bad g status")
+	}
+	mp.waitlock = lock
+	mp.waitunlockf = unlockf
+	gp.waitreason = reason
+	mp.waittraceev = traceEv
+	mp.waittraceskip = traceskip
+	releasem(mp)
+	// can't do anything that might move the G between Ms here.
+	mcall(park_m)
+}
+
+// Puts the current goroutine into a waiting state and unlocks the lock.
+// The goroutine can be made runnable again by calling goready(gp).
+func goparkunlock(lock *mutex, reason waitReason, traceEv byte, traceskip int) {
+	gopark(parkunlock_c, unsafe.Pointer(lock), reason, traceEv, traceskip)
+}
+
+func goready(gp *g, traceskip int) {
+	systemstack(func() {
+		ready(gp, traceskip, true)
+	})
+}
+
+//go:nosplit
+func acquireSudog() *sudog {
+	// Delicate dance: the semaphore implementation calls
+	// acquireSudog, acquireSudog calls new(sudog),
+	// new calls malloc, malloc can call the garbage collector,
+	// and the garbage collector calls the semaphore implementation
+	// in stopTheWorld.
+	// Break the cycle by doing acquirem/releasem around new(sudog).
+	// The acquirem/releasem increments m.locks during new(sudog),
+	// which keeps the garbage collector from being invoked.
+	mp := acquirem()
+	pp := mp.p.ptr()
+	if len(pp.sudogcache) == 0 {
+		lock(&sched.sudoglock)
+		// First, try to grab a batch from central cache.
+		for len(pp.sudogcache) < cap(pp.sudogcache)/2 && sched.sudogcache != nil {
+			s := sched.sudogcache
+			sched.sudogcache = s.next
+			s.next = nil
+			pp.sudogcache = append(pp.sudogcache, s)
+		}
+		unlock(&sched.sudoglock)
+		// If the central cache is empty, allocate a new one.
+		if len(pp.sudogcache) == 0 {
+			pp.sudogcache = append(pp.sudogcache, new(sudog))
+		}
+	}
+	n := len(pp.sudogcache)
+	s := pp.sudogcache[n-1]
+	pp.sudogcache[n-1] = nil
+	pp.sudogcache = pp.sudogcache[:n-1]
+	if s.elem != nil {
+		throw("acquireSudog: found s.elem != nil in cache")
+	}
+	releasem(mp)
+	return s
+}
+
+//go:nosplit
+func releaseSudog(s *sudog) {
+	if s.elem != nil {
+		throw("runtime: sudog with non-nil elem")
+	}
+	if s.isSelect {
+		throw("runtime: sudog with non-false isSelect")
+	}
+	if s.next != nil {
+		throw("runtime: sudog with non-nil next")
+	}
+	if s.prev != nil {
+		throw("runtime: sudog with non-nil prev")
+	}
+	if s.waitlink != nil {
+		throw("runtime: sudog with non-nil waitlink")
+	}
+	if s.c != nil {
+		throw("runtime: sudog with non-nil c")
+	}
+	gp := getg()
+	if gp.param != nil {
+		throw("runtime: releaseSudog with non-nil gp.param")
+	}
+	mp := acquirem() // avoid rescheduling to another P
+	pp := mp.p.ptr()
+	if len(pp.sudogcache) == cap(pp.sudogcache) {
+		// Transfer half of local cache to the central cache.
+		var first, last *sudog
+		for len(pp.sudogcache) > cap(pp.sudogcache)/2 {
+			n := len(pp.sudogcache)
+			p := pp.sudogcache[n-1]
+			pp.sudogcache[n-1] = nil
+			pp.sudogcache = pp.sudogcache[:n-1]
+			if first == nil {
+				first = p
+			} else {
+				last.next = p
+			}
+			last = p
+		}
+		lock(&sched.sudoglock)
+		last.next = sched.sudogcache
+		sched.sudogcache = first
+		unlock(&sched.sudoglock)
+	}
+	pp.sudogcache = append(pp.sudogcache, s)
+	releasem(mp)
+}
+
+// funcPC returns the entry PC of the function f.
+// It assumes that f is a func value. Otherwise the behavior is undefined.
+// CAREFUL: In programs with plugins, funcPC can return different values
+// for the same function (because there are actually multiple copies of
+// the same function in the address space). To be safe, don't use the
+// results of this function in any == expression. It is only safe to
+// use the result as an address at which to start executing code.
+//go:nosplit
+func funcPC(f interface{}) uintptr {
+	return *(*uintptr)(efaceOf(&f).data)
+}
+
+// called from assembly
+func badmcall(fn func(*g)) {
+	throw("runtime: mcall called on m->g0 stack")
+}
+
+func badmcall2(fn func(*g)) {
+	throw("runtime: mcall function returned")
+}
+
+func badreflectcall() {
+	panic(plainError("arg size to reflect.call more than 1GB"))
+}
+
+var badmorestackg0Msg = "fatal: morestack on g0\n"
+
+//go:nosplit
+//go:nowritebarrierrec
+func badmorestackg0() {
+	sp := stringStructOf(&badmorestackg0Msg)
+	write(2, sp.str, int32(sp.len))
+}
+
+var badmorestackgsignalMsg = "fatal: morestack on gsignal\n"
+
+//go:nosplit
+//go:nowritebarrierrec
+func badmorestackgsignal() {
+	sp := stringStructOf(&badmorestackgsignalMsg)
+	write(2, sp.str, int32(sp.len))
+}
+
+//go:nosplit
+func badctxt() {
+	throw("ctxt != 0")
+}
+
+func lockedOSThread() bool {
+	gp := getg()
+	return gp.lockedm != 0 && gp.m.lockedg != 0
+}
+
+var (
+	// allgs contains all Gs ever created (including dead Gs), and thus
+	// never shrinks.
+	//
+	// Access via the slice is protected by allglock or stop-the-world.
+	// Readers that cannot take the lock may (carefully!) use the atomic
+	// variables below.
+	allglock mutex
+	allgs    []*g
+
+	// allglen and allgptr are atomic variables that contain len(allg) and
+	// &allg[0] respectively. Proper ordering depends on totally-ordered
+	// loads and stores. Writes are protected by allglock.
+	//
+	// allgptr is updated before allglen. Readers should read allglen
+	// before allgptr to ensure that allglen is always <= len(allgptr). New
+	// Gs appended during the race can be missed. For a consistent view of
+	// all Gs, allglock must be held.
+	//
+	// allgptr copies should always be stored as a concrete type or
+	// unsafe.Pointer, not uintptr, to ensure that GC can still reach it
+	// even if it points to a stale array.
+	allglen uintptr
+	allgptr **g
+)
+
+func allgadd(gp *g) {
+	if readgstatus(gp) == _Gidle {
+		throw("allgadd: bad status Gidle")
+	}
+
+	lock(&allglock)
+	allgs = append(allgs, gp)
+	if &allgs[0] != allgptr {
+		atomicstorep(unsafe.Pointer(&allgptr), unsafe.Pointer(&allgs[0]))
+	}
+	atomic.Storeuintptr(&allglen, uintptr(len(allgs)))
+	unlock(&allglock)
+}
+
+// atomicAllG returns &allgs[0] and len(allgs) for use with atomicAllGIndex.
+func atomicAllG() (**g, uintptr) {
+	length := atomic.Loaduintptr(&allglen)
+	ptr := (**g)(atomic.Loadp(unsafe.Pointer(&allgptr)))
+	return ptr, length
+}
+
+// atomicAllGIndex returns ptr[i] with the allgptr returned from atomicAllG.
+func atomicAllGIndex(ptr **g, i uintptr) *g {
+	return *(**g)(add(unsafe.Pointer(ptr), i*sys.PtrSize))
+}
+
+const (
+	// Number of goroutine ids to grab from sched.goidgen to local per-P cache at once.
+	// 16 seems to provide enough amortization, but other than that it's mostly arbitrary number.
+	_GoidCacheBatch = 16
+)
+
+// cpuinit extracts the environment variable GODEBUG from the environment on
+// Unix-like operating systems and calls internal/cpu.Initialize.
+func cpuinit() {
+	const prefix = "GODEBUG="
+	var env string
+
+	switch GOOS {
+	case "aix", "darwin", "ios", "dragonfly", "freebsd", "netbsd", "openbsd", "illumos", "solaris", "linux":
+		cpu.DebugOptions = true
+
+		// Similar to goenv_unix but extracts the environment value for
+		// GODEBUG directly.
+		// TODO(moehrmann): remove when general goenvs() can be called before cpuinit()
+		n := int32(0)
+		for argv_index(argv, argc+1+n) != nil {
+			n++
+		}
+
+		for i := int32(0); i < n; i++ {
+			p := argv_index(argv, argc+1+i)
+			s := *(*string)(unsafe.Pointer(&stringStruct{unsafe.Pointer(p), findnull(p)}))
+
+			if hasPrefix(s, prefix) {
+				env = gostring(p)[len(prefix):]
+				break
+			}
+		}
+	}
+
+	cpu.Initialize(env)
+
+	// Support cpu feature variables are used in code generated by the compiler
+	// to guard execution of instructions that can not be assumed to be always supported.
+	x86HasPOPCNT = cpu.X86.HasPOPCNT
+	x86HasSSE41 = cpu.X86.HasSSE41
+	x86HasFMA = cpu.X86.HasFMA
+
+	armHasVFPv4 = cpu.ARM.HasVFPv4
+
+	arm64HasATOMICS = cpu.ARM64.HasATOMICS
+}
+
+// The bootstrap sequence is:
+//
+//	call osinit
+//	call schedinit
+//	make & queue new G
+//	call runtime·mstart
+//
+// The new G calls runtime·main.
+func schedinit() {
+	lockInit(&sched.lock, lockRankSched)
+	lockInit(&sched.sysmonlock, lockRankSysmon)
+	lockInit(&sched.deferlock, lockRankDefer)
+	lockInit(&sched.sudoglock, lockRankSudog)
+	lockInit(&deadlock, lockRankDeadlock)
+	lockInit(&paniclk, lockRankPanic)
+	lockInit(&allglock, lockRankAllg)
+	lockInit(&allpLock, lockRankAllp)
+	lockInit(&reflectOffs.lock, lockRankReflectOffs)
+	lockInit(&finlock, lockRankFin)
+	lockInit(&trace.bufLock, lockRankTraceBuf)
+	lockInit(&trace.stringsLock, lockRankTraceStrings)
+	lockInit(&trace.lock, lockRankTrace)
+	lockInit(&cpuprof.lock, lockRankCpuprof)
+	lockInit(&trace.stackTab.lock, lockRankTraceStackTab)
+	// Enforce that this lock is always a leaf lock.
+	// All of this lock's critical sections should be
+	// extremely short.
+	lockInit(&memstats.heapStats.noPLock, lockRankLeafRank)
+
+	// raceinit must be the first call to race detector.
+	// In particular, it must be done before mallocinit below calls racemapshadow.
+	_g_ := getg()
+	if raceenabled {
+		_g_.racectx, raceprocctx0 = raceinit()
+	}
+
+	sched.maxmcount = 10000
+
+	// The world starts stopped.
+	worldStopped()
+
+	moduledataverify()
+	stackinit()
+	mallocinit()
+	fastrandinit() // must run before mcommoninit
+	mcommoninit(_g_.m, -1)
+	cpuinit()       // must run before alginit
+	alginit()       // maps must not be used before this call
+	modulesinit()   // provides activeModules
+	typelinksinit() // uses maps, activeModules
+	itabsinit()     // uses activeModules
+
+	sigsave(&_g_.m.sigmask)
+	initSigmask = _g_.m.sigmask
+
+	goargs()
+	goenvs()
+	parsedebugvars()
+	gcinit()
+
+	lock(&sched.lock)
+	sched.lastpoll = uint64(nanotime())
+	procs := ncpu
+	if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
+		procs = n
+	}
+	if procresize(procs) != nil {
+		throw("unknown runnable goroutine during bootstrap")
+	}
+	unlock(&sched.lock)
+
+	// World is effectively started now, as P's can run.
+	worldStarted()
+
+	// For cgocheck > 1, we turn on the write barrier at all times
+	// and check all pointer writes. We can't do this until after
+	// procresize because the write barrier needs a P.
+	if debug.cgocheck > 1 {
+		writeBarrier.cgo = true
+		writeBarrier.enabled = true
+		for _, p := range allp {
+			p.wbBuf.reset()
+		}
+	}
+
+	if buildVersion == "" {
+		// Condition should never trigger. This code just serves
+		// to ensure runtime·buildVersion is kept in the resulting binary.
+		buildVersion = "unknown"
+	}
+	if len(modinfo) == 1 {
+		// Condition should never trigger. This code just serves
+		// to ensure runtime·modinfo is kept in the resulting binary.
+		modinfo = ""
+	}
+}
+
+func dumpgstatus(gp *g) {
+	_g_ := getg()
+	print("runtime: gp: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", readgstatus(gp), "\n")
+	print("runtime:  g:  g=", _g_, ", goid=", _g_.goid, ",  g->atomicstatus=", readgstatus(_g_), "\n")
+}
+
+// sched.lock must be held.
+func checkmcount() {
+	assertLockHeld(&sched.lock)
+
+	if mcount() > sched.maxmcount {
+		print("runtime: program exceeds ", sched.maxmcount, "-thread limit\n")
+		throw("thread exhaustion")
+	}
+}
+
+// mReserveID returns the next ID to use for a new m. This new m is immediately
+// considered 'running' by checkdead.
+//
+// sched.lock must be held.
+func mReserveID() int64 {
+	assertLockHeld(&sched.lock)
+
+	if sched.mnext+1 < sched.mnext {
+		throw("runtime: thread ID overflow")
+	}
+	id := sched.mnext
+	sched.mnext++
+	checkmcount()
+	return id
+}
+
+// Pre-allocated ID may be passed as 'id', or omitted by passing -1.
+func mcommoninit(mp *m, id int64) {
+	_g_ := getg()
+
+	// g0 stack won't make sense for user (and is not necessary unwindable).
+	if _g_ != _g_.m.g0 {
+		callers(1, mp.createstack[:])
+	}
+
+	lock(&sched.lock)
+
+	if id >= 0 {
+		mp.id = id
+	} else {
+		mp.id = mReserveID()
+	}
+
+	mp.fastrand[0] = uint32(int64Hash(uint64(mp.id), fastrandseed))
+	mp.fastrand[1] = uint32(int64Hash(uint64(cputicks()), ^fastrandseed))
+	if mp.fastrand[0]|mp.fastrand[1] == 0 {
+		mp.fastrand[1] = 1
+	}
+
+	mpreinit(mp)
+	if mp.gsignal != nil {
+		mp.gsignal.stackguard1 = mp.gsignal.stack.lo + _StackGuard
+	}
+
+	// Add to allm so garbage collector doesn't free g->m
+	// when it is just in a register or thread-local storage.
+	mp.alllink = allm
+
+	// NumCgoCall() iterates over allm w/o schedlock,
+	// so we need to publish it safely.
+	atomicstorep(unsafe.Pointer(&allm), unsafe.Pointer(mp))
+	unlock(&sched.lock)
+
+	// Allocate memory to hold a cgo traceback if the cgo call crashes.
+	if iscgo || GOOS == "solaris" || GOOS == "illumos" || GOOS == "windows" {
+		mp.cgoCallers = new(cgoCallers)
+	}
+}
+
+var fastrandseed uintptr
+
+func fastrandinit() {
+	s := (*[unsafe.Sizeof(fastrandseed)]byte)(unsafe.Pointer(&fastrandseed))[:]
+	getRandomData(s)
+}
+
+// Mark gp ready to run.
+func ready(gp *g, traceskip int, next bool) {
+	if trace.enabled {
+		traceGoUnpark(gp, traceskip)
+	}
+
+	status := readgstatus(gp)
+
+	// Mark runnable.
+	_g_ := getg()
+	mp := acquirem() // disable preemption because it can be holding p in a local var
+	if status&^_Gscan != _Gwaiting {
+		dumpgstatus(gp)
+		throw("bad g->status in ready")
+	}
+
+	// status is Gwaiting or Gscanwaiting, make Grunnable and put on runq
+	casgstatus(gp, _Gwaiting, _Grunnable)
+	runqput(_g_.m.p.ptr(), gp, next)
+	wakep()
+	releasem(mp)
+}
+
+// freezeStopWait is a large value that freezetheworld sets
+// sched.stopwait to in order to request that all Gs permanently stop.
+const freezeStopWait = 0x7fffffff
+
+// freezing is set to non-zero if the runtime is trying to freeze the
+// world.
+var freezing uint32
+
+// Similar to stopTheWorld but best-effort and can be called several times.
+// There is no reverse operation, used during crashing.
+// This function must not lock any mutexes.
+func freezetheworld() {
+	atomic.Store(&freezing, 1)
+	// stopwait and preemption requests can be lost
+	// due to races with concurrently executing threads,
+	// so try several times
+	for i := 0; i < 5; i++ {
+		// this should tell the scheduler to not start any new goroutines
+		sched.stopwait = freezeStopWait
+		atomic.Store(&sched.gcwaiting, 1)
+		// this should stop running goroutines
+		if !preemptall() {
+			break // no running goroutines
+		}
+		usleep(1000)
+	}
+	// to be sure
+	usleep(1000)
+	preemptall()
+	usleep(1000)
+}
+
+// All reads and writes of g's status go through readgstatus, casgstatus
+// castogscanstatus, casfrom_Gscanstatus.
+//go:nosplit
+func readgstatus(gp *g) uint32 {
+	return atomic.Load(&gp.atomicstatus)
+}
+
+// The Gscanstatuses are acting like locks and this releases them.
+// If it proves to be a performance hit we should be able to make these
+// simple atomic stores but for now we are going to throw if
+// we see an inconsistent state.
+func casfrom_Gscanstatus(gp *g, oldval, newval uint32) {
+	success := false
+
+	// Check that transition is valid.
+	switch oldval {
+	default:
+		print("runtime: casfrom_Gscanstatus bad oldval gp=", gp, ", oldval=", hex(oldval), ", newval=", hex(newval), "\n")
+		dumpgstatus(gp)
+		throw("casfrom_Gscanstatus:top gp->status is not in scan state")
+	case _Gscanrunnable,
+		_Gscanwaiting,
+		_Gscanrunning,
+		_Gscansyscall,
+		_Gscanpreempted:
+		if newval == oldval&^_Gscan {
+			success = atomic.Cas(&gp.atomicstatus, oldval, newval)
+		}
+	}
+	if !success {
+		print("runtime: casfrom_Gscanstatus failed gp=", gp, ", oldval=", hex(oldval), ", newval=", hex(newval), "\n")
+		dumpgstatus(gp)
+		throw("casfrom_Gscanstatus: gp->status is not in scan state")
+	}
+	releaseLockRank(lockRankGscan)
+}
+
+// This will return false if the gp is not in the expected status and the cas fails.
+// This acts like a lock acquire while the casfromgstatus acts like a lock release.
+func castogscanstatus(gp *g, oldval, newval uint32) bool {
+	switch oldval {
+	case _Grunnable,
+		_Grunning,
+		_Gwaiting,
+		_Gsyscall:
+		if newval == oldval|_Gscan {
+			r := atomic.Cas(&gp.atomicstatus, oldval, newval)
+			if r {
+				acquireLockRank(lockRankGscan)
+			}
+			return r
+
+		}
+	}
+	print("runtime: castogscanstatus oldval=", hex(oldval), " newval=", hex(newval), "\n")
+	throw("castogscanstatus")
+	panic("not reached")
+}
+
+// If asked to move to or from a Gscanstatus this will throw. Use the castogscanstatus
+// and casfrom_Gscanstatus instead.
+// casgstatus will loop if the g->atomicstatus is in a Gscan status until the routine that
+// put it in the Gscan state is finished.
+//go:nosplit
+func casgstatus(gp *g, oldval, newval uint32) {
+	if (oldval&_Gscan != 0) || (newval&_Gscan != 0) || oldval == newval {
+		systemstack(func() {
+			print("runtime: casgstatus: oldval=", hex(oldval), " newval=", hex(newval), "\n")
+			throw("casgstatus: bad incoming values")
+		})
+	}
+
+	acquireLockRank(lockRankGscan)
+	releaseLockRank(lockRankGscan)
+
+	// See https://golang.org/cl/21503 for justification of the yield delay.
+	const yieldDelay = 5 * 1000
+	var nextYield int64
+
+	// loop if gp->atomicstatus is in a scan state giving
+	// GC time to finish and change the state to oldval.
+	for i := 0; !atomic.Cas(&gp.atomicstatus, oldval, newval); i++ {
+		if oldval == _Gwaiting && gp.atomicstatus == _Grunnable {
+			throw("casgstatus: waiting for Gwaiting but is Grunnable")
+		}
+		if i == 0 {
+			nextYield = nanotime() + yieldDelay
+		}
+		if nanotime() < nextYield {
+			for x := 0; x < 10 && gp.atomicstatus != oldval; x++ {
+				procyield(1)
+			}
+		} else {
+			osyield()
+			nextYield = nanotime() + yieldDelay/2
+		}
+	}
+}
+
+// casgstatus(gp, oldstatus, Gcopystack), assuming oldstatus is Gwaiting or Grunnable.
+// Returns old status. Cannot call casgstatus directly, because we are racing with an
+// async wakeup that might come in from netpoll. If we see Gwaiting from the readgstatus,
+// it might have become Grunnable by the time we get to the cas. If we called casgstatus,
+// it would loop waiting for the status to go back to Gwaiting, which it never will.
+//go:nosplit
+func casgcopystack(gp *g) uint32 {
+	for {
+		oldstatus := readgstatus(gp) &^ _Gscan
+		if oldstatus != _Gwaiting && oldstatus != _Grunnable {
+			throw("copystack: bad status, not Gwaiting or Grunnable")
+		}
+		if atomic.Cas(&gp.atomicstatus, oldstatus, _Gcopystack) {
+			return oldstatus
+		}
+	}
+}
+
+// casGToPreemptScan transitions gp from _Grunning to _Gscan|_Gpreempted.
+//
+// TODO(austin): This is the only status operation that both changes
+// the status and locks the _Gscan bit. Rethink this.
+func casGToPreemptScan(gp *g, old, new uint32) {
+	if old != _Grunning || new != _Gscan|_Gpreempted {
+		throw("bad g transition")
+	}
+	acquireLockRank(lockRankGscan)
+	for !atomic.Cas(&gp.atomicstatus, _Grunning, _Gscan|_Gpreempted) {
+	}
+}
+
+// casGFromPreempted attempts to transition gp from _Gpreempted to
+// _Gwaiting. If successful, the caller is responsible for
+// re-scheduling gp.
+func casGFromPreempted(gp *g, old, new uint32) bool {
+	if old != _Gpreempted || new != _Gwaiting {
+		throw("bad g transition")
+	}
+	return atomic.Cas(&gp.atomicstatus, _Gpreempted, _Gwaiting)
+}
+
+// stopTheWorld stops all P's from executing goroutines, interrupting
+// all goroutines at GC safe points and records reason as the reason
+// for the stop. On return, only the current goroutine's P is running.
+// stopTheWorld must not be called from a system stack and the caller
+// must not hold worldsema. The caller must call startTheWorld when
+// other P's should resume execution.
+//
+// stopTheWorld is safe for multiple goroutines to call at the
+// same time. Each will execute its own stop, and the stops will
+// be serialized.
+//
+// This is also used by routines that do stack dumps. If the system is
+// in panic or being exited, this may not reliably stop all
+// goroutines.
+func stopTheWorld(reason string) {
+	semacquire(&worldsema)
+	gp := getg()
+	gp.m.preemptoff = reason
+	systemstack(func() {
+		// Mark the goroutine which called stopTheWorld preemptible so its
+		// stack may be scanned.
+		// This lets a mark worker scan us while we try to stop the world
+		// since otherwise we could get in a mutual preemption deadlock.
+		// We must not modify anything on the G stack because a stack shrink
+		// may occur. A stack shrink is otherwise OK though because in order
+		// to return from this function (and to leave the system stack) we
+		// must have preempted all goroutines, including any attempting
+		// to scan our stack, in which case, any stack shrinking will
+		// have already completed by the time we exit.
+		casgstatus(gp, _Grunning, _Gwaiting)
+		stopTheWorldWithSema()
+		casgstatus(gp, _Gwaiting, _Grunning)
+	})
+}
+
+// startTheWorld undoes the effects of stopTheWorld.
+func startTheWorld() {
+	systemstack(func() { startTheWorldWithSema(false) })
+
+	// worldsema must be held over startTheWorldWithSema to ensure
+	// gomaxprocs cannot change while worldsema is held.
+	//
+	// Release worldsema with direct handoff to the next waiter, but
+	// acquirem so that semrelease1 doesn't try to yield our time.
+	//
+	// Otherwise if e.g. ReadMemStats is being called in a loop,
+	// it might stomp on other attempts to stop the world, such as
+	// for starting or ending GC. The operation this blocks is
+	// so heavy-weight that we should just try to be as fair as
+	// possible here.
+	//
+	// We don't want to just allow us to get preempted between now
+	// and releasing the semaphore because then we keep everyone
+	// (including, for example, GCs) waiting longer.
+	mp := acquirem()
+	mp.preemptoff = ""
+	semrelease1(&worldsema, true, 0)
+	releasem(mp)
+}
+
+// stopTheWorldGC has the same effect as stopTheWorld, but blocks
+// until the GC is not running. It also blocks a GC from starting
+// until startTheWorldGC is called.
+func stopTheWorldGC(reason string) {
+	semacquire(&gcsema)
+	stopTheWorld(reason)
+}
+
+// startTheWorldGC undoes the effects of stopTheWorldGC.
+func startTheWorldGC() {
+	startTheWorld()
+	semrelease(&gcsema)
+}
+
+// Holding worldsema grants an M the right to try to stop the world.
+var worldsema uint32 = 1
+
+// Holding gcsema grants the M the right to block a GC, and blocks
+// until the current GC is done. In particular, it prevents gomaxprocs
+// from changing concurrently.
+//
+// TODO(mknyszek): Once gomaxprocs and the execution tracer can handle
+// being changed/enabled during a GC, remove this.
+var gcsema uint32 = 1
+
+// stopTheWorldWithSema is the core implementation of stopTheWorld.
+// The caller is responsible for acquiring worldsema and disabling
+// preemption first and then should stopTheWorldWithSema on the system
+// stack:
+//
+//	semacquire(&worldsema, 0)
+//	m.preemptoff = "reason"
+//	systemstack(stopTheWorldWithSema)
+//
+// When finished, the caller must either call startTheWorld or undo
+// these three operations separately:
+//
+//	m.preemptoff = ""
+//	systemstack(startTheWorldWithSema)
+//	semrelease(&worldsema)
+//
+// It is allowed to acquire worldsema once and then execute multiple
+// startTheWorldWithSema/stopTheWorldWithSema pairs.
+// Other P's are able to execute between successive calls to
+// startTheWorldWithSema and stopTheWorldWithSema.
+// Holding worldsema causes any other goroutines invoking
+// stopTheWorld to block.
+func stopTheWorldWithSema() {
+	_g_ := getg()
+
+	// If we hold a lock, then we won't be able to stop another M
+	// that is blocked trying to acquire the lock.
+	if _g_.m.locks > 0 {
+		throw("stopTheWorld: holding locks")
+	}
+
+	lock(&sched.lock)
+	sched.stopwait = gomaxprocs
+	atomic.Store(&sched.gcwaiting, 1)
+	preemptall()
+	// stop current P
+	_g_.m.p.ptr().status = _Pgcstop // Pgcstop is only diagnostic.
+	sched.stopwait--
+	// try to retake all P's in Psyscall status
+	for _, p := range allp {
+		s := p.status
+		if s == _Psyscall && atomic.Cas(&p.status, s, _Pgcstop) {
+			if trace.enabled {
+				traceGoSysBlock(p)
+				traceProcStop(p)
+			}
+			p.syscalltick++
+			sched.stopwait--
+		}
+	}
+	// stop idle P's
+	for {
+		p := pidleget()
+		if p == nil {
+			break
+		}
+		p.status = _Pgcstop
+		sched.stopwait--
+	}
+	wait := sched.stopwait > 0
+	unlock(&sched.lock)
+
+	// wait for remaining P's to stop voluntarily
+	if wait {
+		for {
+			// wait for 100us, then try to re-preempt in case of any races
+			if notetsleep(&sched.stopnote, 100*1000) {
+				noteclear(&sched.stopnote)
+				break
+			}
+			preemptall()
+		}
+	}
+
+	// sanity checks
+	bad := ""
+	if sched.stopwait != 0 {
+		bad = "stopTheWorld: not stopped (stopwait != 0)"
+	} else {
+		for _, p := range allp {
+			if p.status != _Pgcstop {
+				bad = "stopTheWorld: not stopped (status != _Pgcstop)"
+			}
+		}
+	}
+	if atomic.Load(&freezing) != 0 {
+		// Some other thread is panicking. This can cause the
+		// sanity checks above to fail if the panic happens in
+		// the signal handler on a stopped thread. Either way,
+		// we should halt this thread.
+		lock(&deadlock)
+		lock(&deadlock)
+	}
+	if bad != "" {
+		throw(bad)
+	}
+
+	worldStopped()
+}
+
+func startTheWorldWithSema(emitTraceEvent bool) int64 {
+	assertWorldStopped()
+
+	mp := acquirem() // disable preemption because it can be holding p in a local var
+	if netpollinited() {
+		list := netpoll(0) // non-blocking
+		injectglist(&list)
+	}
+	lock(&sched.lock)
+
+	procs := gomaxprocs
+	if newprocs != 0 {
+		procs = newprocs
+		newprocs = 0
+	}
+	p1 := procresize(procs)
+	sched.gcwaiting = 0
+	if sched.sysmonwait != 0 {
+		sched.sysmonwait = 0
+		notewakeup(&sched.sysmonnote)
+	}
+	unlock(&sched.lock)
+
+	worldStarted()
+
+	for p1 != nil {
+		p := p1
+		p1 = p1.link.ptr()
+		if p.m != 0 {
+			mp := p.m.ptr()
+			p.m = 0
+			if mp.nextp != 0 {
+				throw("startTheWorld: inconsistent mp->nextp")
+			}
+			mp.nextp.set(p)
+			notewakeup(&mp.park)
+		} else {
+			// Start M to run P.  Do not start another M below.
+			newm(nil, p, -1)
+		}
+	}
+
+	// Capture start-the-world time before doing clean-up tasks.
+	startTime := nanotime()
+	if emitTraceEvent {
+		traceGCSTWDone()
+	}
+
+	// Wakeup an additional proc in case we have excessive runnable goroutines
+	// in local queues or in the global queue. If we don't, the proc will park itself.
+	// If we have lots of excessive work, resetspinning will unpark additional procs as necessary.
+	wakep()
+
+	releasem(mp)
+
+	return startTime
+}
+
+// usesLibcall indicates whether this runtime performs system calls
+// via libcall.
+func usesLibcall() bool {
+	switch GOOS {
+	case "aix", "darwin", "illumos", "ios", "solaris", "windows":
+		return true
+	case "openbsd":
+		return GOARCH == "amd64" || GOARCH == "arm64"
+	}
+	return false
+}
+
+// mStackIsSystemAllocated indicates whether this runtime starts on a
+// system-allocated stack.
+func mStackIsSystemAllocated() bool {
+	switch GOOS {
+	case "aix", "darwin", "plan9", "illumos", "ios", "solaris", "windows":
+		return true
+	case "openbsd":
+		switch GOARCH {
+		case "amd64", "arm64":
+			return true
+		}
+	}
+	return false
+}
+
+// mstart is the entry-point for new Ms.
+//
+// This must not split the stack because we may not even have stack
+// bounds set up yet.
+//
+// May run during STW (because it doesn't have a P yet), so write
+// barriers are not allowed.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func mstart() {
+	_g_ := getg()
+
+	osStack := _g_.stack.lo == 0
+	if osStack {
+		// Initialize stack bounds from system stack.
+		// Cgo may have left stack size in stack.hi.
+		// minit may update the stack bounds.
+		//
+		// Note: these bounds may not be very accurate.
+		// We set hi to &size, but there are things above
+		// it. The 1024 is supposed to compensate this,
+		// but is somewhat arbitrary.
+		size := _g_.stack.hi
+		if size == 0 {
+			size = 8192 * sys.StackGuardMultiplier
+		}
+		_g_.stack.hi = uintptr(noescape(unsafe.Pointer(&size)))
+		_g_.stack.lo = _g_.stack.hi - size + 1024
+	}
+	// Initialize stack guard so that we can start calling regular
+	// Go code.
+	_g_.stackguard0 = _g_.stack.lo + _StackGuard
+	// This is the g0, so we can also call go:systemstack
+	// functions, which check stackguard1.
+	_g_.stackguard1 = _g_.stackguard0
+	mstart1()
+
+	// Exit this thread.
+	if mStackIsSystemAllocated() {
+		// Windows, Solaris, illumos, Darwin, AIX and Plan 9 always system-allocate
+		// the stack, but put it in _g_.stack before mstart,
+		// so the logic above hasn't set osStack yet.
+		osStack = true
+	}
+	mexit(osStack)
+}
+
+func mstart1() {
+	_g_ := getg()
+
+	if _g_ != _g_.m.g0 {
+		throw("bad runtime·mstart")
+	}
+
+	// Record the caller for use as the top of stack in mcall and
+	// for terminating the thread.
+	// We're never coming back to mstart1 after we call schedule,
+	// so other calls can reuse the current frame.
+	save(getcallerpc(), getcallersp())
+	asminit()
+	minit()
+
+	// Install signal handlers; after minit so that minit can
+	// prepare the thread to be able to handle the signals.
+	if _g_.m == &m0 {
+		mstartm0()
+	}
+
+	if fn := _g_.m.mstartfn; fn != nil {
+		fn()
+	}
+
+	if _g_.m != &m0 {
+		acquirep(_g_.m.nextp.ptr())
+		_g_.m.nextp = 0
+	}
+	schedule()
+}
+
+// mstartm0 implements part of mstart1 that only runs on the m0.
+//
+// Write barriers are allowed here because we know the GC can't be
+// running yet, so they'll be no-ops.
+//
+//go:yeswritebarrierrec
+func mstartm0() {
+	// Create an extra M for callbacks on threads not created by Go.
+	// An extra M is also needed on Windows for callbacks created by
+	// syscall.NewCallback. See issue #6751 for details.
+	if (iscgo || GOOS == "windows") && !cgoHasExtraM {
+		cgoHasExtraM = true
+		newextram()
+	}
+	initsig(false)
+}
+
+// mPark causes a thread to park itself - temporarily waking for
+// fixups but otherwise waiting to be fully woken. This is the
+// only way that m's should park themselves.
+//go:nosplit
+func mPark() {
+	g := getg()
+	for {
+		notesleep(&g.m.park)
+		// Note, because of signal handling by this parked m,
+		// a preemptive mDoFixup() may actually occur via
+		// mDoFixupAndOSYield(). (See golang.org/issue/44193)
+		noteclear(&g.m.park)
+		if !mDoFixup() {
+			return
+		}
+	}
+}
+
+// mexit tears down and exits the current thread.
+//
+// Don't call this directly to exit the thread, since it must run at
+// the top of the thread stack. Instead, use gogo(&_g_.m.g0.sched) to
+// unwind the stack to the point that exits the thread.
+//
+// It is entered with m.p != nil, so write barriers are allowed. It
+// will release the P before exiting.
+//
+//go:yeswritebarrierrec
+func mexit(osStack bool) {
+	g := getg()
+	m := g.m
+
+	if m == &m0 {
+		// This is the main thread. Just wedge it.
+		//
+		// On Linux, exiting the main thread puts the process
+		// into a non-waitable zombie state. On Plan 9,
+		// exiting the main thread unblocks wait even though
+		// other threads are still running. On Solaris we can
+		// neither exitThread nor return from mstart. Other
+		// bad things probably happen on other platforms.
+		//
+		// We could try to clean up this M more before wedging
+		// it, but that complicates signal handling.
+		handoffp(releasep())
+		lock(&sched.lock)
+		sched.nmfreed++
+		checkdead()
+		unlock(&sched.lock)
+		mPark()
+		throw("locked m0 woke up")
+	}
+
+	sigblock(true)
+	unminit()
+
+	// Free the gsignal stack.
+	if m.gsignal != nil {
+		stackfree(m.gsignal.stack)
+		// On some platforms, when calling into VDSO (e.g. nanotime)
+		// we store our g on the gsignal stack, if there is one.
+		// Now the stack is freed, unlink it from the m, so we
+		// won't write to it when calling VDSO code.
+		m.gsignal = nil
+	}
+
+	// Remove m from allm.
+	lock(&sched.lock)
+	for pprev := &allm; *pprev != nil; pprev = &(*pprev).alllink {
+		if *pprev == m {
+			*pprev = m.alllink
+			goto found
+		}
+	}
+	throw("m not found in allm")
+found:
+	if !osStack {
+		// Delay reaping m until it's done with the stack.
+		//
+		// If this is using an OS stack, the OS will free it
+		// so there's no need for reaping.
+		atomic.Store(&m.freeWait, 1)
+		// Put m on the free list, though it will not be reaped until
+		// freeWait is 0. Note that the free list must not be linked
+		// through alllink because some functions walk allm without
+		// locking, so may be using alllink.
+		m.freelink = sched.freem
+		sched.freem = m
+	}
+	unlock(&sched.lock)
+
+	// Release the P.
+	handoffp(releasep())
+	// After this point we must not have write barriers.
+
+	// Invoke the deadlock detector. This must happen after
+	// handoffp because it may have started a new M to take our
+	// P's work.
+	lock(&sched.lock)
+	sched.nmfreed++
+	checkdead()
+	unlock(&sched.lock)
+
+	if GOOS == "darwin" || GOOS == "ios" {
+		// Make sure pendingPreemptSignals is correct when an M exits.
+		// For #41702.
+		if atomic.Load(&m.signalPending) != 0 {
+			atomic.Xadd(&pendingPreemptSignals, -1)
+		}
+	}
+
+	// Destroy all allocated resources. After this is called, we may no
+	// longer take any locks.
+	mdestroy(m)
+
+	if osStack {
+		// Return from mstart and let the system thread
+		// library free the g0 stack and terminate the thread.
+		return
+	}
+
+	// mstart is the thread's entry point, so there's nothing to
+	// return to. Exit the thread directly. exitThread will clear
+	// m.freeWait when it's done with the stack and the m can be
+	// reaped.
+	exitThread(&m.freeWait)
+}
+
+// forEachP calls fn(p) for every P p when p reaches a GC safe point.
+// If a P is currently executing code, this will bring the P to a GC
+// safe point and execute fn on that P. If the P is not executing code
+// (it is idle or in a syscall), this will call fn(p) directly while
+// preventing the P from exiting its state. This does not ensure that
+// fn will run on every CPU executing Go code, but it acts as a global
+// memory barrier. GC uses this as a "ragged barrier."
+//
+// The caller must hold worldsema.
+//
+//go:systemstack
+func forEachP(fn func(*p)) {
+	mp := acquirem()
+	_p_ := getg().m.p.ptr()
+
+	lock(&sched.lock)
+	if sched.safePointWait != 0 {
+		throw("forEachP: sched.safePointWait != 0")
+	}
+	sched.safePointWait = gomaxprocs - 1
+	sched.safePointFn = fn
+
+	// Ask all Ps to run the safe point function.
+	for _, p := range allp {
+		if p != _p_ {
+			atomic.Store(&p.runSafePointFn, 1)
+		}
+	}
+	preemptall()
+
+	// Any P entering _Pidle or _Psyscall from now on will observe
+	// p.runSafePointFn == 1 and will call runSafePointFn when
+	// changing its status to _Pidle/_Psyscall.
+
+	// Run safe point function for all idle Ps. sched.pidle will
+	// not change because we hold sched.lock.
+	for p := sched.pidle.ptr(); p != nil; p = p.link.ptr() {
+		if atomic.Cas(&p.runSafePointFn, 1, 0) {
+			fn(p)
+			sched.safePointWait--
+		}
+	}
+
+	wait := sched.safePointWait > 0
+	unlock(&sched.lock)
+
+	// Run fn for the current P.
+	fn(_p_)
+
+	// Force Ps currently in _Psyscall into _Pidle and hand them
+	// off to induce safe point function execution.
+	for _, p := range allp {
+		s := p.status
+		if s == _Psyscall && p.runSafePointFn == 1 && atomic.Cas(&p.status, s, _Pidle) {
+			if trace.enabled {
+				traceGoSysBlock(p)
+				traceProcStop(p)
+			}
+			p.syscalltick++
+			handoffp(p)
+		}
+	}
+
+	// Wait for remaining Ps to run fn.
+	if wait {
+		for {
+			// Wait for 100us, then try to re-preempt in
+			// case of any races.
+			//
+			// Requires system stack.
+			if notetsleep(&sched.safePointNote, 100*1000) {
+				noteclear(&sched.safePointNote)
+				break
+			}
+			preemptall()
+		}
+	}
+	if sched.safePointWait != 0 {
+		throw("forEachP: not done")
+	}
+	for _, p := range allp {
+		if p.runSafePointFn != 0 {
+			throw("forEachP: P did not run fn")
+		}
+	}
+
+	lock(&sched.lock)
+	sched.safePointFn = nil
+	unlock(&sched.lock)
+	releasem(mp)
+}
+
+// syscall_runtime_doAllThreadsSyscall serializes Go execution and
+// executes a specified fn() call on all m's.
+//
+// The boolean argument to fn() indicates whether the function's
+// return value will be consulted or not. That is, fn(true) should
+// return true if fn() succeeds, and fn(true) should return false if
+// it failed. When fn(false) is called, its return status will be
+// ignored.
+//
+// syscall_runtime_doAllThreadsSyscall first invokes fn(true) on a
+// single, coordinating, m, and only if it returns true does it go on
+// to invoke fn(false) on all of the other m's known to the process.
+//
+//go:linkname syscall_runtime_doAllThreadsSyscall syscall.runtime_doAllThreadsSyscall
+func syscall_runtime_doAllThreadsSyscall(fn func(bool) bool) {
+	if iscgo {
+		panic("doAllThreadsSyscall not supported with cgo enabled")
+	}
+	if fn == nil {
+		return
+	}
+	for atomic.Load(&sched.sysmonStarting) != 0 {
+		osyield()
+	}
+
+	// We don't want this thread to handle signals for the
+	// duration of this critical section. The underlying issue
+	// being that this locked coordinating m is the one monitoring
+	// for fn() execution by all the other m's of the runtime,
+	// while no regular go code execution is permitted (the world
+	// is stopped). If this present m were to get distracted to
+	// run signal handling code, and find itself waiting for a
+	// second thread to execute go code before being able to
+	// return from that signal handling, a deadlock will result.
+	// (See golang.org/issue/44193.)
+	lockOSThread()
+	var sigmask sigset
+	sigsave(&sigmask)
+	sigblock(false)
+
+	stopTheWorldGC("doAllThreadsSyscall")
+	if atomic.Load(&newmHandoff.haveTemplateThread) != 0 {
+		// Ensure that there are no in-flight thread
+		// creations: don't want to race with allm.
+		lock(&newmHandoff.lock)
+		for !newmHandoff.waiting {
+			unlock(&newmHandoff.lock)
+			osyield()
+			lock(&newmHandoff.lock)
+		}
+		unlock(&newmHandoff.lock)
+	}
+	if netpollinited() {
+		netpollBreak()
+	}
+	sigRecvPrepareForFixup()
+	_g_ := getg()
+	if raceenabled {
+		// For m's running without racectx, we loan out the
+		// racectx of this call.
+		lock(&mFixupRace.lock)
+		mFixupRace.ctx = _g_.racectx
+		unlock(&mFixupRace.lock)
+	}
+	if ok := fn(true); ok {
+		tid := _g_.m.procid
+		for mp := allm; mp != nil; mp = mp.alllink {
+			if mp.procid == tid {
+				// This m has already completed fn()
+				// call.
+				continue
+			}
+			// Be wary of mp's without procid values if
+			// they are known not to park. If they are
+			// marked as parking with a zero procid, then
+			// they will be racing with this code to be
+			// allocated a procid and we will annotate
+			// them with the need to execute the fn when
+			// they acquire a procid to run it.
+			if mp.procid == 0 && !mp.doesPark {
+				// Reaching here, we are either
+				// running Windows, or cgo linked
+				// code. Neither of which are
+				// currently supported by this API.
+				throw("unsupported runtime environment")
+			}
+			// stopTheWorldGC() doesn't guarantee stopping
+			// all the threads, so we lock here to avoid
+			// the possibility of racing with mp.
+			lock(&mp.mFixup.lock)
+			mp.mFixup.fn = fn
+			atomic.Store(&mp.mFixup.used, 1)
+			if mp.doesPark {
+				// For non-service threads this will
+				// cause the wakeup to be short lived
+				// (once the mutex is unlocked). The
+				// next real wakeup will occur after
+				// startTheWorldGC() is called.
+				notewakeup(&mp.park)
+			}
+			unlock(&mp.mFixup.lock)
+		}
+		for {
+			done := true
+			for mp := allm; done && mp != nil; mp = mp.alllink {
+				if mp.procid == tid {
+					continue
+				}
+				done = atomic.Load(&mp.mFixup.used) == 0
+			}
+			if done {
+				break
+			}
+			// if needed force sysmon and/or newmHandoff to wakeup.
+			lock(&sched.lock)
+			if atomic.Load(&sched.sysmonwait) != 0 {
+				atomic.Store(&sched.sysmonwait, 0)
+				notewakeup(&sched.sysmonnote)
+			}
+			unlock(&sched.lock)
+			lock(&newmHandoff.lock)
+			if newmHandoff.waiting {
+				newmHandoff.waiting = false
+				notewakeup(&newmHandoff.wake)
+			}
+			unlock(&newmHandoff.lock)
+			osyield()
+		}
+	}
+	if raceenabled {
+		lock(&mFixupRace.lock)
+		mFixupRace.ctx = 0
+		unlock(&mFixupRace.lock)
+	}
+	startTheWorldGC()
+	msigrestore(sigmask)
+	unlockOSThread()
+}
+
+// runSafePointFn runs the safe point function, if any, for this P.
+// This should be called like
+//
+//     if getg().m.p.runSafePointFn != 0 {
+//         runSafePointFn()
+//     }
+//
+// runSafePointFn must be checked on any transition in to _Pidle or
+// _Psyscall to avoid a race where forEachP sees that the P is running
+// just before the P goes into _Pidle/_Psyscall and neither forEachP
+// nor the P run the safe-point function.
+func runSafePointFn() {
+	p := getg().m.p.ptr()
+	// Resolve the race between forEachP running the safe-point
+	// function on this P's behalf and this P running the
+	// safe-point function directly.
+	if !atomic.Cas(&p.runSafePointFn, 1, 0) {
+		return
+	}
+	sched.safePointFn(p)
+	lock(&sched.lock)
+	sched.safePointWait--
+	if sched.safePointWait == 0 {
+		notewakeup(&sched.safePointNote)
+	}
+	unlock(&sched.lock)
+}
+
+// When running with cgo, we call _cgo_thread_start
+// to start threads for us so that we can play nicely with
+// foreign code.
+var cgoThreadStart unsafe.Pointer
+
+type cgothreadstart struct {
+	g   guintptr
+	tls *uint64
+	fn  unsafe.Pointer
+}
+
+// Allocate a new m unassociated with any thread.
+// Can use p for allocation context if needed.
+// fn is recorded as the new m's m.mstartfn.
+// id is optional pre-allocated m ID. Omit by passing -1.
+//
+// This function is allowed to have write barriers even if the caller
+// isn't because it borrows _p_.
+//
+//go:yeswritebarrierrec
+func allocm(_p_ *p, fn func(), id int64) *m {
+	_g_ := getg()
+	acquirem() // disable GC because it can be called from sysmon
+	if _g_.m.p == 0 {
+		acquirep(_p_) // temporarily borrow p for mallocs in this function
+	}
+
+	// Release the free M list. We need to do this somewhere and
+	// this may free up a stack we can use.
+	if sched.freem != nil {
+		lock(&sched.lock)
+		var newList *m
+		for freem := sched.freem; freem != nil; {
+			if freem.freeWait != 0 {
+				next := freem.freelink
+				freem.freelink = newList
+				newList = freem
+				freem = next
+				continue
+			}
+			// stackfree must be on the system stack, but allocm is
+			// reachable off the system stack transitively from
+			// startm.
+			systemstack(func() {
+				stackfree(freem.g0.stack)
+			})
+			freem = freem.freelink
+		}
+		sched.freem = newList
+		unlock(&sched.lock)
+	}
+
+	mp := new(m)
+	mp.mstartfn = fn
+	mcommoninit(mp, id)
+
+	// In case of cgo or Solaris or illumos or Darwin, pthread_create will make us a stack.
+	// Windows and Plan 9 will layout sched stack on OS stack.
+	if iscgo || mStackIsSystemAllocated() {
+		mp.g0 = malg(-1)
+	} else {
+		mp.g0 = malg(8192 * sys.StackGuardMultiplier)
+	}
+	mp.g0.m = mp
+
+	if _p_ == _g_.m.p.ptr() {
+		releasep()
+	}
+	releasem(_g_.m)
+
+	return mp
+}
+
+// needm is called when a cgo callback happens on a
+// thread without an m (a thread not created by Go).
+// In this case, needm is expected to find an m to use
+// and return with m, g initialized correctly.
+// Since m and g are not set now (likely nil, but see below)
+// needm is limited in what routines it can call. In particular
+// it can only call nosplit functions (textflag 7) and cannot
+// do any scheduling that requires an m.
+//
+// In order to avoid needing heavy lifting here, we adopt
+// the following strategy: there is a stack of available m's
+// that can be stolen. Using compare-and-swap
+// to pop from the stack has ABA races, so we simulate
+// a lock by doing an exchange (via Casuintptr) to steal the stack
+// head and replace the top pointer with MLOCKED (1).
+// This serves as a simple spin lock that we can use even
+// without an m. The thread that locks the stack in this way
+// unlocks the stack by storing a valid stack head pointer.
+//
+// In order to make sure that there is always an m structure
+// available to be stolen, we maintain the invariant that there
+// is always one more than needed. At the beginning of the
+// program (if cgo is in use) the list is seeded with a single m.
+// If needm finds that it has taken the last m off the list, its job
+// is - once it has installed its own m so that it can do things like
+// allocate memory - to create a spare m and put it on the list.
+//
+// Each of these extra m's also has a g0 and a curg that are
+// pressed into service as the scheduling stack and current
+// goroutine for the duration of the cgo callback.
+//
+// When the callback is done with the m, it calls dropm to
+// put the m back on the list.
+//go:nosplit
+func needm() {
+	if (iscgo || GOOS == "windows") && !cgoHasExtraM {
+		// Can happen if C/C++ code calls Go from a global ctor.
+		// Can also happen on Windows if a global ctor uses a
+		// callback created by syscall.NewCallback. See issue #6751
+		// for details.
+		//
+		// Can not throw, because scheduler is not initialized yet.
+		write(2, unsafe.Pointer(&earlycgocallback[0]), int32(len(earlycgocallback)))
+		exit(1)
+	}
+
+	// Save and block signals before getting an M.
+	// The signal handler may call needm itself,
+	// and we must avoid a deadlock. Also, once g is installed,
+	// any incoming signals will try to execute,
+	// but we won't have the sigaltstack settings and other data
+	// set up appropriately until the end of minit, which will
+	// unblock the signals. This is the same dance as when
+	// starting a new m to run Go code via newosproc.
+	var sigmask sigset
+	sigsave(&sigmask)
+	sigblock(false)
+
+	// Lock extra list, take head, unlock popped list.
+	// nilokay=false is safe here because of the invariant above,
+	// that the extra list always contains or will soon contain
+	// at least one m.
+	mp := lockextra(false)
+
+	// Set needextram when we've just emptied the list,
+	// so that the eventual call into cgocallbackg will
+	// allocate a new m for the extra list. We delay the
+	// allocation until then so that it can be done
+	// after exitsyscall makes sure it is okay to be
+	// running at all (that is, there's no garbage collection
+	// running right now).
+	mp.needextram = mp.schedlink == 0
+	extraMCount--
+	unlockextra(mp.schedlink.ptr())
+
+	// Store the original signal mask for use by minit.
+	mp.sigmask = sigmask
+
+	// Install g (= m->g0) and set the stack bounds
+	// to match the current stack. We don't actually know
+	// how big the stack is, like we don't know how big any
+	// scheduling stack is, but we assume there's at least 32 kB,
+	// which is more than enough for us.
+	setg(mp.g0)
+	_g_ := getg()
+	_g_.stack.hi = getcallersp() + 1024
+	_g_.stack.lo = getcallersp() - 32*1024
+	_g_.stackguard0 = _g_.stack.lo + _StackGuard
+
+	// Initialize this thread to use the m.
+	asminit()
+	minit()
+
+	// mp.curg is now a real goroutine.
+	casgstatus(mp.curg, _Gdead, _Gsyscall)
+	atomic.Xadd(&sched.ngsys, -1)
+}
+
+var earlycgocallback = []byte("fatal error: cgo callback before cgo call\n")
+
+// newextram allocates m's and puts them on the extra list.
+// It is called with a working local m, so that it can do things
+// like call schedlock and allocate.
+func newextram() {
+	c := atomic.Xchg(&extraMWaiters, 0)
+	if c > 0 {
+		for i := uint32(0); i < c; i++ {
+			oneNewExtraM()
+		}
+	} else {
+		// Make sure there is at least one extra M.
+		mp := lockextra(true)
+		unlockextra(mp)
+		if mp == nil {
+			oneNewExtraM()
+		}
+	}
+}
+
+// oneNewExtraM allocates an m and puts it on the extra list.
+func oneNewExtraM() {
+	// Create extra goroutine locked to extra m.
+	// The goroutine is the context in which the cgo callback will run.
+	// The sched.pc will never be returned to, but setting it to
+	// goexit makes clear to the traceback routines where
+	// the goroutine stack ends.
+	mp := allocm(nil, nil, -1)
+	gp := malg(4096)
+	gp.sched.pc = funcPC(goexit) + sys.PCQuantum
+	gp.sched.sp = gp.stack.hi
+	gp.sched.sp -= 4 * sys.RegSize // extra space in case of reads slightly beyond frame
+	gp.sched.lr = 0
+	gp.sched.g = guintptr(unsafe.Pointer(gp))
+	gp.syscallpc = gp.sched.pc
+	gp.syscallsp = gp.sched.sp
+	gp.stktopsp = gp.sched.sp
+	// malg returns status as _Gidle. Change to _Gdead before
+	// adding to allg where GC can see it. We use _Gdead to hide
+	// this from tracebacks and stack scans since it isn't a
+	// "real" goroutine until needm grabs it.
+	casgstatus(gp, _Gidle, _Gdead)
+	gp.m = mp
+	mp.curg = gp
+	mp.lockedInt++
+	mp.lockedg.set(gp)
+	gp.lockedm.set(mp)
+	gp.goid = int64(atomic.Xadd64(&sched.goidgen, 1))
+	if raceenabled {
+		gp.racectx = racegostart(funcPC(newextram) + sys.PCQuantum)
+	}
+	// put on allg for garbage collector
+	allgadd(gp)
+
+	// gp is now on the allg list, but we don't want it to be
+	// counted by gcount. It would be more "proper" to increment
+	// sched.ngfree, but that requires locking. Incrementing ngsys
+	// has the same effect.
+	atomic.Xadd(&sched.ngsys, +1)
+
+	// Add m to the extra list.
+	mnext := lockextra(true)
+	mp.schedlink.set(mnext)
+	extraMCount++
+	unlockextra(mp)
+}
+
+// dropm is called when a cgo callback has called needm but is now
+// done with the callback and returning back into the non-Go thread.
+// It puts the current m back onto the extra list.
+//
+// The main expense here is the call to signalstack to release the
+// m's signal stack, and then the call to needm on the next callback
+// from this thread. It is tempting to try to save the m for next time,
+// which would eliminate both these costs, but there might not be
+// a next time: the current thread (which Go does not control) might exit.
+// If we saved the m for that thread, there would be an m leak each time
+// such a thread exited. Instead, we acquire and release an m on each
+// call. These should typically not be scheduling operations, just a few
+// atomics, so the cost should be small.
+//
+// TODO(rsc): An alternative would be to allocate a dummy pthread per-thread
+// variable using pthread_key_create. Unlike the pthread keys we already use
+// on OS X, this dummy key would never be read by Go code. It would exist
+// only so that we could register at thread-exit-time destructor.
+// That destructor would put the m back onto the extra list.
+// This is purely a performance optimization. The current version,
+// in which dropm happens on each cgo call, is still correct too.
+// We may have to keep the current version on systems with cgo
+// but without pthreads, like Windows.
+func dropm() {
+	// Clear m and g, and return m to the extra list.
+	// After the call to setg we can only call nosplit functions
+	// with no pointer manipulation.
+	mp := getg().m
+
+	// Return mp.curg to dead state.
+	casgstatus(mp.curg, _Gsyscall, _Gdead)
+	mp.curg.preemptStop = false
+	atomic.Xadd(&sched.ngsys, +1)
+
+	// Block signals before unminit.
+	// Unminit unregisters the signal handling stack (but needs g on some systems).
+	// Setg(nil) clears g, which is the signal handler's cue not to run Go handlers.
+	// It's important not to try to handle a signal between those two steps.
+	sigmask := mp.sigmask
+	sigblock(false)
+	unminit()
+
+	mnext := lockextra(true)
+	extraMCount++
+	mp.schedlink.set(mnext)
+
+	setg(nil)
+
+	// Commit the release of mp.
+	unlockextra(mp)
+
+	msigrestore(sigmask)
+}
+
+// A helper function for EnsureDropM.
+func getm() uintptr {
+	return uintptr(unsafe.Pointer(getg().m))
+}
+
+var extram uintptr
+var extraMCount uint32 // Protected by lockextra
+var extraMWaiters uint32
+
+// lockextra locks the extra list and returns the list head.
+// The caller must unlock the list by storing a new list head
+// to extram. If nilokay is true, then lockextra will
+// return a nil list head if that's what it finds. If nilokay is false,
+// lockextra will keep waiting until the list head is no longer nil.
+//go:nosplit
+func lockextra(nilokay bool) *m {
+	const locked = 1
+
+	incr := false
+	for {
+		old := atomic.Loaduintptr(&extram)
+		if old == locked {
+			osyield()
+			continue
+		}
+		if old == 0 && !nilokay {
+			if !incr {
+				// Add 1 to the number of threads
+				// waiting for an M.
+				// This is cleared by newextram.
+				atomic.Xadd(&extraMWaiters, 1)
+				incr = true
+			}
+			usleep(1)
+			continue
+		}
+		if atomic.Casuintptr(&extram, old, locked) {
+			return (*m)(unsafe.Pointer(old))
+		}
+		osyield()
+		continue
+	}
+}
+
+//go:nosplit
+func unlockextra(mp *m) {
+	atomic.Storeuintptr(&extram, uintptr(unsafe.Pointer(mp)))
+}
+
+// execLock serializes exec and clone to avoid bugs or unspecified behaviour
+// around exec'ing while creating/destroying threads.  See issue #19546.
+var execLock rwmutex
+
+// newmHandoff contains a list of m structures that need new OS threads.
+// This is used by newm in situations where newm itself can't safely
+// start an OS thread.
+var newmHandoff struct {
+	lock mutex
+
+	// newm points to a list of M structures that need new OS
+	// threads. The list is linked through m.schedlink.
+	newm muintptr
+
+	// waiting indicates that wake needs to be notified when an m
+	// is put on the list.
+	waiting bool
+	wake    note
+
+	// haveTemplateThread indicates that the templateThread has
+	// been started. This is not protected by lock. Use cas to set
+	// to 1.
+	haveTemplateThread uint32
+}
+
+// Create a new m. It will start off with a call to fn, or else the scheduler.
+// fn needs to be static and not a heap allocated closure.
+// May run with m.p==nil, so write barriers are not allowed.
+//
+// id is optional pre-allocated m ID. Omit by passing -1.
+//go:nowritebarrierrec
+func newm(fn func(), _p_ *p, id int64) {
+	mp := allocm(_p_, fn, id)
+	mp.doesPark = (_p_ != nil)
+	mp.nextp.set(_p_)
+	mp.sigmask = initSigmask
+	if gp := getg(); gp != nil && gp.m != nil && (gp.m.lockedExt != 0 || gp.m.incgo) && GOOS != "plan9" {
+		// We're on a locked M or a thread that may have been
+		// started by C. The kernel state of this thread may
+		// be strange (the user may have locked it for that
+		// purpose). We don't want to clone that into another
+		// thread. Instead, ask a known-good thread to create
+		// the thread for us.
+		//
+		// This is disabled on Plan 9. See golang.org/issue/22227.
+		//
+		// TODO: This may be unnecessary on Windows, which
+		// doesn't model thread creation off fork.
+		lock(&newmHandoff.lock)
+		if newmHandoff.haveTemplateThread == 0 {
+			throw("on a locked thread with no template thread")
+		}
+		mp.schedlink = newmHandoff.newm
+		newmHandoff.newm.set(mp)
+		if newmHandoff.waiting {
+			newmHandoff.waiting = false
+			notewakeup(&newmHandoff.wake)
+		}
+		unlock(&newmHandoff.lock)
+		return
+	}
+	newm1(mp)
+}
+
+func newm1(mp *m) {
+	if iscgo {
+		var ts cgothreadstart
+		if _cgo_thread_start == nil {
+			throw("_cgo_thread_start missing")
+		}
+		ts.g.set(mp.g0)
+		ts.tls = (*uint64)(unsafe.Pointer(&mp.tls[0]))
+		ts.fn = unsafe.Pointer(funcPC(mstart))
+		if msanenabled {
+			msanwrite(unsafe.Pointer(&ts), unsafe.Sizeof(ts))
+		}
+		execLock.rlock() // Prevent process clone.
+		asmcgocall(_cgo_thread_start, unsafe.Pointer(&ts))
+		execLock.runlock()
+		return
+	}
+	execLock.rlock() // Prevent process clone.
+	newosproc(mp)
+	execLock.runlock()
+}
+
+// startTemplateThread starts the template thread if it is not already
+// running.
+//
+// The calling thread must itself be in a known-good state.
+func startTemplateThread() {
+	if GOARCH == "wasm" { // no threads on wasm yet
+		return
+	}
+
+	// Disable preemption to guarantee that the template thread will be
+	// created before a park once haveTemplateThread is set.
+	mp := acquirem()
+	if !atomic.Cas(&newmHandoff.haveTemplateThread, 0, 1) {
+		releasem(mp)
+		return
+	}
+	newm(templateThread, nil, -1)
+	releasem(mp)
+}
+
+// mFixupRace is used to temporarily borrow the race context from the
+// coordinating m during a syscall_runtime_doAllThreadsSyscall and
+// loan it out to each of the m's of the runtime so they can execute a
+// mFixup.fn in that context.
+var mFixupRace struct {
+	lock mutex
+	ctx  uintptr
+}
+
+// mDoFixup runs any outstanding fixup function for the running m.
+// Returns true if a fixup was outstanding and actually executed.
+//
+// Note: to avoid deadlocks, and the need for the fixup function
+// itself to be async safe, signals are blocked for the working m
+// while it holds the mFixup lock. (See golang.org/issue/44193)
+//
+//go:nosplit
+func mDoFixup() bool {
+	_g_ := getg()
+	if used := atomic.Load(&_g_.m.mFixup.used); used == 0 {
+		return false
+	}
+
+	// slow path - if fixup fn is used, block signals and lock.
+	var sigmask sigset
+	sigsave(&sigmask)
+	sigblock(false)
+	lock(&_g_.m.mFixup.lock)
+	fn := _g_.m.mFixup.fn
+	if fn != nil {
+		if gcphase != _GCoff {
+			// We can't have a write barrier in this
+			// context since we may not have a P, but we
+			// clear fn to signal that we've executed the
+			// fixup. As long as fn is kept alive
+			// elsewhere, technically we should have no
+			// issues with the GC, but fn is likely
+			// generated in a different package altogether
+			// that may change independently. Just assert
+			// the GC is off so this lack of write barrier
+			// is more obviously safe.
+			throw("GC must be disabled to protect validity of fn value")
+		}
+		if _g_.racectx != 0 || !raceenabled {
+			fn(false)
+		} else {
+			// temporarily acquire the context of the
+			// originator of the
+			// syscall_runtime_doAllThreadsSyscall and
+			// block others from using it for the duration
+			// of the fixup call.
+			lock(&mFixupRace.lock)
+			_g_.racectx = mFixupRace.ctx
+			fn(false)
+			_g_.racectx = 0
+			unlock(&mFixupRace.lock)
+		}
+		*(*uintptr)(unsafe.Pointer(&_g_.m.mFixup.fn)) = 0
+		atomic.Store(&_g_.m.mFixup.used, 0)
+	}
+	unlock(&_g_.m.mFixup.lock)
+	msigrestore(sigmask)
+	return fn != nil
+}
+
+// mDoFixupAndOSYield is called when an m is unable to send a signal
+// because the allThreadsSyscall mechanism is in progress. That is, an
+// mPark() has been interrupted with this signal handler so we need to
+// ensure the fixup is executed from this context.
+//go:nosplit
+func mDoFixupAndOSYield() {
+	mDoFixup()
+	osyield()
+}
+
+// templateThread is a thread in a known-good state that exists solely
+// to start new threads in known-good states when the calling thread
+// may not be in a good state.
+//
+// Many programs never need this, so templateThread is started lazily
+// when we first enter a state that might lead to running on a thread
+// in an unknown state.
+//
+// templateThread runs on an M without a P, so it must not have write
+// barriers.
+//
+//go:nowritebarrierrec
+func templateThread() {
+	lock(&sched.lock)
+	sched.nmsys++
+	checkdead()
+	unlock(&sched.lock)
+
+	for {
+		lock(&newmHandoff.lock)
+		for newmHandoff.newm != 0 {
+			newm := newmHandoff.newm.ptr()
+			newmHandoff.newm = 0
+			unlock(&newmHandoff.lock)
+			for newm != nil {
+				next := newm.schedlink.ptr()
+				newm.schedlink = 0
+				newm1(newm)
+				newm = next
+			}
+			lock(&newmHandoff.lock)
+		}
+		newmHandoff.waiting = true
+		noteclear(&newmHandoff.wake)
+		unlock(&newmHandoff.lock)
+		notesleep(&newmHandoff.wake)
+		mDoFixup()
+	}
+}
+
+// Stops execution of the current m until new work is available.
+// Returns with acquired P.
+func stopm() {
+	_g_ := getg()
+
+	if _g_.m.locks != 0 {
+		throw("stopm holding locks")
+	}
+	if _g_.m.p != 0 {
+		throw("stopm holding p")
+	}
+	if _g_.m.spinning {
+		throw("stopm spinning")
+	}
+
+	lock(&sched.lock)
+	mput(_g_.m)
+	unlock(&sched.lock)
+	mPark()
+	acquirep(_g_.m.nextp.ptr())
+	_g_.m.nextp = 0
+}
+
+func mspinning() {
+	// startm's caller incremented nmspinning. Set the new M's spinning.
+	getg().m.spinning = true
+}
+
+// Schedules some M to run the p (creates an M if necessary).
+// If p==nil, tries to get an idle P, if no idle P's does nothing.
+// May run with m.p==nil, so write barriers are not allowed.
+// If spinning is set, the caller has incremented nmspinning and startm will
+// either decrement nmspinning or set m.spinning in the newly started M.
+//
+// Callers passing a non-nil P must call from a non-preemptible context. See
+// comment on acquirem below.
+//
+// Must not have write barriers because this may be called without a P.
+//go:nowritebarrierrec
+func startm(_p_ *p, spinning bool) {
+	// Disable preemption.
+	//
+	// Every owned P must have an owner that will eventually stop it in the
+	// event of a GC stop request. startm takes transient ownership of a P
+	// (either from argument or pidleget below) and transfers ownership to
+	// a started M, which will be responsible for performing the stop.
+	//
+	// Preemption must be disabled during this transient ownership,
+	// otherwise the P this is running on may enter GC stop while still
+	// holding the transient P, leaving that P in limbo and deadlocking the
+	// STW.
+	//
+	// Callers passing a non-nil P must already be in non-preemptible
+	// context, otherwise such preemption could occur on function entry to
+	// startm. Callers passing a nil P may be preemptible, so we must
+	// disable preemption before acquiring a P from pidleget below.
+	mp := acquirem()
+	lock(&sched.lock)
+	if _p_ == nil {
+		_p_ = pidleget()
+		if _p_ == nil {
+			unlock(&sched.lock)
+			if spinning {
+				// The caller incremented nmspinning, but there are no idle Ps,
+				// so it's okay to just undo the increment and give up.
+				if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
+					throw("startm: negative nmspinning")
+				}
+			}
+			releasem(mp)
+			return
+		}
+	}
+	nmp := mget()
+	if nmp == nil {
+		// No M is available, we must drop sched.lock and call newm.
+		// However, we already own a P to assign to the M.
+		//
+		// Once sched.lock is released, another G (e.g., in a syscall),
+		// could find no idle P while checkdead finds a runnable G but
+		// no running M's because this new M hasn't started yet, thus
+		// throwing in an apparent deadlock.
+		//
+		// Avoid this situation by pre-allocating the ID for the new M,
+		// thus marking it as 'running' before we drop sched.lock. This
+		// new M will eventually run the scheduler to execute any
+		// queued G's.
+		id := mReserveID()
+		unlock(&sched.lock)
+
+		var fn func()
+		if spinning {
+			// The caller incremented nmspinning, so set m.spinning in the new M.
+			fn = mspinning
+		}
+		newm(fn, _p_, id)
+		// Ownership transfer of _p_ committed by start in newm.
+		// Preemption is now safe.
+		releasem(mp)
+		return
+	}
+	unlock(&sched.lock)
+	if nmp.spinning {
+		throw("startm: m is spinning")
+	}
+	if nmp.nextp != 0 {
+		throw("startm: m has p")
+	}
+	if spinning && !runqempty(_p_) {
+		throw("startm: p has runnable gs")
+	}
+	// The caller incremented nmspinning, so set m.spinning in the new M.
+	nmp.spinning = spinning
+	nmp.nextp.set(_p_)
+	notewakeup(&nmp.park)
+	// Ownership transfer of _p_ committed by wakeup. Preemption is now
+	// safe.
+	releasem(mp)
+}
+
+// Hands off P from syscall or locked M.
+// Always runs without a P, so write barriers are not allowed.
+//go:nowritebarrierrec
+func handoffp(_p_ *p) {
+	// handoffp must start an M in any situation where
+	// findrunnable would return a G to run on _p_.
+
+	// if it has local work, start it straight away
+	if !runqempty(_p_) || sched.runqsize != 0 {
+		startm(_p_, false)
+		return
+	}
+	// if it has GC work, start it straight away
+	if gcBlackenEnabled != 0 && gcMarkWorkAvailable(_p_) {
+		startm(_p_, false)
+		return
+	}
+	// no local work, check that there are no spinning/idle M's,
+	// otherwise our help is not required
+	if atomic.Load(&sched.nmspinning)+atomic.Load(&sched.npidle) == 0 && atomic.Cas(&sched.nmspinning, 0, 1) { // TODO: fast atomic
+		startm(_p_, true)
+		return
+	}
+	lock(&sched.lock)
+	if sched.gcwaiting != 0 {
+		_p_.status = _Pgcstop
+		sched.stopwait--
+		if sched.stopwait == 0 {
+			notewakeup(&sched.stopnote)
+		}
+		unlock(&sched.lock)
+		return
+	}
+	if _p_.runSafePointFn != 0 && atomic.Cas(&_p_.runSafePointFn, 1, 0) {
+		sched.safePointFn(_p_)
+		sched.safePointWait--
+		if sched.safePointWait == 0 {
+			notewakeup(&sched.safePointNote)
+		}
+	}
+	if sched.runqsize != 0 {
+		unlock(&sched.lock)
+		startm(_p_, false)
+		return
+	}
+	// If this is the last running P and nobody is polling network,
+	// need to wakeup another M to poll network.
+	if sched.npidle == uint32(gomaxprocs-1) && atomic.Load64(&sched.lastpoll) != 0 {
+		unlock(&sched.lock)
+		startm(_p_, false)
+		return
+	}
+
+	// The scheduler lock cannot be held when calling wakeNetPoller below
+	// because wakeNetPoller may call wakep which may call startm.
+	when := nobarrierWakeTime(_p_)
+	pidleput(_p_)
+	unlock(&sched.lock)
+
+	if when != 0 {
+		wakeNetPoller(when)
+	}
+}
+
+// Tries to add one more P to execute G's.
+// Called when a G is made runnable (newproc, ready).
+func wakep() {
+	if atomic.Load(&sched.npidle) == 0 {
+		return
+	}
+	// be conservative about spinning threads
+	if atomic.Load(&sched.nmspinning) != 0 || !atomic.Cas(&sched.nmspinning, 0, 1) {
+		return
+	}
+	startm(nil, true)
+}
+
+// Stops execution of the current m that is locked to a g until the g is runnable again.
+// Returns with acquired P.
+func stoplockedm() {
+	_g_ := getg()
+
+	if _g_.m.lockedg == 0 || _g_.m.lockedg.ptr().lockedm.ptr() != _g_.m {
+		throw("stoplockedm: inconsistent locking")
+	}
+	if _g_.m.p != 0 {
+		// Schedule another M to run this p.
+		_p_ := releasep()
+		handoffp(_p_)
+	}
+	incidlelocked(1)
+	// Wait until another thread schedules lockedg again.
+	mPark()
+	status := readgstatus(_g_.m.lockedg.ptr())
+	if status&^_Gscan != _Grunnable {
+		print("runtime:stoplockedm: lockedg (atomicstatus=", status, ") is not Grunnable or Gscanrunnable\n")
+		dumpgstatus(_g_.m.lockedg.ptr())
+		throw("stoplockedm: not runnable")
+	}
+	acquirep(_g_.m.nextp.ptr())
+	_g_.m.nextp = 0
+}
+
+// Schedules the locked m to run the locked gp.
+// May run during STW, so write barriers are not allowed.
+//go:nowritebarrierrec
+func startlockedm(gp *g) {
+	_g_ := getg()
+
+	mp := gp.lockedm.ptr()
+	if mp == _g_.m {
+		throw("startlockedm: locked to me")
+	}
+	if mp.nextp != 0 {
+		throw("startlockedm: m has p")
+	}
+	// directly handoff current P to the locked m
+	incidlelocked(-1)
+	_p_ := releasep()
+	mp.nextp.set(_p_)
+	notewakeup(&mp.park)
+	stopm()
+}
+
+// Stops the current m for stopTheWorld.
+// Returns when the world is restarted.
+func gcstopm() {
+	_g_ := getg()
+
+	if sched.gcwaiting == 0 {
+		throw("gcstopm: not waiting for gc")
+	}
+	if _g_.m.spinning {
+		_g_.m.spinning = false
+		// OK to just drop nmspinning here,
+		// startTheWorld will unpark threads as necessary.
+		if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
+			throw("gcstopm: negative nmspinning")
+		}
+	}
+	_p_ := releasep()
+	lock(&sched.lock)
+	_p_.status = _Pgcstop
+	sched.stopwait--
+	if sched.stopwait == 0 {
+		notewakeup(&sched.stopnote)
+	}
+	unlock(&sched.lock)
+	stopm()
+}
+
+// Schedules gp to run on the current M.
+// If inheritTime is true, gp inherits the remaining time in the
+// current time slice. Otherwise, it starts a new time slice.
+// Never returns.
+//
+// Write barriers are allowed because this is called immediately after
+// acquiring a P in several places.
+//
+//go:yeswritebarrierrec
+func execute(gp *g, inheritTime bool) {
+	_g_ := getg()
+
+	// Assign gp.m before entering _Grunning so running Gs have an
+	// M.
+	_g_.m.curg = gp
+	gp.m = _g_.m
+	casgstatus(gp, _Grunnable, _Grunning)
+	gp.waitsince = 0
+	gp.preempt = false
+	gp.stackguard0 = gp.stack.lo + _StackGuard
+	if !inheritTime {
+		_g_.m.p.ptr().schedtick++
+	}
+
+	// Check whether the profiler needs to be turned on or off.
+	hz := sched.profilehz
+	if _g_.m.profilehz != hz {
+		setThreadCPUProfiler(hz)
+	}
+
+	if trace.enabled {
+		// GoSysExit has to happen when we have a P, but before GoStart.
+		// So we emit it here.
+		if gp.syscallsp != 0 && gp.sysblocktraced {
+			traceGoSysExit(gp.sysexitticks)
+		}
+		traceGoStart()
+	}
+
+	gogo(&gp.sched)
+}
+
+// Finds a runnable goroutine to execute.
+// Tries to steal from other P's, get g from local or global queue, poll network.
+func findrunnable() (gp *g, inheritTime bool) {
+	_g_ := getg()
+
+	// The conditions here and in handoffp must agree: if
+	// findrunnable would return a G to run, handoffp must start
+	// an M.
+
+top:
+	_p_ := _g_.m.p.ptr()
+	if sched.gcwaiting != 0 {
+		gcstopm()
+		goto top
+	}
+	if _p_.runSafePointFn != 0 {
+		runSafePointFn()
+	}
+
+	now, pollUntil, _ := checkTimers(_p_, 0)
+
+	if fingwait && fingwake {
+		if gp := wakefing(); gp != nil {
+			ready(gp, 0, true)
+		}
+	}
+	if *cgo_yield != nil {
+		asmcgocall(*cgo_yield, nil)
+	}
+
+	// local runq
+	if gp, inheritTime := runqget(_p_); gp != nil {
+		return gp, inheritTime
+	}
+
+	// global runq
+	if sched.runqsize != 0 {
+		lock(&sched.lock)
+		gp := globrunqget(_p_, 0)
+		unlock(&sched.lock)
+		if gp != nil {
+			return gp, false
+		}
+	}
+
+	// Poll network.
+	// This netpoll is only an optimization before we resort to stealing.
+	// We can safely skip it if there are no waiters or a thread is blocked
+	// in netpoll already. If there is any kind of logical race with that
+	// blocked thread (e.g. it has already returned from netpoll, but does
+	// not set lastpoll yet), this thread will do blocking netpoll below
+	// anyway.
+	if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Load64(&sched.lastpoll) != 0 {
+		if list := netpoll(0); !list.empty() { // non-blocking
+			gp := list.pop()
+			injectglist(&list)
+			casgstatus(gp, _Gwaiting, _Grunnable)
+			if trace.enabled {
+				traceGoUnpark(gp, 0)
+			}
+			return gp, false
+		}
+	}
+
+	// Steal work from other P's.
+	procs := uint32(gomaxprocs)
+	ranTimer := false
+	// If number of spinning M's >= number of busy P's, block.
+	// This is necessary to prevent excessive CPU consumption
+	// when GOMAXPROCS>>1 but the program parallelism is low.
+	if !_g_.m.spinning && 2*atomic.Load(&sched.nmspinning) >= procs-atomic.Load(&sched.npidle) {
+		goto stop
+	}
+	if !_g_.m.spinning {
+		_g_.m.spinning = true
+		atomic.Xadd(&sched.nmspinning, 1)
+	}
+	const stealTries = 4
+	for i := 0; i < stealTries; i++ {
+		stealTimersOrRunNextG := i == stealTries-1
+
+		for enum := stealOrder.start(fastrand()); !enum.done(); enum.next() {
+			if sched.gcwaiting != 0 {
+				goto top
+			}
+			p2 := allp[enum.position()]
+			if _p_ == p2 {
+				continue
+			}
+
+			// Steal timers from p2. This call to checkTimers is the only place
+			// where we might hold a lock on a different P's timers. We do this
+			// once on the last pass before checking runnext because stealing
+			// from the other P's runnext should be the last resort, so if there
+			// are timers to steal do that first.
+			//
+			// We only check timers on one of the stealing iterations because
+			// the time stored in now doesn't change in this loop and checking
+			// the timers for each P more than once with the same value of now
+			// is probably a waste of time.
+			//
+			// timerpMask tells us whether the P may have timers at all. If it
+			// can't, no need to check at all.
+			if stealTimersOrRunNextG && timerpMask.read(enum.position()) {
+				tnow, w, ran := checkTimers(p2, now)
+				now = tnow
+				if w != 0 && (pollUntil == 0 || w < pollUntil) {
+					pollUntil = w
+				}
+				if ran {
+					// Running the timers may have
+					// made an arbitrary number of G's
+					// ready and added them to this P's
+					// local run queue. That invalidates
+					// the assumption of runqsteal
+					// that is always has room to add
+					// stolen G's. So check now if there
+					// is a local G to run.
+					if gp, inheritTime := runqget(_p_); gp != nil {
+						return gp, inheritTime
+					}
+					ranTimer = true
+				}
+			}
+
+			// Don't bother to attempt to steal if p2 is idle.
+			if !idlepMask.read(enum.position()) {
+				if gp := runqsteal(_p_, p2, stealTimersOrRunNextG); gp != nil {
+					return gp, false
+				}
+			}
+		}
+	}
+	if ranTimer {
+		// Running a timer may have made some goroutine ready.
+		goto top
+	}
+
+stop:
+
+	// We have nothing to do. If we're in the GC mark phase, can
+	// safely scan and blacken objects, and have work to do, run
+	// idle-time marking rather than give up the P.
+	if gcBlackenEnabled != 0 && gcMarkWorkAvailable(_p_) {
+		node := (*gcBgMarkWorkerNode)(gcBgMarkWorkerPool.pop())
+		if node != nil {
+			_p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
+			gp := node.gp.ptr()
+			casgstatus(gp, _Gwaiting, _Grunnable)
+			if trace.enabled {
+				traceGoUnpark(gp, 0)
+			}
+			return gp, false
+		}
+	}
+
+	delta := int64(-1)
+	if pollUntil != 0 {
+		// checkTimers ensures that polluntil > now.
+		delta = pollUntil - now
+	}
+
+	// wasm only:
+	// If a callback returned and no other goroutine is awake,
+	// then wake event handler goroutine which pauses execution
+	// until a callback was triggered.
+	gp, otherReady := beforeIdle(delta)
+	if gp != nil {
+		casgstatus(gp, _Gwaiting, _Grunnable)
+		if trace.enabled {
+			traceGoUnpark(gp, 0)
+		}
+		return gp, false
+	}
+	if otherReady {
+		goto top
+	}
+
+	// Before we drop our P, make a snapshot of the allp slice,
+	// which can change underfoot once we no longer block
+	// safe-points. We don't need to snapshot the contents because
+	// everything up to cap(allp) is immutable.
+	allpSnapshot := allp
+	// Also snapshot masks. Value changes are OK, but we can't allow
+	// len to change out from under us.
+	idlepMaskSnapshot := idlepMask
+	timerpMaskSnapshot := timerpMask
+
+	// return P and block
+	lock(&sched.lock)
+	if sched.gcwaiting != 0 || _p_.runSafePointFn != 0 {
+		unlock(&sched.lock)
+		goto top
+	}
+	if sched.runqsize != 0 {
+		gp := globrunqget(_p_, 0)
+		unlock(&sched.lock)
+		return gp, false
+	}
+	if releasep() != _p_ {
+		throw("findrunnable: wrong p")
+	}
+	pidleput(_p_)
+	unlock(&sched.lock)
+
+	// Delicate dance: thread transitions from spinning to non-spinning state,
+	// potentially concurrently with submission of new goroutines. We must
+	// drop nmspinning first and then check all per-P queues again (with
+	// #StoreLoad memory barrier in between). If we do it the other way around,
+	// another thread can submit a goroutine after we've checked all run queues
+	// but before we drop nmspinning; as a result nobody will unpark a thread
+	// to run the goroutine.
+	// If we discover new work below, we need to restore m.spinning as a signal
+	// for resetspinning to unpark a new worker thread (because there can be more
+	// than one starving goroutine). However, if after discovering new work
+	// we also observe no idle Ps, it is OK to just park the current thread:
+	// the system is fully loaded so no spinning threads are required.
+	// Also see "Worker thread parking/unparking" comment at the top of the file.
+	wasSpinning := _g_.m.spinning
+	if _g_.m.spinning {
+		_g_.m.spinning = false
+		if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
+			throw("findrunnable: negative nmspinning")
+		}
+	}
+
+	// check all runqueues once again
+	for id, _p_ := range allpSnapshot {
+		if !idlepMaskSnapshot.read(uint32(id)) && !runqempty(_p_) {
+			lock(&sched.lock)
+			_p_ = pidleget()
+			unlock(&sched.lock)
+			if _p_ != nil {
+				acquirep(_p_)
+				if wasSpinning {
+					_g_.m.spinning = true
+					atomic.Xadd(&sched.nmspinning, 1)
+				}
+				goto top
+			}
+			break
+		}
+	}
+
+	// Similar to above, check for timer creation or expiry concurrently with
+	// transitioning from spinning to non-spinning. Note that we cannot use
+	// checkTimers here because it calls adjusttimers which may need to allocate
+	// memory, and that isn't allowed when we don't have an active P.
+	for id, _p_ := range allpSnapshot {
+		if timerpMaskSnapshot.read(uint32(id)) {
+			w := nobarrierWakeTime(_p_)
+			if w != 0 && (pollUntil == 0 || w < pollUntil) {
+				pollUntil = w
+			}
+		}
+	}
+	if pollUntil != 0 {
+		if now == 0 {
+			now = nanotime()
+		}
+		delta = pollUntil - now
+		if delta < 0 {
+			delta = 0
+		}
+	}
+
+	// Check for idle-priority GC work again.
+	//
+	// N.B. Since we have no P, gcBlackenEnabled may change at any time; we
+	// must check again after acquiring a P.
+	if atomic.Load(&gcBlackenEnabled) != 0 && gcMarkWorkAvailable(nil) {
+		// Work is available; we can start an idle GC worker only if
+		// there is an available P and available worker G.
+		//
+		// We can attempt to acquire these in either order. Workers are
+		// almost always available (see comment in findRunnableGCWorker
+		// for the one case there may be none). Since we're slightly
+		// less likely to find a P, check for that first.
+		lock(&sched.lock)
+		var node *gcBgMarkWorkerNode
+		_p_ = pidleget()
+		if _p_ != nil {
+			// Now that we own a P, gcBlackenEnabled can't change
+			// (as it requires STW).
+			if gcBlackenEnabled != 0 {
+				node = (*gcBgMarkWorkerNode)(gcBgMarkWorkerPool.pop())
+				if node == nil {
+					pidleput(_p_)
+					_p_ = nil
+				}
+			} else {
+				pidleput(_p_)
+				_p_ = nil
+			}
+		}
+		unlock(&sched.lock)
+		if _p_ != nil {
+			acquirep(_p_)
+			if wasSpinning {
+				_g_.m.spinning = true
+				atomic.Xadd(&sched.nmspinning, 1)
+			}
+
+			// Run the idle worker.
+			_p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
+			gp := node.gp.ptr()
+			casgstatus(gp, _Gwaiting, _Grunnable)
+			if trace.enabled {
+				traceGoUnpark(gp, 0)
+			}
+			return gp, false
+		}
+	}
+
+	// poll network
+	if netpollinited() && (atomic.Load(&netpollWaiters) > 0 || pollUntil != 0) && atomic.Xchg64(&sched.lastpoll, 0) != 0 {
+		atomic.Store64(&sched.pollUntil, uint64(pollUntil))
+		if _g_.m.p != 0 {
+			throw("findrunnable: netpoll with p")
+		}
+		if _g_.m.spinning {
+			throw("findrunnable: netpoll with spinning")
+		}
+		if faketime != 0 {
+			// When using fake time, just poll.
+			delta = 0
+		}
+		list := netpoll(delta) // block until new work is available
+		atomic.Store64(&sched.pollUntil, 0)
+		atomic.Store64(&sched.lastpoll, uint64(nanotime()))
+		if faketime != 0 && list.empty() {
+			// Using fake time and nothing is ready; stop M.
+			// When all M's stop, checkdead will call timejump.
+			stopm()
+			goto top
+		}
+		lock(&sched.lock)
+		_p_ = pidleget()
+		unlock(&sched.lock)
+		if _p_ == nil {
+			injectglist(&list)
+		} else {
+			acquirep(_p_)
+			if !list.empty() {
+				gp := list.pop()
+				injectglist(&list)
+				casgstatus(gp, _Gwaiting, _Grunnable)
+				if trace.enabled {
+					traceGoUnpark(gp, 0)
+				}
+				return gp, false
+			}
+			if wasSpinning {
+				_g_.m.spinning = true
+				atomic.Xadd(&sched.nmspinning, 1)
+			}
+			goto top
+		}
+	} else if pollUntil != 0 && netpollinited() {
+		pollerPollUntil := int64(atomic.Load64(&sched.pollUntil))
+		if pollerPollUntil == 0 || pollerPollUntil > pollUntil {
+			netpollBreak()
+		}
+	}
+	stopm()
+	goto top
+}
+
+// pollWork reports whether there is non-background work this P could
+// be doing. This is a fairly lightweight check to be used for
+// background work loops, like idle GC. It checks a subset of the
+// conditions checked by the actual scheduler.
+func pollWork() bool {
+	if sched.runqsize != 0 {
+		return true
+	}
+	p := getg().m.p.ptr()
+	if !runqempty(p) {
+		return true
+	}
+	if netpollinited() && atomic.Load(&netpollWaiters) > 0 && sched.lastpoll != 0 {
+		if list := netpoll(0); !list.empty() {
+			injectglist(&list)
+			return true
+		}
+	}
+	return false
+}
+
+// wakeNetPoller wakes up the thread sleeping in the network poller if it isn't
+// going to wake up before the when argument; or it wakes an idle P to service
+// timers and the network poller if there isn't one already.
+func wakeNetPoller(when int64) {
+	if atomic.Load64(&sched.lastpoll) == 0 {
+		// In findrunnable we ensure that when polling the pollUntil
+		// field is either zero or the time to which the current
+		// poll is expected to run. This can have a spurious wakeup
+		// but should never miss a wakeup.
+		pollerPollUntil := int64(atomic.Load64(&sched.pollUntil))
+		if pollerPollUntil == 0 || pollerPollUntil > when {
+			netpollBreak()
+		}
+	} else {
+		// There are no threads in the network poller, try to get
+		// one there so it can handle new timers.
+		if GOOS != "plan9" { // Temporary workaround - see issue #42303.
+			wakep()
+		}
+	}
+}
+
+func resetspinning() {
+	_g_ := getg()
+	if !_g_.m.spinning {
+		throw("resetspinning: not a spinning m")
+	}
+	_g_.m.spinning = false
+	nmspinning := atomic.Xadd(&sched.nmspinning, -1)
+	if int32(nmspinning) < 0 {
+		throw("findrunnable: negative nmspinning")
+	}
+	// M wakeup policy is deliberately somewhat conservative, so check if we
+	// need to wakeup another P here. See "Worker thread parking/unparking"
+	// comment at the top of the file for details.
+	wakep()
+}
+
+// injectglist adds each runnable G on the list to some run queue,
+// and clears glist. If there is no current P, they are added to the
+// global queue, and up to npidle M's are started to run them.
+// Otherwise, for each idle P, this adds a G to the global queue
+// and starts an M. Any remaining G's are added to the current P's
+// local run queue.
+// This may temporarily acquire sched.lock.
+// Can run concurrently with GC.
+func injectglist(glist *gList) {
+	if glist.empty() {
+		return
+	}
+	if trace.enabled {
+		for gp := glist.head.ptr(); gp != nil; gp = gp.schedlink.ptr() {
+			traceGoUnpark(gp, 0)
+		}
+	}
+
+	// Mark all the goroutines as runnable before we put them
+	// on the run queues.
+	head := glist.head.ptr()
+	var tail *g
+	qsize := 0
+	for gp := head; gp != nil; gp = gp.schedlink.ptr() {
+		tail = gp
+		qsize++
+		casgstatus(gp, _Gwaiting, _Grunnable)
+	}
+
+	// Turn the gList into a gQueue.
+	var q gQueue
+	q.head.set(head)
+	q.tail.set(tail)
+	*glist = gList{}
+
+	startIdle := func(n int) {
+		for ; n != 0 && sched.npidle != 0; n-- {
+			startm(nil, false)
+		}
+	}
+
+	pp := getg().m.p.ptr()
+	if pp == nil {
+		lock(&sched.lock)
+		globrunqputbatch(&q, int32(qsize))
+		unlock(&sched.lock)
+		startIdle(qsize)
+		return
+	}
+
+	npidle := int(atomic.Load(&sched.npidle))
+	var globq gQueue
+	var n int
+	for n = 0; n < npidle && !q.empty(); n++ {
+		g := q.pop()
+		globq.pushBack(g)
+	}
+	if n > 0 {
+		lock(&sched.lock)
+		globrunqputbatch(&globq, int32(n))
+		unlock(&sched.lock)
+		startIdle(n)
+		qsize -= n
+	}
+
+	if !q.empty() {
+		runqputbatch(pp, &q, qsize)
+	}
+}
+
+// One round of scheduler: find a runnable goroutine and execute it.
+// Never returns.
+func schedule() {
+	_g_ := getg()
+
+	if _g_.m.locks != 0 {
+		throw("schedule: holding locks")
+	}
+
+	if _g_.m.lockedg != 0 {
+		stoplockedm()
+		execute(_g_.m.lockedg.ptr(), false) // Never returns.
+	}
+
+	// We should not schedule away from a g that is executing a cgo call,
+	// since the cgo call is using the m's g0 stack.
+	if _g_.m.incgo {
+		throw("schedule: in cgo")
+	}
+
+top:
+	pp := _g_.m.p.ptr()
+	pp.preempt = false
+
+	if sched.gcwaiting != 0 {
+		gcstopm()
+		goto top
+	}
+	if pp.runSafePointFn != 0 {
+		runSafePointFn()
+	}
+
+	// Sanity check: if we are spinning, the run queue should be empty.
+	// Check this before calling checkTimers, as that might call
+	// goready to put a ready goroutine on the local run queue.
+	if _g_.m.spinning && (pp.runnext != 0 || pp.runqhead != pp.runqtail) {
+		throw("schedule: spinning with local work")
+	}
+
+	checkTimers(pp, 0)
+
+	var gp *g
+	var inheritTime bool
+
+	// Normal goroutines will check for need to wakeP in ready,
+	// but GCworkers and tracereaders will not, so the check must
+	// be done here instead.
+	tryWakeP := false
+	if trace.enabled || trace.shutdown {
+		gp = traceReader()
+		if gp != nil {
+			casgstatus(gp, _Gwaiting, _Grunnable)
+			traceGoUnpark(gp, 0)
+			tryWakeP = true
+		}
+	}
+	if gp == nil && gcBlackenEnabled != 0 {
+		gp = gcController.findRunnableGCWorker(_g_.m.p.ptr())
+		tryWakeP = tryWakeP || gp != nil
+	}
+	if gp == nil {
+		// Check the global runnable queue once in a while to ensure fairness.
+		// Otherwise two goroutines can completely occupy the local runqueue
+		// by constantly respawning each other.
+		if _g_.m.p.ptr().schedtick%61 == 0 && sched.runqsize > 0 {
+			lock(&sched.lock)
+			gp = globrunqget(_g_.m.p.ptr(), 1)
+			unlock(&sched.lock)
+		}
+	}
+	if gp == nil {
+		gp, inheritTime = runqget(_g_.m.p.ptr())
+		// We can see gp != nil here even if the M is spinning,
+		// if checkTimers added a local goroutine via goready.
+	}
+	if gp == nil {
+		gp, inheritTime = findrunnable() // blocks until work is available
+	}
+
+	// This thread is going to run a goroutine and is not spinning anymore,
+	// so if it was marked as spinning we need to reset it now and potentially
+	// start a new spinning M.
+	if _g_.m.spinning {
+		resetspinning()
+	}
+
+	if sched.disable.user && !schedEnabled(gp) {
+		// Scheduling of this goroutine is disabled. Put it on
+		// the list of pending runnable goroutines for when we
+		// re-enable user scheduling and look again.
+		lock(&sched.lock)
+		if schedEnabled(gp) {
+			// Something re-enabled scheduling while we
+			// were acquiring the lock.
+			unlock(&sched.lock)
+		} else {
+			sched.disable.runnable.pushBack(gp)
+			sched.disable.n++
+			unlock(&sched.lock)
+			goto top
+		}
+	}
+
+	// If about to schedule a not-normal goroutine (a GCworker or tracereader),
+	// wake a P if there is one.
+	if tryWakeP {
+		wakep()
+	}
+	if gp.lockedm != 0 {
+		// Hands off own p to the locked m,
+		// then blocks waiting for a new p.
+		startlockedm(gp)
+		goto top
+	}
+
+	execute(gp, inheritTime)
+}
+
+// dropg removes the association between m and the current goroutine m->curg (gp for short).
+// Typically a caller sets gp's status away from Grunning and then
+// immediately calls dropg to finish the job. The caller is also responsible
+// for arranging that gp will be restarted using ready at an
+// appropriate time. After calling dropg and arranging for gp to be
+// readied later, the caller can do other work but eventually should
+// call schedule to restart the scheduling of goroutines on this m.
+func dropg() {
+	_g_ := getg()
+
+	setMNoWB(&_g_.m.curg.m, nil)
+	setGNoWB(&_g_.m.curg, nil)
+}
+
+// checkTimers runs any timers for the P that are ready.
+// If now is not 0 it is the current time.
+// It returns the current time or 0 if it is not known,
+// and the time when the next timer should run or 0 if there is no next timer,
+// and reports whether it ran any timers.
+// If the time when the next timer should run is not 0,
+// it is always larger than the returned time.
+// We pass now in and out to avoid extra calls of nanotime.
+//go:yeswritebarrierrec
+func checkTimers(pp *p, now int64) (rnow, pollUntil int64, ran bool) {
+	// If it's not yet time for the first timer, or the first adjusted
+	// timer, then there is nothing to do.
+	next := int64(atomic.Load64(&pp.timer0When))
+	nextAdj := int64(atomic.Load64(&pp.timerModifiedEarliest))
+	if next == 0 || (nextAdj != 0 && nextAdj < next) {
+		next = nextAdj
+	}
+
+	if next == 0 {
+		// No timers to run or adjust.
+		return now, 0, false
+	}
+
+	if now == 0 {
+		now = nanotime()
+	}
+	if now < next {
+		// Next timer is not ready to run, but keep going
+		// if we would clear deleted timers.
+		// This corresponds to the condition below where
+		// we decide whether to call clearDeletedTimers.
+		if pp != getg().m.p.ptr() || int(atomic.Load(&pp.deletedTimers)) <= int(atomic.Load(&pp.numTimers)/4) {
+			return now, next, false
+		}
+	}
+
+	lock(&pp.timersLock)
+
+	if len(pp.timers) > 0 {
+		adjusttimers(pp, now)
+		for len(pp.timers) > 0 {
+			// Note that runtimer may temporarily unlock
+			// pp.timersLock.
+			if tw := runtimer(pp, now); tw != 0 {
+				if tw > 0 {
+					pollUntil = tw
+				}
+				break
+			}
+			ran = true
+		}
+	}
+
+	// If this is the local P, and there are a lot of deleted timers,
+	// clear them out. We only do this for the local P to reduce
+	// lock contention on timersLock.
+	if pp == getg().m.p.ptr() && int(atomic.Load(&pp.deletedTimers)) > len(pp.timers)/4 {
+		clearDeletedTimers(pp)
+	}
+
+	unlock(&pp.timersLock)
+
+	return now, pollUntil, ran
+}
+
+func parkunlock_c(gp *g, lock unsafe.Pointer) bool {
+	unlock((*mutex)(lock))
+	return true
+}
+
+// park continuation on g0.
+func park_m(gp *g) {
+	_g_ := getg()
+
+	if trace.enabled {
+		traceGoPark(_g_.m.waittraceev, _g_.m.waittraceskip)
+	}
+
+	casgstatus(gp, _Grunning, _Gwaiting)
+	dropg()
+
+	if fn := _g_.m.waitunlockf; fn != nil {
+		ok := fn(gp, _g_.m.waitlock)
+		_g_.m.waitunlockf = nil
+		_g_.m.waitlock = nil
+		if !ok {
+			if trace.enabled {
+				traceGoUnpark(gp, 2)
+			}
+			casgstatus(gp, _Gwaiting, _Grunnable)
+			execute(gp, true) // Schedule it back, never returns.
+		}
+	}
+	schedule()
+}
+
+func goschedImpl(gp *g) {
+	status := readgstatus(gp)
+	if status&^_Gscan != _Grunning {
+		dumpgstatus(gp)
+		throw("bad g status")
+	}
+	casgstatus(gp, _Grunning, _Grunnable)
+	dropg()
+	lock(&sched.lock)
+	globrunqput(gp)
+	unlock(&sched.lock)
+
+	schedule()
+}
+
+// Gosched continuation on g0.
+func gosched_m(gp *g) {
+	if trace.enabled {
+		traceGoSched()
+	}
+	goschedImpl(gp)
+}
+
+// goschedguarded is a forbidden-states-avoided version of gosched_m
+func goschedguarded_m(gp *g) {
+
+	if !canPreemptM(gp.m) {
+		gogo(&gp.sched) // never return
+	}
+
+	if trace.enabled {
+		traceGoSched()
+	}
+	goschedImpl(gp)
+}
+
+func gopreempt_m(gp *g) {
+	if trace.enabled {
+		traceGoPreempt()
+	}
+	goschedImpl(gp)
+}
+
+// preemptPark parks gp and puts it in _Gpreempted.
+//
+//go:systemstack
+func preemptPark(gp *g) {
+	if trace.enabled {
+		traceGoPark(traceEvGoBlock, 0)
+	}
+	status := readgstatus(gp)
+	if status&^_Gscan != _Grunning {
+		dumpgstatus(gp)
+		throw("bad g status")
+	}
+	gp.waitreason = waitReasonPreempted
+	// Transition from _Grunning to _Gscan|_Gpreempted. We can't
+	// be in _Grunning when we dropg because then we'd be running
+	// without an M, but the moment we're in _Gpreempted,
+	// something could claim this G before we've fully cleaned it
+	// up. Hence, we set the scan bit to lock down further
+	// transitions until we can dropg.
+	casGToPreemptScan(gp, _Grunning, _Gscan|_Gpreempted)
+	dropg()
+	casfrom_Gscanstatus(gp, _Gscan|_Gpreempted, _Gpreempted)
+	schedule()
+}
+
+// goyield is like Gosched, but it:
+// - emits a GoPreempt trace event instead of a GoSched trace event
+// - puts the current G on the runq of the current P instead of the globrunq
+func goyield() {
+	checkTimeouts()
+	mcall(goyield_m)
+}
+
+func goyield_m(gp *g) {
+	if trace.enabled {
+		traceGoPreempt()
+	}
+	pp := gp.m.p.ptr()
+	casgstatus(gp, _Grunning, _Grunnable)
+	dropg()
+	runqput(pp, gp, false)
+	schedule()
+}
+
+// Finishes execution of the current goroutine.
+func goexit1() {
+	if raceenabled {
+		racegoend()
+	}
+	if trace.enabled {
+		traceGoEnd()
+	}
+	mcall(goexit0)
+}
+
+// goexit continuation on g0.
+func goexit0(gp *g) {
+	_g_ := getg()
+
+	casgstatus(gp, _Grunning, _Gdead)
+	if isSystemGoroutine(gp, false) {
+		atomic.Xadd(&sched.ngsys, -1)
+	}
+	gp.m = nil
+	locked := gp.lockedm != 0
+	gp.lockedm = 0
+	_g_.m.lockedg = 0
+	gp.preemptStop = false
+	gp.paniconfault = false
+	gp._defer = nil // should be true already but just in case.
+	gp._panic = nil // non-nil for Goexit during panic. points at stack-allocated data.
+	gp.writebuf = nil
+	gp.waitreason = 0
+	gp.param = nil
+	gp.labels = nil
+	gp.timer = nil
+
+	if gcBlackenEnabled != 0 && gp.gcAssistBytes > 0 {
+		// Flush assist credit to the global pool. This gives
+		// better information to pacing if the application is
+		// rapidly creating an exiting goroutines.
+		assistWorkPerByte := float64frombits(atomic.Load64(&gcController.assistWorkPerByte))
+		scanCredit := int64(assistWorkPerByte * float64(gp.gcAssistBytes))
+		atomic.Xaddint64(&gcController.bgScanCredit, scanCredit)
+		gp.gcAssistBytes = 0
+	}
+
+	dropg()
+
+	if GOARCH == "wasm" { // no threads yet on wasm
+		gfput(_g_.m.p.ptr(), gp)
+		schedule() // never returns
+	}
+
+	if _g_.m.lockedInt != 0 {
+		print("invalid m->lockedInt = ", _g_.m.lockedInt, "\n")
+		throw("internal lockOSThread error")
+	}
+	gfput(_g_.m.p.ptr(), gp)
+	if locked {
+		// The goroutine may have locked this thread because
+		// it put it in an unusual kernel state. Kill it
+		// rather than returning it to the thread pool.
+
+		// Return to mstart, which will release the P and exit
+		// the thread.
+		if GOOS != "plan9" { // See golang.org/issue/22227.
+			gogo(&_g_.m.g0.sched)
+		} else {
+			// Clear lockedExt on plan9 since we may end up re-using
+			// this thread.
+			_g_.m.lockedExt = 0
+		}
+	}
+	schedule()
+}
+
+// save updates getg().sched to refer to pc and sp so that a following
+// gogo will restore pc and sp.
+//
+// save must not have write barriers because invoking a write barrier
+// can clobber getg().sched.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func save(pc, sp uintptr) {
+	_g_ := getg()
+
+	_g_.sched.pc = pc
+	_g_.sched.sp = sp
+	_g_.sched.lr = 0
+	_g_.sched.ret = 0
+	_g_.sched.g = guintptr(unsafe.Pointer(_g_))
+	// We need to ensure ctxt is zero, but can't have a write
+	// barrier here. However, it should always already be zero.
+	// Assert that.
+	if _g_.sched.ctxt != nil {
+		badctxt()
+	}
+}
+
+// The goroutine g is about to enter a system call.
+// Record that it's not using the cpu anymore.
+// This is called only from the go syscall library and cgocall,
+// not from the low-level system calls used by the runtime.
+//
+// Entersyscall cannot split the stack: the gosave must
+// make g->sched refer to the caller's stack segment, because
+// entersyscall is going to return immediately after.
+//
+// Nothing entersyscall calls can split the stack either.
+// We cannot safely move the stack during an active call to syscall,
+// because we do not know which of the uintptr arguments are
+// really pointers (back into the stack).
+// In practice, this means that we make the fast path run through
+// entersyscall doing no-split things, and the slow path has to use systemstack
+// to run bigger things on the system stack.
+//
+// reentersyscall is the entry point used by cgo callbacks, where explicitly
+// saved SP and PC are restored. This is needed when exitsyscall will be called
+// from a function further up in the call stack than the parent, as g->syscallsp
+// must always point to a valid stack frame. entersyscall below is the normal
+// entry point for syscalls, which obtains the SP and PC from the caller.
+//
+// Syscall tracing:
+// At the start of a syscall we emit traceGoSysCall to capture the stack trace.
+// If the syscall does not block, that is it, we do not emit any other events.
+// If the syscall blocks (that is, P is retaken), retaker emits traceGoSysBlock;
+// when syscall returns we emit traceGoSysExit and when the goroutine starts running
+// (potentially instantly, if exitsyscallfast returns true) we emit traceGoStart.
+// To ensure that traceGoSysExit is emitted strictly after traceGoSysBlock,
+// we remember current value of syscalltick in m (_g_.m.syscalltick = _g_.m.p.ptr().syscalltick),
+// whoever emits traceGoSysBlock increments p.syscalltick afterwards;
+// and we wait for the increment before emitting traceGoSysExit.
+// Note that the increment is done even if tracing is not enabled,
+// because tracing can be enabled in the middle of syscall. We don't want the wait to hang.
+//
+//go:nosplit
+func reentersyscall(pc, sp uintptr) {
+	_g_ := getg()
+
+	// Disable preemption because during this function g is in Gsyscall status,
+	// but can have inconsistent g->sched, do not let GC observe it.
+	_g_.m.locks++
+
+	// Entersyscall must not call any function that might split/grow the stack.
+	// (See details in comment above.)
+	// Catch calls that might, by replacing the stack guard with something that
+	// will trip any stack check and leaving a flag to tell newstack to die.
+	_g_.stackguard0 = stackPreempt
+	_g_.throwsplit = true
+
+	// Leave SP around for GC and traceback.
+	save(pc, sp)
+	_g_.syscallsp = sp
+	_g_.syscallpc = pc
+	casgstatus(_g_, _Grunning, _Gsyscall)
+	if _g_.syscallsp < _g_.stack.lo || _g_.stack.hi < _g_.syscallsp {
+		systemstack(func() {
+			print("entersyscall inconsistent ", hex(_g_.syscallsp), " [", hex(_g_.stack.lo), ",", hex(_g_.stack.hi), "]\n")
+			throw("entersyscall")
+		})
+	}
+
+	if trace.enabled {
+		systemstack(traceGoSysCall)
+		// systemstack itself clobbers g.sched.{pc,sp} and we might
+		// need them later when the G is genuinely blocked in a
+		// syscall
+		save(pc, sp)
+	}
+
+	if atomic.Load(&sched.sysmonwait) != 0 {
+		systemstack(entersyscall_sysmon)
+		save(pc, sp)
+	}
+
+	if _g_.m.p.ptr().runSafePointFn != 0 {
+		// runSafePointFn may stack split if run on this stack
+		systemstack(runSafePointFn)
+		save(pc, sp)
+	}
+
+	_g_.m.syscalltick = _g_.m.p.ptr().syscalltick
+	_g_.sysblocktraced = true
+	pp := _g_.m.p.ptr()
+	pp.m = 0
+	_g_.m.oldp.set(pp)
+	_g_.m.p = 0
+	atomic.Store(&pp.status, _Psyscall)
+	if sched.gcwaiting != 0 {
+		systemstack(entersyscall_gcwait)
+		save(pc, sp)
+	}
+
+	_g_.m.locks--
+}
+
+// Standard syscall entry used by the go syscall library and normal cgo calls.
+//
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:nosplit
+//go:linkname entersyscall
+func entersyscall() {
+	reentersyscall(getcallerpc(), getcallersp())
+}
+
+func entersyscall_sysmon() {
+	lock(&sched.lock)
+	if atomic.Load(&sched.sysmonwait) != 0 {
+		atomic.Store(&sched.sysmonwait, 0)
+		notewakeup(&sched.sysmonnote)
+	}
+	unlock(&sched.lock)
+}
+
+func entersyscall_gcwait() {
+	_g_ := getg()
+	_p_ := _g_.m.oldp.ptr()
+
+	lock(&sched.lock)
+	if sched.stopwait > 0 && atomic.Cas(&_p_.status, _Psyscall, _Pgcstop) {
+		if trace.enabled {
+			traceGoSysBlock(_p_)
+			traceProcStop(_p_)
+		}
+		_p_.syscalltick++
+		if sched.stopwait--; sched.stopwait == 0 {
+			notewakeup(&sched.stopnote)
+		}
+	}
+	unlock(&sched.lock)
+}
+
+// The same as entersyscall(), but with a hint that the syscall is blocking.
+//go:nosplit
+func entersyscallblock() {
+	_g_ := getg()
+
+	_g_.m.locks++ // see comment in entersyscall
+	_g_.throwsplit = true
+	_g_.stackguard0 = stackPreempt // see comment in entersyscall
+	_g_.m.syscalltick = _g_.m.p.ptr().syscalltick
+	_g_.sysblocktraced = true
+	_g_.m.p.ptr().syscalltick++
+
+	// Leave SP around for GC and traceback.
+	pc := getcallerpc()
+	sp := getcallersp()
+	save(pc, sp)
+	_g_.syscallsp = _g_.sched.sp
+	_g_.syscallpc = _g_.sched.pc
+	if _g_.syscallsp < _g_.stack.lo || _g_.stack.hi < _g_.syscallsp {
+		sp1 := sp
+		sp2 := _g_.sched.sp
+		sp3 := _g_.syscallsp
+		systemstack(func() {
+			print("entersyscallblock inconsistent ", hex(sp1), " ", hex(sp2), " ", hex(sp3), " [", hex(_g_.stack.lo), ",", hex(_g_.stack.hi), "]\n")
+			throw("entersyscallblock")
+		})
+	}
+	casgstatus(_g_, _Grunning, _Gsyscall)
+	if _g_.syscallsp < _g_.stack.lo || _g_.stack.hi < _g_.syscallsp {
+		systemstack(func() {
+			print("entersyscallblock inconsistent ", hex(sp), " ", hex(_g_.sched.sp), " ", hex(_g_.syscallsp), " [", hex(_g_.stack.lo), ",", hex(_g_.stack.hi), "]\n")
+			throw("entersyscallblock")
+		})
+	}
+
+	systemstack(entersyscallblock_handoff)
+
+	// Resave for traceback during blocked call.
+	save(getcallerpc(), getcallersp())
+
+	_g_.m.locks--
+}
+
+func entersyscallblock_handoff() {
+	if trace.enabled {
+		traceGoSysCall()
+		traceGoSysBlock(getg().m.p.ptr())
+	}
+	handoffp(releasep())
+}
+
+// The goroutine g exited its system call.
+// Arrange for it to run on a cpu again.
+// This is called only from the go syscall library, not
+// from the low-level system calls used by the runtime.
+//
+// Write barriers are not allowed because our P may have been stolen.
+//
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:nosplit
+//go:nowritebarrierrec
+//go:linkname exitsyscall
+func exitsyscall() {
+	_g_ := getg()
+
+	_g_.m.locks++ // see comment in entersyscall
+	if getcallersp() > _g_.syscallsp {
+		throw("exitsyscall: syscall frame is no longer valid")
+	}
+
+	_g_.waitsince = 0
+	oldp := _g_.m.oldp.ptr()
+	_g_.m.oldp = 0
+	if exitsyscallfast(oldp) {
+		if trace.enabled {
+			if oldp != _g_.m.p.ptr() || _g_.m.syscalltick != _g_.m.p.ptr().syscalltick {
+				systemstack(traceGoStart)
+			}
+		}
+		// There's a cpu for us, so we can run.
+		_g_.m.p.ptr().syscalltick++
+		// We need to cas the status and scan before resuming...
+		casgstatus(_g_, _Gsyscall, _Grunning)
+
+		// Garbage collector isn't running (since we are),
+		// so okay to clear syscallsp.
+		_g_.syscallsp = 0
+		_g_.m.locks--
+		if _g_.preempt {
+			// restore the preemption request in case we've cleared it in newstack
+			_g_.stackguard0 = stackPreempt
+		} else {
+			// otherwise restore the real _StackGuard, we've spoiled it in entersyscall/entersyscallblock
+			_g_.stackguard0 = _g_.stack.lo + _StackGuard
+		}
+		_g_.throwsplit = false
+
+		if sched.disable.user && !schedEnabled(_g_) {
+			// Scheduling of this goroutine is disabled.
+			Gosched()
+		}
+
+		return
+	}
+
+	_g_.sysexitticks = 0
+	if trace.enabled {
+		// Wait till traceGoSysBlock event is emitted.
+		// This ensures consistency of the trace (the goroutine is started after it is blocked).
+		for oldp != nil && oldp.syscalltick == _g_.m.syscalltick {
+			osyield()
+		}
+		// We can't trace syscall exit right now because we don't have a P.
+		// Tracing code can invoke write barriers that cannot run without a P.
+		// So instead we remember the syscall exit time and emit the event
+		// in execute when we have a P.
+		_g_.sysexitticks = cputicks()
+	}
+
+	_g_.m.locks--
+
+	// Call the scheduler.
+	mcall(exitsyscall0)
+
+	// Scheduler returned, so we're allowed to run now.
+	// Delete the syscallsp information that we left for
+	// the garbage collector during the system call.
+	// Must wait until now because until gosched returns
+	// we don't know for sure that the garbage collector
+	// is not running.
+	_g_.syscallsp = 0
+	_g_.m.p.ptr().syscalltick++
+	_g_.throwsplit = false
+}
+
+//go:nosplit
+func exitsyscallfast(oldp *p) bool {
+	_g_ := getg()
+
+	// Freezetheworld sets stopwait but does not retake P's.
+	if sched.stopwait == freezeStopWait {
+		return false
+	}
+
+	// Try to re-acquire the last P.
+	if oldp != nil && oldp.status == _Psyscall && atomic.Cas(&oldp.status, _Psyscall, _Pidle) {
+		// There's a cpu for us, so we can run.
+		wirep(oldp)
+		exitsyscallfast_reacquired()
+		return true
+	}
+
+	// Try to get any other idle P.
+	if sched.pidle != 0 {
+		var ok bool
+		systemstack(func() {
+			ok = exitsyscallfast_pidle()
+			if ok && trace.enabled {
+				if oldp != nil {
+					// Wait till traceGoSysBlock event is emitted.
+					// This ensures consistency of the trace (the goroutine is started after it is blocked).
+					for oldp.syscalltick == _g_.m.syscalltick {
+						osyield()
+					}
+				}
+				traceGoSysExit(0)
+			}
+		})
+		if ok {
+			return true
+		}
+	}
+	return false
+}
+
+// exitsyscallfast_reacquired is the exitsyscall path on which this G
+// has successfully reacquired the P it was running on before the
+// syscall.
+//
+//go:nosplit
+func exitsyscallfast_reacquired() {
+	_g_ := getg()
+	if _g_.m.syscalltick != _g_.m.p.ptr().syscalltick {
+		if trace.enabled {
+			// The p was retaken and then enter into syscall again (since _g_.m.syscalltick has changed).
+			// traceGoSysBlock for this syscall was already emitted,
+			// but here we effectively retake the p from the new syscall running on the same p.
+			systemstack(func() {
+				// Denote blocking of the new syscall.
+				traceGoSysBlock(_g_.m.p.ptr())
+				// Denote completion of the current syscall.
+				traceGoSysExit(0)
+			})
+		}
+		_g_.m.p.ptr().syscalltick++
+	}
+}
+
+func exitsyscallfast_pidle() bool {
+	lock(&sched.lock)
+	_p_ := pidleget()
+	if _p_ != nil && atomic.Load(&sched.sysmonwait) != 0 {
+		atomic.Store(&sched.sysmonwait, 0)
+		notewakeup(&sched.sysmonnote)
+	}
+	unlock(&sched.lock)
+	if _p_ != nil {
+		acquirep(_p_)
+		return true
+	}
+	return false
+}
+
+// exitsyscall slow path on g0.
+// Failed to acquire P, enqueue gp as runnable.
+//
+//go:nowritebarrierrec
+func exitsyscall0(gp *g) {
+	_g_ := getg()
+
+	casgstatus(gp, _Gsyscall, _Grunnable)
+	dropg()
+	lock(&sched.lock)
+	var _p_ *p
+	if schedEnabled(_g_) {
+		_p_ = pidleget()
+	}
+	if _p_ == nil {
+		globrunqput(gp)
+	} else if atomic.Load(&sched.sysmonwait) != 0 {
+		atomic.Store(&sched.sysmonwait, 0)
+		notewakeup(&sched.sysmonnote)
+	}
+	unlock(&sched.lock)
+	if _p_ != nil {
+		acquirep(_p_)
+		execute(gp, false) // Never returns.
+	}
+	if _g_.m.lockedg != 0 {
+		// Wait until another thread schedules gp and so m again.
+		stoplockedm()
+		execute(gp, false) // Never returns.
+	}
+	stopm()
+	schedule() // Never returns.
+}
+
+func beforefork() {
+	gp := getg().m.curg
+
+	// Block signals during a fork, so that the child does not run
+	// a signal handler before exec if a signal is sent to the process
+	// group. See issue #18600.
+	gp.m.locks++
+	sigsave(&gp.m.sigmask)
+	sigblock(false)
+
+	// This function is called before fork in syscall package.
+	// Code between fork and exec must not allocate memory nor even try to grow stack.
+	// Here we spoil g->_StackGuard to reliably detect any attempts to grow stack.
+	// runtime_AfterFork will undo this in parent process, but not in child.
+	gp.stackguard0 = stackFork
+}
+
+// Called from syscall package before fork.
+//go:linkname syscall_runtime_BeforeFork syscall.runtime_BeforeFork
+//go:nosplit
+func syscall_runtime_BeforeFork() {
+	systemstack(beforefork)
+}
+
+func afterfork() {
+	gp := getg().m.curg
+
+	// See the comments in beforefork.
+	gp.stackguard0 = gp.stack.lo + _StackGuard
+
+	msigrestore(gp.m.sigmask)
+
+	gp.m.locks--
+}
+
+// Called from syscall package after fork in parent.
+//go:linkname syscall_runtime_AfterFork syscall.runtime_AfterFork
+//go:nosplit
+func syscall_runtime_AfterFork() {
+	systemstack(afterfork)
+}
+
+// inForkedChild is true while manipulating signals in the child process.
+// This is used to avoid calling libc functions in case we are using vfork.
+var inForkedChild bool
+
+// Called from syscall package after fork in child.
+// It resets non-sigignored signals to the default handler, and
+// restores the signal mask in preparation for the exec.
+//
+// Because this might be called during a vfork, and therefore may be
+// temporarily sharing address space with the parent process, this must
+// not change any global variables or calling into C code that may do so.
+//
+//go:linkname syscall_runtime_AfterForkInChild syscall.runtime_AfterForkInChild
+//go:nosplit
+//go:nowritebarrierrec
+func syscall_runtime_AfterForkInChild() {
+	// It's OK to change the global variable inForkedChild here
+	// because we are going to change it back. There is no race here,
+	// because if we are sharing address space with the parent process,
+	// then the parent process can not be running concurrently.
+	inForkedChild = true
+
+	clearSignalHandlers()
+
+	// When we are the child we are the only thread running,
+	// so we know that nothing else has changed gp.m.sigmask.
+	msigrestore(getg().m.sigmask)
+
+	inForkedChild = false
+}
+
+// pendingPreemptSignals is the number of preemption signals
+// that have been sent but not received. This is only used on Darwin.
+// For #41702.
+var pendingPreemptSignals uint32
+
+// Called from syscall package before Exec.
+//go:linkname syscall_runtime_BeforeExec syscall.runtime_BeforeExec
+func syscall_runtime_BeforeExec() {
+	// Prevent thread creation during exec.
+	execLock.lock()
+
+	// On Darwin, wait for all pending preemption signals to
+	// be received. See issue #41702.
+	if GOOS == "darwin" || GOOS == "ios" {
+		for int32(atomic.Load(&pendingPreemptSignals)) > 0 {
+			osyield()
+		}
+	}
+}
+
+// Called from syscall package after Exec.
+//go:linkname syscall_runtime_AfterExec syscall.runtime_AfterExec
+func syscall_runtime_AfterExec() {
+	execLock.unlock()
+}
+
+// Allocate a new g, with a stack big enough for stacksize bytes.
+func malg(stacksize int32) *g {
+	newg := new(g)
+	if stacksize >= 0 {
+		stacksize = round2(_StackSystem + stacksize)
+		systemstack(func() {
+			newg.stack = stackalloc(uint32(stacksize))
+		})
+		newg.stackguard0 = newg.stack.lo + _StackGuard
+		newg.stackguard1 = ^uintptr(0)
+		// Clear the bottom word of the stack. We record g
+		// there on gsignal stack during VDSO on ARM and ARM64.
+		*(*uintptr)(unsafe.Pointer(newg.stack.lo)) = 0
+	}
+	return newg
+}
+
+// Create a new g running fn with siz bytes of arguments.
+// Put it on the queue of g's waiting to run.
+// The compiler turns a go statement into a call to this.
+//
+// The stack layout of this call is unusual: it assumes that the
+// arguments to pass to fn are on the stack sequentially immediately
+// after &fn. Hence, they are logically part of newproc's argument
+// frame, even though they don't appear in its signature (and can't
+// because their types differ between call sites).
+//
+// This must be nosplit because this stack layout means there are
+// untyped arguments in newproc's argument frame. Stack copies won't
+// be able to adjust them and stack splits won't be able to copy them.
+//
+//go:nosplit
+func newproc(siz int32, fn *funcval) {
+	argp := add(unsafe.Pointer(&fn), sys.PtrSize)
+	gp := getg()
+	pc := getcallerpc()
+	systemstack(func() {
+		newg := newproc1(fn, argp, siz, gp, pc)
+
+		_p_ := getg().m.p.ptr()
+		runqput(_p_, newg, true)
+
+		if mainStarted {
+			wakep()
+		}
+	})
+}
+
+// Create a new g in state _Grunnable, starting at fn, with narg bytes
+// of arguments starting at argp. callerpc is the address of the go
+// statement that created this. The caller is responsible for adding
+// the new g to the scheduler.
+//
+// This must run on the system stack because it's the continuation of
+// newproc, which cannot split the stack.
+//
+//go:systemstack
+func newproc1(fn *funcval, argp unsafe.Pointer, narg int32, callergp *g, callerpc uintptr) *g {
+	_g_ := getg()
+
+	if fn == nil {
+		_g_.m.throwing = -1 // do not dump full stacks
+		throw("go of nil func value")
+	}
+	acquirem() // disable preemption because it can be holding p in a local var
+	siz := narg
+	siz = (siz + 7) &^ 7
+
+	// We could allocate a larger initial stack if necessary.
+	// Not worth it: this is almost always an error.
+	// 4*sizeof(uintreg): extra space added below
+	// sizeof(uintreg): caller's LR (arm) or return address (x86, in gostartcall).
+	if siz >= _StackMin-4*sys.RegSize-sys.RegSize {
+		throw("newproc: function arguments too large for new goroutine")
+	}
+
+	_p_ := _g_.m.p.ptr()
+	newg := gfget(_p_)
+	if newg == nil {
+		newg = malg(_StackMin)
+		casgstatus(newg, _Gidle, _Gdead)
+		allgadd(newg) // publishes with a g->status of Gdead so GC scanner doesn't look at uninitialized stack.
+	}
+	if newg.stack.hi == 0 {
+		throw("newproc1: newg missing stack")
+	}
+
+	if readgstatus(newg) != _Gdead {
+		throw("newproc1: new g is not Gdead")
+	}
+
+	totalSize := 4*sys.RegSize + uintptr(siz) + sys.MinFrameSize // extra space in case of reads slightly beyond frame
+	totalSize += -totalSize & (sys.SpAlign - 1)                  // align to spAlign
+	sp := newg.stack.hi - totalSize
+	spArg := sp
+	if usesLR {
+		// caller's LR
+		*(*uintptr)(unsafe.Pointer(sp)) = 0
+		prepGoExitFrame(sp)
+		spArg += sys.MinFrameSize
+	}
+	if narg > 0 {
+		memmove(unsafe.Pointer(spArg), argp, uintptr(narg))
+		// This is a stack-to-stack copy. If write barriers
+		// are enabled and the source stack is grey (the
+		// destination is always black), then perform a
+		// barrier copy. We do this *after* the memmove
+		// because the destination stack may have garbage on
+		// it.
+		if writeBarrier.needed && !_g_.m.curg.gcscandone {
+			f := findfunc(fn.fn)
+			stkmap := (*stackmap)(funcdata(f, _FUNCDATA_ArgsPointerMaps))
+			if stkmap.nbit > 0 {
+				// We're in the prologue, so it's always stack map index 0.
+				bv := stackmapdata(stkmap, 0)
+				bulkBarrierBitmap(spArg, spArg, uintptr(bv.n)*sys.PtrSize, 0, bv.bytedata)
+			}
+		}
+	}
+
+	memclrNoHeapPointers(unsafe.Pointer(&newg.sched), unsafe.Sizeof(newg.sched))
+	newg.sched.sp = sp
+	newg.stktopsp = sp
+	newg.sched.pc = funcPC(goexit) + sys.PCQuantum // +PCQuantum so that previous instruction is in same function
+	newg.sched.g = guintptr(unsafe.Pointer(newg))
+	gostartcallfn(&newg.sched, fn)
+	newg.gopc = callerpc
+	newg.ancestors = saveAncestors(callergp)
+	newg.startpc = fn.fn
+	if _g_.m.curg != nil {
+		newg.labels = _g_.m.curg.labels
+	}
+	if isSystemGoroutine(newg, false) {
+		atomic.Xadd(&sched.ngsys, +1)
+	}
+	casgstatus(newg, _Gdead, _Grunnable)
+
+	if _p_.goidcache == _p_.goidcacheend {
+		// Sched.goidgen is the last allocated id,
+		// this batch must be [sched.goidgen+1, sched.goidgen+GoidCacheBatch].
+		// At startup sched.goidgen=0, so main goroutine receives goid=1.
+		_p_.goidcache = atomic.Xadd64(&sched.goidgen, _GoidCacheBatch)
+		_p_.goidcache -= _GoidCacheBatch - 1
+		_p_.goidcacheend = _p_.goidcache + _GoidCacheBatch
+	}
+	newg.goid = int64(_p_.goidcache)
+	_p_.goidcache++
+	if raceenabled {
+		newg.racectx = racegostart(callerpc)
+	}
+	if trace.enabled {
+		traceGoCreate(newg, newg.startpc)
+	}
+	releasem(_g_.m)
+
+	return newg
+}
+
+// saveAncestors copies previous ancestors of the given caller g and
+// includes infor for the current caller into a new set of tracebacks for
+// a g being created.
+func saveAncestors(callergp *g) *[]ancestorInfo {
+	// Copy all prior info, except for the root goroutine (goid 0).
+	if debug.tracebackancestors <= 0 || callergp.goid == 0 {
+		return nil
+	}
+	var callerAncestors []ancestorInfo
+	if callergp.ancestors != nil {
+		callerAncestors = *callergp.ancestors
+	}
+	n := int32(len(callerAncestors)) + 1
+	if n > debug.tracebackancestors {
+		n = debug.tracebackancestors
+	}
+	ancestors := make([]ancestorInfo, n)
+	copy(ancestors[1:], callerAncestors)
+
+	var pcs [_TracebackMaxFrames]uintptr
+	npcs := gcallers(callergp, 0, pcs[:])
+	ipcs := make([]uintptr, npcs)
+	copy(ipcs, pcs[:])
+	ancestors[0] = ancestorInfo{
+		pcs:  ipcs,
+		goid: callergp.goid,
+		gopc: callergp.gopc,
+	}
+
+	ancestorsp := new([]ancestorInfo)
+	*ancestorsp = ancestors
+	return ancestorsp
+}
+
+// Put on gfree list.
+// If local list is too long, transfer a batch to the global list.
+func gfput(_p_ *p, gp *g) {
+	if readgstatus(gp) != _Gdead {
+		throw("gfput: bad status (not Gdead)")
+	}
+
+	stksize := gp.stack.hi - gp.stack.lo
+
+	if stksize != _FixedStack {
+		// non-standard stack size - free it.
+		stackfree(gp.stack)
+		gp.stack.lo = 0
+		gp.stack.hi = 0
+		gp.stackguard0 = 0
+	}
+
+	_p_.gFree.push(gp)
+	_p_.gFree.n++
+	if _p_.gFree.n >= 64 {
+		lock(&sched.gFree.lock)
+		for _p_.gFree.n >= 32 {
+			_p_.gFree.n--
+			gp = _p_.gFree.pop()
+			if gp.stack.lo == 0 {
+				sched.gFree.noStack.push(gp)
+			} else {
+				sched.gFree.stack.push(gp)
+			}
+			sched.gFree.n++
+		}
+		unlock(&sched.gFree.lock)
+	}
+}
+
+// Get from gfree list.
+// If local list is empty, grab a batch from global list.
+func gfget(_p_ *p) *g {
+retry:
+	if _p_.gFree.empty() && (!sched.gFree.stack.empty() || !sched.gFree.noStack.empty()) {
+		lock(&sched.gFree.lock)
+		// Move a batch of free Gs to the P.
+		for _p_.gFree.n < 32 {
+			// Prefer Gs with stacks.
+			gp := sched.gFree.stack.pop()
+			if gp == nil {
+				gp = sched.gFree.noStack.pop()
+				if gp == nil {
+					break
+				}
+			}
+			sched.gFree.n--
+			_p_.gFree.push(gp)
+			_p_.gFree.n++
+		}
+		unlock(&sched.gFree.lock)
+		goto retry
+	}
+	gp := _p_.gFree.pop()
+	if gp == nil {
+		return nil
+	}
+	_p_.gFree.n--
+	if gp.stack.lo == 0 {
+		// Stack was deallocated in gfput. Allocate a new one.
+		systemstack(func() {
+			gp.stack = stackalloc(_FixedStack)
+		})
+		gp.stackguard0 = gp.stack.lo + _StackGuard
+	} else {
+		if raceenabled {
+			racemalloc(unsafe.Pointer(gp.stack.lo), gp.stack.hi-gp.stack.lo)
+		}
+		if msanenabled {
+			msanmalloc(unsafe.Pointer(gp.stack.lo), gp.stack.hi-gp.stack.lo)
+		}
+	}
+	return gp
+}
+
+// Purge all cached G's from gfree list to the global list.
+func gfpurge(_p_ *p) {
+	lock(&sched.gFree.lock)
+	for !_p_.gFree.empty() {
+		gp := _p_.gFree.pop()
+		_p_.gFree.n--
+		if gp.stack.lo == 0 {
+			sched.gFree.noStack.push(gp)
+		} else {
+			sched.gFree.stack.push(gp)
+		}
+		sched.gFree.n++
+	}
+	unlock(&sched.gFree.lock)
+}
+
+// Breakpoint executes a breakpoint trap.
+func Breakpoint() {
+	breakpoint()
+}
+
+// dolockOSThread is called by LockOSThread and lockOSThread below
+// after they modify m.locked. Do not allow preemption during this call,
+// or else the m might be different in this function than in the caller.
+//go:nosplit
+func dolockOSThread() {
+	if GOARCH == "wasm" {
+		return // no threads on wasm yet
+	}
+	_g_ := getg()
+	_g_.m.lockedg.set(_g_)
+	_g_.lockedm.set(_g_.m)
+}
+
+//go:nosplit
+
+// LockOSThread wires the calling goroutine to its current operating system thread.
+// The calling goroutine will always execute in that thread,
+// and no other goroutine will execute in it,
+// until the calling goroutine has made as many calls to
+// UnlockOSThread as to LockOSThread.
+// If the calling goroutine exits without unlocking the thread,
+// the thread will be terminated.
+//
+// All init functions are run on the startup thread. Calling LockOSThread
+// from an init function will cause the main function to be invoked on
+// that thread.
+//
+// A goroutine should call LockOSThread before calling OS services or
+// non-Go library functions that depend on per-thread state.
+func LockOSThread() {
+	if atomic.Load(&newmHandoff.haveTemplateThread) == 0 && GOOS != "plan9" {
+		// If we need to start a new thread from the locked
+		// thread, we need the template thread. Start it now
+		// while we're in a known-good state.
+		startTemplateThread()
+	}
+	_g_ := getg()
+	_g_.m.lockedExt++
+	if _g_.m.lockedExt == 0 {
+		_g_.m.lockedExt--
+		panic("LockOSThread nesting overflow")
+	}
+	dolockOSThread()
+}
+
+//go:nosplit
+func lockOSThread() {
+	getg().m.lockedInt++
+	dolockOSThread()
+}
+
+// dounlockOSThread is called by UnlockOSThread and unlockOSThread below
+// after they update m->locked. Do not allow preemption during this call,
+// or else the m might be in different in this function than in the caller.
+//go:nosplit
+func dounlockOSThread() {
+	if GOARCH == "wasm" {
+		return // no threads on wasm yet
+	}
+	_g_ := getg()
+	if _g_.m.lockedInt != 0 || _g_.m.lockedExt != 0 {
+		return
+	}
+	_g_.m.lockedg = 0
+	_g_.lockedm = 0
+}
+
+//go:nosplit
+
+// UnlockOSThread undoes an earlier call to LockOSThread.
+// If this drops the number of active LockOSThread calls on the
+// calling goroutine to zero, it unwires the calling goroutine from
+// its fixed operating system thread.
+// If there are no active LockOSThread calls, this is a no-op.
+//
+// Before calling UnlockOSThread, the caller must ensure that the OS
+// thread is suitable for running other goroutines. If the caller made
+// any permanent changes to the state of the thread that would affect
+// other goroutines, it should not call this function and thus leave
+// the goroutine locked to the OS thread until the goroutine (and
+// hence the thread) exits.
+func UnlockOSThread() {
+	_g_ := getg()
+	if _g_.m.lockedExt == 0 {
+		return
+	}
+	_g_.m.lockedExt--
+	dounlockOSThread()
+}
+
+//go:nosplit
+func unlockOSThread() {
+	_g_ := getg()
+	if _g_.m.lockedInt == 0 {
+		systemstack(badunlockosthread)
+	}
+	_g_.m.lockedInt--
+	dounlockOSThread()
+}
+
+func badunlockosthread() {
+	throw("runtime: internal error: misuse of lockOSThread/unlockOSThread")
+}
+
+func gcount() int32 {
+	n := int32(atomic.Loaduintptr(&allglen)) - sched.gFree.n - int32(atomic.Load(&sched.ngsys))
+	for _, _p_ := range allp {
+		n -= _p_.gFree.n
+	}
+
+	// All these variables can be changed concurrently, so the result can be inconsistent.
+	// But at least the current goroutine is running.
+	if n < 1 {
+		n = 1
+	}
+	return n
+}
+
+func mcount() int32 {
+	return int32(sched.mnext - sched.nmfreed)
+}
+
+var prof struct {
+	signalLock uint32
+	hz         int32
+}
+
+func _System()                    { _System() }
+func _ExternalCode()              { _ExternalCode() }
+func _LostExternalCode()          { _LostExternalCode() }
+func _GC()                        { _GC() }
+func _LostSIGPROFDuringAtomic64() { _LostSIGPROFDuringAtomic64() }
+func _VDSO()                      { _VDSO() }
+
+// Called if we receive a SIGPROF signal.
+// Called by the signal handler, may run during STW.
+//go:nowritebarrierrec
+func sigprof(pc, sp, lr uintptr, gp *g, mp *m) {
+	if prof.hz == 0 {
+		return
+	}
+
+	// If mp.profilehz is 0, then profiling is not enabled for this thread.
+	// We must check this to avoid a deadlock between setcpuprofilerate
+	// and the call to cpuprof.add, below.
+	if mp != nil && mp.profilehz == 0 {
+		return
+	}
+
+	// On mips{,le}/arm, 64bit atomics are emulated with spinlocks, in
+	// runtime/internal/atomic. If SIGPROF arrives while the program is inside
+	// the critical section, it creates a deadlock (when writing the sample).
+	// As a workaround, create a counter of SIGPROFs while in critical section
+	// to store the count, and pass it to sigprof.add() later when SIGPROF is
+	// received from somewhere else (with _LostSIGPROFDuringAtomic64 as pc).
+	if GOARCH == "mips" || GOARCH == "mipsle" || GOARCH == "arm" {
+		if f := findfunc(pc); f.valid() {
+			if hasPrefix(funcname(f), "runtime/internal/atomic") {
+				cpuprof.lostAtomic++
+				return
+			}
+		}
+		if GOARCH == "arm" && goarm < 7 && GOOS == "linux" && pc&0xffff0000 == 0xffff0000 {
+			// runtime/internal/atomic functions call into kernel
+			// helpers on arm < 7. See
+			// runtime/internal/atomic/sys_linux_arm.s.
+			cpuprof.lostAtomic++
+			return
+		}
+	}
+
+	// Profiling runs concurrently with GC, so it must not allocate.
+	// Set a trap in case the code does allocate.
+	// Note that on windows, one thread takes profiles of all the
+	// other threads, so mp is usually not getg().m.
+	// In fact mp may not even be stopped.
+	// See golang.org/issue/17165.
+	getg().m.mallocing++
+
+	// Define that a "user g" is a user-created goroutine, and a "system g"
+	// is one that is m->g0 or m->gsignal.
+	//
+	// We might be interrupted for profiling halfway through a
+	// goroutine switch. The switch involves updating three (or four) values:
+	// g, PC, SP, and (on arm) LR. The PC must be the last to be updated,
+	// because once it gets updated the new g is running.
+	//
+	// When switching from a user g to a system g, LR is not considered live,
+	// so the update only affects g, SP, and PC. Since PC must be last, there
+	// the possible partial transitions in ordinary execution are (1) g alone is updated,
+	// (2) both g and SP are updated, and (3) SP alone is updated.
+	// If SP or g alone is updated, we can detect the partial transition by checking
+	// whether the SP is within g's stack bounds. (We could also require that SP
+	// be changed only after g, but the stack bounds check is needed by other
+	// cases, so there is no need to impose an additional requirement.)
+	//
+	// There is one exceptional transition to a system g, not in ordinary execution.
+	// When a signal arrives, the operating system starts the signal handler running
+	// with an updated PC and SP. The g is updated last, at the beginning of the
+	// handler. There are two reasons this is okay. First, until g is updated the
+	// g and SP do not match, so the stack bounds check detects the partial transition.
+	// Second, signal handlers currently run with signals disabled, so a profiling
+	// signal cannot arrive during the handler.
+	//
+	// When switching from a system g to a user g, there are three possibilities.
+	//
+	// First, it may be that the g switch has no PC update, because the SP
+	// either corresponds to a user g throughout (as in asmcgocall)
+	// or because it has been arranged to look like a user g frame
+	// (as in cgocallback). In this case, since the entire
+	// transition is a g+SP update, a partial transition updating just one of
+	// those will be detected by the stack bounds check.
+	//
+	// Second, when returning from a signal handler, the PC and SP updates
+	// are performed by the operating system in an atomic update, so the g
+	// update must be done before them. The stack bounds check detects
+	// the partial transition here, and (again) signal handlers run with signals
+	// disabled, so a profiling signal cannot arrive then anyway.
+	//
+	// Third, the common case: it may be that the switch updates g, SP, and PC
+	// separately. If the PC is within any of the functions that does this,
+	// we don't ask for a traceback. C.F. the function setsSP for more about this.
+	//
+	// There is another apparently viable approach, recorded here in case
+	// the "PC within setsSP function" check turns out not to be usable.
+	// It would be possible to delay the update of either g or SP until immediately
+	// before the PC update instruction. Then, because of the stack bounds check,
+	// the only problematic interrupt point is just before that PC update instruction,
+	// and the sigprof handler can detect that instruction and simulate stepping past
+	// it in order to reach a consistent state. On ARM, the update of g must be made
+	// in two places (in R10 and also in a TLS slot), so the delayed update would
+	// need to be the SP update. The sigprof handler must read the instruction at
+	// the current PC and if it was the known instruction (for example, JMP BX or
+	// MOV R2, PC), use that other register in place of the PC value.
+	// The biggest drawback to this solution is that it requires that we can tell
+	// whether it's safe to read from the memory pointed at by PC.
+	// In a correct program, we can test PC == nil and otherwise read,
+	// but if a profiling signal happens at the instant that a program executes
+	// a bad jump (before the program manages to handle the resulting fault)
+	// the profiling handler could fault trying to read nonexistent memory.
+	//
+	// To recap, there are no constraints on the assembly being used for the
+	// transition. We simply require that g and SP match and that the PC is not
+	// in gogo.
+	traceback := true
+	if gp == nil || sp < gp.stack.lo || gp.stack.hi < sp || setsSP(pc) || (mp != nil && mp.vdsoSP != 0) {
+		traceback = false
+	}
+	var stk [maxCPUProfStack]uintptr
+	n := 0
+	if mp.ncgo > 0 && mp.curg != nil && mp.curg.syscallpc != 0 && mp.curg.syscallsp != 0 {
+		cgoOff := 0
+		// Check cgoCallersUse to make sure that we are not
+		// interrupting other code that is fiddling with
+		// cgoCallers.  We are running in a signal handler
+		// with all signals blocked, so we don't have to worry
+		// about any other code interrupting us.
+		if atomic.Load(&mp.cgoCallersUse) == 0 && mp.cgoCallers != nil && mp.cgoCallers[0] != 0 {
+			for cgoOff < len(mp.cgoCallers) && mp.cgoCallers[cgoOff] != 0 {
+				cgoOff++
+			}
+			copy(stk[:], mp.cgoCallers[:cgoOff])
+			mp.cgoCallers[0] = 0
+		}
+
+		// Collect Go stack that leads to the cgo call.
+		n = gentraceback(mp.curg.syscallpc, mp.curg.syscallsp, 0, mp.curg, 0, &stk[cgoOff], len(stk)-cgoOff, nil, nil, 0)
+		if n > 0 {
+			n += cgoOff
+		}
+	} else if traceback {
+		n = gentraceback(pc, sp, lr, gp, 0, &stk[0], len(stk), nil, nil, _TraceTrap|_TraceJumpStack)
+	}
+
+	if n <= 0 {
+		// Normal traceback is impossible or has failed.
+		// See if it falls into several common cases.
+		n = 0
+		if usesLibcall() && mp.libcallg != 0 && mp.libcallpc != 0 && mp.libcallsp != 0 {
+			// Libcall, i.e. runtime syscall on windows.
+			// Collect Go stack that leads to the call.
+			n = gentraceback(mp.libcallpc, mp.libcallsp, 0, mp.libcallg.ptr(), 0, &stk[0], len(stk), nil, nil, 0)
+		}
+		if n == 0 && mp != nil && mp.vdsoSP != 0 {
+			n = gentraceback(mp.vdsoPC, mp.vdsoSP, 0, gp, 0, &stk[0], len(stk), nil, nil, _TraceTrap|_TraceJumpStack)
+		}
+		if n == 0 {
+			// If all of the above has failed, account it against abstract "System" or "GC".
+			n = 2
+			if inVDSOPage(pc) {
+				pc = funcPC(_VDSO) + sys.PCQuantum
+			} else if pc > firstmoduledata.etext {
+				// "ExternalCode" is better than "etext".
+				pc = funcPC(_ExternalCode) + sys.PCQuantum
+			}
+			stk[0] = pc
+			if mp.preemptoff != "" {
+				stk[1] = funcPC(_GC) + sys.PCQuantum
+			} else {
+				stk[1] = funcPC(_System) + sys.PCQuantum
+			}
+		}
+	}
+
+	if prof.hz != 0 {
+		cpuprof.add(gp, stk[:n])
+	}
+	getg().m.mallocing--
+}
+
+// If the signal handler receives a SIGPROF signal on a non-Go thread,
+// it tries to collect a traceback into sigprofCallers.
+// sigprofCallersUse is set to non-zero while sigprofCallers holds a traceback.
+var sigprofCallers cgoCallers
+var sigprofCallersUse uint32
+
+// sigprofNonGo is called if we receive a SIGPROF signal on a non-Go thread,
+// and the signal handler collected a stack trace in sigprofCallers.
+// When this is called, sigprofCallersUse will be non-zero.
+// g is nil, and what we can do is very limited.
+//go:nosplit
+//go:nowritebarrierrec
+func sigprofNonGo() {
+	if prof.hz != 0 {
+		n := 0
+		for n < len(sigprofCallers) && sigprofCallers[n] != 0 {
+			n++
+		}
+		cpuprof.addNonGo(sigprofCallers[:n])
+	}
+
+	atomic.Store(&sigprofCallersUse, 0)
+}
+
+// sigprofNonGoPC is called when a profiling signal arrived on a
+// non-Go thread and we have a single PC value, not a stack trace.
+// g is nil, and what we can do is very limited.
+//go:nosplit
+//go:nowritebarrierrec
+func sigprofNonGoPC(pc uintptr) {
+	if prof.hz != 0 {
+		stk := []uintptr{
+			pc,
+			funcPC(_ExternalCode) + sys.PCQuantum,
+		}
+		cpuprof.addNonGo(stk)
+	}
+}
+
+// Reports whether a function will set the SP
+// to an absolute value. Important that
+// we don't traceback when these are at the bottom
+// of the stack since we can't be sure that we will
+// find the caller.
+//
+// If the function is not on the bottom of the stack
+// we assume that it will have set it up so that traceback will be consistent,
+// either by being a traceback terminating function
+// or putting one on the stack at the right offset.
+func setsSP(pc uintptr) bool {
+	f := findfunc(pc)
+	if !f.valid() {
+		// couldn't find the function for this PC,
+		// so assume the worst and stop traceback
+		return true
+	}
+	switch f.funcID {
+	case funcID_gogo, funcID_systemstack, funcID_mcall, funcID_morestack:
+		return true
+	}
+	return false
+}
+
+// setcpuprofilerate sets the CPU profiling rate to hz times per second.
+// If hz <= 0, setcpuprofilerate turns off CPU profiling.
+func setcpuprofilerate(hz int32) {
+	// Force sane arguments.
+	if hz < 0 {
+		hz = 0
+	}
+
+	// Disable preemption, otherwise we can be rescheduled to another thread
+	// that has profiling enabled.
+	_g_ := getg()
+	_g_.m.locks++
+
+	// Stop profiler on this thread so that it is safe to lock prof.
+	// if a profiling signal came in while we had prof locked,
+	// it would deadlock.
+	setThreadCPUProfiler(0)
+
+	for !atomic.Cas(&prof.signalLock, 0, 1) {
+		osyield()
+	}
+	if prof.hz != hz {
+		setProcessCPUProfiler(hz)
+		prof.hz = hz
+	}
+	atomic.Store(&prof.signalLock, 0)
+
+	lock(&sched.lock)
+	sched.profilehz = hz
+	unlock(&sched.lock)
+
+	if hz != 0 {
+		setThreadCPUProfiler(hz)
+	}
+
+	_g_.m.locks--
+}
+
+// init initializes pp, which may be a freshly allocated p or a
+// previously destroyed p, and transitions it to status _Pgcstop.
+func (pp *p) init(id int32) {
+	pp.id = id
+	pp.status = _Pgcstop
+	pp.sudogcache = pp.sudogbuf[:0]
+	for i := range pp.deferpool {
+		pp.deferpool[i] = pp.deferpoolbuf[i][:0]
+	}
+	pp.wbBuf.reset()
+	if pp.mcache == nil {
+		if id == 0 {
+			if mcache0 == nil {
+				throw("missing mcache?")
+			}
+			// Use the bootstrap mcache0. Only one P will get
+			// mcache0: the one with ID 0.
+			pp.mcache = mcache0
+		} else {
+			pp.mcache = allocmcache()
+		}
+	}
+	if raceenabled && pp.raceprocctx == 0 {
+		if id == 0 {
+			pp.raceprocctx = raceprocctx0
+			raceprocctx0 = 0 // bootstrap
+		} else {
+			pp.raceprocctx = raceproccreate()
+		}
+	}
+	lockInit(&pp.timersLock, lockRankTimers)
+
+	// This P may get timers when it starts running. Set the mask here
+	// since the P may not go through pidleget (notably P 0 on startup).
+	timerpMask.set(id)
+	// Similarly, we may not go through pidleget before this P starts
+	// running if it is P 0 on startup.
+	idlepMask.clear(id)
+}
+
+// destroy releases all of the resources associated with pp and
+// transitions it to status _Pdead.
+//
+// sched.lock must be held and the world must be stopped.
+func (pp *p) destroy() {
+	assertLockHeld(&sched.lock)
+	assertWorldStopped()
+
+	// Move all runnable goroutines to the global queue
+	for pp.runqhead != pp.runqtail {
+		// Pop from tail of local queue
+		pp.runqtail--
+		gp := pp.runq[pp.runqtail%uint32(len(pp.runq))].ptr()
+		// Push onto head of global queue
+		globrunqputhead(gp)
+	}
+	if pp.runnext != 0 {
+		globrunqputhead(pp.runnext.ptr())
+		pp.runnext = 0
+	}
+	if len(pp.timers) > 0 {
+		plocal := getg().m.p.ptr()
+		// The world is stopped, but we acquire timersLock to
+		// protect against sysmon calling timeSleepUntil.
+		// This is the only case where we hold the timersLock of
+		// more than one P, so there are no deadlock concerns.
+		lock(&plocal.timersLock)
+		lock(&pp.timersLock)
+		moveTimers(plocal, pp.timers)
+		pp.timers = nil
+		pp.numTimers = 0
+		pp.deletedTimers = 0
+		atomic.Store64(&pp.timer0When, 0)
+		unlock(&pp.timersLock)
+		unlock(&plocal.timersLock)
+	}
+	// Flush p's write barrier buffer.
+	if gcphase != _GCoff {
+		wbBufFlush1(pp)
+		pp.gcw.dispose()
+	}
+	for i := range pp.sudogbuf {
+		pp.sudogbuf[i] = nil
+	}
+	pp.sudogcache = pp.sudogbuf[:0]
+	for i := range pp.deferpool {
+		for j := range pp.deferpoolbuf[i] {
+			pp.deferpoolbuf[i][j] = nil
+		}
+		pp.deferpool[i] = pp.deferpoolbuf[i][:0]
+	}
+	systemstack(func() {
+		for i := 0; i < pp.mspancache.len; i++ {
+			// Safe to call since the world is stopped.
+			mheap_.spanalloc.free(unsafe.Pointer(pp.mspancache.buf[i]))
+		}
+		pp.mspancache.len = 0
+		lock(&mheap_.lock)
+		pp.pcache.flush(&mheap_.pages)
+		unlock(&mheap_.lock)
+	})
+	freemcache(pp.mcache)
+	pp.mcache = nil
+	gfpurge(pp)
+	traceProcFree(pp)
+	if raceenabled {
+		if pp.timerRaceCtx != 0 {
+			// The race detector code uses a callback to fetch
+			// the proc context, so arrange for that callback
+			// to see the right thing.
+			// This hack only works because we are the only
+			// thread running.
+			mp := getg().m
+			phold := mp.p.ptr()
+			mp.p.set(pp)
+
+			racectxend(pp.timerRaceCtx)
+			pp.timerRaceCtx = 0
+
+			mp.p.set(phold)
+		}
+		raceprocdestroy(pp.raceprocctx)
+		pp.raceprocctx = 0
+	}
+	pp.gcAssistTime = 0
+	pp.status = _Pdead
+}
+
+// Change number of processors.
+//
+// sched.lock must be held, and the world must be stopped.
+//
+// gcworkbufs must not be being modified by either the GC or the write barrier
+// code, so the GC must not be running if the number of Ps actually changes.
+//
+// Returns list of Ps with local work, they need to be scheduled by the caller.
+func procresize(nprocs int32) *p {
+	assertLockHeld(&sched.lock)
+	assertWorldStopped()
+
+	old := gomaxprocs
+	if old < 0 || nprocs <= 0 {
+		throw("procresize: invalid arg")
+	}
+	if trace.enabled {
+		traceGomaxprocs(nprocs)
+	}
+
+	// update statistics
+	now := nanotime()
+	if sched.procresizetime != 0 {
+		sched.totaltime += int64(old) * (now - sched.procresizetime)
+	}
+	sched.procresizetime = now
+
+	maskWords := (nprocs + 31) / 32
+
+	// Grow allp if necessary.
+	if nprocs > int32(len(allp)) {
+		// Synchronize with retake, which could be running
+		// concurrently since it doesn't run on a P.
+		lock(&allpLock)
+		if nprocs <= int32(cap(allp)) {
+			allp = allp[:nprocs]
+		} else {
+			nallp := make([]*p, nprocs)
+			// Copy everything up to allp's cap so we
+			// never lose old allocated Ps.
+			copy(nallp, allp[:cap(allp)])
+			allp = nallp
+		}
+
+		if maskWords <= int32(cap(idlepMask)) {
+			idlepMask = idlepMask[:maskWords]
+			timerpMask = timerpMask[:maskWords]
+		} else {
+			nidlepMask := make([]uint32, maskWords)
+			// No need to copy beyond len, old Ps are irrelevant.
+			copy(nidlepMask, idlepMask)
+			idlepMask = nidlepMask
+
+			ntimerpMask := make([]uint32, maskWords)
+			copy(ntimerpMask, timerpMask)
+			timerpMask = ntimerpMask
+		}
+		unlock(&allpLock)
+	}
+
+	// initialize new P's
+	for i := old; i < nprocs; i++ {
+		pp := allp[i]
+		if pp == nil {
+			pp = new(p)
+		}
+		pp.init(i)
+		atomicstorep(unsafe.Pointer(&allp[i]), unsafe.Pointer(pp))
+	}
+
+	_g_ := getg()
+	if _g_.m.p != 0 && _g_.m.p.ptr().id < nprocs {
+		// continue to use the current P
+		_g_.m.p.ptr().status = _Prunning
+		_g_.m.p.ptr().mcache.prepareForSweep()
+	} else {
+		// release the current P and acquire allp[0].
+		//
+		// We must do this before destroying our current P
+		// because p.destroy itself has write barriers, so we
+		// need to do that from a valid P.
+		if _g_.m.p != 0 {
+			if trace.enabled {
+				// Pretend that we were descheduled
+				// and then scheduled again to keep
+				// the trace sane.
+				traceGoSched()
+				traceProcStop(_g_.m.p.ptr())
+			}
+			_g_.m.p.ptr().m = 0
+		}
+		_g_.m.p = 0
+		p := allp[0]
+		p.m = 0
+		p.status = _Pidle
+		acquirep(p)
+		if trace.enabled {
+			traceGoStart()
+		}
+	}
+
+	// g.m.p is now set, so we no longer need mcache0 for bootstrapping.
+	mcache0 = nil
+
+	// release resources from unused P's
+	for i := nprocs; i < old; i++ {
+		p := allp[i]
+		p.destroy()
+		// can't free P itself because it can be referenced by an M in syscall
+	}
+
+	// Trim allp.
+	if int32(len(allp)) != nprocs {
+		lock(&allpLock)
+		allp = allp[:nprocs]
+		idlepMask = idlepMask[:maskWords]
+		timerpMask = timerpMask[:maskWords]
+		unlock(&allpLock)
+	}
+
+	var runnablePs *p
+	for i := nprocs - 1; i >= 0; i-- {
+		p := allp[i]
+		if _g_.m.p.ptr() == p {
+			continue
+		}
+		p.status = _Pidle
+		if runqempty(p) {
+			pidleput(p)
+		} else {
+			p.m.set(mget())
+			p.link.set(runnablePs)
+			runnablePs = p
+		}
+	}
+	stealOrder.reset(uint32(nprocs))
+	var int32p *int32 = &gomaxprocs // make compiler check that gomaxprocs is an int32
+	atomic.Store((*uint32)(unsafe.Pointer(int32p)), uint32(nprocs))
+	return runnablePs
+}
+
+// Associate p and the current m.
+//
+// This function is allowed to have write barriers even if the caller
+// isn't because it immediately acquires _p_.
+//
+//go:yeswritebarrierrec
+func acquirep(_p_ *p) {
+	// Do the part that isn't allowed to have write barriers.
+	wirep(_p_)
+
+	// Have p; write barriers now allowed.
+
+	// Perform deferred mcache flush before this P can allocate
+	// from a potentially stale mcache.
+	_p_.mcache.prepareForSweep()
+
+	if trace.enabled {
+		traceProcStart()
+	}
+}
+
+// wirep is the first step of acquirep, which actually associates the
+// current M to _p_. This is broken out so we can disallow write
+// barriers for this part, since we don't yet have a P.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func wirep(_p_ *p) {
+	_g_ := getg()
+
+	if _g_.m.p != 0 {
+		throw("wirep: already in go")
+	}
+	if _p_.m != 0 || _p_.status != _Pidle {
+		id := int64(0)
+		if _p_.m != 0 {
+			id = _p_.m.ptr().id
+		}
+		print("wirep: p->m=", _p_.m, "(", id, ") p->status=", _p_.status, "\n")
+		throw("wirep: invalid p state")
+	}
+	_g_.m.p.set(_p_)
+	_p_.m.set(_g_.m)
+	_p_.status = _Prunning
+}
+
+// Disassociate p and the current m.
+func releasep() *p {
+	_g_ := getg()
+
+	if _g_.m.p == 0 {
+		throw("releasep: invalid arg")
+	}
+	_p_ := _g_.m.p.ptr()
+	if _p_.m.ptr() != _g_.m || _p_.status != _Prunning {
+		print("releasep: m=", _g_.m, " m->p=", _g_.m.p.ptr(), " p->m=", hex(_p_.m), " p->status=", _p_.status, "\n")
+		throw("releasep: invalid p state")
+	}
+	if trace.enabled {
+		traceProcStop(_g_.m.p.ptr())
+	}
+	_g_.m.p = 0
+	_p_.m = 0
+	_p_.status = _Pidle
+	return _p_
+}
+
+func incidlelocked(v int32) {
+	lock(&sched.lock)
+	sched.nmidlelocked += v
+	if v > 0 {
+		checkdead()
+	}
+	unlock(&sched.lock)
+}
+
+// Check for deadlock situation.
+// The check is based on number of running M's, if 0 -> deadlock.
+// sched.lock must be held.
+func checkdead() {
+	assertLockHeld(&sched.lock)
+
+	// For -buildmode=c-shared or -buildmode=c-archive it's OK if
+	// there are no running goroutines. The calling program is
+	// assumed to be running.
+	if islibrary || isarchive {
+		return
+	}
+
+	// If we are dying because of a signal caught on an already idle thread,
+	// freezetheworld will cause all running threads to block.
+	// And runtime will essentially enter into deadlock state,
+	// except that there is a thread that will call exit soon.
+	if panicking > 0 {
+		return
+	}
+
+	// If we are not running under cgo, but we have an extra M then account
+	// for it. (It is possible to have an extra M on Windows without cgo to
+	// accommodate callbacks created by syscall.NewCallback. See issue #6751
+	// for details.)
+	var run0 int32
+	if !iscgo && cgoHasExtraM {
+		mp := lockextra(true)
+		haveExtraM := extraMCount > 0
+		unlockextra(mp)
+		if haveExtraM {
+			run0 = 1
+		}
+	}
+
+	run := mcount() - sched.nmidle - sched.nmidlelocked - sched.nmsys
+	if run > run0 {
+		return
+	}
+	if run < 0 {
+		print("runtime: checkdead: nmidle=", sched.nmidle, " nmidlelocked=", sched.nmidlelocked, " mcount=", mcount(), " nmsys=", sched.nmsys, "\n")
+		throw("checkdead: inconsistent counts")
+	}
+
+	grunning := 0
+	lock(&allglock)
+	for i := 0; i < len(allgs); i++ {
+		gp := allgs[i]
+		if isSystemGoroutine(gp, false) {
+			continue
+		}
+		s := readgstatus(gp)
+		switch s &^ _Gscan {
+		case _Gwaiting,
+			_Gpreempted:
+			grunning++
+		case _Grunnable,
+			_Grunning,
+			_Gsyscall:
+			print("runtime: checkdead: find g ", gp.goid, " in status ", s, "\n")
+			throw("checkdead: runnable g")
+		}
+	}
+	unlock(&allglock)
+	if grunning == 0 { // possible if main goroutine calls runtime·Goexit()
+		unlock(&sched.lock) // unlock so that GODEBUG=scheddetail=1 doesn't hang
+		throw("no goroutines (main called runtime.Goexit) - deadlock!")
+	}
+
+	// Maybe jump time forward for playground.
+	if faketime != 0 {
+		when, _p_ := timeSleepUntil()
+		if _p_ != nil {
+			faketime = when
+			for pp := &sched.pidle; *pp != 0; pp = &(*pp).ptr().link {
+				if (*pp).ptr() == _p_ {
+					*pp = _p_.link
+					break
+				}
+			}
+			mp := mget()
+			if mp == nil {
+				// There should always be a free M since
+				// nothing is running.
+				throw("checkdead: no m for timer")
+			}
+			mp.nextp.set(_p_)
+			notewakeup(&mp.park)
+			return
+		}
+	}
+
+	// There are no goroutines running, so we can look at the P's.
+	for _, _p_ := range allp {
+		if len(_p_.timers) > 0 {
+			return
+		}
+	}
+
+	getg().m.throwing = -1 // do not dump full stacks
+	unlock(&sched.lock)    // unlock so that GODEBUG=scheddetail=1 doesn't hang
+	throw("all goroutines are asleep - deadlock!")
+}
+
+// forcegcperiod is the maximum time in nanoseconds between garbage
+// collections. If we go this long without a garbage collection, one
+// is forced to run.
+//
+// This is a variable for testing purposes. It normally doesn't change.
+var forcegcperiod int64 = 2 * 60 * 1e9
+
+// Always runs without a P, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func sysmon() {
+	lock(&sched.lock)
+	sched.nmsys++
+	checkdead()
+	unlock(&sched.lock)
+
+	// For syscall_runtime_doAllThreadsSyscall, sysmon is
+	// sufficiently up to participate in fixups.
+	atomic.Store(&sched.sysmonStarting, 0)
+
+	lasttrace := int64(0)
+	idle := 0 // how many cycles in succession we had not wokeup somebody
+	delay := uint32(0)
+
+	for {
+		if idle == 0 { // start with 20us sleep...
+			delay = 20
+		} else if idle > 50 { // start doubling the sleep after 1ms...
+			delay *= 2
+		}
+		if delay > 10*1000 { // up to 10ms
+			delay = 10 * 1000
+		}
+		usleep(delay)
+		mDoFixup()
+
+		// sysmon should not enter deep sleep if schedtrace is enabled so that
+		// it can print that information at the right time.
+		//
+		// It should also not enter deep sleep if there are any active P's so
+		// that it can retake P's from syscalls, preempt long running G's, and
+		// poll the network if all P's are busy for long stretches.
+		//
+		// It should wakeup from deep sleep if any P's become active either due
+		// to exiting a syscall or waking up due to a timer expiring so that it
+		// can resume performing those duties. If it wakes from a syscall it
+		// resets idle and delay as a bet that since it had retaken a P from a
+		// syscall before, it may need to do it again shortly after the
+		// application starts work again. It does not reset idle when waking
+		// from a timer to avoid adding system load to applications that spend
+		// most of their time sleeping.
+		now := nanotime()
+		if debug.schedtrace <= 0 && (sched.gcwaiting != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs)) {
+			lock(&sched.lock)
+			if atomic.Load(&sched.gcwaiting) != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs) {
+				syscallWake := false
+				next, _ := timeSleepUntil()
+				if next > now {
+					atomic.Store(&sched.sysmonwait, 1)
+					unlock(&sched.lock)
+					// Make wake-up period small enough
+					// for the sampling to be correct.
+					sleep := forcegcperiod / 2
+					if next-now < sleep {
+						sleep = next - now
+					}
+					shouldRelax := sleep >= osRelaxMinNS
+					if shouldRelax {
+						osRelax(true)
+					}
+					syscallWake = notetsleep(&sched.sysmonnote, sleep)
+					mDoFixup()
+					if shouldRelax {
+						osRelax(false)
+					}
+					lock(&sched.lock)
+					atomic.Store(&sched.sysmonwait, 0)
+					noteclear(&sched.sysmonnote)
+				}
+				if syscallWake {
+					idle = 0
+					delay = 20
+				}
+			}
+			unlock(&sched.lock)
+		}
+
+		lock(&sched.sysmonlock)
+		// Update now in case we blocked on sysmonnote or spent a long time
+		// blocked on schedlock or sysmonlock above.
+		now = nanotime()
+
+		// trigger libc interceptors if needed
+		if *cgo_yield != nil {
+			asmcgocall(*cgo_yield, nil)
+		}
+		// poll network if not polled for more than 10ms
+		lastpoll := int64(atomic.Load64(&sched.lastpoll))
+		if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
+			atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))
+			list := netpoll(0) // non-blocking - returns list of goroutines
+			if !list.empty() {
+				// Need to decrement number of idle locked M's
+				// (pretending that one more is running) before injectglist.
+				// Otherwise it can lead to the following situation:
+				// injectglist grabs all P's but before it starts M's to run the P's,
+				// another M returns from syscall, finishes running its G,
+				// observes that there is no work to do and no other running M's
+				// and reports deadlock.
+				incidlelocked(-1)
+				injectglist(&list)
+				incidlelocked(1)
+			}
+		}
+		mDoFixup()
+		if GOOS == "netbsd" {
+			// netpoll is responsible for waiting for timer
+			// expiration, so we typically don't have to worry
+			// about starting an M to service timers. (Note that
+			// sleep for timeSleepUntil above simply ensures sysmon
+			// starts running again when that timer expiration may
+			// cause Go code to run again).
+			//
+			// However, netbsd has a kernel bug that sometimes
+			// misses netpollBreak wake-ups, which can lead to
+			// unbounded delays servicing timers. If we detect this
+			// overrun, then startm to get something to handle the
+			// timer.
+			//
+			// See issue 42515 and
+			// https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=50094.
+			if next, _ := timeSleepUntil(); next < now {
+				startm(nil, false)
+			}
+		}
+		if atomic.Load(&scavenge.sysmonWake) != 0 {
+			// Kick the scavenger awake if someone requested it.
+			wakeScavenger()
+		}
+		// retake P's blocked in syscalls
+		// and preempt long running G's
+		if retake(now) != 0 {
+			idle = 0
+		} else {
+			idle++
+		}
+		// check if we need to force a GC
+		if t := (gcTrigger{kind: gcTriggerTime, now: now}); t.test() && atomic.Load(&forcegc.idle) != 0 {
+			lock(&forcegc.lock)
+			forcegc.idle = 0
+			var list gList
+			list.push(forcegc.g)
+			injectglist(&list)
+			unlock(&forcegc.lock)
+		}
+		if debug.schedtrace > 0 && lasttrace+int64(debug.schedtrace)*1000000 <= now {
+			lasttrace = now
+			schedtrace(debug.scheddetail > 0)
+		}
+		unlock(&sched.sysmonlock)
+	}
+}
+
+type sysmontick struct {
+	schedtick   uint32
+	schedwhen   int64
+	syscalltick uint32
+	syscallwhen int64
+}
+
+// forcePreemptNS is the time slice given to a G before it is
+// preempted.
+const forcePreemptNS = 10 * 1000 * 1000 // 10ms
+
+func retake(now int64) uint32 {
+	n := 0
+	// Prevent allp slice changes. This lock will be completely
+	// uncontended unless we're already stopping the world.
+	lock(&allpLock)
+	// We can't use a range loop over allp because we may
+	// temporarily drop the allpLock. Hence, we need to re-fetch
+	// allp each time around the loop.
+	for i := 0; i < len(allp); i++ {
+		_p_ := allp[i]
+		if _p_ == nil {
+			// This can happen if procresize has grown
+			// allp but not yet created new Ps.
+			continue
+		}
+		pd := &_p_.sysmontick
+		s := _p_.status
+		sysretake := false
+		if s == _Prunning || s == _Psyscall {
+			// Preempt G if it's running for too long.
+			t := int64(_p_.schedtick)
+			if int64(pd.schedtick) != t {
+				pd.schedtick = uint32(t)
+				pd.schedwhen = now
+			} else if pd.schedwhen+forcePreemptNS <= now {
+				preemptone(_p_)
+				// In case of syscall, preemptone() doesn't
+				// work, because there is no M wired to P.
+				sysretake = true
+			}
+		}
+		if s == _Psyscall {
+			// Retake P from syscall if it's there for more than 1 sysmon tick (at least 20us).
+			t := int64(_p_.syscalltick)
+			if !sysretake && int64(pd.syscalltick) != t {
+				pd.syscalltick = uint32(t)
+				pd.syscallwhen = now
+				continue
+			}
+			// On the one hand we don't want to retake Ps if there is no other work to do,
+			// but on the other hand we want to retake them eventually
+			// because they can prevent the sysmon thread from deep sleep.
+			if runqempty(_p_) && atomic.Load(&sched.nmspinning)+atomic.Load(&sched.npidle) > 0 && pd.syscallwhen+10*1000*1000 > now {
+				continue
+			}
+			// Drop allpLock so we can take sched.lock.
+			unlock(&allpLock)
+			// Need to decrement number of idle locked M's
+			// (pretending that one more is running) before the CAS.
+			// Otherwise the M from which we retake can exit the syscall,
+			// increment nmidle and report deadlock.
+			incidlelocked(-1)
+			if atomic.Cas(&_p_.status, s, _Pidle) {
+				if trace.enabled {
+					traceGoSysBlock(_p_)
+					traceProcStop(_p_)
+				}
+				n++
+				_p_.syscalltick++
+				handoffp(_p_)
+			}
+			incidlelocked(1)
+			lock(&allpLock)
+		}
+	}
+	unlock(&allpLock)
+	return uint32(n)
+}
+
+// Tell all goroutines that they have been preempted and they should stop.
+// This function is purely best-effort. It can fail to inform a goroutine if a
+// processor just started running it.
+// No locks need to be held.
+// Returns true if preemption request was issued to at least one goroutine.
+func preemptall() bool {
+	res := false
+	for _, _p_ := range allp {
+		if _p_.status != _Prunning {
+			continue
+		}
+		if preemptone(_p_) {
+			res = true
+		}
+	}
+	return res
+}
+
+// Tell the goroutine running on processor P to stop.
+// This function is purely best-effort. It can incorrectly fail to inform the
+// goroutine. It can send inform the wrong goroutine. Even if it informs the
+// correct goroutine, that goroutine might ignore the request if it is
+// simultaneously executing newstack.
+// No lock needs to be held.
+// Returns true if preemption request was issued.
+// The actual preemption will happen at some point in the future
+// and will be indicated by the gp->status no longer being
+// Grunning
+func preemptone(_p_ *p) bool {
+	mp := _p_.m.ptr()
+	if mp == nil || mp == getg().m {
+		return false
+	}
+	gp := mp.curg
+	if gp == nil || gp == mp.g0 {
+		return false
+	}
+
+	gp.preempt = true
+
+	// Every call in a go routine checks for stack overflow by
+	// comparing the current stack pointer to gp->stackguard0.
+	// Setting gp->stackguard0 to StackPreempt folds
+	// preemption into the normal stack overflow check.
+	gp.stackguard0 = stackPreempt
+
+	// Request an async preemption of this P.
+	if preemptMSupported && debug.asyncpreemptoff == 0 {
+		_p_.preempt = true
+		preemptM(mp)
+	}
+
+	return true
+}
+
+var starttime int64
+
+func schedtrace(detailed bool) {
+	now := nanotime()
+	if starttime == 0 {
+		starttime = now
+	}
+
+	lock(&sched.lock)
+	print("SCHED ", (now-starttime)/1e6, "ms: gomaxprocs=", gomaxprocs, " idleprocs=", sched.npidle, " threads=", mcount(), " spinningthreads=", sched.nmspinning, " idlethreads=", sched.nmidle, " runqueue=", sched.runqsize)
+	if detailed {
+		print(" gcwaiting=", sched.gcwaiting, " nmidlelocked=", sched.nmidlelocked, " stopwait=", sched.stopwait, " sysmonwait=", sched.sysmonwait, "\n")
+	}
+	// We must be careful while reading data from P's, M's and G's.
+	// Even if we hold schedlock, most data can be changed concurrently.
+	// E.g. (p->m ? p->m->id : -1) can crash if p->m changes from non-nil to nil.
+	for i, _p_ := range allp {
+		mp := _p_.m.ptr()
+		h := atomic.Load(&_p_.runqhead)
+		t := atomic.Load(&_p_.runqtail)
+		if detailed {
+			id := int64(-1)
+			if mp != nil {
+				id = mp.id
+			}
+			print("  P", i, ": status=", _p_.status, " schedtick=", _p_.schedtick, " syscalltick=", _p_.syscalltick, " m=", id, " runqsize=", t-h, " gfreecnt=", _p_.gFree.n, " timerslen=", len(_p_.timers), "\n")
+		} else {
+			// In non-detailed mode format lengths of per-P run queues as:
+			// [len1 len2 len3 len4]
+			print(" ")
+			if i == 0 {
+				print("[")
+			}
+			print(t - h)
+			if i == len(allp)-1 {
+				print("]\n")
+			}
+		}
+	}
+
+	if !detailed {
+		unlock(&sched.lock)
+		return
+	}
+
+	for mp := allm; mp != nil; mp = mp.alllink {
+		_p_ := mp.p.ptr()
+		gp := mp.curg
+		lockedg := mp.lockedg.ptr()
+		id1 := int32(-1)
+		if _p_ != nil {
+			id1 = _p_.id
+		}
+		id2 := int64(-1)
+		if gp != nil {
+			id2 = gp.goid
+		}
+		id3 := int64(-1)
+		if lockedg != nil {
+			id3 = lockedg.goid
+		}
+		print("  M", mp.id, ": p=", id1, " curg=", id2, " mallocing=", mp.mallocing, " throwing=", mp.throwing, " preemptoff=", mp.preemptoff, ""+" locks=", mp.locks, " dying=", mp.dying, " spinning=", mp.spinning, " blocked=", mp.blocked, " lockedg=", id3, "\n")
+	}
+
+	lock(&allglock)
+	for gi := 0; gi < len(allgs); gi++ {
+		gp := allgs[gi]
+		mp := gp.m
+		lockedm := gp.lockedm.ptr()
+		id1 := int64(-1)
+		if mp != nil {
+			id1 = mp.id
+		}
+		id2 := int64(-1)
+		if lockedm != nil {
+			id2 = lockedm.id
+		}
+		print("  G", gp.goid, ": status=", readgstatus(gp), "(", gp.waitreason.String(), ") m=", id1, " lockedm=", id2, "\n")
+	}
+	unlock(&allglock)
+	unlock(&sched.lock)
+}
+
+// schedEnableUser enables or disables the scheduling of user
+// goroutines.
+//
+// This does not stop already running user goroutines, so the caller
+// should first stop the world when disabling user goroutines.
+func schedEnableUser(enable bool) {
+	lock(&sched.lock)
+	if sched.disable.user == !enable {
+		unlock(&sched.lock)
+		return
+	}
+	sched.disable.user = !enable
+	if enable {
+		n := sched.disable.n
+		sched.disable.n = 0
+		globrunqputbatch(&sched.disable.runnable, n)
+		unlock(&sched.lock)
+		for ; n != 0 && sched.npidle != 0; n-- {
+			startm(nil, false)
+		}
+	} else {
+		unlock(&sched.lock)
+	}
+}
+
+// schedEnabled reports whether gp should be scheduled. It returns
+// false is scheduling of gp is disabled.
+//
+// sched.lock must be held.
+func schedEnabled(gp *g) bool {
+	assertLockHeld(&sched.lock)
+
+	if sched.disable.user {
+		return isSystemGoroutine(gp, true)
+	}
+	return true
+}
+
+// Put mp on midle list.
+// sched.lock must be held.
+// May run during STW, so write barriers are not allowed.
+//go:nowritebarrierrec
+func mput(mp *m) {
+	assertLockHeld(&sched.lock)
+
+	mp.schedlink = sched.midle
+	sched.midle.set(mp)
+	sched.nmidle++
+	checkdead()
+}
+
+// Try to get an m from midle list.
+// sched.lock must be held.
+// May run during STW, so write barriers are not allowed.
+//go:nowritebarrierrec
+func mget() *m {
+	assertLockHeld(&sched.lock)
+
+	mp := sched.midle.ptr()
+	if mp != nil {
+		sched.midle = mp.schedlink
+		sched.nmidle--
+	}
+	return mp
+}
+
+// Put gp on the global runnable queue.
+// sched.lock must be held.
+// May run during STW, so write barriers are not allowed.
+//go:nowritebarrierrec
+func globrunqput(gp *g) {
+	assertLockHeld(&sched.lock)
+
+	sched.runq.pushBack(gp)
+	sched.runqsize++
+}
+
+// Put gp at the head of the global runnable queue.
+// sched.lock must be held.
+// May run during STW, so write barriers are not allowed.
+//go:nowritebarrierrec
+func globrunqputhead(gp *g) {
+	assertLockHeld(&sched.lock)
+
+	sched.runq.push(gp)
+	sched.runqsize++
+}
+
+// Put a batch of runnable goroutines on the global runnable queue.
+// This clears *batch.
+// sched.lock must be held.
+func globrunqputbatch(batch *gQueue, n int32) {
+	assertLockHeld(&sched.lock)
+
+	sched.runq.pushBackAll(*batch)
+	sched.runqsize += n
+	*batch = gQueue{}
+}
+
+// Try get a batch of G's from the global runnable queue.
+// sched.lock must be held.
+func globrunqget(_p_ *p, max int32) *g {
+	assertLockHeld(&sched.lock)
+
+	if sched.runqsize == 0 {
+		return nil
+	}
+
+	n := sched.runqsize/gomaxprocs + 1
+	if n > sched.runqsize {
+		n = sched.runqsize
+	}
+	if max > 0 && n > max {
+		n = max
+	}
+	if n > int32(len(_p_.runq))/2 {
+		n = int32(len(_p_.runq)) / 2
+	}
+
+	sched.runqsize -= n
+
+	gp := sched.runq.pop()
+	n--
+	for ; n > 0; n-- {
+		gp1 := sched.runq.pop()
+		runqput(_p_, gp1, false)
+	}
+	return gp
+}
+
+// pMask is an atomic bitstring with one bit per P.
+type pMask []uint32
+
+// read returns true if P id's bit is set.
+func (p pMask) read(id uint32) bool {
+	word := id / 32
+	mask := uint32(1) << (id % 32)
+	return (atomic.Load(&p[word]) & mask) != 0
+}
+
+// set sets P id's bit.
+func (p pMask) set(id int32) {
+	word := id / 32
+	mask := uint32(1) << (id % 32)
+	atomic.Or(&p[word], mask)
+}
+
+// clear clears P id's bit.
+func (p pMask) clear(id int32) {
+	word := id / 32
+	mask := uint32(1) << (id % 32)
+	atomic.And(&p[word], ^mask)
+}
+
+// updateTimerPMask clears pp's timer mask if it has no timers on its heap.
+//
+// Ideally, the timer mask would be kept immediately consistent on any timer
+// operations. Unfortunately, updating a shared global data structure in the
+// timer hot path adds too much overhead in applications frequently switching
+// between no timers and some timers.
+//
+// As a compromise, the timer mask is updated only on pidleget / pidleput. A
+// running P (returned by pidleget) may add a timer at any time, so its mask
+// must be set. An idle P (passed to pidleput) cannot add new timers while
+// idle, so if it has no timers at that time, its mask may be cleared.
+//
+// Thus, we get the following effects on timer-stealing in findrunnable:
+//
+// * Idle Ps with no timers when they go idle are never checked in findrunnable
+//   (for work- or timer-stealing; this is the ideal case).
+// * Running Ps must always be checked.
+// * Idle Ps whose timers are stolen must continue to be checked until they run
+//   again, even after timer expiration.
+//
+// When the P starts running again, the mask should be set, as a timer may be
+// added at any time.
+//
+// TODO(prattmic): Additional targeted updates may improve the above cases.
+// e.g., updating the mask when stealing a timer.
+func updateTimerPMask(pp *p) {
+	if atomic.Load(&pp.numTimers) > 0 {
+		return
+	}
+
+	// Looks like there are no timers, however another P may transiently
+	// decrement numTimers when handling a timerModified timer in
+	// checkTimers. We must take timersLock to serialize with these changes.
+	lock(&pp.timersLock)
+	if atomic.Load(&pp.numTimers) == 0 {
+		timerpMask.clear(pp.id)
+	}
+	unlock(&pp.timersLock)
+}
+
+// pidleput puts p to on the _Pidle list.
+//
+// This releases ownership of p. Once sched.lock is released it is no longer
+// safe to use p.
+//
+// sched.lock must be held.
+//
+// May run during STW, so write barriers are not allowed.
+//go:nowritebarrierrec
+func pidleput(_p_ *p) {
+	assertLockHeld(&sched.lock)
+
+	if !runqempty(_p_) {
+		throw("pidleput: P has non-empty run queue")
+	}
+	updateTimerPMask(_p_) // clear if there are no timers.
+	idlepMask.set(_p_.id)
+	_p_.link = sched.pidle
+	sched.pidle.set(_p_)
+	atomic.Xadd(&sched.npidle, 1) // TODO: fast atomic
+}
+
+// pidleget tries to get a p from the _Pidle list, acquiring ownership.
+//
+// sched.lock must be held.
+//
+// May run during STW, so write barriers are not allowed.
+//go:nowritebarrierrec
+func pidleget() *p {
+	assertLockHeld(&sched.lock)
+
+	_p_ := sched.pidle.ptr()
+	if _p_ != nil {
+		// Timer may get added at any time now.
+		timerpMask.set(_p_.id)
+		idlepMask.clear(_p_.id)
+		sched.pidle = _p_.link
+		atomic.Xadd(&sched.npidle, -1) // TODO: fast atomic
+	}
+	return _p_
+}
+
+// runqempty reports whether _p_ has no Gs on its local run queue.
+// It never returns true spuriously.
+func runqempty(_p_ *p) bool {
+	// Defend against a race where 1) _p_ has G1 in runqnext but runqhead == runqtail,
+	// 2) runqput on _p_ kicks G1 to the runq, 3) runqget on _p_ empties runqnext.
+	// Simply observing that runqhead == runqtail and then observing that runqnext == nil
+	// does not mean the queue is empty.
+	for {
+		head := atomic.Load(&_p_.runqhead)
+		tail := atomic.Load(&_p_.runqtail)
+		runnext := atomic.Loaduintptr((*uintptr)(unsafe.Pointer(&_p_.runnext)))
+		if tail == atomic.Load(&_p_.runqtail) {
+			return head == tail && runnext == 0
+		}
+	}
+}
+
+// To shake out latent assumptions about scheduling order,
+// we introduce some randomness into scheduling decisions
+// when running with the race detector.
+// The need for this was made obvious by changing the
+// (deterministic) scheduling order in Go 1.5 and breaking
+// many poorly-written tests.
+// With the randomness here, as long as the tests pass
+// consistently with -race, they shouldn't have latent scheduling
+// assumptions.
+const randomizeScheduler = raceenabled
+
+// runqput tries to put g on the local runnable queue.
+// If next is false, runqput adds g to the tail of the runnable queue.
+// If next is true, runqput puts g in the _p_.runnext slot.
+// If the run queue is full, runnext puts g on the global queue.
+// Executed only by the owner P.
+func runqput(_p_ *p, gp *g, next bool) {
+	if randomizeScheduler && next && fastrand()%2 == 0 {
+		next = false
+	}
+
+	if next {
+	retryNext:
+		oldnext := _p_.runnext
+		if !_p_.runnext.cas(oldnext, guintptr(unsafe.Pointer(gp))) {
+			goto retryNext
+		}
+		if oldnext == 0 {
+			return
+		}
+		// Kick the old runnext out to the regular run queue.
+		gp = oldnext.ptr()
+	}
+
+retry:
+	h := atomic.LoadAcq(&_p_.runqhead) // load-acquire, synchronize with consumers
+	t := _p_.runqtail
+	if t-h < uint32(len(_p_.runq)) {
+		_p_.runq[t%uint32(len(_p_.runq))].set(gp)
+		atomic.StoreRel(&_p_.runqtail, t+1) // store-release, makes the item available for consumption
+		return
+	}
+	if runqputslow(_p_, gp, h, t) {
+		return
+	}
+	// the queue is not full, now the put above must succeed
+	goto retry
+}
+
+// Put g and a batch of work from local runnable queue on global queue.
+// Executed only by the owner P.
+func runqputslow(_p_ *p, gp *g, h, t uint32) bool {
+	var batch [len(_p_.runq)/2 + 1]*g
+
+	// First, grab a batch from local queue.
+	n := t - h
+	n = n / 2
+	if n != uint32(len(_p_.runq)/2) {
+		throw("runqputslow: queue is not full")
+	}
+	for i := uint32(0); i < n; i++ {
+		batch[i] = _p_.runq[(h+i)%uint32(len(_p_.runq))].ptr()
+	}
+	if !atomic.CasRel(&_p_.runqhead, h, h+n) { // cas-release, commits consume
+		return false
+	}
+	batch[n] = gp
+
+	if randomizeScheduler {
+		for i := uint32(1); i <= n; i++ {
+			j := fastrandn(i + 1)
+			batch[i], batch[j] = batch[j], batch[i]
+		}
+	}
+
+	// Link the goroutines.
+	for i := uint32(0); i < n; i++ {
+		batch[i].schedlink.set(batch[i+1])
+	}
+	var q gQueue
+	q.head.set(batch[0])
+	q.tail.set(batch[n])
+
+	// Now put the batch on global queue.
+	lock(&sched.lock)
+	globrunqputbatch(&q, int32(n+1))
+	unlock(&sched.lock)
+	return true
+}
+
+// runqputbatch tries to put all the G's on q on the local runnable queue.
+// If the queue is full, they are put on the global queue; in that case
+// this will temporarily acquire the scheduler lock.
+// Executed only by the owner P.
+func runqputbatch(pp *p, q *gQueue, qsize int) {
+	h := atomic.LoadAcq(&pp.runqhead)
+	t := pp.runqtail
+	n := uint32(0)
+	for !q.empty() && t-h < uint32(len(pp.runq)) {
+		gp := q.pop()
+		pp.runq[t%uint32(len(pp.runq))].set(gp)
+		t++
+		n++
+	}
+	qsize -= int(n)
+
+	if randomizeScheduler {
+		off := func(o uint32) uint32 {
+			return (pp.runqtail + o) % uint32(len(pp.runq))
+		}
+		for i := uint32(1); i < n; i++ {
+			j := fastrandn(i + 1)
+			pp.runq[off(i)], pp.runq[off(j)] = pp.runq[off(j)], pp.runq[off(i)]
+		}
+	}
+
+	atomic.StoreRel(&pp.runqtail, t)
+	if !q.empty() {
+		lock(&sched.lock)
+		globrunqputbatch(q, int32(qsize))
+		unlock(&sched.lock)
+	}
+}
+
+// Get g from local runnable queue.
+// If inheritTime is true, gp should inherit the remaining time in the
+// current time slice. Otherwise, it should start a new time slice.
+// Executed only by the owner P.
+func runqget(_p_ *p) (gp *g, inheritTime bool) {
+	// If there's a runnext, it's the next G to run.
+	for {
+		next := _p_.runnext
+		if next == 0 {
+			break
+		}
+		if _p_.runnext.cas(next, 0) {
+			return next.ptr(), true
+		}
+	}
+
+	for {
+		h := atomic.LoadAcq(&_p_.runqhead) // load-acquire, synchronize with other consumers
+		t := _p_.runqtail
+		if t == h {
+			return nil, false
+		}
+		gp := _p_.runq[h%uint32(len(_p_.runq))].ptr()
+		if atomic.CasRel(&_p_.runqhead, h, h+1) { // cas-release, commits consume
+			return gp, false
+		}
+	}
+}
+
+// Grabs a batch of goroutines from _p_'s runnable queue into batch.
+// Batch is a ring buffer starting at batchHead.
+// Returns number of grabbed goroutines.
+// Can be executed by any P.
+func runqgrab(_p_ *p, batch *[256]guintptr, batchHead uint32, stealRunNextG bool) uint32 {
+	for {
+		h := atomic.LoadAcq(&_p_.runqhead) // load-acquire, synchronize with other consumers
+		t := atomic.LoadAcq(&_p_.runqtail) // load-acquire, synchronize with the producer
+		n := t - h
+		n = n - n/2
+		if n == 0 {
+			if stealRunNextG {
+				// Try to steal from _p_.runnext.
+				if next := _p_.runnext; next != 0 {
+					if _p_.status == _Prunning {
+						// Sleep to ensure that _p_ isn't about to run the g
+						// we are about to steal.
+						// The important use case here is when the g running
+						// on _p_ ready()s another g and then almost
+						// immediately blocks. Instead of stealing runnext
+						// in this window, back off to give _p_ a chance to
+						// schedule runnext. This will avoid thrashing gs
+						// between different Ps.
+						// A sync chan send/recv takes ~50ns as of time of
+						// writing, so 3us gives ~50x overshoot.
+						if GOOS != "windows" {
+							usleep(3)
+						} else {
+							// On windows system timer granularity is
+							// 1-15ms, which is way too much for this
+							// optimization. So just yield.
+							osyield()
+						}
+					}
+					if !_p_.runnext.cas(next, 0) {
+						continue
+					}
+					batch[batchHead%uint32(len(batch))] = next
+					return 1
+				}
+			}
+			return 0
+		}
+		if n > uint32(len(_p_.runq)/2) { // read inconsistent h and t
+			continue
+		}
+		for i := uint32(0); i < n; i++ {
+			g := _p_.runq[(h+i)%uint32(len(_p_.runq))]
+			batch[(batchHead+i)%uint32(len(batch))] = g
+		}
+		if atomic.CasRel(&_p_.runqhead, h, h+n) { // cas-release, commits consume
+			return n
+		}
+	}
+}
+
+// Steal half of elements from local runnable queue of p2
+// and put onto local runnable queue of p.
+// Returns one of the stolen elements (or nil if failed).
+func runqsteal(_p_, p2 *p, stealRunNextG bool) *g {
+	t := _p_.runqtail
+	n := runqgrab(p2, &_p_.runq, t, stealRunNextG)
+	if n == 0 {
+		return nil
+	}
+	n--
+	gp := _p_.runq[(t+n)%uint32(len(_p_.runq))].ptr()
+	if n == 0 {
+		return gp
+	}
+	h := atomic.LoadAcq(&_p_.runqhead) // load-acquire, synchronize with consumers
+	if t-h+n >= uint32(len(_p_.runq)) {
+		throw("runqsteal: runq overflow")
+	}
+	atomic.StoreRel(&_p_.runqtail, t+n) // store-release, makes the item available for consumption
+	return gp
+}
+
+// A gQueue is a dequeue of Gs linked through g.schedlink. A G can only
+// be on one gQueue or gList at a time.
+type gQueue struct {
+	head guintptr
+	tail guintptr
+}
+
+// empty reports whether q is empty.
+func (q *gQueue) empty() bool {
+	return q.head == 0
+}
+
+// push adds gp to the head of q.
+func (q *gQueue) push(gp *g) {
+	gp.schedlink = q.head
+	q.head.set(gp)
+	if q.tail == 0 {
+		q.tail.set(gp)
+	}
+}
+
+// pushBack adds gp to the tail of q.
+func (q *gQueue) pushBack(gp *g) {
+	gp.schedlink = 0
+	if q.tail != 0 {
+		q.tail.ptr().schedlink.set(gp)
+	} else {
+		q.head.set(gp)
+	}
+	q.tail.set(gp)
+}
+
+// pushBackAll adds all Gs in l2 to the tail of q. After this q2 must
+// not be used.
+func (q *gQueue) pushBackAll(q2 gQueue) {
+	if q2.tail == 0 {
+		return
+	}
+	q2.tail.ptr().schedlink = 0
+	if q.tail != 0 {
+		q.tail.ptr().schedlink = q2.head
+	} else {
+		q.head = q2.head
+	}
+	q.tail = q2.tail
+}
+
+// pop removes and returns the head of queue q. It returns nil if
+// q is empty.
+func (q *gQueue) pop() *g {
+	gp := q.head.ptr()
+	if gp != nil {
+		q.head = gp.schedlink
+		if q.head == 0 {
+			q.tail = 0
+		}
+	}
+	return gp
+}
+
+// popList takes all Gs in q and returns them as a gList.
+func (q *gQueue) popList() gList {
+	stack := gList{q.head}
+	*q = gQueue{}
+	return stack
+}
+
+// A gList is a list of Gs linked through g.schedlink. A G can only be
+// on one gQueue or gList at a time.
+type gList struct {
+	head guintptr
+}
+
+// empty reports whether l is empty.
+func (l *gList) empty() bool {
+	return l.head == 0
+}
+
+// push adds gp to the head of l.
+func (l *gList) push(gp *g) {
+	gp.schedlink = l.head
+	l.head.set(gp)
+}
+
+// pushAll prepends all Gs in q to l.
+func (l *gList) pushAll(q gQueue) {
+	if !q.empty() {
+		q.tail.ptr().schedlink = l.head
+		l.head = q.head
+	}
+}
+
+// pop removes and returns the head of l. If l is empty, it returns nil.
+func (l *gList) pop() *g {
+	gp := l.head.ptr()
+	if gp != nil {
+		l.head = gp.schedlink
+	}
+	return gp
+}
+
+//go:linkname setMaxThreads runtime/debug.setMaxThreads
+func setMaxThreads(in int) (out int) {
+	lock(&sched.lock)
+	out = int(sched.maxmcount)
+	if in > 0x7fffffff { // MaxInt32
+		sched.maxmcount = 0x7fffffff
+	} else {
+		sched.maxmcount = int32(in)
+	}
+	checkmcount()
+	unlock(&sched.lock)
+	return
+}
+
+func haveexperiment(name string) bool {
+	x := sys.Goexperiment
+	for x != "" {
+		xname := ""
+		i := bytealg.IndexByteString(x, ',')
+		if i < 0 {
+			xname, x = x, ""
+		} else {
+			xname, x = x[:i], x[i+1:]
+		}
+		if xname == name {
+			return true
+		}
+		if len(xname) > 2 && xname[:2] == "no" && xname[2:] == name {
+			return false
+		}
+	}
+	return false
+}
+
+//go:nosplit
+func procPin() int {
+	_g_ := getg()
+	mp := _g_.m
+
+	mp.locks++
+	return int(mp.p.ptr().id)
+}
+
+//go:nosplit
+func procUnpin() {
+	_g_ := getg()
+	_g_.m.locks--
+}
+
+//go:linkname sync_runtime_procPin sync.runtime_procPin
+//go:nosplit
+func sync_runtime_procPin() int {
+	return procPin()
+}
+
+//go:linkname sync_runtime_procUnpin sync.runtime_procUnpin
+//go:nosplit
+func sync_runtime_procUnpin() {
+	procUnpin()
+}
+
+//go:linkname sync_atomic_runtime_procPin sync/atomic.runtime_procPin
+//go:nosplit
+func sync_atomic_runtime_procPin() int {
+	return procPin()
+}
+
+//go:linkname sync_atomic_runtime_procUnpin sync/atomic.runtime_procUnpin
+//go:nosplit
+func sync_atomic_runtime_procUnpin() {
+	procUnpin()
+}
+
+// Active spinning for sync.Mutex.
+//go:linkname sync_runtime_canSpin sync.runtime_canSpin
+//go:nosplit
+func sync_runtime_canSpin(i int) bool {
+	// sync.Mutex is cooperative, so we are conservative with spinning.
+	// Spin only few times and only if running on a multicore machine and
+	// GOMAXPROCS>1 and there is at least one other running P and local runq is empty.
+	// As opposed to runtime mutex we don't do passive spinning here,
+	// because there can be work on global runq or on other Ps.
+	if i >= active_spin || ncpu <= 1 || gomaxprocs <= int32(sched.npidle+sched.nmspinning)+1 {
+		return false
+	}
+	if p := getg().m.p.ptr(); !runqempty(p) {
+		return false
+	}
+	return true
+}
+
+//go:linkname sync_runtime_doSpin sync.runtime_doSpin
+//go:nosplit
+func sync_runtime_doSpin() {
+	procyield(active_spin_cnt)
+}
+
+var stealOrder randomOrder
+
+// randomOrder/randomEnum are helper types for randomized work stealing.
+// They allow to enumerate all Ps in different pseudo-random orders without repetitions.
+// The algorithm is based on the fact that if we have X such that X and GOMAXPROCS
+// are coprime, then a sequences of (i + X) % GOMAXPROCS gives the required enumeration.
+type randomOrder struct {
+	count    uint32
+	coprimes []uint32
+}
+
+type randomEnum struct {
+	i     uint32
+	count uint32
+	pos   uint32
+	inc   uint32
+}
+
+func (ord *randomOrder) reset(count uint32) {
+	ord.count = count
+	ord.coprimes = ord.coprimes[:0]
+	for i := uint32(1); i <= count; i++ {
+		if gcd(i, count) == 1 {
+			ord.coprimes = append(ord.coprimes, i)
+		}
+	}
+}
+
+func (ord *randomOrder) start(i uint32) randomEnum {
+	return randomEnum{
+		count: ord.count,
+		pos:   i % ord.count,
+		inc:   ord.coprimes[i%uint32(len(ord.coprimes))],
+	}
+}
+
+func (enum *randomEnum) done() bool {
+	return enum.i == enum.count
+}
+
+func (enum *randomEnum) next() {
+	enum.i++
+	enum.pos = (enum.pos + enum.inc) % enum.count
+}
+
+func (enum *randomEnum) position() uint32 {
+	return enum.pos
+}
+
+func gcd(a, b uint32) uint32 {
+	for b != 0 {
+		a, b = b, a%b
+	}
+	return a
+}
+
+// An initTask represents the set of initializations that need to be done for a package.
+// Keep in sync with ../../test/initempty.go:initTask
+type initTask struct {
+	// TODO: pack the first 3 fields more tightly?
+	state uintptr // 0 = uninitialized, 1 = in progress, 2 = done
+	ndeps uintptr
+	nfns  uintptr
+	// followed by ndeps instances of an *initTask, one per package depended on
+	// followed by nfns pcs, one per init function to run
+}
+
+// inittrace stores statistics for init functions which are
+// updated by malloc and newproc when active is true.
+var inittrace tracestat
+
+type tracestat struct {
+	active bool   // init tracing activation status
+	id     int64  // init go routine id
+	allocs uint64 // heap allocations
+	bytes  uint64 // heap allocated bytes
+}
+
+func doInit(t *initTask) {
+	switch t.state {
+	case 2: // fully initialized
+		return
+	case 1: // initialization in progress
+		throw("recursive call during initialization - linker skew")
+	default: // not initialized yet
+		t.state = 1 // initialization in progress
+
+		for i := uintptr(0); i < t.ndeps; i++ {
+			p := add(unsafe.Pointer(t), (3+i)*sys.PtrSize)
+			t2 := *(**initTask)(p)
+			doInit(t2)
+		}
+
+		if t.nfns == 0 {
+			t.state = 2 // initialization done
+			return
+		}
+
+		var (
+			start  int64
+			before tracestat
+		)
+
+		if inittrace.active {
+			start = nanotime()
+			// Load stats non-atomically since tracinit is updated only by this init go routine.
+			before = inittrace
+		}
+
+		firstFunc := add(unsafe.Pointer(t), (3+t.ndeps)*sys.PtrSize)
+		for i := uintptr(0); i < t.nfns; i++ {
+			p := add(firstFunc, i*sys.PtrSize)
+			f := *(*func())(unsafe.Pointer(&p))
+			f()
+		}
+
+		if inittrace.active {
+			end := nanotime()
+			// Load stats non-atomically since tracinit is updated only by this init go routine.
+			after := inittrace
+
+			pkg := funcpkgpath(findfunc(funcPC(firstFunc)))
+
+			var sbuf [24]byte
+			print("init ", pkg, " @")
+			print(string(fmtNSAsMS(sbuf[:], uint64(start-runtimeInitTime))), " ms, ")
+			print(string(fmtNSAsMS(sbuf[:], uint64(end-start))), " ms clock, ")
+			print(string(itoa(sbuf[:], after.bytes-before.bytes)), " bytes, ")
+			print(string(itoa(sbuf[:], after.allocs-before.allocs)), " allocs")
+			print("\n")
+		}
+
+		t.state = 2 // initialization done
+	}
+}
diff --git a/src/runtime/proc_runtime_test.go b/src/runtime/proc_runtime_test.go
new file mode 100644
index 0000000..a7bde2c
--- /dev/null
+++ b/src/runtime/proc_runtime_test.go
@@ -0,0 +1,33 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Proc unit tests. In runtime package so can use runtime guts.
+
+package runtime
+
+func RunStealOrderTest() {
+	var ord randomOrder
+	for procs := 1; procs <= 64; procs++ {
+		ord.reset(uint32(procs))
+		if procs >= 3 && len(ord.coprimes) < 2 {
+			panic("too few coprimes")
+		}
+		for co := 0; co < len(ord.coprimes); co++ {
+			enum := ord.start(uint32(co))
+			checked := make([]bool, procs)
+			for p := 0; p < procs; p++ {
+				x := enum.position()
+				if checked[x] {
+					println("procs:", procs, "inc:", enum.inc)
+					panic("duplicate during enumeration")
+				}
+				checked[x] = true
+				enum.next()
+			}
+			if !enum.done() {
+				panic("not done")
+			}
+		}
+	}
+}
diff --git a/src/runtime/proc_test.go b/src/runtime/proc_test.go
new file mode 100644
index 0000000..767bde1
--- /dev/null
+++ b/src/runtime/proc_test.go
@@ -0,0 +1,1090 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/race"
+	"internal/testenv"
+	"math"
+	"net"
+	"runtime"
+	"runtime/debug"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"syscall"
+	"testing"
+	"time"
+)
+
+var stop = make(chan bool, 1)
+
+func perpetuumMobile() {
+	select {
+	case <-stop:
+	default:
+		go perpetuumMobile()
+	}
+}
+
+func TestStopTheWorldDeadlock(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+	if testing.Short() {
+		t.Skip("skipping during short test")
+	}
+	maxprocs := runtime.GOMAXPROCS(3)
+	compl := make(chan bool, 2)
+	go func() {
+		for i := 0; i != 1000; i += 1 {
+			runtime.GC()
+		}
+		compl <- true
+	}()
+	go func() {
+		for i := 0; i != 1000; i += 1 {
+			runtime.GOMAXPROCS(3)
+		}
+		compl <- true
+	}()
+	go perpetuumMobile()
+	<-compl
+	<-compl
+	stop <- true
+	runtime.GOMAXPROCS(maxprocs)
+}
+
+func TestYieldProgress(t *testing.T) {
+	testYieldProgress(false)
+}
+
+func TestYieldLockedProgress(t *testing.T) {
+	testYieldProgress(true)
+}
+
+func testYieldProgress(locked bool) {
+	c := make(chan bool)
+	cack := make(chan bool)
+	go func() {
+		if locked {
+			runtime.LockOSThread()
+		}
+		for {
+			select {
+			case <-c:
+				cack <- true
+				return
+			default:
+				runtime.Gosched()
+			}
+		}
+	}()
+	time.Sleep(10 * time.Millisecond)
+	c <- true
+	<-cack
+}
+
+func TestYieldLocked(t *testing.T) {
+	const N = 10
+	c := make(chan bool)
+	go func() {
+		runtime.LockOSThread()
+		for i := 0; i < N; i++ {
+			runtime.Gosched()
+			time.Sleep(time.Millisecond)
+		}
+		c <- true
+		// runtime.UnlockOSThread() is deliberately omitted
+	}()
+	<-c
+}
+
+func TestGoroutineParallelism(t *testing.T) {
+	if runtime.NumCPU() == 1 {
+		// Takes too long, too easy to deadlock, etc.
+		t.Skip("skipping on uniprocessor")
+	}
+	P := 4
+	N := 10
+	if testing.Short() {
+		P = 3
+		N = 3
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(P))
+	// If runtime triggers a forced GC during this test then it will deadlock,
+	// since the goroutines can't be stopped/preempted.
+	// Disable GC for this test (see issue #10958).
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+	for try := 0; try < N; try++ {
+		done := make(chan bool)
+		x := uint32(0)
+		for p := 0; p < P; p++ {
+			// Test that all P goroutines are scheduled at the same time
+			go func(p int) {
+				for i := 0; i < 3; i++ {
+					expected := uint32(P*i + p)
+					for atomic.LoadUint32(&x) != expected {
+					}
+					atomic.StoreUint32(&x, expected+1)
+				}
+				done <- true
+			}(p)
+		}
+		for p := 0; p < P; p++ {
+			<-done
+		}
+	}
+}
+
+// Test that all runnable goroutines are scheduled at the same time.
+func TestGoroutineParallelism2(t *testing.T) {
+	//testGoroutineParallelism2(t, false, false)
+	testGoroutineParallelism2(t, true, false)
+	testGoroutineParallelism2(t, false, true)
+	testGoroutineParallelism2(t, true, true)
+}
+
+func testGoroutineParallelism2(t *testing.T, load, netpoll bool) {
+	if runtime.NumCPU() == 1 {
+		// Takes too long, too easy to deadlock, etc.
+		t.Skip("skipping on uniprocessor")
+	}
+	P := 4
+	N := 10
+	if testing.Short() {
+		N = 3
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(P))
+	// If runtime triggers a forced GC during this test then it will deadlock,
+	// since the goroutines can't be stopped/preempted.
+	// Disable GC for this test (see issue #10958).
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+	for try := 0; try < N; try++ {
+		if load {
+			// Create P goroutines and wait until they all run.
+			// When we run the actual test below, worker threads
+			// running the goroutines will start parking.
+			done := make(chan bool)
+			x := uint32(0)
+			for p := 0; p < P; p++ {
+				go func() {
+					if atomic.AddUint32(&x, 1) == uint32(P) {
+						done <- true
+						return
+					}
+					for atomic.LoadUint32(&x) != uint32(P) {
+					}
+				}()
+			}
+			<-done
+		}
+		if netpoll {
+			// Enable netpoller, affects schedler behavior.
+			laddr := "localhost:0"
+			if runtime.GOOS == "android" {
+				// On some Android devices, there are no records for localhost,
+				// see https://golang.org/issues/14486.
+				// Don't use 127.0.0.1 for every case, it won't work on IPv6-only systems.
+				laddr = "127.0.0.1:0"
+			}
+			ln, err := net.Listen("tcp", laddr)
+			if err != nil {
+				defer ln.Close() // yup, defer in a loop
+			}
+		}
+		done := make(chan bool)
+		x := uint32(0)
+		// Spawn P goroutines in a nested fashion just to differ from TestGoroutineParallelism.
+		for p := 0; p < P/2; p++ {
+			go func(p int) {
+				for p2 := 0; p2 < 2; p2++ {
+					go func(p2 int) {
+						for i := 0; i < 3; i++ {
+							expected := uint32(P*i + p*2 + p2)
+							for atomic.LoadUint32(&x) != expected {
+							}
+							atomic.StoreUint32(&x, expected+1)
+						}
+						done <- true
+					}(p2)
+				}
+			}(p)
+		}
+		for p := 0; p < P; p++ {
+			<-done
+		}
+	}
+}
+
+func TestBlockLocked(t *testing.T) {
+	const N = 10
+	c := make(chan bool)
+	go func() {
+		runtime.LockOSThread()
+		for i := 0; i < N; i++ {
+			c <- true
+		}
+		runtime.UnlockOSThread()
+	}()
+	for i := 0; i < N; i++ {
+		<-c
+	}
+}
+
+func TestTimerFairness(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+
+	done := make(chan bool)
+	c := make(chan bool)
+	for i := 0; i < 2; i++ {
+		go func() {
+			for {
+				select {
+				case c <- true:
+				case <-done:
+					return
+				}
+			}
+		}()
+	}
+
+	timer := time.After(20 * time.Millisecond)
+	for {
+		select {
+		case <-c:
+		case <-timer:
+			close(done)
+			return
+		}
+	}
+}
+
+func TestTimerFairness2(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+
+	done := make(chan bool)
+	c := make(chan bool)
+	for i := 0; i < 2; i++ {
+		go func() {
+			timer := time.After(20 * time.Millisecond)
+			var buf [1]byte
+			for {
+				syscall.Read(0, buf[0:0])
+				select {
+				case c <- true:
+				case <-c:
+				case <-timer:
+					done <- true
+					return
+				}
+			}
+		}()
+	}
+	<-done
+	<-done
+}
+
+// The function is used to test preemption at split stack checks.
+// Declaring a var avoids inlining at the call site.
+var preempt = func() int {
+	var a [128]int
+	sum := 0
+	for _, v := range a {
+		sum += v
+	}
+	return sum
+}
+
+func TestPreemption(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+
+	// Test that goroutines are preempted at function calls.
+	N := 5
+	if testing.Short() {
+		N = 2
+	}
+	c := make(chan bool)
+	var x uint32
+	for g := 0; g < 2; g++ {
+		go func(g int) {
+			for i := 0; i < N; i++ {
+				for atomic.LoadUint32(&x) != uint32(g) {
+					preempt()
+				}
+				atomic.StoreUint32(&x, uint32(1-g))
+			}
+			c <- true
+		}(g)
+	}
+	<-c
+	<-c
+}
+
+func TestPreemptionGC(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+
+	// Test that pending GC preempts running goroutines.
+	P := 5
+	N := 10
+	if testing.Short() {
+		P = 3
+		N = 2
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(P + 1))
+	var stop uint32
+	for i := 0; i < P; i++ {
+		go func() {
+			for atomic.LoadUint32(&stop) == 0 {
+				preempt()
+			}
+		}()
+	}
+	for i := 0; i < N; i++ {
+		runtime.Gosched()
+		runtime.GC()
+	}
+	atomic.StoreUint32(&stop, 1)
+}
+
+func TestAsyncPreempt(t *testing.T) {
+	if !runtime.PreemptMSupported {
+		t.Skip("asynchronous preemption not supported on this platform")
+	}
+	output := runTestProg(t, "testprog", "AsyncPreempt")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestGCFairness(t *testing.T) {
+	output := runTestProg(t, "testprog", "GCFairness")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestGCFairness2(t *testing.T) {
+	output := runTestProg(t, "testprog", "GCFairness2")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestNumGoroutine(t *testing.T) {
+	output := runTestProg(t, "testprog", "NumGoroutine")
+	want := "1\n"
+	if output != want {
+		t.Fatalf("want %q, got %q", want, output)
+	}
+
+	buf := make([]byte, 1<<20)
+
+	// Try up to 10 times for a match before giving up.
+	// This is a fundamentally racy check but it's important
+	// to notice if NumGoroutine and Stack are _always_ out of sync.
+	for i := 0; ; i++ {
+		// Give goroutines about to exit a chance to exit.
+		// The NumGoroutine and Stack below need to see
+		// the same state of the world, so anything we can do
+		// to keep it quiet is good.
+		runtime.Gosched()
+
+		n := runtime.NumGoroutine()
+		buf = buf[:runtime.Stack(buf, true)]
+
+		nstk := strings.Count(string(buf), "goroutine ")
+		if n == nstk {
+			break
+		}
+		if i >= 10 {
+			t.Fatalf("NumGoroutine=%d, but found %d goroutines in stack dump: %s", n, nstk, buf)
+		}
+	}
+}
+
+func TestPingPongHog(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+	if testing.Short() {
+		t.Skip("skipping in -short mode")
+	}
+	if race.Enabled {
+		// The race detector randomizes the scheduler,
+		// which causes this test to fail (#38266).
+		t.Skip("skipping in -race mode")
+	}
+
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+	done := make(chan bool)
+	hogChan, lightChan := make(chan bool), make(chan bool)
+	hogCount, lightCount := 0, 0
+
+	run := func(limit int, counter *int, wake chan bool) {
+		for {
+			select {
+			case <-done:
+				return
+
+			case <-wake:
+				for i := 0; i < limit; i++ {
+					*counter++
+				}
+				wake <- true
+			}
+		}
+	}
+
+	// Start two co-scheduled hog goroutines.
+	for i := 0; i < 2; i++ {
+		go run(1e6, &hogCount, hogChan)
+	}
+
+	// Start two co-scheduled light goroutines.
+	for i := 0; i < 2; i++ {
+		go run(1e3, &lightCount, lightChan)
+	}
+
+	// Start goroutine pairs and wait for a few preemption rounds.
+	hogChan <- true
+	lightChan <- true
+	time.Sleep(100 * time.Millisecond)
+	close(done)
+	<-hogChan
+	<-lightChan
+
+	// Check that hogCount and lightCount are within a factor of
+	// 5, which indicates that both pairs of goroutines handed off
+	// the P within a time-slice to their buddy. We can use a
+	// fairly large factor here to make this robust: if the
+	// scheduler isn't working right, the gap should be ~1000X.
+	const factor = 5
+	if hogCount > lightCount*factor || lightCount > hogCount*factor {
+		t.Fatalf("want hogCount/lightCount in [%v, %v]; got %d/%d = %g", 1.0/factor, factor, hogCount, lightCount, float64(hogCount)/float64(lightCount))
+	}
+}
+
+func BenchmarkPingPongHog(b *testing.B) {
+	if b.N == 0 {
+		return
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+
+	// Create a CPU hog
+	stop, done := make(chan bool), make(chan bool)
+	go func() {
+		for {
+			select {
+			case <-stop:
+				done <- true
+				return
+			default:
+			}
+		}
+	}()
+
+	// Ping-pong b.N times
+	ping, pong := make(chan bool), make(chan bool)
+	go func() {
+		for j := 0; j < b.N; j++ {
+			pong <- <-ping
+		}
+		close(stop)
+		done <- true
+	}()
+	go func() {
+		for i := 0; i < b.N; i++ {
+			ping <- <-pong
+		}
+		done <- true
+	}()
+	b.ResetTimer()
+	ping <- true // Start ping-pong
+	<-stop
+	b.StopTimer()
+	<-ping // Let last ponger exit
+	<-done // Make sure goroutines exit
+	<-done
+	<-done
+}
+
+var padData [128]uint64
+
+func stackGrowthRecursive(i int) {
+	var pad [128]uint64
+	pad = padData
+	for j := range pad {
+		if pad[j] != 0 {
+			return
+		}
+	}
+	if i != 0 {
+		stackGrowthRecursive(i - 1)
+	}
+}
+
+func TestPreemptSplitBig(t *testing.T) {
+	if testing.Short() {
+		t.Skip("skipping in -short mode")
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(2))
+	stop := make(chan int)
+	go big(stop)
+	for i := 0; i < 3; i++ {
+		time.Sleep(10 * time.Microsecond) // let big start running
+		runtime.GC()
+	}
+	close(stop)
+}
+
+func big(stop chan int) int {
+	n := 0
+	for {
+		// delay so that gc is sure to have asked for a preemption
+		for i := 0; i < 1e9; i++ {
+			n++
+		}
+
+		// call bigframe, which used to miss the preemption in its prologue.
+		bigframe(stop)
+
+		// check if we've been asked to stop.
+		select {
+		case <-stop:
+			return n
+		}
+	}
+}
+
+func bigframe(stop chan int) int {
+	// not splitting the stack will overflow.
+	// small will notice that it needs a stack split and will
+	// catch the overflow.
+	var x [8192]byte
+	return small(stop, &x)
+}
+
+func small(stop chan int, x *[8192]byte) int {
+	for i := range x {
+		x[i] = byte(i)
+	}
+	sum := 0
+	for i := range x {
+		sum += int(x[i])
+	}
+
+	// keep small from being a leaf function, which might
+	// make it not do any stack check at all.
+	nonleaf(stop)
+
+	return sum
+}
+
+func nonleaf(stop chan int) bool {
+	// do something that won't be inlined:
+	select {
+	case <-stop:
+		return true
+	default:
+		return false
+	}
+}
+
+func TestSchedLocalQueue(t *testing.T) {
+	runtime.RunSchedLocalQueueTest()
+}
+
+func TestSchedLocalQueueSteal(t *testing.T) {
+	runtime.RunSchedLocalQueueStealTest()
+}
+
+func TestSchedLocalQueueEmpty(t *testing.T) {
+	if runtime.NumCPU() == 1 {
+		// Takes too long and does not trigger the race.
+		t.Skip("skipping on uniprocessor")
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(4))
+
+	// If runtime triggers a forced GC during this test then it will deadlock,
+	// since the goroutines can't be stopped/preempted during spin wait.
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+
+	iters := int(1e5)
+	if testing.Short() {
+		iters = 1e2
+	}
+	runtime.RunSchedLocalQueueEmptyTest(iters)
+}
+
+func benchmarkStackGrowth(b *testing.B, rec int) {
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			stackGrowthRecursive(rec)
+		}
+	})
+}
+
+func BenchmarkStackGrowth(b *testing.B) {
+	benchmarkStackGrowth(b, 10)
+}
+
+func BenchmarkStackGrowthDeep(b *testing.B) {
+	benchmarkStackGrowth(b, 1024)
+}
+
+func BenchmarkCreateGoroutines(b *testing.B) {
+	benchmarkCreateGoroutines(b, 1)
+}
+
+func BenchmarkCreateGoroutinesParallel(b *testing.B) {
+	benchmarkCreateGoroutines(b, runtime.GOMAXPROCS(-1))
+}
+
+func benchmarkCreateGoroutines(b *testing.B, procs int) {
+	c := make(chan bool)
+	var f func(n int)
+	f = func(n int) {
+		if n == 0 {
+			c <- true
+			return
+		}
+		go f(n - 1)
+	}
+	for i := 0; i < procs; i++ {
+		go f(b.N / procs)
+	}
+	for i := 0; i < procs; i++ {
+		<-c
+	}
+}
+
+func BenchmarkCreateGoroutinesCapture(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		const N = 4
+		var wg sync.WaitGroup
+		wg.Add(N)
+		for i := 0; i < N; i++ {
+			i := i
+			go func() {
+				if i >= N {
+					b.Logf("bad") // just to capture b
+				}
+				wg.Done()
+			}()
+		}
+		wg.Wait()
+	}
+}
+
+func BenchmarkClosureCall(b *testing.B) {
+	sum := 0
+	off1 := 1
+	for i := 0; i < b.N; i++ {
+		off2 := 2
+		func() {
+			sum += i + off1 + off2
+		}()
+	}
+	_ = sum
+}
+
+func benchmarkWakeupParallel(b *testing.B, spin func(time.Duration)) {
+	if runtime.GOMAXPROCS(0) == 1 {
+		b.Skip("skipping: GOMAXPROCS=1")
+	}
+
+	wakeDelay := 5 * time.Microsecond
+	for _, delay := range []time.Duration{
+		0,
+		1 * time.Microsecond,
+		2 * time.Microsecond,
+		5 * time.Microsecond,
+		10 * time.Microsecond,
+		20 * time.Microsecond,
+		50 * time.Microsecond,
+		100 * time.Microsecond,
+	} {
+		b.Run(delay.String(), func(b *testing.B) {
+			if b.N == 0 {
+				return
+			}
+			// Start two goroutines, which alternate between being
+			// sender and receiver in the following protocol:
+			//
+			// - The receiver spins for `delay` and then does a
+			// blocking receive on a channel.
+			//
+			// - The sender spins for `delay+wakeDelay` and then
+			// sends to the same channel. (The addition of
+			// `wakeDelay` improves the probability that the
+			// receiver will be blocking when the send occurs when
+			// the goroutines execute in parallel.)
+			//
+			// In each iteration of the benchmark, each goroutine
+			// acts once as sender and once as receiver, so each
+			// goroutine spins for delay twice.
+			//
+			// BenchmarkWakeupParallel is used to estimate how
+			// efficiently the scheduler parallelizes goroutines in
+			// the presence of blocking:
+			//
+			// - If both goroutines are executed on the same core,
+			// an increase in delay by N will increase the time per
+			// iteration by 4*N, because all 4 delays are
+			// serialized.
+			//
+			// - Otherwise, an increase in delay by N will increase
+			// the time per iteration by 2*N, and the time per
+			// iteration is 2 * (runtime overhead + chan
+			// send/receive pair + delay + wakeDelay). This allows
+			// the runtime overhead, including the time it takes
+			// for the unblocked goroutine to be scheduled, to be
+			// estimated.
+			ping, pong := make(chan struct{}), make(chan struct{})
+			start := make(chan struct{})
+			done := make(chan struct{})
+			go func() {
+				<-start
+				for i := 0; i < b.N; i++ {
+					// sender
+					spin(delay + wakeDelay)
+					ping <- struct{}{}
+					// receiver
+					spin(delay)
+					<-pong
+				}
+				done <- struct{}{}
+			}()
+			go func() {
+				for i := 0; i < b.N; i++ {
+					// receiver
+					spin(delay)
+					<-ping
+					// sender
+					spin(delay + wakeDelay)
+					pong <- struct{}{}
+				}
+				done <- struct{}{}
+			}()
+			b.ResetTimer()
+			start <- struct{}{}
+			<-done
+			<-done
+		})
+	}
+}
+
+func BenchmarkWakeupParallelSpinning(b *testing.B) {
+	benchmarkWakeupParallel(b, func(d time.Duration) {
+		end := time.Now().Add(d)
+		for time.Now().Before(end) {
+			// do nothing
+		}
+	})
+}
+
+// sysNanosleep is defined by OS-specific files (such as runtime_linux_test.go)
+// to sleep for the given duration. If nil, dependent tests are skipped.
+// The implementation should invoke a blocking system call and not
+// call time.Sleep, which would deschedule the goroutine.
+var sysNanosleep func(d time.Duration)
+
+func BenchmarkWakeupParallelSyscall(b *testing.B) {
+	if sysNanosleep == nil {
+		b.Skipf("skipping on %v; sysNanosleep not defined", runtime.GOOS)
+	}
+	benchmarkWakeupParallel(b, func(d time.Duration) {
+		sysNanosleep(d)
+	})
+}
+
+type Matrix [][]float64
+
+func BenchmarkMatmult(b *testing.B) {
+	b.StopTimer()
+	// matmult is O(N**3) but testing expects O(b.N),
+	// so we need to take cube root of b.N
+	n := int(math.Cbrt(float64(b.N))) + 1
+	A := makeMatrix(n)
+	B := makeMatrix(n)
+	C := makeMatrix(n)
+	b.StartTimer()
+	matmult(nil, A, B, C, 0, n, 0, n, 0, n, 8)
+}
+
+func makeMatrix(n int) Matrix {
+	m := make(Matrix, n)
+	for i := 0; i < n; i++ {
+		m[i] = make([]float64, n)
+		for j := 0; j < n; j++ {
+			m[i][j] = float64(i*n + j)
+		}
+	}
+	return m
+}
+
+func matmult(done chan<- struct{}, A, B, C Matrix, i0, i1, j0, j1, k0, k1, threshold int) {
+	di := i1 - i0
+	dj := j1 - j0
+	dk := k1 - k0
+	if di >= dj && di >= dk && di >= threshold {
+		// divide in two by y axis
+		mi := i0 + di/2
+		done1 := make(chan struct{}, 1)
+		go matmult(done1, A, B, C, i0, mi, j0, j1, k0, k1, threshold)
+		matmult(nil, A, B, C, mi, i1, j0, j1, k0, k1, threshold)
+		<-done1
+	} else if dj >= dk && dj >= threshold {
+		// divide in two by x axis
+		mj := j0 + dj/2
+		done1 := make(chan struct{}, 1)
+		go matmult(done1, A, B, C, i0, i1, j0, mj, k0, k1, threshold)
+		matmult(nil, A, B, C, i0, i1, mj, j1, k0, k1, threshold)
+		<-done1
+	} else if dk >= threshold {
+		// divide in two by "k" axis
+		// deliberately not parallel because of data races
+		mk := k0 + dk/2
+		matmult(nil, A, B, C, i0, i1, j0, j1, k0, mk, threshold)
+		matmult(nil, A, B, C, i0, i1, j0, j1, mk, k1, threshold)
+	} else {
+		// the matrices are small enough, compute directly
+		for i := i0; i < i1; i++ {
+			for j := j0; j < j1; j++ {
+				for k := k0; k < k1; k++ {
+					C[i][j] += A[i][k] * B[k][j]
+				}
+			}
+		}
+	}
+	if done != nil {
+		done <- struct{}{}
+	}
+}
+
+func TestStealOrder(t *testing.T) {
+	runtime.RunStealOrderTest()
+}
+
+func TestLockOSThreadNesting(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no threads on wasm yet")
+	}
+
+	go func() {
+		e, i := runtime.LockOSCounts()
+		if e != 0 || i != 0 {
+			t.Errorf("want locked counts 0, 0; got %d, %d", e, i)
+			return
+		}
+		runtime.LockOSThread()
+		runtime.LockOSThread()
+		runtime.UnlockOSThread()
+		e, i = runtime.LockOSCounts()
+		if e != 1 || i != 0 {
+			t.Errorf("want locked counts 1, 0; got %d, %d", e, i)
+			return
+		}
+		runtime.UnlockOSThread()
+		e, i = runtime.LockOSCounts()
+		if e != 0 || i != 0 {
+			t.Errorf("want locked counts 0, 0; got %d, %d", e, i)
+			return
+		}
+	}()
+}
+
+func TestLockOSThreadExit(t *testing.T) {
+	testLockOSThreadExit(t, "testprog")
+}
+
+func testLockOSThreadExit(t *testing.T, prog string) {
+	output := runTestProg(t, prog, "LockOSThreadMain", "GOMAXPROCS=1")
+	want := "OK\n"
+	if output != want {
+		t.Errorf("want %q, got %q", want, output)
+	}
+
+	output = runTestProg(t, prog, "LockOSThreadAlt")
+	if output != want {
+		t.Errorf("want %q, got %q", want, output)
+	}
+}
+
+func TestLockOSThreadAvoidsStatePropagation(t *testing.T) {
+	want := "OK\n"
+	skip := "unshare not permitted\n"
+	output := runTestProg(t, "testprog", "LockOSThreadAvoidsStatePropagation", "GOMAXPROCS=1")
+	if output == skip {
+		t.Skip("unshare syscall not permitted on this system")
+	} else if output != want {
+		t.Errorf("want %q, got %q", want, output)
+	}
+}
+
+func TestLockOSThreadTemplateThreadRace(t *testing.T) {
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	iterations := 100
+	if testing.Short() {
+		// Reduce run time to ~100ms, with much lower probability of
+		// catching issues.
+		iterations = 5
+	}
+	for i := 0; i < iterations; i++ {
+		want := "OK\n"
+		output := runBuiltTestProg(t, exe, "LockOSThreadTemplateThreadRace")
+		if output != want {
+			t.Fatalf("run %d: want %q, got %q", i, want, output)
+		}
+	}
+}
+
+// fakeSyscall emulates a system call.
+//go:nosplit
+func fakeSyscall(duration time.Duration) {
+	runtime.Entersyscall()
+	for start := runtime.Nanotime(); runtime.Nanotime()-start < int64(duration); {
+	}
+	runtime.Exitsyscall()
+}
+
+// Check that a goroutine will be preempted if it is calling short system calls.
+func testPreemptionAfterSyscall(t *testing.T, syscallDuration time.Duration) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(2))
+
+	interations := 10
+	if testing.Short() {
+		interations = 1
+	}
+	const (
+		maxDuration = 3 * time.Second
+		nroutines   = 8
+	)
+
+	for i := 0; i < interations; i++ {
+		c := make(chan bool, nroutines)
+		stop := uint32(0)
+
+		start := time.Now()
+		for g := 0; g < nroutines; g++ {
+			go func(stop *uint32) {
+				c <- true
+				for atomic.LoadUint32(stop) == 0 {
+					fakeSyscall(syscallDuration)
+				}
+				c <- true
+			}(&stop)
+		}
+		// wait until all goroutines have started.
+		for g := 0; g < nroutines; g++ {
+			<-c
+		}
+		atomic.StoreUint32(&stop, 1)
+		// wait until all goroutines have finished.
+		for g := 0; g < nroutines; g++ {
+			<-c
+		}
+		duration := time.Since(start)
+
+		if duration > maxDuration {
+			t.Errorf("timeout exceeded: %v (%v)", duration, maxDuration)
+		}
+	}
+}
+
+func TestPreemptionAfterSyscall(t *testing.T) {
+	for _, i := range []time.Duration{10, 100, 1000} {
+		d := i * time.Microsecond
+		t.Run(fmt.Sprint(d), func(t *testing.T) {
+			testPreemptionAfterSyscall(t, d)
+		})
+	}
+}
+
+func TestGetgThreadSwitch(t *testing.T) {
+	runtime.RunGetgThreadSwitchTest()
+}
+
+// TestNetpollBreak tests that netpollBreak can break a netpoll.
+// This test is not particularly safe since the call to netpoll
+// will pick up any stray files that are ready, but it should work
+// OK as long it is not run in parallel.
+func TestNetpollBreak(t *testing.T) {
+	if runtime.GOMAXPROCS(0) == 1 {
+		t.Skip("skipping: GOMAXPROCS=1")
+	}
+
+	// Make sure that netpoll is initialized.
+	runtime.NetpollGenericInit()
+
+	start := time.Now()
+	c := make(chan bool, 2)
+	go func() {
+		c <- true
+		runtime.Netpoll(10 * time.Second.Nanoseconds())
+		c <- true
+	}()
+	<-c
+	// Loop because the break might get eaten by the scheduler.
+	// Break twice to break both the netpoll we started and the
+	// scheduler netpoll.
+loop:
+	for {
+		runtime.Usleep(100)
+		runtime.NetpollBreak()
+		runtime.NetpollBreak()
+		select {
+		case <-c:
+			break loop
+		default:
+		}
+	}
+	if dur := time.Since(start); dur > 5*time.Second {
+		t.Errorf("netpollBreak did not interrupt netpoll: slept for: %v", dur)
+	}
+}
+
+// TestBigGOMAXPROCS tests that setting GOMAXPROCS to a large value
+// doesn't cause a crash at startup. See issue 38474.
+func TestBigGOMAXPROCS(t *testing.T) {
+	t.Parallel()
+	output := runTestProg(t, "testprog", "NonexistentTest", "GOMAXPROCS=1024")
+	// Ignore error conditions on small machines.
+	for _, errstr := range []string{
+		"failed to create new OS thread",
+		"cannot allocate memory",
+	} {
+		if strings.Contains(output, errstr) {
+			t.Skipf("failed to create 1024 threads")
+		}
+	}
+	if !strings.Contains(output, "unknown function: NonexistentTest") {
+		t.Errorf("output:\n%s\nwanted:\nunknown function: NonexistentTest", output)
+	}
+}
diff --git a/src/runtime/profbuf.go b/src/runtime/profbuf.go
new file mode 100644
index 0000000..f40881a
--- /dev/null
+++ b/src/runtime/profbuf.go
@@ -0,0 +1,561 @@
+// Copyright 2017 The Go Authors.  All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// A profBuf is a lock-free buffer for profiling events,
+// safe for concurrent use by one reader and one writer.
+// The writer may be a signal handler running without a user g.
+// The reader is assumed to be a user g.
+//
+// Each logged event corresponds to a fixed size header, a list of
+// uintptrs (typically a stack), and exactly one unsafe.Pointer tag.
+// The header and uintptrs are stored in the circular buffer data and the
+// tag is stored in a circular buffer tags, running in parallel.
+// In the circular buffer data, each event takes 2+hdrsize+len(stk)
+// words: the value 2+hdrsize+len(stk), then the time of the event, then
+// hdrsize words giving the fixed-size header, and then len(stk) words
+// for the stack.
+//
+// The current effective offsets into the tags and data circular buffers
+// for reading and writing are stored in the high 30 and low 32 bits of r and w.
+// The bottom bits of the high 32 are additional flag bits in w, unused in r.
+// "Effective" offsets means the total number of reads or writes, mod 2^length.
+// The offset in the buffer is the effective offset mod the length of the buffer.
+// To make wraparound mod 2^length match wraparound mod length of the buffer,
+// the length of the buffer must be a power of two.
+//
+// If the reader catches up to the writer, a flag passed to read controls
+// whether the read blocks until more data is available. A read returns a
+// pointer to the buffer data itself; the caller is assumed to be done with
+// that data at the next read. The read offset rNext tracks the next offset to
+// be returned by read. By definition, r ≤ rNext ≤ w (before wraparound),
+// and rNext is only used by the reader, so it can be accessed without atomics.
+//
+// If the writer gets ahead of the reader, so that the buffer fills,
+// future writes are discarded and replaced in the output stream by an
+// overflow entry, which has size 2+hdrsize+1, time set to the time of
+// the first discarded write, a header of all zeroed words, and a "stack"
+// containing one word, the number of discarded writes.
+//
+// Between the time the buffer fills and the buffer becomes empty enough
+// to hold more data, the overflow entry is stored as a pending overflow
+// entry in the fields overflow and overflowTime. The pending overflow
+// entry can be turned into a real record by either the writer or the
+// reader. If the writer is called to write a new record and finds that
+// the output buffer has room for both the pending overflow entry and the
+// new record, the writer emits the pending overflow entry and the new
+// record into the buffer. If the reader is called to read data and finds
+// that the output buffer is empty but that there is a pending overflow
+// entry, the reader will return a synthesized record for the pending
+// overflow entry.
+//
+// Only the writer can create or add to a pending overflow entry, but
+// either the reader or the writer can clear the pending overflow entry.
+// A pending overflow entry is indicated by the low 32 bits of 'overflow'
+// holding the number of discarded writes, and overflowTime holding the
+// time of the first discarded write. The high 32 bits of 'overflow'
+// increment each time the low 32 bits transition from zero to non-zero
+// or vice versa. This sequence number avoids ABA problems in the use of
+// compare-and-swap to coordinate between reader and writer.
+// The overflowTime is only written when the low 32 bits of overflow are
+// zero, that is, only when there is no pending overflow entry, in
+// preparation for creating a new one. The reader can therefore fetch and
+// clear the entry atomically using
+//
+//	for {
+//		overflow = load(&b.overflow)
+//		if uint32(overflow) == 0 {
+//			// no pending entry
+//			break
+//		}
+//		time = load(&b.overflowTime)
+//		if cas(&b.overflow, overflow, ((overflow>>32)+1)<<32) {
+//			// pending entry cleared
+//			break
+//		}
+//	}
+//	if uint32(overflow) > 0 {
+//		emit entry for uint32(overflow), time
+//	}
+//
+type profBuf struct {
+	// accessed atomically
+	r, w         profAtomic
+	overflow     uint64
+	overflowTime uint64
+	eof          uint32
+
+	// immutable (excluding slice content)
+	hdrsize uintptr
+	data    []uint64
+	tags    []unsafe.Pointer
+
+	// owned by reader
+	rNext       profIndex
+	overflowBuf []uint64 // for use by reader to return overflow record
+	wait        note
+}
+
+// A profAtomic is the atomically-accessed word holding a profIndex.
+type profAtomic uint64
+
+// A profIndex is the packet tag and data counts and flags bits, described above.
+type profIndex uint64
+
+const (
+	profReaderSleeping profIndex = 1 << 32 // reader is sleeping and must be woken up
+	profWriteExtra     profIndex = 1 << 33 // overflow or eof waiting
+)
+
+func (x *profAtomic) load() profIndex {
+	return profIndex(atomic.Load64((*uint64)(x)))
+}
+
+func (x *profAtomic) store(new profIndex) {
+	atomic.Store64((*uint64)(x), uint64(new))
+}
+
+func (x *profAtomic) cas(old, new profIndex) bool {
+	return atomic.Cas64((*uint64)(x), uint64(old), uint64(new))
+}
+
+func (x profIndex) dataCount() uint32 {
+	return uint32(x)
+}
+
+func (x profIndex) tagCount() uint32 {
+	return uint32(x >> 34)
+}
+
+// countSub subtracts two counts obtained from profIndex.dataCount or profIndex.tagCount,
+// assuming that they are no more than 2^29 apart (guaranteed since they are never more than
+// len(data) or len(tags) apart, respectively).
+// tagCount wraps at 2^30, while dataCount wraps at 2^32.
+// This function works for both.
+func countSub(x, y uint32) int {
+	// x-y is 32-bit signed or 30-bit signed; sign-extend to 32 bits and convert to int.
+	return int(int32(x-y) << 2 >> 2)
+}
+
+// addCountsAndClearFlags returns the packed form of "x + (data, tag) - all flags".
+func (x profIndex) addCountsAndClearFlags(data, tag int) profIndex {
+	return profIndex((uint64(x)>>34+uint64(uint32(tag)<<2>>2))<<34 | uint64(uint32(x)+uint32(data)))
+}
+
+// hasOverflow reports whether b has any overflow records pending.
+func (b *profBuf) hasOverflow() bool {
+	return uint32(atomic.Load64(&b.overflow)) > 0
+}
+
+// takeOverflow consumes the pending overflow records, returning the overflow count
+// and the time of the first overflow.
+// When called by the reader, it is racing against incrementOverflow.
+func (b *profBuf) takeOverflow() (count uint32, time uint64) {
+	overflow := atomic.Load64(&b.overflow)
+	time = atomic.Load64(&b.overflowTime)
+	for {
+		count = uint32(overflow)
+		if count == 0 {
+			time = 0
+			break
+		}
+		// Increment generation, clear overflow count in low bits.
+		if atomic.Cas64(&b.overflow, overflow, ((overflow>>32)+1)<<32) {
+			break
+		}
+		overflow = atomic.Load64(&b.overflow)
+		time = atomic.Load64(&b.overflowTime)
+	}
+	return uint32(overflow), time
+}
+
+// incrementOverflow records a single overflow at time now.
+// It is racing against a possible takeOverflow in the reader.
+func (b *profBuf) incrementOverflow(now int64) {
+	for {
+		overflow := atomic.Load64(&b.overflow)
+
+		// Once we see b.overflow reach 0, it's stable: no one else is changing it underfoot.
+		// We need to set overflowTime if we're incrementing b.overflow from 0.
+		if uint32(overflow) == 0 {
+			// Store overflowTime first so it's always available when overflow != 0.
+			atomic.Store64(&b.overflowTime, uint64(now))
+			atomic.Store64(&b.overflow, (((overflow>>32)+1)<<32)+1)
+			break
+		}
+		// Otherwise we're racing to increment against reader
+		// who wants to set b.overflow to 0.
+		// Out of paranoia, leave 2³²-1 a sticky overflow value,
+		// to avoid wrapping around. Extremely unlikely.
+		if int32(overflow) == -1 {
+			break
+		}
+		if atomic.Cas64(&b.overflow, overflow, overflow+1) {
+			break
+		}
+	}
+}
+
+// newProfBuf returns a new profiling buffer with room for
+// a header of hdrsize words and a buffer of at least bufwords words.
+func newProfBuf(hdrsize, bufwords, tags int) *profBuf {
+	if min := 2 + hdrsize + 1; bufwords < min {
+		bufwords = min
+	}
+
+	// Buffer sizes must be power of two, so that we don't have to
+	// worry about uint32 wraparound changing the effective position
+	// within the buffers. We store 30 bits of count; limiting to 28
+	// gives us some room for intermediate calculations.
+	if bufwords >= 1<<28 || tags >= 1<<28 {
+		throw("newProfBuf: buffer too large")
+	}
+	var i int
+	for i = 1; i < bufwords; i <<= 1 {
+	}
+	bufwords = i
+	for i = 1; i < tags; i <<= 1 {
+	}
+	tags = i
+
+	b := new(profBuf)
+	b.hdrsize = uintptr(hdrsize)
+	b.data = make([]uint64, bufwords)
+	b.tags = make([]unsafe.Pointer, tags)
+	b.overflowBuf = make([]uint64, 2+b.hdrsize+1)
+	return b
+}
+
+// canWriteRecord reports whether the buffer has room
+// for a single contiguous record with a stack of length nstk.
+func (b *profBuf) canWriteRecord(nstk int) bool {
+	br := b.r.load()
+	bw := b.w.load()
+
+	// room for tag?
+	if countSub(br.tagCount(), bw.tagCount())+len(b.tags) < 1 {
+		return false
+	}
+
+	// room for data?
+	nd := countSub(br.dataCount(), bw.dataCount()) + len(b.data)
+	want := 2 + int(b.hdrsize) + nstk
+	i := int(bw.dataCount() % uint32(len(b.data)))
+	if i+want > len(b.data) {
+		// Can't fit in trailing fragment of slice.
+		// Skip over that and start over at beginning of slice.
+		nd -= len(b.data) - i
+	}
+	return nd >= want
+}
+
+// canWriteTwoRecords reports whether the buffer has room
+// for two records with stack lengths nstk1, nstk2, in that order.
+// Each record must be contiguous on its own, but the two
+// records need not be contiguous (one can be at the end of the buffer
+// and the other can wrap around and start at the beginning of the buffer).
+func (b *profBuf) canWriteTwoRecords(nstk1, nstk2 int) bool {
+	br := b.r.load()
+	bw := b.w.load()
+
+	// room for tag?
+	if countSub(br.tagCount(), bw.tagCount())+len(b.tags) < 2 {
+		return false
+	}
+
+	// room for data?
+	nd := countSub(br.dataCount(), bw.dataCount()) + len(b.data)
+
+	// first record
+	want := 2 + int(b.hdrsize) + nstk1
+	i := int(bw.dataCount() % uint32(len(b.data)))
+	if i+want > len(b.data) {
+		// Can't fit in trailing fragment of slice.
+		// Skip over that and start over at beginning of slice.
+		nd -= len(b.data) - i
+		i = 0
+	}
+	i += want
+	nd -= want
+
+	// second record
+	want = 2 + int(b.hdrsize) + nstk2
+	if i+want > len(b.data) {
+		// Can't fit in trailing fragment of slice.
+		// Skip over that and start over at beginning of slice.
+		nd -= len(b.data) - i
+		i = 0
+	}
+	return nd >= want
+}
+
+// write writes an entry to the profiling buffer b.
+// The entry begins with a fixed hdr, which must have
+// length b.hdrsize, followed by a variable-sized stack
+// and a single tag pointer *tagPtr (or nil if tagPtr is nil).
+// No write barriers allowed because this might be called from a signal handler.
+func (b *profBuf) write(tagPtr *unsafe.Pointer, now int64, hdr []uint64, stk []uintptr) {
+	if b == nil {
+		return
+	}
+	if len(hdr) > int(b.hdrsize) {
+		throw("misuse of profBuf.write")
+	}
+
+	if hasOverflow := b.hasOverflow(); hasOverflow && b.canWriteTwoRecords(1, len(stk)) {
+		// Room for both an overflow record and the one being written.
+		// Write the overflow record if the reader hasn't gotten to it yet.
+		// Only racing against reader, not other writers.
+		count, time := b.takeOverflow()
+		if count > 0 {
+			var stk [1]uintptr
+			stk[0] = uintptr(count)
+			b.write(nil, int64(time), nil, stk[:])
+		}
+	} else if hasOverflow || !b.canWriteRecord(len(stk)) {
+		// Pending overflow without room to write overflow and new records
+		// or no overflow but also no room for new record.
+		b.incrementOverflow(now)
+		b.wakeupExtra()
+		return
+	}
+
+	// There's room: write the record.
+	br := b.r.load()
+	bw := b.w.load()
+
+	// Profiling tag
+	//
+	// The tag is a pointer, but we can't run a write barrier here.
+	// We have interrupted the OS-level execution of gp, but the
+	// runtime still sees gp as executing. In effect, we are running
+	// in place of the real gp. Since gp is the only goroutine that
+	// can overwrite gp.labels, the value of gp.labels is stable during
+	// this signal handler: it will still be reachable from gp when
+	// we finish executing. If a GC is in progress right now, it must
+	// keep gp.labels alive, because gp.labels is reachable from gp.
+	// If gp were to overwrite gp.labels, the deletion barrier would
+	// still shade that pointer, which would preserve it for the
+	// in-progress GC, so all is well. Any future GC will see the
+	// value we copied when scanning b.tags (heap-allocated).
+	// We arrange that the store here is always overwriting a nil,
+	// so there is no need for a deletion barrier on b.tags[wt].
+	wt := int(bw.tagCount() % uint32(len(b.tags)))
+	if tagPtr != nil {
+		*(*uintptr)(unsafe.Pointer(&b.tags[wt])) = uintptr(unsafe.Pointer(*tagPtr))
+	}
+
+	// Main record.
+	// It has to fit in a contiguous section of the slice, so if it doesn't fit at the end,
+	// leave a rewind marker (0) and start over at the beginning of the slice.
+	wd := int(bw.dataCount() % uint32(len(b.data)))
+	nd := countSub(br.dataCount(), bw.dataCount()) + len(b.data)
+	skip := 0
+	if wd+2+int(b.hdrsize)+len(stk) > len(b.data) {
+		b.data[wd] = 0
+		skip = len(b.data) - wd
+		nd -= skip
+		wd = 0
+	}
+	data := b.data[wd:]
+	data[0] = uint64(2 + b.hdrsize + uintptr(len(stk))) // length
+	data[1] = uint64(now)                               // time stamp
+	// header, zero-padded
+	i := uintptr(copy(data[2:2+b.hdrsize], hdr))
+	for ; i < b.hdrsize; i++ {
+		data[2+i] = 0
+	}
+	for i, pc := range stk {
+		data[2+b.hdrsize+uintptr(i)] = uint64(pc)
+	}
+
+	for {
+		// Commit write.
+		// Racing with reader setting flag bits in b.w, to avoid lost wakeups.
+		old := b.w.load()
+		new := old.addCountsAndClearFlags(skip+2+len(stk)+int(b.hdrsize), 1)
+		if !b.w.cas(old, new) {
+			continue
+		}
+		// If there was a reader, wake it up.
+		if old&profReaderSleeping != 0 {
+			notewakeup(&b.wait)
+		}
+		break
+	}
+}
+
+// close signals that there will be no more writes on the buffer.
+// Once all the data has been read from the buffer, reads will return eof=true.
+func (b *profBuf) close() {
+	if atomic.Load(&b.eof) > 0 {
+		throw("runtime: profBuf already closed")
+	}
+	atomic.Store(&b.eof, 1)
+	b.wakeupExtra()
+}
+
+// wakeupExtra must be called after setting one of the "extra"
+// atomic fields b.overflow or b.eof.
+// It records the change in b.w and wakes up the reader if needed.
+func (b *profBuf) wakeupExtra() {
+	for {
+		old := b.w.load()
+		new := old | profWriteExtra
+		if !b.w.cas(old, new) {
+			continue
+		}
+		if old&profReaderSleeping != 0 {
+			notewakeup(&b.wait)
+		}
+		break
+	}
+}
+
+// profBufReadMode specifies whether to block when no data is available to read.
+type profBufReadMode int
+
+const (
+	profBufBlocking profBufReadMode = iota
+	profBufNonBlocking
+)
+
+var overflowTag [1]unsafe.Pointer // always nil
+
+func (b *profBuf) read(mode profBufReadMode) (data []uint64, tags []unsafe.Pointer, eof bool) {
+	if b == nil {
+		return nil, nil, true
+	}
+
+	br := b.rNext
+
+	// Commit previous read, returning that part of the ring to the writer.
+	// First clear tags that have now been read, both to avoid holding
+	// up the memory they point at for longer than necessary
+	// and so that b.write can assume it is always overwriting
+	// nil tag entries (see comment in b.write).
+	rPrev := b.r.load()
+	if rPrev != br {
+		ntag := countSub(br.tagCount(), rPrev.tagCount())
+		ti := int(rPrev.tagCount() % uint32(len(b.tags)))
+		for i := 0; i < ntag; i++ {
+			b.tags[ti] = nil
+			if ti++; ti == len(b.tags) {
+				ti = 0
+			}
+		}
+		b.r.store(br)
+	}
+
+Read:
+	bw := b.w.load()
+	numData := countSub(bw.dataCount(), br.dataCount())
+	if numData == 0 {
+		if b.hasOverflow() {
+			// No data to read, but there is overflow to report.
+			// Racing with writer flushing b.overflow into a real record.
+			count, time := b.takeOverflow()
+			if count == 0 {
+				// Lost the race, go around again.
+				goto Read
+			}
+			// Won the race, report overflow.
+			dst := b.overflowBuf
+			dst[0] = uint64(2 + b.hdrsize + 1)
+			dst[1] = uint64(time)
+			for i := uintptr(0); i < b.hdrsize; i++ {
+				dst[2+i] = 0
+			}
+			dst[2+b.hdrsize] = uint64(count)
+			return dst[:2+b.hdrsize+1], overflowTag[:1], false
+		}
+		if atomic.Load(&b.eof) > 0 {
+			// No data, no overflow, EOF set: done.
+			return nil, nil, true
+		}
+		if bw&profWriteExtra != 0 {
+			// Writer claims to have published extra information (overflow or eof).
+			// Attempt to clear notification and then check again.
+			// If we fail to clear the notification it means b.w changed,
+			// so we still need to check again.
+			b.w.cas(bw, bw&^profWriteExtra)
+			goto Read
+		}
+
+		// Nothing to read right now.
+		// Return or sleep according to mode.
+		if mode == profBufNonBlocking {
+			return nil, nil, false
+		}
+		if !b.w.cas(bw, bw|profReaderSleeping) {
+			goto Read
+		}
+		// Committed to sleeping.
+		notetsleepg(&b.wait, -1)
+		noteclear(&b.wait)
+		goto Read
+	}
+	data = b.data[br.dataCount()%uint32(len(b.data)):]
+	if len(data) > numData {
+		data = data[:numData]
+	} else {
+		numData -= len(data) // available in case of wraparound
+	}
+	skip := 0
+	if data[0] == 0 {
+		// Wraparound record. Go back to the beginning of the ring.
+		skip = len(data)
+		data = b.data
+		if len(data) > numData {
+			data = data[:numData]
+		}
+	}
+
+	ntag := countSub(bw.tagCount(), br.tagCount())
+	if ntag == 0 {
+		throw("runtime: malformed profBuf buffer - tag and data out of sync")
+	}
+	tags = b.tags[br.tagCount()%uint32(len(b.tags)):]
+	if len(tags) > ntag {
+		tags = tags[:ntag]
+	}
+
+	// Count out whole data records until either data or tags is done.
+	// They are always in sync in the buffer, but due to an end-of-slice
+	// wraparound we might need to stop early and return the rest
+	// in the next call.
+	di := 0
+	ti := 0
+	for di < len(data) && data[di] != 0 && ti < len(tags) {
+		if uintptr(di)+uintptr(data[di]) > uintptr(len(data)) {
+			throw("runtime: malformed profBuf buffer - invalid size")
+		}
+		di += int(data[di])
+		ti++
+	}
+
+	// Remember how much we returned, to commit read on next call.
+	b.rNext = br.addCountsAndClearFlags(skip+di, ti)
+
+	if raceenabled {
+		// Match racereleasemerge in runtime_setProfLabel,
+		// so that the setting of the labels in runtime_setProfLabel
+		// is treated as happening before any use of the labels
+		// by our caller. The synchronization on labelSync itself is a fiction
+		// for the race detector. The actual synchronization is handled
+		// by the fact that the signal handler only reads from the current
+		// goroutine and uses atomics to write the updated queue indices,
+		// and then the read-out from the signal handler buffer uses
+		// atomics to read those queue indices.
+		raceacquire(unsafe.Pointer(&labelSync))
+	}
+
+	return data[:di], tags[:ti], false
+}
diff --git a/src/runtime/profbuf_test.go b/src/runtime/profbuf_test.go
new file mode 100644
index 0000000..d9c5264
--- /dev/null
+++ b/src/runtime/profbuf_test.go
@@ -0,0 +1,182 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"reflect"
+	. "runtime"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+func TestProfBuf(t *testing.T) {
+	const hdrSize = 2
+
+	write := func(t *testing.T, b *ProfBuf, tag unsafe.Pointer, now int64, hdr []uint64, stk []uintptr) {
+		b.Write(&tag, now, hdr, stk)
+	}
+	read := func(t *testing.T, b *ProfBuf, data []uint64, tags []unsafe.Pointer) {
+		rdata, rtags, eof := b.Read(ProfBufNonBlocking)
+		if !reflect.DeepEqual(rdata, data) || !reflect.DeepEqual(rtags, tags) {
+			t.Fatalf("unexpected profile read:\nhave data %#x\nwant data %#x\nhave tags %#x\nwant tags %#x", rdata, data, rtags, tags)
+		}
+		if eof {
+			t.Fatalf("unexpected eof")
+		}
+	}
+	readBlock := func(t *testing.T, b *ProfBuf, data []uint64, tags []unsafe.Pointer) func() {
+		c := make(chan int)
+		go func() {
+			eof := data == nil
+			rdata, rtags, reof := b.Read(ProfBufBlocking)
+			if !reflect.DeepEqual(rdata, data) || !reflect.DeepEqual(rtags, tags) || reof != eof {
+				// Errorf, not Fatalf, because called in goroutine.
+				t.Errorf("unexpected profile read:\nhave data %#x\nwant data %#x\nhave tags %#x\nwant tags %#x\nhave eof=%v, want %v", rdata, data, rtags, tags, reof, eof)
+			}
+			c <- 1
+		}()
+		time.Sleep(10 * time.Millisecond) // let goroutine run and block
+		return func() {
+			select {
+			case <-c:
+			case <-time.After(1 * time.Second):
+				t.Fatalf("timeout waiting for blocked read")
+			}
+		}
+	}
+	readEOF := func(t *testing.T, b *ProfBuf) {
+		rdata, rtags, eof := b.Read(ProfBufBlocking)
+		if rdata != nil || rtags != nil || !eof {
+			t.Errorf("unexpected profile read: %#x, %#x, eof=%v; want nil, nil, eof=true", rdata, rtags, eof)
+		}
+		rdata, rtags, eof = b.Read(ProfBufNonBlocking)
+		if rdata != nil || rtags != nil || !eof {
+			t.Errorf("unexpected profile read (non-blocking): %#x, %#x, eof=%v; want nil, nil, eof=true", rdata, rtags, eof)
+		}
+	}
+
+	myTags := make([]byte, 100)
+	t.Logf("myTags is %p", &myTags[0])
+
+	t.Run("BasicWriteRead", func(t *testing.T) {
+		b := NewProfBuf(2, 11, 1)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+		read(t, b, nil, nil) // release data returned by previous read
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		read(t, b, []uint64{8, 99, 101, 102, 201, 202, 203, 204}, []unsafe.Pointer{unsafe.Pointer(&myTags[2])})
+	})
+
+	t.Run("ReadMany", func(t *testing.T) {
+		b := NewProfBuf(2, 50, 50)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		write(t, b, unsafe.Pointer(&myTags[1]), 500, []uint64{502, 504}, []uintptr{506})
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 99, 101, 102, 201, 202, 203, 204, 5, 500, 502, 504, 506}, []unsafe.Pointer{unsafe.Pointer(&myTags[0]), unsafe.Pointer(&myTags[2]), unsafe.Pointer(&myTags[1])})
+	})
+
+	t.Run("ReadManyShortData", func(t *testing.T) {
+		b := NewProfBuf(2, 50, 50)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 99, 101, 102, 201, 202, 203, 204}, []unsafe.Pointer{unsafe.Pointer(&myTags[0]), unsafe.Pointer(&myTags[2])})
+	})
+
+	t.Run("ReadManyShortTags", func(t *testing.T) {
+		b := NewProfBuf(2, 50, 50)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 99, 101, 102, 201, 202, 203, 204}, []unsafe.Pointer{unsafe.Pointer(&myTags[0]), unsafe.Pointer(&myTags[2])})
+	})
+
+	t.Run("ReadAfterOverflow1", func(t *testing.T) {
+		// overflow record synthesized by write
+		b := NewProfBuf(2, 16, 5)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})           // uses 10
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])}) // reads 10 but still in use until next read
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5})                       // uses 6
+		read(t, b, []uint64{6, 1, 2, 3, 4, 5}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})              // reads 6 but still in use until next read
+		// now 10 available
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204, 205, 206, 207, 208, 209}) // no room
+		for i := 0; i < 299; i++ {
+			write(t, b, unsafe.Pointer(&myTags[3]), int64(100+i), []uint64{101, 102}, []uintptr{201, 202, 203, 204}) // no room for overflow+this record
+		}
+		write(t, b, unsafe.Pointer(&myTags[1]), 500, []uint64{502, 504}, []uintptr{506}) // room for overflow+this record
+		read(t, b, []uint64{5, 99, 0, 0, 300, 5, 500, 502, 504, 506}, []unsafe.Pointer{nil, unsafe.Pointer(&myTags[1])})
+	})
+
+	t.Run("ReadAfterOverflow2", func(t *testing.T) {
+		// overflow record synthesized by read
+		b := NewProfBuf(2, 16, 5)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213})
+		for i := 0; i < 299; i++ {
+			write(t, b, unsafe.Pointer(&myTags[3]), 100, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		}
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])}) // reads 10 but still in use until next read
+		write(t, b, unsafe.Pointer(&myTags[1]), 500, []uint64{502, 504}, []uintptr{})                     // still overflow
+		read(t, b, []uint64{5, 99, 0, 0, 301}, []unsafe.Pointer{nil})                                     // overflow synthesized by read
+		write(t, b, unsafe.Pointer(&myTags[1]), 500, []uint64{502, 505}, []uintptr{506})                  // written
+		read(t, b, []uint64{5, 500, 502, 505, 506}, []unsafe.Pointer{unsafe.Pointer(&myTags[1])})
+	})
+
+	t.Run("ReadAtEndAfterOverflow", func(t *testing.T) {
+		b := NewProfBuf(2, 12, 5)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		for i := 0; i < 299; i++ {
+			write(t, b, unsafe.Pointer(&myTags[3]), 100, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		}
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+		read(t, b, []uint64{5, 99, 0, 0, 300}, []unsafe.Pointer{nil})
+		write(t, b, unsafe.Pointer(&myTags[1]), 500, []uint64{502, 504}, []uintptr{506})
+		read(t, b, []uint64{5, 500, 502, 504, 506}, []unsafe.Pointer{unsafe.Pointer(&myTags[1])})
+	})
+
+	t.Run("BlockingWriteRead", func(t *testing.T) {
+		b := NewProfBuf(2, 11, 1)
+		wait := readBlock(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		wait()
+		wait = readBlock(t, b, []uint64{8, 99, 101, 102, 201, 202, 203, 204}, []unsafe.Pointer{unsafe.Pointer(&myTags[2])})
+		time.Sleep(10 * time.Millisecond)
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		wait()
+		wait = readBlock(t, b, nil, nil)
+		b.Close()
+		wait()
+		wait = readBlock(t, b, nil, nil)
+		wait()
+		readEOF(t, b)
+	})
+
+	t.Run("DataWraparound", func(t *testing.T) {
+		b := NewProfBuf(2, 16, 1024)
+		for i := 0; i < 10; i++ {
+			write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+			read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+			read(t, b, nil, nil) // release data returned by previous read
+		}
+	})
+
+	t.Run("TagWraparound", func(t *testing.T) {
+		b := NewProfBuf(2, 1024, 2)
+		for i := 0; i < 10; i++ {
+			write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+			read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+			read(t, b, nil, nil) // release data returned by previous read
+		}
+	})
+
+	t.Run("BothWraparound", func(t *testing.T) {
+		b := NewProfBuf(2, 16, 2)
+		for i := 0; i < 10; i++ {
+			write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+			read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+			read(t, b, nil, nil) // release data returned by previous read
+		}
+	})
+}
diff --git a/src/runtime/proflabel.go b/src/runtime/proflabel.go
new file mode 100644
index 0000000..b2a1617
--- /dev/null
+++ b/src/runtime/proflabel.go
@@ -0,0 +1,40 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+var labelSync uintptr
+
+//go:linkname runtime_setProfLabel runtime/pprof.runtime_setProfLabel
+func runtime_setProfLabel(labels unsafe.Pointer) {
+	// Introduce race edge for read-back via profile.
+	// This would more properly use &getg().labels as the sync address,
+	// but we do the read in a signal handler and can't call the race runtime then.
+	//
+	// This uses racereleasemerge rather than just racerelease so
+	// the acquire in profBuf.read synchronizes with *all* prior
+	// setProfLabel operations, not just the most recent one. This
+	// is important because profBuf.read will observe different
+	// labels set by different setProfLabel operations on
+	// different goroutines, so it needs to synchronize with all
+	// of them (this wouldn't be an issue if we could synchronize
+	// on &getg().labels since we would synchronize with each
+	// most-recent labels write separately.)
+	//
+	// racereleasemerge is like a full read-modify-write on
+	// labelSync, rather than just a store-release, so it carries
+	// a dependency on the previous racereleasemerge, which
+	// ultimately carries forward to the acquire in profBuf.read.
+	if raceenabled {
+		racereleasemerge(unsafe.Pointer(&labelSync))
+	}
+	getg().labels = labels
+}
+
+//go:linkname runtime_getProfLabel runtime/pprof.runtime_getProfLabel
+func runtime_getProfLabel() unsafe.Pointer {
+	return getg().labels
+}
diff --git a/src/runtime/race.go b/src/runtime/race.go
new file mode 100644
index 0000000..79fd217
--- /dev/null
+++ b/src/runtime/race.go
@@ -0,0 +1,644 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build race
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Public race detection API, present iff build with -race.
+
+func RaceRead(addr unsafe.Pointer)
+func RaceWrite(addr unsafe.Pointer)
+func RaceReadRange(addr unsafe.Pointer, len int)
+func RaceWriteRange(addr unsafe.Pointer, len int)
+
+func RaceErrors() int {
+	var n uint64
+	racecall(&__tsan_report_count, uintptr(unsafe.Pointer(&n)), 0, 0, 0)
+	return int(n)
+}
+
+//go:nosplit
+
+// RaceAcquire/RaceRelease/RaceReleaseMerge establish happens-before relations
+// between goroutines. These inform the race detector about actual synchronization
+// that it can't see for some reason (e.g. synchronization within RaceDisable/RaceEnable
+// sections of code).
+// RaceAcquire establishes a happens-before relation with the preceding
+// RaceReleaseMerge on addr up to and including the last RaceRelease on addr.
+// In terms of the C memory model (C11 §5.1.2.4, §7.17.3),
+// RaceAcquire is equivalent to atomic_load(memory_order_acquire).
+func RaceAcquire(addr unsafe.Pointer) {
+	raceacquire(addr)
+}
+
+//go:nosplit
+
+// RaceRelease performs a release operation on addr that
+// can synchronize with a later RaceAcquire on addr.
+//
+// In terms of the C memory model, RaceRelease is equivalent to
+// atomic_store(memory_order_release).
+func RaceRelease(addr unsafe.Pointer) {
+	racerelease(addr)
+}
+
+//go:nosplit
+
+// RaceReleaseMerge is like RaceRelease, but also establishes a happens-before
+// relation with the preceding RaceRelease or RaceReleaseMerge on addr.
+//
+// In terms of the C memory model, RaceReleaseMerge is equivalent to
+// atomic_exchange(memory_order_release).
+func RaceReleaseMerge(addr unsafe.Pointer) {
+	racereleasemerge(addr)
+}
+
+//go:nosplit
+
+// RaceDisable disables handling of race synchronization events in the current goroutine.
+// Handling is re-enabled with RaceEnable. RaceDisable/RaceEnable can be nested.
+// Non-synchronization events (memory accesses, function entry/exit) still affect
+// the race detector.
+func RaceDisable() {
+	_g_ := getg()
+	if _g_.raceignore == 0 {
+		racecall(&__tsan_go_ignore_sync_begin, _g_.racectx, 0, 0, 0)
+	}
+	_g_.raceignore++
+}
+
+//go:nosplit
+
+// RaceEnable re-enables handling of race events in the current goroutine.
+func RaceEnable() {
+	_g_ := getg()
+	_g_.raceignore--
+	if _g_.raceignore == 0 {
+		racecall(&__tsan_go_ignore_sync_end, _g_.racectx, 0, 0, 0)
+	}
+}
+
+// Private interface for the runtime.
+
+const raceenabled = true
+
+// For all functions accepting callerpc and pc,
+// callerpc is a return PC of the function that calls this function,
+// pc is start PC of the function that calls this function.
+func raceReadObjectPC(t *_type, addr unsafe.Pointer, callerpc, pc uintptr) {
+	kind := t.kind & kindMask
+	if kind == kindArray || kind == kindStruct {
+		// for composite objects we have to read every address
+		// because a write might happen to any subobject.
+		racereadrangepc(addr, t.size, callerpc, pc)
+	} else {
+		// for non-composite objects we can read just the start
+		// address, as any write must write the first byte.
+		racereadpc(addr, callerpc, pc)
+	}
+}
+
+func raceWriteObjectPC(t *_type, addr unsafe.Pointer, callerpc, pc uintptr) {
+	kind := t.kind & kindMask
+	if kind == kindArray || kind == kindStruct {
+		// for composite objects we have to write every address
+		// because a write might happen to any subobject.
+		racewriterangepc(addr, t.size, callerpc, pc)
+	} else {
+		// for non-composite objects we can write just the start
+		// address, as any write must write the first byte.
+		racewritepc(addr, callerpc, pc)
+	}
+}
+
+//go:noescape
+func racereadpc(addr unsafe.Pointer, callpc, pc uintptr)
+
+//go:noescape
+func racewritepc(addr unsafe.Pointer, callpc, pc uintptr)
+
+type symbolizeCodeContext struct {
+	pc   uintptr
+	fn   *byte
+	file *byte
+	line uintptr
+	off  uintptr
+	res  uintptr
+}
+
+var qq = [...]byte{'?', '?', 0}
+var dash = [...]byte{'-', 0}
+
+const (
+	raceGetProcCmd = iota
+	raceSymbolizeCodeCmd
+	raceSymbolizeDataCmd
+)
+
+// Callback from C into Go, runs on g0.
+func racecallback(cmd uintptr, ctx unsafe.Pointer) {
+	switch cmd {
+	case raceGetProcCmd:
+		throw("should have been handled by racecallbackthunk")
+	case raceSymbolizeCodeCmd:
+		raceSymbolizeCode((*symbolizeCodeContext)(ctx))
+	case raceSymbolizeDataCmd:
+		raceSymbolizeData((*symbolizeDataContext)(ctx))
+	default:
+		throw("unknown command")
+	}
+}
+
+// raceSymbolizeCode reads ctx.pc and populates the rest of *ctx with
+// information about the code at that pc.
+//
+// The race detector has already subtracted 1 from pcs, so they point to the last
+// byte of call instructions (including calls to runtime.racewrite and friends).
+//
+// If the incoming pc is part of an inlined function, *ctx is populated
+// with information about the inlined function, and on return ctx.pc is set
+// to a pc in the logically containing function. (The race detector should call this
+// function again with that pc.)
+//
+// If the incoming pc is not part of an inlined function, the return pc is unchanged.
+func raceSymbolizeCode(ctx *symbolizeCodeContext) {
+	pc := ctx.pc
+	fi := findfunc(pc)
+	f := fi._Func()
+	if f != nil {
+		file, line := f.FileLine(pc)
+		if line != 0 {
+			if inldata := funcdata(fi, _FUNCDATA_InlTree); inldata != nil {
+				inltree := (*[1 << 20]inlinedCall)(inldata)
+				for {
+					ix := pcdatavalue(fi, _PCDATA_InlTreeIndex, pc, nil)
+					if ix >= 0 {
+						if inltree[ix].funcID == funcID_wrapper {
+							// ignore wrappers
+							// Back up to an instruction in the "caller".
+							pc = f.Entry() + uintptr(inltree[ix].parentPc)
+							continue
+						}
+						ctx.pc = f.Entry() + uintptr(inltree[ix].parentPc) // "caller" pc
+						ctx.fn = cfuncnameFromNameoff(fi, inltree[ix].func_)
+						ctx.line = uintptr(line)
+						ctx.file = &bytes(file)[0] // assume NUL-terminated
+						ctx.off = pc - f.Entry()
+						ctx.res = 1
+						return
+					}
+					break
+				}
+			}
+			ctx.fn = cfuncname(fi)
+			ctx.line = uintptr(line)
+			ctx.file = &bytes(file)[0] // assume NUL-terminated
+			ctx.off = pc - f.Entry()
+			ctx.res = 1
+			return
+		}
+	}
+	ctx.fn = &qq[0]
+	ctx.file = &dash[0]
+	ctx.line = 0
+	ctx.off = ctx.pc
+	ctx.res = 1
+}
+
+type symbolizeDataContext struct {
+	addr  uintptr
+	heap  uintptr
+	start uintptr
+	size  uintptr
+	name  *byte
+	file  *byte
+	line  uintptr
+	res   uintptr
+}
+
+func raceSymbolizeData(ctx *symbolizeDataContext) {
+	if base, span, _ := findObject(ctx.addr, 0, 0); base != 0 {
+		ctx.heap = 1
+		ctx.start = base
+		ctx.size = span.elemsize
+		ctx.res = 1
+	}
+}
+
+// Race runtime functions called via runtime·racecall.
+//go:linkname __tsan_init __tsan_init
+var __tsan_init byte
+
+//go:linkname __tsan_fini __tsan_fini
+var __tsan_fini byte
+
+//go:linkname __tsan_proc_create __tsan_proc_create
+var __tsan_proc_create byte
+
+//go:linkname __tsan_proc_destroy __tsan_proc_destroy
+var __tsan_proc_destroy byte
+
+//go:linkname __tsan_map_shadow __tsan_map_shadow
+var __tsan_map_shadow byte
+
+//go:linkname __tsan_finalizer_goroutine __tsan_finalizer_goroutine
+var __tsan_finalizer_goroutine byte
+
+//go:linkname __tsan_go_start __tsan_go_start
+var __tsan_go_start byte
+
+//go:linkname __tsan_go_end __tsan_go_end
+var __tsan_go_end byte
+
+//go:linkname __tsan_malloc __tsan_malloc
+var __tsan_malloc byte
+
+//go:linkname __tsan_free __tsan_free
+var __tsan_free byte
+
+//go:linkname __tsan_acquire __tsan_acquire
+var __tsan_acquire byte
+
+//go:linkname __tsan_release __tsan_release
+var __tsan_release byte
+
+//go:linkname __tsan_release_acquire __tsan_release_acquire
+var __tsan_release_acquire byte
+
+//go:linkname __tsan_release_merge __tsan_release_merge
+var __tsan_release_merge byte
+
+//go:linkname __tsan_go_ignore_sync_begin __tsan_go_ignore_sync_begin
+var __tsan_go_ignore_sync_begin byte
+
+//go:linkname __tsan_go_ignore_sync_end __tsan_go_ignore_sync_end
+var __tsan_go_ignore_sync_end byte
+
+//go:linkname __tsan_report_count __tsan_report_count
+var __tsan_report_count byte
+
+// Mimic what cmd/cgo would do.
+//go:cgo_import_static __tsan_init
+//go:cgo_import_static __tsan_fini
+//go:cgo_import_static __tsan_proc_create
+//go:cgo_import_static __tsan_proc_destroy
+//go:cgo_import_static __tsan_map_shadow
+//go:cgo_import_static __tsan_finalizer_goroutine
+//go:cgo_import_static __tsan_go_start
+//go:cgo_import_static __tsan_go_end
+//go:cgo_import_static __tsan_malloc
+//go:cgo_import_static __tsan_free
+//go:cgo_import_static __tsan_acquire
+//go:cgo_import_static __tsan_release
+//go:cgo_import_static __tsan_release_acquire
+//go:cgo_import_static __tsan_release_merge
+//go:cgo_import_static __tsan_go_ignore_sync_begin
+//go:cgo_import_static __tsan_go_ignore_sync_end
+//go:cgo_import_static __tsan_report_count
+
+// These are called from race_amd64.s.
+//go:cgo_import_static __tsan_read
+//go:cgo_import_static __tsan_read_pc
+//go:cgo_import_static __tsan_read_range
+//go:cgo_import_static __tsan_write
+//go:cgo_import_static __tsan_write_pc
+//go:cgo_import_static __tsan_write_range
+//go:cgo_import_static __tsan_func_enter
+//go:cgo_import_static __tsan_func_exit
+
+//go:cgo_import_static __tsan_go_atomic32_load
+//go:cgo_import_static __tsan_go_atomic64_load
+//go:cgo_import_static __tsan_go_atomic32_store
+//go:cgo_import_static __tsan_go_atomic64_store
+//go:cgo_import_static __tsan_go_atomic32_exchange
+//go:cgo_import_static __tsan_go_atomic64_exchange
+//go:cgo_import_static __tsan_go_atomic32_fetch_add
+//go:cgo_import_static __tsan_go_atomic64_fetch_add
+//go:cgo_import_static __tsan_go_atomic32_compare_exchange
+//go:cgo_import_static __tsan_go_atomic64_compare_exchange
+
+// start/end of global data (data+bss).
+var racedatastart uintptr
+var racedataend uintptr
+
+// start/end of heap for race_amd64.s
+var racearenastart uintptr
+var racearenaend uintptr
+
+func racefuncenter(callpc uintptr)
+func racefuncenterfp(fp uintptr)
+func racefuncexit()
+func raceread(addr uintptr)
+func racewrite(addr uintptr)
+func racereadrange(addr, size uintptr)
+func racewriterange(addr, size uintptr)
+func racereadrangepc1(addr, size, pc uintptr)
+func racewriterangepc1(addr, size, pc uintptr)
+func racecallbackthunk(uintptr)
+
+// racecall allows calling an arbitrary function f from C race runtime
+// with up to 4 uintptr arguments.
+func racecall(fn *byte, arg0, arg1, arg2, arg3 uintptr)
+
+// checks if the address has shadow (i.e. heap or data/bss)
+//go:nosplit
+func isvalidaddr(addr unsafe.Pointer) bool {
+	return racearenastart <= uintptr(addr) && uintptr(addr) < racearenaend ||
+		racedatastart <= uintptr(addr) && uintptr(addr) < racedataend
+}
+
+//go:nosplit
+func raceinit() (gctx, pctx uintptr) {
+	// cgo is required to initialize libc, which is used by race runtime
+	if !iscgo {
+		throw("raceinit: race build must use cgo")
+	}
+
+	racecall(&__tsan_init, uintptr(unsafe.Pointer(&gctx)), uintptr(unsafe.Pointer(&pctx)), funcPC(racecallbackthunk), 0)
+
+	// Round data segment to page boundaries, because it's used in mmap().
+	start := ^uintptr(0)
+	end := uintptr(0)
+	if start > firstmoduledata.noptrdata {
+		start = firstmoduledata.noptrdata
+	}
+	if start > firstmoduledata.data {
+		start = firstmoduledata.data
+	}
+	if start > firstmoduledata.noptrbss {
+		start = firstmoduledata.noptrbss
+	}
+	if start > firstmoduledata.bss {
+		start = firstmoduledata.bss
+	}
+	if end < firstmoduledata.enoptrdata {
+		end = firstmoduledata.enoptrdata
+	}
+	if end < firstmoduledata.edata {
+		end = firstmoduledata.edata
+	}
+	if end < firstmoduledata.enoptrbss {
+		end = firstmoduledata.enoptrbss
+	}
+	if end < firstmoduledata.ebss {
+		end = firstmoduledata.ebss
+	}
+	size := alignUp(end-start, _PageSize)
+	racecall(&__tsan_map_shadow, start, size, 0, 0)
+	racedatastart = start
+	racedataend = start + size
+
+	return
+}
+
+var raceFiniLock mutex
+
+//go:nosplit
+func racefini() {
+	// racefini() can only be called once to avoid races.
+	// This eventually (via __tsan_fini) calls C.exit which has
+	// undefined behavior if called more than once. If the lock is
+	// already held it's assumed that the first caller exits the program
+	// so other calls can hang forever without an issue.
+	lock(&raceFiniLock)
+	// We're entering external code that may call ExitProcess on
+	// Windows.
+	osPreemptExtEnter(getg().m)
+	racecall(&__tsan_fini, 0, 0, 0, 0)
+}
+
+//go:nosplit
+func raceproccreate() uintptr {
+	var ctx uintptr
+	racecall(&__tsan_proc_create, uintptr(unsafe.Pointer(&ctx)), 0, 0, 0)
+	return ctx
+}
+
+//go:nosplit
+func raceprocdestroy(ctx uintptr) {
+	racecall(&__tsan_proc_destroy, ctx, 0, 0, 0)
+}
+
+//go:nosplit
+func racemapshadow(addr unsafe.Pointer, size uintptr) {
+	if racearenastart == 0 {
+		racearenastart = uintptr(addr)
+	}
+	if racearenaend < uintptr(addr)+size {
+		racearenaend = uintptr(addr) + size
+	}
+	racecall(&__tsan_map_shadow, uintptr(addr), size, 0, 0)
+}
+
+//go:nosplit
+func racemalloc(p unsafe.Pointer, sz uintptr) {
+	racecall(&__tsan_malloc, 0, 0, uintptr(p), sz)
+}
+
+//go:nosplit
+func racefree(p unsafe.Pointer, sz uintptr) {
+	racecall(&__tsan_free, uintptr(p), sz, 0, 0)
+}
+
+//go:nosplit
+func racegostart(pc uintptr) uintptr {
+	_g_ := getg()
+	var spawng *g
+	if _g_.m.curg != nil {
+		spawng = _g_.m.curg
+	} else {
+		spawng = _g_
+	}
+
+	var racectx uintptr
+	racecall(&__tsan_go_start, spawng.racectx, uintptr(unsafe.Pointer(&racectx)), pc, 0)
+	return racectx
+}
+
+//go:nosplit
+func racegoend() {
+	racecall(&__tsan_go_end, getg().racectx, 0, 0, 0)
+}
+
+//go:nosplit
+func racectxend(racectx uintptr) {
+	racecall(&__tsan_go_end, racectx, 0, 0, 0)
+}
+
+//go:nosplit
+func racewriterangepc(addr unsafe.Pointer, sz, callpc, pc uintptr) {
+	_g_ := getg()
+	if _g_ != _g_.m.curg {
+		// The call is coming from manual instrumentation of Go code running on g0/gsignal.
+		// Not interesting.
+		return
+	}
+	if callpc != 0 {
+		racefuncenter(callpc)
+	}
+	racewriterangepc1(uintptr(addr), sz, pc)
+	if callpc != 0 {
+		racefuncexit()
+	}
+}
+
+//go:nosplit
+func racereadrangepc(addr unsafe.Pointer, sz, callpc, pc uintptr) {
+	_g_ := getg()
+	if _g_ != _g_.m.curg {
+		// The call is coming from manual instrumentation of Go code running on g0/gsignal.
+		// Not interesting.
+		return
+	}
+	if callpc != 0 {
+		racefuncenter(callpc)
+	}
+	racereadrangepc1(uintptr(addr), sz, pc)
+	if callpc != 0 {
+		racefuncexit()
+	}
+}
+
+//go:nosplit
+func raceacquire(addr unsafe.Pointer) {
+	raceacquireg(getg(), addr)
+}
+
+//go:nosplit
+func raceacquireg(gp *g, addr unsafe.Pointer) {
+	if getg().raceignore != 0 || !isvalidaddr(addr) {
+		return
+	}
+	racecall(&__tsan_acquire, gp.racectx, uintptr(addr), 0, 0)
+}
+
+//go:nosplit
+func raceacquirectx(racectx uintptr, addr unsafe.Pointer) {
+	if !isvalidaddr(addr) {
+		return
+	}
+	racecall(&__tsan_acquire, racectx, uintptr(addr), 0, 0)
+}
+
+//go:nosplit
+func racerelease(addr unsafe.Pointer) {
+	racereleaseg(getg(), addr)
+}
+
+//go:nosplit
+func racereleaseg(gp *g, addr unsafe.Pointer) {
+	if getg().raceignore != 0 || !isvalidaddr(addr) {
+		return
+	}
+	racecall(&__tsan_release, gp.racectx, uintptr(addr), 0, 0)
+}
+
+//go:nosplit
+func racereleaseacquire(addr unsafe.Pointer) {
+	racereleaseacquireg(getg(), addr)
+}
+
+//go:nosplit
+func racereleaseacquireg(gp *g, addr unsafe.Pointer) {
+	if getg().raceignore != 0 || !isvalidaddr(addr) {
+		return
+	}
+	racecall(&__tsan_release_acquire, gp.racectx, uintptr(addr), 0, 0)
+}
+
+//go:nosplit
+func racereleasemerge(addr unsafe.Pointer) {
+	racereleasemergeg(getg(), addr)
+}
+
+//go:nosplit
+func racereleasemergeg(gp *g, addr unsafe.Pointer) {
+	if getg().raceignore != 0 || !isvalidaddr(addr) {
+		return
+	}
+	racecall(&__tsan_release_merge, gp.racectx, uintptr(addr), 0, 0)
+}
+
+//go:nosplit
+func racefingo() {
+	racecall(&__tsan_finalizer_goroutine, getg().racectx, 0, 0, 0)
+}
+
+// The declarations below generate ABI wrappers for functions
+// implemented in assembly in this package but declared in another
+// package.
+
+//go:linkname abigen_sync_atomic_LoadInt32 sync/atomic.LoadInt32
+func abigen_sync_atomic_LoadInt32(addr *int32) (val int32)
+
+//go:linkname abigen_sync_atomic_LoadInt64 sync/atomic.LoadInt64
+func abigen_sync_atomic_LoadInt64(addr *int64) (val int64)
+
+//go:linkname abigen_sync_atomic_LoadUint32 sync/atomic.LoadUint32
+func abigen_sync_atomic_LoadUint32(addr *uint32) (val uint32)
+
+//go:linkname abigen_sync_atomic_LoadUint64 sync/atomic.LoadUint64
+func abigen_sync_atomic_LoadUint64(addr *uint64) (val uint64)
+
+//go:linkname abigen_sync_atomic_LoadUintptr sync/atomic.LoadUintptr
+func abigen_sync_atomic_LoadUintptr(addr *uintptr) (val uintptr)
+
+//go:linkname abigen_sync_atomic_LoadPointer sync/atomic.LoadPointer
+func abigen_sync_atomic_LoadPointer(addr *unsafe.Pointer) (val unsafe.Pointer)
+
+//go:linkname abigen_sync_atomic_StoreInt32 sync/atomic.StoreInt32
+func abigen_sync_atomic_StoreInt32(addr *int32, val int32)
+
+//go:linkname abigen_sync_atomic_StoreInt64 sync/atomic.StoreInt64
+func abigen_sync_atomic_StoreInt64(addr *int64, val int64)
+
+//go:linkname abigen_sync_atomic_StoreUint32 sync/atomic.StoreUint32
+func abigen_sync_atomic_StoreUint32(addr *uint32, val uint32)
+
+//go:linkname abigen_sync_atomic_StoreUint64 sync/atomic.StoreUint64
+func abigen_sync_atomic_StoreUint64(addr *uint64, val uint64)
+
+//go:linkname abigen_sync_atomic_SwapInt32 sync/atomic.SwapInt32
+func abigen_sync_atomic_SwapInt32(addr *int32, new int32) (old int32)
+
+//go:linkname abigen_sync_atomic_SwapInt64 sync/atomic.SwapInt64
+func abigen_sync_atomic_SwapInt64(addr *int64, new int64) (old int64)
+
+//go:linkname abigen_sync_atomic_SwapUint32 sync/atomic.SwapUint32
+func abigen_sync_atomic_SwapUint32(addr *uint32, new uint32) (old uint32)
+
+//go:linkname abigen_sync_atomic_SwapUint64 sync/atomic.SwapUint64
+func abigen_sync_atomic_SwapUint64(addr *uint64, new uint64) (old uint64)
+
+//go:linkname abigen_sync_atomic_AddInt32 sync/atomic.AddInt32
+func abigen_sync_atomic_AddInt32(addr *int32, delta int32) (new int32)
+
+//go:linkname abigen_sync_atomic_AddUint32 sync/atomic.AddUint32
+func abigen_sync_atomic_AddUint32(addr *uint32, delta uint32) (new uint32)
+
+//go:linkname abigen_sync_atomic_AddInt64 sync/atomic.AddInt64
+func abigen_sync_atomic_AddInt64(addr *int64, delta int64) (new int64)
+
+//go:linkname abigen_sync_atomic_AddUint64 sync/atomic.AddUint64
+func abigen_sync_atomic_AddUint64(addr *uint64, delta uint64) (new uint64)
+
+//go:linkname abigen_sync_atomic_AddUintptr sync/atomic.AddUintptr
+func abigen_sync_atomic_AddUintptr(addr *uintptr, delta uintptr) (new uintptr)
+
+//go:linkname abigen_sync_atomic_CompareAndSwapInt32 sync/atomic.CompareAndSwapInt32
+func abigen_sync_atomic_CompareAndSwapInt32(addr *int32, old, new int32) (swapped bool)
+
+//go:linkname abigen_sync_atomic_CompareAndSwapInt64 sync/atomic.CompareAndSwapInt64
+func abigen_sync_atomic_CompareAndSwapInt64(addr *int64, old, new int64) (swapped bool)
+
+//go:linkname abigen_sync_atomic_CompareAndSwapUint32 sync/atomic.CompareAndSwapUint32
+func abigen_sync_atomic_CompareAndSwapUint32(addr *uint32, old, new uint32) (swapped bool)
+
+//go:linkname abigen_sync_atomic_CompareAndSwapUint64 sync/atomic.CompareAndSwapUint64
+func abigen_sync_atomic_CompareAndSwapUint64(addr *uint64, old, new uint64) (swapped bool)
diff --git a/src/runtime/race/README b/src/runtime/race/README
new file mode 100644
index 0000000..178ab94
--- /dev/null
+++ b/src/runtime/race/README
@@ -0,0 +1,14 @@
+runtime/race package contains the data race detector runtime library.
+It is based on ThreadSanitizer race detector, that is currently a part of
+the LLVM project (https://github.com/llvm/llvm-project/tree/master/compiler-rt).
+
+To update the .syso files use golang.org/x/build/cmd/racebuild.
+
+race_darwin_amd64.syso built with LLVM 89f7ccea6f6488c443655880229c54db1f180153 and Go f62d3202bf9dbb3a00ad2a2c63ff4fa4188c5d3b.
+race_freebsd_amd64.syso built with LLVM 89f7ccea6f6488c443655880229c54db1f180153 and Go f62d3202bf9dbb3a00ad2a2c63ff4fa4188c5d3b.
+race_linux_amd64.syso built with LLVM 89f7ccea6f6488c443655880229c54db1f180153 and Go f62d3202bf9dbb3a00ad2a2c63ff4fa4188c5d3b.
+race_linux_ppc64le.syso built with LLVM 89f7ccea6f6488c443655880229c54db1f180153 and Go f62d3202bf9dbb3a00ad2a2c63ff4fa4188c5d3b.
+race_netbsd_amd64.syso built with LLVM 89f7ccea6f6488c443655880229c54db1f180153 and Go f62d3202bf9dbb3a00ad2a2c63ff4fa4188c5d3b.
+race_windows_amd64.syso built with LLVM 89f7ccea6f6488c443655880229c54db1f180153 and Go f62d3202bf9dbb3a00ad2a2c63ff4fa4188c5d3b.
+race_linux_arm64.syso built with LLVM 89f7ccea6f6488c443655880229c54db1f180153 and Go f62d3202bf9dbb3a00ad2a2c63ff4fa4188c5d3b.
+race_darwin_arm64.syso built with LLVM 00da38ce2d36c07f12c287dc515d37bb7bc410e9 and Go fe70a3a0fd31441bcbb9932ecab11a6083cf2119.
diff --git a/src/runtime/race/doc.go b/src/runtime/race/doc.go
new file mode 100644
index 0000000..9e93f66
--- /dev/null
+++ b/src/runtime/race/doc.go
@@ -0,0 +1,9 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package race implements data race detection logic.
+// No public interface is provided.
+// For details about the race detector see
+// https://golang.org/doc/articles/race_detector.html
+package race
diff --git a/src/runtime/race/output_test.go b/src/runtime/race/output_test.go
new file mode 100644
index 0000000..6949687
--- /dev/null
+++ b/src/runtime/race/output_test.go
@@ -0,0 +1,412 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build race
+
+package race_test
+
+import (
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"regexp"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+func TestOutput(t *testing.T) {
+	pkgdir, err := os.MkdirTemp("", "go-build-race-output")
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer os.RemoveAll(pkgdir)
+	out, err := exec.Command(testenv.GoToolPath(t), "install", "-race", "-pkgdir="+pkgdir, "testing").CombinedOutput()
+	if err != nil {
+		t.Fatalf("go install -race: %v\n%s", err, out)
+	}
+
+	for _, test := range tests {
+		if test.goos != "" && test.goos != runtime.GOOS {
+			t.Logf("test %v runs only on %v, skipping: ", test.name, test.goos)
+			continue
+		}
+		dir, err := os.MkdirTemp("", "go-build")
+		if err != nil {
+			t.Fatalf("failed to create temp directory: %v", err)
+		}
+		defer os.RemoveAll(dir)
+		source := "main.go"
+		if test.run == "test" {
+			source = "main_test.go"
+		}
+		src := filepath.Join(dir, source)
+		f, err := os.Create(src)
+		if err != nil {
+			t.Fatalf("failed to create file: %v", err)
+		}
+		_, err = f.WriteString(test.source)
+		if err != nil {
+			f.Close()
+			t.Fatalf("failed to write: %v", err)
+		}
+		if err := f.Close(); err != nil {
+			t.Fatalf("failed to close file: %v", err)
+		}
+
+		cmd := exec.Command(testenv.GoToolPath(t), test.run, "-race", "-pkgdir="+pkgdir, src)
+		// GODEBUG spoils program output, GOMAXPROCS makes it flaky.
+		for _, env := range os.Environ() {
+			if strings.HasPrefix(env, "GODEBUG=") ||
+				strings.HasPrefix(env, "GOMAXPROCS=") ||
+				strings.HasPrefix(env, "GORACE=") {
+				continue
+			}
+			cmd.Env = append(cmd.Env, env)
+		}
+		cmd.Env = append(cmd.Env,
+			"GOMAXPROCS=1", // see comment in race_test.go
+			"GORACE="+test.gorace,
+		)
+		got, _ := cmd.CombinedOutput()
+		if !regexp.MustCompile(test.re).MatchString(string(got)) {
+			t.Fatalf("failed test case %v, expect:\n%v\ngot:\n%s",
+				test.name, test.re, got)
+		}
+	}
+}
+
+var tests = []struct {
+	name   string
+	run    string
+	goos   string
+	gorace string
+	source string
+	re     string
+}{
+	{"simple", "run", "", "atexit_sleep_ms=0", `
+package main
+import "time"
+func main() {
+	done := make(chan bool)
+	x := 0
+	startRacer(&x, done)
+	store(&x, 43)
+	<-done
+}
+func store(x *int, v int) {
+	*x = v
+}
+func startRacer(x *int, done chan bool) {
+	go racer(x, done)
+}
+func racer(x *int, done chan bool) {
+	time.Sleep(10*time.Millisecond)
+	store(x, 42)
+	done <- true
+}
+`, `==================
+WARNING: DATA RACE
+Write at 0x[0-9,a-f]+ by goroutine [0-9]:
+  main\.store\(\)
+      .+/main\.go:12 \+0x[0-9,a-f]+
+  main\.racer\(\)
+      .+/main\.go:19 \+0x[0-9,a-f]+
+
+Previous write at 0x[0-9,a-f]+ by main goroutine:
+  main\.store\(\)
+      .+/main\.go:12 \+0x[0-9,a-f]+
+  main\.main\(\)
+      .+/main\.go:8 \+0x[0-9,a-f]+
+
+Goroutine [0-9] \(running\) created at:
+  main\.startRacer\(\)
+      .+/main\.go:15 \+0x[0-9,a-f]+
+  main\.main\(\)
+      .+/main\.go:7 \+0x[0-9,a-f]+
+==================
+Found 1 data race\(s\)
+exit status 66
+`},
+
+	{"exitcode", "run", "", "atexit_sleep_ms=0 exitcode=13", `
+package main
+func main() {
+	done := make(chan bool)
+	x := 0
+	go func() {
+		x = 42
+		done <- true
+	}()
+	x = 43
+	<-done
+}
+`, `exit status 13`},
+
+	{"strip_path_prefix", "run", "", "atexit_sleep_ms=0 strip_path_prefix=/main.", `
+package main
+func main() {
+	done := make(chan bool)
+	x := 0
+	go func() {
+		x = 42
+		done <- true
+	}()
+	x = 43
+	<-done
+}
+`, `
+      go:7 \+0x[0-9,a-f]+
+`},
+
+	{"halt_on_error", "run", "", "atexit_sleep_ms=0 halt_on_error=1", `
+package main
+func main() {
+	done := make(chan bool)
+	x := 0
+	go func() {
+		x = 42
+		done <- true
+	}()
+	x = 43
+	<-done
+}
+`, `
+==================
+exit status 66
+`},
+
+	{"test_fails_on_race", "test", "", "atexit_sleep_ms=0", `
+package main_test
+import "testing"
+func TestFail(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	_ = x
+	go func() {
+		x = 42
+		done <- true
+	}()
+	x = 43
+	<-done
+	t.Log(t.Failed())
+}
+`, `
+==================
+--- FAIL: TestFail \(0...s\)
+.*main_test.go:14: true
+.*testing.go:.*: race detected during execution of test
+FAIL`},
+
+	{"slicebytetostring_pc", "run", "", "atexit_sleep_ms=0", `
+package main
+func main() {
+	done := make(chan string)
+	data := make([]byte, 10)
+	go func() {
+		done <- string(data)
+	}()
+	data[0] = 1
+	<-done
+}
+`, `
+  runtime\.slicebytetostring\(\)
+      .*/runtime/string\.go:.*
+  main\.main\.func1\(\)
+      .*/main.go:7`},
+
+	// Test for https://golang.org/issue/33309
+	{"midstack_inlining_traceback", "run", "linux", "atexit_sleep_ms=0", `
+package main
+
+var x int
+
+func main() {
+	c := make(chan int)
+	go f(c)
+	x = 1
+	<-c
+}
+
+func f(c chan int) {
+	g(c)
+}
+
+func g(c chan int) {
+	h(c)
+}
+
+func h(c chan int) {
+	c <- x
+}
+`, `==================
+WARNING: DATA RACE
+Read at 0x[0-9,a-f]+ by goroutine [0-9]:
+  main\.h\(\)
+      .+/main\.go:22 \+0x[0-9,a-f]+
+  main\.g\(\)
+      .+/main\.go:18 \+0x[0-9,a-f]+
+  main\.f\(\)
+      .+/main\.go:14 \+0x[0-9,a-f]+
+
+Previous write at 0x[0-9,a-f]+ by main goroutine:
+  main\.main\(\)
+      .+/main\.go:9 \+0x[0-9,a-f]+
+
+Goroutine [0-9] \(running\) created at:
+  main\.main\(\)
+      .+/main\.go:8 \+0x[0-9,a-f]+
+==================
+Found 1 data race\(s\)
+exit status 66
+`},
+
+	// Test for https://golang.org/issue/17190
+	{"external_cgo_thread", "run", "linux", "atexit_sleep_ms=0", `
+package main
+
+/*
+#include <pthread.h>
+typedef struct cb {
+        int foo;
+} cb;
+extern void goCallback();
+static inline void *threadFunc(void *p) {
+	goCallback();
+	return 0;
+}
+static inline void startThread(cb* c) {
+	pthread_t th;
+	pthread_create(&th, 0, threadFunc, 0);
+}
+*/
+import "C"
+
+var done chan bool
+var racy int
+
+//export goCallback
+func goCallback() {
+	racy++
+	done <- true
+}
+
+func main() {
+	done = make(chan bool)
+	var c C.cb
+	C.startThread(&c)
+	racy++
+	<- done
+}
+`, `==================
+WARNING: DATA RACE
+Read at 0x[0-9,a-f]+ by .*:
+  main\..*
+      .*/main\.go:[0-9]+ \+0x[0-9,a-f]+(?s).*
+
+Previous write at 0x[0-9,a-f]+ by .*:
+  main\..*
+      .*/main\.go:[0-9]+ \+0x[0-9,a-f]+(?s).*
+
+Goroutine [0-9] \(running\) created at:
+  runtime\.newextram\(\)
+      .*/runtime/proc.go:[0-9]+ \+0x[0-9,a-f]+
+==================`},
+	{"second_test_passes", "test", "", "atexit_sleep_ms=0", `
+package main_test
+import "testing"
+func TestFail(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	_ = x
+	go func() {
+		x = 42
+		done <- true
+	}()
+	x = 43
+	<-done
+}
+
+func TestPass(t *testing.T) {
+}
+`, `
+==================
+--- FAIL: TestFail \(0...s\)
+.*testing.go:.*: race detected during execution of test
+FAIL`},
+	{"mutex", "run", "", "atexit_sleep_ms=0", `
+package main
+import (
+	"sync"
+	"fmt"
+)
+func main() {
+	c := make(chan bool, 1)
+	threads := 1
+	iterations := 20000
+	data := 0
+	var wg sync.WaitGroup
+	for i := 0; i < threads; i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			for i := 0; i < iterations; i++ {
+				c <- true
+				data += 1
+				<- c
+			}
+		}()
+	}
+	for i := 0; i < iterations; i++ {
+		c <- true
+		data += 1
+		<- c
+	}
+	wg.Wait()
+	if (data == iterations*(threads+1)) { fmt.Println("pass") }
+}`, `pass`},
+	// Test for https://github.com/golang/go/issues/37355
+	{"chanmm", "run", "", "atexit_sleep_ms=0", `
+package main
+import (
+	"sync"
+	"time"
+)
+func main() {
+	c := make(chan bool, 1)
+	var data uint64
+	var wg sync.WaitGroup
+	wg.Add(2)
+	c <- true
+	go func() {
+		defer wg.Done()
+		c <- true
+	}()
+	go func() {
+		defer wg.Done()
+		time.Sleep(time.Second)
+		<-c
+		data = 2
+	}()
+	data = 1
+	<-c
+	wg.Wait()
+	_ = data
+}
+`, `==================
+WARNING: DATA RACE
+Write at 0x[0-9,a-f]+ by goroutine [0-9]:
+  main\.main\.func2\(\)
+      .*/main\.go:21 \+0x[0-9,a-f]+
+
+Previous write at 0x[0-9,a-f]+ by main goroutine:
+  main\.main\(\)
+      .*/main\.go:23 \+0x[0-9,a-f]+
+
+Goroutine [0-9] \(running\) created at:
+  main\.main\(\)
+      .*/main.go:[0-9]+ \+0x[0-9,a-f]+
+==================`},
+}
diff --git a/src/runtime/race/race.go b/src/runtime/race/race.go
new file mode 100644
index 0000000..d6a14b7
--- /dev/null
+++ b/src/runtime/race/race.go
@@ -0,0 +1,15 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build race,linux,amd64 race,freebsd,amd64 race,netbsd,amd64 race,darwin,amd64 race,windows,amd64 race,linux,ppc64le race,linux,arm64 race,darwin,arm64
+
+package race
+
+// This file merely ensures that we link in runtime/cgo in race build,
+// this in turn ensures that runtime uses pthread_create to create threads.
+// The prebuilt race runtime lives in race_GOOS_GOARCH.syso.
+// Calls to the runtime are done directly from src/runtime/race.go.
+
+// void __race_unused_func(void);
+import "C"
diff --git a/src/runtime/race/race_darwin_amd64.syso b/src/runtime/race/race_darwin_amd64.syso
new file mode 100644
index 0000000..3f95ecc
--- /dev/null
+++ b/src/runtime/race/race_darwin_amd64.syso
diff --git a/src/runtime/race/race_darwin_arm64.syso b/src/runtime/race/race_darwin_arm64.syso
new file mode 100644
index 0000000..f6eaa62
--- /dev/null
+++ b/src/runtime/race/race_darwin_arm64.syso
diff --git a/src/runtime/race/race_freebsd_amd64.syso b/src/runtime/race/race_freebsd_amd64.syso
new file mode 100644
index 0000000..2a5b46f
--- /dev/null
+++ b/src/runtime/race/race_freebsd_amd64.syso
diff --git a/src/runtime/race/race_linux_amd64.syso b/src/runtime/race/race_linux_amd64.syso
new file mode 100644
index 0000000..e00398c
--- /dev/null
+++ b/src/runtime/race/race_linux_amd64.syso
diff --git a/src/runtime/race/race_linux_arm64.syso b/src/runtime/race/race_linux_arm64.syso
new file mode 100644
index 0000000..9dae738
--- /dev/null
+++ b/src/runtime/race/race_linux_arm64.syso
diff --git a/src/runtime/race/race_linux_ppc64le.syso b/src/runtime/race/race_linux_ppc64le.syso
new file mode 100644
index 0000000..b562656
--- /dev/null
+++ b/src/runtime/race/race_linux_ppc64le.syso
diff --git a/src/runtime/race/race_linux_test.go b/src/runtime/race/race_linux_test.go
new file mode 100644
index 0000000..c00ce4d
--- /dev/null
+++ b/src/runtime/race/race_linux_test.go
@@ -0,0 +1,37 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux,race
+
+package race_test
+
+import (
+	"sync/atomic"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+func TestAtomicMmap(t *testing.T) {
+	// Test that atomic operations work on "external" memory. Previously they crashed (#16206).
+	// Also do a sanity correctness check: under race detector atomic operations
+	// are implemented inside of race runtime.
+	mem, err := syscall.Mmap(-1, 0, 1<<20, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_ANON|syscall.MAP_PRIVATE)
+	if err != nil {
+		t.Fatalf("mmap failed: %v", err)
+	}
+	defer syscall.Munmap(mem)
+	a := (*uint64)(unsafe.Pointer(&mem[0]))
+	if *a != 0 {
+		t.Fatalf("bad atomic value: %v, want 0", *a)
+	}
+	atomic.AddUint64(a, 1)
+	if *a != 1 {
+		t.Fatalf("bad atomic value: %v, want 1", *a)
+	}
+	atomic.AddUint64(a, 1)
+	if *a != 2 {
+		t.Fatalf("bad atomic value: %v, want 2", *a)
+	}
+}
diff --git a/src/runtime/race/race_netbsd_amd64.syso b/src/runtime/race/race_netbsd_amd64.syso
new file mode 100644
index 0000000..11af16f
--- /dev/null
+++ b/src/runtime/race/race_netbsd_amd64.syso
diff --git a/src/runtime/race/race_test.go b/src/runtime/race/race_test.go
new file mode 100644
index 0000000..d433af6
--- /dev/null
+++ b/src/runtime/race/race_test.go
@@ -0,0 +1,250 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build race
+
+// This program is used to verify the race detector
+// by running the tests and parsing their output.
+// It does not check stack correctness, completeness or anything else:
+// it merely verifies that if a test is expected to be racy
+// then the race is detected.
+package race_test
+
+import (
+	"bufio"
+	"bytes"
+	"fmt"
+	"internal/testenv"
+	"io"
+	"log"
+	"math/rand"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"testing"
+)
+
+var (
+	passedTests = 0
+	totalTests  = 0
+	falsePos    = 0
+	falseNeg    = 0
+	failingPos  = 0
+	failingNeg  = 0
+	failed      = false
+)
+
+const (
+	visibleLen = 40
+	testPrefix = "=== RUN   Test"
+)
+
+func TestRace(t *testing.T) {
+	testOutput, err := runTests(t)
+	if err != nil {
+		t.Fatalf("Failed to run tests: %v\n%v", err, string(testOutput))
+	}
+	reader := bufio.NewReader(bytes.NewReader(testOutput))
+
+	funcName := ""
+	var tsanLog []string
+	for {
+		s, err := nextLine(reader)
+		if err != nil {
+			fmt.Printf("%s\n", processLog(funcName, tsanLog))
+			break
+		}
+		if strings.HasPrefix(s, testPrefix) {
+			fmt.Printf("%s\n", processLog(funcName, tsanLog))
+			tsanLog = make([]string, 0, 100)
+			funcName = s[len(testPrefix):]
+		} else {
+			tsanLog = append(tsanLog, s)
+		}
+	}
+
+	if totalTests == 0 {
+		t.Fatalf("failed to parse test output:\n%s", testOutput)
+	}
+	fmt.Printf("\nPassed %d of %d tests (%.02f%%, %d+, %d-)\n",
+		passedTests, totalTests, 100*float64(passedTests)/float64(totalTests), falsePos, falseNeg)
+	fmt.Printf("%d expected failures (%d has not fail)\n", failingPos+failingNeg, failingNeg)
+	if failed {
+		t.Fail()
+	}
+}
+
+// nextLine is a wrapper around bufio.Reader.ReadString.
+// It reads a line up to the next '\n' character. Error
+// is non-nil if there are no lines left, and nil
+// otherwise.
+func nextLine(r *bufio.Reader) (string, error) {
+	s, err := r.ReadString('\n')
+	if err != nil {
+		if err != io.EOF {
+			log.Fatalf("nextLine: expected EOF, received %v", err)
+		}
+		return s, err
+	}
+	return s[:len(s)-1], nil
+}
+
+// processLog verifies whether the given ThreadSanitizer's log
+// contains a race report, checks this information against
+// the name of the testcase and returns the result of this
+// comparison.
+func processLog(testName string, tsanLog []string) string {
+	if !strings.HasPrefix(testName, "Race") && !strings.HasPrefix(testName, "NoRace") {
+		return ""
+	}
+	gotRace := false
+	for _, s := range tsanLog {
+		if strings.Contains(s, "DATA RACE") {
+			gotRace = true
+			break
+		}
+	}
+
+	failing := strings.Contains(testName, "Failing")
+	expRace := !strings.HasPrefix(testName, "No")
+	for len(testName) < visibleLen {
+		testName += " "
+	}
+	if expRace == gotRace {
+		passedTests++
+		totalTests++
+		if failing {
+			failed = true
+			failingNeg++
+		}
+		return fmt.Sprintf("%s .", testName)
+	}
+	pos := ""
+	if expRace {
+		falseNeg++
+	} else {
+		falsePos++
+		pos = "+"
+	}
+	if failing {
+		failingPos++
+	} else {
+		failed = true
+	}
+	totalTests++
+	return fmt.Sprintf("%s %s%s", testName, "FAILED", pos)
+}
+
+// runTests assures that the package and its dependencies is
+// built with instrumentation enabled and returns the output of 'go test'
+// which includes possible data race reports from ThreadSanitizer.
+func runTests(t *testing.T) ([]byte, error) {
+	tests, err := filepath.Glob("./testdata/*_test.go")
+	if err != nil {
+		return nil, err
+	}
+	args := []string{"test", "-race", "-v"}
+	args = append(args, tests...)
+	cmd := exec.Command(testenv.GoToolPath(t), args...)
+	// The following flags turn off heuristics that suppress seemingly identical reports.
+	// It is required because the tests contain a lot of data races on the same addresses
+	// (the tests are simple and the memory is constantly reused).
+	for _, env := range os.Environ() {
+		if strings.HasPrefix(env, "GOMAXPROCS=") ||
+			strings.HasPrefix(env, "GODEBUG=") ||
+			strings.HasPrefix(env, "GORACE=") {
+			continue
+		}
+		cmd.Env = append(cmd.Env, env)
+	}
+	// We set GOMAXPROCS=1 to prevent test flakiness.
+	// There are two sources of flakiness:
+	// 1. Some tests rely on particular execution order.
+	//    If the order is different, race does not happen at all.
+	// 2. Ironically, ThreadSanitizer runtime contains a logical race condition
+	//    that can lead to false negatives if racy accesses happen literally at the same time.
+	// Tests used to work reliably in the good old days of GOMAXPROCS=1.
+	// So let's set it for now. A more reliable solution is to explicitly annotate tests
+	// with required execution order by means of a special "invisible" synchronization primitive
+	// (that's what is done for C++ ThreadSanitizer tests). This is issue #14119.
+	cmd.Env = append(cmd.Env,
+		"GOMAXPROCS=1",
+		"GORACE=suppress_equal_stacks=0 suppress_equal_addresses=0",
+	)
+	// There are races: we expect tests to fail and the exit code to be non-zero.
+	out, _ := cmd.CombinedOutput()
+	if bytes.Contains(out, []byte("fatal error:")) {
+		// But don't expect runtime to crash.
+		return out, fmt.Errorf("runtime fatal error")
+	}
+	return out, nil
+}
+
+func TestIssue8102(t *testing.T) {
+	// If this compiles with -race, the test passes.
+	type S struct {
+		x interface{}
+		i int
+	}
+	c := make(chan int)
+	a := [2]*int{}
+	for ; ; c <- *a[S{}.i] {
+		if t != nil {
+			break
+		}
+	}
+}
+
+func TestIssue9137(t *testing.T) {
+	a := []string{"a"}
+	i := 0
+	a[i], a[len(a)-1], a = a[len(a)-1], "", a[:len(a)-1]
+	if len(a) != 0 || a[:1][0] != "" {
+		t.Errorf("mangled a: %q %q", a, a[:1])
+	}
+}
+
+func BenchmarkSyncLeak(b *testing.B) {
+	const (
+		G = 1000
+		S = 1000
+		H = 10
+	)
+	var wg sync.WaitGroup
+	wg.Add(G)
+	for g := 0; g < G; g++ {
+		go func() {
+			defer wg.Done()
+			hold := make([][]uint32, H)
+			for i := 0; i < b.N; i++ {
+				a := make([]uint32, S)
+				atomic.AddUint32(&a[rand.Intn(len(a))], 1)
+				hold[rand.Intn(len(hold))] = a
+			}
+			_ = hold
+		}()
+	}
+	wg.Wait()
+}
+
+func BenchmarkStackLeak(b *testing.B) {
+	done := make(chan bool, 1)
+	for i := 0; i < b.N; i++ {
+		go func() {
+			growStack(rand.Intn(100))
+			done <- true
+		}()
+		<-done
+	}
+}
+
+func growStack(i int) {
+	if i == 0 {
+		return
+	}
+	growStack(i - 1)
+}
diff --git a/src/runtime/race/race_unix_test.go b/src/runtime/race/race_unix_test.go
new file mode 100644
index 0000000..84f0ace
--- /dev/null
+++ b/src/runtime/race/race_unix_test.go
@@ -0,0 +1,30 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build race
+// +build darwin freebsd linux
+
+package race_test
+
+import (
+	"sync/atomic"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+// Test that race detector does not crash when accessing non-Go allocated memory (issue 9136).
+func TestNonGoMemory(t *testing.T) {
+	data, err := syscall.Mmap(-1, 0, 4096, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_ANON|syscall.MAP_PRIVATE)
+	if err != nil {
+		t.Fatalf("failed to mmap memory: %v", err)
+	}
+	p := (*uint32)(unsafe.Pointer(&data[0]))
+	atomic.AddUint32(p, 1)
+	(*p)++
+	if *p != 2 {
+		t.Fatalf("data[0] = %v, expect 2", *p)
+	}
+	syscall.Munmap(data)
+}
diff --git a/src/runtime/race/race_windows_amd64.syso b/src/runtime/race/race_windows_amd64.syso
new file mode 100644
index 0000000..9fbf9b4
--- /dev/null
+++ b/src/runtime/race/race_windows_amd64.syso
diff --git a/src/runtime/race/race_windows_test.go b/src/runtime/race/race_windows_test.go
new file mode 100644
index 0000000..307a1ea
--- /dev/null
+++ b/src/runtime/race/race_windows_test.go
@@ -0,0 +1,46 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build windows,race
+
+package race_test
+
+import (
+	"sync/atomic"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+func TestAtomicMmap(t *testing.T) {
+	// Test that atomic operations work on "external" memory. Previously they crashed (#16206).
+	// Also do a sanity correctness check: under race detector atomic operations
+	// are implemented inside of race runtime.
+	kernel32 := syscall.NewLazyDLL("kernel32.dll")
+	VirtualAlloc := kernel32.NewProc("VirtualAlloc")
+	VirtualFree := kernel32.NewProc("VirtualFree")
+	const (
+		MEM_COMMIT     = 0x00001000
+		MEM_RESERVE    = 0x00002000
+		MEM_RELEASE    = 0x8000
+		PAGE_READWRITE = 0x04
+	)
+	mem, _, err := syscall.Syscall6(VirtualAlloc.Addr(), 4, 0, 1<<20, MEM_COMMIT|MEM_RESERVE, PAGE_READWRITE, 0, 0)
+	if err != 0 {
+		t.Fatalf("VirtualAlloc failed: %v", err)
+	}
+	defer syscall.Syscall(VirtualFree.Addr(), 3, mem, 1<<20, MEM_RELEASE)
+	a := (*uint64)(unsafe.Pointer(mem))
+	if *a != 0 {
+		t.Fatalf("bad atomic value: %v, want 0", *a)
+	}
+	atomic.AddUint64(a, 1)
+	if *a != 1 {
+		t.Fatalf("bad atomic value: %v, want 1", *a)
+	}
+	atomic.AddUint64(a, 1)
+	if *a != 2 {
+		t.Fatalf("bad atomic value: %v, want 2", *a)
+	}
+}
diff --git a/src/runtime/race/sched_test.go b/src/runtime/race/sched_test.go
new file mode 100644
index 0000000..d6bb323
--- /dev/null
+++ b/src/runtime/race/sched_test.go
@@ -0,0 +1,48 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build race
+
+package race_test
+
+import (
+	"bytes"
+	"fmt"
+	"reflect"
+	"runtime"
+	"testing"
+)
+
+func TestRandomScheduling(t *testing.T) {
+	// Scheduler is most consistent with GOMAXPROCS=1.
+	// Use that to make the test most likely to fail.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+	const N = 10
+	out := make([][]int, N)
+	for i := 0; i < N; i++ {
+		c := make(chan int, N)
+		for j := 0; j < N; j++ {
+			go func(j int) {
+				c <- j
+			}(j)
+		}
+		row := make([]int, N)
+		for j := 0; j < N; j++ {
+			row[j] = <-c
+		}
+		out[i] = row
+	}
+
+	for i := 0; i < N; i++ {
+		if !reflect.DeepEqual(out[0], out[i]) {
+			return // found a different order
+		}
+	}
+
+	var buf bytes.Buffer
+	for i := 0; i < N; i++ {
+		fmt.Fprintf(&buf, "%v\n", out[i])
+	}
+	t.Fatalf("consistent goroutine execution order:\n%v", buf.String())
+}
diff --git a/src/runtime/race/syso_test.go b/src/runtime/race/syso_test.go
new file mode 100644
index 0000000..db846c5
--- /dev/null
+++ b/src/runtime/race/syso_test.go
@@ -0,0 +1,39 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !android,!js,!ppc64le
+
+// Note: we don't run on Android or ppc64 because if there is any non-race test
+// file in this package, the OS tries to link the .syso file into the
+// test (even when we're not in race mode), which fails. I'm not sure
+// why, but easiest to just punt - as long as a single builder runs
+// this test, we're good.
+
+package race
+
+import (
+	"bytes"
+	"os/exec"
+	"path/filepath"
+	"runtime"
+	"testing"
+)
+
+func TestIssue37485(t *testing.T) {
+	files, err := filepath.Glob("./*.syso")
+	if err != nil {
+		t.Fatalf("can't find syso files: %s", err)
+	}
+	for _, f := range files {
+		cmd := exec.Command(filepath.Join(runtime.GOROOT(), "bin", "go"), "tool", "nm", f)
+		res, err := cmd.CombinedOutput()
+		if err != nil {
+			t.Errorf("nm of %s failed: %s", f, err)
+			continue
+		}
+		if bytes.Contains(res, []byte("getauxval")) {
+			t.Errorf("%s contains getauxval", f)
+		}
+	}
+}
diff --git a/src/runtime/race/testdata/atomic_test.go b/src/runtime/race/testdata/atomic_test.go
new file mode 100644
index 0000000..4ce7260
--- /dev/null
+++ b/src/runtime/race/testdata/atomic_test.go
@@ -0,0 +1,325 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"runtime"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"unsafe"
+)
+
+func TestNoRaceAtomicAddInt64(t *testing.T) {
+	var x1, x2 int8
+	_ = x1 + x2
+	var s int64
+	ch := make(chan bool, 2)
+	go func() {
+		x1 = 1
+		if atomic.AddInt64(&s, 1) == 2 {
+			x2 = 1
+		}
+		ch <- true
+	}()
+	go func() {
+		x2 = 1
+		if atomic.AddInt64(&s, 1) == 2 {
+			x1 = 1
+		}
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceAtomicAddInt64(t *testing.T) {
+	var x1, x2 int8
+	_ = x1 + x2
+	var s int64
+	ch := make(chan bool, 2)
+	go func() {
+		x1 = 1
+		if atomic.AddInt64(&s, 1) == 1 {
+			x2 = 1
+		}
+		ch <- true
+	}()
+	go func() {
+		x2 = 1
+		if atomic.AddInt64(&s, 1) == 1 {
+			x1 = 1
+		}
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceAtomicAddInt32(t *testing.T) {
+	var x1, x2 int8
+	_ = x1 + x2
+	var s int32
+	ch := make(chan bool, 2)
+	go func() {
+		x1 = 1
+		if atomic.AddInt32(&s, 1) == 2 {
+			x2 = 1
+		}
+		ch <- true
+	}()
+	go func() {
+		x2 = 1
+		if atomic.AddInt32(&s, 1) == 2 {
+			x1 = 1
+		}
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceAtomicLoadAddInt32(t *testing.T) {
+	var x int64
+	_ = x
+	var s int32
+	go func() {
+		x = 2
+		atomic.AddInt32(&s, 1)
+	}()
+	for atomic.LoadInt32(&s) != 1 {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicLoadStoreInt32(t *testing.T) {
+	var x int64
+	_ = x
+	var s int32
+	go func() {
+		x = 2
+		atomic.StoreInt32(&s, 1)
+	}()
+	for atomic.LoadInt32(&s) != 1 {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicStoreCASInt32(t *testing.T) {
+	var x int64
+	_ = x
+	var s int32
+	go func() {
+		x = 2
+		atomic.StoreInt32(&s, 1)
+	}()
+	for !atomic.CompareAndSwapInt32(&s, 1, 0) {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicCASLoadInt32(t *testing.T) {
+	var x int64
+	_ = x
+	var s int32
+	go func() {
+		x = 2
+		if !atomic.CompareAndSwapInt32(&s, 0, 1) {
+			panic("")
+		}
+	}()
+	for atomic.LoadInt32(&s) != 1 {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicCASCASInt32(t *testing.T) {
+	var x int64
+	_ = x
+	var s int32
+	go func() {
+		x = 2
+		if !atomic.CompareAndSwapInt32(&s, 0, 1) {
+			panic("")
+		}
+	}()
+	for !atomic.CompareAndSwapInt32(&s, 1, 0) {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicCASCASInt32_2(t *testing.T) {
+	var x1, x2 int8
+	_ = x1 + x2
+	var s int32
+	ch := make(chan bool, 2)
+	go func() {
+		x1 = 1
+		if !atomic.CompareAndSwapInt32(&s, 0, 1) {
+			x2 = 1
+		}
+		ch <- true
+	}()
+	go func() {
+		x2 = 1
+		if !atomic.CompareAndSwapInt32(&s, 0, 1) {
+			x1 = 1
+		}
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceAtomicLoadInt64(t *testing.T) {
+	var x int32
+	_ = x
+	var s int64
+	go func() {
+		x = 2
+		atomic.AddInt64(&s, 1)
+	}()
+	for atomic.LoadInt64(&s) != 1 {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicCASCASUInt64(t *testing.T) {
+	var x int64
+	_ = x
+	var s uint64
+	go func() {
+		x = 2
+		if !atomic.CompareAndSwapUint64(&s, 0, 1) {
+			panic("")
+		}
+	}()
+	for !atomic.CompareAndSwapUint64(&s, 1, 0) {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicLoadStorePointer(t *testing.T) {
+	var x int64
+	_ = x
+	var s unsafe.Pointer
+	var y int = 2
+	var p unsafe.Pointer = unsafe.Pointer(&y)
+	go func() {
+		x = 2
+		atomic.StorePointer(&s, p)
+	}()
+	for atomic.LoadPointer(&s) != p {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicStoreCASUint64(t *testing.T) {
+	var x int64
+	_ = x
+	var s uint64
+	go func() {
+		x = 2
+		atomic.StoreUint64(&s, 1)
+	}()
+	for !atomic.CompareAndSwapUint64(&s, 1, 0) {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestRaceAtomicStoreLoad(t *testing.T) {
+	c := make(chan bool)
+	var a uint64
+	go func() {
+		atomic.StoreUint64(&a, 1)
+		c <- true
+	}()
+	_ = a
+	<-c
+}
+
+func TestRaceAtomicLoadStore(t *testing.T) {
+	c := make(chan bool)
+	var a uint64
+	go func() {
+		_ = atomic.LoadUint64(&a)
+		c <- true
+	}()
+	a = 1
+	<-c
+}
+
+func TestRaceAtomicAddLoad(t *testing.T) {
+	c := make(chan bool)
+	var a uint64
+	go func() {
+		atomic.AddUint64(&a, 1)
+		c <- true
+	}()
+	_ = a
+	<-c
+}
+
+func TestRaceAtomicAddStore(t *testing.T) {
+	c := make(chan bool)
+	var a uint64
+	go func() {
+		atomic.AddUint64(&a, 1)
+		c <- true
+	}()
+	a = 42
+	<-c
+}
+
+// A nil pointer in an atomic operation should not deadlock
+// the rest of the program. Used to hang indefinitely.
+func TestNoRaceAtomicCrash(t *testing.T) {
+	var mutex sync.Mutex
+	var nilptr *int32
+	panics := 0
+	defer func() {
+		if x := recover(); x != nil {
+			mutex.Lock()
+			panics++
+			mutex.Unlock()
+		} else {
+			panic("no panic")
+		}
+	}()
+	atomic.AddInt32(nilptr, 1)
+}
+
+func TestNoRaceDeferAtomicStore(t *testing.T) {
+	// Test that when an atomic function is deferred directly, the
+	// GC scans it correctly. See issue 42599.
+	type foo struct {
+		bar int64
+	}
+
+	var doFork func(f *foo, depth int)
+	doFork = func(f *foo, depth int) {
+		atomic.StoreInt64(&f.bar, 1)
+		defer atomic.StoreInt64(&f.bar, 0)
+		if depth > 0 {
+			for i := 0; i < 2; i++ {
+				f2 := &foo{}
+				go doFork(f2, depth-1)
+			}
+		}
+		runtime.GC()
+	}
+
+	f := &foo{}
+	doFork(f, 11)
+}
diff --git a/src/runtime/race/testdata/cgo_test.go b/src/runtime/race/testdata/cgo_test.go
new file mode 100644
index 0000000..211ef7d
--- /dev/null
+++ b/src/runtime/race/testdata/cgo_test.go
@@ -0,0 +1,21 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"testing"
+)
+
+func TestNoRaceCgoSync(t *testing.T) {
+	cmd := exec.Command(testenv.GoToolPath(t), "run", "-race", "cgo_test_main.go")
+	cmd.Stdout = os.Stdout
+	cmd.Stderr = os.Stderr
+	if err := cmd.Run(); err != nil {
+		t.Fatalf("program exited with error: %v\n", err)
+	}
+}
diff --git a/src/runtime/race/testdata/cgo_test_main.go b/src/runtime/race/testdata/cgo_test_main.go
new file mode 100644
index 0000000..620cea1
--- /dev/null
+++ b/src/runtime/race/testdata/cgo_test_main.go
@@ -0,0 +1,30 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+int sync;
+
+void Notify(void)
+{
+	__sync_fetch_and_add(&sync, 1);
+}
+
+void Wait(void)
+{
+	while(__sync_fetch_and_add(&sync, 0) == 0) {}
+}
+*/
+import "C"
+
+func main() {
+	data := 0
+	go func() {
+		data = 1
+		C.Notify()
+	}()
+	C.Wait()
+	_ = data
+}
diff --git a/src/runtime/race/testdata/chan_test.go b/src/runtime/race/testdata/chan_test.go
new file mode 100644
index 0000000..e39ad4f
--- /dev/null
+++ b/src/runtime/race/testdata/chan_test.go
@@ -0,0 +1,787 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"runtime"
+	"testing"
+	"time"
+)
+
+func TestNoRaceChanSync(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		v = 1
+		c <- 0
+	}()
+	<-c
+	v = 2
+}
+
+func TestNoRaceChanSyncRev(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		c <- 0
+		v = 2
+	}()
+	v = 1
+	<-c
+}
+
+func TestNoRaceChanAsync(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		c <- 0
+	}()
+	<-c
+	v = 2
+}
+
+func TestRaceChanAsyncRev(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		c <- 0
+		v = 1
+	}()
+	v = 2
+	<-c
+}
+
+func TestNoRaceChanAsyncCloseRecv(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	func() {
+		defer func() {
+			recover()
+			v = 2
+		}()
+		<-c
+	}()
+}
+
+func TestNoRaceChanAsyncCloseRecv2(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	_, _ = <-c
+	v = 2
+}
+
+func TestNoRaceChanAsyncCloseRecv3(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	for range c {
+	}
+	v = 2
+}
+
+func TestNoRaceChanSyncCloseRecv(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	func() {
+		defer func() {
+			recover()
+			v = 2
+		}()
+		<-c
+	}()
+}
+
+func TestNoRaceChanSyncCloseRecv2(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	_, _ = <-c
+	v = 2
+}
+
+func TestNoRaceChanSyncCloseRecv3(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	for range c {
+	}
+	v = 2
+}
+
+func TestRaceChanSyncCloseSend(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	func() {
+		defer func() {
+			recover()
+		}()
+		c <- 0
+	}()
+	v = 2
+}
+
+func TestRaceChanAsyncCloseSend(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	func() {
+		defer func() {
+			recover()
+		}()
+		for {
+			c <- 0
+		}
+	}()
+	v = 2
+}
+
+func TestRaceChanCloseClose(t *testing.T) {
+	compl := make(chan bool, 2)
+	v1 := 0
+	v2 := 0
+	_ = v1 + v2
+	c := make(chan int)
+	go func() {
+		defer func() {
+			if recover() != nil {
+				v2 = 2
+			}
+			compl <- true
+		}()
+		v1 = 1
+		close(c)
+	}()
+	go func() {
+		defer func() {
+			if recover() != nil {
+				v1 = 2
+			}
+			compl <- true
+		}()
+		v2 = 1
+		close(c)
+	}()
+	<-compl
+	<-compl
+}
+
+func TestRaceChanSendLen(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		c <- 1
+	}()
+	for len(c) == 0 {
+		runtime.Gosched()
+	}
+	v = 2
+}
+
+func TestRaceChanRecvLen(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	c <- 1
+	go func() {
+		v = 1
+		<-c
+	}()
+	for len(c) != 0 {
+		runtime.Gosched()
+	}
+	v = 2
+}
+
+func TestRaceChanSendSend(t *testing.T) {
+	compl := make(chan bool, 2)
+	v1 := 0
+	v2 := 0
+	_ = v1 + v2
+	c := make(chan int, 1)
+	go func() {
+		v1 = 1
+		select {
+		case c <- 1:
+		default:
+			v2 = 2
+		}
+		compl <- true
+	}()
+	go func() {
+		v2 = 1
+		select {
+		case c <- 1:
+		default:
+			v1 = 2
+		}
+		compl <- true
+	}()
+	<-compl
+	<-compl
+}
+
+func TestNoRaceChanPtr(t *testing.T) {
+	type msg struct {
+		x int
+	}
+	c := make(chan *msg)
+	go func() {
+		c <- &msg{1}
+	}()
+	m := <-c
+	m.x = 2
+}
+
+func TestRaceChanWrongSend(t *testing.T) {
+	v1 := 0
+	v2 := 0
+	_ = v1 + v2
+	c := make(chan int, 2)
+	go func() {
+		v1 = 1
+		c <- 1
+	}()
+	go func() {
+		v2 = 2
+		c <- 2
+	}()
+	time.Sleep(1e7)
+	if <-c == 1 {
+		v2 = 3
+	} else {
+		v1 = 3
+	}
+}
+
+func TestRaceChanWrongClose(t *testing.T) {
+	v1 := 0
+	v2 := 0
+	_ = v1 + v2
+	c := make(chan int, 1)
+	done := make(chan bool)
+	go func() {
+		defer func() {
+			recover()
+		}()
+		v1 = 1
+		c <- 1
+		done <- true
+	}()
+	go func() {
+		time.Sleep(1e7)
+		v2 = 2
+		close(c)
+		done <- true
+	}()
+	time.Sleep(2e7)
+	if _, who := <-c; who {
+		v2 = 2
+	} else {
+		v1 = 2
+	}
+	<-done
+	<-done
+}
+
+func TestRaceChanSendClose(t *testing.T) {
+	compl := make(chan bool, 2)
+	c := make(chan int, 1)
+	go func() {
+		defer func() {
+			recover()
+			compl <- true
+		}()
+		c <- 1
+	}()
+	go func() {
+		time.Sleep(10 * time.Millisecond)
+		close(c)
+		compl <- true
+	}()
+	<-compl
+	<-compl
+}
+
+func TestRaceChanSendSelectClose(t *testing.T) {
+	compl := make(chan bool, 2)
+	c := make(chan int, 1)
+	c1 := make(chan int)
+	go func() {
+		defer func() {
+			recover()
+			compl <- true
+		}()
+		time.Sleep(10 * time.Millisecond)
+		select {
+		case c <- 1:
+		case <-c1:
+		}
+	}()
+	go func() {
+		close(c)
+		compl <- true
+	}()
+	<-compl
+	<-compl
+}
+
+func TestRaceSelectReadWriteAsync(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	c1 := make(chan int, 10)
+	c2 := make(chan int, 10)
+	c3 := make(chan int)
+	c2 <- 1
+	go func() {
+		select {
+		case c1 <- x: // read of x races with...
+		case c3 <- 1:
+		}
+		done <- true
+	}()
+	select {
+	case x = <-c2: // ... write to x here
+	case c3 <- 1:
+	}
+	<-done
+}
+
+func TestRaceSelectReadWriteSync(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	c1 := make(chan int)
+	c2 := make(chan int)
+	c3 := make(chan int)
+	// make c1 and c2 ready for communication
+	go func() {
+		<-c1
+	}()
+	go func() {
+		c2 <- 1
+	}()
+	go func() {
+		select {
+		case c1 <- x: // read of x races with...
+		case c3 <- 1:
+		}
+		done <- true
+	}()
+	select {
+	case x = <-c2: // ... write to x here
+	case c3 <- 1:
+	}
+	<-done
+}
+
+func TestNoRaceSelectReadWriteAsync(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	c1 := make(chan int)
+	c2 := make(chan int)
+	go func() {
+		select {
+		case c1 <- x: // read of x does not race with...
+		case c2 <- 1:
+		}
+		done <- true
+	}()
+	select {
+	case x = <-c1: // ... write to x here
+	case c2 <- 1:
+	}
+	<-done
+}
+
+func TestRaceChanReadWriteAsync(t *testing.T) {
+	done := make(chan bool)
+	c1 := make(chan int, 10)
+	c2 := make(chan int, 10)
+	c2 <- 10
+	x := 0
+	go func() {
+		c1 <- x // read of x races with...
+		done <- true
+	}()
+	x = <-c2 // ... write to x here
+	<-done
+}
+
+func TestRaceChanReadWriteSync(t *testing.T) {
+	done := make(chan bool)
+	c1 := make(chan int)
+	c2 := make(chan int)
+	// make c1 and c2 ready for communication
+	go func() {
+		<-c1
+	}()
+	go func() {
+		c2 <- 10
+	}()
+	x := 0
+	go func() {
+		c1 <- x // read of x races with...
+		done <- true
+	}()
+	x = <-c2 // ... write to x here
+	<-done
+}
+
+func TestNoRaceChanReadWriteAsync(t *testing.T) {
+	done := make(chan bool)
+	c1 := make(chan int, 10)
+	x := 0
+	go func() {
+		c1 <- x // read of x does not race with...
+		done <- true
+	}()
+	x = <-c1 // ... write to x here
+	<-done
+}
+
+func TestNoRaceProducerConsumerUnbuffered(t *testing.T) {
+	type Task struct {
+		f    func()
+		done chan bool
+	}
+
+	queue := make(chan Task)
+
+	go func() {
+		t := <-queue
+		t.f()
+		t.done <- true
+	}()
+
+	doit := func(f func()) {
+		done := make(chan bool, 1)
+		queue <- Task{f, done}
+		<-done
+	}
+
+	x := 0
+	doit(func() {
+		x = 1
+	})
+	_ = x
+}
+
+func TestRaceChanItselfSend(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int, 10)
+	go func() {
+		c <- 0
+		compl <- true
+	}()
+	c = make(chan int, 20)
+	<-compl
+}
+
+func TestRaceChanItselfRecv(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int, 10)
+	c <- 1
+	go func() {
+		<-c
+		compl <- true
+	}()
+	time.Sleep(1e7)
+	c = make(chan int, 20)
+	<-compl
+}
+
+func TestRaceChanItselfNil(t *testing.T) {
+	c := make(chan int, 10)
+	go func() {
+		c <- 0
+	}()
+	time.Sleep(1e7)
+	c = nil
+	_ = c
+}
+
+func TestRaceChanItselfClose(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int)
+	go func() {
+		close(c)
+		compl <- true
+	}()
+	c = make(chan int)
+	<-compl
+}
+
+func TestRaceChanItselfLen(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int)
+	go func() {
+		_ = len(c)
+		compl <- true
+	}()
+	c = make(chan int)
+	<-compl
+}
+
+func TestRaceChanItselfCap(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int)
+	go func() {
+		_ = cap(c)
+		compl <- true
+	}()
+	c = make(chan int)
+	<-compl
+}
+
+func TestNoRaceChanCloseLen(t *testing.T) {
+	c := make(chan int, 10)
+	r := make(chan int, 10)
+	go func() {
+		r <- len(c)
+	}()
+	go func() {
+		close(c)
+		r <- 0
+	}()
+	<-r
+	<-r
+}
+
+func TestNoRaceChanCloseCap(t *testing.T) {
+	c := make(chan int, 10)
+	r := make(chan int, 10)
+	go func() {
+		r <- cap(c)
+	}()
+	go func() {
+		close(c)
+		r <- 0
+	}()
+	<-r
+	<-r
+}
+
+func TestRaceChanCloseSend(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int, 10)
+	go func() {
+		close(c)
+		compl <- true
+	}()
+	c <- 0
+	<-compl
+}
+
+func TestNoRaceChanMutex(t *testing.T) {
+	done := make(chan struct{})
+	mtx := make(chan struct{}, 1)
+	data := 0
+	_ = data
+	go func() {
+		mtx <- struct{}{}
+		data = 42
+		<-mtx
+		done <- struct{}{}
+	}()
+	mtx <- struct{}{}
+	data = 43
+	<-mtx
+	<-done
+}
+
+func TestNoRaceSelectMutex(t *testing.T) {
+	done := make(chan struct{})
+	mtx := make(chan struct{}, 1)
+	aux := make(chan bool)
+	data := 0
+	_ = data
+	go func() {
+		select {
+		case mtx <- struct{}{}:
+		case <-aux:
+		}
+		data = 42
+		select {
+		case <-mtx:
+		case <-aux:
+		}
+		done <- struct{}{}
+	}()
+	select {
+	case mtx <- struct{}{}:
+	case <-aux:
+	}
+	data = 43
+	select {
+	case <-mtx:
+	case <-aux:
+	}
+	<-done
+}
+
+func TestRaceChanSem(t *testing.T) {
+	done := make(chan struct{})
+	mtx := make(chan bool, 2)
+	data := 0
+	_ = data
+	go func() {
+		mtx <- true
+		data = 42
+		<-mtx
+		done <- struct{}{}
+	}()
+	mtx <- true
+	data = 43
+	<-mtx
+	<-done
+}
+
+func TestNoRaceChanWaitGroup(t *testing.T) {
+	const N = 10
+	chanWg := make(chan bool, N/2)
+	data := make([]int, N)
+	for i := 0; i < N; i++ {
+		chanWg <- true
+		go func(i int) {
+			data[i] = 42
+			<-chanWg
+		}(i)
+	}
+	for i := 0; i < cap(chanWg); i++ {
+		chanWg <- true
+	}
+	for i := 0; i < N; i++ {
+		_ = data[i]
+	}
+}
+
+// Test that sender synchronizes with receiver even if the sender was blocked.
+func TestNoRaceBlockedSendSync(t *testing.T) {
+	c := make(chan *int, 1)
+	c <- nil
+	go func() {
+		i := 42
+		c <- &i
+	}()
+	// Give the sender time to actually block.
+	// This sleep is completely optional: race report must not be printed
+	// regardless of whether the sender actually blocks or not.
+	// It cannot lead to flakiness.
+	time.Sleep(10 * time.Millisecond)
+	<-c
+	p := <-c
+	if *p != 42 {
+		t.Fatal()
+	}
+}
+
+// The same as TestNoRaceBlockedSendSync above, but sender unblock happens in a select.
+func TestNoRaceBlockedSelectSendSync(t *testing.T) {
+	c := make(chan *int, 1)
+	c <- nil
+	go func() {
+		i := 42
+		c <- &i
+	}()
+	time.Sleep(10 * time.Millisecond)
+	<-c
+	select {
+	case p := <-c:
+		if *p != 42 {
+			t.Fatal()
+		}
+	case <-make(chan int):
+	}
+}
+
+// Test that close synchronizes with a read from the empty closed channel.
+// See https://golang.org/issue/36714.
+func TestNoRaceCloseHappensBeforeRead(t *testing.T) {
+	for i := 0; i < 100; i++ {
+		var loc int
+		var write = make(chan struct{})
+		var read = make(chan struct{})
+
+		go func() {
+			select {
+			case <-write:
+				_ = loc
+			default:
+			}
+			close(read)
+		}()
+
+		go func() {
+			loc = 1
+			close(write)
+		}()
+
+		<-read
+	}
+}
+
+// Test that we call the proper race detector function when c.elemsize==0.
+// See https://github.com/golang/go/issues/42598
+func TestNoRaceElemetSize0(t *testing.T) {
+	var x, y int
+	var c = make(chan struct{}, 2)
+	c <- struct{}{}
+	c <- struct{}{}
+	go func() {
+		x += 1
+		<-c
+	}()
+	go func() {
+		y += 1
+		<-c
+	}()
+	time.Sleep(10 * time.Millisecond)
+	c <- struct{}{}
+	c <- struct{}{}
+	x += 1
+	y += 1
+}
diff --git a/src/runtime/race/testdata/comp_test.go b/src/runtime/race/testdata/comp_test.go
new file mode 100644
index 0000000..27b2d00
--- /dev/null
+++ b/src/runtime/race/testdata/comp_test.go
@@ -0,0 +1,186 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"testing"
+)
+
+type P struct {
+	x, y int
+}
+
+type S struct {
+	s1, s2 P
+}
+
+func TestNoRaceComp(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S
+	go func() {
+		s.s2.x = 1
+		c <- true
+	}()
+	s.s2.y = 2
+	<-c
+}
+
+func TestNoRaceComp2(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S
+	go func() {
+		s.s1.x = 1
+		c <- true
+	}()
+	s.s1.y = 2
+	<-c
+}
+
+func TestRaceComp(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S
+	go func() {
+		s.s2.y = 1
+		c <- true
+	}()
+	s.s2.y = 2
+	<-c
+}
+
+func TestRaceComp2(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S
+	go func() {
+		s.s1.x = 1
+		c <- true
+	}()
+	s = S{}
+	<-c
+}
+
+func TestRaceComp3(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S
+	go func() {
+		s.s2.y = 1
+		c <- true
+	}()
+	s = S{}
+	<-c
+}
+
+func TestRaceCompArray(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]S, 10)
+	x := 4
+	go func() {
+		s[x].s2.y = 1
+		c <- true
+	}()
+	x = 5
+	<-c
+}
+
+type P2 P
+type S2 S
+
+func TestRaceConv1(t *testing.T) {
+	c := make(chan bool, 1)
+	var p P2
+	go func() {
+		p.x = 1
+		c <- true
+	}()
+	_ = P(p).x
+	<-c
+}
+
+func TestRaceConv2(t *testing.T) {
+	c := make(chan bool, 1)
+	var p P2
+	go func() {
+		p.x = 1
+		c <- true
+	}()
+	ptr := &p
+	_ = P(*ptr).x
+	<-c
+}
+
+func TestRaceConv3(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S2
+	go func() {
+		s.s1.x = 1
+		c <- true
+	}()
+	_ = P2(S(s).s1).x
+	<-c
+}
+
+type X struct {
+	V [4]P
+}
+
+type X2 X
+
+func TestRaceConv4(t *testing.T) {
+	c := make(chan bool, 1)
+	var x X2
+	go func() {
+		x.V[1].x = 1
+		c <- true
+	}()
+	_ = P2(X(x).V[1]).x
+	<-c
+}
+
+type Ptr struct {
+	s1, s2 *P
+}
+
+func TestNoRaceCompPtr(t *testing.T) {
+	c := make(chan bool, 1)
+	p := Ptr{&P{}, &P{}}
+	go func() {
+		p.s1.x = 1
+		c <- true
+	}()
+	p.s1.y = 2
+	<-c
+}
+
+func TestNoRaceCompPtr2(t *testing.T) {
+	c := make(chan bool, 1)
+	p := Ptr{&P{}, &P{}}
+	go func() {
+		p.s1.x = 1
+		c <- true
+	}()
+	_ = p
+	<-c
+}
+
+func TestRaceCompPtr(t *testing.T) {
+	c := make(chan bool, 1)
+	p := Ptr{&P{}, &P{}}
+	go func() {
+		p.s2.x = 1
+		c <- true
+	}()
+	p.s2.x = 2
+	<-c
+}
+
+func TestRaceCompPtr2(t *testing.T) {
+	c := make(chan bool, 1)
+	p := Ptr{&P{}, &P{}}
+	go func() {
+		p.s2.x = 1
+		c <- true
+	}()
+	p.s2 = &P{}
+	<-c
+}
diff --git a/src/runtime/race/testdata/finalizer_test.go b/src/runtime/race/testdata/finalizer_test.go
new file mode 100644
index 0000000..3ac33d2
--- /dev/null
+++ b/src/runtime/race/testdata/finalizer_test.go
@@ -0,0 +1,68 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"runtime"
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNoRaceFin(t *testing.T) {
+	c := make(chan bool)
+	go func() {
+		x := new(string)
+		runtime.SetFinalizer(x, func(x *string) {
+			*x = "foo"
+		})
+		*x = "bar"
+		c <- true
+	}()
+	<-c
+	runtime.GC()
+	time.Sleep(100 * time.Millisecond)
+}
+
+var finVar struct {
+	sync.Mutex
+	cnt int
+}
+
+func TestNoRaceFinGlobal(t *testing.T) {
+	c := make(chan bool)
+	go func() {
+		x := new(string)
+		runtime.SetFinalizer(x, func(x *string) {
+			finVar.Lock()
+			finVar.cnt++
+			finVar.Unlock()
+		})
+		c <- true
+	}()
+	<-c
+	runtime.GC()
+	time.Sleep(100 * time.Millisecond)
+	finVar.Lock()
+	finVar.cnt++
+	finVar.Unlock()
+}
+
+func TestRaceFin(t *testing.T) {
+	c := make(chan bool)
+	y := 0
+	_ = y
+	go func() {
+		x := new(string)
+		runtime.SetFinalizer(x, func(x *string) {
+			y = 42
+		})
+		c <- true
+	}()
+	<-c
+	runtime.GC()
+	time.Sleep(100 * time.Millisecond)
+	y = 66
+}
diff --git a/src/runtime/race/testdata/io_test.go b/src/runtime/race/testdata/io_test.go
new file mode 100644
index 0000000..c5055f7
--- /dev/null
+++ b/src/runtime/race/testdata/io_test.go
@@ -0,0 +1,75 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"fmt"
+	"net"
+	"net/http"
+	"os"
+	"path/filepath"
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNoRaceIOFile(t *testing.T) {
+	x := 0
+	path, _ := os.MkdirTemp("", "race_test")
+	fname := filepath.Join(path, "data")
+	go func() {
+		x = 42
+		f, _ := os.Create(fname)
+		f.Write([]byte("done"))
+		f.Close()
+	}()
+	for {
+		f, err := os.Open(fname)
+		if err != nil {
+			time.Sleep(1e6)
+			continue
+		}
+		buf := make([]byte, 100)
+		count, err := f.Read(buf)
+		if count == 0 {
+			time.Sleep(1e6)
+			continue
+		}
+		break
+	}
+	_ = x
+}
+
+var (
+	regHandler  sync.Once
+	handlerData int
+)
+
+func TestNoRaceIOHttp(t *testing.T) {
+	regHandler.Do(func() {
+		http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+			handlerData++
+			fmt.Fprintf(w, "test")
+			handlerData++
+		})
+	})
+	ln, err := net.Listen("tcp", "127.0.0.1:0")
+	if err != nil {
+		t.Fatalf("net.Listen: %v", err)
+	}
+	defer ln.Close()
+	go http.Serve(ln, nil)
+	handlerData++
+	_, err = http.Get("http://" + ln.Addr().String())
+	if err != nil {
+		t.Fatalf("http.Get: %v", err)
+	}
+	handlerData++
+	_, err = http.Get("http://" + ln.Addr().String())
+	if err != nil {
+		t.Fatalf("http.Get: %v", err)
+	}
+	handlerData++
+}
diff --git a/src/runtime/race/testdata/issue12225_test.go b/src/runtime/race/testdata/issue12225_test.go
new file mode 100644
index 0000000..0494493
--- /dev/null
+++ b/src/runtime/race/testdata/issue12225_test.go
@@ -0,0 +1,20 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import "unsafe"
+
+// golang.org/issue/12225
+// The test is that this compiles at all.
+
+//go:noinline
+func convert(s string) []byte {
+	return []byte(s)
+}
+
+func issue12225() {
+	println(*(*int)(unsafe.Pointer(&convert("")[0])))
+	println(*(*int)(unsafe.Pointer(&[]byte("")[0])))
+}
diff --git a/src/runtime/race/testdata/issue12664_test.go b/src/runtime/race/testdata/issue12664_test.go
new file mode 100644
index 0000000..c9f790e
--- /dev/null
+++ b/src/runtime/race/testdata/issue12664_test.go
@@ -0,0 +1,76 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"fmt"
+	"testing"
+)
+
+var issue12664 = "hi"
+
+func TestRaceIssue12664(t *testing.T) {
+	c := make(chan struct{})
+	go func() {
+		issue12664 = "bye"
+		close(c)
+	}()
+	fmt.Println(issue12664)
+	<-c
+}
+
+type MyI interface {
+	foo()
+}
+
+type MyT int
+
+func (MyT) foo() {
+}
+
+var issue12664_2 MyT = 0
+
+func TestRaceIssue12664_2(t *testing.T) {
+	c := make(chan struct{})
+	go func() {
+		issue12664_2 = 1
+		close(c)
+	}()
+	func(x MyI) {
+		// Never true, but prevents inlining.
+		if x.(MyT) == -1 {
+			close(c)
+		}
+	}(issue12664_2)
+	<-c
+}
+
+var issue12664_3 MyT = 0
+
+func TestRaceIssue12664_3(t *testing.T) {
+	c := make(chan struct{})
+	go func() {
+		issue12664_3 = 1
+		close(c)
+	}()
+	var r MyT
+	var i interface{} = r
+	issue12664_3 = i.(MyT)
+	<-c
+}
+
+var issue12664_4 MyT = 0
+
+func TestRaceIssue12664_4(t *testing.T) {
+	c := make(chan struct{})
+	go func() {
+		issue12664_4 = 1
+		close(c)
+	}()
+	var r MyT
+	var i MyI = r
+	issue12664_4 = i.(MyT)
+	<-c
+}
diff --git a/src/runtime/race/testdata/issue13264_test.go b/src/runtime/race/testdata/issue13264_test.go
new file mode 100644
index 0000000..d42290d
--- /dev/null
+++ b/src/runtime/race/testdata/issue13264_test.go
@@ -0,0 +1,13 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+// golang.org/issue/13264
+// The test is that this compiles at all.
+
+func issue13264() {
+	for ; ; []map[int]int{}[0][0] = 0 {
+	}
+}
diff --git a/src/runtime/race/testdata/map_test.go b/src/runtime/race/testdata/map_test.go
new file mode 100644
index 0000000..88e735e
--- /dev/null
+++ b/src/runtime/race/testdata/map_test.go
@@ -0,0 +1,335 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"testing"
+)
+
+func TestRaceMapRW(t *testing.T) {
+	m := make(map[int]int)
+	ch := make(chan bool, 1)
+	go func() {
+		_ = m[1]
+		ch <- true
+	}()
+	m[1] = 1
+	<-ch
+}
+
+func TestRaceMapRW2(t *testing.T) {
+	m := make(map[int]int)
+	ch := make(chan bool, 1)
+	go func() {
+		_, _ = m[1]
+		ch <- true
+	}()
+	m[1] = 1
+	<-ch
+}
+
+func TestRaceMapRWArray(t *testing.T) {
+	// Check instrumentation of unaddressable arrays (issue 4578).
+	m := make(map[int][2]int)
+	ch := make(chan bool, 1)
+	go func() {
+		_ = m[1][1]
+		ch <- true
+	}()
+	m[2] = [2]int{1, 2}
+	<-ch
+}
+
+func TestNoRaceMapRR(t *testing.T) {
+	m := make(map[int]int)
+	ch := make(chan bool, 1)
+	go func() {
+		_, _ = m[1]
+		ch <- true
+	}()
+	_ = m[1]
+	<-ch
+}
+
+func TestRaceMapRange(t *testing.T) {
+	m := make(map[int]int)
+	ch := make(chan bool, 1)
+	go func() {
+		for range m {
+		}
+		ch <- true
+	}()
+	m[1] = 1
+	<-ch
+}
+
+func TestRaceMapRange2(t *testing.T) {
+	m := make(map[int]int)
+	ch := make(chan bool, 1)
+	go func() {
+		for range m {
+		}
+		ch <- true
+	}()
+	m[1] = 1
+	<-ch
+}
+
+func TestNoRaceMapRangeRange(t *testing.T) {
+	m := make(map[int]int)
+	// now the map is not empty and range triggers an event
+	// should work without this (as in other tests)
+	// so it is suspicious if this test passes and others don't
+	m[0] = 0
+	ch := make(chan bool, 1)
+	go func() {
+		for range m {
+		}
+		ch <- true
+	}()
+	for range m {
+	}
+	<-ch
+}
+
+func TestRaceMapLen(t *testing.T) {
+	m := make(map[string]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		_ = len(m)
+		ch <- true
+	}()
+	m[""] = true
+	<-ch
+}
+
+func TestRaceMapDelete(t *testing.T) {
+	m := make(map[string]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		delete(m, "")
+		ch <- true
+	}()
+	m[""] = true
+	<-ch
+}
+
+func TestRaceMapLenDelete(t *testing.T) {
+	m := make(map[string]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		delete(m, "a")
+		ch <- true
+	}()
+	_ = len(m)
+	<-ch
+}
+
+func TestRaceMapVariable(t *testing.T) {
+	ch := make(chan bool, 1)
+	m := make(map[int]int)
+	_ = m
+	go func() {
+		m = make(map[int]int)
+		ch <- true
+	}()
+	m = make(map[int]int)
+	<-ch
+}
+
+func TestRaceMapVariable2(t *testing.T) {
+	ch := make(chan bool, 1)
+	m := make(map[int]int)
+	go func() {
+		m[1] = 1
+		ch <- true
+	}()
+	m = make(map[int]int)
+	<-ch
+}
+
+func TestRaceMapVariable3(t *testing.T) {
+	ch := make(chan bool, 1)
+	m := make(map[int]int)
+	go func() {
+		_ = m[1]
+		ch <- true
+	}()
+	m = make(map[int]int)
+	<-ch
+}
+
+type Big struct {
+	x [17]int32
+}
+
+func TestRaceMapLookupPartKey(t *testing.T) {
+	k := &Big{}
+	m := make(map[Big]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		k.x[8] = 1
+		ch <- true
+	}()
+	_ = m[*k]
+	<-ch
+}
+
+func TestRaceMapLookupPartKey2(t *testing.T) {
+	k := &Big{}
+	m := make(map[Big]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		k.x[8] = 1
+		ch <- true
+	}()
+	_, _ = m[*k]
+	<-ch
+}
+func TestRaceMapDeletePartKey(t *testing.T) {
+	k := &Big{}
+	m := make(map[Big]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		k.x[8] = 1
+		ch <- true
+	}()
+	delete(m, *k)
+	<-ch
+}
+
+func TestRaceMapInsertPartKey(t *testing.T) {
+	k := &Big{}
+	m := make(map[Big]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		k.x[8] = 1
+		ch <- true
+	}()
+	m[*k] = true
+	<-ch
+}
+
+func TestRaceMapInsertPartVal(t *testing.T) {
+	v := &Big{}
+	m := make(map[int]Big)
+	ch := make(chan bool, 1)
+	go func() {
+		v.x[8] = 1
+		ch <- true
+	}()
+	m[1] = *v
+	<-ch
+}
+
+// Test for issue 7561.
+func TestRaceMapAssignMultipleReturn(t *testing.T) {
+	connect := func() (int, error) { return 42, nil }
+	conns := make(map[int][]int)
+	conns[1] = []int{0}
+	ch := make(chan bool, 1)
+	var err error
+	_ = err
+	go func() {
+		conns[1][0], err = connect()
+		ch <- true
+	}()
+	x := conns[1][0]
+	_ = x
+	<-ch
+}
+
+// BigKey and BigVal must be larger than 256 bytes,
+// so that compiler sets KindGCProg for them.
+type BigKey [1000]*int
+
+type BigVal struct {
+	x int
+	y [1000]*int
+}
+
+func TestRaceMapBigKeyAccess1(t *testing.T) {
+	m := make(map[BigKey]int)
+	var k BigKey
+	ch := make(chan bool, 1)
+	go func() {
+		_ = m[k]
+		ch <- true
+	}()
+	k[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigKeyAccess2(t *testing.T) {
+	m := make(map[BigKey]int)
+	var k BigKey
+	ch := make(chan bool, 1)
+	go func() {
+		_, _ = m[k]
+		ch <- true
+	}()
+	k[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigKeyInsert(t *testing.T) {
+	m := make(map[BigKey]int)
+	var k BigKey
+	ch := make(chan bool, 1)
+	go func() {
+		m[k] = 1
+		ch <- true
+	}()
+	k[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigKeyDelete(t *testing.T) {
+	m := make(map[BigKey]int)
+	var k BigKey
+	ch := make(chan bool, 1)
+	go func() {
+		delete(m, k)
+		ch <- true
+	}()
+	k[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigValInsert(t *testing.T) {
+	m := make(map[int]BigVal)
+	var v BigVal
+	ch := make(chan bool, 1)
+	go func() {
+		m[1] = v
+		ch <- true
+	}()
+	v.y[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigValAccess1(t *testing.T) {
+	m := make(map[int]BigVal)
+	var v BigVal
+	ch := make(chan bool, 1)
+	go func() {
+		v = m[1]
+		ch <- true
+	}()
+	v.y[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigValAccess2(t *testing.T) {
+	m := make(map[int]BigVal)
+	var v BigVal
+	ch := make(chan bool, 1)
+	go func() {
+		v, _ = m[1]
+		ch <- true
+	}()
+	v.y[30] = new(int)
+	<-ch
+}
diff --git a/src/runtime/race/testdata/mop_test.go b/src/runtime/race/testdata/mop_test.go
new file mode 100644
index 0000000..5d25ed4
--- /dev/null
+++ b/src/runtime/race/testdata/mop_test.go
@@ -0,0 +1,2082 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"bytes"
+	"crypto/sha1"
+	"errors"
+	"fmt"
+	"io"
+	"os"
+	"runtime"
+	"sync"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+type Point struct {
+	x, y int
+}
+
+type NamedPoint struct {
+	name string
+	p    Point
+}
+
+type DummyWriter struct {
+	state int
+}
+type Writer interface {
+	Write(p []byte) (n int)
+}
+
+func (d DummyWriter) Write(p []byte) (n int) {
+	return 0
+}
+
+var GlobalX, GlobalY int = 0, 0
+var GlobalCh chan int = make(chan int, 2)
+
+func GlobalFunc1() {
+	GlobalY = GlobalX
+	GlobalCh <- 1
+}
+
+func GlobalFunc2() {
+	GlobalX = 1
+	GlobalCh <- 1
+}
+
+func TestRaceIntRWGlobalFuncs(t *testing.T) {
+	go GlobalFunc1()
+	go GlobalFunc2()
+	<-GlobalCh
+	<-GlobalCh
+}
+
+func TestRaceIntRWClosures(t *testing.T) {
+	var x, y int
+	_ = y
+	ch := make(chan int, 2)
+
+	go func() {
+		y = x
+		ch <- 1
+	}()
+	go func() {
+		x = 1
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceIntRWClosures(t *testing.T) {
+	var x, y int
+	_ = y
+	ch := make(chan int, 1)
+
+	go func() {
+		y = x
+		ch <- 1
+	}()
+	<-ch
+	go func() {
+		x = 1
+		ch <- 1
+	}()
+	<-ch
+
+}
+
+func TestRaceInt32RWClosures(t *testing.T) {
+	var x, y int32
+	_ = y
+	ch := make(chan bool, 2)
+
+	go func() {
+		y = x
+		ch <- true
+	}()
+	go func() {
+		x = 1
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceCase(t *testing.T) {
+	var y int
+	for x := -1; x <= 1; x++ {
+		switch {
+		case x < 0:
+			y = -1
+		case x == 0:
+			y = 0
+		case x > 0:
+			y = 1
+		}
+	}
+	y++
+}
+
+func TestRaceCaseCondition(t *testing.T) {
+	var x int = 0
+	ch := make(chan int, 2)
+
+	go func() {
+		x = 2
+		ch <- 1
+	}()
+	go func() {
+		switch x < 2 {
+		case true:
+			x = 1
+			//case false:
+			//	x = 5
+		}
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceCaseCondition2(t *testing.T) {
+	// switch body is rearranged by the compiler so the tests
+	// passes even if we don't instrument '<'
+	var x int = 0
+	ch := make(chan int, 2)
+
+	go func() {
+		x = 2
+		ch <- 1
+	}()
+	go func() {
+		switch x < 2 {
+		case true:
+			x = 1
+		case false:
+			x = 5
+		}
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceCaseBody(t *testing.T) {
+	var x, y int
+	_ = y
+	ch := make(chan int, 2)
+
+	go func() {
+		y = x
+		ch <- 1
+	}()
+	go func() {
+		switch {
+		default:
+			x = 1
+		case x == 100:
+			x = -x
+		}
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceCaseFallthrough(t *testing.T) {
+	var x, y, z int
+	_ = y
+	ch := make(chan int, 2)
+	z = 1
+
+	go func() {
+		y = x
+		ch <- 1
+	}()
+	go func() {
+		switch {
+		case z == 1:
+		case z == 2:
+			x = 2
+		}
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceCaseFallthrough(t *testing.T) {
+	var x, y, z int
+	_ = y
+	ch := make(chan int, 2)
+	z = 1
+
+	go func() {
+		y = x
+		ch <- 1
+	}()
+	go func() {
+		switch {
+		case z == 1:
+			fallthrough
+		case z == 2:
+			x = 2
+		}
+		ch <- 1
+	}()
+
+	<-ch
+	<-ch
+}
+
+func TestRaceCaseIssue6418(t *testing.T) {
+	m := map[string]map[string]string{
+		"a": {
+			"b": "c",
+		},
+	}
+	ch := make(chan int)
+	go func() {
+		m["a"]["x"] = "y"
+		ch <- 1
+	}()
+	switch m["a"]["b"] {
+	}
+	<-ch
+}
+
+func TestRaceCaseType(t *testing.T) {
+	var x, y int
+	var i interface{} = x
+	c := make(chan int, 1)
+	go func() {
+		switch i.(type) {
+		case nil:
+		case int:
+		}
+		c <- 1
+	}()
+	i = y
+	<-c
+}
+
+func TestRaceCaseTypeBody(t *testing.T) {
+	var x, y int
+	var i interface{} = &x
+	c := make(chan int, 1)
+	go func() {
+		switch i := i.(type) {
+		case nil:
+		case *int:
+			*i = y
+		}
+		c <- 1
+	}()
+	x = y
+	<-c
+}
+
+func TestRaceCaseTypeIssue5890(t *testing.T) {
+	// spurious extra instrumentation of the initial interface
+	// value.
+	var x, y int
+	m := make(map[int]map[int]interface{})
+	m[0] = make(map[int]interface{})
+	c := make(chan int, 1)
+	go func() {
+		switch i := m[0][1].(type) {
+		case nil:
+		case *int:
+			*i = x
+		}
+		c <- 1
+	}()
+	m[0][1] = y
+	<-c
+}
+
+func TestNoRaceRange(t *testing.T) {
+	ch := make(chan int, 3)
+	a := [...]int{1, 2, 3}
+	for _, v := range a {
+		ch <- v
+	}
+	close(ch)
+}
+
+func TestNoRaceRangeIssue5446(t *testing.T) {
+	ch := make(chan int, 3)
+	a := []int{1, 2, 3}
+	b := []int{4}
+	// used to insert a spurious instrumentation of a[i]
+	// and crash.
+	i := 1
+	for i, a[i] = range b {
+		ch <- i
+	}
+	close(ch)
+}
+
+func TestRaceRange(t *testing.T) {
+	const N = 2
+	var a [N]int
+	var x, y int
+	_ = x + y
+	done := make(chan bool, N)
+	for i, v := range a {
+		go func(i int) {
+			// we don't want a write-vs-write race
+			// so there is no array b here
+			if i == 0 {
+				x = v
+			} else {
+				y = v
+			}
+			done <- true
+		}(i)
+		// Ensure the goroutine runs before we continue the loop.
+		runtime.Gosched()
+	}
+	for i := 0; i < N; i++ {
+		<-done
+	}
+}
+
+func TestRaceForInit(t *testing.T) {
+	c := make(chan int)
+	x := 0
+	go func() {
+		c <- x
+	}()
+	for x = 42; false; {
+	}
+	<-c
+}
+
+func TestNoRaceForInit(t *testing.T) {
+	done := make(chan bool)
+	c := make(chan bool)
+	x := 0
+	go func() {
+		for {
+			_, ok := <-c
+			if !ok {
+				done <- true
+				return
+			}
+			x++
+		}
+	}()
+	i := 0
+	for x = 42; i < 10; i++ {
+		c <- true
+	}
+	close(c)
+	<-done
+}
+
+func TestRaceForTest(t *testing.T) {
+	done := make(chan bool)
+	c := make(chan bool)
+	stop := false
+	go func() {
+		for {
+			_, ok := <-c
+			if !ok {
+				done <- true
+				return
+			}
+			stop = true
+		}
+	}()
+	for !stop {
+		c <- true
+	}
+	close(c)
+	<-done
+}
+
+func TestRaceForIncr(t *testing.T) {
+	done := make(chan bool)
+	c := make(chan bool)
+	x := 0
+	go func() {
+		for {
+			_, ok := <-c
+			if !ok {
+				done <- true
+				return
+			}
+			x++
+		}
+	}()
+	for i := 0; i < 10; x++ {
+		i++
+		c <- true
+	}
+	close(c)
+	<-done
+}
+
+func TestNoRaceForIncr(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	go func() {
+		x++
+		done <- true
+	}()
+	for i := 0; i < 0; x++ {
+	}
+	<-done
+}
+
+func TestRacePlus(t *testing.T) {
+	var x, y, z int
+	_ = y
+	ch := make(chan int, 2)
+
+	go func() {
+		y = x + z
+		ch <- 1
+	}()
+	go func() {
+		y = x + z + z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRacePlus2(t *testing.T) {
+	var x, y, z int
+	_ = y
+	ch := make(chan int, 2)
+
+	go func() {
+		x = 1
+		ch <- 1
+	}()
+	go func() {
+		y = +x + z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRacePlus(t *testing.T) {
+	var x, y, z, f int
+	_ = x + y + f
+	ch := make(chan int, 2)
+
+	go func() {
+		y = x + z
+		ch <- 1
+	}()
+	go func() {
+		f = z + x
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceComplement(t *testing.T) {
+	var x, y, z int
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = ^y
+		ch <- 1
+	}()
+	go func() {
+		y = ^z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceDiv(t *testing.T) {
+	var x, y, z int
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = y / (z + 1)
+		ch <- 1
+	}()
+	go func() {
+		y = z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceDivConst(t *testing.T) {
+	var x, y, z uint32
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = y / 3 // involves only a HMUL node
+		ch <- 1
+	}()
+	go func() {
+		y = z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceMod(t *testing.T) {
+	var x, y, z int
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = y % (z + 1)
+		ch <- 1
+	}()
+	go func() {
+		y = z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceModConst(t *testing.T) {
+	var x, y, z int
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = y % 3
+		ch <- 1
+	}()
+	go func() {
+		y = z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceRotate(t *testing.T) {
+	var x, y, z uint32
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = y<<12 | y>>20
+		ch <- 1
+	}()
+	go func() {
+		y = z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+// May crash if the instrumentation is reckless.
+func TestNoRaceEnoughRegisters(t *testing.T) {
+	// from erf.go
+	const (
+		sa1 = 1
+		sa2 = 2
+		sa3 = 3
+		sa4 = 4
+		sa5 = 5
+		sa6 = 6
+		sa7 = 7
+		sa8 = 8
+	)
+	var s, S float64
+	s = 3.1415
+	S = 1 + s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(sa5+s*(sa6+s*(sa7+s*sa8)))))))
+	s = S
+}
+
+// emptyFunc should not be inlined.
+func emptyFunc(x int) {
+	if false {
+		fmt.Println(x)
+	}
+}
+
+func TestRaceFuncArgument(t *testing.T) {
+	var x int
+	ch := make(chan bool, 1)
+	go func() {
+		emptyFunc(x)
+		ch <- true
+	}()
+	x = 1
+	<-ch
+}
+
+func TestRaceFuncArgument2(t *testing.T) {
+	var x int
+	ch := make(chan bool, 2)
+	go func() {
+		x = 42
+		ch <- true
+	}()
+	go func(y int) {
+		ch <- true
+	}(x)
+	<-ch
+	<-ch
+}
+
+func TestRaceSprint(t *testing.T) {
+	var x int
+	ch := make(chan bool, 1)
+	go func() {
+		fmt.Sprint(x)
+		ch <- true
+	}()
+	x = 1
+	<-ch
+}
+
+func TestRaceArrayCopy(t *testing.T) {
+	ch := make(chan bool, 1)
+	var a [5]int
+	go func() {
+		a[3] = 1
+		ch <- true
+	}()
+	a = [5]int{1, 2, 3, 4, 5}
+	<-ch
+}
+
+// Blows up a naive compiler.
+func TestRaceNestedArrayCopy(t *testing.T) {
+	ch := make(chan bool, 1)
+	type (
+		Point32   [2][2][2][2][2]Point
+		Point1024 [2][2][2][2][2]Point32
+		Point32k  [2][2][2][2][2]Point1024
+		Point1M   [2][2][2][2][2]Point32k
+	)
+	var a, b Point1M
+	go func() {
+		a[0][1][0][1][0][1][0][1][0][1][0][1][0][1][0][1][0][1][0][1].y = 1
+		ch <- true
+	}()
+	a = b
+	<-ch
+}
+
+func TestRaceStructRW(t *testing.T) {
+	p := Point{0, 0}
+	ch := make(chan bool, 1)
+	go func() {
+		p = Point{1, 1}
+		ch <- true
+	}()
+	q := p
+	<-ch
+	p = q
+}
+
+func TestRaceStructFieldRW1(t *testing.T) {
+	p := Point{0, 0}
+	ch := make(chan bool, 1)
+	go func() {
+		p.x = 1
+		ch <- true
+	}()
+	_ = p.x
+	<-ch
+}
+
+func TestNoRaceStructFieldRW1(t *testing.T) {
+	// Same struct, different variables, no
+	// pointers. The layout is known (at compile time?) ->
+	// no read on p
+	// writes on x and y
+	p := Point{0, 0}
+	ch := make(chan bool, 1)
+	go func() {
+		p.x = 1
+		ch <- true
+	}()
+	p.y = 1
+	<-ch
+	_ = p
+}
+
+func TestNoRaceStructFieldRW2(t *testing.T) {
+	// Same as NoRaceStructFieldRW1
+	// but p is a pointer, so there is a read on p
+	p := Point{0, 0}
+	ch := make(chan bool, 1)
+	go func() {
+		p.x = 1
+		ch <- true
+	}()
+	p.y = 1
+	<-ch
+	_ = p
+}
+
+func TestRaceStructFieldRW2(t *testing.T) {
+	p := &Point{0, 0}
+	ch := make(chan bool, 1)
+	go func() {
+		p.x = 1
+		ch <- true
+	}()
+	_ = p.x
+	<-ch
+}
+
+func TestRaceStructFieldRW3(t *testing.T) {
+	p := NamedPoint{name: "a", p: Point{0, 0}}
+	ch := make(chan bool, 1)
+	go func() {
+		p.p.x = 1
+		ch <- true
+	}()
+	_ = p.p.x
+	<-ch
+}
+
+func TestRaceEfaceWW(t *testing.T) {
+	var a, b interface{}
+	ch := make(chan bool, 1)
+	go func() {
+		a = 1
+		ch <- true
+	}()
+	a = 2
+	<-ch
+	_, _ = a, b
+}
+
+func TestRaceIfaceWW(t *testing.T) {
+	var a, b Writer
+	ch := make(chan bool, 1)
+	go func() {
+		a = DummyWriter{1}
+		ch <- true
+	}()
+	a = DummyWriter{2}
+	<-ch
+	b = a
+	a = b
+}
+
+func TestRaceIfaceCmp(t *testing.T) {
+	var a, b Writer
+	a = DummyWriter{1}
+	ch := make(chan bool, 1)
+	go func() {
+		a = DummyWriter{1}
+		ch <- true
+	}()
+	_ = a == b
+	<-ch
+}
+
+func TestRaceIfaceCmpNil(t *testing.T) {
+	var a Writer
+	a = DummyWriter{1}
+	ch := make(chan bool, 1)
+	go func() {
+		a = DummyWriter{1}
+		ch <- true
+	}()
+	_ = a == nil
+	<-ch
+}
+
+func TestRaceEfaceConv(t *testing.T) {
+	c := make(chan bool)
+	v := 0
+	go func() {
+		go func(x interface{}) {
+		}(v)
+		c <- true
+	}()
+	v = 42
+	<-c
+}
+
+type OsFile struct{}
+
+func (*OsFile) Read() {
+}
+
+type IoReader interface {
+	Read()
+}
+
+func TestRaceIfaceConv(t *testing.T) {
+	c := make(chan bool)
+	f := &OsFile{}
+	go func() {
+		go func(x IoReader) {
+		}(f)
+		c <- true
+	}()
+	f = &OsFile{}
+	<-c
+}
+
+func TestRaceError(t *testing.T) {
+	ch := make(chan bool, 1)
+	var err error
+	go func() {
+		err = nil
+		ch <- true
+	}()
+	_ = err
+	<-ch
+}
+
+func TestRaceIntptrRW(t *testing.T) {
+	var x, y int
+	var p *int = &x
+	ch := make(chan bool, 1)
+	go func() {
+		*p = 5
+		ch <- true
+	}()
+	y = *p
+	x = y
+	<-ch
+}
+
+func TestRaceStringRW(t *testing.T) {
+	ch := make(chan bool, 1)
+	s := ""
+	go func() {
+		s = "abacaba"
+		ch <- true
+	}()
+	_ = s
+	<-ch
+}
+
+func TestRaceStringPtrRW(t *testing.T) {
+	ch := make(chan bool, 1)
+	var x string
+	p := &x
+	go func() {
+		*p = "a"
+		ch <- true
+	}()
+	_ = *p
+	<-ch
+}
+
+func TestRaceFloat64WW(t *testing.T) {
+	var x, y float64
+	ch := make(chan bool, 1)
+	go func() {
+		x = 1.0
+		ch <- true
+	}()
+	x = 2.0
+	<-ch
+
+	y = x
+	x = y
+}
+
+func TestRaceComplex128WW(t *testing.T) {
+	var x, y complex128
+	ch := make(chan bool, 1)
+	go func() {
+		x = 2 + 2i
+		ch <- true
+	}()
+	x = 4 + 4i
+	<-ch
+
+	y = x
+	x = y
+}
+
+func TestRaceUnsafePtrRW(t *testing.T) {
+	var x, y, z int
+	x, y, z = 1, 2, 3
+	var p unsafe.Pointer = unsafe.Pointer(&x)
+	ch := make(chan bool, 1)
+	go func() {
+		p = (unsafe.Pointer)(&z)
+		ch <- true
+	}()
+	y = *(*int)(p)
+	x = y
+	<-ch
+}
+
+func TestRaceFuncVariableRW(t *testing.T) {
+	var f func(x int) int
+	f = func(x int) int {
+		return x * x
+	}
+	ch := make(chan bool, 1)
+	go func() {
+		f = func(x int) int {
+			return x
+		}
+		ch <- true
+	}()
+	y := f(1)
+	<-ch
+	x := y
+	y = x
+}
+
+func TestRaceFuncVariableWW(t *testing.T) {
+	var f func(x int) int
+	_ = f
+	ch := make(chan bool, 1)
+	go func() {
+		f = func(x int) int {
+			return x
+		}
+		ch <- true
+	}()
+	f = func(x int) int {
+		return x * x
+	}
+	<-ch
+}
+
+// This one should not belong to mop_test
+func TestRacePanic(t *testing.T) {
+	var x int
+	_ = x
+	var zero int = 0
+	ch := make(chan bool, 2)
+	go func() {
+		defer func() {
+			err := recover()
+			if err == nil {
+				panic("should be panicking")
+			}
+			x = 1
+			ch <- true
+		}()
+		var y int = 1 / zero
+		zero = y
+	}()
+	go func() {
+		defer func() {
+			err := recover()
+			if err == nil {
+				panic("should be panicking")
+			}
+			x = 2
+			ch <- true
+		}()
+		var y int = 1 / zero
+		zero = y
+	}()
+
+	<-ch
+	<-ch
+	if zero != 0 {
+		panic("zero has changed")
+	}
+}
+
+func TestNoRaceBlank(t *testing.T) {
+	var a [5]int
+	ch := make(chan bool, 1)
+	go func() {
+		_, _ = a[0], a[1]
+		ch <- true
+	}()
+	_, _ = a[2], a[3]
+	<-ch
+	a[1] = a[0]
+}
+
+func TestRaceAppendRW(t *testing.T) {
+	a := make([]int, 10)
+	ch := make(chan bool)
+	go func() {
+		_ = append(a, 1)
+		ch <- true
+	}()
+	a[0] = 1
+	<-ch
+}
+
+func TestRaceAppendLenRW(t *testing.T) {
+	a := make([]int, 0)
+	ch := make(chan bool)
+	go func() {
+		a = append(a, 1)
+		ch <- true
+	}()
+	_ = len(a)
+	<-ch
+}
+
+func TestRaceAppendCapRW(t *testing.T) {
+	a := make([]int, 0)
+	ch := make(chan string)
+	go func() {
+		a = append(a, 1)
+		ch <- ""
+	}()
+	_ = cap(a)
+	<-ch
+}
+
+func TestNoRaceFuncArgsRW(t *testing.T) {
+	ch := make(chan byte, 1)
+	var x byte
+	go func(y byte) {
+		_ = y
+		ch <- 0
+	}(x)
+	x = 1
+	<-ch
+}
+
+func TestRaceFuncArgsRW(t *testing.T) {
+	ch := make(chan byte, 1)
+	var x byte
+	go func(y *byte) {
+		_ = *y
+		ch <- 0
+	}(&x)
+	x = 1
+	<-ch
+}
+
+// from the mailing list, slightly modified
+// unprotected concurrent access to seen[]
+func TestRaceCrawl(t *testing.T) {
+	url := "dummyurl"
+	depth := 3
+	seen := make(map[string]bool)
+	ch := make(chan int, 100)
+	var wg sync.WaitGroup
+	var crawl func(string, int)
+	crawl = func(u string, d int) {
+		nurl := 0
+		defer func() {
+			ch <- nurl
+		}()
+		seen[u] = true
+		if d <= 0 {
+			wg.Done()
+			return
+		}
+		urls := [...]string{"a", "b", "c"}
+		for _, uu := range urls {
+			if _, ok := seen[uu]; !ok {
+				wg.Add(1)
+				go crawl(uu, d-1)
+				nurl++
+			}
+		}
+		wg.Done()
+	}
+	wg.Add(1)
+	go crawl(url, depth)
+	wg.Wait()
+}
+
+func TestRaceIndirection(t *testing.T) {
+	ch := make(chan struct{}, 1)
+	var y int
+	var x *int = &y
+	go func() {
+		*x = 1
+		ch <- struct{}{}
+	}()
+	*x = 2
+	<-ch
+	_ = *x
+}
+
+func TestRaceRune(t *testing.T) {
+	c := make(chan bool)
+	var x rune
+	go func() {
+		x = 1
+		c <- true
+	}()
+	_ = x
+	<-c
+}
+
+func TestRaceEmptyInterface1(t *testing.T) {
+	c := make(chan bool)
+	var x interface{}
+	go func() {
+		x = nil
+		c <- true
+	}()
+	_ = x
+	<-c
+}
+
+func TestRaceEmptyInterface2(t *testing.T) {
+	c := make(chan bool)
+	var x interface{}
+	go func() {
+		x = &Point{}
+		c <- true
+	}()
+	_ = x
+	<-c
+}
+
+func TestRaceTLS(t *testing.T) {
+	comm := make(chan *int)
+	done := make(chan bool, 2)
+	go func() {
+		var x int
+		comm <- &x
+		x = 1
+		x = *(<-comm)
+		done <- true
+	}()
+	go func() {
+		p := <-comm
+		*p = 2
+		comm <- p
+		done <- true
+	}()
+	<-done
+	<-done
+}
+
+func TestNoRaceHeapReallocation(t *testing.T) {
+	// It is possible that a future implementation
+	// of memory allocation will ruin this test.
+	// Increasing n might help in this case, so
+	// this test is a bit more generic than most of the
+	// others.
+	const n = 2
+	done := make(chan bool, n)
+	empty := func(p *int) {}
+	for i := 0; i < n; i++ {
+		ms := i
+		go func() {
+			<-time.After(time.Duration(ms) * time.Millisecond)
+			runtime.GC()
+			var x int
+			empty(&x) // x goes to the heap
+			done <- true
+		}()
+	}
+	for i := 0; i < n; i++ {
+		<-done
+	}
+}
+
+func TestRaceAnd(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if x == 1 && y == 1 {
+	}
+	<-c
+}
+
+func TestRaceAnd2(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if y == 0 && x == 1 {
+	}
+	<-c
+}
+
+func TestNoRaceAnd(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if y == 1 && x == 1 {
+	}
+	<-c
+}
+
+func TestRaceOr(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if x == 1 || y == 1 {
+	}
+	<-c
+}
+
+func TestRaceOr2(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if y == 1 || x == 1 {
+	}
+	<-c
+}
+
+func TestNoRaceOr(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if y == 0 || x == 1 {
+	}
+	<-c
+}
+
+func TestNoRaceShortCalc(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		y = 1
+		c <- true
+	}()
+	if x == 0 || y == 0 {
+	}
+	<-c
+}
+
+func TestNoRaceShortCalc2(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		y = 1
+		c <- true
+	}()
+	if x == 1 && y == 0 {
+	}
+	<-c
+}
+
+func TestRaceFuncItself(t *testing.T) {
+	c := make(chan bool)
+	f := func() {}
+	go func() {
+		f()
+		c <- true
+	}()
+	f = func() {}
+	<-c
+}
+
+func TestNoRaceFuncUnlock(t *testing.T) {
+	ch := make(chan bool, 1)
+	var mu sync.Mutex
+	x := 0
+	_ = x
+	go func() {
+		mu.Lock()
+		x = 42
+		mu.Unlock()
+		ch <- true
+	}()
+	x = func(mu *sync.Mutex) int {
+		mu.Lock()
+		return 43
+	}(&mu)
+	mu.Unlock()
+	<-ch
+}
+
+func TestRaceStructInit(t *testing.T) {
+	type X struct {
+		x, y int
+	}
+	c := make(chan bool, 1)
+	y := 0
+	go func() {
+		y = 42
+		c <- true
+	}()
+	x := X{x: y}
+	_ = x
+	<-c
+}
+
+func TestRaceArrayInit(t *testing.T) {
+	c := make(chan bool, 1)
+	y := 0
+	go func() {
+		y = 42
+		c <- true
+	}()
+	x := []int{0, y, 42}
+	_ = x
+	<-c
+}
+
+func TestRaceMapInit(t *testing.T) {
+	c := make(chan bool, 1)
+	y := 0
+	go func() {
+		y = 42
+		c <- true
+	}()
+	x := map[int]int{0: 42, y: 42}
+	_ = x
+	<-c
+}
+
+func TestRaceMapInit2(t *testing.T) {
+	c := make(chan bool, 1)
+	y := 0
+	go func() {
+		y = 42
+		c <- true
+	}()
+	x := map[int]int{0: 42, 42: y}
+	_ = x
+	<-c
+}
+
+type Inter interface {
+	Foo(x int)
+}
+type InterImpl struct {
+	x, y int
+}
+
+//go:noinline
+func (p InterImpl) Foo(x int) {
+}
+
+type InterImpl2 InterImpl
+
+func (p *InterImpl2) Foo(x int) {
+	if p == nil {
+		InterImpl{}.Foo(x)
+	}
+	InterImpl(*p).Foo(x)
+}
+
+func TestRaceInterCall(t *testing.T) {
+	c := make(chan bool, 1)
+	p := InterImpl{}
+	var x Inter = p
+	go func() {
+		p2 := InterImpl{}
+		x = p2
+		c <- true
+	}()
+	x.Foo(0)
+	<-c
+}
+
+func TestRaceInterCall2(t *testing.T) {
+	c := make(chan bool, 1)
+	p := InterImpl{}
+	var x Inter = p
+	z := 0
+	go func() {
+		z = 42
+		c <- true
+	}()
+	x.Foo(z)
+	<-c
+}
+
+func TestRaceFuncCall(t *testing.T) {
+	c := make(chan bool, 1)
+	f := func(x, y int) {}
+	x, y := 0, 0
+	go func() {
+		y = 42
+		c <- true
+	}()
+	f(x, y)
+	<-c
+}
+
+func TestRaceMethodCall(t *testing.T) {
+	c := make(chan bool, 1)
+	i := InterImpl{}
+	x := 0
+	go func() {
+		x = 42
+		c <- true
+	}()
+	i.Foo(x)
+	<-c
+}
+
+func TestRaceMethodCall2(t *testing.T) {
+	c := make(chan bool, 1)
+	i := &InterImpl{}
+	go func() {
+		i = &InterImpl{}
+		c <- true
+	}()
+	i.Foo(0)
+	<-c
+}
+
+// Method value with concrete value receiver.
+func TestRaceMethodValue(t *testing.T) {
+	c := make(chan bool, 1)
+	i := InterImpl{}
+	go func() {
+		i = InterImpl{}
+		c <- true
+	}()
+	_ = i.Foo
+	<-c
+}
+
+// Method value with interface receiver.
+func TestRaceMethodValue2(t *testing.T) {
+	c := make(chan bool, 1)
+	var i Inter = InterImpl{}
+	go func() {
+		i = InterImpl{}
+		c <- true
+	}()
+	_ = i.Foo
+	<-c
+}
+
+// Method value with implicit dereference.
+func TestRaceMethodValue3(t *testing.T) {
+	c := make(chan bool, 1)
+	i := &InterImpl{}
+	go func() {
+		*i = InterImpl{}
+		c <- true
+	}()
+	_ = i.Foo // dereferences i.
+	<-c
+}
+
+// Method value implicitly taking receiver address.
+func TestNoRaceMethodValue(t *testing.T) {
+	c := make(chan bool, 1)
+	i := InterImpl2{}
+	go func() {
+		i = InterImpl2{}
+		c <- true
+	}()
+	_ = i.Foo // takes the address of i only.
+	<-c
+}
+
+func TestRacePanicArg(t *testing.T) {
+	c := make(chan bool, 1)
+	err := errors.New("err")
+	go func() {
+		err = errors.New("err2")
+		c <- true
+	}()
+	defer func() {
+		recover()
+		<-c
+	}()
+	panic(err)
+}
+
+func TestRaceDeferArg(t *testing.T) {
+	c := make(chan bool, 1)
+	x := 0
+	go func() {
+		x = 42
+		c <- true
+	}()
+	func() {
+		defer func(x int) {
+		}(x)
+	}()
+	<-c
+}
+
+type DeferT int
+
+func (d DeferT) Foo() {
+}
+
+func TestRaceDeferArg2(t *testing.T) {
+	c := make(chan bool, 1)
+	var x DeferT
+	go func() {
+		var y DeferT
+		x = y
+		c <- true
+	}()
+	func() {
+		defer x.Foo()
+	}()
+	<-c
+}
+
+func TestNoRaceAddrExpr(t *testing.T) {
+	c := make(chan bool, 1)
+	x := 0
+	go func() {
+		x = 42
+		c <- true
+	}()
+	_ = &x
+	<-c
+}
+
+type AddrT struct {
+	_ [256]byte
+	x int
+}
+
+type AddrT2 struct {
+	_ [512]byte
+	p *AddrT
+}
+
+func TestRaceAddrExpr(t *testing.T) {
+	c := make(chan bool, 1)
+	a := AddrT2{p: &AddrT{x: 42}}
+	go func() {
+		a.p = &AddrT{x: 43}
+		c <- true
+	}()
+	_ = &a.p.x
+	<-c
+}
+
+func TestRaceTypeAssert(t *testing.T) {
+	c := make(chan bool, 1)
+	x := 0
+	var i interface{} = x
+	go func() {
+		y := 0
+		i = y
+		c <- true
+	}()
+	_ = i.(int)
+	<-c
+}
+
+func TestRaceBlockAs(t *testing.T) {
+	c := make(chan bool, 1)
+	var x, y int
+	go func() {
+		x = 42
+		c <- true
+	}()
+	x, y = y, x
+	<-c
+}
+
+func TestRaceBlockCall1(t *testing.T) {
+	done := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		f := func() (int, int) {
+			return 42, 43
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = x
+	<-done
+	if x != 42 || y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceBlockCall2(t *testing.T) {
+	done := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		f := func() (int, int) {
+			return 42, 43
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = y
+	<-done
+	if x != 42 || y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceBlockCall3(t *testing.T) {
+	done := make(chan bool)
+	var x *int
+	y := 0
+	go func() {
+		f := func() (*int, int) {
+			i := 42
+			return &i, 43
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = x
+	<-done
+	if *x != 42 || y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceBlockCall4(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	var y *int
+	go func() {
+		f := func() (int, *int) {
+			i := 43
+			return 42, &i
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = y
+	<-done
+	if x != 42 || *y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceBlockCall5(t *testing.T) {
+	done := make(chan bool)
+	var x *int
+	y := 0
+	go func() {
+		f := func() (*int, int) {
+			i := 42
+			return &i, 43
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = y
+	<-done
+	if *x != 42 || y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceBlockCall6(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	var y *int
+	go func() {
+		f := func() (int, *int) {
+			i := 43
+			return 42, &i
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = x
+	<-done
+	if x != 42 || *y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceSliceSlice(t *testing.T) {
+	c := make(chan bool, 1)
+	x := make([]int, 10)
+	go func() {
+		x = make([]int, 20)
+		c <- true
+	}()
+	_ = x[2:3]
+	<-c
+}
+
+func TestRaceSliceSlice2(t *testing.T) {
+	c := make(chan bool, 1)
+	x := make([]int, 10)
+	i := 2
+	go func() {
+		i = 3
+		c <- true
+	}()
+	_ = x[i:4]
+	<-c
+}
+
+func TestRaceSliceString(t *testing.T) {
+	c := make(chan bool, 1)
+	x := "hello"
+	go func() {
+		x = "world"
+		c <- true
+	}()
+	_ = x[2:3]
+	<-c
+}
+
+func TestRaceSliceStruct(t *testing.T) {
+	type X struct {
+		x, y int
+	}
+	c := make(chan bool, 1)
+	x := make([]X, 10)
+	go func() {
+		y := make([]X, 10)
+		copy(y, x)
+		c <- true
+	}()
+	x[1].y = 42
+	<-c
+}
+
+func TestRaceAppendSliceStruct(t *testing.T) {
+	type X struct {
+		x, y int
+	}
+	c := make(chan bool, 1)
+	x := make([]X, 10)
+	go func() {
+		y := make([]X, 0, 10)
+		y = append(y, x...)
+		c <- true
+	}()
+	x[1].y = 42
+	<-c
+}
+
+func TestRaceStructInd(t *testing.T) {
+	c := make(chan bool, 1)
+	type Item struct {
+		x, y int
+	}
+	i := Item{}
+	go func(p *Item) {
+		*p = Item{}
+		c <- true
+	}(&i)
+	i.y = 42
+	<-c
+}
+
+func TestRaceAsFunc1(t *testing.T) {
+	var s []byte
+	c := make(chan bool, 1)
+	go func() {
+		var err error
+		s, err = func() ([]byte, error) {
+			t := []byte("hello world")
+			return t, nil
+		}()
+		c <- true
+		_ = err
+	}()
+	_ = string(s)
+	<-c
+}
+
+func TestRaceAsFunc2(t *testing.T) {
+	c := make(chan bool, 1)
+	x := 0
+	go func() {
+		func(x int) {
+		}(x)
+		c <- true
+	}()
+	x = 42
+	<-c
+}
+
+func TestRaceAsFunc3(t *testing.T) {
+	c := make(chan bool, 1)
+	var mu sync.Mutex
+	x := 0
+	go func() {
+		func(x int) {
+			mu.Lock()
+		}(x) // Read of x must be outside of the mutex.
+		mu.Unlock()
+		c <- true
+	}()
+	mu.Lock()
+	x = 42
+	mu.Unlock()
+	<-c
+}
+
+func TestNoRaceAsFunc4(t *testing.T) {
+	c := make(chan bool, 1)
+	var mu sync.Mutex
+	x := 0
+	_ = x
+	go func() {
+		x = func() int { // Write of x must be under the mutex.
+			mu.Lock()
+			return 42
+		}()
+		mu.Unlock()
+		c <- true
+	}()
+	mu.Lock()
+	x = 42
+	mu.Unlock()
+	<-c
+}
+
+func TestRaceHeapParam(t *testing.T) {
+	done := make(chan bool)
+	x := func() (x int) {
+		go func() {
+			x = 42
+			done <- true
+		}()
+		return
+	}()
+	_ = x
+	<-done
+}
+
+func TestNoRaceEmptyStruct(t *testing.T) {
+	type Empty struct{}
+	type X struct {
+		y int64
+		Empty
+	}
+	type Y struct {
+		x X
+		y int64
+	}
+	c := make(chan X)
+	var y Y
+	go func() {
+		x := y.x
+		c <- x
+	}()
+	y.y = 42
+	<-c
+}
+
+func TestRaceNestedStruct(t *testing.T) {
+	type X struct {
+		x, y int
+	}
+	type Y struct {
+		x X
+	}
+	c := make(chan Y)
+	var y Y
+	go func() {
+		c <- y
+	}()
+	y.x.y = 42
+	<-c
+}
+
+func TestRaceIssue5567(t *testing.T) {
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(4))
+	in := make(chan []byte)
+	res := make(chan error)
+	go func() {
+		var err error
+		defer func() {
+			close(in)
+			res <- err
+		}()
+		path := "mop_test.go"
+		f, err := os.Open(path)
+		if err != nil {
+			return
+		}
+		defer f.Close()
+		var n, total int
+		b := make([]byte, 17) // the race is on b buffer
+		for err == nil {
+			n, err = f.Read(b)
+			total += n
+			if n > 0 {
+				in <- b[:n]
+			}
+		}
+		if err == io.EOF {
+			err = nil
+		}
+	}()
+	h := sha1.New()
+	for b := range in {
+		h.Write(b)
+	}
+	_ = h.Sum(nil)
+	err := <-res
+	if err != nil {
+		t.Fatal(err)
+	}
+}
+
+func TestRaceIssue5654(t *testing.T) {
+	text := `Friends, Romans, countrymen, lend me your ears;
+I come to bury Caesar, not to praise him.
+The evil that men do lives after them;
+The good is oft interred with their bones;
+So let it be with Caesar. The noble Brutus
+Hath told you Caesar was ambitious:
+If it were so, it was a grievous fault,
+And grievously hath Caesar answer'd it.
+Here, under leave of Brutus and the rest -
+For Brutus is an honourable man;
+So are they all, all honourable men -
+Come I to speak in Caesar's funeral.
+He was my friend, faithful and just to me:
+But Brutus says he was ambitious;
+And Brutus is an honourable man.`
+
+	data := bytes.NewBufferString(text)
+	in := make(chan []byte)
+
+	go func() {
+		buf := make([]byte, 16)
+		var n int
+		var err error
+		for ; err == nil; n, err = data.Read(buf) {
+			in <- buf[:n]
+		}
+		close(in)
+	}()
+	res := ""
+	for s := range in {
+		res += string(s)
+	}
+	_ = res
+}
+
+type Base int
+
+func (b *Base) Foo() int {
+	return 42
+}
+
+func (b Base) Bar() int {
+	return int(b)
+}
+
+func TestNoRaceMethodThunk(t *testing.T) {
+	type Derived struct {
+		pad int
+		Base
+	}
+	var d Derived
+	done := make(chan bool)
+	go func() {
+		_ = d.Foo()
+		done <- true
+	}()
+	d = Derived{}
+	<-done
+}
+
+func TestRaceMethodThunk(t *testing.T) {
+	type Derived struct {
+		pad int
+		*Base
+	}
+	var d Derived
+	done := make(chan bool)
+	go func() {
+		_ = d.Foo()
+		done <- true
+	}()
+	d = Derived{}
+	<-done
+}
+
+func TestRaceMethodThunk2(t *testing.T) {
+	type Derived struct {
+		pad int
+		Base
+	}
+	var d Derived
+	done := make(chan bool)
+	go func() {
+		_ = d.Bar()
+		done <- true
+	}()
+	d = Derived{}
+	<-done
+}
+
+func TestRaceMethodThunk3(t *testing.T) {
+	type Derived struct {
+		pad int
+		*Base
+	}
+	var d Derived
+	d.Base = new(Base)
+	done := make(chan bool)
+	go func() {
+		_ = d.Bar()
+		done <- true
+	}()
+	d.Base = new(Base)
+	<-done
+}
+
+func TestRaceMethodThunk4(t *testing.T) {
+	type Derived struct {
+		pad int
+		*Base
+	}
+	var d Derived
+	d.Base = new(Base)
+	done := make(chan bool)
+	go func() {
+		_ = d.Bar()
+		done <- true
+	}()
+	*(*int)(d.Base) = 42
+	<-done
+}
+
+func TestNoRaceTinyAlloc(t *testing.T) {
+	const P = 4
+	const N = 1e6
+	var tinySink *byte
+	_ = tinySink
+	done := make(chan bool)
+	for p := 0; p < P; p++ {
+		go func() {
+			for i := 0; i < N; i++ {
+				var b byte
+				if b != 0 {
+					tinySink = &b // make it heap allocated
+				}
+				b = 42
+			}
+			done <- true
+		}()
+	}
+	for p := 0; p < P; p++ {
+		<-done
+	}
+}
diff --git a/src/runtime/race/testdata/mutex_test.go b/src/runtime/race/testdata/mutex_test.go
new file mode 100644
index 0000000..cbed2d3
--- /dev/null
+++ b/src/runtime/race/testdata/mutex_test.go
@@ -0,0 +1,143 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNoRaceMutex(t *testing.T) {
+	var mu sync.Mutex
+	var x int16 = 0
+	_ = x
+	ch := make(chan bool, 2)
+	go func() {
+		mu.Lock()
+		defer mu.Unlock()
+		x = 1
+		ch <- true
+	}()
+	go func() {
+		mu.Lock()
+		x = 2
+		mu.Unlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceMutex(t *testing.T) {
+	var mu sync.Mutex
+	var x int16 = 0
+	_ = x
+	ch := make(chan bool, 2)
+	go func() {
+		x = 1
+		mu.Lock()
+		defer mu.Unlock()
+		ch <- true
+	}()
+	go func() {
+		x = 2
+		mu.Lock()
+		mu.Unlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceMutex2(t *testing.T) {
+	var mu1 sync.Mutex
+	var mu2 sync.Mutex
+	var x int8 = 0
+	_ = x
+	ch := make(chan bool, 2)
+	go func() {
+		mu1.Lock()
+		defer mu1.Unlock()
+		x = 1
+		ch <- true
+	}()
+	go func() {
+		mu2.Lock()
+		x = 2
+		mu2.Unlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceMutexPureHappensBefore(t *testing.T) {
+	var mu sync.Mutex
+	var x int16 = 0
+	_ = x
+	ch := make(chan bool, 2)
+	go func() {
+		x = 1
+		mu.Lock()
+		mu.Unlock()
+		ch <- true
+	}()
+	go func() {
+		<-time.After(1e5)
+		mu.Lock()
+		mu.Unlock()
+		x = 1
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceMutexSemaphore(t *testing.T) {
+	var mu sync.Mutex
+	ch := make(chan bool, 2)
+	x := 0
+	_ = x
+	mu.Lock()
+	go func() {
+		x = 1
+		mu.Unlock()
+		ch <- true
+	}()
+	go func() {
+		mu.Lock()
+		x = 2
+		mu.Unlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+// from doc/go_mem.html
+func TestNoRaceMutexExampleFromHtml(t *testing.T) {
+	var l sync.Mutex
+	a := ""
+
+	l.Lock()
+	go func() {
+		a = "hello, world"
+		l.Unlock()
+	}()
+	l.Lock()
+	_ = a
+}
+
+func TestRaceMutexOverwrite(t *testing.T) {
+	c := make(chan bool, 1)
+	var mu sync.Mutex
+	go func() {
+		mu = sync.Mutex{}
+		c <- true
+	}()
+	mu.Lock()
+	<-c
+}
diff --git a/src/runtime/race/testdata/pool_test.go b/src/runtime/race/testdata/pool_test.go
new file mode 100644
index 0000000..161f4b7
--- /dev/null
+++ b/src/runtime/race/testdata/pool_test.go
@@ -0,0 +1,47 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestRacePool(t *testing.T) {
+	// Pool randomly drops the argument on the floor during Put.
+	// Repeat so that at least one iteration gets reuse.
+	for i := 0; i < 10; i++ {
+		c := make(chan int)
+		p := &sync.Pool{New: func() interface{} { return make([]byte, 10) }}
+		x := p.Get().([]byte)
+		x[0] = 1
+		p.Put(x)
+		go func() {
+			y := p.Get().([]byte)
+			y[0] = 2
+			c <- 1
+		}()
+		x[0] = 3
+		<-c
+	}
+}
+
+func TestNoRacePool(t *testing.T) {
+	for i := 0; i < 10; i++ {
+		p := &sync.Pool{New: func() interface{} { return make([]byte, 10) }}
+		x := p.Get().([]byte)
+		x[0] = 1
+		p.Put(x)
+		go func() {
+			y := p.Get().([]byte)
+			y[0] = 2
+			p.Put(y)
+		}()
+		time.Sleep(100 * time.Millisecond)
+		x = p.Get().([]byte)
+		x[0] = 3
+	}
+}
diff --git a/src/runtime/race/testdata/reflect_test.go b/src/runtime/race/testdata/reflect_test.go
new file mode 100644
index 0000000..b567400
--- /dev/null
+++ b/src/runtime/race/testdata/reflect_test.go
@@ -0,0 +1,46 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"reflect"
+	"testing"
+)
+
+func TestRaceReflectRW(t *testing.T) {
+	ch := make(chan bool, 1)
+	i := 0
+	v := reflect.ValueOf(&i)
+	go func() {
+		v.Elem().Set(reflect.ValueOf(1))
+		ch <- true
+	}()
+	_ = v.Elem().Int()
+	<-ch
+}
+
+func TestRaceReflectWW(t *testing.T) {
+	ch := make(chan bool, 1)
+	i := 0
+	v := reflect.ValueOf(&i)
+	go func() {
+		v.Elem().Set(reflect.ValueOf(1))
+		ch <- true
+	}()
+	v.Elem().Set(reflect.ValueOf(2))
+	<-ch
+}
+
+func TestRaceReflectCopyWW(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]byte, 2)
+	v := reflect.ValueOf(a)
+	go func() {
+		reflect.Copy(v, v)
+		ch <- true
+	}()
+	reflect.Copy(v, v)
+	<-ch
+}
diff --git a/src/runtime/race/testdata/regression_test.go b/src/runtime/race/testdata/regression_test.go
new file mode 100644
index 0000000..6a7802f
--- /dev/null
+++ b/src/runtime/race/testdata/regression_test.go
@@ -0,0 +1,189 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Code patterns that caused problems in the past.
+
+package race_test
+
+import (
+	"testing"
+)
+
+type LogImpl struct {
+	x int
+}
+
+func NewLog() (l LogImpl) {
+	c := make(chan bool)
+	go func() {
+		_ = l
+		c <- true
+	}()
+	l = LogImpl{}
+	<-c
+	return
+}
+
+var _ LogImpl = NewLog()
+
+func MakeMap() map[int]int {
+	return make(map[int]int)
+}
+
+func InstrumentMapLen() {
+	_ = len(MakeMap())
+}
+
+func InstrumentMapLen2() {
+	m := make(map[int]map[int]int)
+	_ = len(m[0])
+}
+
+func InstrumentMapLen3() {
+	m := make(map[int]*map[int]int)
+	_ = len(*m[0])
+}
+
+func TestRaceUnaddressableMapLen(t *testing.T) {
+	m := make(map[int]map[int]int)
+	ch := make(chan int, 1)
+	m[0] = make(map[int]int)
+	go func() {
+		_ = len(m[0])
+		ch <- 0
+	}()
+	m[0][0] = 1
+	<-ch
+}
+
+type Rect struct {
+	x, y int
+}
+
+type Image struct {
+	min, max Rect
+}
+
+//go:noinline
+func NewImage() Image {
+	return Image{}
+}
+
+func AddrOfTemp() {
+	_ = NewImage().min
+}
+
+type TypeID int
+
+func (t *TypeID) encodeType(x int) (tt TypeID, err error) {
+	switch x {
+	case 0:
+		return t.encodeType(x * x)
+	}
+	return 0, nil
+}
+
+type stack []int
+
+func (s *stack) push(x int) {
+	*s = append(*s, x)
+}
+
+func (s *stack) pop() int {
+	i := len(*s)
+	n := (*s)[i-1]
+	*s = (*s)[:i-1]
+	return n
+}
+
+func TestNoRaceStackPushPop(t *testing.T) {
+	var s stack
+	go func(s *stack) {}(&s)
+	s.push(1)
+	x := s.pop()
+	_ = x
+}
+
+type RpcChan struct {
+	c chan bool
+}
+
+var makeChanCalls int
+
+//go:noinline
+func makeChan() *RpcChan {
+	makeChanCalls++
+	c := &RpcChan{make(chan bool, 1)}
+	c.c <- true
+	return c
+}
+
+func call() bool {
+	x := <-makeChan().c
+	return x
+}
+
+func TestNoRaceRpcChan(t *testing.T) {
+	makeChanCalls = 0
+	_ = call()
+	if makeChanCalls != 1 {
+		t.Fatalf("makeChanCalls %d, expected 1\n", makeChanCalls)
+	}
+}
+
+func divInSlice() {
+	v := make([]int64, 10)
+	i := 1
+	_ = v[(i*4)/3]
+}
+
+func TestNoRaceReturn(t *testing.T) {
+	c := make(chan int)
+	noRaceReturn(c)
+	<-c
+}
+
+// Return used to do an implicit a = a, causing a read/write race
+// with the goroutine. Compiler has an optimization to avoid that now.
+// See issue 4014.
+func noRaceReturn(c chan int) (a, b int) {
+	a = 42
+	go func() {
+		_ = a
+		c <- 1
+	}()
+	return a, 10
+}
+
+func issue5431() {
+	var p **inltype
+	if inlinetest(p).x && inlinetest(p).y {
+	} else if inlinetest(p).x || inlinetest(p).y {
+	}
+}
+
+type inltype struct {
+	x, y bool
+}
+
+func inlinetest(p **inltype) *inltype {
+	return *p
+}
+
+type iface interface {
+	Foo() *struct{ b bool }
+}
+
+type Int int
+
+func (i Int) Foo() *struct{ b bool } {
+	return &struct{ b bool }{false}
+}
+
+func TestNoRaceForInfiniteLoop(t *testing.T) {
+	var x Int
+	// interface conversion causes nodes to be put on init list
+	for iface(x).Foo().b {
+	}
+}
diff --git a/src/runtime/race/testdata/rwmutex_test.go b/src/runtime/race/testdata/rwmutex_test.go
new file mode 100644
index 0000000..39219e5
--- /dev/null
+++ b/src/runtime/race/testdata/rwmutex_test.go
@@ -0,0 +1,154 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestRaceMutexRWMutex(t *testing.T) {
+	var mu1 sync.Mutex
+	var mu2 sync.RWMutex
+	var x int16 = 0
+	_ = x
+	ch := make(chan bool, 2)
+	go func() {
+		mu1.Lock()
+		defer mu1.Unlock()
+		x = 1
+		ch <- true
+	}()
+	go func() {
+		mu2.Lock()
+		x = 2
+		mu2.Unlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceRWMutex(t *testing.T) {
+	var mu sync.RWMutex
+	var x, y int64 = 0, 1
+	_ = y
+	ch := make(chan bool, 2)
+	go func() {
+		mu.Lock()
+		defer mu.Unlock()
+		x = 2
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y = x
+		mu.RUnlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceRWMutexMultipleReaders(t *testing.T) {
+	var mu sync.RWMutex
+	var x, y int64 = 0, 1
+	ch := make(chan bool, 4)
+	go func() {
+		mu.Lock()
+		defer mu.Unlock()
+		x = 2
+		ch <- true
+	}()
+	// Use three readers so that no matter what order they're
+	// scheduled in, two will be on the same side of the write
+	// lock above.
+	go func() {
+		mu.RLock()
+		y = x + 1
+		mu.RUnlock()
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y = x + 2
+		mu.RUnlock()
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y = x + 3
+		mu.RUnlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+	<-ch
+	<-ch
+	_ = y
+}
+
+func TestNoRaceRWMutexMultipleReaders(t *testing.T) {
+	var mu sync.RWMutex
+	x := int64(0)
+	ch := make(chan bool, 4)
+	go func() {
+		mu.Lock()
+		defer mu.Unlock()
+		x = 2
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y := x + 1
+		_ = y
+		mu.RUnlock()
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y := x + 2
+		_ = y
+		mu.RUnlock()
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y := x + 3
+		_ = y
+		mu.RUnlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+	<-ch
+	<-ch
+}
+
+func TestNoRaceRWMutexTransitive(t *testing.T) {
+	var mu sync.RWMutex
+	x := int64(0)
+	ch := make(chan bool, 2)
+	go func() {
+		mu.RLock()
+		_ = x
+		mu.RUnlock()
+		ch <- true
+	}()
+	go func() {
+		time.Sleep(1e7)
+		mu.RLock()
+		_ = x
+		mu.RUnlock()
+		ch <- true
+	}()
+	time.Sleep(2e7)
+	mu.Lock()
+	x = 42
+	mu.Unlock()
+	<-ch
+	<-ch
+}
diff --git a/src/runtime/race/testdata/select_test.go b/src/runtime/race/testdata/select_test.go
new file mode 100644
index 0000000..9a43f9b
--- /dev/null
+++ b/src/runtime/race/testdata/select_test.go
@@ -0,0 +1,293 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"runtime"
+	"testing"
+)
+
+func TestNoRaceSelect1(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool)
+	c := make(chan bool)
+	c1 := make(chan bool)
+
+	go func() {
+		x = 1
+		// At least two channels are needed because
+		// otherwise the compiler optimizes select out.
+		// See comment in runtime/select.go:^func selectgo.
+		select {
+		case c <- true:
+		case c1 <- true:
+		}
+		compl <- true
+	}()
+	select {
+	case <-c:
+	case c1 <- true:
+	}
+	x = 2
+	<-compl
+}
+
+func TestNoRaceSelect2(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool)
+	c := make(chan bool)
+	c1 := make(chan bool)
+	go func() {
+		select {
+		case <-c:
+		case <-c1:
+		}
+		x = 1
+		compl <- true
+	}()
+	x = 2
+	close(c)
+	runtime.Gosched()
+	<-compl
+}
+
+func TestNoRaceSelect3(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool)
+	c := make(chan bool, 10)
+	c1 := make(chan bool)
+	go func() {
+		x = 1
+		select {
+		case c <- true:
+		case <-c1:
+		}
+		compl <- true
+	}()
+	<-c
+	x = 2
+	<-compl
+}
+
+func TestNoRaceSelect4(t *testing.T) {
+	type Task struct {
+		f    func()
+		done chan bool
+	}
+
+	queue := make(chan Task)
+	dummy := make(chan bool)
+
+	go func() {
+		for {
+			select {
+			case t := <-queue:
+				t.f()
+				t.done <- true
+			}
+		}
+	}()
+
+	doit := func(f func()) {
+		done := make(chan bool, 1)
+		select {
+		case queue <- Task{f, done}:
+		case <-dummy:
+		}
+		select {
+		case <-done:
+		case <-dummy:
+		}
+	}
+
+	var x int
+	doit(func() {
+		x = 1
+	})
+	_ = x
+}
+
+func TestNoRaceSelect5(t *testing.T) {
+	test := func(sel, needSched bool) {
+		var x int
+		_ = x
+		ch := make(chan bool)
+		c1 := make(chan bool)
+
+		done := make(chan bool, 2)
+		go func() {
+			if needSched {
+				runtime.Gosched()
+			}
+			// println(1)
+			x = 1
+			if sel {
+				select {
+				case ch <- true:
+				case <-c1:
+				}
+			} else {
+				ch <- true
+			}
+			done <- true
+		}()
+
+		go func() {
+			// println(2)
+			if sel {
+				select {
+				case <-ch:
+				case <-c1:
+				}
+			} else {
+				<-ch
+			}
+			x = 1
+			done <- true
+		}()
+		<-done
+		<-done
+	}
+
+	test(true, true)
+	test(true, false)
+	test(false, true)
+	test(false, false)
+}
+
+func TestRaceSelect1(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool, 2)
+	c := make(chan bool)
+	c1 := make(chan bool)
+
+	go func() {
+		<-c
+		<-c
+	}()
+	f := func() {
+		select {
+		case c <- true:
+		case c1 <- true:
+		}
+		x = 1
+		compl <- true
+	}
+	go f()
+	go f()
+	<-compl
+	<-compl
+}
+
+func TestRaceSelect2(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool)
+	c := make(chan bool)
+	c1 := make(chan bool)
+	go func() {
+		x = 1
+		select {
+		case <-c:
+		case <-c1:
+		}
+		compl <- true
+	}()
+	close(c)
+	x = 2
+	<-compl
+}
+
+func TestRaceSelect3(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool)
+	c := make(chan bool)
+	c1 := make(chan bool)
+	go func() {
+		x = 1
+		select {
+		case c <- true:
+		case c1 <- true:
+		}
+		compl <- true
+	}()
+	x = 2
+	select {
+	case <-c:
+	}
+	<-compl
+}
+
+func TestRaceSelect4(t *testing.T) {
+	done := make(chan bool, 1)
+	var x int
+	go func() {
+		select {
+		default:
+			x = 2
+		}
+		done <- true
+	}()
+	_ = x
+	<-done
+}
+
+// The idea behind this test:
+// there are two variables, access to one
+// of them is synchronized, access to the other
+// is not.
+// Select must (unconditionally) choose the non-synchronized variable
+// thus causing exactly one race.
+// Currently this test doesn't look like it accomplishes
+// this goal.
+func TestRaceSelect5(t *testing.T) {
+	done := make(chan bool, 1)
+	c1 := make(chan bool, 1)
+	c2 := make(chan bool)
+	var x, y int
+	go func() {
+		select {
+		case c1 <- true:
+			x = 1
+		case c2 <- true:
+			y = 1
+		}
+		done <- true
+	}()
+	_ = x
+	_ = y
+	<-done
+}
+
+// select statements may introduce
+// flakiness: whether this test contains
+// a race depends on the scheduling
+// (some may argue that the code contains
+// this race by definition)
+/*
+func TestFlakyDefault(t *testing.T) {
+	var x int
+	c := make(chan bool, 1)
+	done := make(chan bool, 1)
+	go func() {
+		select {
+		case <-c:
+			x = 2
+		default:
+			x = 3
+		}
+		done <- true
+	}()
+	x = 1
+	c <- true
+	_ = x
+	<-done
+}
+*/
diff --git a/src/runtime/race/testdata/slice_test.go b/src/runtime/race/testdata/slice_test.go
new file mode 100644
index 0000000..9009a9a
--- /dev/null
+++ b/src/runtime/race/testdata/slice_test.go
@@ -0,0 +1,608 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+)
+
+func TestRaceSliceRW(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 2)
+	go func() {
+		a[1] = 1
+		ch <- true
+	}()
+	_ = a[1]
+	<-ch
+}
+
+func TestNoRaceSliceRW(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 2)
+	go func() {
+		a[0] = 1
+		ch <- true
+	}()
+	_ = a[1]
+	<-ch
+}
+
+func TestRaceSliceWW(t *testing.T) {
+	a := make([]int, 10)
+	ch := make(chan bool, 1)
+	go func() {
+		a[1] = 1
+		ch <- true
+	}()
+	a[1] = 2
+	<-ch
+}
+
+func TestNoRaceArrayWW(t *testing.T) {
+	var a [5]int
+	ch := make(chan bool, 1)
+	go func() {
+		a[0] = 1
+		ch <- true
+	}()
+	a[1] = 2
+	<-ch
+}
+
+func TestRaceArrayWW(t *testing.T) {
+	var a [5]int
+	ch := make(chan bool, 1)
+	go func() {
+		a[1] = 1
+		ch <- true
+	}()
+	a[1] = 2
+	<-ch
+}
+
+func TestNoRaceSliceWriteLen(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]bool, 1)
+	go func() {
+		a[0] = true
+		ch <- true
+	}()
+	_ = len(a)
+	<-ch
+}
+
+func TestNoRaceSliceWriteCap(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]uint64, 100)
+	go func() {
+		a[50] = 123
+		ch <- true
+	}()
+	_ = cap(a)
+	<-ch
+}
+
+func TestRaceSliceCopyRead(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 10)
+	b := make([]int, 10)
+	go func() {
+		_ = a[5]
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestNoRaceSliceWriteCopy(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 10)
+	b := make([]int, 10)
+	go func() {
+		a[5] = 1
+		ch <- true
+	}()
+	copy(a[:5], b[:5])
+	<-ch
+}
+
+func TestRaceSliceCopyWrite2(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 10)
+	b := make([]int, 10)
+	go func() {
+		b[5] = 1
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestRaceSliceCopyWrite3(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]byte, 10)
+	go func() {
+		a[7] = 1
+		ch <- true
+	}()
+	copy(a, "qwertyqwerty")
+	<-ch
+}
+
+func TestNoRaceSliceCopyRead(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 10)
+	b := make([]int, 10)
+	go func() {
+		_ = b[5]
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestRacePointerSliceCopyRead(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]*int, 10)
+	b := make([]*int, 10)
+	go func() {
+		_ = a[5]
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestNoRacePointerSliceWriteCopy(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]*int, 10)
+	b := make([]*int, 10)
+	go func() {
+		a[5] = new(int)
+		ch <- true
+	}()
+	copy(a[:5], b[:5])
+	<-ch
+}
+
+func TestRacePointerSliceCopyWrite2(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]*int, 10)
+	b := make([]*int, 10)
+	go func() {
+		b[5] = new(int)
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestNoRacePointerSliceCopyRead(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]*int, 10)
+	b := make([]*int, 10)
+	go func() {
+		_ = b[5]
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestNoRaceSliceWriteSlice2(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]float64, 10)
+	go func() {
+		a[2] = 1.0
+		ch <- true
+	}()
+	_ = a[0:5]
+	<-ch
+}
+
+func TestRaceSliceWriteSlice(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]float64, 10)
+	go func() {
+		a[2] = 1.0
+		ch <- true
+	}()
+	a = a[5:10]
+	<-ch
+}
+
+func TestNoRaceSliceWriteSlice(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]float64, 10)
+	go func() {
+		a[2] = 1.0
+		ch <- true
+	}()
+	_ = a[5:10]
+	<-ch
+}
+
+func TestNoRaceSliceLenCap(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]struct{}, 10)
+	go func() {
+		_ = len(a)
+		ch <- true
+	}()
+	_ = cap(a)
+	<-ch
+}
+
+func TestNoRaceStructSlicesRangeWrite(t *testing.T) {
+	type Str struct {
+		a []int
+		b []int
+	}
+	ch := make(chan bool, 1)
+	var s Str
+	s.a = make([]int, 10)
+	s.b = make([]int, 10)
+	go func() {
+		for range s.a {
+		}
+		ch <- true
+	}()
+	s.b[5] = 5
+	<-ch
+}
+
+func TestRaceSliceDifferent(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	s2 := s
+	go func() {
+		s[3] = 3
+		c <- true
+	}()
+	// false negative because s2 is PAUTO w/o PHEAP
+	// so we do not instrument it
+	s2[3] = 3
+	<-c
+}
+
+func TestRaceSliceRangeWrite(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s[3] = 3
+		c <- true
+	}()
+	for _, v := range s {
+		_ = v
+	}
+	<-c
+}
+
+func TestNoRaceSliceRangeWrite(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s[3] = 3
+		c <- true
+	}()
+	for range s {
+	}
+	<-c
+}
+
+func TestRaceSliceRangeAppend(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s = append(s, 3)
+		c <- true
+	}()
+	for range s {
+	}
+	<-c
+}
+
+func TestNoRaceSliceRangeAppend(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		_ = append(s, 3)
+		c <- true
+	}()
+	for range s {
+	}
+	<-c
+}
+
+func TestRaceSliceVarWrite(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s[3] = 3
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceVarRead(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		_ = s[3]
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceVarRange(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		for range s {
+		}
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceVarAppend(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		_ = append(s, 10)
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceVarCopy(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s2 := make([]int, 10)
+		copy(s, s2)
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceVarCopy2(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s2 := make([]int, 10)
+		copy(s2, s)
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceAppend(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10, 20)
+	go func() {
+		_ = append(s, 1)
+		c <- true
+	}()
+	_ = append(s, 2)
+	<-c
+}
+
+func TestRaceSliceAppendWrite(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		_ = append(s, 1)
+		c <- true
+	}()
+	s[0] = 42
+	<-c
+}
+
+func TestRaceSliceAppendSlice(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s2 := make([]int, 10)
+		_ = append(s, s2...)
+		c <- true
+	}()
+	s[0] = 42
+	<-c
+}
+
+func TestRaceSliceAppendSlice2(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	s2foobar := make([]int, 10)
+	go func() {
+		_ = append(s, s2foobar...)
+		c <- true
+	}()
+	s2foobar[5] = 42
+	<-c
+}
+
+func TestRaceSliceAppendString(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]byte, 10)
+	go func() {
+		_ = append(s, "qwerty"...)
+		c <- true
+	}()
+	s[0] = 42
+	<-c
+}
+
+func TestRacePointerSliceAppend(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]*int, 10, 20)
+	go func() {
+		_ = append(s, new(int))
+		c <- true
+	}()
+	_ = append(s, new(int))
+	<-c
+}
+
+func TestRacePointerSliceAppendWrite(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]*int, 10)
+	go func() {
+		_ = append(s, new(int))
+		c <- true
+	}()
+	s[0] = new(int)
+	<-c
+}
+
+func TestRacePointerSliceAppendSlice(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]*int, 10)
+	go func() {
+		s2 := make([]*int, 10)
+		_ = append(s, s2...)
+		c <- true
+	}()
+	s[0] = new(int)
+	<-c
+}
+
+func TestRacePointerSliceAppendSlice2(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]*int, 10)
+	s2foobar := make([]*int, 10)
+	go func() {
+		_ = append(s, s2foobar...)
+		c <- true
+	}()
+	println("WRITE:", &s2foobar[5])
+	s2foobar[5] = nil
+	<-c
+}
+
+func TestNoRaceSliceIndexAccess(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	v := 0
+	go func() {
+		_ = v
+		c <- true
+	}()
+	s[v] = 1
+	<-c
+}
+
+func TestNoRaceSliceIndexAccess2(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	v := 0
+	go func() {
+		_ = v
+		c <- true
+	}()
+	_ = s[v]
+	<-c
+}
+
+func TestRaceSliceIndexAccess(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	v := 0
+	go func() {
+		v = 1
+		c <- true
+	}()
+	s[v] = 1
+	<-c
+}
+
+func TestRaceSliceIndexAccess2(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	v := 0
+	go func() {
+		v = 1
+		c <- true
+	}()
+	_ = s[v]
+	<-c
+}
+
+func TestRaceSliceByteToString(t *testing.T) {
+	c := make(chan string)
+	s := make([]byte, 10)
+	go func() {
+		c <- string(s)
+	}()
+	s[0] = 42
+	<-c
+}
+
+func TestRaceSliceRuneToString(t *testing.T) {
+	c := make(chan string)
+	s := make([]rune, 10)
+	go func() {
+		c <- string(s)
+	}()
+	s[9] = 42
+	<-c
+}
+
+func TestRaceConcatString(t *testing.T) {
+	s := "hello"
+	c := make(chan string, 1)
+	go func() {
+		c <- s + " world"
+	}()
+	s = "world"
+	<-c
+}
+
+func TestRaceCompareString(t *testing.T) {
+	s1 := "hello"
+	s2 := "world"
+	c := make(chan bool, 1)
+	go func() {
+		c <- s1 == s2
+	}()
+	s1 = s2
+	<-c
+}
+
+func TestRaceSlice3(t *testing.T) {
+	done := make(chan bool)
+	x := make([]int, 10)
+	i := 2
+	go func() {
+		i = 3
+		done <- true
+	}()
+	_ = x[:1:i]
+	<-done
+}
+
+var saved string
+
+func TestRaceSlice4(t *testing.T) {
+	// See issue 36794.
+	data := []byte("hello there")
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		_ = string(data)
+		wg.Done()
+	}()
+	copy(data, data[2:])
+	wg.Wait()
+}
diff --git a/src/runtime/race/testdata/sync_test.go b/src/runtime/race/testdata/sync_test.go
new file mode 100644
index 0000000..b5fcd6c
--- /dev/null
+++ b/src/runtime/race/testdata/sync_test.go
@@ -0,0 +1,202 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNoRaceCond(t *testing.T) {
+	x := 0
+	_ = x
+	condition := 0
+	var mu sync.Mutex
+	cond := sync.NewCond(&mu)
+	go func() {
+		x = 1
+		mu.Lock()
+		condition = 1
+		cond.Signal()
+		mu.Unlock()
+	}()
+	mu.Lock()
+	for condition != 1 {
+		cond.Wait()
+	}
+	mu.Unlock()
+	x = 2
+}
+
+func TestRaceCond(t *testing.T) {
+	done := make(chan bool)
+	var mu sync.Mutex
+	cond := sync.NewCond(&mu)
+	x := 0
+	_ = x
+	condition := 0
+	go func() {
+		time.Sleep(10 * time.Millisecond) // Enter cond.Wait loop
+		x = 1
+		mu.Lock()
+		condition = 1
+		cond.Signal()
+		mu.Unlock()
+		time.Sleep(10 * time.Millisecond) // Exit cond.Wait loop
+		mu.Lock()
+		x = 3
+		mu.Unlock()
+		done <- true
+	}()
+	mu.Lock()
+	for condition != 1 {
+		cond.Wait()
+	}
+	mu.Unlock()
+	x = 2
+	<-done
+}
+
+// We do not currently automatically
+// parse this test. It is intended that the creation
+// stack is observed manually not to contain
+// off-by-one errors
+func TestRaceAnnounceThreads(t *testing.T) {
+	const N = 7
+	allDone := make(chan bool, N)
+
+	var x int
+	_ = x
+
+	var f, g, h func()
+	f = func() {
+		x = 1
+		go g()
+		go func() {
+			x = 1
+			allDone <- true
+		}()
+		x = 2
+		allDone <- true
+	}
+
+	g = func() {
+		for i := 0; i < 2; i++ {
+			go func() {
+				x = 1
+				allDone <- true
+			}()
+			allDone <- true
+		}
+	}
+
+	h = func() {
+		x = 1
+		x = 2
+		go f()
+		allDone <- true
+	}
+
+	go h()
+
+	for i := 0; i < N; i++ {
+		<-allDone
+	}
+}
+
+func TestNoRaceAfterFunc1(t *testing.T) {
+	i := 2
+	c := make(chan bool)
+	var f func()
+	f = func() {
+		i--
+		if i >= 0 {
+			time.AfterFunc(0, f)
+		} else {
+			c <- true
+		}
+	}
+
+	time.AfterFunc(0, f)
+	<-c
+}
+
+func TestNoRaceAfterFunc2(t *testing.T) {
+	var x int
+	_ = x
+	timer := time.AfterFunc(10, func() {
+		x = 1
+	})
+	defer timer.Stop()
+}
+
+func TestNoRaceAfterFunc3(t *testing.T) {
+	c := make(chan bool, 1)
+	x := 0
+	_ = x
+	time.AfterFunc(1e7, func() {
+		x = 1
+		c <- true
+	})
+	<-c
+}
+
+func TestRaceAfterFunc3(t *testing.T) {
+	c := make(chan bool, 2)
+	x := 0
+	_ = x
+	time.AfterFunc(1e7, func() {
+		x = 1
+		c <- true
+	})
+	time.AfterFunc(2e7, func() {
+		x = 2
+		c <- true
+	})
+	<-c
+	<-c
+}
+
+// This test's output is intended to be
+// observed manually. One should check
+// that goroutine creation stack is
+// comprehensible.
+func TestRaceGoroutineCreationStack(t *testing.T) {
+	var x int
+	_ = x
+	var ch = make(chan bool, 1)
+
+	f1 := func() {
+		x = 1
+		ch <- true
+	}
+	f2 := func() { go f1() }
+	f3 := func() { go f2() }
+	f4 := func() { go f3() }
+
+	go f4()
+	x = 2
+	<-ch
+}
+
+// A nil pointer in a mutex method call should not
+// corrupt the race detector state.
+// Used to hang indefinitely.
+func TestNoRaceNilMutexCrash(t *testing.T) {
+	var mutex sync.Mutex
+	panics := 0
+	defer func() {
+		if x := recover(); x != nil {
+			mutex.Lock()
+			panics++
+			mutex.Unlock()
+		} else {
+			panic("no panic")
+		}
+	}()
+	var othermutex *sync.RWMutex
+	othermutex.RLock()
+}
diff --git a/src/runtime/race/testdata/waitgroup_test.go b/src/runtime/race/testdata/waitgroup_test.go
new file mode 100644
index 0000000..1693373
--- /dev/null
+++ b/src/runtime/race/testdata/waitgroup_test.go
@@ -0,0 +1,360 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"runtime"
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNoRaceWaitGroup(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	n := 1
+	for i := 0; i < n; i++ {
+		wg.Add(1)
+		j := i
+		go func() {
+			x = j
+			wg.Done()
+		}()
+	}
+	wg.Wait()
+}
+
+func TestRaceWaitGroup(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	n := 2
+	for i := 0; i < n; i++ {
+		wg.Add(1)
+		j := i
+		go func() {
+			x = j
+			wg.Done()
+		}()
+	}
+	wg.Wait()
+}
+
+func TestNoRaceWaitGroup2(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		x = 1
+		wg.Done()
+	}()
+	wg.Wait()
+	x = 2
+}
+
+// incrementing counter in Add and locking wg's mutex
+func TestRaceWaitGroupAsMutex(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	c := make(chan bool, 2)
+	go func() {
+		wg.Wait()
+		time.Sleep(100 * time.Millisecond)
+		wg.Add(+1)
+		x = 1
+		wg.Add(-1)
+		c <- true
+	}()
+	go func() {
+		wg.Wait()
+		time.Sleep(100 * time.Millisecond)
+		wg.Add(+1)
+		x = 2
+		wg.Add(-1)
+		c <- true
+	}()
+	<-c
+	<-c
+}
+
+// Incorrect usage: Add is too late.
+func TestRaceWaitGroupWrongWait(t *testing.T) {
+	c := make(chan bool, 2)
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	go func() {
+		wg.Add(1)
+		runtime.Gosched()
+		x = 1
+		wg.Done()
+		c <- true
+	}()
+	go func() {
+		wg.Add(1)
+		runtime.Gosched()
+		x = 2
+		wg.Done()
+		c <- true
+	}()
+	wg.Wait()
+	<-c
+	<-c
+}
+
+func TestRaceWaitGroupWrongAdd(t *testing.T) {
+	c := make(chan bool, 2)
+	var wg sync.WaitGroup
+	go func() {
+		wg.Add(1)
+		time.Sleep(100 * time.Millisecond)
+		wg.Done()
+		c <- true
+	}()
+	go func() {
+		wg.Add(1)
+		time.Sleep(100 * time.Millisecond)
+		wg.Done()
+		c <- true
+	}()
+	time.Sleep(50 * time.Millisecond)
+	wg.Wait()
+	<-c
+	<-c
+}
+
+func TestNoRaceWaitGroupMultipleWait(t *testing.T) {
+	c := make(chan bool, 2)
+	var wg sync.WaitGroup
+	go func() {
+		wg.Wait()
+		c <- true
+	}()
+	go func() {
+		wg.Wait()
+		c <- true
+	}()
+	wg.Wait()
+	<-c
+	<-c
+}
+
+func TestNoRaceWaitGroupMultipleWait2(t *testing.T) {
+	c := make(chan bool, 2)
+	var wg sync.WaitGroup
+	wg.Add(2)
+	go func() {
+		wg.Done()
+		wg.Wait()
+		c <- true
+	}()
+	go func() {
+		wg.Done()
+		wg.Wait()
+		c <- true
+	}()
+	wg.Wait()
+	<-c
+	<-c
+}
+
+func TestNoRaceWaitGroupMultipleWait3(t *testing.T) {
+	const P = 3
+	var data [P]int
+	done := make(chan bool, P)
+	var wg sync.WaitGroup
+	wg.Add(P)
+	for p := 0; p < P; p++ {
+		go func(p int) {
+			data[p] = 42
+			wg.Done()
+		}(p)
+	}
+	for p := 0; p < P; p++ {
+		go func() {
+			wg.Wait()
+			for p1 := 0; p1 < P; p1++ {
+				_ = data[p1]
+			}
+			done <- true
+		}()
+	}
+	for p := 0; p < P; p++ {
+		<-done
+	}
+}
+
+// Correct usage but still a race
+func TestRaceWaitGroup2(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	wg.Add(2)
+	go func() {
+		x = 1
+		wg.Done()
+	}()
+	go func() {
+		x = 2
+		wg.Done()
+	}()
+	wg.Wait()
+}
+
+func TestNoRaceWaitGroupPanicRecover(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	defer func() {
+		err := recover()
+		if err != "sync: negative WaitGroup counter" {
+			t.Fatalf("Unexpected panic: %#v", err)
+		}
+		x = 2
+	}()
+	x = 1
+	wg.Add(-1)
+}
+
+// TODO: this is actually a panic-synchronization test, not a
+// WaitGroup test. Move it to another *_test file
+// Is it possible to get a race by synchronization via panic?
+func TestNoRaceWaitGroupPanicRecover2(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	ch := make(chan bool, 1)
+	var f func() = func() {
+		x = 2
+		ch <- true
+	}
+	go func() {
+		defer func() {
+			err := recover()
+			if err != "sync: negative WaitGroup counter" {
+			}
+			go f()
+		}()
+		x = 1
+		wg.Add(-1)
+	}()
+
+	<-ch
+}
+
+func TestNoRaceWaitGroupTransitive(t *testing.T) {
+	x, y := 0, 0
+	var wg sync.WaitGroup
+	wg.Add(2)
+	go func() {
+		x = 42
+		wg.Done()
+	}()
+	go func() {
+		time.Sleep(1e7)
+		y = 42
+		wg.Done()
+	}()
+	wg.Wait()
+	_ = x
+	_ = y
+}
+
+func TestNoRaceWaitGroupReuse(t *testing.T) {
+	const P = 3
+	var data [P]int
+	var wg sync.WaitGroup
+	for try := 0; try < 3; try++ {
+		wg.Add(P)
+		for p := 0; p < P; p++ {
+			go func(p int) {
+				data[p]++
+				wg.Done()
+			}(p)
+		}
+		wg.Wait()
+		for p := 0; p < P; p++ {
+			data[p]++
+		}
+	}
+}
+
+func TestNoRaceWaitGroupReuse2(t *testing.T) {
+	const P = 3
+	var data [P]int
+	var wg sync.WaitGroup
+	for try := 0; try < 3; try++ {
+		wg.Add(P)
+		for p := 0; p < P; p++ {
+			go func(p int) {
+				data[p]++
+				wg.Done()
+			}(p)
+		}
+		done := make(chan bool)
+		go func() {
+			wg.Wait()
+			for p := 0; p < P; p++ {
+				data[p]++
+			}
+			done <- true
+		}()
+		wg.Wait()
+		<-done
+		for p := 0; p < P; p++ {
+			data[p]++
+		}
+	}
+}
+
+func TestRaceWaitGroupReuse(t *testing.T) {
+	const P = 3
+	const T = 3
+	done := make(chan bool, T)
+	var wg sync.WaitGroup
+	for try := 0; try < T; try++ {
+		var data [P]int
+		wg.Add(P)
+		for p := 0; p < P; p++ {
+			go func(p int) {
+				time.Sleep(50 * time.Millisecond)
+				data[p]++
+				wg.Done()
+			}(p)
+		}
+		go func() {
+			wg.Wait()
+			for p := 0; p < P; p++ {
+				data[p]++
+			}
+			done <- true
+		}()
+		time.Sleep(100 * time.Millisecond)
+		wg.Wait()
+	}
+	for try := 0; try < T; try++ {
+		<-done
+	}
+}
+
+func TestNoRaceWaitGroupConcurrentAdd(t *testing.T) {
+	const P = 4
+	waiting := make(chan bool, P)
+	var wg sync.WaitGroup
+	for p := 0; p < P; p++ {
+		go func() {
+			wg.Add(1)
+			waiting <- true
+			wg.Done()
+		}()
+	}
+	for p := 0; p < P; p++ {
+		<-waiting
+	}
+	wg.Wait()
+}
diff --git a/src/runtime/race/timer_test.go b/src/runtime/race/timer_test.go
new file mode 100644
index 0000000..a6c34a8
--- /dev/null
+++ b/src/runtime/race/timer_test.go
@@ -0,0 +1,33 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build race
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestTimers(t *testing.T) {
+	const goroutines = 8
+	var wg sync.WaitGroup
+	wg.Add(goroutines)
+	var mu sync.Mutex
+	for i := 0; i < goroutines; i++ {
+		go func() {
+			defer wg.Done()
+			ticker := time.NewTicker(1)
+			defer ticker.Stop()
+			for c := 0; c < 1000; c++ {
+				<-ticker.C
+				mu.Lock()
+				mu.Unlock()
+			}
+		}()
+	}
+	wg.Wait()
+}
diff --git a/src/runtime/race0.go b/src/runtime/race0.go
new file mode 100644
index 0000000..180f707
--- /dev/null
+++ b/src/runtime/race0.go
@@ -0,0 +1,44 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !race
+
+// Dummy race detection API, used when not built with -race.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+const raceenabled = false
+
+// Because raceenabled is false, none of these functions should be called.
+
+func raceReadObjectPC(t *_type, addr unsafe.Pointer, callerpc, pc uintptr)  { throw("race") }
+func raceWriteObjectPC(t *_type, addr unsafe.Pointer, callerpc, pc uintptr) { throw("race") }
+func raceinit() (uintptr, uintptr)                                          { throw("race"); return 0, 0 }
+func racefini()                                                             { throw("race") }
+func raceproccreate() uintptr                                               { throw("race"); return 0 }
+func raceprocdestroy(ctx uintptr)                                           { throw("race") }
+func racemapshadow(addr unsafe.Pointer, size uintptr)                       { throw("race") }
+func racewritepc(addr unsafe.Pointer, callerpc, pc uintptr)                 { throw("race") }
+func racereadpc(addr unsafe.Pointer, callerpc, pc uintptr)                  { throw("race") }
+func racereadrangepc(addr unsafe.Pointer, sz, callerpc, pc uintptr)         { throw("race") }
+func racewriterangepc(addr unsafe.Pointer, sz, callerpc, pc uintptr)        { throw("race") }
+func raceacquire(addr unsafe.Pointer)                                       { throw("race") }
+func raceacquireg(gp *g, addr unsafe.Pointer)                               { throw("race") }
+func raceacquirectx(racectx uintptr, addr unsafe.Pointer)                   { throw("race") }
+func racerelease(addr unsafe.Pointer)                                       { throw("race") }
+func racereleaseg(gp *g, addr unsafe.Pointer)                               { throw("race") }
+func racereleaseacquire(addr unsafe.Pointer)                                { throw("race") }
+func racereleaseacquireg(gp *g, addr unsafe.Pointer)                        { throw("race") }
+func racereleasemerge(addr unsafe.Pointer)                                  { throw("race") }
+func racereleasemergeg(gp *g, addr unsafe.Pointer)                          { throw("race") }
+func racefingo()                                                            { throw("race") }
+func racemalloc(p unsafe.Pointer, sz uintptr)                               { throw("race") }
+func racefree(p unsafe.Pointer, sz uintptr)                                 { throw("race") }
+func racegostart(pc uintptr) uintptr                                        { throw("race"); return 0 }
+func racegoend()                                                            { throw("race") }
+func racectxend(racectx uintptr)                                            { throw("race") }
diff --git a/src/runtime/race_amd64.s b/src/runtime/race_amd64.s
new file mode 100644
index 0000000..9818bc6
--- /dev/null
+++ b/src/runtime/race_amd64.s
@@ -0,0 +1,486 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build race
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// The following thunks allow calling the gcc-compiled race runtime directly
+// from Go code without going all the way through cgo.
+// First, it's much faster (up to 50% speedup for real Go programs).
+// Second, it eliminates race-related special cases from cgocall and scheduler.
+// Third, in long-term it will allow to remove cyclic runtime/race dependency on cmd/go.
+
+// A brief recap of the amd64 calling convention.
+// Arguments are passed in DI, SI, DX, CX, R8, R9, the rest is on stack.
+// Callee-saved registers are: BX, BP, R12-R15.
+// SP must be 16-byte aligned.
+// On Windows:
+// Arguments are passed in CX, DX, R8, R9, the rest is on stack.
+// Callee-saved registers are: BX, BP, DI, SI, R12-R15.
+// SP must be 16-byte aligned. Windows also requires "stack-backing" for the 4 register arguments:
+// https://msdn.microsoft.com/en-us/library/ms235286.aspx
+// We do not do this, because it seems to be intended for vararg/unprototyped functions.
+// Gcc-compiled race runtime does not try to use that space.
+
+#ifdef GOOS_windows
+#define RARG0 CX
+#define RARG1 DX
+#define RARG2 R8
+#define RARG3 R9
+#else
+#define RARG0 DI
+#define RARG1 SI
+#define RARG2 DX
+#define RARG3 CX
+#endif
+
+// func runtime·raceread(addr uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would render runtime.getcallerpc ineffective.
+TEXT	runtime·raceread<ABIInternal>(SB), NOSPLIT, $0-8
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	(SP), RARG2
+	// void __tsan_read(ThreadState *thr, void *addr, void *pc);
+	MOVQ	$__tsan_read(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceRead(addr uintptr)
+TEXT	runtime·RaceRead(SB), NOSPLIT, $0-8
+	// This needs to be a tail call, because raceread reads caller pc.
+	JMP	runtime·raceread(SB)
+
+// void runtime·racereadpc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racereadpc(SB), NOSPLIT, $0-24
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	callpc+8(FP), RARG2
+	MOVQ	pc+16(FP), RARG3
+	ADDQ	$1, RARG3 // pc is function start, tsan wants return address
+	// void __tsan_read_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVQ	$__tsan_read_pc(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racewrite(addr uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would render runtime.getcallerpc ineffective.
+TEXT	runtime·racewrite<ABIInternal>(SB), NOSPLIT, $0-8
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	(SP), RARG2
+	// void __tsan_write(ThreadState *thr, void *addr, void *pc);
+	MOVQ	$__tsan_write(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceWrite(addr uintptr)
+TEXT	runtime·RaceWrite(SB), NOSPLIT, $0-8
+	// This needs to be a tail call, because racewrite reads caller pc.
+	JMP	runtime·racewrite(SB)
+
+// void runtime·racewritepc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racewritepc(SB), NOSPLIT, $0-24
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	callpc+8(FP), RARG2
+	MOVQ	pc+16(FP), RARG3
+	ADDQ	$1, RARG3 // pc is function start, tsan wants return address
+	// void __tsan_write_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVQ	$__tsan_write_pc(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racereadrange(addr, size uintptr)
+// Called from instrumented code.
+TEXT	runtime·racereadrange(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	size+8(FP), RARG2
+	MOVQ	(SP), RARG3
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVQ	$__tsan_read_range(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceReadRange(addr, size uintptr)
+TEXT	runtime·RaceReadRange(SB), NOSPLIT, $0-16
+	// This needs to be a tail call, because racereadrange reads caller pc.
+	JMP	runtime·racereadrange(SB)
+
+// void runtime·racereadrangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racereadrangepc1(SB), NOSPLIT, $0-24
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	size+8(FP), RARG2
+	MOVQ	pc+16(FP), RARG3
+	ADDQ	$1, RARG3 // pc is function start, tsan wants return address
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVQ	$__tsan_read_range(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racewriterange(addr, size uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would render runtime.getcallerpc ineffective.
+TEXT	runtime·racewriterange<ABIInternal>(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	size+8(FP), RARG2
+	MOVQ	(SP), RARG3
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVQ	$__tsan_write_range(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceWriteRange(addr, size uintptr)
+TEXT	runtime·RaceWriteRange(SB), NOSPLIT, $0-16
+	// This needs to be a tail call, because racewriterange reads caller pc.
+	JMP	runtime·racewriterange(SB)
+
+// void runtime·racewriterangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racewriterangepc1(SB), NOSPLIT, $0-24
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	size+8(FP), RARG2
+	MOVQ	pc+16(FP), RARG3
+	ADDQ	$1, RARG3 // pc is function start, tsan wants return address
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVQ	$__tsan_write_range(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// If addr (RARG1) is out of range, do nothing.
+// Otherwise, setup goroutine context and invoke racecall. Other arguments already set.
+TEXT	racecalladdr<>(SB), NOSPLIT, $0-0
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	CMPQ	RARG1, runtime·racearenastart(SB)
+	JB	data
+	CMPQ	RARG1, runtime·racearenaend(SB)
+	JB	call
+data:
+	CMPQ	RARG1, runtime·racedatastart(SB)
+	JB	ret
+	CMPQ	RARG1, runtime·racedataend(SB)
+	JAE	ret
+call:
+	MOVQ	AX, AX		// w/o this 6a miscompiles this function
+	JMP	racecall<>(SB)
+ret:
+	RET
+
+// func runtime·racefuncenterfp(fp uintptr)
+// Called from instrumented code.
+// Like racefuncenter but passes FP, not PC
+TEXT	runtime·racefuncenterfp(SB), NOSPLIT, $0-8
+	MOVQ	fp+0(FP), R11
+	MOVQ	-8(R11), R11
+	JMP	racefuncenter<>(SB)
+
+// func runtime·racefuncenter(pc uintptr)
+// Called from instrumented code.
+TEXT	runtime·racefuncenter(SB), NOSPLIT, $0-8
+	MOVQ	callpc+0(FP), R11
+	JMP	racefuncenter<>(SB)
+
+// Common code for racefuncenter/racefuncenterfp
+// R11 = caller's return address
+TEXT	racefuncenter<>(SB), NOSPLIT, $0-0
+	MOVQ	DX, R15		// save function entry context (for closures)
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	MOVQ	R11, RARG1
+	// void __tsan_func_enter(ThreadState *thr, void *pc);
+	MOVQ	$__tsan_func_enter(SB), AX
+	// racecall<> preserves R15
+	CALL	racecall<>(SB)
+	MOVQ	R15, DX	// restore function entry context
+	RET
+
+// func runtime·racefuncexit()
+// Called from instrumented code.
+TEXT	runtime·racefuncexit(SB), NOSPLIT, $0-0
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	// void __tsan_func_exit(ThreadState *thr);
+	MOVQ	$__tsan_func_exit(SB), AX
+	JMP	racecall<>(SB)
+
+// Atomic operations for sync/atomic package.
+
+// Load
+TEXT	sync∕atomic·LoadInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic32_load(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic64_load(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt32(SB)
+
+TEXT	sync∕atomic·LoadUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadPointer(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+// Store
+TEXT	sync∕atomic·StoreInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic32_store(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·StoreInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic64_store(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·StoreUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt32(SB)
+
+TEXT	sync∕atomic·StoreUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt64(SB)
+
+TEXT	sync∕atomic·StoreUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt64(SB)
+
+// Swap
+TEXT	sync∕atomic·SwapInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic32_exchange(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·SwapInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic64_exchange(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·SwapUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt32(SB)
+
+TEXT	sync∕atomic·SwapUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt64(SB)
+
+TEXT	sync∕atomic·SwapUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt64(SB)
+
+// Add
+TEXT	sync∕atomic·AddInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic32_fetch_add(SB), AX
+	CALL	racecallatomic<>(SB)
+	MOVL	add+8(FP), AX	// convert fetch_add to add_fetch
+	ADDL	AX, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic64_fetch_add(SB), AX
+	CALL	racecallatomic<>(SB)
+	MOVQ	add+8(FP), AX	// convert fetch_add to add_fetch
+	ADDQ	AX, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	JMP	sync∕atomic·AddInt32(SB)
+
+TEXT	sync∕atomic·AddUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·AddInt64(SB)
+
+TEXT	sync∕atomic·AddUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·AddInt64(SB)
+
+// CompareAndSwap
+TEXT	sync∕atomic·CompareAndSwapInt32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic32_compare_exchange(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·CompareAndSwapInt64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic64_compare_exchange(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·CompareAndSwapUint32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt32(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUint64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt64(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUintptr(SB), NOSPLIT, $0-25
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt64(SB)
+
+// Generic atomic operation implementation.
+// AX already contains target function.
+TEXT	racecallatomic<>(SB), NOSPLIT, $0-0
+	// Trigger SIGSEGV early.
+	MOVQ	16(SP), R12
+	MOVL	(R12), R13
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	CMPQ	R12, runtime·racearenastart(SB)
+	JB	racecallatomic_data
+	CMPQ	R12, runtime·racearenaend(SB)
+	JB	racecallatomic_ok
+racecallatomic_data:
+	CMPQ	R12, runtime·racedatastart(SB)
+	JB	racecallatomic_ignore
+	CMPQ	R12, runtime·racedataend(SB)
+	JAE	racecallatomic_ignore
+racecallatomic_ok:
+	// Addr is within the good range, call the atomic function.
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	MOVQ	8(SP), RARG1	// caller pc
+	MOVQ	(SP), RARG2	// pc
+	LEAQ	16(SP), RARG3	// arguments
+	JMP	racecall<>(SB)	// does not return
+racecallatomic_ignore:
+	// Addr is outside the good range.
+	// Call __tsan_go_ignore_sync_begin to ignore synchronization during the atomic op.
+	// An attempt to synchronize on the address would cause crash.
+	MOVQ	AX, R15	// remember the original function
+	MOVQ	$__tsan_go_ignore_sync_begin(SB), AX
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	CALL	racecall<>(SB)
+	MOVQ	R15, AX	// restore the original function
+	// Call the atomic function.
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	MOVQ	8(SP), RARG1	// caller pc
+	MOVQ	(SP), RARG2	// pc
+	LEAQ	16(SP), RARG3	// arguments
+	CALL	racecall<>(SB)
+	// Call __tsan_go_ignore_sync_end.
+	MOVQ	$__tsan_go_ignore_sync_end(SB), AX
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	JMP	racecall<>(SB)
+
+// void runtime·racecall(void(*f)(...), ...)
+// Calls C function f from race runtime and passes up to 4 arguments to it.
+// The arguments are never heap-object-preserving pointers, so we pretend there are no arguments.
+TEXT	runtime·racecall(SB), NOSPLIT, $0-0
+	MOVQ	fn+0(FP), AX
+	MOVQ	arg0+8(FP), RARG0
+	MOVQ	arg1+16(FP), RARG1
+	MOVQ	arg2+24(FP), RARG2
+	MOVQ	arg3+32(FP), RARG3
+	JMP	racecall<>(SB)
+
+// Switches SP to g0 stack and calls (AX). Arguments already set.
+TEXT	racecall<>(SB), NOSPLIT, $0-0
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_m(R14), R13
+	// Switch to g0 stack.
+	MOVQ	SP, R12		// callee-saved, preserved across the CALL
+	MOVQ	m_g0(R13), R10
+	CMPQ	R10, R14
+	JE	call	// already on g0
+	MOVQ	(g_sched+gobuf_sp)(R10), SP
+call:
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	CALL	AX
+	MOVQ	R12, SP
+	RET
+
+// C->Go callback thunk that allows to call runtime·racesymbolize from C code.
+// Direct Go->C race call has only switched SP, finish g->g0 switch by setting correct g.
+// The overall effect of Go->C->Go call chain is similar to that of mcall.
+// RARG0 contains command code. RARG1 contains command-specific context.
+// See racecallback for command codes.
+TEXT	runtime·racecallbackthunk(SB), NOSPLIT, $56-8
+	// Handle command raceGetProcCmd (0) here.
+	// First, code below assumes that we are on curg, while raceGetProcCmd
+	// can be executed on g0. Second, it is called frequently, so will
+	// benefit from this fast path.
+	CMPQ	RARG0, $0
+	JNE	rest
+	get_tls(RARG0)
+	MOVQ	g(RARG0), RARG0
+	MOVQ	g_m(RARG0), RARG0
+	MOVQ	m_p(RARG0), RARG0
+	MOVQ	p_raceprocctx(RARG0), RARG0
+	MOVQ	RARG0, (RARG1)
+	RET
+
+rest:
+	// Save callee-saved registers (Go code won't respect that).
+	// This is superset of darwin/linux/windows registers.
+	PUSHQ	BX
+	PUSHQ	BP
+	PUSHQ	DI
+	PUSHQ	SI
+	PUSHQ	R12
+	PUSHQ	R13
+	PUSHQ	R14
+	PUSHQ	R15
+	// Set g = g0.
+	get_tls(R12)
+	MOVQ	g(R12), R13
+	MOVQ	g_m(R13), R14
+	MOVQ	m_g0(R14), R15
+	CMPQ	R13, R15
+	JEQ	noswitch	// branch if already on g0
+	MOVQ	R15, g(R12)	// g = m->g0
+	PUSHQ	RARG1	// func arg
+	PUSHQ	RARG0	// func arg
+	CALL	runtime·racecallback(SB)
+	POPQ	R12
+	POPQ	R12
+	// All registers are smashed after Go code, reload.
+	get_tls(R12)
+	MOVQ	g(R12), R13
+	MOVQ	g_m(R13), R13
+	MOVQ	m_curg(R13), R14
+	MOVQ	R14, g(R12)	// g = m->curg
+ret:
+	// Restore callee-saved registers.
+	POPQ	R15
+	POPQ	R14
+	POPQ	R13
+	POPQ	R12
+	POPQ	SI
+	POPQ	DI
+	POPQ	BP
+	POPQ	BX
+	RET
+
+noswitch:
+	// already on g0
+	PUSHQ	RARG1	// func arg
+	PUSHQ	RARG0	// func arg
+	CALL	runtime·racecallback(SB)
+	POPQ	R12
+	POPQ	R12
+	JMP	ret
diff --git a/src/runtime/race_arm64.s b/src/runtime/race_arm64.s
new file mode 100644
index 0000000..8aa1774
--- /dev/null
+++ b/src/runtime/race_arm64.s
@@ -0,0 +1,498 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build race
+
+#include "go_asm.h"
+#include "funcdata.h"
+#include "textflag.h"
+#include "tls_arm64.h"
+
+// The following thunks allow calling the gcc-compiled race runtime directly
+// from Go code without going all the way through cgo.
+// First, it's much faster (up to 50% speedup for real Go programs).
+// Second, it eliminates race-related special cases from cgocall and scheduler.
+// Third, in long-term it will allow to remove cyclic runtime/race dependency on cmd/go.
+
+// A brief recap of the arm64 calling convention.
+// Arguments are passed in R0...R7, the rest is on stack.
+// Callee-saved registers are: R19...R28.
+// Temporary registers are: R9...R15
+// SP must be 16-byte aligned.
+
+// When calling racecalladdr, R9 is the call target address.
+
+// The race ctx, ThreadState *thr below, is passed in R0 and loaded in racecalladdr.
+
+// Darwin may return unaligned thread pointer. Align it. (See tls_arm64.s)
+// No-op on other OSes.
+#ifdef TLS_darwin
+#define TP_ALIGN	AND	$~7, R0
+#else
+#define TP_ALIGN
+#endif
+
+// Load g from TLS. (See tls_arm64.s)
+#define load_g \
+	MRS_TPIDR_R0 \
+	TP_ALIGN \
+	MOVD    runtime·tls_g(SB), R11 \
+	MOVD    (R0)(R11), g
+
+// func runtime·raceread(addr uintptr)
+// Called from instrumented code.
+TEXT	runtime·raceread(SB), NOSPLIT, $0-8
+	MOVD	addr+0(FP), R1
+	MOVD	LR, R2
+	// void __tsan_read(ThreadState *thr, void *addr, void *pc);
+	MOVD	$__tsan_read(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceRead(addr uintptr)
+TEXT	runtime·RaceRead(SB), NOSPLIT, $0-8
+	// This needs to be a tail call, because raceread reads caller pc.
+	JMP	runtime·raceread(SB)
+
+// func runtime·racereadpc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racereadpc(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R1
+	MOVD	callpc+8(FP), R2
+	MOVD	pc+16(FP), R3
+	// void __tsan_read_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVD	$__tsan_read_pc(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racewrite(addr uintptr)
+// Called from instrumented code.
+TEXT	runtime·racewrite(SB), NOSPLIT, $0-8
+	MOVD	addr+0(FP), R1
+	MOVD	LR, R2
+	// void __tsan_write(ThreadState *thr, void *addr, void *pc);
+	MOVD	$__tsan_write(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceWrite(addr uintptr)
+TEXT	runtime·RaceWrite(SB), NOSPLIT, $0-8
+	// This needs to be a tail call, because racewrite reads caller pc.
+	JMP	runtime·racewrite(SB)
+
+// func runtime·racewritepc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racewritepc(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R1
+	MOVD	callpc+8(FP), R2
+	MOVD	pc+16(FP), R3
+	// void __tsan_write_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVD	$__tsan_write_pc(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racereadrange(addr, size uintptr)
+// Called from instrumented code.
+TEXT	runtime·racereadrange(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), R1
+	MOVD	size+8(FP), R2
+	MOVD	LR, R3
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_read_range(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceReadRange(addr, size uintptr)
+TEXT	runtime·RaceReadRange(SB), NOSPLIT, $0-16
+	// This needs to be a tail call, because racereadrange reads caller pc.
+	JMP	runtime·racereadrange(SB)
+
+// func runtime·racereadrangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racereadrangepc1(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R1
+	MOVD	size+8(FP), R2
+	MOVD	pc+16(FP), R3
+	ADD	$4, R3	// pc is function start, tsan wants return address.
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_read_range(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racewriterange(addr, size uintptr)
+// Called from instrumented code.
+TEXT	runtime·racewriterange(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), R1
+	MOVD	size+8(FP), R2
+	MOVD	LR, R3
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_write_range(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceWriteRange(addr, size uintptr)
+TEXT	runtime·RaceWriteRange(SB), NOSPLIT, $0-16
+	// This needs to be a tail call, because racewriterange reads caller pc.
+	JMP	runtime·racewriterange(SB)
+
+// func runtime·racewriterangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racewriterangepc1(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R1
+	MOVD	size+8(FP), R2
+	MOVD	pc+16(FP), R3
+	ADD	$4, R3	// pc is function start, tsan wants return address.
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_write_range(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// If addr (R1) is out of range, do nothing.
+// Otherwise, setup goroutine context and invoke racecall. Other arguments already set.
+TEXT	racecalladdr<>(SB), NOSPLIT, $0-0
+	load_g
+	MOVD	g_racectx(g), R0
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	MOVD	runtime·racearenastart(SB), R10
+	CMP	R10, R1
+	BLT	data
+	MOVD	runtime·racearenaend(SB), R10
+	CMP	R10, R1
+	BLT	call
+data:
+	MOVD	runtime·racedatastart(SB), R10
+	CMP	R10, R1
+	BLT	ret
+	MOVD	runtime·racedataend(SB), R10
+	CMP	R10, R1
+	BGT	ret
+call:
+	JMP	racecall<>(SB)
+ret:
+	RET
+
+// func runtime·racefuncenterfp(fp uintptr)
+// Called from instrumented code.
+// Like racefuncenter but doesn't passes an arg, uses the caller pc
+// from the first slot on the stack
+TEXT	runtime·racefuncenterfp(SB), NOSPLIT, $0-0
+	MOVD	0(RSP), R9
+	JMP	racefuncenter<>(SB)
+
+// func runtime·racefuncenter(pc uintptr)
+// Called from instrumented code.
+TEXT	runtime·racefuncenter(SB), NOSPLIT, $0-8
+	MOVD	callpc+0(FP), R9
+	JMP	racefuncenter<>(SB)
+
+// Common code for racefuncenter/racefuncenterfp
+// R9 = caller's return address
+TEXT	racefuncenter<>(SB), NOSPLIT, $0-0
+	load_g
+	MOVD	g_racectx(g), R0	// goroutine racectx
+	MOVD	R9, R1
+	// void __tsan_func_enter(ThreadState *thr, void *pc);
+	MOVD	$__tsan_func_enter(SB), R9
+	BL	racecall<>(SB)
+	RET
+
+// func runtime·racefuncexit()
+// Called from instrumented code.
+TEXT	runtime·racefuncexit(SB), NOSPLIT, $0-0
+	load_g
+	MOVD	g_racectx(g), R0	// race context
+	// void __tsan_func_exit(ThreadState *thr);
+	MOVD	$__tsan_func_exit(SB), R9
+	JMP	racecall<>(SB)
+
+// Atomic operations for sync/atomic package.
+// R3 = addr of arguments passed to this function, it can
+// be fetched at 40(RSP) in racecallatomic after two times BL
+// R0, R1, R2 set in racecallatomic
+
+// Load
+TEXT	sync∕atomic·LoadInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_load(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_load(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt32(SB)
+
+TEXT	sync∕atomic·LoadUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadPointer(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+// Store
+TEXT	sync∕atomic·StoreInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_store(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·StoreInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_store(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·StoreUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt32(SB)
+
+TEXT	sync∕atomic·StoreUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt64(SB)
+
+TEXT	sync∕atomic·StoreUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt64(SB)
+
+// Swap
+TEXT	sync∕atomic·SwapInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_exchange(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·SwapInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_exchange(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·SwapUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt32(SB)
+
+TEXT	sync∕atomic·SwapUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt64(SB)
+
+TEXT	sync∕atomic·SwapUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt64(SB)
+
+// Add
+TEXT	sync∕atomic·AddInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_fetch_add(SB), R9
+	BL	racecallatomic<>(SB)
+	MOVW	add+8(FP), R0	// convert fetch_add to add_fetch
+	MOVW	ret+16(FP), R1
+	ADD	R0, R1, R0
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_fetch_add(SB), R9
+	BL	racecallatomic<>(SB)
+	MOVD	add+8(FP), R0	// convert fetch_add to add_fetch
+	MOVD	ret+16(FP), R1
+	ADD	R0, R1, R0
+	MOVD	R0, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	JMP	sync∕atomic·AddInt32(SB)
+
+TEXT	sync∕atomic·AddUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·AddInt64(SB)
+
+TEXT	sync∕atomic·AddUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·AddInt64(SB)
+
+// CompareAndSwap
+TEXT	sync∕atomic·CompareAndSwapInt32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_compare_exchange(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·CompareAndSwapInt64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_compare_exchange(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·CompareAndSwapUint32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt32(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUint64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt64(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUintptr(SB), NOSPLIT, $0-25
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt64(SB)
+
+// Generic atomic operation implementation.
+// R9 = addr of target function
+TEXT	racecallatomic<>(SB), NOSPLIT, $0
+	// Set up these registers
+	// R0 = *ThreadState
+	// R1 = caller pc
+	// R2 = pc
+	// R3 = addr of incoming arg list
+
+	// Trigger SIGSEGV early.
+	MOVD	40(RSP), R3	// 1st arg is addr. after two times BL, get it at 40(RSP)
+	MOVD	(R3), R13	// segv here if addr is bad
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	MOVD	runtime·racearenastart(SB), R10
+	CMP	R10, R3
+	BLT	racecallatomic_data
+	MOVD	runtime·racearenaend(SB), R10
+	CMP	R10, R3
+	BLT	racecallatomic_ok
+racecallatomic_data:
+	MOVD	runtime·racedatastart(SB), R10
+	CMP	R10, R3
+	BLT	racecallatomic_ignore
+	MOVD	runtime·racedataend(SB), R10
+	CMP	R10, R3
+	BGE	racecallatomic_ignore
+racecallatomic_ok:
+	// Addr is within the good range, call the atomic function.
+	load_g
+	MOVD	g_racectx(g), R0	// goroutine context
+	MOVD	16(RSP), R1	// caller pc
+	MOVD	R9, R2	// pc
+	ADD	$40, RSP, R3
+	JMP	racecall<>(SB)	// does not return
+racecallatomic_ignore:
+	// Addr is outside the good range.
+	// Call __tsan_go_ignore_sync_begin to ignore synchronization during the atomic op.
+	// An attempt to synchronize on the address would cause crash.
+	MOVD	R9, R20	// remember the original function
+	MOVD	$__tsan_go_ignore_sync_begin(SB), R9
+	load_g
+	MOVD	g_racectx(g), R0	// goroutine context
+	BL	racecall<>(SB)
+	MOVD	R20, R9	// restore the original function
+	// Call the atomic function.
+	// racecall will call LLVM race code which might clobber R28 (g)
+	load_g
+	MOVD	g_racectx(g), R0	// goroutine context
+	MOVD	16(RSP), R1	// caller pc
+	MOVD	R9, R2	// pc
+	ADD	$40, RSP, R3	// arguments
+	BL	racecall<>(SB)
+	// Call __tsan_go_ignore_sync_end.
+	MOVD	$__tsan_go_ignore_sync_end(SB), R9
+	MOVD	g_racectx(g), R0	// goroutine context
+	BL	racecall<>(SB)
+	RET
+
+// func runtime·racecall(void(*f)(...), ...)
+// Calls C function f from race runtime and passes up to 4 arguments to it.
+// The arguments are never heap-object-preserving pointers, so we pretend there are no arguments.
+TEXT	runtime·racecall(SB), NOSPLIT, $0-0
+	MOVD	fn+0(FP), R9
+	MOVD	arg0+8(FP), R0
+	MOVD	arg1+16(FP), R1
+	MOVD	arg2+24(FP), R2
+	MOVD	arg3+32(FP), R3
+	JMP	racecall<>(SB)
+
+// Switches SP to g0 stack and calls (R9). Arguments already set.
+TEXT	racecall<>(SB), NOSPLIT, $0-0
+	MOVD	g_m(g), R10
+	// Switch to g0 stack.
+	MOVD	RSP, R19	// callee-saved, preserved across the CALL
+	MOVD	m_g0(R10), R11
+	CMP	R11, g
+	BEQ	call	// already on g0
+	MOVD	(g_sched+gobuf_sp)(R11), R12
+	MOVD	R12, RSP
+call:
+	BL	R9
+	MOVD	R19, RSP
+	RET
+
+// C->Go callback thunk that allows to call runtime·racesymbolize from C code.
+// Direct Go->C race call has only switched SP, finish g->g0 switch by setting correct g.
+// The overall effect of Go->C->Go call chain is similar to that of mcall.
+// R0 contains command code. R1 contains command-specific context.
+// See racecallback for command codes.
+TEXT	runtime·racecallbackthunk(SB), NOSPLIT|NOFRAME, $0
+	// Handle command raceGetProcCmd (0) here.
+	// First, code below assumes that we are on curg, while raceGetProcCmd
+	// can be executed on g0. Second, it is called frequently, so will
+	// benefit from this fast path.
+	CBNZ	R0, rest
+	MOVD	g, R13
+#ifdef TLS_darwin
+	MOVD	R27, R12 // save R27 a.k.a. REGTMP (callee-save in C). load_g clobbers it
+#endif
+	load_g
+#ifdef TLS_darwin
+	MOVD	R12, R27
+#endif
+	MOVD	g_m(g), R0
+	MOVD	m_p(R0), R0
+	MOVD	p_raceprocctx(R0), R0
+	MOVD	R0, (R1)
+	MOVD	R13, g
+	JMP	(LR)
+rest:
+	// Save callee-saved registers (Go code won't respect that).
+	// 8(RSP) and 16(RSP) are for args passed through racecallback
+	SUB	$112, RSP
+	MOVD	LR, 0(RSP)
+	STP	(R19, R20), 24(RSP)
+	STP	(R21, R22), 40(RSP)
+	STP	(R23, R24), 56(RSP)
+	STP	(R25, R26), 72(RSP)
+	STP	(R27,   g), 88(RSP)
+	// Set g = g0.
+	// load_g will clobber R0, Save R0
+	MOVD	R0, R13
+	load_g
+	// restore R0
+	MOVD	R13, R0
+	MOVD	g_m(g), R13
+	MOVD	m_g0(R13), R14
+	CMP	R14, g
+	BEQ	noswitch	// branch if already on g0
+	MOVD	R14, g
+
+	MOVD	R0, 8(RSP)	// func arg
+	MOVD	R1, 16(RSP)	// func arg
+	BL	runtime·racecallback(SB)
+
+	// All registers are smashed after Go code, reload.
+	MOVD	g_m(g), R13
+	MOVD	m_curg(R13), g	// g = m->curg
+ret:
+	// Restore callee-saved registers.
+	MOVD	0(RSP), LR
+	LDP	24(RSP), (R19, R20)
+	LDP	40(RSP), (R21, R22)
+	LDP	56(RSP), (R23, R24)
+	LDP	72(RSP), (R25, R26)
+	LDP	88(RSP), (R27,   g)
+	ADD	$112, RSP
+	JMP	(LR)
+
+noswitch:
+	// already on g0
+	MOVD	R0, 8(RSP)	// func arg
+	MOVD	R1, 16(RSP)	// func arg
+	BL	runtime·racecallback(SB)
+	JMP	ret
+
+#ifndef TLSG_IS_VARIABLE
+// tls_g, g value for each thread in TLS
+GLOBL runtime·tls_g+0(SB), TLSBSS+DUPOK, $8
+#endif
diff --git a/src/runtime/race_ppc64le.s b/src/runtime/race_ppc64le.s
new file mode 100644
index 0000000..8961254
--- /dev/null
+++ b/src/runtime/race_ppc64le.s
@@ -0,0 +1,608 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build race
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+#include "asm_ppc64x.h"
+
+// The following functions allow calling the clang-compiled race runtime directly
+// from Go code without going all the way through cgo.
+// First, it's much faster (up to 50% speedup for real Go programs).
+// Second, it eliminates race-related special cases from cgocall and scheduler.
+// Third, in long-term it will allow to remove cyclic runtime/race dependency on cmd/go.
+
+// A brief recap of the ppc64le calling convention.
+// Arguments are passed in R3, R4, R5 ...
+// SP must be 16-byte aligned.
+
+// Note that for ppc64x, LLVM follows the standard ABI and
+// expects arguments in registers, so these functions move
+// the arguments from storage to the registers expected
+// by the ABI.
+
+// When calling from Go to Clang tsan code:
+// R3 is the 1st argument and is usually the ThreadState*
+// R4-? are the 2nd, 3rd, 4th, etc. arguments
+
+// When calling racecalladdr:
+// R8 is the call target address
+
+// The race ctx is passed in R3 and loaded in
+// racecalladdr.
+//
+// The sequence used to get the race ctx:
+//    MOVD    runtime·tls_g(SB), R10	// offset to TLS
+//    MOVD    0(R13)(R10*1), g		// R13=TLS for this thread, g = R30
+//    MOVD    g_racectx(g), R3		// racectx == ThreadState
+
+// func runtime·RaceRead(addr uintptr)
+// Called from instrumented Go code
+TEXT	runtime·raceread(SB), NOSPLIT, $0-8
+	MOVD	addr+0(FP), R4
+	MOVD	LR, R5 // caller of this?
+	// void __tsan_read(ThreadState *thr, void *addr, void *pc);
+	MOVD	$__tsan_read(SB), R8
+	BR	racecalladdr<>(SB)
+
+TEXT    runtime·RaceRead(SB), NOSPLIT, $0-8
+	BR	runtime·raceread(SB)
+
+// void runtime·racereadpc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racereadpc(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R4
+	MOVD	callpc+8(FP), R5
+	MOVD	pc+16(FP), R6
+	// void __tsan_read_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVD	$__tsan_read_pc(SB), R8
+	BR	racecalladdr<>(SB)
+
+// func runtime·RaceWrite(addr uintptr)
+// Called from instrumented Go code
+TEXT	runtime·racewrite(SB), NOSPLIT, $0-8
+	MOVD	addr+0(FP), R4
+	MOVD	LR, R5 // caller has set LR via BL inst
+	// void __tsan_write(ThreadState *thr, void *addr, void *pc);
+	MOVD	$__tsan_write(SB), R8
+	BR	racecalladdr<>(SB)
+
+TEXT    runtime·RaceWrite(SB), NOSPLIT, $0-8
+	JMP	runtime·racewrite(SB)
+
+// void runtime·racewritepc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racewritepc(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R4
+	MOVD	callpc+8(FP), R5
+	MOVD	pc+16(FP), R6
+	// void __tsan_write_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVD	$__tsan_write_pc(SB), R8
+	BR	racecalladdr<>(SB)
+
+// func runtime·RaceReadRange(addr, size uintptr)
+// Called from instrumented Go code.
+TEXT	runtime·racereadrange(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), R4
+	MOVD	size+8(FP), R5
+	MOVD	LR, R6
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_read_range(SB), R8
+	BR	racecalladdr<>(SB)
+
+// void runtime·racereadrangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racereadrangepc1(SB), NOSPLIT, $0-24
+	MOVD    addr+0(FP), R4
+	MOVD    size+8(FP), R5
+	MOVD    pc+16(FP), R6
+	ADD	$4, R6		// tsan wants return addr
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD    $__tsan_read_range(SB), R8
+	BR	racecalladdr<>(SB)
+
+TEXT    runtime·RaceReadRange(SB), NOSPLIT, $0-16
+	BR	runtime·racereadrange(SB)
+
+// func runtime·RaceWriteRange(addr, size uintptr)
+// Called from instrumented Go code.
+TEXT	runtime·racewriterange(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), R4
+	MOVD	size+8(FP), R5
+	MOVD	LR, R6
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_write_range(SB), R8
+	BR	racecalladdr<>(SB)
+
+TEXT    runtime·RaceWriteRange(SB), NOSPLIT, $0-16
+	BR	runtime·racewriterange(SB)
+
+// void runtime·racewriterangepc1(void *addr, uintptr sz, void *pc)
+// Called from instrumented Go code
+TEXT	runtime·racewriterangepc1(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R4
+	MOVD	size+8(FP), R5
+	MOVD	pc+16(FP), R6
+	ADD	$4, R6			// add 4 to inst offset?
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_write_range(SB), R8
+	BR	racecalladdr<>(SB)
+
+// Call a __tsan function from Go code.
+// R8 = tsan function address
+// R3 = *ThreadState a.k.a. g_racectx from g
+// R4 = addr passed to __tsan function
+//
+// Otherwise, setup goroutine context and invoke racecall. Other arguments already set.
+TEXT	racecalladdr<>(SB), NOSPLIT, $0-0
+	MOVD    runtime·tls_g(SB), R10
+	MOVD	0(R13)(R10*1), g
+	MOVD	g_racectx(g), R3	// goroutine context
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	MOVD	runtime·racearenastart(SB), R9
+	CMP	R4, R9
+	BLT	data
+	MOVD	runtime·racearenaend(SB), R9
+	CMP	R4, R9
+	BLT	call
+data:
+	MOVD	runtime·racedatastart(SB), R9
+	CMP	R4, R9
+	BLT	ret
+	MOVD	runtime·racedataend(SB), R9
+	CMP	R4, R9
+	BGT	ret
+call:
+	// Careful!! racecall will save LR on its
+	// stack, which is OK as long as racecalladdr
+	// doesn't change in a way that generates a stack.
+	// racecall should return to the caller of
+	// recalladdr.
+	BR	racecall<>(SB)
+ret:
+	RET
+
+// func runtime·racefuncenterfp()
+// Called from instrumented Go code.
+// Like racefuncenter but doesn't pass an arg, uses the caller pc
+// from the first slot on the stack.
+TEXT	runtime·racefuncenterfp(SB), NOSPLIT, $0-0
+	MOVD	0(R1), R8
+	BR	racefuncenter<>(SB)
+
+// func runtime·racefuncenter(pc uintptr)
+// Called from instrumented Go code.
+// Not used now since gc/racewalk.go doesn't pass the
+// correct caller pc and racefuncenterfp can do it.
+TEXT	runtime·racefuncenter(SB), NOSPLIT, $0-8
+	MOVD	callpc+0(FP), R8
+	BR	racefuncenter<>(SB)
+
+// Common code for racefuncenter/racefuncenterfp
+// R11 = caller's return address
+TEXT	racefuncenter<>(SB), NOSPLIT, $0-0
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R13)(R10*1), g
+	MOVD    g_racectx(g), R3        // goroutine racectx aka *ThreadState
+	MOVD	R8, R4			// caller pc set by caller in R8
+	// void __tsan_func_enter(ThreadState *thr, void *pc);
+	MOVD	$__tsan_func_enter(SB), R8
+	BR	racecall<>(SB)
+	RET
+
+// func runtime·racefuncexit()
+// Called from Go instrumented code.
+TEXT	runtime·racefuncexit(SB), NOSPLIT, $0-0
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R13)(R10*1), g
+	MOVD    g_racectx(g), R3        // goroutine racectx aka *ThreadState
+	// void __tsan_func_exit(ThreadState *thr);
+	MOVD	$__tsan_func_exit(SB), R8
+	BR	racecall<>(SB)
+
+// Atomic operations for sync/atomic package.
+// Some use the __tsan versions instead
+// R6 = addr of arguments passed to this function
+// R3, R4, R5 set in racecallatomic
+
+// Load atomic in tsan
+TEXT	sync∕atomic·LoadInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	// void __tsan_go_atomic32_load(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic32_load(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	// void __tsan_go_atomic64_load(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic64_load(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	BR	sync∕atomic·LoadInt32(SB)
+
+TEXT	sync∕atomic·LoadUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	BR	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	BR	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadPointer(SB), NOSPLIT, $0-16
+	GO_ARGS
+	BR	sync∕atomic·LoadInt64(SB)
+
+// Store atomic in tsan
+TEXT	sync∕atomic·StoreInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	// void __tsan_go_atomic32_store(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic32_store(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·StoreInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	// void __tsan_go_atomic64_store(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic64_store(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·StoreUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	BR	sync∕atomic·StoreInt32(SB)
+
+TEXT	sync∕atomic·StoreUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	BR	sync∕atomic·StoreInt64(SB)
+
+TEXT	sync∕atomic·StoreUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	BR	sync∕atomic·StoreInt64(SB)
+
+// Swap in tsan
+TEXT	sync∕atomic·SwapInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	// void __tsan_go_atomic32_exchange(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic32_exchange(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·SwapInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	// void __tsan_go_atomic64_exchange(ThreadState *thr, uptr cpc, uptr pc, u8 *a)
+	MOVD	$__tsan_go_atomic64_exchange(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·SwapUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	BR	sync∕atomic·SwapInt32(SB)
+
+TEXT	sync∕atomic·SwapUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	BR	sync∕atomic·SwapInt64(SB)
+
+TEXT	sync∕atomic·SwapUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	BR	sync∕atomic·SwapInt64(SB)
+
+// Add atomic in tsan
+TEXT	sync∕atomic·AddInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	// void __tsan_go_atomic32_fetch_add(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic32_fetch_add(SB), R8
+	ADD	$64, R1, R6	// addr of caller's 1st arg
+	BL	racecallatomic<>(SB)
+	// The tsan fetch_add result is not as expected by Go,
+	// so the 'add' must be added to the result.
+	MOVW	add+8(FP), R3	// The tsa fetch_add does not return the
+	MOVW	ret+16(FP), R4	// result as expected by go, so fix it.
+	ADD	R3, R4, R3
+	MOVW	R3, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	// void __tsan_go_atomic64_fetch_add(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic64_fetch_add(SB), R8
+	ADD	$64, R1, R6	// addr of caller's 1st arg
+	BL	racecallatomic<>(SB)
+	// The tsan fetch_add result is not as expected by Go,
+	// so the 'add' must be added to the result.
+	MOVD	add+8(FP), R3
+	MOVD	ret+16(FP), R4
+	ADD	R3, R4, R3
+	MOVD	R3, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	BR	sync∕atomic·AddInt32(SB)
+
+TEXT	sync∕atomic·AddUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	BR	sync∕atomic·AddInt64(SB)
+
+TEXT	sync∕atomic·AddUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	BR	sync∕atomic·AddInt64(SB)
+
+// CompareAndSwap in tsan
+TEXT	sync∕atomic·CompareAndSwapInt32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	// void __tsan_go_atomic32_compare_exchange(
+	//   ThreadState *thr, uptr cpc, uptr pc, u8 *a)
+	MOVD	$__tsan_go_atomic32_compare_exchange(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·CompareAndSwapInt64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	// void __tsan_go_atomic32_compare_exchange(
+	//   ThreadState *thr, uptr cpc, uptr pc, u8 *a)
+	MOVD	$__tsan_go_atomic64_compare_exchange(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUint32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	BR	sync∕atomic·CompareAndSwapInt32(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUint64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	BR	sync∕atomic·CompareAndSwapInt64(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUintptr(SB), NOSPLIT, $0-25
+	GO_ARGS
+	BR	sync∕atomic·CompareAndSwapInt64(SB)
+
+// Common function used to call tsan's atomic functions
+// R3 = *ThreadState
+// R4 = TODO: What's this supposed to be?
+// R5 = caller pc
+// R6 = addr of incoming arg list
+// R8 contains addr of target function.
+TEXT	racecallatomic<>(SB), NOSPLIT, $0-0
+	// Trigger SIGSEGV early if address passed to atomic function is bad.
+	MOVD	(R6), R7	// 1st arg is addr
+	MOVD	(R7), R9	// segv here if addr is bad
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	MOVD	runtime·racearenastart(SB), R9
+	CMP	R7, R9
+	BLT	racecallatomic_data
+	MOVD	runtime·racearenaend(SB), R9
+	CMP	R7, R9
+	BLT	racecallatomic_ok
+racecallatomic_data:
+	MOVD	runtime·racedatastart(SB), R9
+	CMP	R7, R9
+	BLT	racecallatomic_ignore
+	MOVD	runtime·racedataend(SB), R9
+	CMP	R7, R9
+	BGE	racecallatomic_ignore
+racecallatomic_ok:
+	// Addr is within the good range, call the atomic function.
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R13)(R10*1), g
+	MOVD    g_racectx(g), R3        // goroutine racectx aka *ThreadState
+	MOVD	R8, R5			// pc is the function called
+	MOVD	(R1), R4		// caller pc from stack
+	BL	racecall<>(SB)		// BL needed to maintain stack consistency
+	RET				//
+racecallatomic_ignore:
+	// Addr is outside the good range.
+	// Call __tsan_go_ignore_sync_begin to ignore synchronization during the atomic op.
+	// An attempt to synchronize on the address would cause crash.
+	MOVD	R8, R15	// save the original function
+	MOVD	R6, R17 // save the original arg list addr
+	MOVD	$__tsan_go_ignore_sync_begin(SB), R8 // func addr to call
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R13)(R10*1), g
+	MOVD    g_racectx(g), R3        // goroutine context
+	BL	racecall<>(SB)
+	MOVD	R15, R8	// restore the original function
+	MOVD	R17, R6 // restore arg list addr
+	// Call the atomic function.
+	// racecall will call LLVM race code which might clobber r30 (g)
+	MOVD	runtime·tls_g(SB), R10
+	MOVD	0(R13)(R10*1), g
+
+	MOVD	g_racectx(g), R3
+	MOVD	R8, R4		// pc being called same TODO as above
+	MOVD	(R1), R5	// caller pc from latest LR
+	BL	racecall<>(SB)
+	// Call __tsan_go_ignore_sync_end.
+	MOVD	$__tsan_go_ignore_sync_end(SB), R8
+	MOVD	g_racectx(g), R3	// goroutine context g should sitll be good?
+	BL	racecall<>(SB)
+	RET
+
+// void runtime·racecall(void(*f)(...), ...)
+// Calls C function f from race runtime and passes up to 4 arguments to it.
+// The arguments are never heap-object-preserving pointers, so we pretend there are no arguments.
+TEXT	runtime·racecall(SB), NOSPLIT, $0-0
+	MOVD	fn+0(FP), R8
+	MOVD	arg0+8(FP), R3
+	MOVD	arg1+16(FP), R4
+	MOVD	arg2+24(FP), R5
+	MOVD	arg3+32(FP), R6
+	JMP	racecall<>(SB)
+
+// Finds g0 and sets its stack
+// Arguments were loaded for call from Go to C
+TEXT	racecall<>(SB), NOSPLIT, $0-0
+	// Set the LR slot for the ppc64 ABI
+	MOVD	LR, R10
+	MOVD	R10, 0(R1)	// Go expectation
+	MOVD	R10, 16(R1)	// C ABI
+	// Get info from the current goroutine
+	MOVD    runtime·tls_g(SB), R10	// g offset in TLS
+	MOVD    0(R13)(R10*1), g	// R13 = current TLS
+	MOVD	g_m(g), R7		// m for g
+	MOVD	R1, R16			// callee-saved, preserved across C call
+	MOVD	m_g0(R7), R10		// g0 for m
+	CMP	R10, g			// same g0?
+	BEQ	call			// already on g0
+	MOVD	(g_sched+gobuf_sp)(R10), R1 // switch R1
+call:
+	MOVD	R8, CTR			// R8 = caller addr
+	MOVD	R8, R12			// expected by PPC64 ABI
+	BL	(CTR)
+	XOR     R0, R0			// clear R0 on return from Clang
+	MOVD	R16, R1			// restore R1; R16 nonvol in Clang
+	MOVD    runtime·tls_g(SB), R10	// find correct g
+	MOVD    0(R13)(R10*1), g
+	MOVD	16(R1), R10		// LR was saved away, restore for return
+	MOVD	R10, LR
+	RET
+
+// C->Go callback thunk that allows to call runtime·racesymbolize from C code.
+// Direct Go->C race call has only switched SP, finish g->g0 switch by setting correct g.
+// The overall effect of Go->C->Go call chain is similar to that of mcall.
+// RARG0 contains command code. RARG1 contains command-specific context.
+// See racecallback for command codes.
+TEXT	runtime·racecallbackthunk(SB), NOSPLIT, $-8
+	// Handle command raceGetProcCmd (0) here.
+	// First, code below assumes that we are on curg, while raceGetProcCmd
+	// can be executed on g0. Second, it is called frequently, so will
+	// benefit from this fast path.
+	XOR	R0, R0		// clear R0 since we came from C code
+	CMP	R3, $0
+	BNE	rest
+	// g0 TODO: Don't modify g here since R30 is nonvolatile
+	MOVD	g, R9
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R13)(R10*1), g
+	MOVD	g_m(g), R3
+	MOVD	m_p(R3), R3
+	MOVD	p_raceprocctx(R3), R3
+	MOVD	R3, (R4)
+	MOVD	R9, g		// restore R30 ??
+	RET
+
+	// This is all similar to what cgo does
+	// Save registers according to the ppc64 ABI
+rest:
+	MOVD	LR, R10	// save link register
+	MOVD	R10, 16(R1)
+	MOVW	CR, R10
+	MOVW	R10, 8(R1)
+	MOVDU   R1, -336(R1) // Allocate frame needed for outargs and register save area
+
+	MOVD    R14, 328(R1)
+	MOVD    R15, 48(R1)
+	MOVD    R16, 56(R1)
+	MOVD    R17, 64(R1)
+	MOVD    R18, 72(R1)
+	MOVD    R19, 80(R1)
+	MOVD    R20, 88(R1)
+	MOVD    R21, 96(R1)
+	MOVD    R22, 104(R1)
+	MOVD    R23, 112(R1)
+	MOVD    R24, 120(R1)
+	MOVD    R25, 128(R1)
+	MOVD    R26, 136(R1)
+	MOVD    R27, 144(R1)
+	MOVD    R28, 152(R1)
+	MOVD    R29, 160(R1)
+	MOVD    g, 168(R1) // R30
+	MOVD    R31, 176(R1)
+	FMOVD   F14, 184(R1)
+	FMOVD   F15, 192(R1)
+	FMOVD   F16, 200(R1)
+	FMOVD   F17, 208(R1)
+	FMOVD   F18, 216(R1)
+	FMOVD   F19, 224(R1)
+	FMOVD   F20, 232(R1)
+	FMOVD   F21, 240(R1)
+	FMOVD   F22, 248(R1)
+	FMOVD   F23, 256(R1)
+	FMOVD   F24, 264(R1)
+	FMOVD   F25, 272(R1)
+	FMOVD   F26, 280(R1)
+	FMOVD   F27, 288(R1)
+	FMOVD   F28, 296(R1)
+	FMOVD   F29, 304(R1)
+	FMOVD   F30, 312(R1)
+	FMOVD   F31, 320(R1)
+
+	MOVD	R3, FIXED_FRAME+0(R1)
+	MOVD	R4, FIXED_FRAME+8(R1)
+
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R13)(R10*1), g
+
+	MOVD	g_m(g), R7
+	MOVD	m_g0(R7), R8
+	CMP	g, R8
+	BEQ	noswitch
+
+	MOVD	R8, g // set g = m-> g0
+
+	BL	runtime·racecallback(SB)
+
+	// All registers are clobbered after Go code, reload.
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R13)(R10*1), g
+
+	MOVD	g_m(g), R7
+	MOVD	m_curg(R7), g // restore g = m->curg
+
+ret:
+	MOVD    328(R1), R14
+	MOVD    48(R1), R15
+	MOVD    56(R1), R16
+	MOVD    64(R1), R17
+	MOVD    72(R1), R18
+	MOVD    80(R1), R19
+	MOVD    88(R1), R20
+	MOVD    96(R1), R21
+	MOVD    104(R1), R22
+	MOVD    112(R1), R23
+	MOVD    120(R1), R24
+	MOVD    128(R1), R25
+	MOVD    136(R1), R26
+	MOVD    144(R1), R27
+	MOVD    152(R1), R28
+	MOVD    160(R1), R29
+	MOVD    168(R1), g // R30
+	MOVD    176(R1), R31
+	FMOVD   184(R1), F14
+	FMOVD   192(R1), F15
+	FMOVD   200(R1), F16
+	FMOVD   208(R1), F17
+	FMOVD   216(R1), F18
+	FMOVD   224(R1), F19
+	FMOVD   232(R1), F20
+	FMOVD   240(R1), F21
+	FMOVD   248(R1), F22
+	FMOVD   256(R1), F23
+	FMOVD   264(R1), F24
+	FMOVD   272(R1), F25
+	FMOVD   280(R1), F26
+	FMOVD   288(R1), F27
+	FMOVD   296(R1), F28
+	FMOVD   304(R1), F29
+	FMOVD   312(R1), F30
+	FMOVD   320(R1), F31
+
+	ADD     $336, R1
+	MOVD    8(R1), R10
+	MOVFL   R10, $0xff // Restore of CR
+	MOVD    16(R1), R10	// needed?
+	MOVD    R10, LR
+	RET
+
+noswitch:
+	BL      runtime·racecallback(SB)
+	JMP     ret
+
+// tls_g, g value for each thread in TLS
+GLOBL runtime·tls_g+0(SB), TLSBSS+DUPOK, $8
diff --git a/src/runtime/rand_test.go b/src/runtime/rand_test.go
new file mode 100644
index 0000000..1b84c79
--- /dev/null
+++ b/src/runtime/rand_test.go
@@ -0,0 +1,45 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	. "runtime"
+	"strconv"
+	"testing"
+)
+
+func BenchmarkFastrand(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			Fastrand()
+		}
+	})
+}
+
+func BenchmarkFastrandHashiter(b *testing.B) {
+	var m = make(map[int]int, 10)
+	for i := 0; i < 10; i++ {
+		m[i] = i
+	}
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			for range m {
+				break
+			}
+		}
+	})
+}
+
+var sink32 uint32
+
+func BenchmarkFastrandn(b *testing.B) {
+	for n := uint32(2); n <= 5; n++ {
+		b.Run(strconv.Itoa(int(n)), func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				sink32 = Fastrandn(n)
+			}
+		})
+	}
+}
diff --git a/src/runtime/rdebug.go b/src/runtime/rdebug.go
new file mode 100644
index 0000000..1b213f1
--- /dev/null
+++ b/src/runtime/rdebug.go
@@ -0,0 +1,22 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import _ "unsafe" // for go:linkname
+
+//go:linkname setMaxStack runtime/debug.setMaxStack
+func setMaxStack(in int) (out int) {
+	out = int(maxstacksize)
+	maxstacksize = uintptr(in)
+	return out
+}
+
+//go:linkname setPanicOnFault runtime/debug.setPanicOnFault
+func setPanicOnFault(new bool) (old bool) {
+	_g_ := getg()
+	old = _g_.paniconfault
+	_g_.paniconfault = new
+	return old
+}
diff --git a/src/runtime/relax_stub.go b/src/runtime/relax_stub.go
new file mode 100644
index 0000000..81ed129
--- /dev/null
+++ b/src/runtime/relax_stub.go
@@ -0,0 +1,17 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !windows
+
+package runtime
+
+// osRelaxMinNS is the number of nanoseconds of idleness to tolerate
+// without performing an osRelax. Since osRelax may reduce the
+// precision of timers, this should be enough larger than the relaxed
+// timer precision to keep the timer error acceptable.
+const osRelaxMinNS = 0
+
+// osRelax is called by the scheduler when transitioning to and from
+// all Ps being idle.
+func osRelax(relax bool) {}
diff --git a/src/runtime/rt0_aix_ppc64.s b/src/runtime/rt0_aix_ppc64.s
new file mode 100644
index 0000000..e06caa1
--- /dev/null
+++ b/src/runtime/rt0_aix_ppc64.s
@@ -0,0 +1,199 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// _rt0_ppc64_aix is a function descriptor of the entrypoint function
+// __start. This name is needed by cmd/link.
+DATA	_rt0_ppc64_aix+0(SB)/8, $__start<>(SB)
+DATA	_rt0_ppc64_aix+8(SB)/8, $TOC(SB)
+GLOBL	_rt0_ppc64_aix(SB), NOPTR, $16
+
+
+// The starting function must return in the loader to
+// initialise some librairies, especially libthread which
+// creates the main thread and adds the TLS in R13
+// R19 contains a function descriptor to the loader function
+// which needs to be called.
+// This code is similar to the __start function in C
+TEXT __start<>(SB),NOSPLIT,$-8
+	XOR R0, R0
+	MOVD $libc___n_pthreads(SB), R4
+	MOVD 0(R4), R4
+	MOVD $libc___mod_init(SB), R5
+	MOVD 0(R5), R5
+	MOVD 0(R19), R0
+	MOVD R2, 40(R1)
+	MOVD 8(R19), R2
+	MOVD R18, R3
+	MOVD R0, CTR
+	BL (CTR) // Return to AIX loader
+
+	// Launch rt0_go
+	MOVD 40(R1), R2
+	MOVD R14, R3 // argc
+	MOVD R15, R4 // argv
+	BL _main(SB)
+
+
+DATA	main+0(SB)/8, $_main(SB)
+DATA	main+8(SB)/8, $TOC(SB)
+DATA	main+16(SB)/8, $0
+GLOBL	main(SB), NOPTR, $24
+
+TEXT _main(SB),NOSPLIT,$-8
+	MOVD $runtime·rt0_go(SB), R12
+	MOVD R12, CTR
+	BR (CTR)
+
+
+TEXT _rt0_ppc64_aix_lib(SB),NOSPLIT,$-8
+	// Start with standard C stack frame layout and linkage.
+	MOVD	LR, R0
+	MOVD	R0, 16(R1) // Save LR in caller's frame.
+	MOVW	CR, R0	   // Save CR in caller's frame
+	MOVD	R0, 8(R1)
+
+	MOVDU	R1, -344(R1) // Allocate frame.
+
+	// Preserve callee-save registers.
+	MOVD	R14, 48(R1)
+	MOVD	R15, 56(R1)
+	MOVD	R16, 64(R1)
+	MOVD	R17, 72(R1)
+	MOVD	R18, 80(R1)
+	MOVD	R19, 88(R1)
+	MOVD	R20, 96(R1)
+	MOVD	R21,104(R1)
+	MOVD	R22, 112(R1)
+	MOVD	R23, 120(R1)
+	MOVD	R24, 128(R1)
+	MOVD	R25, 136(R1)
+	MOVD	R26, 144(R1)
+	MOVD	R27, 152(R1)
+	MOVD	R28, 160(R1)
+	MOVD	R29, 168(R1)
+	MOVD	g, 176(R1) // R30
+	MOVD	R31, 184(R1)
+	FMOVD	F14, 192(R1)
+	FMOVD	F15, 200(R1)
+	FMOVD	F16, 208(R1)
+	FMOVD	F17, 216(R1)
+	FMOVD	F18, 224(R1)
+	FMOVD	F19, 232(R1)
+	FMOVD	F20, 240(R1)
+	FMOVD	F21, 248(R1)
+	FMOVD	F22, 256(R1)
+	FMOVD	F23, 264(R1)
+	FMOVD	F24, 272(R1)
+	FMOVD	F25, 280(R1)
+	FMOVD	F26, 288(R1)
+	FMOVD	F27, 296(R1)
+	FMOVD	F28, 304(R1)
+	FMOVD	F29, 312(R1)
+	FMOVD	F30, 320(R1)
+	FMOVD	F31, 328(R1)
+
+	// Synchronous initialization.
+	MOVD	$runtime·reginit(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	MOVBZ	runtime·isarchive(SB), R3	// Check buildmode = c-archive
+	CMP		$0, R3
+	BEQ		done
+
+	MOVD	R14, _rt0_ppc64_aix_lib_argc<>(SB)
+	MOVD	R15, _rt0_ppc64_aix_lib_argv<>(SB)
+
+	MOVD	$runtime·libpreinit(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R12
+	CMP	$0, R12
+	BEQ	nocgo
+	MOVD	$_rt0_ppc64_aix_lib_go(SB), R3
+	MOVD	$0, R4
+	MOVD	R2, 40(R1)
+	MOVD	8(R12), R2
+	MOVD	(R12), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+	MOVD	40(R1), R2
+	BR	done
+
+nocgo:
+	MOVD	$0x800000, R12					   // stacksize = 8192KB
+	MOVD	R12, 8(R1)
+	MOVD	$_rt0_ppc64_aix_lib_go(SB), R12
+	MOVD	R12, 16(R1)
+	MOVD	$runtime·newosproc0(SB),R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+done:
+	// Restore saved registers.
+	MOVD	48(R1), R14
+	MOVD	56(R1), R15
+	MOVD	64(R1), R16
+	MOVD	72(R1), R17
+	MOVD	80(R1), R18
+	MOVD	88(R1), R19
+	MOVD	96(R1), R20
+	MOVD	104(R1), R21
+	MOVD	112(R1), R22
+	MOVD	120(R1), R23
+	MOVD	128(R1), R24
+	MOVD	136(R1), R25
+	MOVD	144(R1), R26
+	MOVD	152(R1), R27
+	MOVD	160(R1), R28
+	MOVD	168(R1), R29
+	MOVD	176(R1), g // R30
+	MOVD	184(R1), R31
+	FMOVD	196(R1), F14
+	FMOVD	200(R1), F15
+	FMOVD	208(R1), F16
+	FMOVD	216(R1), F17
+	FMOVD	224(R1), F18
+	FMOVD	232(R1), F19
+	FMOVD	240(R1), F20
+	FMOVD	248(R1), F21
+	FMOVD	256(R1), F22
+	FMOVD	264(R1), F23
+	FMOVD	272(R1), F24
+	FMOVD	280(R1), F25
+	FMOVD	288(R1), F26
+	FMOVD	296(R1), F27
+	FMOVD	304(R1), F28
+	FMOVD	312(R1), F29
+	FMOVD	320(R1), F30
+	FMOVD	328(R1), F31
+
+	ADD	$344, R1
+
+	MOVD	8(R1), R0
+	MOVFL	R0, $0xff
+	MOVD	16(R1), R0
+	MOVD	R0, LR
+	RET
+
+DATA	_rt0_ppc64_aix_lib_go+0(SB)/8, $__rt0_ppc64_aix_lib_go(SB)
+DATA	_rt0_ppc64_aix_lib_go+8(SB)/8, $TOC(SB)
+DATA	_rt0_ppc64_aix_lib_go+16(SB)/8, $0
+GLOBL	_rt0_ppc64_aix_lib_go(SB), NOPTR, $24
+
+TEXT __rt0_ppc64_aix_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_ppc64_aix_lib_argc<>(SB), R3
+	MOVD	_rt0_ppc64_aix_lib_argv<>(SB), R4
+	MOVD	$runtime·rt0_go(SB), R12
+	MOVD	R12, CTR
+	BR	(CTR)
+
+DATA _rt0_ppc64_aix_lib_argc<>(SB)/8, $0
+GLOBL _rt0_ppc64_aix_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_ppc64_aix_lib_argv<>(SB)/8, $0
+GLOBL _rt0_ppc64_aix_lib_argv<>(SB),NOPTR, $8
diff --git a/src/runtime/rt0_android_386.s b/src/runtime/rt0_android_386.s
new file mode 100644
index 0000000..3a1b06b
--- /dev/null
+++ b/src/runtime/rt0_android_386.s
@@ -0,0 +1,27 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_android(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+TEXT _rt0_386_android_lib(SB),NOSPLIT,$0
+	PUSHL	$_rt0_386_android_argv(SB)  // argv
+	PUSHL	$1  // argc
+	CALL	_rt0_386_lib(SB)
+	POPL	AX
+	POPL	AX
+	RET
+
+DATA _rt0_386_android_argv+0x00(SB)/4,$_rt0_386_android_argv0(SB)
+DATA _rt0_386_android_argv+0x04(SB)/4,$0  // argv terminate
+DATA _rt0_386_android_argv+0x08(SB)/4,$0  // envp terminate
+DATA _rt0_386_android_argv+0x0c(SB)/4,$0  // auxv terminate
+GLOBL _rt0_386_android_argv(SB),NOPTR,$0x10
+
+// TODO: wire up necessary VDSO (see os_linux_386.go)
+
+DATA _rt0_386_android_argv0(SB)/8, $"gojni"
+GLOBL _rt0_386_android_argv0(SB),RODATA,$8
diff --git a/src/runtime/rt0_android_amd64.s b/src/runtime/rt0_android_amd64.s
new file mode 100644
index 0000000..6bda3bf
--- /dev/null
+++ b/src/runtime/rt0_android_amd64.s
@@ -0,0 +1,22 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_android(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_android_lib(SB),NOSPLIT,$0
+	MOVQ	$1, DI // argc
+	MOVQ	$_rt0_amd64_android_argv(SB), SI  // argv
+	JMP	_rt0_amd64_lib(SB)
+
+DATA _rt0_amd64_android_argv+0x00(SB)/8,$_rt0_amd64_android_argv0(SB)
+DATA _rt0_amd64_android_argv+0x08(SB)/8,$0 // end argv
+DATA _rt0_amd64_android_argv+0x10(SB)/8,$0 // end envv
+DATA _rt0_amd64_android_argv+0x18(SB)/8,$0 // end auxv
+GLOBL _rt0_amd64_android_argv(SB),NOPTR,$0x20
+
+DATA _rt0_amd64_android_argv0(SB)/8, $"gojni"
+GLOBL _rt0_amd64_android_argv0(SB),RODATA,$8
diff --git a/src/runtime/rt0_android_arm.s b/src/runtime/rt0_android_arm.s
new file mode 100644
index 0000000..cc5b78e
--- /dev/null
+++ b/src/runtime/rt0_android_arm.s
@@ -0,0 +1,25 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm_android(SB),NOSPLIT|NOFRAME,$0
+	MOVW		(R13), R0      // argc
+	MOVW		$4(R13), R1    // argv
+	MOVW		$_rt0_arm_linux1(SB), R4
+	B		(R4)
+
+TEXT _rt0_arm_android_lib(SB),NOSPLIT,$0
+	MOVW	$1, R0                          // argc
+	MOVW	$_rt0_arm_android_argv(SB), R1  // **argv
+	B	_rt0_arm_lib(SB)
+
+DATA _rt0_arm_android_argv+0x00(SB)/4,$_rt0_arm_android_argv0(SB)
+DATA _rt0_arm_android_argv+0x04(SB)/4,$0 // end argv
+DATA _rt0_arm_android_argv+0x08(SB)/4,$0 // end envv
+DATA _rt0_arm_android_argv+0x0c(SB)/4,$0 // end auxv
+GLOBL _rt0_arm_android_argv(SB),NOPTR,$0x10
+
+DATA _rt0_arm_android_argv0(SB)/8, $"gojni"
+GLOBL _rt0_arm_android_argv0(SB),RODATA,$8
diff --git a/src/runtime/rt0_android_arm64.s b/src/runtime/rt0_android_arm64.s
new file mode 100644
index 0000000..4135bf0
--- /dev/null
+++ b/src/runtime/rt0_android_arm64.s
@@ -0,0 +1,26 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm64_android(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$_rt0_arm64_linux(SB), R4
+	B	(R4)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm64_android_lib(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$1, R0                            // argc
+	MOVD	$_rt0_arm64_android_argv(SB), R1  // **argv
+	MOVD	$_rt0_arm64_linux_lib(SB), R4
+	B	(R4)
+
+DATA _rt0_arm64_android_argv+0x00(SB)/8,$_rt0_arm64_android_argv0(SB)
+DATA _rt0_arm64_android_argv+0x08(SB)/8,$0 // end argv
+DATA _rt0_arm64_android_argv+0x10(SB)/8,$0 // end envv
+DATA _rt0_arm64_android_argv+0x18(SB)/8,$0 // end auxv
+GLOBL _rt0_arm64_android_argv(SB),NOPTR,$0x20
+
+DATA _rt0_arm64_android_argv0(SB)/8, $"gojni"
+GLOBL _rt0_arm64_android_argv0(SB),RODATA,$8
diff --git a/src/runtime/rt0_darwin_amd64.s b/src/runtime/rt0_darwin_amd64.s
new file mode 100644
index 0000000..ed804d4
--- /dev/null
+++ b/src/runtime/rt0_darwin_amd64.s
@@ -0,0 +1,13 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_darwin(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+// When linking with -shared, this symbol is called when the shared library
+// is loaded.
+TEXT _rt0_amd64_darwin_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_darwin_arm64.s b/src/runtime/rt0_darwin_arm64.s
new file mode 100644
index 0000000..0040361
--- /dev/null
+++ b/src/runtime/rt0_darwin_arm64.s
@@ -0,0 +1,94 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm64_darwin(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	BL	(R2)
+exit:
+	MOVD	$0, R0
+	MOVD	$1, R16	// sys_exit
+	SVC	$0x80
+	B	exit
+
+// When linking with -buildmode=c-archive or -buildmode=c-shared,
+// this symbol is called from a global initialization function.
+//
+// Note that all currently shipping darwin/arm64 platforms require
+// cgo and do not support c-shared.
+TEXT _rt0_arm64_darwin_lib(SB),NOSPLIT,$168
+	// Preserve callee-save registers.
+	MOVD R19, 24(RSP)
+	MOVD R20, 32(RSP)
+	MOVD R21, 40(RSP)
+	MOVD R22, 48(RSP)
+	MOVD R23, 56(RSP)
+	MOVD R24, 64(RSP)
+	MOVD R25, 72(RSP)
+	MOVD R26, 80(RSP)
+	MOVD R27, 88(RSP)
+	MOVD g, 96(RSP)
+	FMOVD F8, 104(RSP)
+	FMOVD F9, 112(RSP)
+	FMOVD F10, 120(RSP)
+	FMOVD F11, 128(RSP)
+	FMOVD F12, 136(RSP)
+	FMOVD F13, 144(RSP)
+	FMOVD F14, 152(RSP)
+	FMOVD F15, 160(RSP)
+
+	MOVD  R0, _rt0_arm64_darwin_lib_argc<>(SB)
+	MOVD  R1, _rt0_arm64_darwin_lib_argv<>(SB)
+
+	MOVD	$0, g // initialize g to nil
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R4
+	BL	(R4)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD  _cgo_sys_thread_create(SB), R4
+	MOVD  $_rt0_arm64_darwin_lib_go(SB), R0
+	MOVD  $0, R1
+	SUB   $16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL    (R4)
+	ADD   $16, RSP
+
+	// Restore callee-save registers.
+	MOVD 24(RSP), R19
+	MOVD 32(RSP), R20
+	MOVD 40(RSP), R21
+	MOVD 48(RSP), R22
+	MOVD 56(RSP), R23
+	MOVD 64(RSP), R24
+	MOVD 72(RSP), R25
+	MOVD 80(RSP), R26
+	MOVD 88(RSP), R27
+	MOVD 96(RSP), g
+	FMOVD 104(RSP), F8
+	FMOVD 112(RSP), F9
+	FMOVD 120(RSP), F10
+	FMOVD 128(RSP), F11
+	FMOVD 136(RSP), F12
+	FMOVD 144(RSP), F13
+	FMOVD 152(RSP), F14
+	FMOVD 160(RSP), F15
+
+	RET
+
+TEXT _rt0_arm64_darwin_lib_go(SB),NOSPLIT,$0
+	MOVD  _rt0_arm64_darwin_lib_argc<>(SB), R0
+	MOVD  _rt0_arm64_darwin_lib_argv<>(SB), R1
+	MOVD  $runtime·rt0_go(SB), R4
+	B     (R4)
+
+DATA  _rt0_arm64_darwin_lib_argc<>(SB)/8, $0
+GLOBL _rt0_arm64_darwin_lib_argc<>(SB),NOPTR, $8
+DATA  _rt0_arm64_darwin_lib_argv<>(SB)/8, $0
+GLOBL _rt0_arm64_darwin_lib_argv<>(SB),NOPTR, $8
+
+// external linking entry point.
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	JMP	_rt0_arm64_darwin(SB)
diff --git a/src/runtime/rt0_dragonfly_amd64.s b/src/runtime/rt0_dragonfly_amd64.s
new file mode 100644
index 0000000..e76f9b9
--- /dev/null
+++ b/src/runtime/rt0_dragonfly_amd64.s
@@ -0,0 +1,14 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// On Dragonfly argc/argv are passed in DI, not SP, so we can't use _rt0_amd64.
+TEXT _rt0_amd64_dragonfly(SB),NOSPLIT,$-8
+	LEAQ	8(DI), SI // argv
+	MOVQ	0(DI), DI // argc
+	JMP	runtime·rt0_go(SB)
+
+TEXT _rt0_amd64_dragonfly_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_freebsd_386.s b/src/runtime/rt0_freebsd_386.s
new file mode 100644
index 0000000..1808059
--- /dev/null
+++ b/src/runtime/rt0_freebsd_386.s
@@ -0,0 +1,17 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_freebsd(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+TEXT _rt0_386_freebsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_386_lib(SB)
+
+TEXT main(SB),NOSPLIT,$0
+	// Remove the return address from the stack.
+	// rt0_go doesn't expect it to be there.
+	ADDL	$4, SP
+	JMP	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_freebsd_amd64.s b/src/runtime/rt0_freebsd_amd64.s
new file mode 100644
index 0000000..ccc48f6
--- /dev/null
+++ b/src/runtime/rt0_freebsd_amd64.s
@@ -0,0 +1,14 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// On FreeBSD argc/argv are passed in DI, not SP, so we can't use _rt0_amd64.
+TEXT _rt0_amd64_freebsd(SB),NOSPLIT,$-8
+	LEAQ	8(DI), SI // argv
+	MOVQ	0(DI), DI // argc
+	JMP	runtime·rt0_go(SB)
+
+TEXT _rt0_amd64_freebsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_freebsd_arm.s b/src/runtime/rt0_freebsd_arm.s
new file mode 100644
index 0000000..62ecd9a
--- /dev/null
+++ b/src/runtime/rt0_freebsd_arm.s
@@ -0,0 +1,11 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm_freebsd(SB),NOSPLIT,$0
+	B	_rt0_arm(SB)
+
+TEXT _rt0_arm_freebsd_lib(SB),NOSPLIT,$0
+	B	_rt0_arm_lib(SB)
diff --git a/src/runtime/rt0_freebsd_arm64.s b/src/runtime/rt0_freebsd_arm64.s
new file mode 100644
index 0000000..a938d98
--- /dev/null
+++ b/src/runtime/rt0_freebsd_arm64.s
@@ -0,0 +1,105 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// On FreeBSD argc/argv are passed in R0, not RSP
+TEXT _rt0_arm64_freebsd(SB),NOSPLIT|NOFRAME,$0
+	ADD	$8, R0, R1	// argv
+	MOVD	0(R0), R0	// argc
+	BL	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm64_freebsd_lib(SB),NOSPLIT,$184
+	// Preserve callee-save registers.
+	MOVD R19, 24(RSP)
+	MOVD R20, 32(RSP)
+	MOVD R21, 40(RSP)
+	MOVD R22, 48(RSP)
+	MOVD R23, 56(RSP)
+	MOVD R24, 64(RSP)
+	MOVD R25, 72(RSP)
+	MOVD R26, 80(RSP)
+	MOVD R27, 88(RSP)
+	FMOVD F8, 96(RSP)
+	FMOVD F9, 104(RSP)
+	FMOVD F10, 112(RSP)
+	FMOVD F11, 120(RSP)
+	FMOVD F12, 128(RSP)
+	FMOVD F13, 136(RSP)
+	FMOVD F14, 144(RSP)
+	FMOVD F15, 152(RSP)
+	MOVD g, 160(RSP)
+
+	// Initialize g as null in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOVD	ZR, g
+
+	MOVD	R0, _rt0_arm64_freebsd_lib_argc<>(SB)
+	MOVD	R1, _rt0_arm64_freebsd_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R4
+	BL	(R4)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R4
+	CBZ	R4, nocgo
+	MOVD	$_rt0_arm64_freebsd_lib_go(SB), R0
+	MOVD	$0, R1
+	SUB	$16, RSP	// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	(R4)
+	ADD	$16, RSP
+	B	restore
+
+nocgo:
+	MOVD	$0x800000, R0                     // stacksize = 8192KB
+	MOVD	$_rt0_arm64_freebsd_lib_go(SB), R1
+	MOVD	R0, 8(RSP)
+	MOVD	R1, 16(RSP)
+	MOVD	$runtime·newosproc0(SB),R4
+	BL	(R4)
+
+restore:
+	// Restore callee-save registers.
+	MOVD 24(RSP), R19
+	MOVD 32(RSP), R20
+	MOVD 40(RSP), R21
+	MOVD 48(RSP), R22
+	MOVD 56(RSP), R23
+	MOVD 64(RSP), R24
+	MOVD 72(RSP), R25
+	MOVD 80(RSP), R26
+	MOVD 88(RSP), R27
+	FMOVD 96(RSP), F8
+	FMOVD 104(RSP), F9
+	FMOVD 112(RSP), F10
+	FMOVD 120(RSP), F11
+	FMOVD 128(RSP), F12
+	FMOVD 136(RSP), F13
+	FMOVD 144(RSP), F14
+	FMOVD 152(RSP), F15
+	MOVD 160(RSP), g
+	RET
+
+TEXT _rt0_arm64_freebsd_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_arm64_freebsd_lib_argc<>(SB), R0
+	MOVD	_rt0_arm64_freebsd_lib_argv<>(SB), R1
+	MOVD	$runtime·rt0_go(SB),R4
+	B       (R4)
+
+DATA _rt0_arm64_freebsd_lib_argc<>(SB)/8, $0
+GLOBL _rt0_arm64_freebsd_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_arm64_freebsd_lib_argv<>(SB)/8, $0
+GLOBL _rt0_arm64_freebsd_lib_argv<>(SB),NOPTR, $8
+
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	BL	(R2)
+exit:
+	MOVD	$0, R0
+	MOVD	$1, R8	// SYS_exit
+	SVC
+	B	exit
diff --git a/src/runtime/rt0_illumos_amd64.s b/src/runtime/rt0_illumos_amd64.s
new file mode 100644
index 0000000..54d35b7
--- /dev/null
+++ b/src/runtime/rt0_illumos_amd64.s
@@ -0,0 +1,11 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_illumos(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_illumos_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_ios_amd64.s b/src/runtime/rt0_ios_amd64.s
new file mode 100644
index 0000000..c699032
--- /dev/null
+++ b/src/runtime/rt0_ios_amd64.s
@@ -0,0 +1,14 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// internal linking executable entry point.
+// ios/amd64 only supports external linking.
+TEXT _rt0_amd64_ios(SB),NOSPLIT|NOFRAME,$0
+	UNDEF
+
+// library entry point.
+TEXT _rt0_amd64_ios_lib(SB),NOSPLIT|NOFRAME,$0
+	JMP	_rt0_amd64_darwin_lib(SB)
diff --git a/src/runtime/rt0_ios_arm64.s b/src/runtime/rt0_ios_arm64.s
new file mode 100644
index 0000000..dcc8365
--- /dev/null
+++ b/src/runtime/rt0_ios_arm64.s
@@ -0,0 +1,14 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// internal linking executable entry point.
+// ios/arm64 only supports external linking.
+TEXT _rt0_arm64_ios(SB),NOSPLIT|NOFRAME,$0
+	UNDEF
+
+// library entry point.
+TEXT _rt0_arm64_ios_lib(SB),NOSPLIT|NOFRAME,$0
+	JMP	_rt0_arm64_darwin_lib(SB)
diff --git a/src/runtime/rt0_js_wasm.s b/src/runtime/rt0_js_wasm.s
new file mode 100644
index 0000000..714582a
--- /dev/null
+++ b/src/runtime/rt0_js_wasm.s
@@ -0,0 +1,107 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// _rt0_wasm_js is not used itself. It only exists to mark the exported functions as alive.
+TEXT _rt0_wasm_js(SB),NOSPLIT,$0
+	I32Const $wasm_export_run(SB)
+	Drop
+	I32Const $wasm_export_resume(SB)
+	Drop
+	I32Const $wasm_export_getsp(SB)
+	Drop
+
+// wasm_export_run gets called from JavaScript. It initializes the Go runtime and executes Go code until it needs
+// to wait for an event. It does NOT follow the Go ABI. It has two WebAssembly parameters:
+// R0: argc (i32)
+// R1: argv (i32)
+TEXT wasm_export_run(SB),NOSPLIT,$0
+	MOVD $runtime·wasmStack+(m0Stack__size-16)(SB), SP
+
+	Get SP
+	Get R0 // argc
+	I64ExtendI32U
+	I64Store $0
+
+	Get SP
+	Get R1 // argv
+	I64ExtendI32U
+	I64Store $8
+
+	I32Const $0 // entry PC_B
+	Call runtime·rt0_go(SB)
+	Drop
+	Call wasm_pc_f_loop(SB)
+
+	Return
+
+// wasm_export_resume gets called from JavaScript. It resumes the execution of Go code until it needs to wait for
+// an event.
+TEXT wasm_export_resume(SB),NOSPLIT,$0
+	I32Const $0
+	Call runtime·handleEvent(SB)
+	Drop
+	Call wasm_pc_f_loop(SB)
+
+	Return
+
+TEXT wasm_pc_f_loop(SB),NOSPLIT,$0
+// Call the function for the current PC_F. Repeat until PAUSE != 0 indicates pause or exit.
+// The WebAssembly stack may unwind, e.g. when switching goroutines.
+// The Go stack on the linear memory is then used to jump to the correct functions
+// with this loop, without having to restore the full WebAssembly stack.
+// It is expected to have a pending call before entering the loop, so check PAUSE first.
+	Get PAUSE
+	I32Eqz
+	If
+	loop:
+		Loop
+			// Get PC_B & PC_F from -8(SP)
+			Get SP
+			I32Const $8
+			I32Sub
+			I32Load16U $0 // PC_B
+
+			Get SP
+			I32Const $8
+			I32Sub
+			I32Load16U $2 // PC_F
+
+			CallIndirect $0
+			Drop
+
+			Get PAUSE
+			I32Eqz
+			BrIf loop
+		End
+	End
+
+	I32Const $0
+	Set PAUSE
+
+	Return
+
+// wasm_export_getsp gets called from JavaScript to retrieve the SP.
+TEXT wasm_export_getsp(SB),NOSPLIT,$0
+	Get SP
+	Return
+
+TEXT runtime·pause(SB), NOSPLIT, $0-8
+	MOVD newsp+0(FP), SP
+	I32Const $1
+	Set PAUSE
+	RETUNWIND
+
+TEXT runtime·exit(SB), NOSPLIT, $0-4
+	I32Const $0
+	Call runtime·wasmExit(SB)
+	Drop
+	I32Const $1
+	Set PAUSE
+	RETUNWIND
+
+TEXT wasm_export_lib(SB),NOSPLIT,$0
+	UNDEF
diff --git a/src/runtime/rt0_linux_386.s b/src/runtime/rt0_linux_386.s
new file mode 100644
index 0000000..325066f
--- /dev/null
+++ b/src/runtime/rt0_linux_386.s
@@ -0,0 +1,17 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_linux(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+TEXT _rt0_386_linux_lib(SB),NOSPLIT,$0
+	JMP	_rt0_386_lib(SB)
+
+TEXT main(SB),NOSPLIT,$0
+	// Remove the return address from the stack.
+	// rt0_go doesn't expect it to be there.
+	ADDL	$4, SP
+	JMP	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_linux_amd64.s b/src/runtime/rt0_linux_amd64.s
new file mode 100644
index 0000000..94ff709
--- /dev/null
+++ b/src/runtime/rt0_linux_amd64.s
@@ -0,0 +1,11 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_linux_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_linux_arm.s b/src/runtime/rt0_linux_arm.s
new file mode 100644
index 0000000..8a5722f
--- /dev/null
+++ b/src/runtime/rt0_linux_arm.s
@@ -0,0 +1,33 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm_linux(SB),NOSPLIT|NOFRAME,$0
+	MOVW	(R13), R0	// argc
+	MOVW	$4(R13), R1		// argv
+	MOVW	$_rt0_arm_linux1(SB), R4
+	B		(R4)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm_linux_lib(SB),NOSPLIT,$0
+	B	_rt0_arm_lib(SB)
+
+TEXT _rt0_arm_linux1(SB),NOSPLIT|NOFRAME,$0
+	// We first need to detect the kernel ABI, and warn the user
+	// if the system only supports OABI.
+	// The strategy here is to call some EABI syscall to see if
+	// SIGILL is received.
+	// If you get a SIGILL here, you have the wrong kernel.
+
+	// Save argc and argv (syscall will clobber at least R0).
+	MOVM.DB.W [R0-R1], (R13)
+
+	// do an EABI syscall
+	MOVW	$20, R7 // sys_getpid
+	SWI	$0 // this will trigger SIGILL on OABI systems
+
+	MOVM.IA.W (R13), [R0-R1]
+	B	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_linux_arm64.s b/src/runtime/rt0_linux_arm64.s
new file mode 100644
index 0000000..f48a8d6
--- /dev/null
+++ b/src/runtime/rt0_linux_arm64.s
@@ -0,0 +1,104 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm64_linux(SB),NOSPLIT|NOFRAME,$0
+	MOVD	0(RSP), R0	// argc
+	ADD	$8, RSP, R1	// argv
+	BL	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm64_linux_lib(SB),NOSPLIT,$184
+	// Preserve callee-save registers.
+	MOVD R19, 24(RSP)
+	MOVD R20, 32(RSP)
+	MOVD R21, 40(RSP)
+	MOVD R22, 48(RSP)
+	MOVD R23, 56(RSP)
+	MOVD R24, 64(RSP)
+	MOVD R25, 72(RSP)
+	MOVD R26, 80(RSP)
+	MOVD R27, 88(RSP)
+	FMOVD F8, 96(RSP)
+	FMOVD F9, 104(RSP)
+	FMOVD F10, 112(RSP)
+	FMOVD F11, 120(RSP)
+	FMOVD F12, 128(RSP)
+	FMOVD F13, 136(RSP)
+	FMOVD F14, 144(RSP)
+	FMOVD F15, 152(RSP)
+	MOVD g, 160(RSP)
+
+	// Initialize g as null in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOVD	ZR, g
+
+	MOVD	R0, _rt0_arm64_linux_lib_argc<>(SB)
+	MOVD	R1, _rt0_arm64_linux_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R4
+	BL	(R4)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R4
+	CBZ	R4, nocgo
+	MOVD	$_rt0_arm64_linux_lib_go(SB), R0
+	MOVD	$0, R1
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	(R4)
+	ADD	$16, RSP
+	B	restore
+
+nocgo:
+	MOVD	$0x800000, R0                     // stacksize = 8192KB
+	MOVD	$_rt0_arm64_linux_lib_go(SB), R1
+	MOVD	R0, 8(RSP)
+	MOVD	R1, 16(RSP)
+	MOVD	$runtime·newosproc0(SB),R4
+	BL	(R4)
+
+restore:
+	// Restore callee-save registers.
+	MOVD 24(RSP), R19
+	MOVD 32(RSP), R20
+	MOVD 40(RSP), R21
+	MOVD 48(RSP), R22
+	MOVD 56(RSP), R23
+	MOVD 64(RSP), R24
+	MOVD 72(RSP), R25
+	MOVD 80(RSP), R26
+	MOVD 88(RSP), R27
+	FMOVD 96(RSP), F8
+	FMOVD 104(RSP), F9
+	FMOVD 112(RSP), F10
+	FMOVD 120(RSP), F11
+	FMOVD 128(RSP), F12
+	FMOVD 136(RSP), F13
+	FMOVD 144(RSP), F14
+	FMOVD 152(RSP), F15
+	MOVD 160(RSP), g
+	RET
+
+TEXT _rt0_arm64_linux_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_arm64_linux_lib_argc<>(SB), R0
+	MOVD	_rt0_arm64_linux_lib_argv<>(SB), R1
+	MOVD	$runtime·rt0_go(SB),R4
+	B       (R4)
+
+DATA _rt0_arm64_linux_lib_argc<>(SB)/8, $0
+GLOBL _rt0_arm64_linux_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_arm64_linux_lib_argv<>(SB)/8, $0
+GLOBL _rt0_arm64_linux_lib_argv<>(SB),NOPTR, $8
+
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	BL	(R2)
+exit:
+	MOVD $0, R0
+	MOVD	$94, R8	// sys_exit
+	SVC
+	B	exit
diff --git a/src/runtime/rt0_linux_mips64x.s b/src/runtime/rt0_linux_mips64x.s
new file mode 100644
index 0000000..5550675
--- /dev/null
+++ b/src/runtime/rt0_linux_mips64x.s
@@ -0,0 +1,39 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build mips64 mips64le
+
+#include "textflag.h"
+
+TEXT _rt0_mips64_linux(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _rt0_mips64le_linux(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _main<>(SB),NOSPLIT|NOFRAME,$0
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+#ifdef GOARCH_mips64
+	MOVW	4(R29), R4 // argc, big-endian ABI places int32 at offset 4
+#else
+	MOVW	0(R29), R4 // argc
+#endif
+	ADDV	$8, R29, R5 // argv
+	JMP	main(SB)
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	// in external linking, glibc jumps to main with argc in R4
+	// and argv in R5
+
+	// initialize REGSB = PC&0xffffffff00000000
+	BGEZAL	R0, 1(PC)
+	SRLV	$32, R31, RSB
+	SLLV	$32, RSB
+
+	MOVV	$runtime·rt0_go(SB), R1
+	JMP	(R1)
diff --git a/src/runtime/rt0_linux_mipsx.s b/src/runtime/rt0_linux_mipsx.s
new file mode 100644
index 0000000..74b8f50
--- /dev/null
+++ b/src/runtime/rt0_linux_mipsx.s
@@ -0,0 +1,28 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build mips mipsle
+
+#include "textflag.h"
+
+TEXT _rt0_mips_linux(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _rt0_mipsle_linux(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _main<>(SB),NOSPLIT|NOFRAME,$0
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+	MOVW	0(R29), R4 // argc
+	ADD	$4, R29, R5 // argv
+	JMP	main(SB)
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	// In external linking, libc jumps to main with argc in R4, argv in R5
+	MOVW	$runtime·rt0_go(SB), R1
+	JMP	(R1)
diff --git a/src/runtime/rt0_linux_ppc64.s b/src/runtime/rt0_linux_ppc64.s
new file mode 100644
index 0000000..897d610
--- /dev/null
+++ b/src/runtime/rt0_linux_ppc64.s
@@ -0,0 +1,34 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// actually a function descriptor for _main<>(SB)
+TEXT _rt0_ppc64_linux(SB),NOSPLIT,$0
+	DWORD $_main<>(SB)
+	DWORD $0
+	DWORD $0
+
+TEXT main(SB),NOSPLIT,$0
+	DWORD $_main<>(SB)
+	DWORD $0
+	DWORD $0
+
+TEXT _main<>(SB),NOSPLIT,$-8
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+	//
+	// TODO(austin): Support ABI v1 dynamic linking entry point
+	MOVD	$runtime·rt0_go(SB), R12
+	MOVD	R12, CTR
+	MOVBZ	runtime·iscgo(SB), R5
+	CMP	R5, $0
+	BEQ	nocgo
+	BR	(CTR)
+nocgo:
+	MOVD	0(R1), R3 // argc
+	ADD	$8, R1, R4 // argv
+	BR	(CTR)
diff --git a/src/runtime/rt0_linux_ppc64le.s b/src/runtime/rt0_linux_ppc64le.s
new file mode 100644
index 0000000..4f7c6e6
--- /dev/null
+++ b/src/runtime/rt0_linux_ppc64le.s
@@ -0,0 +1,174 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT _rt0_ppc64le_linux(SB),NOSPLIT,$0
+	XOR R0, R0	  // Make sure R0 is zero before _main
+	BR _main<>(SB)
+
+TEXT _rt0_ppc64le_linux_lib(SB),NOSPLIT,$-8
+	// Start with standard C stack frame layout and linkage.
+	MOVD	LR, R0
+	MOVD	R0, 16(R1) // Save LR in caller's frame.
+	MOVW	CR, R0     // Save CR in caller's frame
+	MOVD	R0, 8(R1)
+	MOVDU	R1, -320(R1) // Allocate frame.
+
+	// Preserve callee-save registers.
+	MOVD	R14, 24(R1)
+	MOVD	R15, 32(R1)
+	MOVD	R16, 40(R1)
+	MOVD	R17, 48(R1)
+	MOVD	R18, 56(R1)
+	MOVD	R19, 64(R1)
+	MOVD	R20, 72(R1)
+	MOVD	R21, 80(R1)
+	MOVD	R22, 88(R1)
+	MOVD	R23, 96(R1)
+	MOVD	R24, 104(R1)
+	MOVD	R25, 112(R1)
+	MOVD	R26, 120(R1)
+	MOVD	R27, 128(R1)
+	MOVD	R28, 136(R1)
+	MOVD	R29, 144(R1)
+	MOVD	g, 152(R1) // R30
+	MOVD	R31, 160(R1)
+	FMOVD	F14, 168(R1)
+	FMOVD	F15, 176(R1)
+	FMOVD	F16, 184(R1)
+	FMOVD	F17, 192(R1)
+	FMOVD	F18, 200(R1)
+	FMOVD	F19, 208(R1)
+	FMOVD	F20, 216(R1)
+	FMOVD	F21, 224(R1)
+	FMOVD	F22, 232(R1)
+	FMOVD	F23, 240(R1)
+	FMOVD	F24, 248(R1)
+	FMOVD	F25, 256(R1)
+	FMOVD	F26, 264(R1)
+	FMOVD	F27, 272(R1)
+	FMOVD	F28, 280(R1)
+	FMOVD	F29, 288(R1)
+	FMOVD	F30, 296(R1)
+	FMOVD	F31, 304(R1)
+
+	MOVD	R3, _rt0_ppc64le_linux_lib_argc<>(SB)
+	MOVD	R4, _rt0_ppc64le_linux_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVD	$runtime·reginit(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+	MOVD	$runtime·libpreinit(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R12
+	CMP	$0, R12
+	BEQ	nocgo
+	MOVD	$_rt0_ppc64le_linux_lib_go(SB), R3
+	MOVD	$0, R4
+	MOVD	R12, CTR
+	BL	(CTR)
+	BR	done
+
+nocgo:
+	MOVD	$0x800000, R12                     // stacksize = 8192KB
+	MOVD	R12, 8(R1)
+	MOVD	$_rt0_ppc64le_linux_lib_go(SB), R12
+	MOVD	R12, 16(R1)
+	MOVD	$runtime·newosproc0(SB),R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+done:
+	// Restore saved registers.
+	MOVD	24(R1), R14
+	MOVD	32(R1), R15
+	MOVD	40(R1), R16
+	MOVD	48(R1), R17
+	MOVD	56(R1), R18
+	MOVD	64(R1), R19
+	MOVD	72(R1), R20
+	MOVD	80(R1), R21
+	MOVD	88(R1), R22
+	MOVD	96(R1), R23
+	MOVD	104(R1), R24
+	MOVD	112(R1), R25
+	MOVD	120(R1), R26
+	MOVD	128(R1), R27
+	MOVD	136(R1), R28
+	MOVD	144(R1), R29
+	MOVD	152(R1), g // R30
+	MOVD	160(R1), R31
+	FMOVD	168(R1), F14
+	FMOVD	176(R1), F15
+	FMOVD	184(R1), F16
+	FMOVD	192(R1), F17
+	FMOVD	200(R1), F18
+	FMOVD	208(R1), F19
+	FMOVD	216(R1), F20
+	FMOVD	224(R1), F21
+	FMOVD	232(R1), F22
+	FMOVD	240(R1), F23
+	FMOVD	248(R1), F24
+	FMOVD	256(R1), F25
+	FMOVD	264(R1), F26
+	FMOVD	272(R1), F27
+	FMOVD	280(R1), F28
+	FMOVD	288(R1), F29
+	FMOVD	296(R1), F30
+	FMOVD	304(R1), F31
+
+	ADD	$320, R1
+	MOVD	8(R1), R0
+	MOVFL	R0, $0xff
+	MOVD	16(R1), R0
+	MOVD	R0, LR
+	RET
+
+TEXT _rt0_ppc64le_linux_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_ppc64le_linux_lib_argc<>(SB), R3
+	MOVD	_rt0_ppc64le_linux_lib_argv<>(SB), R4
+	MOVD	$runtime·rt0_go(SB), R12
+	MOVD	R12, CTR
+	BR	(CTR)
+
+DATA _rt0_ppc64le_linux_lib_argc<>(SB)/8, $0
+GLOBL _rt0_ppc64le_linux_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_ppc64le_linux_lib_argv<>(SB)/8, $0
+GLOBL _rt0_ppc64le_linux_lib_argv<>(SB),NOPTR, $8
+
+TEXT _main<>(SB),NOSPLIT,$-8
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+	//
+	// In a dynamically linked binary, r3 contains argc, r4
+	// contains argv, r5 contains envp, r6 contains auxv, and r13
+	// contains the TLS pointer.
+	//
+	// Figure out which case this is by looking at r4: if it's 0,
+	// we're statically linked; otherwise we're dynamically
+	// linked.
+	CMP	R0, R4
+	BNE	dlink
+
+	// Statically linked
+	MOVD	0(R1), R3 // argc
+	ADD	$8, R1, R4 // argv
+	MOVD	$runtime·m0+m_tls(SB), R13 // TLS
+	ADD	$0x7000, R13
+
+dlink:
+	BR	main(SB)
+
+TEXT main(SB),NOSPLIT,$-8
+	MOVD	$runtime·rt0_go(SB), R12
+	MOVD	R12, CTR
+	BR	(CTR)
diff --git a/src/runtime/rt0_linux_riscv64.s b/src/runtime/rt0_linux_riscv64.s
new file mode 100644
index 0000000..f31f7f7
--- /dev/null
+++ b/src/runtime/rt0_linux_riscv64.s
@@ -0,0 +1,14 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_riscv64_linux(SB),NOSPLIT|NOFRAME,$0
+	MOV	0(X2), A0	// argc
+	ADD	$8, X2, A1	// argv
+	JMP	main(SB)
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOV	$runtime·rt0_go(SB), T0
+	JALR	ZERO, T0
diff --git a/src/runtime/rt0_linux_s390x.s b/src/runtime/rt0_linux_s390x.s
new file mode 100644
index 0000000..4b62c5a
--- /dev/null
+++ b/src/runtime/rt0_linux_s390x.s
@@ -0,0 +1,23 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_s390x_linux(SB), NOSPLIT|NOFRAME, $0
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+
+	MOVD 0(R15), R2  // argc
+	ADD  $8, R15, R3 // argv
+	BR   main(SB)
+
+TEXT _rt0_s390x_linux_lib(SB), NOSPLIT, $0
+	MOVD $_rt0_s390x_lib(SB), R1
+	BR   R1
+
+TEXT main(SB), NOSPLIT|NOFRAME, $0
+	MOVD $runtime·rt0_go(SB), R1
+	BR   R1
diff --git a/src/runtime/rt0_netbsd_386.s b/src/runtime/rt0_netbsd_386.s
new file mode 100644
index 0000000..cefc04a
--- /dev/null
+++ b/src/runtime/rt0_netbsd_386.s
@@ -0,0 +1,17 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_netbsd(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+TEXT _rt0_386_netbsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_386_lib(SB)
+
+TEXT main(SB),NOSPLIT,$0
+	// Remove the return address from the stack.
+	// rt0_go doesn't expect it to be there.
+	ADDL	$4, SP
+	JMP	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_netbsd_amd64.s b/src/runtime/rt0_netbsd_amd64.s
new file mode 100644
index 0000000..77c7187
--- /dev/null
+++ b/src/runtime/rt0_netbsd_amd64.s
@@ -0,0 +1,11 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_netbsd(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_netbsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_netbsd_arm.s b/src/runtime/rt0_netbsd_arm.s
new file mode 100644
index 0000000..503c32a
--- /dev/null
+++ b/src/runtime/rt0_netbsd_arm.s
@@ -0,0 +1,11 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm_netbsd(SB),NOSPLIT,$0
+	B	_rt0_arm(SB)
+
+TEXT _rt0_arm_netbsd_lib(SB),NOSPLIT,$0
+	B	_rt0_arm_lib(SB)
diff --git a/src/runtime/rt0_netbsd_arm64.s b/src/runtime/rt0_netbsd_arm64.s
new file mode 100644
index 0000000..2f3b5a5
--- /dev/null
+++ b/src/runtime/rt0_netbsd_arm64.s
@@ -0,0 +1,102 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm64_netbsd(SB),NOSPLIT|NOFRAME,$0
+	MOVD	0(RSP), R0	// argc
+	ADD	$8, RSP, R1	// argv
+	BL	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm64_netbsd_lib(SB),NOSPLIT,$184
+	// Preserve callee-save registers.
+	MOVD R19, 24(RSP)
+	MOVD R20, 32(RSP)
+	MOVD R21, 40(RSP)
+	MOVD R22, 48(RSP)
+	MOVD R23, 56(RSP)
+	MOVD R24, 64(RSP)
+	MOVD R25, 72(RSP)
+	MOVD R26, 80(RSP)
+	MOVD R27, 88(RSP)
+	FMOVD F8, 96(RSP)
+	FMOVD F9, 104(RSP)
+	FMOVD F10, 112(RSP)
+	FMOVD F11, 120(RSP)
+	FMOVD F12, 128(RSP)
+	FMOVD F13, 136(RSP)
+	FMOVD F14, 144(RSP)
+	FMOVD F15, 152(RSP)
+	MOVD g, 160(RSP)
+
+	// Initialize g as null in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOVD	ZR, g
+
+	MOVD	R0, _rt0_arm64_netbsd_lib_argc<>(SB)
+	MOVD	R1, _rt0_arm64_netbsd_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R4
+	BL	(R4)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R4
+	CBZ	R4, nocgo
+	MOVD	$_rt0_arm64_netbsd_lib_go(SB), R0
+	MOVD	$0, R1
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	(R4)
+	ADD	$16, RSP
+	B	restore
+
+nocgo:
+	MOVD	$0x800000, R0                     // stacksize = 8192KB
+	MOVD	$_rt0_arm64_netbsd_lib_go(SB), R1
+	MOVD	R0, 8(RSP)
+	MOVD	R1, 16(RSP)
+	MOVD	$runtime·newosproc0(SB),R4
+	BL	(R4)
+
+restore:
+	// Restore callee-save registers.
+	MOVD 24(RSP), R19
+	MOVD 32(RSP), R20
+	MOVD 40(RSP), R21
+	MOVD 48(RSP), R22
+	MOVD 56(RSP), R23
+	MOVD 64(RSP), R24
+	MOVD 72(RSP), R25
+	MOVD 80(RSP), R26
+	MOVD 88(RSP), R27
+	FMOVD 96(RSP), F8
+	FMOVD 104(RSP), F9
+	FMOVD 112(RSP), F10
+	FMOVD 120(RSP), F11
+	FMOVD 128(RSP), F12
+	FMOVD 136(RSP), F13
+	FMOVD 144(RSP), F14
+	FMOVD 152(RSP), F15
+	MOVD 160(RSP), g
+	RET
+
+TEXT _rt0_arm64_netbsd_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_arm64_netbsd_lib_argc<>(SB), R0
+	MOVD	_rt0_arm64_netbsd_lib_argv<>(SB), R1
+	MOVD	$runtime·rt0_go(SB),R4
+	B       (R4)
+
+DATA _rt0_arm64_netbsd_lib_argc<>(SB)/8, $0
+GLOBL _rt0_arm64_netbsd_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_arm64_netbsd_lib_argv<>(SB)/8, $0
+GLOBL _rt0_arm64_netbsd_lib_argv<>(SB),NOPTR, $8
+
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	BL	(R2)
+exit:
+	MOVD	$0, R0
+	SVC	$1	// sys_exit
diff --git a/src/runtime/rt0_openbsd_386.s b/src/runtime/rt0_openbsd_386.s
new file mode 100644
index 0000000..959f4d6
--- /dev/null
+++ b/src/runtime/rt0_openbsd_386.s
@@ -0,0 +1,17 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_openbsd(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+TEXT _rt0_386_openbsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_386_lib(SB)
+
+TEXT main(SB),NOSPLIT,$0
+	// Remove the return address from the stack.
+	// rt0_go doesn't expect it to be there.
+	ADDL	$4, SP
+	JMP	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_openbsd_amd64.s b/src/runtime/rt0_openbsd_amd64.s
new file mode 100644
index 0000000..c2f3f23
--- /dev/null
+++ b/src/runtime/rt0_openbsd_amd64.s
@@ -0,0 +1,11 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_openbsd(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_openbsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_openbsd_arm.s b/src/runtime/rt0_openbsd_arm.s
new file mode 100644
index 0000000..3511c96
--- /dev/null
+++ b/src/runtime/rt0_openbsd_arm.s
@@ -0,0 +1,11 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm_openbsd(SB),NOSPLIT,$0
+	B	_rt0_arm(SB)
+
+TEXT _rt0_arm_openbsd_lib(SB),NOSPLIT,$0
+	B	_rt0_arm_lib(SB)
diff --git a/src/runtime/rt0_openbsd_arm64.s b/src/runtime/rt0_openbsd_arm64.s
new file mode 100644
index 0000000..722fab6
--- /dev/null
+++ b/src/runtime/rt0_openbsd_arm64.s
@@ -0,0 +1,110 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See comment in runtime/sys_openbsd_arm64.s re this construction.
+#define	INVOKE_SYSCALL	\
+	SVC;		\
+	NOOP;		\
+	NOOP
+
+TEXT _rt0_arm64_openbsd(SB),NOSPLIT|NOFRAME,$0
+	MOVD	0(RSP), R0	// argc
+	ADD	$8, RSP, R1	// argv
+	BL	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm64_openbsd_lib(SB),NOSPLIT,$184
+	// Preserve callee-save registers.
+	MOVD R19, 24(RSP)
+	MOVD R20, 32(RSP)
+	MOVD R21, 40(RSP)
+	MOVD R22, 48(RSP)
+	MOVD R23, 56(RSP)
+	MOVD R24, 64(RSP)
+	MOVD R25, 72(RSP)
+	MOVD R26, 80(RSP)
+	MOVD R27, 88(RSP)
+	FMOVD F8, 96(RSP)
+	FMOVD F9, 104(RSP)
+	FMOVD F10, 112(RSP)
+	FMOVD F11, 120(RSP)
+	FMOVD F12, 128(RSP)
+	FMOVD F13, 136(RSP)
+	FMOVD F14, 144(RSP)
+	FMOVD F15, 152(RSP)
+	MOVD g, 160(RSP)
+
+	// Initialize g as null in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOVD	ZR, g
+
+	MOVD	R0, _rt0_arm64_openbsd_lib_argc<>(SB)
+	MOVD	R1, _rt0_arm64_openbsd_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R4
+	BL	(R4)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R4
+	CBZ	R4, nocgo
+	MOVD	$_rt0_arm64_openbsd_lib_go(SB), R0
+	MOVD	$0, R1
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	(R4)
+	ADD	$16, RSP
+	B	restore
+
+nocgo:
+	MOVD	$0x800000, R0                     // stacksize = 8192KB
+	MOVD	$_rt0_arm64_openbsd_lib_go(SB), R1
+	MOVD	R0, 8(RSP)
+	MOVD	R1, 16(RSP)
+	MOVD	$runtime·newosproc0(SB),R4
+	BL	(R4)
+
+restore:
+	// Restore callee-save registers.
+	MOVD 24(RSP), R19
+	MOVD 32(RSP), R20
+	MOVD 40(RSP), R21
+	MOVD 48(RSP), R22
+	MOVD 56(RSP), R23
+	MOVD 64(RSP), R24
+	MOVD 72(RSP), R25
+	MOVD 80(RSP), R26
+	MOVD 88(RSP), R27
+	FMOVD 96(RSP), F8
+	FMOVD 104(RSP), F9
+	FMOVD 112(RSP), F10
+	FMOVD 120(RSP), F11
+	FMOVD 128(RSP), F12
+	FMOVD 136(RSP), F13
+	FMOVD 144(RSP), F14
+	FMOVD 152(RSP), F15
+	MOVD 160(RSP), g
+	RET
+
+TEXT _rt0_arm64_openbsd_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_arm64_openbsd_lib_argc<>(SB), R0
+	MOVD	_rt0_arm64_openbsd_lib_argv<>(SB), R1
+	MOVD	$runtime·rt0_go(SB),R4
+	B       (R4)
+
+DATA _rt0_arm64_openbsd_lib_argc<>(SB)/8, $0
+GLOBL _rt0_arm64_openbsd_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_arm64_openbsd_lib_argv<>(SB)/8, $0
+GLOBL _rt0_arm64_openbsd_lib_argv<>(SB),NOPTR, $8
+
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	BL	(R2)
+exit:
+	MOVD	$0, R0
+	MOVD	$1, R8		// sys_exit
+	INVOKE_SYSCALL
+	B	exit
diff --git a/src/runtime/rt0_openbsd_mips64.s b/src/runtime/rt0_openbsd_mips64.s
new file mode 100644
index 0000000..82a8dfa
--- /dev/null
+++ b/src/runtime/rt0_openbsd_mips64.s
@@ -0,0 +1,36 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_mips64_openbsd(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _rt0_mips64le_openbsd(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _main<>(SB),NOSPLIT|NOFRAME,$0
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+#ifdef GOARCH_mips64
+	MOVW	4(R29), R4 // argc, big-endian ABI places int32 at offset 4
+#else
+	MOVW	0(R29), R4 // argc
+#endif
+	ADDV	$8, R29, R5 // argv
+	JMP	main(SB)
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	// in external linking, glibc jumps to main with argc in R4
+	// and argv in R5
+
+	// initialize REGSB = PC&0xffffffff00000000
+	BGEZAL	R0, 1(PC)
+	SRLV	$32, R31, RSB
+	SLLV	$32, RSB
+
+	MOVV	$runtime·rt0_go(SB), R1
+	JMP	(R1)
diff --git a/src/runtime/rt0_plan9_386.s b/src/runtime/rt0_plan9_386.s
new file mode 100644
index 0000000..6471615
--- /dev/null
+++ b/src/runtime/rt0_plan9_386.s
@@ -0,0 +1,21 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_plan9(SB),NOSPLIT,$12
+	MOVL	AX, _tos(SB)
+	LEAL	8(SP), AX
+	MOVL	AX, _privates(SB)
+	MOVL	$1, _nprivates(SB)
+	CALL	runtime·asminit(SB)
+	MOVL	inargc-4(FP), AX
+	MOVL	AX, 0(SP)
+	LEAL	inargv+0(FP), AX
+	MOVL	AX, 4(SP)
+	JMP	runtime·rt0_go(SB)
+
+GLOBL _tos(SB), NOPTR, $4
+GLOBL _privates(SB), NOPTR, $4
+GLOBL _nprivates(SB), NOPTR, $4
diff --git a/src/runtime/rt0_plan9_amd64.s b/src/runtime/rt0_plan9_amd64.s
new file mode 100644
index 0000000..6fd493a
--- /dev/null
+++ b/src/runtime/rt0_plan9_amd64.s
@@ -0,0 +1,19 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_plan9(SB),NOSPLIT,$24
+	MOVQ	AX, _tos(SB)
+	LEAQ	16(SP), AX
+	MOVQ	AX, _privates(SB)
+	MOVL	$1, _nprivates(SB)
+	MOVL	inargc-8(FP), DI
+	LEAQ	inargv+0(FP), SI
+	MOVQ	$runtime·rt0_go(SB), AX
+	JMP	AX
+
+GLOBL _tos(SB), NOPTR, $8
+GLOBL _privates(SB), NOPTR, $8
+GLOBL _nprivates(SB), NOPTR, $4
diff --git a/src/runtime/rt0_plan9_arm.s b/src/runtime/rt0_plan9_arm.s
new file mode 100644
index 0000000..697a78d
--- /dev/null
+++ b/src/runtime/rt0_plan9_arm.s
@@ -0,0 +1,15 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+//in plan 9 argc is at top of stack followed by ptrs to arguments
+
+TEXT _rt0_arm_plan9(SB),NOSPLIT|NOFRAME,$0
+	MOVW	R0, _tos(SB)
+	MOVW	0(R13), R0
+	MOVW	$4(R13), R1
+	B	runtime·rt0_go(SB)
+
+GLOBL _tos(SB), NOPTR, $4
diff --git a/src/runtime/rt0_solaris_amd64.s b/src/runtime/rt0_solaris_amd64.s
new file mode 100644
index 0000000..5c46ded
--- /dev/null
+++ b/src/runtime/rt0_solaris_amd64.s
@@ -0,0 +1,11 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_solaris(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_solaris_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_windows_386.s b/src/runtime/rt0_windows_386.s
new file mode 100644
index 0000000..fa39edd
--- /dev/null
+++ b/src/runtime/rt0_windows_386.s
@@ -0,0 +1,47 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_windows(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+// When building with -buildmode=(c-shared or c-archive), this
+// symbol is called. For dynamic libraries it is called when the
+// library is loaded. For static libraries it is called when the
+// final executable starts, during the C runtime initialization
+// phase.
+TEXT _rt0_386_windows_lib(SB),NOSPLIT,$0x1C
+	MOVL	BP, 0x08(SP)
+	MOVL	BX, 0x0C(SP)
+	MOVL	AX, 0x10(SP)
+	MOVL  CX, 0x14(SP)
+	MOVL  DX, 0x18(SP)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVL	_cgo_sys_thread_create(SB), AX
+	MOVL	$_rt0_386_windows_lib_go(SB), 0x00(SP)
+	MOVL	$0, 0x04(SP)
+
+	 // Top two items on the stack are passed to _cgo_sys_thread_create
+	 // as parameters. This is the calling convention on 32-bit Windows.
+	CALL	AX
+
+	MOVL	0x08(SP), BP
+	MOVL	0x0C(SP), BX
+	MOVL	0x10(SP), AX
+	MOVL	0x14(SP), CX
+	MOVL	0x18(SP), DX
+	RET
+
+TEXT _rt0_386_windows_lib_go(SB),NOSPLIT,$0
+	PUSHL	$0
+	PUSHL	$0
+	JMP	runtime·rt0_go(SB)
+
+TEXT _main(SB),NOSPLIT,$0
+	// Remove the return address from the stack.
+	// rt0_go doesn't expect it to be there.
+	ADDL	$4, SP
+	JMP	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_windows_amd64.s b/src/runtime/rt0_windows_amd64.s
new file mode 100644
index 0000000..345e141
--- /dev/null
+++ b/src/runtime/rt0_windows_amd64.s
@@ -0,0 +1,43 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+TEXT _rt0_amd64_windows(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+// When building with -buildmode=(c-shared or c-archive), this
+// symbol is called. For dynamic libraries it is called when the
+// library is loaded. For static libraries it is called when the
+// final executable starts, during the C runtime initialization
+// phase.
+// Leave space for four pointers on the stack as required
+// by the Windows amd64 calling convention.
+TEXT _rt0_amd64_windows_lib(SB),NOSPLIT,$0x48
+	MOVQ	BP, 0x20(SP)
+	MOVQ	BX, 0x28(SP)
+	MOVQ	AX, 0x30(SP)
+	MOVQ  CX, 0x38(SP)
+	MOVQ  DX, 0x40(SP)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVQ	_cgo_sys_thread_create(SB), AX
+	MOVQ	$_rt0_amd64_windows_lib_go(SB), CX
+	MOVQ	$0, DX
+	CALL	AX
+
+	MOVQ	0x20(SP), BP
+	MOVQ	0x28(SP), BX
+	MOVQ	0x30(SP), AX
+	MOVQ	0x38(SP), CX
+	MOVQ	0x40(SP), DX
+	RET
+
+TEXT _rt0_amd64_windows_lib_go(SB),NOSPLIT,$0
+	MOVQ  $0, DI
+	MOVQ	$0, SI
+	MOVQ	$runtime·rt0_go(SB), AX
+	JMP	AX
diff --git a/src/runtime/rt0_windows_arm.s b/src/runtime/rt0_windows_arm.s
new file mode 100644
index 0000000..c5787d0
--- /dev/null
+++ b/src/runtime/rt0_windows_arm.s
@@ -0,0 +1,12 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// This is the entry point for the program from the
+// kernel for an ordinary -buildmode=exe program.
+TEXT _rt0_arm_windows(SB),NOSPLIT|NOFRAME,$0
+	B	·rt0_go(SB)
diff --git a/src/runtime/runtime-gdb.py b/src/runtime/runtime-gdb.py
new file mode 100644
index 0000000..8d96dfb
--- /dev/null
+++ b/src/runtime/runtime-gdb.py
@@ -0,0 +1,606 @@
+# Copyright 2010 The Go Authors. All rights reserved.
+# Use of this source code is governed by a BSD-style
+# license that can be found in the LICENSE file.
+
+"""GDB Pretty printers and convenience functions for Go's runtime structures.
+
+This script is loaded by GDB when it finds a .debug_gdb_scripts
+section in the compiled binary. The [68]l linkers emit this with a
+path to this file based on the path to the runtime package.
+"""
+
+# Known issues:
+#    - pretty printing only works for the 'native' strings. E.g. 'type
+#      foo string' will make foo a plain struct in the eyes of gdb,
+#      circumventing the pretty print triggering.
+
+
+from __future__ import print_function
+import re
+import sys
+import gdb
+
+print("Loading Go Runtime support.", file=sys.stderr)
+#http://python3porting.com/differences.html
+if sys.version > '3':
+	xrange = range
+# allow to manually reload while developing
+goobjfile = gdb.current_objfile() or gdb.objfiles()[0]
+goobjfile.pretty_printers = []
+
+# G state (runtime2.go)
+
+def read_runtime_const(varname, default):
+  try:
+    return int(gdb.parse_and_eval(varname))
+  except Exception:
+    return int(default)
+
+
+G_IDLE = read_runtime_const("'runtime._Gidle'", 0)
+G_RUNNABLE = read_runtime_const("'runtime._Grunnable'", 1)
+G_RUNNING = read_runtime_const("'runtime._Grunning'", 2)
+G_SYSCALL = read_runtime_const("'runtime._Gsyscall'", 3)
+G_WAITING = read_runtime_const("'runtime._Gwaiting'", 4)
+G_MORIBUND_UNUSED = read_runtime_const("'runtime._Gmoribund_unused'", 5)
+G_DEAD = read_runtime_const("'runtime._Gdead'", 6)
+G_ENQUEUE_UNUSED = read_runtime_const("'runtime._Genqueue_unused'", 7)
+G_COPYSTACK = read_runtime_const("'runtime._Gcopystack'", 8)
+G_SCAN = read_runtime_const("'runtime._Gscan'", 0x1000)
+G_SCANRUNNABLE = G_SCAN+G_RUNNABLE
+G_SCANRUNNING = G_SCAN+G_RUNNING
+G_SCANSYSCALL = G_SCAN+G_SYSCALL
+G_SCANWAITING = G_SCAN+G_WAITING
+
+sts = {
+    G_IDLE: 'idle',
+    G_RUNNABLE: 'runnable',
+    G_RUNNING: 'running',
+    G_SYSCALL: 'syscall',
+    G_WAITING: 'waiting',
+    G_MORIBUND_UNUSED: 'moribund',
+    G_DEAD: 'dead',
+    G_ENQUEUE_UNUSED: 'enqueue',
+    G_COPYSTACK: 'copystack',
+    G_SCAN: 'scan',
+    G_SCANRUNNABLE: 'runnable+s',
+    G_SCANRUNNING: 'running+s',
+    G_SCANSYSCALL: 'syscall+s',
+    G_SCANWAITING: 'waiting+s',
+}
+
+
+#
+#  Value wrappers
+#
+
+class SliceValue:
+	"Wrapper for slice values."
+
+	def __init__(self, val):
+		self.val = val
+
+	@property
+	def len(self):
+		return int(self.val['len'])
+
+	@property
+	def cap(self):
+		return int(self.val['cap'])
+
+	def __getitem__(self, i):
+		if i < 0 or i >= self.len:
+			raise IndexError(i)
+		ptr = self.val["array"]
+		return (ptr + i).dereference()
+
+
+#
+#  Pretty Printers
+#
+
+# The patterns for matching types are permissive because gdb 8.2 switched to matching on (we think) typedef names instead of C syntax names.
+class StringTypePrinter:
+	"Pretty print Go strings."
+
+	pattern = re.compile(r'^(struct string( \*)?|string)$')
+
+	def __init__(self, val):
+		self.val = val
+
+	def display_hint(self):
+		return 'string'
+
+	def to_string(self):
+		l = int(self.val['len'])
+		return self.val['str'].string("utf-8", "ignore", l)
+
+
+class SliceTypePrinter:
+	"Pretty print slices."
+
+	pattern = re.compile(r'^(struct \[\]|\[\])')
+
+	def __init__(self, val):
+		self.val = val
+
+	def display_hint(self):
+		return 'array'
+
+	def to_string(self):
+		t = str(self.val.type)
+		if (t.startswith("struct ")):
+			return t[len("struct "):]
+		return t
+
+	def children(self):
+		sval = SliceValue(self.val)
+		if sval.len > sval.cap:
+			return
+		for idx, item in enumerate(sval):
+			yield ('[{0}]'.format(idx), item)
+
+
+class MapTypePrinter:
+	"""Pretty print map[K]V types.
+
+	Map-typed go variables are really pointers. dereference them in gdb
+	to inspect their contents with this pretty printer.
+	"""
+
+	pattern = re.compile(r'^map\[.*\].*$')
+
+	def __init__(self, val):
+		self.val = val
+
+	def display_hint(self):
+		return 'map'
+
+	def to_string(self):
+		return str(self.val.type)
+
+	def children(self):
+		B = self.val['B']
+		buckets = self.val['buckets']
+		oldbuckets = self.val['oldbuckets']
+		flags = self.val['flags']
+		inttype = self.val['hash0'].type
+		cnt = 0
+		for bucket in xrange(2 ** int(B)):
+			bp = buckets + bucket
+			if oldbuckets:
+				oldbucket = bucket & (2 ** (B - 1) - 1)
+				oldbp = oldbuckets + oldbucket
+				oldb = oldbp.dereference()
+				if (oldb['overflow'].cast(inttype) & 1) == 0:  # old bucket not evacuated yet
+					if bucket >= 2 ** (B - 1):
+						continue    # already did old bucket
+					bp = oldbp
+			while bp:
+				b = bp.dereference()
+				for i in xrange(8):
+					if b['tophash'][i] != 0:
+						k = b['keys'][i]
+						v = b['values'][i]
+						if flags & 1:
+							k = k.dereference()
+						if flags & 2:
+							v = v.dereference()
+						yield str(cnt), k
+						yield str(cnt + 1), v
+						cnt += 2
+				bp = b['overflow']
+
+
+class ChanTypePrinter:
+	"""Pretty print chan[T] types.
+
+	Chan-typed go variables are really pointers. dereference them in gdb
+	to inspect their contents with this pretty printer.
+	"""
+
+	pattern = re.compile(r'^chan ')
+
+	def __init__(self, val):
+		self.val = val
+
+	def display_hint(self):
+		return 'array'
+
+	def to_string(self):
+		return str(self.val.type)
+
+	def children(self):
+		# see chan.c chanbuf(). et is the type stolen from hchan<T>::recvq->first->elem
+		et = [x.type for x in self.val['recvq']['first'].type.target().fields() if x.name == 'elem'][0]
+		ptr = (self.val.address["buf"]).cast(et)
+		for i in range(self.val["qcount"]):
+			j = (self.val["recvx"] + i) % self.val["dataqsiz"]
+			yield ('[{0}]'.format(i), (ptr + j).dereference())
+
+
+#
+#  Register all the *Printer classes above.
+#
+
+def makematcher(klass):
+	def matcher(val):
+		try:
+			if klass.pattern.match(str(val.type)):
+				return klass(val)
+		except Exception:
+			pass
+	return matcher
+
+goobjfile.pretty_printers.extend([makematcher(var) for var in vars().values() if hasattr(var, 'pattern')])
+#
+#  Utilities
+#
+
+def pc_to_int(pc):
+	# python2 will not cast pc (type void*) to an int cleanly
+	# instead python2 and python3 work with the hex string representation
+	# of the void pointer which we can parse back into an int.
+	# int(pc) will not work.
+	try:
+		# python3 / newer versions of gdb
+		pc = int(pc)
+	except gdb.error:
+		# str(pc) can return things like
+		# "0x429d6c <runtime.gopark+284>", so
+		# chop at first space.
+		pc = int(str(pc).split(None, 1)[0], 16)
+	return pc
+
+
+#
+#  For reference, this is what we're trying to do:
+#  eface: p *(*(struct 'runtime.rtype'*)'main.e'->type_->data)->string
+#  iface: p *(*(struct 'runtime.rtype'*)'main.s'->tab->Type->data)->string
+#
+# interface types can't be recognized by their name, instead we check
+# if they have the expected fields.  Unfortunately the mapping of
+# fields to python attributes in gdb.py isn't complete: you can't test
+# for presence other than by trapping.
+
+
+def is_iface(val):
+	try:
+		return str(val['tab'].type) == "struct runtime.itab *" and str(val['data'].type) == "void *"
+	except gdb.error:
+		pass
+
+
+def is_eface(val):
+	try:
+		return str(val['_type'].type) == "struct runtime._type *" and str(val['data'].type) == "void *"
+	except gdb.error:
+		pass
+
+
+def lookup_type(name):
+	try:
+		return gdb.lookup_type(name)
+	except gdb.error:
+		pass
+	try:
+		return gdb.lookup_type('struct ' + name)
+	except gdb.error:
+		pass
+	try:
+		return gdb.lookup_type('struct ' + name[1:]).pointer()
+	except gdb.error:
+		pass
+
+
+def iface_commontype(obj):
+	if is_iface(obj):
+		go_type_ptr = obj['tab']['_type']
+	elif is_eface(obj):
+		go_type_ptr = obj['_type']
+	else:
+		return
+
+	return go_type_ptr.cast(gdb.lookup_type("struct reflect.rtype").pointer()).dereference()
+
+
+def iface_dtype(obj):
+	"Decode type of the data field of an eface or iface struct."
+	# known issue: dtype_name decoded from runtime.rtype is "nested.Foo"
+	# but the dwarf table lists it as "full/path/to/nested.Foo"
+
+	dynamic_go_type = iface_commontype(obj)
+	if dynamic_go_type is None:
+		return
+	dtype_name = dynamic_go_type['string'].dereference()['str'].string()
+
+	dynamic_gdb_type = lookup_type(dtype_name)
+	if dynamic_gdb_type is None:
+		return
+
+	type_size = int(dynamic_go_type['size'])
+	uintptr_size = int(dynamic_go_type['size'].type.sizeof)	 # size is itself an uintptr
+	if type_size > uintptr_size:
+			dynamic_gdb_type = dynamic_gdb_type.pointer()
+
+	return dynamic_gdb_type
+
+
+def iface_dtype_name(obj):
+	"Decode type name of the data field of an eface or iface struct."
+
+	dynamic_go_type = iface_commontype(obj)
+	if dynamic_go_type is None:
+		return
+	return dynamic_go_type['string'].dereference()['str'].string()
+
+
+class IfacePrinter:
+	"""Pretty print interface values
+
+	Casts the data field to the appropriate dynamic type."""
+
+	def __init__(self, val):
+		self.val = val
+
+	def display_hint(self):
+		return 'string'
+
+	def to_string(self):
+		if self.val['data'] == 0:
+			return 0x0
+		try:
+			dtype = iface_dtype(self.val)
+		except Exception:
+			return "<bad dynamic type>"
+
+		if dtype is None:  # trouble looking up, print something reasonable
+			return "({typename}){data}".format(
+				typename=iface_dtype_name(self.val), data=self.val['data'])
+
+		try:
+			return self.val['data'].cast(dtype).dereference()
+		except Exception:
+			pass
+		return self.val['data'].cast(dtype)
+
+
+def ifacematcher(val):
+	if is_iface(val) or is_eface(val):
+		return IfacePrinter(val)
+
+goobjfile.pretty_printers.append(ifacematcher)
+
+#
+#  Convenience Functions
+#
+
+
+class GoLenFunc(gdb.Function):
+	"Length of strings, slices, maps or channels"
+
+	how = ((StringTypePrinter, 'len'), (SliceTypePrinter, 'len'), (MapTypePrinter, 'count'), (ChanTypePrinter, 'qcount'))
+
+	def __init__(self):
+		gdb.Function.__init__(self, "len")
+
+	def invoke(self, obj):
+		typename = str(obj.type)
+		for klass, fld in self.how:
+			if klass.pattern.match(typename):
+				return obj[fld]
+
+
+class GoCapFunc(gdb.Function):
+	"Capacity of slices or channels"
+
+	how = ((SliceTypePrinter, 'cap'), (ChanTypePrinter, 'dataqsiz'))
+
+	def __init__(self):
+		gdb.Function.__init__(self, "cap")
+
+	def invoke(self, obj):
+		typename = str(obj.type)
+		for klass, fld in self.how:
+			if klass.pattern.match(typename):
+				return obj[fld]
+
+
+class DTypeFunc(gdb.Function):
+	"""Cast Interface values to their dynamic type.
+
+	For non-interface types this behaves as the identity operation.
+	"""
+
+	def __init__(self):
+		gdb.Function.__init__(self, "dtype")
+
+	def invoke(self, obj):
+		try:
+			return obj['data'].cast(iface_dtype(obj))
+		except gdb.error:
+			pass
+		return obj
+
+#
+#  Commands
+#
+
+def linked_list(ptr, linkfield):
+	while ptr:
+		yield ptr
+		ptr = ptr[linkfield]
+
+
+class GoroutinesCmd(gdb.Command):
+	"List all goroutines."
+
+	def __init__(self):
+		gdb.Command.__init__(self, "info goroutines", gdb.COMMAND_STACK, gdb.COMPLETE_NONE)
+
+	def invoke(self, _arg, _from_tty):
+		# args = gdb.string_to_argv(arg)
+		vp = gdb.lookup_type('void').pointer()
+		for ptr in SliceValue(gdb.parse_and_eval("'runtime.allgs'")):
+			if ptr['atomicstatus'] == G_DEAD:
+				continue
+			s = ' '
+			if ptr['m']:
+				s = '*'
+			pc = ptr['sched']['pc'].cast(vp)
+			pc = pc_to_int(pc)
+			blk = gdb.block_for_pc(pc)
+			status = int(ptr['atomicstatus'])
+			st = sts.get(status, "unknown(%d)" % status)
+			print(s, ptr['goid'], "{0:8s}".format(st), blk.function)
+
+
+def find_goroutine(goid):
+	"""
+	find_goroutine attempts to find the goroutine identified by goid.
+	It returns a tuple of gdb.Value's representing the stack pointer
+	and program counter pointer for the goroutine.
+
+	@param int goid
+
+	@return tuple (gdb.Value, gdb.Value)
+	"""
+	vp = gdb.lookup_type('void').pointer()
+	for ptr in SliceValue(gdb.parse_and_eval("'runtime.allgs'")):
+		if ptr['atomicstatus'] == G_DEAD:
+			continue
+		if ptr['goid'] == goid:
+			break
+	else:
+		return None, None
+	# Get the goroutine's saved state.
+	pc, sp = ptr['sched']['pc'], ptr['sched']['sp']
+	status = ptr['atomicstatus']&~G_SCAN
+	# Goroutine is not running nor in syscall, so use the info in goroutine
+	if status != G_RUNNING and status != G_SYSCALL:
+		return pc.cast(vp), sp.cast(vp)
+
+	# If the goroutine is in a syscall, use syscallpc/sp.
+	pc, sp = ptr['syscallpc'], ptr['syscallsp']
+	if sp != 0:
+		return pc.cast(vp), sp.cast(vp)
+	# Otherwise, the goroutine is running, so it doesn't have
+	# saved scheduler state. Find G's OS thread.
+	m = ptr['m']
+	if m == 0:
+		return None, None
+	for thr in gdb.selected_inferior().threads():
+		if thr.ptid[1] == m['procid']:
+			break
+	else:
+		return None, None
+	# Get scheduler state from the G's OS thread state.
+	curthr = gdb.selected_thread()
+	try:
+		thr.switch()
+		pc = gdb.parse_and_eval('$pc')
+		sp = gdb.parse_and_eval('$sp')
+	finally:
+		curthr.switch()
+	return pc.cast(vp), sp.cast(vp)
+
+
+class GoroutineCmd(gdb.Command):
+	"""Execute gdb command in the context of goroutine <goid>.
+
+	Switch PC and SP to the ones in the goroutine's G structure,
+	execute an arbitrary gdb command, and restore PC and SP.
+
+	Usage: (gdb) goroutine <goid> <gdbcmd>
+
+	You could pass "all" as <goid> to apply <gdbcmd> to all goroutines.
+
+	For example: (gdb) goroutine all <gdbcmd>
+
+	Note that it is ill-defined to modify state in the context of a goroutine.
+	Restrict yourself to inspecting values.
+	"""
+
+	def __init__(self):
+		gdb.Command.__init__(self, "goroutine", gdb.COMMAND_STACK, gdb.COMPLETE_NONE)
+
+	def invoke(self, arg, _from_tty):
+		goid_str, cmd = arg.split(None, 1)
+		goids = []
+
+		if goid_str == 'all':
+			for ptr in SliceValue(gdb.parse_and_eval("'runtime.allgs'")):
+				goids.append(int(ptr['goid']))
+		else:
+			goids = [int(gdb.parse_and_eval(goid_str))]
+
+		for goid in goids:
+			self.invoke_per_goid(goid, cmd)
+
+	def invoke_per_goid(self, goid, cmd):
+		pc, sp = find_goroutine(goid)
+		if not pc:
+			print("No such goroutine: ", goid)
+			return
+		pc = pc_to_int(pc)
+		save_frame = gdb.selected_frame()
+		gdb.parse_and_eval('$save_sp = $sp')
+		gdb.parse_and_eval('$save_pc = $pc')
+		# In GDB, assignments to sp must be done from the
+		# top-most frame, so select frame 0 first.
+		gdb.execute('select-frame 0')
+		gdb.parse_and_eval('$sp = {0}'.format(str(sp)))
+		gdb.parse_and_eval('$pc = {0}'.format(str(pc)))
+		try:
+			gdb.execute(cmd)
+		finally:
+			# In GDB, assignments to sp must be done from the
+			# top-most frame, so select frame 0 first.
+			gdb.execute('select-frame 0')
+			gdb.parse_and_eval('$pc = $save_pc')
+			gdb.parse_and_eval('$sp = $save_sp')
+			save_frame.select()
+
+
+class GoIfaceCmd(gdb.Command):
+	"Print Static and dynamic interface types"
+
+	def __init__(self):
+		gdb.Command.__init__(self, "iface", gdb.COMMAND_DATA, gdb.COMPLETE_SYMBOL)
+
+	def invoke(self, arg, _from_tty):
+		for obj in gdb.string_to_argv(arg):
+			try:
+				#TODO fix quoting for qualified variable names
+				obj = gdb.parse_and_eval(str(obj))
+			except Exception as e:
+				print("Can't parse ", obj, ": ", e)
+				continue
+
+			if obj['data'] == 0:
+				dtype = "nil"
+			else:
+				dtype = iface_dtype(obj)
+
+			if dtype is None:
+				print("Not an interface: ", obj.type)
+				continue
+
+			print("{0}: {1}".format(obj.type, dtype))
+
+# TODO: print interface's methods and dynamic type's func pointers thereof.
+#rsc: "to find the number of entries in the itab's Fn field look at
+# itab.inter->numMethods
+# i am sure i have the names wrong but look at the interface type
+# and its method count"
+# so Itype will start with a commontype which has kind = interface
+
+#
+# Register all convenience functions and CLI commands
+#
+GoLenFunc()
+GoCapFunc()
+DTypeFunc()
+GoroutinesCmd()
+GoroutineCmd()
+GoIfaceCmd()
diff --git a/src/runtime/runtime-gdb_test.go b/src/runtime/runtime-gdb_test.go
new file mode 100644
index 0000000..5df8c3c
--- /dev/null
+++ b/src/runtime/runtime-gdb_test.go
@@ -0,0 +1,749 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"fmt"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"regexp"
+	"runtime"
+	"strconv"
+	"strings"
+	"testing"
+)
+
+// NOTE: In some configurations, GDB will segfault when sent a SIGWINCH signal.
+// Some runtime tests send SIGWINCH to the entire process group, so those tests
+// must never run in parallel with GDB tests.
+//
+// See issue 39021 and https://sourceware.org/bugzilla/show_bug.cgi?id=26056.
+
+func checkGdbEnvironment(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+	switch runtime.GOOS {
+	case "darwin":
+		t.Skip("gdb does not work on darwin")
+	case "netbsd":
+		t.Skip("gdb does not work with threads on NetBSD; see https://golang.org/issue/22893 and https://gnats.netbsd.org/52548")
+	case "windows":
+		t.Skip("gdb tests fail on Windows: https://golang.org/issue/22687")
+	case "linux":
+		if runtime.GOARCH == "ppc64" {
+			t.Skip("skipping gdb tests on linux/ppc64; see https://golang.org/issue/17366")
+		}
+		if runtime.GOARCH == "mips" {
+			t.Skip("skipping gdb tests on linux/mips; see https://golang.org/issue/25939")
+		}
+	case "freebsd":
+		t.Skip("skipping gdb tests on FreeBSD; see https://golang.org/issue/29508")
+	case "aix":
+		if testing.Short() {
+			t.Skip("skipping gdb tests on AIX; see https://golang.org/issue/35710")
+		}
+	case "plan9":
+		t.Skip("there is no gdb on Plan 9")
+	}
+	if final := os.Getenv("GOROOT_FINAL"); final != "" && runtime.GOROOT() != final {
+		t.Skip("gdb test can fail with GOROOT_FINAL pending")
+	}
+}
+
+func checkGdbVersion(t *testing.T) {
+	// Issue 11214 reports various failures with older versions of gdb.
+	out, err := exec.Command("gdb", "--version").CombinedOutput()
+	if err != nil {
+		t.Skipf("skipping: error executing gdb: %v", err)
+	}
+	re := regexp.MustCompile(`([0-9]+)\.([0-9]+)`)
+	matches := re.FindSubmatch(out)
+	if len(matches) < 3 {
+		t.Skipf("skipping: can't determine gdb version from\n%s\n", out)
+	}
+	major, err1 := strconv.Atoi(string(matches[1]))
+	minor, err2 := strconv.Atoi(string(matches[2]))
+	if err1 != nil || err2 != nil {
+		t.Skipf("skipping: can't determine gdb version: %v, %v", err1, err2)
+	}
+	if major < 7 || (major == 7 && minor < 7) {
+		t.Skipf("skipping: gdb version %d.%d too old", major, minor)
+	}
+	t.Logf("gdb version %d.%d", major, minor)
+}
+
+func checkGdbPython(t *testing.T) {
+	if runtime.GOOS == "solaris" || runtime.GOOS == "illumos" {
+		t.Skip("skipping gdb python tests on illumos and solaris; see golang.org/issue/20821")
+	}
+
+	cmd := exec.Command("gdb", "-nx", "-q", "--batch", "-iex", "python import sys; print('go gdb python support')")
+	out, err := cmd.CombinedOutput()
+
+	if err != nil {
+		t.Skipf("skipping due to issue running gdb: %v", err)
+	}
+	if strings.TrimSpace(string(out)) != "go gdb python support" {
+		t.Skipf("skipping due to lack of python gdb support: %s", out)
+	}
+}
+
+// checkCleanBacktrace checks that the given backtrace is well formed and does
+// not contain any error messages from GDB.
+func checkCleanBacktrace(t *testing.T, backtrace string) {
+	backtrace = strings.TrimSpace(backtrace)
+	lines := strings.Split(backtrace, "\n")
+	if len(lines) == 0 {
+		t.Fatalf("empty backtrace")
+	}
+	for i, l := range lines {
+		if !strings.HasPrefix(l, fmt.Sprintf("#%v  ", i)) {
+			t.Fatalf("malformed backtrace at line %v: %v", i, l)
+		}
+	}
+	// TODO(mundaym): check for unknown frames (e.g. "??").
+}
+
+const helloSource = `
+import "fmt"
+import "runtime"
+var gslice []string
+func main() {
+	mapvar := make(map[string]string, 13)
+	slicemap := make(map[string][]string,11)
+    chanint := make(chan int, 10)
+    chanstr := make(chan string, 10)
+    chanint <- 99
+	chanint <- 11
+    chanstr <- "spongepants"
+    chanstr <- "squarebob"
+	mapvar["abc"] = "def"
+	mapvar["ghi"] = "jkl"
+	slicemap["a"] = []string{"b","c","d"}
+    slicemap["e"] = []string{"f","g","h"}
+	strvar := "abc"
+	ptrvar := &strvar
+	slicevar := make([]string, 0, 16)
+	slicevar = append(slicevar, mapvar["abc"])
+	fmt.Println("hi")
+	runtime.KeepAlive(ptrvar)
+	_ = ptrvar // set breakpoint here
+	gslice = slicevar
+	fmt.Printf("%v, %v, %v\n", slicemap, <-chanint, <-chanstr)
+	runtime.KeepAlive(mapvar)
+}  // END_OF_PROGRAM
+`
+
+func lastLine(src []byte) int {
+	eop := []byte("END_OF_PROGRAM")
+	for i, l := range bytes.Split(src, []byte("\n")) {
+		if bytes.Contains(l, eop) {
+			return i
+		}
+	}
+	return 0
+}
+
+func TestGdbPython(t *testing.T) {
+	testGdbPython(t, false)
+}
+
+func TestGdbPythonCgo(t *testing.T) {
+	if runtime.GOARCH == "mips" || runtime.GOARCH == "mipsle" || runtime.GOARCH == "mips64" {
+		testenv.SkipFlaky(t, 18784)
+	}
+	testGdbPython(t, true)
+}
+
+func testGdbPython(t *testing.T, cgo bool) {
+	if cgo {
+		testenv.MustHaveCGO(t)
+	}
+
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+	checkGdbPython(t)
+
+	dir, err := os.MkdirTemp("", "go-build")
+	if err != nil {
+		t.Fatalf("failed to create temp directory: %v", err)
+	}
+	defer os.RemoveAll(dir)
+
+	var buf bytes.Buffer
+	buf.WriteString("package main\n")
+	if cgo {
+		buf.WriteString(`import "C"` + "\n")
+	}
+	buf.WriteString(helloSource)
+
+	src := buf.Bytes()
+
+	// Locate breakpoint line
+	var bp int
+	lines := bytes.Split(src, []byte("\n"))
+	for i, line := range lines {
+		if bytes.Contains(line, []byte("breakpoint")) {
+			bp = i
+			break
+		}
+	}
+
+	err = os.WriteFile(filepath.Join(dir, "main.go"), src, 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	nLines := lastLine(src)
+
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	args := []string{"-nx", "-q", "--batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(runtime.GOROOT(), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "set print thread-events off",
+	}
+	if cgo {
+		// When we build the cgo version of the program, the system's
+		// linker is used. Some external linkers, like GNU gold,
+		// compress the .debug_gdb_scripts into .zdebug_gdb_scripts.
+		// Until gold and gdb can work together, temporarily load the
+		// python script directly.
+		args = append(args,
+			"-ex", "source "+filepath.Join(runtime.GOROOT(), "src", "runtime", "runtime-gdb.py"),
+		)
+	} else {
+		args = append(args,
+			"-ex", "info auto-load python-scripts",
+		)
+	}
+	args = append(args,
+		"-ex", "set python print-stack full",
+		"-ex", fmt.Sprintf("br main.go:%d", bp),
+		"-ex", "run",
+		"-ex", "echo BEGIN info goroutines\n",
+		"-ex", "info goroutines",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN print mapvar\n",
+		"-ex", "print mapvar",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN print slicemap\n",
+		"-ex", "print slicemap",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN print strvar\n",
+		"-ex", "print strvar",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN print chanint\n",
+		"-ex", "print chanint",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN print chanstr\n",
+		"-ex", "print chanstr",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN info locals\n",
+		"-ex", "info locals",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN goroutine 1 bt\n",
+		"-ex", "goroutine 1 bt",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN goroutine all bt\n",
+		"-ex", "goroutine all bt",
+		"-ex", "echo END\n",
+		"-ex", "clear main.go:15", // clear the previous break point
+		"-ex", fmt.Sprintf("br main.go:%d", nLines), // new break point at the end of main
+		"-ex", "c",
+		"-ex", "echo BEGIN goroutine 1 bt at the end\n",
+		"-ex", "goroutine 1 bt",
+		"-ex", "echo END\n",
+		filepath.Join(dir, "a.exe"),
+	)
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	firstLine := bytes.SplitN(got, []byte("\n"), 2)[0]
+	if string(firstLine) != "Loading Go Runtime support." {
+		// This can happen when using all.bash with
+		// GOROOT_FINAL set, because the tests are run before
+		// the final installation of the files.
+		cmd := exec.Command(testenv.GoToolPath(t), "env", "GOROOT")
+		cmd.Env = []string{}
+		out, err := cmd.CombinedOutput()
+		if err != nil && bytes.Contains(out, []byte("cannot find GOROOT")) {
+			t.Skipf("skipping because GOROOT=%s does not exist", runtime.GOROOT())
+		}
+
+		_, file, _, _ := runtime.Caller(1)
+
+		t.Logf("package testing source file: %s", file)
+		t.Fatalf("failed to load Go runtime support: %s\n%s", firstLine, got)
+	}
+
+	// Extract named BEGIN...END blocks from output
+	partRe := regexp.MustCompile(`(?ms)^BEGIN ([^\n]*)\n(.*?)\nEND`)
+	blocks := map[string]string{}
+	for _, subs := range partRe.FindAllSubmatch(got, -1) {
+		blocks[string(subs[1])] = string(subs[2])
+	}
+
+	infoGoroutinesRe := regexp.MustCompile(`\*\s+\d+\s+running\s+`)
+	if bl := blocks["info goroutines"]; !infoGoroutinesRe.MatchString(bl) {
+		t.Fatalf("info goroutines failed: %s", bl)
+	}
+
+	printMapvarRe1 := regexp.MustCompile(`^\$[0-9]+ = map\[string\]string = {\[(0x[0-9a-f]+\s+)?"abc"\] = (0x[0-9a-f]+\s+)?"def", \[(0x[0-9a-f]+\s+)?"ghi"\] = (0x[0-9a-f]+\s+)?"jkl"}$`)
+	printMapvarRe2 := regexp.MustCompile(`^\$[0-9]+ = map\[string\]string = {\[(0x[0-9a-f]+\s+)?"ghi"\] = (0x[0-9a-f]+\s+)?"jkl", \[(0x[0-9a-f]+\s+)?"abc"\] = (0x[0-9a-f]+\s+)?"def"}$`)
+	if bl := blocks["print mapvar"]; !printMapvarRe1.MatchString(bl) &&
+		!printMapvarRe2.MatchString(bl) {
+		t.Fatalf("print mapvar failed: %s", bl)
+	}
+
+	// 2 orders, and possible differences in spacing.
+	sliceMapSfx1 := `map[string][]string = {["e"] = []string = {"f", "g", "h"}, ["a"] = []string = {"b", "c", "d"}}`
+	sliceMapSfx2 := `map[string][]string = {["a"] = []string = {"b", "c", "d"}, ["e"] = []string = {"f", "g", "h"}}`
+	if bl := strings.ReplaceAll(blocks["print slicemap"], "  ", " "); !strings.HasSuffix(bl, sliceMapSfx1) && !strings.HasSuffix(bl, sliceMapSfx2) {
+		t.Fatalf("print slicemap failed: %s", bl)
+	}
+
+	chanIntSfx := `chan int = {99, 11}`
+	if bl := strings.ReplaceAll(blocks["print chanint"], "  ", " "); !strings.HasSuffix(bl, chanIntSfx) {
+		t.Fatalf("print chanint failed: %s", bl)
+	}
+
+	chanStrSfx := `chan string = {"spongepants", "squarebob"}`
+	if bl := strings.ReplaceAll(blocks["print chanstr"], "  ", " "); !strings.HasSuffix(bl, chanStrSfx) {
+		t.Fatalf("print chanstr failed: %s", bl)
+	}
+
+	strVarRe := regexp.MustCompile(`^\$[0-9]+ = (0x[0-9a-f]+\s+)?"abc"$`)
+	if bl := blocks["print strvar"]; !strVarRe.MatchString(bl) {
+		t.Fatalf("print strvar failed: %s", bl)
+	}
+
+	// The exact format of composite values has changed over time.
+	// For issue 16338: ssa decompose phase split a slice into
+	// a collection of scalar vars holding its fields. In such cases
+	// the DWARF variable location expression should be of the
+	// form "var.field" and not just "field".
+	// However, the newer dwarf location list code reconstituted
+	// aggregates from their fields and reverted their printing
+	// back to its original form.
+	// Only test that all variables are listed in 'info locals' since
+	// different versions of gdb print variables in different
+	// order and with differing amount of information and formats.
+
+	if bl := blocks["info locals"]; !strings.Contains(bl, "slicevar") ||
+		!strings.Contains(bl, "mapvar") ||
+		!strings.Contains(bl, "strvar") {
+		t.Fatalf("info locals failed: %s", bl)
+	}
+
+	// Check that the backtraces are well formed.
+	checkCleanBacktrace(t, blocks["goroutine 1 bt"])
+	checkCleanBacktrace(t, blocks["goroutine 1 bt at the end"])
+
+	btGoroutine1Re := regexp.MustCompile(`(?m)^#0\s+(0x[0-9a-f]+\s+in\s+)?main\.main.+at`)
+	if bl := blocks["goroutine 1 bt"]; !btGoroutine1Re.MatchString(bl) {
+		t.Fatalf("goroutine 1 bt failed: %s", bl)
+	}
+
+	if bl := blocks["goroutine all bt"]; !btGoroutine1Re.MatchString(bl) {
+		t.Fatalf("goroutine all bt failed: %s", bl)
+	}
+
+	btGoroutine1AtTheEndRe := regexp.MustCompile(`(?m)^#0\s+(0x[0-9a-f]+\s+in\s+)?main\.main.+at`)
+	if bl := blocks["goroutine 1 bt at the end"]; !btGoroutine1AtTheEndRe.MatchString(bl) {
+		t.Fatalf("goroutine 1 bt at the end failed: %s", bl)
+	}
+}
+
+const backtraceSource = `
+package main
+
+//go:noinline
+func aaa() bool { return bbb() }
+
+//go:noinline
+func bbb() bool { return ccc() }
+
+//go:noinline
+func ccc() bool { return ddd() }
+
+//go:noinline
+func ddd() bool { return f() }
+
+//go:noinline
+func eee() bool { return true }
+
+var f = eee
+
+func main() {
+	_ = aaa()
+}
+`
+
+// TestGdbBacktrace tests that gdb can unwind the stack correctly
+// using only the DWARF debug info.
+func TestGdbBacktrace(t *testing.T) {
+	if runtime.GOOS == "netbsd" {
+		testenv.SkipFlaky(t, 15603)
+	}
+
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+
+	dir, err := os.MkdirTemp("", "go-build")
+	if err != nil {
+		t.Fatalf("failed to create temp directory: %v", err)
+	}
+	defer os.RemoveAll(dir)
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err = os.WriteFile(src, []byte(backtraceSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	// Execute gdb commands.
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(runtime.GOROOT(), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "break main.eee",
+		"-ex", "run",
+		"-ex", "backtrace",
+		"-ex", "continue",
+		filepath.Join(dir, "a.exe"),
+	}
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	// Check that the backtrace matches the source code.
+	bt := []string{
+		"eee",
+		"ddd",
+		"ccc",
+		"bbb",
+		"aaa",
+		"main",
+	}
+	for i, name := range bt {
+		s := fmt.Sprintf("#%v.*main\\.%v", i, name)
+		re := regexp.MustCompile(s)
+		if found := re.Find(got) != nil; !found {
+			t.Fatalf("could not find '%v' in backtrace", s)
+		}
+	}
+}
+
+const autotmpTypeSource = `
+package main
+
+type astruct struct {
+	a, b int
+}
+
+func main() {
+	var iface interface{} = map[string]astruct{}
+	var iface2 interface{} = []astruct{}
+	println(iface, iface2)
+}
+`
+
+// TestGdbAutotmpTypes ensures that types of autotmp variables appear in .debug_info
+// See bug #17830.
+func TestGdbAutotmpTypes(t *testing.T) {
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+
+	if runtime.GOOS == "aix" && testing.Short() {
+		t.Skip("TestGdbAutotmpTypes is too slow on aix/ppc64")
+	}
+
+	dir, err := os.MkdirTemp("", "go-build")
+	if err != nil {
+		t.Fatalf("failed to create temp directory: %v", err)
+	}
+	defer os.RemoveAll(dir)
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err = os.WriteFile(src, []byte(autotmpTypeSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-gcflags=all=-N -l", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	// Execute gdb commands.
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(runtime.GOROOT(), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "break main.main",
+		"-ex", "run",
+		"-ex", "step",
+		"-ex", "info types astruct",
+		filepath.Join(dir, "a.exe"),
+	}
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	sgot := string(got)
+
+	// Check that the backtrace matches the source code.
+	types := []string{
+		"[]main.astruct;",
+		"bucket<string,main.astruct>;",
+		"hash<string,main.astruct>;",
+		"main.astruct;",
+		"hash<string,main.astruct> * map[string]main.astruct;",
+	}
+	for _, name := range types {
+		if !strings.Contains(sgot, name) {
+			t.Fatalf("could not find %s in 'info typrs astruct' output", name)
+		}
+	}
+}
+
+const constsSource = `
+package main
+
+const aConstant int = 42
+const largeConstant uint64 = ^uint64(0)
+const minusOne int64 = -1
+
+func main() {
+	println("hello world")
+}
+`
+
+func TestGdbConst(t *testing.T) {
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+
+	dir, err := os.MkdirTemp("", "go-build")
+	if err != nil {
+		t.Fatalf("failed to create temp directory: %v", err)
+	}
+	defer os.RemoveAll(dir)
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err = os.WriteFile(src, []byte(constsSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-gcflags=all=-N -l", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	// Execute gdb commands.
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(runtime.GOROOT(), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "break main.main",
+		"-ex", "run",
+		"-ex", "print main.aConstant",
+		"-ex", "print main.largeConstant",
+		"-ex", "print main.minusOne",
+		"-ex", "print 'runtime.mSpanInUse'",
+		"-ex", "print 'runtime._PageSize'",
+		filepath.Join(dir, "a.exe"),
+	}
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	sgot := strings.ReplaceAll(string(got), "\r\n", "\n")
+
+	if !strings.Contains(sgot, "\n$1 = 42\n$2 = 18446744073709551615\n$3 = -1\n$4 = 1 '\\001'\n$5 = 8192") {
+		t.Fatalf("output mismatch")
+	}
+}
+
+const panicSource = `
+package main
+
+import "runtime/debug"
+
+func main() {
+	debug.SetTraceback("crash")
+	crash()
+}
+
+func crash() {
+	panic("panic!")
+}
+`
+
+// TestGdbPanic tests that gdb can unwind the stack correctly
+// from SIGABRTs from Go panics.
+func TestGdbPanic(t *testing.T) {
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+
+	dir, err := os.MkdirTemp("", "go-build")
+	if err != nil {
+		t.Fatalf("failed to create temp directory: %v", err)
+	}
+	defer os.RemoveAll(dir)
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err = os.WriteFile(src, []byte(panicSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	// Execute gdb commands.
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(runtime.GOROOT(), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "run",
+		"-ex", "backtrace",
+		filepath.Join(dir, "a.exe"),
+	}
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	// Check that the backtrace matches the source code.
+	bt := []string{
+		`crash`,
+		`main`,
+	}
+	for _, name := range bt {
+		s := fmt.Sprintf("(#.* .* in )?main\\.%v", name)
+		re := regexp.MustCompile(s)
+		if found := re.Find(got) != nil; !found {
+			t.Fatalf("could not find '%v' in backtrace", s)
+		}
+	}
+}
+
+const InfCallstackSource = `
+package main
+import "C"
+import "time"
+
+func loop() {
+        for i := 0; i < 1000; i++ {
+                time.Sleep(time.Millisecond*5)
+        }
+}
+
+func main() {
+        go loop()
+        time.Sleep(time.Second * 1)
+}
+`
+
+// TestGdbInfCallstack tests that gdb can unwind the callstack of cgo programs
+// on arm64 platforms without endless frames of function 'crossfunc1'.
+// https://golang.org/issue/37238
+func TestGdbInfCallstack(t *testing.T) {
+	checkGdbEnvironment(t)
+
+	testenv.MustHaveCGO(t)
+	if runtime.GOARCH != "arm64" {
+		t.Skip("skipping infinite callstack test on non-arm64 arches")
+	}
+
+	t.Parallel()
+	checkGdbVersion(t)
+
+	dir, err := os.MkdirTemp("", "go-build")
+	if err != nil {
+		t.Fatalf("failed to create temp directory: %v", err)
+	}
+	defer os.RemoveAll(dir)
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err = os.WriteFile(src, []byte(InfCallstackSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	// Execute gdb commands.
+	// 'setg_gcc' is the first point where we can reproduce the issue with just one 'run' command.
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(runtime.GOROOT(), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "break setg_gcc",
+		"-ex", "run",
+		"-ex", "backtrace 3",
+		"-ex", "disable 1",
+		"-ex", "continue",
+		filepath.Join(dir, "a.exe"),
+	}
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	// Check that the backtrace matches
+	// We check the 3 inner most frames only as they are present certainly, according to gcc_<OS>_arm64.c
+	bt := []string{
+		`setg_gcc`,
+		`crosscall1`,
+		`threadentry`,
+	}
+	for i, name := range bt {
+		s := fmt.Sprintf("#%v.*%v", i, name)
+		re := regexp.MustCompile(s)
+		if found := re.Find(got) != nil; !found {
+			t.Fatalf("could not find '%v' in backtrace", s)
+		}
+	}
+}
diff --git a/src/runtime/runtime-lldb_test.go b/src/runtime/runtime-lldb_test.go
new file mode 100644
index 0000000..c923b87
--- /dev/null
+++ b/src/runtime/runtime-lldb_test.go
@@ -0,0 +1,189 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+var lldbPath string
+
+func checkLldbPython(t *testing.T) {
+	cmd := exec.Command("lldb", "-P")
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Skipf("skipping due to issue running lldb: %v\n%s", err, out)
+	}
+	lldbPath = strings.TrimSpace(string(out))
+
+	cmd = exec.Command("/usr/bin/python2.7", "-c", "import sys;sys.path.append(sys.argv[1]);import lldb; print('go lldb python support')", lldbPath)
+	out, err = cmd.CombinedOutput()
+
+	if err != nil {
+		t.Skipf("skipping due to issue running python: %v\n%s", err, out)
+	}
+	if string(out) != "go lldb python support\n" {
+		t.Skipf("skipping due to lack of python lldb support: %s", out)
+	}
+
+	if runtime.GOOS == "darwin" {
+		// Try to see if we have debugging permissions.
+		cmd = exec.Command("/usr/sbin/DevToolsSecurity", "-status")
+		out, err = cmd.CombinedOutput()
+		if err != nil {
+			t.Skipf("DevToolsSecurity failed: %v", err)
+		} else if !strings.Contains(string(out), "enabled") {
+			t.Skip(string(out))
+		}
+		cmd = exec.Command("/usr/bin/groups")
+		out, err = cmd.CombinedOutput()
+		if err != nil {
+			t.Skipf("groups failed: %v", err)
+		} else if !strings.Contains(string(out), "_developer") {
+			t.Skip("Not in _developer group")
+		}
+	}
+}
+
+const lldbHelloSource = `
+package main
+import "fmt"
+func main() {
+	mapvar := make(map[string]string,5)
+	mapvar["abc"] = "def"
+	mapvar["ghi"] = "jkl"
+	intvar := 42
+	ptrvar := &intvar
+	fmt.Println("hi") // line 10
+	_ = ptrvar
+}
+`
+
+const lldbScriptSource = `
+import sys
+sys.path.append(sys.argv[1])
+import lldb
+import os
+
+TIMEOUT_SECS = 5
+
+debugger = lldb.SBDebugger.Create()
+debugger.SetAsync(True)
+target = debugger.CreateTargetWithFileAndArch("a.exe", None)
+if target:
+  print "Created target"
+  main_bp = target.BreakpointCreateByLocation("main.go", 10)
+  if main_bp:
+    print "Created breakpoint"
+  process = target.LaunchSimple(None, None, os.getcwd())
+  if process:
+    print "Process launched"
+    listener = debugger.GetListener()
+    process.broadcaster.AddListener(listener, lldb.SBProcess.eBroadcastBitStateChanged)
+    while True:
+      event = lldb.SBEvent()
+      if listener.WaitForEvent(TIMEOUT_SECS, event):
+        if lldb.SBProcess.GetRestartedFromEvent(event):
+          continue
+        state = process.GetState()
+        if state in [lldb.eStateUnloaded, lldb.eStateLaunching, lldb.eStateRunning]:
+          continue
+      else:
+        print "Timeout launching"
+      break
+    if state == lldb.eStateStopped:
+      for t in process.threads:
+        if t.GetStopReason() == lldb.eStopReasonBreakpoint:
+          print "Hit breakpoint"
+          frame = t.GetFrameAtIndex(0)
+          if frame:
+            if frame.line_entry:
+              print "Stopped at %s:%d" % (frame.line_entry.file.basename, frame.line_entry.line)
+            if frame.function:
+              print "Stopped in %s" % (frame.function.name,)
+            var = frame.FindVariable('intvar')
+            if var:
+              print "intvar = %s" % (var.GetValue(),)
+            else:
+              print "no intvar"
+    else:
+      print "Process state", state
+    process.Destroy()
+else:
+  print "Failed to create target a.exe"
+
+lldb.SBDebugger.Destroy(debugger)
+sys.exit()
+`
+
+const expectedLldbOutput = `Created target
+Created breakpoint
+Process launched
+Hit breakpoint
+Stopped at main.go:10
+Stopped in main.main
+intvar = 42
+`
+
+func TestLldbPython(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+	if final := os.Getenv("GOROOT_FINAL"); final != "" && runtime.GOROOT() != final {
+		t.Skip("gdb test can fail with GOROOT_FINAL pending")
+	}
+	testenv.SkipFlaky(t, 31188)
+
+	checkLldbPython(t)
+
+	dir, err := os.MkdirTemp("", "go-build")
+	if err != nil {
+		t.Fatalf("failed to create temp directory: %v", err)
+	}
+	defer os.RemoveAll(dir)
+
+	src := filepath.Join(dir, "main.go")
+	err = os.WriteFile(src, []byte(lldbHelloSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create src file: %v", err)
+	}
+
+	mod := filepath.Join(dir, "go.mod")
+	err = os.WriteFile(mod, []byte("module lldbtest"), 0644)
+	if err != nil {
+		t.Fatalf("failed to create mod file: %v", err)
+	}
+
+	// As of 2018-07-17, lldb doesn't support compressed DWARF, so
+	// disable it for this test.
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-gcflags=all=-N -l", "-ldflags=-compressdwarf=false", "-o", "a.exe")
+	cmd.Dir = dir
+	cmd.Env = append(os.Environ(), "GOPATH=") // issue 31100
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	src = filepath.Join(dir, "script.py")
+	err = os.WriteFile(src, []byte(lldbScriptSource), 0755)
+	if err != nil {
+		t.Fatalf("failed to create script: %v", err)
+	}
+
+	cmd = exec.Command("/usr/bin/python2.7", "script.py", lldbPath)
+	cmd.Dir = dir
+	got, _ := cmd.CombinedOutput()
+
+	if string(got) != expectedLldbOutput {
+		if strings.Contains(string(got), "Timeout launching") {
+			t.Skip("Timeout launching")
+		}
+		t.Fatalf("Unexpected lldb output:\n%s", got)
+	}
+}
diff --git a/src/runtime/runtime.go b/src/runtime/runtime.go
new file mode 100644
index 0000000..33ecc26
--- /dev/null
+++ b/src/runtime/runtime.go
@@ -0,0 +1,65 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	_ "unsafe" // for go:linkname
+)
+
+//go:generate go run wincallback.go
+//go:generate go run mkduff.go
+//go:generate go run mkfastlog2table.go
+
+var ticks struct {
+	lock mutex
+	pad  uint32 // ensure 8-byte alignment of val on 386
+	val  uint64
+}
+
+// Note: Called by runtime/pprof in addition to runtime code.
+func tickspersecond() int64 {
+	r := int64(atomic.Load64(&ticks.val))
+	if r != 0 {
+		return r
+	}
+	lock(&ticks.lock)
+	r = int64(ticks.val)
+	if r == 0 {
+		t0 := nanotime()
+		c0 := cputicks()
+		usleep(100 * 1000)
+		t1 := nanotime()
+		c1 := cputicks()
+		if t1 == t0 {
+			t1++
+		}
+		r = (c1 - c0) * 1000 * 1000 * 1000 / (t1 - t0)
+		if r == 0 {
+			r++
+		}
+		atomic.Store64(&ticks.val, uint64(r))
+	}
+	unlock(&ticks.lock)
+	return r
+}
+
+var envs []string
+var argslice []string
+
+//go:linkname syscall_runtime_envs syscall.runtime_envs
+func syscall_runtime_envs() []string { return append([]string{}, envs...) }
+
+//go:linkname syscall_Getpagesize syscall.Getpagesize
+func syscall_Getpagesize() int { return int(physPageSize) }
+
+//go:linkname os_runtime_args os.runtime_args
+func os_runtime_args() []string { return append([]string{}, argslice...) }
+
+//go:linkname syscall_Exit syscall.Exit
+//go:nosplit
+func syscall_Exit(code int) {
+	exit(int32(code))
+}
diff --git a/src/runtime/runtime1.go b/src/runtime/runtime1.go
new file mode 100644
index 0000000..30b7044
--- /dev/null
+++ b/src/runtime/runtime1.go
@@ -0,0 +1,544 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/bytealg"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Keep a cached value to make gotraceback fast,
+// since we call it on every call to gentraceback.
+// The cached value is a uint32 in which the low bits
+// are the "crash" and "all" settings and the remaining
+// bits are the traceback value (0 off, 1 on, 2 include system).
+const (
+	tracebackCrash = 1 << iota
+	tracebackAll
+	tracebackShift = iota
+)
+
+var traceback_cache uint32 = 2 << tracebackShift
+var traceback_env uint32
+
+// gotraceback returns the current traceback settings.
+//
+// If level is 0, suppress all tracebacks.
+// If level is 1, show tracebacks, but exclude runtime frames.
+// If level is 2, show tracebacks including runtime frames.
+// If all is set, print all goroutine stacks. Otherwise, print just the current goroutine.
+// If crash is set, crash (core dump, etc) after tracebacking.
+//
+//go:nosplit
+func gotraceback() (level int32, all, crash bool) {
+	_g_ := getg()
+	t := atomic.Load(&traceback_cache)
+	crash = t&tracebackCrash != 0
+	all = _g_.m.throwing > 0 || t&tracebackAll != 0
+	if _g_.m.traceback != 0 {
+		level = int32(_g_.m.traceback)
+	} else {
+		level = int32(t >> tracebackShift)
+	}
+	return
+}
+
+var (
+	argc int32
+	argv **byte
+)
+
+// nosplit for use in linux startup sysargs
+//go:nosplit
+func argv_index(argv **byte, i int32) *byte {
+	return *(**byte)(add(unsafe.Pointer(argv), uintptr(i)*sys.PtrSize))
+}
+
+func args(c int32, v **byte) {
+	argc = c
+	argv = v
+	sysargs(c, v)
+}
+
+func goargs() {
+	if GOOS == "windows" {
+		return
+	}
+	argslice = make([]string, argc)
+	for i := int32(0); i < argc; i++ {
+		argslice[i] = gostringnocopy(argv_index(argv, i))
+	}
+}
+
+func goenvs_unix() {
+	// TODO(austin): ppc64 in dynamic linking mode doesn't
+	// guarantee env[] will immediately follow argv. Might cause
+	// problems.
+	n := int32(0)
+	for argv_index(argv, argc+1+n) != nil {
+		n++
+	}
+
+	envs = make([]string, n)
+	for i := int32(0); i < n; i++ {
+		envs[i] = gostring(argv_index(argv, argc+1+i))
+	}
+}
+
+func environ() []string {
+	return envs
+}
+
+// TODO: These should be locals in testAtomic64, but we don't 8-byte
+// align stack variables on 386.
+var test_z64, test_x64 uint64
+
+func testAtomic64() {
+	test_z64 = 42
+	test_x64 = 0
+	if atomic.Cas64(&test_z64, test_x64, 1) {
+		throw("cas64 failed")
+	}
+	if test_x64 != 0 {
+		throw("cas64 failed")
+	}
+	test_x64 = 42
+	if !atomic.Cas64(&test_z64, test_x64, 1) {
+		throw("cas64 failed")
+	}
+	if test_x64 != 42 || test_z64 != 1 {
+		throw("cas64 failed")
+	}
+	if atomic.Load64(&test_z64) != 1 {
+		throw("load64 failed")
+	}
+	atomic.Store64(&test_z64, (1<<40)+1)
+	if atomic.Load64(&test_z64) != (1<<40)+1 {
+		throw("store64 failed")
+	}
+	if atomic.Xadd64(&test_z64, (1<<40)+1) != (2<<40)+2 {
+		throw("xadd64 failed")
+	}
+	if atomic.Load64(&test_z64) != (2<<40)+2 {
+		throw("xadd64 failed")
+	}
+	if atomic.Xchg64(&test_z64, (3<<40)+3) != (2<<40)+2 {
+		throw("xchg64 failed")
+	}
+	if atomic.Load64(&test_z64) != (3<<40)+3 {
+		throw("xchg64 failed")
+	}
+}
+
+func check() {
+	var (
+		a     int8
+		b     uint8
+		c     int16
+		d     uint16
+		e     int32
+		f     uint32
+		g     int64
+		h     uint64
+		i, i1 float32
+		j, j1 float64
+		k     unsafe.Pointer
+		l     *uint16
+		m     [4]byte
+	)
+	type x1t struct {
+		x uint8
+	}
+	type y1t struct {
+		x1 x1t
+		y  uint8
+	}
+	var x1 x1t
+	var y1 y1t
+
+	if unsafe.Sizeof(a) != 1 {
+		throw("bad a")
+	}
+	if unsafe.Sizeof(b) != 1 {
+		throw("bad b")
+	}
+	if unsafe.Sizeof(c) != 2 {
+		throw("bad c")
+	}
+	if unsafe.Sizeof(d) != 2 {
+		throw("bad d")
+	}
+	if unsafe.Sizeof(e) != 4 {
+		throw("bad e")
+	}
+	if unsafe.Sizeof(f) != 4 {
+		throw("bad f")
+	}
+	if unsafe.Sizeof(g) != 8 {
+		throw("bad g")
+	}
+	if unsafe.Sizeof(h) != 8 {
+		throw("bad h")
+	}
+	if unsafe.Sizeof(i) != 4 {
+		throw("bad i")
+	}
+	if unsafe.Sizeof(j) != 8 {
+		throw("bad j")
+	}
+	if unsafe.Sizeof(k) != sys.PtrSize {
+		throw("bad k")
+	}
+	if unsafe.Sizeof(l) != sys.PtrSize {
+		throw("bad l")
+	}
+	if unsafe.Sizeof(x1) != 1 {
+		throw("bad unsafe.Sizeof x1")
+	}
+	if unsafe.Offsetof(y1.y) != 1 {
+		throw("bad offsetof y1.y")
+	}
+	if unsafe.Sizeof(y1) != 2 {
+		throw("bad unsafe.Sizeof y1")
+	}
+
+	if timediv(12345*1000000000+54321, 1000000000, &e) != 12345 || e != 54321 {
+		throw("bad timediv")
+	}
+
+	var z uint32
+	z = 1
+	if !atomic.Cas(&z, 1, 2) {
+		throw("cas1")
+	}
+	if z != 2 {
+		throw("cas2")
+	}
+
+	z = 4
+	if atomic.Cas(&z, 5, 6) {
+		throw("cas3")
+	}
+	if z != 4 {
+		throw("cas4")
+	}
+
+	z = 0xffffffff
+	if !atomic.Cas(&z, 0xffffffff, 0xfffffffe) {
+		throw("cas5")
+	}
+	if z != 0xfffffffe {
+		throw("cas6")
+	}
+
+	m = [4]byte{1, 1, 1, 1}
+	atomic.Or8(&m[1], 0xf0)
+	if m[0] != 1 || m[1] != 0xf1 || m[2] != 1 || m[3] != 1 {
+		throw("atomicor8")
+	}
+
+	m = [4]byte{0xff, 0xff, 0xff, 0xff}
+	atomic.And8(&m[1], 0x1)
+	if m[0] != 0xff || m[1] != 0x1 || m[2] != 0xff || m[3] != 0xff {
+		throw("atomicand8")
+	}
+
+	*(*uint64)(unsafe.Pointer(&j)) = ^uint64(0)
+	if j == j {
+		throw("float64nan")
+	}
+	if !(j != j) {
+		throw("float64nan1")
+	}
+
+	*(*uint64)(unsafe.Pointer(&j1)) = ^uint64(1)
+	if j == j1 {
+		throw("float64nan2")
+	}
+	if !(j != j1) {
+		throw("float64nan3")
+	}
+
+	*(*uint32)(unsafe.Pointer(&i)) = ^uint32(0)
+	if i == i {
+		throw("float32nan")
+	}
+	if i == i {
+		throw("float32nan1")
+	}
+
+	*(*uint32)(unsafe.Pointer(&i1)) = ^uint32(1)
+	if i == i1 {
+		throw("float32nan2")
+	}
+	if i == i1 {
+		throw("float32nan3")
+	}
+
+	testAtomic64()
+
+	if _FixedStack != round2(_FixedStack) {
+		throw("FixedStack is not power-of-2")
+	}
+
+	if !checkASM() {
+		throw("assembly checks failed")
+	}
+}
+
+type dbgVar struct {
+	name  string
+	value *int32
+}
+
+// Holds variables parsed from GODEBUG env var,
+// except for "memprofilerate" since there is an
+// existing int var for that value, which may
+// already have an initial value.
+var debug struct {
+	cgocheck           int32
+	clobberfree        int32
+	efence             int32
+	gccheckmark        int32
+	gcpacertrace       int32
+	gcshrinkstackoff   int32
+	gcstoptheworld     int32
+	gctrace            int32
+	invalidptr         int32
+	madvdontneed       int32 // for Linux; issue 28466
+	scavenge           int32
+	scavtrace          int32
+	scheddetail        int32
+	schedtrace         int32
+	tracebackancestors int32
+	asyncpreemptoff    int32
+
+	// debug.malloc is used as a combined debug check
+	// in the malloc function and should be set
+	// if any of the below debug options is != 0.
+	malloc         bool
+	allocfreetrace int32
+	inittrace      int32
+	sbrk           int32
+}
+
+var dbgvars = []dbgVar{
+	{"allocfreetrace", &debug.allocfreetrace},
+	{"clobberfree", &debug.clobberfree},
+	{"cgocheck", &debug.cgocheck},
+	{"efence", &debug.efence},
+	{"gccheckmark", &debug.gccheckmark},
+	{"gcpacertrace", &debug.gcpacertrace},
+	{"gcshrinkstackoff", &debug.gcshrinkstackoff},
+	{"gcstoptheworld", &debug.gcstoptheworld},
+	{"gctrace", &debug.gctrace},
+	{"invalidptr", &debug.invalidptr},
+	{"madvdontneed", &debug.madvdontneed},
+	{"sbrk", &debug.sbrk},
+	{"scavenge", &debug.scavenge},
+	{"scavtrace", &debug.scavtrace},
+	{"scheddetail", &debug.scheddetail},
+	{"schedtrace", &debug.schedtrace},
+	{"tracebackancestors", &debug.tracebackancestors},
+	{"asyncpreemptoff", &debug.asyncpreemptoff},
+	{"inittrace", &debug.inittrace},
+}
+
+func parsedebugvars() {
+	// defaults
+	debug.cgocheck = 1
+	debug.invalidptr = 1
+	if GOOS == "linux" {
+		// On Linux, MADV_FREE is faster than MADV_DONTNEED,
+		// but doesn't affect many of the statistics that
+		// MADV_DONTNEED does until the memory is actually
+		// reclaimed. This generally leads to poor user
+		// experience, like confusing stats in top and other
+		// monitoring tools; and bad integration with
+		// management systems that respond to memory usage.
+		// Hence, default to MADV_DONTNEED.
+		debug.madvdontneed = 1
+	}
+
+	for p := gogetenv("GODEBUG"); p != ""; {
+		field := ""
+		i := bytealg.IndexByteString(p, ',')
+		if i < 0 {
+			field, p = p, ""
+		} else {
+			field, p = p[:i], p[i+1:]
+		}
+		i = bytealg.IndexByteString(field, '=')
+		if i < 0 {
+			continue
+		}
+		key, value := field[:i], field[i+1:]
+
+		// Update MemProfileRate directly here since it
+		// is int, not int32, and should only be updated
+		// if specified in GODEBUG.
+		if key == "memprofilerate" {
+			if n, ok := atoi(value); ok {
+				MemProfileRate = n
+			}
+		} else {
+			for _, v := range dbgvars {
+				if v.name == key {
+					if n, ok := atoi32(value); ok {
+						*v.value = n
+					}
+				}
+			}
+		}
+	}
+
+	debug.malloc = (debug.allocfreetrace | debug.inittrace | debug.sbrk) != 0
+
+	setTraceback(gogetenv("GOTRACEBACK"))
+	traceback_env = traceback_cache
+}
+
+//go:linkname setTraceback runtime/debug.SetTraceback
+func setTraceback(level string) {
+	var t uint32
+	switch level {
+	case "none":
+		t = 0
+	case "single", "":
+		t = 1 << tracebackShift
+	case "all":
+		t = 1<<tracebackShift | tracebackAll
+	case "system":
+		t = 2<<tracebackShift | tracebackAll
+	case "crash":
+		t = 2<<tracebackShift | tracebackAll | tracebackCrash
+	default:
+		t = tracebackAll
+		if n, ok := atoi(level); ok && n == int(uint32(n)) {
+			t |= uint32(n) << tracebackShift
+		}
+	}
+	// when C owns the process, simply exit'ing the process on fatal errors
+	// and panics is surprising. Be louder and abort instead.
+	if islibrary || isarchive {
+		t |= tracebackCrash
+	}
+
+	t |= traceback_env
+
+	atomic.Store(&traceback_cache, t)
+}
+
+// Poor mans 64-bit division.
+// This is a very special function, do not use it if you are not sure what you are doing.
+// int64 division is lowered into _divv() call on 386, which does not fit into nosplit functions.
+// Handles overflow in a time-specific manner.
+// This keeps us within no-split stack limits on 32-bit processors.
+//go:nosplit
+func timediv(v int64, div int32, rem *int32) int32 {
+	res := int32(0)
+	for bit := 30; bit >= 0; bit-- {
+		if v >= int64(div)<<uint(bit) {
+			v = v - (int64(div) << uint(bit))
+			// Before this for loop, res was 0, thus all these
+			// power of 2 increments are now just bitsets.
+			res |= 1 << uint(bit)
+		}
+	}
+	if v >= int64(div) {
+		if rem != nil {
+			*rem = 0
+		}
+		return 0x7fffffff
+	}
+	if rem != nil {
+		*rem = int32(v)
+	}
+	return res
+}
+
+// Helpers for Go. Must be NOSPLIT, must only call NOSPLIT functions, and must not block.
+
+//go:nosplit
+func acquirem() *m {
+	_g_ := getg()
+	_g_.m.locks++
+	return _g_.m
+}
+
+//go:nosplit
+func releasem(mp *m) {
+	_g_ := getg()
+	mp.locks--
+	if mp.locks == 0 && _g_.preempt {
+		// restore the preemption request in case we've cleared it in newstack
+		_g_.stackguard0 = stackPreempt
+	}
+}
+
+//go:linkname reflect_typelinks reflect.typelinks
+func reflect_typelinks() ([]unsafe.Pointer, [][]int32) {
+	modules := activeModules()
+	sections := []unsafe.Pointer{unsafe.Pointer(modules[0].types)}
+	ret := [][]int32{modules[0].typelinks}
+	for _, md := range modules[1:] {
+		sections = append(sections, unsafe.Pointer(md.types))
+		ret = append(ret, md.typelinks)
+	}
+	return sections, ret
+}
+
+// reflect_resolveNameOff resolves a name offset from a base pointer.
+//go:linkname reflect_resolveNameOff reflect.resolveNameOff
+func reflect_resolveNameOff(ptrInModule unsafe.Pointer, off int32) unsafe.Pointer {
+	return unsafe.Pointer(resolveNameOff(ptrInModule, nameOff(off)).bytes)
+}
+
+// reflect_resolveTypeOff resolves an *rtype offset from a base type.
+//go:linkname reflect_resolveTypeOff reflect.resolveTypeOff
+func reflect_resolveTypeOff(rtype unsafe.Pointer, off int32) unsafe.Pointer {
+	return unsafe.Pointer((*_type)(rtype).typeOff(typeOff(off)))
+}
+
+// reflect_resolveTextOff resolves a function pointer offset from a base type.
+//go:linkname reflect_resolveTextOff reflect.resolveTextOff
+func reflect_resolveTextOff(rtype unsafe.Pointer, off int32) unsafe.Pointer {
+	return (*_type)(rtype).textOff(textOff(off))
+
+}
+
+// reflectlite_resolveNameOff resolves a name offset from a base pointer.
+//go:linkname reflectlite_resolveNameOff internal/reflectlite.resolveNameOff
+func reflectlite_resolveNameOff(ptrInModule unsafe.Pointer, off int32) unsafe.Pointer {
+	return unsafe.Pointer(resolveNameOff(ptrInModule, nameOff(off)).bytes)
+}
+
+// reflectlite_resolveTypeOff resolves an *rtype offset from a base type.
+//go:linkname reflectlite_resolveTypeOff internal/reflectlite.resolveTypeOff
+func reflectlite_resolveTypeOff(rtype unsafe.Pointer, off int32) unsafe.Pointer {
+	return unsafe.Pointer((*_type)(rtype).typeOff(typeOff(off)))
+}
+
+// reflect_addReflectOff adds a pointer to the reflection offset lookup map.
+//go:linkname reflect_addReflectOff reflect.addReflectOff
+func reflect_addReflectOff(ptr unsafe.Pointer) int32 {
+	reflectOffsLock()
+	if reflectOffs.m == nil {
+		reflectOffs.m = make(map[int32]unsafe.Pointer)
+		reflectOffs.minv = make(map[unsafe.Pointer]int32)
+		reflectOffs.next = -1
+	}
+	id, found := reflectOffs.minv[ptr]
+	if !found {
+		id = reflectOffs.next
+		reflectOffs.next-- // use negative offsets as IDs to aid debugging
+		reflectOffs.m[id] = ptr
+		reflectOffs.minv[ptr] = id
+	}
+	reflectOffsUnlock()
+	return id
+}
diff --git a/src/runtime/runtime2.go b/src/runtime/runtime2.go
new file mode 100644
index 0000000..e982532
--- /dev/null
+++ b/src/runtime/runtime2.go
@@ -0,0 +1,1105 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// defined constants
+const (
+	// G status
+	//
+	// Beyond indicating the general state of a G, the G status
+	// acts like a lock on the goroutine's stack (and hence its
+	// ability to execute user code).
+	//
+	// If you add to this list, add to the list
+	// of "okay during garbage collection" status
+	// in mgcmark.go too.
+	//
+	// TODO(austin): The _Gscan bit could be much lighter-weight.
+	// For example, we could choose not to run _Gscanrunnable
+	// goroutines found in the run queue, rather than CAS-looping
+	// until they become _Grunnable. And transitions like
+	// _Gscanwaiting -> _Gscanrunnable are actually okay because
+	// they don't affect stack ownership.
+
+	// _Gidle means this goroutine was just allocated and has not
+	// yet been initialized.
+	_Gidle = iota // 0
+
+	// _Grunnable means this goroutine is on a run queue. It is
+	// not currently executing user code. The stack is not owned.
+	_Grunnable // 1
+
+	// _Grunning means this goroutine may execute user code. The
+	// stack is owned by this goroutine. It is not on a run queue.
+	// It is assigned an M and a P (g.m and g.m.p are valid).
+	_Grunning // 2
+
+	// _Gsyscall means this goroutine is executing a system call.
+	// It is not executing user code. The stack is owned by this
+	// goroutine. It is not on a run queue. It is assigned an M.
+	_Gsyscall // 3
+
+	// _Gwaiting means this goroutine is blocked in the runtime.
+	// It is not executing user code. It is not on a run queue,
+	// but should be recorded somewhere (e.g., a channel wait
+	// queue) so it can be ready()d when necessary. The stack is
+	// not owned *except* that a channel operation may read or
+	// write parts of the stack under the appropriate channel
+	// lock. Otherwise, it is not safe to access the stack after a
+	// goroutine enters _Gwaiting (e.g., it may get moved).
+	_Gwaiting // 4
+
+	// _Gmoribund_unused is currently unused, but hardcoded in gdb
+	// scripts.
+	_Gmoribund_unused // 5
+
+	// _Gdead means this goroutine is currently unused. It may be
+	// just exited, on a free list, or just being initialized. It
+	// is not executing user code. It may or may not have a stack
+	// allocated. The G and its stack (if any) are owned by the M
+	// that is exiting the G or that obtained the G from the free
+	// list.
+	_Gdead // 6
+
+	// _Genqueue_unused is currently unused.
+	_Genqueue_unused // 7
+
+	// _Gcopystack means this goroutine's stack is being moved. It
+	// is not executing user code and is not on a run queue. The
+	// stack is owned by the goroutine that put it in _Gcopystack.
+	_Gcopystack // 8
+
+	// _Gpreempted means this goroutine stopped itself for a
+	// suspendG preemption. It is like _Gwaiting, but nothing is
+	// yet responsible for ready()ing it. Some suspendG must CAS
+	// the status to _Gwaiting to take responsibility for
+	// ready()ing this G.
+	_Gpreempted // 9
+
+	// _Gscan combined with one of the above states other than
+	// _Grunning indicates that GC is scanning the stack. The
+	// goroutine is not executing user code and the stack is owned
+	// by the goroutine that set the _Gscan bit.
+	//
+	// _Gscanrunning is different: it is used to briefly block
+	// state transitions while GC signals the G to scan its own
+	// stack. This is otherwise like _Grunning.
+	//
+	// atomicstatus&~Gscan gives the state the goroutine will
+	// return to when the scan completes.
+	_Gscan          = 0x1000
+	_Gscanrunnable  = _Gscan + _Grunnable  // 0x1001
+	_Gscanrunning   = _Gscan + _Grunning   // 0x1002
+	_Gscansyscall   = _Gscan + _Gsyscall   // 0x1003
+	_Gscanwaiting   = _Gscan + _Gwaiting   // 0x1004
+	_Gscanpreempted = _Gscan + _Gpreempted // 0x1009
+)
+
+const (
+	// P status
+
+	// _Pidle means a P is not being used to run user code or the
+	// scheduler. Typically, it's on the idle P list and available
+	// to the scheduler, but it may just be transitioning between
+	// other states.
+	//
+	// The P is owned by the idle list or by whatever is
+	// transitioning its state. Its run queue is empty.
+	_Pidle = iota
+
+	// _Prunning means a P is owned by an M and is being used to
+	// run user code or the scheduler. Only the M that owns this P
+	// is allowed to change the P's status from _Prunning. The M
+	// may transition the P to _Pidle (if it has no more work to
+	// do), _Psyscall (when entering a syscall), or _Pgcstop (to
+	// halt for the GC). The M may also hand ownership of the P
+	// off directly to another M (e.g., to schedule a locked G).
+	_Prunning
+
+	// _Psyscall means a P is not running user code. It has
+	// affinity to an M in a syscall but is not owned by it and
+	// may be stolen by another M. This is similar to _Pidle but
+	// uses lightweight transitions and maintains M affinity.
+	//
+	// Leaving _Psyscall must be done with a CAS, either to steal
+	// or retake the P. Note that there's an ABA hazard: even if
+	// an M successfully CASes its original P back to _Prunning
+	// after a syscall, it must understand the P may have been
+	// used by another M in the interim.
+	_Psyscall
+
+	// _Pgcstop means a P is halted for STW and owned by the M
+	// that stopped the world. The M that stopped the world
+	// continues to use its P, even in _Pgcstop. Transitioning
+	// from _Prunning to _Pgcstop causes an M to release its P and
+	// park.
+	//
+	// The P retains its run queue and startTheWorld will restart
+	// the scheduler on Ps with non-empty run queues.
+	_Pgcstop
+
+	// _Pdead means a P is no longer used (GOMAXPROCS shrank). We
+	// reuse Ps if GOMAXPROCS increases. A dead P is mostly
+	// stripped of its resources, though a few things remain
+	// (e.g., trace buffers).
+	_Pdead
+)
+
+// Mutual exclusion locks.  In the uncontended case,
+// as fast as spin locks (just a few user-level instructions),
+// but on the contention path they sleep in the kernel.
+// A zeroed Mutex is unlocked (no need to initialize each lock).
+// Initialization is helpful for static lock ranking, but not required.
+type mutex struct {
+	// Empty struct if lock ranking is disabled, otherwise includes the lock rank
+	lockRankStruct
+	// Futex-based impl treats it as uint32 key,
+	// while sema-based impl as M* waitm.
+	// Used to be a union, but unions break precise GC.
+	key uintptr
+}
+
+// sleep and wakeup on one-time events.
+// before any calls to notesleep or notewakeup,
+// must call noteclear to initialize the Note.
+// then, exactly one thread can call notesleep
+// and exactly one thread can call notewakeup (once).
+// once notewakeup has been called, the notesleep
+// will return.  future notesleep will return immediately.
+// subsequent noteclear must be called only after
+// previous notesleep has returned, e.g. it's disallowed
+// to call noteclear straight after notewakeup.
+//
+// notetsleep is like notesleep but wakes up after
+// a given number of nanoseconds even if the event
+// has not yet happened.  if a goroutine uses notetsleep to
+// wake up early, it must wait to call noteclear until it
+// can be sure that no other goroutine is calling
+// notewakeup.
+//
+// notesleep/notetsleep are generally called on g0,
+// notetsleepg is similar to notetsleep but is called on user g.
+type note struct {
+	// Futex-based impl treats it as uint32 key,
+	// while sema-based impl as M* waitm.
+	// Used to be a union, but unions break precise GC.
+	key uintptr
+}
+
+type funcval struct {
+	fn uintptr
+	// variable-size, fn-specific data here
+}
+
+type iface struct {
+	tab  *itab
+	data unsafe.Pointer
+}
+
+type eface struct {
+	_type *_type
+	data  unsafe.Pointer
+}
+
+func efaceOf(ep *interface{}) *eface {
+	return (*eface)(unsafe.Pointer(ep))
+}
+
+// The guintptr, muintptr, and puintptr are all used to bypass write barriers.
+// It is particularly important to avoid write barriers when the current P has
+// been released, because the GC thinks the world is stopped, and an
+// unexpected write barrier would not be synchronized with the GC,
+// which can lead to a half-executed write barrier that has marked the object
+// but not queued it. If the GC skips the object and completes before the
+// queuing can occur, it will incorrectly free the object.
+//
+// We tried using special assignment functions invoked only when not
+// holding a running P, but then some updates to a particular memory
+// word went through write barriers and some did not. This breaks the
+// write barrier shadow checking mode, and it is also scary: better to have
+// a word that is completely ignored by the GC than to have one for which
+// only a few updates are ignored.
+//
+// Gs and Ps are always reachable via true pointers in the
+// allgs and allp lists or (during allocation before they reach those lists)
+// from stack variables.
+//
+// Ms are always reachable via true pointers either from allm or
+// freem. Unlike Gs and Ps we do free Ms, so it's important that
+// nothing ever hold an muintptr across a safe point.
+
+// A guintptr holds a goroutine pointer, but typed as a uintptr
+// to bypass write barriers. It is used in the Gobuf goroutine state
+// and in scheduling lists that are manipulated without a P.
+//
+// The Gobuf.g goroutine pointer is almost always updated by assembly code.
+// In one of the few places it is updated by Go code - func save - it must be
+// treated as a uintptr to avoid a write barrier being emitted at a bad time.
+// Instead of figuring out how to emit the write barriers missing in the
+// assembly manipulation, we change the type of the field to uintptr,
+// so that it does not require write barriers at all.
+//
+// Goroutine structs are published in the allg list and never freed.
+// That will keep the goroutine structs from being collected.
+// There is never a time that Gobuf.g's contain the only references
+// to a goroutine: the publishing of the goroutine in allg comes first.
+// Goroutine pointers are also kept in non-GC-visible places like TLS,
+// so I can't see them ever moving. If we did want to start moving data
+// in the GC, we'd need to allocate the goroutine structs from an
+// alternate arena. Using guintptr doesn't make that problem any worse.
+type guintptr uintptr
+
+//go:nosplit
+func (gp guintptr) ptr() *g { return (*g)(unsafe.Pointer(gp)) }
+
+//go:nosplit
+func (gp *guintptr) set(g *g) { *gp = guintptr(unsafe.Pointer(g)) }
+
+//go:nosplit
+func (gp *guintptr) cas(old, new guintptr) bool {
+	return atomic.Casuintptr((*uintptr)(unsafe.Pointer(gp)), uintptr(old), uintptr(new))
+}
+
+// setGNoWB performs *gp = new without a write barrier.
+// For times when it's impractical to use a guintptr.
+//go:nosplit
+//go:nowritebarrier
+func setGNoWB(gp **g, new *g) {
+	(*guintptr)(unsafe.Pointer(gp)).set(new)
+}
+
+type puintptr uintptr
+
+//go:nosplit
+func (pp puintptr) ptr() *p { return (*p)(unsafe.Pointer(pp)) }
+
+//go:nosplit
+func (pp *puintptr) set(p *p) { *pp = puintptr(unsafe.Pointer(p)) }
+
+// muintptr is a *m that is not tracked by the garbage collector.
+//
+// Because we do free Ms, there are some additional constrains on
+// muintptrs:
+//
+// 1. Never hold an muintptr locally across a safe point.
+//
+// 2. Any muintptr in the heap must be owned by the M itself so it can
+//    ensure it is not in use when the last true *m is released.
+type muintptr uintptr
+
+//go:nosplit
+func (mp muintptr) ptr() *m { return (*m)(unsafe.Pointer(mp)) }
+
+//go:nosplit
+func (mp *muintptr) set(m *m) { *mp = muintptr(unsafe.Pointer(m)) }
+
+// setMNoWB performs *mp = new without a write barrier.
+// For times when it's impractical to use an muintptr.
+//go:nosplit
+//go:nowritebarrier
+func setMNoWB(mp **m, new *m) {
+	(*muintptr)(unsafe.Pointer(mp)).set(new)
+}
+
+type gobuf struct {
+	// The offsets of sp, pc, and g are known to (hard-coded in) libmach.
+	//
+	// ctxt is unusual with respect to GC: it may be a
+	// heap-allocated funcval, so GC needs to track it, but it
+	// needs to be set and cleared from assembly, where it's
+	// difficult to have write barriers. However, ctxt is really a
+	// saved, live register, and we only ever exchange it between
+	// the real register and the gobuf. Hence, we treat it as a
+	// root during stack scanning, which means assembly that saves
+	// and restores it doesn't need write barriers. It's still
+	// typed as a pointer so that any other writes from Go get
+	// write barriers.
+	sp   uintptr
+	pc   uintptr
+	g    guintptr
+	ctxt unsafe.Pointer
+	ret  sys.Uintreg
+	lr   uintptr
+	bp   uintptr // for framepointer-enabled architectures
+}
+
+// sudog represents a g in a wait list, such as for sending/receiving
+// on a channel.
+//
+// sudog is necessary because the g ↔ synchronization object relation
+// is many-to-many. A g can be on many wait lists, so there may be
+// many sudogs for one g; and many gs may be waiting on the same
+// synchronization object, so there may be many sudogs for one object.
+//
+// sudogs are allocated from a special pool. Use acquireSudog and
+// releaseSudog to allocate and free them.
+type sudog struct {
+	// The following fields are protected by the hchan.lock of the
+	// channel this sudog is blocking on. shrinkstack depends on
+	// this for sudogs involved in channel ops.
+
+	g *g
+
+	next *sudog
+	prev *sudog
+	elem unsafe.Pointer // data element (may point to stack)
+
+	// The following fields are never accessed concurrently.
+	// For channels, waitlink is only accessed by g.
+	// For semaphores, all fields (including the ones above)
+	// are only accessed when holding a semaRoot lock.
+
+	acquiretime int64
+	releasetime int64
+	ticket      uint32
+
+	// isSelect indicates g is participating in a select, so
+	// g.selectDone must be CAS'd to win the wake-up race.
+	isSelect bool
+
+	// success indicates whether communication over channel c
+	// succeeded. It is true if the goroutine was awoken because a
+	// value was delivered over channel c, and false if awoken
+	// because c was closed.
+	success bool
+
+	parent   *sudog // semaRoot binary tree
+	waitlink *sudog // g.waiting list or semaRoot
+	waittail *sudog // semaRoot
+	c        *hchan // channel
+}
+
+type libcall struct {
+	fn   uintptr
+	n    uintptr // number of parameters
+	args uintptr // parameters
+	r1   uintptr // return values
+	r2   uintptr
+	err  uintptr // error number
+}
+
+// Stack describes a Go execution stack.
+// The bounds of the stack are exactly [lo, hi),
+// with no implicit data structures on either side.
+type stack struct {
+	lo uintptr
+	hi uintptr
+}
+
+// heldLockInfo gives info on a held lock and the rank of that lock
+type heldLockInfo struct {
+	lockAddr uintptr
+	rank     lockRank
+}
+
+type g struct {
+	// Stack parameters.
+	// stack describes the actual stack memory: [stack.lo, stack.hi).
+	// stackguard0 is the stack pointer compared in the Go stack growth prologue.
+	// It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
+	// stackguard1 is the stack pointer compared in the C stack growth prologue.
+	// It is stack.lo+StackGuard on g0 and gsignal stacks.
+	// It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash).
+	stack       stack   // offset known to runtime/cgo
+	stackguard0 uintptr // offset known to liblink
+	stackguard1 uintptr // offset known to liblink
+
+	_panic       *_panic // innermost panic - offset known to liblink
+	_defer       *_defer // innermost defer
+	m            *m      // current m; offset known to arm liblink
+	sched        gobuf
+	syscallsp    uintptr        // if status==Gsyscall, syscallsp = sched.sp to use during gc
+	syscallpc    uintptr        // if status==Gsyscall, syscallpc = sched.pc to use during gc
+	stktopsp     uintptr        // expected sp at top of stack, to check in traceback
+	param        unsafe.Pointer // passed parameter on wakeup
+	atomicstatus uint32
+	stackLock    uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
+	goid         int64
+	schedlink    guintptr
+	waitsince    int64      // approx time when the g become blocked
+	waitreason   waitReason // if status==Gwaiting
+
+	preempt       bool // preemption signal, duplicates stackguard0 = stackpreempt
+	preemptStop   bool // transition to _Gpreempted on preemption; otherwise, just deschedule
+	preemptShrink bool // shrink stack at synchronous safe point
+
+	// asyncSafePoint is set if g is stopped at an asynchronous
+	// safe point. This means there are frames on the stack
+	// without precise pointer information.
+	asyncSafePoint bool
+
+	paniconfault bool // panic (instead of crash) on unexpected fault address
+	gcscandone   bool // g has scanned stack; protected by _Gscan bit in status
+	throwsplit   bool // must not split stack
+	// activeStackChans indicates that there are unlocked channels
+	// pointing into this goroutine's stack. If true, stack
+	// copying needs to acquire channel locks to protect these
+	// areas of the stack.
+	activeStackChans bool
+	// parkingOnChan indicates that the goroutine is about to
+	// park on a chansend or chanrecv. Used to signal an unsafe point
+	// for stack shrinking. It's a boolean value, but is updated atomically.
+	parkingOnChan uint8
+
+	raceignore     int8     // ignore race detection events
+	sysblocktraced bool     // StartTrace has emitted EvGoInSyscall about this goroutine
+	sysexitticks   int64    // cputicks when syscall has returned (for tracing)
+	traceseq       uint64   // trace event sequencer
+	tracelastp     puintptr // last P emitted an event for this goroutine
+	lockedm        muintptr
+	sig            uint32
+	writebuf       []byte
+	sigcode0       uintptr
+	sigcode1       uintptr
+	sigpc          uintptr
+	gopc           uintptr         // pc of go statement that created this goroutine
+	ancestors      *[]ancestorInfo // ancestor information goroutine(s) that created this goroutine (only used if debug.tracebackancestors)
+	startpc        uintptr         // pc of goroutine function
+	racectx        uintptr
+	waiting        *sudog         // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
+	cgoCtxt        []uintptr      // cgo traceback context
+	labels         unsafe.Pointer // profiler labels
+	timer          *timer         // cached timer for time.Sleep
+	selectDone     uint32         // are we participating in a select and did someone win the race?
+
+	// Per-G GC state
+
+	// gcAssistBytes is this G's GC assist credit in terms of
+	// bytes allocated. If this is positive, then the G has credit
+	// to allocate gcAssistBytes bytes without assisting. If this
+	// is negative, then the G must correct this by performing
+	// scan work. We track this in bytes to make it fast to update
+	// and check for debt in the malloc hot path. The assist ratio
+	// determines how this corresponds to scan work debt.
+	gcAssistBytes int64
+}
+
+type m struct {
+	g0      *g     // goroutine with scheduling stack
+	morebuf gobuf  // gobuf arg to morestack
+	divmod  uint32 // div/mod denominator for arm - known to liblink
+
+	// Fields not known to debuggers.
+	procid        uint64       // for debuggers, but offset not hard-coded
+	gsignal       *g           // signal-handling g
+	goSigStack    gsignalStack // Go-allocated signal handling stack
+	sigmask       sigset       // storage for saved signal mask
+	tls           [6]uintptr   // thread-local storage (for x86 extern register)
+	mstartfn      func()
+	curg          *g       // current running goroutine
+	caughtsig     guintptr // goroutine running during fatal signal
+	p             puintptr // attached p for executing go code (nil if not executing go code)
+	nextp         puintptr
+	oldp          puintptr // the p that was attached before executing a syscall
+	id            int64
+	mallocing     int32
+	throwing      int32
+	preemptoff    string // if != "", keep curg running on this m
+	locks         int32
+	dying         int32
+	profilehz     int32
+	spinning      bool // m is out of work and is actively looking for work
+	blocked       bool // m is blocked on a note
+	newSigstack   bool // minit on C thread called sigaltstack
+	printlock     int8
+	incgo         bool   // m is executing a cgo call
+	freeWait      uint32 // if == 0, safe to free g0 and delete m (atomic)
+	fastrand      [2]uint32
+	needextram    bool
+	traceback     uint8
+	ncgocall      uint64      // number of cgo calls in total
+	ncgo          int32       // number of cgo calls currently in progress
+	cgoCallersUse uint32      // if non-zero, cgoCallers in use temporarily
+	cgoCallers    *cgoCallers // cgo traceback if crashing in cgo call
+	doesPark      bool        // non-P running threads: sysmon and newmHandoff never use .park
+	park          note
+	alllink       *m // on allm
+	schedlink     muintptr
+	lockedg       guintptr
+	createstack   [32]uintptr // stack that created this thread.
+	lockedExt     uint32      // tracking for external LockOSThread
+	lockedInt     uint32      // tracking for internal lockOSThread
+	nextwaitm     muintptr    // next m waiting for lock
+	waitunlockf   func(*g, unsafe.Pointer) bool
+	waitlock      unsafe.Pointer
+	waittraceev   byte
+	waittraceskip int
+	startingtrace bool
+	syscalltick   uint32
+	freelink      *m // on sched.freem
+
+	// mFixup is used to synchronize OS related m state
+	// (credentials etc) use mutex to access. To avoid deadlocks
+	// an atomic.Load() of used being zero in mDoFixupFn()
+	// guarantees fn is nil.
+	mFixup struct {
+		lock mutex
+		used uint32
+		fn   func(bool) bool
+	}
+
+	// these are here because they are too large to be on the stack
+	// of low-level NOSPLIT functions.
+	libcall   libcall
+	libcallpc uintptr // for cpu profiler
+	libcallsp uintptr
+	libcallg  guintptr
+	syscall   libcall // stores syscall parameters on windows
+
+	vdsoSP uintptr // SP for traceback while in VDSO call (0 if not in call)
+	vdsoPC uintptr // PC for traceback while in VDSO call
+
+	// preemptGen counts the number of completed preemption
+	// signals. This is used to detect when a preemption is
+	// requested, but fails. Accessed atomically.
+	preemptGen uint32
+
+	// Whether this is a pending preemption signal on this M.
+	// Accessed atomically.
+	signalPending uint32
+
+	dlogPerM
+
+	mOS
+
+	// Up to 10 locks held by this m, maintained by the lock ranking code.
+	locksHeldLen int
+	locksHeld    [10]heldLockInfo
+}
+
+type p struct {
+	id          int32
+	status      uint32 // one of pidle/prunning/...
+	link        puintptr
+	schedtick   uint32     // incremented on every scheduler call
+	syscalltick uint32     // incremented on every system call
+	sysmontick  sysmontick // last tick observed by sysmon
+	m           muintptr   // back-link to associated m (nil if idle)
+	mcache      *mcache
+	pcache      pageCache
+	raceprocctx uintptr
+
+	deferpool    [5][]*_defer // pool of available defer structs of different sizes (see panic.go)
+	deferpoolbuf [5][32]*_defer
+
+	// Cache of goroutine ids, amortizes accesses to runtime·sched.goidgen.
+	goidcache    uint64
+	goidcacheend uint64
+
+	// Queue of runnable goroutines. Accessed without lock.
+	runqhead uint32
+	runqtail uint32
+	runq     [256]guintptr
+	// runnext, if non-nil, is a runnable G that was ready'd by
+	// the current G and should be run next instead of what's in
+	// runq if there's time remaining in the running G's time
+	// slice. It will inherit the time left in the current time
+	// slice. If a set of goroutines is locked in a
+	// communicate-and-wait pattern, this schedules that set as a
+	// unit and eliminates the (potentially large) scheduling
+	// latency that otherwise arises from adding the ready'd
+	// goroutines to the end of the run queue.
+	runnext guintptr
+
+	// Available G's (status == Gdead)
+	gFree struct {
+		gList
+		n int32
+	}
+
+	sudogcache []*sudog
+	sudogbuf   [128]*sudog
+
+	// Cache of mspan objects from the heap.
+	mspancache struct {
+		// We need an explicit length here because this field is used
+		// in allocation codepaths where write barriers are not allowed,
+		// and eliminating the write barrier/keeping it eliminated from
+		// slice updates is tricky, moreso than just managing the length
+		// ourselves.
+		len int
+		buf [128]*mspan
+	}
+
+	tracebuf traceBufPtr
+
+	// traceSweep indicates the sweep events should be traced.
+	// This is used to defer the sweep start event until a span
+	// has actually been swept.
+	traceSweep bool
+	// traceSwept and traceReclaimed track the number of bytes
+	// swept and reclaimed by sweeping in the current sweep loop.
+	traceSwept, traceReclaimed uintptr
+
+	palloc persistentAlloc // per-P to avoid mutex
+
+	_ uint32 // Alignment for atomic fields below
+
+	// The when field of the first entry on the timer heap.
+	// This is updated using atomic functions.
+	// This is 0 if the timer heap is empty.
+	timer0When uint64
+
+	// The earliest known nextwhen field of a timer with
+	// timerModifiedEarlier status. Because the timer may have been
+	// modified again, there need not be any timer with this value.
+	// This is updated using atomic functions.
+	// This is 0 if there are no timerModifiedEarlier timers.
+	timerModifiedEarliest uint64
+
+	// Per-P GC state
+	gcAssistTime         int64 // Nanoseconds in assistAlloc
+	gcFractionalMarkTime int64 // Nanoseconds in fractional mark worker (atomic)
+
+	// gcMarkWorkerMode is the mode for the next mark worker to run in.
+	// That is, this is used to communicate with the worker goroutine
+	// selected for immediate execution by
+	// gcController.findRunnableGCWorker. When scheduling other goroutines,
+	// this field must be set to gcMarkWorkerNotWorker.
+	gcMarkWorkerMode gcMarkWorkerMode
+	// gcMarkWorkerStartTime is the nanotime() at which the most recent
+	// mark worker started.
+	gcMarkWorkerStartTime int64
+
+	// gcw is this P's GC work buffer cache. The work buffer is
+	// filled by write barriers, drained by mutator assists, and
+	// disposed on certain GC state transitions.
+	gcw gcWork
+
+	// wbBuf is this P's GC write barrier buffer.
+	//
+	// TODO: Consider caching this in the running G.
+	wbBuf wbBuf
+
+	runSafePointFn uint32 // if 1, run sched.safePointFn at next safe point
+
+	// statsSeq is a counter indicating whether this P is currently
+	// writing any stats. Its value is even when not, odd when it is.
+	statsSeq uint32
+
+	// Lock for timers. We normally access the timers while running
+	// on this P, but the scheduler can also do it from a different P.
+	timersLock mutex
+
+	// Actions to take at some time. This is used to implement the
+	// standard library's time package.
+	// Must hold timersLock to access.
+	timers []*timer
+
+	// Number of timers in P's heap.
+	// Modified using atomic instructions.
+	numTimers uint32
+
+	// Number of timerDeleted timers in P's heap.
+	// Modified using atomic instructions.
+	deletedTimers uint32
+
+	// Race context used while executing timer functions.
+	timerRaceCtx uintptr
+
+	// preempt is set to indicate that this P should be enter the
+	// scheduler ASAP (regardless of what G is running on it).
+	preempt bool
+
+	pad cpu.CacheLinePad
+}
+
+type schedt struct {
+	// accessed atomically. keep at top to ensure alignment on 32-bit systems.
+	goidgen   uint64
+	lastpoll  uint64 // time of last network poll, 0 if currently polling
+	pollUntil uint64 // time to which current poll is sleeping
+
+	lock mutex
+
+	// When increasing nmidle, nmidlelocked, nmsys, or nmfreed, be
+	// sure to call checkdead().
+
+	midle        muintptr // idle m's waiting for work
+	nmidle       int32    // number of idle m's waiting for work
+	nmidlelocked int32    // number of locked m's waiting for work
+	mnext        int64    // number of m's that have been created and next M ID
+	maxmcount    int32    // maximum number of m's allowed (or die)
+	nmsys        int32    // number of system m's not counted for deadlock
+	nmfreed      int64    // cumulative number of freed m's
+
+	ngsys uint32 // number of system goroutines; updated atomically
+
+	pidle      puintptr // idle p's
+	npidle     uint32
+	nmspinning uint32 // See "Worker thread parking/unparking" comment in proc.go.
+
+	// Global runnable queue.
+	runq     gQueue
+	runqsize int32
+
+	// disable controls selective disabling of the scheduler.
+	//
+	// Use schedEnableUser to control this.
+	//
+	// disable is protected by sched.lock.
+	disable struct {
+		// user disables scheduling of user goroutines.
+		user     bool
+		runnable gQueue // pending runnable Gs
+		n        int32  // length of runnable
+	}
+
+	// Global cache of dead G's.
+	gFree struct {
+		lock    mutex
+		stack   gList // Gs with stacks
+		noStack gList // Gs without stacks
+		n       int32
+	}
+
+	// Central cache of sudog structs.
+	sudoglock  mutex
+	sudogcache *sudog
+
+	// Central pool of available defer structs of different sizes.
+	deferlock mutex
+	deferpool [5]*_defer
+
+	// freem is the list of m's waiting to be freed when their
+	// m.exited is set. Linked through m.freelink.
+	freem *m
+
+	gcwaiting  uint32 // gc is waiting to run
+	stopwait   int32
+	stopnote   note
+	sysmonwait uint32
+	sysmonnote note
+
+	// While true, sysmon not ready for mFixup calls.
+	// Accessed atomically.
+	sysmonStarting uint32
+
+	// safepointFn should be called on each P at the next GC
+	// safepoint if p.runSafePointFn is set.
+	safePointFn   func(*p)
+	safePointWait int32
+	safePointNote note
+
+	profilehz int32 // cpu profiling rate
+
+	procresizetime int64 // nanotime() of last change to gomaxprocs
+	totaltime      int64 // ∫gomaxprocs dt up to procresizetime
+
+	// sysmonlock protects sysmon's actions on the runtime.
+	//
+	// Acquire and hold this mutex to block sysmon from interacting
+	// with the rest of the runtime.
+	sysmonlock mutex
+}
+
+// Values for the flags field of a sigTabT.
+const (
+	_SigNotify   = 1 << iota // let signal.Notify have signal, even if from kernel
+	_SigKill                 // if signal.Notify doesn't take it, exit quietly
+	_SigThrow                // if signal.Notify doesn't take it, exit loudly
+	_SigPanic                // if the signal is from the kernel, panic
+	_SigDefault              // if the signal isn't explicitly requested, don't monitor it
+	_SigGoExit               // cause all runtime procs to exit (only used on Plan 9).
+	_SigSetStack             // add SA_ONSTACK to libc handler
+	_SigUnblock              // always unblock; see blockableSig
+	_SigIgn                  // _SIG_DFL action is to ignore the signal
+)
+
+// Layout of in-memory per-function information prepared by linker
+// See https://golang.org/s/go12symtab.
+// Keep in sync with linker (../cmd/link/internal/ld/pcln.go:/pclntab)
+// and with package debug/gosym and with symtab.go in package runtime.
+type _func struct {
+	entry   uintptr // start pc
+	nameoff int32   // function name
+
+	args        int32  // in/out args size
+	deferreturn uint32 // offset of start of a deferreturn call instruction from entry, if any.
+
+	pcsp      uint32
+	pcfile    uint32
+	pcln      uint32
+	npcdata   uint32
+	cuOffset  uint32  // runtime.cutab offset of this function's CU
+	funcID    funcID  // set for certain special runtime functions
+	_         [2]byte // pad
+	nfuncdata uint8   // must be last
+}
+
+// Pseudo-Func that is returned for PCs that occur in inlined code.
+// A *Func can be either a *_func or a *funcinl, and they are distinguished
+// by the first uintptr.
+type funcinl struct {
+	zero  uintptr // set to 0 to distinguish from _func
+	entry uintptr // entry of the real (the "outermost") frame.
+	name  string
+	file  string
+	line  int
+}
+
+// layout of Itab known to compilers
+// allocated in non-garbage-collected memory
+// Needs to be in sync with
+// ../cmd/compile/internal/gc/reflect.go:/^func.dumptabs.
+type itab struct {
+	inter *interfacetype
+	_type *_type
+	hash  uint32 // copy of _type.hash. Used for type switches.
+	_     [4]byte
+	fun   [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
+}
+
+// Lock-free stack node.
+// Also known to export_test.go.
+type lfnode struct {
+	next    uint64
+	pushcnt uintptr
+}
+
+type forcegcstate struct {
+	lock mutex
+	g    *g
+	idle uint32
+}
+
+// extendRandom extends the random numbers in r[:n] to the whole slice r.
+// Treats n<0 as n==0.
+func extendRandom(r []byte, n int) {
+	if n < 0 {
+		n = 0
+	}
+	for n < len(r) {
+		// Extend random bits using hash function & time seed
+		w := n
+		if w > 16 {
+			w = 16
+		}
+		h := memhash(unsafe.Pointer(&r[n-w]), uintptr(nanotime()), uintptr(w))
+		for i := 0; i < sys.PtrSize && n < len(r); i++ {
+			r[n] = byte(h)
+			n++
+			h >>= 8
+		}
+	}
+}
+
+// A _defer holds an entry on the list of deferred calls.
+// If you add a field here, add code to clear it in freedefer and deferProcStack
+// This struct must match the code in cmd/compile/internal/gc/reflect.go:deferstruct
+// and cmd/compile/internal/gc/ssa.go:(*state).call.
+// Some defers will be allocated on the stack and some on the heap.
+// All defers are logically part of the stack, so write barriers to
+// initialize them are not required. All defers must be manually scanned,
+// and for heap defers, marked.
+type _defer struct {
+	siz     int32 // includes both arguments and results
+	started bool
+	heap    bool
+	// openDefer indicates that this _defer is for a frame with open-coded
+	// defers. We have only one defer record for the entire frame (which may
+	// currently have 0, 1, or more defers active).
+	openDefer bool
+	sp        uintptr  // sp at time of defer
+	pc        uintptr  // pc at time of defer
+	fn        *funcval // can be nil for open-coded defers
+	_panic    *_panic  // panic that is running defer
+	link      *_defer
+
+	// If openDefer is true, the fields below record values about the stack
+	// frame and associated function that has the open-coded defer(s). sp
+	// above will be the sp for the frame, and pc will be address of the
+	// deferreturn call in the function.
+	fd   unsafe.Pointer // funcdata for the function associated with the frame
+	varp uintptr        // value of varp for the stack frame
+	// framepc is the current pc associated with the stack frame. Together,
+	// with sp above (which is the sp associated with the stack frame),
+	// framepc/sp can be used as pc/sp pair to continue a stack trace via
+	// gentraceback().
+	framepc uintptr
+}
+
+// A _panic holds information about an active panic.
+//
+// A _panic value must only ever live on the stack.
+//
+// The argp and link fields are stack pointers, but don't need special
+// handling during stack growth: because they are pointer-typed and
+// _panic values only live on the stack, regular stack pointer
+// adjustment takes care of them.
+type _panic struct {
+	argp      unsafe.Pointer // pointer to arguments of deferred call run during panic; cannot move - known to liblink
+	arg       interface{}    // argument to panic
+	link      *_panic        // link to earlier panic
+	pc        uintptr        // where to return to in runtime if this panic is bypassed
+	sp        unsafe.Pointer // where to return to in runtime if this panic is bypassed
+	recovered bool           // whether this panic is over
+	aborted   bool           // the panic was aborted
+	goexit    bool
+}
+
+// stack traces
+type stkframe struct {
+	fn       funcInfo   // function being run
+	pc       uintptr    // program counter within fn
+	continpc uintptr    // program counter where execution can continue, or 0 if not
+	lr       uintptr    // program counter at caller aka link register
+	sp       uintptr    // stack pointer at pc
+	fp       uintptr    // stack pointer at caller aka frame pointer
+	varp     uintptr    // top of local variables
+	argp     uintptr    // pointer to function arguments
+	arglen   uintptr    // number of bytes at argp
+	argmap   *bitvector // force use of this argmap
+}
+
+// ancestorInfo records details of where a goroutine was started.
+type ancestorInfo struct {
+	pcs  []uintptr // pcs from the stack of this goroutine
+	goid int64     // goroutine id of this goroutine; original goroutine possibly dead
+	gopc uintptr   // pc of go statement that created this goroutine
+}
+
+const (
+	_TraceRuntimeFrames = 1 << iota // include frames for internal runtime functions.
+	_TraceTrap                      // the initial PC, SP are from a trap, not a return PC from a call
+	_TraceJumpStack                 // if traceback is on a systemstack, resume trace at g that called into it
+)
+
+// The maximum number of frames we print for a traceback
+const _TracebackMaxFrames = 100
+
+// A waitReason explains why a goroutine has been stopped.
+// See gopark. Do not re-use waitReasons, add new ones.
+type waitReason uint8
+
+const (
+	waitReasonZero                  waitReason = iota // ""
+	waitReasonGCAssistMarking                         // "GC assist marking"
+	waitReasonIOWait                                  // "IO wait"
+	waitReasonChanReceiveNilChan                      // "chan receive (nil chan)"
+	waitReasonChanSendNilChan                         // "chan send (nil chan)"
+	waitReasonDumpingHeap                             // "dumping heap"
+	waitReasonGarbageCollection                       // "garbage collection"
+	waitReasonGarbageCollectionScan                   // "garbage collection scan"
+	waitReasonPanicWait                               // "panicwait"
+	waitReasonSelect                                  // "select"
+	waitReasonSelectNoCases                           // "select (no cases)"
+	waitReasonGCAssistWait                            // "GC assist wait"
+	waitReasonGCSweepWait                             // "GC sweep wait"
+	waitReasonGCScavengeWait                          // "GC scavenge wait"
+	waitReasonChanReceive                             // "chan receive"
+	waitReasonChanSend                                // "chan send"
+	waitReasonFinalizerWait                           // "finalizer wait"
+	waitReasonForceGCIdle                             // "force gc (idle)"
+	waitReasonSemacquire                              // "semacquire"
+	waitReasonSleep                                   // "sleep"
+	waitReasonSyncCondWait                            // "sync.Cond.Wait"
+	waitReasonTimerGoroutineIdle                      // "timer goroutine (idle)"
+	waitReasonTraceReaderBlocked                      // "trace reader (blocked)"
+	waitReasonWaitForGCCycle                          // "wait for GC cycle"
+	waitReasonGCWorkerIdle                            // "GC worker (idle)"
+	waitReasonPreempted                               // "preempted"
+	waitReasonDebugCall                               // "debug call"
+)
+
+var waitReasonStrings = [...]string{
+	waitReasonZero:                  "",
+	waitReasonGCAssistMarking:       "GC assist marking",
+	waitReasonIOWait:                "IO wait",
+	waitReasonChanReceiveNilChan:    "chan receive (nil chan)",
+	waitReasonChanSendNilChan:       "chan send (nil chan)",
+	waitReasonDumpingHeap:           "dumping heap",
+	waitReasonGarbageCollection:     "garbage collection",
+	waitReasonGarbageCollectionScan: "garbage collection scan",
+	waitReasonPanicWait:             "panicwait",
+	waitReasonSelect:                "select",
+	waitReasonSelectNoCases:         "select (no cases)",
+	waitReasonGCAssistWait:          "GC assist wait",
+	waitReasonGCSweepWait:           "GC sweep wait",
+	waitReasonGCScavengeWait:        "GC scavenge wait",
+	waitReasonChanReceive:           "chan receive",
+	waitReasonChanSend:              "chan send",
+	waitReasonFinalizerWait:         "finalizer wait",
+	waitReasonForceGCIdle:           "force gc (idle)",
+	waitReasonSemacquire:            "semacquire",
+	waitReasonSleep:                 "sleep",
+	waitReasonSyncCondWait:          "sync.Cond.Wait",
+	waitReasonTimerGoroutineIdle:    "timer goroutine (idle)",
+	waitReasonTraceReaderBlocked:    "trace reader (blocked)",
+	waitReasonWaitForGCCycle:        "wait for GC cycle",
+	waitReasonGCWorkerIdle:          "GC worker (idle)",
+	waitReasonPreempted:             "preempted",
+	waitReasonDebugCall:             "debug call",
+}
+
+func (w waitReason) String() string {
+	if w < 0 || w >= waitReason(len(waitReasonStrings)) {
+		return "unknown wait reason"
+	}
+	return waitReasonStrings[w]
+}
+
+var (
+	allm       *m
+	gomaxprocs int32
+	ncpu       int32
+	forcegc    forcegcstate
+	sched      schedt
+	newprocs   int32
+
+	// allpLock protects P-less reads and size changes of allp, idlepMask,
+	// and timerpMask, and all writes to allp.
+	allpLock mutex
+	// len(allp) == gomaxprocs; may change at safe points, otherwise
+	// immutable.
+	allp []*p
+	// Bitmask of Ps in _Pidle list, one bit per P. Reads and writes must
+	// be atomic. Length may change at safe points.
+	//
+	// Each P must update only its own bit. In order to maintain
+	// consistency, a P going idle must the idle mask simultaneously with
+	// updates to the idle P list under the sched.lock, otherwise a racing
+	// pidleget may clear the mask before pidleput sets the mask,
+	// corrupting the bitmap.
+	//
+	// N.B., procresize takes ownership of all Ps in stopTheWorldWithSema.
+	idlepMask pMask
+	// Bitmask of Ps that may have a timer, one bit per P. Reads and writes
+	// must be atomic. Length may change at safe points.
+	timerpMask pMask
+
+	// Pool of GC parked background workers. Entries are type
+	// *gcBgMarkWorkerNode.
+	gcBgMarkWorkerPool lfstack
+
+	// Total number of gcBgMarkWorker goroutines. Protected by worldsema.
+	gcBgMarkWorkerCount int32
+
+	// Information about what cpu features are available.
+	// Packages outside the runtime should not use these
+	// as they are not an external api.
+	// Set on startup in asm_{386,amd64}.s
+	processorVersionInfo uint32
+	isIntel              bool
+	lfenceBeforeRdtsc    bool
+
+	goarm uint8 // set by cmd/link on arm systems
+)
+
+// Set by the linker so the runtime can determine the buildmode.
+var (
+	islibrary bool // -buildmode=c-shared
+	isarchive bool // -buildmode=c-archive
+)
+
+// Must agree with cmd/internal/objabi.Framepointer_enabled.
+const framepointer_enabled = GOARCH == "amd64" || GOARCH == "arm64" && (GOOS == "linux" || GOOS == "darwin" || GOOS == "ios")
diff --git a/src/runtime/runtime_linux_test.go b/src/runtime/runtime_linux_test.go
new file mode 100644
index 0000000..cd59368
--- /dev/null
+++ b/src/runtime/runtime_linux_test.go
@@ -0,0 +1,63 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	. "runtime"
+	"syscall"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+var pid, tid int
+
+func init() {
+	// Record pid and tid of init thread for use during test.
+	// The call to LockOSThread is just to exercise it;
+	// we can't test that it does anything.
+	// Instead we're testing that the conditions are good
+	// for how it is used in init (must be on main thread).
+	pid, tid = syscall.Getpid(), syscall.Gettid()
+	LockOSThread()
+
+	sysNanosleep = func(d time.Duration) {
+		// Invoke a blocking syscall directly; calling time.Sleep()
+		// would deschedule the goroutine instead.
+		ts := syscall.NsecToTimespec(d.Nanoseconds())
+		for {
+			if err := syscall.Nanosleep(&ts, &ts); err != syscall.EINTR {
+				return
+			}
+		}
+	}
+}
+
+func TestLockOSThread(t *testing.T) {
+	if pid != tid {
+		t.Fatalf("pid=%d but tid=%d", pid, tid)
+	}
+}
+
+// Test that error values are negative.
+// Use a misaligned pointer to get -EINVAL.
+func TestMincoreErrorSign(t *testing.T) {
+	var dst byte
+	v := Mincore(Add(unsafe.Pointer(new(int32)), 1), 1, &dst)
+
+	const EINVAL = 0x16
+	if v != -EINVAL {
+		t.Errorf("mincore = %v, want %v", v, -EINVAL)
+	}
+}
+
+func TestEpollctlErrorSign(t *testing.T) {
+	v := Epollctl(-1, 1, -1, unsafe.Pointer(&EpollEvent{}))
+
+	const EBADF = 0x09
+	if v != -EBADF {
+		t.Errorf("epollctl = %v, want %v", v, -EBADF)
+	}
+}
diff --git a/src/runtime/runtime_mmap_test.go b/src/runtime/runtime_mmap_test.go
new file mode 100644
index 0000000..bb0b747
--- /dev/null
+++ b/src/runtime/runtime_mmap_test.go
@@ -0,0 +1,53 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+	"unsafe"
+)
+
+// Test that the error value returned by mmap is positive, as that is
+// what the code in mem_bsd.go, mem_darwin.go, and mem_linux.go expects.
+// See the uses of ENOMEM in sysMap in those files.
+func TestMmapErrorSign(t *testing.T) {
+	p, err := runtime.Mmap(nil, ^uintptr(0)&^(runtime.GetPhysPageSize()-1), 0, runtime.MAP_ANON|runtime.MAP_PRIVATE, -1, 0)
+
+	if p != nil || err != runtime.ENOMEM {
+		t.Errorf("mmap = %v, %v, want nil, %v", p, err, runtime.ENOMEM)
+	}
+}
+
+func TestPhysPageSize(t *testing.T) {
+	// Mmap fails if the address is not page aligned, so we can
+	// use this to test if the page size is the true page size.
+	ps := runtime.GetPhysPageSize()
+
+	// Get a region of memory to play with. This should be page-aligned.
+	b, err := runtime.Mmap(nil, 2*ps, 0, runtime.MAP_ANON|runtime.MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		t.Fatalf("Mmap: %v", err)
+	}
+
+	if runtime.GOOS == "aix" {
+		// AIX does not allow mapping a range that is already mapped.
+		runtime.Munmap(unsafe.Pointer(uintptr(b)), 2*ps)
+	}
+
+	// Mmap should fail at a half page into the buffer.
+	_, err = runtime.Mmap(unsafe.Pointer(uintptr(b)+ps/2), ps, 0, runtime.MAP_ANON|runtime.MAP_PRIVATE|runtime.MAP_FIXED, -1, 0)
+	if err == 0 {
+		t.Errorf("Mmap should have failed with half-page alignment %d, but succeeded: %v", ps/2, err)
+	}
+
+	// Mmap should succeed at a full page into the buffer.
+	_, err = runtime.Mmap(unsafe.Pointer(uintptr(b)+ps), ps, 0, runtime.MAP_ANON|runtime.MAP_PRIVATE|runtime.MAP_FIXED, -1, 0)
+	if err != 0 {
+		t.Errorf("Mmap at full-page alignment %d failed: %v", ps, err)
+	}
+}
diff --git a/src/runtime/runtime_test.go b/src/runtime/runtime_test.go
new file mode 100644
index 0000000..e5d2d97
--- /dev/null
+++ b/src/runtime/runtime_test.go
@@ -0,0 +1,364 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"flag"
+	"io"
+	. "runtime"
+	"runtime/debug"
+	"strings"
+	"testing"
+	"unsafe"
+)
+
+var flagQuick = flag.Bool("quick", false, "skip slow tests, for second run in all.bash")
+
+func init() {
+	// We're testing the runtime, so make tracebacks show things
+	// in the runtime. This only raises the level, so it won't
+	// override GOTRACEBACK=crash from the user.
+	SetTracebackEnv("system")
+}
+
+var errf error
+
+func errfn() error {
+	return errf
+}
+
+func errfn1() error {
+	return io.EOF
+}
+
+func BenchmarkIfaceCmp100(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < 100; j++ {
+			if errfn() == io.EOF {
+				b.Fatal("bad comparison")
+			}
+		}
+	}
+}
+
+func BenchmarkIfaceCmpNil100(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < 100; j++ {
+			if errfn1() == nil {
+				b.Fatal("bad comparison")
+			}
+		}
+	}
+}
+
+var efaceCmp1 interface{}
+var efaceCmp2 interface{}
+
+func BenchmarkEfaceCmpDiff(b *testing.B) {
+	x := 5
+	efaceCmp1 = &x
+	y := 6
+	efaceCmp2 = &y
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < 100; j++ {
+			if efaceCmp1 == efaceCmp2 {
+				b.Fatal("bad comparison")
+			}
+		}
+	}
+}
+
+func BenchmarkEfaceCmpDiffIndirect(b *testing.B) {
+	efaceCmp1 = [2]int{1, 2}
+	efaceCmp2 = [2]int{1, 2}
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < 100; j++ {
+			if efaceCmp1 != efaceCmp2 {
+				b.Fatal("bad comparison")
+			}
+		}
+	}
+}
+
+func BenchmarkDefer(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		defer1()
+	}
+}
+
+func defer1() {
+	defer func(x, y, z int) {
+		if recover() != nil || x != 1 || y != 2 || z != 3 {
+			panic("bad recover")
+		}
+	}(1, 2, 3)
+}
+
+func BenchmarkDefer10(b *testing.B) {
+	for i := 0; i < b.N/10; i++ {
+		defer2()
+	}
+}
+
+func defer2() {
+	for i := 0; i < 10; i++ {
+		defer func(x, y, z int) {
+			if recover() != nil || x != 1 || y != 2 || z != 3 {
+				panic("bad recover")
+			}
+		}(1, 2, 3)
+	}
+}
+
+func BenchmarkDeferMany(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		defer func(x, y, z int) {
+			if recover() != nil || x != 1 || y != 2 || z != 3 {
+				panic("bad recover")
+			}
+		}(1, 2, 3)
+	}
+}
+
+func BenchmarkPanicRecover(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		defer3()
+	}
+}
+
+func defer3() {
+	defer func(x, y, z int) {
+		if recover() == nil {
+			panic("failed recover")
+		}
+	}(1, 2, 3)
+	panic("hi")
+}
+
+// golang.org/issue/7063
+func TestStopCPUProfilingWithProfilerOff(t *testing.T) {
+	SetCPUProfileRate(0)
+}
+
+// Addresses to test for faulting behavior.
+// This is less a test of SetPanicOnFault and more a check that
+// the operating system and the runtime can process these faults
+// correctly. That is, we're indirectly testing that without SetPanicOnFault
+// these would manage to turn into ordinary crashes.
+// Note that these are truncated on 32-bit systems, so the bottom 32 bits
+// of the larger addresses must themselves be invalid addresses.
+// We might get unlucky and the OS might have mapped one of these
+// addresses, but probably not: they're all in the first page, very high
+// addresses that normally an OS would reserve for itself, or malformed
+// addresses. Even so, we might have to remove one or two on different
+// systems. We will see.
+
+var faultAddrs = []uint64{
+	// low addresses
+	0,
+	1,
+	0xfff,
+	// high (kernel) addresses
+	// or else malformed.
+	0xffffffffffffffff,
+	0xfffffffffffff001,
+	0xffffffffffff0001,
+	0xfffffffffff00001,
+	0xffffffffff000001,
+	0xfffffffff0000001,
+	0xffffffff00000001,
+	0xfffffff000000001,
+	0xffffff0000000001,
+	0xfffff00000000001,
+	0xffff000000000001,
+	0xfff0000000000001,
+	0xff00000000000001,
+	0xf000000000000001,
+	0x8000000000000001,
+}
+
+func TestSetPanicOnFault(t *testing.T) {
+	old := debug.SetPanicOnFault(true)
+	defer debug.SetPanicOnFault(old)
+
+	nfault := 0
+	for _, addr := range faultAddrs {
+		testSetPanicOnFault(t, uintptr(addr), &nfault)
+	}
+	if nfault == 0 {
+		t.Fatalf("none of the addresses faulted")
+	}
+}
+
+// testSetPanicOnFault tests one potentially faulting address.
+// It deliberately constructs and uses an invalid pointer,
+// so mark it as nocheckptr.
+//go:nocheckptr
+func testSetPanicOnFault(t *testing.T, addr uintptr, nfault *int) {
+	if GOOS == "js" {
+		t.Skip("js does not support catching faults")
+	}
+
+	defer func() {
+		if err := recover(); err != nil {
+			*nfault++
+		}
+	}()
+
+	// The read should fault, except that sometimes we hit
+	// addresses that have had C or kernel pages mapped there
+	// readable by user code. So just log the content.
+	// If no addresses fault, we'll fail the test.
+	v := *(*byte)(unsafe.Pointer(addr))
+	t.Logf("addr %#x: %#x\n", addr, v)
+}
+
+func eqstring_generic(s1, s2 string) bool {
+	if len(s1) != len(s2) {
+		return false
+	}
+	// optimization in assembly versions:
+	// if s1.str == s2.str { return true }
+	for i := 0; i < len(s1); i++ {
+		if s1[i] != s2[i] {
+			return false
+		}
+	}
+	return true
+}
+
+func TestEqString(t *testing.T) {
+	// This isn't really an exhaustive test of == on strings, it's
+	// just a convenient way of documenting (via eqstring_generic)
+	// what == does.
+	s := []string{
+		"",
+		"a",
+		"c",
+		"aaa",
+		"ccc",
+		"cccc"[:3], // same contents, different string
+		"1234567890",
+	}
+	for _, s1 := range s {
+		for _, s2 := range s {
+			x := s1 == s2
+			y := eqstring_generic(s1, s2)
+			if x != y {
+				t.Errorf(`("%s" == "%s") = %t, want %t`, s1, s2, x, y)
+			}
+		}
+	}
+}
+
+func TestTrailingZero(t *testing.T) {
+	// make sure we add padding for structs with trailing zero-sized fields
+	type T1 struct {
+		n int32
+		z [0]byte
+	}
+	if unsafe.Sizeof(T1{}) != 8 {
+		t.Errorf("sizeof(%#v)==%d, want 8", T1{}, unsafe.Sizeof(T1{}))
+	}
+	type T2 struct {
+		n int64
+		z struct{}
+	}
+	if unsafe.Sizeof(T2{}) != 8+unsafe.Sizeof(Uintreg(0)) {
+		t.Errorf("sizeof(%#v)==%d, want %d", T2{}, unsafe.Sizeof(T2{}), 8+unsafe.Sizeof(Uintreg(0)))
+	}
+	type T3 struct {
+		n byte
+		z [4]struct{}
+	}
+	if unsafe.Sizeof(T3{}) != 2 {
+		t.Errorf("sizeof(%#v)==%d, want 2", T3{}, unsafe.Sizeof(T3{}))
+	}
+	// make sure padding can double for both zerosize and alignment
+	type T4 struct {
+		a int32
+		b int16
+		c int8
+		z struct{}
+	}
+	if unsafe.Sizeof(T4{}) != 8 {
+		t.Errorf("sizeof(%#v)==%d, want 8", T4{}, unsafe.Sizeof(T4{}))
+	}
+	// make sure we don't pad a zero-sized thing
+	type T5 struct {
+	}
+	if unsafe.Sizeof(T5{}) != 0 {
+		t.Errorf("sizeof(%#v)==%d, want 0", T5{}, unsafe.Sizeof(T5{}))
+	}
+}
+
+func TestAppendGrowth(t *testing.T) {
+	var x []int64
+	check := func(want int) {
+		if cap(x) != want {
+			t.Errorf("len=%d, cap=%d, want cap=%d", len(x), cap(x), want)
+		}
+	}
+
+	check(0)
+	want := 1
+	for i := 1; i <= 100; i++ {
+		x = append(x, 1)
+		check(want)
+		if i&(i-1) == 0 {
+			want = 2 * i
+		}
+	}
+}
+
+var One = []int64{1}
+
+func TestAppendSliceGrowth(t *testing.T) {
+	var x []int64
+	check := func(want int) {
+		if cap(x) != want {
+			t.Errorf("len=%d, cap=%d, want cap=%d", len(x), cap(x), want)
+		}
+	}
+
+	check(0)
+	want := 1
+	for i := 1; i <= 100; i++ {
+		x = append(x, One...)
+		check(want)
+		if i&(i-1) == 0 {
+			want = 2 * i
+		}
+	}
+}
+
+func TestGoroutineProfileTrivial(t *testing.T) {
+	// Calling GoroutineProfile twice in a row should find the same number of goroutines,
+	// but it's possible there are goroutines just about to exit, so we might end up
+	// with fewer in the second call. Try a few times; it should converge once those
+	// zombies are gone.
+	for i := 0; ; i++ {
+		n1, ok := GoroutineProfile(nil) // should fail, there's at least 1 goroutine
+		if n1 < 1 || ok {
+			t.Fatalf("GoroutineProfile(nil) = %d, %v, want >0, false", n1, ok)
+		}
+		n2, ok := GoroutineProfile(make([]StackRecord, n1))
+		if n2 == n1 && ok {
+			break
+		}
+		t.Logf("GoroutineProfile(%d) = %d, %v, want %d, true", n1, n2, ok, n1)
+		if i >= 10 {
+			t.Fatalf("GoroutineProfile not converging")
+		}
+	}
+}
+
+func TestVersion(t *testing.T) {
+	// Test that version does not contain \r or \n.
+	vers := Version()
+	if strings.Contains(vers, "\r") || strings.Contains(vers, "\n") {
+		t.Fatalf("cr/nl in version: %q", vers)
+	}
+}
diff --git a/src/runtime/runtime_unix_test.go b/src/runtime/runtime_unix_test.go
new file mode 100644
index 0000000..b0cbbbe
--- /dev/null
+++ b/src/runtime/runtime_unix_test.go
@@ -0,0 +1,56 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Only works on systems with syscall.Close.
+// We need a fast system call to provoke the race,
+// and Close(-1) is nearly universally fast.
+
+// +build aix darwin dragonfly freebsd linux netbsd openbsd plan9
+
+package runtime_test
+
+import (
+	"runtime"
+	"sync"
+	"sync/atomic"
+	"syscall"
+	"testing"
+)
+
+func TestGoroutineProfile(t *testing.T) {
+	// GoroutineProfile used to use the wrong starting sp for
+	// goroutines coming out of system calls, causing possible
+	// crashes.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(100))
+
+	var stop uint32
+	defer atomic.StoreUint32(&stop, 1) // in case of panic
+
+	var wg sync.WaitGroup
+	for i := 0; i < 4; i++ {
+		wg.Add(1)
+		go func() {
+			for atomic.LoadUint32(&stop) == 0 {
+				syscall.Close(-1)
+			}
+			wg.Done()
+		}()
+	}
+
+	max := 10000
+	if testing.Short() {
+		max = 100
+	}
+	stk := make([]runtime.StackRecord, 128)
+	for n := 0; n < max; n++ {
+		_, ok := runtime.GoroutineProfile(stk)
+		if !ok {
+			t.Fatalf("GoroutineProfile failed")
+		}
+	}
+
+	// If the program didn't crash, we passed.
+	atomic.StoreUint32(&stop, 1)
+	wg.Wait()
+}
diff --git a/src/runtime/rwmutex.go b/src/runtime/rwmutex.go
new file mode 100644
index 0000000..7713c3f
--- /dev/null
+++ b/src/runtime/rwmutex.go
@@ -0,0 +1,125 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+)
+
+// This is a copy of sync/rwmutex.go rewritten to work in the runtime.
+
+// A rwmutex is a reader/writer mutual exclusion lock.
+// The lock can be held by an arbitrary number of readers or a single writer.
+// This is a variant of sync.RWMutex, for the runtime package.
+// Like mutex, rwmutex blocks the calling M.
+// It does not interact with the goroutine scheduler.
+type rwmutex struct {
+	rLock      mutex    // protects readers, readerPass, writer
+	readers    muintptr // list of pending readers
+	readerPass uint32   // number of pending readers to skip readers list
+
+	wLock  mutex    // serializes writers
+	writer muintptr // pending writer waiting for completing readers
+
+	readerCount uint32 // number of pending readers
+	readerWait  uint32 // number of departing readers
+}
+
+const rwmutexMaxReaders = 1 << 30
+
+// rlock locks rw for reading.
+func (rw *rwmutex) rlock() {
+	// The reader must not be allowed to lose its P or else other
+	// things blocking on the lock may consume all of the Ps and
+	// deadlock (issue #20903). Alternatively, we could drop the P
+	// while sleeping.
+	acquirem()
+	if int32(atomic.Xadd(&rw.readerCount, 1)) < 0 {
+		// A writer is pending. Park on the reader queue.
+		systemstack(func() {
+			lockWithRank(&rw.rLock, lockRankRwmutexR)
+			if rw.readerPass > 0 {
+				// Writer finished.
+				rw.readerPass -= 1
+				unlock(&rw.rLock)
+			} else {
+				// Queue this reader to be woken by
+				// the writer.
+				m := getg().m
+				m.schedlink = rw.readers
+				rw.readers.set(m)
+				unlock(&rw.rLock)
+				notesleep(&m.park)
+				noteclear(&m.park)
+			}
+		})
+	}
+}
+
+// runlock undoes a single rlock call on rw.
+func (rw *rwmutex) runlock() {
+	if r := int32(atomic.Xadd(&rw.readerCount, -1)); r < 0 {
+		if r+1 == 0 || r+1 == -rwmutexMaxReaders {
+			throw("runlock of unlocked rwmutex")
+		}
+		// A writer is pending.
+		if atomic.Xadd(&rw.readerWait, -1) == 0 {
+			// The last reader unblocks the writer.
+			lockWithRank(&rw.rLock, lockRankRwmutexR)
+			w := rw.writer.ptr()
+			if w != nil {
+				notewakeup(&w.park)
+			}
+			unlock(&rw.rLock)
+		}
+	}
+	releasem(getg().m)
+}
+
+// lock locks rw for writing.
+func (rw *rwmutex) lock() {
+	// Resolve competition with other writers and stick to our P.
+	lockWithRank(&rw.wLock, lockRankRwmutexW)
+	m := getg().m
+	// Announce that there is a pending writer.
+	r := int32(atomic.Xadd(&rw.readerCount, -rwmutexMaxReaders)) + rwmutexMaxReaders
+	// Wait for any active readers to complete.
+	lockWithRank(&rw.rLock, lockRankRwmutexR)
+	if r != 0 && atomic.Xadd(&rw.readerWait, r) != 0 {
+		// Wait for reader to wake us up.
+		systemstack(func() {
+			rw.writer.set(m)
+			unlock(&rw.rLock)
+			notesleep(&m.park)
+			noteclear(&m.park)
+		})
+	} else {
+		unlock(&rw.rLock)
+	}
+}
+
+// unlock unlocks rw for writing.
+func (rw *rwmutex) unlock() {
+	// Announce to readers that there is no active writer.
+	r := int32(atomic.Xadd(&rw.readerCount, rwmutexMaxReaders))
+	if r >= rwmutexMaxReaders {
+		throw("unlock of unlocked rwmutex")
+	}
+	// Unblock blocked readers.
+	lockWithRank(&rw.rLock, lockRankRwmutexR)
+	for rw.readers.ptr() != nil {
+		reader := rw.readers.ptr()
+		rw.readers = reader.schedlink
+		reader.schedlink.set(nil)
+		notewakeup(&reader.park)
+		r -= 1
+	}
+	// If r > 0, there are pending readers that aren't on the
+	// queue. Tell them to skip waiting.
+	rw.readerPass += uint32(r)
+	unlock(&rw.rLock)
+	// Allow other writers to proceed.
+	unlock(&rw.wLock)
+}
diff --git a/src/runtime/rwmutex_test.go b/src/runtime/rwmutex_test.go
new file mode 100644
index 0000000..291a32e
--- /dev/null
+++ b/src/runtime/rwmutex_test.go
@@ -0,0 +1,186 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// GOMAXPROCS=10 go test
+
+// This is a copy of sync/rwmutex_test.go rewritten to test the
+// runtime rwmutex.
+
+package runtime_test
+
+import (
+	"fmt"
+	. "runtime"
+	"runtime/debug"
+	"sync/atomic"
+	"testing"
+)
+
+func parallelReader(m *RWMutex, clocked chan bool, cunlock *uint32, cdone chan bool) {
+	m.RLock()
+	clocked <- true
+	for atomic.LoadUint32(cunlock) == 0 {
+	}
+	m.RUnlock()
+	cdone <- true
+}
+
+func doTestParallelReaders(numReaders int) {
+	GOMAXPROCS(numReaders + 1)
+	var m RWMutex
+	clocked := make(chan bool, numReaders)
+	var cunlock uint32
+	cdone := make(chan bool)
+	for i := 0; i < numReaders; i++ {
+		go parallelReader(&m, clocked, &cunlock, cdone)
+	}
+	// Wait for all parallel RLock()s to succeed.
+	for i := 0; i < numReaders; i++ {
+		<-clocked
+	}
+	atomic.StoreUint32(&cunlock, 1)
+	// Wait for the goroutines to finish.
+	for i := 0; i < numReaders; i++ {
+		<-cdone
+	}
+}
+
+func TestParallelRWMutexReaders(t *testing.T) {
+	if GOARCH == "wasm" {
+		t.Skip("wasm has no threads yet")
+	}
+	defer GOMAXPROCS(GOMAXPROCS(-1))
+	// If runtime triggers a forced GC during this test then it will deadlock,
+	// since the goroutines can't be stopped/preempted.
+	// Disable GC for this test (see issue #10958).
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+	doTestParallelReaders(1)
+	doTestParallelReaders(3)
+	doTestParallelReaders(4)
+}
+
+func reader(rwm *RWMutex, num_iterations int, activity *int32, cdone chan bool) {
+	for i := 0; i < num_iterations; i++ {
+		rwm.RLock()
+		n := atomic.AddInt32(activity, 1)
+		if n < 1 || n >= 10000 {
+			panic(fmt.Sprintf("wlock(%d)\n", n))
+		}
+		for i := 0; i < 100; i++ {
+		}
+		atomic.AddInt32(activity, -1)
+		rwm.RUnlock()
+	}
+	cdone <- true
+}
+
+func writer(rwm *RWMutex, num_iterations int, activity *int32, cdone chan bool) {
+	for i := 0; i < num_iterations; i++ {
+		rwm.Lock()
+		n := atomic.AddInt32(activity, 10000)
+		if n != 10000 {
+			panic(fmt.Sprintf("wlock(%d)\n", n))
+		}
+		for i := 0; i < 100; i++ {
+		}
+		atomic.AddInt32(activity, -10000)
+		rwm.Unlock()
+	}
+	cdone <- true
+}
+
+func HammerRWMutex(gomaxprocs, numReaders, num_iterations int) {
+	GOMAXPROCS(gomaxprocs)
+	// Number of active readers + 10000 * number of active writers.
+	var activity int32
+	var rwm RWMutex
+	cdone := make(chan bool)
+	go writer(&rwm, num_iterations, &activity, cdone)
+	var i int
+	for i = 0; i < numReaders/2; i++ {
+		go reader(&rwm, num_iterations, &activity, cdone)
+	}
+	go writer(&rwm, num_iterations, &activity, cdone)
+	for ; i < numReaders; i++ {
+		go reader(&rwm, num_iterations, &activity, cdone)
+	}
+	// Wait for the 2 writers and all readers to finish.
+	for i := 0; i < 2+numReaders; i++ {
+		<-cdone
+	}
+}
+
+func TestRWMutex(t *testing.T) {
+	defer GOMAXPROCS(GOMAXPROCS(-1))
+	n := 1000
+	if testing.Short() {
+		n = 5
+	}
+	HammerRWMutex(1, 1, n)
+	HammerRWMutex(1, 3, n)
+	HammerRWMutex(1, 10, n)
+	HammerRWMutex(4, 1, n)
+	HammerRWMutex(4, 3, n)
+	HammerRWMutex(4, 10, n)
+	HammerRWMutex(10, 1, n)
+	HammerRWMutex(10, 3, n)
+	HammerRWMutex(10, 10, n)
+	HammerRWMutex(10, 5, n)
+}
+
+func BenchmarkRWMutexUncontended(b *testing.B) {
+	type PaddedRWMutex struct {
+		RWMutex
+		pad [32]uint32
+	}
+	b.RunParallel(func(pb *testing.PB) {
+		var rwm PaddedRWMutex
+		for pb.Next() {
+			rwm.RLock()
+			rwm.RLock()
+			rwm.RUnlock()
+			rwm.RUnlock()
+			rwm.Lock()
+			rwm.Unlock()
+		}
+	})
+}
+
+func benchmarkRWMutex(b *testing.B, localWork, writeRatio int) {
+	var rwm RWMutex
+	b.RunParallel(func(pb *testing.PB) {
+		foo := 0
+		for pb.Next() {
+			foo++
+			if foo%writeRatio == 0 {
+				rwm.Lock()
+				rwm.Unlock()
+			} else {
+				rwm.RLock()
+				for i := 0; i != localWork; i += 1 {
+					foo *= 2
+					foo /= 2
+				}
+				rwm.RUnlock()
+			}
+		}
+		_ = foo
+	})
+}
+
+func BenchmarkRWMutexWrite100(b *testing.B) {
+	benchmarkRWMutex(b, 0, 100)
+}
+
+func BenchmarkRWMutexWrite10(b *testing.B) {
+	benchmarkRWMutex(b, 0, 10)
+}
+
+func BenchmarkRWMutexWorkWrite100(b *testing.B) {
+	benchmarkRWMutex(b, 100, 100)
+}
+
+func BenchmarkRWMutexWorkWrite10(b *testing.B) {
+	benchmarkRWMutex(b, 100, 10)
+}
diff --git a/src/runtime/select.go b/src/runtime/select.go
new file mode 100644
index 0000000..e72761b
--- /dev/null
+++ b/src/runtime/select.go
@@ -0,0 +1,616 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// This file contains the implementation of Go select statements.
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const debugSelect = false
+
+// Select case descriptor.
+// Known to compiler.
+// Changes here must also be made in src/cmd/internal/gc/select.go's scasetype.
+type scase struct {
+	c    *hchan         // chan
+	elem unsafe.Pointer // data element
+}
+
+var (
+	chansendpc = funcPC(chansend)
+	chanrecvpc = funcPC(chanrecv)
+)
+
+func selectsetpc(pc *uintptr) {
+	*pc = getcallerpc()
+}
+
+func sellock(scases []scase, lockorder []uint16) {
+	var c *hchan
+	for _, o := range lockorder {
+		c0 := scases[o].c
+		if c0 != c {
+			c = c0
+			lock(&c.lock)
+		}
+	}
+}
+
+func selunlock(scases []scase, lockorder []uint16) {
+	// We must be very careful here to not touch sel after we have unlocked
+	// the last lock, because sel can be freed right after the last unlock.
+	// Consider the following situation.
+	// First M calls runtime·park() in runtime·selectgo() passing the sel.
+	// Once runtime·park() has unlocked the last lock, another M makes
+	// the G that calls select runnable again and schedules it for execution.
+	// When the G runs on another M, it locks all the locks and frees sel.
+	// Now if the first M touches sel, it will access freed memory.
+	for i := len(lockorder) - 1; i >= 0; i-- {
+		c := scases[lockorder[i]].c
+		if i > 0 && c == scases[lockorder[i-1]].c {
+			continue // will unlock it on the next iteration
+		}
+		unlock(&c.lock)
+	}
+}
+
+func selparkcommit(gp *g, _ unsafe.Pointer) bool {
+	// There are unlocked sudogs that point into gp's stack. Stack
+	// copying must lock the channels of those sudogs.
+	// Set activeStackChans here instead of before we try parking
+	// because we could self-deadlock in stack growth on a
+	// channel lock.
+	gp.activeStackChans = true
+	// Mark that it's safe for stack shrinking to occur now,
+	// because any thread acquiring this G's stack for shrinking
+	// is guaranteed to observe activeStackChans after this store.
+	atomic.Store8(&gp.parkingOnChan, 0)
+	// Make sure we unlock after setting activeStackChans and
+	// unsetting parkingOnChan. The moment we unlock any of the
+	// channel locks we risk gp getting readied by a channel operation
+	// and so gp could continue running before everything before the
+	// unlock is visible (even to gp itself).
+
+	// This must not access gp's stack (see gopark). In
+	// particular, it must not access the *hselect. That's okay,
+	// because by the time this is called, gp.waiting has all
+	// channels in lock order.
+	var lastc *hchan
+	for sg := gp.waiting; sg != nil; sg = sg.waitlink {
+		if sg.c != lastc && lastc != nil {
+			// As soon as we unlock the channel, fields in
+			// any sudog with that channel may change,
+			// including c and waitlink. Since multiple
+			// sudogs may have the same channel, we unlock
+			// only after we've passed the last instance
+			// of a channel.
+			unlock(&lastc.lock)
+		}
+		lastc = sg.c
+	}
+	if lastc != nil {
+		unlock(&lastc.lock)
+	}
+	return true
+}
+
+func block() {
+	gopark(nil, nil, waitReasonSelectNoCases, traceEvGoStop, 1) // forever
+}
+
+// selectgo implements the select statement.
+//
+// cas0 points to an array of type [ncases]scase, and order0 points to
+// an array of type [2*ncases]uint16 where ncases must be <= 65536.
+// Both reside on the goroutine's stack (regardless of any escaping in
+// selectgo).
+//
+// For race detector builds, pc0 points to an array of type
+// [ncases]uintptr (also on the stack); for other builds, it's set to
+// nil.
+//
+// selectgo returns the index of the chosen scase, which matches the
+// ordinal position of its respective select{recv,send,default} call.
+// Also, if the chosen scase was a receive operation, it reports whether
+// a value was received.
+func selectgo(cas0 *scase, order0 *uint16, pc0 *uintptr, nsends, nrecvs int, block bool) (int, bool) {
+	if debugSelect {
+		print("select: cas0=", cas0, "\n")
+	}
+
+	// NOTE: In order to maintain a lean stack size, the number of scases
+	// is capped at 65536.
+	cas1 := (*[1 << 16]scase)(unsafe.Pointer(cas0))
+	order1 := (*[1 << 17]uint16)(unsafe.Pointer(order0))
+
+	ncases := nsends + nrecvs
+	scases := cas1[:ncases:ncases]
+	pollorder := order1[:ncases:ncases]
+	lockorder := order1[ncases:][:ncases:ncases]
+	// NOTE: pollorder/lockorder's underlying array was not zero-initialized by compiler.
+
+	// Even when raceenabled is true, there might be select
+	// statements in packages compiled without -race (e.g.,
+	// ensureSigM in runtime/signal_unix.go).
+	var pcs []uintptr
+	if raceenabled && pc0 != nil {
+		pc1 := (*[1 << 16]uintptr)(unsafe.Pointer(pc0))
+		pcs = pc1[:ncases:ncases]
+	}
+	casePC := func(casi int) uintptr {
+		if pcs == nil {
+			return 0
+		}
+		return pcs[casi]
+	}
+
+	var t0 int64
+	if blockprofilerate > 0 {
+		t0 = cputicks()
+	}
+
+	// The compiler rewrites selects that statically have
+	// only 0 or 1 cases plus default into simpler constructs.
+	// The only way we can end up with such small sel.ncase
+	// values here is for a larger select in which most channels
+	// have been nilled out. The general code handles those
+	// cases correctly, and they are rare enough not to bother
+	// optimizing (and needing to test).
+
+	// generate permuted order
+	norder := 0
+	for i := range scases {
+		cas := &scases[i]
+
+		// Omit cases without channels from the poll and lock orders.
+		if cas.c == nil {
+			cas.elem = nil // allow GC
+			continue
+		}
+
+		j := fastrandn(uint32(norder + 1))
+		pollorder[norder] = pollorder[j]
+		pollorder[j] = uint16(i)
+		norder++
+	}
+	pollorder = pollorder[:norder]
+	lockorder = lockorder[:norder]
+
+	// sort the cases by Hchan address to get the locking order.
+	// simple heap sort, to guarantee n log n time and constant stack footprint.
+	for i := range lockorder {
+		j := i
+		// Start with the pollorder to permute cases on the same channel.
+		c := scases[pollorder[i]].c
+		for j > 0 && scases[lockorder[(j-1)/2]].c.sortkey() < c.sortkey() {
+			k := (j - 1) / 2
+			lockorder[j] = lockorder[k]
+			j = k
+		}
+		lockorder[j] = pollorder[i]
+	}
+	for i := len(lockorder) - 1; i >= 0; i-- {
+		o := lockorder[i]
+		c := scases[o].c
+		lockorder[i] = lockorder[0]
+		j := 0
+		for {
+			k := j*2 + 1
+			if k >= i {
+				break
+			}
+			if k+1 < i && scases[lockorder[k]].c.sortkey() < scases[lockorder[k+1]].c.sortkey() {
+				k++
+			}
+			if c.sortkey() < scases[lockorder[k]].c.sortkey() {
+				lockorder[j] = lockorder[k]
+				j = k
+				continue
+			}
+			break
+		}
+		lockorder[j] = o
+	}
+
+	if debugSelect {
+		for i := 0; i+1 < len(lockorder); i++ {
+			if scases[lockorder[i]].c.sortkey() > scases[lockorder[i+1]].c.sortkey() {
+				print("i=", i, " x=", lockorder[i], " y=", lockorder[i+1], "\n")
+				throw("select: broken sort")
+			}
+		}
+	}
+
+	// lock all the channels involved in the select
+	sellock(scases, lockorder)
+
+	var (
+		gp     *g
+		sg     *sudog
+		c      *hchan
+		k      *scase
+		sglist *sudog
+		sgnext *sudog
+		qp     unsafe.Pointer
+		nextp  **sudog
+	)
+
+	// pass 1 - look for something already waiting
+	var casi int
+	var cas *scase
+	var caseSuccess bool
+	var caseReleaseTime int64 = -1
+	var recvOK bool
+	for _, casei := range pollorder {
+		casi = int(casei)
+		cas = &scases[casi]
+		c = cas.c
+
+		if casi >= nsends {
+			sg = c.sendq.dequeue()
+			if sg != nil {
+				goto recv
+			}
+			if c.qcount > 0 {
+				goto bufrecv
+			}
+			if c.closed != 0 {
+				goto rclose
+			}
+		} else {
+			if raceenabled {
+				racereadpc(c.raceaddr(), casePC(casi), chansendpc)
+			}
+			if c.closed != 0 {
+				goto sclose
+			}
+			sg = c.recvq.dequeue()
+			if sg != nil {
+				goto send
+			}
+			if c.qcount < c.dataqsiz {
+				goto bufsend
+			}
+		}
+	}
+
+	if !block {
+		selunlock(scases, lockorder)
+		casi = -1
+		goto retc
+	}
+
+	// pass 2 - enqueue on all chans
+	gp = getg()
+	if gp.waiting != nil {
+		throw("gp.waiting != nil")
+	}
+	nextp = &gp.waiting
+	for _, casei := range lockorder {
+		casi = int(casei)
+		cas = &scases[casi]
+		c = cas.c
+		sg := acquireSudog()
+		sg.g = gp
+		sg.isSelect = true
+		// No stack splits between assigning elem and enqueuing
+		// sg on gp.waiting where copystack can find it.
+		sg.elem = cas.elem
+		sg.releasetime = 0
+		if t0 != 0 {
+			sg.releasetime = -1
+		}
+		sg.c = c
+		// Construct waiting list in lock order.
+		*nextp = sg
+		nextp = &sg.waitlink
+
+		if casi < nsends {
+			c.sendq.enqueue(sg)
+		} else {
+			c.recvq.enqueue(sg)
+		}
+	}
+
+	// wait for someone to wake us up
+	gp.param = nil
+	// Signal to anyone trying to shrink our stack that we're about
+	// to park on a channel. The window between when this G's status
+	// changes and when we set gp.activeStackChans is not safe for
+	// stack shrinking.
+	atomic.Store8(&gp.parkingOnChan, 1)
+	gopark(selparkcommit, nil, waitReasonSelect, traceEvGoBlockSelect, 1)
+	gp.activeStackChans = false
+
+	sellock(scases, lockorder)
+
+	gp.selectDone = 0
+	sg = (*sudog)(gp.param)
+	gp.param = nil
+
+	// pass 3 - dequeue from unsuccessful chans
+	// otherwise they stack up on quiet channels
+	// record the successful case, if any.
+	// We singly-linked up the SudoGs in lock order.
+	casi = -1
+	cas = nil
+	caseSuccess = false
+	sglist = gp.waiting
+	// Clear all elem before unlinking from gp.waiting.
+	for sg1 := gp.waiting; sg1 != nil; sg1 = sg1.waitlink {
+		sg1.isSelect = false
+		sg1.elem = nil
+		sg1.c = nil
+	}
+	gp.waiting = nil
+
+	for _, casei := range lockorder {
+		k = &scases[casei]
+		if sg == sglist {
+			// sg has already been dequeued by the G that woke us up.
+			casi = int(casei)
+			cas = k
+			caseSuccess = sglist.success
+			if sglist.releasetime > 0 {
+				caseReleaseTime = sglist.releasetime
+			}
+		} else {
+			c = k.c
+			if int(casei) < nsends {
+				c.sendq.dequeueSudoG(sglist)
+			} else {
+				c.recvq.dequeueSudoG(sglist)
+			}
+		}
+		sgnext = sglist.waitlink
+		sglist.waitlink = nil
+		releaseSudog(sglist)
+		sglist = sgnext
+	}
+
+	if cas == nil {
+		throw("selectgo: bad wakeup")
+	}
+
+	c = cas.c
+
+	if debugSelect {
+		print("wait-return: cas0=", cas0, " c=", c, " cas=", cas, " send=", casi < nsends, "\n")
+	}
+
+	if casi < nsends {
+		if !caseSuccess {
+			goto sclose
+		}
+	} else {
+		recvOK = caseSuccess
+	}
+
+	if raceenabled {
+		if casi < nsends {
+			raceReadObjectPC(c.elemtype, cas.elem, casePC(casi), chansendpc)
+		} else if cas.elem != nil {
+			raceWriteObjectPC(c.elemtype, cas.elem, casePC(casi), chanrecvpc)
+		}
+	}
+	if msanenabled {
+		if casi < nsends {
+			msanread(cas.elem, c.elemtype.size)
+		} else if cas.elem != nil {
+			msanwrite(cas.elem, c.elemtype.size)
+		}
+	}
+
+	selunlock(scases, lockorder)
+	goto retc
+
+bufrecv:
+	// can receive from buffer
+	if raceenabled {
+		if cas.elem != nil {
+			raceWriteObjectPC(c.elemtype, cas.elem, casePC(casi), chanrecvpc)
+		}
+		racenotify(c, c.recvx, nil)
+	}
+	if msanenabled && cas.elem != nil {
+		msanwrite(cas.elem, c.elemtype.size)
+	}
+	recvOK = true
+	qp = chanbuf(c, c.recvx)
+	if cas.elem != nil {
+		typedmemmove(c.elemtype, cas.elem, qp)
+	}
+	typedmemclr(c.elemtype, qp)
+	c.recvx++
+	if c.recvx == c.dataqsiz {
+		c.recvx = 0
+	}
+	c.qcount--
+	selunlock(scases, lockorder)
+	goto retc
+
+bufsend:
+	// can send to buffer
+	if raceenabled {
+		racenotify(c, c.sendx, nil)
+		raceReadObjectPC(c.elemtype, cas.elem, casePC(casi), chansendpc)
+	}
+	if msanenabled {
+		msanread(cas.elem, c.elemtype.size)
+	}
+	typedmemmove(c.elemtype, chanbuf(c, c.sendx), cas.elem)
+	c.sendx++
+	if c.sendx == c.dataqsiz {
+		c.sendx = 0
+	}
+	c.qcount++
+	selunlock(scases, lockorder)
+	goto retc
+
+recv:
+	// can receive from sleeping sender (sg)
+	recv(c, sg, cas.elem, func() { selunlock(scases, lockorder) }, 2)
+	if debugSelect {
+		print("syncrecv: cas0=", cas0, " c=", c, "\n")
+	}
+	recvOK = true
+	goto retc
+
+rclose:
+	// read at end of closed channel
+	selunlock(scases, lockorder)
+	recvOK = false
+	if cas.elem != nil {
+		typedmemclr(c.elemtype, cas.elem)
+	}
+	if raceenabled {
+		raceacquire(c.raceaddr())
+	}
+	goto retc
+
+send:
+	// can send to a sleeping receiver (sg)
+	if raceenabled {
+		raceReadObjectPC(c.elemtype, cas.elem, casePC(casi), chansendpc)
+	}
+	if msanenabled {
+		msanread(cas.elem, c.elemtype.size)
+	}
+	send(c, sg, cas.elem, func() { selunlock(scases, lockorder) }, 2)
+	if debugSelect {
+		print("syncsend: cas0=", cas0, " c=", c, "\n")
+	}
+	goto retc
+
+retc:
+	if caseReleaseTime > 0 {
+		blockevent(caseReleaseTime-t0, 1)
+	}
+	return casi, recvOK
+
+sclose:
+	// send on closed channel
+	selunlock(scases, lockorder)
+	panic(plainError("send on closed channel"))
+}
+
+func (c *hchan) sortkey() uintptr {
+	return uintptr(unsafe.Pointer(c))
+}
+
+// A runtimeSelect is a single case passed to rselect.
+// This must match ../reflect/value.go:/runtimeSelect
+type runtimeSelect struct {
+	dir selectDir
+	typ unsafe.Pointer // channel type (not used here)
+	ch  *hchan         // channel
+	val unsafe.Pointer // ptr to data (SendDir) or ptr to receive buffer (RecvDir)
+}
+
+// These values must match ../reflect/value.go:/SelectDir.
+type selectDir int
+
+const (
+	_             selectDir = iota
+	selectSend              // case Chan <- Send
+	selectRecv              // case <-Chan:
+	selectDefault           // default
+)
+
+//go:linkname reflect_rselect reflect.rselect
+func reflect_rselect(cases []runtimeSelect) (int, bool) {
+	if len(cases) == 0 {
+		block()
+	}
+	sel := make([]scase, len(cases))
+	orig := make([]int, len(cases))
+	nsends, nrecvs := 0, 0
+	dflt := -1
+	for i, rc := range cases {
+		var j int
+		switch rc.dir {
+		case selectDefault:
+			dflt = i
+			continue
+		case selectSend:
+			j = nsends
+			nsends++
+		case selectRecv:
+			nrecvs++
+			j = len(cases) - nrecvs
+		}
+
+		sel[j] = scase{c: rc.ch, elem: rc.val}
+		orig[j] = i
+	}
+
+	// Only a default case.
+	if nsends+nrecvs == 0 {
+		return dflt, false
+	}
+
+	// Compact sel and orig if necessary.
+	if nsends+nrecvs < len(cases) {
+		copy(sel[nsends:], sel[len(cases)-nrecvs:])
+		copy(orig[nsends:], orig[len(cases)-nrecvs:])
+	}
+
+	order := make([]uint16, 2*(nsends+nrecvs))
+	var pc0 *uintptr
+	if raceenabled {
+		pcs := make([]uintptr, nsends+nrecvs)
+		for i := range pcs {
+			selectsetpc(&pcs[i])
+		}
+		pc0 = &pcs[0]
+	}
+
+	chosen, recvOK := selectgo(&sel[0], &order[0], pc0, nsends, nrecvs, dflt == -1)
+
+	// Translate chosen back to caller's ordering.
+	if chosen < 0 {
+		chosen = dflt
+	} else {
+		chosen = orig[chosen]
+	}
+	return chosen, recvOK
+}
+
+func (q *waitq) dequeueSudoG(sgp *sudog) {
+	x := sgp.prev
+	y := sgp.next
+	if x != nil {
+		if y != nil {
+			// middle of queue
+			x.next = y
+			y.prev = x
+			sgp.next = nil
+			sgp.prev = nil
+			return
+		}
+		// end of queue
+		x.next = nil
+		q.last = x
+		sgp.prev = nil
+		return
+	}
+	if y != nil {
+		// start of queue
+		y.prev = nil
+		q.first = y
+		sgp.next = nil
+		return
+	}
+
+	// x==y==nil. Either sgp is the only element in the queue,
+	// or it has already been removed. Use q.first to disambiguate.
+	if q.first == sgp {
+		q.first = nil
+		q.last = nil
+	}
+}
diff --git a/src/runtime/sema.go b/src/runtime/sema.go
new file mode 100644
index 0000000..f94c1aa
--- /dev/null
+++ b/src/runtime/sema.go
@@ -0,0 +1,617 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Semaphore implementation exposed to Go.
+// Intended use is provide a sleep and wakeup
+// primitive that can be used in the contended case
+// of other synchronization primitives.
+// Thus it targets the same goal as Linux's futex,
+// but it has much simpler semantics.
+//
+// That is, don't think of these as semaphores.
+// Think of them as a way to implement sleep and wakeup
+// such that every sleep is paired with a single wakeup,
+// even if, due to races, the wakeup happens before the sleep.
+//
+// See Mullender and Cox, ``Semaphores in Plan 9,''
+// https://swtch.com/semaphore.pdf
+
+package runtime
+
+import (
+	"internal/cpu"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// Asynchronous semaphore for sync.Mutex.
+
+// A semaRoot holds a balanced tree of sudog with distinct addresses (s.elem).
+// Each of those sudog may in turn point (through s.waitlink) to a list
+// of other sudogs waiting on the same address.
+// The operations on the inner lists of sudogs with the same address
+// are all O(1). The scanning of the top-level semaRoot list is O(log n),
+// where n is the number of distinct addresses with goroutines blocked
+// on them that hash to the given semaRoot.
+// See golang.org/issue/17953 for a program that worked badly
+// before we introduced the second level of list, and test/locklinear.go
+// for a test that exercises this.
+type semaRoot struct {
+	lock  mutex
+	treap *sudog // root of balanced tree of unique waiters.
+	nwait uint32 // Number of waiters. Read w/o the lock.
+}
+
+// Prime to not correlate with any user patterns.
+const semTabSize = 251
+
+var semtable [semTabSize]struct {
+	root semaRoot
+	pad  [cpu.CacheLinePadSize - unsafe.Sizeof(semaRoot{})]byte
+}
+
+//go:linkname sync_runtime_Semacquire sync.runtime_Semacquire
+func sync_runtime_Semacquire(addr *uint32) {
+	semacquire1(addr, false, semaBlockProfile, 0)
+}
+
+//go:linkname poll_runtime_Semacquire internal/poll.runtime_Semacquire
+func poll_runtime_Semacquire(addr *uint32) {
+	semacquire1(addr, false, semaBlockProfile, 0)
+}
+
+//go:linkname sync_runtime_Semrelease sync.runtime_Semrelease
+func sync_runtime_Semrelease(addr *uint32, handoff bool, skipframes int) {
+	semrelease1(addr, handoff, skipframes)
+}
+
+//go:linkname sync_runtime_SemacquireMutex sync.runtime_SemacquireMutex
+func sync_runtime_SemacquireMutex(addr *uint32, lifo bool, skipframes int) {
+	semacquire1(addr, lifo, semaBlockProfile|semaMutexProfile, skipframes)
+}
+
+//go:linkname poll_runtime_Semrelease internal/poll.runtime_Semrelease
+func poll_runtime_Semrelease(addr *uint32) {
+	semrelease(addr)
+}
+
+func readyWithTime(s *sudog, traceskip int) {
+	if s.releasetime != 0 {
+		s.releasetime = cputicks()
+	}
+	goready(s.g, traceskip)
+}
+
+type semaProfileFlags int
+
+const (
+	semaBlockProfile semaProfileFlags = 1 << iota
+	semaMutexProfile
+)
+
+// Called from runtime.
+func semacquire(addr *uint32) {
+	semacquire1(addr, false, 0, 0)
+}
+
+func semacquire1(addr *uint32, lifo bool, profile semaProfileFlags, skipframes int) {
+	gp := getg()
+	if gp != gp.m.curg {
+		throw("semacquire not on the G stack")
+	}
+
+	// Easy case.
+	if cansemacquire(addr) {
+		return
+	}
+
+	// Harder case:
+	//	increment waiter count
+	//	try cansemacquire one more time, return if succeeded
+	//	enqueue itself as a waiter
+	//	sleep
+	//	(waiter descriptor is dequeued by signaler)
+	s := acquireSudog()
+	root := semroot(addr)
+	t0 := int64(0)
+	s.releasetime = 0
+	s.acquiretime = 0
+	s.ticket = 0
+	if profile&semaBlockProfile != 0 && blockprofilerate > 0 {
+		t0 = cputicks()
+		s.releasetime = -1
+	}
+	if profile&semaMutexProfile != 0 && mutexprofilerate > 0 {
+		if t0 == 0 {
+			t0 = cputicks()
+		}
+		s.acquiretime = t0
+	}
+	for {
+		lockWithRank(&root.lock, lockRankRoot)
+		// Add ourselves to nwait to disable "easy case" in semrelease.
+		atomic.Xadd(&root.nwait, 1)
+		// Check cansemacquire to avoid missed wakeup.
+		if cansemacquire(addr) {
+			atomic.Xadd(&root.nwait, -1)
+			unlock(&root.lock)
+			break
+		}
+		// Any semrelease after the cansemacquire knows we're waiting
+		// (we set nwait above), so go to sleep.
+		root.queue(addr, s, lifo)
+		goparkunlock(&root.lock, waitReasonSemacquire, traceEvGoBlockSync, 4+skipframes)
+		if s.ticket != 0 || cansemacquire(addr) {
+			break
+		}
+	}
+	if s.releasetime > 0 {
+		blockevent(s.releasetime-t0, 3+skipframes)
+	}
+	releaseSudog(s)
+}
+
+func semrelease(addr *uint32) {
+	semrelease1(addr, false, 0)
+}
+
+func semrelease1(addr *uint32, handoff bool, skipframes int) {
+	root := semroot(addr)
+	atomic.Xadd(addr, 1)
+
+	// Easy case: no waiters?
+	// This check must happen after the xadd, to avoid a missed wakeup
+	// (see loop in semacquire).
+	if atomic.Load(&root.nwait) == 0 {
+		return
+	}
+
+	// Harder case: search for a waiter and wake it.
+	lockWithRank(&root.lock, lockRankRoot)
+	if atomic.Load(&root.nwait) == 0 {
+		// The count is already consumed by another goroutine,
+		// so no need to wake up another goroutine.
+		unlock(&root.lock)
+		return
+	}
+	s, t0 := root.dequeue(addr)
+	if s != nil {
+		atomic.Xadd(&root.nwait, -1)
+	}
+	unlock(&root.lock)
+	if s != nil { // May be slow or even yield, so unlock first
+		acquiretime := s.acquiretime
+		if acquiretime != 0 {
+			mutexevent(t0-acquiretime, 3+skipframes)
+		}
+		if s.ticket != 0 {
+			throw("corrupted semaphore ticket")
+		}
+		if handoff && cansemacquire(addr) {
+			s.ticket = 1
+		}
+		readyWithTime(s, 5+skipframes)
+		if s.ticket == 1 && getg().m.locks == 0 {
+			// Direct G handoff
+			// readyWithTime has added the waiter G as runnext in the
+			// current P; we now call the scheduler so that we start running
+			// the waiter G immediately.
+			// Note that waiter inherits our time slice: this is desirable
+			// to avoid having a highly contended semaphore hog the P
+			// indefinitely. goyield is like Gosched, but it emits a
+			// "preempted" trace event instead and, more importantly, puts
+			// the current G on the local runq instead of the global one.
+			// We only do this in the starving regime (handoff=true), as in
+			// the non-starving case it is possible for a different waiter
+			// to acquire the semaphore while we are yielding/scheduling,
+			// and this would be wasteful. We wait instead to enter starving
+			// regime, and then we start to do direct handoffs of ticket and
+			// P.
+			// See issue 33747 for discussion.
+			goyield()
+		}
+	}
+}
+
+func semroot(addr *uint32) *semaRoot {
+	return &semtable[(uintptr(unsafe.Pointer(addr))>>3)%semTabSize].root
+}
+
+func cansemacquire(addr *uint32) bool {
+	for {
+		v := atomic.Load(addr)
+		if v == 0 {
+			return false
+		}
+		if atomic.Cas(addr, v, v-1) {
+			return true
+		}
+	}
+}
+
+// queue adds s to the blocked goroutines in semaRoot.
+func (root *semaRoot) queue(addr *uint32, s *sudog, lifo bool) {
+	s.g = getg()
+	s.elem = unsafe.Pointer(addr)
+	s.next = nil
+	s.prev = nil
+
+	var last *sudog
+	pt := &root.treap
+	for t := *pt; t != nil; t = *pt {
+		if t.elem == unsafe.Pointer(addr) {
+			// Already have addr in list.
+			if lifo {
+				// Substitute s in t's place in treap.
+				*pt = s
+				s.ticket = t.ticket
+				s.acquiretime = t.acquiretime
+				s.parent = t.parent
+				s.prev = t.prev
+				s.next = t.next
+				if s.prev != nil {
+					s.prev.parent = s
+				}
+				if s.next != nil {
+					s.next.parent = s
+				}
+				// Add t first in s's wait list.
+				s.waitlink = t
+				s.waittail = t.waittail
+				if s.waittail == nil {
+					s.waittail = t
+				}
+				t.parent = nil
+				t.prev = nil
+				t.next = nil
+				t.waittail = nil
+			} else {
+				// Add s to end of t's wait list.
+				if t.waittail == nil {
+					t.waitlink = s
+				} else {
+					t.waittail.waitlink = s
+				}
+				t.waittail = s
+				s.waitlink = nil
+			}
+			return
+		}
+		last = t
+		if uintptr(unsafe.Pointer(addr)) < uintptr(t.elem) {
+			pt = &t.prev
+		} else {
+			pt = &t.next
+		}
+	}
+
+	// Add s as new leaf in tree of unique addrs.
+	// The balanced tree is a treap using ticket as the random heap priority.
+	// That is, it is a binary tree ordered according to the elem addresses,
+	// but then among the space of possible binary trees respecting those
+	// addresses, it is kept balanced on average by maintaining a heap ordering
+	// on the ticket: s.ticket <= both s.prev.ticket and s.next.ticket.
+	// https://en.wikipedia.org/wiki/Treap
+	// https://faculty.washington.edu/aragon/pubs/rst89.pdf
+	//
+	// s.ticket compared with zero in couple of places, therefore set lowest bit.
+	// It will not affect treap's quality noticeably.
+	s.ticket = fastrand() | 1
+	s.parent = last
+	*pt = s
+
+	// Rotate up into tree according to ticket (priority).
+	for s.parent != nil && s.parent.ticket > s.ticket {
+		if s.parent.prev == s {
+			root.rotateRight(s.parent)
+		} else {
+			if s.parent.next != s {
+				panic("semaRoot queue")
+			}
+			root.rotateLeft(s.parent)
+		}
+	}
+}
+
+// dequeue searches for and finds the first goroutine
+// in semaRoot blocked on addr.
+// If the sudog was being profiled, dequeue returns the time
+// at which it was woken up as now. Otherwise now is 0.
+func (root *semaRoot) dequeue(addr *uint32) (found *sudog, now int64) {
+	ps := &root.treap
+	s := *ps
+	for ; s != nil; s = *ps {
+		if s.elem == unsafe.Pointer(addr) {
+			goto Found
+		}
+		if uintptr(unsafe.Pointer(addr)) < uintptr(s.elem) {
+			ps = &s.prev
+		} else {
+			ps = &s.next
+		}
+	}
+	return nil, 0
+
+Found:
+	now = int64(0)
+	if s.acquiretime != 0 {
+		now = cputicks()
+	}
+	if t := s.waitlink; t != nil {
+		// Substitute t, also waiting on addr, for s in root tree of unique addrs.
+		*ps = t
+		t.ticket = s.ticket
+		t.parent = s.parent
+		t.prev = s.prev
+		if t.prev != nil {
+			t.prev.parent = t
+		}
+		t.next = s.next
+		if t.next != nil {
+			t.next.parent = t
+		}
+		if t.waitlink != nil {
+			t.waittail = s.waittail
+		} else {
+			t.waittail = nil
+		}
+		t.acquiretime = now
+		s.waitlink = nil
+		s.waittail = nil
+	} else {
+		// Rotate s down to be leaf of tree for removal, respecting priorities.
+		for s.next != nil || s.prev != nil {
+			if s.next == nil || s.prev != nil && s.prev.ticket < s.next.ticket {
+				root.rotateRight(s)
+			} else {
+				root.rotateLeft(s)
+			}
+		}
+		// Remove s, now a leaf.
+		if s.parent != nil {
+			if s.parent.prev == s {
+				s.parent.prev = nil
+			} else {
+				s.parent.next = nil
+			}
+		} else {
+			root.treap = nil
+		}
+	}
+	s.parent = nil
+	s.elem = nil
+	s.next = nil
+	s.prev = nil
+	s.ticket = 0
+	return s, now
+}
+
+// rotateLeft rotates the tree rooted at node x.
+// turning (x a (y b c)) into (y (x a b) c).
+func (root *semaRoot) rotateLeft(x *sudog) {
+	// p -> (x a (y b c))
+	p := x.parent
+	y := x.next
+	b := y.prev
+
+	y.prev = x
+	x.parent = y
+	x.next = b
+	if b != nil {
+		b.parent = x
+	}
+
+	y.parent = p
+	if p == nil {
+		root.treap = y
+	} else if p.prev == x {
+		p.prev = y
+	} else {
+		if p.next != x {
+			throw("semaRoot rotateLeft")
+		}
+		p.next = y
+	}
+}
+
+// rotateRight rotates the tree rooted at node y.
+// turning (y (x a b) c) into (x a (y b c)).
+func (root *semaRoot) rotateRight(y *sudog) {
+	// p -> (y (x a b) c)
+	p := y.parent
+	x := y.prev
+	b := x.next
+
+	x.next = y
+	y.parent = x
+	y.prev = b
+	if b != nil {
+		b.parent = y
+	}
+
+	x.parent = p
+	if p == nil {
+		root.treap = x
+	} else if p.prev == y {
+		p.prev = x
+	} else {
+		if p.next != y {
+			throw("semaRoot rotateRight")
+		}
+		p.next = x
+	}
+}
+
+// notifyList is a ticket-based notification list used to implement sync.Cond.
+//
+// It must be kept in sync with the sync package.
+type notifyList struct {
+	// wait is the ticket number of the next waiter. It is atomically
+	// incremented outside the lock.
+	wait uint32
+
+	// notify is the ticket number of the next waiter to be notified. It can
+	// be read outside the lock, but is only written to with lock held.
+	//
+	// Both wait & notify can wrap around, and such cases will be correctly
+	// handled as long as their "unwrapped" difference is bounded by 2^31.
+	// For this not to be the case, we'd need to have 2^31+ goroutines
+	// blocked on the same condvar, which is currently not possible.
+	notify uint32
+
+	// List of parked waiters.
+	lock mutex
+	head *sudog
+	tail *sudog
+}
+
+// less checks if a < b, considering a & b running counts that may overflow the
+// 32-bit range, and that their "unwrapped" difference is always less than 2^31.
+func less(a, b uint32) bool {
+	return int32(a-b) < 0
+}
+
+// notifyListAdd adds the caller to a notify list such that it can receive
+// notifications. The caller must eventually call notifyListWait to wait for
+// such a notification, passing the returned ticket number.
+//go:linkname notifyListAdd sync.runtime_notifyListAdd
+func notifyListAdd(l *notifyList) uint32 {
+	// This may be called concurrently, for example, when called from
+	// sync.Cond.Wait while holding a RWMutex in read mode.
+	return atomic.Xadd(&l.wait, 1) - 1
+}
+
+// notifyListWait waits for a notification. If one has been sent since
+// notifyListAdd was called, it returns immediately. Otherwise, it blocks.
+//go:linkname notifyListWait sync.runtime_notifyListWait
+func notifyListWait(l *notifyList, t uint32) {
+	lockWithRank(&l.lock, lockRankNotifyList)
+
+	// Return right away if this ticket has already been notified.
+	if less(t, l.notify) {
+		unlock(&l.lock)
+		return
+	}
+
+	// Enqueue itself.
+	s := acquireSudog()
+	s.g = getg()
+	s.ticket = t
+	s.releasetime = 0
+	t0 := int64(0)
+	if blockprofilerate > 0 {
+		t0 = cputicks()
+		s.releasetime = -1
+	}
+	if l.tail == nil {
+		l.head = s
+	} else {
+		l.tail.next = s
+	}
+	l.tail = s
+	goparkunlock(&l.lock, waitReasonSyncCondWait, traceEvGoBlockCond, 3)
+	if t0 != 0 {
+		blockevent(s.releasetime-t0, 2)
+	}
+	releaseSudog(s)
+}
+
+// notifyListNotifyAll notifies all entries in the list.
+//go:linkname notifyListNotifyAll sync.runtime_notifyListNotifyAll
+func notifyListNotifyAll(l *notifyList) {
+	// Fast-path: if there are no new waiters since the last notification
+	// we don't need to acquire the lock.
+	if atomic.Load(&l.wait) == atomic.Load(&l.notify) {
+		return
+	}
+
+	// Pull the list out into a local variable, waiters will be readied
+	// outside the lock.
+	lockWithRank(&l.lock, lockRankNotifyList)
+	s := l.head
+	l.head = nil
+	l.tail = nil
+
+	// Update the next ticket to be notified. We can set it to the current
+	// value of wait because any previous waiters are already in the list
+	// or will notice that they have already been notified when trying to
+	// add themselves to the list.
+	atomic.Store(&l.notify, atomic.Load(&l.wait))
+	unlock(&l.lock)
+
+	// Go through the local list and ready all waiters.
+	for s != nil {
+		next := s.next
+		s.next = nil
+		readyWithTime(s, 4)
+		s = next
+	}
+}
+
+// notifyListNotifyOne notifies one entry in the list.
+//go:linkname notifyListNotifyOne sync.runtime_notifyListNotifyOne
+func notifyListNotifyOne(l *notifyList) {
+	// Fast-path: if there are no new waiters since the last notification
+	// we don't need to acquire the lock at all.
+	if atomic.Load(&l.wait) == atomic.Load(&l.notify) {
+		return
+	}
+
+	lockWithRank(&l.lock, lockRankNotifyList)
+
+	// Re-check under the lock if we need to do anything.
+	t := l.notify
+	if t == atomic.Load(&l.wait) {
+		unlock(&l.lock)
+		return
+	}
+
+	// Update the next notify ticket number.
+	atomic.Store(&l.notify, t+1)
+
+	// Try to find the g that needs to be notified.
+	// If it hasn't made it to the list yet we won't find it,
+	// but it won't park itself once it sees the new notify number.
+	//
+	// This scan looks linear but essentially always stops quickly.
+	// Because g's queue separately from taking numbers,
+	// there may be minor reorderings in the list, but we
+	// expect the g we're looking for to be near the front.
+	// The g has others in front of it on the list only to the
+	// extent that it lost the race, so the iteration will not
+	// be too long. This applies even when the g is missing:
+	// it hasn't yet gotten to sleep and has lost the race to
+	// the (few) other g's that we find on the list.
+	for p, s := (*sudog)(nil), l.head; s != nil; p, s = s, s.next {
+		if s.ticket == t {
+			n := s.next
+			if p != nil {
+				p.next = n
+			} else {
+				l.head = n
+			}
+			if n == nil {
+				l.tail = p
+			}
+			unlock(&l.lock)
+			s.next = nil
+			readyWithTime(s, 4)
+			return
+		}
+	}
+	unlock(&l.lock)
+}
+
+//go:linkname notifyListCheck sync.runtime_notifyListCheck
+func notifyListCheck(sz uintptr) {
+	if sz != unsafe.Sizeof(notifyList{}) {
+		print("runtime: bad notifyList size - sync=", sz, " runtime=", unsafe.Sizeof(notifyList{}), "\n")
+		throw("bad notifyList size")
+	}
+}
+
+//go:linkname sync_nanotime sync.runtime_nanotime
+func sync_nanotime() int64 {
+	return nanotime()
+}
diff --git a/src/runtime/sema_test.go b/src/runtime/sema_test.go
new file mode 100644
index 0000000..cf3de0a
--- /dev/null
+++ b/src/runtime/sema_test.go
@@ -0,0 +1,103 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	. "runtime"
+	"sync"
+	"sync/atomic"
+	"testing"
+)
+
+// TestSemaHandoff checks that when semrelease+handoff is
+// requested, the G that releases the semaphore yields its
+// P directly to the first waiter in line.
+// See issue 33747 for discussion.
+func TestSemaHandoff(t *testing.T) {
+	const iter = 10000
+	ok := 0
+	for i := 0; i < iter; i++ {
+		if testSemaHandoff() {
+			ok++
+		}
+	}
+	// As long as two thirds of handoffs are direct, we
+	// consider the test successful. The scheduler is
+	// nondeterministic, so this test checks that we get the
+	// desired outcome in a significant majority of cases.
+	// The actual ratio of direct handoffs is much higher
+	// (>90%) but we use a lower threshold to minimize the
+	// chances that unrelated changes in the runtime will
+	// cause the test to fail or become flaky.
+	if ok < iter*2/3 {
+		t.Fatal("direct handoff < 2/3:", ok, iter)
+	}
+}
+
+func TestSemaHandoff1(t *testing.T) {
+	if GOMAXPROCS(-1) <= 1 {
+		t.Skip("GOMAXPROCS <= 1")
+	}
+	defer GOMAXPROCS(GOMAXPROCS(-1))
+	GOMAXPROCS(1)
+	TestSemaHandoff(t)
+}
+
+func TestSemaHandoff2(t *testing.T) {
+	if GOMAXPROCS(-1) <= 2 {
+		t.Skip("GOMAXPROCS <= 2")
+	}
+	defer GOMAXPROCS(GOMAXPROCS(-1))
+	GOMAXPROCS(2)
+	TestSemaHandoff(t)
+}
+
+func testSemaHandoff() bool {
+	var sema, res uint32
+	done := make(chan struct{})
+
+	// We're testing that the current goroutine is able to yield its time slice
+	// to another goroutine. Stop the current goroutine from migrating to
+	// another CPU where it can win the race (and appear to have not yielded) by
+	// keeping the CPUs slightly busy.
+	var wg sync.WaitGroup
+	for i := 0; i < GOMAXPROCS(-1); i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			for {
+				select {
+				case <-done:
+					return
+				default:
+				}
+				Gosched()
+			}
+		}()
+	}
+
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		Semacquire(&sema)
+		atomic.CompareAndSwapUint32(&res, 0, 1)
+
+		Semrelease1(&sema, true, 0)
+		close(done)
+	}()
+	for SemNwait(&sema) == 0 {
+		Gosched() // wait for goroutine to block in Semacquire
+	}
+
+	// The crux of the test: we release the semaphore with handoff
+	// and immediately perform a CAS both here and in the waiter; we
+	// want the CAS in the waiter to execute first.
+	Semrelease1(&sema, true, 0)
+	atomic.CompareAndSwapUint32(&res, 0, 2)
+
+	wg.Wait() // wait for goroutines to finish to avoid data races
+
+	return res == 1 // did the waiter run first?
+}
diff --git a/src/runtime/semasleep_test.go b/src/runtime/semasleep_test.go
new file mode 100644
index 0000000..9b371b0
--- /dev/null
+++ b/src/runtime/semasleep_test.go
@@ -0,0 +1,65 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows,!js
+
+package runtime_test
+
+import (
+	"os/exec"
+	"syscall"
+	"testing"
+	"time"
+)
+
+// Issue #27250. Spurious wakeups to pthread_cond_timedwait_relative_np
+// shouldn't cause semasleep to retry with the same timeout which would
+// cause indefinite spinning.
+func TestSpuriousWakeupsNeverHangSemasleep(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	start := time.Now()
+	cmd := exec.Command(exe, "After1")
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("Failed to start command: %v", err)
+	}
+	doneCh := make(chan error, 1)
+	go func() {
+		doneCh <- cmd.Wait()
+	}()
+
+	// With the repro running, we can continuously send to it
+	// a non-terminal signal such as SIGIO, to spuriously
+	// wakeup pthread_cond_timedwait_relative_np.
+	unfixedTimer := time.NewTimer(2 * time.Second)
+	for {
+		select {
+		case <-time.After(200 * time.Millisecond):
+			// Send the pesky signal that toggles spinning
+			// indefinitely if #27520 is not fixed.
+			cmd.Process.Signal(syscall.SIGIO)
+
+		case <-unfixedTimer.C:
+			t.Error("Program failed to return on time and has to be killed, issue #27520 still exists")
+			cmd.Process.Signal(syscall.SIGKILL)
+			return
+
+		case err := <-doneCh:
+			if err != nil {
+				t.Fatalf("The program returned but unfortunately with an error: %v", err)
+			}
+			if time.Since(start) < 100*time.Millisecond {
+				t.Fatalf("The program stopped too quickly.")
+			}
+			return
+		}
+	}
+}
diff --git a/src/runtime/sigaction.go b/src/runtime/sigaction.go
new file mode 100644
index 0000000..3c88857
--- /dev/null
+++ b/src/runtime/sigaction.go
@@ -0,0 +1,16 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux,!amd64,!arm64 freebsd,!amd64
+
+package runtime
+
+// This version is used on Linux and FreeBSD systems on which we don't
+// use cgo to call the C version of sigaction.
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaction(sig uint32, new, old *sigactiont) {
+	sysSigaction(sig, new, old)
+}
diff --git a/src/runtime/signal_386.go b/src/runtime/signal_386.go
new file mode 100644
index 0000000..065aff4
--- /dev/null
+++ b/src/runtime/signal_386.go
@@ -0,0 +1,58 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build dragonfly freebsd linux netbsd openbsd
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("eax    ", hex(c.eax()), "\n")
+	print("ebx    ", hex(c.ebx()), "\n")
+	print("ecx    ", hex(c.ecx()), "\n")
+	print("edx    ", hex(c.edx()), "\n")
+	print("edi    ", hex(c.edi()), "\n")
+	print("esi    ", hex(c.esi()), "\n")
+	print("ebp    ", hex(c.ebp()), "\n")
+	print("esp    ", hex(c.esp()), "\n")
+	print("eip    ", hex(c.eip()), "\n")
+	print("eflags ", hex(c.eflags()), "\n")
+	print("cs     ", hex(c.cs()), "\n")
+	print("fs     ", hex(c.fs()), "\n")
+	print("gs     ", hex(c.gs()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.eip()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.esp()) }
+func (c *sigctxt) siglr() uintptr { return 0 }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	pc := uintptr(c.eip())
+	sp := uintptr(c.esp())
+
+	if shouldPushSigpanic(gp, pc, *(*uintptr)(unsafe.Pointer(sp))) {
+		c.pushCall(funcPC(sigpanic), pc)
+	} else {
+		// Not safe to push the call. Just clobber the frame.
+		c.set_eip(uint32(funcPC(sigpanic)))
+	}
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Make it look like we called target at resumePC.
+	sp := uintptr(c.esp())
+	sp -= sys.PtrSize
+	*(*uintptr)(unsafe.Pointer(sp)) = resumePC
+	c.set_esp(uint32(sp))
+	c.set_eip(uint32(targetPC))
+}
diff --git a/src/runtime/signal_aix_ppc64.go b/src/runtime/signal_aix_ppc64.go
new file mode 100644
index 0000000..c17563e
--- /dev/null
+++ b/src/runtime/signal_aix_ppc64.go
@@ -0,0 +1,85 @@
+/// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *context64 { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint64  { return c.regs().gpr[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().gpr[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().gpr[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().gpr[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().gpr[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().gpr[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().gpr[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().gpr[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().gpr[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().gpr[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().gpr[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().gpr[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().gpr[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().gpr[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().gpr[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().gpr[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().gpr[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().gpr[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().gpr[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().gpr[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().gpr[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().gpr[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().gpr[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().gpr[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().gpr[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().gpr[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().gpr[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().gpr[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().gpr[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().gpr[29] }
+func (c *sigctxt) r30() uint64 { return c.regs().gpr[30] }
+func (c *sigctxt) r31() uint64 { return c.regs().gpr[31] }
+func (c *sigctxt) sp() uint64  { return c.regs().gpr[1] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().iar }
+
+func (c *sigctxt) ctr() uint64    { return c.regs().ctr }
+func (c *sigctxt) link() uint64   { return c.regs().lr }
+func (c *sigctxt) xer() uint32    { return c.regs().xer }
+func (c *sigctxt) ccr() uint32    { return c.regs().cr }
+func (c *sigctxt) fpscr() uint32  { return c.regs().fpscr }
+func (c *sigctxt) fpscrx() uint32 { return c.regs().fpscrx }
+
+// TODO(aix): find trap equivalent
+func (c *sigctxt) trap() uint32 { return 0x0 }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return uint64(c.info.si_addr) }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.sigaddr()) }
+
+func (c *sigctxt) set_r0(x uint64)   { c.regs().gpr[0] = x }
+func (c *sigctxt) set_r12(x uint64)  { c.regs().gpr[12] = x }
+func (c *sigctxt) set_r30(x uint64)  { c.regs().gpr[30] = x }
+func (c *sigctxt) set_pc(x uint64)   { c.regs().iar = x }
+func (c *sigctxt) set_sp(x uint64)   { c.regs().gpr[1] = x }
+func (c *sigctxt) set_link(x uint64) { c.regs().lr = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*sys.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_amd64.go b/src/runtime/signal_amd64.go
new file mode 100644
index 0000000..6ab1f75
--- /dev/null
+++ b/src/runtime/signal_amd64.go
@@ -0,0 +1,83 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build amd64
+// +build darwin dragonfly freebsd linux netbsd openbsd solaris
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("rax    ", hex(c.rax()), "\n")
+	print("rbx    ", hex(c.rbx()), "\n")
+	print("rcx    ", hex(c.rcx()), "\n")
+	print("rdx    ", hex(c.rdx()), "\n")
+	print("rdi    ", hex(c.rdi()), "\n")
+	print("rsi    ", hex(c.rsi()), "\n")
+	print("rbp    ", hex(c.rbp()), "\n")
+	print("rsp    ", hex(c.rsp()), "\n")
+	print("r8     ", hex(c.r8()), "\n")
+	print("r9     ", hex(c.r9()), "\n")
+	print("r10    ", hex(c.r10()), "\n")
+	print("r11    ", hex(c.r11()), "\n")
+	print("r12    ", hex(c.r12()), "\n")
+	print("r13    ", hex(c.r13()), "\n")
+	print("r14    ", hex(c.r14()), "\n")
+	print("r15    ", hex(c.r15()), "\n")
+	print("rip    ", hex(c.rip()), "\n")
+	print("rflags ", hex(c.rflags()), "\n")
+	print("cs     ", hex(c.cs()), "\n")
+	print("fs     ", hex(c.fs()), "\n")
+	print("gs     ", hex(c.gs()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.rip()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.rsp()) }
+func (c *sigctxt) siglr() uintptr { return 0 }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// Work around Leopard bug that doesn't set FPE_INTDIV.
+	// Look at instruction to see if it is a divide.
+	// Not necessary in Snow Leopard (si_code will be != 0).
+	if GOOS == "darwin" && sig == _SIGFPE && gp.sigcode0 == 0 {
+		pc := (*[4]byte)(unsafe.Pointer(gp.sigpc))
+		i := 0
+		if pc[i]&0xF0 == 0x40 { // 64-bit REX prefix
+			i++
+		} else if pc[i] == 0x66 { // 16-bit instruction prefix
+			i++
+		}
+		if pc[i] == 0xF6 || pc[i] == 0xF7 {
+			gp.sigcode0 = _FPE_INTDIV
+		}
+	}
+
+	pc := uintptr(c.rip())
+	sp := uintptr(c.rsp())
+
+	if shouldPushSigpanic(gp, pc, *(*uintptr)(unsafe.Pointer(sp))) {
+		c.pushCall(funcPC(sigpanic), pc)
+	} else {
+		// Not safe to push the call. Just clobber the frame.
+		c.set_rip(uint64(funcPC(sigpanic)))
+	}
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Make it look like we called target at resumePC.
+	sp := uintptr(c.rsp())
+	sp -= sys.PtrSize
+	*(*uintptr)(unsafe.Pointer(sp)) = resumePC
+	c.set_rsp(uint64(sp))
+	c.set_rip(uint64(targetPC))
+}
diff --git a/src/runtime/signal_arm.go b/src/runtime/signal_arm.go
new file mode 100644
index 0000000..156d9d3
--- /dev/null
+++ b/src/runtime/signal_arm.go
@@ -0,0 +1,78 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build dragonfly freebsd linux netbsd openbsd
+
+package runtime
+
+import "unsafe"
+
+func dumpregs(c *sigctxt) {
+	print("trap    ", hex(c.trap()), "\n")
+	print("error   ", hex(c.error()), "\n")
+	print("oldmask ", hex(c.oldmask()), "\n")
+	print("r0      ", hex(c.r0()), "\n")
+	print("r1      ", hex(c.r1()), "\n")
+	print("r2      ", hex(c.r2()), "\n")
+	print("r3      ", hex(c.r3()), "\n")
+	print("r4      ", hex(c.r4()), "\n")
+	print("r5      ", hex(c.r5()), "\n")
+	print("r6      ", hex(c.r6()), "\n")
+	print("r7      ", hex(c.r7()), "\n")
+	print("r8      ", hex(c.r8()), "\n")
+	print("r9      ", hex(c.r9()), "\n")
+	print("r10     ", hex(c.r10()), "\n")
+	print("fp      ", hex(c.fp()), "\n")
+	print("ip      ", hex(c.ip()), "\n")
+	print("sp      ", hex(c.sp()), "\n")
+	print("lr      ", hex(c.lr()), "\n")
+	print("pc      ", hex(c.pc()), "\n")
+	print("cpsr    ", hex(c.cpsr()), "\n")
+	print("fault   ", hex(c.fault()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.lr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange lr, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LR to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - 4
+	c.set_sp(sp)
+	*(*uint32)(unsafe.Pointer(uintptr(sp))) = c.lr()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.lr())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_lr(uint32(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r10(uint32(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint32(funcPC(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - 4
+	c.set_sp(sp)
+	*(*uint32)(unsafe.Pointer(uintptr(sp))) = c.lr()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_lr(uint32(resumePC))
+	c.set_pc(uint32(targetPC))
+}
diff --git a/src/runtime/signal_arm64.go b/src/runtime/signal_arm64.go
new file mode 100644
index 0000000..3c20139
--- /dev/null
+++ b/src/runtime/signal_arm64.go
@@ -0,0 +1,94 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build darwin freebsd linux netbsd openbsd
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("r0      ", hex(c.r0()), "\n")
+	print("r1      ", hex(c.r1()), "\n")
+	print("r2      ", hex(c.r2()), "\n")
+	print("r3      ", hex(c.r3()), "\n")
+	print("r4      ", hex(c.r4()), "\n")
+	print("r5      ", hex(c.r5()), "\n")
+	print("r6      ", hex(c.r6()), "\n")
+	print("r7      ", hex(c.r7()), "\n")
+	print("r8      ", hex(c.r8()), "\n")
+	print("r9      ", hex(c.r9()), "\n")
+	print("r10     ", hex(c.r10()), "\n")
+	print("r11     ", hex(c.r11()), "\n")
+	print("r12     ", hex(c.r12()), "\n")
+	print("r13     ", hex(c.r13()), "\n")
+	print("r14     ", hex(c.r14()), "\n")
+	print("r15     ", hex(c.r15()), "\n")
+	print("r16     ", hex(c.r16()), "\n")
+	print("r17     ", hex(c.r17()), "\n")
+	print("r18     ", hex(c.r18()), "\n")
+	print("r19     ", hex(c.r19()), "\n")
+	print("r20     ", hex(c.r20()), "\n")
+	print("r21     ", hex(c.r21()), "\n")
+	print("r22     ", hex(c.r22()), "\n")
+	print("r23     ", hex(c.r23()), "\n")
+	print("r24     ", hex(c.r24()), "\n")
+	print("r25     ", hex(c.r25()), "\n")
+	print("r26     ", hex(c.r26()), "\n")
+	print("r27     ", hex(c.r27()), "\n")
+	print("r28     ", hex(c.r28()), "\n")
+	print("r29     ", hex(c.r29()), "\n")
+	print("lr      ", hex(c.lr()), "\n")
+	print("sp      ", hex(c.sp()), "\n")
+	print("pc      ", hex(c.pc()), "\n")
+	print("fault   ", hex(c.fault()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.lr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange lr, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LR to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - sys.SpAlign // needs only sizeof uint64, but must align the stack
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.lr()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.lr())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_lr(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r28(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint64(funcPC(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra space is known to gentraceback.
+	sp := c.sp() - 16 // SP needs 16-byte alignment
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.lr()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_lr(uint64(resumePC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_darwin.go b/src/runtime/signal_darwin.go
new file mode 100644
index 0000000..8090fb2
--- /dev/null
+++ b/src/runtime/signal_darwin.go
@@ -0,0 +1,40 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/* 0 */ {0, "SIGNONE: no trap"},
+	/* 1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/* 6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/* 7 */ {_SigThrow, "SIGEMT: emulate instruction executed"},
+	/* 8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/* 9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigThrow, "SIGSYS: bad system call"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 17 */ {0, "SIGSTOP: stop"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 19 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue after stop"},
+	/* 20 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify + _SigIgn, "SIGIO: i/o now possible"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify + _SigIgn, "SIGINFO: status request from keyboard"},
+	/* 30 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 31 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+}
diff --git a/src/runtime/signal_darwin_amd64.go b/src/runtime/signal_darwin_amd64.go
new file mode 100644
index 0000000..abc212a
--- /dev/null
+++ b/src/runtime/signal_darwin_amd64.go
@@ -0,0 +1,92 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *regs64 { return &(*ucontext)(c.ctxt).uc_mcontext.ss }
+
+func (c *sigctxt) rax() uint64 { return c.regs().rax }
+func (c *sigctxt) rbx() uint64 { return c.regs().rbx }
+func (c *sigctxt) rcx() uint64 { return c.regs().rcx }
+func (c *sigctxt) rdx() uint64 { return c.regs().rdx }
+func (c *sigctxt) rdi() uint64 { return c.regs().rdi }
+func (c *sigctxt) rsi() uint64 { return c.regs().rsi }
+func (c *sigctxt) rbp() uint64 { return c.regs().rbp }
+func (c *sigctxt) rsp() uint64 { return c.regs().rsp }
+func (c *sigctxt) r8() uint64  { return c.regs().r8 }
+func (c *sigctxt) r9() uint64  { return c.regs().r9 }
+func (c *sigctxt) r10() uint64 { return c.regs().r10 }
+func (c *sigctxt) r11() uint64 { return c.regs().r11 }
+func (c *sigctxt) r12() uint64 { return c.regs().r12 }
+func (c *sigctxt) r13() uint64 { return c.regs().r13 }
+func (c *sigctxt) r14() uint64 { return c.regs().r14 }
+func (c *sigctxt) r15() uint64 { return c.regs().r15 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().rip }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().rflags }
+func (c *sigctxt) cs() uint64      { return c.regs().cs }
+func (c *sigctxt) fs() uint64      { return c.regs().fs }
+func (c *sigctxt) gs() uint64      { return c.regs().gs }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().rip = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().rsp = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) { c.info.si_addr = x }
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+	switch sig {
+	case _SIGTRAP:
+		// OS X sets c.sigcode() == TRAP_BRKPT unconditionally for all SIGTRAPs,
+		// leaving no way to distinguish a breakpoint-induced SIGTRAP
+		// from an asynchronous signal SIGTRAP.
+		// They all look breakpoint-induced by default.
+		// Try looking at the code to see if it's a breakpoint.
+		// The assumption is that we're very unlikely to get an
+		// asynchronous SIGTRAP at just the moment that the
+		// PC started to point at unmapped memory.
+		pc := uintptr(c.rip())
+		// OS X will leave the pc just after the INT 3 instruction.
+		// INT 3 is usually 1 byte, but there is a 2-byte form.
+		code := (*[2]byte)(unsafe.Pointer(pc - 2))
+		if code[1] != 0xCC && (code[0] != 0xCD || code[1] != 3) {
+			// SIGTRAP on something other than INT 3.
+			c.set_sigcode(_SI_USER)
+		}
+
+	case _SIGSEGV:
+		// x86-64 has 48-bit virtual addresses. The top 16 bits must echo bit 47.
+		// The hardware delivers a different kind of fault for a malformed address
+		// than it does for an attempt to access a valid but unmapped address.
+		// OS X 10.9.2 mishandles the malformed address case, making it look like
+		// a user-generated signal (like someone ran kill -SEGV ourpid).
+		// We pass user-generated signals to os/signal, or else ignore them.
+		// Doing that here - and returning to the faulting code - results in an
+		// infinite loop. It appears the best we can do is rewrite what the kernel
+		// delivers into something more like the truth. The address used below
+		// has very little chance of being the one that caused the fault, but it is
+		// malformed, it is clearly not a real pointer, and if it does get printed
+		// in real life, people will probably search for it and find this code.
+		// There are no Google hits for b01dfacedebac1e or 0xb01dfacedebac1e
+		// as I type this comment.
+		if c.sigcode() == _SI_USER {
+			c.set_sigcode(_SI_USER + 1)
+			c.set_sigaddr(0xb01dfacedebac1e)
+		}
+	}
+}
diff --git a/src/runtime/signal_darwin_arm64.go b/src/runtime/signal_darwin_arm64.go
new file mode 100644
index 0000000..690ffe4
--- /dev/null
+++ b/src/runtime/signal_darwin_arm64.go
@@ -0,0 +1,90 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *regs64 { return &(*ucontext)(c.ctxt).uc_mcontext.ss }
+
+func (c *sigctxt) r0() uint64  { return c.regs().x[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().x[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().x[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().x[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().x[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().x[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().x[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().x[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().x[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().x[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().x[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().x[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().x[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().x[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().x[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().x[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().x[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().x[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().x[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().x[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().x[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().x[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().x[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().x[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().x[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().x[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().x[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().x[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().x[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().fp }
+func (c *sigctxt) lr() uint64  { return c.regs().lr }
+func (c *sigctxt) sp() uint64  { return c.regs().sp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().pc }
+
+func (c *sigctxt) fault() uintptr { return uintptr(unsafe.Pointer(c.info.si_addr)) }
+
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return uint64(uintptr(unsafe.Pointer(c.info.si_addr))) }
+
+func (c *sigctxt) set_pc(x uint64)  { c.regs().pc = x }
+func (c *sigctxt) set_sp(x uint64)  { c.regs().sp = x }
+func (c *sigctxt) set_lr(x uint64)  { c.regs().lr = x }
+func (c *sigctxt) set_r28(x uint64) { c.regs().x[28] = x }
+
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	c.info.si_addr = (*byte)(unsafe.Pointer(uintptr(x)))
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+	switch sig {
+	case _SIGTRAP:
+		// OS X sets c.sigcode() == TRAP_BRKPT unconditionally for all SIGTRAPs,
+		// leaving no way to distinguish a breakpoint-induced SIGTRAP
+		// from an asynchronous signal SIGTRAP.
+		// They all look breakpoint-induced by default.
+		// Try looking at the code to see if it's a breakpoint.
+		// The assumption is that we're very unlikely to get an
+		// asynchronous SIGTRAP at just the moment that the
+		// PC started to point at unmapped memory.
+		pc := uintptr(c.pc())
+		// OS X will leave the pc just after the instruction.
+		code := (*uint32)(unsafe.Pointer(pc - 4))
+		if *code != 0xd4200000 {
+			// SIGTRAP on something other than breakpoint.
+			c.set_sigcode(_SI_USER)
+		}
+	}
+}
diff --git a/src/runtime/signal_dragonfly.go b/src/runtime/signal_dragonfly.go
new file mode 100644
index 0000000..f2b26e7
--- /dev/null
+++ b/src/runtime/signal_dragonfly.go
@@ -0,0 +1,41 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/* 0 */ {0, "SIGNONE: no trap"},
+	/* 1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/* 6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/* 7 */ {_SigThrow, "SIGEMT: emulate instruction executed"},
+	/* 8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/* 9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigThrow, "SIGSYS: bad system call"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 17 */ {0, "SIGSTOP: stop"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 19 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue after stop"},
+	/* 20 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify + _SigIgn, "SIGIO: i/o now possible"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify + _SigIgn, "SIGINFO: status request from keyboard"},
+	/* 30 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 31 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/* 32 */ {_SigNotify, "SIGTHR: reserved"},
+}
diff --git a/src/runtime/signal_dragonfly_amd64.go b/src/runtime/signal_dragonfly_amd64.go
new file mode 100644
index 0000000..c473edd
--- /dev/null
+++ b/src/runtime/signal_dragonfly_amd64.go
@@ -0,0 +1,51 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext {
+	return (*mcontext)(unsafe.Pointer(&(*ucontext)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) rax() uint64 { return c.regs().mc_rax }
+func (c *sigctxt) rbx() uint64 { return c.regs().mc_rbx }
+func (c *sigctxt) rcx() uint64 { return c.regs().mc_rcx }
+func (c *sigctxt) rdx() uint64 { return c.regs().mc_rdx }
+func (c *sigctxt) rdi() uint64 { return c.regs().mc_rdi }
+func (c *sigctxt) rsi() uint64 { return c.regs().mc_rsi }
+func (c *sigctxt) rbp() uint64 { return c.regs().mc_rbp }
+func (c *sigctxt) rsp() uint64 { return c.regs().mc_rsp }
+func (c *sigctxt) r8() uint64  { return c.regs().mc_r8 }
+func (c *sigctxt) r9() uint64  { return c.regs().mc_r9 }
+func (c *sigctxt) r10() uint64 { return c.regs().mc_r10 }
+func (c *sigctxt) r11() uint64 { return c.regs().mc_r11 }
+func (c *sigctxt) r12() uint64 { return c.regs().mc_r12 }
+func (c *sigctxt) r13() uint64 { return c.regs().mc_r13 }
+func (c *sigctxt) r14() uint64 { return c.regs().mc_r14 }
+func (c *sigctxt) r15() uint64 { return c.regs().mc_r15 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().mc_rip }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().mc_rflags }
+func (c *sigctxt) cs() uint64      { return c.regs().mc_cs }
+func (c *sigctxt) fs() uint64      { return c.regs().mc_ss }
+func (c *sigctxt) gs() uint64      { return c.regs().mc_ss }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().mc_rip = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().mc_rsp = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) { c.info.si_addr = x }
diff --git a/src/runtime/signal_freebsd.go b/src/runtime/signal_freebsd.go
new file mode 100644
index 0000000..2812c69
--- /dev/null
+++ b/src/runtime/signal_freebsd.go
@@ -0,0 +1,41 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/* 0 */ {0, "SIGNONE: no trap"},
+	/* 1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/* 6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/* 7 */ {_SigThrow, "SIGEMT: emulate instruction executed"},
+	/* 8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/* 9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigNotify, "SIGSYS: bad system call"}, // see golang.org/issues/15204
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 17 */ {0, "SIGSTOP: stop"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 19 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue after stop"},
+	/* 20 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify + _SigIgn, "SIGIO: i/o now possible"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify + _SigIgn, "SIGINFO: status request from keyboard"},
+	/* 30 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 31 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/* 32 */ {_SigNotify, "SIGTHR: reserved"},
+}
diff --git a/src/runtime/signal_freebsd_386.go b/src/runtime/signal_freebsd_386.go
new file mode 100644
index 0000000..f7cc0df
--- /dev/null
+++ b/src/runtime/signal_freebsd_386.go
@@ -0,0 +1,41 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) eax() uint32 { return c.regs().mc_eax }
+func (c *sigctxt) ebx() uint32 { return c.regs().mc_ebx }
+func (c *sigctxt) ecx() uint32 { return c.regs().mc_ecx }
+func (c *sigctxt) edx() uint32 { return c.regs().mc_edx }
+func (c *sigctxt) edi() uint32 { return c.regs().mc_edi }
+func (c *sigctxt) esi() uint32 { return c.regs().mc_esi }
+func (c *sigctxt) ebp() uint32 { return c.regs().mc_ebp }
+func (c *sigctxt) esp() uint32 { return c.regs().mc_esp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) eip() uint32 { return c.regs().mc_eip }
+
+func (c *sigctxt) eflags() uint32  { return c.regs().mc_eflags }
+func (c *sigctxt) cs() uint32      { return c.regs().mc_cs }
+func (c *sigctxt) fs() uint32      { return c.regs().mc_fs }
+func (c *sigctxt) gs() uint32      { return c.regs().mc_gs }
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 { return uint32(c.info.si_addr) }
+
+func (c *sigctxt) set_eip(x uint32)     { c.regs().mc_eip = x }
+func (c *sigctxt) set_esp(x uint32)     { c.regs().mc_esp = x }
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) { c.info.si_addr = uintptr(x) }
diff --git a/src/runtime/signal_freebsd_amd64.go b/src/runtime/signal_freebsd_amd64.go
new file mode 100644
index 0000000..20b86e7
--- /dev/null
+++ b/src/runtime/signal_freebsd_amd64.go
@@ -0,0 +1,51 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext {
+	return (*mcontext)(unsafe.Pointer(&(*ucontext)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) rax() uint64 { return c.regs().mc_rax }
+func (c *sigctxt) rbx() uint64 { return c.regs().mc_rbx }
+func (c *sigctxt) rcx() uint64 { return c.regs().mc_rcx }
+func (c *sigctxt) rdx() uint64 { return c.regs().mc_rdx }
+func (c *sigctxt) rdi() uint64 { return c.regs().mc_rdi }
+func (c *sigctxt) rsi() uint64 { return c.regs().mc_rsi }
+func (c *sigctxt) rbp() uint64 { return c.regs().mc_rbp }
+func (c *sigctxt) rsp() uint64 { return c.regs().mc_rsp }
+func (c *sigctxt) r8() uint64  { return c.regs().mc_r8 }
+func (c *sigctxt) r9() uint64  { return c.regs().mc_r9 }
+func (c *sigctxt) r10() uint64 { return c.regs().mc_r10 }
+func (c *sigctxt) r11() uint64 { return c.regs().mc_r11 }
+func (c *sigctxt) r12() uint64 { return c.regs().mc_r12 }
+func (c *sigctxt) r13() uint64 { return c.regs().mc_r13 }
+func (c *sigctxt) r14() uint64 { return c.regs().mc_r14 }
+func (c *sigctxt) r15() uint64 { return c.regs().mc_r15 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().mc_rip }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().mc_rflags }
+func (c *sigctxt) cs() uint64      { return c.regs().mc_cs }
+func (c *sigctxt) fs() uint64      { return uint64(c.regs().mc_fs) }
+func (c *sigctxt) gs() uint64      { return uint64(c.regs().mc_gs) }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().mc_rip = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().mc_rsp = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) { c.info.si_addr = x }
diff --git a/src/runtime/signal_freebsd_arm.go b/src/runtime/signal_freebsd_arm.go
new file mode 100644
index 0000000..2135c1e
--- /dev/null
+++ b/src/runtime/signal_freebsd_arm.go
@@ -0,0 +1,55 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint32  { return c.regs().__gregs[0] }
+func (c *sigctxt) r1() uint32  { return c.regs().__gregs[1] }
+func (c *sigctxt) r2() uint32  { return c.regs().__gregs[2] }
+func (c *sigctxt) r3() uint32  { return c.regs().__gregs[3] }
+func (c *sigctxt) r4() uint32  { return c.regs().__gregs[4] }
+func (c *sigctxt) r5() uint32  { return c.regs().__gregs[5] }
+func (c *sigctxt) r6() uint32  { return c.regs().__gregs[6] }
+func (c *sigctxt) r7() uint32  { return c.regs().__gregs[7] }
+func (c *sigctxt) r8() uint32  { return c.regs().__gregs[8] }
+func (c *sigctxt) r9() uint32  { return c.regs().__gregs[9] }
+func (c *sigctxt) r10() uint32 { return c.regs().__gregs[10] }
+func (c *sigctxt) fp() uint32  { return c.regs().__gregs[11] }
+func (c *sigctxt) ip() uint32  { return c.regs().__gregs[12] }
+func (c *sigctxt) sp() uint32  { return c.regs().__gregs[13] }
+func (c *sigctxt) lr() uint32  { return c.regs().__gregs[14] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint32 { return c.regs().__gregs[15] }
+
+func (c *sigctxt) cpsr() uint32    { return c.regs().__gregs[16] }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.info.si_addr) }
+func (c *sigctxt) trap() uint32    { return 0 }
+func (c *sigctxt) error() uint32   { return 0 }
+func (c *sigctxt) oldmask() uint32 { return 0 }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 { return uint32(c.info.si_addr) }
+
+func (c *sigctxt) set_pc(x uint32)  { c.regs().__gregs[15] = x }
+func (c *sigctxt) set_sp(x uint32)  { c.regs().__gregs[13] = x }
+func (c *sigctxt) set_lr(x uint32)  { c.regs().__gregs[14] = x }
+func (c *sigctxt) set_r10(x uint32) { c.regs().__gregs[10] = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	c.info.si_addr = uintptr(x)
+}
diff --git a/src/runtime/signal_freebsd_arm64.go b/src/runtime/signal_freebsd_arm64.go
new file mode 100644
index 0000000..159e965
--- /dev/null
+++ b/src/runtime/signal_freebsd_arm64.go
@@ -0,0 +1,66 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint64  { return c.regs().mc_gpregs.gp_x[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().mc_gpregs.gp_x[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().mc_gpregs.gp_x[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().mc_gpregs.gp_x[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().mc_gpregs.gp_x[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().mc_gpregs.gp_x[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().mc_gpregs.gp_x[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().mc_gpregs.gp_x[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().mc_gpregs.gp_x[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().mc_gpregs.gp_x[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().mc_gpregs.gp_x[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().mc_gpregs.gp_x[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().mc_gpregs.gp_x[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().mc_gpregs.gp_x[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().mc_gpregs.gp_x[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().mc_gpregs.gp_x[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().mc_gpregs.gp_x[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().mc_gpregs.gp_x[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().mc_gpregs.gp_x[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().mc_gpregs.gp_x[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().mc_gpregs.gp_x[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().mc_gpregs.gp_x[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().mc_gpregs.gp_x[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().mc_gpregs.gp_x[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().mc_gpregs.gp_x[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().mc_gpregs.gp_x[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().mc_gpregs.gp_x[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().mc_gpregs.gp_x[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().mc_gpregs.gp_x[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().mc_gpregs.gp_x[29] }
+func (c *sigctxt) lr() uint64  { return c.regs().mc_gpregs.gp_lr }
+func (c *sigctxt) sp() uint64  { return c.regs().mc_gpregs.gp_sp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().mc_gpregs.gp_elr }
+
+func (c *sigctxt) fault() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_pc(x uint64)  { c.regs().mc_gpregs.gp_elr = x }
+func (c *sigctxt) set_sp(x uint64)  { c.regs().mc_gpregs.gp_sp = x }
+func (c *sigctxt) set_lr(x uint64)  { c.regs().mc_gpregs.gp_lr = x }
+func (c *sigctxt) set_r28(x uint64) { c.regs().mc_gpregs.gp_x[28] = x }
+
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) { c.info.si_addr = x }
diff --git a/src/runtime/signal_linux_386.go b/src/runtime/signal_linux_386.go
new file mode 100644
index 0000000..13d9df4
--- /dev/null
+++ b/src/runtime/signal_linux_386.go
@@ -0,0 +1,46 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) eax() uint32 { return c.regs().eax }
+func (c *sigctxt) ebx() uint32 { return c.regs().ebx }
+func (c *sigctxt) ecx() uint32 { return c.regs().ecx }
+func (c *sigctxt) edx() uint32 { return c.regs().edx }
+func (c *sigctxt) edi() uint32 { return c.regs().edi }
+func (c *sigctxt) esi() uint32 { return c.regs().esi }
+func (c *sigctxt) ebp() uint32 { return c.regs().ebp }
+func (c *sigctxt) esp() uint32 { return c.regs().esp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) eip() uint32 { return c.regs().eip }
+
+func (c *sigctxt) eflags() uint32  { return c.regs().eflags }
+func (c *sigctxt) cs() uint32      { return uint32(c.regs().cs) }
+func (c *sigctxt) fs() uint32      { return uint32(c.regs().fs) }
+func (c *sigctxt) gs() uint32      { return uint32(c.regs().gs) }
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 { return c.info.si_addr }
+
+func (c *sigctxt) set_eip(x uint32)     { c.regs().eip = x }
+func (c *sigctxt) set_esp(x uint32)     { c.regs().esp = x }
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*sys.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_amd64.go b/src/runtime/signal_linux_amd64.go
new file mode 100644
index 0000000..210e896
--- /dev/null
+++ b/src/runtime/signal_linux_amd64.go
@@ -0,0 +1,56 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(unsafe.Pointer(&(*ucontext)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) rax() uint64 { return c.regs().rax }
+func (c *sigctxt) rbx() uint64 { return c.regs().rbx }
+func (c *sigctxt) rcx() uint64 { return c.regs().rcx }
+func (c *sigctxt) rdx() uint64 { return c.regs().rdx }
+func (c *sigctxt) rdi() uint64 { return c.regs().rdi }
+func (c *sigctxt) rsi() uint64 { return c.regs().rsi }
+func (c *sigctxt) rbp() uint64 { return c.regs().rbp }
+func (c *sigctxt) rsp() uint64 { return c.regs().rsp }
+func (c *sigctxt) r8() uint64  { return c.regs().r8 }
+func (c *sigctxt) r9() uint64  { return c.regs().r9 }
+func (c *sigctxt) r10() uint64 { return c.regs().r10 }
+func (c *sigctxt) r11() uint64 { return c.regs().r11 }
+func (c *sigctxt) r12() uint64 { return c.regs().r12 }
+func (c *sigctxt) r13() uint64 { return c.regs().r13 }
+func (c *sigctxt) r14() uint64 { return c.regs().r14 }
+func (c *sigctxt) r15() uint64 { return c.regs().r15 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().rip }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().eflags }
+func (c *sigctxt) cs() uint64      { return uint64(c.regs().cs) }
+func (c *sigctxt) fs() uint64      { return uint64(c.regs().fs) }
+func (c *sigctxt) gs() uint64      { return uint64(c.regs().gs) }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().rip = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().rsp = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*sys.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_arm.go b/src/runtime/signal_linux_arm.go
new file mode 100644
index 0000000..876b505
--- /dev/null
+++ b/src/runtime/signal_linux_arm.go
@@ -0,0 +1,58 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint32  { return c.regs().r0 }
+func (c *sigctxt) r1() uint32  { return c.regs().r1 }
+func (c *sigctxt) r2() uint32  { return c.regs().r2 }
+func (c *sigctxt) r3() uint32  { return c.regs().r3 }
+func (c *sigctxt) r4() uint32  { return c.regs().r4 }
+func (c *sigctxt) r5() uint32  { return c.regs().r5 }
+func (c *sigctxt) r6() uint32  { return c.regs().r6 }
+func (c *sigctxt) r7() uint32  { return c.regs().r7 }
+func (c *sigctxt) r8() uint32  { return c.regs().r8 }
+func (c *sigctxt) r9() uint32  { return c.regs().r9 }
+func (c *sigctxt) r10() uint32 { return c.regs().r10 }
+func (c *sigctxt) fp() uint32  { return c.regs().fp }
+func (c *sigctxt) ip() uint32  { return c.regs().ip }
+func (c *sigctxt) sp() uint32  { return c.regs().sp }
+func (c *sigctxt) lr() uint32  { return c.regs().lr }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint32 { return c.regs().pc }
+
+func (c *sigctxt) cpsr() uint32    { return c.regs().cpsr }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.regs().fault_address) }
+func (c *sigctxt) trap() uint32    { return c.regs().trap_no }
+func (c *sigctxt) error() uint32   { return c.regs().error_code }
+func (c *sigctxt) oldmask() uint32 { return c.regs().oldmask }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 { return c.info.si_addr }
+
+func (c *sigctxt) set_pc(x uint32)  { c.regs().pc = x }
+func (c *sigctxt) set_sp(x uint32)  { c.regs().sp = x }
+func (c *sigctxt) set_lr(x uint32)  { c.regs().lr = x }
+func (c *sigctxt) set_r10(x uint32) { c.regs().r10 = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*sys.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_arm64.go b/src/runtime/signal_linux_arm64.go
new file mode 100644
index 0000000..2075f25
--- /dev/null
+++ b/src/runtime/signal_linux_arm64.go
@@ -0,0 +1,71 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint64  { return c.regs().regs[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().regs[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().regs[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().regs[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().regs[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().regs[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().regs[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().regs[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().regs[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().regs[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().regs[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().regs[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().regs[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().regs[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().regs[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().regs[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().regs[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().regs[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().regs[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().regs[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().regs[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().regs[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().regs[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().regs[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().regs[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().regs[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().regs[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().regs[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().regs[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().regs[29] }
+func (c *sigctxt) lr() uint64  { return c.regs().regs[30] }
+func (c *sigctxt) sp() uint64  { return c.regs().sp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().pc }
+
+func (c *sigctxt) pstate() uint64 { return c.regs().pstate }
+func (c *sigctxt) fault() uintptr { return uintptr(c.regs().fault_address) }
+
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_pc(x uint64)  { c.regs().pc = x }
+func (c *sigctxt) set_sp(x uint64)  { c.regs().sp = x }
+func (c *sigctxt) set_lr(x uint64)  { c.regs().regs[30] = x }
+func (c *sigctxt) set_r28(x uint64) { c.regs().regs[28] = x }
+
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*sys.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_mips64x.go b/src/runtime/signal_linux_mips64x.go
new file mode 100644
index 0000000..b608197
--- /dev/null
+++ b/src/runtime/signal_linux_mips64x.go
@@ -0,0 +1,78 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build mips64 mips64le
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint64  { return c.regs().sc_regs[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().sc_regs[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().sc_regs[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().sc_regs[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().sc_regs[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().sc_regs[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().sc_regs[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().sc_regs[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().sc_regs[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().sc_regs[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().sc_regs[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().sc_regs[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().sc_regs[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().sc_regs[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().sc_regs[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().sc_regs[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().sc_regs[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().sc_regs[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().sc_regs[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().sc_regs[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().sc_regs[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().sc_regs[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().sc_regs[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().sc_regs[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().sc_regs[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().sc_regs[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().sc_regs[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().sc_regs[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().sc_regs[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().sc_regs[29] }
+func (c *sigctxt) r30() uint64 { return c.regs().sc_regs[30] }
+func (c *sigctxt) r31() uint64 { return c.regs().sc_regs[31] }
+func (c *sigctxt) sp() uint64  { return c.regs().sc_regs[29] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().sc_pc }
+
+func (c *sigctxt) link() uint64 { return c.regs().sc_regs[31] }
+func (c *sigctxt) lo() uint64   { return c.regs().sc_mdlo }
+func (c *sigctxt) hi() uint64   { return c.regs().sc_mdhi }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_r28(x uint64)  { c.regs().sc_regs[28] = x }
+func (c *sigctxt) set_r30(x uint64)  { c.regs().sc_regs[30] = x }
+func (c *sigctxt) set_pc(x uint64)   { c.regs().sc_pc = x }
+func (c *sigctxt) set_sp(x uint64)   { c.regs().sc_regs[29] = x }
+func (c *sigctxt) set_link(x uint64) { c.regs().sc_regs[31] = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*sys.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_mipsx.go b/src/runtime/signal_linux_mipsx.go
new file mode 100644
index 0000000..c88ac4d
--- /dev/null
+++ b/src/runtime/signal_linux_mipsx.go
@@ -0,0 +1,65 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build mips mipsle
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+func (c *sigctxt) r0() uint32        { return uint32(c.regs().sc_regs[0]) }
+func (c *sigctxt) r1() uint32        { return uint32(c.regs().sc_regs[1]) }
+func (c *sigctxt) r2() uint32        { return uint32(c.regs().sc_regs[2]) }
+func (c *sigctxt) r3() uint32        { return uint32(c.regs().sc_regs[3]) }
+func (c *sigctxt) r4() uint32        { return uint32(c.regs().sc_regs[4]) }
+func (c *sigctxt) r5() uint32        { return uint32(c.regs().sc_regs[5]) }
+func (c *sigctxt) r6() uint32        { return uint32(c.regs().sc_regs[6]) }
+func (c *sigctxt) r7() uint32        { return uint32(c.regs().sc_regs[7]) }
+func (c *sigctxt) r8() uint32        { return uint32(c.regs().sc_regs[8]) }
+func (c *sigctxt) r9() uint32        { return uint32(c.regs().sc_regs[9]) }
+func (c *sigctxt) r10() uint32       { return uint32(c.regs().sc_regs[10]) }
+func (c *sigctxt) r11() uint32       { return uint32(c.regs().sc_regs[11]) }
+func (c *sigctxt) r12() uint32       { return uint32(c.regs().sc_regs[12]) }
+func (c *sigctxt) r13() uint32       { return uint32(c.regs().sc_regs[13]) }
+func (c *sigctxt) r14() uint32       { return uint32(c.regs().sc_regs[14]) }
+func (c *sigctxt) r15() uint32       { return uint32(c.regs().sc_regs[15]) }
+func (c *sigctxt) r16() uint32       { return uint32(c.regs().sc_regs[16]) }
+func (c *sigctxt) r17() uint32       { return uint32(c.regs().sc_regs[17]) }
+func (c *sigctxt) r18() uint32       { return uint32(c.regs().sc_regs[18]) }
+func (c *sigctxt) r19() uint32       { return uint32(c.regs().sc_regs[19]) }
+func (c *sigctxt) r20() uint32       { return uint32(c.regs().sc_regs[20]) }
+func (c *sigctxt) r21() uint32       { return uint32(c.regs().sc_regs[21]) }
+func (c *sigctxt) r22() uint32       { return uint32(c.regs().sc_regs[22]) }
+func (c *sigctxt) r23() uint32       { return uint32(c.regs().sc_regs[23]) }
+func (c *sigctxt) r24() uint32       { return uint32(c.regs().sc_regs[24]) }
+func (c *sigctxt) r25() uint32       { return uint32(c.regs().sc_regs[25]) }
+func (c *sigctxt) r26() uint32       { return uint32(c.regs().sc_regs[26]) }
+func (c *sigctxt) r27() uint32       { return uint32(c.regs().sc_regs[27]) }
+func (c *sigctxt) r28() uint32       { return uint32(c.regs().sc_regs[28]) }
+func (c *sigctxt) r29() uint32       { return uint32(c.regs().sc_regs[29]) }
+func (c *sigctxt) r30() uint32       { return uint32(c.regs().sc_regs[30]) }
+func (c *sigctxt) r31() uint32       { return uint32(c.regs().sc_regs[31]) }
+func (c *sigctxt) sp() uint32        { return uint32(c.regs().sc_regs[29]) }
+func (c *sigctxt) pc() uint32        { return uint32(c.regs().sc_pc) }
+func (c *sigctxt) link() uint32      { return uint32(c.regs().sc_regs[31]) }
+func (c *sigctxt) lo() uint32        { return uint32(c.regs().sc_mdlo) }
+func (c *sigctxt) hi() uint32        { return uint32(c.regs().sc_mdhi) }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 { return c.info.si_addr }
+
+func (c *sigctxt) set_r30(x uint32)  { c.regs().sc_regs[30] = uint64(x) }
+func (c *sigctxt) set_pc(x uint32)   { c.regs().sc_pc = uint64(x) }
+func (c *sigctxt) set_sp(x uint32)   { c.regs().sc_regs[29] = uint64(x) }
+func (c *sigctxt) set_link(x uint32) { c.regs().sc_regs[31] = uint64(x) }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) { c.info.si_addr = x }
diff --git a/src/runtime/signal_linux_ppc64x.go b/src/runtime/signal_linux_ppc64x.go
new file mode 100644
index 0000000..97cb26d
--- /dev/null
+++ b/src/runtime/signal_linux_ppc64x.go
@@ -0,0 +1,82 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build ppc64 ppc64le
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *ptregs { return (*ucontext)(c.ctxt).uc_mcontext.regs }
+
+func (c *sigctxt) r0() uint64  { return c.regs().gpr[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().gpr[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().gpr[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().gpr[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().gpr[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().gpr[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().gpr[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().gpr[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().gpr[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().gpr[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().gpr[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().gpr[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().gpr[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().gpr[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().gpr[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().gpr[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().gpr[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().gpr[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().gpr[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().gpr[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().gpr[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().gpr[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().gpr[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().gpr[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().gpr[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().gpr[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().gpr[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().gpr[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().gpr[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().gpr[29] }
+func (c *sigctxt) r30() uint64 { return c.regs().gpr[30] }
+func (c *sigctxt) r31() uint64 { return c.regs().gpr[31] }
+func (c *sigctxt) sp() uint64  { return c.regs().gpr[1] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().nip }
+
+func (c *sigctxt) trap() uint64 { return c.regs().trap }
+func (c *sigctxt) ctr() uint64  { return c.regs().ctr }
+func (c *sigctxt) link() uint64 { return c.regs().link }
+func (c *sigctxt) xer() uint64  { return c.regs().xer }
+func (c *sigctxt) ccr() uint64  { return c.regs().ccr }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.regs().dar) }
+
+func (c *sigctxt) set_r0(x uint64)   { c.regs().gpr[0] = x }
+func (c *sigctxt) set_r12(x uint64)  { c.regs().gpr[12] = x }
+func (c *sigctxt) set_r30(x uint64)  { c.regs().gpr[30] = x }
+func (c *sigctxt) set_pc(x uint64)   { c.regs().nip = x }
+func (c *sigctxt) set_sp(x uint64)   { c.regs().gpr[1] = x }
+func (c *sigctxt) set_link(x uint64) { c.regs().link = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*sys.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_riscv64.go b/src/runtime/signal_linux_riscv64.go
new file mode 100644
index 0000000..9f68e5c
--- /dev/null
+++ b/src/runtime/signal_linux_riscv64.go
@@ -0,0 +1,68 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) ra() uint64  { return c.regs().sc_regs.ra }
+func (c *sigctxt) sp() uint64  { return c.regs().sc_regs.sp }
+func (c *sigctxt) gp() uint64  { return c.regs().sc_regs.gp }
+func (c *sigctxt) tp() uint64  { return c.regs().sc_regs.tp }
+func (c *sigctxt) t0() uint64  { return c.regs().sc_regs.t0 }
+func (c *sigctxt) t1() uint64  { return c.regs().sc_regs.t1 }
+func (c *sigctxt) t2() uint64  { return c.regs().sc_regs.t2 }
+func (c *sigctxt) s0() uint64  { return c.regs().sc_regs.s0 }
+func (c *sigctxt) s1() uint64  { return c.regs().sc_regs.s1 }
+func (c *sigctxt) a0() uint64  { return c.regs().sc_regs.a0 }
+func (c *sigctxt) a1() uint64  { return c.regs().sc_regs.a1 }
+func (c *sigctxt) a2() uint64  { return c.regs().sc_regs.a2 }
+func (c *sigctxt) a3() uint64  { return c.regs().sc_regs.a3 }
+func (c *sigctxt) a4() uint64  { return c.regs().sc_regs.a4 }
+func (c *sigctxt) a5() uint64  { return c.regs().sc_regs.a5 }
+func (c *sigctxt) a6() uint64  { return c.regs().sc_regs.a6 }
+func (c *sigctxt) a7() uint64  { return c.regs().sc_regs.a7 }
+func (c *sigctxt) s2() uint64  { return c.regs().sc_regs.s2 }
+func (c *sigctxt) s3() uint64  { return c.regs().sc_regs.s3 }
+func (c *sigctxt) s4() uint64  { return c.regs().sc_regs.s4 }
+func (c *sigctxt) s5() uint64  { return c.regs().sc_regs.s5 }
+func (c *sigctxt) s6() uint64  { return c.regs().sc_regs.s6 }
+func (c *sigctxt) s7() uint64  { return c.regs().sc_regs.s7 }
+func (c *sigctxt) s8() uint64  { return c.regs().sc_regs.s8 }
+func (c *sigctxt) s9() uint64  { return c.regs().sc_regs.s9 }
+func (c *sigctxt) s10() uint64 { return c.regs().sc_regs.s10 }
+func (c *sigctxt) s11() uint64 { return c.regs().sc_regs.s11 }
+func (c *sigctxt) t3() uint64  { return c.regs().sc_regs.t3 }
+func (c *sigctxt) t4() uint64  { return c.regs().sc_regs.t4 }
+func (c *sigctxt) t5() uint64  { return c.regs().sc_regs.t5 }
+func (c *sigctxt) t6() uint64  { return c.regs().sc_regs.t6 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().sc_regs.pc }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_pc(x uint64) { c.regs().sc_regs.pc = x }
+func (c *sigctxt) set_ra(x uint64) { c.regs().sc_regs.ra = x }
+func (c *sigctxt) set_sp(x uint64) { c.regs().sc_regs.sp = x }
+func (c *sigctxt) set_gp(x uint64) { c.regs().sc_regs.gp = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*sys.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_s390x.go b/src/runtime/signal_linux_s390x.go
new file mode 100644
index 0000000..12d5c31
--- /dev/null
+++ b/src/runtime/signal_linux_s390x.go
@@ -0,0 +1,125 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(unsafe.Pointer(&(*ucontext)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) r0() uint64   { return c.regs().gregs[0] }
+func (c *sigctxt) r1() uint64   { return c.regs().gregs[1] }
+func (c *sigctxt) r2() uint64   { return c.regs().gregs[2] }
+func (c *sigctxt) r3() uint64   { return c.regs().gregs[3] }
+func (c *sigctxt) r4() uint64   { return c.regs().gregs[4] }
+func (c *sigctxt) r5() uint64   { return c.regs().gregs[5] }
+func (c *sigctxt) r6() uint64   { return c.regs().gregs[6] }
+func (c *sigctxt) r7() uint64   { return c.regs().gregs[7] }
+func (c *sigctxt) r8() uint64   { return c.regs().gregs[8] }
+func (c *sigctxt) r9() uint64   { return c.regs().gregs[9] }
+func (c *sigctxt) r10() uint64  { return c.regs().gregs[10] }
+func (c *sigctxt) r11() uint64  { return c.regs().gregs[11] }
+func (c *sigctxt) r12() uint64  { return c.regs().gregs[12] }
+func (c *sigctxt) r13() uint64  { return c.regs().gregs[13] }
+func (c *sigctxt) r14() uint64  { return c.regs().gregs[14] }
+func (c *sigctxt) r15() uint64  { return c.regs().gregs[15] }
+func (c *sigctxt) link() uint64 { return c.regs().gregs[14] }
+func (c *sigctxt) sp() uint64   { return c.regs().gregs[15] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().psw_addr }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_r0(x uint64)      { c.regs().gregs[0] = x }
+func (c *sigctxt) set_r13(x uint64)     { c.regs().gregs[13] = x }
+func (c *sigctxt) set_link(x uint64)    { c.regs().gregs[14] = x }
+func (c *sigctxt) set_sp(x uint64)      { c.regs().gregs[15] = x }
+func (c *sigctxt) set_pc(x uint64)      { c.regs().psw_addr = x }
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*sys.PtrSize)) = uintptr(x)
+}
+
+func dumpregs(c *sigctxt) {
+	print("r0   ", hex(c.r0()), "\t")
+	print("r1   ", hex(c.r1()), "\n")
+	print("r2   ", hex(c.r2()), "\t")
+	print("r3   ", hex(c.r3()), "\n")
+	print("r4   ", hex(c.r4()), "\t")
+	print("r5   ", hex(c.r5()), "\n")
+	print("r6   ", hex(c.r6()), "\t")
+	print("r7   ", hex(c.r7()), "\n")
+	print("r8   ", hex(c.r8()), "\t")
+	print("r9   ", hex(c.r9()), "\n")
+	print("r10  ", hex(c.r10()), "\t")
+	print("r11  ", hex(c.r11()), "\n")
+	print("r12  ", hex(c.r12()), "\t")
+	print("r13  ", hex(c.r13()), "\n")
+	print("r14  ", hex(c.r14()), "\t")
+	print("r15  ", hex(c.r15()), "\n")
+	print("pc   ", hex(c.pc()), "\t")
+	print("link ", hex(c.link()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.link()) }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange link, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LINK to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - sys.MinFrameSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+
+	pc := uintptr(gp.sigpc)
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.link())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_link(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r0(0)
+	c.set_r13(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint64(funcPC(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - 8
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_link(uint64(resumePC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_mips64x.go b/src/runtime/signal_mips64x.go
new file mode 100644
index 0000000..2a347ff
--- /dev/null
+++ b/src/runtime/signal_mips64x.go
@@ -0,0 +1,100 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux openbsd
+// +build mips64 mips64le
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("r0   ", hex(c.r0()), "\t")
+	print("r1   ", hex(c.r1()), "\n")
+	print("r2   ", hex(c.r2()), "\t")
+	print("r3   ", hex(c.r3()), "\n")
+	print("r4   ", hex(c.r4()), "\t")
+	print("r5   ", hex(c.r5()), "\n")
+	print("r6   ", hex(c.r6()), "\t")
+	print("r7   ", hex(c.r7()), "\n")
+	print("r8   ", hex(c.r8()), "\t")
+	print("r9   ", hex(c.r9()), "\n")
+	print("r10  ", hex(c.r10()), "\t")
+	print("r11  ", hex(c.r11()), "\n")
+	print("r12  ", hex(c.r12()), "\t")
+	print("r13  ", hex(c.r13()), "\n")
+	print("r14  ", hex(c.r14()), "\t")
+	print("r15  ", hex(c.r15()), "\n")
+	print("r16  ", hex(c.r16()), "\t")
+	print("r17  ", hex(c.r17()), "\n")
+	print("r18  ", hex(c.r18()), "\t")
+	print("r19  ", hex(c.r19()), "\n")
+	print("r20  ", hex(c.r20()), "\t")
+	print("r21  ", hex(c.r21()), "\n")
+	print("r22  ", hex(c.r22()), "\t")
+	print("r23  ", hex(c.r23()), "\n")
+	print("r24  ", hex(c.r24()), "\t")
+	print("r25  ", hex(c.r25()), "\n")
+	print("r26  ", hex(c.r26()), "\t")
+	print("r27  ", hex(c.r27()), "\n")
+	print("r28  ", hex(c.r28()), "\t")
+	print("r29  ", hex(c.r29()), "\n")
+	print("r30  ", hex(c.r30()), "\t")
+	print("r31  ", hex(c.r31()), "\n")
+	print("pc   ", hex(c.pc()), "\t")
+	print("link ", hex(c.link()), "\n")
+	print("lo   ", hex(c.lo()), "\t")
+	print("hi   ", hex(c.hi()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.link()) }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange link, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LINK to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - sys.PtrSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.link())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_link(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	sigpanicPC := uint64(funcPC(sigpanic))
+	c.set_r28(sigpanicPC >> 32 << 32) // RSB register
+	c.set_r30(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(sigpanicPC)
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - 8
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_link(uint64(resumePC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_mipsx.go b/src/runtime/signal_mipsx.go
new file mode 100644
index 0000000..8c29f59
--- /dev/null
+++ b/src/runtime/signal_mipsx.go
@@ -0,0 +1,95 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build mips mipsle
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("r0   ", hex(c.r0()), "\t")
+	print("r1   ", hex(c.r1()), "\n")
+	print("r2   ", hex(c.r2()), "\t")
+	print("r3   ", hex(c.r3()), "\n")
+	print("r4   ", hex(c.r4()), "\t")
+	print("r5   ", hex(c.r5()), "\n")
+	print("r6   ", hex(c.r6()), "\t")
+	print("r7   ", hex(c.r7()), "\n")
+	print("r8   ", hex(c.r8()), "\t")
+	print("r9   ", hex(c.r9()), "\n")
+	print("r10  ", hex(c.r10()), "\t")
+	print("r11  ", hex(c.r11()), "\n")
+	print("r12  ", hex(c.r12()), "\t")
+	print("r13  ", hex(c.r13()), "\n")
+	print("r14  ", hex(c.r14()), "\t")
+	print("r15  ", hex(c.r15()), "\n")
+	print("r16  ", hex(c.r16()), "\t")
+	print("r17  ", hex(c.r17()), "\n")
+	print("r18  ", hex(c.r18()), "\t")
+	print("r19  ", hex(c.r19()), "\n")
+	print("r20  ", hex(c.r20()), "\t")
+	print("r21  ", hex(c.r21()), "\n")
+	print("r22  ", hex(c.r22()), "\t")
+	print("r23  ", hex(c.r23()), "\n")
+	print("r24  ", hex(c.r24()), "\t")
+	print("r25  ", hex(c.r25()), "\n")
+	print("r26  ", hex(c.r26()), "\t")
+	print("r27  ", hex(c.r27()), "\n")
+	print("r28  ", hex(c.r28()), "\t")
+	print("r29  ", hex(c.r29()), "\n")
+	print("r30  ", hex(c.r30()), "\t")
+	print("r31  ", hex(c.r31()), "\n")
+	print("pc   ", hex(c.pc()), "\t")
+	print("link ", hex(c.link()), "\n")
+	print("lo   ", hex(c.lo()), "\t")
+	print("hi   ", hex(c.hi()), "\n")
+}
+
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.link()) }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange link, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LINK to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - sys.MinFrameSize
+	c.set_sp(sp)
+	*(*uint32)(unsafe.Pointer(uintptr(sp))) = c.link()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.link())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_link(uint32(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r30(uint32(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint32(funcPC(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - 4
+	c.set_sp(sp)
+	*(*uint32)(unsafe.Pointer(uintptr(sp))) = c.link()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_link(uint32(resumePC))
+	c.set_pc(uint32(targetPC))
+}
diff --git a/src/runtime/signal_netbsd.go b/src/runtime/signal_netbsd.go
new file mode 100644
index 0000000..ca51084
--- /dev/null
+++ b/src/runtime/signal_netbsd.go
@@ -0,0 +1,41 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/*  0 */ {0, "SIGNONE: no trap"},
+	/*  1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/*  2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/*  3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/*  4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/*  5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/*  6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/*  7 */ {_SigThrow, "SIGEMT: emulate instruction executed"},
+	/*  8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/*  9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigThrow, "SIGSYS: bad system call"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 17 */ {0, "SIGSTOP: stop"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 19 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue after stop"},
+	/* 20 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify + _SigIgn, "SIGIO: i/o now possible"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify + _SigIgn, "SIGINFO: status request from keyboard"},
+	/* 30 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 31 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/* 32 */ {_SigNotify, "SIGTHR: reserved"},
+}
diff --git a/src/runtime/signal_netbsd_386.go b/src/runtime/signal_netbsd_386.go
new file mode 100644
index 0000000..845a575
--- /dev/null
+++ b/src/runtime/signal_netbsd_386.go
@@ -0,0 +1,45 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontextt { return &(*ucontextt)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) eax() uint32 { return c.regs().__gregs[_REG_EAX] }
+func (c *sigctxt) ebx() uint32 { return c.regs().__gregs[_REG_EBX] }
+func (c *sigctxt) ecx() uint32 { return c.regs().__gregs[_REG_ECX] }
+func (c *sigctxt) edx() uint32 { return c.regs().__gregs[_REG_EDX] }
+func (c *sigctxt) edi() uint32 { return c.regs().__gregs[_REG_EDI] }
+func (c *sigctxt) esi() uint32 { return c.regs().__gregs[_REG_ESI] }
+func (c *sigctxt) ebp() uint32 { return c.regs().__gregs[_REG_EBP] }
+func (c *sigctxt) esp() uint32 { return c.regs().__gregs[_REG_UESP] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) eip() uint32 { return c.regs().__gregs[_REG_EIP] }
+
+func (c *sigctxt) eflags() uint32  { return c.regs().__gregs[_REG_EFL] }
+func (c *sigctxt) cs() uint32      { return c.regs().__gregs[_REG_CS] }
+func (c *sigctxt) fs() uint32      { return c.regs().__gregs[_REG_FS] }
+func (c *sigctxt) gs() uint32      { return c.regs().__gregs[_REG_GS] }
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info._code) }
+func (c *sigctxt) sigaddr() uint32 {
+	return *(*uint32)(unsafe.Pointer(&c.info._reason[0]))
+}
+
+func (c *sigctxt) set_eip(x uint32)     { c.regs().__gregs[_REG_EIP] = x }
+func (c *sigctxt) set_esp(x uint32)     { c.regs().__gregs[_REG_UESP] = x }
+func (c *sigctxt) set_sigcode(x uint32) { c.info._code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	*(*uint32)(unsafe.Pointer(&c.info._reason[0])) = x
+}
diff --git a/src/runtime/signal_netbsd_amd64.go b/src/runtime/signal_netbsd_amd64.go
new file mode 100644
index 0000000..67fe437
--- /dev/null
+++ b/src/runtime/signal_netbsd_amd64.go
@@ -0,0 +1,55 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontextt {
+	return (*mcontextt)(unsafe.Pointer(&(*ucontextt)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) rax() uint64 { return c.regs().__gregs[_REG_RAX] }
+func (c *sigctxt) rbx() uint64 { return c.regs().__gregs[_REG_RBX] }
+func (c *sigctxt) rcx() uint64 { return c.regs().__gregs[_REG_RCX] }
+func (c *sigctxt) rdx() uint64 { return c.regs().__gregs[_REG_RDX] }
+func (c *sigctxt) rdi() uint64 { return c.regs().__gregs[_REG_RDI] }
+func (c *sigctxt) rsi() uint64 { return c.regs().__gregs[_REG_RSI] }
+func (c *sigctxt) rbp() uint64 { return c.regs().__gregs[_REG_RBP] }
+func (c *sigctxt) rsp() uint64 { return c.regs().__gregs[_REG_RSP] }
+func (c *sigctxt) r8() uint64  { return c.regs().__gregs[_REG_R8] }
+func (c *sigctxt) r9() uint64  { return c.regs().__gregs[_REG_R8] }
+func (c *sigctxt) r10() uint64 { return c.regs().__gregs[_REG_R10] }
+func (c *sigctxt) r11() uint64 { return c.regs().__gregs[_REG_R11] }
+func (c *sigctxt) r12() uint64 { return c.regs().__gregs[_REG_R12] }
+func (c *sigctxt) r13() uint64 { return c.regs().__gregs[_REG_R13] }
+func (c *sigctxt) r14() uint64 { return c.regs().__gregs[_REG_R14] }
+func (c *sigctxt) r15() uint64 { return c.regs().__gregs[_REG_R15] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().__gregs[_REG_RIP] }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().__gregs[_REG_RFLAGS] }
+func (c *sigctxt) cs() uint64      { return c.regs().__gregs[_REG_CS] }
+func (c *sigctxt) fs() uint64      { return c.regs().__gregs[_REG_FS] }
+func (c *sigctxt) gs() uint64      { return c.regs().__gregs[_REG_GS] }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info._code) }
+func (c *sigctxt) sigaddr() uint64 {
+	return *(*uint64)(unsafe.Pointer(&c.info._reason[0]))
+}
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().__gregs[_REG_RIP] = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().__gregs[_REG_RSP] = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info._code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uint64)(unsafe.Pointer(&c.info._reason[0])) = x
+}
diff --git a/src/runtime/signal_netbsd_arm.go b/src/runtime/signal_netbsd_arm.go
new file mode 100644
index 0000000..fdb3078
--- /dev/null
+++ b/src/runtime/signal_netbsd_arm.go
@@ -0,0 +1,55 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontextt { return &(*ucontextt)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint32  { return c.regs().__gregs[_REG_R0] }
+func (c *sigctxt) r1() uint32  { return c.regs().__gregs[_REG_R1] }
+func (c *sigctxt) r2() uint32  { return c.regs().__gregs[_REG_R2] }
+func (c *sigctxt) r3() uint32  { return c.regs().__gregs[_REG_R3] }
+func (c *sigctxt) r4() uint32  { return c.regs().__gregs[_REG_R4] }
+func (c *sigctxt) r5() uint32  { return c.regs().__gregs[_REG_R5] }
+func (c *sigctxt) r6() uint32  { return c.regs().__gregs[_REG_R6] }
+func (c *sigctxt) r7() uint32  { return c.regs().__gregs[_REG_R7] }
+func (c *sigctxt) r8() uint32  { return c.regs().__gregs[_REG_R8] }
+func (c *sigctxt) r9() uint32  { return c.regs().__gregs[_REG_R9] }
+func (c *sigctxt) r10() uint32 { return c.regs().__gregs[_REG_R10] }
+func (c *sigctxt) fp() uint32  { return c.regs().__gregs[_REG_R11] }
+func (c *sigctxt) ip() uint32  { return c.regs().__gregs[_REG_R12] }
+func (c *sigctxt) sp() uint32  { return c.regs().__gregs[_REG_R13] }
+func (c *sigctxt) lr() uint32  { return c.regs().__gregs[_REG_R14] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint32 { return c.regs().__gregs[_REG_R15] }
+
+func (c *sigctxt) cpsr() uint32    { return c.regs().__gregs[_REG_CPSR] }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.info._reason) }
+func (c *sigctxt) trap() uint32    { return 0 }
+func (c *sigctxt) error() uint32   { return 0 }
+func (c *sigctxt) oldmask() uint32 { return 0 }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info._code) }
+func (c *sigctxt) sigaddr() uint32 { return uint32(c.info._reason) }
+
+func (c *sigctxt) set_pc(x uint32)  { c.regs().__gregs[_REG_R15] = x }
+func (c *sigctxt) set_sp(x uint32)  { c.regs().__gregs[_REG_R13] = x }
+func (c *sigctxt) set_lr(x uint32)  { c.regs().__gregs[_REG_R14] = x }
+func (c *sigctxt) set_r10(x uint32) { c.regs().__gregs[_REG_R10] = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info._code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	c.info._reason = uintptr(x)
+}
diff --git a/src/runtime/signal_netbsd_arm64.go b/src/runtime/signal_netbsd_arm64.go
new file mode 100644
index 0000000..8dfdfea
--- /dev/null
+++ b/src/runtime/signal_netbsd_arm64.go
@@ -0,0 +1,73 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontextt {
+	return (*mcontextt)(unsafe.Pointer(&(*ucontextt)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) r0() uint64  { return c.regs().__gregs[_REG_X0] }
+func (c *sigctxt) r1() uint64  { return c.regs().__gregs[_REG_X1] }
+func (c *sigctxt) r2() uint64  { return c.regs().__gregs[_REG_X2] }
+func (c *sigctxt) r3() uint64  { return c.regs().__gregs[_REG_X3] }
+func (c *sigctxt) r4() uint64  { return c.regs().__gregs[_REG_X4] }
+func (c *sigctxt) r5() uint64  { return c.regs().__gregs[_REG_X5] }
+func (c *sigctxt) r6() uint64  { return c.regs().__gregs[_REG_X6] }
+func (c *sigctxt) r7() uint64  { return c.regs().__gregs[_REG_X7] }
+func (c *sigctxt) r8() uint64  { return c.regs().__gregs[_REG_X8] }
+func (c *sigctxt) r9() uint64  { return c.regs().__gregs[_REG_X9] }
+func (c *sigctxt) r10() uint64 { return c.regs().__gregs[_REG_X10] }
+func (c *sigctxt) r11() uint64 { return c.regs().__gregs[_REG_X11] }
+func (c *sigctxt) r12() uint64 { return c.regs().__gregs[_REG_X12] }
+func (c *sigctxt) r13() uint64 { return c.regs().__gregs[_REG_X13] }
+func (c *sigctxt) r14() uint64 { return c.regs().__gregs[_REG_X14] }
+func (c *sigctxt) r15() uint64 { return c.regs().__gregs[_REG_X15] }
+func (c *sigctxt) r16() uint64 { return c.regs().__gregs[_REG_X16] }
+func (c *sigctxt) r17() uint64 { return c.regs().__gregs[_REG_X17] }
+func (c *sigctxt) r18() uint64 { return c.regs().__gregs[_REG_X18] }
+func (c *sigctxt) r19() uint64 { return c.regs().__gregs[_REG_X19] }
+func (c *sigctxt) r20() uint64 { return c.regs().__gregs[_REG_X20] }
+func (c *sigctxt) r21() uint64 { return c.regs().__gregs[_REG_X21] }
+func (c *sigctxt) r22() uint64 { return c.regs().__gregs[_REG_X22] }
+func (c *sigctxt) r23() uint64 { return c.regs().__gregs[_REG_X23] }
+func (c *sigctxt) r24() uint64 { return c.regs().__gregs[_REG_X24] }
+func (c *sigctxt) r25() uint64 { return c.regs().__gregs[_REG_X25] }
+func (c *sigctxt) r26() uint64 { return c.regs().__gregs[_REG_X26] }
+func (c *sigctxt) r27() uint64 { return c.regs().__gregs[_REG_X27] }
+func (c *sigctxt) r28() uint64 { return c.regs().__gregs[_REG_X28] }
+func (c *sigctxt) r29() uint64 { return c.regs().__gregs[_REG_X29] }
+func (c *sigctxt) lr() uint64  { return c.regs().__gregs[_REG_X30] }
+func (c *sigctxt) sp() uint64  { return c.regs().__gregs[_REG_X31] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().__gregs[_REG_ELR] }
+
+func (c *sigctxt) fault() uintptr  { return uintptr(c.info._reason) }
+func (c *sigctxt) trap() uint64    { return 0 }
+func (c *sigctxt) error() uint64   { return 0 }
+func (c *sigctxt) oldmask() uint64 { return 0 }
+
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info._code) }
+func (c *sigctxt) sigaddr() uint64 { return uint64(c.info._reason) }
+
+func (c *sigctxt) set_pc(x uint64)  { c.regs().__gregs[_REG_ELR] = x }
+func (c *sigctxt) set_sp(x uint64)  { c.regs().__gregs[_REG_X31] = x }
+func (c *sigctxt) set_lr(x uint64)  { c.regs().__gregs[_REG_X30] = x }
+func (c *sigctxt) set_r28(x uint64) { c.regs().__gregs[_REG_X28] = x }
+
+func (c *sigctxt) set_sigcode(x uint64) { c.info._code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	c.info._reason = uintptr(x)
+}
diff --git a/src/runtime/signal_openbsd.go b/src/runtime/signal_openbsd.go
new file mode 100644
index 0000000..d2c5c5e
--- /dev/null
+++ b/src/runtime/signal_openbsd.go
@@ -0,0 +1,41 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/*  0 */ {0, "SIGNONE: no trap"},
+	/*  1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/*  2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/*  3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/*  4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/*  5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/*  6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/*  7 */ {_SigThrow, "SIGEMT: emulate instruction executed"},
+	/*  8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/*  9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigThrow, "SIGSYS: bad system call"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 17 */ {0, "SIGSTOP: stop"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 19 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue after stop"},
+	/* 20 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify, "SIGIO: i/o now possible"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify, "SIGINFO: status request from keyboard"},
+	/* 30 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 31 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/* 32 */ {0, "SIGTHR: reserved"}, // thread AST - cannot be registered.
+}
diff --git a/src/runtime/signal_openbsd_386.go b/src/runtime/signal_openbsd_386.go
new file mode 100644
index 0000000..2fc4b1d
--- /dev/null
+++ b/src/runtime/signal_openbsd_386.go
@@ -0,0 +1,47 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(c.ctxt)
+}
+
+func (c *sigctxt) eax() uint32 { return c.regs().sc_eax }
+func (c *sigctxt) ebx() uint32 { return c.regs().sc_ebx }
+func (c *sigctxt) ecx() uint32 { return c.regs().sc_ecx }
+func (c *sigctxt) edx() uint32 { return c.regs().sc_edx }
+func (c *sigctxt) edi() uint32 { return c.regs().sc_edi }
+func (c *sigctxt) esi() uint32 { return c.regs().sc_esi }
+func (c *sigctxt) ebp() uint32 { return c.regs().sc_ebp }
+func (c *sigctxt) esp() uint32 { return c.regs().sc_esp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) eip() uint32 { return c.regs().sc_eip }
+
+func (c *sigctxt) eflags() uint32  { return c.regs().sc_eflags }
+func (c *sigctxt) cs() uint32      { return c.regs().sc_cs }
+func (c *sigctxt) fs() uint32      { return c.regs().sc_fs }
+func (c *sigctxt) gs() uint32      { return c.regs().sc_gs }
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 {
+	return *(*uint32)(add(unsafe.Pointer(c.info), 12))
+}
+
+func (c *sigctxt) set_eip(x uint32)     { c.regs().sc_eip = x }
+func (c *sigctxt) set_esp(x uint32)     { c.regs().sc_esp = x }
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	*(*uint32)(add(unsafe.Pointer(c.info), 12)) = x
+}
diff --git a/src/runtime/signal_openbsd_amd64.go b/src/runtime/signal_openbsd_amd64.go
new file mode 100644
index 0000000..091a88a
--- /dev/null
+++ b/src/runtime/signal_openbsd_amd64.go
@@ -0,0 +1,55 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(c.ctxt)
+}
+
+func (c *sigctxt) rax() uint64 { return c.regs().sc_rax }
+func (c *sigctxt) rbx() uint64 { return c.regs().sc_rbx }
+func (c *sigctxt) rcx() uint64 { return c.regs().sc_rcx }
+func (c *sigctxt) rdx() uint64 { return c.regs().sc_rdx }
+func (c *sigctxt) rdi() uint64 { return c.regs().sc_rdi }
+func (c *sigctxt) rsi() uint64 { return c.regs().sc_rsi }
+func (c *sigctxt) rbp() uint64 { return c.regs().sc_rbp }
+func (c *sigctxt) rsp() uint64 { return c.regs().sc_rsp }
+func (c *sigctxt) r8() uint64  { return c.regs().sc_r8 }
+func (c *sigctxt) r9() uint64  { return c.regs().sc_r9 }
+func (c *sigctxt) r10() uint64 { return c.regs().sc_r10 }
+func (c *sigctxt) r11() uint64 { return c.regs().sc_r11 }
+func (c *sigctxt) r12() uint64 { return c.regs().sc_r12 }
+func (c *sigctxt) r13() uint64 { return c.regs().sc_r13 }
+func (c *sigctxt) r14() uint64 { return c.regs().sc_r14 }
+func (c *sigctxt) r15() uint64 { return c.regs().sc_r15 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().sc_rip }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().sc_rflags }
+func (c *sigctxt) cs() uint64      { return c.regs().sc_cs }
+func (c *sigctxt) fs() uint64      { return c.regs().sc_fs }
+func (c *sigctxt) gs() uint64      { return c.regs().sc_gs }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 {
+	return *(*uint64)(add(unsafe.Pointer(c.info), 16))
+}
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().sc_rip = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().sc_rsp = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uint64)(add(unsafe.Pointer(c.info), 16)) = x
+}
diff --git a/src/runtime/signal_openbsd_arm.go b/src/runtime/signal_openbsd_arm.go
new file mode 100644
index 0000000..f796550
--- /dev/null
+++ b/src/runtime/signal_openbsd_arm.go
@@ -0,0 +1,59 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(c.ctxt)
+}
+
+func (c *sigctxt) r0() uint32  { return c.regs().sc_r0 }
+func (c *sigctxt) r1() uint32  { return c.regs().sc_r1 }
+func (c *sigctxt) r2() uint32  { return c.regs().sc_r2 }
+func (c *sigctxt) r3() uint32  { return c.regs().sc_r3 }
+func (c *sigctxt) r4() uint32  { return c.regs().sc_r4 }
+func (c *sigctxt) r5() uint32  { return c.regs().sc_r5 }
+func (c *sigctxt) r6() uint32  { return c.regs().sc_r6 }
+func (c *sigctxt) r7() uint32  { return c.regs().sc_r7 }
+func (c *sigctxt) r8() uint32  { return c.regs().sc_r8 }
+func (c *sigctxt) r9() uint32  { return c.regs().sc_r9 }
+func (c *sigctxt) r10() uint32 { return c.regs().sc_r10 }
+func (c *sigctxt) fp() uint32  { return c.regs().sc_r11 }
+func (c *sigctxt) ip() uint32  { return c.regs().sc_r12 }
+func (c *sigctxt) sp() uint32  { return c.regs().sc_usr_sp }
+func (c *sigctxt) lr() uint32  { return c.regs().sc_usr_lr }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint32 { return c.regs().sc_pc }
+
+func (c *sigctxt) cpsr() uint32    { return c.regs().sc_spsr }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.sigaddr()) }
+func (c *sigctxt) trap() uint32    { return 0 }
+func (c *sigctxt) error() uint32   { return 0 }
+func (c *sigctxt) oldmask() uint32 { return 0 }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 {
+	return *(*uint32)(add(unsafe.Pointer(c.info), 16))
+}
+
+func (c *sigctxt) set_pc(x uint32)  { c.regs().sc_pc = x }
+func (c *sigctxt) set_sp(x uint32)  { c.regs().sc_usr_sp = x }
+func (c *sigctxt) set_lr(x uint32)  { c.regs().sc_usr_lr = x }
+func (c *sigctxt) set_r10(x uint32) { c.regs().sc_r10 = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	*(*uint32)(add(unsafe.Pointer(c.info), 16)) = x
+}
diff --git a/src/runtime/signal_openbsd_arm64.go b/src/runtime/signal_openbsd_arm64.go
new file mode 100644
index 0000000..3747b4f
--- /dev/null
+++ b/src/runtime/signal_openbsd_arm64.go
@@ -0,0 +1,75 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(c.ctxt)
+}
+
+func (c *sigctxt) r0() uint64  { return (uint64)(c.regs().sc_x[0]) }
+func (c *sigctxt) r1() uint64  { return (uint64)(c.regs().sc_x[1]) }
+func (c *sigctxt) r2() uint64  { return (uint64)(c.regs().sc_x[2]) }
+func (c *sigctxt) r3() uint64  { return (uint64)(c.regs().sc_x[3]) }
+func (c *sigctxt) r4() uint64  { return (uint64)(c.regs().sc_x[4]) }
+func (c *sigctxt) r5() uint64  { return (uint64)(c.regs().sc_x[5]) }
+func (c *sigctxt) r6() uint64  { return (uint64)(c.regs().sc_x[6]) }
+func (c *sigctxt) r7() uint64  { return (uint64)(c.regs().sc_x[7]) }
+func (c *sigctxt) r8() uint64  { return (uint64)(c.regs().sc_x[8]) }
+func (c *sigctxt) r9() uint64  { return (uint64)(c.regs().sc_x[9]) }
+func (c *sigctxt) r10() uint64 { return (uint64)(c.regs().sc_x[10]) }
+func (c *sigctxt) r11() uint64 { return (uint64)(c.regs().sc_x[11]) }
+func (c *sigctxt) r12() uint64 { return (uint64)(c.regs().sc_x[12]) }
+func (c *sigctxt) r13() uint64 { return (uint64)(c.regs().sc_x[13]) }
+func (c *sigctxt) r14() uint64 { return (uint64)(c.regs().sc_x[14]) }
+func (c *sigctxt) r15() uint64 { return (uint64)(c.regs().sc_x[15]) }
+func (c *sigctxt) r16() uint64 { return (uint64)(c.regs().sc_x[16]) }
+func (c *sigctxt) r17() uint64 { return (uint64)(c.regs().sc_x[17]) }
+func (c *sigctxt) r18() uint64 { return (uint64)(c.regs().sc_x[18]) }
+func (c *sigctxt) r19() uint64 { return (uint64)(c.regs().sc_x[19]) }
+func (c *sigctxt) r20() uint64 { return (uint64)(c.regs().sc_x[20]) }
+func (c *sigctxt) r21() uint64 { return (uint64)(c.regs().sc_x[21]) }
+func (c *sigctxt) r22() uint64 { return (uint64)(c.regs().sc_x[22]) }
+func (c *sigctxt) r23() uint64 { return (uint64)(c.regs().sc_x[23]) }
+func (c *sigctxt) r24() uint64 { return (uint64)(c.regs().sc_x[24]) }
+func (c *sigctxt) r25() uint64 { return (uint64)(c.regs().sc_x[25]) }
+func (c *sigctxt) r26() uint64 { return (uint64)(c.regs().sc_x[26]) }
+func (c *sigctxt) r27() uint64 { return (uint64)(c.regs().sc_x[27]) }
+func (c *sigctxt) r28() uint64 { return (uint64)(c.regs().sc_x[28]) }
+func (c *sigctxt) r29() uint64 { return (uint64)(c.regs().sc_x[29]) }
+func (c *sigctxt) lr() uint64  { return (uint64)(c.regs().sc_lr) }
+func (c *sigctxt) sp() uint64  { return (uint64)(c.regs().sc_sp) }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return (uint64)(c.regs().sc_lr) } /* XXX */
+
+func (c *sigctxt) fault() uint64   { return c.sigaddr() }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 {
+	return *(*uint64)(add(unsafe.Pointer(c.info), 16))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return uint64(c.regs().sc_elr) }
+
+func (c *sigctxt) set_pc(x uint64)  { c.regs().sc_elr = uintptr(x) }
+func (c *sigctxt) set_sp(x uint64)  { c.regs().sc_sp = uintptr(x) }
+func (c *sigctxt) set_lr(x uint64)  { c.regs().sc_lr = uintptr(x) }
+func (c *sigctxt) set_r28(x uint64) { c.regs().sc_x[28] = uintptr(x) }
+
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uint64)(add(unsafe.Pointer(c.info), 16)) = x
+}
diff --git a/src/runtime/signal_openbsd_mips64.go b/src/runtime/signal_openbsd_mips64.go
new file mode 100644
index 0000000..54ed523
--- /dev/null
+++ b/src/runtime/signal_openbsd_mips64.go
@@ -0,0 +1,78 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(c.ctxt)
+}
+
+func (c *sigctxt) r0() uint64  { return c.regs().sc_regs[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().sc_regs[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().sc_regs[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().sc_regs[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().sc_regs[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().sc_regs[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().sc_regs[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().sc_regs[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().sc_regs[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().sc_regs[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().sc_regs[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().sc_regs[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().sc_regs[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().sc_regs[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().sc_regs[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().sc_regs[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().sc_regs[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().sc_regs[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().sc_regs[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().sc_regs[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().sc_regs[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().sc_regs[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().sc_regs[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().sc_regs[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().sc_regs[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().sc_regs[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().sc_regs[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().sc_regs[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().sc_regs[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().sc_regs[29] }
+func (c *sigctxt) r30() uint64 { return c.regs().sc_regs[30] }
+func (c *sigctxt) r31() uint64 { return c.regs().sc_regs[31] }
+func (c *sigctxt) sp() uint64  { return c.regs().sc_regs[29] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().sc_pc }
+
+func (c *sigctxt) link() uint64 { return c.regs().sc_regs[31] }
+func (c *sigctxt) lo() uint64   { return c.regs().mullo }
+func (c *sigctxt) hi() uint64   { return c.regs().mulhi }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 {
+	return *(*uint64)(add(unsafe.Pointer(c.info), 16))
+}
+
+func (c *sigctxt) set_r28(x uint64)  { c.regs().sc_regs[28] = x }
+func (c *sigctxt) set_r30(x uint64)  { c.regs().sc_regs[30] = x }
+func (c *sigctxt) set_pc(x uint64)   { c.regs().sc_pc = x }
+func (c *sigctxt) set_sp(x uint64)   { c.regs().sc_regs[29] = x }
+func (c *sigctxt) set_link(x uint64) { c.regs().sc_regs[31] = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uint64)(add(unsafe.Pointer(c.info), 16)) = x
+}
diff --git a/src/runtime/signal_plan9.go b/src/runtime/signal_plan9.go
new file mode 100644
index 0000000..d3894c8
--- /dev/null
+++ b/src/runtime/signal_plan9.go
@@ -0,0 +1,57 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+type sigTabT struct {
+	flags int
+	name  string
+}
+
+// Incoming notes are compared against this table using strncmp, so the
+// order matters: longer patterns must appear before their prefixes.
+// There are _SIG constants in os2_plan9.go for the table index of some
+// of these.
+//
+// If you add entries to this table, you must respect the prefix ordering
+// and also update the constant values is os2_plan9.go.
+var sigtable = [...]sigTabT{
+	// Traps that we cannot be recovered.
+	{_SigThrow, "sys: trap: debug exception"},
+	{_SigThrow, "sys: trap: invalid opcode"},
+
+	// We can recover from some memory errors in runtime·sigpanic.
+	{_SigPanic, "sys: trap: fault read"},  // SIGRFAULT
+	{_SigPanic, "sys: trap: fault write"}, // SIGWFAULT
+
+	// We can also recover from math errors.
+	{_SigPanic, "sys: trap: divide error"}, // SIGINTDIV
+	{_SigPanic, "sys: fp:"},                // SIGFLOAT
+
+	// All other traps are normally handled as if they were marked SigThrow.
+	// We mark them SigPanic here so that debug.SetPanicOnFault will work.
+	{_SigPanic, "sys: trap:"}, // SIGTRAP
+
+	// Writes to a closed pipe can be handled if desired, otherwise they're ignored.
+	{_SigNotify, "sys: write on closed pipe"},
+
+	// Other system notes are more serious and cannot be recovered.
+	{_SigThrow, "sys:"},
+
+	// Issued to all other procs when calling runtime·exit.
+	{_SigGoExit, "go: exit "},
+
+	// Kill is sent by external programs to cause an exit.
+	{_SigKill, "kill"},
+
+	// Interrupts can be handled if desired, otherwise they cause an exit.
+	{_SigNotify + _SigKill, "interrupt"},
+	{_SigNotify + _SigKill, "hangup"},
+
+	// Alarms can be handled if desired, otherwise they're ignored.
+	{_SigNotify, "alarm"},
+
+	// Aborts can be handled if desired, otherwise they cause a stack trace.
+	{_SigNotify + _SigThrow, "abort"},
+}
diff --git a/src/runtime/signal_ppc64x.go b/src/runtime/signal_ppc64x.go
new file mode 100644
index 0000000..5de93a3
--- /dev/null
+++ b/src/runtime/signal_ppc64x.go
@@ -0,0 +1,111 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix linux
+// +build ppc64 ppc64le
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("r0   ", hex(c.r0()), "\t")
+	print("r1   ", hex(c.r1()), "\n")
+	print("r2   ", hex(c.r2()), "\t")
+	print("r3   ", hex(c.r3()), "\n")
+	print("r4   ", hex(c.r4()), "\t")
+	print("r5   ", hex(c.r5()), "\n")
+	print("r6   ", hex(c.r6()), "\t")
+	print("r7   ", hex(c.r7()), "\n")
+	print("r8   ", hex(c.r8()), "\t")
+	print("r9   ", hex(c.r9()), "\n")
+	print("r10  ", hex(c.r10()), "\t")
+	print("r11  ", hex(c.r11()), "\n")
+	print("r12  ", hex(c.r12()), "\t")
+	print("r13  ", hex(c.r13()), "\n")
+	print("r14  ", hex(c.r14()), "\t")
+	print("r15  ", hex(c.r15()), "\n")
+	print("r16  ", hex(c.r16()), "\t")
+	print("r17  ", hex(c.r17()), "\n")
+	print("r18  ", hex(c.r18()), "\t")
+	print("r19  ", hex(c.r19()), "\n")
+	print("r20  ", hex(c.r20()), "\t")
+	print("r21  ", hex(c.r21()), "\n")
+	print("r22  ", hex(c.r22()), "\t")
+	print("r23  ", hex(c.r23()), "\n")
+	print("r24  ", hex(c.r24()), "\t")
+	print("r25  ", hex(c.r25()), "\n")
+	print("r26  ", hex(c.r26()), "\t")
+	print("r27  ", hex(c.r27()), "\n")
+	print("r28  ", hex(c.r28()), "\t")
+	print("r29  ", hex(c.r29()), "\n")
+	print("r30  ", hex(c.r30()), "\t")
+	print("r31  ", hex(c.r31()), "\n")
+	print("pc   ", hex(c.pc()), "\t")
+	print("ctr  ", hex(c.ctr()), "\n")
+	print("link ", hex(c.link()), "\t")
+	print("xer  ", hex(c.xer()), "\n")
+	print("ccr  ", hex(c.ccr()), "\t")
+	print("trap ", hex(c.trap()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.link()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange link, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LINK to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - sys.MinFrameSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.link())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_link(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r0(0)
+	c.set_r30(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_r12(uint64(funcPC(sigpanic)))
+	c.set_pc(uint64(funcPC(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra space is known to gentraceback.
+	sp := c.sp() - sys.MinFrameSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+	// In PIC mode, we'll set up (i.e. clobber) R2 on function
+	// entry. Save it ahead of time.
+	// In PIC mode it requires R12 points to the function entry,
+	// so we'll set it up when pushing the call. Save it ahead
+	// of time as well.
+	// 8(SP) and 16(SP) are unused space in the reserved
+	// MinFrameSize (32) bytes.
+	*(*uint64)(unsafe.Pointer(uintptr(sp) + 8)) = c.r2()
+	*(*uint64)(unsafe.Pointer(uintptr(sp) + 16)) = c.r12()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_link(uint64(resumePC))
+	c.set_r12(uint64(targetPC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_riscv64.go b/src/runtime/signal_riscv64.go
new file mode 100644
index 0000000..93363a4
--- /dev/null
+++ b/src/runtime/signal_riscv64.go
@@ -0,0 +1,93 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux,riscv64
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("ra  ", hex(c.ra()), "\t")
+	print("sp  ", hex(c.sp()), "\n")
+	print("gp  ", hex(c.gp()), "\t")
+	print("tp  ", hex(c.tp()), "\n")
+	print("t0  ", hex(c.t0()), "\t")
+	print("t1  ", hex(c.t1()), "\n")
+	print("t2  ", hex(c.t2()), "\t")
+	print("s0  ", hex(c.s0()), "\n")
+	print("s1  ", hex(c.s1()), "\t")
+	print("a0  ", hex(c.a0()), "\n")
+	print("a1  ", hex(c.a1()), "\t")
+	print("a2  ", hex(c.a2()), "\n")
+	print("a3  ", hex(c.a3()), "\t")
+	print("a4  ", hex(c.a4()), "\n")
+	print("a5  ", hex(c.a5()), "\t")
+	print("a6  ", hex(c.a6()), "\n")
+	print("a7  ", hex(c.a7()), "\t")
+	print("s2  ", hex(c.s2()), "\n")
+	print("s3  ", hex(c.s3()), "\t")
+	print("s4  ", hex(c.s4()), "\n")
+	print("s5  ", hex(c.s5()), "\t")
+	print("s6  ", hex(c.s6()), "\n")
+	print("s7  ", hex(c.s7()), "\t")
+	print("s8  ", hex(c.s8()), "\n")
+	print("s9  ", hex(c.s9()), "\t")
+	print("s10 ", hex(c.s10()), "\n")
+	print("s11 ", hex(c.s11()), "\t")
+	print("t3  ", hex(c.t3()), "\n")
+	print("t4  ", hex(c.t4()), "\t")
+	print("t5  ", hex(c.t5()), "\n")
+	print("t6  ", hex(c.t6()), "\t")
+	print("pc  ", hex(c.pc()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.ra()) }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange RA, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save RA to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - sys.PtrSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.ra()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.ra())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_ra(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_gp(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint64(funcPC(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - sys.PtrSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.ra()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_ra(uint64(resumePC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_solaris.go b/src/runtime/signal_solaris.go
new file mode 100644
index 0000000..25f8ad5
--- /dev/null
+++ b/src/runtime/signal_solaris.go
@@ -0,0 +1,83 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/* 0 */ {0, "SIGNONE: no trap"},
+	/* 1 */ {_SigNotify + _SigKill, "SIGHUP: hangup"},
+	/* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt (rubout)"},
+	/* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit (ASCII FS)"},
+	/* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction (not reset when caught)"},
+	/* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap (not reset when caught)"},
+	/* 6 */ {_SigNotify + _SigThrow, "SIGABRT: used by abort, replace SIGIOT in the future"},
+	/* 7 */ {_SigThrow, "SIGEMT: EMT instruction"},
+	/* 8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating point exception"},
+	/* 9 */ {0, "SIGKILL: kill (cannot be caught or ignored)"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigThrow, "SIGSYS: bad argument to system call"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write on a pipe with no one to read it"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: software termination signal from kill"},
+	/* 16 */ {_SigNotify, "SIGUSR1: user defined signal 1"},
+	/* 17 */ {_SigNotify, "SIGUSR2: user defined signal 2"},
+	/* 18 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status change alias (POSIX)"},
+	/* 19 */ {_SigNotify, "SIGPWR: power-fail restart"},
+	/* 20 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 21 */ {_SigNotify + _SigIgn, "SIGURG: urgent socket condition"},
+	/* 22 */ {_SigNotify, "SIGPOLL: pollable event occurred"},
+	/* 23 */ {0, "SIGSTOP: stop (cannot be caught or ignored)"},
+	/* 24 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: user stop requested from tty"},
+	/* 25 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: stopped process has been continued"},
+	/* 26 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background tty read attempted"},
+	/* 27 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background tty write attempted"},
+	/* 28 */ {_SigNotify, "SIGVTALRM: virtual timer expired"},
+	/* 29 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling timer expired"},
+	/* 30 */ {_SigNotify, "SIGXCPU: exceeded cpu limit"},
+	/* 31 */ {_SigNotify, "SIGXFSZ: exceeded file size limit"},
+	/* 32 */ {_SigNotify, "SIGWAITING: reserved signal no longer used by"},
+	/* 33 */ {_SigNotify, "SIGLWP: reserved signal no longer used by"},
+	/* 34 */ {_SigNotify, "SIGFREEZE: special signal used by CPR"},
+	/* 35 */ {_SigNotify, "SIGTHAW: special signal used by CPR"},
+	/* 36 */ {_SigSetStack + _SigUnblock, "SIGCANCEL: reserved signal for thread cancellation"}, // Oracle's spelling of cancellation.
+	/* 37 */ {_SigNotify, "SIGLOST: resource lost (eg, record-lock lost)"},
+	/* 38 */ {_SigNotify, "SIGXRES: resource control exceeded"},
+	/* 39 */ {_SigNotify, "SIGJVM1: reserved signal for Java Virtual Machine"},
+	/* 40 */ {_SigNotify, "SIGJVM2: reserved signal for Java Virtual Machine"},
+
+	/* TODO(aram): what should be do about these signals? _SigDefault or _SigNotify? is this set static? */
+	/* 41 */ {_SigNotify, "real time signal"},
+	/* 42 */ {_SigNotify, "real time signal"},
+	/* 43 */ {_SigNotify, "real time signal"},
+	/* 44 */ {_SigNotify, "real time signal"},
+	/* 45 */ {_SigNotify, "real time signal"},
+	/* 46 */ {_SigNotify, "real time signal"},
+	/* 47 */ {_SigNotify, "real time signal"},
+	/* 48 */ {_SigNotify, "real time signal"},
+	/* 49 */ {_SigNotify, "real time signal"},
+	/* 50 */ {_SigNotify, "real time signal"},
+	/* 51 */ {_SigNotify, "real time signal"},
+	/* 52 */ {_SigNotify, "real time signal"},
+	/* 53 */ {_SigNotify, "real time signal"},
+	/* 54 */ {_SigNotify, "real time signal"},
+	/* 55 */ {_SigNotify, "real time signal"},
+	/* 56 */ {_SigNotify, "real time signal"},
+	/* 57 */ {_SigNotify, "real time signal"},
+	/* 58 */ {_SigNotify, "real time signal"},
+	/* 59 */ {_SigNotify, "real time signal"},
+	/* 60 */ {_SigNotify, "real time signal"},
+	/* 61 */ {_SigNotify, "real time signal"},
+	/* 62 */ {_SigNotify, "real time signal"},
+	/* 63 */ {_SigNotify, "real time signal"},
+	/* 64 */ {_SigNotify, "real time signal"},
+	/* 65 */ {_SigNotify, "real time signal"},
+	/* 66 */ {_SigNotify, "real time signal"},
+	/* 67 */ {_SigNotify, "real time signal"},
+	/* 68 */ {_SigNotify, "real time signal"},
+	/* 69 */ {_SigNotify, "real time signal"},
+	/* 70 */ {_SigNotify, "real time signal"},
+	/* 71 */ {_SigNotify, "real time signal"},
+	/* 72 */ {_SigNotify, "real time signal"},
+}
diff --git a/src/runtime/signal_solaris_amd64.go b/src/runtime/signal_solaris_amd64.go
new file mode 100644
index 0000000..b1da313
--- /dev/null
+++ b/src/runtime/signal_solaris_amd64.go
@@ -0,0 +1,53 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext {
+	return (*mcontext)(unsafe.Pointer(&(*ucontext)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) rax() uint64 { return uint64(c.regs().gregs[_REG_RAX]) }
+func (c *sigctxt) rbx() uint64 { return uint64(c.regs().gregs[_REG_RBX]) }
+func (c *sigctxt) rcx() uint64 { return uint64(c.regs().gregs[_REG_RCX]) }
+func (c *sigctxt) rdx() uint64 { return uint64(c.regs().gregs[_REG_RDX]) }
+func (c *sigctxt) rdi() uint64 { return uint64(c.regs().gregs[_REG_RDI]) }
+func (c *sigctxt) rsi() uint64 { return uint64(c.regs().gregs[_REG_RSI]) }
+func (c *sigctxt) rbp() uint64 { return uint64(c.regs().gregs[_REG_RBP]) }
+func (c *sigctxt) rsp() uint64 { return uint64(c.regs().gregs[_REG_RSP]) }
+func (c *sigctxt) r8() uint64  { return uint64(c.regs().gregs[_REG_R8]) }
+func (c *sigctxt) r9() uint64  { return uint64(c.regs().gregs[_REG_R9]) }
+func (c *sigctxt) r10() uint64 { return uint64(c.regs().gregs[_REG_R10]) }
+func (c *sigctxt) r11() uint64 { return uint64(c.regs().gregs[_REG_R11]) }
+func (c *sigctxt) r12() uint64 { return uint64(c.regs().gregs[_REG_R12]) }
+func (c *sigctxt) r13() uint64 { return uint64(c.regs().gregs[_REG_R13]) }
+func (c *sigctxt) r14() uint64 { return uint64(c.regs().gregs[_REG_R14]) }
+func (c *sigctxt) r15() uint64 { return uint64(c.regs().gregs[_REG_R15]) }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return uint64(c.regs().gregs[_REG_RIP]) }
+
+func (c *sigctxt) rflags() uint64  { return uint64(c.regs().gregs[_REG_RFLAGS]) }
+func (c *sigctxt) cs() uint64      { return uint64(c.regs().gregs[_REG_CS]) }
+func (c *sigctxt) fs() uint64      { return uint64(c.regs().gregs[_REG_FS]) }
+func (c *sigctxt) gs() uint64      { return uint64(c.regs().gregs[_REG_GS]) }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return *(*uint64)(unsafe.Pointer(&c.info.__data[0])) }
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().gregs[_REG_RIP] = int64(x) }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().gregs[_REG_RSP] = int64(x) }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(unsafe.Pointer(&c.info.__data[0])) = uintptr(x)
+}
diff --git a/src/runtime/signal_unix.go b/src/runtime/signal_unix.go
new file mode 100644
index 0000000..89f936e
--- /dev/null
+++ b/src/runtime/signal_unix.go
@@ -0,0 +1,1232 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// sigTabT is the type of an entry in the global sigtable array.
+// sigtable is inherently system dependent, and appears in OS-specific files,
+// but sigTabT is the same for all Unixy systems.
+// The sigtable array is indexed by a system signal number to get the flags
+// and printable name of each signal.
+type sigTabT struct {
+	flags int32
+	name  string
+}
+
+//go:linkname os_sigpipe os.sigpipe
+func os_sigpipe() {
+	systemstack(sigpipe)
+}
+
+func signame(sig uint32) string {
+	if sig >= uint32(len(sigtable)) {
+		return ""
+	}
+	return sigtable[sig].name
+}
+
+const (
+	_SIG_DFL uintptr = 0
+	_SIG_IGN uintptr = 1
+)
+
+// sigPreempt is the signal used for non-cooperative preemption.
+//
+// There's no good way to choose this signal, but there are some
+// heuristics:
+//
+// 1. It should be a signal that's passed-through by debuggers by
+// default. On Linux, this is SIGALRM, SIGURG, SIGCHLD, SIGIO,
+// SIGVTALRM, SIGPROF, and SIGWINCH, plus some glibc-internal signals.
+//
+// 2. It shouldn't be used internally by libc in mixed Go/C binaries
+// because libc may assume it's the only thing that can handle these
+// signals. For example SIGCANCEL or SIGSETXID.
+//
+// 3. It should be a signal that can happen spuriously without
+// consequences. For example, SIGALRM is a bad choice because the
+// signal handler can't tell if it was caused by the real process
+// alarm or not (arguably this means the signal is broken, but I
+// digress). SIGUSR1 and SIGUSR2 are also bad because those are often
+// used in meaningful ways by applications.
+//
+// 4. We need to deal with platforms without real-time signals (like
+// macOS), so those are out.
+//
+// We use SIGURG because it meets all of these criteria, is extremely
+// unlikely to be used by an application for its "real" meaning (both
+// because out-of-band data is basically unused and because SIGURG
+// doesn't report which socket has the condition, making it pretty
+// useless), and even if it is, the application has to be ready for
+// spurious SIGURG. SIGIO wouldn't be a bad choice either, but is more
+// likely to be used for real.
+const sigPreempt = _SIGURG
+
+// Stores the signal handlers registered before Go installed its own.
+// These signal handlers will be invoked in cases where Go doesn't want to
+// handle a particular signal (e.g., signal occurred on a non-Go thread).
+// See sigfwdgo for more information on when the signals are forwarded.
+//
+// This is read by the signal handler; accesses should use
+// atomic.Loaduintptr and atomic.Storeuintptr.
+var fwdSig [_NSIG]uintptr
+
+// handlingSig is indexed by signal number and is non-zero if we are
+// currently handling the signal. Or, to put it another way, whether
+// the signal handler is currently set to the Go signal handler or not.
+// This is uint32 rather than bool so that we can use atomic instructions.
+var handlingSig [_NSIG]uint32
+
+// channels for synchronizing signal mask updates with the signal mask
+// thread
+var (
+	disableSigChan  chan uint32
+	enableSigChan   chan uint32
+	maskUpdatedChan chan struct{}
+)
+
+func init() {
+	// _NSIG is the number of signals on this operating system.
+	// sigtable should describe what to do for all the possible signals.
+	if len(sigtable) != _NSIG {
+		print("runtime: len(sigtable)=", len(sigtable), " _NSIG=", _NSIG, "\n")
+		throw("bad sigtable len")
+	}
+}
+
+var signalsOK bool
+
+// Initialize signals.
+// Called by libpreinit so runtime may not be initialized.
+//go:nosplit
+//go:nowritebarrierrec
+func initsig(preinit bool) {
+	if !preinit {
+		// It's now OK for signal handlers to run.
+		signalsOK = true
+	}
+
+	// For c-archive/c-shared this is called by libpreinit with
+	// preinit == true.
+	if (isarchive || islibrary) && !preinit {
+		return
+	}
+
+	for i := uint32(0); i < _NSIG; i++ {
+		t := &sigtable[i]
+		if t.flags == 0 || t.flags&_SigDefault != 0 {
+			continue
+		}
+
+		// We don't need to use atomic operations here because
+		// there shouldn't be any other goroutines running yet.
+		fwdSig[i] = getsig(i)
+
+		if !sigInstallGoHandler(i) {
+			// Even if we are not installing a signal handler,
+			// set SA_ONSTACK if necessary.
+			if fwdSig[i] != _SIG_DFL && fwdSig[i] != _SIG_IGN {
+				setsigstack(i)
+			} else if fwdSig[i] == _SIG_IGN {
+				sigInitIgnored(i)
+			}
+			continue
+		}
+
+		handlingSig[i] = 1
+		setsig(i, funcPC(sighandler))
+	}
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigInstallGoHandler(sig uint32) bool {
+	// For some signals, we respect an inherited SIG_IGN handler
+	// rather than insist on installing our own default handler.
+	// Even these signals can be fetched using the os/signal package.
+	switch sig {
+	case _SIGHUP, _SIGINT:
+		if atomic.Loaduintptr(&fwdSig[sig]) == _SIG_IGN {
+			return false
+		}
+	}
+
+	t := &sigtable[sig]
+	if t.flags&_SigSetStack != 0 {
+		return false
+	}
+
+	// When built using c-archive or c-shared, only install signal
+	// handlers for synchronous signals and SIGPIPE.
+	if (isarchive || islibrary) && t.flags&_SigPanic == 0 && sig != _SIGPIPE {
+		return false
+	}
+
+	return true
+}
+
+// sigenable enables the Go signal handler to catch the signal sig.
+// It is only called while holding the os/signal.handlers lock,
+// via os/signal.enableSignal and signal_enable.
+func sigenable(sig uint32) {
+	if sig >= uint32(len(sigtable)) {
+		return
+	}
+
+	// SIGPROF is handled specially for profiling.
+	if sig == _SIGPROF {
+		return
+	}
+
+	t := &sigtable[sig]
+	if t.flags&_SigNotify != 0 {
+		ensureSigM()
+		enableSigChan <- sig
+		<-maskUpdatedChan
+		if atomic.Cas(&handlingSig[sig], 0, 1) {
+			atomic.Storeuintptr(&fwdSig[sig], getsig(sig))
+			setsig(sig, funcPC(sighandler))
+		}
+	}
+}
+
+// sigdisable disables the Go signal handler for the signal sig.
+// It is only called while holding the os/signal.handlers lock,
+// via os/signal.disableSignal and signal_disable.
+func sigdisable(sig uint32) {
+	if sig >= uint32(len(sigtable)) {
+		return
+	}
+
+	// SIGPROF is handled specially for profiling.
+	if sig == _SIGPROF {
+		return
+	}
+
+	t := &sigtable[sig]
+	if t.flags&_SigNotify != 0 {
+		ensureSigM()
+		disableSigChan <- sig
+		<-maskUpdatedChan
+
+		// If initsig does not install a signal handler for a
+		// signal, then to go back to the state before Notify
+		// we should remove the one we installed.
+		if !sigInstallGoHandler(sig) {
+			atomic.Store(&handlingSig[sig], 0)
+			setsig(sig, atomic.Loaduintptr(&fwdSig[sig]))
+		}
+	}
+}
+
+// sigignore ignores the signal sig.
+// It is only called while holding the os/signal.handlers lock,
+// via os/signal.ignoreSignal and signal_ignore.
+func sigignore(sig uint32) {
+	if sig >= uint32(len(sigtable)) {
+		return
+	}
+
+	// SIGPROF is handled specially for profiling.
+	if sig == _SIGPROF {
+		return
+	}
+
+	t := &sigtable[sig]
+	if t.flags&_SigNotify != 0 {
+		atomic.Store(&handlingSig[sig], 0)
+		setsig(sig, _SIG_IGN)
+	}
+}
+
+// clearSignalHandlers clears all signal handlers that are not ignored
+// back to the default. This is called by the child after a fork, so that
+// we can enable the signal mask for the exec without worrying about
+// running a signal handler in the child.
+//go:nosplit
+//go:nowritebarrierrec
+func clearSignalHandlers() {
+	for i := uint32(0); i < _NSIG; i++ {
+		if atomic.Load(&handlingSig[i]) != 0 {
+			setsig(i, _SIG_DFL)
+		}
+	}
+}
+
+// setProcessCPUProfiler is called when the profiling timer changes.
+// It is called with prof.lock held. hz is the new timer, and is 0 if
+// profiling is being disabled. Enable or disable the signal as
+// required for -buildmode=c-archive.
+func setProcessCPUProfiler(hz int32) {
+	if hz != 0 {
+		// Enable the Go signal handler if not enabled.
+		if atomic.Cas(&handlingSig[_SIGPROF], 0, 1) {
+			atomic.Storeuintptr(&fwdSig[_SIGPROF], getsig(_SIGPROF))
+			setsig(_SIGPROF, funcPC(sighandler))
+		}
+
+		var it itimerval
+		it.it_interval.tv_sec = 0
+		it.it_interval.set_usec(1000000 / hz)
+		it.it_value = it.it_interval
+		setitimer(_ITIMER_PROF, &it, nil)
+	} else {
+		// If the Go signal handler should be disabled by default,
+		// switch back to the signal handler that was installed
+		// when we enabled profiling. We don't try to handle the case
+		// of a program that changes the SIGPROF handler while Go
+		// profiling is enabled.
+		//
+		// If no signal handler was installed before, then start
+		// ignoring SIGPROF signals. We do this, rather than change
+		// to SIG_DFL, because there may be a pending SIGPROF
+		// signal that has not yet been delivered to some other thread.
+		// If we change to SIG_DFL here, the program will crash
+		// when that SIGPROF is delivered. We assume that programs
+		// that use profiling don't want to crash on a stray SIGPROF.
+		// See issue 19320.
+		if !sigInstallGoHandler(_SIGPROF) {
+			if atomic.Cas(&handlingSig[_SIGPROF], 1, 0) {
+				h := atomic.Loaduintptr(&fwdSig[_SIGPROF])
+				if h == _SIG_DFL {
+					h = _SIG_IGN
+				}
+				setsig(_SIGPROF, h)
+			}
+		}
+
+		setitimer(_ITIMER_PROF, &itimerval{}, nil)
+	}
+}
+
+// setThreadCPUProfiler makes any thread-specific changes required to
+// implement profiling at a rate of hz.
+// No changes required on Unix systems.
+func setThreadCPUProfiler(hz int32) {
+	getg().m.profilehz = hz
+}
+
+func sigpipe() {
+	if signal_ignored(_SIGPIPE) || sigsend(_SIGPIPE) {
+		return
+	}
+	dieFromSignal(_SIGPIPE)
+}
+
+// doSigPreempt handles a preemption signal on gp.
+func doSigPreempt(gp *g, ctxt *sigctxt) {
+	// Check if this G wants to be preempted and is safe to
+	// preempt.
+	if wantAsyncPreempt(gp) {
+		if ok, newpc := isAsyncSafePoint(gp, ctxt.sigpc(), ctxt.sigsp(), ctxt.siglr()); ok {
+			// Adjust the PC and inject a call to asyncPreempt.
+			ctxt.pushCall(funcPC(asyncPreempt), newpc)
+		}
+	}
+
+	// Acknowledge the preemption.
+	atomic.Xadd(&gp.m.preemptGen, 1)
+	atomic.Store(&gp.m.signalPending, 0)
+
+	if GOOS == "darwin" || GOOS == "ios" {
+		atomic.Xadd(&pendingPreemptSignals, -1)
+	}
+}
+
+const preemptMSupported = true
+
+// preemptM sends a preemption request to mp. This request may be
+// handled asynchronously and may be coalesced with other requests to
+// the M. When the request is received, if the running G or P are
+// marked for preemption and the goroutine is at an asynchronous
+// safe-point, it will preempt the goroutine. It always atomically
+// increments mp.preemptGen after handling a preemption request.
+func preemptM(mp *m) {
+	// On Darwin, don't try to preempt threads during exec.
+	// Issue #41702.
+	if GOOS == "darwin" || GOOS == "ios" {
+		execLock.rlock()
+	}
+
+	if atomic.Cas(&mp.signalPending, 0, 1) {
+		if GOOS == "darwin" || GOOS == "ios" {
+			atomic.Xadd(&pendingPreemptSignals, 1)
+		}
+
+		// If multiple threads are preempting the same M, it may send many
+		// signals to the same M such that it hardly make progress, causing
+		// live-lock problem. Apparently this could happen on darwin. See
+		// issue #37741.
+		// Only send a signal if there isn't already one pending.
+		signalM(mp, sigPreempt)
+	}
+
+	if GOOS == "darwin" || GOOS == "ios" {
+		execLock.runlock()
+	}
+}
+
+// sigFetchG fetches the value of G safely when running in a signal handler.
+// On some architectures, the g value may be clobbered when running in a VDSO.
+// See issue #32912.
+//
+//go:nosplit
+func sigFetchG(c *sigctxt) *g {
+	switch GOARCH {
+	case "arm", "arm64", "ppc64", "ppc64le":
+		if !iscgo && inVDSOPage(c.sigpc()) {
+			// When using cgo, we save the g on TLS and load it from there
+			// in sigtramp. Just use that.
+			// Otherwise, before making a VDSO call we save the g to the
+			// bottom of the signal stack. Fetch from there.
+			// TODO: in efence mode, stack is sysAlloc'd, so this wouldn't
+			// work.
+			sp := getcallersp()
+			s := spanOf(sp)
+			if s != nil && s.state.get() == mSpanManual && s.base() < sp && sp < s.limit {
+				gp := *(**g)(unsafe.Pointer(s.base()))
+				return gp
+			}
+			return nil
+		}
+	}
+	return getg()
+}
+
+// sigtrampgo is called from the signal handler function, sigtramp,
+// written in assembly code.
+// This is called by the signal handler, and the world may be stopped.
+//
+// It must be nosplit because getg() is still the G that was running
+// (if any) when the signal was delivered, but it's (usually) called
+// on the gsignal stack. Until this switches the G to gsignal, the
+// stack bounds check won't work.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func sigtrampgo(sig uint32, info *siginfo, ctx unsafe.Pointer) {
+	if sigfwdgo(sig, info, ctx) {
+		return
+	}
+	c := &sigctxt{info, ctx}
+	g := sigFetchG(c)
+	setg(g)
+	if g == nil {
+		if sig == _SIGPROF {
+			sigprofNonGoPC(c.sigpc())
+			return
+		}
+		if sig == sigPreempt && preemptMSupported && debug.asyncpreemptoff == 0 {
+			// This is probably a signal from preemptM sent
+			// while executing Go code but received while
+			// executing non-Go code.
+			// We got past sigfwdgo, so we know that there is
+			// no non-Go signal handler for sigPreempt.
+			// The default behavior for sigPreempt is to ignore
+			// the signal, so badsignal will be a no-op anyway.
+			if GOOS == "darwin" || GOOS == "ios" {
+				atomic.Xadd(&pendingPreemptSignals, -1)
+			}
+			return
+		}
+		c.fixsigcode(sig)
+		badsignal(uintptr(sig), c)
+		return
+	}
+
+	setg(g.m.gsignal)
+
+	// If some non-Go code called sigaltstack, adjust.
+	var gsignalStack gsignalStack
+	setStack := adjustSignalStack(sig, g.m, &gsignalStack)
+	if setStack {
+		g.m.gsignal.stktopsp = getcallersp()
+	}
+
+	if g.stackguard0 == stackFork {
+		signalDuringFork(sig)
+	}
+
+	c.fixsigcode(sig)
+	sighandler(sig, info, ctx, g)
+	setg(g)
+	if setStack {
+		restoreGsignalStack(&gsignalStack)
+	}
+}
+
+// adjustSignalStack adjusts the current stack guard based on the
+// stack pointer that is actually in use while handling a signal.
+// We do this in case some non-Go code called sigaltstack.
+// This reports whether the stack was adjusted, and if so stores the old
+// signal stack in *gsigstack.
+//go:nosplit
+func adjustSignalStack(sig uint32, mp *m, gsigStack *gsignalStack) bool {
+	sp := uintptr(unsafe.Pointer(&sig))
+	if sp >= mp.gsignal.stack.lo && sp < mp.gsignal.stack.hi {
+		return false
+	}
+
+	var st stackt
+	sigaltstack(nil, &st)
+	stsp := uintptr(unsafe.Pointer(st.ss_sp))
+	if st.ss_flags&_SS_DISABLE == 0 && sp >= stsp && sp < stsp+st.ss_size {
+		setGsignalStack(&st, gsigStack)
+		return true
+	}
+
+	if sp >= mp.g0.stack.lo && sp < mp.g0.stack.hi {
+		// The signal was delivered on the g0 stack.
+		// This can happen when linked with C code
+		// using the thread sanitizer, which collects
+		// signals then delivers them itself by calling
+		// the signal handler directly when C code,
+		// including C code called via cgo, calls a
+		// TSAN-intercepted function such as malloc.
+		//
+		// We check this condition last as g0.stack.lo
+		// may be not very accurate (see mstart).
+		st := stackt{ss_size: mp.g0.stack.hi - mp.g0.stack.lo}
+		setSignalstackSP(&st, mp.g0.stack.lo)
+		setGsignalStack(&st, gsigStack)
+		return true
+	}
+
+	// sp is not within gsignal stack, g0 stack, or sigaltstack. Bad.
+	setg(nil)
+	needm()
+	if st.ss_flags&_SS_DISABLE != 0 {
+		noSignalStack(sig)
+	} else {
+		sigNotOnStack(sig)
+	}
+	dropm()
+	return false
+}
+
+// crashing is the number of m's we have waited for when implementing
+// GOTRACEBACK=crash when a signal is received.
+var crashing int32
+
+// testSigtrap and testSigusr1 are used by the runtime tests. If
+// non-nil, it is called on SIGTRAP/SIGUSR1. If it returns true, the
+// normal behavior on this signal is suppressed.
+var testSigtrap func(info *siginfo, ctxt *sigctxt, gp *g) bool
+var testSigusr1 func(gp *g) bool
+
+// sighandler is invoked when a signal occurs. The global g will be
+// set to a gsignal goroutine and we will be running on the alternate
+// signal stack. The parameter g will be the value of the global g
+// when the signal occurred. The sig, info, and ctxt parameters are
+// from the system signal handler: they are the parameters passed when
+// the SA is passed to the sigaction system call.
+//
+// The garbage collector may have stopped the world, so write barriers
+// are not allowed.
+//
+//go:nowritebarrierrec
+func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) {
+	_g_ := getg()
+	c := &sigctxt{info, ctxt}
+
+	if sig == _SIGPROF {
+		sigprof(c.sigpc(), c.sigsp(), c.siglr(), gp, _g_.m)
+		return
+	}
+
+	if sig == _SIGTRAP && testSigtrap != nil && testSigtrap(info, (*sigctxt)(noescape(unsafe.Pointer(c))), gp) {
+		return
+	}
+
+	if sig == _SIGUSR1 && testSigusr1 != nil && testSigusr1(gp) {
+		return
+	}
+
+	if sig == sigPreempt && debug.asyncpreemptoff == 0 {
+		// Might be a preemption signal.
+		doSigPreempt(gp, c)
+		// Even if this was definitely a preemption signal, it
+		// may have been coalesced with another signal, so we
+		// still let it through to the application.
+	}
+
+	flags := int32(_SigThrow)
+	if sig < uint32(len(sigtable)) {
+		flags = sigtable[sig].flags
+	}
+	if c.sigcode() != _SI_USER && flags&_SigPanic != 0 && gp.throwsplit {
+		// We can't safely sigpanic because it may grow the
+		// stack. Abort in the signal handler instead.
+		flags = _SigThrow
+	}
+	if isAbortPC(c.sigpc()) {
+		// On many architectures, the abort function just
+		// causes a memory fault. Don't turn that into a panic.
+		flags = _SigThrow
+	}
+	if c.sigcode() != _SI_USER && flags&_SigPanic != 0 {
+		// The signal is going to cause a panic.
+		// Arrange the stack so that it looks like the point
+		// where the signal occurred made a call to the
+		// function sigpanic. Then set the PC to sigpanic.
+
+		// Have to pass arguments out of band since
+		// augmenting the stack frame would break
+		// the unwinding code.
+		gp.sig = sig
+		gp.sigcode0 = uintptr(c.sigcode())
+		gp.sigcode1 = uintptr(c.fault())
+		gp.sigpc = c.sigpc()
+
+		c.preparePanic(sig, gp)
+		return
+	}
+
+	if c.sigcode() == _SI_USER || flags&_SigNotify != 0 {
+		if sigsend(sig) {
+			return
+		}
+	}
+
+	if c.sigcode() == _SI_USER && signal_ignored(sig) {
+		return
+	}
+
+	if flags&_SigKill != 0 {
+		dieFromSignal(sig)
+	}
+
+	// _SigThrow means that we should exit now.
+	// If we get here with _SigPanic, it means that the signal
+	// was sent to us by a program (c.sigcode() == _SI_USER);
+	// in that case, if we didn't handle it in sigsend, we exit now.
+	if flags&(_SigThrow|_SigPanic) == 0 {
+		return
+	}
+
+	_g_.m.throwing = 1
+	_g_.m.caughtsig.set(gp)
+
+	if crashing == 0 {
+		startpanic_m()
+	}
+
+	if sig < uint32(len(sigtable)) {
+		print(sigtable[sig].name, "\n")
+	} else {
+		print("Signal ", sig, "\n")
+	}
+
+	print("PC=", hex(c.sigpc()), " m=", _g_.m.id, " sigcode=", c.sigcode(), "\n")
+	if _g_.m.lockedg != 0 && _g_.m.ncgo > 0 && gp == _g_.m.g0 {
+		print("signal arrived during cgo execution\n")
+		gp = _g_.m.lockedg.ptr()
+	}
+	if sig == _SIGILL || sig == _SIGFPE {
+		// It would be nice to know how long the instruction is.
+		// Unfortunately, that's complicated to do in general (mostly for x86
+		// and s930x, but other archs have non-standard instruction lengths also).
+		// Opt to print 16 bytes, which covers most instructions.
+		const maxN = 16
+		n := uintptr(maxN)
+		// We have to be careful, though. If we're near the end of
+		// a page and the following page isn't mapped, we could
+		// segfault. So make sure we don't straddle a page (even though
+		// that could lead to printing an incomplete instruction).
+		// We're assuming here we can read at least the page containing the PC.
+		// I suppose it is possible that the page is mapped executable but not readable?
+		pc := c.sigpc()
+		if n > physPageSize-pc%physPageSize {
+			n = physPageSize - pc%physPageSize
+		}
+		print("instruction bytes:")
+		b := (*[maxN]byte)(unsafe.Pointer(pc))
+		for i := uintptr(0); i < n; i++ {
+			print(" ", hex(b[i]))
+		}
+		println()
+	}
+	print("\n")
+
+	level, _, docrash := gotraceback()
+	if level > 0 {
+		goroutineheader(gp)
+		tracebacktrap(c.sigpc(), c.sigsp(), c.siglr(), gp)
+		if crashing > 0 && gp != _g_.m.curg && _g_.m.curg != nil && readgstatus(_g_.m.curg)&^_Gscan == _Grunning {
+			// tracebackothers on original m skipped this one; trace it now.
+			goroutineheader(_g_.m.curg)
+			traceback(^uintptr(0), ^uintptr(0), 0, _g_.m.curg)
+		} else if crashing == 0 {
+			tracebackothers(gp)
+			print("\n")
+		}
+		dumpregs(c)
+	}
+
+	if docrash {
+		crashing++
+		if crashing < mcount()-int32(extraMCount) {
+			// There are other m's that need to dump their stacks.
+			// Relay SIGQUIT to the next m by sending it to the current process.
+			// All m's that have already received SIGQUIT have signal masks blocking
+			// receipt of any signals, so the SIGQUIT will go to an m that hasn't seen it yet.
+			// When the last m receives the SIGQUIT, it will fall through to the call to
+			// crash below. Just in case the relaying gets botched, each m involved in
+			// the relay sleeps for 5 seconds and then does the crash/exit itself.
+			// In expected operation, the last m has received the SIGQUIT and run
+			// crash/exit and the process is gone, all long before any of the
+			// 5-second sleeps have finished.
+			print("\n-----\n\n")
+			raiseproc(_SIGQUIT)
+			usleep(5 * 1000 * 1000)
+		}
+		crash()
+	}
+
+	printDebugLog()
+
+	exit(2)
+}
+
+// sigpanic turns a synchronous signal into a run-time panic.
+// If the signal handler sees a synchronous panic, it arranges the
+// stack to look like the function where the signal occurred called
+// sigpanic, sets the signal's PC value to sigpanic, and returns from
+// the signal handler. The effect is that the program will act as
+// though the function that got the signal simply called sigpanic
+// instead.
+//
+// This must NOT be nosplit because the linker doesn't know where
+// sigpanic calls can be injected.
+//
+// The signal handler must not inject a call to sigpanic if
+// getg().throwsplit, since sigpanic may need to grow the stack.
+//
+// This is exported via linkname to assembly in runtime/cgo.
+//go:linkname sigpanic
+func sigpanic() {
+	g := getg()
+	if !canpanic(g) {
+		throw("unexpected signal during runtime execution")
+	}
+
+	switch g.sig {
+	case _SIGBUS:
+		if g.sigcode0 == _BUS_ADRERR && g.sigcode1 < 0x1000 {
+			panicmem()
+		}
+		// Support runtime/debug.SetPanicOnFault.
+		if g.paniconfault {
+			panicmemAddr(g.sigcode1)
+		}
+		print("unexpected fault address ", hex(g.sigcode1), "\n")
+		throw("fault")
+	case _SIGSEGV:
+		if (g.sigcode0 == 0 || g.sigcode0 == _SEGV_MAPERR || g.sigcode0 == _SEGV_ACCERR) && g.sigcode1 < 0x1000 {
+			panicmem()
+		}
+		// Support runtime/debug.SetPanicOnFault.
+		if g.paniconfault {
+			panicmemAddr(g.sigcode1)
+		}
+		print("unexpected fault address ", hex(g.sigcode1), "\n")
+		throw("fault")
+	case _SIGFPE:
+		switch g.sigcode0 {
+		case _FPE_INTDIV:
+			panicdivide()
+		case _FPE_INTOVF:
+			panicoverflow()
+		}
+		panicfloat()
+	}
+
+	if g.sig >= uint32(len(sigtable)) {
+		// can't happen: we looked up g.sig in sigtable to decide to call sigpanic
+		throw("unexpected signal value")
+	}
+	panic(errorString(sigtable[g.sig].name))
+}
+
+// dieFromSignal kills the program with a signal.
+// This provides the expected exit status for the shell.
+// This is only called with fatal signals expected to kill the process.
+//go:nosplit
+//go:nowritebarrierrec
+func dieFromSignal(sig uint32) {
+	unblocksig(sig)
+	// Mark the signal as unhandled to ensure it is forwarded.
+	atomic.Store(&handlingSig[sig], 0)
+	raise(sig)
+
+	// That should have killed us. On some systems, though, raise
+	// sends the signal to the whole process rather than to just
+	// the current thread, which means that the signal may not yet
+	// have been delivered. Give other threads a chance to run and
+	// pick up the signal.
+	osyield()
+	osyield()
+	osyield()
+
+	// If that didn't work, try _SIG_DFL.
+	setsig(sig, _SIG_DFL)
+	raise(sig)
+
+	osyield()
+	osyield()
+	osyield()
+
+	// If we are still somehow running, just exit with the wrong status.
+	exit(2)
+}
+
+// raisebadsignal is called when a signal is received on a non-Go
+// thread, and the Go program does not want to handle it (that is, the
+// program has not called os/signal.Notify for the signal).
+func raisebadsignal(sig uint32, c *sigctxt) {
+	if sig == _SIGPROF {
+		// Ignore profiling signals that arrive on non-Go threads.
+		return
+	}
+
+	var handler uintptr
+	if sig >= _NSIG {
+		handler = _SIG_DFL
+	} else {
+		handler = atomic.Loaduintptr(&fwdSig[sig])
+	}
+
+	// Reset the signal handler and raise the signal.
+	// We are currently running inside a signal handler, so the
+	// signal is blocked. We need to unblock it before raising the
+	// signal, or the signal we raise will be ignored until we return
+	// from the signal handler. We know that the signal was unblocked
+	// before entering the handler, or else we would not have received
+	// it. That means that we don't have to worry about blocking it
+	// again.
+	unblocksig(sig)
+	setsig(sig, handler)
+
+	// If we're linked into a non-Go program we want to try to
+	// avoid modifying the original context in which the signal
+	// was raised. If the handler is the default, we know it
+	// is non-recoverable, so we don't have to worry about
+	// re-installing sighandler. At this point we can just
+	// return and the signal will be re-raised and caught by
+	// the default handler with the correct context.
+	//
+	// On FreeBSD, the libthr sigaction code prevents
+	// this from working so we fall through to raise.
+	if GOOS != "freebsd" && (isarchive || islibrary) && handler == _SIG_DFL && c.sigcode() != _SI_USER {
+		return
+	}
+
+	raise(sig)
+
+	// Give the signal a chance to be delivered.
+	// In almost all real cases the program is about to crash,
+	// so sleeping here is not a waste of time.
+	usleep(1000)
+
+	// If the signal didn't cause the program to exit, restore the
+	// Go signal handler and carry on.
+	//
+	// We may receive another instance of the signal before we
+	// restore the Go handler, but that is not so bad: we know
+	// that the Go program has been ignoring the signal.
+	setsig(sig, funcPC(sighandler))
+}
+
+//go:nosplit
+func crash() {
+	// OS X core dumps are linear dumps of the mapped memory,
+	// from the first virtual byte to the last, with zeros in the gaps.
+	// Because of the way we arrange the address space on 64-bit systems,
+	// this means the OS X core file will be >128 GB and even on a zippy
+	// workstation can take OS X well over an hour to write (uninterruptible).
+	// Save users from making that mistake.
+	if GOOS == "darwin" && GOARCH == "amd64" {
+		return
+	}
+
+	dieFromSignal(_SIGABRT)
+}
+
+// ensureSigM starts one global, sleeping thread to make sure at least one thread
+// is available to catch signals enabled for os/signal.
+func ensureSigM() {
+	if maskUpdatedChan != nil {
+		return
+	}
+	maskUpdatedChan = make(chan struct{})
+	disableSigChan = make(chan uint32)
+	enableSigChan = make(chan uint32)
+	go func() {
+		// Signal masks are per-thread, so make sure this goroutine stays on one
+		// thread.
+		LockOSThread()
+		defer UnlockOSThread()
+		// The sigBlocked mask contains the signals not active for os/signal,
+		// initially all signals except the essential. When signal.Notify()/Stop is called,
+		// sigenable/sigdisable in turn notify this thread to update its signal
+		// mask accordingly.
+		sigBlocked := sigset_all
+		for i := range sigtable {
+			if !blockableSig(uint32(i)) {
+				sigdelset(&sigBlocked, i)
+			}
+		}
+		sigprocmask(_SIG_SETMASK, &sigBlocked, nil)
+		for {
+			select {
+			case sig := <-enableSigChan:
+				if sig > 0 {
+					sigdelset(&sigBlocked, int(sig))
+				}
+			case sig := <-disableSigChan:
+				if sig > 0 && blockableSig(sig) {
+					sigaddset(&sigBlocked, int(sig))
+				}
+			}
+			sigprocmask(_SIG_SETMASK, &sigBlocked, nil)
+			maskUpdatedChan <- struct{}{}
+		}
+	}()
+}
+
+// This is called when we receive a signal when there is no signal stack.
+// This can only happen if non-Go code calls sigaltstack to disable the
+// signal stack.
+func noSignalStack(sig uint32) {
+	println("signal", sig, "received on thread with no signal stack")
+	throw("non-Go code disabled sigaltstack")
+}
+
+// This is called if we receive a signal when there is a signal stack
+// but we are not on it. This can only happen if non-Go code called
+// sigaction without setting the SS_ONSTACK flag.
+func sigNotOnStack(sig uint32) {
+	println("signal", sig, "received but handler not on signal stack")
+	throw("non-Go code set up signal handler without SA_ONSTACK flag")
+}
+
+// signalDuringFork is called if we receive a signal while doing a fork.
+// We do not want signals at that time, as a signal sent to the process
+// group may be delivered to the child process, causing confusion.
+// This should never be called, because we block signals across the fork;
+// this function is just a safety check. See issue 18600 for background.
+func signalDuringFork(sig uint32) {
+	println("signal", sig, "received during fork")
+	throw("signal received during fork")
+}
+
+var badginsignalMsg = "fatal: bad g in signal handler\n"
+
+// This runs on a foreign stack, without an m or a g. No stack split.
+//go:nosplit
+//go:norace
+//go:nowritebarrierrec
+func badsignal(sig uintptr, c *sigctxt) {
+	if !iscgo && !cgoHasExtraM {
+		// There is no extra M. needm will not be able to grab
+		// an M. Instead of hanging, just crash.
+		// Cannot call split-stack function as there is no G.
+		s := stringStructOf(&badginsignalMsg)
+		write(2, s.str, int32(s.len))
+		exit(2)
+		*(*uintptr)(unsafe.Pointer(uintptr(123))) = 2
+	}
+	needm()
+	if !sigsend(uint32(sig)) {
+		// A foreign thread received the signal sig, and the
+		// Go code does not want to handle it.
+		raisebadsignal(uint32(sig), c)
+	}
+	dropm()
+}
+
+//go:noescape
+func sigfwd(fn uintptr, sig uint32, info *siginfo, ctx unsafe.Pointer)
+
+// Determines if the signal should be handled by Go and if not, forwards the
+// signal to the handler that was installed before Go's. Returns whether the
+// signal was forwarded.
+// This is called by the signal handler, and the world may be stopped.
+//go:nosplit
+//go:nowritebarrierrec
+func sigfwdgo(sig uint32, info *siginfo, ctx unsafe.Pointer) bool {
+	if sig >= uint32(len(sigtable)) {
+		return false
+	}
+	fwdFn := atomic.Loaduintptr(&fwdSig[sig])
+	flags := sigtable[sig].flags
+
+	// If we aren't handling the signal, forward it.
+	if atomic.Load(&handlingSig[sig]) == 0 || !signalsOK {
+		// If the signal is ignored, doing nothing is the same as forwarding.
+		if fwdFn == _SIG_IGN || (fwdFn == _SIG_DFL && flags&_SigIgn != 0) {
+			return true
+		}
+		// We are not handling the signal and there is no other handler to forward to.
+		// Crash with the default behavior.
+		if fwdFn == _SIG_DFL {
+			setsig(sig, _SIG_DFL)
+			dieFromSignal(sig)
+			return false
+		}
+
+		sigfwd(fwdFn, sig, info, ctx)
+		return true
+	}
+
+	// This function and its caller sigtrampgo assumes SIGPIPE is delivered on the
+	// originating thread. This property does not hold on macOS (golang.org/issue/33384),
+	// so we have no choice but to ignore SIGPIPE.
+	if (GOOS == "darwin" || GOOS == "ios") && sig == _SIGPIPE {
+		return true
+	}
+
+	// If there is no handler to forward to, no need to forward.
+	if fwdFn == _SIG_DFL {
+		return false
+	}
+
+	c := &sigctxt{info, ctx}
+	// Only forward synchronous signals and SIGPIPE.
+	// Unfortunately, user generated SIGPIPEs will also be forwarded, because si_code
+	// is set to _SI_USER even for a SIGPIPE raised from a write to a closed socket
+	// or pipe.
+	if (c.sigcode() == _SI_USER || flags&_SigPanic == 0) && sig != _SIGPIPE {
+		return false
+	}
+	// Determine if the signal occurred inside Go code. We test that:
+	//   (1) we weren't in VDSO page,
+	//   (2) we were in a goroutine (i.e., m.curg != nil), and
+	//   (3) we weren't in CGO.
+	g := sigFetchG(c)
+	if g != nil && g.m != nil && g.m.curg != nil && !g.m.incgo {
+		return false
+	}
+
+	// Signal not handled by Go, forward it.
+	if fwdFn != _SIG_IGN {
+		sigfwd(fwdFn, sig, info, ctx)
+	}
+
+	return true
+}
+
+// sigsave saves the current thread's signal mask into *p.
+// This is used to preserve the non-Go signal mask when a non-Go
+// thread calls a Go function.
+// This is nosplit and nowritebarrierrec because it is called by needm
+// which may be called on a non-Go thread with no g available.
+//go:nosplit
+//go:nowritebarrierrec
+func sigsave(p *sigset) {
+	sigprocmask(_SIG_SETMASK, nil, p)
+}
+
+// msigrestore sets the current thread's signal mask to sigmask.
+// This is used to restore the non-Go signal mask when a non-Go thread
+// calls a Go function.
+// This is nosplit and nowritebarrierrec because it is called by dropm
+// after g has been cleared.
+//go:nosplit
+//go:nowritebarrierrec
+func msigrestore(sigmask sigset) {
+	sigprocmask(_SIG_SETMASK, &sigmask, nil)
+}
+
+// sigsetAllExiting is used by sigblock(true) when a thread is
+// exiting. sigset_all is defined in OS specific code, and per GOOS
+// behavior may override this default for sigsetAllExiting: see
+// osinit().
+var sigsetAllExiting = sigset_all
+
+// sigblock blocks signals in the current thread's signal mask.
+// This is used to block signals while setting up and tearing down g
+// when a non-Go thread calls a Go function. When a thread is exiting
+// we use the sigsetAllExiting value, otherwise the OS specific
+// definition of sigset_all is used.
+// This is nosplit and nowritebarrierrec because it is called by needm
+// which may be called on a non-Go thread with no g available.
+//go:nosplit
+//go:nowritebarrierrec
+func sigblock(exiting bool) {
+	if exiting {
+		sigprocmask(_SIG_SETMASK, &sigsetAllExiting, nil)
+		return
+	}
+	sigprocmask(_SIG_SETMASK, &sigset_all, nil)
+}
+
+// unblocksig removes sig from the current thread's signal mask.
+// This is nosplit and nowritebarrierrec because it is called from
+// dieFromSignal, which can be called by sigfwdgo while running in the
+// signal handler, on the signal stack, with no g available.
+//go:nosplit
+//go:nowritebarrierrec
+func unblocksig(sig uint32) {
+	var set sigset
+	sigaddset(&set, int(sig))
+	sigprocmask(_SIG_UNBLOCK, &set, nil)
+}
+
+// minitSignals is called when initializing a new m to set the
+// thread's alternate signal stack and signal mask.
+func minitSignals() {
+	minitSignalStack()
+	minitSignalMask()
+}
+
+// minitSignalStack is called when initializing a new m to set the
+// alternate signal stack. If the alternate signal stack is not set
+// for the thread (the normal case) then set the alternate signal
+// stack to the gsignal stack. If the alternate signal stack is set
+// for the thread (the case when a non-Go thread sets the alternate
+// signal stack and then calls a Go function) then set the gsignal
+// stack to the alternate signal stack. We also set the alternate
+// signal stack to the gsignal stack if cgo is not used (regardless
+// of whether it is already set). Record which choice was made in
+// newSigstack, so that it can be undone in unminit.
+func minitSignalStack() {
+	_g_ := getg()
+	var st stackt
+	sigaltstack(nil, &st)
+	if st.ss_flags&_SS_DISABLE != 0 || !iscgo {
+		signalstack(&_g_.m.gsignal.stack)
+		_g_.m.newSigstack = true
+	} else {
+		setGsignalStack(&st, &_g_.m.goSigStack)
+		_g_.m.newSigstack = false
+	}
+}
+
+// minitSignalMask is called when initializing a new m to set the
+// thread's signal mask. When this is called all signals have been
+// blocked for the thread.  This starts with m.sigmask, which was set
+// either from initSigmask for a newly created thread or by calling
+// sigsave if this is a non-Go thread calling a Go function. It
+// removes all essential signals from the mask, thus causing those
+// signals to not be blocked. Then it sets the thread's signal mask.
+// After this is called the thread can receive signals.
+func minitSignalMask() {
+	nmask := getg().m.sigmask
+	for i := range sigtable {
+		if !blockableSig(uint32(i)) {
+			sigdelset(&nmask, i)
+		}
+	}
+	sigprocmask(_SIG_SETMASK, &nmask, nil)
+}
+
+// unminitSignals is called from dropm, via unminit, to undo the
+// effect of calling minit on a non-Go thread.
+//go:nosplit
+func unminitSignals() {
+	if getg().m.newSigstack {
+		st := stackt{ss_flags: _SS_DISABLE}
+		sigaltstack(&st, nil)
+	} else {
+		// We got the signal stack from someone else. Restore
+		// the Go-allocated stack in case this M gets reused
+		// for another thread (e.g., it's an extram). Also, on
+		// Android, libc allocates a signal stack for all
+		// threads, so it's important to restore the Go stack
+		// even on Go-created threads so we can free it.
+		restoreGsignalStack(&getg().m.goSigStack)
+	}
+}
+
+// blockableSig reports whether sig may be blocked by the signal mask.
+// We never want to block the signals marked _SigUnblock;
+// these are the synchronous signals that turn into a Go panic.
+// In a Go program--not a c-archive/c-shared--we never want to block
+// the signals marked _SigKill or _SigThrow, as otherwise it's possible
+// for all running threads to block them and delay their delivery until
+// we start a new thread. When linked into a C program we let the C code
+// decide on the disposition of those signals.
+func blockableSig(sig uint32) bool {
+	flags := sigtable[sig].flags
+	if flags&_SigUnblock != 0 {
+		return false
+	}
+	if isarchive || islibrary {
+		return true
+	}
+	return flags&(_SigKill|_SigThrow) == 0
+}
+
+// gsignalStack saves the fields of the gsignal stack changed by
+// setGsignalStack.
+type gsignalStack struct {
+	stack       stack
+	stackguard0 uintptr
+	stackguard1 uintptr
+	stktopsp    uintptr
+}
+
+// setGsignalStack sets the gsignal stack of the current m to an
+// alternate signal stack returned from the sigaltstack system call.
+// It saves the old values in *old for use by restoreGsignalStack.
+// This is used when handling a signal if non-Go code has set the
+// alternate signal stack.
+//go:nosplit
+//go:nowritebarrierrec
+func setGsignalStack(st *stackt, old *gsignalStack) {
+	g := getg()
+	if old != nil {
+		old.stack = g.m.gsignal.stack
+		old.stackguard0 = g.m.gsignal.stackguard0
+		old.stackguard1 = g.m.gsignal.stackguard1
+		old.stktopsp = g.m.gsignal.stktopsp
+	}
+	stsp := uintptr(unsafe.Pointer(st.ss_sp))
+	g.m.gsignal.stack.lo = stsp
+	g.m.gsignal.stack.hi = stsp + st.ss_size
+	g.m.gsignal.stackguard0 = stsp + _StackGuard
+	g.m.gsignal.stackguard1 = stsp + _StackGuard
+}
+
+// restoreGsignalStack restores the gsignal stack to the value it had
+// before entering the signal handler.
+//go:nosplit
+//go:nowritebarrierrec
+func restoreGsignalStack(st *gsignalStack) {
+	gp := getg().m.gsignal
+	gp.stack = st.stack
+	gp.stackguard0 = st.stackguard0
+	gp.stackguard1 = st.stackguard1
+	gp.stktopsp = st.stktopsp
+}
+
+// signalstack sets the current thread's alternate signal stack to s.
+//go:nosplit
+func signalstack(s *stack) {
+	st := stackt{ss_size: s.hi - s.lo}
+	setSignalstackSP(&st, s.lo)
+	sigaltstack(&st, nil)
+}
+
+// setsigsegv is used on darwin/arm64 to fake a segmentation fault.
+//
+// This is exported via linkname to assembly in runtime/cgo.
+//
+//go:nosplit
+//go:linkname setsigsegv
+func setsigsegv(pc uintptr) {
+	g := getg()
+	g.sig = _SIGSEGV
+	g.sigpc = pc
+	g.sigcode0 = _SEGV_MAPERR
+	g.sigcode1 = 0 // TODO: emulate si_addr
+}
diff --git a/src/runtime/signal_windows.go b/src/runtime/signal_windows.go
new file mode 100644
index 0000000..3af2e39
--- /dev/null
+++ b/src/runtime/signal_windows.go
@@ -0,0 +1,310 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+func disableWER() {
+	// do not display Windows Error Reporting dialogue
+	const (
+		SEM_FAILCRITICALERRORS     = 0x0001
+		SEM_NOGPFAULTERRORBOX      = 0x0002
+		SEM_NOALIGNMENTFAULTEXCEPT = 0x0004
+		SEM_NOOPENFILEERRORBOX     = 0x8000
+	)
+	errormode := uint32(stdcall1(_SetErrorMode, SEM_NOGPFAULTERRORBOX))
+	stdcall1(_SetErrorMode, uintptr(errormode)|SEM_FAILCRITICALERRORS|SEM_NOGPFAULTERRORBOX|SEM_NOOPENFILEERRORBOX)
+}
+
+// in sys_windows_386.s and sys_windows_amd64.s
+func exceptiontramp()
+func firstcontinuetramp()
+func lastcontinuetramp()
+
+func initExceptionHandler() {
+	stdcall2(_AddVectoredExceptionHandler, 1, funcPC(exceptiontramp))
+	if _AddVectoredContinueHandler == nil || GOARCH == "386" {
+		// use SetUnhandledExceptionFilter for windows-386 or
+		// if VectoredContinueHandler is unavailable.
+		// note: SetUnhandledExceptionFilter handler won't be called, if debugging.
+		stdcall1(_SetUnhandledExceptionFilter, funcPC(lastcontinuetramp))
+	} else {
+		stdcall2(_AddVectoredContinueHandler, 1, funcPC(firstcontinuetramp))
+		stdcall2(_AddVectoredContinueHandler, 0, funcPC(lastcontinuetramp))
+	}
+}
+
+// isAbort returns true, if context r describes exception raised
+// by calling runtime.abort function.
+//
+//go:nosplit
+func isAbort(r *context) bool {
+	// In the case of an abort, the exception IP is one byte after
+	// the INT3 (this differs from UNIX OSes).
+	return isAbortPC(r.ip() - 1)
+}
+
+// isgoexception reports whether this exception should be translated
+// into a Go panic.
+//
+// It is nosplit to avoid growing the stack in case we're aborting
+// because of a stack overflow.
+//
+//go:nosplit
+func isgoexception(info *exceptionrecord, r *context) bool {
+	// Only handle exception if executing instructions in Go binary
+	// (not Windows library code).
+	// TODO(mwhudson): needs to loop to support shared libs
+	if r.ip() < firstmoduledata.text || firstmoduledata.etext < r.ip() {
+		return false
+	}
+
+	if isAbort(r) {
+		// Never turn abort into a panic.
+		return false
+	}
+
+	// Go will only handle some exceptions.
+	switch info.exceptioncode {
+	default:
+		return false
+	case _EXCEPTION_ACCESS_VIOLATION:
+	case _EXCEPTION_INT_DIVIDE_BY_ZERO:
+	case _EXCEPTION_INT_OVERFLOW:
+	case _EXCEPTION_FLT_DENORMAL_OPERAND:
+	case _EXCEPTION_FLT_DIVIDE_BY_ZERO:
+	case _EXCEPTION_FLT_INEXACT_RESULT:
+	case _EXCEPTION_FLT_OVERFLOW:
+	case _EXCEPTION_FLT_UNDERFLOW:
+	case _EXCEPTION_BREAKPOINT:
+	}
+	return true
+}
+
+// Called by sigtramp from Windows VEH handler.
+// Return value signals whether the exception has been handled (EXCEPTION_CONTINUE_EXECUTION)
+// or should be made available to other handlers in the chain (EXCEPTION_CONTINUE_SEARCH).
+//
+// This is the first entry into Go code for exception handling. This
+// is nosplit to avoid growing the stack until we've checked for
+// _EXCEPTION_BREAKPOINT, which is raised if we overflow the g0 stack,
+//
+//go:nosplit
+func exceptionhandler(info *exceptionrecord, r *context, gp *g) int32 {
+	if !isgoexception(info, r) {
+		return _EXCEPTION_CONTINUE_SEARCH
+	}
+
+	// After this point, it is safe to grow the stack.
+
+	if gp.throwsplit {
+		// We can't safely sigpanic because it may grow the
+		// stack. Let it fall through.
+		return _EXCEPTION_CONTINUE_SEARCH
+	}
+
+	// Make it look like a call to the signal func.
+	// Have to pass arguments out of band since
+	// augmenting the stack frame would break
+	// the unwinding code.
+	gp.sig = info.exceptioncode
+	gp.sigcode0 = uintptr(info.exceptioninformation[0])
+	gp.sigcode1 = uintptr(info.exceptioninformation[1])
+	gp.sigpc = r.ip()
+
+	// Only push runtime·sigpanic if r.ip() != 0.
+	// If r.ip() == 0, probably panicked because of a
+	// call to a nil func. Not pushing that onto sp will
+	// make the trace look like a call to runtime·sigpanic instead.
+	// (Otherwise the trace will end at runtime·sigpanic and we
+	// won't get to see who faulted.)
+	// Also don't push a sigpanic frame if the faulting PC
+	// is the entry of asyncPreempt. In this case, we suspended
+	// the thread right between the fault and the exception handler
+	// starting to run, and we have pushed an asyncPreempt call.
+	// The exception is not from asyncPreempt, so not to push a
+	// sigpanic call to make it look like that. Instead, just
+	// overwrite the PC. (See issue #35773)
+	if r.ip() != 0 && r.ip() != funcPC(asyncPreempt) {
+		sp := unsafe.Pointer(r.sp())
+		sp = add(sp, ^(unsafe.Sizeof(uintptr(0)) - 1)) // sp--
+		r.set_sp(uintptr(sp))
+		switch GOARCH {
+		default:
+			panic("unsupported architecture")
+		case "386", "amd64":
+			*((*uintptr)(sp)) = r.ip()
+		case "arm":
+			*((*uintptr)(sp)) = r.lr()
+			r.set_lr(r.ip())
+		}
+	}
+	r.set_ip(funcPC(sigpanic))
+	return _EXCEPTION_CONTINUE_EXECUTION
+}
+
+// It seems Windows searches ContinueHandler's list even
+// if ExceptionHandler returns EXCEPTION_CONTINUE_EXECUTION.
+// firstcontinuehandler will stop that search,
+// if exceptionhandler did the same earlier.
+//
+// It is nosplit for the same reason as exceptionhandler.
+//
+//go:nosplit
+func firstcontinuehandler(info *exceptionrecord, r *context, gp *g) int32 {
+	if !isgoexception(info, r) {
+		return _EXCEPTION_CONTINUE_SEARCH
+	}
+	return _EXCEPTION_CONTINUE_EXECUTION
+}
+
+var testingWER bool
+
+// lastcontinuehandler is reached, because runtime cannot handle
+// current exception. lastcontinuehandler will print crash info and exit.
+//
+// It is nosplit for the same reason as exceptionhandler.
+//
+//go:nosplit
+func lastcontinuehandler(info *exceptionrecord, r *context, gp *g) int32 {
+	if islibrary || isarchive {
+		// Go DLL/archive has been loaded in a non-go program.
+		// If the exception does not originate from go, the go runtime
+		// should not take responsibility of crashing the process.
+		return _EXCEPTION_CONTINUE_SEARCH
+	}
+	if testingWER {
+		return _EXCEPTION_CONTINUE_SEARCH
+	}
+
+	_g_ := getg()
+
+	if panicking != 0 { // traceback already printed
+		exit(2)
+	}
+	panicking = 1
+
+	// In case we're handling a g0 stack overflow, blow away the
+	// g0 stack bounds so we have room to print the traceback. If
+	// this somehow overflows the stack, the OS will trap it.
+	_g_.stack.lo = 0
+	_g_.stackguard0 = _g_.stack.lo + _StackGuard
+	_g_.stackguard1 = _g_.stackguard0
+
+	print("Exception ", hex(info.exceptioncode), " ", hex(info.exceptioninformation[0]), " ", hex(info.exceptioninformation[1]), " ", hex(r.ip()), "\n")
+
+	print("PC=", hex(r.ip()), "\n")
+	if _g_.m.lockedg != 0 && _g_.m.ncgo > 0 && gp == _g_.m.g0 {
+		if iscgo {
+			print("signal arrived during external code execution\n")
+		}
+		gp = _g_.m.lockedg.ptr()
+	}
+	print("\n")
+
+	// TODO(jordanrh1): This may be needed for 386/AMD64 as well.
+	if GOARCH == "arm" {
+		_g_.m.throwing = 1
+		_g_.m.caughtsig.set(gp)
+	}
+
+	level, _, docrash := gotraceback()
+	if level > 0 {
+		tracebacktrap(r.ip(), r.sp(), r.lr(), gp)
+		tracebackothers(gp)
+		dumpregs(r)
+	}
+
+	if docrash {
+		crash()
+	}
+
+	exit(2)
+	return 0 // not reached
+}
+
+func sigpanic() {
+	g := getg()
+	if !canpanic(g) {
+		throw("unexpected signal during runtime execution")
+	}
+
+	switch g.sig {
+	case _EXCEPTION_ACCESS_VIOLATION:
+		if g.sigcode1 < 0x1000 {
+			panicmem()
+		}
+		if g.paniconfault {
+			panicmemAddr(g.sigcode1)
+		}
+		print("unexpected fault address ", hex(g.sigcode1), "\n")
+		throw("fault")
+	case _EXCEPTION_INT_DIVIDE_BY_ZERO:
+		panicdivide()
+	case _EXCEPTION_INT_OVERFLOW:
+		panicoverflow()
+	case _EXCEPTION_FLT_DENORMAL_OPERAND,
+		_EXCEPTION_FLT_DIVIDE_BY_ZERO,
+		_EXCEPTION_FLT_INEXACT_RESULT,
+		_EXCEPTION_FLT_OVERFLOW,
+		_EXCEPTION_FLT_UNDERFLOW:
+		panicfloat()
+	}
+	throw("fault")
+}
+
+var (
+	badsignalmsg [100]byte
+	badsignallen int32
+)
+
+func setBadSignalMsg() {
+	const msg = "runtime: signal received on thread not created by Go.\n"
+	for i, c := range msg {
+		badsignalmsg[i] = byte(c)
+		badsignallen++
+	}
+}
+
+// Following are not implemented.
+
+func initsig(preinit bool) {
+}
+
+func sigenable(sig uint32) {
+}
+
+func sigdisable(sig uint32) {
+}
+
+func sigignore(sig uint32) {
+}
+
+func badsignal2()
+
+func raisebadsignal(sig uint32) {
+	badsignal2()
+}
+
+func signame(sig uint32) string {
+	return ""
+}
+
+//go:nosplit
+func crash() {
+	// TODO: This routine should do whatever is needed
+	// to make the Windows program abort/crash as it
+	// would if Go was not intercepting signals.
+	// On Unix the routine would remove the custom signal
+	// handler and then raise a signal (like SIGABRT).
+	// Something like that should happen here.
+	// It's okay to leave this empty for now: if crash returns
+	// the ordinary exit-after-panic happens.
+}
+
+// gsignalStack is unused on Windows.
+type gsignalStack struct{}
diff --git a/src/runtime/signal_windows_test.go b/src/runtime/signal_windows_test.go
new file mode 100644
index 0000000..33a9b92
--- /dev/null
+++ b/src/runtime/signal_windows_test.go
@@ -0,0 +1,215 @@
+// +build windows
+
+package runtime_test
+
+import (
+	"bufio"
+	"bytes"
+	"fmt"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"runtime"
+	"strconv"
+	"strings"
+	"syscall"
+	"testing"
+)
+
+func TestVectoredHandlerDontCrashOnLibrary(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	if runtime.GOARCH != "amd64" {
+		t.Skip("this test can only run on windows/amd64")
+	}
+	testenv.MustHaveGoBuild(t)
+	testenv.MustHaveExecPath(t, "gcc")
+	testprog.Lock()
+	defer testprog.Unlock()
+	dir, err := os.MkdirTemp("", "go-build")
+	if err != nil {
+		t.Fatalf("failed to create temp directory: %v", err)
+	}
+	defer os.RemoveAll(dir)
+
+	// build go dll
+	dll := filepath.Join(dir, "testwinlib.dll")
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", dll, "--buildmode", "c-shared", "testdata/testwinlib/main.go")
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build go library: %s\n%s", err, out)
+	}
+
+	// build c program
+	exe := filepath.Join(dir, "test.exe")
+	cmd = exec.Command("gcc", "-L"+dir, "-I"+dir, "-ltestwinlib", "-o", exe, "testdata/testwinlib/main.c")
+	out, err = testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build c exe: %s\n%s", err, out)
+	}
+
+	// run test program
+	cmd = exec.Command(exe)
+	out, err = testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failure while running executable: %s\n%s", err, out)
+	}
+	expectedOutput := "exceptionCount: 1\ncontinueCount: 1\n"
+	// cleaning output
+	cleanedOut := strings.ReplaceAll(string(out), "\r\n", "\n")
+	if cleanedOut != expectedOutput {
+		t.Errorf("expected output %q, got %q", expectedOutput, cleanedOut)
+	}
+}
+
+func sendCtrlBreak(pid int) error {
+	kernel32, err := syscall.LoadDLL("kernel32.dll")
+	if err != nil {
+		return fmt.Errorf("LoadDLL: %v\n", err)
+	}
+	generateEvent, err := kernel32.FindProc("GenerateConsoleCtrlEvent")
+	if err != nil {
+		return fmt.Errorf("FindProc: %v\n", err)
+	}
+	result, _, err := generateEvent.Call(syscall.CTRL_BREAK_EVENT, uintptr(pid))
+	if result == 0 {
+		return fmt.Errorf("GenerateConsoleCtrlEvent: %v\n", err)
+	}
+	return nil
+}
+
+// TestCtrlHandler tests that Go can gracefully handle closing the console window.
+// See https://golang.org/issues/41884.
+func TestCtrlHandler(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+	t.Parallel()
+
+	// build go program
+	exe := filepath.Join(t.TempDir(), "test.exe")
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", exe, "testdata/testwinsignal/main.go")
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build go exe: %v\n%s", err, out)
+	}
+
+	// run test program
+	cmd = exec.Command(exe)
+	var stderr bytes.Buffer
+	cmd.Stderr = &stderr
+	outPipe, err := cmd.StdoutPipe()
+	if err != nil {
+		t.Fatalf("Failed to create stdout pipe: %v", err)
+	}
+	outReader := bufio.NewReader(outPipe)
+
+	// in a new command window
+	const _CREATE_NEW_CONSOLE = 0x00000010
+	cmd.SysProcAttr = &syscall.SysProcAttr{
+		CreationFlags: _CREATE_NEW_CONSOLE,
+		HideWindow:    true,
+	}
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("Start failed: %v", err)
+	}
+	defer func() {
+		cmd.Process.Kill()
+		cmd.Wait()
+	}()
+
+	// wait for child to be ready to receive signals
+	if line, err := outReader.ReadString('\n'); err != nil {
+		t.Fatalf("could not read stdout: %v", err)
+	} else if strings.TrimSpace(line) != "ready" {
+		t.Fatalf("unexpected message: %s", line)
+	}
+
+	// gracefully kill pid, this closes the command window
+	if err := exec.Command("taskkill.exe", "/pid", strconv.Itoa(cmd.Process.Pid)).Run(); err != nil {
+		t.Fatalf("failed to kill: %v", err)
+	}
+
+	// check child received, handled SIGTERM
+	if line, err := outReader.ReadString('\n'); err != nil {
+		t.Fatalf("could not read stdout: %v", err)
+	} else if expected, got := syscall.SIGTERM.String(), strings.TrimSpace(line); expected != got {
+		t.Fatalf("Expected '%s' got: %s", expected, got)
+	}
+
+	// check child exited gracefully, did not timeout
+	if err := cmd.Wait(); err != nil {
+		t.Fatalf("Program exited with error: %v\n%s", err, &stderr)
+	}
+}
+
+// TestLibraryCtrlHandler tests that Go DLL allows calling program to handle console control events.
+// See https://golang.org/issues/35965.
+func TestLibraryCtrlHandler(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	if runtime.GOARCH != "amd64" {
+		t.Skip("this test can only run on windows/amd64")
+	}
+	testenv.MustHaveGoBuild(t)
+	testenv.MustHaveExecPath(t, "gcc")
+	testprog.Lock()
+	defer testprog.Unlock()
+	dir, err := os.MkdirTemp("", "go-build")
+	if err != nil {
+		t.Fatalf("failed to create temp directory: %v", err)
+	}
+	defer os.RemoveAll(dir)
+
+	// build go dll
+	dll := filepath.Join(dir, "dummy.dll")
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", dll, "--buildmode", "c-shared", "testdata/testwinlibsignal/dummy.go")
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build go library: %s\n%s", err, out)
+	}
+
+	// build c program
+	exe := filepath.Join(dir, "test.exe")
+	cmd = exec.Command("gcc", "-o", exe, "testdata/testwinlibsignal/main.c")
+	out, err = testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build c exe: %s\n%s", err, out)
+	}
+
+	// run test program
+	cmd = exec.Command(exe)
+	var stderr bytes.Buffer
+	cmd.Stderr = &stderr
+	outPipe, err := cmd.StdoutPipe()
+	if err != nil {
+		t.Fatalf("Failed to create stdout pipe: %v", err)
+	}
+	outReader := bufio.NewReader(outPipe)
+
+	cmd.SysProcAttr = &syscall.SysProcAttr{
+		CreationFlags: syscall.CREATE_NEW_PROCESS_GROUP,
+	}
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("Start failed: %v", err)
+	}
+
+	errCh := make(chan error, 1)
+	go func() {
+		if line, err := outReader.ReadString('\n'); err != nil {
+			errCh <- fmt.Errorf("could not read stdout: %v", err)
+		} else if strings.TrimSpace(line) != "ready" {
+			errCh <- fmt.Errorf("unexpected message: %v", line)
+		} else {
+			errCh <- sendCtrlBreak(cmd.Process.Pid)
+		}
+	}()
+
+	if err := <-errCh; err != nil {
+		t.Fatal(err)
+	}
+	if err := cmd.Wait(); err != nil {
+		t.Fatalf("Program exited with error: %v\n%s", err, &stderr)
+	}
+}
diff --git a/src/runtime/sigqueue.go b/src/runtime/sigqueue.go
new file mode 100644
index 0000000..6bed64e
--- /dev/null
+++ b/src/runtime/sigqueue.go
@@ -0,0 +1,294 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file implements runtime support for signal handling.
+//
+// Most synchronization primitives are not available from
+// the signal handler (it cannot block, allocate memory, or use locks)
+// so the handler communicates with a processing goroutine
+// via struct sig, below.
+//
+// sigsend is called by the signal handler to queue a new signal.
+// signal_recv is called by the Go program to receive a newly queued signal.
+// Synchronization between sigsend and signal_recv is based on the sig.state
+// variable. It can be in 4 states: sigIdle, sigReceiving, sigSending and sigFixup.
+// sigReceiving means that signal_recv is blocked on sig.Note and there are no
+// new pending signals.
+// sigSending means that sig.mask *may* contain new pending signals,
+// signal_recv can't be blocked in this state.
+// sigIdle means that there are no new pending signals and signal_recv is not blocked.
+// sigFixup is a transient state that can only exist as a short
+// transition from sigReceiving and then on to sigIdle: it is
+// used to ensure the AllThreadsSyscall()'s mDoFixup() operation
+// occurs on the sleeping m, waiting to receive a signal.
+// Transitions between states are done atomically with CAS.
+// When signal_recv is unblocked, it resets sig.Note and rechecks sig.mask.
+// If several sigsends and signal_recv execute concurrently, it can lead to
+// unnecessary rechecks of sig.mask, but it cannot lead to missed signals
+// nor deadlocks.
+
+// +build !plan9
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	_ "unsafe" // for go:linkname
+)
+
+// sig handles communication between the signal handler and os/signal.
+// Other than the inuse and recv fields, the fields are accessed atomically.
+//
+// The wanted and ignored fields are only written by one goroutine at
+// a time; access is controlled by the handlers Mutex in os/signal.
+// The fields are only read by that one goroutine and by the signal handler.
+// We access them atomically to minimize the race between setting them
+// in the goroutine calling os/signal and the signal handler,
+// which may be running in a different thread. That race is unavoidable,
+// as there is no connection between handling a signal and receiving one,
+// but atomic instructions should minimize it.
+var sig struct {
+	note       note
+	mask       [(_NSIG + 31) / 32]uint32
+	wanted     [(_NSIG + 31) / 32]uint32
+	ignored    [(_NSIG + 31) / 32]uint32
+	recv       [(_NSIG + 31) / 32]uint32
+	state      uint32
+	delivering uint32
+	inuse      bool
+}
+
+const (
+	sigIdle = iota
+	sigReceiving
+	sigSending
+	sigFixup
+)
+
+// sigsend delivers a signal from sighandler to the internal signal delivery queue.
+// It reports whether the signal was sent. If not, the caller typically crashes the program.
+// It runs from the signal handler, so it's limited in what it can do.
+func sigsend(s uint32) bool {
+	bit := uint32(1) << uint(s&31)
+	if !sig.inuse || s >= uint32(32*len(sig.wanted)) {
+		return false
+	}
+
+	atomic.Xadd(&sig.delivering, 1)
+	// We are running in the signal handler; defer is not available.
+
+	if w := atomic.Load(&sig.wanted[s/32]); w&bit == 0 {
+		atomic.Xadd(&sig.delivering, -1)
+		return false
+	}
+
+	// Add signal to outgoing queue.
+	for {
+		mask := sig.mask[s/32]
+		if mask&bit != 0 {
+			atomic.Xadd(&sig.delivering, -1)
+			return true // signal already in queue
+		}
+		if atomic.Cas(&sig.mask[s/32], mask, mask|bit) {
+			break
+		}
+	}
+
+	// Notify receiver that queue has new bit.
+Send:
+	for {
+		switch atomic.Load(&sig.state) {
+		default:
+			throw("sigsend: inconsistent state")
+		case sigIdle:
+			if atomic.Cas(&sig.state, sigIdle, sigSending) {
+				break Send
+			}
+		case sigSending:
+			// notification already pending
+			break Send
+		case sigReceiving:
+			if atomic.Cas(&sig.state, sigReceiving, sigIdle) {
+				if GOOS == "darwin" || GOOS == "ios" {
+					sigNoteWakeup(&sig.note)
+					break Send
+				}
+				notewakeup(&sig.note)
+				break Send
+			}
+		case sigFixup:
+			// nothing to do - we need to wait for sigIdle.
+			mDoFixupAndOSYield()
+		}
+	}
+
+	atomic.Xadd(&sig.delivering, -1)
+	return true
+}
+
+// sigRecvPrepareForFixup is used to temporarily wake up the
+// signal_recv() running thread while it is blocked waiting for the
+// arrival of a signal. If it causes the thread to wake up, the
+// sig.state travels through this sequence: sigReceiving -> sigFixup
+// -> sigIdle -> sigReceiving and resumes. (This is only called while
+// GC is disabled.)
+//go:nosplit
+func sigRecvPrepareForFixup() {
+	if atomic.Cas(&sig.state, sigReceiving, sigFixup) {
+		notewakeup(&sig.note)
+	}
+}
+
+// Called to receive the next queued signal.
+// Must only be called from a single goroutine at a time.
+//go:linkname signal_recv os/signal.signal_recv
+func signal_recv() uint32 {
+	for {
+		// Serve any signals from local copy.
+		for i := uint32(0); i < _NSIG; i++ {
+			if sig.recv[i/32]&(1<<(i&31)) != 0 {
+				sig.recv[i/32] &^= 1 << (i & 31)
+				return i
+			}
+		}
+
+		// Wait for updates to be available from signal sender.
+	Receive:
+		for {
+			switch atomic.Load(&sig.state) {
+			default:
+				throw("signal_recv: inconsistent state")
+			case sigIdle:
+				if atomic.Cas(&sig.state, sigIdle, sigReceiving) {
+					if GOOS == "darwin" || GOOS == "ios" {
+						sigNoteSleep(&sig.note)
+						break Receive
+					}
+					notetsleepg(&sig.note, -1)
+					noteclear(&sig.note)
+					if !atomic.Cas(&sig.state, sigFixup, sigIdle) {
+						break Receive
+					}
+					// Getting here, the code will
+					// loop around again to sleep
+					// in state sigReceiving. This
+					// path is taken when
+					// sigRecvPrepareForFixup()
+					// has been called by another
+					// thread.
+				}
+			case sigSending:
+				if atomic.Cas(&sig.state, sigSending, sigIdle) {
+					break Receive
+				}
+			}
+		}
+
+		// Incorporate updates from sender into local copy.
+		for i := range sig.mask {
+			sig.recv[i] = atomic.Xchg(&sig.mask[i], 0)
+		}
+	}
+}
+
+// signalWaitUntilIdle waits until the signal delivery mechanism is idle.
+// This is used to ensure that we do not drop a signal notification due
+// to a race between disabling a signal and receiving a signal.
+// This assumes that signal delivery has already been disabled for
+// the signal(s) in question, and here we are just waiting to make sure
+// that all the signals have been delivered to the user channels
+// by the os/signal package.
+//go:linkname signalWaitUntilIdle os/signal.signalWaitUntilIdle
+func signalWaitUntilIdle() {
+	// Although the signals we care about have been removed from
+	// sig.wanted, it is possible that another thread has received
+	// a signal, has read from sig.wanted, is now updating sig.mask,
+	// and has not yet woken up the processor thread. We need to wait
+	// until all current signal deliveries have completed.
+	for atomic.Load(&sig.delivering) != 0 {
+		Gosched()
+	}
+
+	// Although WaitUntilIdle seems like the right name for this
+	// function, the state we are looking for is sigReceiving, not
+	// sigIdle.  The sigIdle state is really more like sigProcessing.
+	for atomic.Load(&sig.state) != sigReceiving {
+		Gosched()
+	}
+}
+
+// Must only be called from a single goroutine at a time.
+//go:linkname signal_enable os/signal.signal_enable
+func signal_enable(s uint32) {
+	if !sig.inuse {
+		// This is the first call to signal_enable. Initialize.
+		sig.inuse = true // enable reception of signals; cannot disable
+		if GOOS == "darwin" || GOOS == "ios" {
+			sigNoteSetup(&sig.note)
+		} else {
+			noteclear(&sig.note)
+		}
+	}
+
+	if s >= uint32(len(sig.wanted)*32) {
+		return
+	}
+
+	w := sig.wanted[s/32]
+	w |= 1 << (s & 31)
+	atomic.Store(&sig.wanted[s/32], w)
+
+	i := sig.ignored[s/32]
+	i &^= 1 << (s & 31)
+	atomic.Store(&sig.ignored[s/32], i)
+
+	sigenable(s)
+}
+
+// Must only be called from a single goroutine at a time.
+//go:linkname signal_disable os/signal.signal_disable
+func signal_disable(s uint32) {
+	if s >= uint32(len(sig.wanted)*32) {
+		return
+	}
+	sigdisable(s)
+
+	w := sig.wanted[s/32]
+	w &^= 1 << (s & 31)
+	atomic.Store(&sig.wanted[s/32], w)
+}
+
+// Must only be called from a single goroutine at a time.
+//go:linkname signal_ignore os/signal.signal_ignore
+func signal_ignore(s uint32) {
+	if s >= uint32(len(sig.wanted)*32) {
+		return
+	}
+	sigignore(s)
+
+	w := sig.wanted[s/32]
+	w &^= 1 << (s & 31)
+	atomic.Store(&sig.wanted[s/32], w)
+
+	i := sig.ignored[s/32]
+	i |= 1 << (s & 31)
+	atomic.Store(&sig.ignored[s/32], i)
+}
+
+// sigInitIgnored marks the signal as already ignored. This is called at
+// program start by initsig. In a shared library initsig is called by
+// libpreinit, so the runtime may not be initialized yet.
+//go:nosplit
+func sigInitIgnored(s uint32) {
+	i := sig.ignored[s/32]
+	i |= 1 << (s & 31)
+	atomic.Store(&sig.ignored[s/32], i)
+}
+
+// Checked by signal handlers.
+//go:linkname signal_ignored os/signal.signal_ignored
+func signal_ignored(s uint32) bool {
+	i := atomic.Load(&sig.ignored[s/32])
+	return i&(1<<(s&31)) != 0
+}
diff --git a/src/runtime/sigqueue_note.go b/src/runtime/sigqueue_note.go
new file mode 100644
index 0000000..16aeeb2
--- /dev/null
+++ b/src/runtime/sigqueue_note.go
@@ -0,0 +1,25 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The current implementation of notes on Darwin is not async-signal-safe,
+// so on Darwin the sigqueue code uses different functions to wake up the
+// signal_recv thread. This file holds the non-Darwin implementations of
+// those functions. These functions will never be called.
+
+// +build !darwin
+// +build !plan9
+
+package runtime
+
+func sigNoteSetup(*note) {
+	throw("sigNoteSetup")
+}
+
+func sigNoteSleep(*note) {
+	throw("sigNoteSleep")
+}
+
+func sigNoteWakeup(*note) {
+	throw("sigNoteWakeup")
+}
diff --git a/src/runtime/sigqueue_plan9.go b/src/runtime/sigqueue_plan9.go
new file mode 100644
index 0000000..aebd206
--- /dev/null
+++ b/src/runtime/sigqueue_plan9.go
@@ -0,0 +1,163 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file implements runtime support for signal handling.
+
+package runtime
+
+import _ "unsafe"
+
+const qsize = 64
+
+var sig struct {
+	q     noteQueue
+	inuse bool
+
+	lock     mutex
+	note     note
+	sleeping bool
+}
+
+type noteData struct {
+	s [_ERRMAX]byte
+	n int // n bytes of s are valid
+}
+
+type noteQueue struct {
+	lock mutex
+	data [qsize]noteData
+	ri   int
+	wi   int
+	full bool
+}
+
+// It is not allowed to allocate memory in the signal handler.
+func (q *noteQueue) push(item *byte) bool {
+	lock(&q.lock)
+	if q.full {
+		unlock(&q.lock)
+		return false
+	}
+	s := gostringnocopy(item)
+	copy(q.data[q.wi].s[:], s)
+	q.data[q.wi].n = len(s)
+	q.wi++
+	if q.wi == qsize {
+		q.wi = 0
+	}
+	if q.wi == q.ri {
+		q.full = true
+	}
+	unlock(&q.lock)
+	return true
+}
+
+func (q *noteQueue) pop() string {
+	lock(&q.lock)
+	q.full = false
+	if q.ri == q.wi {
+		unlock(&q.lock)
+		return ""
+	}
+	note := &q.data[q.ri]
+	item := string(note.s[:note.n])
+	q.ri++
+	if q.ri == qsize {
+		q.ri = 0
+	}
+	unlock(&q.lock)
+	return item
+}
+
+// Called from sighandler to send a signal back out of the signal handling thread.
+// Reports whether the signal was sent. If not, the caller typically crashes the program.
+func sendNote(s *byte) bool {
+	if !sig.inuse {
+		return false
+	}
+
+	// Add signal to outgoing queue.
+	if !sig.q.push(s) {
+		return false
+	}
+
+	lock(&sig.lock)
+	if sig.sleeping {
+		sig.sleeping = false
+		notewakeup(&sig.note)
+	}
+	unlock(&sig.lock)
+
+	return true
+}
+
+// sigRecvPrepareForFixup is a no-op on plan9. (This would only be
+// called while GC is disabled.)
+//
+//go:nosplit
+func sigRecvPrepareForFixup() {
+}
+
+// Called to receive the next queued signal.
+// Must only be called from a single goroutine at a time.
+//go:linkname signal_recv os/signal.signal_recv
+func signal_recv() string {
+	for {
+		note := sig.q.pop()
+		if note != "" {
+			return note
+		}
+
+		lock(&sig.lock)
+		sig.sleeping = true
+		noteclear(&sig.note)
+		unlock(&sig.lock)
+		notetsleepg(&sig.note, -1)
+	}
+}
+
+// signalWaitUntilIdle waits until the signal delivery mechanism is idle.
+// This is used to ensure that we do not drop a signal notification due
+// to a race between disabling a signal and receiving a signal.
+// This assumes that signal delivery has already been disabled for
+// the signal(s) in question, and here we are just waiting to make sure
+// that all the signals have been delivered to the user channels
+// by the os/signal package.
+//go:linkname signalWaitUntilIdle os/signal.signalWaitUntilIdle
+func signalWaitUntilIdle() {
+	for {
+		lock(&sig.lock)
+		sleeping := sig.sleeping
+		unlock(&sig.lock)
+		if sleeping {
+			return
+		}
+		Gosched()
+	}
+}
+
+// Must only be called from a single goroutine at a time.
+//go:linkname signal_enable os/signal.signal_enable
+func signal_enable(s uint32) {
+	if !sig.inuse {
+		// This is the first call to signal_enable. Initialize.
+		sig.inuse = true // enable reception of signals; cannot disable
+		noteclear(&sig.note)
+	}
+}
+
+// Must only be called from a single goroutine at a time.
+//go:linkname signal_disable os/signal.signal_disable
+func signal_disable(s uint32) {
+}
+
+// Must only be called from a single goroutine at a time.
+//go:linkname signal_ignore os/signal.signal_ignore
+func signal_ignore(s uint32) {
+}
+
+//go:linkname signal_ignored os/signal.signal_ignored
+func signal_ignored(s uint32) bool {
+	return false
+}
diff --git a/src/runtime/sigtab_aix.go b/src/runtime/sigtab_aix.go
new file mode 100644
index 0000000..42e5606
--- /dev/null
+++ b/src/runtime/sigtab_aix.go
@@ -0,0 +1,264 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	0:           {0, "SIGNONE: no trap"},
+	_SIGHUP:     {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	_SIGINT:     {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	_SIGQUIT:    {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	_SIGILL:     {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	_SIGTRAP:    {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	_SIGABRT:    {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	_SIGBUS:     {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	_SIGFPE:     {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	_SIGKILL:    {0, "SIGKILL: kill"},
+	_SIGUSR1:    {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	_SIGSEGV:    {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	_SIGUSR2:    {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	_SIGPIPE:    {_SigNotify, "SIGPIPE: write to broken pipe"},
+	_SIGALRM:    {_SigNotify, "SIGALRM: alarm clock"},
+	_SIGTERM:    {_SigNotify + _SigKill, "SIGTERM: termination"},
+	_SIGCHLD:    {_SigNotify + _SigUnblock, "SIGCHLD: child status has changed"},
+	_SIGCONT:    {_SigNotify + _SigDefault, "SIGCONT: continue"},
+	_SIGSTOP:    {0, "SIGSTOP: stop"},
+	_SIGTSTP:    {_SigNotify + _SigDefault, "SIGTSTP: keyboard stop"},
+	_SIGTTIN:    {_SigNotify + _SigDefault, "SIGTTIN: background read from tty"},
+	_SIGTTOU:    {_SigNotify + _SigDefault, "SIGTTOU: background write to tty"},
+	_SIGURG:     {_SigNotify, "SIGURG: urgent condition on socket"},
+	_SIGXCPU:    {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	_SIGXFSZ:    {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	_SIGVTALRM:  {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	_SIGPROF:    {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	_SIGWINCH:   {_SigNotify, "SIGWINCH: window size change"},
+	_SIGSYS:     {_SigThrow, "SIGSYS: bad system call"},
+	_SIGIO:      {_SigNotify, "SIGIO: i/o now possible"},
+	_SIGPWR:     {_SigNotify, "SIGPWR: power failure restart"},
+	_SIGEMT:     {_SigThrow, "SIGEMT: emulate instruction executed"},
+	_SIGWAITING: {0, "SIGWAITING: reserved signal no longer used by"},
+	26:          {_SigNotify, "signal 26"},
+	27:          {_SigNotify, "signal 27"},
+	33:          {_SigNotify, "signal 33"},
+	35:          {_SigNotify, "signal 35"},
+	36:          {_SigNotify, "signal 36"},
+	37:          {_SigNotify, "signal 37"},
+	38:          {_SigNotify, "signal 38"},
+	40:          {_SigNotify, "signal 40"},
+	41:          {_SigNotify, "signal 41"},
+	42:          {_SigNotify, "signal 42"},
+	43:          {_SigNotify, "signal 43"},
+	44:          {_SigNotify, "signal 44"},
+	45:          {_SigNotify, "signal 45"},
+	46:          {_SigNotify, "signal 46"},
+	47:          {_SigNotify, "signal 47"},
+	48:          {_SigNotify, "signal 48"},
+	49:          {_SigNotify, "signal 49"},
+	50:          {_SigNotify, "signal 50"},
+	51:          {_SigNotify, "signal 51"},
+	52:          {_SigNotify, "signal 52"},
+	53:          {_SigNotify, "signal 53"},
+	54:          {_SigNotify, "signal 54"},
+	55:          {_SigNotify, "signal 55"},
+	56:          {_SigNotify, "signal 56"},
+	57:          {_SigNotify, "signal 57"},
+	58:          {_SigNotify, "signal 58"},
+	59:          {_SigNotify, "signal 59"},
+	60:          {_SigNotify, "signal 60"},
+	61:          {_SigNotify, "signal 61"},
+	62:          {_SigNotify, "signal 62"},
+	63:          {_SigNotify, "signal 63"},
+	64:          {_SigNotify, "signal 64"},
+	65:          {_SigNotify, "signal 65"},
+	66:          {_SigNotify, "signal 66"},
+	67:          {_SigNotify, "signal 67"},
+	68:          {_SigNotify, "signal 68"},
+	69:          {_SigNotify, "signal 69"},
+	70:          {_SigNotify, "signal 70"},
+	71:          {_SigNotify, "signal 71"},
+	72:          {_SigNotify, "signal 72"},
+	73:          {_SigNotify, "signal 73"},
+	74:          {_SigNotify, "signal 74"},
+	75:          {_SigNotify, "signal 75"},
+	76:          {_SigNotify, "signal 76"},
+	77:          {_SigNotify, "signal 77"},
+	78:          {_SigNotify, "signal 78"},
+	79:          {_SigNotify, "signal 79"},
+	80:          {_SigNotify, "signal 80"},
+	81:          {_SigNotify, "signal 81"},
+	82:          {_SigNotify, "signal 82"},
+	83:          {_SigNotify, "signal 83"},
+	84:          {_SigNotify, "signal 84"},
+	85:          {_SigNotify, "signal 85"},
+	86:          {_SigNotify, "signal 86"},
+	87:          {_SigNotify, "signal 87"},
+	88:          {_SigNotify, "signal 88"},
+	89:          {_SigNotify, "signal 89"},
+	90:          {_SigNotify, "signal 90"},
+	91:          {_SigNotify, "signal 91"},
+	92:          {_SigNotify, "signal 92"},
+	93:          {_SigNotify, "signal 93"},
+	94:          {_SigNotify, "signal 94"},
+	95:          {_SigNotify, "signal 95"},
+	96:          {_SigNotify, "signal 96"},
+	97:          {_SigNotify, "signal 97"},
+	98:          {_SigNotify, "signal 98"},
+	99:          {_SigNotify, "signal 99"},
+	100:         {_SigNotify, "signal 100"},
+	101:         {_SigNotify, "signal 101"},
+	102:         {_SigNotify, "signal 102"},
+	103:         {_SigNotify, "signal 103"},
+	104:         {_SigNotify, "signal 104"},
+	105:         {_SigNotify, "signal 105"},
+	106:         {_SigNotify, "signal 106"},
+	107:         {_SigNotify, "signal 107"},
+	108:         {_SigNotify, "signal 108"},
+	109:         {_SigNotify, "signal 109"},
+	110:         {_SigNotify, "signal 110"},
+	111:         {_SigNotify, "signal 111"},
+	112:         {_SigNotify, "signal 112"},
+	113:         {_SigNotify, "signal 113"},
+	114:         {_SigNotify, "signal 114"},
+	115:         {_SigNotify, "signal 115"},
+	116:         {_SigNotify, "signal 116"},
+	117:         {_SigNotify, "signal 117"},
+	118:         {_SigNotify, "signal 118"},
+	119:         {_SigNotify, "signal 119"},
+	120:         {_SigNotify, "signal 120"},
+	121:         {_SigNotify, "signal 121"},
+	122:         {_SigNotify, "signal 122"},
+	123:         {_SigNotify, "signal 123"},
+	124:         {_SigNotify, "signal 124"},
+	125:         {_SigNotify, "signal 125"},
+	126:         {_SigNotify, "signal 126"},
+	127:         {_SigNotify, "signal 127"},
+	128:         {_SigNotify, "signal 128"},
+	129:         {_SigNotify, "signal 129"},
+	130:         {_SigNotify, "signal 130"},
+	131:         {_SigNotify, "signal 131"},
+	132:         {_SigNotify, "signal 132"},
+	133:         {_SigNotify, "signal 133"},
+	134:         {_SigNotify, "signal 134"},
+	135:         {_SigNotify, "signal 135"},
+	136:         {_SigNotify, "signal 136"},
+	137:         {_SigNotify, "signal 137"},
+	138:         {_SigNotify, "signal 138"},
+	139:         {_SigNotify, "signal 139"},
+	140:         {_SigNotify, "signal 140"},
+	141:         {_SigNotify, "signal 141"},
+	142:         {_SigNotify, "signal 142"},
+	143:         {_SigNotify, "signal 143"},
+	144:         {_SigNotify, "signal 144"},
+	145:         {_SigNotify, "signal 145"},
+	146:         {_SigNotify, "signal 146"},
+	147:         {_SigNotify, "signal 147"},
+	148:         {_SigNotify, "signal 148"},
+	149:         {_SigNotify, "signal 149"},
+	150:         {_SigNotify, "signal 150"},
+	151:         {_SigNotify, "signal 151"},
+	152:         {_SigNotify, "signal 152"},
+	153:         {_SigNotify, "signal 153"},
+	154:         {_SigNotify, "signal 154"},
+	155:         {_SigNotify, "signal 155"},
+	156:         {_SigNotify, "signal 156"},
+	157:         {_SigNotify, "signal 157"},
+	158:         {_SigNotify, "signal 158"},
+	159:         {_SigNotify, "signal 159"},
+	160:         {_SigNotify, "signal 160"},
+	161:         {_SigNotify, "signal 161"},
+	162:         {_SigNotify, "signal 162"},
+	163:         {_SigNotify, "signal 163"},
+	164:         {_SigNotify, "signal 164"},
+	165:         {_SigNotify, "signal 165"},
+	166:         {_SigNotify, "signal 166"},
+	167:         {_SigNotify, "signal 167"},
+	168:         {_SigNotify, "signal 168"},
+	169:         {_SigNotify, "signal 169"},
+	170:         {_SigNotify, "signal 170"},
+	171:         {_SigNotify, "signal 171"},
+	172:         {_SigNotify, "signal 172"},
+	173:         {_SigNotify, "signal 173"},
+	174:         {_SigNotify, "signal 174"},
+	175:         {_SigNotify, "signal 175"},
+	176:         {_SigNotify, "signal 176"},
+	177:         {_SigNotify, "signal 177"},
+	178:         {_SigNotify, "signal 178"},
+	179:         {_SigNotify, "signal 179"},
+	180:         {_SigNotify, "signal 180"},
+	181:         {_SigNotify, "signal 181"},
+	182:         {_SigNotify, "signal 182"},
+	183:         {_SigNotify, "signal 183"},
+	184:         {_SigNotify, "signal 184"},
+	185:         {_SigNotify, "signal 185"},
+	186:         {_SigNotify, "signal 186"},
+	187:         {_SigNotify, "signal 187"},
+	188:         {_SigNotify, "signal 188"},
+	189:         {_SigNotify, "signal 189"},
+	190:         {_SigNotify, "signal 190"},
+	191:         {_SigNotify, "signal 191"},
+	192:         {_SigNotify, "signal 192"},
+	193:         {_SigNotify, "signal 193"},
+	194:         {_SigNotify, "signal 194"},
+	195:         {_SigNotify, "signal 195"},
+	196:         {_SigNotify, "signal 196"},
+	197:         {_SigNotify, "signal 197"},
+	198:         {_SigNotify, "signal 198"},
+	199:         {_SigNotify, "signal 199"},
+	200:         {_SigNotify, "signal 200"},
+	201:         {_SigNotify, "signal 201"},
+	202:         {_SigNotify, "signal 202"},
+	203:         {_SigNotify, "signal 203"},
+	204:         {_SigNotify, "signal 204"},
+	205:         {_SigNotify, "signal 205"},
+	206:         {_SigNotify, "signal 206"},
+	207:         {_SigNotify, "signal 207"},
+	208:         {_SigNotify, "signal 208"},
+	209:         {_SigNotify, "signal 209"},
+	210:         {_SigNotify, "signal 210"},
+	211:         {_SigNotify, "signal 211"},
+	212:         {_SigNotify, "signal 212"},
+	213:         {_SigNotify, "signal 213"},
+	214:         {_SigNotify, "signal 214"},
+	215:         {_SigNotify, "signal 215"},
+	216:         {_SigNotify, "signal 216"},
+	217:         {_SigNotify, "signal 217"},
+	218:         {_SigNotify, "signal 218"},
+	219:         {_SigNotify, "signal 219"},
+	220:         {_SigNotify, "signal 220"},
+	221:         {_SigNotify, "signal 221"},
+	222:         {_SigNotify, "signal 222"},
+	223:         {_SigNotify, "signal 223"},
+	224:         {_SigNotify, "signal 224"},
+	225:         {_SigNotify, "signal 225"},
+	226:         {_SigNotify, "signal 226"},
+	227:         {_SigNotify, "signal 227"},
+	228:         {_SigNotify, "signal 228"},
+	229:         {_SigNotify, "signal 229"},
+	230:         {_SigNotify, "signal 230"},
+	231:         {_SigNotify, "signal 231"},
+	232:         {_SigNotify, "signal 232"},
+	233:         {_SigNotify, "signal 233"},
+	234:         {_SigNotify, "signal 234"},
+	235:         {_SigNotify, "signal 235"},
+	236:         {_SigNotify, "signal 236"},
+	237:         {_SigNotify, "signal 237"},
+	238:         {_SigNotify, "signal 238"},
+	239:         {_SigNotify, "signal 239"},
+	240:         {_SigNotify, "signal 240"},
+	241:         {_SigNotify, "signal 241"},
+	242:         {_SigNotify, "signal 242"},
+	243:         {_SigNotify, "signal 243"},
+	244:         {_SigNotify, "signal 244"},
+	245:         {_SigNotify, "signal 245"},
+	246:         {_SigNotify, "signal 246"},
+	247:         {_SigNotify, "signal 247"},
+	248:         {_SigNotify, "signal 248"},
+	249:         {_SigNotify, "signal 249"},
+	250:         {_SigNotify, "signal 250"},
+	251:         {_SigNotify, "signal 251"},
+	252:         {_SigNotify, "signal 252"},
+	253:         {_SigNotify, "signal 253"},
+	254:         {_SigNotify, "signal 254"},
+	255:         {_SigNotify, "signal 255"},
+}
diff --git a/src/runtime/sigtab_linux_generic.go b/src/runtime/sigtab_linux_generic.go
new file mode 100644
index 0000000..38d6865
--- /dev/null
+++ b/src/runtime/sigtab_linux_generic.go
@@ -0,0 +1,79 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !mips
+// +build !mipsle
+// +build !mips64
+// +build !mips64le
+// +build linux
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/* 0 */ {0, "SIGNONE: no trap"},
+	/* 1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/* 6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/* 7 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/* 9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigThrow + _SigUnblock, "SIGSTKFLT: stack fault"},
+	/* 17 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue"},
+	/* 19 */ {0, "SIGSTOP: stop, unblockable"},
+	/* 20 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify, "SIGIO: i/o now possible"},
+	/* 30 */ {_SigNotify, "SIGPWR: power failure restart"},
+	/* 31 */ {_SigThrow, "SIGSYS: bad system call"},
+	/* 32 */ {_SigSetStack + _SigUnblock, "signal 32"}, /* SIGCANCEL; see issue 6997 */
+	/* 33 */ {_SigSetStack + _SigUnblock, "signal 33"}, /* SIGSETXID; see issues 3871, 9400, 12498 */
+	/* 34 */ {_SigSetStack + _SigUnblock, "signal 34"}, /* musl SIGSYNCCALL; see issue 39343 */
+	/* 35 */ {_SigNotify, "signal 35"},
+	/* 36 */ {_SigNotify, "signal 36"},
+	/* 37 */ {_SigNotify, "signal 37"},
+	/* 38 */ {_SigNotify, "signal 38"},
+	/* 39 */ {_SigNotify, "signal 39"},
+	/* 40 */ {_SigNotify, "signal 40"},
+	/* 41 */ {_SigNotify, "signal 41"},
+	/* 42 */ {_SigNotify, "signal 42"},
+	/* 43 */ {_SigNotify, "signal 43"},
+	/* 44 */ {_SigNotify, "signal 44"},
+	/* 45 */ {_SigNotify, "signal 45"},
+	/* 46 */ {_SigNotify, "signal 46"},
+	/* 47 */ {_SigNotify, "signal 47"},
+	/* 48 */ {_SigNotify, "signal 48"},
+	/* 49 */ {_SigNotify, "signal 49"},
+	/* 50 */ {_SigNotify, "signal 50"},
+	/* 51 */ {_SigNotify, "signal 51"},
+	/* 52 */ {_SigNotify, "signal 52"},
+	/* 53 */ {_SigNotify, "signal 53"},
+	/* 54 */ {_SigNotify, "signal 54"},
+	/* 55 */ {_SigNotify, "signal 55"},
+	/* 56 */ {_SigNotify, "signal 56"},
+	/* 57 */ {_SigNotify, "signal 57"},
+	/* 58 */ {_SigNotify, "signal 58"},
+	/* 59 */ {_SigNotify, "signal 59"},
+	/* 60 */ {_SigNotify, "signal 60"},
+	/* 61 */ {_SigNotify, "signal 61"},
+	/* 62 */ {_SigNotify, "signal 62"},
+	/* 63 */ {_SigNotify, "signal 63"},
+	/* 64 */ {_SigNotify, "signal 64"},
+}
diff --git a/src/runtime/sigtab_linux_mipsx.go b/src/runtime/sigtab_linux_mipsx.go
new file mode 100644
index 0000000..51ef470
--- /dev/null
+++ b/src/runtime/sigtab_linux_mipsx.go
@@ -0,0 +1,140 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle mips64 mips64le
+// +build linux
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/*  0 */ {0, "SIGNONE: no trap"},
+	/*  1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/*  2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/*  3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/*  4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/*  5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/*  6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/*  7 */ {_SigThrow, "SIGEMT"},
+	/*  8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/*  9 */ {0, "SIGKILL: kill"},
+	/*  10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/*  11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/*  12 */ {_SigThrow, "SIGSYS: bad system call"},
+	/*  13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/*  14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/*  15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/*  16 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/*  17 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/*  18 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/*  19 */ {_SigNotify, "SIGPWR: power failure restart"},
+	/*  20 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/*  21 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/*  22 */ {_SigNotify, "SIGIO: i/o now possible"},
+	/*  23 */ {0, "SIGSTOP: stop, unblockable"},
+	/*  24 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/*  25 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue"},
+	/*  26 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/*  27 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/*  28 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/*  29 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/*  30 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/*  31 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/*  32 */ {_SigSetStack + _SigUnblock, "signal 32"}, /* SIGCANCEL; see issue 6997 */
+	/*  33 */ {_SigSetStack + _SigUnblock, "signal 33"}, /* SIGSETXID; see issues 3871, 9400, 12498 */
+	/*  34 */ {_SigSetStack + _SigUnblock, "signal 34"}, /* musl SIGSYNCCALL; see issue 39343 */
+	/*  35 */ {_SigNotify, "signal 35"},
+	/*  36 */ {_SigNotify, "signal 36"},
+	/*  37 */ {_SigNotify, "signal 37"},
+	/*  38 */ {_SigNotify, "signal 38"},
+	/*  39 */ {_SigNotify, "signal 39"},
+	/*  40 */ {_SigNotify, "signal 40"},
+	/*  41 */ {_SigNotify, "signal 41"},
+	/*  42 */ {_SigNotify, "signal 42"},
+	/*  43 */ {_SigNotify, "signal 43"},
+	/*  44 */ {_SigNotify, "signal 44"},
+	/*  45 */ {_SigNotify, "signal 45"},
+	/*  46 */ {_SigNotify, "signal 46"},
+	/*  47 */ {_SigNotify, "signal 47"},
+	/*  48 */ {_SigNotify, "signal 48"},
+	/*  49 */ {_SigNotify, "signal 49"},
+	/*  50 */ {_SigNotify, "signal 50"},
+	/*  51 */ {_SigNotify, "signal 51"},
+	/*  52 */ {_SigNotify, "signal 52"},
+	/*  53 */ {_SigNotify, "signal 53"},
+	/*  54 */ {_SigNotify, "signal 54"},
+	/*  55 */ {_SigNotify, "signal 55"},
+	/*  56 */ {_SigNotify, "signal 56"},
+	/*  57 */ {_SigNotify, "signal 57"},
+	/*  58 */ {_SigNotify, "signal 58"},
+	/*  59 */ {_SigNotify, "signal 59"},
+	/*  60 */ {_SigNotify, "signal 60"},
+	/*  61 */ {_SigNotify, "signal 61"},
+	/*  62 */ {_SigNotify, "signal 62"},
+	/*  63 */ {_SigNotify, "signal 63"},
+	/*  64 */ {_SigNotify, "signal 64"},
+	/*  65 */ {_SigNotify, "signal 65"},
+	/*  66 */ {_SigNotify, "signal 66"},
+	/*  67 */ {_SigNotify, "signal 67"},
+	/*  68 */ {_SigNotify, "signal 68"},
+	/*  69 */ {_SigNotify, "signal 69"},
+	/*  70 */ {_SigNotify, "signal 70"},
+	/*  71 */ {_SigNotify, "signal 71"},
+	/*  72 */ {_SigNotify, "signal 72"},
+	/*  73 */ {_SigNotify, "signal 73"},
+	/*  74 */ {_SigNotify, "signal 74"},
+	/*  75 */ {_SigNotify, "signal 75"},
+	/*  76 */ {_SigNotify, "signal 76"},
+	/*  77 */ {_SigNotify, "signal 77"},
+	/*  78 */ {_SigNotify, "signal 78"},
+	/*  79 */ {_SigNotify, "signal 79"},
+	/*  80 */ {_SigNotify, "signal 80"},
+	/*  81 */ {_SigNotify, "signal 81"},
+	/*  82 */ {_SigNotify, "signal 82"},
+	/*  83 */ {_SigNotify, "signal 83"},
+	/*  84 */ {_SigNotify, "signal 84"},
+	/*  85 */ {_SigNotify, "signal 85"},
+	/*  86 */ {_SigNotify, "signal 86"},
+	/*  87 */ {_SigNotify, "signal 87"},
+	/*  88 */ {_SigNotify, "signal 88"},
+	/*  89 */ {_SigNotify, "signal 89"},
+	/*  90 */ {_SigNotify, "signal 90"},
+	/*  91 */ {_SigNotify, "signal 91"},
+	/*  92 */ {_SigNotify, "signal 92"},
+	/*  93 */ {_SigNotify, "signal 93"},
+	/*  94 */ {_SigNotify, "signal 94"},
+	/*  95 */ {_SigNotify, "signal 95"},
+	/*  96 */ {_SigNotify, "signal 96"},
+	/*  97 */ {_SigNotify, "signal 97"},
+	/*  98 */ {_SigNotify, "signal 98"},
+	/*  99 */ {_SigNotify, "signal 99"},
+	/* 100 */ {_SigNotify, "signal 100"},
+	/* 101 */ {_SigNotify, "signal 101"},
+	/* 102 */ {_SigNotify, "signal 102"},
+	/* 103 */ {_SigNotify, "signal 103"},
+	/* 104 */ {_SigNotify, "signal 104"},
+	/* 105 */ {_SigNotify, "signal 105"},
+	/* 106 */ {_SigNotify, "signal 106"},
+	/* 107 */ {_SigNotify, "signal 107"},
+	/* 108 */ {_SigNotify, "signal 108"},
+	/* 109 */ {_SigNotify, "signal 109"},
+	/* 110 */ {_SigNotify, "signal 110"},
+	/* 111 */ {_SigNotify, "signal 111"},
+	/* 112 */ {_SigNotify, "signal 112"},
+	/* 113 */ {_SigNotify, "signal 113"},
+	/* 114 */ {_SigNotify, "signal 114"},
+	/* 115 */ {_SigNotify, "signal 115"},
+	/* 116 */ {_SigNotify, "signal 116"},
+	/* 117 */ {_SigNotify, "signal 117"},
+	/* 118 */ {_SigNotify, "signal 118"},
+	/* 119 */ {_SigNotify, "signal 119"},
+	/* 120 */ {_SigNotify, "signal 120"},
+	/* 121 */ {_SigNotify, "signal 121"},
+	/* 122 */ {_SigNotify, "signal 122"},
+	/* 123 */ {_SigNotify, "signal 123"},
+	/* 124 */ {_SigNotify, "signal 124"},
+	/* 125 */ {_SigNotify, "signal 125"},
+	/* 126 */ {_SigNotify, "signal 126"},
+	/* 127 */ {_SigNotify, "signal 127"},
+	/* 128 */ {_SigNotify, "signal 128"},
+}
diff --git a/src/runtime/sizeclasses.go b/src/runtime/sizeclasses.go
new file mode 100644
index 0000000..c5521ce
--- /dev/null
+++ b/src/runtime/sizeclasses.go
@@ -0,0 +1,96 @@
+// Code generated by mksizeclasses.go; DO NOT EDIT.
+//go:generate go run mksizeclasses.go
+
+package runtime
+
+// class  bytes/obj  bytes/span  objects  tail waste  max waste
+//     1          8        8192     1024           0     87.50%
+//     2         16        8192      512           0     43.75%
+//     3         24        8192      341           8     29.24%
+//     4         32        8192      256           0     21.88%
+//     5         48        8192      170          32     31.52%
+//     6         64        8192      128           0     23.44%
+//     7         80        8192      102          32     19.07%
+//     8         96        8192       85          32     15.95%
+//     9        112        8192       73          16     13.56%
+//    10        128        8192       64           0     11.72%
+//    11        144        8192       56         128     11.82%
+//    12        160        8192       51          32      9.73%
+//    13        176        8192       46          96      9.59%
+//    14        192        8192       42         128      9.25%
+//    15        208        8192       39          80      8.12%
+//    16        224        8192       36         128      8.15%
+//    17        240        8192       34          32      6.62%
+//    18        256        8192       32           0      5.86%
+//    19        288        8192       28         128     12.16%
+//    20        320        8192       25         192     11.80%
+//    21        352        8192       23          96      9.88%
+//    22        384        8192       21         128      9.51%
+//    23        416        8192       19         288     10.71%
+//    24        448        8192       18         128      8.37%
+//    25        480        8192       17          32      6.82%
+//    26        512        8192       16           0      6.05%
+//    27        576        8192       14         128     12.33%
+//    28        640        8192       12         512     15.48%
+//    29        704        8192       11         448     13.93%
+//    30        768        8192       10         512     13.94%
+//    31        896        8192        9         128     15.52%
+//    32       1024        8192        8           0     12.40%
+//    33       1152        8192        7         128     12.41%
+//    34       1280        8192        6         512     15.55%
+//    35       1408       16384       11         896     14.00%
+//    36       1536        8192        5         512     14.00%
+//    37       1792       16384        9         256     15.57%
+//    38       2048        8192        4           0     12.45%
+//    39       2304       16384        7         256     12.46%
+//    40       2688        8192        3         128     15.59%
+//    41       3072       24576        8           0     12.47%
+//    42       3200       16384        5         384      6.22%
+//    43       3456       24576        7         384      8.83%
+//    44       4096        8192        2           0     15.60%
+//    45       4864       24576        5         256     16.65%
+//    46       5376       16384        3         256     10.92%
+//    47       6144       24576        4           0     12.48%
+//    48       6528       32768        5         128      6.23%
+//    49       6784       40960        6         256      4.36%
+//    50       6912       49152        7         768      3.37%
+//    51       8192        8192        1           0     15.61%
+//    52       9472       57344        6         512     14.28%
+//    53       9728       49152        5         512      3.64%
+//    54      10240       40960        4           0      4.99%
+//    55      10880       32768        3         128      6.24%
+//    56      12288       24576        2           0     11.45%
+//    57      13568       40960        3         256      9.99%
+//    58      14336       57344        4           0      5.35%
+//    59      16384       16384        1           0     12.49%
+//    60      18432       73728        4           0     11.11%
+//    61      19072       57344        3         128      3.57%
+//    62      20480       40960        2           0      6.87%
+//    63      21760       65536        3         256      6.25%
+//    64      24576       24576        1           0     11.45%
+//    65      27264       81920        3         128     10.00%
+//    66      28672       57344        2           0      4.91%
+//    67      32768       32768        1           0     12.50%
+
+const (
+	_MaxSmallSize   = 32768
+	smallSizeDiv    = 8
+	smallSizeMax    = 1024
+	largeSizeDiv    = 128
+	_NumSizeClasses = 68
+	_PageShift      = 13
+)
+
+var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 24, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 896, 1024, 1152, 1280, 1408, 1536, 1792, 2048, 2304, 2688, 3072, 3200, 3456, 4096, 4864, 5376, 6144, 6528, 6784, 6912, 8192, 9472, 9728, 10240, 10880, 12288, 13568, 14336, 16384, 18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768}
+var class_to_allocnpages = [_NumSizeClasses]uint8{0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 3, 2, 3, 1, 3, 2, 3, 4, 5, 6, 1, 7, 6, 5, 4, 3, 5, 7, 2, 9, 7, 5, 8, 3, 10, 7, 4}
+
+type divMagic struct {
+	shift    uint8
+	shift2   uint8
+	mul      uint16
+	baseMask uint16
+}
+
+var class_to_divmagic = [_NumSizeClasses]divMagic{{0, 0, 0, 0}, {3, 0, 1, 65528}, {4, 0, 1, 65520}, {3, 11, 683, 0}, {5, 0, 1, 65504}, {4, 11, 683, 0}, {6, 0, 1, 65472}, {4, 10, 205, 0}, {5, 9, 171, 0}, {4, 11, 293, 0}, {7, 0, 1, 65408}, {4, 13, 911, 0}, {5, 10, 205, 0}, {4, 12, 373, 0}, {6, 9, 171, 0}, {4, 13, 631, 0}, {5, 11, 293, 0}, {4, 13, 547, 0}, {8, 0, 1, 65280}, {5, 9, 57, 0}, {6, 9, 103, 0}, {5, 12, 373, 0}, {7, 7, 43, 0}, {5, 10, 79, 0}, {6, 10, 147, 0}, {5, 11, 137, 0}, {9, 0, 1, 65024}, {6, 9, 57, 0}, {7, 9, 103, 0}, {6, 11, 187, 0}, {8, 7, 43, 0}, {7, 8, 37, 0}, {10, 0, 1, 64512}, {7, 9, 57, 0}, {8, 6, 13, 0}, {7, 11, 187, 0}, {9, 5, 11, 0}, {8, 8, 37, 0}, {11, 0, 1, 63488}, {8, 9, 57, 0}, {7, 10, 49, 0}, {10, 5, 11, 0}, {7, 10, 41, 0}, {7, 9, 19, 0}, {12, 0, 1, 61440}, {8, 9, 27, 0}, {8, 10, 49, 0}, {11, 5, 11, 0}, {7, 13, 161, 0}, {7, 13, 155, 0}, {8, 9, 19, 0}, {13, 0, 1, 57344}, {8, 12, 111, 0}, {9, 9, 27, 0}, {11, 6, 13, 0}, {7, 14, 193, 0}, {12, 3, 3, 0}, {8, 13, 155, 0}, {11, 8, 37, 0}, {14, 0, 1, 49152}, {11, 8, 29, 0}, {7, 13, 55, 0}, {12, 5, 7, 0}, {8, 14, 193, 0}, {13, 3, 3, 0}, {7, 14, 77, 0}, {12, 7, 19, 0}, {15, 0, 1, 32768}}
+var size_to_class8 = [smallSizeMax/smallSizeDiv + 1]uint8{0, 1, 2, 3, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32}
+var size_to_class128 = [(_MaxSmallSize-smallSizeMax)/largeSizeDiv + 1]uint8{32, 33, 34, 35, 36, 37, 37, 38, 38, 39, 39, 40, 40, 40, 41, 41, 41, 42, 43, 43, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 49, 49, 50, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 54, 54, 54, 54, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 58, 58, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 61, 61, 61, 61, 61, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67}
diff --git a/src/runtime/sizeof_test.go b/src/runtime/sizeof_test.go
new file mode 100644
index 0000000..736e848
--- /dev/null
+++ b/src/runtime/sizeof_test.go
@@ -0,0 +1,38 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"reflect"
+	"runtime"
+	"testing"
+	"unsafe"
+)
+
+// Assert that the size of important structures do not change unexpectedly.
+
+func TestSizeof(t *testing.T) {
+	const _64bit = unsafe.Sizeof(uintptr(0)) == 8
+
+	var tests = []struct {
+		val    interface{} // type as a value
+		_32bit uintptr     // size on 32bit platforms
+		_64bit uintptr     // size on 64bit platforms
+	}{
+		{runtime.G{}, 216, 376},   // g, but exported for testing
+		{runtime.Sudog{}, 56, 88}, // sudog, but exported for testing
+	}
+
+	for _, tt := range tests {
+		want := tt._32bit
+		if _64bit {
+			want = tt._64bit
+		}
+		got := reflect.TypeOf(tt.val).Size()
+		if want != got {
+			t.Errorf("unsafe.Sizeof(%T) = %d, want %d", tt.val, got, want)
+		}
+	}
+}
diff --git a/src/runtime/slice.go b/src/runtime/slice.go
new file mode 100644
index 0000000..c0647d9
--- /dev/null
+++ b/src/runtime/slice.go
@@ -0,0 +1,280 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/math"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type slice struct {
+	array unsafe.Pointer
+	len   int
+	cap   int
+}
+
+// A notInHeapSlice is a slice backed by go:notinheap memory.
+type notInHeapSlice struct {
+	array *notInHeap
+	len   int
+	cap   int
+}
+
+func panicmakeslicelen() {
+	panic(errorString("makeslice: len out of range"))
+}
+
+func panicmakeslicecap() {
+	panic(errorString("makeslice: cap out of range"))
+}
+
+// makeslicecopy allocates a slice of "tolen" elements of type "et",
+// then copies "fromlen" elements of type "et" into that new allocation from "from".
+func makeslicecopy(et *_type, tolen int, fromlen int, from unsafe.Pointer) unsafe.Pointer {
+	var tomem, copymem uintptr
+	if uintptr(tolen) > uintptr(fromlen) {
+		var overflow bool
+		tomem, overflow = math.MulUintptr(et.size, uintptr(tolen))
+		if overflow || tomem > maxAlloc || tolen < 0 {
+			panicmakeslicelen()
+		}
+		copymem = et.size * uintptr(fromlen)
+	} else {
+		// fromlen is a known good length providing and equal or greater than tolen,
+		// thereby making tolen a good slice length too as from and to slices have the
+		// same element width.
+		tomem = et.size * uintptr(tolen)
+		copymem = tomem
+	}
+
+	var to unsafe.Pointer
+	if et.ptrdata == 0 {
+		to = mallocgc(tomem, nil, false)
+		if copymem < tomem {
+			memclrNoHeapPointers(add(to, copymem), tomem-copymem)
+		}
+	} else {
+		// Note: can't use rawmem (which avoids zeroing of memory), because then GC can scan uninitialized memory.
+		to = mallocgc(tomem, et, true)
+		if copymem > 0 && writeBarrier.enabled {
+			// Only shade the pointers in old.array since we know the destination slice to
+			// only contains nil pointers because it has been cleared during alloc.
+			bulkBarrierPreWriteSrcOnly(uintptr(to), uintptr(from), copymem)
+		}
+	}
+
+	if raceenabled {
+		callerpc := getcallerpc()
+		pc := funcPC(makeslicecopy)
+		racereadrangepc(from, copymem, callerpc, pc)
+	}
+	if msanenabled {
+		msanread(from, copymem)
+	}
+
+	memmove(to, from, copymem)
+
+	return to
+}
+
+func makeslice(et *_type, len, cap int) unsafe.Pointer {
+	mem, overflow := math.MulUintptr(et.size, uintptr(cap))
+	if overflow || mem > maxAlloc || len < 0 || len > cap {
+		// NOTE: Produce a 'len out of range' error instead of a
+		// 'cap out of range' error when someone does make([]T, bignumber).
+		// 'cap out of range' is true too, but since the cap is only being
+		// supplied implicitly, saying len is clearer.
+		// See golang.org/issue/4085.
+		mem, overflow := math.MulUintptr(et.size, uintptr(len))
+		if overflow || mem > maxAlloc || len < 0 {
+			panicmakeslicelen()
+		}
+		panicmakeslicecap()
+	}
+
+	return mallocgc(mem, et, true)
+}
+
+func makeslice64(et *_type, len64, cap64 int64) unsafe.Pointer {
+	len := int(len64)
+	if int64(len) != len64 {
+		panicmakeslicelen()
+	}
+
+	cap := int(cap64)
+	if int64(cap) != cap64 {
+		panicmakeslicecap()
+	}
+
+	return makeslice(et, len, cap)
+}
+
+// growslice handles slice growth during append.
+// It is passed the slice element type, the old slice, and the desired new minimum capacity,
+// and it returns a new slice with at least that capacity, with the old data
+// copied into it.
+// The new slice's length is set to the old slice's length,
+// NOT to the new requested capacity.
+// This is for codegen convenience. The old slice's length is used immediately
+// to calculate where to write new values during an append.
+// TODO: When the old backend is gone, reconsider this decision.
+// The SSA backend might prefer the new length or to return only ptr/cap and save stack space.
+func growslice(et *_type, old slice, cap int) slice {
+	if raceenabled {
+		callerpc := getcallerpc()
+		racereadrangepc(old.array, uintptr(old.len*int(et.size)), callerpc, funcPC(growslice))
+	}
+	if msanenabled {
+		msanread(old.array, uintptr(old.len*int(et.size)))
+	}
+
+	if cap < old.cap {
+		panic(errorString("growslice: cap out of range"))
+	}
+
+	if et.size == 0 {
+		// append should not create a slice with nil pointer but non-zero len.
+		// We assume that append doesn't need to preserve old.array in this case.
+		return slice{unsafe.Pointer(&zerobase), old.len, cap}
+	}
+
+	newcap := old.cap
+	doublecap := newcap + newcap
+	if cap > doublecap {
+		newcap = cap
+	} else {
+		if old.cap < 1024 {
+			newcap = doublecap
+		} else {
+			// Check 0 < newcap to detect overflow
+			// and prevent an infinite loop.
+			for 0 < newcap && newcap < cap {
+				newcap += newcap / 4
+			}
+			// Set newcap to the requested cap when
+			// the newcap calculation overflowed.
+			if newcap <= 0 {
+				newcap = cap
+			}
+		}
+	}
+
+	var overflow bool
+	var lenmem, newlenmem, capmem uintptr
+	// Specialize for common values of et.size.
+	// For 1 we don't need any division/multiplication.
+	// For sys.PtrSize, compiler will optimize division/multiplication into a shift by a constant.
+	// For powers of 2, use a variable shift.
+	switch {
+	case et.size == 1:
+		lenmem = uintptr(old.len)
+		newlenmem = uintptr(cap)
+		capmem = roundupsize(uintptr(newcap))
+		overflow = uintptr(newcap) > maxAlloc
+		newcap = int(capmem)
+	case et.size == sys.PtrSize:
+		lenmem = uintptr(old.len) * sys.PtrSize
+		newlenmem = uintptr(cap) * sys.PtrSize
+		capmem = roundupsize(uintptr(newcap) * sys.PtrSize)
+		overflow = uintptr(newcap) > maxAlloc/sys.PtrSize
+		newcap = int(capmem / sys.PtrSize)
+	case isPowerOfTwo(et.size):
+		var shift uintptr
+		if sys.PtrSize == 8 {
+			// Mask shift for better code generation.
+			shift = uintptr(sys.Ctz64(uint64(et.size))) & 63
+		} else {
+			shift = uintptr(sys.Ctz32(uint32(et.size))) & 31
+		}
+		lenmem = uintptr(old.len) << shift
+		newlenmem = uintptr(cap) << shift
+		capmem = roundupsize(uintptr(newcap) << shift)
+		overflow = uintptr(newcap) > (maxAlloc >> shift)
+		newcap = int(capmem >> shift)
+	default:
+		lenmem = uintptr(old.len) * et.size
+		newlenmem = uintptr(cap) * et.size
+		capmem, overflow = math.MulUintptr(et.size, uintptr(newcap))
+		capmem = roundupsize(capmem)
+		newcap = int(capmem / et.size)
+	}
+
+	// The check of overflow in addition to capmem > maxAlloc is needed
+	// to prevent an overflow which can be used to trigger a segfault
+	// on 32bit architectures with this example program:
+	//
+	// type T [1<<27 + 1]int64
+	//
+	// var d T
+	// var s []T
+	//
+	// func main() {
+	//   s = append(s, d, d, d, d)
+	//   print(len(s), "\n")
+	// }
+	if overflow || capmem > maxAlloc {
+		panic(errorString("growslice: cap out of range"))
+	}
+
+	var p unsafe.Pointer
+	if et.ptrdata == 0 {
+		p = mallocgc(capmem, nil, false)
+		// The append() that calls growslice is going to overwrite from old.len to cap (which will be the new length).
+		// Only clear the part that will not be overwritten.
+		memclrNoHeapPointers(add(p, newlenmem), capmem-newlenmem)
+	} else {
+		// Note: can't use rawmem (which avoids zeroing of memory), because then GC can scan uninitialized memory.
+		p = mallocgc(capmem, et, true)
+		if lenmem > 0 && writeBarrier.enabled {
+			// Only shade the pointers in old.array since we know the destination slice p
+			// only contains nil pointers because it has been cleared during alloc.
+			bulkBarrierPreWriteSrcOnly(uintptr(p), uintptr(old.array), lenmem-et.size+et.ptrdata)
+		}
+	}
+	memmove(p, old.array, lenmem)
+
+	return slice{p, old.len, newcap}
+}
+
+func isPowerOfTwo(x uintptr) bool {
+	return x&(x-1) == 0
+}
+
+// slicecopy is used to copy from a string or slice of pointerless elements into a slice.
+func slicecopy(toPtr unsafe.Pointer, toLen int, fromPtr unsafe.Pointer, fromLen int, width uintptr) int {
+	if fromLen == 0 || toLen == 0 {
+		return 0
+	}
+
+	n := fromLen
+	if toLen < n {
+		n = toLen
+	}
+
+	if width == 0 {
+		return n
+	}
+
+	size := uintptr(n) * width
+	if raceenabled {
+		callerpc := getcallerpc()
+		pc := funcPC(slicecopy)
+		racereadrangepc(fromPtr, size, callerpc, pc)
+		racewriterangepc(toPtr, size, callerpc, pc)
+	}
+	if msanenabled {
+		msanread(fromPtr, size)
+		msanwrite(toPtr, size)
+	}
+
+	if size == 1 { // common case worth about 2x to do here
+		// TODO: is this still worth it with new memmove impl?
+		*(*byte)(toPtr) = *(*byte)(fromPtr) // known to be a byte pointer
+	} else {
+		memmove(toPtr, fromPtr, size)
+	}
+	return n
+}
diff --git a/src/runtime/slice_test.go b/src/runtime/slice_test.go
new file mode 100644
index 0000000..cd2bc26
--- /dev/null
+++ b/src/runtime/slice_test.go
@@ -0,0 +1,501 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"testing"
+)
+
+const N = 20
+
+func BenchmarkMakeSliceCopy(b *testing.B) {
+	const length = 32
+	var bytes = make([]byte, 8*length)
+	var ints = make([]int, length)
+	var ptrs = make([]*byte, length)
+	b.Run("mallocmove", func(b *testing.B) {
+		b.Run("Byte", func(b *testing.B) {
+			var x []byte
+			for i := 0; i < b.N; i++ {
+				x = make([]byte, len(bytes))
+				copy(x, bytes)
+			}
+		})
+		b.Run("Int", func(b *testing.B) {
+			var x []int
+			for i := 0; i < b.N; i++ {
+				x = make([]int, len(ints))
+				copy(x, ints)
+			}
+		})
+		b.Run("Ptr", func(b *testing.B) {
+			var x []*byte
+			for i := 0; i < b.N; i++ {
+				x = make([]*byte, len(ptrs))
+				copy(x, ptrs)
+			}
+
+		})
+	})
+	b.Run("makecopy", func(b *testing.B) {
+		b.Run("Byte", func(b *testing.B) {
+			var x []byte
+			for i := 0; i < b.N; i++ {
+				x = make([]byte, 8*length)
+				copy(x, bytes)
+			}
+		})
+		b.Run("Int", func(b *testing.B) {
+			var x []int
+			for i := 0; i < b.N; i++ {
+				x = make([]int, length)
+				copy(x, ints)
+			}
+		})
+		b.Run("Ptr", func(b *testing.B) {
+			var x []*byte
+			for i := 0; i < b.N; i++ {
+				x = make([]*byte, length)
+				copy(x, ptrs)
+			}
+
+		})
+	})
+	b.Run("nilappend", func(b *testing.B) {
+		b.Run("Byte", func(b *testing.B) {
+			var x []byte
+			for i := 0; i < b.N; i++ {
+				x = append([]byte(nil), bytes...)
+				_ = x
+			}
+		})
+		b.Run("Int", func(b *testing.B) {
+			var x []int
+			for i := 0; i < b.N; i++ {
+				x = append([]int(nil), ints...)
+				_ = x
+			}
+		})
+		b.Run("Ptr", func(b *testing.B) {
+			var x []*byte
+			for i := 0; i < b.N; i++ {
+				x = append([]*byte(nil), ptrs...)
+				_ = x
+			}
+		})
+	})
+}
+
+type (
+	struct24 struct{ a, b, c int64 }
+	struct32 struct{ a, b, c, d int64 }
+	struct40 struct{ a, b, c, d, e int64 }
+)
+
+func BenchmarkMakeSlice(b *testing.B) {
+	const length = 2
+	b.Run("Byte", func(b *testing.B) {
+		var x []byte
+		for i := 0; i < b.N; i++ {
+			x = make([]byte, length, 2*length)
+			_ = x
+		}
+	})
+	b.Run("Int16", func(b *testing.B) {
+		var x []int16
+		for i := 0; i < b.N; i++ {
+			x = make([]int16, length, 2*length)
+			_ = x
+		}
+	})
+	b.Run("Int", func(b *testing.B) {
+		var x []int
+		for i := 0; i < b.N; i++ {
+			x = make([]int, length, 2*length)
+			_ = x
+		}
+	})
+	b.Run("Ptr", func(b *testing.B) {
+		var x []*byte
+		for i := 0; i < b.N; i++ {
+			x = make([]*byte, length, 2*length)
+			_ = x
+		}
+	})
+	b.Run("Struct", func(b *testing.B) {
+		b.Run("24", func(b *testing.B) {
+			var x []struct24
+			for i := 0; i < b.N; i++ {
+				x = make([]struct24, length, 2*length)
+				_ = x
+			}
+		})
+		b.Run("32", func(b *testing.B) {
+			var x []struct32
+			for i := 0; i < b.N; i++ {
+				x = make([]struct32, length, 2*length)
+				_ = x
+			}
+		})
+		b.Run("40", func(b *testing.B) {
+			var x []struct40
+			for i := 0; i < b.N; i++ {
+				x = make([]struct40, length, 2*length)
+				_ = x
+			}
+		})
+
+	})
+}
+
+func BenchmarkGrowSlice(b *testing.B) {
+	b.Run("Byte", func(b *testing.B) {
+		x := make([]byte, 9)
+		for i := 0; i < b.N; i++ {
+			_ = append([]byte(nil), x...)
+		}
+	})
+	b.Run("Int16", func(b *testing.B) {
+		x := make([]int16, 9)
+		for i := 0; i < b.N; i++ {
+			_ = append([]int16(nil), x...)
+		}
+	})
+	b.Run("Int", func(b *testing.B) {
+		x := make([]int, 9)
+		for i := 0; i < b.N; i++ {
+			_ = append([]int(nil), x...)
+		}
+	})
+	b.Run("Ptr", func(b *testing.B) {
+		x := make([]*byte, 9)
+		for i := 0; i < b.N; i++ {
+			_ = append([]*byte(nil), x...)
+		}
+	})
+	b.Run("Struct", func(b *testing.B) {
+		b.Run("24", func(b *testing.B) {
+			x := make([]struct24, 9)
+			for i := 0; i < b.N; i++ {
+				_ = append([]struct24(nil), x...)
+			}
+		})
+		b.Run("32", func(b *testing.B) {
+			x := make([]struct32, 9)
+			for i := 0; i < b.N; i++ {
+				_ = append([]struct32(nil), x...)
+			}
+		})
+		b.Run("40", func(b *testing.B) {
+			x := make([]struct40, 9)
+			for i := 0; i < b.N; i++ {
+				_ = append([]struct40(nil), x...)
+			}
+		})
+
+	})
+}
+
+var (
+	SinkIntSlice        []int
+	SinkIntPointerSlice []*int
+)
+
+func BenchmarkExtendSlice(b *testing.B) {
+	var length = 4 // Use a variable to prevent stack allocation of slices.
+	b.Run("IntSlice", func(b *testing.B) {
+		s := make([]int, 0, length)
+		for i := 0; i < b.N; i++ {
+			s = append(s[:0:length/2], make([]int, length)...)
+		}
+		SinkIntSlice = s
+	})
+	b.Run("PointerSlice", func(b *testing.B) {
+		s := make([]*int, 0, length)
+		for i := 0; i < b.N; i++ {
+			s = append(s[:0:length/2], make([]*int, length)...)
+		}
+		SinkIntPointerSlice = s
+	})
+	b.Run("NoGrow", func(b *testing.B) {
+		s := make([]int, 0, length)
+		for i := 0; i < b.N; i++ {
+			s = append(s[:0:length], make([]int, length)...)
+		}
+		SinkIntSlice = s
+	})
+}
+
+func BenchmarkAppend(b *testing.B) {
+	b.StopTimer()
+	x := make([]int, 0, N)
+	b.StartTimer()
+	for i := 0; i < b.N; i++ {
+		x = x[0:0]
+		for j := 0; j < N; j++ {
+			x = append(x, j)
+		}
+	}
+}
+
+func BenchmarkAppendGrowByte(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x []byte
+		for j := 0; j < 1<<20; j++ {
+			x = append(x, byte(j))
+		}
+	}
+}
+
+func BenchmarkAppendGrowString(b *testing.B) {
+	var s string
+	for i := 0; i < b.N; i++ {
+		var x []string
+		for j := 0; j < 1<<20; j++ {
+			x = append(x, s)
+		}
+	}
+}
+
+func BenchmarkAppendSlice(b *testing.B) {
+	for _, length := range []int{1, 4, 7, 8, 15, 16, 32} {
+		b.Run(fmt.Sprint(length, "Bytes"), func(b *testing.B) {
+			x := make([]byte, 0, N)
+			y := make([]byte, length)
+			for i := 0; i < b.N; i++ {
+				x = x[0:0]
+				x = append(x, y...)
+			}
+		})
+	}
+}
+
+var (
+	blackhole []byte
+)
+
+func BenchmarkAppendSliceLarge(b *testing.B) {
+	for _, length := range []int{1 << 10, 4 << 10, 16 << 10, 64 << 10, 256 << 10, 1024 << 10} {
+		y := make([]byte, length)
+		b.Run(fmt.Sprint(length, "Bytes"), func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				blackhole = nil
+				blackhole = append(blackhole, y...)
+			}
+		})
+	}
+}
+
+func BenchmarkAppendStr(b *testing.B) {
+	for _, str := range []string{
+		"1",
+		"1234",
+		"12345678",
+		"1234567890123456",
+		"12345678901234567890123456789012",
+	} {
+		b.Run(fmt.Sprint(len(str), "Bytes"), func(b *testing.B) {
+			x := make([]byte, 0, N)
+			for i := 0; i < b.N; i++ {
+				x = x[0:0]
+				x = append(x, str...)
+			}
+		})
+	}
+}
+
+func BenchmarkAppendSpecialCase(b *testing.B) {
+	b.StopTimer()
+	x := make([]int, 0, N)
+	b.StartTimer()
+	for i := 0; i < b.N; i++ {
+		x = x[0:0]
+		for j := 0; j < N; j++ {
+			if len(x) < cap(x) {
+				x = x[:len(x)+1]
+				x[len(x)-1] = j
+			} else {
+				x = append(x, j)
+			}
+		}
+	}
+}
+
+var x []int
+
+func f() int {
+	x[:1][0] = 3
+	return 2
+}
+
+func TestSideEffectOrder(t *testing.T) {
+	x = make([]int, 0, 10)
+	x = append(x, 1, f())
+	if x[0] != 1 || x[1] != 2 {
+		t.Error("append failed: ", x[0], x[1])
+	}
+}
+
+func TestAppendOverlap(t *testing.T) {
+	x := []byte("1234")
+	x = append(x[1:], x...) // p > q in runtime·appendslice.
+	got := string(x)
+	want := "2341234"
+	if got != want {
+		t.Errorf("overlap failed: got %q want %q", got, want)
+	}
+}
+
+func BenchmarkCopy(b *testing.B) {
+	for _, l := range []int{1, 2, 4, 8, 12, 16, 32, 128, 1024} {
+		buf := make([]byte, 4096)
+		b.Run(fmt.Sprint(l, "Byte"), func(b *testing.B) {
+			s := make([]byte, l)
+			var n int
+			for i := 0; i < b.N; i++ {
+				n = copy(buf, s)
+			}
+			b.SetBytes(int64(n))
+		})
+		b.Run(fmt.Sprint(l, "String"), func(b *testing.B) {
+			s := string(make([]byte, l))
+			var n int
+			for i := 0; i < b.N; i++ {
+				n = copy(buf, s)
+			}
+			b.SetBytes(int64(n))
+		})
+	}
+}
+
+var (
+	sByte []byte
+	s1Ptr []uintptr
+	s2Ptr [][2]uintptr
+	s3Ptr [][3]uintptr
+	s4Ptr [][4]uintptr
+)
+
+// BenchmarkAppendInPlace tests the performance of append
+// when the result is being written back to the same slice.
+// In order for the in-place optimization to occur,
+// the slice must be referred to by address;
+// using a global is an easy way to trigger that.
+// We test the "grow" and "no grow" paths separately,
+// but not the "normal" (occasionally grow) path,
+// because it is a blend of the other two.
+// We use small numbers and small sizes in an attempt
+// to avoid benchmarking memory allocation and copying.
+// We use scalars instead of pointers in an attempt
+// to avoid benchmarking the write barriers.
+// We benchmark four common sizes (byte, pointer, string/interface, slice),
+// and one larger size.
+func BenchmarkAppendInPlace(b *testing.B) {
+	b.Run("NoGrow", func(b *testing.B) {
+		const C = 128
+
+		b.Run("Byte", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				sByte = make([]byte, C)
+				for j := 0; j < C; j++ {
+					sByte = append(sByte, 0x77)
+				}
+			}
+		})
+
+		b.Run("1Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s1Ptr = make([]uintptr, C)
+				for j := 0; j < C; j++ {
+					s1Ptr = append(s1Ptr, 0x77)
+				}
+			}
+		})
+
+		b.Run("2Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s2Ptr = make([][2]uintptr, C)
+				for j := 0; j < C; j++ {
+					s2Ptr = append(s2Ptr, [2]uintptr{0x77, 0x88})
+				}
+			}
+		})
+
+		b.Run("3Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s3Ptr = make([][3]uintptr, C)
+				for j := 0; j < C; j++ {
+					s3Ptr = append(s3Ptr, [3]uintptr{0x77, 0x88, 0x99})
+				}
+			}
+		})
+
+		b.Run("4Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s4Ptr = make([][4]uintptr, C)
+				for j := 0; j < C; j++ {
+					s4Ptr = append(s4Ptr, [4]uintptr{0x77, 0x88, 0x99, 0xAA})
+				}
+			}
+		})
+
+	})
+
+	b.Run("Grow", func(b *testing.B) {
+		const C = 5
+
+		b.Run("Byte", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				sByte = make([]byte, 0)
+				for j := 0; j < C; j++ {
+					sByte = append(sByte, 0x77)
+					sByte = sByte[:cap(sByte)]
+				}
+			}
+		})
+
+		b.Run("1Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s1Ptr = make([]uintptr, 0)
+				for j := 0; j < C; j++ {
+					s1Ptr = append(s1Ptr, 0x77)
+					s1Ptr = s1Ptr[:cap(s1Ptr)]
+				}
+			}
+		})
+
+		b.Run("2Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s2Ptr = make([][2]uintptr, 0)
+				for j := 0; j < C; j++ {
+					s2Ptr = append(s2Ptr, [2]uintptr{0x77, 0x88})
+					s2Ptr = s2Ptr[:cap(s2Ptr)]
+				}
+			}
+		})
+
+		b.Run("3Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s3Ptr = make([][3]uintptr, 0)
+				for j := 0; j < C; j++ {
+					s3Ptr = append(s3Ptr, [3]uintptr{0x77, 0x88, 0x99})
+					s3Ptr = s3Ptr[:cap(s3Ptr)]
+				}
+			}
+		})
+
+		b.Run("4Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s4Ptr = make([][4]uintptr, 0)
+				for j := 0; j < C; j++ {
+					s4Ptr = append(s4Ptr, [4]uintptr{0x77, 0x88, 0x99, 0xAA})
+					s4Ptr = s4Ptr[:cap(s4Ptr)]
+				}
+			}
+		})
+
+	})
+}
diff --git a/src/runtime/softfloat64.go b/src/runtime/softfloat64.go
new file mode 100644
index 0000000..13bee6c
--- /dev/null
+++ b/src/runtime/softfloat64.go
@@ -0,0 +1,597 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Software IEEE754 64-bit floating point.
+// Only referred to (and thus linked in) by arm port
+// and by tests in this directory.
+
+package runtime
+
+const (
+	mantbits64 uint = 52
+	expbits64  uint = 11
+	bias64          = -1<<(expbits64-1) + 1
+
+	nan64 uint64 = (1<<expbits64-1)<<mantbits64 + 1<<(mantbits64-1) // quiet NaN, 0 payload
+	inf64 uint64 = (1<<expbits64 - 1) << mantbits64
+	neg64 uint64 = 1 << (expbits64 + mantbits64)
+
+	mantbits32 uint = 23
+	expbits32  uint = 8
+	bias32          = -1<<(expbits32-1) + 1
+
+	nan32 uint32 = (1<<expbits32-1)<<mantbits32 + 1<<(mantbits32-1) // quiet NaN, 0 payload
+	inf32 uint32 = (1<<expbits32 - 1) << mantbits32
+	neg32 uint32 = 1 << (expbits32 + mantbits32)
+)
+
+func funpack64(f uint64) (sign, mant uint64, exp int, inf, nan bool) {
+	sign = f & (1 << (mantbits64 + expbits64))
+	mant = f & (1<<mantbits64 - 1)
+	exp = int(f>>mantbits64) & (1<<expbits64 - 1)
+
+	switch exp {
+	case 1<<expbits64 - 1:
+		if mant != 0 {
+			nan = true
+			return
+		}
+		inf = true
+		return
+
+	case 0:
+		// denormalized
+		if mant != 0 {
+			exp += bias64 + 1
+			for mant < 1<<mantbits64 {
+				mant <<= 1
+				exp--
+			}
+		}
+
+	default:
+		// add implicit top bit
+		mant |= 1 << mantbits64
+		exp += bias64
+	}
+	return
+}
+
+func funpack32(f uint32) (sign, mant uint32, exp int, inf, nan bool) {
+	sign = f & (1 << (mantbits32 + expbits32))
+	mant = f & (1<<mantbits32 - 1)
+	exp = int(f>>mantbits32) & (1<<expbits32 - 1)
+
+	switch exp {
+	case 1<<expbits32 - 1:
+		if mant != 0 {
+			nan = true
+			return
+		}
+		inf = true
+		return
+
+	case 0:
+		// denormalized
+		if mant != 0 {
+			exp += bias32 + 1
+			for mant < 1<<mantbits32 {
+				mant <<= 1
+				exp--
+			}
+		}
+
+	default:
+		// add implicit top bit
+		mant |= 1 << mantbits32
+		exp += bias32
+	}
+	return
+}
+
+func fpack64(sign, mant uint64, exp int, trunc uint64) uint64 {
+	mant0, exp0, trunc0 := mant, exp, trunc
+	if mant == 0 {
+		return sign
+	}
+	for mant < 1<<mantbits64 {
+		mant <<= 1
+		exp--
+	}
+	for mant >= 4<<mantbits64 {
+		trunc |= mant & 1
+		mant >>= 1
+		exp++
+	}
+	if mant >= 2<<mantbits64 {
+		if mant&1 != 0 && (trunc != 0 || mant&2 != 0) {
+			mant++
+			if mant >= 4<<mantbits64 {
+				mant >>= 1
+				exp++
+			}
+		}
+		mant >>= 1
+		exp++
+	}
+	if exp >= 1<<expbits64-1+bias64 {
+		return sign ^ inf64
+	}
+	if exp < bias64+1 {
+		if exp < bias64-int(mantbits64) {
+			return sign | 0
+		}
+		// repeat expecting denormal
+		mant, exp, trunc = mant0, exp0, trunc0
+		for exp < bias64 {
+			trunc |= mant & 1
+			mant >>= 1
+			exp++
+		}
+		if mant&1 != 0 && (trunc != 0 || mant&2 != 0) {
+			mant++
+		}
+		mant >>= 1
+		exp++
+		if mant < 1<<mantbits64 {
+			return sign | mant
+		}
+	}
+	return sign | uint64(exp-bias64)<<mantbits64 | mant&(1<<mantbits64-1)
+}
+
+func fpack32(sign, mant uint32, exp int, trunc uint32) uint32 {
+	mant0, exp0, trunc0 := mant, exp, trunc
+	if mant == 0 {
+		return sign
+	}
+	for mant < 1<<mantbits32 {
+		mant <<= 1
+		exp--
+	}
+	for mant >= 4<<mantbits32 {
+		trunc |= mant & 1
+		mant >>= 1
+		exp++
+	}
+	if mant >= 2<<mantbits32 {
+		if mant&1 != 0 && (trunc != 0 || mant&2 != 0) {
+			mant++
+			if mant >= 4<<mantbits32 {
+				mant >>= 1
+				exp++
+			}
+		}
+		mant >>= 1
+		exp++
+	}
+	if exp >= 1<<expbits32-1+bias32 {
+		return sign ^ inf32
+	}
+	if exp < bias32+1 {
+		if exp < bias32-int(mantbits32) {
+			return sign | 0
+		}
+		// repeat expecting denormal
+		mant, exp, trunc = mant0, exp0, trunc0
+		for exp < bias32 {
+			trunc |= mant & 1
+			mant >>= 1
+			exp++
+		}
+		if mant&1 != 0 && (trunc != 0 || mant&2 != 0) {
+			mant++
+		}
+		mant >>= 1
+		exp++
+		if mant < 1<<mantbits32 {
+			return sign | mant
+		}
+	}
+	return sign | uint32(exp-bias32)<<mantbits32 | mant&(1<<mantbits32-1)
+}
+
+func fadd64(f, g uint64) uint64 {
+	fs, fm, fe, fi, fn := funpack64(f)
+	gs, gm, ge, gi, gn := funpack64(g)
+
+	// Special cases.
+	switch {
+	case fn || gn: // NaN + x or x + NaN = NaN
+		return nan64
+
+	case fi && gi && fs != gs: // +Inf + -Inf or -Inf + +Inf = NaN
+		return nan64
+
+	case fi: // ±Inf + g = ±Inf
+		return f
+
+	case gi: // f + ±Inf = ±Inf
+		return g
+
+	case fm == 0 && gm == 0 && fs != 0 && gs != 0: // -0 + -0 = -0
+		return f
+
+	case fm == 0: // 0 + g = g but 0 + -0 = +0
+		if gm == 0 {
+			g ^= gs
+		}
+		return g
+
+	case gm == 0: // f + 0 = f
+		return f
+
+	}
+
+	if fe < ge || fe == ge && fm < gm {
+		f, g, fs, fm, fe, gs, gm, ge = g, f, gs, gm, ge, fs, fm, fe
+	}
+
+	shift := uint(fe - ge)
+	fm <<= 2
+	gm <<= 2
+	trunc := gm & (1<<shift - 1)
+	gm >>= shift
+	if fs == gs {
+		fm += gm
+	} else {
+		fm -= gm
+		if trunc != 0 {
+			fm--
+		}
+	}
+	if fm == 0 {
+		fs = 0
+	}
+	return fpack64(fs, fm, fe-2, trunc)
+}
+
+func fsub64(f, g uint64) uint64 {
+	return fadd64(f, fneg64(g))
+}
+
+func fneg64(f uint64) uint64 {
+	return f ^ (1 << (mantbits64 + expbits64))
+}
+
+func fmul64(f, g uint64) uint64 {
+	fs, fm, fe, fi, fn := funpack64(f)
+	gs, gm, ge, gi, gn := funpack64(g)
+
+	// Special cases.
+	switch {
+	case fn || gn: // NaN * g or f * NaN = NaN
+		return nan64
+
+	case fi && gi: // Inf * Inf = Inf (with sign adjusted)
+		return f ^ gs
+
+	case fi && gm == 0, fm == 0 && gi: // 0 * Inf = Inf * 0 = NaN
+		return nan64
+
+	case fm == 0: // 0 * x = 0 (with sign adjusted)
+		return f ^ gs
+
+	case gm == 0: // x * 0 = 0 (with sign adjusted)
+		return g ^ fs
+	}
+
+	// 53-bit * 53-bit = 107- or 108-bit
+	lo, hi := mullu(fm, gm)
+	shift := mantbits64 - 1
+	trunc := lo & (1<<shift - 1)
+	mant := hi<<(64-shift) | lo>>shift
+	return fpack64(fs^gs, mant, fe+ge-1, trunc)
+}
+
+func fdiv64(f, g uint64) uint64 {
+	fs, fm, fe, fi, fn := funpack64(f)
+	gs, gm, ge, gi, gn := funpack64(g)
+
+	// Special cases.
+	switch {
+	case fn || gn: // NaN / g = f / NaN = NaN
+		return nan64
+
+	case fi && gi: // ±Inf / ±Inf = NaN
+		return nan64
+
+	case !fi && !gi && fm == 0 && gm == 0: // 0 / 0 = NaN
+		return nan64
+
+	case fi, !gi && gm == 0: // Inf / g = f / 0 = Inf
+		return fs ^ gs ^ inf64
+
+	case gi, fm == 0: // f / Inf = 0 / g = Inf
+		return fs ^ gs ^ 0
+	}
+	_, _, _, _ = fi, fn, gi, gn
+
+	// 53-bit<<54 / 53-bit = 53- or 54-bit.
+	shift := mantbits64 + 2
+	q, r := divlu(fm>>(64-shift), fm<<shift, gm)
+	return fpack64(fs^gs, q, fe-ge-2, r)
+}
+
+func f64to32(f uint64) uint32 {
+	fs, fm, fe, fi, fn := funpack64(f)
+	if fn {
+		return nan32
+	}
+	fs32 := uint32(fs >> 32)
+	if fi {
+		return fs32 ^ inf32
+	}
+	const d = mantbits64 - mantbits32 - 1
+	return fpack32(fs32, uint32(fm>>d), fe-1, uint32(fm&(1<<d-1)))
+}
+
+func f32to64(f uint32) uint64 {
+	const d = mantbits64 - mantbits32
+	fs, fm, fe, fi, fn := funpack32(f)
+	if fn {
+		return nan64
+	}
+	fs64 := uint64(fs) << 32
+	if fi {
+		return fs64 ^ inf64
+	}
+	return fpack64(fs64, uint64(fm)<<d, fe, 0)
+}
+
+func fcmp64(f, g uint64) (cmp int32, isnan bool) {
+	fs, fm, _, fi, fn := funpack64(f)
+	gs, gm, _, gi, gn := funpack64(g)
+
+	switch {
+	case fn, gn: // flag NaN
+		return 0, true
+
+	case !fi && !gi && fm == 0 && gm == 0: // ±0 == ±0
+		return 0, false
+
+	case fs > gs: // f < 0, g > 0
+		return -1, false
+
+	case fs < gs: // f > 0, g < 0
+		return +1, false
+
+	// Same sign, not NaN.
+	// Can compare encodings directly now.
+	// Reverse for sign.
+	case fs == 0 && f < g, fs != 0 && f > g:
+		return -1, false
+
+	case fs == 0 && f > g, fs != 0 && f < g:
+		return +1, false
+	}
+
+	// f == g
+	return 0, false
+}
+
+func f64toint(f uint64) (val int64, ok bool) {
+	fs, fm, fe, fi, fn := funpack64(f)
+
+	switch {
+	case fi, fn: // NaN
+		return 0, false
+
+	case fe < -1: // f < 0.5
+		return 0, false
+
+	case fe > 63: // f >= 2^63
+		if fs != 0 && fm == 0 { // f == -2^63
+			return -1 << 63, true
+		}
+		if fs != 0 {
+			return 0, false
+		}
+		return 0, false
+	}
+
+	for fe > int(mantbits64) {
+		fe--
+		fm <<= 1
+	}
+	for fe < int(mantbits64) {
+		fe++
+		fm >>= 1
+	}
+	val = int64(fm)
+	if fs != 0 {
+		val = -val
+	}
+	return val, true
+}
+
+func fintto64(val int64) (f uint64) {
+	fs := uint64(val) & (1 << 63)
+	mant := uint64(val)
+	if fs != 0 {
+		mant = -mant
+	}
+	return fpack64(fs, mant, int(mantbits64), 0)
+}
+
+// 64x64 -> 128 multiply.
+// adapted from hacker's delight.
+func mullu(u, v uint64) (lo, hi uint64) {
+	const (
+		s    = 32
+		mask = 1<<s - 1
+	)
+	u0 := u & mask
+	u1 := u >> s
+	v0 := v & mask
+	v1 := v >> s
+	w0 := u0 * v0
+	t := u1*v0 + w0>>s
+	w1 := t & mask
+	w2 := t >> s
+	w1 += u0 * v1
+	return u * v, u1*v1 + w2 + w1>>s
+}
+
+// 128/64 -> 64 quotient, 64 remainder.
+// adapted from hacker's delight
+func divlu(u1, u0, v uint64) (q, r uint64) {
+	const b = 1 << 32
+
+	if u1 >= v {
+		return 1<<64 - 1, 1<<64 - 1
+	}
+
+	// s = nlz(v); v <<= s
+	s := uint(0)
+	for v&(1<<63) == 0 {
+		s++
+		v <<= 1
+	}
+
+	vn1 := v >> 32
+	vn0 := v & (1<<32 - 1)
+	un32 := u1<<s | u0>>(64-s)
+	un10 := u0 << s
+	un1 := un10 >> 32
+	un0 := un10 & (1<<32 - 1)
+	q1 := un32 / vn1
+	rhat := un32 - q1*vn1
+
+again1:
+	if q1 >= b || q1*vn0 > b*rhat+un1 {
+		q1--
+		rhat += vn1
+		if rhat < b {
+			goto again1
+		}
+	}
+
+	un21 := un32*b + un1 - q1*v
+	q0 := un21 / vn1
+	rhat = un21 - q0*vn1
+
+again2:
+	if q0 >= b || q0*vn0 > b*rhat+un0 {
+		q0--
+		rhat += vn1
+		if rhat < b {
+			goto again2
+		}
+	}
+
+	return q1*b + q0, (un21*b + un0 - q0*v) >> s
+}
+
+func fadd32(x, y uint32) uint32 {
+	return f64to32(fadd64(f32to64(x), f32to64(y)))
+}
+
+func fmul32(x, y uint32) uint32 {
+	return f64to32(fmul64(f32to64(x), f32to64(y)))
+}
+
+func fdiv32(x, y uint32) uint32 {
+	return f64to32(fdiv64(f32to64(x), f32to64(y)))
+}
+
+func feq32(x, y uint32) bool {
+	cmp, nan := fcmp64(f32to64(x), f32to64(y))
+	return cmp == 0 && !nan
+}
+
+func fgt32(x, y uint32) bool {
+	cmp, nan := fcmp64(f32to64(x), f32to64(y))
+	return cmp >= 1 && !nan
+}
+
+func fge32(x, y uint32) bool {
+	cmp, nan := fcmp64(f32to64(x), f32to64(y))
+	return cmp >= 0 && !nan
+}
+
+func feq64(x, y uint64) bool {
+	cmp, nan := fcmp64(x, y)
+	return cmp == 0 && !nan
+}
+
+func fgt64(x, y uint64) bool {
+	cmp, nan := fcmp64(x, y)
+	return cmp >= 1 && !nan
+}
+
+func fge64(x, y uint64) bool {
+	cmp, nan := fcmp64(x, y)
+	return cmp >= 0 && !nan
+}
+
+func fint32to32(x int32) uint32 {
+	return f64to32(fintto64(int64(x)))
+}
+
+func fint32to64(x int32) uint64 {
+	return fintto64(int64(x))
+}
+
+func fint64to32(x int64) uint32 {
+	return f64to32(fintto64(x))
+}
+
+func fint64to64(x int64) uint64 {
+	return fintto64(x)
+}
+
+func f32toint32(x uint32) int32 {
+	val, _ := f64toint(f32to64(x))
+	return int32(val)
+}
+
+func f32toint64(x uint32) int64 {
+	val, _ := f64toint(f32to64(x))
+	return val
+}
+
+func f64toint32(x uint64) int32 {
+	val, _ := f64toint(x)
+	return int32(val)
+}
+
+func f64toint64(x uint64) int64 {
+	val, _ := f64toint(x)
+	return val
+}
+
+func f64touint64(x float64) uint64 {
+	if x < float64(1<<63) {
+		return uint64(int64(x))
+	}
+	y := x - float64(1<<63)
+	z := uint64(int64(y))
+	return z | (1 << 63)
+}
+
+func f32touint64(x float32) uint64 {
+	if x < float32(1<<63) {
+		return uint64(int64(x))
+	}
+	y := x - float32(1<<63)
+	z := uint64(int64(y))
+	return z | (1 << 63)
+}
+
+func fuint64to64(x uint64) float64 {
+	if int64(x) >= 0 {
+		return float64(int64(x))
+	}
+	// See ../cmd/compile/internal/gc/ssa.go:uint64Tofloat
+	y := x & 1
+	z := x >> 1
+	z = z | y
+	r := float64(int64(z))
+	return r + r
+}
+
+func fuint64to32(x uint64) float32 {
+	return float32(fuint64to64(x))
+}
diff --git a/src/runtime/softfloat64_test.go b/src/runtime/softfloat64_test.go
new file mode 100644
index 0000000..7347aff
--- /dev/null
+++ b/src/runtime/softfloat64_test.go
@@ -0,0 +1,198 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math"
+	"math/rand"
+	. "runtime"
+	"testing"
+)
+
+// turn uint64 op into float64 op
+func fop(f func(x, y uint64) uint64) func(x, y float64) float64 {
+	return func(x, y float64) float64 {
+		bx := math.Float64bits(x)
+		by := math.Float64bits(y)
+		return math.Float64frombits(f(bx, by))
+	}
+}
+
+func add(x, y float64) float64 { return x + y }
+func sub(x, y float64) float64 { return x - y }
+func mul(x, y float64) float64 { return x * y }
+func div(x, y float64) float64 { return x / y }
+
+func TestFloat64(t *testing.T) {
+	base := []float64{
+		0,
+		math.Copysign(0, -1),
+		-1,
+		1,
+		math.NaN(),
+		math.Inf(+1),
+		math.Inf(-1),
+		0.1,
+		1.5,
+		1.9999999999999998,     // all 1s mantissa
+		1.3333333333333333,     // 1.010101010101...
+		1.1428571428571428,     // 1.001001001001...
+		1.112536929253601e-308, // first normal
+		2,
+		4,
+		8,
+		16,
+		32,
+		64,
+		128,
+		256,
+		3,
+		12,
+		1234,
+		123456,
+		-0.1,
+		-1.5,
+		-1.9999999999999998,
+		-1.3333333333333333,
+		-1.1428571428571428,
+		-2,
+		-3,
+		1e-200,
+		1e-300,
+		1e-310,
+		5e-324,
+		1e-105,
+		1e-305,
+		1e+200,
+		1e+306,
+		1e+307,
+		1e+308,
+	}
+	all := make([]float64, 200)
+	copy(all, base)
+	for i := len(base); i < len(all); i++ {
+		all[i] = rand.NormFloat64()
+	}
+
+	test(t, "+", add, fop(Fadd64), all)
+	test(t, "-", sub, fop(Fsub64), all)
+	if GOARCH != "386" { // 386 is not precise!
+		test(t, "*", mul, fop(Fmul64), all)
+		test(t, "/", div, fop(Fdiv64), all)
+	}
+}
+
+// 64 -hw-> 32 -hw-> 64
+func trunc32(f float64) float64 {
+	return float64(float32(f))
+}
+
+// 64 -sw->32 -hw-> 64
+func to32sw(f float64) float64 {
+	return float64(math.Float32frombits(F64to32(math.Float64bits(f))))
+}
+
+// 64 -hw->32 -sw-> 64
+func to64sw(f float64) float64 {
+	return math.Float64frombits(F32to64(math.Float32bits(float32(f))))
+}
+
+// float64 -hw-> int64 -hw-> float64
+func hwint64(f float64) float64 {
+	return float64(int64(f))
+}
+
+// float64 -hw-> int32 -hw-> float64
+func hwint32(f float64) float64 {
+	return float64(int32(f))
+}
+
+// float64 -sw-> int64 -hw-> float64
+func toint64sw(f float64) float64 {
+	i, ok := F64toint(math.Float64bits(f))
+	if !ok {
+		// There's no right answer for out of range.
+		// Match the hardware to pass the test.
+		i = int64(f)
+	}
+	return float64(i)
+}
+
+// float64 -hw-> int64 -sw-> float64
+func fromint64sw(f float64) float64 {
+	return math.Float64frombits(Fintto64(int64(f)))
+}
+
+var nerr int
+
+func err(t *testing.T, format string, args ...interface{}) {
+	t.Errorf(format, args...)
+
+	// cut errors off after a while.
+	// otherwise we spend all our time
+	// allocating memory to hold the
+	// formatted output.
+	if nerr++; nerr >= 10 {
+		t.Fatal("too many errors")
+	}
+}
+
+func test(t *testing.T, op string, hw, sw func(float64, float64) float64, all []float64) {
+	for _, f := range all {
+		for _, g := range all {
+			h := hw(f, g)
+			s := sw(f, g)
+			if !same(h, s) {
+				err(t, "%g %s %g = sw %g, hw %g\n", f, op, g, s, h)
+			}
+			testu(t, "to32", trunc32, to32sw, h)
+			testu(t, "to64", trunc32, to64sw, h)
+			testu(t, "toint64", hwint64, toint64sw, h)
+			testu(t, "fromint64", hwint64, fromint64sw, h)
+			testcmp(t, f, h)
+			testcmp(t, h, f)
+			testcmp(t, g, h)
+			testcmp(t, h, g)
+		}
+	}
+}
+
+func testu(t *testing.T, op string, hw, sw func(float64) float64, v float64) {
+	h := hw(v)
+	s := sw(v)
+	if !same(h, s) {
+		err(t, "%s %g = sw %g, hw %g\n", op, v, s, h)
+	}
+}
+
+func hwcmp(f, g float64) (cmp int, isnan bool) {
+	switch {
+	case f < g:
+		return -1, false
+	case f > g:
+		return +1, false
+	case f == g:
+		return 0, false
+	}
+	return 0, true // must be NaN
+}
+
+func testcmp(t *testing.T, f, g float64) {
+	hcmp, hisnan := hwcmp(f, g)
+	scmp, sisnan := Fcmp64(math.Float64bits(f), math.Float64bits(g))
+	if int32(hcmp) != scmp || hisnan != sisnan {
+		err(t, "cmp(%g, %g) = sw %v, %v, hw %v, %v\n", f, g, scmp, sisnan, hcmp, hisnan)
+	}
+}
+
+func same(f, g float64) bool {
+	if math.IsNaN(f) && math.IsNaN(g) {
+		return true
+	}
+	if math.Copysign(1, f) != math.Copysign(1, g) {
+		return false
+	}
+	return f == g
+}
diff --git a/src/runtime/stack.go b/src/runtime/stack.go
new file mode 100644
index 0000000..7b9dce5
--- /dev/null
+++ b/src/runtime/stack.go
@@ -0,0 +1,1335 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+/*
+Stack layout parameters.
+Included both by runtime (compiled via 6c) and linkers (compiled via gcc).
+
+The per-goroutine g->stackguard is set to point StackGuard bytes
+above the bottom of the stack.  Each function compares its stack
+pointer against g->stackguard to check for overflow.  To cut one
+instruction from the check sequence for functions with tiny frames,
+the stack is allowed to protrude StackSmall bytes below the stack
+guard.  Functions with large frames don't bother with the check and
+always call morestack.  The sequences are (for amd64, others are
+similar):
+
+	guard = g->stackguard
+	frame = function's stack frame size
+	argsize = size of function arguments (call + return)
+
+	stack frame size <= StackSmall:
+		CMPQ guard, SP
+		JHI 3(PC)
+		MOVQ m->morearg, $(argsize << 32)
+		CALL morestack(SB)
+
+	stack frame size > StackSmall but < StackBig
+		LEAQ (frame-StackSmall)(SP), R0
+		CMPQ guard, R0
+		JHI 3(PC)
+		MOVQ m->morearg, $(argsize << 32)
+		CALL morestack(SB)
+
+	stack frame size >= StackBig:
+		MOVQ m->morearg, $((argsize << 32) | frame)
+		CALL morestack(SB)
+
+The bottom StackGuard - StackSmall bytes are important: there has
+to be enough room to execute functions that refuse to check for
+stack overflow, either because they need to be adjacent to the
+actual caller's frame (deferproc) or because they handle the imminent
+stack overflow (morestack).
+
+For example, deferproc might call malloc, which does one of the
+above checks (without allocating a full frame), which might trigger
+a call to morestack.  This sequence needs to fit in the bottom
+section of the stack.  On amd64, morestack's frame is 40 bytes, and
+deferproc's frame is 56 bytes.  That fits well within the
+StackGuard - StackSmall bytes at the bottom.
+The linkers explore all possible call traces involving non-splitting
+functions to make sure that this limit cannot be violated.
+*/
+
+const (
+	// StackSystem is a number of additional bytes to add
+	// to each stack below the usual guard area for OS-specific
+	// purposes like signal handling. Used on Windows, Plan 9,
+	// and iOS because they do not use a separate stack.
+	_StackSystem = sys.GoosWindows*512*sys.PtrSize + sys.GoosPlan9*512 + sys.GoosIos*sys.GoarchArm64*1024
+
+	// The minimum size of stack used by Go code
+	_StackMin = 2048
+
+	// The minimum stack size to allocate.
+	// The hackery here rounds FixedStack0 up to a power of 2.
+	_FixedStack0 = _StackMin + _StackSystem
+	_FixedStack1 = _FixedStack0 - 1
+	_FixedStack2 = _FixedStack1 | (_FixedStack1 >> 1)
+	_FixedStack3 = _FixedStack2 | (_FixedStack2 >> 2)
+	_FixedStack4 = _FixedStack3 | (_FixedStack3 >> 4)
+	_FixedStack5 = _FixedStack4 | (_FixedStack4 >> 8)
+	_FixedStack6 = _FixedStack5 | (_FixedStack5 >> 16)
+	_FixedStack  = _FixedStack6 + 1
+
+	// Functions that need frames bigger than this use an extra
+	// instruction to do the stack split check, to avoid overflow
+	// in case SP - framesize wraps below zero.
+	// This value can be no bigger than the size of the unmapped
+	// space at zero.
+	_StackBig = 4096
+
+	// The stack guard is a pointer this many bytes above the
+	// bottom of the stack.
+	_StackGuard = 928*sys.StackGuardMultiplier + _StackSystem
+
+	// After a stack split check the SP is allowed to be this
+	// many bytes below the stack guard. This saves an instruction
+	// in the checking sequence for tiny frames.
+	_StackSmall = 128
+
+	// The maximum number of bytes that a chain of NOSPLIT
+	// functions can use.
+	_StackLimit = _StackGuard - _StackSystem - _StackSmall
+)
+
+const (
+	// stackDebug == 0: no logging
+	//            == 1: logging of per-stack operations
+	//            == 2: logging of per-frame operations
+	//            == 3: logging of per-word updates
+	//            == 4: logging of per-word reads
+	stackDebug       = 0
+	stackFromSystem  = 0 // allocate stacks from system memory instead of the heap
+	stackFaultOnFree = 0 // old stacks are mapped noaccess to detect use after free
+	stackPoisonCopy  = 0 // fill stack that should not be accessed with garbage, to detect bad dereferences during copy
+	stackNoCache     = 0 // disable per-P small stack caches
+
+	// check the BP links during traceback.
+	debugCheckBP = false
+)
+
+const (
+	uintptrMask = 1<<(8*sys.PtrSize) - 1
+
+	// Goroutine preemption request.
+	// Stored into g->stackguard0 to cause split stack check failure.
+	// Must be greater than any real sp.
+	// 0xfffffade in hex.
+	stackPreempt = uintptrMask & -1314
+
+	// Thread is forking.
+	// Stored into g->stackguard0 to cause split stack check failure.
+	// Must be greater than any real sp.
+	stackFork = uintptrMask & -1234
+)
+
+// Global pool of spans that have free stacks.
+// Stacks are assigned an order according to size.
+//     order = log_2(size/FixedStack)
+// There is a free list for each order.
+var stackpool [_NumStackOrders]struct {
+	item stackpoolItem
+	_    [cpu.CacheLinePadSize - unsafe.Sizeof(stackpoolItem{})%cpu.CacheLinePadSize]byte
+}
+
+//go:notinheap
+type stackpoolItem struct {
+	mu   mutex
+	span mSpanList
+}
+
+// Global pool of large stack spans.
+var stackLarge struct {
+	lock mutex
+	free [heapAddrBits - pageShift]mSpanList // free lists by log_2(s.npages)
+}
+
+func stackinit() {
+	if _StackCacheSize&_PageMask != 0 {
+		throw("cache size must be a multiple of page size")
+	}
+	for i := range stackpool {
+		stackpool[i].item.span.init()
+		lockInit(&stackpool[i].item.mu, lockRankStackpool)
+	}
+	for i := range stackLarge.free {
+		stackLarge.free[i].init()
+		lockInit(&stackLarge.lock, lockRankStackLarge)
+	}
+}
+
+// stacklog2 returns ⌊log_2(n)⌋.
+func stacklog2(n uintptr) int {
+	log2 := 0
+	for n > 1 {
+		n >>= 1
+		log2++
+	}
+	return log2
+}
+
+// Allocates a stack from the free pool. Must be called with
+// stackpool[order].item.mu held.
+func stackpoolalloc(order uint8) gclinkptr {
+	list := &stackpool[order].item.span
+	s := list.first
+	lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
+	if s == nil {
+		// no free stacks. Allocate another span worth.
+		s = mheap_.allocManual(_StackCacheSize>>_PageShift, spanAllocStack)
+		if s == nil {
+			throw("out of memory")
+		}
+		if s.allocCount != 0 {
+			throw("bad allocCount")
+		}
+		if s.manualFreeList.ptr() != nil {
+			throw("bad manualFreeList")
+		}
+		osStackAlloc(s)
+		s.elemsize = _FixedStack << order
+		for i := uintptr(0); i < _StackCacheSize; i += s.elemsize {
+			x := gclinkptr(s.base() + i)
+			x.ptr().next = s.manualFreeList
+			s.manualFreeList = x
+		}
+		list.insert(s)
+	}
+	x := s.manualFreeList
+	if x.ptr() == nil {
+		throw("span has no free stacks")
+	}
+	s.manualFreeList = x.ptr().next
+	s.allocCount++
+	if s.manualFreeList.ptr() == nil {
+		// all stacks in s are allocated.
+		list.remove(s)
+	}
+	return x
+}
+
+// Adds stack x to the free pool. Must be called with stackpool[order].item.mu held.
+func stackpoolfree(x gclinkptr, order uint8) {
+	s := spanOfUnchecked(uintptr(x))
+	if s.state.get() != mSpanManual {
+		throw("freeing stack not in a stack span")
+	}
+	if s.manualFreeList.ptr() == nil {
+		// s will now have a free stack
+		stackpool[order].item.span.insert(s)
+	}
+	x.ptr().next = s.manualFreeList
+	s.manualFreeList = x
+	s.allocCount--
+	if gcphase == _GCoff && s.allocCount == 0 {
+		// Span is completely free. Return it to the heap
+		// immediately if we're sweeping.
+		//
+		// If GC is active, we delay the free until the end of
+		// GC to avoid the following type of situation:
+		//
+		// 1) GC starts, scans a SudoG but does not yet mark the SudoG.elem pointer
+		// 2) The stack that pointer points to is copied
+		// 3) The old stack is freed
+		// 4) The containing span is marked free
+		// 5) GC attempts to mark the SudoG.elem pointer. The
+		//    marking fails because the pointer looks like a
+		//    pointer into a free span.
+		//
+		// By not freeing, we prevent step #4 until GC is done.
+		stackpool[order].item.span.remove(s)
+		s.manualFreeList = 0
+		osStackFree(s)
+		mheap_.freeManual(s, spanAllocStack)
+	}
+}
+
+// stackcacherefill/stackcacherelease implement a global pool of stack segments.
+// The pool is required to prevent unlimited growth of per-thread caches.
+//
+//go:systemstack
+func stackcacherefill(c *mcache, order uint8) {
+	if stackDebug >= 1 {
+		print("stackcacherefill order=", order, "\n")
+	}
+
+	// Grab some stacks from the global cache.
+	// Grab half of the allowed capacity (to prevent thrashing).
+	var list gclinkptr
+	var size uintptr
+	lock(&stackpool[order].item.mu)
+	for size < _StackCacheSize/2 {
+		x := stackpoolalloc(order)
+		x.ptr().next = list
+		list = x
+		size += _FixedStack << order
+	}
+	unlock(&stackpool[order].item.mu)
+	c.stackcache[order].list = list
+	c.stackcache[order].size = size
+}
+
+//go:systemstack
+func stackcacherelease(c *mcache, order uint8) {
+	if stackDebug >= 1 {
+		print("stackcacherelease order=", order, "\n")
+	}
+	x := c.stackcache[order].list
+	size := c.stackcache[order].size
+	lock(&stackpool[order].item.mu)
+	for size > _StackCacheSize/2 {
+		y := x.ptr().next
+		stackpoolfree(x, order)
+		x = y
+		size -= _FixedStack << order
+	}
+	unlock(&stackpool[order].item.mu)
+	c.stackcache[order].list = x
+	c.stackcache[order].size = size
+}
+
+//go:systemstack
+func stackcache_clear(c *mcache) {
+	if stackDebug >= 1 {
+		print("stackcache clear\n")
+	}
+	for order := uint8(0); order < _NumStackOrders; order++ {
+		lock(&stackpool[order].item.mu)
+		x := c.stackcache[order].list
+		for x.ptr() != nil {
+			y := x.ptr().next
+			stackpoolfree(x, order)
+			x = y
+		}
+		c.stackcache[order].list = 0
+		c.stackcache[order].size = 0
+		unlock(&stackpool[order].item.mu)
+	}
+}
+
+// stackalloc allocates an n byte stack.
+//
+// stackalloc must run on the system stack because it uses per-P
+// resources and must not split the stack.
+//
+//go:systemstack
+func stackalloc(n uint32) stack {
+	// Stackalloc must be called on scheduler stack, so that we
+	// never try to grow the stack during the code that stackalloc runs.
+	// Doing so would cause a deadlock (issue 1547).
+	thisg := getg()
+	if thisg != thisg.m.g0 {
+		throw("stackalloc not on scheduler stack")
+	}
+	if n&(n-1) != 0 {
+		throw("stack size not a power of 2")
+	}
+	if stackDebug >= 1 {
+		print("stackalloc ", n, "\n")
+	}
+
+	if debug.efence != 0 || stackFromSystem != 0 {
+		n = uint32(alignUp(uintptr(n), physPageSize))
+		v := sysAlloc(uintptr(n), &memstats.stacks_sys)
+		if v == nil {
+			throw("out of memory (stackalloc)")
+		}
+		return stack{uintptr(v), uintptr(v) + uintptr(n)}
+	}
+
+	// Small stacks are allocated with a fixed-size free-list allocator.
+	// If we need a stack of a bigger size, we fall back on allocating
+	// a dedicated span.
+	var v unsafe.Pointer
+	if n < _FixedStack<<_NumStackOrders && n < _StackCacheSize {
+		order := uint8(0)
+		n2 := n
+		for n2 > _FixedStack {
+			order++
+			n2 >>= 1
+		}
+		var x gclinkptr
+		if stackNoCache != 0 || thisg.m.p == 0 || thisg.m.preemptoff != "" {
+			// thisg.m.p == 0 can happen in the guts of exitsyscall
+			// or procresize. Just get a stack from the global pool.
+			// Also don't touch stackcache during gc
+			// as it's flushed concurrently.
+			lock(&stackpool[order].item.mu)
+			x = stackpoolalloc(order)
+			unlock(&stackpool[order].item.mu)
+		} else {
+			c := thisg.m.p.ptr().mcache
+			x = c.stackcache[order].list
+			if x.ptr() == nil {
+				stackcacherefill(c, order)
+				x = c.stackcache[order].list
+			}
+			c.stackcache[order].list = x.ptr().next
+			c.stackcache[order].size -= uintptr(n)
+		}
+		v = unsafe.Pointer(x)
+	} else {
+		var s *mspan
+		npage := uintptr(n) >> _PageShift
+		log2npage := stacklog2(npage)
+
+		// Try to get a stack from the large stack cache.
+		lock(&stackLarge.lock)
+		if !stackLarge.free[log2npage].isEmpty() {
+			s = stackLarge.free[log2npage].first
+			stackLarge.free[log2npage].remove(s)
+		}
+		unlock(&stackLarge.lock)
+
+		lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
+
+		if s == nil {
+			// Allocate a new stack from the heap.
+			s = mheap_.allocManual(npage, spanAllocStack)
+			if s == nil {
+				throw("out of memory")
+			}
+			osStackAlloc(s)
+			s.elemsize = uintptr(n)
+		}
+		v = unsafe.Pointer(s.base())
+	}
+
+	if raceenabled {
+		racemalloc(v, uintptr(n))
+	}
+	if msanenabled {
+		msanmalloc(v, uintptr(n))
+	}
+	if stackDebug >= 1 {
+		print("  allocated ", v, "\n")
+	}
+	return stack{uintptr(v), uintptr(v) + uintptr(n)}
+}
+
+// stackfree frees an n byte stack allocation at stk.
+//
+// stackfree must run on the system stack because it uses per-P
+// resources and must not split the stack.
+//
+//go:systemstack
+func stackfree(stk stack) {
+	gp := getg()
+	v := unsafe.Pointer(stk.lo)
+	n := stk.hi - stk.lo
+	if n&(n-1) != 0 {
+		throw("stack not a power of 2")
+	}
+	if stk.lo+n < stk.hi {
+		throw("bad stack size")
+	}
+	if stackDebug >= 1 {
+		println("stackfree", v, n)
+		memclrNoHeapPointers(v, n) // for testing, clobber stack data
+	}
+	if debug.efence != 0 || stackFromSystem != 0 {
+		if debug.efence != 0 || stackFaultOnFree != 0 {
+			sysFault(v, n)
+		} else {
+			sysFree(v, n, &memstats.stacks_sys)
+		}
+		return
+	}
+	if msanenabled {
+		msanfree(v, n)
+	}
+	if n < _FixedStack<<_NumStackOrders && n < _StackCacheSize {
+		order := uint8(0)
+		n2 := n
+		for n2 > _FixedStack {
+			order++
+			n2 >>= 1
+		}
+		x := gclinkptr(v)
+		if stackNoCache != 0 || gp.m.p == 0 || gp.m.preemptoff != "" {
+			lock(&stackpool[order].item.mu)
+			stackpoolfree(x, order)
+			unlock(&stackpool[order].item.mu)
+		} else {
+			c := gp.m.p.ptr().mcache
+			if c.stackcache[order].size >= _StackCacheSize {
+				stackcacherelease(c, order)
+			}
+			x.ptr().next = c.stackcache[order].list
+			c.stackcache[order].list = x
+			c.stackcache[order].size += n
+		}
+	} else {
+		s := spanOfUnchecked(uintptr(v))
+		if s.state.get() != mSpanManual {
+			println(hex(s.base()), v)
+			throw("bad span state")
+		}
+		if gcphase == _GCoff {
+			// Free the stack immediately if we're
+			// sweeping.
+			osStackFree(s)
+			mheap_.freeManual(s, spanAllocStack)
+		} else {
+			// If the GC is running, we can't return a
+			// stack span to the heap because it could be
+			// reused as a heap span, and this state
+			// change would race with GC. Add it to the
+			// large stack cache instead.
+			log2npage := stacklog2(s.npages)
+			lock(&stackLarge.lock)
+			stackLarge.free[log2npage].insert(s)
+			unlock(&stackLarge.lock)
+		}
+	}
+}
+
+var maxstacksize uintptr = 1 << 20 // enough until runtime.main sets it for real
+
+var maxstackceiling = maxstacksize
+
+var ptrnames = []string{
+	0: "scalar",
+	1: "ptr",
+}
+
+// Stack frame layout
+//
+// (x86)
+// +------------------+
+// | args from caller |
+// +------------------+ <- frame->argp
+// |  return address  |
+// +------------------+
+// |  caller's BP (*) | (*) if framepointer_enabled && varp < sp
+// +------------------+ <- frame->varp
+// |     locals       |
+// +------------------+
+// |  args to callee  |
+// +------------------+ <- frame->sp
+//
+// (arm)
+// +------------------+
+// | args from caller |
+// +------------------+ <- frame->argp
+// | caller's retaddr |
+// +------------------+ <- frame->varp
+// |     locals       |
+// +------------------+
+// |  args to callee  |
+// +------------------+
+// |  return address  |
+// +------------------+ <- frame->sp
+
+type adjustinfo struct {
+	old   stack
+	delta uintptr // ptr distance from old to new stack (newbase - oldbase)
+	cache pcvalueCache
+
+	// sghi is the highest sudog.elem on the stack.
+	sghi uintptr
+}
+
+// Adjustpointer checks whether *vpp is in the old stack described by adjinfo.
+// If so, it rewrites *vpp to point into the new stack.
+func adjustpointer(adjinfo *adjustinfo, vpp unsafe.Pointer) {
+	pp := (*uintptr)(vpp)
+	p := *pp
+	if stackDebug >= 4 {
+		print("        ", pp, ":", hex(p), "\n")
+	}
+	if adjinfo.old.lo <= p && p < adjinfo.old.hi {
+		*pp = p + adjinfo.delta
+		if stackDebug >= 3 {
+			print("        adjust ptr ", pp, ":", hex(p), " -> ", hex(*pp), "\n")
+		}
+	}
+}
+
+// Information from the compiler about the layout of stack frames.
+// Note: this type must agree with reflect.bitVector.
+type bitvector struct {
+	n        int32 // # of bits
+	bytedata *uint8
+}
+
+// ptrbit returns the i'th bit in bv.
+// ptrbit is less efficient than iterating directly over bitvector bits,
+// and should only be used in non-performance-critical code.
+// See adjustpointers for an example of a high-efficiency walk of a bitvector.
+func (bv *bitvector) ptrbit(i uintptr) uint8 {
+	b := *(addb(bv.bytedata, i/8))
+	return (b >> (i % 8)) & 1
+}
+
+// bv describes the memory starting at address scanp.
+// Adjust any pointers contained therein.
+func adjustpointers(scanp unsafe.Pointer, bv *bitvector, adjinfo *adjustinfo, f funcInfo) {
+	minp := adjinfo.old.lo
+	maxp := adjinfo.old.hi
+	delta := adjinfo.delta
+	num := uintptr(bv.n)
+	// If this frame might contain channel receive slots, use CAS
+	// to adjust pointers. If the slot hasn't been received into
+	// yet, it may contain stack pointers and a concurrent send
+	// could race with adjusting those pointers. (The sent value
+	// itself can never contain stack pointers.)
+	useCAS := uintptr(scanp) < adjinfo.sghi
+	for i := uintptr(0); i < num; i += 8 {
+		if stackDebug >= 4 {
+			for j := uintptr(0); j < 8; j++ {
+				print("        ", add(scanp, (i+j)*sys.PtrSize), ":", ptrnames[bv.ptrbit(i+j)], ":", hex(*(*uintptr)(add(scanp, (i+j)*sys.PtrSize))), " # ", i, " ", *addb(bv.bytedata, i/8), "\n")
+			}
+		}
+		b := *(addb(bv.bytedata, i/8))
+		for b != 0 {
+			j := uintptr(sys.Ctz8(b))
+			b &= b - 1
+			pp := (*uintptr)(add(scanp, (i+j)*sys.PtrSize))
+		retry:
+			p := *pp
+			if f.valid() && 0 < p && p < minLegalPointer && debug.invalidptr != 0 {
+				// Looks like a junk value in a pointer slot.
+				// Live analysis wrong?
+				getg().m.traceback = 2
+				print("runtime: bad pointer in frame ", funcname(f), " at ", pp, ": ", hex(p), "\n")
+				throw("invalid pointer found on stack")
+			}
+			if minp <= p && p < maxp {
+				if stackDebug >= 3 {
+					print("adjust ptr ", hex(p), " ", funcname(f), "\n")
+				}
+				if useCAS {
+					ppu := (*unsafe.Pointer)(unsafe.Pointer(pp))
+					if !atomic.Casp1(ppu, unsafe.Pointer(p), unsafe.Pointer(p+delta)) {
+						goto retry
+					}
+				} else {
+					*pp = p + delta
+				}
+			}
+		}
+	}
+}
+
+// Note: the argument/return area is adjusted by the callee.
+func adjustframe(frame *stkframe, arg unsafe.Pointer) bool {
+	adjinfo := (*adjustinfo)(arg)
+	if frame.continpc == 0 {
+		// Frame is dead.
+		return true
+	}
+	f := frame.fn
+	if stackDebug >= 2 {
+		print("    adjusting ", funcname(f), " frame=[", hex(frame.sp), ",", hex(frame.fp), "] pc=", hex(frame.pc), " continpc=", hex(frame.continpc), "\n")
+	}
+	if f.funcID == funcID_systemstack_switch {
+		// A special routine at the bottom of stack of a goroutine that does a systemstack call.
+		// We will allow it to be copied even though we don't
+		// have full GC info for it (because it is written in asm).
+		return true
+	}
+
+	locals, args, objs := getStackMap(frame, &adjinfo.cache, true)
+
+	// Adjust local variables if stack frame has been allocated.
+	if locals.n > 0 {
+		size := uintptr(locals.n) * sys.PtrSize
+		adjustpointers(unsafe.Pointer(frame.varp-size), &locals, adjinfo, f)
+	}
+
+	// Adjust saved base pointer if there is one.
+	// TODO what about arm64 frame pointer adjustment?
+	if sys.ArchFamily == sys.AMD64 && frame.argp-frame.varp == 2*sys.RegSize {
+		if stackDebug >= 3 {
+			print("      saved bp\n")
+		}
+		if debugCheckBP {
+			// Frame pointers should always point to the next higher frame on
+			// the Go stack (or be nil, for the top frame on the stack).
+			bp := *(*uintptr)(unsafe.Pointer(frame.varp))
+			if bp != 0 && (bp < adjinfo.old.lo || bp >= adjinfo.old.hi) {
+				println("runtime: found invalid frame pointer")
+				print("bp=", hex(bp), " min=", hex(adjinfo.old.lo), " max=", hex(adjinfo.old.hi), "\n")
+				throw("bad frame pointer")
+			}
+		}
+		adjustpointer(adjinfo, unsafe.Pointer(frame.varp))
+	}
+
+	// Adjust arguments.
+	if args.n > 0 {
+		if stackDebug >= 3 {
+			print("      args\n")
+		}
+		adjustpointers(unsafe.Pointer(frame.argp), &args, adjinfo, funcInfo{})
+	}
+
+	// Adjust pointers in all stack objects (whether they are live or not).
+	// See comments in mgcmark.go:scanframeworker.
+	if frame.varp != 0 {
+		for _, obj := range objs {
+			off := obj.off
+			base := frame.varp // locals base pointer
+			if off >= 0 {
+				base = frame.argp // arguments and return values base pointer
+			}
+			p := base + uintptr(off)
+			if p < frame.sp {
+				// Object hasn't been allocated in the frame yet.
+				// (Happens when the stack bounds check fails and
+				// we call into morestack.)
+				continue
+			}
+			t := obj.typ
+			gcdata := t.gcdata
+			var s *mspan
+			if t.kind&kindGCProg != 0 {
+				// See comments in mgcmark.go:scanstack
+				s = materializeGCProg(t.ptrdata, gcdata)
+				gcdata = (*byte)(unsafe.Pointer(s.startAddr))
+			}
+			for i := uintptr(0); i < t.ptrdata; i += sys.PtrSize {
+				if *addb(gcdata, i/(8*sys.PtrSize))>>(i/sys.PtrSize&7)&1 != 0 {
+					adjustpointer(adjinfo, unsafe.Pointer(p+i))
+				}
+			}
+			if s != nil {
+				dematerializeGCProg(s)
+			}
+		}
+	}
+
+	return true
+}
+
+func adjustctxt(gp *g, adjinfo *adjustinfo) {
+	adjustpointer(adjinfo, unsafe.Pointer(&gp.sched.ctxt))
+	if !framepointer_enabled {
+		return
+	}
+	if debugCheckBP {
+		bp := gp.sched.bp
+		if bp != 0 && (bp < adjinfo.old.lo || bp >= adjinfo.old.hi) {
+			println("runtime: found invalid top frame pointer")
+			print("bp=", hex(bp), " min=", hex(adjinfo.old.lo), " max=", hex(adjinfo.old.hi), "\n")
+			throw("bad top frame pointer")
+		}
+	}
+	adjustpointer(adjinfo, unsafe.Pointer(&gp.sched.bp))
+}
+
+func adjustdefers(gp *g, adjinfo *adjustinfo) {
+	// Adjust pointers in the Defer structs.
+	// We need to do this first because we need to adjust the
+	// defer.link fields so we always work on the new stack.
+	adjustpointer(adjinfo, unsafe.Pointer(&gp._defer))
+	for d := gp._defer; d != nil; d = d.link {
+		adjustpointer(adjinfo, unsafe.Pointer(&d.fn))
+		adjustpointer(adjinfo, unsafe.Pointer(&d.sp))
+		adjustpointer(adjinfo, unsafe.Pointer(&d._panic))
+		adjustpointer(adjinfo, unsafe.Pointer(&d.link))
+		adjustpointer(adjinfo, unsafe.Pointer(&d.varp))
+		adjustpointer(adjinfo, unsafe.Pointer(&d.fd))
+	}
+
+	// Adjust defer argument blocks the same way we adjust active stack frames.
+	// Note: this code is after the loop above, so that if a defer record is
+	// stack allocated, we work on the copy in the new stack.
+	tracebackdefers(gp, adjustframe, noescape(unsafe.Pointer(adjinfo)))
+}
+
+func adjustpanics(gp *g, adjinfo *adjustinfo) {
+	// Panics are on stack and already adjusted.
+	// Update pointer to head of list in G.
+	adjustpointer(adjinfo, unsafe.Pointer(&gp._panic))
+}
+
+func adjustsudogs(gp *g, adjinfo *adjustinfo) {
+	// the data elements pointed to by a SudoG structure
+	// might be in the stack.
+	for s := gp.waiting; s != nil; s = s.waitlink {
+		adjustpointer(adjinfo, unsafe.Pointer(&s.elem))
+	}
+}
+
+func fillstack(stk stack, b byte) {
+	for p := stk.lo; p < stk.hi; p++ {
+		*(*byte)(unsafe.Pointer(p)) = b
+	}
+}
+
+func findsghi(gp *g, stk stack) uintptr {
+	var sghi uintptr
+	for sg := gp.waiting; sg != nil; sg = sg.waitlink {
+		p := uintptr(sg.elem) + uintptr(sg.c.elemsize)
+		if stk.lo <= p && p < stk.hi && p > sghi {
+			sghi = p
+		}
+	}
+	return sghi
+}
+
+// syncadjustsudogs adjusts gp's sudogs and copies the part of gp's
+// stack they refer to while synchronizing with concurrent channel
+// operations. It returns the number of bytes of stack copied.
+func syncadjustsudogs(gp *g, used uintptr, adjinfo *adjustinfo) uintptr {
+	if gp.waiting == nil {
+		return 0
+	}
+
+	// Lock channels to prevent concurrent send/receive.
+	var lastc *hchan
+	for sg := gp.waiting; sg != nil; sg = sg.waitlink {
+		if sg.c != lastc {
+			// There is a ranking cycle here between gscan bit and
+			// hchan locks. Normally, we only allow acquiring hchan
+			// locks and then getting a gscan bit. In this case, we
+			// already have the gscan bit. We allow acquiring hchan
+			// locks here as a special case, since a deadlock can't
+			// happen because the G involved must already be
+			// suspended. So, we get a special hchan lock rank here
+			// that is lower than gscan, but doesn't allow acquiring
+			// any other locks other than hchan.
+			lockWithRank(&sg.c.lock, lockRankHchanLeaf)
+		}
+		lastc = sg.c
+	}
+
+	// Adjust sudogs.
+	adjustsudogs(gp, adjinfo)
+
+	// Copy the part of the stack the sudogs point in to
+	// while holding the lock to prevent races on
+	// send/receive slots.
+	var sgsize uintptr
+	if adjinfo.sghi != 0 {
+		oldBot := adjinfo.old.hi - used
+		newBot := oldBot + adjinfo.delta
+		sgsize = adjinfo.sghi - oldBot
+		memmove(unsafe.Pointer(newBot), unsafe.Pointer(oldBot), sgsize)
+	}
+
+	// Unlock channels.
+	lastc = nil
+	for sg := gp.waiting; sg != nil; sg = sg.waitlink {
+		if sg.c != lastc {
+			unlock(&sg.c.lock)
+		}
+		lastc = sg.c
+	}
+
+	return sgsize
+}
+
+// Copies gp's stack to a new stack of a different size.
+// Caller must have changed gp status to Gcopystack.
+func copystack(gp *g, newsize uintptr) {
+	if gp.syscallsp != 0 {
+		throw("stack growth not allowed in system call")
+	}
+	old := gp.stack
+	if old.lo == 0 {
+		throw("nil stackbase")
+	}
+	used := old.hi - gp.sched.sp
+
+	// allocate new stack
+	new := stackalloc(uint32(newsize))
+	if stackPoisonCopy != 0 {
+		fillstack(new, 0xfd)
+	}
+	if stackDebug >= 1 {
+		print("copystack gp=", gp, " [", hex(old.lo), " ", hex(old.hi-used), " ", hex(old.hi), "]", " -> [", hex(new.lo), " ", hex(new.hi-used), " ", hex(new.hi), "]/", newsize, "\n")
+	}
+
+	// Compute adjustment.
+	var adjinfo adjustinfo
+	adjinfo.old = old
+	adjinfo.delta = new.hi - old.hi
+
+	// Adjust sudogs, synchronizing with channel ops if necessary.
+	ncopy := used
+	if !gp.activeStackChans {
+		if newsize < old.hi-old.lo && atomic.Load8(&gp.parkingOnChan) != 0 {
+			// It's not safe for someone to shrink this stack while we're actively
+			// parking on a channel, but it is safe to grow since we do that
+			// ourselves and explicitly don't want to synchronize with channels
+			// since we could self-deadlock.
+			throw("racy sudog adjustment due to parking on channel")
+		}
+		adjustsudogs(gp, &adjinfo)
+	} else {
+		// sudogs may be pointing in to the stack and gp has
+		// released channel locks, so other goroutines could
+		// be writing to gp's stack. Find the highest such
+		// pointer so we can handle everything there and below
+		// carefully. (This shouldn't be far from the bottom
+		// of the stack, so there's little cost in handling
+		// everything below it carefully.)
+		adjinfo.sghi = findsghi(gp, old)
+
+		// Synchronize with channel ops and copy the part of
+		// the stack they may interact with.
+		ncopy -= syncadjustsudogs(gp, used, &adjinfo)
+	}
+
+	// Copy the stack (or the rest of it) to the new location
+	memmove(unsafe.Pointer(new.hi-ncopy), unsafe.Pointer(old.hi-ncopy), ncopy)
+
+	// Adjust remaining structures that have pointers into stacks.
+	// We have to do most of these before we traceback the new
+	// stack because gentraceback uses them.
+	adjustctxt(gp, &adjinfo)
+	adjustdefers(gp, &adjinfo)
+	adjustpanics(gp, &adjinfo)
+	if adjinfo.sghi != 0 {
+		adjinfo.sghi += adjinfo.delta
+	}
+
+	// Swap out old stack for new one
+	gp.stack = new
+	gp.stackguard0 = new.lo + _StackGuard // NOTE: might clobber a preempt request
+	gp.sched.sp = new.hi - used
+	gp.stktopsp += adjinfo.delta
+
+	// Adjust pointers in the new stack.
+	gentraceback(^uintptr(0), ^uintptr(0), 0, gp, 0, nil, 0x7fffffff, adjustframe, noescape(unsafe.Pointer(&adjinfo)), 0)
+
+	// free old stack
+	if stackPoisonCopy != 0 {
+		fillstack(old, 0xfc)
+	}
+	stackfree(old)
+}
+
+// round x up to a power of 2.
+func round2(x int32) int32 {
+	s := uint(0)
+	for 1<<s < x {
+		s++
+	}
+	return 1 << s
+}
+
+// Called from runtime·morestack when more stack is needed.
+// Allocate larger stack and relocate to new stack.
+// Stack growth is multiplicative, for constant amortized cost.
+//
+// g->atomicstatus will be Grunning or Gscanrunning upon entry.
+// If the scheduler is trying to stop this g, then it will set preemptStop.
+//
+// This must be nowritebarrierrec because it can be called as part of
+// stack growth from other nowritebarrierrec functions, but the
+// compiler doesn't check this.
+//
+//go:nowritebarrierrec
+func newstack() {
+	thisg := getg()
+	// TODO: double check all gp. shouldn't be getg().
+	if thisg.m.morebuf.g.ptr().stackguard0 == stackFork {
+		throw("stack growth after fork")
+	}
+	if thisg.m.morebuf.g.ptr() != thisg.m.curg {
+		print("runtime: newstack called from g=", hex(thisg.m.morebuf.g), "\n"+"\tm=", thisg.m, " m->curg=", thisg.m.curg, " m->g0=", thisg.m.g0, " m->gsignal=", thisg.m.gsignal, "\n")
+		morebuf := thisg.m.morebuf
+		traceback(morebuf.pc, morebuf.sp, morebuf.lr, morebuf.g.ptr())
+		throw("runtime: wrong goroutine in newstack")
+	}
+
+	gp := thisg.m.curg
+
+	if thisg.m.curg.throwsplit {
+		// Update syscallsp, syscallpc in case traceback uses them.
+		morebuf := thisg.m.morebuf
+		gp.syscallsp = morebuf.sp
+		gp.syscallpc = morebuf.pc
+		pcname, pcoff := "(unknown)", uintptr(0)
+		f := findfunc(gp.sched.pc)
+		if f.valid() {
+			pcname = funcname(f)
+			pcoff = gp.sched.pc - f.entry
+		}
+		print("runtime: newstack at ", pcname, "+", hex(pcoff),
+			" sp=", hex(gp.sched.sp), " stack=[", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n",
+			"\tmorebuf={pc:", hex(morebuf.pc), " sp:", hex(morebuf.sp), " lr:", hex(morebuf.lr), "}\n",
+			"\tsched={pc:", hex(gp.sched.pc), " sp:", hex(gp.sched.sp), " lr:", hex(gp.sched.lr), " ctxt:", gp.sched.ctxt, "}\n")
+
+		thisg.m.traceback = 2 // Include runtime frames
+		traceback(morebuf.pc, morebuf.sp, morebuf.lr, gp)
+		throw("runtime: stack split at bad time")
+	}
+
+	morebuf := thisg.m.morebuf
+	thisg.m.morebuf.pc = 0
+	thisg.m.morebuf.lr = 0
+	thisg.m.morebuf.sp = 0
+	thisg.m.morebuf.g = 0
+
+	// NOTE: stackguard0 may change underfoot, if another thread
+	// is about to try to preempt gp. Read it just once and use that same
+	// value now and below.
+	preempt := atomic.Loaduintptr(&gp.stackguard0) == stackPreempt
+
+	// Be conservative about where we preempt.
+	// We are interested in preempting user Go code, not runtime code.
+	// If we're holding locks, mallocing, or preemption is disabled, don't
+	// preempt.
+	// This check is very early in newstack so that even the status change
+	// from Grunning to Gwaiting and back doesn't happen in this case.
+	// That status change by itself can be viewed as a small preemption,
+	// because the GC might change Gwaiting to Gscanwaiting, and then
+	// this goroutine has to wait for the GC to finish before continuing.
+	// If the GC is in some way dependent on this goroutine (for example,
+	// it needs a lock held by the goroutine), that small preemption turns
+	// into a real deadlock.
+	if preempt {
+		if !canPreemptM(thisg.m) {
+			// Let the goroutine keep running for now.
+			// gp->preempt is set, so it will be preempted next time.
+			gp.stackguard0 = gp.stack.lo + _StackGuard
+			gogo(&gp.sched) // never return
+		}
+	}
+
+	if gp.stack.lo == 0 {
+		throw("missing stack in newstack")
+	}
+	sp := gp.sched.sp
+	if sys.ArchFamily == sys.AMD64 || sys.ArchFamily == sys.I386 || sys.ArchFamily == sys.WASM {
+		// The call to morestack cost a word.
+		sp -= sys.PtrSize
+	}
+	if stackDebug >= 1 || sp < gp.stack.lo {
+		print("runtime: newstack sp=", hex(sp), " stack=[", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n",
+			"\tmorebuf={pc:", hex(morebuf.pc), " sp:", hex(morebuf.sp), " lr:", hex(morebuf.lr), "}\n",
+			"\tsched={pc:", hex(gp.sched.pc), " sp:", hex(gp.sched.sp), " lr:", hex(gp.sched.lr), " ctxt:", gp.sched.ctxt, "}\n")
+	}
+	if sp < gp.stack.lo {
+		print("runtime: gp=", gp, ", goid=", gp.goid, ", gp->status=", hex(readgstatus(gp)), "\n ")
+		print("runtime: split stack overflow: ", hex(sp), " < ", hex(gp.stack.lo), "\n")
+		throw("runtime: split stack overflow")
+	}
+
+	if preempt {
+		if gp == thisg.m.g0 {
+			throw("runtime: preempt g0")
+		}
+		if thisg.m.p == 0 && thisg.m.locks == 0 {
+			throw("runtime: g is running but p is not")
+		}
+
+		if gp.preemptShrink {
+			// We're at a synchronous safe point now, so
+			// do the pending stack shrink.
+			gp.preemptShrink = false
+			shrinkstack(gp)
+		}
+
+		if gp.preemptStop {
+			preemptPark(gp) // never returns
+		}
+
+		// Act like goroutine called runtime.Gosched.
+		gopreempt_m(gp) // never return
+	}
+
+	// Allocate a bigger segment and move the stack.
+	oldsize := gp.stack.hi - gp.stack.lo
+	newsize := oldsize * 2
+
+	// Make sure we grow at least as much as needed to fit the new frame.
+	// (This is just an optimization - the caller of morestack will
+	// recheck the bounds on return.)
+	if f := findfunc(gp.sched.pc); f.valid() {
+		max := uintptr(funcMaxSPDelta(f))
+		for newsize-oldsize < max+_StackGuard {
+			newsize *= 2
+		}
+	}
+
+	if newsize > maxstacksize || newsize > maxstackceiling {
+		if maxstacksize < maxstackceiling {
+			print("runtime: goroutine stack exceeds ", maxstacksize, "-byte limit\n")
+		} else {
+			print("runtime: goroutine stack exceeds ", maxstackceiling, "-byte limit\n")
+		}
+		print("runtime: sp=", hex(sp), " stack=[", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
+		throw("stack overflow")
+	}
+
+	// The goroutine must be executing in order to call newstack,
+	// so it must be Grunning (or Gscanrunning).
+	casgstatus(gp, _Grunning, _Gcopystack)
+
+	// The concurrent GC will not scan the stack while we are doing the copy since
+	// the gp is in a Gcopystack status.
+	copystack(gp, newsize)
+	if stackDebug >= 1 {
+		print("stack grow done\n")
+	}
+	casgstatus(gp, _Gcopystack, _Grunning)
+	gogo(&gp.sched)
+}
+
+//go:nosplit
+func nilfunc() {
+	*(*uint8)(nil) = 0
+}
+
+// adjust Gobuf as if it executed a call to fn
+// and then did an immediate gosave.
+func gostartcallfn(gobuf *gobuf, fv *funcval) {
+	var fn unsafe.Pointer
+	if fv != nil {
+		fn = unsafe.Pointer(fv.fn)
+	} else {
+		fn = unsafe.Pointer(funcPC(nilfunc))
+	}
+	gostartcall(gobuf, fn, unsafe.Pointer(fv))
+}
+
+// isShrinkStackSafe returns whether it's safe to attempt to shrink
+// gp's stack. Shrinking the stack is only safe when we have precise
+// pointer maps for all frames on the stack.
+func isShrinkStackSafe(gp *g) bool {
+	// We can't copy the stack if we're in a syscall.
+	// The syscall might have pointers into the stack and
+	// often we don't have precise pointer maps for the innermost
+	// frames.
+	//
+	// We also can't copy the stack if we're at an asynchronous
+	// safe-point because we don't have precise pointer maps for
+	// all frames.
+	//
+	// We also can't *shrink* the stack in the window between the
+	// goroutine calling gopark to park on a channel and
+	// gp.activeStackChans being set.
+	return gp.syscallsp == 0 && !gp.asyncSafePoint && atomic.Load8(&gp.parkingOnChan) == 0
+}
+
+// Maybe shrink the stack being used by gp.
+//
+// gp must be stopped and we must own its stack. It may be in
+// _Grunning, but only if this is our own user G.
+func shrinkstack(gp *g) {
+	if gp.stack.lo == 0 {
+		throw("missing stack in shrinkstack")
+	}
+	if s := readgstatus(gp); s&_Gscan == 0 {
+		// We don't own the stack via _Gscan. We could still
+		// own it if this is our own user G and we're on the
+		// system stack.
+		if !(gp == getg().m.curg && getg() != getg().m.curg && s == _Grunning) {
+			// We don't own the stack.
+			throw("bad status in shrinkstack")
+		}
+	}
+	if !isShrinkStackSafe(gp) {
+		throw("shrinkstack at bad time")
+	}
+	// Check for self-shrinks while in a libcall. These may have
+	// pointers into the stack disguised as uintptrs, but these
+	// code paths should all be nosplit.
+	if gp == getg().m.curg && gp.m.libcallsp != 0 {
+		throw("shrinking stack in libcall")
+	}
+
+	if debug.gcshrinkstackoff > 0 {
+		return
+	}
+	f := findfunc(gp.startpc)
+	if f.valid() && f.funcID == funcID_gcBgMarkWorker {
+		// We're not allowed to shrink the gcBgMarkWorker
+		// stack (see gcBgMarkWorker for explanation).
+		return
+	}
+
+	oldsize := gp.stack.hi - gp.stack.lo
+	newsize := oldsize / 2
+	// Don't shrink the allocation below the minimum-sized stack
+	// allocation.
+	if newsize < _FixedStack {
+		return
+	}
+	// Compute how much of the stack is currently in use and only
+	// shrink the stack if gp is using less than a quarter of its
+	// current stack. The currently used stack includes everything
+	// down to the SP plus the stack guard space that ensures
+	// there's room for nosplit functions.
+	avail := gp.stack.hi - gp.stack.lo
+	if used := gp.stack.hi - gp.sched.sp + _StackLimit; used >= avail/4 {
+		return
+	}
+
+	if stackDebug > 0 {
+		print("shrinking stack ", oldsize, "->", newsize, "\n")
+	}
+
+	copystack(gp, newsize)
+}
+
+// freeStackSpans frees unused stack spans at the end of GC.
+func freeStackSpans() {
+
+	// Scan stack pools for empty stack spans.
+	for order := range stackpool {
+		lock(&stackpool[order].item.mu)
+		list := &stackpool[order].item.span
+		for s := list.first; s != nil; {
+			next := s.next
+			if s.allocCount == 0 {
+				list.remove(s)
+				s.manualFreeList = 0
+				osStackFree(s)
+				mheap_.freeManual(s, spanAllocStack)
+			}
+			s = next
+		}
+		unlock(&stackpool[order].item.mu)
+	}
+
+	// Free large stack spans.
+	lock(&stackLarge.lock)
+	for i := range stackLarge.free {
+		for s := stackLarge.free[i].first; s != nil; {
+			next := s.next
+			stackLarge.free[i].remove(s)
+			osStackFree(s)
+			mheap_.freeManual(s, spanAllocStack)
+			s = next
+		}
+	}
+	unlock(&stackLarge.lock)
+}
+
+// getStackMap returns the locals and arguments live pointer maps, and
+// stack object list for frame.
+func getStackMap(frame *stkframe, cache *pcvalueCache, debug bool) (locals, args bitvector, objs []stackObjectRecord) {
+	targetpc := frame.continpc
+	if targetpc == 0 {
+		// Frame is dead. Return empty bitvectors.
+		return
+	}
+
+	f := frame.fn
+	pcdata := int32(-1)
+	if targetpc != f.entry {
+		// Back up to the CALL. If we're at the function entry
+		// point, we want to use the entry map (-1), even if
+		// the first instruction of the function changes the
+		// stack map.
+		targetpc--
+		pcdata = pcdatavalue(f, _PCDATA_StackMapIndex, targetpc, cache)
+	}
+	if pcdata == -1 {
+		// We do not have a valid pcdata value but there might be a
+		// stackmap for this function. It is likely that we are looking
+		// at the function prologue, assume so and hope for the best.
+		pcdata = 0
+	}
+
+	// Local variables.
+	size := frame.varp - frame.sp
+	var minsize uintptr
+	switch sys.ArchFamily {
+	case sys.ARM64:
+		minsize = sys.SpAlign
+	default:
+		minsize = sys.MinFrameSize
+	}
+	if size > minsize {
+		stackid := pcdata
+		stkmap := (*stackmap)(funcdata(f, _FUNCDATA_LocalsPointerMaps))
+		if stkmap == nil || stkmap.n <= 0 {
+			print("runtime: frame ", funcname(f), " untyped locals ", hex(frame.varp-size), "+", hex(size), "\n")
+			throw("missing stackmap")
+		}
+		// If nbit == 0, there's no work to do.
+		if stkmap.nbit > 0 {
+			if stackid < 0 || stackid >= stkmap.n {
+				// don't know where we are
+				print("runtime: pcdata is ", stackid, " and ", stkmap.n, " locals stack map entries for ", funcname(f), " (targetpc=", hex(targetpc), ")\n")
+				throw("bad symbol table")
+			}
+			locals = stackmapdata(stkmap, stackid)
+			if stackDebug >= 3 && debug {
+				print("      locals ", stackid, "/", stkmap.n, " ", locals.n, " words ", locals.bytedata, "\n")
+			}
+		} else if stackDebug >= 3 && debug {
+			print("      no locals to adjust\n")
+		}
+	}
+
+	// Arguments.
+	if frame.arglen > 0 {
+		if frame.argmap != nil {
+			// argmap is set when the function is reflect.makeFuncStub or reflect.methodValueCall.
+			// In this case, arglen specifies how much of the args section is actually live.
+			// (It could be either all the args + results, or just the args.)
+			args = *frame.argmap
+			n := int32(frame.arglen / sys.PtrSize)
+			if n < args.n {
+				args.n = n // Don't use more of the arguments than arglen.
+			}
+		} else {
+			stackmap := (*stackmap)(funcdata(f, _FUNCDATA_ArgsPointerMaps))
+			if stackmap == nil || stackmap.n <= 0 {
+				print("runtime: frame ", funcname(f), " untyped args ", hex(frame.argp), "+", hex(frame.arglen), "\n")
+				throw("missing stackmap")
+			}
+			if pcdata < 0 || pcdata >= stackmap.n {
+				// don't know where we are
+				print("runtime: pcdata is ", pcdata, " and ", stackmap.n, " args stack map entries for ", funcname(f), " (targetpc=", hex(targetpc), ")\n")
+				throw("bad symbol table")
+			}
+			if stackmap.nbit > 0 {
+				args = stackmapdata(stackmap, pcdata)
+			}
+		}
+	}
+
+	// stack objects.
+	p := funcdata(f, _FUNCDATA_StackObjects)
+	if p != nil {
+		n := *(*uintptr)(p)
+		p = add(p, sys.PtrSize)
+		*(*slice)(unsafe.Pointer(&objs)) = slice{array: noescape(p), len: int(n), cap: int(n)}
+		// Note: the noescape above is needed to keep
+		// getStackMap from "leaking param content:
+		// frame".  That leak propagates up to getgcmask, then
+		// GCMask, then verifyGCInfo, which converts the stack
+		// gcinfo tests into heap gcinfo tests :(
+	}
+
+	return
+}
+
+// A stackObjectRecord is generated by the compiler for each stack object in a stack frame.
+// This record must match the generator code in cmd/compile/internal/gc/ssa.go:emitStackObjects.
+type stackObjectRecord struct {
+	// offset in frame
+	// if negative, offset from varp
+	// if non-negative, offset from argp
+	off int
+	typ *_type
+}
+
+// This is exported as ABI0 via linkname so obj can call it.
+//
+//go:nosplit
+//go:linkname morestackc
+func morestackc() {
+	throw("attempt to execute system stack code on user stack")
+}
diff --git a/src/runtime/stack_test.go b/src/runtime/stack_test.go
new file mode 100644
index 0000000..43fc5ca
--- /dev/null
+++ b/src/runtime/stack_test.go
@@ -0,0 +1,894 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"fmt"
+	"os"
+	"reflect"
+	"regexp"
+	. "runtime"
+	"strconv"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"time"
+	_ "unsafe" // for go:linkname
+)
+
+// TestStackMem measures per-thread stack segment cache behavior.
+// The test consumed up to 500MB in the past.
+func TestStackMem(t *testing.T) {
+	const (
+		BatchSize      = 32
+		BatchCount     = 256
+		ArraySize      = 1024
+		RecursionDepth = 128
+	)
+	if testing.Short() {
+		return
+	}
+	defer GOMAXPROCS(GOMAXPROCS(BatchSize))
+	s0 := new(MemStats)
+	ReadMemStats(s0)
+	for b := 0; b < BatchCount; b++ {
+		c := make(chan bool, BatchSize)
+		for i := 0; i < BatchSize; i++ {
+			go func() {
+				var f func(k int, a [ArraySize]byte)
+				f = func(k int, a [ArraySize]byte) {
+					if k == 0 {
+						time.Sleep(time.Millisecond)
+						return
+					}
+					f(k-1, a)
+				}
+				f(RecursionDepth, [ArraySize]byte{})
+				c <- true
+			}()
+		}
+		for i := 0; i < BatchSize; i++ {
+			<-c
+		}
+
+		// The goroutines have signaled via c that they are ready to exit.
+		// Give them a chance to exit by sleeping. If we don't wait, we
+		// might not reuse them on the next batch.
+		time.Sleep(10 * time.Millisecond)
+	}
+	s1 := new(MemStats)
+	ReadMemStats(s1)
+	consumed := int64(s1.StackSys - s0.StackSys)
+	t.Logf("Consumed %vMB for stack mem", consumed>>20)
+	estimate := int64(8 * BatchSize * ArraySize * RecursionDepth) // 8 is to reduce flakiness.
+	if consumed > estimate {
+		t.Fatalf("Stack mem: want %v, got %v", estimate, consumed)
+	}
+	// Due to broken stack memory accounting (https://golang.org/issue/7468),
+	// StackInuse can decrease during function execution, so we cast the values to int64.
+	inuse := int64(s1.StackInuse) - int64(s0.StackInuse)
+	t.Logf("Inuse %vMB for stack mem", inuse>>20)
+	if inuse > 4<<20 {
+		t.Fatalf("Stack inuse: want %v, got %v", 4<<20, inuse)
+	}
+}
+
+// Test stack growing in different contexts.
+func TestStackGrowth(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	if GOARCH == "wasm" {
+		t.Skip("fails on wasm (too slow?)")
+	}
+
+	// Don't make this test parallel as this makes the 20 second
+	// timeout unreliable on slow builders. (See issue #19381.)
+
+	var wg sync.WaitGroup
+
+	// in a normal goroutine
+	var growDuration time.Duration // For debugging failures
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		start := time.Now()
+		growStack(nil)
+		growDuration = time.Since(start)
+	}()
+	wg.Wait()
+
+	// in locked goroutine
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		LockOSThread()
+		growStack(nil)
+		UnlockOSThread()
+	}()
+	wg.Wait()
+
+	// in finalizer
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		done := make(chan bool)
+		var startTime time.Time
+		var started, progress uint32
+		go func() {
+			s := new(string)
+			SetFinalizer(s, func(ss *string) {
+				startTime = time.Now()
+				atomic.StoreUint32(&started, 1)
+				growStack(&progress)
+				done <- true
+			})
+			s = nil
+			done <- true
+		}()
+		<-done
+		GC()
+
+		timeout := 20 * time.Second
+		if s := os.Getenv("GO_TEST_TIMEOUT_SCALE"); s != "" {
+			scale, err := strconv.Atoi(s)
+			if err == nil {
+				timeout *= time.Duration(scale)
+			}
+		}
+
+		select {
+		case <-done:
+		case <-time.After(timeout):
+			if atomic.LoadUint32(&started) == 0 {
+				t.Log("finalizer did not start")
+			} else {
+				t.Logf("finalizer started %s ago and finished %d iterations", time.Since(startTime), atomic.LoadUint32(&progress))
+			}
+			t.Log("first growStack took", growDuration)
+			t.Error("finalizer did not run")
+			return
+		}
+	}()
+	wg.Wait()
+}
+
+// ... and in init
+//func init() {
+//	growStack()
+//}
+
+func growStack(progress *uint32) {
+	n := 1 << 10
+	if testing.Short() {
+		n = 1 << 8
+	}
+	for i := 0; i < n; i++ {
+		x := 0
+		growStackIter(&x, i)
+		if x != i+1 {
+			panic("stack is corrupted")
+		}
+		if progress != nil {
+			atomic.StoreUint32(progress, uint32(i))
+		}
+	}
+	GC()
+}
+
+// This function is not an anonymous func, so that the compiler can do escape
+// analysis and place x on stack (and subsequently stack growth update the pointer).
+func growStackIter(p *int, n int) {
+	if n == 0 {
+		*p = n + 1
+		GC()
+		return
+	}
+	*p = n + 1
+	x := 0
+	growStackIter(&x, n-1)
+	if x != n {
+		panic("stack is corrupted")
+	}
+}
+
+func TestStackGrowthCallback(t *testing.T) {
+	t.Parallel()
+	var wg sync.WaitGroup
+
+	// test stack growth at chan op
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		c := make(chan int, 1)
+		growStackWithCallback(func() {
+			c <- 1
+			<-c
+		})
+	}()
+
+	// test stack growth at map op
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		m := make(map[int]int)
+		growStackWithCallback(func() {
+			_, _ = m[1]
+			m[1] = 1
+		})
+	}()
+
+	// test stack growth at goroutine creation
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		growStackWithCallback(func() {
+			done := make(chan bool)
+			go func() {
+				done <- true
+			}()
+			<-done
+		})
+	}()
+	wg.Wait()
+}
+
+func growStackWithCallback(cb func()) {
+	var f func(n int)
+	f = func(n int) {
+		if n == 0 {
+			cb()
+			return
+		}
+		f(n - 1)
+	}
+	for i := 0; i < 1<<10; i++ {
+		f(i)
+	}
+}
+
+// TestDeferPtrs tests the adjustment of Defer's argument pointers (p aka &y)
+// during a stack copy.
+func set(p *int, x int) {
+	*p = x
+}
+func TestDeferPtrs(t *testing.T) {
+	var y int
+
+	defer func() {
+		if y != 42 {
+			t.Errorf("defer's stack references were not adjusted appropriately")
+		}
+	}()
+	defer set(&y, 42)
+	growStack(nil)
+}
+
+type bigBuf [4 * 1024]byte
+
+// TestDeferPtrsGoexit is like TestDeferPtrs but exercises the possibility that the
+// stack grows as part of starting the deferred function. It calls Goexit at various
+// stack depths, forcing the deferred function (with >4kB of args) to be run at
+// the bottom of the stack. The goal is to find a stack depth less than 4kB from
+// the end of the stack. Each trial runs in a different goroutine so that an earlier
+// stack growth does not invalidate a later attempt.
+func TestDeferPtrsGoexit(t *testing.T) {
+	for i := 0; i < 100; i++ {
+		c := make(chan int, 1)
+		go testDeferPtrsGoexit(c, i)
+		if n := <-c; n != 42 {
+			t.Fatalf("defer's stack references were not adjusted appropriately (i=%d n=%d)", i, n)
+		}
+	}
+}
+
+func testDeferPtrsGoexit(c chan int, i int) {
+	var y int
+	defer func() {
+		c <- y
+	}()
+	defer setBig(&y, 42, bigBuf{})
+	useStackAndCall(i, Goexit)
+}
+
+func setBig(p *int, x int, b bigBuf) {
+	*p = x
+}
+
+// TestDeferPtrsPanic is like TestDeferPtrsGoexit, but it's using panic instead
+// of Goexit to run the Defers. Those two are different execution paths
+// in the runtime.
+func TestDeferPtrsPanic(t *testing.T) {
+	for i := 0; i < 100; i++ {
+		c := make(chan int, 1)
+		go testDeferPtrsGoexit(c, i)
+		if n := <-c; n != 42 {
+			t.Fatalf("defer's stack references were not adjusted appropriately (i=%d n=%d)", i, n)
+		}
+	}
+}
+
+func testDeferPtrsPanic(c chan int, i int) {
+	var y int
+	defer func() {
+		if recover() == nil {
+			c <- -1
+			return
+		}
+		c <- y
+	}()
+	defer setBig(&y, 42, bigBuf{})
+	useStackAndCall(i, func() { panic(1) })
+}
+
+//go:noinline
+func testDeferLeafSigpanic1() {
+	// Cause a sigpanic to be injected in this frame.
+	//
+	// This function has to be declared before
+	// TestDeferLeafSigpanic so the runtime will crash if we think
+	// this function's continuation PC is in
+	// TestDeferLeafSigpanic.
+	*(*int)(nil) = 0
+}
+
+// TestDeferLeafSigpanic tests defer matching around leaf functions
+// that sigpanic. This is tricky because on LR machines the outer
+// function and the inner function have the same SP, but it's critical
+// that we match up the defer correctly to get the right liveness map.
+// See issue #25499.
+func TestDeferLeafSigpanic(t *testing.T) {
+	// Push a defer that will walk the stack.
+	defer func() {
+		if err := recover(); err == nil {
+			t.Fatal("expected panic from nil pointer")
+		}
+		GC()
+	}()
+	// Call a leaf function. We must set up the exact call stack:
+	//
+	//  defering function -> leaf function -> sigpanic
+	//
+	// On LR machines, the leaf function will have the same SP as
+	// the SP pushed for the defer frame.
+	testDeferLeafSigpanic1()
+}
+
+// TestPanicUseStack checks that a chain of Panic structs on the stack are
+// updated correctly if the stack grows during the deferred execution that
+// happens as a result of the panic.
+func TestPanicUseStack(t *testing.T) {
+	pc := make([]uintptr, 10000)
+	defer func() {
+		recover()
+		Callers(0, pc) // force stack walk
+		useStackAndCall(100, func() {
+			defer func() {
+				recover()
+				Callers(0, pc) // force stack walk
+				useStackAndCall(200, func() {
+					defer func() {
+						recover()
+						Callers(0, pc) // force stack walk
+					}()
+					panic(3)
+				})
+			}()
+			panic(2)
+		})
+	}()
+	panic(1)
+}
+
+func TestPanicFar(t *testing.T) {
+	var xtree *xtreeNode
+	pc := make([]uintptr, 10000)
+	defer func() {
+		// At this point we created a large stack and unwound
+		// it via recovery. Force a stack walk, which will
+		// check the stack's consistency.
+		Callers(0, pc)
+	}()
+	defer func() {
+		recover()
+	}()
+	useStackAndCall(100, func() {
+		// Kick off the GC and make it do something nontrivial.
+		// (This used to force stack barriers to stick around.)
+		xtree = makeTree(18)
+		// Give the GC time to start scanning stacks.
+		time.Sleep(time.Millisecond)
+		panic(1)
+	})
+	_ = xtree
+}
+
+type xtreeNode struct {
+	l, r *xtreeNode
+}
+
+func makeTree(d int) *xtreeNode {
+	if d == 0 {
+		return new(xtreeNode)
+	}
+	return &xtreeNode{makeTree(d - 1), makeTree(d - 1)}
+}
+
+// use about n KB of stack and call f
+func useStackAndCall(n int, f func()) {
+	if n == 0 {
+		f()
+		return
+	}
+	var b [1024]byte // makes frame about 1KB
+	useStackAndCall(n-1+int(b[99]), f)
+}
+
+func useStack(n int) {
+	useStackAndCall(n, func() {})
+}
+
+func growing(c chan int, done chan struct{}) {
+	for n := range c {
+		useStack(n)
+		done <- struct{}{}
+	}
+	done <- struct{}{}
+}
+
+func TestStackCache(t *testing.T) {
+	// Allocate a bunch of goroutines and grow their stacks.
+	// Repeat a few times to test the stack cache.
+	const (
+		R = 4
+		G = 200
+		S = 5
+	)
+	for i := 0; i < R; i++ {
+		var reqchans [G]chan int
+		done := make(chan struct{})
+		for j := 0; j < G; j++ {
+			reqchans[j] = make(chan int)
+			go growing(reqchans[j], done)
+		}
+		for s := 0; s < S; s++ {
+			for j := 0; j < G; j++ {
+				reqchans[j] <- 1 << uint(s)
+			}
+			for j := 0; j < G; j++ {
+				<-done
+			}
+		}
+		for j := 0; j < G; j++ {
+			close(reqchans[j])
+		}
+		for j := 0; j < G; j++ {
+			<-done
+		}
+	}
+}
+
+func TestStackOutput(t *testing.T) {
+	b := make([]byte, 1024)
+	stk := string(b[:Stack(b, false)])
+	if !strings.HasPrefix(stk, "goroutine ") {
+		t.Errorf("Stack (len %d):\n%s", len(stk), stk)
+		t.Errorf("Stack output should begin with \"goroutine \"")
+	}
+}
+
+func TestStackAllOutput(t *testing.T) {
+	b := make([]byte, 1024)
+	stk := string(b[:Stack(b, true)])
+	if !strings.HasPrefix(stk, "goroutine ") {
+		t.Errorf("Stack (len %d):\n%s", len(stk), stk)
+		t.Errorf("Stack output should begin with \"goroutine \"")
+	}
+}
+
+func TestStackPanic(t *testing.T) {
+	// Test that stack copying copies panics correctly. This is difficult
+	// to test because it is very unlikely that the stack will be copied
+	// in the middle of gopanic. But it can happen.
+	// To make this test effective, edit panic.go:gopanic and uncomment
+	// the GC() call just before freedefer(d).
+	defer func() {
+		if x := recover(); x == nil {
+			t.Errorf("recover failed")
+		}
+	}()
+	useStack(32)
+	panic("test panic")
+}
+
+func BenchmarkStackCopyPtr(b *testing.B) {
+	c := make(chan bool)
+	for i := 0; i < b.N; i++ {
+		go func() {
+			i := 1000000
+			countp(&i)
+			c <- true
+		}()
+		<-c
+	}
+}
+
+func countp(n *int) {
+	if *n == 0 {
+		return
+	}
+	*n--
+	countp(n)
+}
+
+func BenchmarkStackCopy(b *testing.B) {
+	c := make(chan bool)
+	for i := 0; i < b.N; i++ {
+		go func() {
+			count(1000000)
+			c <- true
+		}()
+		<-c
+	}
+}
+
+func count(n int) int {
+	if n == 0 {
+		return 0
+	}
+	return 1 + count(n-1)
+}
+
+func BenchmarkStackCopyNoCache(b *testing.B) {
+	c := make(chan bool)
+	for i := 0; i < b.N; i++ {
+		go func() {
+			count1(1000000)
+			c <- true
+		}()
+		<-c
+	}
+}
+
+func count1(n int) int {
+	if n <= 0 {
+		return 0
+	}
+	return 1 + count2(n-1)
+}
+
+func count2(n int) int  { return 1 + count3(n-1) }
+func count3(n int) int  { return 1 + count4(n-1) }
+func count4(n int) int  { return 1 + count5(n-1) }
+func count5(n int) int  { return 1 + count6(n-1) }
+func count6(n int) int  { return 1 + count7(n-1) }
+func count7(n int) int  { return 1 + count8(n-1) }
+func count8(n int) int  { return 1 + count9(n-1) }
+func count9(n int) int  { return 1 + count10(n-1) }
+func count10(n int) int { return 1 + count11(n-1) }
+func count11(n int) int { return 1 + count12(n-1) }
+func count12(n int) int { return 1 + count13(n-1) }
+func count13(n int) int { return 1 + count14(n-1) }
+func count14(n int) int { return 1 + count15(n-1) }
+func count15(n int) int { return 1 + count16(n-1) }
+func count16(n int) int { return 1 + count17(n-1) }
+func count17(n int) int { return 1 + count18(n-1) }
+func count18(n int) int { return 1 + count19(n-1) }
+func count19(n int) int { return 1 + count20(n-1) }
+func count20(n int) int { return 1 + count21(n-1) }
+func count21(n int) int { return 1 + count22(n-1) }
+func count22(n int) int { return 1 + count23(n-1) }
+func count23(n int) int { return 1 + count1(n-1) }
+
+type structWithMethod struct{}
+
+func (s structWithMethod) caller() string {
+	_, file, line, ok := Caller(1)
+	if !ok {
+		panic("Caller failed")
+	}
+	return fmt.Sprintf("%s:%d", file, line)
+}
+
+func (s structWithMethod) callers() []uintptr {
+	pc := make([]uintptr, 16)
+	return pc[:Callers(0, pc)]
+}
+
+func (s structWithMethod) stack() string {
+	buf := make([]byte, 4<<10)
+	return string(buf[:Stack(buf, false)])
+}
+
+func (s structWithMethod) nop() {}
+
+func TestStackWrapperCaller(t *testing.T) {
+	var d structWithMethod
+	// Force the compiler to construct a wrapper method.
+	wrapper := (*structWithMethod).caller
+	// Check that the wrapper doesn't affect the stack trace.
+	if dc, ic := d.caller(), wrapper(&d); dc != ic {
+		t.Fatalf("direct caller %q != indirect caller %q", dc, ic)
+	}
+}
+
+func TestStackWrapperCallers(t *testing.T) {
+	var d structWithMethod
+	wrapper := (*structWithMethod).callers
+	// Check that <autogenerated> doesn't appear in the stack trace.
+	pcs := wrapper(&d)
+	frames := CallersFrames(pcs)
+	for {
+		fr, more := frames.Next()
+		if fr.File == "<autogenerated>" {
+			t.Fatalf("<autogenerated> appears in stack trace: %+v", fr)
+		}
+		if !more {
+			break
+		}
+	}
+}
+
+func TestStackWrapperStack(t *testing.T) {
+	var d structWithMethod
+	wrapper := (*structWithMethod).stack
+	// Check that <autogenerated> doesn't appear in the stack trace.
+	stk := wrapper(&d)
+	if strings.Contains(stk, "<autogenerated>") {
+		t.Fatalf("<autogenerated> appears in stack trace:\n%s", stk)
+	}
+}
+
+type I interface {
+	M()
+}
+
+func TestStackWrapperStackPanic(t *testing.T) {
+	t.Run("sigpanic", func(t *testing.T) {
+		// nil calls to interface methods cause a sigpanic.
+		testStackWrapperPanic(t, func() { I.M(nil) }, "runtime_test.I.M")
+	})
+	t.Run("panicwrap", func(t *testing.T) {
+		// Nil calls to value method wrappers call panicwrap.
+		wrapper := (*structWithMethod).nop
+		testStackWrapperPanic(t, func() { wrapper(nil) }, "runtime_test.(*structWithMethod).nop")
+	})
+}
+
+func testStackWrapperPanic(t *testing.T, cb func(), expect string) {
+	// Test that the stack trace from a panicking wrapper includes
+	// the wrapper, even though elide these when they don't panic.
+	t.Run("CallersFrames", func(t *testing.T) {
+		defer func() {
+			err := recover()
+			if err == nil {
+				t.Fatalf("expected panic")
+			}
+			pcs := make([]uintptr, 10)
+			n := Callers(0, pcs)
+			frames := CallersFrames(pcs[:n])
+			for {
+				frame, more := frames.Next()
+				t.Log(frame.Function)
+				if frame.Function == expect {
+					return
+				}
+				if !more {
+					break
+				}
+			}
+			t.Fatalf("panicking wrapper %s missing from stack trace", expect)
+		}()
+		cb()
+	})
+	t.Run("Stack", func(t *testing.T) {
+		defer func() {
+			err := recover()
+			if err == nil {
+				t.Fatalf("expected panic")
+			}
+			buf := make([]byte, 4<<10)
+			stk := string(buf[:Stack(buf, false)])
+			if !strings.Contains(stk, "\n"+expect) {
+				t.Fatalf("panicking wrapper %s missing from stack trace:\n%s", expect, stk)
+			}
+		}()
+		cb()
+	})
+}
+
+func TestCallersFromWrapper(t *testing.T) {
+	// Test that invoking CallersFrames on a stack where the first
+	// PC is an autogenerated wrapper keeps the wrapper in the
+	// trace. Normally we elide these, assuming that the wrapper
+	// calls the thing you actually wanted to see, but in this
+	// case we need to keep it.
+	pc := reflect.ValueOf(I.M).Pointer()
+	frames := CallersFrames([]uintptr{pc})
+	frame, more := frames.Next()
+	if frame.Function != "runtime_test.I.M" {
+		t.Fatalf("want function %s, got %s", "runtime_test.I.M", frame.Function)
+	}
+	if more {
+		t.Fatalf("want 1 frame, got > 1")
+	}
+}
+
+func TestTracebackSystemstack(t *testing.T) {
+	if GOARCH == "ppc64" || GOARCH == "ppc64le" {
+		t.Skip("systemstack tail call not implemented on ppc64x")
+	}
+
+	// Test that profiles correctly jump over systemstack,
+	// including nested systemstack calls.
+	pcs := make([]uintptr, 20)
+	pcs = pcs[:TracebackSystemstack(pcs, 5)]
+	// Check that runtime.TracebackSystemstack appears five times
+	// and that we see TestTracebackSystemstack.
+	countIn, countOut := 0, 0
+	frames := CallersFrames(pcs)
+	var tb bytes.Buffer
+	for {
+		frame, more := frames.Next()
+		fmt.Fprintf(&tb, "\n%s+0x%x %s:%d", frame.Function, frame.PC-frame.Entry, frame.File, frame.Line)
+		switch frame.Function {
+		case "runtime.TracebackSystemstack":
+			countIn++
+		case "runtime_test.TestTracebackSystemstack":
+			countOut++
+		}
+		if !more {
+			break
+		}
+	}
+	if countIn != 5 || countOut != 1 {
+		t.Fatalf("expected 5 calls to TracebackSystemstack and 1 call to TestTracebackSystemstack, got:%s", tb.String())
+	}
+}
+
+func TestTracebackAncestors(t *testing.T) {
+	goroutineRegex := regexp.MustCompile(`goroutine [0-9]+ \[`)
+	for _, tracebackDepth := range []int{0, 1, 5, 50} {
+		output := runTestProg(t, "testprog", "TracebackAncestors", fmt.Sprintf("GODEBUG=tracebackancestors=%d", tracebackDepth))
+
+		numGoroutines := 3
+		numFrames := 2
+		ancestorsExpected := numGoroutines
+		if numGoroutines > tracebackDepth {
+			ancestorsExpected = tracebackDepth
+		}
+
+		matches := goroutineRegex.FindAllStringSubmatch(output, -1)
+		if len(matches) != 2 {
+			t.Fatalf("want 2 goroutines, got:\n%s", output)
+		}
+
+		// Check functions in the traceback.
+		fns := []string{"main.recurseThenCallGo", "main.main", "main.printStack", "main.TracebackAncestors"}
+		for _, fn := range fns {
+			if !strings.Contains(output, "\n"+fn+"(") {
+				t.Fatalf("expected %q function in traceback:\n%s", fn, output)
+			}
+		}
+
+		if want, count := "originating from goroutine", ancestorsExpected; strings.Count(output, want) != count {
+			t.Errorf("output does not contain %d instances of %q:\n%s", count, want, output)
+		}
+
+		if want, count := "main.recurseThenCallGo(...)", ancestorsExpected*(numFrames+1); strings.Count(output, want) != count {
+			t.Errorf("output does not contain %d instances of %q:\n%s", count, want, output)
+		}
+
+		if want, count := "main.recurseThenCallGo(0x", 1; strings.Count(output, want) != count {
+			t.Errorf("output does not contain %d instances of %q:\n%s", count, want, output)
+		}
+	}
+}
+
+// Test that defer closure is correctly scanned when the stack is scanned.
+func TestDeferLiveness(t *testing.T) {
+	output := runTestProg(t, "testprog", "DeferLiveness", "GODEBUG=clobberfree=1")
+	if output != "" {
+		t.Errorf("output:\n%s\n\nwant no output", output)
+	}
+}
+
+func TestDeferHeapAndStack(t *testing.T) {
+	P := 4     // processors
+	N := 10000 //iterations
+	D := 200   // stack depth
+
+	if testing.Short() {
+		P /= 2
+		N /= 10
+		D /= 10
+	}
+	c := make(chan bool)
+	for p := 0; p < P; p++ {
+		go func() {
+			for i := 0; i < N; i++ {
+				if deferHeapAndStack(D) != 2*D {
+					panic("bad result")
+				}
+			}
+			c <- true
+		}()
+	}
+	for p := 0; p < P; p++ {
+		<-c
+	}
+}
+
+// deferHeapAndStack(n) computes 2*n
+func deferHeapAndStack(n int) (r int) {
+	if n == 0 {
+		return 0
+	}
+	if n%2 == 0 {
+		// heap-allocated defers
+		for i := 0; i < 2; i++ {
+			defer func() {
+				r++
+			}()
+		}
+	} else {
+		// stack-allocated defers
+		defer func() {
+			r++
+		}()
+		defer func() {
+			r++
+		}()
+	}
+	r = deferHeapAndStack(n - 1)
+	escapeMe(new([1024]byte)) // force some GCs
+	return
+}
+
+// Pass a value to escapeMe to force it to escape.
+var escapeMe = func(x interface{}) {}
+
+// Test that when F -> G is inlined and F is excluded from stack
+// traces, G still appears.
+func TestTracebackInlineExcluded(t *testing.T) {
+	defer func() {
+		recover()
+		buf := make([]byte, 4<<10)
+		stk := string(buf[:Stack(buf, false)])
+
+		t.Log(stk)
+
+		if not := "tracebackExcluded"; strings.Contains(stk, not) {
+			t.Errorf("found but did not expect %q", not)
+		}
+		if want := "tracebackNotExcluded"; !strings.Contains(stk, want) {
+			t.Errorf("expected %q in stack", want)
+		}
+	}()
+	tracebackExcluded()
+}
+
+// tracebackExcluded should be excluded from tracebacks. There are
+// various ways this could come up. Linking it to a "runtime." name is
+// rather synthetic, but it's easy and reliable. See issue #42754 for
+// one way this happened in real code.
+//
+//go:linkname tracebackExcluded runtime.tracebackExcluded
+//go:noinline
+func tracebackExcluded() {
+	// Call an inlined function that should not itself be excluded
+	// from tracebacks.
+	tracebackNotExcluded()
+}
+
+// tracebackNotExcluded should be inlined into tracebackExcluded, but
+// should not itself be excluded from the traceback.
+func tracebackNotExcluded() {
+	var x *int
+	*x = 0
+}
diff --git a/src/runtime/string.go b/src/runtime/string.go
new file mode 100644
index 0000000..9a601f0
--- /dev/null
+++ b/src/runtime/string.go
@@ -0,0 +1,485 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/bytealg"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// The constant is known to the compiler.
+// There is no fundamental theory behind this number.
+const tmpStringBufSize = 32
+
+type tmpBuf [tmpStringBufSize]byte
+
+// concatstrings implements a Go string concatenation x+y+z+...
+// The operands are passed in the slice a.
+// If buf != nil, the compiler has determined that the result does not
+// escape the calling function, so the string data can be stored in buf
+// if small enough.
+func concatstrings(buf *tmpBuf, a []string) string {
+	idx := 0
+	l := 0
+	count := 0
+	for i, x := range a {
+		n := len(x)
+		if n == 0 {
+			continue
+		}
+		if l+n < l {
+			throw("string concatenation too long")
+		}
+		l += n
+		count++
+		idx = i
+	}
+	if count == 0 {
+		return ""
+	}
+
+	// If there is just one string and either it is not on the stack
+	// or our result does not escape the calling frame (buf != nil),
+	// then we can return that string directly.
+	if count == 1 && (buf != nil || !stringDataOnStack(a[idx])) {
+		return a[idx]
+	}
+	s, b := rawstringtmp(buf, l)
+	for _, x := range a {
+		copy(b, x)
+		b = b[len(x):]
+	}
+	return s
+}
+
+func concatstring2(buf *tmpBuf, a [2]string) string {
+	return concatstrings(buf, a[:])
+}
+
+func concatstring3(buf *tmpBuf, a [3]string) string {
+	return concatstrings(buf, a[:])
+}
+
+func concatstring4(buf *tmpBuf, a [4]string) string {
+	return concatstrings(buf, a[:])
+}
+
+func concatstring5(buf *tmpBuf, a [5]string) string {
+	return concatstrings(buf, a[:])
+}
+
+// slicebytetostring converts a byte slice to a string.
+// It is inserted by the compiler into generated code.
+// ptr is a pointer to the first element of the slice;
+// n is the length of the slice.
+// Buf is a fixed-size buffer for the result,
+// it is not nil if the result does not escape.
+func slicebytetostring(buf *tmpBuf, ptr *byte, n int) (str string) {
+	if n == 0 {
+		// Turns out to be a relatively common case.
+		// Consider that you want to parse out data between parens in "foo()bar",
+		// you find the indices and convert the subslice to string.
+		return ""
+	}
+	if raceenabled {
+		racereadrangepc(unsafe.Pointer(ptr),
+			uintptr(n),
+			getcallerpc(),
+			funcPC(slicebytetostring))
+	}
+	if msanenabled {
+		msanread(unsafe.Pointer(ptr), uintptr(n))
+	}
+	if n == 1 {
+		p := unsafe.Pointer(&staticuint64s[*ptr])
+		if sys.BigEndian {
+			p = add(p, 7)
+		}
+		stringStructOf(&str).str = p
+		stringStructOf(&str).len = 1
+		return
+	}
+
+	var p unsafe.Pointer
+	if buf != nil && n <= len(buf) {
+		p = unsafe.Pointer(buf)
+	} else {
+		p = mallocgc(uintptr(n), nil, false)
+	}
+	stringStructOf(&str).str = p
+	stringStructOf(&str).len = n
+	memmove(p, unsafe.Pointer(ptr), uintptr(n))
+	return
+}
+
+// stringDataOnStack reports whether the string's data is
+// stored on the current goroutine's stack.
+func stringDataOnStack(s string) bool {
+	ptr := uintptr(stringStructOf(&s).str)
+	stk := getg().stack
+	return stk.lo <= ptr && ptr < stk.hi
+}
+
+func rawstringtmp(buf *tmpBuf, l int) (s string, b []byte) {
+	if buf != nil && l <= len(buf) {
+		b = buf[:l]
+		s = slicebytetostringtmp(&b[0], len(b))
+	} else {
+		s, b = rawstring(l)
+	}
+	return
+}
+
+// slicebytetostringtmp returns a "string" referring to the actual []byte bytes.
+//
+// Callers need to ensure that the returned string will not be used after
+// the calling goroutine modifies the original slice or synchronizes with
+// another goroutine.
+//
+// The function is only called when instrumenting
+// and otherwise intrinsified by the compiler.
+//
+// Some internal compiler optimizations use this function.
+// - Used for m[T1{... Tn{..., string(k), ...} ...}] and m[string(k)]
+//   where k is []byte, T1 to Tn is a nesting of struct and array literals.
+// - Used for "<"+string(b)+">" concatenation where b is []byte.
+// - Used for string(b)=="foo" comparison where b is []byte.
+func slicebytetostringtmp(ptr *byte, n int) (str string) {
+	if raceenabled && n > 0 {
+		racereadrangepc(unsafe.Pointer(ptr),
+			uintptr(n),
+			getcallerpc(),
+			funcPC(slicebytetostringtmp))
+	}
+	if msanenabled && n > 0 {
+		msanread(unsafe.Pointer(ptr), uintptr(n))
+	}
+	stringStructOf(&str).str = unsafe.Pointer(ptr)
+	stringStructOf(&str).len = n
+	return
+}
+
+func stringtoslicebyte(buf *tmpBuf, s string) []byte {
+	var b []byte
+	if buf != nil && len(s) <= len(buf) {
+		*buf = tmpBuf{}
+		b = buf[:len(s)]
+	} else {
+		b = rawbyteslice(len(s))
+	}
+	copy(b, s)
+	return b
+}
+
+func stringtoslicerune(buf *[tmpStringBufSize]rune, s string) []rune {
+	// two passes.
+	// unlike slicerunetostring, no race because strings are immutable.
+	n := 0
+	for range s {
+		n++
+	}
+
+	var a []rune
+	if buf != nil && n <= len(buf) {
+		*buf = [tmpStringBufSize]rune{}
+		a = buf[:n]
+	} else {
+		a = rawruneslice(n)
+	}
+
+	n = 0
+	for _, r := range s {
+		a[n] = r
+		n++
+	}
+	return a
+}
+
+func slicerunetostring(buf *tmpBuf, a []rune) string {
+	if raceenabled && len(a) > 0 {
+		racereadrangepc(unsafe.Pointer(&a[0]),
+			uintptr(len(a))*unsafe.Sizeof(a[0]),
+			getcallerpc(),
+			funcPC(slicerunetostring))
+	}
+	if msanenabled && len(a) > 0 {
+		msanread(unsafe.Pointer(&a[0]), uintptr(len(a))*unsafe.Sizeof(a[0]))
+	}
+	var dum [4]byte
+	size1 := 0
+	for _, r := range a {
+		size1 += encoderune(dum[:], r)
+	}
+	s, b := rawstringtmp(buf, size1+3)
+	size2 := 0
+	for _, r := range a {
+		// check for race
+		if size2 >= size1 {
+			break
+		}
+		size2 += encoderune(b[size2:], r)
+	}
+	return s[:size2]
+}
+
+type stringStruct struct {
+	str unsafe.Pointer
+	len int
+}
+
+// Variant with *byte pointer type for DWARF debugging.
+type stringStructDWARF struct {
+	str *byte
+	len int
+}
+
+func stringStructOf(sp *string) *stringStruct {
+	return (*stringStruct)(unsafe.Pointer(sp))
+}
+
+func intstring(buf *[4]byte, v int64) (s string) {
+	var b []byte
+	if buf != nil {
+		b = buf[:]
+		s = slicebytetostringtmp(&b[0], len(b))
+	} else {
+		s, b = rawstring(4)
+	}
+	if int64(rune(v)) != v {
+		v = runeError
+	}
+	n := encoderune(b, rune(v))
+	return s[:n]
+}
+
+// rawstring allocates storage for a new string. The returned
+// string and byte slice both refer to the same storage.
+// The storage is not zeroed. Callers should use
+// b to set the string contents and then drop b.
+func rawstring(size int) (s string, b []byte) {
+	p := mallocgc(uintptr(size), nil, false)
+
+	stringStructOf(&s).str = p
+	stringStructOf(&s).len = size
+
+	*(*slice)(unsafe.Pointer(&b)) = slice{p, size, size}
+
+	return
+}
+
+// rawbyteslice allocates a new byte slice. The byte slice is not zeroed.
+func rawbyteslice(size int) (b []byte) {
+	cap := roundupsize(uintptr(size))
+	p := mallocgc(cap, nil, false)
+	if cap != uintptr(size) {
+		memclrNoHeapPointers(add(p, uintptr(size)), cap-uintptr(size))
+	}
+
+	*(*slice)(unsafe.Pointer(&b)) = slice{p, size, int(cap)}
+	return
+}
+
+// rawruneslice allocates a new rune slice. The rune slice is not zeroed.
+func rawruneslice(size int) (b []rune) {
+	if uintptr(size) > maxAlloc/4 {
+		throw("out of memory")
+	}
+	mem := roundupsize(uintptr(size) * 4)
+	p := mallocgc(mem, nil, false)
+	if mem != uintptr(size)*4 {
+		memclrNoHeapPointers(add(p, uintptr(size)*4), mem-uintptr(size)*4)
+	}
+
+	*(*slice)(unsafe.Pointer(&b)) = slice{p, size, int(mem / 4)}
+	return
+}
+
+// used by cmd/cgo
+func gobytes(p *byte, n int) (b []byte) {
+	if n == 0 {
+		return make([]byte, 0)
+	}
+
+	if n < 0 || uintptr(n) > maxAlloc {
+		panic(errorString("gobytes: length out of range"))
+	}
+
+	bp := mallocgc(uintptr(n), nil, false)
+	memmove(bp, unsafe.Pointer(p), uintptr(n))
+
+	*(*slice)(unsafe.Pointer(&b)) = slice{bp, n, n}
+	return
+}
+
+// This is exported via linkname to assembly in syscall (for Plan9).
+//go:linkname gostring
+func gostring(p *byte) string {
+	l := findnull(p)
+	if l == 0 {
+		return ""
+	}
+	s, b := rawstring(l)
+	memmove(unsafe.Pointer(&b[0]), unsafe.Pointer(p), uintptr(l))
+	return s
+}
+
+func gostringn(p *byte, l int) string {
+	if l == 0 {
+		return ""
+	}
+	s, b := rawstring(l)
+	memmove(unsafe.Pointer(&b[0]), unsafe.Pointer(p), uintptr(l))
+	return s
+}
+
+func hasPrefix(s, prefix string) bool {
+	return len(s) >= len(prefix) && s[:len(prefix)] == prefix
+}
+
+const (
+	maxUint = ^uint(0)
+	maxInt  = int(maxUint >> 1)
+)
+
+// atoi parses an int from a string s.
+// The bool result reports whether s is a number
+// representable by a value of type int.
+func atoi(s string) (int, bool) {
+	if s == "" {
+		return 0, false
+	}
+
+	neg := false
+	if s[0] == '-' {
+		neg = true
+		s = s[1:]
+	}
+
+	un := uint(0)
+	for i := 0; i < len(s); i++ {
+		c := s[i]
+		if c < '0' || c > '9' {
+			return 0, false
+		}
+		if un > maxUint/10 {
+			// overflow
+			return 0, false
+		}
+		un *= 10
+		un1 := un + uint(c) - '0'
+		if un1 < un {
+			// overflow
+			return 0, false
+		}
+		un = un1
+	}
+
+	if !neg && un > uint(maxInt) {
+		return 0, false
+	}
+	if neg && un > uint(maxInt)+1 {
+		return 0, false
+	}
+
+	n := int(un)
+	if neg {
+		n = -n
+	}
+
+	return n, true
+}
+
+// atoi32 is like atoi but for integers
+// that fit into an int32.
+func atoi32(s string) (int32, bool) {
+	if n, ok := atoi(s); n == int(int32(n)) {
+		return int32(n), ok
+	}
+	return 0, false
+}
+
+//go:nosplit
+func findnull(s *byte) int {
+	if s == nil {
+		return 0
+	}
+
+	// Avoid IndexByteString on Plan 9 because it uses SSE instructions
+	// on x86 machines, and those are classified as floating point instructions,
+	// which are illegal in a note handler.
+	if GOOS == "plan9" {
+		p := (*[maxAlloc/2 - 1]byte)(unsafe.Pointer(s))
+		l := 0
+		for p[l] != 0 {
+			l++
+		}
+		return l
+	}
+
+	// pageSize is the unit we scan at a time looking for NULL.
+	// It must be the minimum page size for any architecture Go
+	// runs on. It's okay (just a minor performance loss) if the
+	// actual system page size is larger than this value.
+	const pageSize = 4096
+
+	offset := 0
+	ptr := unsafe.Pointer(s)
+	// IndexByteString uses wide reads, so we need to be careful
+	// with page boundaries. Call IndexByteString on
+	// [ptr, endOfPage) interval.
+	safeLen := int(pageSize - uintptr(ptr)%pageSize)
+
+	for {
+		t := *(*string)(unsafe.Pointer(&stringStruct{ptr, safeLen}))
+		// Check one page at a time.
+		if i := bytealg.IndexByteString(t, 0); i != -1 {
+			return offset + i
+		}
+		// Move to next page
+		ptr = unsafe.Pointer(uintptr(ptr) + uintptr(safeLen))
+		offset += safeLen
+		safeLen = pageSize
+	}
+}
+
+func findnullw(s *uint16) int {
+	if s == nil {
+		return 0
+	}
+	p := (*[maxAlloc/2/2 - 1]uint16)(unsafe.Pointer(s))
+	l := 0
+	for p[l] != 0 {
+		l++
+	}
+	return l
+}
+
+//go:nosplit
+func gostringnocopy(str *byte) string {
+	ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)}
+	s := *(*string)(unsafe.Pointer(&ss))
+	return s
+}
+
+func gostringw(strw *uint16) string {
+	var buf [8]byte
+	str := (*[maxAlloc/2/2 - 1]uint16)(unsafe.Pointer(strw))
+	n1 := 0
+	for i := 0; str[i] != 0; i++ {
+		n1 += encoderune(buf[:], rune(str[i]))
+	}
+	s, b := rawstring(n1 + 4)
+	n2 := 0
+	for i := 0; str[i] != 0; i++ {
+		// check for race
+		if n2 >= n1 {
+			break
+		}
+		n2 += encoderune(b[n2:], rune(str[i]))
+	}
+	b[n2] = 0 // for luck
+	return s[:n2]
+}
diff --git a/src/runtime/string_test.go b/src/runtime/string_test.go
new file mode 100644
index 0000000..4eda12c
--- /dev/null
+++ b/src/runtime/string_test.go
@@ -0,0 +1,456 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"strconv"
+	"strings"
+	"testing"
+	"unicode/utf8"
+)
+
+// Strings and slices that don't escape and fit into tmpBuf are stack allocated,
+// which defeats using AllocsPerRun to test other optimizations.
+const sizeNoStack = 100
+
+func BenchmarkCompareStringEqual(b *testing.B) {
+	bytes := []byte("Hello Gophers!")
+	s1, s2 := string(bytes), string(bytes)
+	for i := 0; i < b.N; i++ {
+		if s1 != s2 {
+			b.Fatal("s1 != s2")
+		}
+	}
+}
+
+func BenchmarkCompareStringIdentical(b *testing.B) {
+	s1 := "Hello Gophers!"
+	s2 := s1
+	for i := 0; i < b.N; i++ {
+		if s1 != s2 {
+			b.Fatal("s1 != s2")
+		}
+	}
+}
+
+func BenchmarkCompareStringSameLength(b *testing.B) {
+	s1 := "Hello Gophers!"
+	s2 := "Hello, Gophers"
+	for i := 0; i < b.N; i++ {
+		if s1 == s2 {
+			b.Fatal("s1 == s2")
+		}
+	}
+}
+
+func BenchmarkCompareStringDifferentLength(b *testing.B) {
+	s1 := "Hello Gophers!"
+	s2 := "Hello, Gophers!"
+	for i := 0; i < b.N; i++ {
+		if s1 == s2 {
+			b.Fatal("s1 == s2")
+		}
+	}
+}
+
+func BenchmarkCompareStringBigUnaligned(b *testing.B) {
+	bytes := make([]byte, 0, 1<<20)
+	for len(bytes) < 1<<20 {
+		bytes = append(bytes, "Hello Gophers!"...)
+	}
+	s1, s2 := string(bytes), "hello"+string(bytes)
+	for i := 0; i < b.N; i++ {
+		if s1 != s2[len("hello"):] {
+			b.Fatal("s1 != s2")
+		}
+	}
+	b.SetBytes(int64(len(s1)))
+}
+
+func BenchmarkCompareStringBig(b *testing.B) {
+	bytes := make([]byte, 0, 1<<20)
+	for len(bytes) < 1<<20 {
+		bytes = append(bytes, "Hello Gophers!"...)
+	}
+	s1, s2 := string(bytes), string(bytes)
+	for i := 0; i < b.N; i++ {
+		if s1 != s2 {
+			b.Fatal("s1 != s2")
+		}
+	}
+	b.SetBytes(int64(len(s1)))
+}
+
+func BenchmarkConcatStringAndBytes(b *testing.B) {
+	s1 := []byte("Gophers!")
+	for i := 0; i < b.N; i++ {
+		_ = "Hello " + string(s1)
+	}
+}
+
+var escapeString string
+
+func BenchmarkSliceByteToString(b *testing.B) {
+	buf := []byte{'!'}
+	for n := 0; n < 8; n++ {
+		b.Run(strconv.Itoa(len(buf)), func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				escapeString = string(buf)
+			}
+		})
+		buf = append(buf, buf...)
+	}
+}
+
+var stringdata = []struct{ name, data string }{
+	{"ASCII", "01234567890"},
+	{"Japanese", "日本語日本語日本語"},
+	{"MixedLength", "$Ѐࠀက퀀𐀀\U00040000\U0010FFFF"},
+}
+
+var sinkInt int
+
+func BenchmarkRuneCount(b *testing.B) {
+	// Each sub-benchmark counts the runes in a string in a different way.
+	b.Run("lenruneslice", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					sinkInt += len([]rune(sd.data))
+				}
+			})
+		}
+	})
+	b.Run("rangeloop", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					n := 0
+					for range sd.data {
+						n++
+					}
+					sinkInt += n
+				}
+			})
+		}
+	})
+	b.Run("utf8.RuneCountInString", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					sinkInt += utf8.RuneCountInString(sd.data)
+				}
+			})
+		}
+	})
+}
+
+func BenchmarkRuneIterate(b *testing.B) {
+	b.Run("range", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					for range sd.data {
+					}
+				}
+			})
+		}
+	})
+	b.Run("range1", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					for range sd.data {
+					}
+				}
+			})
+		}
+	})
+	b.Run("range2", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					for range sd.data {
+					}
+				}
+			})
+		}
+	})
+}
+
+func BenchmarkArrayEqual(b *testing.B) {
+	a1 := [16]byte{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
+	a2 := [16]byte{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		if a1 != a2 {
+			b.Fatal("not equal")
+		}
+	}
+}
+
+func TestStringW(t *testing.T) {
+	strings := []string{
+		"hello",
+		"a\u5566\u7788b",
+	}
+
+	for _, s := range strings {
+		var b []uint16
+		for _, c := range s {
+			b = append(b, uint16(c))
+			if c != rune(uint16(c)) {
+				t.Errorf("bad test: stringW can't handle >16 bit runes")
+			}
+		}
+		b = append(b, 0)
+		r := runtime.GostringW(b)
+		if r != s {
+			t.Errorf("gostringW(%v) = %s, want %s", b, r, s)
+		}
+	}
+}
+
+func TestLargeStringConcat(t *testing.T) {
+	output := runTestProg(t, "testprog", "stringconcat")
+	want := "panic: " + strings.Repeat("0", 1<<10) + strings.Repeat("1", 1<<10) +
+		strings.Repeat("2", 1<<10) + strings.Repeat("3", 1<<10)
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestCompareTempString(t *testing.T) {
+	s := strings.Repeat("x", sizeNoStack)
+	b := []byte(s)
+	n := testing.AllocsPerRun(1000, func() {
+		if string(b) != s {
+			t.Fatalf("strings are not equal: '%v' and '%v'", string(b), s)
+		}
+		if string(b) == s {
+		} else {
+			t.Fatalf("strings are not equal: '%v' and '%v'", string(b), s)
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestStringIndexHaystack(t *testing.T) {
+	// See issue 25864.
+	haystack := []byte("hello")
+	needle := "ll"
+	n := testing.AllocsPerRun(1000, func() {
+		if strings.Index(string(haystack), needle) != 2 {
+			t.Fatalf("needle not found")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestStringIndexNeedle(t *testing.T) {
+	// See issue 25864.
+	haystack := "hello"
+	needle := []byte("ll")
+	n := testing.AllocsPerRun(1000, func() {
+		if strings.Index(haystack, string(needle)) != 2 {
+			t.Fatalf("needle not found")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestStringOnStack(t *testing.T) {
+	s := ""
+	for i := 0; i < 3; i++ {
+		s = "a" + s + "b" + s + "c"
+	}
+
+	if want := "aaabcbabccbaabcbabccc"; s != want {
+		t.Fatalf("want: '%v', got '%v'", want, s)
+	}
+}
+
+func TestIntString(t *testing.T) {
+	// Non-escaping result of intstring.
+	s := ""
+	for i := rune(0); i < 4; i++ {
+		s += string(i+'0') + string(i+'0'+1)
+	}
+	if want := "01122334"; s != want {
+		t.Fatalf("want '%v', got '%v'", want, s)
+	}
+
+	// Escaping result of intstring.
+	var a [4]string
+	for i := rune(0); i < 4; i++ {
+		a[i] = string(i + '0')
+	}
+	s = a[0] + a[1] + a[2] + a[3]
+	if want := "0123"; s != want {
+		t.Fatalf("want '%v', got '%v'", want, s)
+	}
+}
+
+func TestIntStringAllocs(t *testing.T) {
+	unknown := '0'
+	n := testing.AllocsPerRun(1000, func() {
+		s1 := string(unknown)
+		s2 := string(unknown + 1)
+		if s1 == s2 {
+			t.Fatalf("bad")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestRangeStringCast(t *testing.T) {
+	s := strings.Repeat("x", sizeNoStack)
+	n := testing.AllocsPerRun(1000, func() {
+		for i, c := range []byte(s) {
+			if c != s[i] {
+				t.Fatalf("want '%c' at pos %v, got '%c'", s[i], i, c)
+			}
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func isZeroed(b []byte) bool {
+	for _, x := range b {
+		if x != 0 {
+			return false
+		}
+	}
+	return true
+}
+
+func isZeroedR(r []rune) bool {
+	for _, x := range r {
+		if x != 0 {
+			return false
+		}
+	}
+	return true
+}
+
+func TestString2Slice(t *testing.T) {
+	// Make sure we don't return slices that expose
+	// an unzeroed section of stack-allocated temp buf
+	// between len and cap. See issue 14232.
+	s := "foož"
+	b := ([]byte)(s)
+	if !isZeroed(b[len(b):cap(b)]) {
+		t.Errorf("extra bytes not zeroed")
+	}
+	r := ([]rune)(s)
+	if !isZeroedR(r[len(r):cap(r)]) {
+		t.Errorf("extra runes not zeroed")
+	}
+}
+
+const intSize = 32 << (^uint(0) >> 63)
+
+type atoi64Test struct {
+	in  string
+	out int64
+	ok  bool
+}
+
+var atoi64tests = []atoi64Test{
+	{"", 0, false},
+	{"0", 0, true},
+	{"-0", 0, true},
+	{"1", 1, true},
+	{"-1", -1, true},
+	{"12345", 12345, true},
+	{"-12345", -12345, true},
+	{"012345", 12345, true},
+	{"-012345", -12345, true},
+	{"12345x", 0, false},
+	{"-12345x", 0, false},
+	{"98765432100", 98765432100, true},
+	{"-98765432100", -98765432100, true},
+	{"20496382327982653440", 0, false},
+	{"-20496382327982653440", 0, false},
+	{"9223372036854775807", 1<<63 - 1, true},
+	{"-9223372036854775807", -(1<<63 - 1), true},
+	{"9223372036854775808", 0, false},
+	{"-9223372036854775808", -1 << 63, true},
+	{"9223372036854775809", 0, false},
+	{"-9223372036854775809", 0, false},
+}
+
+func TestAtoi(t *testing.T) {
+	switch intSize {
+	case 32:
+		for i := range atoi32tests {
+			test := &atoi32tests[i]
+			out, ok := runtime.Atoi(test.in)
+			if test.out != int32(out) || test.ok != ok {
+				t.Errorf("atoi(%q) = (%v, %v) want (%v, %v)",
+					test.in, out, ok, test.out, test.ok)
+			}
+		}
+	case 64:
+		for i := range atoi64tests {
+			test := &atoi64tests[i]
+			out, ok := runtime.Atoi(test.in)
+			if test.out != int64(out) || test.ok != ok {
+				t.Errorf("atoi(%q) = (%v, %v) want (%v, %v)",
+					test.in, out, ok, test.out, test.ok)
+			}
+		}
+	}
+}
+
+type atoi32Test struct {
+	in  string
+	out int32
+	ok  bool
+}
+
+var atoi32tests = []atoi32Test{
+	{"", 0, false},
+	{"0", 0, true},
+	{"-0", 0, true},
+	{"1", 1, true},
+	{"-1", -1, true},
+	{"12345", 12345, true},
+	{"-12345", -12345, true},
+	{"012345", 12345, true},
+	{"-012345", -12345, true},
+	{"12345x", 0, false},
+	{"-12345x", 0, false},
+	{"987654321", 987654321, true},
+	{"-987654321", -987654321, true},
+	{"2147483647", 1<<31 - 1, true},
+	{"-2147483647", -(1<<31 - 1), true},
+	{"2147483648", 0, false},
+	{"-2147483648", -1 << 31, true},
+	{"2147483649", 0, false},
+	{"-2147483649", 0, false},
+}
+
+func TestAtoi32(t *testing.T) {
+	for i := range atoi32tests {
+		test := &atoi32tests[i]
+		out, ok := runtime.Atoi32(test.in)
+		if test.out != out || test.ok != ok {
+			t.Errorf("atoi32(%q) = (%v, %v) want (%v, %v)",
+				test.in, out, ok, test.out, test.ok)
+		}
+	}
+}
diff --git a/src/runtime/stubs.go b/src/runtime/stubs.go
new file mode 100644
index 0000000..2ee2c74
--- /dev/null
+++ b/src/runtime/stubs.go
@@ -0,0 +1,359 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// Should be a built-in for unsafe.Pointer?
+//go:nosplit
+func add(p unsafe.Pointer, x uintptr) unsafe.Pointer {
+	return unsafe.Pointer(uintptr(p) + x)
+}
+
+// getg returns the pointer to the current g.
+// The compiler rewrites calls to this function into instructions
+// that fetch the g directly (from TLS or from the dedicated register).
+func getg() *g
+
+// mcall switches from the g to the g0 stack and invokes fn(g),
+// where g is the goroutine that made the call.
+// mcall saves g's current PC/SP in g->sched so that it can be restored later.
+// It is up to fn to arrange for that later execution, typically by recording
+// g in a data structure, causing something to call ready(g) later.
+// mcall returns to the original goroutine g later, when g has been rescheduled.
+// fn must not return at all; typically it ends by calling schedule, to let the m
+// run other goroutines.
+//
+// mcall can only be called from g stacks (not g0, not gsignal).
+//
+// This must NOT be go:noescape: if fn is a stack-allocated closure,
+// fn puts g on a run queue, and g executes before fn returns, the
+// closure will be invalidated while it is still executing.
+func mcall(fn func(*g))
+
+// systemstack runs fn on a system stack.
+// If systemstack is called from the per-OS-thread (g0) stack, or
+// if systemstack is called from the signal handling (gsignal) stack,
+// systemstack calls fn directly and returns.
+// Otherwise, systemstack is being called from the limited stack
+// of an ordinary goroutine. In this case, systemstack switches
+// to the per-OS-thread stack, calls fn, and switches back.
+// It is common to use a func literal as the argument, in order
+// to share inputs and outputs with the code around the call
+// to system stack:
+//
+//	... set up y ...
+//	systemstack(func() {
+//		x = bigcall(y)
+//	})
+//	... use x ...
+//
+//go:noescape
+func systemstack(fn func())
+
+var badsystemstackMsg = "fatal: systemstack called from unexpected goroutine"
+
+//go:nosplit
+//go:nowritebarrierrec
+func badsystemstack() {
+	sp := stringStructOf(&badsystemstackMsg)
+	write(2, sp.str, int32(sp.len))
+}
+
+// memclrNoHeapPointers clears n bytes starting at ptr.
+//
+// Usually you should use typedmemclr. memclrNoHeapPointers should be
+// used only when the caller knows that *ptr contains no heap pointers
+// because either:
+//
+// *ptr is initialized memory and its type is pointer-free, or
+//
+// *ptr is uninitialized memory (e.g., memory that's being reused
+// for a new allocation) and hence contains only "junk".
+//
+// memclrNoHeapPointers ensures that if ptr is pointer-aligned, and n
+// is a multiple of the pointer size, then any pointer-aligned,
+// pointer-sized portion is cleared atomically. Despite the function
+// name, this is necessary because this function is the underlying
+// implementation of typedmemclr and memclrHasPointers. See the doc of
+// memmove for more details.
+//
+// The (CPU-specific) implementations of this function are in memclr_*.s.
+//
+//go:noescape
+func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+
+//go:linkname reflect_memclrNoHeapPointers reflect.memclrNoHeapPointers
+func reflect_memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr) {
+	memclrNoHeapPointers(ptr, n)
+}
+
+// memmove copies n bytes from "from" to "to".
+//
+// memmove ensures that any pointer in "from" is written to "to" with
+// an indivisible write, so that racy reads cannot observe a
+// half-written pointer. This is necessary to prevent the garbage
+// collector from observing invalid pointers, and differs from memmove
+// in unmanaged languages. However, memmove is only required to do
+// this if "from" and "to" may contain pointers, which can only be the
+// case if "from", "to", and "n" are all be word-aligned.
+//
+// Implementations are in memmove_*.s.
+//
+//go:noescape
+func memmove(to, from unsafe.Pointer, n uintptr)
+
+//go:linkname reflect_memmove reflect.memmove
+func reflect_memmove(to, from unsafe.Pointer, n uintptr) {
+	memmove(to, from, n)
+}
+
+// exported value for testing
+var hashLoad = float32(loadFactorNum) / float32(loadFactorDen)
+
+//go:nosplit
+func fastrand() uint32 {
+	mp := getg().m
+	// Implement xorshift64+: 2 32-bit xorshift sequences added together.
+	// Shift triplet [17,7,16] was calculated as indicated in Marsaglia's
+	// Xorshift paper: https://www.jstatsoft.org/article/view/v008i14/xorshift.pdf
+	// This generator passes the SmallCrush suite, part of TestU01 framework:
+	// http://simul.iro.umontreal.ca/testu01/tu01.html
+	s1, s0 := mp.fastrand[0], mp.fastrand[1]
+	s1 ^= s1 << 17
+	s1 = s1 ^ s0 ^ s1>>7 ^ s0>>16
+	mp.fastrand[0], mp.fastrand[1] = s0, s1
+	return s0 + s1
+}
+
+//go:nosplit
+func fastrandn(n uint32) uint32 {
+	// This is similar to fastrand() % n, but faster.
+	// See https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
+	return uint32(uint64(fastrand()) * uint64(n) >> 32)
+}
+
+//go:linkname sync_fastrand sync.fastrand
+func sync_fastrand() uint32 { return fastrand() }
+
+//go:linkname net_fastrand net.fastrand
+func net_fastrand() uint32 { return fastrand() }
+
+//go:linkname os_fastrand os.fastrand
+func os_fastrand() uint32 { return fastrand() }
+
+// in internal/bytealg/equal_*.s
+//go:noescape
+func memequal(a, b unsafe.Pointer, size uintptr) bool
+
+// noescape hides a pointer from escape analysis.  noescape is
+// the identity function but escape analysis doesn't think the
+// output depends on the input.  noescape is inlined and currently
+// compiles down to zero instructions.
+// USE CAREFULLY!
+//go:nosplit
+func noescape(p unsafe.Pointer) unsafe.Pointer {
+	x := uintptr(p)
+	return unsafe.Pointer(x ^ 0)
+}
+
+// Not all cgocallback frames are actually cgocallback,
+// so not all have these arguments. Mark them uintptr so that the GC
+// does not misinterpret memory when the arguments are not present.
+// cgocallback is not called from Go, only from crosscall2.
+// This in turn calls cgocallbackg, which is where we'll find
+// pointer-declared arguments.
+func cgocallback(fn, frame, ctxt uintptr)
+func gogo(buf *gobuf)
+func gosave(buf *gobuf)
+
+//go:noescape
+func jmpdefer(fv *funcval, argp uintptr)
+func asminit()
+func setg(gg *g)
+func breakpoint()
+
+// reflectcall calls fn with a copy of the n argument bytes pointed at by arg.
+// After fn returns, reflectcall copies n-retoffset result bytes
+// back into arg+retoffset before returning. If copying result bytes back,
+// the caller should pass the argument frame type as argtype, so that
+// call can execute appropriate write barriers during the copy.
+//
+// Package reflect always passes a frame type. In package runtime,
+// Windows callbacks are the only use of this that copies results
+// back, and those cannot have pointers in their results, so runtime
+// passes nil for the frame type.
+//
+// Package reflect accesses this symbol through a linkname.
+func reflectcall(argtype *_type, fn, arg unsafe.Pointer, argsize uint32, retoffset uint32)
+
+func procyield(cycles uint32)
+
+type neverCallThisFunction struct{}
+
+// goexit is the return stub at the top of every goroutine call stack.
+// Each goroutine stack is constructed as if goexit called the
+// goroutine's entry point function, so that when the entry point
+// function returns, it will return to goexit, which will call goexit1
+// to perform the actual exit.
+//
+// This function must never be called directly. Call goexit1 instead.
+// gentraceback assumes that goexit terminates the stack. A direct
+// call on the stack will cause gentraceback to stop walking the stack
+// prematurely and if there is leftover state it may panic.
+func goexit(neverCallThisFunction)
+
+// publicationBarrier performs a store/store barrier (a "publication"
+// or "export" barrier). Some form of synchronization is required
+// between initializing an object and making that object accessible to
+// another processor. Without synchronization, the initialization
+// writes and the "publication" write may be reordered, allowing the
+// other processor to follow the pointer and observe an uninitialized
+// object. In general, higher-level synchronization should be used,
+// such as locking or an atomic pointer write. publicationBarrier is
+// for when those aren't an option, such as in the implementation of
+// the memory manager.
+//
+// There's no corresponding barrier for the read side because the read
+// side naturally has a data dependency order. All architectures that
+// Go supports or seems likely to ever support automatically enforce
+// data dependency ordering.
+func publicationBarrier()
+
+// getcallerpc returns the program counter (PC) of its caller's caller.
+// getcallersp returns the stack pointer (SP) of its caller's caller.
+// The implementation may be a compiler intrinsic; there is not
+// necessarily code implementing this on every platform.
+//
+// For example:
+//
+//	func f(arg1, arg2, arg3 int) {
+//		pc := getcallerpc()
+//		sp := getcallersp()
+//	}
+//
+// These two lines find the PC and SP immediately following
+// the call to f (where f will return).
+//
+// The call to getcallerpc and getcallersp must be done in the
+// frame being asked about.
+//
+// The result of getcallersp is correct at the time of the return,
+// but it may be invalidated by any subsequent call to a function
+// that might relocate the stack in order to grow or shrink it.
+// A general rule is that the result of getcallersp should be used
+// immediately and can only be passed to nosplit functions.
+
+//go:noescape
+func getcallerpc() uintptr
+
+//go:noescape
+func getcallersp() uintptr // implemented as an intrinsic on all platforms
+
+// getclosureptr returns the pointer to the current closure.
+// getclosureptr can only be used in an assignment statement
+// at the entry of a function. Moreover, go:nosplit directive
+// must be specified at the declaration of caller function,
+// so that the function prolog does not clobber the closure register.
+// for example:
+//
+//	//go:nosplit
+//	func f(arg1, arg2, arg3 int) {
+//		dx := getclosureptr()
+//	}
+//
+// The compiler rewrites calls to this function into instructions that fetch the
+// pointer from a well-known register (DX on x86 architecture, etc.) directly.
+func getclosureptr() uintptr
+
+//go:noescape
+func asmcgocall(fn, arg unsafe.Pointer) int32
+
+func morestack()
+func morestack_noctxt()
+func rt0_go()
+
+// return0 is a stub used to return 0 from deferproc.
+// It is called at the very end of deferproc to signal
+// the calling Go function that it should not jump
+// to deferreturn.
+// in asm_*.s
+func return0()
+
+// in asm_*.s
+// not called directly; definitions here supply type information for traceback.
+func call16(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call32(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call64(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call128(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call256(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call512(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call1024(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call2048(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call4096(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call8192(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call16384(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call32768(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call65536(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call131072(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call262144(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call524288(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call1048576(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call2097152(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call4194304(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call8388608(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call16777216(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call33554432(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call67108864(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call134217728(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call268435456(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call536870912(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+func call1073741824(typ, fn, arg unsafe.Pointer, n, retoffset uint32)
+
+func systemstack_switch()
+
+// alignUp rounds n up to a multiple of a. a must be a power of 2.
+func alignUp(n, a uintptr) uintptr {
+	return (n + a - 1) &^ (a - 1)
+}
+
+// alignDown rounds n down to a multiple of a. a must be a power of 2.
+func alignDown(n, a uintptr) uintptr {
+	return n &^ (a - 1)
+}
+
+// divRoundUp returns ceil(n / a).
+func divRoundUp(n, a uintptr) uintptr {
+	// a is generally a power of two. This will get inlined and
+	// the compiler will optimize the division.
+	return (n + a - 1) / a
+}
+
+// checkASM reports whether assembly runtime checks have passed.
+func checkASM() bool
+
+func memequal_varlen(a, b unsafe.Pointer) bool
+
+// bool2int returns 0 if x is false or 1 if x is true.
+func bool2int(x bool) int {
+	// Avoid branches. In the SSA compiler, this compiles to
+	// exactly what you would want it to.
+	return int(uint8(*(*uint8)(unsafe.Pointer(&x))))
+}
+
+// abort crashes the runtime in situations where even throw might not
+// work. In general it should do something a debugger will recognize
+// (e.g., an INT3 on x86). A crash in abort is recognized by the
+// signal handler, which will attempt to tear down the runtime
+// immediately.
+func abort()
+
+// Called from compiled code; declared for vet; do NOT call from Go.
+func gcWriteBarrier()
+func duffzero()
+func duffcopy()
+
+// Called from linker-generated .initarray; declared for go vet; do NOT call from Go.
+func addmoduledata()
diff --git a/src/runtime/stubs2.go b/src/runtime/stubs2.go
new file mode 100644
index 0000000..85088b3
--- /dev/null
+++ b/src/runtime/stubs2.go
@@ -0,0 +1,41 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !aix
+// +build !darwin
+// +build !js
+// +build !openbsd
+// +build !plan9
+// +build !solaris
+// +build !windows
+
+package runtime
+
+import "unsafe"
+
+// read calls the read system call.
+// It returns a non-negative number of bytes written or a negative errno value.
+func read(fd int32, p unsafe.Pointer, n int32) int32
+
+func closefd(fd int32) int32
+
+func exit(code int32)
+func usleep(usec uint32)
+
+// write calls the write system call.
+// It returns a non-negative number of bytes written or a negative errno value.
+//go:noescape
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32
+
+//go:noescape
+func open(name *byte, mode, perm int32) int32
+
+// return value is only set on linux to be used in osinit()
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) int32
+
+// exitThread terminates the current thread, writing *wait = 0 when
+// the stack is safe to reclaim.
+//
+//go:noescape
+func exitThread(wait *uint32)
diff --git a/src/runtime/stubs3.go b/src/runtime/stubs3.go
new file mode 100644
index 0000000..1885d32
--- /dev/null
+++ b/src/runtime/stubs3.go
@@ -0,0 +1,14 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !aix
+// +build !darwin
+// +build !freebsd
+// +build !openbsd
+// +build !plan9
+// +build !solaris
+
+package runtime
+
+func nanotime1() int64
diff --git a/src/runtime/stubs_386.go b/src/runtime/stubs_386.go
new file mode 100644
index 0000000..5108294
--- /dev/null
+++ b/src/runtime/stubs_386.go
@@ -0,0 +1,17 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func float64touint32(a float64) uint32
+func uint32tofloat64(a uint32) float64
+
+// stackcheck checks that SP is in range [g->stack.lo, g->stack.hi).
+func stackcheck()
+
+// Called from assembly only; declared for go vet.
+func setldt(slot uintptr, base unsafe.Pointer, size uintptr)
+func emptyfunc()
diff --git a/src/runtime/stubs_amd64.go b/src/runtime/stubs_amd64.go
new file mode 100644
index 0000000..8c14bc2
--- /dev/null
+++ b/src/runtime/stubs_amd64.go
@@ -0,0 +1,37 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Called from compiled code; declared for vet; do NOT call from Go.
+func gcWriteBarrierCX()
+func gcWriteBarrierDX()
+func gcWriteBarrierBX()
+func gcWriteBarrierBP()
+func gcWriteBarrierSI()
+func gcWriteBarrierR8()
+func gcWriteBarrierR9()
+
+// stackcheck checks that SP is in range [g->stack.lo, g->stack.hi).
+func stackcheck()
+
+// Called from assembly only; declared for go vet.
+func settls() // argument in DI
+
+// Retpolines, used by -spectre=ret flag in cmd/asm, cmd/compile.
+func retpolineAX()
+func retpolineCX()
+func retpolineDX()
+func retpolineBX()
+func retpolineBP()
+func retpolineSI()
+func retpolineDI()
+func retpolineR8()
+func retpolineR9()
+func retpolineR10()
+func retpolineR11()
+func retpolineR12()
+func retpolineR13()
+func retpolineR14()
+func retpolineR15()
diff --git a/src/runtime/stubs_arm.go b/src/runtime/stubs_arm.go
new file mode 100644
index 0000000..c13bf16
--- /dev/null
+++ b/src/runtime/stubs_arm.go
@@ -0,0 +1,20 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Called from compiler-generated code; declared for go vet.
+func udiv()
+func _div()
+func _divu()
+func _mod()
+func _modu()
+
+// Called from assembly only; declared for go vet.
+func usplitR0()
+func load_g()
+func save_g()
+func emptyfunc()
+func _initcgo()
+func read_tls_fallback()
diff --git a/src/runtime/stubs_arm64.go b/src/runtime/stubs_arm64.go
new file mode 100644
index 0000000..44c566e
--- /dev/null
+++ b/src/runtime/stubs_arm64.go
@@ -0,0 +1,9 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
diff --git a/src/runtime/stubs_linux.go b/src/runtime/stubs_linux.go
new file mode 100644
index 0000000..e75fcf6
--- /dev/null
+++ b/src/runtime/stubs_linux.go
@@ -0,0 +1,19 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+
+package runtime
+
+import "unsafe"
+
+func sbrk0() uintptr
+
+// Called from write_err_android.go only, but defined in sys_linux_*.s;
+// declared here (instead of in write_err_android.go) for go vet on non-android builds.
+// The return value is the raw syscall result, which may encode an error number.
+//go:noescape
+func access(name *byte, mode int32) int32
+func connect(fd int32, addr unsafe.Pointer, len int32) int32
+func socket(domain int32, typ int32, prot int32) int32
diff --git a/src/runtime/stubs_mips64x.go b/src/runtime/stubs_mips64x.go
new file mode 100644
index 0000000..4e62c1c
--- /dev/null
+++ b/src/runtime/stubs_mips64x.go
@@ -0,0 +1,11 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+package runtime
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
diff --git a/src/runtime/stubs_mipsx.go b/src/runtime/stubs_mipsx.go
new file mode 100644
index 0000000..707b295
--- /dev/null
+++ b/src/runtime/stubs_mipsx.go
@@ -0,0 +1,11 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+package runtime
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
diff --git a/src/runtime/stubs_nonlinux.go b/src/runtime/stubs_nonlinux.go
new file mode 100644
index 0000000..e1ea05c
--- /dev/null
+++ b/src/runtime/stubs_nonlinux.go
@@ -0,0 +1,12 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !linux
+
+package runtime
+
+// sbrk0 returns the current process brk, or 0 if not implemented.
+func sbrk0() uintptr {
+	return 0
+}
diff --git a/src/runtime/stubs_ppc64x.go b/src/runtime/stubs_ppc64x.go
new file mode 100644
index 0000000..26f5bb2
--- /dev/null
+++ b/src/runtime/stubs_ppc64x.go
@@ -0,0 +1,12 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+package runtime
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
+func reginit()
diff --git a/src/runtime/stubs_s390x.go b/src/runtime/stubs_s390x.go
new file mode 100644
index 0000000..44c566e
--- /dev/null
+++ b/src/runtime/stubs_s390x.go
@@ -0,0 +1,9 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
diff --git a/src/runtime/symtab.go b/src/runtime/symtab.go
new file mode 100644
index 0000000..3341fc4
--- /dev/null
+++ b/src/runtime/symtab.go
@@ -0,0 +1,1041 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Frames may be used to get function/file/line information for a
+// slice of PC values returned by Callers.
+type Frames struct {
+	// callers is a slice of PCs that have not yet been expanded to frames.
+	callers []uintptr
+
+	// frames is a slice of Frames that have yet to be returned.
+	frames     []Frame
+	frameStore [2]Frame
+}
+
+// Frame is the information returned by Frames for each call frame.
+type Frame struct {
+	// PC is the program counter for the location in this frame.
+	// For a frame that calls another frame, this will be the
+	// program counter of a call instruction. Because of inlining,
+	// multiple frames may have the same PC value, but different
+	// symbolic information.
+	PC uintptr
+
+	// Func is the Func value of this call frame. This may be nil
+	// for non-Go code or fully inlined functions.
+	Func *Func
+
+	// Function is the package path-qualified function name of
+	// this call frame. If non-empty, this string uniquely
+	// identifies a single function in the program.
+	// This may be the empty string if not known.
+	// If Func is not nil then Function == Func.Name().
+	Function string
+
+	// File and Line are the file name and line number of the
+	// location in this frame. For non-leaf frames, this will be
+	// the location of a call. These may be the empty string and
+	// zero, respectively, if not known.
+	File string
+	Line int
+
+	// Entry point program counter for the function; may be zero
+	// if not known. If Func is not nil then Entry ==
+	// Func.Entry().
+	Entry uintptr
+
+	// The runtime's internal view of the function. This field
+	// is set (funcInfo.valid() returns true) only for Go functions,
+	// not for C functions.
+	funcInfo funcInfo
+}
+
+// CallersFrames takes a slice of PC values returned by Callers and
+// prepares to return function/file/line information.
+// Do not change the slice until you are done with the Frames.
+func CallersFrames(callers []uintptr) *Frames {
+	f := &Frames{callers: callers}
+	f.frames = f.frameStore[:0]
+	return f
+}
+
+// Next returns frame information for the next caller.
+// If more is false, there are no more callers (the Frame value is valid).
+func (ci *Frames) Next() (frame Frame, more bool) {
+	for len(ci.frames) < 2 {
+		// Find the next frame.
+		// We need to look for 2 frames so we know what
+		// to return for the "more" result.
+		if len(ci.callers) == 0 {
+			break
+		}
+		pc := ci.callers[0]
+		ci.callers = ci.callers[1:]
+		funcInfo := findfunc(pc)
+		if !funcInfo.valid() {
+			if cgoSymbolizer != nil {
+				// Pre-expand cgo frames. We could do this
+				// incrementally, too, but there's no way to
+				// avoid allocation in this case anyway.
+				ci.frames = append(ci.frames, expandCgoFrames(pc)...)
+			}
+			continue
+		}
+		f := funcInfo._Func()
+		entry := f.Entry()
+		if pc > entry {
+			// We store the pc of the start of the instruction following
+			// the instruction in question (the call or the inline mark).
+			// This is done for historical reasons, and to make FuncForPC
+			// work correctly for entries in the result of runtime.Callers.
+			pc--
+		}
+		name := funcname(funcInfo)
+		if inldata := funcdata(funcInfo, _FUNCDATA_InlTree); inldata != nil {
+			inltree := (*[1 << 20]inlinedCall)(inldata)
+			// Non-strict as cgoTraceback may have added bogus PCs
+			// with a valid funcInfo but invalid PCDATA.
+			ix := pcdatavalue1(funcInfo, _PCDATA_InlTreeIndex, pc, nil, false)
+			if ix >= 0 {
+				// Note: entry is not modified. It always refers to a real frame, not an inlined one.
+				f = nil
+				name = funcnameFromNameoff(funcInfo, inltree[ix].func_)
+				// File/line is already correct.
+				// TODO: remove file/line from InlinedCall?
+			}
+		}
+		ci.frames = append(ci.frames, Frame{
+			PC:       pc,
+			Func:     f,
+			Function: name,
+			Entry:    entry,
+			funcInfo: funcInfo,
+			// Note: File,Line set below
+		})
+	}
+
+	// Pop one frame from the frame list. Keep the rest.
+	// Avoid allocation in the common case, which is 1 or 2 frames.
+	switch len(ci.frames) {
+	case 0: // In the rare case when there are no frames at all, we return Frame{}.
+		return
+	case 1:
+		frame = ci.frames[0]
+		ci.frames = ci.frameStore[:0]
+	case 2:
+		frame = ci.frames[0]
+		ci.frameStore[0] = ci.frames[1]
+		ci.frames = ci.frameStore[:1]
+	default:
+		frame = ci.frames[0]
+		ci.frames = ci.frames[1:]
+	}
+	more = len(ci.frames) > 0
+	if frame.funcInfo.valid() {
+		// Compute file/line just before we need to return it,
+		// as it can be expensive. This avoids computing file/line
+		// for the Frame we find but don't return. See issue 32093.
+		file, line := funcline1(frame.funcInfo, frame.PC, false)
+		frame.File, frame.Line = file, int(line)
+	}
+	return
+}
+
+// runtime_expandFinalInlineFrame expands the final pc in stk to include all
+// "callers" if pc is inline.
+//
+//go:linkname runtime_expandFinalInlineFrame runtime/pprof.runtime_expandFinalInlineFrame
+func runtime_expandFinalInlineFrame(stk []uintptr) []uintptr {
+	if len(stk) == 0 {
+		return stk
+	}
+	pc := stk[len(stk)-1]
+	tracepc := pc - 1
+
+	f := findfunc(tracepc)
+	if !f.valid() {
+		// Not a Go function.
+		return stk
+	}
+
+	inldata := funcdata(f, _FUNCDATA_InlTree)
+	if inldata == nil {
+		// Nothing inline in f.
+		return stk
+	}
+
+	// Treat the previous func as normal. We haven't actually checked, but
+	// since this pc was included in the stack, we know it shouldn't be
+	// elided.
+	lastFuncID := funcID_normal
+
+	// Remove pc from stk; we'll re-add it below.
+	stk = stk[:len(stk)-1]
+
+	// See inline expansion in gentraceback.
+	var cache pcvalueCache
+	inltree := (*[1 << 20]inlinedCall)(inldata)
+	for {
+		// Non-strict as cgoTraceback may have added bogus PCs
+		// with a valid funcInfo but invalid PCDATA.
+		ix := pcdatavalue1(f, _PCDATA_InlTreeIndex, tracepc, &cache, false)
+		if ix < 0 {
+			break
+		}
+		if inltree[ix].funcID == funcID_wrapper && elideWrapperCalling(lastFuncID) {
+			// ignore wrappers
+		} else {
+			stk = append(stk, pc)
+		}
+		lastFuncID = inltree[ix].funcID
+		// Back up to an instruction in the "caller".
+		tracepc = f.entry + uintptr(inltree[ix].parentPc)
+		pc = tracepc + 1
+	}
+
+	// N.B. we want to keep the last parentPC which is not inline.
+	stk = append(stk, pc)
+
+	return stk
+}
+
+// expandCgoFrames expands frame information for pc, known to be
+// a non-Go function, using the cgoSymbolizer hook. expandCgoFrames
+// returns nil if pc could not be expanded.
+func expandCgoFrames(pc uintptr) []Frame {
+	arg := cgoSymbolizerArg{pc: pc}
+	callCgoSymbolizer(&arg)
+
+	if arg.file == nil && arg.funcName == nil {
+		// No useful information from symbolizer.
+		return nil
+	}
+
+	var frames []Frame
+	for {
+		frames = append(frames, Frame{
+			PC:       pc,
+			Func:     nil,
+			Function: gostring(arg.funcName),
+			File:     gostring(arg.file),
+			Line:     int(arg.lineno),
+			Entry:    arg.entry,
+			// funcInfo is zero, which implies !funcInfo.valid().
+			// That ensures that we use the File/Line info given here.
+		})
+		if arg.more == 0 {
+			break
+		}
+		callCgoSymbolizer(&arg)
+	}
+
+	// No more frames for this PC. Tell the symbolizer we are done.
+	// We don't try to maintain a single cgoSymbolizerArg for the
+	// whole use of Frames, because there would be no good way to tell
+	// the symbolizer when we are done.
+	arg.pc = 0
+	callCgoSymbolizer(&arg)
+
+	return frames
+}
+
+// NOTE: Func does not expose the actual unexported fields, because we return *Func
+// values to users, and we want to keep them from being able to overwrite the data
+// with (say) *f = Func{}.
+// All code operating on a *Func must call raw() to get the *_func
+// or funcInfo() to get the funcInfo instead.
+
+// A Func represents a Go function in the running binary.
+type Func struct {
+	opaque struct{} // unexported field to disallow conversions
+}
+
+func (f *Func) raw() *_func {
+	return (*_func)(unsafe.Pointer(f))
+}
+
+func (f *Func) funcInfo() funcInfo {
+	fn := f.raw()
+	return funcInfo{fn, findmoduledatap(fn.entry)}
+}
+
+// PCDATA and FUNCDATA table indexes.
+//
+// See funcdata.h and ../cmd/internal/objabi/funcdata.go.
+const (
+	_PCDATA_UnsafePoint   = 0
+	_PCDATA_StackMapIndex = 1
+	_PCDATA_InlTreeIndex  = 2
+
+	_FUNCDATA_ArgsPointerMaps    = 0
+	_FUNCDATA_LocalsPointerMaps  = 1
+	_FUNCDATA_StackObjects       = 2
+	_FUNCDATA_InlTree            = 3
+	_FUNCDATA_OpenCodedDeferInfo = 4
+
+	_ArgsSizeUnknown = -0x80000000
+)
+
+const (
+	// PCDATA_UnsafePoint values.
+	_PCDATA_UnsafePointSafe   = -1 // Safe for async preemption
+	_PCDATA_UnsafePointUnsafe = -2 // Unsafe for async preemption
+
+	// _PCDATA_Restart1(2) apply on a sequence of instructions, within
+	// which if an async preemption happens, we should back off the PC
+	// to the start of the sequence when resume.
+	// We need two so we can distinguish the start/end of the sequence
+	// in case that two sequences are next to each other.
+	_PCDATA_Restart1 = -3
+	_PCDATA_Restart2 = -4
+
+	// Like _PCDATA_RestartAtEntry, but back to function entry if async
+	// preempted.
+	_PCDATA_RestartAtEntry = -5
+)
+
+// A FuncID identifies particular functions that need to be treated
+// specially by the runtime.
+// Note that in some situations involving plugins, there may be multiple
+// copies of a particular special runtime function.
+// Note: this list must match the list in cmd/internal/objabi/funcid.go.
+type funcID uint8
+
+const (
+	funcID_normal funcID = iota // not a special function
+	funcID_runtime_main
+	funcID_goexit
+	funcID_jmpdefer
+	funcID_mcall
+	funcID_morestack
+	funcID_mstart
+	funcID_rt0_go
+	funcID_asmcgocall
+	funcID_sigpanic
+	funcID_runfinq
+	funcID_gcBgMarkWorker
+	funcID_systemstack_switch
+	funcID_systemstack
+	funcID_cgocallback
+	funcID_gogo
+	funcID_externalthreadhandler
+	funcID_debugCallV1
+	funcID_gopanic
+	funcID_panicwrap
+	funcID_handleAsyncEvent
+	funcID_asyncPreempt
+	funcID_wrapper // any autogenerated code (hash/eq algorithms, method wrappers, etc.)
+)
+
+// pcHeader holds data used by the pclntab lookups.
+type pcHeader struct {
+	magic          uint32  // 0xFFFFFFFA
+	pad1, pad2     uint8   // 0,0
+	minLC          uint8   // min instruction size
+	ptrSize        uint8   // size of a ptr in bytes
+	nfunc          int     // number of functions in the module
+	nfiles         uint    // number of entries in the file tab.
+	funcnameOffset uintptr // offset to the funcnametab variable from pcHeader
+	cuOffset       uintptr // offset to the cutab variable from pcHeader
+	filetabOffset  uintptr // offset to the filetab variable from pcHeader
+	pctabOffset    uintptr // offset to the pctab varible from pcHeader
+	pclnOffset     uintptr // offset to the pclntab variable from pcHeader
+}
+
+// moduledata records information about the layout of the executable
+// image. It is written by the linker. Any changes here must be
+// matched changes to the code in cmd/internal/ld/symtab.go:symtab.
+// moduledata is stored in statically allocated non-pointer memory;
+// none of the pointers here are visible to the garbage collector.
+type moduledata struct {
+	pcHeader     *pcHeader
+	funcnametab  []byte
+	cutab        []uint32
+	filetab      []byte
+	pctab        []byte
+	pclntable    []byte
+	ftab         []functab
+	findfunctab  uintptr
+	minpc, maxpc uintptr
+
+	text, etext           uintptr
+	noptrdata, enoptrdata uintptr
+	data, edata           uintptr
+	bss, ebss             uintptr
+	noptrbss, enoptrbss   uintptr
+	end, gcdata, gcbss    uintptr
+	types, etypes         uintptr
+
+	textsectmap []textsect
+	typelinks   []int32 // offsets from types
+	itablinks   []*itab
+
+	ptab []ptabEntry
+
+	pluginpath string
+	pkghashes  []modulehash
+
+	modulename   string
+	modulehashes []modulehash
+
+	hasmain uint8 // 1 if module contains the main function, 0 otherwise
+
+	gcdatamask, gcbssmask bitvector
+
+	typemap map[typeOff]*_type // offset to *_rtype in previous module
+
+	bad bool // module failed to load and should be ignored
+
+	next *moduledata
+}
+
+// A modulehash is used to compare the ABI of a new module or a
+// package in a new module with the loaded program.
+//
+// For each shared library a module links against, the linker creates an entry in the
+// moduledata.modulehashes slice containing the name of the module, the abi hash seen
+// at link time and a pointer to the runtime abi hash. These are checked in
+// moduledataverify1 below.
+//
+// For each loaded plugin, the pkghashes slice has a modulehash of the
+// newly loaded package that can be used to check the plugin's version of
+// a package against any previously loaded version of the package.
+// This is done in plugin.lastmoduleinit.
+type modulehash struct {
+	modulename   string
+	linktimehash string
+	runtimehash  *string
+}
+
+// pinnedTypemaps are the map[typeOff]*_type from the moduledata objects.
+//
+// These typemap objects are allocated at run time on the heap, but the
+// only direct reference to them is in the moduledata, created by the
+// linker and marked SNOPTRDATA so it is ignored by the GC.
+//
+// To make sure the map isn't collected, we keep a second reference here.
+var pinnedTypemaps []map[typeOff]*_type
+
+var firstmoduledata moduledata  // linker symbol
+var lastmoduledatap *moduledata // linker symbol
+var modulesSlice *[]*moduledata // see activeModules
+
+// activeModules returns a slice of active modules.
+//
+// A module is active once its gcdatamask and gcbssmask have been
+// assembled and it is usable by the GC.
+//
+// This is nosplit/nowritebarrier because it is called by the
+// cgo pointer checking code.
+//go:nosplit
+//go:nowritebarrier
+func activeModules() []*moduledata {
+	p := (*[]*moduledata)(atomic.Loadp(unsafe.Pointer(&modulesSlice)))
+	if p == nil {
+		return nil
+	}
+	return *p
+}
+
+// modulesinit creates the active modules slice out of all loaded modules.
+//
+// When a module is first loaded by the dynamic linker, an .init_array
+// function (written by cmd/link) is invoked to call addmoduledata,
+// appending to the module to the linked list that starts with
+// firstmoduledata.
+//
+// There are two times this can happen in the lifecycle of a Go
+// program. First, if compiled with -linkshared, a number of modules
+// built with -buildmode=shared can be loaded at program initialization.
+// Second, a Go program can load a module while running that was built
+// with -buildmode=plugin.
+//
+// After loading, this function is called which initializes the
+// moduledata so it is usable by the GC and creates a new activeModules
+// list.
+//
+// Only one goroutine may call modulesinit at a time.
+func modulesinit() {
+	modules := new([]*moduledata)
+	for md := &firstmoduledata; md != nil; md = md.next {
+		if md.bad {
+			continue
+		}
+		*modules = append(*modules, md)
+		if md.gcdatamask == (bitvector{}) {
+			md.gcdatamask = progToPointerMask((*byte)(unsafe.Pointer(md.gcdata)), md.edata-md.data)
+			md.gcbssmask = progToPointerMask((*byte)(unsafe.Pointer(md.gcbss)), md.ebss-md.bss)
+		}
+	}
+
+	// Modules appear in the moduledata linked list in the order they are
+	// loaded by the dynamic loader, with one exception: the
+	// firstmoduledata itself the module that contains the runtime. This
+	// is not always the first module (when using -buildmode=shared, it
+	// is typically libstd.so, the second module). The order matters for
+	// typelinksinit, so we swap the first module with whatever module
+	// contains the main function.
+	//
+	// See Issue #18729.
+	for i, md := range *modules {
+		if md.hasmain != 0 {
+			(*modules)[0] = md
+			(*modules)[i] = &firstmoduledata
+			break
+		}
+	}
+
+	atomicstorep(unsafe.Pointer(&modulesSlice), unsafe.Pointer(modules))
+}
+
+type functab struct {
+	entry   uintptr
+	funcoff uintptr
+}
+
+// Mapping information for secondary text sections
+
+type textsect struct {
+	vaddr    uintptr // prelinked section vaddr
+	length   uintptr // section length
+	baseaddr uintptr // relocated section address
+}
+
+const minfunc = 16                 // minimum function size
+const pcbucketsize = 256 * minfunc // size of bucket in the pc->func lookup table
+
+// findfunctab is an array of these structures.
+// Each bucket represents 4096 bytes of the text segment.
+// Each subbucket represents 256 bytes of the text segment.
+// To find a function given a pc, locate the bucket and subbucket for
+// that pc. Add together the idx and subbucket value to obtain a
+// function index. Then scan the functab array starting at that
+// index to find the target function.
+// This table uses 20 bytes for every 4096 bytes of code, or ~0.5% overhead.
+type findfuncbucket struct {
+	idx        uint32
+	subbuckets [16]byte
+}
+
+func moduledataverify() {
+	for datap := &firstmoduledata; datap != nil; datap = datap.next {
+		moduledataverify1(datap)
+	}
+}
+
+const debugPcln = false
+
+func moduledataverify1(datap *moduledata) {
+	// Check that the pclntab's format is valid.
+	hdr := datap.pcHeader
+	if hdr.magic != 0xfffffffa || hdr.pad1 != 0 || hdr.pad2 != 0 || hdr.minLC != sys.PCQuantum || hdr.ptrSize != sys.PtrSize {
+		println("runtime: function symbol table header:", hex(hdr.magic), hex(hdr.pad1), hex(hdr.pad2), hex(hdr.minLC), hex(hdr.ptrSize))
+		throw("invalid function symbol table\n")
+	}
+
+	// ftab is lookup table for function by program counter.
+	nftab := len(datap.ftab) - 1
+	for i := 0; i < nftab; i++ {
+		// NOTE: ftab[nftab].entry is legal; it is the address beyond the final function.
+		if datap.ftab[i].entry > datap.ftab[i+1].entry {
+			f1 := funcInfo{(*_func)(unsafe.Pointer(&datap.pclntable[datap.ftab[i].funcoff])), datap}
+			f2 := funcInfo{(*_func)(unsafe.Pointer(&datap.pclntable[datap.ftab[i+1].funcoff])), datap}
+			f2name := "end"
+			if i+1 < nftab {
+				f2name = funcname(f2)
+			}
+			println("function symbol table not sorted by program counter:", hex(datap.ftab[i].entry), funcname(f1), ">", hex(datap.ftab[i+1].entry), f2name)
+			for j := 0; j <= i; j++ {
+				print("\t", hex(datap.ftab[j].entry), " ", funcname(funcInfo{(*_func)(unsafe.Pointer(&datap.pclntable[datap.ftab[j].funcoff])), datap}), "\n")
+			}
+			if GOOS == "aix" && isarchive {
+				println("-Wl,-bnoobjreorder is mandatory on aix/ppc64 with c-archive")
+			}
+			throw("invalid runtime symbol table")
+		}
+	}
+
+	if datap.minpc != datap.ftab[0].entry ||
+		datap.maxpc != datap.ftab[nftab].entry {
+		throw("minpc or maxpc invalid")
+	}
+
+	for _, modulehash := range datap.modulehashes {
+		if modulehash.linktimehash != *modulehash.runtimehash {
+			println("abi mismatch detected between", datap.modulename, "and", modulehash.modulename)
+			throw("abi mismatch")
+		}
+	}
+}
+
+// FuncForPC returns a *Func describing the function that contains the
+// given program counter address, or else nil.
+//
+// If pc represents multiple functions because of inlining, it returns
+// the *Func describing the innermost function, but with an entry of
+// the outermost function.
+func FuncForPC(pc uintptr) *Func {
+	f := findfunc(pc)
+	if !f.valid() {
+		return nil
+	}
+	if inldata := funcdata(f, _FUNCDATA_InlTree); inldata != nil {
+		// Note: strict=false so bad PCs (those between functions) don't crash the runtime.
+		// We just report the preceding function in that situation. See issue 29735.
+		// TODO: Perhaps we should report no function at all in that case.
+		// The runtime currently doesn't have function end info, alas.
+		if ix := pcdatavalue1(f, _PCDATA_InlTreeIndex, pc, nil, false); ix >= 0 {
+			inltree := (*[1 << 20]inlinedCall)(inldata)
+			name := funcnameFromNameoff(f, inltree[ix].func_)
+			file, line := funcline(f, pc)
+			fi := &funcinl{
+				entry: f.entry, // entry of the real (the outermost) function.
+				name:  name,
+				file:  file,
+				line:  int(line),
+			}
+			return (*Func)(unsafe.Pointer(fi))
+		}
+	}
+	return f._Func()
+}
+
+// Name returns the name of the function.
+func (f *Func) Name() string {
+	if f == nil {
+		return ""
+	}
+	fn := f.raw()
+	if fn.entry == 0 { // inlined version
+		fi := (*funcinl)(unsafe.Pointer(fn))
+		return fi.name
+	}
+	return funcname(f.funcInfo())
+}
+
+// Entry returns the entry address of the function.
+func (f *Func) Entry() uintptr {
+	fn := f.raw()
+	if fn.entry == 0 { // inlined version
+		fi := (*funcinl)(unsafe.Pointer(fn))
+		return fi.entry
+	}
+	return fn.entry
+}
+
+// FileLine returns the file name and line number of the
+// source code corresponding to the program counter pc.
+// The result will not be accurate if pc is not a program
+// counter within f.
+func (f *Func) FileLine(pc uintptr) (file string, line int) {
+	fn := f.raw()
+	if fn.entry == 0 { // inlined version
+		fi := (*funcinl)(unsafe.Pointer(fn))
+		return fi.file, fi.line
+	}
+	// Pass strict=false here, because anyone can call this function,
+	// and they might just be wrong about targetpc belonging to f.
+	file, line32 := funcline1(f.funcInfo(), pc, false)
+	return file, int(line32)
+}
+
+func findmoduledatap(pc uintptr) *moduledata {
+	for datap := &firstmoduledata; datap != nil; datap = datap.next {
+		if datap.minpc <= pc && pc < datap.maxpc {
+			return datap
+		}
+	}
+	return nil
+}
+
+type funcInfo struct {
+	*_func
+	datap *moduledata
+}
+
+func (f funcInfo) valid() bool {
+	return f._func != nil
+}
+
+func (f funcInfo) _Func() *Func {
+	return (*Func)(unsafe.Pointer(f._func))
+}
+
+func findfunc(pc uintptr) funcInfo {
+	datap := findmoduledatap(pc)
+	if datap == nil {
+		return funcInfo{}
+	}
+	const nsub = uintptr(len(findfuncbucket{}.subbuckets))
+
+	x := pc - datap.minpc
+	b := x / pcbucketsize
+	i := x % pcbucketsize / (pcbucketsize / nsub)
+
+	ffb := (*findfuncbucket)(add(unsafe.Pointer(datap.findfunctab), b*unsafe.Sizeof(findfuncbucket{})))
+	idx := ffb.idx + uint32(ffb.subbuckets[i])
+
+	// If the idx is beyond the end of the ftab, set it to the end of the table and search backward.
+	// This situation can occur if multiple text sections are generated to handle large text sections
+	// and the linker has inserted jump tables between them.
+
+	if idx >= uint32(len(datap.ftab)) {
+		idx = uint32(len(datap.ftab) - 1)
+	}
+	if pc < datap.ftab[idx].entry {
+		// With multiple text sections, the idx might reference a function address that
+		// is higher than the pc being searched, so search backward until the matching address is found.
+
+		for datap.ftab[idx].entry > pc && idx > 0 {
+			idx--
+		}
+		if idx == 0 {
+			throw("findfunc: bad findfunctab entry idx")
+		}
+	} else {
+		// linear search to find func with pc >= entry.
+		for datap.ftab[idx+1].entry <= pc {
+			idx++
+		}
+	}
+	funcoff := datap.ftab[idx].funcoff
+	if funcoff == ^uintptr(0) {
+		// With multiple text sections, there may be functions inserted by the external
+		// linker that are not known by Go. This means there may be holes in the PC
+		// range covered by the func table. The invalid funcoff value indicates a hole.
+		// See also cmd/link/internal/ld/pcln.go:pclntab
+		return funcInfo{}
+	}
+	return funcInfo{(*_func)(unsafe.Pointer(&datap.pclntable[funcoff])), datap}
+}
+
+type pcvalueCache struct {
+	entries [2][8]pcvalueCacheEnt
+}
+
+type pcvalueCacheEnt struct {
+	// targetpc and off together are the key of this cache entry.
+	targetpc uintptr
+	off      uint32
+	// val is the value of this cached pcvalue entry.
+	val int32
+}
+
+// pcvalueCacheKey returns the outermost index in a pcvalueCache to use for targetpc.
+// It must be very cheap to calculate.
+// For now, align to sys.PtrSize and reduce mod the number of entries.
+// In practice, this appears to be fairly randomly and evenly distributed.
+func pcvalueCacheKey(targetpc uintptr) uintptr {
+	return (targetpc / sys.PtrSize) % uintptr(len(pcvalueCache{}.entries))
+}
+
+// Returns the PCData value, and the PC where this value starts.
+// TODO: the start PC is returned only when cache is nil.
+func pcvalue(f funcInfo, off uint32, targetpc uintptr, cache *pcvalueCache, strict bool) (int32, uintptr) {
+	if off == 0 {
+		return -1, 0
+	}
+
+	// Check the cache. This speeds up walks of deep stacks, which
+	// tend to have the same recursive functions over and over.
+	//
+	// This cache is small enough that full associativity is
+	// cheaper than doing the hashing for a less associative
+	// cache.
+	if cache != nil {
+		x := pcvalueCacheKey(targetpc)
+		for i := range cache.entries[x] {
+			// We check off first because we're more
+			// likely to have multiple entries with
+			// different offsets for the same targetpc
+			// than the other way around, so we'll usually
+			// fail in the first clause.
+			ent := &cache.entries[x][i]
+			if ent.off == off && ent.targetpc == targetpc {
+				return ent.val, 0
+			}
+		}
+	}
+
+	if !f.valid() {
+		if strict && panicking == 0 {
+			print("runtime: no module data for ", hex(f.entry), "\n")
+			throw("no module data")
+		}
+		return -1, 0
+	}
+	datap := f.datap
+	p := datap.pctab[off:]
+	pc := f.entry
+	prevpc := pc
+	val := int32(-1)
+	for {
+		var ok bool
+		p, ok = step(p, &pc, &val, pc == f.entry)
+		if !ok {
+			break
+		}
+		if targetpc < pc {
+			// Replace a random entry in the cache. Random
+			// replacement prevents a performance cliff if
+			// a recursive stack's cycle is slightly
+			// larger than the cache.
+			// Put the new element at the beginning,
+			// since it is the most likely to be newly used.
+			if cache != nil {
+				x := pcvalueCacheKey(targetpc)
+				e := &cache.entries[x]
+				ci := fastrand() % uint32(len(cache.entries[x]))
+				e[ci] = e[0]
+				e[0] = pcvalueCacheEnt{
+					targetpc: targetpc,
+					off:      off,
+					val:      val,
+				}
+			}
+
+			return val, prevpc
+		}
+		prevpc = pc
+	}
+
+	// If there was a table, it should have covered all program counters.
+	// If not, something is wrong.
+	if panicking != 0 || !strict {
+		return -1, 0
+	}
+
+	print("runtime: invalid pc-encoded table f=", funcname(f), " pc=", hex(pc), " targetpc=", hex(targetpc), " tab=", p, "\n")
+
+	p = datap.pctab[off:]
+	pc = f.entry
+	val = -1
+	for {
+		var ok bool
+		p, ok = step(p, &pc, &val, pc == f.entry)
+		if !ok {
+			break
+		}
+		print("\tvalue=", val, " until pc=", hex(pc), "\n")
+	}
+
+	throw("invalid runtime symbol table")
+	return -1, 0
+}
+
+func cfuncname(f funcInfo) *byte {
+	if !f.valid() || f.nameoff == 0 {
+		return nil
+	}
+	return &f.datap.funcnametab[f.nameoff]
+}
+
+func funcname(f funcInfo) string {
+	return gostringnocopy(cfuncname(f))
+}
+
+func funcpkgpath(f funcInfo) string {
+	name := funcname(f)
+	i := len(name) - 1
+	for ; i > 0; i-- {
+		if name[i] == '/' {
+			break
+		}
+	}
+	for ; i < len(name); i++ {
+		if name[i] == '.' {
+			break
+		}
+	}
+	return name[:i]
+}
+
+func cfuncnameFromNameoff(f funcInfo, nameoff int32) *byte {
+	if !f.valid() {
+		return nil
+	}
+	return &f.datap.funcnametab[nameoff]
+}
+
+func funcnameFromNameoff(f funcInfo, nameoff int32) string {
+	return gostringnocopy(cfuncnameFromNameoff(f, nameoff))
+}
+
+func funcfile(f funcInfo, fileno int32) string {
+	datap := f.datap
+	if !f.valid() {
+		return "?"
+	}
+	// Make sure the cu index and file offset are valid
+	if fileoff := datap.cutab[f.cuOffset+uint32(fileno)]; fileoff != ^uint32(0) {
+		return gostringnocopy(&datap.filetab[fileoff])
+	}
+	// pcln section is corrupt.
+	return "?"
+}
+
+func funcline1(f funcInfo, targetpc uintptr, strict bool) (file string, line int32) {
+	datap := f.datap
+	if !f.valid() {
+		return "?", 0
+	}
+	fileno, _ := pcvalue(f, f.pcfile, targetpc, nil, strict)
+	line, _ = pcvalue(f, f.pcln, targetpc, nil, strict)
+	if fileno == -1 || line == -1 || int(fileno) >= len(datap.filetab) {
+		// print("looking for ", hex(targetpc), " in ", funcname(f), " got file=", fileno, " line=", lineno, "\n")
+		return "?", 0
+	}
+	file = funcfile(f, fileno)
+	return
+}
+
+func funcline(f funcInfo, targetpc uintptr) (file string, line int32) {
+	return funcline1(f, targetpc, true)
+}
+
+func funcspdelta(f funcInfo, targetpc uintptr, cache *pcvalueCache) int32 {
+	x, _ := pcvalue(f, f.pcsp, targetpc, cache, true)
+	if x&(sys.PtrSize-1) != 0 {
+		print("invalid spdelta ", funcname(f), " ", hex(f.entry), " ", hex(targetpc), " ", hex(f.pcsp), " ", x, "\n")
+	}
+	return x
+}
+
+// funcMaxSPDelta returns the maximum spdelta at any point in f.
+func funcMaxSPDelta(f funcInfo) int32 {
+	datap := f.datap
+	p := datap.pctab[f.pcsp:]
+	pc := f.entry
+	val := int32(-1)
+	max := int32(0)
+	for {
+		var ok bool
+		p, ok = step(p, &pc, &val, pc == f.entry)
+		if !ok {
+			return max
+		}
+		if val > max {
+			max = val
+		}
+	}
+}
+
+func pcdatastart(f funcInfo, table uint32) uint32 {
+	return *(*uint32)(add(unsafe.Pointer(&f.nfuncdata), unsafe.Sizeof(f.nfuncdata)+uintptr(table)*4))
+}
+
+func pcdatavalue(f funcInfo, table uint32, targetpc uintptr, cache *pcvalueCache) int32 {
+	if table >= f.npcdata {
+		return -1
+	}
+	r, _ := pcvalue(f, pcdatastart(f, table), targetpc, cache, true)
+	return r
+}
+
+func pcdatavalue1(f funcInfo, table uint32, targetpc uintptr, cache *pcvalueCache, strict bool) int32 {
+	if table >= f.npcdata {
+		return -1
+	}
+	r, _ := pcvalue(f, pcdatastart(f, table), targetpc, cache, strict)
+	return r
+}
+
+// Like pcdatavalue, but also return the start PC of this PCData value.
+// It doesn't take a cache.
+func pcdatavalue2(f funcInfo, table uint32, targetpc uintptr) (int32, uintptr) {
+	if table >= f.npcdata {
+		return -1, 0
+	}
+	return pcvalue(f, pcdatastart(f, table), targetpc, nil, true)
+}
+
+func funcdata(f funcInfo, i uint8) unsafe.Pointer {
+	if i < 0 || i >= f.nfuncdata {
+		return nil
+	}
+	p := add(unsafe.Pointer(&f.nfuncdata), unsafe.Sizeof(f.nfuncdata)+uintptr(f.npcdata)*4)
+	if sys.PtrSize == 8 && uintptr(p)&4 != 0 {
+		if uintptr(unsafe.Pointer(f._func))&4 != 0 {
+			println("runtime: misaligned func", f._func)
+		}
+		p = add(p, 4)
+	}
+	return *(*unsafe.Pointer)(add(p, uintptr(i)*sys.PtrSize))
+}
+
+// step advances to the next pc, value pair in the encoded table.
+func step(p []byte, pc *uintptr, val *int32, first bool) (newp []byte, ok bool) {
+	// For both uvdelta and pcdelta, the common case (~70%)
+	// is that they are a single byte. If so, avoid calling readvarint.
+	uvdelta := uint32(p[0])
+	if uvdelta == 0 && !first {
+		return nil, false
+	}
+	n := uint32(1)
+	if uvdelta&0x80 != 0 {
+		n, uvdelta = readvarint(p)
+	}
+	*val += int32(-(uvdelta & 1) ^ (uvdelta >> 1))
+	p = p[n:]
+
+	pcdelta := uint32(p[0])
+	n = 1
+	if pcdelta&0x80 != 0 {
+		n, pcdelta = readvarint(p)
+	}
+	p = p[n:]
+	*pc += uintptr(pcdelta * sys.PCQuantum)
+	return p, true
+}
+
+// readvarint reads a varint from p.
+func readvarint(p []byte) (read uint32, val uint32) {
+	var v, shift, n uint32
+	for {
+		b := p[n]
+		n++
+		v |= uint32(b&0x7F) << (shift & 31)
+		if b&0x80 == 0 {
+			break
+		}
+		shift += 7
+	}
+	return n, v
+}
+
+type stackmap struct {
+	n        int32   // number of bitmaps
+	nbit     int32   // number of bits in each bitmap
+	bytedata [1]byte // bitmaps, each starting on a byte boundary
+}
+
+//go:nowritebarrier
+func stackmapdata(stkmap *stackmap, n int32) bitvector {
+	// Check this invariant only when stackDebug is on at all.
+	// The invariant is already checked by many of stackmapdata's callers,
+	// and disabling it by default allows stackmapdata to be inlined.
+	if stackDebug > 0 && (n < 0 || n >= stkmap.n) {
+		throw("stackmapdata: index out of range")
+	}
+	return bitvector{stkmap.nbit, addb(&stkmap.bytedata[0], uintptr(n*((stkmap.nbit+7)>>3)))}
+}
+
+// inlinedCall is the encoding of entries in the FUNCDATA_InlTree table.
+type inlinedCall struct {
+	parent   int16  // index of parent in the inltree, or < 0
+	funcID   funcID // type of the called function
+	_        byte
+	file     int32 // perCU file index for inlined call. See cmd/link:pcln.go
+	line     int32 // line number of the call site
+	func_    int32 // offset into pclntab for name of called function
+	parentPc int32 // position of an instruction whose source position is the call site (offset from entry)
+}
diff --git a/src/runtime/symtab_test.go b/src/runtime/symtab_test.go
new file mode 100644
index 0000000..ffa07c7
--- /dev/null
+++ b/src/runtime/symtab_test.go
@@ -0,0 +1,252 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"strings"
+	"testing"
+	"unsafe"
+)
+
+func TestCaller(t *testing.T) {
+	procs := runtime.GOMAXPROCS(-1)
+	c := make(chan bool, procs)
+	for p := 0; p < procs; p++ {
+		go func() {
+			for i := 0; i < 1000; i++ {
+				testCallerFoo(t)
+			}
+			c <- true
+		}()
+		defer func() {
+			<-c
+		}()
+	}
+}
+
+// These are marked noinline so that we can use FuncForPC
+// in testCallerBar.
+//go:noinline
+func testCallerFoo(t *testing.T) {
+	testCallerBar(t)
+}
+
+//go:noinline
+func testCallerBar(t *testing.T) {
+	for i := 0; i < 2; i++ {
+		pc, file, line, ok := runtime.Caller(i)
+		f := runtime.FuncForPC(pc)
+		if !ok ||
+			!strings.HasSuffix(file, "symtab_test.go") ||
+			(i == 0 && !strings.HasSuffix(f.Name(), "testCallerBar")) ||
+			(i == 1 && !strings.HasSuffix(f.Name(), "testCallerFoo")) ||
+			line < 5 || line > 1000 ||
+			f.Entry() >= pc {
+			t.Errorf("incorrect symbol info %d: %t %d %d %s %s %d",
+				i, ok, f.Entry(), pc, f.Name(), file, line)
+		}
+	}
+}
+
+func lineNumber() int {
+	_, _, line, _ := runtime.Caller(1)
+	return line // return 0 for error
+}
+
+// Do not add/remove lines in this block without updating the line numbers.
+var firstLine = lineNumber() // 0
+var (                        // 1
+	lineVar1             = lineNumber()               // 2
+	lineVar2a, lineVar2b = lineNumber(), lineNumber() // 3
+)                        // 4
+var compLit = []struct { // 5
+	lineA, lineB int // 6
+}{ // 7
+	{ // 8
+		lineNumber(), lineNumber(), // 9
+	}, // 10
+	{ // 11
+		lineNumber(), // 12
+		lineNumber(), // 13
+	}, // 14
+	{ // 15
+		lineB: lineNumber(), // 16
+		lineA: lineNumber(), // 17
+	}, // 18
+}                                     // 19
+var arrayLit = [...]int{lineNumber(), // 20
+	lineNumber(), lineNumber(), // 21
+	lineNumber(), // 22
+}                                  // 23
+var sliceLit = []int{lineNumber(), // 24
+	lineNumber(), lineNumber(), // 25
+	lineNumber(), // 26
+}                         // 27
+var mapLit = map[int]int{ // 28
+	29:           lineNumber(), // 29
+	30:           lineNumber(), // 30
+	lineNumber(): 31,           // 31
+	lineNumber(): 32,           // 32
+}                           // 33
+var intLit = lineNumber() + // 34
+	lineNumber() + // 35
+	lineNumber() // 36
+func trythis() { // 37
+	recordLines(lineNumber(), // 38
+		lineNumber(), // 39
+		lineNumber()) // 40
+}
+
+// Modifications below this line are okay.
+
+var l38, l39, l40 int
+
+func recordLines(a, b, c int) {
+	l38 = a
+	l39 = b
+	l40 = c
+}
+
+func TestLineNumber(t *testing.T) {
+	trythis()
+	for _, test := range []struct {
+		name string
+		val  int
+		want int
+	}{
+		{"firstLine", firstLine, 0},
+		{"lineVar1", lineVar1, 2},
+		{"lineVar2a", lineVar2a, 3},
+		{"lineVar2b", lineVar2b, 3},
+		{"compLit[0].lineA", compLit[0].lineA, 9},
+		{"compLit[0].lineB", compLit[0].lineB, 9},
+		{"compLit[1].lineA", compLit[1].lineA, 12},
+		{"compLit[1].lineB", compLit[1].lineB, 13},
+		{"compLit[2].lineA", compLit[2].lineA, 17},
+		{"compLit[2].lineB", compLit[2].lineB, 16},
+
+		{"arrayLit[0]", arrayLit[0], 20},
+		{"arrayLit[1]", arrayLit[1], 21},
+		{"arrayLit[2]", arrayLit[2], 21},
+		{"arrayLit[3]", arrayLit[3], 22},
+
+		{"sliceLit[0]", sliceLit[0], 24},
+		{"sliceLit[1]", sliceLit[1], 25},
+		{"sliceLit[2]", sliceLit[2], 25},
+		{"sliceLit[3]", sliceLit[3], 26},
+
+		{"mapLit[29]", mapLit[29], 29},
+		{"mapLit[30]", mapLit[30], 30},
+		{"mapLit[31]", mapLit[31+firstLine] + firstLine, 31}, // nb it's the key not the value
+		{"mapLit[32]", mapLit[32+firstLine] + firstLine, 32}, // nb it's the key not the value
+
+		{"intLit", intLit - 2*firstLine, 34 + 35 + 36},
+
+		{"l38", l38, 38},
+		{"l39", l39, 39},
+		{"l40", l40, 40},
+	} {
+		if got := test.val - firstLine; got != test.want {
+			t.Errorf("%s on firstLine+%d want firstLine+%d (firstLine=%d, val=%d)",
+				test.name, got, test.want, firstLine, test.val)
+		}
+	}
+}
+
+func TestNilName(t *testing.T) {
+	defer func() {
+		if ex := recover(); ex != nil {
+			t.Fatalf("expected no nil panic, got=%v", ex)
+		}
+	}()
+	if got := (*runtime.Func)(nil).Name(); got != "" {
+		t.Errorf("Name() = %q, want %q", got, "")
+	}
+}
+
+var dummy int
+
+func inlined() {
+	// Side effect to prevent elimination of this entire function.
+	dummy = 42
+}
+
+// A function with an InlTree. Returns a PC within the function body.
+//
+// No inline to ensure this complete function appears in output.
+//
+//go:noinline
+func tracebackFunc(t *testing.T) uintptr {
+	// This body must be more complex than a single call to inlined to get
+	// an inline tree.
+	inlined()
+	inlined()
+
+	// Acquire a PC in this function.
+	pc, _, _, ok := runtime.Caller(0)
+	if !ok {
+		t.Fatalf("Caller(0) got ok false, want true")
+	}
+
+	return pc
+}
+
+// Test that CallersFrames handles PCs in the alignment region between
+// functions (int 3 on amd64) without crashing.
+//
+// Go will never generate a stack trace containing such an address, as it is
+// not a valid call site. However, the cgo traceback function passed to
+// runtime.SetCgoTraceback may not be completely accurate and may incorrect
+// provide PCs in Go code or the alignement region between functions.
+//
+// Go obviously doesn't easily expose the problematic PCs to running programs,
+// so this test is a bit fragile. Some details:
+//
+// * tracebackFunc is our target function. We want to get a PC in the
+//   alignment region following this function. This function also has other
+//   functions inlined into it to ensure it has an InlTree (this was the source
+//   of the bug in issue 44971).
+//
+// * We acquire a PC in tracebackFunc, walking forwards until FuncForPC says
+//   we're in a new function. The last PC of the function according to FuncForPC
+//   should be in the alignment region (assuming the function isn't already
+//   perfectly aligned).
+//
+// This is a regression test for issue 44971.
+func TestFunctionAlignmentTraceback(t *testing.T) {
+	pc := tracebackFunc(t)
+
+	// Double-check we got the right PC.
+	f := runtime.FuncForPC(pc)
+	if !strings.HasSuffix(f.Name(), "tracebackFunc") {
+		t.Fatalf("Caller(0) = %+v, want tracebackFunc", f)
+	}
+
+	// Iterate forward until we find a different function. Back up one
+	// instruction is (hopefully) an alignment instruction.
+	for runtime.FuncForPC(pc) == f {
+		pc++
+	}
+	pc--
+
+	// Is this an alignment region filler instruction? We only check this
+	// on amd64 for simplicity. If this function has no filler, then we may
+	// get a false negative, but will never get a false positive.
+	if runtime.GOARCH == "amd64" {
+		code := *(*uint8)(unsafe.Pointer(pc))
+		if code != 0xcc { // INT $3
+			t.Errorf("PC %v code got %#x want 0xcc", pc, code)
+		}
+	}
+
+	// Finally ensure that Frames.Next doesn't crash when processing this
+	// PC.
+	frames := runtime.CallersFrames([]uintptr{pc})
+	frame, _ := frames.Next()
+	if frame.Func != f {
+		t.Errorf("frames.Next() got %+v want %+v", frame.Func, f)
+	}
+}
diff --git a/src/runtime/sys_aix_ppc64.s b/src/runtime/sys_aix_ppc64.s
new file mode 100644
index 0000000..a56d043
--- /dev/null
+++ b/src/runtime/sys_aix_ppc64.s
@@ -0,0 +1,315 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build aix
+// +build ppc64 ppc64le
+
+//
+// System calls and other sys.stuff for ppc64, Aix
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "asm_ppc64x.h"
+
+// This function calls a C function with the function descriptor in R12
+TEXT callCfunction<>(SB),	NOSPLIT|NOFRAME,$0
+	MOVD	0(R12), R12
+	MOVD	R2, 40(R1)
+	MOVD	0(R12), R0
+	MOVD	8(R12), R2
+	MOVD	R0, CTR
+	BR	(CTR)
+
+
+// asmsyscall6 calls a library function with a function descriptor
+// stored in libcall_fn and store the results in libcall struture
+// Up to 6 arguments can be passed to this C function
+// Called by runtime.asmcgocall
+// It reserves a stack of 288 bytes for the C function.
+// NOT USING GO CALLING CONVENTION
+// runtime.asmsyscall6 is a function descriptor to the real asmsyscall6.
+DATA	runtime·asmsyscall6+0(SB)/8, $asmsyscall6<>(SB)
+DATA	runtime·asmsyscall6+8(SB)/8, $TOC(SB)
+DATA	runtime·asmsyscall6+16(SB)/8, $0
+GLOBL	runtime·asmsyscall6(SB), NOPTR, $24
+
+TEXT asmsyscall6<>(SB),NOSPLIT,$256
+	MOVD	R3, 48(R1) // Save libcall for later
+	MOVD	libcall_fn(R3), R12
+	MOVD	libcall_args(R3), R9
+	MOVD	0(R9), R3
+	MOVD	8(R9), R4
+	MOVD	16(R9), R5
+	MOVD	24(R9), R6
+	MOVD	32(R9), R7
+	MOVD	40(R9), R8
+	BL	callCfunction<>(SB)
+
+	// Restore R0 and TOC
+	XOR	R0, R0
+	MOVD	40(R1), R2
+
+	// Store result in libcall
+	MOVD	48(R1), R5
+	MOVD	R3, (libcall_r1)(R5)
+	MOVD	$-1, R6
+	CMP	R6, R3
+	BNE	skiperrno
+
+	// Save errno in libcall
+	BL	runtime·load_g(SB)
+	MOVD	g_m(g), R4
+	MOVD	(m_mOS + mOS_perrno)(R4), R9
+	MOVW	0(R9), R9
+	MOVD	R9, (libcall_err)(R5)
+	RET
+skiperrno:
+	// Reset errno if no error has been returned
+	MOVD	R0, (libcall_err)(R5)
+	RET
+
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R3
+	MOVD	info+16(FP), R4
+	MOVD	ctx+24(FP), R5
+	MOVD	fn+0(FP), R12
+	// fn is a function descriptor
+	// R2 must be saved on restore
+	MOVD	0(R12), R0
+	MOVD	R2, 40(R1)
+	MOVD	8(R12), R2
+	MOVD	R0, CTR
+	BL	(CTR)
+	MOVD	40(R1), R2
+	BL	runtime·reginit(SB)
+	RET
+
+
+// runtime.sigtramp is a function descriptor to the real sigtramp.
+DATA	runtime·sigtramp+0(SB)/8, $sigtramp<>(SB)
+DATA	runtime·sigtramp+8(SB)/8, $TOC(SB)
+DATA	runtime·sigtramp+16(SB)/8, $0
+GLOBL	runtime·sigtramp(SB), NOPTR, $24
+
+// This function must not have any frame as we want to control how
+// every registers are used.
+// TODO(aix): Implement SetCgoTraceback handler.
+TEXT sigtramp<>(SB),NOSPLIT|NOFRAME,$0
+	MOVD	LR, R0
+	MOVD	R0, 16(R1)
+	// initialize essential registers (just in case)
+	BL	runtime·reginit(SB)
+
+	// Note that we are executing on altsigstack here, so we have
+	// more stack available than NOSPLIT would have us believe.
+	// To defeat the linker, we make our own stack frame with
+	// more space.
+	SUB	$144+FIXED_FRAME, R1
+
+	// Save registers
+	MOVD	R31, 56(R1)
+	MOVD	g, 64(R1)
+	MOVD	R29, 72(R1)
+	MOVD	R14, 80(R1)
+	MOVD	R15, 88(R1)
+
+	BL	runtime·load_g(SB)
+
+	CMP	$0, g
+	BEQ	sigtramp // g == nil
+	MOVD	g_m(g), R6
+	CMP	$0, R6
+	BEQ	sigtramp	// g.m == nil
+
+	// Save m->libcall. We need to do this because we
+	// might get interrupted by a signal in runtime·asmcgocall.
+	MOVD	(m_libcall+libcall_fn)(R6), R7
+	MOVD	R7, 96(R1)
+	MOVD	(m_libcall+libcall_args)(R6), R7
+	MOVD	R7, 104(R1)
+	MOVD	(m_libcall+libcall_n)(R6), R7
+	MOVD	R7, 112(R1)
+	MOVD	(m_libcall+libcall_r1)(R6), R7
+	MOVD	R7, 120(R1)
+	MOVD	(m_libcall+libcall_r2)(R6), R7
+	MOVD	R7, 128(R1)
+
+	// save errno, it might be EINTR; stuff we do here might reset it.
+	MOVD	(m_mOS+mOS_perrno)(R6), R8
+	MOVD	0(R8), R8
+	MOVD	R8, 136(R1)
+
+sigtramp:
+	MOVW	R3, FIXED_FRAME+0(R1)
+	MOVD	R4, FIXED_FRAME+8(R1)
+	MOVD	R5, FIXED_FRAME+16(R1)
+	MOVD	$runtime·sigtrampgo(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	CMP	$0, g
+	BEQ	exit // g == nil
+	MOVD	g_m(g), R6
+	CMP	$0, R6
+	BEQ	exit	// g.m == nil
+
+	// restore libcall
+	MOVD	96(R1), R7
+	MOVD	R7, (m_libcall+libcall_fn)(R6)
+	MOVD	104(R1), R7
+	MOVD	R7, (m_libcall+libcall_args)(R6)
+	MOVD	112(R1), R7
+	MOVD	R7, (m_libcall+libcall_n)(R6)
+	MOVD	120(R1), R7
+	MOVD	R7, (m_libcall+libcall_r1)(R6)
+	MOVD	128(R1), R7
+	MOVD	R7, (m_libcall+libcall_r2)(R6)
+
+	// restore errno
+	MOVD	(m_mOS+mOS_perrno)(R6), R7
+	MOVD	136(R1), R8
+	MOVD	R8, 0(R7)
+
+exit:
+	// restore registers
+	MOVD	56(R1),R31
+	MOVD	64(R1),g
+	MOVD	72(R1),R29
+	MOVD	80(R1), R14
+	MOVD	88(R1), R15
+
+	// Don't use RET because we need to restore R31 !
+	ADD $144+FIXED_FRAME, R1
+	MOVD	16(R1), R0
+	MOVD	R0, LR
+	BR (LR)
+
+// runtime.tstart is a function descriptor to the real tstart.
+DATA	runtime·tstart+0(SB)/8, $tstart<>(SB)
+DATA	runtime·tstart+8(SB)/8, $TOC(SB)
+DATA	runtime·tstart+16(SB)/8, $0
+GLOBL	runtime·tstart(SB), NOPTR, $24
+
+TEXT tstart<>(SB),NOSPLIT,$0
+	XOR	 R0, R0 // reset R0
+
+	// set g
+	MOVD	m_g0(R3), g
+	BL	runtime·save_g(SB)
+	MOVD	R3, g_m(g)
+
+	// Layout new m scheduler stack on os stack.
+	MOVD	R1, R3
+	MOVD	R3, (g_stack+stack_hi)(g)
+	SUB	$(const_threadStackSize), R3		// stack size
+	MOVD	R3, (g_stack+stack_lo)(g)
+	ADD	$const__StackGuard, R3
+	MOVD	R3, g_stackguard0(g)
+	MOVD	R3, g_stackguard1(g)
+
+	BL	runtime·mstart(SB)
+
+	MOVD R0, R3
+	RET
+
+
+#define CSYSCALL()			\
+	MOVD	0(R12), R12		\
+	MOVD	R2, 40(R1)		\
+	MOVD	0(R12), R0		\
+	MOVD	8(R12), R2		\
+	MOVD	R0, CTR			\
+	BL	(CTR)			\
+	MOVD	40(R1), R2		\
+	BL runtime·reginit(SB)
+
+
+// Runs on OS stack, called from runtime·osyield.
+TEXT runtime·osyield1(SB),NOSPLIT,$0
+	MOVD	$libc_sched_yield(SB), R12
+	CSYSCALL()
+	RET
+
+
+// Runs on OS stack, called from runtime·sigprocmask.
+TEXT runtime·sigprocmask1(SB),NOSPLIT,$0-24
+	MOVD	how+0(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	MOVD	$libpthread_sigthreadmask(SB), R12
+	CSYSCALL()
+	RET
+
+// Runs on OS stack, called from runtime·usleep.
+TEXT runtime·usleep1(SB),NOSPLIT,$0-4
+	MOVW	us+0(FP), R3
+	MOVD	$libc_usleep(SB), R12
+	CSYSCALL()
+	RET
+
+// Runs on OS stack, called from runtime·exit.
+TEXT runtime·exit1(SB),NOSPLIT,$0-4
+	MOVW	code+0(FP), R3
+	MOVD	$libc_exit(SB), R12
+	CSYSCALL()
+	RET
+
+// Runs on OS stack, called from runtime·write1.
+TEXT runtime·write2(SB),NOSPLIT,$0-28
+	MOVD	fd+0(FP), R3
+	MOVD	p+8(FP), R4
+	MOVW	n+16(FP), R5
+	MOVD	$libc_write(SB), R12
+	CSYSCALL()
+	MOVW	R3, ret+24(FP)
+	RET
+
+// Runs on OS stack, called from runtime·pthread_attr_init.
+TEXT runtime·pthread_attr_init1(SB),NOSPLIT,$0-12
+	MOVD	attr+0(FP), R3
+	MOVD	$libpthread_attr_init(SB), R12
+	CSYSCALL()
+	MOVW	R3, ret+8(FP)
+	RET
+
+// Runs on OS stack, called from runtime·pthread_attr_setstacksize.
+TEXT runtime·pthread_attr_setstacksize1(SB),NOSPLIT,$0-20
+	MOVD	attr+0(FP), R3
+	MOVD	size+8(FP), R4
+	MOVD	$libpthread_attr_setstacksize(SB), R12
+	CSYSCALL()
+	MOVW	R3, ret+16(FP)
+	RET
+
+// Runs on OS stack, called from runtime·pthread_setdetachstate.
+TEXT runtime·pthread_attr_setdetachstate1(SB),NOSPLIT,$0-20
+	MOVD	attr+0(FP), R3
+	MOVW	state+8(FP), R4
+	MOVD	$libpthread_attr_setdetachstate(SB), R12
+	CSYSCALL()
+	MOVW	R3, ret+16(FP)
+	RET
+
+// Runs on OS stack, called from runtime·pthread_create.
+TEXT runtime·pthread_create1(SB),NOSPLIT,$0-36
+	MOVD	tid+0(FP), R3
+	MOVD	attr+8(FP), R4
+	MOVD	fn+16(FP), R5
+	MOVD	arg+24(FP), R6
+	MOVD	$libpthread_create(SB), R12
+	CSYSCALL()
+	MOVW	R3, ret+32(FP)
+	RET
+
+// Runs on OS stack, called from runtime·sigaction.
+TEXT runtime·sigaction1(SB),NOSPLIT,$0-24
+	MOVD	sig+0(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	MOVD	$libc_sigaction(SB), R12
+	CSYSCALL()
+	RET
diff --git a/src/runtime/sys_arm.go b/src/runtime/sys_arm.go
new file mode 100644
index 0000000..730b9c9
--- /dev/null
+++ b/src/runtime/sys_arm.go
@@ -0,0 +1,21 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
+
+// for testing
+func usplit(x uint32) (q, r uint32)
diff --git a/src/runtime/sys_arm64.go b/src/runtime/sys_arm64.go
new file mode 100644
index 0000000..230241d
--- /dev/null
+++ b/src/runtime/sys_arm64.go
@@ -0,0 +1,18 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_darwin.go b/src/runtime/sys_darwin.go
new file mode 100644
index 0000000..4a3f2fc
--- /dev/null
+++ b/src/runtime/sys_darwin.go
@@ -0,0 +1,463 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// The X versions of syscall expect the libc call to return a 64-bit result.
+// Otherwise (the non-X version) expects a 32-bit result.
+// This distinction is required because an error is indicated by returning -1,
+// and we need to know whether to check 32 or 64 bits of the result.
+// (Some libc functions that return 32 bits put junk in the upper 32 bits of AX.)
+
+//go:linkname syscall_syscall syscall.syscall
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscall)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall()
+
+//go:linkname syscall_syscallX syscall.syscallX
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscallX(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	entersyscallblock()
+	libcCall(unsafe.Pointer(funcPC(syscallX)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscallX()
+
+//go:linkname syscall_syscall6 syscall.syscall6
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscall6)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall6()
+
+//go:linkname syscall_syscall6X syscall.syscall6X
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall6X(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscall6X)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall6X()
+
+//go:linkname syscall_syscallPtr syscall.syscallPtr
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscallPtr(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscallPtr)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscallPtr()
+
+//go:linkname syscall_rawSyscall syscall.rawSyscall
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_rawSyscall(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	libcCall(unsafe.Pointer(funcPC(syscall)), unsafe.Pointer(&fn))
+	return
+}
+
+//go:linkname syscall_rawSyscall6 syscall.rawSyscall6
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_rawSyscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	libcCall(unsafe.Pointer(funcPC(syscall6)), unsafe.Pointer(&fn))
+	return
+}
+
+// syscallNoErr is used in crypto/x509 to call into Security.framework and CF.
+
+//go:linkname crypto_x509_syscall crypto/x509/internal/macos.syscall
+//go:nosplit
+//go:cgo_unsafe_args
+func crypto_x509_syscall(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1 uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscallNoErr)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscallNoErr()
+
+// The *_trampoline functions convert from the Go calling convention to the C calling convention
+// and then call the underlying libc function.  They are defined in sys_darwin_$ARCH.s.
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_init(attr *pthreadattr) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_attr_init_trampoline)), unsafe.Pointer(&attr))
+}
+func pthread_attr_init_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_getstacksize(attr *pthreadattr, size *uintptr) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_attr_getstacksize_trampoline)), unsafe.Pointer(&attr))
+}
+func pthread_attr_getstacksize_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_setdetachstate(attr *pthreadattr, state int) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_attr_setdetachstate_trampoline)), unsafe.Pointer(&attr))
+}
+func pthread_attr_setdetachstate_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_create(attr *pthreadattr, start uintptr, arg unsafe.Pointer) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_create_trampoline)), unsafe.Pointer(&attr))
+}
+func pthread_create_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func raise(sig uint32) {
+	libcCall(unsafe.Pointer(funcPC(raise_trampoline)), unsafe.Pointer(&sig))
+}
+func raise_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_self() (t pthread) {
+	libcCall(unsafe.Pointer(funcPC(pthread_self_trampoline)), unsafe.Pointer(&t))
+	return
+}
+func pthread_self_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_kill(t pthread, sig uint32) {
+	libcCall(unsafe.Pointer(funcPC(pthread_kill_trampoline)), unsafe.Pointer(&t))
+	return
+}
+func pthread_kill_trampoline()
+
+// mmap is used to do low-level memory allocation via mmap. Don't allow stack
+// splits, since this function (used by sysAlloc) is called in a lot of low-level
+// parts of the runtime and callers often assume it won't acquire any locks.
+// go:nosplit
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (unsafe.Pointer, int) {
+	args := struct {
+		addr            unsafe.Pointer
+		n               uintptr
+		prot, flags, fd int32
+		off             uint32
+		ret1            unsafe.Pointer
+		ret2            int
+	}{addr, n, prot, flags, fd, off, nil, 0}
+	libcCall(unsafe.Pointer(funcPC(mmap_trampoline)), unsafe.Pointer(&args))
+	return args.ret1, args.ret2
+}
+func mmap_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func munmap(addr unsafe.Pointer, n uintptr) {
+	libcCall(unsafe.Pointer(funcPC(munmap_trampoline)), unsafe.Pointer(&addr))
+}
+func munmap_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) {
+	libcCall(unsafe.Pointer(funcPC(madvise_trampoline)), unsafe.Pointer(&addr))
+}
+func madvise_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func mlock(addr unsafe.Pointer, n uintptr) {
+	libcCall(unsafe.Pointer(funcPC(mlock_trampoline)), unsafe.Pointer(&addr))
+}
+func mlock_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func read(fd int32, p unsafe.Pointer, n int32) int32 {
+	return libcCall(unsafe.Pointer(funcPC(read_trampoline)), unsafe.Pointer(&fd))
+}
+func read_trampoline()
+
+func pipe() (r, w int32, errno int32) {
+	var p [2]int32
+	errno = libcCall(unsafe.Pointer(funcPC(pipe_trampoline)), noescape(unsafe.Pointer(&p)))
+	return p[0], p[1], errno
+}
+func pipe_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func closefd(fd int32) int32 {
+	return libcCall(unsafe.Pointer(funcPC(close_trampoline)), unsafe.Pointer(&fd))
+}
+func close_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+//
+// This is exported via linkname to assembly in runtime/cgo.
+//go:linkname exit
+func exit(code int32) {
+	libcCall(unsafe.Pointer(funcPC(exit_trampoline)), unsafe.Pointer(&code))
+}
+func exit_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func usleep(usec uint32) {
+	libcCall(unsafe.Pointer(funcPC(usleep_trampoline)), unsafe.Pointer(&usec))
+}
+func usleep_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	return libcCall(unsafe.Pointer(funcPC(write_trampoline)), unsafe.Pointer(&fd))
+}
+func write_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func open(name *byte, mode, perm int32) (ret int32) {
+	return libcCall(unsafe.Pointer(funcPC(open_trampoline)), unsafe.Pointer(&name))
+}
+func open_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func nanotime1() int64 {
+	var r struct {
+		t            int64  // raw timer
+		numer, denom uint32 // conversion factors. nanoseconds = t * numer / denom.
+	}
+	libcCall(unsafe.Pointer(funcPC(nanotime_trampoline)), unsafe.Pointer(&r))
+	// Note: Apple seems unconcerned about overflow here. See
+	// https://developer.apple.com/library/content/qa/qa1398/_index.html
+	// Note also, numer == denom == 1 is common.
+	t := r.t
+	if r.numer != 1 {
+		t *= int64(r.numer)
+	}
+	if r.denom != 1 {
+		t /= int64(r.denom)
+	}
+	return t
+}
+func nanotime_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func walltime1() (int64, int32) {
+	var t timespec
+	libcCall(unsafe.Pointer(funcPC(walltime_trampoline)), unsafe.Pointer(&t))
+	return t.tv_sec, int32(t.tv_nsec)
+}
+func walltime_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigaction(sig uint32, new *usigactiont, old *usigactiont) {
+	libcCall(unsafe.Pointer(funcPC(sigaction_trampoline)), unsafe.Pointer(&sig))
+}
+func sigaction_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigprocmask(how uint32, new *sigset, old *sigset) {
+	libcCall(unsafe.Pointer(funcPC(sigprocmask_trampoline)), unsafe.Pointer(&how))
+}
+func sigprocmask_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigaltstack(new *stackt, old *stackt) {
+	if new != nil && new.ss_flags&_SS_DISABLE != 0 && new.ss_size == 0 {
+		// Despite the fact that Darwin's sigaltstack man page says it ignores the size
+		// when SS_DISABLE is set, it doesn't. sigaltstack returns ENOMEM
+		// if we don't give it a reasonable size.
+		// ref: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140421/214296.html
+		new.ss_size = 32768
+	}
+	libcCall(unsafe.Pointer(funcPC(sigaltstack_trampoline)), unsafe.Pointer(&new))
+}
+func sigaltstack_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func raiseproc(sig uint32) {
+	libcCall(unsafe.Pointer(funcPC(raiseproc_trampoline)), unsafe.Pointer(&sig))
+}
+func raiseproc_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func setitimer(mode int32, new, old *itimerval) {
+	libcCall(unsafe.Pointer(funcPC(setitimer_trampoline)), unsafe.Pointer(&mode))
+}
+func setitimer_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sysctl(mib *uint32, miblen uint32, oldp *byte, oldlenp *uintptr, newp *byte, newlen uintptr) int32 {
+	return libcCall(unsafe.Pointer(funcPC(sysctl_trampoline)), unsafe.Pointer(&mib))
+}
+func sysctl_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sysctlbyname(name *byte, oldp *byte, oldlenp *uintptr, newp *byte, newlen uintptr) int32 {
+	return libcCall(unsafe.Pointer(funcPC(sysctlbyname_trampoline)), unsafe.Pointer(&name))
+}
+func sysctlbyname_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func fcntl(fd, cmd, arg int32) int32 {
+	return libcCall(unsafe.Pointer(funcPC(fcntl_trampoline)), unsafe.Pointer(&fd))
+}
+func fcntl_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func kqueue() int32 {
+	v := libcCall(unsafe.Pointer(funcPC(kqueue_trampoline)), nil)
+	return v
+}
+func kqueue_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32 {
+	return libcCall(unsafe.Pointer(funcPC(kevent_trampoline)), unsafe.Pointer(&kq))
+}
+func kevent_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_mutex_init(m *pthreadmutex, attr *pthreadmutexattr) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_mutex_init_trampoline)), unsafe.Pointer(&m))
+}
+func pthread_mutex_init_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_mutex_lock(m *pthreadmutex) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_mutex_lock_trampoline)), unsafe.Pointer(&m))
+}
+func pthread_mutex_lock_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_mutex_unlock(m *pthreadmutex) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_mutex_unlock_trampoline)), unsafe.Pointer(&m))
+}
+func pthread_mutex_unlock_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_cond_init(c *pthreadcond, attr *pthreadcondattr) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_cond_init_trampoline)), unsafe.Pointer(&c))
+}
+func pthread_cond_init_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_cond_wait(c *pthreadcond, m *pthreadmutex) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_cond_wait_trampoline)), unsafe.Pointer(&c))
+}
+func pthread_cond_wait_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_cond_timedwait_relative_np(c *pthreadcond, m *pthreadmutex, t *timespec) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_cond_timedwait_relative_np_trampoline)), unsafe.Pointer(&c))
+}
+func pthread_cond_timedwait_relative_np_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_cond_signal(c *pthreadcond) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_cond_signal_trampoline)), unsafe.Pointer(&c))
+}
+func pthread_cond_signal_trampoline()
+
+// Not used on Darwin, but must be defined.
+func exitThread(wait *uint32) {
+}
+
+//go:nosplit
+func closeonexec(fd int32) {
+	fcntl(fd, _F_SETFD, _FD_CLOEXEC)
+}
+
+//go:nosplit
+func setNonblock(fd int32) {
+	flags := fcntl(fd, _F_GETFL, 0)
+	fcntl(fd, _F_SETFL, flags|_O_NONBLOCK)
+}
+
+// Tell the linker that the libc_* functions are to be found
+// in a system library, with the libc_ prefix missing.
+
+//go:cgo_import_dynamic libc_pthread_attr_init pthread_attr_init "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_attr_getstacksize pthread_attr_getstacksize "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_attr_setdetachstate pthread_attr_setdetachstate "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_create pthread_create "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_self pthread_self "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_kill pthread_kill "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_exit _exit "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_raise raise "/usr/lib/libSystem.B.dylib"
+
+//go:cgo_import_dynamic libc_open open "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_close close "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_read read "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_write write "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pipe pipe "/usr/lib/libSystem.B.dylib"
+
+//go:cgo_import_dynamic libc_mmap mmap "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_munmap munmap "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_madvise madvise "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_mlock mlock "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_error __error "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_usleep usleep "/usr/lib/libSystem.B.dylib"
+
+//go:cgo_import_dynamic libc_mach_timebase_info mach_timebase_info "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_mach_absolute_time mach_absolute_time "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_clock_gettime clock_gettime "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_sigaction sigaction "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_sigmask pthread_sigmask "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_sigaltstack sigaltstack "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_getpid getpid "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_kill kill "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_setitimer setitimer "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_sysctl sysctl "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_sysctlbyname sysctlbyname "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_fcntl fcntl "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_kqueue kqueue "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_kevent kevent "/usr/lib/libSystem.B.dylib"
+
+//go:cgo_import_dynamic libc_pthread_mutex_init pthread_mutex_init "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_mutex_lock pthread_mutex_lock "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_mutex_unlock pthread_mutex_unlock "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_cond_init pthread_cond_init "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_cond_wait pthread_cond_wait "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_cond_timedwait_relative_np pthread_cond_timedwait_relative_np "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_cond_signal pthread_cond_signal "/usr/lib/libSystem.B.dylib"
diff --git a/src/runtime/sys_darwin_amd64.s b/src/runtime/sys_darwin_amd64.s
new file mode 100644
index 0000000..630fb5d
--- /dev/null
+++ b/src/runtime/sys_darwin_amd64.s
@@ -0,0 +1,870 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// System calls and other sys.stuff for AMD64, Darwin
+// System calls are implemented in libSystem, this file contains
+// trampolines that convert from Go to C calling convention.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	0(DI), DI		// arg 1 exit status
+	CALL	libc_exit(SB)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+TEXT runtime·open_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	8(DI), SI		// arg 2 flags
+	MOVL	12(DI), DX		// arg 3 mode
+	MOVQ	0(DI), DI		// arg 1 pathname
+	XORL	AX, AX			// vararg: say "no float args"
+	CALL	libc_open(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·close_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	0(DI), DI		// arg 1 fd
+	CALL	libc_close(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·read_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 buf
+	MOVL	16(DI), DX		// arg 3 count
+	MOVL	0(DI), DI		// arg 1 fd
+	CALL	libc_read(SB)
+	TESTL	AX, AX
+	JGE	noerr
+	CALL	libc_error(SB)
+	MOVL	(AX), AX
+	NEGL	AX			// caller expects negative errno value
+noerr:
+	POPQ	BP
+	RET
+
+TEXT runtime·write_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 buf
+	MOVL	16(DI), DX		// arg 3 count
+	MOVQ	0(DI), DI		// arg 1 fd
+	CALL	libc_write(SB)
+	TESTL	AX, AX
+	JGE	noerr
+	CALL	libc_error(SB)
+	MOVL	(AX), AX
+	NEGL	AX			// caller expects negative errno value
+noerr:
+	POPQ	BP
+	RET
+
+TEXT runtime·pipe_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	CALL	libc_pipe(SB)		// pointer already in DI
+	TESTL	AX, AX
+	JEQ	3(PC)
+	CALL	libc_error(SB)		// return negative errno value
+	NEGL	AX
+	POPQ	BP
+	RET
+
+TEXT runtime·setitimer_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 new
+	MOVQ	16(DI), DX		// arg 3 old
+	MOVL	0(DI), DI		// arg 1 which
+	CALL	libc_setitimer(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·madvise_trampoline(SB), NOSPLIT, $0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 len
+	MOVL	16(DI), DX	// arg 3 advice
+	MOVQ	0(DI), DI	// arg 1 addr
+	CALL	libc_madvise(SB)
+	// ignore failure - maybe pages are locked
+	POPQ	BP
+	RET
+
+TEXT runtime·mlock_trampoline(SB), NOSPLIT, $0
+	UNDEF // unimplemented
+
+GLOBL timebase<>(SB),NOPTR,$(machTimebaseInfo__size)
+
+TEXT runtime·nanotime_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	DI, BX
+	CALL	libc_mach_absolute_time(SB)
+	MOVQ	AX, 0(BX)
+	MOVL	timebase<>+machTimebaseInfo_numer(SB), SI
+	MOVL	timebase<>+machTimebaseInfo_denom(SB), DI // atomic read
+	TESTL	DI, DI
+	JNE	initialized
+
+	SUBQ	$(machTimebaseInfo__size+15)/16*16, SP
+	MOVQ	SP, DI
+	CALL	libc_mach_timebase_info(SB)
+	MOVL	machTimebaseInfo_numer(SP), SI
+	MOVL	machTimebaseInfo_denom(SP), DI
+	ADDQ	$(machTimebaseInfo__size+15)/16*16, SP
+
+	MOVL	SI, timebase<>+machTimebaseInfo_numer(SB)
+	MOVL	DI, AX
+	XCHGL	AX, timebase<>+machTimebaseInfo_denom(SB) // atomic write
+
+initialized:
+	MOVL	SI, 8(BX)
+	MOVL	DI, 12(BX)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+TEXT runtime·walltime_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP			// make a frame; keep stack aligned
+	MOVQ	SP, BP
+	MOVQ	DI, SI			// arg 2 timespec
+	MOVL	$CLOCK_REALTIME, DI	// arg 1 clock_id
+	CALL	libc_clock_gettime(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·sigaction_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 new
+	MOVQ	16(DI), DX		// arg 3 old
+	MOVL	0(DI), DI		// arg 1 sig
+	CALL	libc_sigaction(SB)
+	TESTL	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+TEXT runtime·sigprocmask_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 new
+	MOVQ	16(DI), DX	// arg 3 old
+	MOVL	0(DI), DI	// arg 1 how
+	CALL	libc_pthread_sigmask(SB)
+	TESTL	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+TEXT runtime·sigaltstack_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 old
+	MOVQ	0(DI), DI		// arg 1 new
+	CALL	libc_sigaltstack(SB)
+	TESTQ	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+TEXT runtime·raiseproc_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	0(DI), BX	// signal
+	CALL	libc_getpid(SB)
+	MOVL	AX, DI		// arg 1 pid
+	MOVL	BX, SI		// arg 2 signal
+	CALL	libc_kill(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	PUSHQ	BP
+	MOVQ	SP, BP
+	ANDQ	$~15, SP     // alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// This is the function registered during sigaction and is invoked when
+// a signal is received. It just redirects to the Go function sigtrampgo.
+TEXT runtime·sigtramp(SB),NOSPLIT,$0
+	// This runs on the signal stack, so we have lots of stack available.
+	// We allocate our own stack space, because if we tell the linker
+	// how much we're using, the NOSPLIT check fails.
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$64, SP
+
+	// Save callee-save registers.
+	MOVQ	BX, 24(SP)
+	MOVQ	R12, 32(SP)
+	MOVQ	R13, 40(SP)
+	MOVQ	R14, 48(SP)
+	MOVQ	R15, 56(SP)
+
+	// Call into the Go signal handler
+	MOVL	DI, 0(SP)  // sig
+	MOVQ	SI, 8(SP)  // info
+	MOVQ	DX, 16(SP) // ctx
+	CALL runtime·sigtrampgo(SB)
+
+	// Restore callee-save registers.
+	MOVQ	24(SP), BX
+	MOVQ	32(SP), R12
+	MOVQ	40(SP), R13
+	MOVQ	48(SP), R14
+	MOVQ	56(SP), R15
+
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// Used instead of sigtramp in programs that use cgo.
+// Arguments from kernel are in DI, SI, DX.
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	// If no traceback function, do usual sigtramp.
+	MOVQ	runtime·cgoTraceback(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// If no traceback support function, which means that
+	// runtime/cgo was not linked in, do usual sigtramp.
+	MOVQ	_cgo_callers(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// Figure out if we are currently in a cgo call.
+	// If not, just do usual sigtramp.
+	get_tls(CX)
+	MOVQ	g(CX),AX
+	TESTQ	AX, AX
+	JZ	sigtrampnog     // g == nil
+	MOVQ	g_m(AX), AX
+	TESTQ	AX, AX
+	JZ	sigtramp        // g.m == nil
+	MOVL	m_ncgo(AX), CX
+	TESTL	CX, CX
+	JZ	sigtramp        // g.m.ncgo == 0
+	MOVQ	m_curg(AX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg == nil
+	MOVQ	g_syscallsp(CX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg.syscallsp == 0
+	MOVQ	m_cgoCallers(AX), R8
+	TESTQ	R8, R8
+	JZ	sigtramp        // g.m.cgoCallers == nil
+	MOVL	m_cgoCallersUse(AX), CX
+	TESTL	CX, CX
+	JNZ	sigtramp	// g.m.cgoCallersUse != 0
+
+	// Jump to a function in runtime/cgo.
+	// That function, written in C, will call the user's traceback
+	// function with proper unwind info, and will then call back here.
+	// The first three arguments, and the fifth, are already in registers.
+	// Set the two remaining arguments now.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigtramp(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+sigtramp:
+	JMP	runtime·sigtramp(SB)
+
+sigtrampnog:
+	// Signal arrived on a non-Go thread. If this is SIGPROF, get a
+	// stack trace.
+	CMPL	DI, $27 // 27 == SIGPROF
+	JNZ	sigtramp
+
+	// Lock sigprofCallersUse.
+	MOVL	$0, AX
+	MOVL	$1, CX
+	MOVQ	$runtime·sigprofCallersUse(SB), R11
+	LOCK
+	CMPXCHGL	CX, 0(R11)
+	JNZ	sigtramp  // Skip stack trace if already locked.
+
+	// Jump to the traceback function in runtime/cgo.
+	// It will call back to sigprofNonGo, which will ignore the
+	// arguments passed in registers.
+	// First three arguments to traceback function are in registers already.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigprofCallers(SB), R8
+	MOVQ	$runtime·sigprofNonGo(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+TEXT runtime·mmap_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP			// make a frame; keep stack aligned
+	MOVQ	SP, BP
+	MOVQ	DI, BX
+	MOVQ	0(BX), DI		// arg 1 addr
+	MOVQ	8(BX), SI		// arg 2 len
+	MOVL	16(BX), DX		// arg 3 prot
+	MOVL	20(BX), CX		// arg 4 flags
+	MOVL	24(BX), R8		// arg 5 fid
+	MOVL	28(BX), R9		// arg 6 offset
+	CALL	libc_mmap(SB)
+	XORL	DX, DX
+	CMPQ	AX, $-1
+	JNE	ok
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), DX		// errno
+	XORL	AX, AX
+ok:
+	MOVQ	AX, 32(BX)
+	MOVQ	DX, 40(BX)
+	POPQ	BP
+	RET
+
+TEXT runtime·munmap_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 len
+	MOVQ	0(DI), DI		// arg 1 addr
+	CALL	libc_munmap(SB)
+	TESTQ	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+TEXT runtime·usleep_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	0(DI), DI	// arg 1 usec
+	CALL	libc_usleep(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·settls(SB),NOSPLIT,$32
+	// Nothing to do on Darwin, pthread already set thread-local storage up.
+	RET
+
+TEXT runtime·sysctl_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	8(DI), SI		// arg 2 miblen
+	MOVQ	16(DI), DX		// arg 3 oldp
+	MOVQ	24(DI), CX		// arg 4 oldlenp
+	MOVQ	32(DI), R8		// arg 5 newp
+	MOVQ	40(DI), R9		// arg 6 newlen
+	MOVQ	0(DI), DI		// arg 1 mib
+	CALL	libc_sysctl(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·sysctlbyname_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 oldp
+	MOVQ	16(DI), DX		// arg 3 oldlenp
+	MOVQ	24(DI), CX		// arg 4 newp
+	MOVQ	32(DI), R8		// arg 5 newlen
+	MOVQ	0(DI), DI		// arg 1 name
+	CALL	libc_sysctlbyname(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·kqueue_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	CALL	libc_kqueue(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·kevent_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 keventt
+	MOVL	16(DI), DX		// arg 3 nch
+	MOVQ	24(DI), CX		// arg 4 ev
+	MOVL	32(DI), R8		// arg 5 nev
+	MOVQ	40(DI), R9		// arg 6 ts
+	MOVL	0(DI), DI		// arg 1 kq
+	CALL	libc_kevent(SB)
+	CMPL	AX, $-1
+	JNE	ok
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX		// errno
+	NEGQ	AX			// caller wants it as a negative error code
+ok:
+	POPQ	BP
+	RET
+
+TEXT runtime·fcntl_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	4(DI), SI		// arg 2 cmd
+	MOVL	8(DI), DX		// arg 3 arg
+	MOVL	0(DI), DI		// arg 1 fd
+	XORL	AX, AX			// vararg: say "no float args"
+	CALL	libc_fcntl(SB)
+	POPQ	BP
+	RET
+
+// mstart_stub is the first function executed on a new thread started by pthread_create.
+// It just does some low-level setup and then calls mstart.
+// Note: called with the C calling convention.
+TEXT runtime·mstart_stub(SB),NOSPLIT,$0
+	// DI points to the m.
+	// We are already on m's g0 stack.
+
+	// Save callee-save registers.
+	SUBQ	$40, SP
+	MOVQ	BX, 0(SP)
+	MOVQ	R12, 8(SP)
+	MOVQ	R13, 16(SP)
+	MOVQ	R14, 24(SP)
+	MOVQ	R15, 32(SP)
+
+	MOVQ	m_g0(DI), DX // g
+
+	// Initialize TLS entry.
+	// See cmd/link/internal/ld/sym.go:computeTLSOffset.
+	MOVQ	DX, 0x30(GS)
+
+	// Someday the convention will be D is always cleared.
+	CLD
+
+	CALL	runtime·mstart(SB)
+
+	// Restore callee-save registers.
+	MOVQ	0(SP), BX
+	MOVQ	8(SP), R12
+	MOVQ	16(SP), R13
+	MOVQ	24(SP), R14
+	MOVQ	32(SP), R15
+
+	// Go is all done with this OS thread.
+	// Tell pthread everything is ok (we never join with this thread, so
+	// the value here doesn't really matter).
+	XORL	AX, AX
+
+	ADDQ	$40, SP
+	RET
+
+// These trampolines help convert from Go calling convention to C calling convention.
+// They should be called with asmcgocall.
+// A pointer to the arguments is passed in DI.
+// A single int32 result is returned in AX.
+// (For more results, make an args/results structure.)
+TEXT runtime·pthread_attr_init_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP	// make frame, keep stack 16-byte aligned.
+	MOVQ	SP, BP
+	MOVQ	0(DI), DI // arg 1 attr
+	CALL	libc_pthread_attr_init(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_attr_getstacksize_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 size
+	MOVQ	0(DI), DI	// arg 1 attr
+	CALL	libc_pthread_attr_getstacksize(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_attr_setdetachstate_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 state
+	MOVQ	0(DI), DI	// arg 1 attr
+	CALL	libc_pthread_attr_setdetachstate(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_create_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	0(DI), SI	// arg 2 attr
+	MOVQ	8(DI), DX	// arg 3 start
+	MOVQ	16(DI), CX	// arg 4 arg
+	MOVQ	SP, DI		// arg 1 &threadid (which we throw away)
+	CALL	libc_pthread_create(SB)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+TEXT runtime·raise_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	0(DI), DI	// arg 1 signal
+	CALL	libc_raise(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_mutex_init_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 attr
+	MOVQ	0(DI), DI	// arg 1 mutex
+	CALL	libc_pthread_mutex_init(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_mutex_lock_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	0(DI), DI	// arg 1 mutex
+	CALL	libc_pthread_mutex_lock(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_mutex_unlock_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	0(DI), DI	// arg 1 mutex
+	CALL	libc_pthread_mutex_unlock(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_cond_init_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 attr
+	MOVQ	0(DI), DI	// arg 1 cond
+	CALL	libc_pthread_cond_init(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_cond_wait_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 mutex
+	MOVQ	0(DI), DI	// arg 1 cond
+	CALL	libc_pthread_cond_wait(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_cond_timedwait_relative_np_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 mutex
+	MOVQ	16(DI), DX	// arg 3 timeout
+	MOVQ	0(DI), DI	// arg 1 cond
+	CALL	libc_pthread_cond_timedwait_relative_np(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_cond_signal_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	0(DI), DI	// arg 1 cond
+	CALL	libc_pthread_cond_signal(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_self_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	DI, BX		// BX is caller-save
+	CALL	libc_pthread_self(SB)
+	MOVQ	AX, 0(BX)	// return value
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_kill_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 sig
+	MOVQ	0(DI), DI	// arg 1 thread
+	CALL	libc_pthread_kill(SB)
+	POPQ	BP
+	RET
+
+// syscall calls a function in libc on behalf of the syscall package.
+// syscall takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	(0*8)(DI), CX // fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	CX
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (4*8)(DI) // r1
+	MOVQ	DX, (5*8)(DI) // r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPL	AX, $-1	      // Note: high 32 bits are junk
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (6*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// syscallX calls a function in libc on behalf of the syscall package.
+// syscallX takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscallX must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscallX is like syscall but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscallX(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	(0*8)(DI), CX // fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	CX
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (4*8)(DI) // r1
+	MOVQ	DX, (5*8)(DI) // r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPQ	AX, $-1
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (6*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// syscallPtr is like syscallX except that the libc function reports an
+// error by returning NULL and setting errno.
+TEXT runtime·syscallPtr(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	(0*8)(DI), CX // fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	CX
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (4*8)(DI) // r1
+	MOVQ	DX, (5*8)(DI) // r2
+
+	// syscallPtr libc functions return NULL on error
+	// and set errno.
+	TESTQ	AX, AX
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (6*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// syscall6 calls a function in libc on behalf of the syscall package.
+// syscall6 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6 expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall6(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	(0*8)(DI), R11// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), R9 // a6
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (7*8)(DI) // r1
+	MOVQ	DX, (8*8)(DI) // r2
+
+	CMPL	AX, $-1
+	JNE	ok
+
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (9*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// syscall6X calls a function in libc on behalf of the syscall package.
+// syscall6X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6X is like syscall6 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall6X(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	(0*8)(DI), R11// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), R9 // a6
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (7*8)(DI) // r1
+	MOVQ	DX, (8*8)(DI) // r2
+
+	CMPQ	AX, $-1
+	JNE	ok
+
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (9*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// syscallNoErr is like syscall6 but does not check for errors, and
+// only returns one value, for use with standard C ABI library functions.
+TEXT runtime·syscallNoErr(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	(0*8)(DI), R11// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), R9 // a6
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (7*8)(DI) // r1
+
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
diff --git a/src/runtime/sys_darwin_arm64.go b/src/runtime/sys_darwin_arm64.go
new file mode 100644
index 0000000..9c14f33
--- /dev/null
+++ b/src/runtime/sys_darwin_arm64.go
@@ -0,0 +1,62 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// libc function wrappers. Must run on system stack.
+
+//go:nosplit
+//go:cgo_unsafe_args
+func g0_pthread_key_create(k *pthreadkey, destructor uintptr) int32 {
+	return asmcgocall(unsafe.Pointer(funcPC(pthread_key_create_trampoline)), unsafe.Pointer(&k))
+}
+func pthread_key_create_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func g0_pthread_setspecific(k pthreadkey, value uintptr) int32 {
+	return asmcgocall(unsafe.Pointer(funcPC(pthread_setspecific_trampoline)), unsafe.Pointer(&k))
+}
+func pthread_setspecific_trampoline()
+
+//go:cgo_import_dynamic libc_pthread_key_create pthread_key_create "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_setspecific pthread_setspecific "/usr/lib/libSystem.B.dylib"
+
+// tlsinit allocates a thread-local storage slot for g.
+//
+// It finds the first available slot using pthread_key_create and uses
+// it as the offset value for runtime.tlsg.
+//
+// This runs at startup on g0 stack, but before g is set, so it must
+// not split stack (transitively). g is expected to be nil, so things
+// (e.g. asmcgocall) will skip saving or reading g.
+//
+//go:nosplit
+func tlsinit(tlsg *uintptr, tlsbase *[_PTHREAD_KEYS_MAX]uintptr) {
+	var k pthreadkey
+	err := g0_pthread_key_create(&k, 0)
+	if err != 0 {
+		abort()
+	}
+
+	const magic = 0xc476c475c47957
+	err = g0_pthread_setspecific(k, magic)
+	if err != 0 {
+		abort()
+	}
+
+	for i, x := range tlsbase {
+		if x == magic {
+			*tlsg = uintptr(i * sys.PtrSize)
+			g0_pthread_setspecific(k, 0)
+			return
+		}
+	}
+	abort()
+}
diff --git a/src/runtime/sys_darwin_arm64.s b/src/runtime/sys_darwin_arm64.s
new file mode 100644
index 0000000..96d2ed1
--- /dev/null
+++ b/src/runtime/sys_darwin_arm64.s
@@ -0,0 +1,757 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// System calls and other sys.stuff for ARM64, Darwin
+// System calls are implemented in libSystem, this file contains
+// trampolines that convert from Go to C calling convention.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+
+TEXT notok<>(SB),NOSPLIT,$0
+	MOVD	$0, R8
+	MOVD	R8, (R8)
+	B	0(PC)
+
+TEXT runtime·open_trampoline(SB),NOSPLIT,$0
+	SUB	$16, RSP
+	MOVW	8(R0), R1	// arg 2 flags
+	MOVW	12(R0), R2	// arg 3 mode
+	MOVW	R2, (RSP)	// arg 3 is variadic, pass on stack
+	MOVD	0(R0), R0	// arg 1 pathname
+	BL	libc_open(SB)
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·close_trampoline(SB),NOSPLIT,$0
+	MOVW	0(R0), R0	// arg 1 fd
+	BL	libc_close(SB)
+	RET
+
+TEXT runtime·write_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 buf
+	MOVW	16(R0), R2	// arg 3 count
+	MOVW	0(R0), R0	// arg 1 fd
+	BL	libc_write(SB)
+	MOVD	$-1, R1
+	CMP	R0, R1
+	BNE	noerr
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	NEG	R0, R0		// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·read_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 buf
+	MOVW	16(R0), R2	// arg 3 count
+	MOVW	0(R0), R0	// arg 1 fd
+	BL	libc_read(SB)
+	MOVD	$-1, R1
+	CMP	R0, R1
+	BNE	noerr
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	NEG	R0, R0		// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·pipe_trampoline(SB),NOSPLIT,$0
+	BL	libc_pipe(SB)	// pointer already in R0
+	CMP	$0, R0
+	BEQ	3(PC)
+	BL	libc_error(SB)	// return negative errno value
+	NEG	R0, R0
+	RET
+
+TEXT runtime·exit_trampoline(SB),NOSPLIT|NOFRAME,$0
+	MOVW	0(R0), R0
+	BL	libc_exit(SB)
+	MOVD	$1234, R0
+	MOVD	$1002, R1
+	MOVD	R0, (R1)	// fail hard
+
+TEXT runtime·raiseproc_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R19	// signal
+	BL	libc_getpid(SB)
+	// arg 1 pid already in R0 from getpid
+	MOVD	R19, R1	// arg 2 signal
+	BL	libc_kill(SB)
+	RET
+
+TEXT runtime·mmap_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R19
+	MOVD	0(R19), R0	// arg 1 addr
+	MOVD	8(R19), R1	// arg 2 len
+	MOVW	16(R19), R2	// arg 3 prot
+	MOVW	20(R19), R3	// arg 4 flags
+	MOVW	24(R19), R4	// arg 5 fd
+	MOVW	28(R19), R5	// arg 6 off
+	BL	libc_mmap(SB)
+	MOVD	$0, R1
+	MOVD	$-1, R2
+	CMP	R0, R2
+	BNE	ok
+	BL	libc_error(SB)
+	MOVW	(R0), R1
+	MOVD	$0, R0
+ok:
+	MOVD	R0, 32(R19) // ret 1 p
+	MOVD	R1, 40(R19)	// ret 2 err
+	RET
+
+TEXT runtime·munmap_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 len
+	MOVD	0(R0), R0	// arg 1 addr
+	BL	libc_munmap(SB)
+	CMP	$0, R0
+	BEQ	2(PC)
+	BL	notok<>(SB)
+	RET
+
+TEXT runtime·madvise_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 len
+	MOVW	16(R0), R2	// arg 3 advice
+	MOVD	0(R0), R0	// arg 1 addr
+	BL	libc_madvise(SB)
+	RET
+
+TEXT runtime·mlock_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 len
+	MOVD	0(R0), R0	// arg 1 addr
+	BL	libc_mlock(SB)
+	RET
+
+TEXT runtime·setitimer_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 new
+	MOVD	16(R0), R2	// arg 3 old
+	MOVW	0(R0), R0	// arg 1 which
+	BL	libc_setitimer(SB)
+	RET
+
+TEXT runtime·walltime_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R1			// arg 2 timespec
+	MOVW	$CLOCK_REALTIME, R0 	// arg 1 clock_id
+	BL	libc_clock_gettime(SB)
+	RET
+
+GLOBL timebase<>(SB),NOPTR,$(machTimebaseInfo__size)
+
+TEXT runtime·nanotime_trampoline(SB),NOSPLIT,$40
+	MOVD	R0, R19
+	BL	libc_mach_absolute_time(SB)
+	MOVD	R0, 0(R19)
+	MOVW	timebase<>+machTimebaseInfo_numer(SB), R20
+	MOVD	$timebase<>+machTimebaseInfo_denom(SB), R21
+	LDARW	(R21), R21	// atomic read
+	CMP	$0, R21
+	BNE	initialized
+
+	SUB	$(machTimebaseInfo__size+15)/16*16, RSP
+	MOVD	RSP, R0
+	BL	libc_mach_timebase_info(SB)
+	MOVW	machTimebaseInfo_numer(RSP), R20
+	MOVW	machTimebaseInfo_denom(RSP), R21
+	ADD	$(machTimebaseInfo__size+15)/16*16, RSP
+
+	MOVW	R20, timebase<>+machTimebaseInfo_numer(SB)
+	MOVD	$timebase<>+machTimebaseInfo_denom(SB), R22
+	STLRW	R21, (R22)	// atomic write
+
+initialized:
+	MOVW	R20, 8(R19)
+	MOVW	R21, 12(R19)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R0
+	MOVD	info+16(FP), R1
+	MOVD	ctx+24(FP), R2
+	MOVD	fn+0(FP), R11
+	BL	(R11)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$192
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	MOVD	R19, 8*4(RSP)
+	MOVD	R20, 8*5(RSP)
+	MOVD	R21, 8*6(RSP)
+	MOVD	R22, 8*7(RSP)
+	MOVD	R23, 8*8(RSP)
+	MOVD	R24, 8*9(RSP)
+	MOVD	R25, 8*10(RSP)
+	MOVD	R26, 8*11(RSP)
+	MOVD	R27, 8*12(RSP)
+	MOVD	g, 8*13(RSP)
+	MOVD	R29, 8*14(RSP)
+	FMOVD	F8, 8*15(RSP)
+	FMOVD	F9, 8*16(RSP)
+	FMOVD	F10, 8*17(RSP)
+	FMOVD	F11, 8*18(RSP)
+	FMOVD	F12, 8*19(RSP)
+	FMOVD	F13, 8*20(RSP)
+	FMOVD	F14, 8*21(RSP)
+	FMOVD	F15, 8*22(RSP)
+
+	// Save arguments.
+	MOVW	R0, (8*1)(RSP)	// sig
+	MOVD	R1, (8*2)(RSP)	// info
+	MOVD	R2, (8*3)(RSP)	// ctx
+
+	// this might be called in external code context,
+	// where g is not set.
+	BL	runtime·load_g(SB)
+
+#ifdef GOOS_ios
+	MOVD	RSP, R6
+	CMP	$0, g
+	BEQ	nog
+	// iOS always use the main stack to run the signal handler.
+	// We need to switch to gsignal ourselves.
+	MOVD	g_m(g), R11
+	MOVD	m_gsignal(R11), R5
+	MOVD	(g_stack+stack_hi)(R5), R6
+
+nog:
+	// Restore arguments.
+	MOVW	(8*1)(RSP), R0
+	MOVD	(8*2)(RSP), R1
+	MOVD	(8*3)(RSP), R2
+
+	// Reserve space for args and the stack pointer on the
+	// gsignal stack.
+	SUB	$48, R6
+	// Save stack pointer.
+	MOVD	RSP, R4
+	MOVD	R4, (8*4)(R6)
+	// Switch to gsignal stack.
+	MOVD	R6, RSP
+
+	// Save arguments.
+	MOVW	R0, (8*1)(RSP)
+	MOVD	R1, (8*2)(RSP)
+	MOVD	R2, (8*3)(RSP)
+#endif
+
+	// Call sigtrampgo.
+	MOVD	$runtime·sigtrampgo(SB), R11
+	BL	(R11)
+
+#ifdef GOOS_ios
+	// Switch to old stack.
+	MOVD	(8*4)(RSP), R5
+	MOVD	R5, RSP
+#endif
+
+	// Restore callee-save registers.
+	MOVD	(8*4)(RSP), R19
+	MOVD	(8*5)(RSP), R20
+	MOVD	(8*6)(RSP), R21
+	MOVD	(8*7)(RSP), R22
+	MOVD	(8*8)(RSP), R23
+	MOVD	(8*9)(RSP), R24
+	MOVD	(8*10)(RSP), R25
+	MOVD	(8*11)(RSP), R26
+	MOVD	(8*12)(RSP), R27
+	MOVD	(8*13)(RSP), g
+	MOVD	(8*14)(RSP), R29
+	FMOVD	(8*15)(RSP), F8
+	FMOVD	(8*16)(RSP), F9
+	FMOVD	(8*17)(RSP), F10
+	FMOVD	(8*18)(RSP), F11
+	FMOVD	(8*19)(RSP), F12
+	FMOVD	(8*20)(RSP), F13
+	FMOVD	(8*21)(RSP), F14
+	FMOVD	(8*22)(RSP), F15
+
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	JMP	runtime·sigtramp(SB)
+
+TEXT runtime·sigprocmask_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 new
+	MOVD	16(R0), R2	// arg 3 old
+	MOVW	0(R0), R0	// arg 1 how
+	BL	libc_pthread_sigmask(SB)
+	CMP	$0, R0
+	BEQ	2(PC)
+	BL	notok<>(SB)
+	RET
+
+TEXT runtime·sigaction_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 new
+	MOVD	16(R0), R2	// arg 3 old
+	MOVW	0(R0), R0	// arg 1 how
+	BL	libc_sigaction(SB)
+	CMP	$0, R0
+	BEQ	2(PC)
+	BL	notok<>(SB)
+	RET
+
+TEXT runtime·usleep_trampoline(SB),NOSPLIT,$0
+	MOVW	0(R0), R0	// arg 1 usec
+	BL	libc_usleep(SB)
+	RET
+
+TEXT runtime·sysctl_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1	// arg 2 miblen
+	MOVD	16(R0), R2	// arg 3 oldp
+	MOVD	24(R0), R3	// arg 4 oldlenp
+	MOVD	32(R0), R4	// arg 5 newp
+	MOVD	40(R0), R5	// arg 6 newlen
+	MOVD	0(R0), R0	// arg 1 mib
+	BL	libc_sysctl(SB)
+	RET
+
+TEXT runtime·sysctlbyname_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 oldp
+	MOVD	16(R0), R2	// arg 3 oldlenp
+	MOVD	24(R0), R3	// arg 4 newp
+	MOVD	32(R0), R4	// arg 5 newlen
+	MOVD	0(R0), R0	// arg 1 name
+	BL	libc_sysctlbyname(SB)
+	RET
+
+
+TEXT runtime·kqueue_trampoline(SB),NOSPLIT,$0
+	BL	libc_kqueue(SB)
+	RET
+
+TEXT runtime·kevent_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 keventt
+	MOVW	16(R0), R2	// arg 3 nch
+	MOVD	24(R0), R3	// arg 4 ev
+	MOVW	32(R0), R4	// arg 5 nev
+	MOVD	40(R0), R5	// arg 6 ts
+	MOVW	0(R0), R0	// arg 1 kq
+	BL	libc_kevent(SB)
+	MOVD	$-1, R2
+	CMP	R0, R2
+	BNE	ok
+	BL	libc_error(SB)
+	MOVW	(R0), R0	// errno
+	NEG	R0, R0	// caller wants it as a negative error code
+ok:
+	RET
+
+TEXT runtime·fcntl_trampoline(SB),NOSPLIT,$0
+	SUB	$16, RSP
+	MOVW	4(R0), R1	// arg 2 cmd
+	MOVW	8(R0), R2	// arg 3 arg
+	MOVW	R2, (RSP)	// arg 3 is variadic, pass on stack
+	MOVW	0(R0), R0	// arg 1 fd
+	BL	libc_fcntl(SB)
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·sigaltstack_trampoline(SB),NOSPLIT,$0
+#ifdef GOOS_ios
+	// sigaltstack on iOS is not supported and will always
+	// run the signal handler on the main stack, so our sigtramp has
+	// to do the stack switch ourselves.
+	MOVW	$43, R0
+	BL	libc_exit(SB)
+#else
+	MOVD	8(R0), R1		// arg 2 old
+	MOVD	0(R0), R0		// arg 1 new
+	CALL	libc_sigaltstack(SB)
+	CBZ	R0, 2(PC)
+	BL	notok<>(SB)
+#endif
+	RET
+
+// Thread related functions
+
+// mstart_stub is the first function executed on a new thread started by pthread_create.
+// It just does some low-level setup and then calls mstart.
+// Note: called with the C calling convention.
+TEXT runtime·mstart_stub(SB),NOSPLIT,$160
+	// R0 points to the m.
+	// We are already on m's g0 stack.
+
+	// Save callee-save registers.
+	MOVD	R19, 8(RSP)
+	MOVD	R20, 16(RSP)
+	MOVD	R21, 24(RSP)
+	MOVD	R22, 32(RSP)
+	MOVD	R23, 40(RSP)
+	MOVD	R24, 48(RSP)
+	MOVD	R25, 56(RSP)
+	MOVD	R26, 64(RSP)
+	MOVD	R27, 72(RSP)
+	MOVD	g, 80(RSP)
+	MOVD	R29, 88(RSP)
+	FMOVD	F8, 96(RSP)
+	FMOVD	F9, 104(RSP)
+	FMOVD	F10, 112(RSP)
+	FMOVD	F11, 120(RSP)
+	FMOVD	F12, 128(RSP)
+	FMOVD	F13, 136(RSP)
+	FMOVD	F14, 144(RSP)
+	FMOVD	F15, 152(RSP)
+
+	MOVD	m_g0(R0), g
+	BL	·save_g(SB)
+
+	BL	runtime·mstart(SB)
+
+	// Restore callee-save registers.
+	MOVD	8(RSP), R19
+	MOVD	16(RSP), R20
+	MOVD	24(RSP), R21
+	MOVD	32(RSP), R22
+	MOVD	40(RSP), R23
+	MOVD	48(RSP), R24
+	MOVD	56(RSP), R25
+	MOVD	64(RSP), R26
+	MOVD	72(RSP), R27
+	MOVD	80(RSP), g
+	MOVD	88(RSP), R29
+	FMOVD	96(RSP), F8
+	FMOVD	104(RSP), F9
+	FMOVD	112(RSP), F10
+	FMOVD	120(RSP), F11
+	FMOVD	128(RSP), F12
+	FMOVD	136(RSP), F13
+	FMOVD	144(RSP), F14
+	FMOVD	152(RSP), F15
+
+	// Go is all done with this OS thread.
+	// Tell pthread everything is ok (we never join with this thread, so
+	// the value here doesn't really matter).
+	MOVD	$0, R0
+
+	RET
+
+TEXT runtime·pthread_attr_init_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0	// arg 1 attr
+	BL	libc_pthread_attr_init(SB)
+	RET
+
+TEXT runtime·pthread_attr_getstacksize_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 size
+	MOVD	0(R0), R0	// arg 1 attr
+	BL	libc_pthread_attr_getstacksize(SB)
+	RET
+
+TEXT runtime·pthread_attr_setdetachstate_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 state
+	MOVD	0(R0), R0	// arg 1 attr
+	BL	libc_pthread_attr_setdetachstate(SB)
+	RET
+
+TEXT runtime·pthread_create_trampoline(SB),NOSPLIT,$0
+	SUB	$16, RSP
+	MOVD	0(R0), R1	// arg 2 state
+	MOVD	8(R0), R2	// arg 3 start
+	MOVD	16(R0), R3	// arg 4 arg
+	MOVD	RSP, R0 	// arg 1 &threadid (which we throw away)
+	BL	libc_pthread_create(SB)
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·raise_trampoline(SB),NOSPLIT,$0
+	MOVW	0(R0), R0	// arg 1 sig
+	BL	libc_raise(SB)
+	RET
+
+TEXT runtime·pthread_mutex_init_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 attr
+	MOVD	0(R0), R0	// arg 1 mutex
+	BL	libc_pthread_mutex_init(SB)
+	RET
+
+TEXT runtime·pthread_mutex_lock_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0	// arg 1 mutex
+	BL	libc_pthread_mutex_lock(SB)
+	RET
+
+TEXT runtime·pthread_mutex_unlock_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0	// arg 1 mutex
+	BL	libc_pthread_mutex_unlock(SB)
+	RET
+
+TEXT runtime·pthread_cond_init_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 attr
+	MOVD	0(R0), R0	// arg 1 cond
+	BL	libc_pthread_cond_init(SB)
+	RET
+
+TEXT runtime·pthread_cond_wait_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 mutex
+	MOVD	0(R0), R0	// arg 1 cond
+	BL	libc_pthread_cond_wait(SB)
+	RET
+
+TEXT runtime·pthread_cond_timedwait_relative_np_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 mutex
+	MOVD	16(R0), R2	// arg 3 timeout
+	MOVD	0(R0), R0	// arg 1 cond
+	BL	libc_pthread_cond_timedwait_relative_np(SB)
+	RET
+
+TEXT runtime·pthread_cond_signal_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0	// arg 1 cond
+	BL	libc_pthread_cond_signal(SB)
+	RET
+
+TEXT runtime·pthread_self_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R19		// R19 is callee-save
+	BL	libc_pthread_self(SB)
+	MOVD	R0, 0(R19)	// return value
+	RET
+
+TEXT runtime·pthread_kill_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 sig
+	MOVD	0(R0), R0	// arg 1 thread
+	BL	libc_pthread_kill(SB)
+	RET
+
+TEXT runtime·pthread_key_create_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 destructor
+	MOVD	0(R0), R0	// arg 1 *key
+	BL	libc_pthread_key_create(SB)
+	RET
+
+TEXT runtime·pthread_setspecific_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 value
+	MOVD	0(R0), R0	// arg 1 key
+	BL	libc_pthread_setspecific(SB)
+	RET
+
+// syscall calls a function in libc on behalf of the syscall package.
+// syscall takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, 8(RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	8(R0), R0	// a1
+
+	// If fn is declared as vararg, we have to pass the vararg arguments on the stack.
+	// (Because ios decided not to adhere to the standard arm64 calling convention, sigh...)
+	// The only libSystem calls we support that are vararg are open, fcntl, and ioctl,
+	// which are all of the form fn(x, y, ...). So we just need to put the 3rd arg
+	// on the stack as well.
+	// If we ever have other vararg libSystem calls, we might need to handle more cases.
+	MOVD	R2, (RSP)
+
+	BL	(R12)
+
+	MOVD	8(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 32(R2)	// save r1
+	MOVD	R1, 40(R2)	// save r2
+	CMPW	$-1, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, 8(RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	8(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 48(R2)	// save err
+ok:
+	RET
+
+// syscallX calls a function in libc on behalf of the syscall package.
+// syscallX takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscallX must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscallX(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, (RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	8(R0), R0	// a1
+	BL	(R12)
+
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 32(R2)	// save r1
+	MOVD	R1, 40(R2)	// save r2
+	CMP	$-1, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, (RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 48(R2)	// save err
+ok:
+	RET
+
+// syscallPtr is like syscallX except that the libc function reports an
+// error by returning NULL and setting errno.
+TEXT runtime·syscallPtr(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, (RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	8(R0), R0	// a1
+	BL	(R12)
+
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 32(R2)	// save r1
+	MOVD	R1, 40(R2)	// save r2
+	CMP	$0, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, (RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 48(R2)	// save err
+ok:
+	RET
+
+// syscall6 calls a function in libc on behalf of the syscall package.
+// syscall6 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall6(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, 8(RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	32(R0), R3	// a4
+	MOVD	40(R0), R4	// a5
+	MOVD	48(R0), R5	// a6
+	MOVD	8(R0), R0	// a1
+
+	// If fn is declared as vararg, we have to pass the vararg arguments on the stack.
+	// See syscall above. The only function this applies to is openat, for which the 4th
+	// arg must be on the stack.
+	MOVD	R3, (RSP)
+
+	BL	(R12)
+
+	MOVD	8(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 56(R2)	// save r1
+	MOVD	R1, 64(R2)	// save r2
+	CMPW	$-1, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, 8(RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	8(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 72(R2)	// save err
+ok:
+	RET
+
+// syscall6X calls a function in libc on behalf of the syscall package.
+// syscall6X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall6X(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, (RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	32(R0), R3	// a4
+	MOVD	40(R0), R4	// a5
+	MOVD	48(R0), R5	// a6
+	MOVD	8(R0), R0	// a1
+	BL	(R12)
+
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 56(R2)	// save r1
+	MOVD	R1, 64(R2)	// save r2
+	CMP	$-1, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, (RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 72(R2)	// save err
+ok:
+	RET
+
+// syscallNoErr is like syscall6 but does not check for errors, and
+// only returns one value, for use with standard C ABI library functions.
+TEXT runtime·syscallNoErr(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, (RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	32(R0), R3	// a4
+	MOVD	40(R0), R4	// a5
+	MOVD	48(R0), R5	// a6
+	MOVD	8(R0), R0	// a1
+	BL	(R12)
+
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 56(R2)	// save r1
+	RET
diff --git a/src/runtime/sys_dragonfly_amd64.s b/src/runtime/sys_dragonfly_amd64.s
new file mode 100644
index 0000000..580633a
--- /dev/null
+++ b/src/runtime/sys_dragonfly_amd64.s
@@ -0,0 +1,407 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for AMD64, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+TEXT runtime·sys_umtx_sleep(SB),NOSPLIT,$0
+	MOVQ addr+0(FP), DI		// arg 1 - ptr
+	MOVL val+8(FP), SI		// arg 2 - value
+	MOVL timeout+12(FP), DX		// arg 3 - timeout
+	MOVL $469, AX		// umtx_sleep
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·sys_umtx_wakeup(SB),NOSPLIT,$0
+	MOVQ addr+0(FP), DI		// arg 1 - ptr
+	MOVL val+8(FP), SI		// arg 2 - count
+	MOVL $470, AX		// umtx_wakeup
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·lwp_create(SB),NOSPLIT,$0
+	MOVQ param+0(FP), DI		// arg 1 - params
+	MOVL $495, AX		// lwp_create
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·lwp_start(SB),NOSPLIT,$0
+	MOVQ	DI, R13 // m
+
+	// set up FS to point at m->tls
+	LEAQ	m_tls(R13), DI
+	CALL	runtime·settls(SB)	// smashes DI
+
+	// set up m, g
+	get_tls(CX)
+	MOVQ	m_g0(R13), DI
+	MOVQ	R13, g_m(DI)
+	MOVQ	DI, g(CX)
+
+	CALL	runtime·stackcheck(SB)
+	CALL	runtime·mstart(SB)
+
+	MOVQ 0, AX			// crash (not reached)
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-8
+	MOVL	code+0(FP), DI		// arg 1 exit status
+	MOVL	$1, AX
+	SYSCALL
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-8
+	MOVQ	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$0x10000, DI	// arg 1 how - EXTEXIT_LWP
+	MOVL	$0, SI		// arg 2 status
+	MOVL	$0, DX		// arg 3 addr
+	MOVL	$494, AX	// extexit
+	SYSCALL
+	MOVL	$0xf1, 0xf1  // crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-8
+	MOVQ	name+0(FP), DI		// arg 1 pathname
+	MOVL	mode+8(FP), SI		// arg 2 flags
+	MOVL	perm+12(FP), DX		// arg 3 mode
+	MOVL	$5, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVL	$6, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVQ	p+8(FP), SI		// arg 2 buf
+	MOVL	n+16(FP), DX		// arg 3 count
+	MOVL	$3, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	MOVL	$42, AX
+	SYSCALL
+	JCC	pipeok
+	MOVL	$-1,r+0(FP)
+	MOVL	$-1,w+4(FP)
+	MOVL	AX, errno+8(FP)
+	RET
+pipeok:
+	MOVL	AX, r+0(FP)
+	MOVL	DX, w+4(FP)
+	MOVL	$0, errno+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-8
+	MOVQ	fd+0(FP), DI		// arg 1 fd
+	MOVQ	p+8(FP), SI		// arg 2 buf
+	MOVL	n+16(FP), DX		// arg 3 count
+	MOVL	$4, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·lwp_gettid(SB),NOSPLIT,$0-4
+	MOVL	$496, AX	// lwp_gettid
+	SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·lwp_kill(SB),NOSPLIT,$0-16
+	MOVL	pid+0(FP), DI	// arg 1 - pid
+	MOVL	tid+4(FP), SI	// arg 2 - tid
+	MOVQ	sig+8(FP), DX	// arg 3 - signum
+	MOVL	$497, AX	// lwp_kill
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	MOVL	$20, AX		// getpid
+	SYSCALL
+	MOVQ	AX, DI		// arg 1 - pid
+	MOVL	sig+0(FP), SI	// arg 2 - signum
+	MOVL	$37, AX		// kill
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB), NOSPLIT, $-8
+	MOVL	mode+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVL	$83, AX
+	SYSCALL
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB), NOSPLIT, $32
+	MOVL	$232, AX // clock_gettime
+	MOVQ	$0, DI  	// CLOCK_REALTIME
+	LEAQ	8(SP), SI
+	SYSCALL
+	MOVQ	8(SP), AX	// sec
+	MOVQ	16(SP), DX	// nsec
+
+	// sec is in AX, nsec in DX
+	MOVQ	AX, sec+0(FP)
+	MOVL	DX, nsec+8(FP)
+	RET
+
+TEXT runtime·nanotime1(SB), NOSPLIT, $32
+	MOVL	$232, AX
+	MOVQ	$4, DI  	// CLOCK_MONOTONIC
+	LEAQ	8(SP), SI
+	SYSCALL
+	MOVQ	8(SP), AX	// sec
+	MOVQ	16(SP), DX	// nsec
+
+	// sec is in AX, nsec in DX
+	// return nsec in AX
+	IMULQ	$1000000000, AX
+	ADDQ	DX, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+
+TEXT runtime·sigaction(SB),NOSPLIT,$-8
+	MOVL	sig+0(FP), DI		// arg 1 sig
+	MOVQ	new+8(FP), SI		// arg 2 act
+	MOVQ	old+16(FP), DX		// arg 3 oact
+	MOVL	$342, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	PUSHQ	BP
+	MOVQ	SP, BP
+	ANDQ	$~15, SP     // alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$72
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVQ	BX,  bx-8(SP)
+	MOVQ	BP,  bp-16(SP)  // save in case GOEXPERIMENT=noframepointer is set
+	MOVQ	R12, r12-24(SP)
+	MOVQ	R13, r13-32(SP)
+	MOVQ	R14, r14-40(SP)
+	MOVQ	R15, r15-48(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVQ	DX, ctx-56(SP)
+	MOVQ	SI, info-64(SP)
+	MOVQ	DI, signum-72(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	MOVQ	r15-48(SP), R15
+	MOVQ	r14-40(SP), R14
+	MOVQ	r13-32(SP), R13
+	MOVQ	r12-24(SP), R12
+	MOVQ	bp-16(SP),  BP
+	MOVQ	bx-8(SP),   BX
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 - addr
+	MOVQ	n+8(FP), SI		// arg 2 - len
+	MOVL	prot+16(FP), DX		// arg 3 - prot
+	MOVL	flags+20(FP), R10		// arg 4 - flags
+	MOVL	fd+24(FP), R8		// arg 5 - fd
+	MOVL	off+28(FP), R9
+	SUBQ	$16, SP
+	MOVQ	R9, 8(SP)		// arg 7 - offset (passed on stack)
+	MOVQ	$0, R9			// arg 6 - pad
+	MOVL	$197, AX
+	SYSCALL
+	JCC	ok
+	ADDQ	$16, SP
+	MOVQ	$0, p+32(FP)
+	MOVQ	AX, err+40(FP)
+	RET
+ok:
+	ADDQ	$16, SP
+	MOVQ	AX, p+32(FP)
+	MOVQ	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 addr
+	MOVQ	n+8(FP), SI		// arg 2 len
+	MOVL	$73, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	flags+16(FP), DX
+	MOVQ	$75, AX	// madvise
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVQ	new+0(FP), DI
+	MOVQ	old+8(FP), SI
+	MOVQ	$53, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVQ	AX, 0(SP)		// tv_sec
+	MOVL	$1000, AX
+	MULL	DX
+	MOVQ	AX, 8(SP)		// tv_nsec
+
+	MOVQ	SP, DI			// arg 1 - rqtp
+	MOVQ	$0, SI			// arg 2 - rmtp
+	MOVL	$240, AX		// sys_nanosleep
+	SYSCALL
+	RET
+
+// set tls base to DI
+TEXT runtime·settls(SB),NOSPLIT,$16
+	ADDQ	$8, DI	// adjust for ELF: wants to use -8(FS) for g
+	MOVQ	DI, 0(SP)
+	MOVQ	$16, 8(SP)
+	MOVQ	$0, DI			// arg 1 - which
+	MOVQ	SP, SI			// arg 2 - tls_info
+	MOVQ	$16, DX			// arg 3 - infosize
+	MOVQ	$472, AX		// set_tls_area
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVQ	mib+0(FP), DI		// arg 1 - name
+	MOVL	miblen+8(FP), SI		// arg 2 - namelen
+	MOVQ	out+16(FP), DX		// arg 3 - oldp
+	MOVQ	size+24(FP), R10		// arg 4 - oldlenp
+	MOVQ	dst+32(FP), R8		// arg 5 - newp
+	MOVQ	ndst+40(FP), R9		// arg 6 - newlen
+	MOVQ	$202, AX		// sys___sysctl
+	SYSCALL
+	JCC 4(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$-4
+	MOVL	$331, AX		// sys_sched_yield
+	SYSCALL
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVL	how+0(FP), DI		// arg 1 - how
+	MOVQ	new+8(FP), SI		// arg 2 - set
+	MOVQ	old+16(FP), DX		// arg 3 - oset
+	MOVL	$340, AX		// sys_sigprocmask
+	SYSCALL
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// int32 runtime·kqueue(void);
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVQ	$0, DI
+	MOVQ	$0, SI
+	MOVQ	$0, DX
+	MOVL	$362, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout);
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	kq+0(FP), DI
+	MOVQ	ch+8(FP), SI
+	MOVL	nch+16(FP), DX
+	MOVQ	ev+24(FP), R10
+	MOVL	nev+32(FP), R8
+	MOVQ	ts+40(FP), R9
+	MOVL	$363, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVL	fd+0(FP), DI	// fd
+	MOVQ	$2, SI		// F_SETFD
+	MOVQ	$1, DX		// FD_CLOEXEC
+	MOVL	$92, AX		// fcntl
+	SYSCALL
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVL    fd+0(FP), DI  // fd
+	MOVQ    $3, SI  // F_GETFL
+	MOVQ    $0, DX
+	MOVL	$92, AX // fcntl
+	SYSCALL
+	MOVL	fd+0(FP), DI // fd
+	MOVQ	$4, SI // F_SETFL
+	MOVQ	$4, DX // O_NONBLOCK
+	ORL	AX, DX
+	MOVL	$92, AX // fcntl
+	SYSCALL
+	RET
diff --git a/src/runtime/sys_freebsd_386.s b/src/runtime/sys_freebsd_386.s
new file mode 100644
index 0000000..97e6d9a
--- /dev/null
+++ b/src/runtime/sys_freebsd_386.s
@@ -0,0 +1,472 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for 386, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+TEXT runtime·sys_umtx_op(SB),NOSPLIT,$-4
+	MOVL	$454, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+20(FP)
+	RET
+
+TEXT runtime·thr_new(SB),NOSPLIT,$-4
+	MOVL	$455, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+// Called by OS using C ABI.
+TEXT runtime·thr_start(SB),NOSPLIT,$0
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	MOVL	4(SP), AX // m
+	MOVL	m_g0(AX), BX
+	LEAL	m_tls(AX), BP
+	MOVL	m_id(AX), DI
+	ADDL	$7, DI
+	PUSHAL
+	PUSHL	$32
+	PUSHL	BP
+	PUSHL	DI
+	CALL	runtime·setldt(SB)
+	POPL	AX
+	POPL	AX
+	POPL	AX
+	POPAL
+	get_tls(CX)
+	MOVL	BX, g(CX)
+
+	MOVL	AX, g_m(BX)
+	CALL	runtime·stackcheck(SB)		// smashes AX
+	CALL	runtime·mstart(SB)
+
+	MOVL	0, AX			// crash (not reached)
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-4
+	MOVL	$1, AX
+	INT	$0x80
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+GLOBL exitStack<>(SB),RODATA,$8
+DATA exitStack<>+0x00(SB)/4, $0
+DATA exitStack<>+0x04(SB)/4, $0
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVL	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	// thr_exit takes a single pointer argument, which it expects
+	// on the stack. We want to pass 0, so switch over to a fake
+	// stack of 0s. It won't write to the stack.
+	MOVL	$exitStack<>(SB), SP
+	MOVL	$431, AX	// thr_exit
+	INT	$0x80
+	MOVL	$0xf1, 0xf1  // crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-4
+	MOVL	$5, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-4
+	MOVL	$6, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-4
+	MOVL	$3, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$8-12
+	MOVL	$42, AX
+	INT	$0x80
+	JAE	ok
+	MOVL	$0, r+0(FP)
+	MOVL	$0, w+4(FP)
+	MOVL	AX, errno+8(FP)
+	RET
+ok:
+	MOVL	AX, r+0(FP)
+	MOVL	DX, w+4(FP)
+	MOVL	$0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$12-16
+	MOVL	$542, AX
+	LEAL	r+4(FP), BX
+	MOVL	BX, 4(SP)
+	MOVL	flags+0(FP), BX
+	MOVL	BX, 8(SP)
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, errno+12(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-4
+	MOVL	$4, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·thr_self(SB),NOSPLIT,$8-4
+	// thr_self(&0(FP))
+	LEAL	ret+0(FP), AX
+	MOVL	AX, 4(SP)
+	MOVL	$432, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·thr_kill(SB),NOSPLIT,$-4
+	// thr_kill(tid, sig)
+	MOVL	$433, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$16
+	// getpid
+	MOVL	$20, AX
+	INT	$0x80
+	// kill(self, sig)
+	MOVL	AX, 4(SP)
+	MOVL	sig+0(FP), AX
+	MOVL	AX, 8(SP)
+	MOVL	$37, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$32
+	LEAL addr+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL
+	MOVSL
+	MOVSL
+	MOVSL
+	MOVSL
+	MOVSL
+	MOVL	$0, AX	// top 32 bits of file offset
+	STOSL
+	MOVL	$477, AX
+	INT	$0x80
+	JAE	ok
+	MOVL	$0, p+24(FP)
+	MOVL	AX, err+28(FP)
+	RET
+ok:
+	MOVL	AX, p+24(FP)
+	MOVL	$0, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$-4
+	MOVL	$73, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$-4
+	MOVL	$75, AX	// madvise
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·setitimer(SB), NOSPLIT, $-4
+	MOVL	$83, AX
+	INT	$0x80
+	RET
+
+// func fallback_walltime() (sec int64, nsec int32)
+TEXT runtime·fallback_walltime(SB), NOSPLIT, $32-12
+	MOVL	$232, AX // clock_gettime
+	LEAL	12(SP), BX
+	MOVL	$0, 4(SP)	// CLOCK_REALTIME
+	MOVL	BX, 8(SP)
+	INT	$0x80
+	MOVL	12(SP), AX	// sec
+	MOVL	16(SP), BX	// nsec
+
+	// sec is in AX, nsec in BX
+	MOVL	AX, sec_lo+0(FP)
+	MOVL	$0, sec_hi+4(FP)
+	MOVL	BX, nsec+8(FP)
+	RET
+
+// func fallback_nanotime() int64
+TEXT runtime·fallback_nanotime(SB), NOSPLIT, $32-8
+	MOVL	$232, AX
+	LEAL	12(SP), BX
+	MOVL	$4, 4(SP)	// CLOCK_MONOTONIC
+	MOVL	BX, 8(SP)
+	INT	$0x80
+	MOVL	12(SP), AX	// sec
+	MOVL	16(SP), BX	// nsec
+
+	// sec is in AX, nsec in BX
+	// convert to DX:AX nsec
+	MOVL	$1000000000, CX
+	MULL	CX
+	ADDL	BX, AX
+	ADCL	$0, DX
+
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+
+
+TEXT runtime·asmSigaction(SB),NOSPLIT,$-4
+	MOVL	$416, AX
+	INT	$0x80
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$12-16
+	MOVL	fn+0(FP), AX
+	MOVL	sig+4(FP), BX
+	MOVL	info+8(FP), CX
+	MOVL	ctx+12(FP), DX
+	MOVL	SP, SI
+	SUBL	$32, SP
+	ANDL	$~15, SP	// align stack: handler might be a C function
+	MOVL	BX, 0(SP)
+	MOVL	CX, 4(SP)
+	MOVL	DX, 8(SP)
+	MOVL	SI, 12(SP)	// save SI: handler might be a Go function
+	CALL	AX
+	MOVL	12(SP), AX
+	MOVL	AX, SP
+	RET
+
+// Called by OS using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT,$12
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	MOVL	16(SP), BX	// signo
+	MOVL	BX, 0(SP)
+	MOVL	20(SP), BX // info
+	MOVL	BX, 4(SP)
+	MOVL	24(SP), BX // context
+	MOVL	BX, 8(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	// call sigreturn
+	MOVL	24(SP), AX	// context
+	MOVL	$0, 0(SP)	// syscall gap
+	MOVL	AX, 4(SP)
+	MOVL	$417, AX	// sigreturn(ucontext)
+	INT	$0x80
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVL	$53, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$20
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVL	AX, 12(SP)		// tv_sec
+	MOVL	$1000, AX
+	MULL	DX
+	MOVL	AX, 16(SP)		// tv_nsec
+
+	MOVL	$0, 0(SP)
+	LEAL	12(SP), AX
+	MOVL	AX, 4(SP)		// arg 1 - rqtp
+	MOVL	$0, 8(SP)		// arg 2 - rmtp
+	MOVL	$240, AX		// sys_nanosleep
+	INT	$0x80
+	RET
+
+/*
+descriptor entry format for system call
+is the native machine format, ugly as it is:
+
+	2-byte limit
+	3-byte base
+	1-byte: 0x80=present, 0x60=dpl<<5, 0x1F=type
+	1-byte: 0x80=limit is *4k, 0x40=32-bit operand size,
+		0x0F=4 more bits of limit
+	1 byte: 8 more bits of base
+
+int i386_get_ldt(int, union ldt_entry *, int);
+int i386_set_ldt(int, const union ldt_entry *, int);
+
+*/
+
+// setldt(int entry, int address, int limit)
+TEXT runtime·setldt(SB),NOSPLIT,$32
+	MOVL	base+4(FP), BX
+	// see comment in sys_linux_386.s; freebsd is similar
+	ADDL	$0x4, BX
+
+	// set up data_desc
+	LEAL	16(SP), AX	// struct data_desc
+	MOVL	$0, 0(AX)
+	MOVL	$0, 4(AX)
+
+	MOVW	BX, 2(AX)
+	SHRL	$16, BX
+	MOVB	BX, 4(AX)
+	SHRL	$8, BX
+	MOVB	BX, 7(AX)
+
+	MOVW	$0xffff, 0(AX)
+	MOVB	$0xCF, 6(AX)	// 32-bit operand, 4k limit unit, 4 more bits of limit
+
+	MOVB	$0xF2, 5(AX)	// r/w data descriptor, dpl=3, present
+
+	// call i386_set_ldt(entry, desc, 1)
+	MOVL	$0xffffffff, 0(SP)	// auto-allocate entry and return in AX
+	MOVL	AX, 4(SP)
+	MOVL	$1, 8(SP)
+	CALL	i386_set_ldt<>(SB)
+
+	// compute segment selector - (entry*8+7)
+	SHLL	$3, AX
+	ADDL	$7, AX
+	MOVW	AX, GS
+	RET
+
+TEXT i386_set_ldt<>(SB),NOSPLIT,$16
+	LEAL	args+0(FP), AX	// 0(FP) == 4(SP) before SP got moved
+	MOVL	$0, 0(SP)	// syscall gap
+	MOVL	$1, 4(SP)
+	MOVL	AX, 8(SP)
+	MOVL	$165, AX
+	INT	$0x80
+	JAE	2(PC)
+	INT	$3
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$28
+	LEAL	mib+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL				// arg 1 - name
+	MOVSL				// arg 2 - namelen
+	MOVSL				// arg 3 - oldp
+	MOVSL				// arg 4 - oldlenp
+	MOVSL				// arg 5 - newp
+	MOVSL				// arg 6 - newlen
+	MOVL	$202, AX		// sys___sysctl
+	INT	$0x80
+	JAE	4(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$-4
+	MOVL	$331, AX		// sys_sched_yield
+	INT	$0x80
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$16
+	MOVL	$0, 0(SP)		// syscall gap
+	MOVL	how+0(FP), AX		// arg 1 - how
+	MOVL	AX, 4(SP)
+	MOVL	new+4(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - set
+	MOVL	old+8(FP), AX
+	MOVL	AX, 12(SP)		// arg 3 - oset
+	MOVL	$340, AX		// sys_sigprocmask
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// int32 runtime·kqueue(void);
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVL	$362, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout);
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	$363, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int32 runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT,$32
+	MOVL	$92, AX		// fcntl
+	// 0(SP) is where the caller PC would be; kernel skips it
+	MOVL	fd+0(FP), BX
+	MOVL	BX, 4(SP)	// fd
+	MOVL	$2, 8(SP)	// F_SETFD
+	MOVL	$1, 12(SP)	// FD_CLOEXEC
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	RET
+
+// func runtime·setNonblock(fd int32)
+TEXT runtime·setNonblock(SB),NOSPLIT,$16-4
+	MOVL	$92, AX // fcntl
+	MOVL	fd+0(FP), BX // fd
+	MOVL	BX, 4(SP)
+	MOVL	$3, 8(SP) // F_GETFL
+	MOVL	$0, 12(SP)
+	INT	$0x80
+	MOVL	fd+0(FP), BX // fd
+	MOVL	BX, 4(SP)
+	MOVL	$4, 8(SP) // F_SETFL
+	ORL	$4, AX // O_NONBLOCK
+	MOVL	AX, 12(SP)
+	MOVL	$92, AX // fcntl
+	INT	$0x80
+	RET
+
+// func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+TEXT runtime·cpuset_getaffinity(SB), NOSPLIT, $0-28
+	MOVL	$487, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+GLOBL runtime·tlsoffset(SB),NOPTR,$4
diff --git a/src/runtime/sys_freebsd_amd64.s b/src/runtime/sys_freebsd_amd64.s
new file mode 100644
index 0000000..07734b0
--- /dev/null
+++ b/src/runtime/sys_freebsd_amd64.s
@@ -0,0 +1,510 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for AMD64, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+TEXT runtime·sys_umtx_op(SB),NOSPLIT,$0
+	MOVQ addr+0(FP), DI
+	MOVL mode+8(FP), SI
+	MOVL val+12(FP), DX
+	MOVQ uaddr1+16(FP), R10
+	MOVQ ut+24(FP), R8
+	MOVL $454, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+32(FP)
+	RET
+
+TEXT runtime·thr_new(SB),NOSPLIT,$0
+	MOVQ param+0(FP), DI
+	MOVL size+8(FP), SI
+	MOVL $455, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·thr_start(SB),NOSPLIT,$0
+	MOVQ	DI, R13 // m
+
+	// set up FS to point at m->tls
+	LEAQ	m_tls(R13), DI
+	CALL	runtime·settls(SB)	// smashes DI
+
+	// set up m, g
+	get_tls(CX)
+	MOVQ	m_g0(R13), DI
+	MOVQ	R13, g_m(DI)
+	MOVQ	DI, g(CX)
+
+	CALL	runtime·stackcheck(SB)
+	CALL	runtime·mstart(SB)
+
+	MOVQ 0, AX			// crash (not reached)
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-8
+	MOVL	code+0(FP), DI		// arg 1 exit status
+	MOVL	$1, AX
+	SYSCALL
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-8
+	MOVQ	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$0, DI		// arg 1 long *state
+	MOVL	$431, AX	// thr_exit
+	SYSCALL
+	MOVL	$0xf1, 0xf1  // crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-8
+	MOVQ	name+0(FP), DI		// arg 1 pathname
+	MOVL	mode+8(FP), SI		// arg 2 flags
+	MOVL	perm+12(FP), DX		// arg 3 mode
+	MOVL	$5, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVL	$6, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVQ	p+8(FP), SI		// arg 2 buf
+	MOVL	n+16(FP), DX		// arg 3 count
+	MOVL	$3, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	MOVL	$42, AX
+	SYSCALL
+	JCC	ok
+	MOVL	$0, r+0(FP)
+	MOVL	$0, w+4(FP)
+	MOVL	AX, errno+8(FP)
+	RET
+ok:
+	MOVL	AX, r+0(FP)
+	MOVL	DX, w+4(FP)
+	MOVL	$0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-20
+	LEAQ	r+8(FP), DI
+	MOVL	flags+0(FP), SI
+	MOVL	$542, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, errno+16(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-8
+	MOVQ	fd+0(FP), DI		// arg 1 fd
+	MOVQ	p+8(FP), SI		// arg 2 buf
+	MOVL	n+16(FP), DX		// arg 3 count
+	MOVL	$4, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·thr_self(SB),NOSPLIT,$0-8
+	// thr_self(&0(FP))
+	LEAQ	ret+0(FP), DI	// arg 1
+	MOVL	$432, AX
+	SYSCALL
+	RET
+
+TEXT runtime·thr_kill(SB),NOSPLIT,$0-16
+	// thr_kill(tid, sig)
+	MOVQ	tid+0(FP), DI	// arg 1 id
+	MOVQ	sig+8(FP), SI	// arg 2 sig
+	MOVL	$433, AX
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	// getpid
+	MOVL	$20, AX
+	SYSCALL
+	// kill(self, sig)
+	MOVQ	AX, DI		// arg 1 pid
+	MOVL	sig+0(FP), SI	// arg 2 sig
+	MOVL	$37, AX
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB), NOSPLIT, $-8
+	MOVL	mode+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVL	$83, AX
+	SYSCALL
+	RET
+
+// func fallback_walltime() (sec int64, nsec int32)
+TEXT runtime·fallback_walltime(SB), NOSPLIT, $32-12
+	MOVL	$232, AX	// clock_gettime
+	MOVQ	$0, DI		// CLOCK_REALTIME
+	LEAQ	8(SP), SI
+	SYSCALL
+	MOVQ	8(SP), AX	// sec
+	MOVQ	16(SP), DX	// nsec
+
+	// sec is in AX, nsec in DX
+	MOVQ	AX, sec+0(FP)
+	MOVL	DX, nsec+8(FP)
+	RET
+
+TEXT runtime·fallback_nanotime(SB), NOSPLIT, $32-8
+	MOVL	$232, AX
+	MOVQ	$4, DI		// CLOCK_MONOTONIC
+	LEAQ	8(SP), SI
+	SYSCALL
+	MOVQ	8(SP), AX	// sec
+	MOVQ	16(SP), DX	// nsec
+
+	// sec is in AX, nsec in DX
+	// return nsec in AX
+	IMULQ	$1000000000, AX
+	ADDQ	DX, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+
+TEXT runtime·asmSigaction(SB),NOSPLIT,$0
+	MOVQ	sig+0(FP), DI		// arg 1 sig
+	MOVQ	new+8(FP), SI		// arg 2 act
+	MOVQ	old+16(FP), DX		// arg 3 oact
+	MOVL	$416, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·callCgoSigaction(SB),NOSPLIT,$16
+	MOVQ	sig+0(FP), DI		// arg 1 sig
+	MOVQ	new+8(FP), SI		// arg 2 act
+	MOVQ	old+16(FP), DX		// arg 3 oact
+	MOVQ	_cgo_sigaction(SB), AX
+	MOVQ	SP, BX			// callee-saved
+	ANDQ	$~15, SP		// alignment as per amd64 psABI
+	CALL	AX
+	MOVQ	BX, SP
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	PUSHQ	BP
+	MOVQ	SP, BP
+	ANDQ	$~15, SP     // alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$72
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVQ	BX, bx-8(SP)
+	MOVQ	BP, bp-16(SP)  // save in case GOEXPERIMENT=noframepointer is set
+	MOVQ	R12, r12-24(SP)
+	MOVQ	R13, r13-32(SP)
+	MOVQ	R14, r14-40(SP)
+	MOVQ	R15, r15-48(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVQ	DX, ctx-56(SP)
+	MOVQ	SI, info-64(SP)
+	MOVQ	DI, signum-72(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	MOVQ	r15-48(SP), R15
+	MOVQ	r14-40(SP), R14
+	MOVQ	r13-32(SP), R13
+	MOVQ	r12-24(SP), R12
+	MOVQ	bp-16(SP),  BP
+	MOVQ	bx-8(SP),   BX
+	RET
+
+// Used instead of sigtramp in programs that use cgo.
+// Arguments from kernel are in DI, SI, DX.
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	// If no traceback function, do usual sigtramp.
+	MOVQ	runtime·cgoTraceback(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// If no traceback support function, which means that
+	// runtime/cgo was not linked in, do usual sigtramp.
+	MOVQ	_cgo_callers(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// Figure out if we are currently in a cgo call.
+	// If not, just do usual sigtramp.
+	get_tls(CX)
+	MOVQ	g(CX),AX
+	TESTQ	AX, AX
+	JZ	sigtrampnog     // g == nil
+	MOVQ	g_m(AX), AX
+	TESTQ	AX, AX
+	JZ	sigtramp        // g.m == nil
+	MOVL	m_ncgo(AX), CX
+	TESTL	CX, CX
+	JZ	sigtramp        // g.m.ncgo == 0
+	MOVQ	m_curg(AX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg == nil
+	MOVQ	g_syscallsp(CX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg.syscallsp == 0
+	MOVQ	m_cgoCallers(AX), R8
+	TESTQ	R8, R8
+	JZ	sigtramp        // g.m.cgoCallers == nil
+	MOVL	m_cgoCallersUse(AX), CX
+	TESTL	CX, CX
+	JNZ	sigtramp	// g.m.cgoCallersUse != 0
+
+	// Jump to a function in runtime/cgo.
+	// That function, written in C, will call the user's traceback
+	// function with proper unwind info, and will then call back here.
+	// The first three arguments, and the fifth, are already in registers.
+	// Set the two remaining arguments now.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigtramp(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+sigtramp:
+	JMP	runtime·sigtramp(SB)
+
+sigtrampnog:
+	// Signal arrived on a non-Go thread. If this is SIGPROF, get a
+	// stack trace.
+	CMPL	DI, $27 // 27 == SIGPROF
+	JNZ	sigtramp
+
+	// Lock sigprofCallersUse.
+	MOVL	$0, AX
+	MOVL	$1, CX
+	MOVQ	$runtime·sigprofCallersUse(SB), R11
+	LOCK
+	CMPXCHGL	CX, 0(R11)
+	JNZ	sigtramp  // Skip stack trace if already locked.
+
+	// Jump to the traceback function in runtime/cgo.
+	// It will call back to sigprofNonGo, which will ignore the
+	// arguments passed in registers.
+	// First three arguments to traceback function are in registers already.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigprofCallers(SB), R8
+	MOVQ	$runtime·sigprofNonGo(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 addr
+	MOVQ	n+8(FP), SI		// arg 2 len
+	MOVL	prot+16(FP), DX		// arg 3 prot
+	MOVL	flags+20(FP), R10		// arg 4 flags
+	MOVL	fd+24(FP), R8		// arg 5 fid
+	MOVL	off+28(FP), R9		// arg 6 offset
+	MOVL	$477, AX
+	SYSCALL
+	JCC	ok
+	MOVQ	$0, p+32(FP)
+	MOVQ	AX, err+40(FP)
+	RET
+ok:
+	MOVQ	AX, p+32(FP)
+	MOVQ	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 addr
+	MOVQ	n+8(FP), SI		// arg 2 len
+	MOVL	$73, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	flags+16(FP), DX
+	MOVQ	$75, AX	// madvise
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVQ	new+0(FP), DI
+	MOVQ	old+8(FP), SI
+	MOVQ	$53, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVQ	AX, 0(SP)		// tv_sec
+	MOVL	$1000, AX
+	MULL	DX
+	MOVQ	AX, 8(SP)		// tv_nsec
+
+	MOVQ	SP, DI			// arg 1 - rqtp
+	MOVQ	$0, SI			// arg 2 - rmtp
+	MOVL	$240, AX		// sys_nanosleep
+	SYSCALL
+	RET
+
+// set tls base to DI
+TEXT runtime·settls(SB),NOSPLIT,$8
+	ADDQ	$8, DI	// adjust for ELF: wants to use -8(FS) for g and m
+	MOVQ	DI, 0(SP)
+	MOVQ	SP, SI
+	MOVQ	$129, DI	// AMD64_SET_FSBASE
+	MOVQ	$165, AX	// sysarch
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVQ	mib+0(FP), DI		// arg 1 - name
+	MOVL	miblen+8(FP), SI		// arg 2 - namelen
+	MOVQ	out+16(FP), DX		// arg 3 - oldp
+	MOVQ	size+24(FP), R10		// arg 4 - oldlenp
+	MOVQ	dst+32(FP), R8		// arg 5 - newp
+	MOVQ	ndst+40(FP), R9		// arg 6 - newlen
+	MOVQ	$202, AX		// sys___sysctl
+	SYSCALL
+	JCC 4(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$-4
+	MOVL	$331, AX		// sys_sched_yield
+	SYSCALL
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVL	how+0(FP), DI		// arg 1 - how
+	MOVQ	new+8(FP), SI		// arg 2 - set
+	MOVQ	old+16(FP), DX		// arg 3 - oset
+	MOVL	$340, AX		// sys_sigprocmask
+	SYSCALL
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// int32 runtime·kqueue(void);
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVQ	$0, DI
+	MOVQ	$0, SI
+	MOVQ	$0, DX
+	MOVL	$362, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout);
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	kq+0(FP), DI
+	MOVQ	ch+8(FP), SI
+	MOVL	nch+16(FP), DX
+	MOVQ	ev+24(FP), R10
+	MOVL	nev+32(FP), R8
+	MOVQ	ts+40(FP), R9
+	MOVL	$363, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVL	fd+0(FP), DI	// fd
+	MOVQ	$2, SI		// F_SETFD
+	MOVQ	$1, DX		// FD_CLOEXEC
+	MOVL	$92, AX		// fcntl
+	SYSCALL
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVL    fd+0(FP), DI  // fd
+	MOVQ    $3, SI  // F_GETFL
+	MOVQ    $0, DX
+	MOVL	$92, AX // fcntl
+	SYSCALL
+	MOVL	fd+0(FP), DI // fd
+	MOVQ	$4, SI // F_SETFL
+	MOVQ	$4, DX // O_NONBLOCK
+	ORL	AX, DX
+	MOVL	$92, AX // fcntl
+	SYSCALL
+	RET
+
+// func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+TEXT runtime·cpuset_getaffinity(SB), NOSPLIT, $0-44
+	MOVQ	level+0(FP), DI
+	MOVQ	which+8(FP), SI
+	MOVQ	id+16(FP), DX
+	MOVQ	size+24(FP), R10
+	MOVQ	mask+32(FP), R8
+	MOVL	$487, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+40(FP)
+	RET
diff --git a/src/runtime/sys_freebsd_arm.s b/src/runtime/sys_freebsd_arm.s
new file mode 100644
index 0000000..b12e47c
--- /dev/null
+++ b/src/runtime/sys_freebsd_arm.s
@@ -0,0 +1,475 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for ARM, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// for EABI, as we don't support OABI
+#define SYS_BASE 0x0
+
+#define SYS_exit (SYS_BASE + 1)
+#define SYS_read (SYS_BASE + 3)
+#define SYS_write (SYS_BASE + 4)
+#define SYS_open (SYS_BASE + 5)
+#define SYS_close (SYS_BASE + 6)
+#define SYS_getpid (SYS_BASE + 20)
+#define SYS_kill (SYS_BASE + 37)
+#define SYS_pipe (SYS_BASE + 42)
+#define SYS_sigaltstack (SYS_BASE + 53)
+#define SYS_munmap (SYS_BASE + 73)
+#define SYS_madvise (SYS_BASE + 75)
+#define SYS_setitimer (SYS_BASE + 83)
+#define SYS_fcntl (SYS_BASE + 92)
+#define SYS___sysctl (SYS_BASE + 202)
+#define SYS_nanosleep (SYS_BASE + 240)
+#define SYS_clock_gettime (SYS_BASE + 232)
+#define SYS_sched_yield (SYS_BASE + 331)
+#define SYS_sigprocmask (SYS_BASE + 340)
+#define SYS_kqueue (SYS_BASE + 362)
+#define SYS_kevent (SYS_BASE + 363)
+#define SYS_sigaction (SYS_BASE + 416)
+#define SYS_thr_exit (SYS_BASE + 431)
+#define SYS_thr_self (SYS_BASE + 432)
+#define SYS_thr_kill (SYS_BASE + 433)
+#define SYS__umtx_op (SYS_BASE + 454)
+#define SYS_thr_new (SYS_BASE + 455)
+#define SYS_mmap (SYS_BASE + 477)
+#define SYS_cpuset_getaffinity (SYS_BASE + 487)
+#define SYS_pipe2 (SYS_BASE + 542)
+
+TEXT runtime·sys_umtx_op(SB),NOSPLIT,$0
+	MOVW addr+0(FP), R0
+	MOVW mode+4(FP), R1
+	MOVW val+8(FP), R2
+	MOVW uaddr1+12(FP), R3
+	ADD $20, R13 // arg 5 is passed on stack
+	MOVW $SYS__umtx_op, R7
+	SWI $0
+	RSB.CS $0, R0
+	SUB $20, R13
+	// BCS error
+	MOVW	R0, ret+20(FP)
+	RET
+
+TEXT runtime·thr_new(SB),NOSPLIT,$0
+	MOVW param+0(FP), R0
+	MOVW size+4(FP), R1
+	MOVW $SYS_thr_new, R7
+	SWI $0
+	RSB.CS $0, R0
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·thr_start(SB),NOSPLIT,$0
+	// set up g
+	MOVW m_g0(R0), g
+	MOVW R0, g_m(g)
+	BL runtime·emptyfunc(SB) // fault if stack check is wrong
+	BL runtime·mstart(SB)
+
+	MOVW $2, R8  // crash (not reached)
+	MOVW R8, (R8)
+	RET
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0
+	MOVW code+0(FP), R0	// arg 1 exit status
+	MOVW $SYS_exit, R7
+	SWI $0
+	MOVW.CS $0, R8 // crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVW	wait+0(FP), R0
+	// We're done using the stack.
+	MOVW	$0, R2
+storeloop:
+	LDREX	(R0), R4          // loads R4
+	STREX	R2, (R0), R1      // stores R2
+	CMP	$0, R1
+	BNE	storeloop
+	MOVW	$0, R0		// arg 1 long *state
+	MOVW	$SYS_thr_exit, R7
+	SWI	$0
+	MOVW.CS	$0, R8 // crash on syscall failure
+	MOVW.CS	R8, (R8)
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0
+	MOVW name+0(FP), R0	// arg 1 name
+	MOVW mode+4(FP), R1	// arg 2 mode
+	MOVW perm+8(FP), R2	// arg 3 perm
+	MOVW $SYS_open, R7
+	SWI $0
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0
+	MOVW fd+0(FP), R0	// arg 1 fd
+	MOVW p+4(FP), R1	// arg 2 buf
+	MOVW n+8(FP), R2	// arg 3 count
+	MOVW $SYS_read, R7
+	SWI $0
+	RSB.CS	$0, R0		// caller expects negative errno
+	MOVW	R0, ret+12(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	MOVW	$SYS_pipe, R7
+	SWI	$0
+	BCC	ok
+	MOVW	$0, R1
+	MOVW	R1, r+0(FP)
+	MOVW	R1, w+4(FP)
+	MOVW	R0, errno+8(FP)
+	RET
+ok:
+	MOVW	R0, r+0(FP)
+	MOVW	R1, w+4(FP)
+	MOVW	$0, R1
+	MOVW	R1, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVW	$r+4(FP), R0
+	MOVW	flags+0(FP), R1
+	MOVW	$SYS_pipe2, R7
+	SWI	$0
+	RSB.CS $0, R0
+	MOVW	R0, errno+12(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0
+	MOVW fd+0(FP), R0	// arg 1 fd
+	MOVW p+4(FP), R1	// arg 2 buf
+	MOVW n+8(FP), R2	// arg 3 count
+	MOVW $SYS_write, R7
+	SWI $0
+	RSB.CS	$0, R0		// caller expects negative errno
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0
+	MOVW fd+0(FP), R0	// arg 1 fd
+	MOVW $SYS_close, R7
+	SWI $0
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+4(FP)
+	RET
+
+TEXT runtime·thr_self(SB),NOSPLIT,$0-4
+	// thr_self(&0(FP))
+	MOVW $ret+0(FP), R0 // arg 1
+	MOVW $SYS_thr_self, R7
+	SWI $0
+	RET
+
+TEXT runtime·thr_kill(SB),NOSPLIT,$0-8
+	// thr_kill(tid, sig)
+	MOVW tid+0(FP), R0	// arg 1 id
+	MOVW sig+4(FP), R1	// arg 2 signal
+	MOVW $SYS_thr_kill, R7
+	SWI $0
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	// getpid
+	MOVW $SYS_getpid, R7
+	SWI $0
+	// kill(self, sig)
+				// arg 1 - pid, now in R0
+	MOVW sig+0(FP), R1	// arg 2 - signal
+	MOVW $SYS_kill, R7
+	SWI $0
+	RET
+
+TEXT runtime·setitimer(SB), NOSPLIT|NOFRAME, $0
+	MOVW mode+0(FP), R0
+	MOVW new+4(FP), R1
+	MOVW old+8(FP), R2
+	MOVW $SYS_setitimer, R7
+	SWI $0
+	RET
+
+// func fallback_walltime() (sec int64, nsec int32)
+TEXT runtime·fallback_walltime(SB), NOSPLIT, $32-12
+	MOVW $0, R0 // CLOCK_REALTIME
+	MOVW $8(R13), R1
+	MOVW $SYS_clock_gettime, R7
+	SWI $0
+
+	MOVW 8(R13), R0 // sec.low
+	MOVW 12(R13), R1 // sec.high
+	MOVW 16(R13), R2 // nsec
+
+	MOVW R0, sec_lo+0(FP)
+	MOVW R1, sec_hi+4(FP)
+	MOVW R2, nsec+8(FP)
+	RET
+
+// func fallback_nanotime() int64
+TEXT runtime·fallback_nanotime(SB), NOSPLIT, $32
+	MOVW $4, R0 // CLOCK_MONOTONIC
+	MOVW $8(R13), R1
+	MOVW $SYS_clock_gettime, R7
+	SWI $0
+
+	MOVW 8(R13), R0 // sec.low
+	MOVW 12(R13), R4 // sec.high
+	MOVW 16(R13), R2 // nsec
+
+	MOVW $1000000000, R3
+	MULLU R0, R3, (R1, R0)
+	MUL R3, R4
+	ADD.S R2, R0
+	ADC R4, R1
+
+	MOVW R0, ret_lo+0(FP)
+	MOVW R1, ret_hi+4(FP)
+	RET
+
+TEXT runtime·asmSigaction(SB),NOSPLIT|NOFRAME,$0
+	MOVW sig+0(FP), R0		// arg 1 sig
+	MOVW new+4(FP), R1		// arg 2 act
+	MOVW old+8(FP), R2		// arg 3 oact
+	MOVW $SYS_sigaction, R7
+	SWI $0
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$0
+	// Reserve space for callee-save registers and arguments.
+	MOVM.DB.W [R4-R11], (R13)
+	SUB	$16, R13
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVW	R0, 4(R13) // signum
+	MOVB	runtime·iscgo(SB), R0
+	CMP 	$0, R0
+	BL.NE	runtime·load_g(SB)
+
+	MOVW	R1, 8(R13)
+	MOVW	R2, 12(R13)
+	BL	runtime·sigtrampgo(SB)
+
+	// Restore callee-save registers.
+	ADD	$16, R13
+	MOVM.IA.W (R13), [R4-R11]
+
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$16
+	MOVW addr+0(FP), R0		// arg 1 addr
+	MOVW n+4(FP), R1		// arg 2 len
+	MOVW prot+8(FP), R2		// arg 3 prot
+	MOVW flags+12(FP), R3		// arg 4 flags
+	// arg 5 (fid) and arg6 (offset_lo, offset_hi) are passed on stack
+	// note the C runtime only passes the 32-bit offset_lo to us
+	MOVW fd+16(FP), R4		// arg 5
+	MOVW R4, 4(R13)
+	MOVW off+20(FP), R5		// arg 6 lower 32-bit
+	// the word at 8(R13) is skipped due to 64-bit argument alignment.
+	MOVW R5, 12(R13)
+	MOVW $0, R6 		// higher 32-bit for arg 6
+	MOVW R6, 16(R13)
+	ADD $4, R13
+	MOVW $SYS_mmap, R7
+	SWI $0
+	SUB $4, R13
+	MOVW $0, R1
+	MOVW.CS R0, R1		// if failed, put in R1
+	MOVW.CS $0, R0
+	MOVW	R0, p+24(FP)
+	MOVW	R1, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVW addr+0(FP), R0		// arg 1 addr
+	MOVW n+4(FP), R1		// arg 2 len
+	MOVW $SYS_munmap, R7
+	SWI $0
+	MOVW.CS $0, R8 // crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0		// arg 1 addr
+	MOVW	n+4(FP), R1		// arg 2 len
+	MOVW	flags+8(FP), R2		// arg 3 flags
+	MOVW	$SYS_madvise, R7
+	SWI	$0
+	MOVW.CS $-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVW new+0(FP), R0
+	MOVW old+4(FP), R1
+	MOVW $SYS_sigaltstack, R7
+	SWI $0
+	MOVW.CS $0, R8 // crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVW	sig+4(FP), R0
+	MOVW	info+8(FP), R1
+	MOVW	ctx+12(FP), R2
+	MOVW	fn+0(FP), R11
+	MOVW	R13, R4
+	SUB	$24, R13
+	BIC	$0x7, R13 // alignment for ELF ABI
+	BL	(R11)
+	MOVW	R4, R13
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVW usec+0(FP), R0
+	CALL runtime·usplitR0(SB)
+	// 0(R13) is the saved LR, don't use it
+	MOVW R0, 4(R13) // tv_sec.low
+	MOVW $0, R0
+	MOVW R0, 8(R13) // tv_sec.high
+	MOVW $1000, R2
+	MUL R1, R2
+	MOVW R2, 12(R13) // tv_nsec
+
+	MOVW $4(R13), R0 // arg 1 - rqtp
+	MOVW $0, R1      // arg 2 - rmtp
+	MOVW $SYS_nanosleep, R7
+	SWI $0
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVW mib+0(FP), R0	// arg 1 - name
+	MOVW miblen+4(FP), R1	// arg 2 - namelen
+	MOVW out+8(FP), R2	// arg 3 - old
+	MOVW size+12(FP), R3	// arg 4 - oldlenp
+	// arg 5 (newp) and arg 6 (newlen) are passed on stack
+	ADD $20, R13
+	MOVW $SYS___sysctl, R7
+	SWI $0
+	SUB.CS $0, R0, R0
+	SUB $20, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVW $SYS_sched_yield, R7
+	SWI $0
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVW how+0(FP), R0	// arg 1 - how
+	MOVW new+4(FP), R1	// arg 2 - set
+	MOVW old+8(FP), R2	// arg 3 - oset
+	MOVW $SYS_sigprocmask, R7
+	SWI $0
+	MOVW.CS $0, R8 // crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+// int32 runtime·kqueue(void)
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVW $SYS_kqueue, R7
+	SWI $0
+	RSB.CS $0, R0
+	MOVW	R0, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVW kq+0(FP), R0	// kq
+	MOVW ch+4(FP), R1	// changelist
+	MOVW nch+8(FP), R2	// nchanges
+	MOVW ev+12(FP), R3	// eventlist
+	ADD $20, R13	// pass arg 5 and 6 on stack
+	MOVW $SYS_kevent, R7
+	SWI $0
+	RSB.CS $0, R0
+	SUB $20, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd)
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVW fd+0(FP), R0	// fd
+	MOVW $2, R1	// F_SETFD
+	MOVW $1, R2	// FD_CLOEXEC
+	MOVW $SYS_fcntl, R7
+	SWI $0
+	RET
+
+// func runtime·setNonblock(fd int32)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVW	fd+0(FP), R0	// fd
+	MOVW	$3, R1	// F_GETFL
+	MOVW	$0, R2
+	MOVW	$SYS_fcntl, R7
+	SWI	$0
+	ORR	$0x4, R0, R2	// O_NONBLOCK
+	MOVW	fd+0(FP), R0	// fd
+	MOVW	$4, R1	// F_SETFL
+	MOVW	$SYS_fcntl, R7
+	SWI	$0
+	RET
+
+// TODO: this is only valid for ARMv7+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	B	runtime·armPublicationBarrier(SB)
+
+// TODO(minux): this only supports ARMv6K+.
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	WORD $0xee1d0f70 // mrc p15, 0, r0, c13, c0, 3
+	RET
+
+// func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+TEXT runtime·cpuset_getaffinity(SB), NOSPLIT, $0-28
+	MOVW	level+0(FP), R0
+	MOVW	which+4(FP), R1
+	MOVW	id_lo+8(FP), R2
+	MOVW	id_hi+12(FP), R3
+	ADD	$20, R13	// Pass size and mask on stack.
+	MOVW	$SYS_cpuset_getaffinity, R7
+	SWI	$0
+	RSB.CS	$0, R0
+	SUB	$20, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func getCntxct(physical bool) uint32
+TEXT runtime·getCntxct(SB),NOSPLIT|NOFRAME,$0-8
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	DMB
+
+	MOVB	physical+0(FP), R0
+	CMP	$1, R0
+	B.NE	3(PC)
+
+	// get CNTPCT (Physical Count Register) into R0(low) R1(high)
+	// mrrc    15, 0, r0, r1, cr14
+	WORD	$0xec510f0e
+	B	2(PC)
+
+	// get CNTVCT (Virtual Count Register) into R0(low) R1(high)
+	// mrrc    15, 1, r0, r1, cr14
+	WORD	$0xec510f1e
+
+	MOVW	R0, ret+4(FP)
+	RET
diff --git a/src/runtime/sys_freebsd_arm64.s b/src/runtime/sys_freebsd_arm64.s
new file mode 100644
index 0000000..1aa09e8
--- /dev/null
+++ b/src/runtime/sys_freebsd_arm64.s
@@ -0,0 +1,523 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for arm64, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		4
+#define FD_CLOEXEC		1
+#define F_SETFD			2
+#define F_GETFL			3
+#define F_SETFL			4
+#define O_NONBLOCK		4
+
+#define SYS_exit		1
+#define SYS_read		3
+#define SYS_write		4
+#define SYS_open		5
+#define SYS_close		6
+#define SYS_getpid		20
+#define SYS_kill		37
+#define SYS_sigaltstack		53
+#define SYS_munmap		73
+#define SYS_madvise		75
+#define SYS_setitimer		83
+#define SYS_fcntl		92
+#define SYS___sysctl		202
+#define SYS_nanosleep		240
+#define SYS_clock_gettime	232
+#define SYS_sched_yield		331
+#define SYS_sigprocmask		340
+#define SYS_kqueue		362
+#define SYS_kevent		363
+#define SYS_sigaction		416
+#define SYS_thr_exit		431
+#define SYS_thr_self		432
+#define SYS_thr_kill		433
+#define SYS__umtx_op		454
+#define SYS_thr_new		455
+#define SYS_mmap		477
+#define SYS_cpuset_getaffinity	487
+#define SYS_pipe2 		542
+
+TEXT emptyfunc<>(SB),0,$0-0
+	RET
+
+// func sys_umtx_op(addr *uint32, mode int32, val uint32, uaddr1 uintptr, ut *umtx_time) int32
+TEXT runtime·sys_umtx_op(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0
+	MOVW	mode+8(FP), R1
+	MOVW	val+12(FP), R2
+	MOVD	uaddr1+16(FP), R3
+	MOVD	ut+24(FP), R4
+	MOVD	$SYS__umtx_op, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+32(FP)
+	RET
+
+// func thr_new(param *thrparam, size int32) int32
+TEXT runtime·thr_new(SB),NOSPLIT,$0
+	MOVD	param+0(FP), R0
+	MOVW	size+8(FP), R1
+	MOVD	$SYS_thr_new, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+16(FP)
+	RET
+
+// func thr_start()
+TEXT runtime·thr_start(SB),NOSPLIT,$0
+	// set up g
+	MOVD	m_g0(R0), g
+	MOVD	R0, g_m(g)
+	BL	emptyfunc<>(SB)	 // fault if stack check is wrong
+	BL	runtime·mstart(SB)
+
+	MOVD	$2, R8	// crash (not reached)
+	MOVD	R8, (R8)
+	RET
+
+// func exit(code int32)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R0
+	MOVD	$SYS_exit, R8
+	SVC
+	MOVD	$0, R0
+	MOVD	R0, (R0)
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVD	wait+0(FP), R0
+	// We're done using the stack.
+	MOVW	$0, R1
+	STLRW	R1, (R0)
+	MOVW	$0, R0
+	MOVD	$SYS_thr_exit, R8
+	SVC
+	JMP	0(PC)
+
+// func open(name *byte, mode, perm int32) int32
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	name+0(FP), R0
+	MOVW	mode+8(FP), R1
+	MOVW	perm+12(FP), R2
+	MOVD	$SYS_open, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+16(FP)
+	RET
+
+// func closefd(fd int32) int32
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R0
+	MOVD	$SYS_close, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+8(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT|NOFRAME,$0-12
+	MOVD	$r+0(FP), R0
+	MOVW	$0, R1
+	MOVD	$SYS_pipe2, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	$r+8(FP), R0
+	MOVW	flags+0(FP), R1
+	MOVD	$SYS_pipe2, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, errno+16(FP)
+	RET
+
+// func write1(fd uintptr, p unsafe.Pointer, n int32) int32
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	fd+0(FP), R0
+	MOVD	p+8(FP), R1
+	MOVW	n+16(FP), R2
+	MOVD	$SYS_write, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0		// caller expects negative errno
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func read(fd int32, p unsafe.Pointer, n int32) int32
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R0
+	MOVD	p+8(FP), R1
+	MOVW	n+16(FP), R2
+	MOVD	$SYS_read, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0		// caller expects negative errno
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func usleep(usec uint32)
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), R3
+	MOVD	R3, R5
+	MOVW	$1000000, R4
+	UDIV	R4, R3
+	MOVD	R3, 8(RSP)
+	MUL	R3, R4
+	SUB	R4, R5
+	MOVW	$1000, R4
+	MUL	R4, R5
+	MOVD	R5, 16(RSP)
+
+	// nanosleep(&ts, 0)
+	ADD	$8, RSP, R0
+	MOVD	$0, R1
+	MOVD	$SYS_nanosleep, R8
+	SVC
+	RET
+
+// func thr_self() thread
+TEXT runtime·thr_self(SB),NOSPLIT,$8-8
+	MOVD	$ptr-8(SP), R0	// arg 1 &8(SP)
+	MOVD	$SYS_thr_self, R8
+	SVC
+	MOVD	ptr-8(SP), R0
+	MOVD	R0, ret+0(FP)
+	RET
+
+// func thr_kill(t thread, sig int)
+TEXT runtime·thr_kill(SB),NOSPLIT,$0-16
+	MOVD	tid+0(FP), R0	// arg 1 pid
+	MOVD	sig+8(FP), R1	// arg 2 sig
+	MOVD	$SYS_thr_kill, R8
+	SVC
+	RET
+
+// func raiseproc(sig uint32)
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$SYS_getpid, R8
+	SVC
+	MOVW	sig+0(FP), R1
+	MOVD	$SYS_kill, R8
+	SVC
+	RET
+
+// func setitimer(mode int32, new, old *itimerval)
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVD	$SYS_setitimer, R8
+	SVC
+	RET
+
+// func fallback_walltime() (sec int64, nsec int32)
+TEXT runtime·fallback_walltime(SB),NOSPLIT,$24-12
+	MOVW	$CLOCK_REALTIME, R0
+	MOVD	$8(RSP), R1
+	MOVD	$SYS_clock_gettime, R8
+	SVC
+	MOVD	8(RSP), R0	// sec
+	MOVW	16(RSP), R1	// nsec
+	MOVD	R0, sec+0(FP)
+	MOVW	R1, nsec+8(FP)
+	RET
+
+// func fallback_nanotime() int64
+TEXT runtime·fallback_nanotime(SB),NOSPLIT,$24-8
+	MOVD	$CLOCK_MONOTONIC, R0
+	MOVD	$8(RSP), R1
+	MOVD	$SYS_clock_gettime, R8
+	SVC
+	MOVD	8(RSP), R0	// sec
+	MOVW	16(RSP), R2	// nsec
+
+	// sec is in R0, nsec in R2
+	// return nsec in R2
+	MOVD	$1000000000, R3
+	MUL	R3, R0
+	ADD	R2, R0
+
+	MOVD	R0, ret+0(FP)
+	RET
+
+// func asmSigaction(sig uintptr, new, old *sigactiont) int32
+TEXT runtime·asmSigaction(SB),NOSPLIT|NOFRAME,$0
+	MOVD	sig+0(FP), R0		// arg 1 sig
+	MOVD	new+8(FP), R1		// arg 2 act
+	MOVD	old+16(FP), R2		// arg 3 oact
+	MOVD	$SYS_sigaction, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func sigfwd(fn uintptr, sig uint32, info *siginfo, ctx unsafe.Pointer)
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R0
+	MOVD	info+16(FP), R1
+	MOVD	ctx+24(FP), R2
+	MOVD	fn+0(FP), R11
+	BL	(R11)
+	RET
+
+// func sigtramp()
+TEXT runtime·sigtramp(SB),NOSPLIT,$192
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	MOVD	R19, 8*4(RSP)
+	MOVD	R20, 8*5(RSP)
+	MOVD	R21, 8*6(RSP)
+	MOVD	R22, 8*7(RSP)
+	MOVD	R23, 8*8(RSP)
+	MOVD	R24, 8*9(RSP)
+	MOVD	R25, 8*10(RSP)
+	MOVD	R26, 8*11(RSP)
+	MOVD	R27, 8*12(RSP)
+	MOVD	g, 8*13(RSP)
+	MOVD	R29, 8*14(RSP)
+	FMOVD	F8, 8*15(RSP)
+	FMOVD	F9, 8*16(RSP)
+	FMOVD	F10, 8*17(RSP)
+	FMOVD	F11, 8*18(RSP)
+	FMOVD	F12, 8*19(RSP)
+	FMOVD	F13, 8*20(RSP)
+	FMOVD	F14, 8*21(RSP)
+	FMOVD	F15, 8*22(RSP)
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVW	R0, 8(RSP)
+	MOVBU	runtime·iscgo(SB), R0
+	CMP	$0, R0
+	BEQ	2(PC)
+	BL	runtime·load_g(SB)
+
+	MOVD	R1, 16(RSP)
+	MOVD	R2, 24(RSP)
+	MOVD	$runtime·sigtrampgo(SB), R0
+	BL	(R0)
+
+	// Restore callee-save registers.
+	MOVD	8*4(RSP), R19
+	MOVD	8*5(RSP), R20
+	MOVD	8*6(RSP), R21
+	MOVD	8*7(RSP), R22
+	MOVD	8*8(RSP), R23
+	MOVD	8*9(RSP), R24
+	MOVD	8*10(RSP), R25
+	MOVD	8*11(RSP), R26
+	MOVD	8*12(RSP), R27
+	MOVD	8*13(RSP), g
+	MOVD	8*14(RSP), R29
+	FMOVD	8*15(RSP), F8
+	FMOVD	8*16(RSP), F9
+	FMOVD	8*17(RSP), F10
+	FMOVD	8*18(RSP), F11
+	FMOVD	8*19(RSP), F12
+	FMOVD	8*20(RSP), F13
+	FMOVD	8*21(RSP), F14
+	FMOVD	8*22(RSP), F15
+
+	RET
+
+// func mmap(addr uintptr, n uintptr, prot int, flags int, fd int, off int64) (ret uintptr, err error)
+TEXT runtime·mmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVW	prot+16(FP), R2
+	MOVW	flags+20(FP), R3
+	MOVW	fd+24(FP), R4
+	MOVW	off+28(FP), R5
+	MOVD	$SYS_mmap, R8
+	SVC
+	BCS	fail
+	MOVD	R0, p+32(FP)
+	MOVD	$0, err+40(FP)
+	RET
+fail:
+	MOVD	$0, p+32(FP)
+	MOVD	R0, err+40(FP)
+	RET
+
+// func munmap(addr uintptr, n uintptr) (err error)
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVD	$SYS_munmap, R8
+	SVC
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+
+// func madvise(addr unsafe.Pointer, n uintptr, flags int32) int32
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVW	flags+16(FP), R2
+	MOVD	$SYS_madvise, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVD	mib+0(FP), R0
+	MOVD	miblen+8(FP), R1
+	MOVD	out+16(FP), R2
+	MOVD	size+24(FP), R3
+	MOVD	dst+32(FP), R4
+	MOVD	ndst+40(FP), R5
+	MOVD	$SYS___sysctl, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+48(FP)
+	RET
+
+// func sigaltstack(new, old *stackt)
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVD	new+0(FP), R0
+	MOVD	old+8(FP), R1
+	MOVD	$SYS_sigaltstack, R8
+	SVC
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+
+// func osyield()
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$SYS_sched_yield, R8
+	SVC
+	RET
+
+// func sigprocmask(how int32, new, old *sigset)
+TEXT runtime·sigprocmask(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	how+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVD	$SYS_sigprocmask, R8
+	SVC
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+
+// func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+TEXT runtime·cpuset_getaffinity(SB),NOSPLIT|NOFRAME,$0-44
+	MOVD	level+0(FP), R0
+	MOVD	which+8(FP), R1
+	MOVD	id+16(FP), R2
+	MOVD	size+24(FP), R3
+	MOVD	mask+32(FP), R4
+	MOVD	$SYS_cpuset_getaffinity, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+40(FP)
+	RET
+
+// func kqueue() int32
+TEXT runtime·kqueue(SB),NOSPLIT|NOFRAME,$0
+	MOVD $SYS_kqueue, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+0(FP)
+	RET
+
+// func kevent(kq int, ch unsafe.Pointer, nch int, ev unsafe.Pointer, nev int, ts *Timespec) (n int, err error)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVW	kq+0(FP), R0
+	MOVD	ch+8(FP), R1
+	MOVW	nch+16(FP), R2
+	MOVD	ev+24(FP), R3
+	MOVW	nev+32(FP), R4
+	MOVD	ts+40(FP), R5
+	MOVD	$SYS_kevent, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+48(FP)
+	RET
+
+// func closeonexec(fd int32)
+TEXT runtime·closeonexec(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R0
+	MOVD	$F_SETFD, R1
+	MOVD	$FD_CLOEXEC, R2
+	MOVD	$SYS_fcntl, R8
+	SVC
+	RET
+
+// func runtime·setNonblock(fd int32)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVW	fd+0(FP), R0
+	MOVD	$F_GETFL, R1
+	MOVD	$0, R2
+	MOVD	$SYS_fcntl, R8
+	SVC
+	ORR	$O_NONBLOCK, R0, R2
+	MOVW	fd+0(FP), R0
+	MOVW	$F_SETFL, R1
+	MOVW	$SYS_fcntl, R7
+	SVC
+	RET
+
+// func getCntxct(physical bool) uint32
+TEXT runtime·getCntxct(SB),NOSPLIT,$0
+	MOVB	physical+0(FP), R0
+	CMP	$0, R0
+	BEQ	3(PC)
+
+	// get CNTPCT (Physical Count Register) into R0
+	MRS	CNTPCT_EL0, R0 // SIGILL
+	B	2(PC)
+
+	// get CNTVCT (Virtual Count Register) into R0
+	MRS	CNTVCT_EL0, R0
+
+	MOVW	R0, ret+8(FP)
+	RET
diff --git a/src/runtime/sys_libc.go b/src/runtime/sys_libc.go
new file mode 100644
index 0000000..996c032
--- /dev/null
+++ b/src/runtime/sys_libc.go
@@ -0,0 +1,53 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build darwin openbsd,amd64 openbsd,arm64
+
+package runtime
+
+import "unsafe"
+
+// Call fn with arg as its argument. Return what fn returns.
+// fn is the raw pc value of the entry point of the desired function.
+// Switches to the system stack, if not already there.
+// Preserves the calling point as the location where a profiler traceback will begin.
+//go:nosplit
+func libcCall(fn, arg unsafe.Pointer) int32 {
+	// Leave caller's PC/SP/G around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		// Make sure we don't reset libcallsp. This makes
+		// libcCall reentrant; We remember the g/pc/sp for the
+		// first call on an M, until that libcCall instance
+		// returns.  Reentrance only matters for signals, as
+		// libc never calls back into Go.  The tricky case is
+		// where we call libcX from an M and record g/pc/sp.
+		// Before that call returns, a signal arrives on the
+		// same M and the signal handling code calls another
+		// libc function.  We don't want that second libcCall
+		// from within the handler to be recorded, and we
+		// don't want that call's completion to zero
+		// libcallsp.
+		// We don't need to set libcall* while we're in a sighandler
+		// (even if we're not currently in libc) because we block all
+		// signals while we're handling a signal. That includes the
+		// profile signal, which is the one that uses the libcall* info.
+		mp = nil
+	}
+	res := asmcgocall(fn, arg)
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return res
+}
diff --git a/src/runtime/sys_linux_386.s b/src/runtime/sys_linux_386.s
new file mode 100644
index 0000000..1e3a834
--- /dev/null
+++ b/src/runtime/sys_linux_386.s
@@ -0,0 +1,808 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for 386, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// Most linux systems use glibc's dynamic linker, which puts the
+// __kernel_vsyscall vdso helper at 0x10(GS) for easy access from position
+// independent code and setldt in runtime does the same in the statically
+// linked case. However, systems that use alternative libc such as Android's
+// bionic and musl, do not save the helper anywhere, and so the only way to
+// invoke a syscall from position independent code is boring old int $0x80
+// (which is also what syscall wrappers in bionic/musl use).
+//
+// The benchmarks also showed that using int $0x80 is as fast as calling
+// *%gs:0x10 except on AMD Opteron. See https://golang.org/cl/19833
+// for the benchmark program and raw data.
+//#define INVOKE_SYSCALL	CALL	0x10(GS) // non-portable
+#define INVOKE_SYSCALL	INT	$0x80
+
+#define SYS_exit		1
+#define SYS_read		3
+#define SYS_write		4
+#define SYS_open		5
+#define SYS_close		6
+#define SYS_getpid		20
+#define SYS_access		33
+#define SYS_kill		37
+#define SYS_pipe		42
+#define SYS_brk 		45
+#define SYS_fcntl		55
+#define SYS_munmap		91
+#define SYS_socketcall		102
+#define SYS_setittimer		104
+#define SYS_clone		120
+#define SYS_sched_yield 	158
+#define SYS_nanosleep		162
+#define SYS_rt_sigreturn	173
+#define SYS_rt_sigaction	174
+#define SYS_rt_sigprocmask	175
+#define SYS_sigaltstack 	186
+#define SYS_mmap2		192
+#define SYS_mincore		218
+#define SYS_madvise		219
+#define SYS_gettid		224
+#define SYS_futex		240
+#define SYS_sched_getaffinity	242
+#define SYS_set_thread_area	243
+#define SYS_exit_group		252
+#define SYS_epoll_create	254
+#define SYS_epoll_ctl		255
+#define SYS_epoll_wait		256
+#define SYS_clock_gettime	265
+#define SYS_tgkill		270
+#define SYS_epoll_create1	329
+#define SYS_pipe2		331
+
+TEXT runtime·exit(SB),NOSPLIT,$0
+	MOVL	$SYS_exit_group, AX
+	MOVL	code+0(FP), BX
+	INVOKE_SYSCALL
+	INT $3	// not reached
+	RET
+
+TEXT exit1<>(SB),NOSPLIT,$0
+	MOVL	$SYS_exit, AX
+	MOVL	code+0(FP), BX
+	INVOKE_SYSCALL
+	INT $3	// not reached
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVL	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$1, AX	// exit (just this thread)
+	MOVL	$0, BX	// exit code
+	INT	$0x80	// no stack; must not use CALL
+	// We may not even have a stack any more.
+	INT	$3
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$0
+	MOVL	$SYS_open, AX
+	MOVL	name+0(FP), BX
+	MOVL	mode+4(FP), CX
+	MOVL	perm+8(FP), DX
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0
+	MOVL	$SYS_close, AX
+	MOVL	fd+0(FP), BX
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$0
+	MOVL	$SYS_write, AX
+	MOVL	fd+0(FP), BX
+	MOVL	p+4(FP), CX
+	MOVL	n+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$0
+	MOVL	$SYS_read, AX
+	MOVL	fd+0(FP), BX
+	MOVL	p+4(FP), CX
+	MOVL	n+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	MOVL	$SYS_pipe, AX
+	LEAL	r+0(FP), BX
+	INVOKE_SYSCALL
+	MOVL	AX, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVL	$SYS_pipe2, AX
+	LEAL	r+4(FP), BX
+	MOVL	flags+0(FP), CX
+	INVOKE_SYSCALL
+	MOVL	AX, errno+12(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$8
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVL	AX, 0(SP)
+	MOVL	$1000, AX	// usec to nsec
+	MULL	DX
+	MOVL	AX, 4(SP)
+
+	// nanosleep(&ts, 0)
+	MOVL	$SYS_nanosleep, AX
+	LEAL	0(SP), BX
+	MOVL	$0, CX
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVL	$SYS_gettid, AX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT,$12
+	MOVL	$SYS_getpid, AX
+	INVOKE_SYSCALL
+	MOVL	AX, BX	// arg 1 pid
+	MOVL	$SYS_gettid, AX
+	INVOKE_SYSCALL
+	MOVL	AX, CX	// arg 2 tid
+	MOVL	sig+0(FP), DX	// arg 3 signal
+	MOVL	$SYS_tgkill, AX
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$12
+	MOVL	$SYS_getpid, AX
+	INVOKE_SYSCALL
+	MOVL	AX, BX	// arg 1 pid
+	MOVL	sig+0(FP), CX	// arg 2 signal
+	MOVL	$SYS_kill, AX
+	INVOKE_SYSCALL
+	RET
+
+TEXT ·getpid(SB),NOSPLIT,$0-4
+	MOVL	$SYS_getpid, AX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT,$0
+	MOVL	$SYS_tgkill, AX
+	MOVL	tgid+0(FP), BX
+	MOVL	tid+4(FP), CX
+	MOVL	sig+8(FP), DX
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0-12
+	MOVL	$SYS_setittimer, AX
+	MOVL	mode+0(FP), BX
+	MOVL	new+4(FP), CX
+	MOVL	old+8(FP), DX
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT,$0-16
+	MOVL	$SYS_mincore, AX
+	MOVL	addr+0(FP), BX
+	MOVL	n+4(FP), CX
+	MOVL	dst+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB), NOSPLIT, $8-12
+	// We don't know how much stack space the VDSO code will need,
+	// so switch to g0.
+
+	MOVL	SP, BP	// Save old SP; BP unchanged by C code.
+
+	get_tls(CX)
+	MOVL	g(CX), AX
+	MOVL	g_m(AX), SI // SI unchanged by C code.
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVL	m_vdsoPC(SI), CX
+	MOVL	m_vdsoSP(SI), DX
+	MOVL	CX, 0(SP)
+	MOVL	DX, 4(SP)
+
+	LEAL	sec+0(FP), DX
+	MOVL	-4(DX), CX
+	MOVL	CX, m_vdsoPC(SI)
+	MOVL	DX, m_vdsoSP(SI)
+
+	CMPL	AX, m_curg(SI)	// Only switch if on curg.
+	JNE	noswitch
+
+	MOVL	m_g0(SI), DX
+	MOVL	(g_sched+gobuf_sp)(DX), SP	// Set SP to g0 stack
+
+noswitch:
+	SUBL	$16, SP		// Space for results
+	ANDL	$~15, SP	// Align for C code
+
+	// Stack layout, depending on call path:
+	//  x(SP)   vDSO            INVOKE_SYSCALL
+	//    12    ts.tv_nsec      ts.tv_nsec
+	//     8    ts.tv_sec       ts.tv_sec
+	//     4    &ts             -
+	//     0    CLOCK_<id>      -
+
+	MOVL	runtime·vdsoClockgettimeSym(SB), AX
+	CMPL	AX, $0
+	JEQ	fallback
+
+	LEAL	8(SP), BX	// &ts (struct timespec)
+	MOVL	BX, 4(SP)
+	MOVL	$0, 0(SP)	// CLOCK_REALTIME
+	CALL	AX
+	JMP finish
+
+fallback:
+	MOVL	$SYS_clock_gettime, AX
+	MOVL	$0, BX		// CLOCK_REALTIME
+	LEAL	8(SP), CX
+	INVOKE_SYSCALL
+
+finish:
+	MOVL	8(SP), AX	// sec
+	MOVL	12(SP), BX	// nsec
+
+	MOVL	BP, SP		// Restore real SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVL	4(SP), CX
+	MOVL	CX, m_vdsoSP(SI)
+	MOVL	0(SP), CX
+	MOVL	CX, m_vdsoPC(SI)
+
+	// sec is in AX, nsec in BX
+	MOVL	AX, sec_lo+0(FP)
+	MOVL	$0, sec_hi+4(FP)
+	MOVL	BX, nsec+8(FP)
+	RET
+
+// int64 nanotime(void) so really
+// void nanotime(int64 *nsec)
+TEXT runtime·nanotime1(SB), NOSPLIT, $8-8
+	// Switch to g0 stack. See comment above in runtime·walltime.
+
+	MOVL	SP, BP	// Save old SP; BP unchanged by C code.
+
+	get_tls(CX)
+	MOVL	g(CX), AX
+	MOVL	g_m(AX), SI // SI unchanged by C code.
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVL	m_vdsoPC(SI), CX
+	MOVL	m_vdsoSP(SI), DX
+	MOVL	CX, 0(SP)
+	MOVL	DX, 4(SP)
+
+	LEAL	ret+0(FP), DX
+	MOVL	-4(DX), CX
+	MOVL	CX, m_vdsoPC(SI)
+	MOVL	DX, m_vdsoSP(SI)
+
+	CMPL	AX, m_curg(SI)	// Only switch if on curg.
+	JNE	noswitch
+
+	MOVL	m_g0(SI), DX
+	MOVL	(g_sched+gobuf_sp)(DX), SP	// Set SP to g0 stack
+
+noswitch:
+	SUBL	$16, SP		// Space for results
+	ANDL	$~15, SP	// Align for C code
+
+	MOVL	runtime·vdsoClockgettimeSym(SB), AX
+	CMPL	AX, $0
+	JEQ	fallback
+
+	LEAL	8(SP), BX	// &ts (struct timespec)
+	MOVL	BX, 4(SP)
+	MOVL	$1, 0(SP)	// CLOCK_MONOTONIC
+	CALL	AX
+	JMP finish
+
+fallback:
+	MOVL	$SYS_clock_gettime, AX
+	MOVL	$1, BX		// CLOCK_MONOTONIC
+	LEAL	8(SP), CX
+	INVOKE_SYSCALL
+
+finish:
+	MOVL	8(SP), AX	// sec
+	MOVL	12(SP), BX	// nsec
+
+	MOVL	BP, SP		// Restore real SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVL	4(SP), CX
+	MOVL	CX, m_vdsoSP(SI)
+	MOVL	0(SP), CX
+	MOVL	CX, m_vdsoPC(SI)
+
+	// sec is in AX, nsec in BX
+	// convert to DX:AX nsec
+	MOVL	$1000000000, CX
+	MULL	CX
+	ADDL	BX, AX
+	ADCL	$0, DX
+
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT,$0
+	MOVL	$SYS_rt_sigprocmask, AX
+	MOVL	how+0(FP), BX
+	MOVL	new+4(FP), CX
+	MOVL	old+8(FP), DX
+	MOVL	size+12(FP), SI
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	INT $3
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT,$0
+	MOVL	$SYS_rt_sigaction, AX
+	MOVL	sig+0(FP), BX
+	MOVL	new+4(FP), CX
+	MOVL	old+8(FP), DX
+	MOVL	size+12(FP), SI
+	INVOKE_SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$12-16
+	MOVL	fn+0(FP), AX
+	MOVL	sig+4(FP), BX
+	MOVL	info+8(FP), CX
+	MOVL	ctx+12(FP), DX
+	MOVL	SP, SI
+	SUBL	$32, SP
+	ANDL	$-15, SP	// align stack: handler might be a C function
+	MOVL	BX, 0(SP)
+	MOVL	CX, 4(SP)
+	MOVL	DX, 8(SP)
+	MOVL	SI, 12(SP)	// save SI: handler might be a Go function
+	CALL	AX
+	MOVL	12(SP), AX
+	MOVL	AX, SP
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$28
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVL	BX, bx-4(SP)
+	MOVL	BP, bp-8(SP)
+	MOVL	SI, si-12(SP)
+	MOVL	DI, di-16(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVL	sig+0(FP), BX
+	MOVL	BX, 0(SP)
+	MOVL	info+4(FP), BX
+	MOVL	BX, 4(SP)
+	MOVL	ctx+8(FP), BX
+	MOVL	BX, 8(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	MOVL	di-16(SP), DI
+	MOVL	si-12(SP), SI
+	MOVL	bp-8(SP),  BP
+	MOVL	bx-4(SP),  BX
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	JMP	runtime·sigtramp(SB)
+
+TEXT runtime·sigreturn(SB),NOSPLIT,$0
+	MOVL	$SYS_rt_sigreturn, AX
+	// Sigreturn expects same SP as signal handler,
+	// so cannot CALL 0x10(GS) here.
+	INT	$0x80
+	INT	$3	// not reached
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVL	$SYS_mmap2, AX
+	MOVL	addr+0(FP), BX
+	MOVL	n+4(FP), CX
+	MOVL	prot+8(FP), DX
+	MOVL	flags+12(FP), SI
+	MOVL	fd+16(FP), DI
+	MOVL	off+20(FP), BP
+	SHRL	$12, BP
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	ok
+	NOTL	AX
+	INCL	AX
+	MOVL	$0, p+24(FP)
+	MOVL	AX, err+28(FP)
+	RET
+ok:
+	MOVL	AX, p+24(FP)
+	MOVL	$0, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVL	$SYS_munmap, AX
+	MOVL	addr+0(FP), BX
+	MOVL	n+4(FP), CX
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	INT $3
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVL	$SYS_madvise, AX
+	MOVL	addr+0(FP), BX
+	MOVL	n+4(FP), CX
+	MOVL	flags+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// int32 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT,$0
+	MOVL	$SYS_futex, AX
+	MOVL	addr+0(FP), BX
+	MOVL	op+4(FP), CX
+	MOVL	val+8(FP), DX
+	MOVL	ts+12(FP), SI
+	MOVL	addr2+16(FP), DI
+	MOVL	val3+20(FP), BP
+	INVOKE_SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int32 clone(int32 flags, void *stack, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT,$0
+	MOVL	$SYS_clone, AX
+	MOVL	flags+0(FP), BX
+	MOVL	stk+4(FP), CX
+	MOVL	$0, DX	// parent tid ptr
+	MOVL	$0, DI	// child tid ptr
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	SUBL	$16, CX
+	MOVL	mp+8(FP), SI
+	MOVL	SI, 0(CX)
+	MOVL	gp+12(FP), SI
+	MOVL	SI, 4(CX)
+	MOVL	fn+16(FP), SI
+	MOVL	SI, 8(CX)
+	MOVL	$1234, 12(CX)
+
+	// cannot use CALL 0x10(GS) here, because the stack changes during the
+	// system call (after CALL 0x10(GS), the child is still using the
+	// parent's stack when executing its RET instruction).
+	INT	$0x80
+
+	// In parent, return.
+	CMPL	AX, $0
+	JEQ	3(PC)
+	MOVL	AX, ret+20(FP)
+	RET
+
+	// Paranoia: check that SP is as we expect.
+	NOP	SP // tell vet SP changed - stop checking offsets
+	MOVL	12(SP), BP
+	CMPL	BP, $1234
+	JEQ	2(PC)
+	INT	$3
+
+	// Initialize AX to Linux tid
+	MOVL	$SYS_gettid, AX
+	INVOKE_SYSCALL
+
+	MOVL	0(SP), BX	    // m
+	MOVL	4(SP), DX	    // g
+	MOVL	8(SP), SI	    // fn
+
+	CMPL	BX, $0
+	JEQ	nog
+	CMPL	DX, $0
+	JEQ	nog
+
+	MOVL	AX, m_procid(BX)	// save tid as m->procid
+
+	// set up ldt 7+id to point at m->tls.
+	LEAL	m_tls(BX), BP
+	MOVL	m_id(BX), DI
+	ADDL	$7, DI	// m0 is LDT#7. count up.
+	// setldt(tls#, &tls, sizeof tls)
+	PUSHAL	// save registers
+	PUSHL	$32	// sizeof tls
+	PUSHL	BP	// &tls
+	PUSHL	DI	// tls #
+	CALL	runtime·setldt(SB)
+	POPL	AX
+	POPL	AX
+	POPL	AX
+	POPAL
+
+	// Now segment is established. Initialize m, g.
+	get_tls(AX)
+	MOVL	DX, g(AX)
+	MOVL	BX, g_m(DX)
+
+	CALL	runtime·stackcheck(SB)	// smashes AX, CX
+	MOVL	0(DX), DX	// paranoia; check they are not nil
+	MOVL	0(BX), BX
+
+	// more paranoia; check that stack splitting code works
+	PUSHAL
+	CALL	runtime·emptyfunc(SB)
+	POPAL
+
+nog:
+	CALL	SI	// fn()
+	CALL	exit1<>(SB)
+	MOVL	$0x1234, 0x1005
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVL	$SYS_sigaltstack, AX
+	MOVL	new+0(FP), BX
+	MOVL	old+4(FP), CX
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	INT	$3
+	RET
+
+// <asm-i386/ldt.h>
+// struct user_desc {
+//	unsigned int  entry_number;
+//	unsigned long base_addr;
+//	unsigned int  limit;
+//	unsigned int  seg_32bit:1;
+//	unsigned int  contents:2;
+//	unsigned int  read_exec_only:1;
+//	unsigned int  limit_in_pages:1;
+//	unsigned int  seg_not_present:1;
+//	unsigned int  useable:1;
+// };
+#define SEG_32BIT 0x01
+// contents are the 2 bits 0x02 and 0x04.
+#define CONTENTS_DATA 0x00
+#define CONTENTS_STACK 0x02
+#define CONTENTS_CODE 0x04
+#define READ_EXEC_ONLY 0x08
+#define LIMIT_IN_PAGES 0x10
+#define SEG_NOT_PRESENT 0x20
+#define USEABLE 0x40
+
+// `-1` means the kernel will pick a TLS entry on the first setldt call,
+// which happens during runtime init, and that we'll store back the saved
+// entry and reuse that on subsequent calls when creating new threads.
+DATA  runtime·tls_entry_number+0(SB)/4, $-1
+GLOBL runtime·tls_entry_number(SB), NOPTR, $4
+
+// setldt(int entry, int address, int limit)
+// We use set_thread_area, which mucks with the GDT, instead of modify_ldt,
+// which would modify the LDT, but is disabled on some kernels.
+// The name, setldt, is a misnomer, although we leave this name as it is for
+// the compatibility with other platforms.
+TEXT runtime·setldt(SB),NOSPLIT,$32
+	MOVL	base+4(FP), DX
+
+#ifdef GOOS_android
+	// Android stores the TLS offset in runtime·tls_g.
+	SUBL	runtime·tls_g(SB), DX
+	MOVL	DX, 0(DX)
+#else
+	/*
+	 * When linking against the system libraries,
+	 * we use its pthread_create and let it set up %gs
+	 * for us.  When we do that, the private storage
+	 * we get is not at 0(GS), but -4(GS).
+	 * To insulate the rest of the tool chain from this
+	 * ugliness, 8l rewrites 0(TLS) into -4(GS) for us.
+	 * To accommodate that rewrite, we translate
+	 * the address here and bump the limit to 0xffffffff (no limit)
+	 * so that -4(GS) maps to 0(address).
+	 * Also, the final 0(GS) (current 4(DX)) has to point
+	 * to itself, to mimic ELF.
+	 */
+	ADDL	$0x4, DX	// address
+	MOVL	DX, 0(DX)
+#endif
+
+	// get entry number
+	MOVL	runtime·tls_entry_number(SB), CX
+
+	// set up user_desc
+	LEAL	16(SP), AX	// struct user_desc
+	MOVL	CX, 0(AX)	// unsigned int entry_number
+	MOVL	DX, 4(AX)	// unsigned long base_addr
+	MOVL	$0xfffff, 8(AX)	// unsigned int limit
+	MOVL	$(SEG_32BIT|LIMIT_IN_PAGES|USEABLE|CONTENTS_DATA), 12(AX)	// flag bits
+
+	// call set_thread_area
+	MOVL	AX, BX	// user_desc
+	MOVL	$SYS_set_thread_area, AX
+	// We can't call this via 0x10(GS) because this is called from setldt0 to set that up.
+	INT     $0x80
+
+	// breakpoint on error
+	CMPL AX, $0xfffff001
+	JLS 2(PC)
+	INT $3
+
+	// read allocated entry number back out of user_desc
+	LEAL	16(SP), AX	// get our user_desc back
+	MOVL	0(AX), AX
+
+	// store entry number if the kernel allocated it
+	CMPL	CX, $-1
+	JNE	2(PC)
+	MOVL	AX, runtime·tls_entry_number(SB)
+
+	// compute segment selector - (entry*8+3)
+	SHLL	$3, AX
+	ADDL	$3, AX
+	MOVW	AX, GS
+
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVL	$SYS_sched_yield, AX
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT,$0
+	MOVL	$SYS_sched_getaffinity, AX
+	MOVL	pid+0(FP), BX
+	MOVL	len+4(FP), CX
+	MOVL	buf+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// int32 runtime·epollcreate(int32 size);
+TEXT runtime·epollcreate(SB),NOSPLIT,$0
+	MOVL    $SYS_epoll_create, AX
+	MOVL	size+0(FP), BX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+4(FP)
+	RET
+
+// int32 runtime·epollcreate1(int32 flags);
+TEXT runtime·epollcreate1(SB),NOSPLIT,$0
+	MOVL    $SYS_epoll_create1, AX
+	MOVL	flags+0(FP), BX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+4(FP)
+	RET
+
+// func epollctl(epfd, op, fd int32, ev *epollEvent) int
+TEXT runtime·epollctl(SB),NOSPLIT,$0
+	MOVL	$SYS_epoll_ctl, AX
+	MOVL	epfd+0(FP), BX
+	MOVL	op+4(FP), CX
+	MOVL	fd+8(FP), DX
+	MOVL	ev+12(FP), SI
+	INVOKE_SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+// int32 runtime·epollwait(int32 epfd, EpollEvent *ev, int32 nev, int32 timeout);
+TEXT runtime·epollwait(SB),NOSPLIT,$0
+	MOVL	$SYS_epoll_wait, AX
+	MOVL	epfd+0(FP), BX
+	MOVL	ev+4(FP), CX
+	MOVL	nev+8(FP), DX
+	MOVL	timeout+12(FP), SI
+	INVOKE_SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVL	$SYS_fcntl, AX
+	MOVL	fd+0(FP), BX  // fd
+	MOVL	$2, CX  // F_SETFD
+	MOVL	$1, DX  // FD_CLOEXEC
+	INVOKE_SYSCALL
+	RET
+
+// func runtime·setNonblock(fd int32)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVL	$SYS_fcntl, AX
+	MOVL	fd+0(FP), BX // fd
+	MOVL	$3, CX // F_GETFL
+	MOVL	$0, DX
+	INVOKE_SYSCALL
+	MOVL	fd+0(FP), BX // fd
+	MOVL	$4, CX // F_SETFL
+	MOVL	$0x800, DX // O_NONBLOCK
+	ORL	AX, DX
+	MOVL	$SYS_fcntl, AX
+	INVOKE_SYSCALL
+	RET
+
+// int access(const char *name, int mode)
+TEXT runtime·access(SB),NOSPLIT,$0
+	MOVL	$SYS_access, AX
+	MOVL	name+0(FP), BX
+	MOVL	mode+4(FP), CX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+// int connect(int fd, const struct sockaddr *addr, socklen_t addrlen)
+TEXT runtime·connect(SB),NOSPLIT,$0-16
+	// connect is implemented as socketcall(NR_socket, 3, *(rest of args))
+	// stack already should have fd, addr, addrlen.
+	MOVL	$SYS_socketcall, AX
+	MOVL	$3, BX  // connect
+	LEAL	fd+0(FP), CX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// int socket(int domain, int type, int protocol)
+TEXT runtime·socket(SB),NOSPLIT,$0-16
+	// socket is implemented as socketcall(NR_socket, 1, *(rest of args))
+	// stack already should have domain, type, protocol.
+	MOVL	$SYS_socketcall, AX
+	MOVL	$1, BX  // socket
+	LEAL	domain+0(FP), CX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-4
+	// Implemented as brk(NULL).
+	MOVL	$SYS_brk, AX
+	MOVL	$0, BX  // NULL
+	INVOKE_SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_linux_amd64.s b/src/runtime/sys_linux_amd64.s
new file mode 100644
index 0000000..37cb8da
--- /dev/null
+++ b/src/runtime/sys_linux_amd64.s
@@ -0,0 +1,791 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for AMD64, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define AT_FDCWD -100
+
+#define SYS_read		0
+#define SYS_write		1
+#define SYS_close		3
+#define SYS_mmap		9
+#define SYS_munmap		11
+#define SYS_brk 		12
+#define SYS_rt_sigaction	13
+#define SYS_rt_sigprocmask	14
+#define SYS_rt_sigreturn	15
+#define SYS_pipe		22
+#define SYS_sched_yield 	24
+#define SYS_mincore		27
+#define SYS_madvise		28
+#define SYS_nanosleep		35
+#define SYS_setittimer		38
+#define SYS_getpid		39
+#define SYS_socket		41
+#define SYS_connect		42
+#define SYS_clone		56
+#define SYS_exit		60
+#define SYS_kill		62
+#define SYS_fcntl		72
+#define SYS_sigaltstack 	131
+#define SYS_arch_prctl		158
+#define SYS_gettid		186
+#define SYS_futex		202
+#define SYS_sched_getaffinity	204
+#define SYS_epoll_create	213
+#define SYS_clock_gettime	228
+#define SYS_exit_group		231
+#define SYS_epoll_ctl		233
+#define SYS_tgkill		234
+#define SYS_openat		257
+#define SYS_faccessat		269
+#define SYS_epoll_pwait		281
+#define SYS_epoll_create1	291
+#define SYS_pipe2		293
+
+TEXT runtime·exit(SB),NOSPLIT,$0-4
+	MOVL	code+0(FP), DI
+	MOVL	$SYS_exit_group, AX
+	SYSCALL
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-8
+	MOVQ	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$0, DI	// exit code
+	MOVL	$SYS_exit, AX
+	SYSCALL
+	// We may not even have a stack any more.
+	INT	$3
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$0-20
+	// This uses openat instead of open, because Android O blocks open.
+	MOVL	$AT_FDCWD, DI // AT_FDCWD, so this acts like open
+	MOVQ	name+0(FP), SI
+	MOVL	mode+8(FP), DX
+	MOVL	perm+12(FP), R10
+	MOVL	$SYS_openat, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0-12
+	MOVL	fd+0(FP), DI
+	MOVL	$SYS_close, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$0-28
+	MOVQ	fd+0(FP), DI
+	MOVQ	p+8(FP), SI
+	MOVL	n+16(FP), DX
+	MOVL	$SYS_write, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$0-28
+	MOVL	fd+0(FP), DI
+	MOVQ	p+8(FP), SI
+	MOVL	n+16(FP), DX
+	MOVL	$SYS_read, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	LEAQ	r+0(FP), DI
+	MOVL	$SYS_pipe, AX
+	SYSCALL
+	MOVL	AX, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-20
+	LEAQ	r+8(FP), DI
+	MOVL	flags+0(FP), SI
+	MOVL	$SYS_pipe2, AX
+	SYSCALL
+	MOVL	AX, errno+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVQ	AX, 0(SP)
+	MOVL	$1000, AX	// usec to nsec
+	MULL	DX
+	MOVQ	AX, 8(SP)
+
+	// nanosleep(&ts, 0)
+	MOVQ	SP, DI
+	MOVL	$0, SI
+	MOVL	$SYS_nanosleep, AX
+	SYSCALL
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVL	$SYS_gettid, AX
+	SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT,$0
+	MOVL	$SYS_getpid, AX
+	SYSCALL
+	MOVL	AX, R12
+	MOVL	$SYS_gettid, AX
+	SYSCALL
+	MOVL	AX, SI	// arg 2 tid
+	MOVL	R12, DI	// arg 1 pid
+	MOVL	sig+0(FP), DX	// arg 3
+	MOVL	$SYS_tgkill, AX
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	MOVL	$SYS_getpid, AX
+	SYSCALL
+	MOVL	AX, DI	// arg 1 pid
+	MOVL	sig+0(FP), SI	// arg 2
+	MOVL	$SYS_kill, AX
+	SYSCALL
+	RET
+
+TEXT ·getpid(SB),NOSPLIT,$0-8
+	MOVL	$SYS_getpid, AX
+	SYSCALL
+	MOVQ	AX, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT,$0
+	MOVQ	tgid+0(FP), DI
+	MOVQ	tid+8(FP), SI
+	MOVQ	sig+16(FP), DX
+	MOVL	$SYS_tgkill, AX
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0-24
+	MOVL	mode+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVL	$SYS_setittimer, AX
+	SYSCALL
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT,$0-28
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVQ	dst+16(FP), DX
+	MOVL	$SYS_mincore, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+// non-zero frame-size means bp is saved and restored
+TEXT runtime·walltime1(SB),NOSPLIT,$16-12
+	// We don't know how much stack space the VDSO code will need,
+	// so switch to g0.
+	// In particular, a kernel configured with CONFIG_OPTIMIZE_INLINING=n
+	// and hardening can use a full page of stack space in gettime_sym
+	// due to stack probes inserted to avoid stack/heap collisions.
+	// See issue #20427.
+
+	MOVQ	SP, R12	// Save old SP; R12 unchanged by C code.
+
+	get_tls(CX)
+	MOVQ	g(CX), AX
+	MOVQ	g_m(AX), BX // BX unchanged by C code.
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVQ	m_vdsoPC(BX), CX
+	MOVQ	m_vdsoSP(BX), DX
+	MOVQ	CX, 0(SP)
+	MOVQ	DX, 8(SP)
+
+	LEAQ	sec+0(FP), DX
+	MOVQ	-8(DX), CX
+	MOVQ	CX, m_vdsoPC(BX)
+	MOVQ	DX, m_vdsoSP(BX)
+
+	CMPQ	AX, m_curg(BX)	// Only switch if on curg.
+	JNE	noswitch
+
+	MOVQ	m_g0(BX), DX
+	MOVQ	(g_sched+gobuf_sp)(DX), SP	// Set SP to g0 stack
+
+noswitch:
+	SUBQ	$16, SP		// Space for results
+	ANDQ	$~15, SP	// Align for C code
+
+	MOVL	$0, DI // CLOCK_REALTIME
+	LEAQ	0(SP), SI
+	MOVQ	runtime·vdsoClockgettimeSym(SB), AX
+	CMPQ	AX, $0
+	JEQ	fallback
+	CALL	AX
+ret:
+	MOVQ	0(SP), AX	// sec
+	MOVQ	8(SP), DX	// nsec
+	MOVQ	R12, SP		// Restore real SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVQ	8(SP), CX
+	MOVQ	CX, m_vdsoSP(BX)
+	MOVQ	0(SP), CX
+	MOVQ	CX, m_vdsoPC(BX)
+	MOVQ	AX, sec+0(FP)
+	MOVL	DX, nsec+8(FP)
+	RET
+fallback:
+	MOVQ	$SYS_clock_gettime, AX
+	SYSCALL
+	JMP ret
+
+// func nanotime1() int64
+TEXT runtime·nanotime1(SB),NOSPLIT,$16-8
+	// Switch to g0 stack. See comment above in runtime·walltime.
+
+	MOVQ	SP, R12	// Save old SP; R12 unchanged by C code.
+
+	get_tls(CX)
+	MOVQ	g(CX), AX
+	MOVQ	g_m(AX), BX // BX unchanged by C code.
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVQ	m_vdsoPC(BX), CX
+	MOVQ	m_vdsoSP(BX), DX
+	MOVQ	CX, 0(SP)
+	MOVQ	DX, 8(SP)
+
+	LEAQ	ret+0(FP), DX
+	MOVQ	-8(DX), CX
+	MOVQ	CX, m_vdsoPC(BX)
+	MOVQ	DX, m_vdsoSP(BX)
+
+	CMPQ	AX, m_curg(BX)	// Only switch if on curg.
+	JNE	noswitch
+
+	MOVQ	m_g0(BX), DX
+	MOVQ	(g_sched+gobuf_sp)(DX), SP	// Set SP to g0 stack
+
+noswitch:
+	SUBQ	$16, SP		// Space for results
+	ANDQ	$~15, SP	// Align for C code
+
+	MOVL	$1, DI // CLOCK_MONOTONIC
+	LEAQ	0(SP), SI
+	MOVQ	runtime·vdsoClockgettimeSym(SB), AX
+	CMPQ	AX, $0
+	JEQ	fallback
+	CALL	AX
+ret:
+	MOVQ	0(SP), AX	// sec
+	MOVQ	8(SP), DX	// nsec
+	MOVQ	R12, SP		// Restore real SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVQ	8(SP), CX
+	MOVQ	CX, m_vdsoSP(BX)
+	MOVQ	0(SP), CX
+	MOVQ	CX, m_vdsoPC(BX)
+	// sec is in AX, nsec in DX
+	// return nsec in AX
+	IMULQ	$1000000000, AX
+	ADDQ	DX, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+fallback:
+	MOVQ	$SYS_clock_gettime, AX
+	SYSCALL
+	JMP	ret
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT,$0-28
+	MOVL	how+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVL	size+24(FP), R10
+	MOVL	$SYS_rt_sigprocmask, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT,$0-36
+	MOVQ	sig+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVQ	size+24(FP), R10
+	MOVL	$SYS_rt_sigaction, AX
+	SYSCALL
+	MOVL	AX, ret+32(FP)
+	RET
+
+// Call the function stored in _cgo_sigaction using the GCC calling convention.
+TEXT runtime·callCgoSigaction(SB),NOSPLIT,$16
+	MOVQ	sig+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVQ	_cgo_sigaction(SB), AX
+	MOVQ	SP, BX	// callee-saved
+	ANDQ	$~15, SP	// alignment as per amd64 psABI
+	CALL	AX
+	MOVQ	BX, SP
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	PUSHQ	BP
+	MOVQ	SP, BP
+	ANDQ	$~15, SP     // alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// Defined as ABIInternal since it does not use the stack-based Go ABI.
+TEXT runtime·sigtramp<ABIInternal>(SB),NOSPLIT,$72
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVQ	BX,  bx-8(SP)
+	MOVQ	BP,  bp-16(SP)  // save in case GOEXPERIMENT=noframepointer is set
+	MOVQ	R12, r12-24(SP)
+	MOVQ	R13, r13-32(SP)
+	MOVQ	R14, r14-40(SP)
+	MOVQ	R15, r15-48(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVQ	DX, ctx-56(SP)
+	MOVQ	SI, info-64(SP)
+	MOVQ	DI, signum-72(SP)
+	MOVQ	$runtime·sigtrampgo(SB), AX
+	CALL AX
+
+	MOVQ	r15-48(SP), R15
+	MOVQ	r14-40(SP), R14
+	MOVQ	r13-32(SP), R13
+	MOVQ	r12-24(SP), R12
+	MOVQ	bp-16(SP),  BP
+	MOVQ	bx-8(SP),   BX
+	RET
+
+// Used instead of sigtramp in programs that use cgo.
+// Arguments from kernel are in DI, SI, DX.
+// Defined as ABIInternal since it does not use the stack-based Go ABI.
+TEXT runtime·cgoSigtramp<ABIInternal>(SB),NOSPLIT,$0
+	// If no traceback function, do usual sigtramp.
+	MOVQ	runtime·cgoTraceback(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// If no traceback support function, which means that
+	// runtime/cgo was not linked in, do usual sigtramp.
+	MOVQ	_cgo_callers(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// Figure out if we are currently in a cgo call.
+	// If not, just do usual sigtramp.
+	get_tls(CX)
+	MOVQ	g(CX),AX
+	TESTQ	AX, AX
+	JZ	sigtrampnog     // g == nil
+	MOVQ	g_m(AX), AX
+	TESTQ	AX, AX
+	JZ	sigtramp        // g.m == nil
+	MOVL	m_ncgo(AX), CX
+	TESTL	CX, CX
+	JZ	sigtramp        // g.m.ncgo == 0
+	MOVQ	m_curg(AX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg == nil
+	MOVQ	g_syscallsp(CX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg.syscallsp == 0
+	MOVQ	m_cgoCallers(AX), R8
+	TESTQ	R8, R8
+	JZ	sigtramp        // g.m.cgoCallers == nil
+	MOVL	m_cgoCallersUse(AX), CX
+	TESTL	CX, CX
+	JNZ	sigtramp	// g.m.cgoCallersUse != 0
+
+	// Jump to a function in runtime/cgo.
+	// That function, written in C, will call the user's traceback
+	// function with proper unwind info, and will then call back here.
+	// The first three arguments, and the fifth, are already in registers.
+	// Set the two remaining arguments now.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigtramp<ABIInternal>(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+sigtramp:
+	JMP	runtime·sigtramp<ABIInternal>(SB)
+
+sigtrampnog:
+	// Signal arrived on a non-Go thread. If this is SIGPROF, get a
+	// stack trace.
+	CMPL	DI, $27 // 27 == SIGPROF
+	JNZ	sigtramp
+
+	// Lock sigprofCallersUse.
+	MOVL	$0, AX
+	MOVL	$1, CX
+	MOVQ	$runtime·sigprofCallersUse(SB), R11
+	LOCK
+	CMPXCHGL	CX, 0(R11)
+	JNZ	sigtramp  // Skip stack trace if already locked.
+
+	// Jump to the traceback function in runtime/cgo.
+	// It will call back to sigprofNonGo, which will ignore the
+	// arguments passed in registers.
+	// First three arguments to traceback function are in registers already.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigprofCallers(SB), R8
+	MOVQ	$runtime·sigprofNonGo(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+// For cgo unwinding to work, this function must look precisely like
+// the one in glibc.  The glibc source code is:
+// https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c
+// The code that cares about the precise instructions used is:
+// https://gcc.gnu.org/viewcvs/gcc/trunk/libgcc/config/i386/linux-unwind.h?revision=219188&view=markup
+// Defined as ABIInternal since it does not use the stack-based Go ABI.
+TEXT runtime·sigreturn<ABIInternal>(SB),NOSPLIT,$0
+	MOVQ	$SYS_rt_sigreturn, AX
+	SYSCALL
+	INT $3	// not reached
+
+TEXT runtime·sysMmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	prot+16(FP), DX
+	MOVL	flags+20(FP), R10
+	MOVL	fd+24(FP), R8
+	MOVL	off+28(FP), R9
+
+	MOVL	$SYS_mmap, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	ok
+	NOTQ	AX
+	INCQ	AX
+	MOVQ	$0, p+32(FP)
+	MOVQ	AX, err+40(FP)
+	RET
+ok:
+	MOVQ	AX, p+32(FP)
+	MOVQ	$0, err+40(FP)
+	RET
+
+// Call the function stored in _cgo_mmap using the GCC calling convention.
+// This must be called on the system stack.
+TEXT runtime·callCgoMmap(SB),NOSPLIT,$16
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	prot+16(FP), DX
+	MOVL	flags+20(FP), CX
+	MOVL	fd+24(FP), R8
+	MOVL	off+28(FP), R9
+	MOVQ	_cgo_mmap(SB), AX
+	MOVQ	SP, BX
+	ANDQ	$~15, SP	// alignment as per amd64 psABI
+	MOVQ	BX, 0(SP)
+	CALL	AX
+	MOVQ	0(SP), SP
+	MOVQ	AX, ret+32(FP)
+	RET
+
+TEXT runtime·sysMunmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVQ	$SYS_munmap, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// Call the function stored in _cgo_munmap using the GCC calling convention.
+// This must be called on the system stack.
+TEXT runtime·callCgoMunmap(SB),NOSPLIT,$16-16
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVQ	_cgo_munmap(SB), AX
+	MOVQ	SP, BX
+	ANDQ	$~15, SP	// alignment as per amd64 psABI
+	MOVQ	BX, 0(SP)
+	CALL	AX
+	MOVQ	0(SP), SP
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	flags+16(FP), DX
+	MOVQ	$SYS_madvise, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int64 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVL	op+8(FP), SI
+	MOVL	val+12(FP), DX
+	MOVQ	ts+16(FP), R10
+	MOVQ	addr2+24(FP), R8
+	MOVL	val3+32(FP), R9
+	MOVL	$SYS_futex, AX
+	SYSCALL
+	MOVL	AX, ret+40(FP)
+	RET
+
+// int32 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT,$0
+	MOVL	flags+0(FP), DI
+	MOVQ	stk+8(FP), SI
+	MOVQ	$0, DX
+	MOVQ	$0, R10
+	MOVQ    $0, R8
+	// Copy mp, gp, fn off parent stack for use by child.
+	// Careful: Linux system call clobbers CX and R11.
+	MOVQ	mp+16(FP), R13
+	MOVQ	gp+24(FP), R9
+	MOVQ	fn+32(FP), R12
+	CMPQ	R13, $0    // m
+	JEQ	nog1
+	CMPQ	R9, $0    // g
+	JEQ	nog1
+	LEAQ	m_tls(R13), R8
+#ifdef GOOS_android
+	// Android stores the TLS offset in runtime·tls_g.
+	SUBQ	runtime·tls_g(SB), R8
+#else
+	ADDQ	$8, R8	// ELF wants to use -8(FS)
+#endif
+	ORQ 	$0x00080000, DI //add flag CLONE_SETTLS(0x00080000) to call clone
+nog1:
+	MOVL	$SYS_clone, AX
+	SYSCALL
+
+	// In parent, return.
+	CMPQ	AX, $0
+	JEQ	3(PC)
+	MOVL	AX, ret+40(FP)
+	RET
+
+	// In child, on new stack.
+	MOVQ	SI, SP
+
+	// If g or m are nil, skip Go-related setup.
+	CMPQ	R13, $0    // m
+	JEQ	nog2
+	CMPQ	R9, $0    // g
+	JEQ	nog2
+
+	// Initialize m->procid to Linux tid
+	MOVL	$SYS_gettid, AX
+	SYSCALL
+	MOVQ	AX, m_procid(R13)
+
+	// In child, set up new stack
+	get_tls(CX)
+	MOVQ	R13, g_m(R9)
+	MOVQ	R9, g(CX)
+	CALL	runtime·stackcheck(SB)
+
+nog2:
+	// Call fn
+	CALL	R12
+
+	// It shouldn't return. If it does, exit that thread.
+	MOVL	$111, DI
+	MOVL	$SYS_exit, AX
+	SYSCALL
+	JMP	-3(PC)	// keep exiting
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVQ	new+0(FP), DI
+	MOVQ	old+8(FP), SI
+	MOVQ	$SYS_sigaltstack, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// set tls base to DI
+TEXT runtime·settls(SB),NOSPLIT,$32
+#ifdef GOOS_android
+	// Android stores the TLS offset in runtime·tls_g.
+	SUBQ	runtime·tls_g(SB), DI
+#else
+	ADDQ	$8, DI	// ELF wants to use -8(FS)
+#endif
+	MOVQ	DI, SI
+	MOVQ	$0x1002, DI	// ARCH_SET_FS
+	MOVQ	$SYS_arch_prctl, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVL	$SYS_sched_yield, AX
+	SYSCALL
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT,$0
+	MOVQ	pid+0(FP), DI
+	MOVQ	len+8(FP), SI
+	MOVQ	buf+16(FP), DX
+	MOVL	$SYS_sched_getaffinity, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int32 runtime·epollcreate(int32 size);
+TEXT runtime·epollcreate(SB),NOSPLIT,$0
+	MOVL    size+0(FP), DI
+	MOVL    $SYS_epoll_create, AX
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+// int32 runtime·epollcreate1(int32 flags);
+TEXT runtime·epollcreate1(SB),NOSPLIT,$0
+	MOVL	flags+0(FP), DI
+	MOVL	$SYS_epoll_create1, AX
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+// func epollctl(epfd, op, fd int32, ev *epollEvent) int
+TEXT runtime·epollctl(SB),NOSPLIT,$0
+	MOVL	epfd+0(FP), DI
+	MOVL	op+4(FP), SI
+	MOVL	fd+8(FP), DX
+	MOVQ	ev+16(FP), R10
+	MOVL	$SYS_epoll_ctl, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int32 runtime·epollwait(int32 epfd, EpollEvent *ev, int32 nev, int32 timeout);
+TEXT runtime·epollwait(SB),NOSPLIT,$0
+	// This uses pwait instead of wait, because Android O blocks wait.
+	MOVL	epfd+0(FP), DI
+	MOVQ	ev+8(FP), SI
+	MOVL	nev+16(FP), DX
+	MOVL	timeout+20(FP), R10
+	MOVQ	$0, R8
+	MOVL	$SYS_epoll_pwait, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVL    fd+0(FP), DI  // fd
+	MOVQ    $2, SI  // F_SETFD
+	MOVQ    $1, DX  // FD_CLOEXEC
+	MOVL	$SYS_fcntl, AX
+	SYSCALL
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVL    fd+0(FP), DI  // fd
+	MOVQ    $3, SI  // F_GETFL
+	MOVQ    $0, DX
+	MOVL	$SYS_fcntl, AX
+	SYSCALL
+	MOVL	fd+0(FP), DI // fd
+	MOVQ	$4, SI // F_SETFL
+	MOVQ	$0x800, DX // O_NONBLOCK
+	ORL	AX, DX
+	MOVL	$SYS_fcntl, AX
+	SYSCALL
+	RET
+
+// int access(const char *name, int mode)
+TEXT runtime·access(SB),NOSPLIT,$0
+	// This uses faccessat instead of access, because Android O blocks access.
+	MOVL	$AT_FDCWD, DI // AT_FDCWD, so this acts like access
+	MOVQ	name+0(FP), SI
+	MOVL	mode+8(FP), DX
+	MOVL	$0, R10
+	MOVL	$SYS_faccessat, AX
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+// int connect(int fd, const struct sockaddr *addr, socklen_t addrlen)
+TEXT runtime·connect(SB),NOSPLIT,$0-28
+	MOVL	fd+0(FP), DI
+	MOVQ	addr+8(FP), SI
+	MOVL	len+16(FP), DX
+	MOVL	$SYS_connect, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int socket(int domain, int type, int protocol)
+TEXT runtime·socket(SB),NOSPLIT,$0-20
+	MOVL	domain+0(FP), DI
+	MOVL	typ+4(FP), SI
+	MOVL	prot+8(FP), DX
+	MOVL	$SYS_socket, AX
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-8
+	// Implemented as brk(NULL).
+	MOVQ	$0, DI
+	MOVL	$SYS_brk, AX
+	SYSCALL
+	MOVQ	AX, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_linux_arm.s b/src/runtime/sys_linux_arm.s
new file mode 100644
index 0000000..475f523
--- /dev/null
+++ b/src/runtime/sys_linux_arm.s
@@ -0,0 +1,743 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for arm, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME	0
+#define CLOCK_MONOTONIC	1
+
+// for EABI, as we don't support OABI
+#define SYS_BASE 0x0
+
+#define SYS_exit (SYS_BASE + 1)
+#define SYS_read (SYS_BASE + 3)
+#define SYS_write (SYS_BASE + 4)
+#define SYS_open (SYS_BASE + 5)
+#define SYS_close (SYS_BASE + 6)
+#define SYS_getpid (SYS_BASE + 20)
+#define SYS_kill (SYS_BASE + 37)
+#define SYS_pipe (SYS_BASE + 42)
+#define SYS_clone (SYS_BASE + 120)
+#define SYS_rt_sigreturn (SYS_BASE + 173)
+#define SYS_rt_sigaction (SYS_BASE + 174)
+#define SYS_rt_sigprocmask (SYS_BASE + 175)
+#define SYS_sigaltstack (SYS_BASE + 186)
+#define SYS_mmap2 (SYS_BASE + 192)
+#define SYS_futex (SYS_BASE + 240)
+#define SYS_exit_group (SYS_BASE + 248)
+#define SYS_munmap (SYS_BASE + 91)
+#define SYS_madvise (SYS_BASE + 220)
+#define SYS_setitimer (SYS_BASE + 104)
+#define SYS_mincore (SYS_BASE + 219)
+#define SYS_gettid (SYS_BASE + 224)
+#define SYS_tgkill (SYS_BASE + 268)
+#define SYS_sched_yield (SYS_BASE + 158)
+#define SYS_nanosleep (SYS_BASE + 162)
+#define SYS_sched_getaffinity (SYS_BASE + 242)
+#define SYS_clock_gettime (SYS_BASE + 263)
+#define SYS_epoll_create (SYS_BASE + 250)
+#define SYS_epoll_ctl (SYS_BASE + 251)
+#define SYS_epoll_wait (SYS_BASE + 252)
+#define SYS_epoll_create1 (SYS_BASE + 357)
+#define SYS_pipe2 (SYS_BASE + 359)
+#define SYS_fcntl (SYS_BASE + 55)
+#define SYS_access (SYS_BASE + 33)
+#define SYS_connect (SYS_BASE + 283)
+#define SYS_socket (SYS_BASE + 281)
+#define SYS_brk (SYS_BASE + 45)
+
+#define ARM_BASE (SYS_BASE + 0x0f0000)
+
+TEXT runtime·open(SB),NOSPLIT,$0
+	MOVW	name+0(FP), R0
+	MOVW	mode+4(FP), R1
+	MOVW	perm+8(FP), R2
+	MOVW	$SYS_open, R7
+	SWI	$0
+	MOVW	$0xfffff001, R1
+	CMP	R1, R0
+	MOVW.HI	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0
+	MOVW	$SYS_close, R7
+	SWI	$0
+	MOVW	$0xfffff001, R1
+	CMP	R1, R0
+	MOVW.HI	$-1, R0
+	MOVW	R0, ret+4(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0
+	MOVW	p+4(FP), R1
+	MOVW	n+8(FP), R2
+	MOVW	$SYS_write, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0
+	MOVW	p+4(FP), R1
+	MOVW	n+8(FP), R2
+	MOVW	$SYS_read, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	MOVW	$r+0(FP), R0
+	MOVW	$SYS_pipe, R7
+	SWI	$0
+	MOVW	R0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVW	$r+4(FP), R0
+	MOVW	flags+0(FP), R1
+	MOVW	$SYS_pipe2, R7
+	SWI	$0
+	MOVW	R0, errno+12(FP)
+	RET
+
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0
+	MOVW	code+0(FP), R0
+	MOVW	$SYS_exit_group, R7
+	SWI	$0
+	MOVW	$1234, R0
+	MOVW	$1002, R1
+	MOVW	R0, (R1)	// fail hard
+
+TEXT exit1<>(SB),NOSPLIT|NOFRAME,$0
+	MOVW	code+0(FP), R0
+	MOVW	$SYS_exit, R7
+	SWI	$0
+	MOVW	$1234, R0
+	MOVW	$1003, R1
+	MOVW	R0, (R1)	// fail hard
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	wait+0(FP), R0
+	// We're done using the stack.
+	// Alas, there's no reliable way to make this write atomic
+	// without potentially using the stack. So it goes.
+	MOVW	$0, R1
+	MOVW	R1, (R0)
+	MOVW	$0, R0	// exit code
+	MOVW	$SYS_exit, R7
+	SWI	$0
+	MOVW	$1234, R0
+	MOVW	$1004, R1
+	MOVW	R0, (R1)	// fail hard
+	JMP	0(PC)
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVW	$SYS_gettid, R7
+	SWI	$0
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT	runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$SYS_getpid, R7
+	SWI	$0
+	MOVW	R0, R4
+	MOVW	$SYS_gettid, R7
+	SWI	$0
+	MOVW	R0, R1	// arg 2 tid
+	MOVW	R4, R0	// arg 1 pid
+	MOVW	sig+0(FP), R2	// arg 3
+	MOVW	$SYS_tgkill, R7
+	SWI	$0
+	RET
+
+TEXT	runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$SYS_getpid, R7
+	SWI	$0
+	// arg 1 tid already in R0 from getpid
+	MOVW	sig+0(FP), R1	// arg 2 - signal
+	MOVW	$SYS_kill, R7
+	SWI	$0
+	RET
+
+TEXT ·getpid(SB),NOSPLIT,$0-4
+	MOVW	$SYS_getpid, R7
+	SWI	$0
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT,$0-12
+	MOVW	tgid+0(FP), R0
+	MOVW	tid+4(FP), R1
+	MOVW	sig+8(FP), R2
+	MOVW	$SYS_tgkill, R7
+	SWI	$0
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0
+	MOVW	n+4(FP), R1
+	MOVW	prot+8(FP), R2
+	MOVW	flags+12(FP), R3
+	MOVW	fd+16(FP), R4
+	MOVW	off+20(FP), R5
+	MOVW	$SYS_mmap2, R7
+	SWI	$0
+	MOVW	$0xfffff001, R6
+	CMP		R6, R0
+	MOVW	$0, R1
+	RSB.HI	$0, R0
+	MOVW.HI	R0, R1		// if error, put in R1
+	MOVW.HI	$0, R0
+	MOVW	R0, p+24(FP)
+	MOVW	R1, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0
+	MOVW	n+4(FP), R1
+	MOVW	$SYS_munmap, R7
+	SWI	$0
+	MOVW	$0xfffff001, R6
+	CMP 	R6, R0
+	MOVW.HI	$0, R8  // crash on syscall failure
+	MOVW.HI	R8, (R8)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0
+	MOVW	n+4(FP), R1
+	MOVW	flags+8(FP), R2
+	MOVW	$SYS_madvise, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0
+	MOVW	mode+0(FP), R0
+	MOVW	new+4(FP), R1
+	MOVW	old+8(FP), R2
+	MOVW	$SYS_setitimer, R7
+	SWI	$0
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0
+	MOVW	n+4(FP), R1
+	MOVW	dst+8(FP), R2
+	MOVW	$SYS_mincore, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·walltime1(SB),NOSPLIT,$8-12
+	// We don't know how much stack space the VDSO code will need,
+	// so switch to g0.
+
+	// Save old SP. Use R13 instead of SP to avoid linker rewriting the offsets.
+	MOVW	R13, R4	// R4 is unchanged by C code.
+
+	MOVW	g_m(g), R5 // R5 is unchanged by C code.
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVW	m_vdsoPC(R5), R1
+	MOVW	m_vdsoSP(R5), R2
+	MOVW	R1, 4(R13)
+	MOVW	R2, 8(R13)
+
+	MOVW	LR, m_vdsoPC(R5)
+	MOVW	R13, m_vdsoSP(R5)
+
+	MOVW	m_curg(R5), R0
+
+	CMP	g, R0		// Only switch if on curg.
+	B.NE	noswitch
+
+	MOVW	m_g0(R5), R0
+	MOVW	(g_sched+gobuf_sp)(R0), R13	 // Set SP to g0 stack
+
+noswitch:
+	SUB	$24, R13	// Space for results
+	BIC	$0x7, R13	// Align for C code
+
+	MOVW	$CLOCK_REALTIME, R0
+	MOVW	$8(R13), R1	// timespec
+	MOVW	runtime·vdsoClockgettimeSym(SB), R2
+	CMP	$0, R2
+	B.EQ	fallback
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+	// If we don't have a signal stack, we won't receive signal,
+	// so don't bother saving g.
+	// When using cgo, we already saved g on TLS, also don't save
+	// g here.
+	// Also don't save g if we are already on the signal stack.
+	// We won't get a nested signal.
+	MOVB	runtime·iscgo(SB), R6
+	CMP	$0, R6
+	BNE	nosaveg
+	MOVW	m_gsignal(R5), R6          // g.m.gsignal
+	CMP	$0, R6
+	BEQ	nosaveg
+	CMP	g, R6
+	BEQ	nosaveg
+	MOVW	(g_stack+stack_lo)(R6), R6 // g.m.gsignal.stack.lo
+	MOVW	g, (R6)
+
+	BL	(R2)
+
+	MOVW	$0, R1
+	MOVW	R1, (R6) // clear g slot, R6 is unchanged by C code
+
+	JMP	finish
+
+nosaveg:
+	BL	(R2)
+	JMP	finish
+
+fallback:
+	MOVW	$SYS_clock_gettime, R7
+	SWI	$0
+
+finish:
+	MOVW	8(R13), R0  // sec
+	MOVW	12(R13), R2  // nsec
+
+	MOVW	R4, R13		// Restore real SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVW	8(R13), R1
+	MOVW	R1, m_vdsoSP(R5)
+	MOVW	4(R13), R1
+	MOVW	R1, m_vdsoPC(R5)
+
+	MOVW	R0, sec_lo+0(FP)
+	MOVW	R1, sec_hi+4(FP)
+	MOVW	R2, nsec+8(FP)
+	RET
+
+// int64 nanotime1(void)
+TEXT runtime·nanotime1(SB),NOSPLIT,$8-8
+	// Switch to g0 stack. See comment above in runtime·walltime.
+
+	// Save old SP. Use R13 instead of SP to avoid linker rewriting the offsets.
+	MOVW	R13, R4	// R4 is unchanged by C code.
+
+	MOVW	g_m(g), R5 // R5 is unchanged by C code.
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVW	m_vdsoPC(R5), R1
+	MOVW	m_vdsoSP(R5), R2
+	MOVW	R1, 4(R13)
+	MOVW	R2, 8(R13)
+
+	MOVW	LR, m_vdsoPC(R5)
+	MOVW	R13, m_vdsoSP(R5)
+
+	MOVW	m_curg(R5), R0
+
+	CMP	g, R0		// Only switch if on curg.
+	B.NE	noswitch
+
+	MOVW	m_g0(R5), R0
+	MOVW	(g_sched+gobuf_sp)(R0), R13	// Set SP to g0 stack
+
+noswitch:
+	SUB	$24, R13	// Space for results
+	BIC	$0x7, R13	// Align for C code
+
+	MOVW	$CLOCK_MONOTONIC, R0
+	MOVW	$8(R13), R1	// timespec
+	MOVW	runtime·vdsoClockgettimeSym(SB), R2
+	CMP	$0, R2
+	B.EQ	fallback
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+	// If we don't have a signal stack, we won't receive signal,
+	// so don't bother saving g.
+	// When using cgo, we already saved g on TLS, also don't save
+	// g here.
+	// Also don't save g if we are already on the signal stack.
+	// We won't get a nested signal.
+	MOVB	runtime·iscgo(SB), R6
+	CMP	$0, R6
+	BNE	nosaveg
+	MOVW	m_gsignal(R5), R6          // g.m.gsignal
+	CMP	$0, R6
+	BEQ	nosaveg
+	CMP	g, R6
+	BEQ	nosaveg
+	MOVW	(g_stack+stack_lo)(R6), R6 // g.m.gsignal.stack.lo
+	MOVW	g, (R6)
+
+	BL	(R2)
+
+	MOVW	$0, R1
+	MOVW	R1, (R6) // clear g slot, R6 is unchanged by C code
+
+	JMP	finish
+
+nosaveg:
+	BL	(R2)
+	JMP	finish
+
+fallback:
+	MOVW	$SYS_clock_gettime, R7
+	SWI	$0
+
+finish:
+	MOVW	8(R13), R0	// sec
+	MOVW	12(R13), R2	// nsec
+
+	MOVW	R4, R13		// Restore real SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVW	8(R13), R4
+	MOVW	R4, m_vdsoSP(R5)
+	MOVW	4(R13), R4
+	MOVW	R4, m_vdsoPC(R5)
+
+	MOVW	$1000000000, R3
+	MULLU	R0, R3, (R1, R0)
+	ADD.S	R2, R0
+	ADC	R4, R1
+
+	MOVW	R0, ret_lo+0(FP)
+	MOVW	R1, ret_hi+4(FP)
+	RET
+
+// int32 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT,$0
+	MOVW    addr+0(FP), R0
+	MOVW    op+4(FP), R1
+	MOVW    val+8(FP), R2
+	MOVW    ts+12(FP), R3
+	MOVW    addr2+16(FP), R4
+	MOVW    val3+20(FP), R5
+	MOVW	$SYS_futex, R7
+	SWI	$0
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int32 clone(int32 flags, void *stack, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT,$0
+	MOVW	flags+0(FP), R0
+	MOVW	stk+4(FP), R1
+	MOVW	$0, R2	// parent tid ptr
+	MOVW	$0, R3	// tls_val
+	MOVW	$0, R4	// child tid ptr
+	MOVW	$0, R5
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	MOVW	$-16(R1), R1
+	MOVW	mp+8(FP), R6
+	MOVW	R6, 0(R1)
+	MOVW	gp+12(FP), R6
+	MOVW	R6, 4(R1)
+	MOVW	fn+16(FP), R6
+	MOVW	R6, 8(R1)
+	MOVW	$1234, R6
+	MOVW	R6, 12(R1)
+
+	MOVW	$SYS_clone, R7
+	SWI	$0
+
+	// In parent, return.
+	CMP	$0, R0
+	BEQ	3(PC)
+	MOVW	R0, ret+20(FP)
+	RET
+
+	// Paranoia: check that SP is as we expect. Use R13 to avoid linker 'fixup'
+	NOP	R13	// tell vet SP/R13 changed - stop checking offsets
+	MOVW	12(R13), R0
+	MOVW	$1234, R1
+	CMP	R0, R1
+	BEQ	2(PC)
+	BL	runtime·abort(SB)
+
+	MOVW	0(R13), R8    // m
+	MOVW	4(R13), R0    // g
+
+	CMP	$0, R8
+	BEQ	nog
+	CMP	$0, R0
+	BEQ	nog
+
+	MOVW	R0, g
+	MOVW	R8, g_m(g)
+
+	// paranoia; check they are not nil
+	MOVW	0(R8), R0
+	MOVW	0(g), R0
+
+	BL	runtime·emptyfunc(SB)	// fault if stack check is wrong
+
+	// Initialize m->procid to Linux tid
+	MOVW	$SYS_gettid, R7
+	SWI	$0
+	MOVW	g_m(g), R8
+	MOVW	R0, m_procid(R8)
+
+nog:
+	// Call fn
+	MOVW	8(R13), R0
+	MOVW	$16(R13), R13
+	BL	(R0)
+
+	// It shouldn't return. If it does, exit that thread.
+	SUB	$16, R13 // restore the stack pointer to avoid memory corruption
+	MOVW	$0, R0
+	MOVW	R0, 4(R13)
+	BL	exit1<>(SB)
+
+	MOVW	$1234, R0
+	MOVW	$1005, R1
+	MOVW	R0, (R1)
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVW	new+0(FP), R0
+	MOVW	old+4(FP), R1
+	MOVW	$SYS_sigaltstack, R7
+	SWI	$0
+	MOVW	$0xfffff001, R6
+	CMP 	R6, R0
+	MOVW.HI	$0, R8  // crash on syscall failure
+	MOVW.HI	R8, (R8)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVW	sig+4(FP), R0
+	MOVW	info+8(FP), R1
+	MOVW	ctx+12(FP), R2
+	MOVW	fn+0(FP), R11
+	MOVW	R13, R4
+	SUB	$24, R13
+	BIC	$0x7, R13 // alignment for ELF ABI
+	BL	(R11)
+	MOVW	R4, R13
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$0
+	// Reserve space for callee-save registers and arguments.
+	MOVM.DB.W [R4-R11], (R13)
+	SUB	$16, R13
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVW	R0, 4(R13)
+	MOVB	runtime·iscgo(SB), R0
+	CMP 	$0, R0
+	BL.NE	runtime·load_g(SB)
+
+	MOVW	R1, 8(R13)
+	MOVW	R2, 12(R13)
+	MOVW  	$runtime·sigtrampgo(SB), R11
+	BL	(R11)
+
+	// Restore callee-save registers.
+	ADD	$16, R13
+	MOVM.IA.W (R13), [R4-R11]
+
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	MOVW  	$runtime·sigtramp(SB), R11
+	B	(R11)
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT,$0
+	MOVW	how+0(FP), R0
+	MOVW	new+4(FP), R1
+	MOVW	old+8(FP), R2
+	MOVW	size+12(FP), R3
+	MOVW	$SYS_rt_sigprocmask, R7
+	SWI	$0
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT,$0
+	MOVW	sig+0(FP), R0
+	MOVW	new+4(FP), R1
+	MOVW	old+8(FP), R2
+	MOVW	size+12(FP), R3
+	MOVW	$SYS_rt_sigaction, R7
+	SWI	$0
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$12
+	MOVW	usec+0(FP), R0
+	CALL	runtime·usplitR0(SB)
+	MOVW	R0, 4(R13)
+	MOVW	$1000, R0	// usec to nsec
+	MUL	R0, R1
+	MOVW	R1, 8(R13)
+	MOVW	$4(R13), R0
+	MOVW	$0, R1
+	MOVW	$SYS_nanosleep, R7
+	SWI	$0
+	RET
+
+// As for cas, memory barriers are complicated on ARM, but the kernel
+// provides a user helper. ARMv5 does not support SMP and has no
+// memory barrier instruction at all. ARMv6 added SMP support and has
+// a memory barrier, but it requires writing to a coprocessor
+// register. ARMv7 introduced the DMB instruction, but it's expensive
+// even on single-core devices. The kernel helper takes care of all of
+// this for us.
+
+TEXT kernelPublicationBarrier<>(SB),NOSPLIT,$0
+	// void __kuser_memory_barrier(void);
+	MOVW	$0xffff0fa0, R11
+	CALL	(R11)
+	RET
+
+TEXT ·publicationBarrier(SB),NOSPLIT,$0
+	MOVB	·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	·armPublicationBarrier(SB)
+	JMP	kernelPublicationBarrier<>(SB) // extra layer so this function is leaf and no SP adjustment on GOARM=7
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVW	$SYS_sched_yield, R7
+	SWI	$0
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT,$0
+	MOVW	pid+0(FP), R0
+	MOVW	len+4(FP), R1
+	MOVW	buf+8(FP), R2
+	MOVW	$SYS_sched_getaffinity, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+// int32 runtime·epollcreate(int32 size)
+TEXT runtime·epollcreate(SB),NOSPLIT,$0
+	MOVW	size+0(FP), R0
+	MOVW	$SYS_epoll_create, R7
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+// int32 runtime·epollcreate1(int32 flags)
+TEXT runtime·epollcreate1(SB),NOSPLIT,$0
+	MOVW	flags+0(FP), R0
+	MOVW	$SYS_epoll_create1, R7
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+// func epollctl(epfd, op, fd int32, ev *epollEvent) int
+TEXT runtime·epollctl(SB),NOSPLIT,$0
+	MOVW	epfd+0(FP), R0
+	MOVW	op+4(FP), R1
+	MOVW	fd+8(FP), R2
+	MOVW	ev+12(FP), R3
+	MOVW	$SYS_epoll_ctl, R7
+	SWI	$0
+	MOVW	R0, ret+16(FP)
+	RET
+
+// int32 runtime·epollwait(int32 epfd, EpollEvent *ev, int32 nev, int32 timeout)
+TEXT runtime·epollwait(SB),NOSPLIT,$0
+	MOVW	epfd+0(FP), R0
+	MOVW	ev+4(FP), R1
+	MOVW	nev+8(FP), R2
+	MOVW	timeout+12(FP), R3
+	MOVW	$SYS_epoll_wait, R7
+	SWI	$0
+	MOVW	R0, ret+16(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd)
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0	// fd
+	MOVW	$2, R1	// F_SETFD
+	MOVW	$1, R2	// FD_CLOEXEC
+	MOVW	$SYS_fcntl, R7
+	SWI	$0
+	RET
+
+// func runtime·setNonblock(fd int32)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVW	fd+0(FP), R0	// fd
+	MOVW	$3, R1	// F_GETFL
+	MOVW	$0, R2
+	MOVW	$SYS_fcntl, R7
+	SWI	$0
+	ORR	$0x800, R0, R2	// O_NONBLOCK
+	MOVW	fd+0(FP), R0	// fd
+	MOVW	$4, R1	// F_SETFL
+	MOVW	$SYS_fcntl, R7
+	SWI	$0
+	RET
+
+// b __kuser_get_tls @ 0xffff0fe0
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0xffff0fe0, R0
+	B	(R0)
+
+TEXT runtime·access(SB),NOSPLIT,$0
+	MOVW	name+0(FP), R0
+	MOVW	mode+4(FP), R1
+	MOVW	$SYS_access, R7
+	SWI	$0
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·connect(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0
+	MOVW	addr+4(FP), R1
+	MOVW	len+8(FP), R2
+	MOVW	$SYS_connect, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·socket(SB),NOSPLIT,$0
+	MOVW	domain+0(FP), R0
+	MOVW	typ+4(FP), R1
+	MOVW	prot+8(FP), R2
+	MOVW	$SYS_socket, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-4
+	// Implemented as brk(NULL).
+	MOVW	$0, R0
+	MOVW	$SYS_brk, R7
+	SWI	$0
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT runtime·sigreturn(SB),NOSPLIT,$0-0
+	RET
diff --git a/src/runtime/sys_linux_arm64.s b/src/runtime/sys_linux_arm64.s
new file mode 100644
index 0000000..198a5ba
--- /dev/null
+++ b/src/runtime/sys_linux_arm64.s
@@ -0,0 +1,767 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for arm64, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define AT_FDCWD -100
+
+#define CLOCK_REALTIME 0
+#define CLOCK_MONOTONIC 1
+
+#define SYS_exit		93
+#define SYS_read		63
+#define SYS_write		64
+#define SYS_openat		56
+#define SYS_close		57
+#define SYS_pipe2		59
+#define SYS_fcntl		25
+#define SYS_nanosleep		101
+#define SYS_mmap		222
+#define SYS_munmap		215
+#define SYS_setitimer		103
+#define SYS_clone		220
+#define SYS_sched_yield		124
+#define SYS_rt_sigreturn	139
+#define SYS_rt_sigaction	134
+#define SYS_rt_sigprocmask	135
+#define SYS_sigaltstack		132
+#define SYS_madvise		233
+#define SYS_mincore		232
+#define SYS_getpid		172
+#define SYS_gettid		178
+#define SYS_kill		129
+#define SYS_tgkill		131
+#define SYS_futex		98
+#define SYS_sched_getaffinity	123
+#define SYS_exit_group		94
+#define SYS_epoll_create1	20
+#define SYS_epoll_ctl		21
+#define SYS_epoll_pwait		22
+#define SYS_clock_gettime	113
+#define SYS_faccessat		48
+#define SYS_socket		198
+#define SYS_connect		203
+#define SYS_brk			214
+
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R0
+	MOVD	$SYS_exit_group, R8
+	SVC
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVD	wait+0(FP), R0
+	// We're done using the stack.
+	MOVW	$0, R1
+	STLRW	R1, (R0)
+	MOVW	$0, R0	// exit code
+	MOVD	$SYS_exit, R8
+	SVC
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	$AT_FDCWD, R0
+	MOVD	name+0(FP), R1
+	MOVW	mode+8(FP), R2
+	MOVW	perm+12(FP), R3
+	MOVD	$SYS_openat, R8
+	SVC
+	CMN	$4095, R0
+	BCC	done
+	MOVW	$-1, R0
+done:
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R0
+	MOVD	$SYS_close, R8
+	SVC
+	CMN	$4095, R0
+	BCC	done
+	MOVW	$-1, R0
+done:
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	fd+0(FP), R0
+	MOVD	p+8(FP), R1
+	MOVW	n+16(FP), R2
+	MOVD	$SYS_write, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R0
+	MOVD	p+8(FP), R1
+	MOVW	n+16(FP), R2
+	MOVD	$SYS_read, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT|NOFRAME,$0-12
+	MOVD	$r+0(FP), R0
+	MOVW	$0, R1
+	MOVW	$SYS_pipe2, R8
+	SVC
+	MOVW	R0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	$r+8(FP), R0
+	MOVW	flags+0(FP), R1
+	MOVW	$SYS_pipe2, R8
+	SVC
+	MOVW	R0, errno+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), R3
+	MOVD	R3, R5
+	MOVW	$1000000, R4
+	UDIV	R4, R3
+	MOVD	R3, 8(RSP)
+	MUL	R3, R4
+	SUB	R4, R5
+	MOVW	$1000, R4
+	MUL	R4, R5
+	MOVD	R5, 16(RSP)
+
+	// nanosleep(&ts, 0)
+	ADD	$8, RSP, R0
+	MOVD	$0, R1
+	MOVD	$SYS_nanosleep, R8
+	SVC
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVD	$SYS_gettid, R8
+	SVC
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$SYS_getpid, R8
+	SVC
+	MOVW	R0, R19
+	MOVD	$SYS_gettid, R8
+	SVC
+	MOVW	R0, R1	// arg 2 tid
+	MOVW	R19, R0	// arg 1 pid
+	MOVW	sig+0(FP), R2	// arg 3
+	MOVD	$SYS_tgkill, R8
+	SVC
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$SYS_getpid, R8
+	SVC
+	MOVW	R0, R0		// arg 1 pid
+	MOVW	sig+0(FP), R1	// arg 2
+	MOVD	$SYS_kill, R8
+	SVC
+	RET
+
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	MOVD	$SYS_getpid, R8
+	SVC
+	MOVD	R0, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT,$0-24
+	MOVD	tgid+0(FP), R0
+	MOVD	tid+8(FP), R1
+	MOVD	sig+16(FP), R2
+	MOVD	$SYS_tgkill, R8
+	SVC
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVD	$SYS_setitimer, R8
+	SVC
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVD	dst+16(FP), R2
+	MOVD	$SYS_mincore, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB),NOSPLIT,$24-12
+	MOVD	RSP, R20	// R20 is unchanged by C code
+	MOVD	RSP, R1
+
+	MOVD	g_m(g), R21	// R21 = m
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVD	m_vdsoPC(R21), R2
+	MOVD	m_vdsoSP(R21), R3
+	MOVD	R2, 8(RSP)
+	MOVD	R3, 16(RSP)
+
+	MOVD	LR, m_vdsoPC(R21)
+	MOVD	R20, m_vdsoSP(R21)
+
+	MOVD	m_curg(R21), R0
+	CMP	g, R0
+	BNE	noswitch
+
+	MOVD	m_g0(R21), R3
+	MOVD	(g_sched+gobuf_sp)(R3), R1	// Set RSP to g0 stack
+
+noswitch:
+	SUB	$16, R1
+	BIC	$15, R1	// Align for C code
+	MOVD	R1, RSP
+
+	MOVW	$CLOCK_REALTIME, R0
+	MOVD	runtime·vdsoClockgettimeSym(SB), R2
+	CBZ	R2, fallback
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+	// If we don't have a signal stack, we won't receive signal,
+	// so don't bother saving g.
+	// When using cgo, we already saved g on TLS, also don't save
+	// g here.
+	// Also don't save g if we are already on the signal stack.
+	// We won't get a nested signal.
+	MOVBU	runtime·iscgo(SB), R22
+	CBNZ	R22, nosaveg
+	MOVD	m_gsignal(R21), R22          // g.m.gsignal
+	CBZ	R22, nosaveg
+	CMP	g, R22
+	BEQ	nosaveg
+	MOVD	(g_stack+stack_lo)(R22), R22 // g.m.gsignal.stack.lo
+	MOVD	g, (R22)
+
+	BL	(R2)
+
+	MOVD	ZR, (R22)  // clear g slot, R22 is unchanged by C code
+
+	B	finish
+
+nosaveg:
+	BL	(R2)
+	B	finish
+
+fallback:
+	MOVD	$SYS_clock_gettime, R8
+	SVC
+
+finish:
+	MOVD	0(RSP), R3	// sec
+	MOVD	8(RSP), R5	// nsec
+
+	MOVD	R20, RSP	// restore SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVD	16(RSP), R1
+	MOVD	R1, m_vdsoSP(R21)
+	MOVD	8(RSP), R1
+	MOVD	R1, m_vdsoPC(R21)
+
+	MOVD	R3, sec+0(FP)
+	MOVW	R5, nsec+8(FP)
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$24-8
+	MOVD	RSP, R20	// R20 is unchanged by C code
+	MOVD	RSP, R1
+
+	MOVD	g_m(g), R21	// R21 = m
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVD	m_vdsoPC(R21), R2
+	MOVD	m_vdsoSP(R21), R3
+	MOVD	R2, 8(RSP)
+	MOVD	R3, 16(RSP)
+
+	MOVD	LR, m_vdsoPC(R21)
+	MOVD	R20, m_vdsoSP(R21)
+
+	MOVD	m_curg(R21), R0
+	CMP	g, R0
+	BNE	noswitch
+
+	MOVD	m_g0(R21), R3
+	MOVD	(g_sched+gobuf_sp)(R3), R1	// Set RSP to g0 stack
+
+noswitch:
+	SUB	$32, R1
+	BIC	$15, R1
+	MOVD	R1, RSP
+
+	MOVW	$CLOCK_MONOTONIC, R0
+	MOVD	runtime·vdsoClockgettimeSym(SB), R2
+	CBZ	R2, fallback
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+	// If we don't have a signal stack, we won't receive signal,
+	// so don't bother saving g.
+	// When using cgo, we already saved g on TLS, also don't save
+	// g here.
+	// Also don't save g if we are already on the signal stack.
+	// We won't get a nested signal.
+	MOVBU	runtime·iscgo(SB), R22
+	CBNZ	R22, nosaveg
+	MOVD	m_gsignal(R21), R22          // g.m.gsignal
+	CBZ	R22, nosaveg
+	CMP	g, R22
+	BEQ	nosaveg
+	MOVD	(g_stack+stack_lo)(R22), R22 // g.m.gsignal.stack.lo
+	MOVD	g, (R22)
+
+	BL	(R2)
+
+	MOVD	ZR, (R22)  // clear g slot, R22 is unchanged by C code
+
+	B	finish
+
+nosaveg:
+	BL	(R2)
+	B	finish
+
+fallback:
+	MOVD	$SYS_clock_gettime, R8
+	SVC
+
+finish:
+	MOVD	0(RSP), R3	// sec
+	MOVD	8(RSP), R5	// nsec
+
+	MOVD	R20, RSP	// restore SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVD	16(RSP), R1
+	MOVD	R1, m_vdsoSP(R21)
+	MOVD	8(RSP), R1
+	MOVD	R1, m_vdsoPC(R21)
+
+	// sec is in R3, nsec in R5
+	// return nsec in R3
+	MOVD	$1000000000, R4
+	MUL	R4, R3
+	ADD	R5, R3
+	MOVD	R3, ret+0(FP)
+	RET
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVW	size+24(FP), R3
+	MOVD	$SYS_rt_sigprocmask, R8
+	SVC
+	CMN	$4095, R0
+	BCC	done
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+done:
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOVD	sig+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVD	size+24(FP), R3
+	MOVD	$SYS_rt_sigaction, R8
+	SVC
+	MOVW	R0, ret+32(FP)
+	RET
+
+// Call the function stored in _cgo_sigaction using the GCC calling convention.
+TEXT runtime·callCgoSigaction(SB),NOSPLIT,$0
+	MOVD	sig+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVD	 _cgo_sigaction(SB), R3
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	R3
+	ADD	$16, RSP
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R0
+	MOVD	info+16(FP), R1
+	MOVD	ctx+24(FP), R2
+	MOVD	fn+0(FP), R11
+	BL	(R11)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$192
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	MOVD	R19, 8*4(RSP)
+	MOVD	R20, 8*5(RSP)
+	MOVD	R21, 8*6(RSP)
+	MOVD	R22, 8*7(RSP)
+	MOVD	R23, 8*8(RSP)
+	MOVD	R24, 8*9(RSP)
+	MOVD	R25, 8*10(RSP)
+	MOVD	R26, 8*11(RSP)
+	MOVD	R27, 8*12(RSP)
+	MOVD	g, 8*13(RSP)
+	MOVD	R29, 8*14(RSP)
+	FMOVD	F8, 8*15(RSP)
+	FMOVD	F9, 8*16(RSP)
+	FMOVD	F10, 8*17(RSP)
+	FMOVD	F11, 8*18(RSP)
+	FMOVD	F12, 8*19(RSP)
+	FMOVD	F13, 8*20(RSP)
+	FMOVD	F14, 8*21(RSP)
+	FMOVD	F15, 8*22(RSP)
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVW	R0, 8(RSP)
+	MOVBU	runtime·iscgo(SB), R0
+	CBZ	R0, 2(PC)
+	BL	runtime·load_g(SB)
+
+	MOVD	R1, 16(RSP)
+	MOVD	R2, 24(RSP)
+	MOVD	$runtime·sigtrampgo(SB), R0
+	BL	(R0)
+
+	// Restore callee-save registers.
+	MOVD	8*4(RSP), R19
+	MOVD	8*5(RSP), R20
+	MOVD	8*6(RSP), R21
+	MOVD	8*7(RSP), R22
+	MOVD	8*8(RSP), R23
+	MOVD	8*9(RSP), R24
+	MOVD	8*10(RSP), R25
+	MOVD	8*11(RSP), R26
+	MOVD	8*12(RSP), R27
+	MOVD	8*13(RSP), g
+	MOVD	8*14(RSP), R29
+	FMOVD	8*15(RSP), F8
+	FMOVD	8*16(RSP), F9
+	FMOVD	8*17(RSP), F10
+	FMOVD	8*18(RSP), F11
+	FMOVD	8*19(RSP), F12
+	FMOVD	8*20(RSP), F13
+	FMOVD	8*21(RSP), F14
+	FMOVD	8*22(RSP), F15
+
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	MOVD	$runtime·sigtramp(SB), R3
+	B	(R3)
+
+TEXT runtime·sysMmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVW	prot+16(FP), R2
+	MOVW	flags+20(FP), R3
+	MOVW	fd+24(FP), R4
+	MOVW	off+28(FP), R5
+
+	MOVD	$SYS_mmap, R8
+	SVC
+	CMN	$4095, R0
+	BCC	ok
+	NEG	R0,R0
+	MOVD	$0, p+32(FP)
+	MOVD	R0, err+40(FP)
+	RET
+ok:
+	MOVD	R0, p+32(FP)
+	MOVD	$0, err+40(FP)
+	RET
+
+// Call the function stored in _cgo_mmap using the GCC calling convention.
+// This must be called on the system stack.
+TEXT runtime·callCgoMmap(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVW	prot+16(FP), R2
+	MOVW	flags+20(FP), R3
+	MOVW	fd+24(FP), R4
+	MOVW	off+28(FP), R5
+	MOVD	_cgo_mmap(SB), R9
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	R9
+	ADD	$16, RSP
+	MOVD	R0, ret+32(FP)
+	RET
+
+TEXT runtime·sysMunmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVD	$SYS_munmap, R8
+	SVC
+	CMN	$4095, R0
+	BCC	cool
+	MOVD	R0, 0xf0(R0)
+cool:
+	RET
+
+// Call the function stored in _cgo_munmap using the GCC calling convention.
+// This must be called on the system stack.
+TEXT runtime·callCgoMunmap(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVD	_cgo_munmap(SB), R9
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	R9
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVW	flags+16(FP), R2
+	MOVD	$SYS_madvise, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int64 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVW	op+8(FP), R1
+	MOVW	val+12(FP), R2
+	MOVD	ts+16(FP), R3
+	MOVD	addr2+24(FP), R4
+	MOVW	val3+32(FP), R5
+	MOVD	$SYS_futex, R8
+	SVC
+	MOVW	R0, ret+40(FP)
+	RET
+
+// int64 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R0
+	MOVD	stk+8(FP), R1
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	MOVD	mp+16(FP), R10
+	MOVD	gp+24(FP), R11
+	MOVD	fn+32(FP), R12
+
+	MOVD	R10, -8(R1)
+	MOVD	R11, -16(R1)
+	MOVD	R12, -24(R1)
+	MOVD	$1234, R10
+	MOVD	R10, -32(R1)
+
+	MOVD	$SYS_clone, R8
+	SVC
+
+	// In parent, return.
+	CMP	ZR, R0
+	BEQ	child
+	MOVW	R0, ret+40(FP)
+	RET
+child:
+
+	// In child, on new stack.
+	MOVD	-32(RSP), R10
+	MOVD	$1234, R0
+	CMP	R0, R10
+	BEQ	good
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+
+good:
+	// Initialize m->procid to Linux tid
+	MOVD	$SYS_gettid, R8
+	SVC
+
+	MOVD	-24(RSP), R12     // fn
+	MOVD	-16(RSP), R11     // g
+	MOVD	-8(RSP), R10      // m
+
+	CMP	$0, R10
+	BEQ	nog
+	CMP	$0, R11
+	BEQ	nog
+
+	MOVD	R0, m_procid(R10)
+
+	// TODO: setup TLS.
+
+	// In child, set up new stack
+	MOVD	R10, g_m(R11)
+	MOVD	R11, g
+	//CALL	runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	MOVD	R12, R0
+	BL	(R0)
+
+	// It shouldn't return.	 If it does, exit that thread.
+	MOVW	$111, R0
+again:
+	MOVD	$SYS_exit, R8
+	SVC
+	B	again	// keep exiting
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVD	new+0(FP), R0
+	MOVD	old+8(FP), R1
+	MOVD	$SYS_sigaltstack, R8
+	SVC
+	CMN	$4095, R0
+	BCC	ok
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+ok:
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$SYS_sched_yield, R8
+	SVC
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOVD	pid+0(FP), R0
+	MOVD	len+8(FP), R1
+	MOVD	buf+16(FP), R2
+	MOVD	$SYS_sched_getaffinity, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int32 runtime·epollcreate(int32 size);
+TEXT runtime·epollcreate(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0, R0
+	MOVD	$SYS_epoll_create1, R8
+	SVC
+	MOVW	R0, ret+8(FP)
+	RET
+
+// int32 runtime·epollcreate1(int32 flags);
+TEXT runtime·epollcreate1(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R0
+	MOVD	$SYS_epoll_create1, R8
+	SVC
+	MOVW	R0, ret+8(FP)
+	RET
+
+// func epollctl(epfd, op, fd int32, ev *epollEvent) int
+TEXT runtime·epollctl(SB),NOSPLIT|NOFRAME,$0
+	MOVW	epfd+0(FP), R0
+	MOVW	op+4(FP), R1
+	MOVW	fd+8(FP), R2
+	MOVD	ev+16(FP), R3
+	MOVD	$SYS_epoll_ctl, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int32 runtime·epollwait(int32 epfd, EpollEvent *ev, int32 nev, int32 timeout);
+TEXT runtime·epollwait(SB),NOSPLIT|NOFRAME,$0
+	MOVW	epfd+0(FP), R0
+	MOVD	ev+8(FP), R1
+	MOVW	nev+16(FP), R2
+	MOVW	timeout+20(FP), R3
+	MOVD	$0, R4
+	MOVD	$SYS_epoll_pwait, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R0  // fd
+	MOVD	$2, R1	// F_SETFD
+	MOVD	$1, R2	// FD_CLOEXEC
+	MOVD	$SYS_fcntl, R8
+	SVC
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	fd+0(FP), R0 // fd
+	MOVD	$3, R1	// F_GETFL
+	MOVD	$0, R2
+	MOVD	$SYS_fcntl, R8
+	SVC
+	MOVD	$0x800, R2 // O_NONBLOCK
+	ORR	R0, R2
+	MOVW	fd+0(FP), R0 // fd
+	MOVD	$4, R1	// F_SETFL
+	MOVD	$SYS_fcntl, R8
+	SVC
+	RET
+
+// int access(const char *name, int mode)
+TEXT runtime·access(SB),NOSPLIT,$0-20
+	MOVD	$AT_FDCWD, R0
+	MOVD	name+0(FP), R1
+	MOVW	mode+8(FP), R2
+	MOVD	$SYS_faccessat, R8
+	SVC
+	MOVW	R0, ret+16(FP)
+	RET
+
+// int connect(int fd, const struct sockaddr *addr, socklen_t len)
+TEXT runtime·connect(SB),NOSPLIT,$0-28
+	MOVW	fd+0(FP), R0
+	MOVD	addr+8(FP), R1
+	MOVW	len+16(FP), R2
+	MOVD	$SYS_connect, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int socket(int domain, int typ, int prot)
+TEXT runtime·socket(SB),NOSPLIT,$0-20
+	MOVW	domain+0(FP), R0
+	MOVW	typ+4(FP), R1
+	MOVW	prot+8(FP), R2
+	MOVD	$SYS_socket, R8
+	SVC
+	MOVW	R0, ret+16(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-8
+	// Implemented as brk(NULL).
+	MOVD	$0, R0
+	MOVD	$SYS_brk, R8
+	SVC
+	MOVD	R0, ret+0(FP)
+	RET
+
+TEXT runtime·sigreturn(SB),NOSPLIT,$0-0
+	RET
diff --git a/src/runtime/sys_linux_mips64x.s b/src/runtime/sys_linux_mips64x.s
new file mode 100644
index 0000000..c3e9f37
--- /dev/null
+++ b/src/runtime/sys_linux_mips64x.s
@@ -0,0 +1,645 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build mips64 mips64le
+
+//
+// System calls and other sys.stuff for mips64, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define AT_FDCWD -100
+
+#define SYS_exit		5058
+#define SYS_read		5000
+#define SYS_write		5001
+#define SYS_close		5003
+#define SYS_getpid		5038
+#define SYS_kill		5060
+#define SYS_fcntl		5070
+#define SYS_mmap		5009
+#define SYS_munmap		5011
+#define SYS_setitimer		5036
+#define SYS_clone		5055
+#define SYS_nanosleep		5034
+#define SYS_sched_yield		5023
+#define SYS_rt_sigreturn	5211
+#define SYS_rt_sigaction	5013
+#define SYS_rt_sigprocmask	5014
+#define SYS_sigaltstack		5129
+#define SYS_madvise		5027
+#define SYS_mincore		5026
+#define SYS_gettid		5178
+#define SYS_futex		5194
+#define SYS_sched_getaffinity	5196
+#define SYS_exit_group		5205
+#define SYS_epoll_create	5207
+#define SYS_epoll_ctl		5208
+#define SYS_tgkill		5225
+#define SYS_openat		5247
+#define SYS_epoll_pwait		5272
+#define SYS_clock_gettime	5222
+#define SYS_epoll_create1	5285
+#define SYS_brk			5012
+#define SYS_pipe2		5287
+
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R4
+	MOVV	$SYS_exit_group, R2
+	SYSCALL
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVV	wait+0(FP), R1
+	// We're done using the stack.
+	MOVW	$0, R2
+	SYNC
+	MOVW	R2, (R1)
+	SYNC
+	MOVW	$0, R4	// exit code
+	MOVV	$SYS_exit, R2
+	SYSCALL
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	// This uses openat instead of open, because Android O blocks open.
+	MOVW	$AT_FDCWD, R4 // AT_FDCWD, so this acts like open
+	MOVV	name+0(FP), R5
+	MOVW	mode+8(FP), R6
+	MOVW	perm+12(FP), R7
+	MOVV	$SYS_openat, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R4
+	MOVV	$SYS_close, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVV	fd+0(FP), R4
+	MOVV	p+8(FP), R5
+	MOVW	n+16(FP), R6
+	MOVV	$SYS_write, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R4
+	MOVV	p+8(FP), R5
+	MOVW	n+16(FP), R6
+	MOVV	$SYS_read, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT|NOFRAME,$0-12
+	MOVV	$r+0(FP), R4
+	MOVV	R0, R5
+	MOVV	$SYS_pipe2, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVV	$r+8(FP), R4
+	MOVW	flags+0(FP), R5
+	MOVV	$SYS_pipe2, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, errno+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16-4
+	MOVWU	usec+0(FP), R3
+	MOVV	R3, R5
+	MOVW	$1000000, R4
+	DIVVU	R4, R3
+	MOVV	LO, R3
+	MOVV	R3, 8(R29)
+	MOVW	$1000, R4
+	MULVU	R3, R4
+	MOVV	LO, R4
+	SUBVU	R4, R5
+	MOVV	R5, 16(R29)
+
+	// nanosleep(&ts, 0)
+	ADDV	$8, R29, R4
+	MOVW	$0, R5
+	MOVV	$SYS_nanosleep, R2
+	SYSCALL
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVV	$SYS_gettid, R2
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOVV	$SYS_getpid, R2
+	SYSCALL
+	MOVW	R2, R16
+	MOVV	$SYS_gettid, R2
+	SYSCALL
+	MOVW	R2, R5	// arg 2 tid
+	MOVW	R16, R4	// arg 1 pid
+	MOVW	sig+0(FP), R6	// arg 3
+	MOVV	$SYS_tgkill, R2
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVV	$SYS_getpid, R2
+	SYSCALL
+	MOVW	R2, R4	// arg 1 pid
+	MOVW	sig+0(FP), R5	// arg 2
+	MOVV	$SYS_kill, R2
+	SYSCALL
+	RET
+
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	MOVV	$SYS_getpid, R2
+	SYSCALL
+	MOVV	R2, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT|NOFRAME,$0-24
+	MOVV	tgid+0(FP), R4
+	MOVV	tid+8(FP), R5
+	MOVV	sig+16(FP), R6
+	MOVV	$SYS_tgkill, R2
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R4
+	MOVV	new+8(FP), R5
+	MOVV	old+16(FP), R6
+	MOVV	$SYS_setitimer, R2
+	SYSCALL
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVV	dst+16(FP), R6
+	MOVV	$SYS_mincore, R2
+	SYSCALL
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB),NOSPLIT,$16-12
+	MOVV	R29, R16	// R16 is unchanged by C code
+	MOVV	R29, R1
+
+	MOVV	g_m(g), R17	// R17 = m
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVV	m_vdsoPC(R17), R2
+	MOVV	m_vdsoSP(R17), R3
+	MOVV	R2, 8(R29)
+	MOVV	R3, 16(R29)
+
+	MOVV	R31, m_vdsoPC(R17)
+	MOVV	R29, m_vdsoSP(R17)
+
+	MOVV	m_curg(R17), R4
+	MOVV	g, R5
+	BNE	R4, R5, noswitch
+
+	MOVV	m_g0(R17), R4
+	MOVV	(g_sched+gobuf_sp)(R4), R1	// Set SP to g0 stack
+
+noswitch:
+	SUBV	$16, R1
+	AND	$~15, R1	// Align for C code
+	MOVV	R1, R29
+
+	MOVW	$0, R4 // CLOCK_REALTIME
+	MOVV	$0(R29), R5
+
+	MOVV	runtime·vdsoClockgettimeSym(SB), R25
+	BEQ	R25, fallback
+
+	JAL	(R25)
+	// check on vdso call return for kernel compatibility
+	// see https://golang.org/issues/39046
+	// if we get any error make fallback permanent.
+	BEQ	R2, R0, finish
+	MOVV	R0, runtime·vdsoClockgettimeSym(SB)
+	MOVW	$0, R4 // CLOCK_REALTIME
+	MOVV	$0(R29), R5
+	JMP	fallback
+
+finish:
+	MOVV	0(R29), R3	// sec
+	MOVV	8(R29), R5	// nsec
+
+	MOVV	R16, R29	// restore SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVV	16(R29), R1
+	MOVV	R1, m_vdsoSP(R17)
+	MOVV	8(R29), R1
+	MOVV	R1, m_vdsoPC(R17)
+
+	MOVV	R3, sec+0(FP)
+	MOVW	R5, nsec+8(FP)
+	RET
+
+fallback:
+	MOVV	$SYS_clock_gettime, R2
+	SYSCALL
+	JMP finish
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$16-8
+	MOVV	R29, R16	// R16 is unchanged by C code
+	MOVV	R29, R1
+
+	MOVV	g_m(g), R17	// R17 = m
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVV	m_vdsoPC(R17), R2
+	MOVV	m_vdsoSP(R17), R3
+	MOVV	R2, 8(R29)
+	MOVV	R3, 16(R29)
+
+	MOVV	R31, m_vdsoPC(R17)
+	MOVV	R29, m_vdsoSP(R17)
+
+	MOVV	m_curg(R17), R4
+	MOVV	g, R5
+	BNE	R4, R5, noswitch
+
+	MOVV	m_g0(R17), R4
+	MOVV	(g_sched+gobuf_sp)(R4), R1	// Set SP to g0 stack
+
+noswitch:
+	SUBV	$16, R1
+	AND	$~15, R1	// Align for C code
+	MOVV	R1, R29
+
+	MOVW	$1, R4 // CLOCK_MONOTONIC
+	MOVV	$0(R29), R5
+
+	MOVV	runtime·vdsoClockgettimeSym(SB), R25
+	BEQ	R25, fallback
+
+	JAL	(R25)
+	// see walltime1 for detail
+	BEQ	R2, R0, finish
+	MOVV	R0, runtime·vdsoClockgettimeSym(SB)
+	MOVW	$1, R4 // CLOCK_MONOTONIC
+	MOVV	$0(R29), R5
+	JMP	fallback
+
+finish:
+	MOVV	0(R29), R3	// sec
+	MOVV	8(R29), R5	// nsec
+
+	MOVV	R16, R29	// restore SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVV	16(R29), R1
+	MOVV	R1, m_vdsoSP(R17)
+	MOVV	8(R29), R1
+	MOVV	R1, m_vdsoPC(R17)
+
+	// sec is in R3, nsec in R5
+	// return nsec in R3
+	MOVV	$1000000000, R4
+	MULVU	R4, R3
+	MOVV	LO, R3
+	ADDVU	R5, R3
+	MOVV	R3, ret+0(FP)
+	RET
+
+fallback:
+	MOVV	$SYS_clock_gettime, R2
+	SYSCALL
+	JMP	finish
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), R4
+	MOVV	new+8(FP), R5
+	MOVV	old+16(FP), R6
+	MOVW	size+24(FP), R7
+	MOVV	$SYS_rt_sigprocmask, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVV	R0, 0xf1(R0)	// crash
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOVV	sig+0(FP), R4
+	MOVV	new+8(FP), R5
+	MOVV	old+16(FP), R6
+	MOVV	size+24(FP), R7
+	MOVV	$SYS_rt_sigaction, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+32(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R4
+	MOVV	info+16(FP), R5
+	MOVV	ctx+24(FP), R6
+	MOVV	fn+0(FP), R25
+	JAL	(R25)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$64
+	// initialize REGSB = PC&0xffffffff00000000
+	BGEZAL	R0, 1(PC)
+	SRLV	$32, R31, RSB
+	SLLV	$32, RSB
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVB	runtime·iscgo(SB), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·load_g(SB)
+
+	MOVW	R4, 8(R29)
+	MOVV	R5, 16(R29)
+	MOVV	R6, 24(R29)
+	MOVV	$runtime·sigtrampgo(SB), R1
+	JAL	(R1)
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	JMP	runtime·sigtramp(SB)
+
+TEXT runtime·mmap(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVW	prot+16(FP), R6
+	MOVW	flags+20(FP), R7
+	MOVW	fd+24(FP), R8
+	MOVW	off+28(FP), R9
+
+	MOVV	$SYS_mmap, R2
+	SYSCALL
+	BEQ	R7, ok
+	MOVV	$0, p+32(FP)
+	MOVV	R2, err+40(FP)
+	RET
+ok:
+	MOVV	R2, p+32(FP)
+	MOVV	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVV	$SYS_munmap, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVV	R0, 0xf3(R0)	// crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVW	flags+16(FP), R6
+	MOVV	$SYS_madvise, R2
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// int64 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVW	op+8(FP), R5
+	MOVW	val+12(FP), R6
+	MOVV	ts+16(FP), R7
+	MOVV	addr2+24(FP), R8
+	MOVW	val3+32(FP), R9
+	MOVV	$SYS_futex, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+40(FP)
+	RET
+
+// int64 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R4
+	MOVV	stk+8(FP), R5
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	// Careful: Linux system call clobbers ???.
+	MOVV	mp+16(FP), R16
+	MOVV	gp+24(FP), R17
+	MOVV	fn+32(FP), R18
+
+	MOVV	R16, -8(R5)
+	MOVV	R17, -16(R5)
+	MOVV	R18, -24(R5)
+	MOVV	$1234, R16
+	MOVV	R16, -32(R5)
+
+	MOVV	$SYS_clone, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+
+	// In parent, return.
+	BEQ	R2, 3(PC)
+	MOVW	R2, ret+40(FP)
+	RET
+
+	// In child, on new stack.
+	MOVV	-32(R29), R16
+	MOVV	$1234, R1
+	BEQ	R16, R1, 2(PC)
+	MOVV	R0, 0(R0)
+
+	// Initialize m->procid to Linux tid
+	MOVV	$SYS_gettid, R2
+	SYSCALL
+
+	MOVV	-24(R29), R18		// fn
+	MOVV	-16(R29), R17		// g
+	MOVV	-8(R29), R16		// m
+
+	BEQ	R16, nog
+	BEQ	R17, nog
+
+	MOVV	R2, m_procid(R16)
+
+	// TODO: setup TLS.
+
+	// In child, set up new stack
+	MOVV	R16, g_m(R17)
+	MOVV	R17, g
+	//CALL	runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	JAL	(R18)
+
+	// It shouldn't return.	 If it does, exit that thread.
+	MOVW	$111, R4
+	MOVV	$SYS_exit, R2
+	SYSCALL
+	JMP	-3(PC)	// keep exiting
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVV	new+0(FP), R4
+	MOVV	old+8(FP), R5
+	MOVV	$SYS_sigaltstack, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVV	R0, 0xf1(R0)	// crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVV	$SYS_sched_yield, R2
+	SYSCALL
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOVV	pid+0(FP), R4
+	MOVV	len+8(FP), R5
+	MOVV	buf+16(FP), R6
+	MOVV	$SYS_sched_getaffinity, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+// int32 runtime·epollcreate(int32 size);
+TEXT runtime·epollcreate(SB),NOSPLIT|NOFRAME,$0
+	MOVW    size+0(FP), R4
+	MOVV	$SYS_epoll_create, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+8(FP)
+	RET
+
+// int32 runtime·epollcreate1(int32 flags);
+TEXT runtime·epollcreate1(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R4
+	MOVV	$SYS_epoll_create1, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+8(FP)
+	RET
+
+// func epollctl(epfd, op, fd int32, ev *epollEvent) int
+TEXT runtime·epollctl(SB),NOSPLIT|NOFRAME,$0
+	MOVW	epfd+0(FP), R4
+	MOVW	op+4(FP), R5
+	MOVW	fd+8(FP), R6
+	MOVV	ev+16(FP), R7
+	MOVV	$SYS_epoll_ctl, R2
+	SYSCALL
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+// int32 runtime·epollwait(int32 epfd, EpollEvent *ev, int32 nev, int32 timeout);
+TEXT runtime·epollwait(SB),NOSPLIT|NOFRAME,$0
+	// This uses pwait instead of wait, because Android O blocks wait.
+	MOVW	epfd+0(FP), R4
+	MOVV	ev+8(FP), R5
+	MOVW	nev+16(FP), R6
+	MOVW	timeout+20(FP), R7
+	MOVV	$0, R8
+	MOVV	$SYS_epoll_pwait, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT|NOFRAME,$0
+	MOVW    fd+0(FP), R4  // fd
+	MOVV    $2, R5  // F_SETFD
+	MOVV    $1, R6  // FD_CLOEXEC
+	MOVV	$SYS_fcntl, R2
+	SYSCALL
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	fd+0(FP), R4 // fd
+	MOVV	$3, R5	// F_GETFL
+	MOVV	$0, R6
+	MOVV	$SYS_fcntl, R2
+	SYSCALL
+	MOVW	$0x80, R6 // O_NONBLOCK
+	OR	R2, R6
+	MOVW	fd+0(FP), R4 // fd
+	MOVV	$4, R5	// F_SETFL
+	MOVV	$SYS_fcntl, R2
+	SYSCALL
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT|NOFRAME,$0-8
+	// Implemented as brk(NULL).
+	MOVV	$0, R4
+	MOVV	$SYS_brk, R2
+	SYSCALL
+	MOVV	R2, ret+0(FP)
+	RET
+
+TEXT runtime·access(SB),$0-20
+	MOVV	R0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP) // for vet
+	RET
+
+TEXT runtime·connect(SB),$0-28
+	MOVV	R0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+24(FP) // for vet
+	RET
+
+TEXT runtime·socket(SB),$0-20
+	MOVV	R0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP) // for vet
+	RET
diff --git a/src/runtime/sys_linux_mipsx.s b/src/runtime/sys_linux_mipsx.s
new file mode 100644
index 0000000..fab2ab3
--- /dev/null
+++ b/src/runtime/sys_linux_mipsx.s
@@ -0,0 +1,571 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build mips mipsle
+
+//
+// System calls and other sys.stuff for mips, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define SYS_exit		4001
+#define SYS_read		4003
+#define SYS_write		4004
+#define SYS_open		4005
+#define SYS_close		4006
+#define SYS_getpid		4020
+#define SYS_kill		4037
+#define SYS_pipe		4042
+#define SYS_brk			4045
+#define SYS_fcntl		4055
+#define SYS_mmap		4090
+#define SYS_munmap		4091
+#define SYS_setitimer		4104
+#define SYS_clone		4120
+#define SYS_sched_yield		4162
+#define SYS_nanosleep		4166
+#define SYS_rt_sigreturn	4193
+#define SYS_rt_sigaction	4194
+#define SYS_rt_sigprocmask	4195
+#define SYS_sigaltstack		4206
+#define SYS_madvise		4218
+#define SYS_mincore		4217
+#define SYS_gettid		4222
+#define SYS_futex		4238
+#define SYS_sched_getaffinity	4240
+#define SYS_exit_group		4246
+#define SYS_epoll_create	4248
+#define SYS_epoll_ctl		4249
+#define SYS_epoll_wait		4250
+#define SYS_clock_gettime	4263
+#define SYS_tgkill		4266
+#define SYS_epoll_create1	4326
+#define SYS_pipe2		4328
+
+TEXT runtime·exit(SB),NOSPLIT,$0-4
+	MOVW	code+0(FP), R4
+	MOVW	$SYS_exit_group, R2
+	SYSCALL
+	UNDEF
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVW	wait+0(FP), R1
+	// We're done using the stack.
+	MOVW	$0, R2
+	SYNC
+	MOVW	R2, (R1)
+	SYNC
+	MOVW	$0, R4	// exit code
+	MOVW	$SYS_exit, R2
+	SYSCALL
+	UNDEF
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$0-16
+	MOVW	name+0(FP), R4
+	MOVW	mode+4(FP), R5
+	MOVW	perm+8(FP), R6
+	MOVW	$SYS_open, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0-8
+	MOVW	fd+0(FP), R4
+	MOVW	$SYS_close, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+4(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$0-16
+	MOVW	fd+0(FP), R4
+	MOVW	p+4(FP), R5
+	MOVW	n+8(FP), R6
+	MOVW	$SYS_write, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+12(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$0-16
+	MOVW	fd+0(FP), R4
+	MOVW	p+4(FP), R5
+	MOVW	n+8(FP), R6
+	MOVW	$SYS_read, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+12(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	MOVW	$SYS_pipe, R2
+	SYSCALL
+	BEQ	R7, pipeok
+	MOVW	$-1, R1
+	MOVW	R1, r+0(FP)
+	MOVW	R1, w+4(FP)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, errno+8(FP)
+	RET
+pipeok:
+	MOVW	R2, r+0(FP)
+	MOVW	R3, w+4(FP)
+	MOVW	R0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVW	$r+4(FP), R4
+	MOVW	flags+0(FP), R5
+	MOVW	$SYS_pipe2, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, errno+12(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$28-4
+	MOVW	usec+0(FP), R3
+	MOVW	R3, R5
+	MOVW	$1000000, R4
+	DIVU	R4, R3
+	MOVW	LO, R3
+	MOVW	R3, 24(R29)
+	MOVW	$1000, R4
+	MULU	R3, R4
+	MOVW	LO, R4
+	SUBU	R4, R5
+	MOVW	R5, 28(R29)
+
+	// nanosleep(&ts, 0)
+	ADDU	$24, R29, R4
+	MOVW	$0, R5
+	MOVW	$SYS_nanosleep, R2
+	SYSCALL
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVW	$SYS_gettid, R2
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT,$0-4
+	MOVW	$SYS_getpid, R2
+	SYSCALL
+	MOVW	R2, R16
+	MOVW	$SYS_gettid, R2
+	SYSCALL
+	MOVW	R2, R5	// arg 2 tid
+	MOVW	R16, R4	// arg 1 pid
+	MOVW	sig+0(FP), R6	// arg 3
+	MOVW	$SYS_tgkill, R2
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	MOVW	$SYS_getpid, R2
+	SYSCALL
+	MOVW	R2, R4	// arg 1 pid
+	MOVW	sig+0(FP), R5	// arg 2
+	MOVW	$SYS_kill, R2
+	SYSCALL
+	RET
+
+TEXT ·getpid(SB),NOSPLIT,$0-4
+	MOVW	$SYS_getpid, R2
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT,$0-12
+	MOVW	tgid+0(FP), R4
+	MOVW	tid+4(FP), R5
+	MOVW	sig+8(FP), R6
+	MOVW	$SYS_tgkill, R2
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0-12
+	MOVW	mode+0(FP), R4
+	MOVW	new+4(FP), R5
+	MOVW	old+8(FP), R6
+	MOVW	$SYS_setitimer, R2
+	SYSCALL
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT,$0-16
+	MOVW	addr+0(FP), R4
+	MOVW	n+4(FP), R5
+	MOVW	dst+8(FP), R6
+	MOVW	$SYS_mincore, R2
+	SYSCALL
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+12(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB),NOSPLIT,$8-12
+	MOVW	$0, R4	// CLOCK_REALTIME
+	MOVW	$4(R29), R5
+	MOVW	$SYS_clock_gettime, R2
+	SYSCALL
+	MOVW	4(R29), R3	// sec
+	MOVW	8(R29), R5	// nsec
+	MOVW	$sec+0(FP), R6
+#ifdef GOARCH_mips
+	MOVW	R3, 4(R6)
+	MOVW	R0, 0(R6)
+#else
+	MOVW	R3, 0(R6)
+	MOVW	R0, 4(R6)
+#endif
+	MOVW	R5, nsec+8(FP)
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$8-8
+	MOVW	$1, R4	// CLOCK_MONOTONIC
+	MOVW	$4(R29), R5
+	MOVW	$SYS_clock_gettime, R2
+	SYSCALL
+	MOVW	4(R29), R3	// sec
+	MOVW	8(R29), R5	// nsec
+	// sec is in R3, nsec in R5
+	// return nsec in R3
+	MOVW	$1000000000, R4
+	MULU	R4, R3
+	MOVW	LO, R3
+	ADDU	R5, R3
+	SGTU	R5, R3, R4
+	MOVW	$ret+0(FP), R6
+#ifdef GOARCH_mips
+	MOVW	R3, 4(R6)
+#else
+	MOVW	R3, 0(R6)
+#endif
+	MOVW	HI, R3
+	ADDU	R4, R3
+#ifdef GOARCH_mips
+	MOVW	R3, 0(R6)
+#else
+	MOVW	R3, 4(R6)
+#endif
+	RET
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT,$0-16
+	MOVW	how+0(FP), R4
+	MOVW	new+4(FP), R5
+	MOVW	old+8(FP), R6
+	MOVW	size+12(FP), R7
+	MOVW	$SYS_rt_sigprocmask, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	UNDEF	// crash
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT,$0-20
+	MOVW	sig+0(FP), R4
+	MOVW	new+4(FP), R5
+	MOVW	old+8(FP), R6
+	MOVW	size+12(FP), R7
+	MOVW	$SYS_rt_sigaction, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVW	sig+4(FP), R4
+	MOVW	info+8(FP), R5
+	MOVW	ctx+12(FP), R6
+	MOVW	fn+0(FP), R25
+	MOVW	R29, R22
+	SUBU	$16, R29
+	AND	$~7, R29	// shadow space for 4 args aligned to 8 bytes as per O32 ABI
+	JAL	(R25)
+	MOVW	R22, R29
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$12
+	// this might be called in external code context,
+	// where g is not set.
+	MOVB	runtime·iscgo(SB), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·load_g(SB)
+
+	MOVW	R4, 4(R29)
+	MOVW	R5, 8(R29)
+	MOVW	R6, 12(R29)
+	MOVW	$runtime·sigtrampgo(SB), R1
+	JAL	(R1)
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	JMP	runtime·sigtramp(SB)
+
+TEXT runtime·mmap(SB),NOSPLIT,$20-32
+	MOVW	addr+0(FP), R4
+	MOVW	n+4(FP), R5
+	MOVW	prot+8(FP), R6
+	MOVW	flags+12(FP), R7
+	MOVW	fd+16(FP), R8
+	MOVW	off+20(FP), R9
+	MOVW	R8, 16(R29)
+	MOVW	R9, 20(R29)
+
+	MOVW	$SYS_mmap, R2
+	SYSCALL
+	BEQ	R7, ok
+	MOVW	$0, p+24(FP)
+	MOVW	R2, err+28(FP)
+	RET
+ok:
+	MOVW	R2, p+24(FP)
+	MOVW	$0, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0-8
+	MOVW	addr+0(FP), R4
+	MOVW	n+4(FP), R5
+	MOVW	$SYS_munmap, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	UNDEF	// crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0-16
+	MOVW	addr+0(FP), R4
+	MOVW	n+4(FP), R5
+	MOVW	flags+8(FP), R6
+	MOVW	$SYS_madvise, R2
+	SYSCALL
+	MOVW	R2, ret+12(FP)
+	RET
+
+// int32 futex(int32 *uaddr, int32 op, int32 val, struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT,$20-28
+	MOVW	addr+0(FP), R4
+	MOVW	op+4(FP), R5
+	MOVW	val+8(FP), R6
+	MOVW	ts+12(FP), R7
+
+	MOVW	addr2+16(FP), R8
+	MOVW	val3+20(FP), R9
+
+	MOVW	R8, 16(R29)
+	MOVW	R9, 20(R29)
+
+	MOVW	$SYS_futex, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+
+// int32 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	flags+0(FP), R4
+	MOVW	stk+4(FP), R5
+	MOVW	R0, R6	// ptid
+	MOVW	R0, R7	// tls
+
+	// O32 syscall handler unconditionally copies arguments 5-8 from stack,
+	// even for syscalls with less than 8 arguments. Reserve 32 bytes of new
+	// stack so that any syscall invoked immediately in the new thread won't fail.
+	ADD	$-32, R5
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	MOVW	mp+8(FP), R16
+	MOVW	gp+12(FP), R17
+	MOVW	fn+16(FP), R18
+
+	MOVW	$1234, R1
+
+	MOVW	R16, 0(R5)
+	MOVW	R17, 4(R5)
+	MOVW	R18, 8(R5)
+
+	MOVW	R1, 12(R5)
+
+	MOVW	$SYS_clone, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+
+	// In parent, return.
+	BEQ	R2, 3(PC)
+	MOVW	R2, ret+20(FP)
+	RET
+
+	// In child, on new stack.
+	// Check that SP is as we expect
+	NOP	R29	// tell vet R29/SP changed - stop checking offsets
+	MOVW	12(R29), R16
+	MOVW	$1234, R1
+	BEQ	R16, R1, 2(PC)
+	MOVW	(R0), R0
+
+	// Initialize m->procid to Linux tid
+	MOVW	$SYS_gettid, R2
+	SYSCALL
+
+	MOVW	0(R29), R16	// m
+	MOVW	4(R29), R17	// g
+	MOVW	8(R29), R18	// fn
+
+	BEQ	R16, nog
+	BEQ	R17, nog
+
+	MOVW	R2, m_procid(R16)
+
+	// In child, set up new stack
+	MOVW	R16, g_m(R17)
+	MOVW	R17, g
+
+// TODO(mips32): doesn't have runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	ADDU	$32, R29
+	JAL	(R18)
+
+	// It shouldn't return.	 If it does, exit that thread.
+	ADDU	$-32, R29
+	MOVW	$0xf4, R4
+	MOVW	$SYS_exit, R2
+	SYSCALL
+	UNDEF
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVW	new+0(FP), R4
+	MOVW	old+4(FP), R5
+	MOVW	$SYS_sigaltstack, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	UNDEF	// crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVW	$SYS_sched_yield, R2
+	SYSCALL
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT,$0-16
+	MOVW	pid+0(FP), R4
+	MOVW	len+4(FP), R5
+	MOVW	buf+8(FP), R6
+	MOVW	$SYS_sched_getaffinity, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+12(FP)
+	RET
+
+// int32 runtime·epollcreate(int32 size);
+TEXT runtime·epollcreate(SB),NOSPLIT,$0-8
+	MOVW	size+0(FP), R4
+	MOVW	$SYS_epoll_create, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+4(FP)
+	RET
+
+// int32 runtime·epollcreate1(int32 flags);
+TEXT runtime·epollcreate1(SB),NOSPLIT,$0-8
+	MOVW	flags+0(FP), R4
+	MOVW	$SYS_epoll_create1, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+4(FP)
+	RET
+
+// func epollctl(epfd, op, fd int32, ev *epollEvent) int
+TEXT runtime·epollctl(SB),NOSPLIT,$0-20
+	MOVW	epfd+0(FP), R4
+	MOVW	op+4(FP), R5
+	MOVW	fd+8(FP), R6
+	MOVW	ev+12(FP), R7
+	MOVW	$SYS_epoll_ctl, R2
+	SYSCALL
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+16(FP)
+	RET
+
+// int32 runtime·epollwait(int32 epfd, EpollEvent *ev, int32 nev, int32 timeout);
+TEXT runtime·epollwait(SB),NOSPLIT,$0-20
+	MOVW	epfd+0(FP), R4
+	MOVW	ev+4(FP), R5
+	MOVW	nev+8(FP), R6
+	MOVW	timeout+12(FP), R7
+	MOVW	$SYS_epoll_wait, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+16(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT,$0-4
+	MOVW	fd+0(FP), R4	// fd
+	MOVW	$2, R5	// F_SETFD
+	MOVW	$1, R6	// FD_CLOEXEC
+	MOVW	$SYS_fcntl, R2
+	SYSCALL
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVW	fd+0(FP), R4 // fd
+	MOVW	$3, R5	// F_GETFL
+	MOVW	$0, R6
+	MOVW	$SYS_fcntl, R2
+	SYSCALL
+	MOVW	$0x80, R6 // O_NONBLOCK
+	OR	R2, R6
+	MOVW	fd+0(FP), R4 // fd
+	MOVW	$4, R5	// F_SETFL
+	MOVW	$SYS_fcntl, R2
+	SYSCALL
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-4
+	// Implemented as brk(NULL).
+	MOVW	$0, R4
+	MOVW	$SYS_brk, R2
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT runtime·access(SB),$0-12
+	BREAK // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+8(FP)	// for vet
+	RET
+
+TEXT runtime·connect(SB),$0-16
+	BREAK // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+12(FP)	// for vet
+	RET
+
+TEXT runtime·socket(SB),$0-16
+	BREAK // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+12(FP)	// for vet
+	RET
diff --git a/src/runtime/sys_linux_ppc64x.s b/src/runtime/sys_linux_ppc64x.s
new file mode 100644
index 0000000..7be8c4c
--- /dev/null
+++ b/src/runtime/sys_linux_ppc64x.s
@@ -0,0 +1,769 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build ppc64 ppc64le
+
+//
+// System calls and other sys.stuff for ppc64, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "asm_ppc64x.h"
+
+#define SYS_exit		  1
+#define SYS_read		  3
+#define SYS_write		  4
+#define SYS_open		  5
+#define SYS_close		  6
+#define SYS_getpid		 20
+#define SYS_kill		 37
+#define SYS_pipe		 42
+#define SYS_brk			 45
+#define SYS_fcntl		 55
+#define SYS_mmap		 90
+#define SYS_munmap		 91
+#define SYS_setitimer		104
+#define SYS_clone		120
+#define SYS_sched_yield		158
+#define SYS_nanosleep		162
+#define SYS_rt_sigreturn	172
+#define SYS_rt_sigaction	173
+#define SYS_rt_sigprocmask	174
+#define SYS_sigaltstack		185
+#define SYS_madvise		205
+#define SYS_mincore		206
+#define SYS_gettid		207
+#define SYS_futex		221
+#define SYS_sched_getaffinity	223
+#define SYS_exit_group		234
+#define SYS_epoll_create	236
+#define SYS_epoll_ctl		237
+#define SYS_epoll_wait		238
+#define SYS_clock_gettime	246
+#define SYS_tgkill		250
+#define SYS_epoll_create1	315
+#define SYS_pipe2		317
+
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R3
+	SYSCALL	$SYS_exit_group
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVD	wait+0(FP), R1
+	// We're done using the stack.
+	MOVW	$0, R2
+	SYNC
+	MOVW	R2, (R1)
+	MOVW	$0, R3	// exit code
+	SYSCALL	$SYS_exit
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	name+0(FP), R3
+	MOVW	mode+8(FP), R4
+	MOVW	perm+12(FP), R5
+	SYSCALL	$SYS_open
+	BVC	2(PC)
+	MOVW	$-1, R3
+	MOVW	R3, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R3
+	SYSCALL	$SYS_close
+	BVC	2(PC)
+	MOVW	$-1, R3
+	MOVW	R3, ret+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	fd+0(FP), R3
+	MOVD	p+8(FP), R4
+	MOVW	n+16(FP), R5
+	SYSCALL	$SYS_write
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+24(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R3
+	MOVD	p+8(FP), R4
+	MOVW	n+16(FP), R5
+	SYSCALL	$SYS_read
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT|NOFRAME,$0-12
+	ADD	$FIXED_FRAME, R1, R3
+	SYSCALL	$SYS_pipe
+	MOVW	R3, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	ADD	$FIXED_FRAME+8, R1, R3
+	MOVW	flags+0(FP), R4
+	SYSCALL	$SYS_pipe2
+	MOVW	R3, errno+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16-4
+	MOVW	usec+0(FP), R3
+	MOVD	R3, R5
+	MOVW	$1000000, R4
+	DIVD	R4, R3
+	MOVD	R3, 8(R1)
+	MOVW	$1000, R4
+	MULLD	R3, R4
+	SUB	R4, R5
+	MOVD	R5, 16(R1)
+
+	// nanosleep(&ts, 0)
+	ADD	$8, R1, R3
+	MOVW	$0, R4
+	SYSCALL	$SYS_nanosleep
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	SYSCALL	$SYS_gettid
+	MOVW	R3, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	SYSCALL	$SYS_getpid
+	MOVW	R3, R14
+	SYSCALL	$SYS_gettid
+	MOVW	R3, R4	// arg 2 tid
+	MOVW	R14, R3	// arg 1 pid
+	MOVW	sig+0(FP), R5	// arg 3
+	SYSCALL	$SYS_tgkill
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	SYSCALL	$SYS_getpid
+	MOVW	R3, R3	// arg 1 pid
+	MOVW	sig+0(FP), R4	// arg 2
+	SYSCALL	$SYS_kill
+	RET
+
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	SYSCALL $SYS_getpid
+	MOVD	R3, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT|NOFRAME,$0-24
+	MOVD	tgid+0(FP), R3
+	MOVD	tid+8(FP), R4
+	MOVD	sig+16(FP), R5
+	SYSCALL $SYS_tgkill
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	SYSCALL	$SYS_setitimer
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	addr+0(FP), R3
+	MOVD	n+8(FP), R4
+	MOVD	dst+16(FP), R5
+	SYSCALL	$SYS_mincore
+	NEG	R3		// caller expects negative errno
+	MOVW	R3, ret+24(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB),NOSPLIT,$16-12
+	MOVD	R1, R15		// R15 is unchanged by C code
+	MOVD	g_m(g), R21	// R21 = m
+
+	MOVD	$0, R3		// CLOCK_REALTIME
+
+	MOVD	runtime·vdsoClockgettimeSym(SB), R12	// Check for VDSO availability
+	CMP	R12, R0
+	BEQ	fallback
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVD	m_vdsoPC(R21), R4
+	MOVD	m_vdsoSP(R21), R5
+	MOVD	R4, 32(R1)
+	MOVD	R5, 40(R1)
+
+	MOVD	LR, R14
+	MOVD	R14, m_vdsoPC(R21)
+	MOVD	R15, m_vdsoSP(R21)
+
+	MOVD	m_curg(R21), R6
+	CMP	g, R6
+	BNE	noswitch
+
+	MOVD	m_g0(R21), R7
+	MOVD	(g_sched+gobuf_sp)(R7), R1	// Set SP to g0 stack
+
+noswitch:
+	SUB	$16, R1                 // Space for results
+	RLDICR	$0, R1, $59, R1         // Align for C code
+	MOVD	R12, CTR
+	MOVD	R1, R4
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+	// If we don't have a signal stack, we won't receive signal,
+	// so don't bother saving g.
+	// When using cgo, we already saved g on TLS, also don't save
+	// g here.
+	// Also don't save g if we are already on the signal stack.
+	// We won't get a nested signal.
+	MOVBZ	runtime·iscgo(SB), R22
+	CMP	R22, $0
+	BNE	nosaveg
+	MOVD	m_gsignal(R21), R22	// g.m.gsignal
+	CMP	R22, $0
+	BEQ	nosaveg
+
+	CMP	g, R22
+	BEQ	nosaveg
+	MOVD	(g_stack+stack_lo)(R22), R22 // g.m.gsignal.stack.lo
+	MOVD	g, (R22)
+
+	BL	(CTR)	// Call from VDSO
+
+	MOVD	$0, (R22)	// clear g slot, R22 is unchanged by C code
+
+	JMP	finish
+
+nosaveg:
+	BL	(CTR)	// Call from VDSO
+
+finish:
+	MOVD	$0, R0		// Restore R0
+	MOVD	0(R1), R3	// sec
+	MOVD	8(R1), R5	// nsec
+	MOVD	R15, R1		// Restore SP
+
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVD	40(R1), R6
+	MOVD	R6, m_vdsoSP(R21)
+	MOVD	32(R1), R6
+	MOVD	R6, m_vdsoPC(R21)
+
+return:
+	MOVD	R3, sec+0(FP)
+	MOVW	R5, nsec+8(FP)
+	RET
+
+	// Syscall fallback
+fallback:
+	ADD	$32, R1, R4
+	SYSCALL $SYS_clock_gettime
+	MOVD	32(R1), R3
+	MOVD	40(R1), R5
+	JMP	return
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$16-8
+	MOVD	$1, R3		// CLOCK_MONOTONIC
+
+	MOVD	R1, R15		// R15 is unchanged by C code
+	MOVD	g_m(g), R21	// R21 = m
+
+	MOVD	runtime·vdsoClockgettimeSym(SB), R12	// Check for VDSO availability
+	CMP	R12, R0
+	BEQ	fallback
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVD	m_vdsoPC(R21), R4
+	MOVD	m_vdsoSP(R21), R5
+	MOVD	R4, 32(R1)
+	MOVD	R5, 40(R1)
+
+	MOVD	LR, R14		// R14 is unchanged by C code
+	MOVD	R14, m_vdsoPC(R21)
+	MOVD	R15, m_vdsoSP(R21)
+
+	MOVD	m_curg(R21), R6
+	CMP	g, R6
+	BNE	noswitch
+
+	MOVD	m_g0(R21), R7
+	MOVD	(g_sched+gobuf_sp)(R7), R1	// Set SP to g0 stack
+
+noswitch:
+	SUB	$16, R1			// Space for results
+	RLDICR	$0, R1, $59, R1		// Align for C code
+	MOVD	R12, CTR
+	MOVD	R1, R4
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+	// If we don't have a signal stack, we won't receive signal,
+	// so don't bother saving g.
+	// When using cgo, we already saved g on TLS, also don't save
+	// g here.
+	// Also don't save g if we are already on the signal stack.
+	// We won't get a nested signal.
+	MOVBZ	runtime·iscgo(SB), R22
+	CMP	R22, $0
+	BNE	nosaveg
+	MOVD	m_gsignal(R21), R22	// g.m.gsignal
+	CMP	R22, $0
+	BEQ	nosaveg
+
+	CMP	g, R22
+	BEQ	nosaveg
+	MOVD	(g_stack+stack_lo)(R22), R22 // g.m.gsignal.stack.lo
+	MOVD	g, (R22)
+
+	BL	(CTR)	// Call from VDSO
+
+	MOVD	$0, (R22)	// clear g slot, R22 is unchanged by C code
+
+	JMP	finish
+
+nosaveg:
+	BL	(CTR)	// Call from VDSO
+
+finish:
+	MOVD	$0, R0			// Restore R0
+	MOVD	0(R1), R3		// sec
+	MOVD	8(R1), R5		// nsec
+	MOVD	R15, R1			// Restore SP
+
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVD	40(R1), R6
+	MOVD	R6, m_vdsoSP(R21)
+	MOVD	32(R1), R6
+	MOVD	R6, m_vdsoPC(R21)
+
+return:
+	// sec is in R3, nsec in R5
+	// return nsec in R3
+	MOVD	$1000000000, R4
+	MULLD	R4, R3
+	ADD	R5, R3
+	MOVD	R3, ret+0(FP)
+	RET
+
+	// Syscall fallback
+fallback:
+	ADD	$32, R1, R4
+	SYSCALL $SYS_clock_gettime
+	MOVD	32(R1), R3
+	MOVD	40(R1), R5
+	JMP	return
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	MOVW	size+24(FP), R6
+	SYSCALL	$SYS_rt_sigprocmask
+	BVC	2(PC)
+	MOVD	R0, 0xf0(R0)	// crash
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOVD	sig+0(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	MOVD	size+24(FP), R6
+	SYSCALL	$SYS_rt_sigaction
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+32(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R3
+	MOVD	info+16(FP), R4
+	MOVD	ctx+24(FP), R5
+	MOVD	fn+0(FP), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+	MOVD	24(R1), R2
+	RET
+
+TEXT runtime·sigreturn(SB),NOSPLIT,$0-0
+	RET
+
+#ifdef GOARCH_ppc64le
+// ppc64le doesn't need function descriptors
+TEXT runtime·sigtramp(SB),NOSPLIT,$64
+#else
+// function descriptor for the real sigtramp
+TEXT runtime·sigtramp(SB),NOSPLIT|NOFRAME,$0
+	DWORD	$sigtramp<>(SB)
+	DWORD	$0
+	DWORD	$0
+TEXT sigtramp<>(SB),NOSPLIT,$64
+#endif
+	// initialize essential registers (just in case)
+	BL	runtime·reginit(SB)
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVBZ	runtime·iscgo(SB), R6
+	CMP	R6, $0
+	BEQ	2(PC)
+	BL	runtime·load_g(SB)
+
+	MOVW	R3, FIXED_FRAME+0(R1)
+	MOVD	R4, FIXED_FRAME+8(R1)
+	MOVD	R5, FIXED_FRAME+16(R1)
+	MOVD	$runtime·sigtrampgo(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+	MOVD	24(R1), R2
+	RET
+
+#ifdef GOARCH_ppc64le
+// ppc64le doesn't need function descriptors
+TEXT runtime·cgoSigtramp(SB),NOSPLIT|NOFRAME,$0
+	// The stack unwinder, presumably written in C, may not be able to
+	// handle Go frame correctly. So, this function is NOFRAME, and we
+	// save/restore LR manually.
+	MOVD	LR, R10
+
+	// We're coming from C code, initialize essential registers.
+	CALL	runtime·reginit(SB)
+
+	// If no traceback function, do usual sigtramp.
+	MOVD	runtime·cgoTraceback(SB), R6
+	CMP	$0, R6
+	BEQ	sigtramp
+
+	// If no traceback support function, which means that
+	// runtime/cgo was not linked in, do usual sigtramp.
+	MOVD	_cgo_callers(SB), R6
+	CMP	$0, R6
+	BEQ	sigtramp
+
+	// Set up g register.
+	CALL	runtime·load_g(SB)
+
+	// Figure out if we are currently in a cgo call.
+	// If not, just do usual sigtramp.
+	CMP	$0, g
+	BEQ	sigtrampnog // g == nil
+	MOVD	g_m(g), R6
+	CMP	$0, R6
+	BEQ	sigtramp    // g.m == nil
+	MOVW	m_ncgo(R6), R7
+	CMPW	$0, R7
+	BEQ	sigtramp    // g.m.ncgo = 0
+	MOVD	m_curg(R6), R7
+	CMP	$0, R7
+	BEQ	sigtramp    // g.m.curg == nil
+	MOVD	g_syscallsp(R7), R7
+	CMP	$0, R7
+	BEQ	sigtramp    // g.m.curg.syscallsp == 0
+	MOVD	m_cgoCallers(R6), R7 // R7 is the fifth arg in C calling convention.
+	CMP	$0, R7
+	BEQ	sigtramp    // g.m.cgoCallers == nil
+	MOVW	m_cgoCallersUse(R6), R8
+	CMPW	$0, R8
+	BNE	sigtramp    // g.m.cgoCallersUse != 0
+
+	// Jump to a function in runtime/cgo.
+	// That function, written in C, will call the user's traceback
+	// function with proper unwind info, and will then call back here.
+	// The first three arguments, and the fifth, are already in registers.
+	// Set the two remaining arguments now.
+	MOVD	runtime·cgoTraceback(SB), R6
+	MOVD	$runtime·sigtramp(SB), R8
+	MOVD	_cgo_callers(SB), R12
+	MOVD	R12, CTR
+	MOVD	R10, LR // restore LR
+	JMP	(CTR)
+
+sigtramp:
+	MOVD	R10, LR // restore LR
+	JMP	runtime·sigtramp(SB)
+
+sigtrampnog:
+	// Signal arrived on a non-Go thread. If this is SIGPROF, get a
+	// stack trace.
+	CMPW	R3, $27 // 27 == SIGPROF
+	BNE	sigtramp
+
+	// Lock sigprofCallersUse (cas from 0 to 1).
+	MOVW	$1, R7
+	MOVD	$runtime·sigprofCallersUse(SB), R8
+	SYNC
+	LWAR    (R8), R6
+	CMPW    $0, R6
+	BNE     sigtramp
+	STWCCC  R7, (R8)
+	BNE     -4(PC)
+	ISYNC
+
+	// Jump to the traceback function in runtime/cgo.
+	// It will call back to sigprofNonGo, which will ignore the
+	// arguments passed in registers.
+	// First three arguments to traceback function are in registers already.
+	MOVD	runtime·cgoTraceback(SB), R6
+	MOVD	$runtime·sigprofCallers(SB), R7
+	MOVD	$runtime·sigprofNonGoWrapper<>(SB), R8
+	MOVD	_cgo_callers(SB), R12
+	MOVD	R12, CTR
+	MOVD	R10, LR // restore LR
+	JMP	(CTR)
+#else
+// function descriptor for the real sigtramp
+TEXT runtime·cgoSigtramp(SB),NOSPLIT|NOFRAME,$0
+	DWORD	$cgoSigtramp<>(SB)
+	DWORD	$0
+	DWORD	$0
+TEXT cgoSigtramp<>(SB),NOSPLIT,$0
+	JMP	sigtramp<>(SB)
+#endif
+
+TEXT runtime·sigprofNonGoWrapper<>(SB),NOSPLIT,$0
+	// We're coming from C code, set up essential register, then call sigprofNonGo.
+	CALL	runtime·reginit(SB)
+	CALL	runtime·sigprofNonGo(SB)
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R3
+	MOVD	n+8(FP), R4
+	MOVW	prot+16(FP), R5
+	MOVW	flags+20(FP), R6
+	MOVW	fd+24(FP), R7
+	MOVW	off+28(FP), R8
+
+	SYSCALL	$SYS_mmap
+	BVC	ok
+	MOVD	$0, p+32(FP)
+	MOVD	R3, err+40(FP)
+	RET
+ok:
+	MOVD	R3, p+32(FP)
+	MOVD	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R3
+	MOVD	n+8(FP), R4
+	SYSCALL	$SYS_munmap
+	BVC	2(PC)
+	MOVD	R0, 0xf0(R0)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R3
+	MOVD	n+8(FP), R4
+	MOVW	flags+16(FP), R5
+	SYSCALL	$SYS_madvise
+	MOVW	R3, ret+24(FP)
+	RET
+
+// int64 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R3
+	MOVW	op+8(FP), R4
+	MOVW	val+12(FP), R5
+	MOVD	ts+16(FP), R6
+	MOVD	addr2+24(FP), R7
+	MOVW	val3+32(FP), R8
+	SYSCALL	$SYS_futex
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+40(FP)
+	RET
+
+// int64 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R3
+	MOVD	stk+8(FP), R4
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	// Careful: Linux system call clobbers ???.
+	MOVD	mp+16(FP), R7
+	MOVD	gp+24(FP), R8
+	MOVD	fn+32(FP), R12
+
+	MOVD	R7, -8(R4)
+	MOVD	R8, -16(R4)
+	MOVD	R12, -24(R4)
+	MOVD	$1234, R7
+	MOVD	R7, -32(R4)
+
+	SYSCALL $SYS_clone
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+
+	// In parent, return.
+	CMP	R3, $0
+	BEQ	3(PC)
+	MOVW	R3, ret+40(FP)
+	RET
+
+	// In child, on new stack.
+	// initialize essential registers
+	BL	runtime·reginit(SB)
+	MOVD	-32(R1), R7
+	CMP	R7, $1234
+	BEQ	2(PC)
+	MOVD	R0, 0(R0)
+
+	// Initialize m->procid to Linux tid
+	SYSCALL $SYS_gettid
+
+	MOVD	-24(R1), R12       // fn
+	MOVD	-16(R1), R8        // g
+	MOVD	-8(R1), R7         // m
+
+	CMP	R7, $0
+	BEQ	nog
+	CMP	R8, $0
+	BEQ	nog
+
+	MOVD	R3, m_procid(R7)
+
+	// TODO: setup TLS.
+
+	// In child, set up new stack
+	MOVD	R7, g_m(R8)
+	MOVD	R8, g
+	//CALL	runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	// It shouldn't return.	 If it does, exit that thread.
+	MOVW	$111, R3
+	SYSCALL	$SYS_exit
+	BR	-2(PC)	// keep exiting
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVD	new+0(FP), R3
+	MOVD	old+8(FP), R4
+	SYSCALL	$SYS_sigaltstack
+	BVC	2(PC)
+	MOVD	R0, 0xf0(R0)  // crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	SYSCALL	$SYS_sched_yield
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOVD	pid+0(FP), R3
+	MOVD	len+8(FP), R4
+	MOVD	buf+16(FP), R5
+	SYSCALL	$SYS_sched_getaffinity
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+24(FP)
+	RET
+
+// int32 runtime·epollcreate(int32 size);
+TEXT runtime·epollcreate(SB),NOSPLIT|NOFRAME,$0
+	MOVW    size+0(FP), R3
+	SYSCALL	$SYS_epoll_create
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+8(FP)
+	RET
+
+// int32 runtime·epollcreate1(int32 flags);
+TEXT runtime·epollcreate1(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R3
+	SYSCALL	$SYS_epoll_create1
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+8(FP)
+	RET
+
+// func epollctl(epfd, op, fd int32, ev *epollEvent) int
+TEXT runtime·epollctl(SB),NOSPLIT|NOFRAME,$0
+	MOVW	epfd+0(FP), R3
+	MOVW	op+4(FP), R4
+	MOVW	fd+8(FP), R5
+	MOVD	ev+16(FP), R6
+	SYSCALL	$SYS_epoll_ctl
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+24(FP)
+	RET
+
+// int32 runtime·epollwait(int32 epfd, EpollEvent *ev, int32 nev, int32 timeout);
+TEXT runtime·epollwait(SB),NOSPLIT|NOFRAME,$0
+	MOVW	epfd+0(FP), R3
+	MOVD	ev+8(FP), R4
+	MOVW	nev+16(FP), R5
+	MOVW	timeout+20(FP), R6
+	SYSCALL	$SYS_epoll_wait
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+24(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT|NOFRAME,$0
+	MOVW    fd+0(FP), R3  // fd
+	MOVD    $2, R4  // F_SETFD
+	MOVD    $1, R5  // FD_CLOEXEC
+	SYSCALL	$SYS_fcntl
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	fd+0(FP), R3 // fd
+	MOVD	$3, R4	// F_GETFL
+	MOVD	$0, R5
+	SYSCALL	$SYS_fcntl
+	OR	$0x800, R3, R5 // O_NONBLOCK
+	MOVW	fd+0(FP), R3 // fd
+	MOVD	$4, R4	// F_SETFL
+	SYSCALL	$SYS_fcntl
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT|NOFRAME,$0
+	// Implemented as brk(NULL).
+	MOVD	$0, R3
+	SYSCALL	$SYS_brk
+	MOVD	R3, ret+0(FP)
+	RET
+
+TEXT runtime·access(SB),$0-20
+	MOVD	R0, 0(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP) // for vet
+	RET
+
+TEXT runtime·connect(SB),$0-28
+	MOVD	R0, 0(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+24(FP) // for vet
+	RET
+
+TEXT runtime·socket(SB),$0-20
+	MOVD	R0, 0(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP) // for vet
+	RET
diff --git a/src/runtime/sys_linux_riscv64.s b/src/runtime/sys_linux_riscv64.s
new file mode 100644
index 0000000..626ab39
--- /dev/null
+++ b/src/runtime/sys_linux_riscv64.s
@@ -0,0 +1,515 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for riscv64, Linux
+//
+
+#include "textflag.h"
+#include "go_asm.h"
+
+#define AT_FDCWD -100
+
+#define SYS_brk			214
+#define SYS_clock_gettime	113
+#define SYS_clone		220
+#define SYS_close		57
+#define SYS_connect		203
+#define SYS_epoll_create1	20
+#define SYS_epoll_ctl		21
+#define SYS_epoll_pwait		22
+#define SYS_exit		93
+#define SYS_exit_group		94
+#define SYS_faccessat		48
+#define SYS_fcntl		25
+#define SYS_futex		98
+#define SYS_getpid		172
+#define SYS_getrlimit		163
+#define SYS_gettid		178
+#define SYS_gettimeofday	169
+#define SYS_kill		129
+#define SYS_madvise		233
+#define SYS_mincore		232
+#define SYS_mmap		222
+#define SYS_munmap		215
+#define SYS_nanosleep		101
+#define SYS_openat		56
+#define SYS_pipe2		59
+#define SYS_pselect6		72
+#define SYS_read		63
+#define SYS_rt_sigaction	134
+#define SYS_rt_sigprocmask	135
+#define SYS_rt_sigreturn	139
+#define SYS_sched_getaffinity	123
+#define SYS_sched_yield		124
+#define SYS_setitimer		103
+#define SYS_sigaltstack		132
+#define SYS_socket		198
+#define SYS_tgkill		131
+#define SYS_tkill		130
+#define SYS_write		64
+
+// func exit(code int32)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), A0
+	MOV	$SYS_exit_group, A7
+	ECALL
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOV	wait+0(FP), A0
+	// We're done using the stack.
+	FENCE
+	MOVW	ZERO, (A0)
+	FENCE
+	MOV	$0, A0	// exit code
+	MOV	$SYS_exit, A7
+	ECALL
+	JMP	0(PC)
+
+// func open(name *byte, mode, perm int32) int32
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOV	$AT_FDCWD, A0
+	MOV	name+0(FP), A1
+	MOVW	mode+8(FP), A2
+	MOVW	perm+12(FP), A3
+	MOV	$SYS_openat, A7
+	ECALL
+	MOV	$-4096, T0
+	BGEU	T0, A0, 2(PC)
+	MOV	$-1, A0
+	MOVW	A0, ret+16(FP)
+	RET
+
+// func closefd(fd int32) int32
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), A0
+	MOV	$SYS_close, A7
+	ECALL
+	MOV	$-4096, T0
+	BGEU	T0, A0, 2(PC)
+	MOV	$-1, A0
+	MOVW	A0, ret+8(FP)
+	RET
+
+// func write1(fd uintptr, p unsafe.Pointer, n int32) int32
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOV	fd+0(FP), A0
+	MOV	p+8(FP), A1
+	MOVW	n+16(FP), A2
+	MOV	$SYS_write, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func read(fd int32, p unsafe.Pointer, n int32) int32
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), A0
+	MOV	p+8(FP), A1
+	MOVW	n+16(FP), A2
+	MOV	$SYS_read, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT|NOFRAME,$0-12
+	MOV	$r+0(FP), A0
+	MOV	ZERO, A1
+	MOV	$SYS_pipe2, A7
+	ECALL
+	MOVW	A0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOV	$r+8(FP), A0
+	MOVW	flags+0(FP), A1
+	MOV	$SYS_pipe2, A7
+	ECALL
+	MOVW	A0, errno+16(FP)
+	RET
+
+// func getrlimit(kind int32, limit unsafe.Pointer) int32
+TEXT runtime·getrlimit(SB),NOSPLIT|NOFRAME,$0-20
+	MOVW	kind+0(FP), A0
+	MOV	limit+8(FP), A1
+	MOV	$SYS_getrlimit, A7
+	ECALL
+	MOVW	A0, ret+16(FP)
+	RET
+
+// func usleep(usec uint32)
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), A0
+	MOV	$1000, A1
+	MUL	A1, A0, A0
+	MOV	$1000000000, A1
+	DIV	A1, A0, A2
+	MOV	A2, 8(X2)
+	REM	A1, A0, A3
+	MOV	A3, 16(X2)
+	ADD	$8, X2, A0
+	MOV	ZERO, A1
+	MOV	$SYS_nanosleep, A7
+	ECALL
+	RET
+
+// func gettid() uint32
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOV	$SYS_gettid, A7
+	ECALL
+	MOVW	A0, ret+0(FP)
+	RET
+
+// func raise(sig uint32)
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOV	$SYS_gettid, A7
+	ECALL
+	// arg 1 tid - already in A0
+	MOVW	sig+0(FP), A1	// arg 2
+	MOV	$SYS_tkill, A7
+	ECALL
+	RET
+
+// func raiseproc(sig uint32)
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOV	$SYS_getpid, A7
+	ECALL
+	// arg 1 pid - already in A0
+	MOVW	sig+0(FP), A1	// arg 2
+	MOV	$SYS_kill, A7
+	ECALL
+	RET
+
+// func getpid() int
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	MOV	$SYS_getpid, A7
+	ECALL
+	MOV	A0, ret+0(FP)
+	RET
+
+// func tgkill(tgid, tid, sig int)
+TEXT ·tgkill(SB),NOSPLIT|NOFRAME,$0-24
+	MOV	tgid+0(FP), A0
+	MOV	tid+8(FP), A1
+	MOV	sig+16(FP), A2
+	MOV	$SYS_tgkill, A7
+	ECALL
+	RET
+
+// func setitimer(mode int32, new, old *itimerval)
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), A0
+	MOV	new+8(FP), A1
+	MOV	old+16(FP), A2
+	MOV	$SYS_setitimer, A7
+	ECALL
+	RET
+
+// func mincore(addr unsafe.Pointer, n uintptr, dst *byte) int32
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOV	dst+16(FP), A2
+	MOV	$SYS_mincore, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB),NOSPLIT,$24-12
+	MOV	$0, A0 // CLOCK_REALTIME
+	MOV	$8(X2), A1
+	MOV	$SYS_clock_gettime, A7
+	ECALL
+	MOV	8(X2), T0	// sec
+	MOV	16(X2), T1	// nsec
+	MOV	T0, sec+0(FP)
+	MOVW	T1, nsec+8(FP)
+	RET
+
+// func nanotime1() int64
+TEXT runtime·nanotime1(SB),NOSPLIT,$24-8
+	MOV	$1, A0 // CLOCK_MONOTONIC
+	MOV	$8(X2), A1
+	MOV	$SYS_clock_gettime, A7
+	ECALL
+	MOV	8(X2), T0	// sec
+	MOV	16(X2), T1	// nsec
+	// sec is in T0, nsec in T1
+	// return nsec in T0
+	MOV	$1000000000, T2
+	MUL	T2, T0
+	ADD	T1, T0
+	MOV	T0, ret+0(FP)
+	RET
+
+// func rtsigprocmask(how int32, new, old *sigset, size int32)
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), A0
+	MOV	new+8(FP), A1
+	MOV	old+16(FP), A2
+	MOVW	size+24(FP), A3
+	MOV	$SYS_rt_sigprocmask, A7
+	ECALL
+	MOV	$-4096, T0
+	BLTU	A0, T0, 2(PC)
+	WORD	$0	// crash
+	RET
+
+// func rt_sigaction(sig uintptr, new, old *sigactiont, size uintptr) int32
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOV	sig+0(FP), A0
+	MOV	new+8(FP), A1
+	MOV	old+16(FP), A2
+	MOV	size+24(FP), A3
+	MOV	$SYS_rt_sigaction, A7
+	ECALL
+	MOVW	A0, ret+32(FP)
+	RET
+
+// func sigfwd(fn uintptr, sig uint32, info *siginfo, ctx unsafe.Pointer)
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), A0
+	MOV	info+16(FP), A1
+	MOV	ctx+24(FP), A2
+	MOV	fn+0(FP), T1
+	JALR	RA, T1
+	RET
+
+// func sigtramp(signo, ureg, ctxt unsafe.Pointer)
+TEXT runtime·sigtramp(SB),NOSPLIT,$64
+	MOVW	A0, 8(X2)
+	MOV	A1, 16(X2)
+	MOV	A2, 24(X2)
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVBU	runtime·iscgo(SB), A0
+	BEQ	A0, ZERO, 2(PC)
+	CALL	runtime·load_g(SB)
+
+	MOV	$runtime·sigtrampgo(SB), A0
+	JALR	RA, A0
+	RET
+
+// func cgoSigtramp()
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	MOV	$runtime·sigtramp(SB), T1
+	JALR	ZERO, T1
+
+// func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (p unsafe.Pointer, err int)
+TEXT runtime·mmap(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOVW	prot+16(FP), A2
+	MOVW	flags+20(FP), A3
+	MOVW	fd+24(FP), A4
+	MOVW	off+28(FP), A5
+	MOV	$SYS_mmap, A7
+	ECALL
+	MOV	$-4096, T0
+	BGEU	T0, A0, 5(PC)
+	SUB	A0, ZERO, A0
+	MOV	ZERO, p+32(FP)
+	MOV	A0, err+40(FP)
+	RET
+ok:
+	MOV	A0, p+32(FP)
+	MOV	ZERO, err+40(FP)
+	RET
+
+// func munmap(addr unsafe.Pointer, n uintptr)
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOV	$SYS_munmap, A7
+	ECALL
+	MOV	$-4096, T0
+	BLTU	A0, T0, 2(PC)
+	WORD	$0	// crash
+	RET
+
+// func madvise(addr unsafe.Pointer, n uintptr, flags int32)
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOVW	flags+16(FP), A2
+	MOV	$SYS_madvise, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func futex(addr unsafe.Pointer, op int32, val uint32, ts, addr2 unsafe.Pointer, val3 uint32) int32
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOVW	op+8(FP), A1
+	MOVW	val+12(FP), A2
+	MOV	ts+16(FP), A3
+	MOV	addr2+24(FP), A4
+	MOVW	val3+32(FP), A5
+	MOV	$SYS_futex, A7
+	ECALL
+	MOVW	A0, ret+40(FP)
+	RET
+
+// func clone(flags int32, stk, mp, gp, fn unsafe.Pointer) int32
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), A0
+	MOV	stk+8(FP), A1
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	MOV	mp+16(FP), T0
+	MOV	gp+24(FP), T1
+	MOV	fn+32(FP), T2
+
+	MOV	T0, -8(A1)
+	MOV	T1, -16(A1)
+	MOV	T2, -24(A1)
+	MOV	$1234, T0
+	MOV	T0, -32(A1)
+
+	MOV	$SYS_clone, A7
+	ECALL
+
+	// In parent, return.
+	BEQ	ZERO, A0, child
+	MOVW	ZERO, ret+40(FP)
+	RET
+
+child:
+	// In child, on new stack.
+	MOV	-32(X2), T0
+	MOV	$1234, A0
+	BEQ	A0, T0, good
+	WORD	$0	// crash
+
+good:
+	// Initialize m->procid to Linux tid
+	MOV	$SYS_gettid, A7
+	ECALL
+
+	MOV	-24(X2), T2	// fn
+	MOV	-16(X2), T1	// g
+	MOV	-8(X2), T0	// m
+
+	BEQ	ZERO, T0, nog
+	BEQ	ZERO, T1, nog
+
+	MOV	A0, m_procid(T0)
+
+	// In child, set up new stack
+	MOV	T0, g_m(T1)
+	MOV	T1, g
+
+nog:
+	// Call fn
+	JALR	RA, T2
+
+	// It shouldn't return.  If it does, exit this thread.
+	MOV	$111, A0
+	MOV	$SYS_exit, A7
+	ECALL
+	JMP	-3(PC)	// keep exiting
+
+// func sigaltstack(new, old *stackt)
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOV	new+0(FP), A0
+	MOV	old+8(FP), A1
+	MOV	$SYS_sigaltstack, A7
+	ECALL
+	MOV	$-4096, T0
+	BLTU	A0, T0, 2(PC)
+	WORD	$0	// crash
+	RET
+
+// func osyield()
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOV	$SYS_sched_yield, A7
+	ECALL
+	RET
+
+// func sched_getaffinity(pid, len uintptr, buf *uintptr) int32
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOV	pid+0(FP), A0
+	MOV	len+8(FP), A1
+	MOV	buf+16(FP), A2
+	MOV	$SYS_sched_getaffinity, A7
+	ECALL
+	MOV	A0, ret+24(FP)
+	RET
+
+// func epollcreate(size int32) int32
+TEXT runtime·epollcreate(SB),NOSPLIT|NOFRAME,$0
+	MOV	$0, A0
+	MOV	$SYS_epoll_create1, A7
+	ECALL
+	MOVW	A0, ret+8(FP)
+	RET
+
+// func epollcreate1(flags int32) int32
+TEXT runtime·epollcreate1(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), A0
+	MOV	$SYS_epoll_create1, A7
+	ECALL
+	MOVW	A0, ret+8(FP)
+	RET
+
+// func epollctl(epfd, op, fd int32, ev *epollevent) int32
+TEXT runtime·epollctl(SB),NOSPLIT|NOFRAME,$0
+	MOVW	epfd+0(FP), A0
+	MOVW	op+4(FP), A1
+	MOVW	fd+8(FP), A2
+	MOV	ev+16(FP), A3
+	MOV	$SYS_epoll_ctl, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func epollwait(epfd int32, ev *epollevent, nev, timeout int32) int32
+TEXT runtime·epollwait(SB),NOSPLIT|NOFRAME,$0
+	MOVW	epfd+0(FP), A0
+	MOV	ev+8(FP), A1
+	MOVW	nev+16(FP), A2
+	MOVW	timeout+20(FP), A3
+	MOV	$0, A4
+	MOV	$SYS_epoll_pwait, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func closeonexec(int32)
+TEXT runtime·closeonexec(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), A0  // fd
+	MOV	$2, A1	// F_SETFD
+	MOV	$1, A2	// FD_CLOEXEC
+	MOV	$SYS_fcntl, A7
+	ECALL
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	fd+0(FP), A0 // fd
+	MOV	$3, A1	// F_GETFL
+	MOV	$0, A2
+	MOV	$SYS_fcntl, A7
+	ECALL
+	MOV	$0x800, A2 // O_NONBLOCK
+	OR	A0, A2
+	MOVW	fd+0(FP), A0 // fd
+	MOV	$4, A1	// F_SETFL
+	MOV	$SYS_fcntl, A7
+	ECALL
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-8
+	// Implemented as brk(NULL).
+	MOV	$0, A0
+	MOV	$SYS_brk, A7
+	ECALL
+	MOVW	A0, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_linux_s390x.s b/src/runtime/sys_linux_s390x.s
new file mode 100644
index 0000000..c15a1d5
--- /dev/null
+++ b/src/runtime/sys_linux_s390x.s
@@ -0,0 +1,508 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// System calls and other system stuff for Linux s390x; see
+// /usr/include/asm/unistd.h for the syscall number definitions.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define SYS_exit                  1
+#define SYS_read                  3
+#define SYS_write                 4
+#define SYS_open                  5
+#define SYS_close                 6
+#define SYS_getpid               20
+#define SYS_kill                 37
+#define SYS_pipe		 42
+#define SYS_brk			 45
+#define SYS_fcntl                55
+#define SYS_mmap                 90
+#define SYS_munmap               91
+#define SYS_setitimer           104
+#define SYS_clone               120
+#define SYS_sched_yield         158
+#define SYS_nanosleep           162
+#define SYS_rt_sigreturn        173
+#define SYS_rt_sigaction        174
+#define SYS_rt_sigprocmask      175
+#define SYS_sigaltstack         186
+#define SYS_madvise             219
+#define SYS_mincore             218
+#define SYS_gettid              236
+#define SYS_futex               238
+#define SYS_sched_getaffinity   240
+#define SYS_tgkill              241
+#define SYS_exit_group          248
+#define SYS_epoll_create        249
+#define SYS_epoll_ctl           250
+#define SYS_epoll_wait          251
+#define SYS_clock_gettime       260
+#define SYS_pipe2		325
+#define SYS_epoll_create1       327
+
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R2
+	MOVW	$SYS_exit_group, R1
+	SYSCALL
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVD	wait+0(FP), R1
+	// We're done using the stack.
+	MOVW	$0, R2
+	MOVW	R2, (R1)
+	MOVW	$0, R2	// exit code
+	MOVW	$SYS_exit, R1
+	SYSCALL
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	name+0(FP), R2
+	MOVW	mode+8(FP), R3
+	MOVW	perm+12(FP), R4
+	MOVW	$SYS_open, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R2
+	MOVW	$SYS_close, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	fd+0(FP), R2
+	MOVD	p+8(FP), R3
+	MOVW	n+16(FP), R4
+	MOVW	$SYS_write, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R2
+	MOVD	p+8(FP), R3
+	MOVW	n+16(FP), R4
+	MOVW	$SYS_read, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT|NOFRAME,$0-12
+	MOVD	$r+0(FP), R2
+	MOVW	$SYS_pipe, R1
+	SYSCALL
+	MOVW	R2, errno+8(FP)
+	RET
+
+// func pipe2() (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	$r+8(FP), R2
+	MOVW	flags+0(FP), R3
+	MOVW	$SYS_pipe2, R1
+	SYSCALL
+	MOVW	R2, errno+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16-4
+	MOVW	usec+0(FP), R2
+	MOVD	R2, R4
+	MOVW	$1000000, R3
+	DIVD	R3, R2
+	MOVD	R2, 8(R15)
+	MOVW	$1000, R3
+	MULLD	R2, R3
+	SUB	R3, R4
+	MOVD	R4, 16(R15)
+
+	// nanosleep(&ts, 0)
+	ADD	$8, R15, R2
+	MOVW	$0, R3
+	MOVW	$SYS_nanosleep, R1
+	SYSCALL
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVW	$SYS_gettid, R1
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$SYS_getpid, R1
+	SYSCALL
+	MOVW	R2, R10
+	MOVW	$SYS_gettid, R1
+	SYSCALL
+	MOVW	R2, R3	// arg 2 tid
+	MOVW	R10, R2	// arg 1 pid
+	MOVW	sig+0(FP), R4	// arg 2
+	MOVW	$SYS_tgkill, R1
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$SYS_getpid, R1
+	SYSCALL
+	MOVW	R2, R2	// arg 1 pid
+	MOVW	sig+0(FP), R3	// arg 2
+	MOVW	$SYS_kill, R1
+	SYSCALL
+	RET
+
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	MOVW	$SYS_getpid, R1
+	SYSCALL
+	MOVD	R2, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT|NOFRAME,$0-24
+	MOVD	tgid+0(FP), R2
+	MOVD	tid+8(FP), R3
+	MOVD	sig+16(FP), R4
+	MOVW	$SYS_tgkill, R1
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R2
+	MOVD	new+8(FP), R3
+	MOVD	old+16(FP), R4
+	MOVW	$SYS_setitimer, R1
+	SYSCALL
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	addr+0(FP), R2
+	MOVD	n+8(FP), R3
+	MOVD	dst+16(FP), R4
+	MOVW	$SYS_mincore, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB),NOSPLIT,$16
+	MOVW	$0, R2 // CLOCK_REALTIME
+	MOVD	$tp-16(SP), R3
+	MOVW	$SYS_clock_gettime, R1
+	SYSCALL
+	LMG	tp-16(SP), R2, R3
+	// sec is in R2, nsec in R3
+	MOVD	R2, sec+0(FP)
+	MOVW	R3, nsec+8(FP)
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$16
+	MOVW	$1, R2 // CLOCK_MONOTONIC
+	MOVD	$tp-16(SP), R3
+	MOVW	$SYS_clock_gettime, R1
+	SYSCALL
+	LMG	tp-16(SP), R2, R3
+	// sec is in R2, nsec in R3
+	// return nsec in R2
+	MULLD	$1000000000, R2
+	ADD	R3, R2
+	MOVD	R2, ret+0(FP)
+	RET
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), R2
+	MOVD	new+8(FP), R3
+	MOVD	old+16(FP), R4
+	MOVW	size+24(FP), R5
+	MOVW	$SYS_rt_sigprocmask, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, 2(PC)
+	MOVD	R0, 0(R0) // crash
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOVD	sig+0(FP), R2
+	MOVD	new+8(FP), R3
+	MOVD	old+16(FP), R4
+	MOVD	size+24(FP), R5
+	MOVW	$SYS_rt_sigaction, R1
+	SYSCALL
+	MOVW	R2, ret+32(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R2
+	MOVD	info+16(FP), R3
+	MOVD	ctx+24(FP), R4
+	MOVD	fn+0(FP), R5
+	BL	R5
+	RET
+
+TEXT runtime·sigreturn(SB),NOSPLIT,$0-0
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$64
+	// initialize essential registers (just in case)
+	XOR	R0, R0
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVB	runtime·iscgo(SB), R6
+	CMPBEQ	R6, $0, 2(PC)
+	BL	runtime·load_g(SB)
+
+	MOVW	R2, 8(R15)
+	MOVD	R3, 16(R15)
+	MOVD	R4, 24(R15)
+	MOVD	$runtime·sigtrampgo(SB), R5
+	BL	R5
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	BR	runtime·sigtramp(SB)
+
+// func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) unsafe.Pointer
+TEXT runtime·mmap(SB),NOSPLIT,$48-48
+	MOVD	addr+0(FP), R2
+	MOVD	n+8(FP), R3
+	MOVW	prot+16(FP), R4
+	MOVW	flags+20(FP), R5
+	MOVW	fd+24(FP), R6
+	MOVWZ	off+28(FP), R7
+
+	// s390x uses old_mmap, so the arguments need to be placed into
+	// a struct and a pointer to the struct passed to mmap.
+	MOVD	R2, addr-48(SP)
+	MOVD	R3, n-40(SP)
+	MOVD	R4, prot-32(SP)
+	MOVD	R5, flags-24(SP)
+	MOVD	R6, fd-16(SP)
+	MOVD	R7, off-8(SP)
+
+	MOVD	$addr-48(SP), R2
+	MOVW	$SYS_mmap, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, ok
+	NEG	R2
+	MOVD	$0, p+32(FP)
+	MOVD	R2, err+40(FP)
+	RET
+ok:
+	MOVD	R2, p+32(FP)
+	MOVD	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R2
+	MOVD	n+8(FP), R3
+	MOVW	$SYS_munmap, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, 2(PC)
+	MOVD	R0, 0(R0) // crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R2
+	MOVD	n+8(FP), R3
+	MOVW	flags+16(FP), R4
+	MOVW	$SYS_madvise, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// int64 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R2
+	MOVW	op+8(FP), R3
+	MOVW	val+12(FP), R4
+	MOVD	ts+16(FP), R5
+	MOVD	addr2+24(FP), R6
+	MOVW	val3+32(FP),  R7
+	MOVW	$SYS_futex, R1
+	SYSCALL
+	MOVW	R2, ret+40(FP)
+	RET
+
+// int32 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R3
+	MOVD	stk+8(FP), R2
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	// Careful: Linux system call clobbers ???.
+	MOVD	mp+16(FP), R7
+	MOVD	gp+24(FP), R8
+	MOVD	fn+32(FP), R9
+
+	MOVD	R7, -8(R2)
+	MOVD	R8, -16(R2)
+	MOVD	R9, -24(R2)
+	MOVD	$1234, R7
+	MOVD	R7, -32(R2)
+
+	SYSCALL $SYS_clone
+
+	// In parent, return.
+	CMPBEQ	R2, $0, 3(PC)
+	MOVW	R2, ret+40(FP)
+	RET
+
+	// In child, on new stack.
+	// initialize essential registers
+	XOR	R0, R0
+	MOVD	-32(R15), R7
+	CMP	R7, $1234
+	BEQ	2(PC)
+	MOVD	R0, 0(R0)
+
+	// Initialize m->procid to Linux tid
+	SYSCALL $SYS_gettid
+
+	MOVD	-24(R15), R9        // fn
+	MOVD	-16(R15), R8        // g
+	MOVD	-8(R15), R7         // m
+
+	CMPBEQ	R7, $0, nog
+	CMP	R8, $0
+	BEQ	nog
+
+	MOVD	R2, m_procid(R7)
+
+	// In child, set up new stack
+	MOVD	R7, g_m(R8)
+	MOVD	R8, g
+	//CALL	runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	BL	R9
+
+	// It shouldn't return.	 If it does, exit that thread.
+	MOVW	$111, R2
+	MOVW	$SYS_exit, R1
+	SYSCALL
+	BR	-2(PC)	// keep exiting
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVD	new+0(FP), R2
+	MOVD	old+8(FP), R3
+	MOVW	$SYS_sigaltstack, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, 2(PC)
+	MOVD	R0, 0(R0) // crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$SYS_sched_yield, R1
+	SYSCALL
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOVD	pid+0(FP), R2
+	MOVD	len+8(FP), R3
+	MOVD	buf+16(FP), R4
+	MOVW	$SYS_sched_getaffinity, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// int32 runtime·epollcreate(int32 size);
+TEXT runtime·epollcreate(SB),NOSPLIT|NOFRAME,$0
+	MOVW    size+0(FP), R2
+	MOVW	$SYS_epoll_create, R1
+	SYSCALL
+	MOVW	R2, ret+8(FP)
+	RET
+
+// int32 runtime·epollcreate1(int32 flags);
+TEXT runtime·epollcreate1(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R2
+	MOVW	$SYS_epoll_create1, R1
+	SYSCALL
+	MOVW	R2, ret+8(FP)
+	RET
+
+// func epollctl(epfd, op, fd int32, ev *epollEvent) int
+TEXT runtime·epollctl(SB),NOSPLIT|NOFRAME,$0
+	MOVW	epfd+0(FP), R2
+	MOVW	op+4(FP), R3
+	MOVW	fd+8(FP), R4
+	MOVD	ev+16(FP), R5
+	MOVW	$SYS_epoll_ctl, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// int32 runtime·epollwait(int32 epfd, EpollEvent *ev, int32 nev, int32 timeout);
+TEXT runtime·epollwait(SB),NOSPLIT|NOFRAME,$0
+	MOVW	epfd+0(FP), R2
+	MOVD	ev+8(FP), R3
+	MOVW	nev+16(FP), R4
+	MOVW	timeout+20(FP), R5
+	MOVW	$SYS_epoll_wait, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT|NOFRAME,$0
+	MOVW    fd+0(FP), R2  // fd
+	MOVD    $2, R3  // F_SETFD
+	MOVD    $1, R4  // FD_CLOEXEC
+	MOVW	$SYS_fcntl, R1
+	SYSCALL
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	fd+0(FP), R2 // fd
+	MOVD	$3, R3	// F_GETFL
+	XOR	R4, R4
+	MOVW	$SYS_fcntl, R1
+	SYSCALL
+	MOVD	$0x800, R4 // O_NONBLOCK
+	OR	R2, R4
+	MOVW	fd+0(FP), R2 // fd
+	MOVD	$4, R3	// F_SETFL
+	MOVW	$SYS_fcntl, R1
+	SYSCALL
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT|NOFRAME,$0-8
+	// Implemented as brk(NULL).
+	MOVD	$0, R2
+	MOVW	$SYS_brk, R1
+	SYSCALL
+	MOVD	R2, ret+0(FP)
+	RET
+
+TEXT runtime·access(SB),$0-20
+	MOVD	$0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·connect(SB),$0-28
+	MOVD	$0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·socket(SB),$0-20
+	MOVD	$0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP)
+	RET
diff --git a/src/runtime/sys_mips64x.go b/src/runtime/sys_mips64x.go
new file mode 100644
index 0000000..cb429c3
--- /dev/null
+++ b/src/runtime/sys_mips64x.go
@@ -0,0 +1,20 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_mipsx.go b/src/runtime/sys_mipsx.go
new file mode 100644
index 0000000..2819218
--- /dev/null
+++ b/src/runtime/sys_mipsx.go
@@ -0,0 +1,20 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_netbsd_386.s b/src/runtime/sys_netbsd_386.s
new file mode 100644
index 0000000..d0c470c
--- /dev/null
+++ b/src/runtime/sys_netbsd_386.s
@@ -0,0 +1,499 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for 386, NetBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		3
+#define FD_CLOEXEC		1
+#define F_SETFD			2
+
+#define SYS_exit			1
+#define SYS_read			3
+#define SYS_write			4
+#define SYS_open			5
+#define SYS_close			6
+#define SYS_getpid			20
+#define SYS_kill			37
+#define SYS_munmap			73
+#define SYS_madvise			75
+#define SYS_fcntl			92
+#define SYS_mmap			197
+#define SYS___sysctl			202
+#define SYS___sigaltstack14		281
+#define SYS___sigprocmask14		293
+#define SYS_getcontext			307
+#define SYS_setcontext			308
+#define SYS__lwp_create			309
+#define SYS__lwp_exit			310
+#define SYS__lwp_self			311
+#define SYS__lwp_setprivate		317
+#define SYS__lwp_kill			318
+#define SYS__lwp_unpark			321
+#define SYS___sigaction_sigtramp	340
+#define SYS_kqueue			344
+#define SYS_sched_yield			350
+#define SYS___setitimer50		425
+#define SYS___clock_gettime50		427
+#define SYS___nanosleep50		430
+#define SYS___kevent50			435
+#define SYS____lwp_park60		478
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-4
+	MOVL	$SYS_exit, AX
+	INT	$0x80
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVL	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$SYS__lwp_exit, AX
+	INT	$0x80
+	MOVL	$0xf1, 0xf1		// crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-4
+	MOVL	$SYS_open, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-4
+	MOVL	$SYS_close, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-4
+	MOVL	$SYS_read, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	MOVL	$42, AX
+	INT	$0x80
+	JCC	pipeok
+	MOVL	$-1, r+0(FP)
+	MOVL	$-1, w+4(FP)
+	MOVL	AX, errno+8(FP)
+	RET
+pipeok:
+	MOVL	AX, r+0(FP)
+	MOVL	DX, w+4(FP)
+	MOVL	$0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$12-16
+	MOVL	$453, AX
+	LEAL	r+4(FP), BX
+	MOVL	BX, 4(SP)
+	MOVL	flags+0(FP), BX
+	MOVL	BX, 8(SP)
+	INT	$0x80
+	MOVL	AX, errno+12(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-4
+	MOVL	$SYS_write, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$24
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVL	AX, 12(SP)		// tv_sec - l32
+	MOVL	$0, 16(SP)		// tv_sec - h32
+	MOVL	$1000, AX
+	MULL	DX
+	MOVL	AX, 20(SP)		// tv_nsec
+
+	MOVL	$0, 0(SP)
+	LEAL	12(SP), AX
+	MOVL	AX, 4(SP)		// arg 1 - rqtp
+	MOVL	$0, 8(SP)		// arg 2 - rmtp
+	MOVL	$SYS___nanosleep50, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·lwp_kill(SB),NOSPLIT,$12-8
+	MOVL	$0, 0(SP)
+	MOVL	tid+0(FP), AX
+	MOVL	AX, 4(SP)		// arg 1 - target
+	MOVL	sig+4(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - signo
+	MOVL	$SYS__lwp_kill, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$12
+	MOVL	$SYS_getpid, AX
+	INT	$0x80
+	MOVL	$0, 0(SP)
+	MOVL	AX, 4(SP)		// arg 1 - pid
+	MOVL	sig+0(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - signo
+	MOVL	$SYS_kill, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$36
+	LEAL	addr+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL				// arg 1 - addr
+	MOVSL				// arg 2 - len
+	MOVSL				// arg 3 - prot
+	MOVSL				// arg 4 - flags
+	MOVSL				// arg 5 - fd
+	MOVL	$0, AX
+	STOSL				// arg 6 - pad
+	MOVSL				// arg 7 - offset
+	MOVL	$0, AX			// top 32 bits of file offset
+	STOSL
+	MOVL	$SYS_mmap, AX
+	INT	$0x80
+	JAE	ok
+	MOVL	$0, p+24(FP)
+	MOVL	AX, err+28(FP)
+	RET
+ok:
+	MOVL	AX, p+24(FP)
+	MOVL	$0, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$-4
+	MOVL	$SYS_munmap, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$-4
+	MOVL	$SYS_madvise, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$-4
+	MOVL	$SYS___setitimer50, AX
+	INT	$0x80
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB), NOSPLIT, $32
+	LEAL	12(SP), BX
+	MOVL	$CLOCK_REALTIME, 4(SP)	// arg 1 - clock_id
+	MOVL	BX, 8(SP)		// arg 2 - tp
+	MOVL	$SYS___clock_gettime50, AX
+	INT	$0x80
+
+	MOVL	12(SP), AX		// sec - l32
+	MOVL	AX, sec_lo+0(FP)
+	MOVL	16(SP), AX		// sec - h32
+	MOVL	AX, sec_hi+4(FP)
+
+	MOVL	20(SP), BX		// nsec
+	MOVL	BX, nsec+8(FP)
+	RET
+
+// int64 nanotime1(void) so really
+// void nanotime1(int64 *nsec)
+TEXT runtime·nanotime1(SB),NOSPLIT,$32
+	LEAL	12(SP), BX
+	MOVL	$CLOCK_MONOTONIC, 4(SP)	// arg 1 - clock_id
+	MOVL	BX, 8(SP)		// arg 2 - tp
+	MOVL	$SYS___clock_gettime50, AX
+	INT	$0x80
+
+	MOVL	16(SP), CX		// sec - h32
+	IMULL	$1000000000, CX
+
+	MOVL	12(SP), AX		// sec - l32
+	MOVL	$1000000000, BX
+	MULL	BX			// result in dx:ax
+
+	MOVL	20(SP), BX		// nsec
+	ADDL	BX, AX
+	ADCL	CX, DX			// add high bits with carry
+
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+
+TEXT runtime·getcontext(SB),NOSPLIT,$-4
+	MOVL	$SYS_getcontext, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$-4
+	MOVL	$SYS___sigprocmask14, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT sigreturn_tramp<>(SB),NOSPLIT,$0
+	LEAL	140(SP), AX		// Load address of ucontext
+	MOVL	AX, 4(SP)
+	MOVL	$SYS_setcontext, AX
+	INT	$0x80
+	MOVL	$-1, 4(SP)		// Something failed...
+	MOVL	$SYS_exit, AX
+	INT	$0x80
+
+TEXT runtime·sigaction(SB),NOSPLIT,$24
+	LEAL	sig+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL				// arg 1 - sig
+	MOVSL				// arg 2 - act
+	MOVSL				// arg 3 - oact
+	LEAL	sigreturn_tramp<>(SB), AX
+	STOSL				// arg 4 - tramp
+	MOVL	$2, AX
+	STOSL				// arg 5 - vers
+	MOVL	$SYS___sigaction_sigtramp, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$12-16
+	MOVL	fn+0(FP), AX
+	MOVL	sig+4(FP), BX
+	MOVL	info+8(FP), CX
+	MOVL	ctx+12(FP), DX
+	MOVL	SP, SI
+	SUBL	$32, SP
+	ANDL	$-15, SP	// align stack: handler might be a C function
+	MOVL	BX, 0(SP)
+	MOVL	CX, 4(SP)
+	MOVL	DX, 8(SP)
+	MOVL	SI, 12(SP)	// save SI: handler might be a Go function
+	CALL	AX
+	MOVL	12(SP), AX
+	MOVL	AX, SP
+	RET
+
+// Called by OS using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT,$28
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVL	BX, bx-4(SP)
+	MOVL	BP, bp-8(SP)
+	MOVL	SI, si-12(SP)
+	MOVL	DI, di-16(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVL	32(SP), BX // signo
+	MOVL	BX, 0(SP)
+	MOVL	36(SP), BX // info
+	MOVL	BX, 4(SP)
+	MOVL	40(SP), BX // context
+	MOVL	BX, 8(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	MOVL	di-16(SP), DI
+	MOVL	si-12(SP), SI
+	MOVL	bp-8(SP),  BP
+	MOVL	bx-4(SP),  BX
+	RET
+
+// int32 lwp_create(void *context, uintptr flags, void *lwpid);
+TEXT runtime·lwp_create(SB),NOSPLIT,$16
+	MOVL	$0, 0(SP)
+	MOVL	ctxt+0(FP), AX
+	MOVL	AX, 4(SP)		// arg 1 - context
+	MOVL	flags+4(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - flags
+	MOVL	lwpid+8(FP), AX
+	MOVL	AX, 12(SP)		// arg 3 - lwpid
+	MOVL	$SYS__lwp_create, AX
+	INT	$0x80
+	JCC	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·lwp_tramp(SB),NOSPLIT,$0
+
+	// Set FS to point at m->tls
+	LEAL	m_tls(BX), BP
+	PUSHAL				// save registers
+	PUSHL	BP
+	CALL	lwp_setprivate<>(SB)
+	POPL	AX
+	POPAL
+
+	// Now segment is established. Initialize m, g.
+	get_tls(AX)
+	MOVL	DX, g(AX)
+	MOVL	BX, g_m(DX)
+
+	CALL	runtime·stackcheck(SB)	// smashes AX, CX
+	MOVL	0(DX), DX		// paranoia; check they are not nil
+	MOVL	0(BX), BX
+
+	// more paranoia; check that stack splitting code works
+	PUSHAL
+	CALL	runtime·emptyfunc(SB)
+	POPAL
+
+	// Call fn
+	CALL	SI
+
+	// fn should never return
+	MOVL	$0x1234, 0x1005
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVL	$SYS___sigaltstack14, AX
+	MOVL	new+0(FP), BX
+	MOVL	old+4(FP), CX
+	INT	$0x80
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	INT	$3
+	RET
+
+TEXT runtime·setldt(SB),NOSPLIT,$8
+	// Under NetBSD we set the GS base instead of messing with the LDT.
+	MOVL	base+4(FP), AX
+	MOVL	AX, 0(SP)
+	CALL	lwp_setprivate<>(SB)
+	RET
+
+TEXT lwp_setprivate<>(SB),NOSPLIT,$16
+	// adjust for ELF: wants to use -4(GS) for g
+	MOVL	base+0(FP), CX
+	ADDL	$4, CX
+	MOVL	$0, 0(SP)		// syscall gap
+	MOVL	CX, 4(SP)		// arg 1 - ptr
+	MOVL	$SYS__lwp_setprivate, AX
+	INT	$0x80
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$-4
+	MOVL	$SYS_sched_yield, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·lwp_park(SB),NOSPLIT,$-4
+	MOVL	$SYS____lwp_park60, AX
+	INT	$0x80
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·lwp_unpark(SB),NOSPLIT,$-4
+	MOVL	$SYS__lwp_unpark, AX
+	INT	$0x80
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·lwp_self(SB),NOSPLIT,$-4
+	MOVL	$SYS__lwp_self, AX
+	INT	$0x80
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$28
+	LEAL	mib+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL				// arg 1 - name
+	MOVSL				// arg 2 - namelen
+	MOVSL				// arg 3 - oldp
+	MOVSL				// arg 4 - oldlenp
+	MOVSL				// arg 5 - newp
+	MOVSL				// arg 6 - newlen
+	MOVL	$SYS___sysctl, AX
+	INT	$0x80
+	JAE	4(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+GLOBL runtime·tlsoffset(SB),NOPTR,$4
+
+// int32 runtime·kqueue(void)
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVL	$SYS_kqueue, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	$SYS___kevent50, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int32 runtime·closeonexec(int32 fd)
+TEXT runtime·closeonexec(SB),NOSPLIT,$32
+	MOVL	$SYS_fcntl, AX
+	// 0(SP) is where the caller PC would be; kernel skips it
+	MOVL	fd+0(FP), BX
+	MOVL	BX, 4(SP)	// fd
+	MOVL	$F_SETFD, 8(SP)
+	MOVL	$FD_CLOEXEC, 12(SP)
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	RET
+
+// func runtime·setNonblock(fd int32)
+TEXT runtime·setNonblock(SB),NOSPLIT,$16-4
+	MOVL	$92, AX // fcntl
+	MOVL	fd+0(FP), BX // fd
+	MOVL	BX, 4(SP)
+	MOVL	$3, 8(SP) // F_GETFL
+	MOVL	$0, 12(SP)
+	INT	$0x80
+	MOVL	fd+0(FP), BX // fd
+	MOVL	BX, 4(SP)
+	MOVL	$4, 8(SP) // F_SETFL
+	ORL	$4, AX // O_NONBLOCK
+	MOVL	AX, 12(SP)
+	MOVL	$92, AX // fcntl
+	INT	$0x80
+	RET
diff --git a/src/runtime/sys_netbsd_amd64.s b/src/runtime/sys_netbsd_amd64.s
new file mode 100644
index 0000000..dc9bd12
--- /dev/null
+++ b/src/runtime/sys_netbsd_amd64.s
@@ -0,0 +1,468 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for AMD64, NetBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		3
+#define FD_CLOEXEC		1
+#define F_SETFD			2
+
+#define SYS_exit			1
+#define SYS_read			3
+#define SYS_write			4
+#define SYS_open			5
+#define SYS_close			6
+#define SYS_getpid			20
+#define SYS_kill			37
+#define SYS_munmap			73
+#define SYS_madvise			75
+#define SYS_fcntl			92
+#define SYS_mmap			197
+#define SYS___sysctl			202
+#define SYS___sigaltstack14		281
+#define SYS___sigprocmask14		293
+#define SYS_getcontext			307
+#define SYS_setcontext			308
+#define SYS__lwp_create			309
+#define SYS__lwp_exit			310
+#define SYS__lwp_self			311
+#define SYS__lwp_setprivate		317
+#define SYS__lwp_kill			318
+#define SYS__lwp_unpark			321
+#define SYS___sigaction_sigtramp	340
+#define SYS_kqueue			344
+#define SYS_sched_yield			350
+#define SYS___setitimer50		425
+#define SYS___clock_gettime50		427
+#define SYS___nanosleep50		430
+#define SYS___kevent50			435
+#define SYS____lwp_park60		478
+
+// int32 lwp_create(void *context, uintptr flags, void *lwpid)
+TEXT runtime·lwp_create(SB),NOSPLIT,$0
+	MOVQ	ctxt+0(FP), DI
+	MOVQ	flags+8(FP), SI
+	MOVQ	lwpid+16(FP), DX
+	MOVL	$SYS__lwp_create, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·lwp_tramp(SB),NOSPLIT,$0
+
+	// Set FS to point at m->tls.
+	LEAQ	m_tls(R8), DI
+	CALL	runtime·settls(SB)
+
+	// Set up new stack.
+	get_tls(CX)
+	MOVQ	R8, g_m(R9)
+	MOVQ	R9, g(CX)
+	CALL	runtime·stackcheck(SB)
+
+	// Call fn
+	CALL	R12
+
+	// It shouldn't return. If it does, exit.
+	MOVL	$SYS__lwp_exit, AX
+	SYSCALL
+	JMP	-3(PC)			// keep exiting
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVL	$SYS_sched_yield, AX
+	SYSCALL
+	RET
+
+TEXT runtime·lwp_park(SB),NOSPLIT,$0
+	MOVL	clockid+0(FP), DI		// arg 1 - clockid
+	MOVL	flags+4(FP), SI			// arg 2 - flags
+	MOVQ	ts+8(FP), DX			// arg 3 - ts
+	MOVL	unpark+16(FP), R10		// arg 4 - unpark
+	MOVQ	hint+24(FP), R8			// arg 5 - hint
+	MOVQ	unparkhint+32(FP), R9		// arg 6 - unparkhint
+	MOVL	$SYS____lwp_park60, AX
+	SYSCALL
+	MOVL	AX, ret+40(FP)
+	RET
+
+TEXT runtime·lwp_unpark(SB),NOSPLIT,$0
+	MOVL	lwp+0(FP), DI		// arg 1 - lwp
+	MOVQ	hint+8(FP), SI		// arg 2 - hint
+	MOVL	$SYS__lwp_unpark, AX
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·lwp_self(SB),NOSPLIT,$0
+	MOVL	$SYS__lwp_self, AX
+	SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-8
+	MOVL	code+0(FP), DI		// arg 1 - exit status
+	MOVL	$SYS_exit, AX
+	SYSCALL
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-8
+	MOVQ	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$SYS__lwp_exit, AX
+	SYSCALL
+	MOVL	$0xf1, 0xf1		// crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-8
+	MOVQ	name+0(FP), DI		// arg 1 pathname
+	MOVL	mode+8(FP), SI		// arg 2 flags
+	MOVL	perm+12(FP), DX		// arg 3 mode
+	MOVL	$SYS_open, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVL	$SYS_close, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVQ	p+8(FP), SI		// arg 2 buf
+	MOVL	n+16(FP), DX		// arg 3 count
+	MOVL	$SYS_read, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	MOVL	$42, AX
+	SYSCALL
+	JCC	pipeok
+	MOVL	$-1, r+0(FP)
+	MOVL	$-1, w+4(FP)
+	MOVL	AX, errno+8(FP)
+	RET
+pipeok:
+	MOVL	AX, r+0(FP)
+	MOVL	DX, w+4(FP)
+	MOVL	$0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-20
+	LEAQ	r+8(FP), DI
+	MOVL	flags+0(FP), SI
+	MOVL	$453, AX
+	SYSCALL
+	MOVL	AX, errno+16(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-8
+	MOVQ	fd+0(FP), DI		// arg 1 - fd
+	MOVQ	p+8(FP), SI		// arg 2 - buf
+	MOVL	n+16(FP), DX		// arg 3 - nbyte
+	MOVL	$SYS_write, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVQ	AX, 0(SP)		// tv_sec
+	MOVL	$1000, AX
+	MULL	DX
+	MOVQ	AX, 8(SP)		// tv_nsec
+
+	MOVQ	SP, DI			// arg 1 - rqtp
+	MOVQ	$0, SI			// arg 2 - rmtp
+	MOVL	$SYS___nanosleep50, AX
+	SYSCALL
+	RET
+
+TEXT runtime·lwp_kill(SB),NOSPLIT,$0-16
+	MOVL	tid+0(FP), DI		// arg 1 - target
+	MOVQ	sig+8(FP), SI		// arg 2 - signo
+	MOVL	$SYS__lwp_kill, AX
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$16
+	MOVL	$SYS_getpid, AX
+	SYSCALL
+	MOVQ	AX, DI			// arg 1 - pid
+	MOVL	sig+0(FP), SI		// arg 2 - signo
+	MOVL	$SYS_kill, AX
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$-8
+	MOVL	mode+0(FP), DI		// arg 1 - which
+	MOVQ	new+8(FP), SI		// arg 2 - itv
+	MOVQ	old+16(FP), DX		// arg 3 - oitv
+	MOVL	$SYS___setitimer50, AX
+	SYSCALL
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB), NOSPLIT, $32
+	MOVQ	$CLOCK_REALTIME, DI	// arg 1 - clock_id
+	LEAQ	8(SP), SI		// arg 2 - tp
+	MOVL	$SYS___clock_gettime50, AX
+	SYSCALL
+	MOVQ	8(SP), AX		// sec
+	MOVQ	16(SP), DX		// nsec
+
+	// sec is in AX, nsec in DX
+	MOVQ	AX, sec+0(FP)
+	MOVL	DX, nsec+8(FP)
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$32
+	MOVQ	$CLOCK_MONOTONIC, DI	// arg 1 - clock_id
+	LEAQ	8(SP), SI		// arg 2 - tp
+	MOVL	$SYS___clock_gettime50, AX
+	SYSCALL
+	MOVQ	8(SP), AX		// sec
+	MOVQ	16(SP), DX		// nsec
+
+	// sec is in AX, nsec in DX
+	// return nsec in AX
+	IMULQ	$1000000000, AX
+	ADDQ	DX, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+
+TEXT runtime·getcontext(SB),NOSPLIT,$-8
+	MOVQ	ctxt+0(FP), DI		// arg 1 - context
+	MOVL	$SYS_getcontext, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVL	how+0(FP), DI		// arg 1 - how
+	MOVQ	new+8(FP), SI		// arg 2 - set
+	MOVQ	old+16(FP), DX		// arg 3 - oset
+	MOVL	$SYS___sigprocmask14, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT sigreturn_tramp<>(SB),NOSPLIT,$-8
+	MOVQ	R15, DI			// Load address of ucontext
+	MOVQ	$SYS_setcontext, AX
+	SYSCALL
+	MOVQ	$-1, DI			// Something failed...
+	MOVL	$SYS_exit, AX
+	SYSCALL
+
+TEXT runtime·sigaction(SB),NOSPLIT,$-8
+	MOVL	sig+0(FP), DI		// arg 1 - signum
+	MOVQ	new+8(FP), SI		// arg 2 - nsa
+	MOVQ	old+16(FP), DX		// arg 3 - osa
+					// arg 4 - tramp
+	LEAQ	sigreturn_tramp<>(SB), R10
+	MOVQ	$2, R8			// arg 5 - vers
+	MOVL	$SYS___sigaction_sigtramp, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	PUSHQ	BP
+	MOVQ	SP, BP
+	ANDQ	$~15, SP     // alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$72
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVQ	BX,  bx-8(SP)
+	MOVQ	BP,  bp-16(SP)  // save in case GOEXPERIMENT=noframepointer is set
+	MOVQ	R12, r12-24(SP)
+	MOVQ	R13, r13-32(SP)
+	MOVQ	R14, r14-40(SP)
+	MOVQ	R15, r15-48(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVQ	DX, ctx-56(SP)
+	MOVQ	SI, info-64(SP)
+	MOVQ	DI, signum-72(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	MOVQ	r15-48(SP), R15
+	MOVQ	r14-40(SP), R14
+	MOVQ	r13-32(SP), R13
+	MOVQ	r12-24(SP), R12
+	MOVQ	bp-16(SP),  BP
+	MOVQ	bx-8(SP),   BX
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 - addr
+	MOVQ	n+8(FP), SI		// arg 2 - len
+	MOVL	prot+16(FP), DX		// arg 3 - prot
+	MOVL	flags+20(FP), R10		// arg 4 - flags
+	MOVL	fd+24(FP), R8		// arg 5 - fd
+	MOVL	off+28(FP), R9
+	SUBQ	$16, SP
+	MOVQ	R9, 8(SP)		// arg 7 - offset (passed on stack)
+	MOVQ	$0, R9			// arg 6 - pad
+	MOVL	$SYS_mmap, AX
+	SYSCALL
+	JCC	ok
+	ADDQ	$16, SP
+	MOVQ	$0, p+32(FP)
+	MOVQ	AX, err+40(FP)
+	RET
+ok:
+	ADDQ	$16, SP
+	MOVQ	AX, p+32(FP)
+	MOVQ	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 - addr
+	MOVQ	n+8(FP), SI		// arg 2 - len
+	MOVL	$SYS_munmap, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 - addr
+	MOVQ	n+8(FP), SI		// arg 2 - len
+	MOVL	flags+16(FP), DX	// arg 3 - behav
+	MOVQ	$SYS_madvise, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVQ	new+0(FP), DI		// arg 1 - nss
+	MOVQ	old+8(FP), SI		// arg 2 - oss
+	MOVQ	$SYS___sigaltstack14, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+// set tls base to DI
+TEXT runtime·settls(SB),NOSPLIT,$8
+	// adjust for ELF: wants to use -8(FS) for g
+	ADDQ	$8, DI			// arg 1 - ptr
+	MOVQ	$SYS__lwp_setprivate, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVQ	mib+0(FP), DI		// arg 1 - name
+	MOVL	miblen+8(FP), SI		// arg 2 - namelen
+	MOVQ	out+16(FP), DX		// arg 3 - oldp
+	MOVQ	size+24(FP), R10		// arg 4 - oldlenp
+	MOVQ	dst+32(FP), R8		// arg 5 - newp
+	MOVQ	ndst+40(FP), R9		// arg 6 - newlen
+	MOVQ	$SYS___sysctl, AX
+	SYSCALL
+	JCC 4(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+// int32 runtime·kqueue(void)
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVQ	$0, DI
+	MOVL	$SYS_kqueue, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	kq+0(FP), DI
+	MOVQ	ch+8(FP), SI
+	MOVL	nch+16(FP), DX
+	MOVQ	ev+24(FP), R10
+	MOVL	nev+32(FP), R8
+	MOVQ	ts+40(FP), R9
+	MOVL	$SYS___kevent50, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd)
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVL	fd+0(FP), DI	// fd
+	MOVQ	$F_SETFD, SI
+	MOVQ	$FD_CLOEXEC, DX
+	MOVL	$SYS_fcntl, AX
+	SYSCALL
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVL    fd+0(FP), DI  // fd
+	MOVQ    $3, SI  // F_GETFL
+	MOVQ    $0, DX
+	MOVL	$92, AX // fcntl
+	SYSCALL
+	MOVL	fd+0(FP), DI // fd
+	MOVQ	$4, SI // F_SETFL
+	MOVQ	$4, DX // O_NONBLOCK
+	ORL	AX, DX
+	MOVL	$92, AX // fcntl
+	SYSCALL
+	RET
diff --git a/src/runtime/sys_netbsd_arm.s b/src/runtime/sys_netbsd_arm.s
new file mode 100644
index 0000000..678dea5
--- /dev/null
+++ b/src/runtime/sys_netbsd_arm.s
@@ -0,0 +1,441 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for ARM, NetBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		3
+#define FD_CLOEXEC		1
+#define F_SETFD			2
+
+#define SWI_OS_NETBSD			0xa00000
+#define SYS_exit			SWI_OS_NETBSD | 1
+#define SYS_read			SWI_OS_NETBSD | 3
+#define SYS_write			SWI_OS_NETBSD | 4
+#define SYS_open			SWI_OS_NETBSD | 5
+#define SYS_close			SWI_OS_NETBSD | 6
+#define SYS_getpid			SWI_OS_NETBSD | 20
+#define SYS_kill			SWI_OS_NETBSD | 37
+#define SYS_munmap			SWI_OS_NETBSD | 73
+#define SYS_madvise			SWI_OS_NETBSD | 75
+#define SYS_fcntl			SWI_OS_NETBSD | 92
+#define SYS_mmap			SWI_OS_NETBSD | 197
+#define SYS___sysctl			SWI_OS_NETBSD | 202
+#define SYS___sigaltstack14		SWI_OS_NETBSD | 281
+#define SYS___sigprocmask14		SWI_OS_NETBSD | 293
+#define SYS_getcontext			SWI_OS_NETBSD | 307
+#define SYS_setcontext			SWI_OS_NETBSD | 308
+#define SYS__lwp_create			SWI_OS_NETBSD | 309
+#define SYS__lwp_exit			SWI_OS_NETBSD | 310
+#define SYS__lwp_self			SWI_OS_NETBSD | 311
+#define SYS__lwp_getprivate		SWI_OS_NETBSD | 316
+#define SYS__lwp_setprivate		SWI_OS_NETBSD | 317
+#define SYS__lwp_kill			SWI_OS_NETBSD | 318
+#define SYS__lwp_unpark			SWI_OS_NETBSD | 321
+#define SYS___sigaction_sigtramp	SWI_OS_NETBSD | 340
+#define SYS_kqueue			SWI_OS_NETBSD | 344
+#define SYS_sched_yield			SWI_OS_NETBSD | 350
+#define SYS___setitimer50		SWI_OS_NETBSD | 425
+#define SYS___clock_gettime50		SWI_OS_NETBSD | 427
+#define SYS___nanosleep50		SWI_OS_NETBSD | 430
+#define SYS___kevent50			SWI_OS_NETBSD | 435
+#define SYS____lwp_park60		SWI_OS_NETBSD | 478
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0
+	MOVW code+0(FP), R0	// arg 1 exit status
+	SWI $SYS_exit
+	MOVW.CS $0, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVW wait+0(FP), R0
+	// We're done using the stack.
+	MOVW $0, R2
+storeloop:
+	LDREX (R0), R4          // loads R4
+	STREX R2, (R0), R1      // stores R2
+	CMP $0, R1
+	BNE storeloop
+	SWI $SYS__lwp_exit
+	MOVW $1, R8	// crash
+	MOVW R8, (R8)
+	JMP 0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0
+	MOVW name+0(FP), R0
+	MOVW mode+4(FP), R1
+	MOVW perm+8(FP), R2
+	SWI $SYS_open
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0
+	MOVW fd+0(FP), R0
+	SWI $SYS_close
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+4(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0
+	MOVW fd+0(FP), R0
+	MOVW p+4(FP), R1
+	MOVW n+8(FP), R2
+	SWI $SYS_read
+	RSB.CS	$0, R0		// caller expects negative errno
+	MOVW	R0, ret+12(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	SWI $0xa0002a
+	BCC pipeok
+	MOVW $-1,R2
+	MOVW R2, r+0(FP)
+	MOVW R2, w+4(FP)
+	MOVW R0, errno+8(FP)
+	RET
+pipeok:
+	MOVW $0, R2
+	MOVW R0, r+0(FP)
+	MOVW R1, w+4(FP)
+	MOVW R2, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVW $r+4(FP), R0
+	MOVW flags+0(FP), R1
+	SWI $0xa001c5
+	MOVW R0, errno+12(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R0	// arg 1 - fd
+	MOVW	p+4(FP), R1	// arg 2 - buf
+	MOVW	n+8(FP), R2	// arg 3 - nbyte
+	SWI $SYS_write
+	RSB.CS	$0, R0		// caller expects negative errno
+	MOVW	R0, ret+12(FP)
+	RET
+
+// int32 lwp_create(void *context, uintptr flags, void *lwpid)
+TEXT runtime·lwp_create(SB),NOSPLIT,$0
+	MOVW ctxt+0(FP), R0
+	MOVW flags+4(FP), R1
+	MOVW lwpid+8(FP), R2
+	SWI $SYS__lwp_create
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	SWI $SYS_sched_yield
+	RET
+
+TEXT runtime·lwp_park(SB),NOSPLIT,$8
+	MOVW clockid+0(FP), R0		// arg 1 - clock_id
+	MOVW flags+4(FP), R1		// arg 2 - flags
+	MOVW ts+8(FP), R2		// arg 3 - ts
+	MOVW unpark+12(FP), R3		// arg 4 - unpark
+	MOVW hint+16(FP), R4		// arg 5 - hint
+	MOVW R4, 4(R13)
+	MOVW unparkhint+20(FP), R5	// arg 6 - unparkhint
+	MOVW R5, 8(R13)
+	SWI $SYS____lwp_park60
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·lwp_unpark(SB),NOSPLIT,$0
+	MOVW	lwp+0(FP), R0	// arg 1 - lwp
+	MOVW	hint+4(FP), R1	// arg 2 - hint
+	SWI	$SYS__lwp_unpark
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·lwp_self(SB),NOSPLIT,$0
+	SWI	$SYS__lwp_self
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT runtime·lwp_tramp(SB),NOSPLIT,$0
+	MOVW R0, g_m(R1)
+	MOVW R1, g
+
+	BL runtime·emptyfunc(SB) // fault if stack check is wrong
+	BL (R2)
+	MOVW $2, R8  // crash (not reached)
+	MOVW R8, (R8)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVW usec+0(FP), R0
+	CALL runtime·usplitR0(SB)
+	// 0(R13) is the saved LR, don't use it
+	MOVW R0, 4(R13) // tv_sec.low
+	MOVW $0, R0
+	MOVW R0, 8(R13) // tv_sec.high
+	MOVW $1000, R2
+	MUL R1, R2
+	MOVW R2, 12(R13) // tv_nsec
+
+	MOVW $4(R13), R0 // arg 1 - rqtp
+	MOVW $0, R1      // arg 2 - rmtp
+	SWI $SYS___nanosleep50
+	RET
+
+TEXT runtime·lwp_kill(SB),NOSPLIT,$0-8
+	MOVW	tid+0(FP), R0	// arg 1 - tid
+	MOVW	sig+4(FP), R1	// arg 2 - signal
+	SWI	$SYS__lwp_kill
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$16
+	SWI	$SYS_getpid	// the returned R0 is arg 1
+	MOVW	sig+0(FP), R1	// arg 2 - signal
+	SWI	$SYS_kill
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0
+	MOVW mode+0(FP), R0	// arg 1 - which
+	MOVW new+4(FP), R1	// arg 2 - itv
+	MOVW old+8(FP), R2	// arg 3 - oitv
+	SWI $SYS___setitimer50
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB), NOSPLIT, $32
+	MOVW $0, R0	// CLOCK_REALTIME
+	MOVW $8(R13), R1
+	SWI $SYS___clock_gettime50
+
+	MOVW 8(R13), R0	// sec.low
+	MOVW 12(R13), R1 // sec.high
+	MOVW 16(R13), R2 // nsec
+
+	MOVW R0, sec_lo+0(FP)
+	MOVW R1, sec_hi+4(FP)
+	MOVW R2, nsec+8(FP)
+	RET
+
+// int64 nanotime1(void) so really
+// void nanotime1(int64 *nsec)
+TEXT runtime·nanotime1(SB), NOSPLIT, $32
+	MOVW $3, R0 // CLOCK_MONOTONIC
+	MOVW $8(R13), R1
+	SWI $SYS___clock_gettime50
+
+	MOVW 8(R13), R0 // sec.low
+	MOVW 12(R13), R4 // sec.high
+	MOVW 16(R13), R2 // nsec
+
+	MOVW $1000000000, R3
+	MULLU R0, R3, (R1, R0)
+	MUL R3, R4
+	ADD.S R2, R0
+	ADC R4, R1
+
+	MOVW R0, ret_lo+0(FP)
+	MOVW R1, ret_hi+4(FP)
+	RET
+
+TEXT runtime·getcontext(SB),NOSPLIT|NOFRAME,$0
+	MOVW ctxt+0(FP), R0	// arg 1 - context
+	SWI $SYS_getcontext
+	MOVW.CS $0, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVW how+0(FP), R0	// arg 1 - how
+	MOVW new+4(FP), R1	// arg 2 - set
+	MOVW old+8(FP), R2	// arg 3 - oset
+	SWI $SYS___sigprocmask14
+	MOVW.CS $0, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT sigreturn_tramp<>(SB),NOSPLIT|NOFRAME,$0
+	// on entry, SP points to siginfo, we add sizeof(ucontext)
+	// to SP to get a pointer to ucontext.
+	ADD $0x80, R13, R0 // 0x80 == sizeof(UcontextT)
+	SWI $SYS_setcontext
+	// something failed, we have to exit
+	MOVW $0x4242, R0 // magic return number
+	SWI $SYS_exit
+	B -2(PC)	// continue exit
+
+TEXT runtime·sigaction(SB),NOSPLIT,$4
+	MOVW sig+0(FP), R0	// arg 1 - signum
+	MOVW new+4(FP), R1	// arg 2 - nsa
+	MOVW old+8(FP), R2	// arg 3 - osa
+	MOVW $sigreturn_tramp<>(SB), R3	// arg 4 - tramp
+	MOVW $2, R4	// arg 5 - vers
+	MOVW R4, 4(R13)
+	ADD $4, R13	// pass arg 5 on stack
+	SWI $SYS___sigaction_sigtramp
+	SUB $4, R13
+	MOVW.CS $3, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVW	sig+4(FP), R0
+	MOVW	info+8(FP), R1
+	MOVW	ctx+12(FP), R2
+	MOVW	fn+0(FP), R11
+	MOVW	R13, R4
+	SUB	$24, R13
+	BIC	$0x7, R13 // alignment for ELF ABI
+	BL	(R11)
+	MOVW	R4, R13
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$0
+	// Reserve space for callee-save registers and arguments.
+	MOVM.DB.W [R4-R11], (R13)
+	SUB	$16, R13
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVW	R0, 4(R13) // signum
+	MOVB	runtime·iscgo(SB), R0
+	CMP 	$0, R0
+	BL.NE	runtime·load_g(SB)
+
+	MOVW	R1, 8(R13)
+	MOVW	R2, 12(R13)
+	BL	runtime·sigtrampgo(SB)
+
+	// Restore callee-save registers.
+	ADD	$16, R13
+	MOVM.IA.W (R13), [R4-R11]
+
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$12
+	MOVW addr+0(FP), R0	// arg 1 - addr
+	MOVW n+4(FP), R1	// arg 2 - len
+	MOVW prot+8(FP), R2	// arg 3 - prot
+	MOVW flags+12(FP), R3	// arg 4 - flags
+	// arg 5 (fid) and arg6 (offset_lo, offset_hi) are passed on stack
+	// note the C runtime only passes the 32-bit offset_lo to us
+	MOVW fd+16(FP), R4		// arg 5
+	MOVW R4, 4(R13)
+	MOVW off+20(FP), R5		// arg 6 lower 32-bit
+	MOVW R5, 8(R13)
+	MOVW $0, R6 // higher 32-bit for arg 6
+	MOVW R6, 12(R13)
+	ADD $4, R13 // pass arg 5 and arg 6 on stack
+	SWI $SYS_mmap
+	SUB $4, R13
+	MOVW	$0, R1
+	MOVW.CS R0, R1	// if error, move to R1
+	MOVW.CS $0, R0
+	MOVW	R0, p+24(FP)
+	MOVW	R1, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVW addr+0(FP), R0	// arg 1 - addr
+	MOVW n+4(FP), R1	// arg 2 - len
+	SWI $SYS_munmap
+	MOVW.CS $0, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0	// arg 1 - addr
+	MOVW	n+4(FP), R1	// arg 2 - len
+	MOVW	flags+8(FP), R2	// arg 3 - behav
+	SWI	$SYS_madvise
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVW new+0(FP), R0	// arg 1 - nss
+	MOVW old+4(FP), R1	// arg 2 - oss
+	SWI $SYS___sigaltstack14
+	MOVW.CS $0, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$8
+	MOVW mib+0(FP), R0	// arg 1 - name
+	MOVW miblen+4(FP), R1	// arg 2 - namelen
+	MOVW out+8(FP), R2	// arg 3 - oldp
+	MOVW size+12(FP), R3	// arg 4 - oldlenp
+	MOVW dst+16(FP), R4	// arg 5 - newp
+	MOVW R4, 4(R13)
+	MOVW ndst+20(FP), R4	// arg 6 - newlen
+	MOVW R4, 8(R13)
+	ADD $4, R13	// pass arg 5 and 6 on stack
+	SWI $SYS___sysctl
+	SUB $4, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int32 runtime·kqueue(void)
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	SWI	$SYS_kqueue
+	RSB.CS	$0, R0
+	MOVW	R0, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout)
+TEXT runtime·kevent(SB),NOSPLIT,$8
+	MOVW kq+0(FP), R0	// kq
+	MOVW ch+4(FP), R1	// changelist
+	MOVW nch+8(FP), R2	// nchanges
+	MOVW ev+12(FP), R3	// eventlist
+	MOVW nev+16(FP), R4	// nevents
+	MOVW R4, 4(R13)
+	MOVW ts+20(FP), R4	// timeout
+	MOVW R4, 8(R13)
+	ADD $4, R13	// pass arg 5 and 6 on stack
+	SWI $SYS___kevent50
+	RSB.CS $0, R0
+	SUB $4, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd)
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVW fd+0(FP), R0	// fd
+	MOVW $F_SETFD, R1	// F_SETFD
+	MOVW $FD_CLOEXEC, R2	// FD_CLOEXEC
+	SWI $SYS_fcntl
+	RET
+
+// func runtime·setNonblock(fd int32)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVW fd+0(FP), R0	// fd
+	MOVW $3, R1	// F_GETFL
+	MOVW $0, R2
+	SWI $0xa0005c	// sys_fcntl
+	ORR $0x4, R0, R2	// O_NONBLOCK
+	MOVW fd+0(FP), R0	// fd
+	MOVW $4, R1	// F_SETFL
+	SWI $0xa0005c	// sys_fcntl
+	RET
+
+// TODO: this is only valid for ARMv7+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	B	runtime·armPublicationBarrier(SB)
+
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	MOVM.WP [R1, R2, R3, R12], (R13)
+	SWI $SYS__lwp_getprivate
+	MOVM.IAW    (R13), [R1, R2, R3, R12]
+	RET
diff --git a/src/runtime/sys_netbsd_arm64.s b/src/runtime/sys_netbsd_arm64.s
new file mode 100644
index 0000000..4d9b054
--- /dev/null
+++ b/src/runtime/sys_netbsd_arm64.s
@@ -0,0 +1,477 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for arm64, NetBSD
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		3
+#define FD_CLOEXEC		1
+#define F_SETFD			2
+#define F_GETFL			3
+#define F_SETFL			4
+#define O_NONBLOCK		4
+
+#define SYS_exit			1
+#define SYS_read			3
+#define SYS_write			4
+#define SYS_open			5
+#define SYS_close			6
+#define SYS_getpid			20
+#define SYS_kill			37
+#define SYS_munmap			73
+#define SYS_madvise			75
+#define SYS_fcntl			92
+#define SYS_mmap			197
+#define SYS___sysctl			202
+#define SYS___sigaltstack14		281
+#define SYS___sigprocmask14		293
+#define SYS_getcontext			307
+#define SYS_setcontext			308
+#define SYS__lwp_create			309
+#define SYS__lwp_exit			310
+#define SYS__lwp_self			311
+#define SYS__lwp_kill			318
+#define SYS__lwp_unpark			321
+#define SYS___sigaction_sigtramp	340
+#define SYS_kqueue			344
+#define SYS_sched_yield			350
+#define SYS___setitimer50		425
+#define SYS___clock_gettime50		427
+#define SYS___nanosleep50		430
+#define SYS___kevent50			435
+#define SYS_pipe2			453
+#define SYS_openat			468
+#define SYS____lwp_park60		478
+
+// int32 lwp_create(void *context, uintptr flags, void *lwpid)
+TEXT runtime·lwp_create(SB),NOSPLIT,$0
+	MOVD	ctxt+0(FP), R0
+	MOVD	flags+8(FP), R1
+	MOVD	lwpid+16(FP), R2
+	SVC	$SYS__lwp_create
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·lwp_tramp(SB),NOSPLIT,$0
+	CMP	$0, R1
+	BEQ	nog
+	CMP	$0, R2
+	BEQ	nog
+
+	MOVD	R0, g_m(R1)
+	MOVD	R1, g
+nog:
+	CALL	(R2)
+
+	MOVD	$0, R0  // crash (not reached)
+	MOVD	R0, (R8)
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	SVC	$SYS_sched_yield
+	RET
+
+TEXT runtime·lwp_park(SB),NOSPLIT,$0
+	MOVW	clockid+0(FP), R0	// arg 1 - clockid
+	MOVW	flags+4(FP), R1		// arg 2 - flags
+	MOVD	ts+8(FP), R2		// arg 3 - ts
+	MOVW	unpark+16(FP), R3	// arg 4 - unpark
+	MOVD	hint+24(FP), R4		// arg 5 - hint
+	MOVD	unparkhint+32(FP), R5	// arg 6 - unparkhint
+	SVC	$SYS____lwp_park60
+	MOVW	R0, ret+40(FP)
+	RET
+
+TEXT runtime·lwp_unpark(SB),NOSPLIT,$0
+	MOVW	lwp+0(FP), R0		// arg 1 - lwp
+	MOVD	hint+8(FP), R1		// arg 2 - hint
+	SVC	$SYS__lwp_unpark
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·lwp_self(SB),NOSPLIT,$0
+	SVC	$SYS__lwp_self
+	MOVW	R0, ret+0(FP)
+	RET
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-8
+	MOVW	code+0(FP), R0		// arg 1 - exit status
+	SVC	$SYS_exit
+	MOVD	$0, R0			// If we're still running,
+	MOVD	R0, (R0)		// crash
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-8
+	MOVD	wait+0(FP), R0
+	// We're done using the stack.
+	MOVW	$0, R1
+	STLRW	R1, (R0)
+	SVC	$SYS__lwp_exit
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$-8
+	MOVD	name+0(FP), R0		// arg 1 - pathname
+	MOVW	mode+8(FP), R1		// arg 2 - flags
+	MOVW	perm+12(FP), R2		// arg 3 - mode
+	SVC	$SYS_open
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-8
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	SVC	$SYS_close
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	MOVD	p+8(FP), R1		// arg 2 - buf
+	MOVW	n+16(FP), R2		// arg 3 - count
+	SVC	$SYS_read
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT|NOFRAME,$0-12
+	ADD	$8, RSP, R0
+	MOVW	$0, R1
+	SVC	$SYS_pipe2
+	BCC	pipeok
+	NEG	R0, R0
+pipeok:
+	MOVW	R0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	ADD	$16, RSP, R0
+	MOVW	flags+0(FP), R1
+	SVC	$SYS_pipe2
+	BCC	pipe2ok
+	NEG	R0, R0
+pipe2ok:
+	MOVW	R0, errno+16(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-8
+	MOVD	fd+0(FP), R0		// arg 1 - fd
+	MOVD	p+8(FP), R1		// arg 2 - buf
+	MOVW	n+16(FP), R2		// arg 3 - nbyte
+	SVC	$SYS_write
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), R3
+	MOVD	R3, R5
+	MOVW	$1000000, R4
+	UDIV	R4, R3
+	MOVD	R3, 8(RSP)		// sec
+	MUL	R3, R4
+	SUB	R4, R5
+	MOVW	$1000, R4
+	MUL	R4, R5
+	MOVD	R5, 16(RSP)		// nsec
+
+	MOVD	$8(RSP), R0		// arg 1 - rqtp
+	MOVD	$0, R1			// arg 2 - rmtp
+	SVC	$SYS___nanosleep50
+	RET
+
+TEXT runtime·lwp_kill(SB),NOSPLIT,$0-16
+	MOVW	tid+0(FP), R0		// arg 1 - target
+	MOVD	sig+8(FP), R1		// arg 2 - signo
+	SVC	$SYS__lwp_kill
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$16
+	SVC	$SYS_getpid
+					// arg 1 - pid (from getpid)
+	MOVD	sig+0(FP), R1		// arg 2 - signo
+	SVC	$SYS_kill
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$-8
+	MOVW	mode+0(FP), R0		// arg 1 - which
+	MOVD	new+8(FP), R1		// arg 2 - itv
+	MOVD	old+16(FP), R2		// arg 3 - oitv
+	SVC	$SYS___setitimer50
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB), NOSPLIT, $32
+	MOVW	$CLOCK_REALTIME, R0	// arg 1 - clock_id
+	MOVD	$8(RSP), R1		// arg 2 - tp
+	SVC	$SYS___clock_gettime50
+
+	MOVD	8(RSP), R0		// sec
+	MOVD	16(RSP), R1		// nsec
+
+	// sec is in R0, nsec in R1
+	MOVD	R0, sec+0(FP)
+	MOVW	R1, nsec+8(FP)
+	RET
+
+// int64 nanotime1(void) so really
+// void nanotime1(int64 *nsec)
+TEXT runtime·nanotime1(SB), NOSPLIT, $32
+	MOVD	$CLOCK_MONOTONIC, R0	// arg 1 - clock_id
+	MOVD	$8(RSP), R1		// arg 2 - tp
+	SVC	$SYS___clock_gettime50
+	MOVD	8(RSP), R0		// sec
+	MOVD	16(RSP), R2		// nsec
+
+	// sec is in R0, nsec in R2
+	// return nsec in R2
+	MOVD	$1000000000, R3
+	MUL	R3, R0
+	ADD	R2, R0
+
+	MOVD	R0, ret+0(FP)
+	RET
+
+TEXT runtime·getcontext(SB),NOSPLIT,$-8
+	MOVD	ctxt+0(FP), R0		// arg 1 - context
+	SVC	$SYS_getcontext
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)		// crash
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVW	how+0(FP), R0		// arg 1 - how
+	MOVD	new+8(FP), R1		// arg 2 - set
+	MOVD	old+16(FP), R2		// arg 3 - oset
+	SVC	$SYS___sigprocmask14
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)		// crash
+
+TEXT sigreturn_tramp<>(SB),NOSPLIT,$-8
+	MOVD	g, R0
+	SVC	$SYS_setcontext
+	MOVD	$0x4242, R0		// Something failed, return magic number
+	SVC	$SYS_exit
+
+TEXT runtime·sigaction(SB),NOSPLIT,$-8
+	MOVW	sig+0(FP), R0		// arg 1 - signum
+	MOVD	new+8(FP), R1		// arg 2 - nsa
+	MOVD	old+16(FP), R2		// arg 3 - osa
+					// arg 4 - tramp
+	MOVD	$sigreturn_tramp<>(SB), R3
+	MOVW	$2, R4			// arg 5 - vers
+	SVC	$SYS___sigaction_sigtramp
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)		// crash
+
+// XXX ???
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R0
+	MOVD	info+16(FP), R1
+	MOVD	ctx+24(FP), R2
+	MOVD	fn+0(FP), R11
+	BL	(R11)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$192
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	MOVD	R19, 8*4(RSP)
+	MOVD	R20, 8*5(RSP)
+	MOVD	R21, 8*6(RSP)
+	MOVD	R22, 8*7(RSP)
+	MOVD	R23, 8*8(RSP)
+	MOVD	R24, 8*9(RSP)
+	MOVD	R25, 8*10(RSP)
+	MOVD	R26, 8*11(RSP)
+	MOVD	R27, 8*12(RSP)
+	MOVD	g, 8*13(RSP)
+	// Unclobber g for now (kernel uses it as ucontext ptr)
+	// See https://github.com/golang/go/issues/30824#issuecomment-492772426
+	// This is only correct in the non-cgo case.
+	// XXX should use lwp_getprivate as suggested.
+	// 8*36 is ucontext.uc_mcontext.__gregs[_REG_X28]
+	MOVD	8*36(g), g
+	MOVD	R29, 8*14(RSP)
+	FMOVD	F8, 8*15(RSP)
+	FMOVD	F9, 8*16(RSP)
+	FMOVD	F10, 8*17(RSP)
+	FMOVD	F11, 8*18(RSP)
+	FMOVD	F12, 8*19(RSP)
+	FMOVD	F13, 8*20(RSP)
+	FMOVD	F14, 8*21(RSP)
+	FMOVD	F15, 8*22(RSP)
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVD	R0, 8(RSP)		// signum
+	MOVB	runtime·iscgo(SB), R0
+	CMP 	$0, R0
+	// XXX branch destination
+	BEQ	2(PC)
+	BL	runtime·load_g(SB)
+
+	MOVD	R1, 16(RSP)
+	MOVD	R2, 24(RSP)
+	BL	runtime·sigtrampgo(SB)
+
+	// Restore callee-save registers.
+	MOVD	8*4(RSP), R19
+	MOVD	8*5(RSP), R20
+	MOVD	8*6(RSP), R21
+	MOVD	8*7(RSP), R22
+	MOVD	8*8(RSP), R23
+	MOVD	8*9(RSP), R24
+	MOVD	8*10(RSP), R25
+	MOVD	8*11(RSP), R26
+	MOVD	8*12(RSP), R27
+	MOVD	8*13(RSP), g
+	MOVD	8*14(RSP), R29
+	FMOVD	8*15(RSP), F8
+	FMOVD	8*16(RSP), F9
+	FMOVD	8*17(RSP), F10
+	FMOVD	8*18(RSP), F11
+	FMOVD	8*19(RSP), F12
+	FMOVD	8*20(RSP), F13
+	FMOVD	8*21(RSP), F14
+	FMOVD	8*22(RSP), F15
+
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0		// arg 1 - addr
+	MOVD	n+8(FP), R1		// arg 2 - len
+	MOVW	prot+16(FP), R2		// arg 3 - prot
+	MOVW	flags+20(FP), R3	// arg 4 - flags
+	MOVW	fd+24(FP), R4		// arg 5 - fd
+	MOVW	$0, R5			// arg 6 - pad
+	MOVD	off+28(FP), R6		// arg 7 - offset
+	SVC	$SYS_mmap
+	BCS	fail
+	MOVD	R0, p+32(FP)
+	MOVD	$0, err+40(FP)
+	RET
+fail:
+	MOVD	$0, p+32(FP)
+	MOVD	R0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0	// arg 1 - addr
+	MOVD	n+8(FP), R1	// arg 2 - len
+	SVC	$SYS_munmap
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0		// arg 1 - addr
+	MOVD	n+8(FP), R1		// arg 2 - len
+	MOVW	flags+16(FP), R2	// arg 3 - behav
+	SVC	$SYS_madvise
+	BCC	ok
+	MOVD	$-1, R0
+ok:
+	MOVD	R0, ret+24(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVD	new+0(FP), R0		// arg 1 - nss
+	MOVD	old+8(FP), R1		// arg 2 - oss
+	SVC	$SYS___sigaltstack14
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)		// crash
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVD	mib+0(FP), R0		// arg 1 - name
+	MOVW	miblen+8(FP), R1	// arg 2 - namelen
+	MOVD	out+16(FP), R2		// arg 3 - oldp
+	MOVD	size+24(FP), R3		// arg 4 - oldlenp
+	MOVD	dst+32(FP), R4		// arg 5 - newp
+	MOVD	ndst+40(FP), R5		// arg 6 - newlen
+	SVC	$SYS___sysctl
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+48(FP)
+	RET
+
+// int32 runtime·kqueue(void)
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVD	$0, R0
+	SVC	$SYS_kqueue
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVW	kq+0(FP), R0	// arg 1 - kq
+	MOVD	ch+8(FP), R1	// arg 2 - changelist
+	MOVW	nch+16(FP), R2	// arg 3 - nchanges
+	MOVD	ev+24(FP), R3	// arg 4 - eventlist
+	MOVW	nev+32(FP), R4	// arg 5 - nevents
+	MOVD	ts+40(FP), R5	// arg 6 - timeout
+	SVC	$SYS___kevent50
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+48(FP)
+	RET
+
+// void runtime·closeonexec(int32 fd)
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	MOVW	$F_SETFD, R1
+	MOVW	$FD_CLOEXEC, R2
+	SVC	$SYS_fcntl
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	MOVD	$F_GETFL, R1		// arg 2 - cmd
+	MOVD	$0, R2			// arg 3
+	SVC	$SYS_fcntl
+	MOVD	$O_NONBLOCK, R2
+	EOR	R0, R2			// arg 3 - flags
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	MOVD	$F_SETFL, R1		// arg 2 - cmd
+	SVC	$SYS_fcntl
+	RET
diff --git a/src/runtime/sys_nonppc64x.go b/src/runtime/sys_nonppc64x.go
new file mode 100644
index 0000000..4409374
--- /dev/null
+++ b/src/runtime/sys_nonppc64x.go
@@ -0,0 +1,10 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !ppc64,!ppc64le
+
+package runtime
+
+func prepGoExitFrame(sp uintptr) {
+}
diff --git a/src/runtime/sys_openbsd.go b/src/runtime/sys_openbsd.go
new file mode 100644
index 0000000..fcddf4d
--- /dev/null
+++ b/src/runtime/sys_openbsd.go
@@ -0,0 +1,60 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build openbsd,amd64 openbsd,arm64
+
+package runtime
+
+import "unsafe"
+
+// The *_trampoline functions convert from the Go calling convention to the C calling convention
+// and then call the underlying libc function. These are defined in sys_openbsd_$ARCH.s.
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_init(attr *pthreadattr) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_attr_init_trampoline)), unsafe.Pointer(&attr))
+}
+func pthread_attr_init_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_destroy(attr *pthreadattr) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_attr_destroy_trampoline)), unsafe.Pointer(&attr))
+}
+func pthread_attr_destroy_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_getstacksize(attr *pthreadattr, size *uintptr) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_attr_getstacksize_trampoline)), unsafe.Pointer(&attr))
+}
+func pthread_attr_getstacksize_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_setdetachstate(attr *pthreadattr, state int) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_attr_setdetachstate_trampoline)), unsafe.Pointer(&attr))
+}
+func pthread_attr_setdetachstate_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_create(attr *pthreadattr, start uintptr, arg unsafe.Pointer) int32 {
+	return libcCall(unsafe.Pointer(funcPC(pthread_create_trampoline)), unsafe.Pointer(&attr))
+}
+func pthread_create_trampoline()
+
+// Tell the linker that the libc_* functions are to be found
+// in a system library, with the libc_ prefix missing.
+
+//go:cgo_import_dynamic libc_pthread_attr_init pthread_attr_init "libpthread.so"
+//go:cgo_import_dynamic libc_pthread_attr_destroy pthread_attr_destroy "libpthread.so"
+//go:cgo_import_dynamic libc_pthread_attr_getstacksize pthread_attr_getstacksize "libpthread.so"
+//go:cgo_import_dynamic libc_pthread_attr_setdetachstate pthread_attr_setdetachstate "libpthread.so"
+//go:cgo_import_dynamic libc_pthread_create pthread_create "libpthread.so"
+//go:cgo_import_dynamic libc_pthread_sigmask pthread_sigmask "libpthread.so"
+
+//go:cgo_import_dynamic _ _ "libpthread.so"
+//go:cgo_import_dynamic _ _ "libc.so"
diff --git a/src/runtime/sys_openbsd1.go b/src/runtime/sys_openbsd1.go
new file mode 100644
index 0000000..e288621
--- /dev/null
+++ b/src/runtime/sys_openbsd1.go
@@ -0,0 +1,34 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build openbsd,amd64 openbsd,arm64
+
+package runtime
+
+import "unsafe"
+
+//go:nosplit
+//go:cgo_unsafe_args
+func thrsleep(ident uintptr, clock_id int32, tsp *timespec, lock uintptr, abort *uint32) int32 {
+	return libcCall(unsafe.Pointer(funcPC(thrsleep_trampoline)), unsafe.Pointer(&ident))
+}
+func thrsleep_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func thrwakeup(ident uintptr, n int32) int32 {
+	return libcCall(unsafe.Pointer(funcPC(thrwakeup_trampoline)), unsafe.Pointer(&ident))
+}
+func thrwakeup_trampoline()
+
+func osyield() {
+	libcCall(unsafe.Pointer(funcPC(sched_yield_trampoline)), unsafe.Pointer(nil))
+}
+func sched_yield_trampoline()
+
+//go:cgo_import_dynamic libc_thrsleep __thrsleep "libc.so"
+//go:cgo_import_dynamic libc_thrwakeup __thrwakeup "libc.so"
+//go:cgo_import_dynamic libc_sched_yield sched_yield "libc.so"
+
+//go:cgo_import_dynamic _ _ "libc.so"
diff --git a/src/runtime/sys_openbsd2.go b/src/runtime/sys_openbsd2.go
new file mode 100644
index 0000000..474e714
--- /dev/null
+++ b/src/runtime/sys_openbsd2.go
@@ -0,0 +1,250 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build openbsd,amd64 openbsd,arm64
+
+package runtime
+
+import "unsafe"
+
+// This is exported via linkname to assembly in runtime/cgo.
+//go:linkname exit
+//go:nosplit
+//go:cgo_unsafe_args
+func exit(code int32) {
+	libcCall(unsafe.Pointer(funcPC(exit_trampoline)), unsafe.Pointer(&code))
+}
+func exit_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func getthrid() (tid int32) {
+	libcCall(unsafe.Pointer(funcPC(getthrid_trampoline)), unsafe.Pointer(&tid))
+	return
+}
+func getthrid_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func raiseproc(sig uint32) {
+	libcCall(unsafe.Pointer(funcPC(raiseproc_trampoline)), unsafe.Pointer(&sig))
+}
+func raiseproc_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func thrkill(tid int32, sig int) {
+	libcCall(unsafe.Pointer(funcPC(thrkill_trampoline)), unsafe.Pointer(&tid))
+}
+func thrkill_trampoline()
+
+// mmap is used to do low-level memory allocation via mmap. Don't allow stack
+// splits, since this function (used by sysAlloc) is called in a lot of low-level
+// parts of the runtime and callers often assume it won't acquire any locks.
+// go:nosplit
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (unsafe.Pointer, int) {
+	args := struct {
+		addr            unsafe.Pointer
+		n               uintptr
+		prot, flags, fd int32
+		off             uint32
+		ret1            unsafe.Pointer
+		ret2            int
+	}{addr, n, prot, flags, fd, off, nil, 0}
+	libcCall(unsafe.Pointer(funcPC(mmap_trampoline)), unsafe.Pointer(&args))
+	return args.ret1, args.ret2
+}
+func mmap_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func munmap(addr unsafe.Pointer, n uintptr) {
+	libcCall(unsafe.Pointer(funcPC(munmap_trampoline)), unsafe.Pointer(&addr))
+}
+func munmap_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) {
+	libcCall(unsafe.Pointer(funcPC(madvise_trampoline)), unsafe.Pointer(&addr))
+}
+func madvise_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func open(name *byte, mode, perm int32) (ret int32) {
+	return libcCall(unsafe.Pointer(funcPC(open_trampoline)), unsafe.Pointer(&name))
+}
+func open_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func closefd(fd int32) int32 {
+	return libcCall(unsafe.Pointer(funcPC(close_trampoline)), unsafe.Pointer(&fd))
+}
+func close_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func read(fd int32, p unsafe.Pointer, n int32) int32 {
+	return libcCall(unsafe.Pointer(funcPC(read_trampoline)), unsafe.Pointer(&fd))
+}
+func read_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	return libcCall(unsafe.Pointer(funcPC(write_trampoline)), unsafe.Pointer(&fd))
+}
+func write_trampoline()
+
+func pipe() (r, w int32, errno int32) {
+	return pipe2(0)
+}
+
+func pipe2(flags int32) (r, w int32, errno int32) {
+	var p [2]int32
+	args := struct {
+		p     unsafe.Pointer
+		flags int32
+	}{noescape(unsafe.Pointer(&p)), flags}
+	errno = libcCall(unsafe.Pointer(funcPC(pipe2_trampoline)), unsafe.Pointer(&args))
+	return p[0], p[1], errno
+}
+func pipe2_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func setitimer(mode int32, new, old *itimerval) {
+	libcCall(unsafe.Pointer(funcPC(setitimer_trampoline)), unsafe.Pointer(&mode))
+}
+func setitimer_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func usleep(usec uint32) {
+	libcCall(unsafe.Pointer(funcPC(usleep_trampoline)), unsafe.Pointer(&usec))
+}
+func usleep_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32 {
+	return libcCall(unsafe.Pointer(funcPC(sysctl_trampoline)), unsafe.Pointer(&mib))
+}
+func sysctl_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func fcntl(fd, cmd, arg int32) int32 {
+	return libcCall(unsafe.Pointer(funcPC(fcntl_trampoline)), unsafe.Pointer(&fd))
+}
+func fcntl_trampoline()
+
+//go:nosplit
+func nanotime1() int64 {
+	var ts timespec
+	args := struct {
+		clock_id int32
+		tp       unsafe.Pointer
+	}{_CLOCK_MONOTONIC, unsafe.Pointer(&ts)}
+	libcCall(unsafe.Pointer(funcPC(clock_gettime_trampoline)), unsafe.Pointer(&args))
+	return ts.tv_sec*1e9 + int64(ts.tv_nsec)
+}
+func clock_gettime_trampoline()
+
+//go:nosplit
+func walltime1() (int64, int32) {
+	var ts timespec
+	args := struct {
+		clock_id int32
+		tp       unsafe.Pointer
+	}{_CLOCK_REALTIME, unsafe.Pointer(&ts)}
+	libcCall(unsafe.Pointer(funcPC(clock_gettime_trampoline)), unsafe.Pointer(&args))
+	return ts.tv_sec, int32(ts.tv_nsec)
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func kqueue() int32 {
+	return libcCall(unsafe.Pointer(funcPC(kqueue_trampoline)), nil)
+}
+func kqueue_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32 {
+	return libcCall(unsafe.Pointer(funcPC(kevent_trampoline)), unsafe.Pointer(&kq))
+}
+func kevent_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigaction(sig uint32, new *sigactiont, old *sigactiont) {
+	libcCall(unsafe.Pointer(funcPC(sigaction_trampoline)), unsafe.Pointer(&sig))
+}
+func sigaction_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigprocmask(how uint32, new *sigset, old *sigset) {
+	libcCall(unsafe.Pointer(funcPC(sigprocmask_trampoline)), unsafe.Pointer(&how))
+}
+func sigprocmask_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigaltstack(new *stackt, old *stackt) {
+	libcCall(unsafe.Pointer(funcPC(sigaltstack_trampoline)), unsafe.Pointer(&new))
+}
+func sigaltstack_trampoline()
+
+// Not used on OpenBSD, but must be defined.
+func exitThread(wait *uint32) {
+}
+
+//go:nosplit
+func closeonexec(fd int32) {
+	fcntl(fd, _F_SETFD, _FD_CLOEXEC)
+}
+
+//go:nosplit
+func setNonblock(fd int32) {
+	flags := fcntl(fd, _F_GETFL, 0)
+	fcntl(fd, _F_SETFL, flags|_O_NONBLOCK)
+}
+
+// Tell the linker that the libc_* functions are to be found
+// in a system library, with the libc_ prefix missing.
+
+//go:cgo_import_dynamic libc_errno __errno "libc.so"
+//go:cgo_import_dynamic libc_exit exit "libc.so"
+//go:cgo_import_dynamic libc_getthrid getthrid "libc.so"
+//go:cgo_import_dynamic libc_sched_yield sched_yield "libc.so"
+//go:cgo_import_dynamic libc_thrkill thrkill "libc.so"
+
+//go:cgo_import_dynamic libc_mmap mmap "libc.so"
+//go:cgo_import_dynamic libc_munmap munmap "libc.so"
+//go:cgo_import_dynamic libc_madvise madvise "libc.so"
+
+//go:cgo_import_dynamic libc_open open "libc.so"
+//go:cgo_import_dynamic libc_close close "libc.so"
+//go:cgo_import_dynamic libc_read read "libc.so"
+//go:cgo_import_dynamic libc_write write "libc.so"
+//go:cgo_import_dynamic libc_pipe2 pipe2 "libc.so"
+
+//go:cgo_import_dynamic libc_clock_gettime clock_gettime "libc.so"
+//go:cgo_import_dynamic libc_setitimer setitimer "libc.so"
+//go:cgo_import_dynamic libc_usleep usleep "libc.so"
+//go:cgo_import_dynamic libc_sysctl sysctl "libc.so"
+//go:cgo_import_dynamic libc_fcntl fcntl "libc.so"
+//go:cgo_import_dynamic libc_getpid getpid "libc.so"
+//go:cgo_import_dynamic libc_kill kill "libc.so"
+//go:cgo_import_dynamic libc_kqueue kqueue "libc.so"
+//go:cgo_import_dynamic libc_kevent kevent "libc.so"
+
+//go:cgo_import_dynamic libc_sigaction sigaction "libc.so"
+//go:cgo_import_dynamic libc_sigaltstack sigaltstack "libc.so"
+
+//go:cgo_import_dynamic _ _ "libc.so"
diff --git a/src/runtime/sys_openbsd3.go b/src/runtime/sys_openbsd3.go
new file mode 100644
index 0000000..4d4c88e
--- /dev/null
+++ b/src/runtime/sys_openbsd3.go
@@ -0,0 +1,113 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build openbsd,amd64 openbsd,arm64
+
+package runtime
+
+import "unsafe"
+
+// The X versions of syscall expect the libc call to return a 64-bit result.
+// Otherwise (the non-X version) expects a 32-bit result.
+// This distinction is required because an error is indicated by returning -1,
+// and we need to know whether to check 32 or 64 bits of the result.
+// (Some libc functions that return 32 bits put junk in the upper 32 bits of AX.)
+
+//go:linkname syscall_syscall syscall.syscall
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscall)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall()
+
+//go:linkname syscall_syscallX syscall.syscallX
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscallX(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscallX)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscallX()
+
+//go:linkname syscall_syscall6 syscall.syscall6
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscall6)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall6()
+
+//go:linkname syscall_syscall6X syscall.syscall6X
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall6X(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscall6X)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall6X()
+
+//go:linkname syscall_syscall10 syscall.syscall10
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall10(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscall10)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall10()
+
+//go:linkname syscall_syscall10X syscall.syscall10X
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall10X(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(funcPC(syscall10X)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall10X()
+
+//go:linkname syscall_rawSyscall syscall.rawSyscall
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_rawSyscall(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	libcCall(unsafe.Pointer(funcPC(syscall)), unsafe.Pointer(&fn))
+	return
+}
+
+//go:linkname syscall_rawSyscall6 syscall.rawSyscall6
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_rawSyscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	libcCall(unsafe.Pointer(funcPC(syscall6)), unsafe.Pointer(&fn))
+	return
+}
+
+//go:linkname syscall_rawSyscall6X syscall.rawSyscall6X
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_rawSyscall6X(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	libcCall(unsafe.Pointer(funcPC(syscall6X)), unsafe.Pointer(&fn))
+	return
+}
+
+//go:linkname syscall_rawSyscall10X syscall.rawSyscall10X
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_rawSyscall10X(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10 uintptr) (r1, r2, err uintptr) {
+	libcCall(unsafe.Pointer(funcPC(syscall10X)), unsafe.Pointer(&fn))
+	return
+}
diff --git a/src/runtime/sys_openbsd_386.s b/src/runtime/sys_openbsd_386.s
new file mode 100644
index 0000000..24fbfd6
--- /dev/null
+++ b/src/runtime/sys_openbsd_386.s
@@ -0,0 +1,461 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for 386, OpenBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define	CLOCK_MONOTONIC	$3
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-4
+	MOVL	$1, AX
+	INT	$0x80
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVL	$302, AX		// sys___threxit
+	INT	$0x80
+	MOVL	$0xf1, 0xf1		// crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-4
+	MOVL	$5, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-4
+	MOVL	$6, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-4
+	MOVL	$3, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$8-12
+	MOVL	$263, AX
+	LEAL	r+0(FP), BX
+	MOVL	BX, 4(SP)
+	INT	$0x80
+	MOVL	AX, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$12-16
+	MOVL	$101, AX
+	LEAL	r+4(FP), BX
+	MOVL	BX, 4(SP)
+	MOVL	flags+0(FP), BX
+	MOVL	BX, 8(SP)
+	INT	$0x80
+	MOVL	AX, errno+12(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-4
+	MOVL	$4, AX			// sys_write
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$24
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVL	AX, 12(SP)		// tv_sec - l32
+	MOVL	$0, 16(SP)		// tv_sec - h32
+	MOVL	$1000, AX
+	MULL	DX
+	MOVL	AX, 20(SP)		// tv_nsec
+
+	MOVL	$0, 0(SP)
+	LEAL	12(SP), AX
+	MOVL	AX, 4(SP)		// arg 1 - rqtp
+	MOVL	$0, 8(SP)		// arg 2 - rmtp
+	MOVL	$91, AX			// sys_nanosleep
+	INT	$0x80
+	RET
+
+TEXT runtime·getthrid(SB),NOSPLIT,$0-4
+	MOVL	$299, AX		// sys_getthrid
+	INT	$0x80
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·thrkill(SB),NOSPLIT,$16-8
+	MOVL	$0, 0(SP)
+	MOVL	tid+0(FP), AX
+	MOVL	AX, 4(SP)		// arg 1 - tid
+	MOVL	sig+4(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - signum
+	MOVL	$0, 12(SP)		// arg 3 - tcb
+	MOVL	$119, AX		// sys_thrkill
+	INT	$0x80
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$12
+	MOVL	$20, AX			// sys_getpid
+	INT	$0x80
+	MOVL	$0, 0(SP)
+	MOVL	AX, 4(SP)		// arg 1 - pid
+	MOVL	sig+0(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - signum
+	MOVL	$122, AX		// sys_kill
+	INT	$0x80
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$36
+	LEAL	addr+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL				// arg 1 - addr
+	MOVSL				// arg 2 - len
+	MOVSL				// arg 3 - prot
+	MOVSL				// arg 4 - flags
+	MOVSL				// arg 5 - fd
+	MOVL	$0, AX
+	STOSL				// arg 6 - pad
+	MOVSL				// arg 7 - offset
+	MOVL	$0, AX			// top 32 bits of file offset
+	STOSL
+	MOVL	$197, AX		// sys_mmap
+	INT	$0x80
+	JAE	ok
+	MOVL	$0, p+24(FP)
+	MOVL	AX, err+28(FP)
+	RET
+ok:
+	MOVL	AX, p+24(FP)
+	MOVL	$0, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$-4
+	MOVL	$73, AX			// sys_munmap
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$-4
+	MOVL	$75, AX			// sys_madvise
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$-4
+	MOVL	$69, AX
+	INT	$0x80
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB), NOSPLIT, $32
+	LEAL	12(SP), BX
+	MOVL	$0, 4(SP)		// arg 1 - clock_id
+	MOVL	BX, 8(SP)		// arg 2 - tp
+	MOVL	$87, AX			// sys_clock_gettime
+	INT	$0x80
+
+	MOVL	12(SP), AX		// sec - l32
+	MOVL	AX, sec_lo+0(FP)
+	MOVL	16(SP), AX		// sec - h32
+	MOVL	AX, sec_hi+4(FP)
+
+	MOVL	20(SP), BX		// nsec
+	MOVL	BX, nsec+8(FP)
+	RET
+
+// int64 nanotime1(void) so really
+// void nanotime1(int64 *nsec)
+TEXT runtime·nanotime1(SB),NOSPLIT,$32
+	LEAL	12(SP), BX
+	MOVL	CLOCK_MONOTONIC, 4(SP)	// arg 1 - clock_id
+	MOVL	BX, 8(SP)		// arg 2 - tp
+	MOVL	$87, AX			// sys_clock_gettime
+	INT	$0x80
+
+	MOVL    16(SP), CX		// sec - h32
+	IMULL   $1000000000, CX
+
+	MOVL    12(SP), AX		// sec - l32
+	MOVL    $1000000000, BX
+	MULL    BX			// result in dx:ax
+
+	MOVL	20(SP), BX		// nsec
+	ADDL	BX, AX
+	ADCL	CX, DX			// add high bits with carry
+
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+
+TEXT runtime·sigaction(SB),NOSPLIT,$-4
+	MOVL	$46, AX			// sys_sigaction
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·obsdsigprocmask(SB),NOSPLIT,$-4
+	MOVL	$48, AX			// sys_sigprocmask
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$12-16
+	MOVL	fn+0(FP), AX
+	MOVL	sig+4(FP), BX
+	MOVL	info+8(FP), CX
+	MOVL	ctx+12(FP), DX
+	MOVL	SP, SI
+	SUBL	$32, SP
+	ANDL	$~15, SP	// align stack: handler might be a C function
+	MOVL	BX, 0(SP)
+	MOVL	CX, 4(SP)
+	MOVL	DX, 8(SP)
+	MOVL	SI, 12(SP)	// save SI: handler might be a Go function
+	CALL	AX
+	MOVL	12(SP), AX
+	MOVL	AX, SP
+	RET
+
+// Called by OS using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT,$28
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVL	BX, bx-4(SP)
+	MOVL	BP, bp-8(SP)
+	MOVL	SI, si-12(SP)
+	MOVL	DI, di-16(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVL	32(SP), BX // signo
+	MOVL	BX, 0(SP)
+	MOVL	36(SP), BX // info
+	MOVL	BX, 4(SP)
+	MOVL	40(SP), BX // context
+	MOVL	BX, 8(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	MOVL	di-16(SP), DI
+	MOVL	si-12(SP), SI
+	MOVL	bp-8(SP),  BP
+	MOVL	bx-4(SP),  BX
+	RET
+
+// int32 tfork(void *param, uintptr psize, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·tfork(SB),NOSPLIT,$12
+
+	// Copy mp, gp and fn from the parent stack onto the child stack.
+	MOVL	param+0(FP), AX
+	MOVL	8(AX), CX		// tf_stack
+	SUBL	$16, CX
+	MOVL	CX, 8(AX)
+	MOVL	mm+8(FP), SI
+	MOVL	SI, 0(CX)
+	MOVL	gg+12(FP), SI
+	MOVL	SI, 4(CX)
+	MOVL	fn+16(FP), SI
+	MOVL	SI, 8(CX)
+	MOVL	$1234, 12(CX)
+
+	MOVL	$0, 0(SP)		// syscall gap
+	MOVL	param+0(FP), AX
+	MOVL	AX, 4(SP)		// arg 1 - param
+	MOVL	psize+4(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - psize
+	MOVL	$8, AX			// sys___tfork
+	INT	$0x80
+
+	// Return if tfork syscall failed.
+	JCC	4(PC)
+	NEGL	AX
+	MOVL	AX, ret+20(FP)
+	RET
+
+	// In parent, return.
+	CMPL	AX, $0
+	JEQ	3(PC)
+	MOVL	AX, ret+20(FP)
+	RET
+
+	// Paranoia: check that SP is as we expect.
+	MOVL	12(SP), BP
+	CMPL	BP, $1234
+	JEQ	2(PC)
+	INT	$3
+
+	// Reload registers.
+	MOVL	0(SP), BX		// m
+	MOVL	4(SP), DX		// g
+	MOVL	8(SP), SI		// fn
+
+	// Set FS to point at m->tls.
+	LEAL	m_tls(BX), BP
+	PUSHAL				// save registers
+	PUSHL	BP
+	CALL	set_tcb<>(SB)
+	POPL	AX
+	POPAL
+
+	// Now segment is established. Initialize m, g.
+	get_tls(AX)
+	MOVL	DX, g(AX)
+	MOVL	BX, g_m(DX)
+
+	CALL	runtime·stackcheck(SB)	// smashes AX, CX
+	MOVL	0(DX), DX		// paranoia; check they are not nil
+	MOVL	0(BX), BX
+
+	// More paranoia; check that stack splitting code works.
+	PUSHAL
+	CALL	runtime·emptyfunc(SB)
+	POPAL
+
+	// Call fn.
+	CALL	SI
+
+	// fn should never return.
+	MOVL	$0x1234, 0x1005
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVL	$288, AX		// sys_sigaltstack
+	MOVL	new+0(FP), BX
+	MOVL	old+4(FP), CX
+	INT	$0x80
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	INT	$3
+	RET
+
+TEXT runtime·setldt(SB),NOSPLIT,$4
+	// Under OpenBSD we set the GS base instead of messing with the LDT.
+	MOVL	base+4(FP), AX
+	MOVL	AX, 0(SP)
+	CALL	set_tcb<>(SB)
+	RET
+
+TEXT set_tcb<>(SB),NOSPLIT,$8
+	// adjust for ELF: wants to use -4(GS) for g
+	MOVL	tlsbase+0(FP), CX
+	ADDL	$4, CX
+	MOVL	$0, 0(SP)		// syscall gap
+	MOVL	CX, 4(SP)		// arg 1 - tcb
+	MOVL	$329, AX		// sys___set_tcb
+	INT	$0x80
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$-4
+	MOVL	$298, AX		// sys_sched_yield
+	INT	$0x80
+	RET
+
+TEXT runtime·thrsleep(SB),NOSPLIT,$-4
+	MOVL	$94, AX			// sys___thrsleep
+	INT	$0x80
+	MOVL	AX, ret+20(FP)
+	RET
+
+TEXT runtime·thrwakeup(SB),NOSPLIT,$-4
+	MOVL	$301, AX		// sys___thrwakeup
+	INT	$0x80
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$28
+	LEAL	mib+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL				// arg 1 - name
+	MOVSL				// arg 2 - namelen
+	MOVSL				// arg 3 - oldp
+	MOVSL				// arg 4 - oldlenp
+	MOVSL				// arg 5 - newp
+	MOVSL				// arg 6 - newlen
+	MOVL	$202, AX		// sys___sysctl
+	INT	$0x80
+	JCC	4(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int32 runtime·kqueue(void);
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVL	$269, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout);
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	$72, AX			// sys_kevent
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int32 runtime·closeonexec(int32 fd);
+TEXT runtime·closeonexec(SB),NOSPLIT,$32
+	MOVL	$92, AX			// sys_fcntl
+	// 0(SP) is where the caller PC would be; kernel skips it
+	MOVL	fd+0(FP), BX
+	MOVL	BX, 4(SP)	// fd
+	MOVL	$2, 8(SP)	// F_SETFD
+	MOVL	$1, 12(SP)	// FD_CLOEXEC
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	RET
+
+// func runtime·setNonblock(fd int32)
+TEXT runtime·setNonblock(SB),NOSPLIT,$16-4
+	MOVL	$92, AX // fcntl
+	MOVL	fd+0(FP), BX // fd
+	MOVL	BX, 4(SP)
+	MOVL	$3, 8(SP) // F_GETFL
+	MOVL	$0, 12(SP)
+	INT	$0x80
+	MOVL	fd+0(FP), BX // fd
+	MOVL	BX, 4(SP)
+	MOVL	$4, 8(SP) // F_SETFL
+	ORL	$4, AX // O_NONBLOCK
+	MOVL	AX, 12(SP)
+	MOVL	$92, AX // fcntl
+	INT	$0x80
+	RET
+
+GLOBL runtime·tlsoffset(SB),NOPTR,$4
diff --git a/src/runtime/sys_openbsd_amd64.s b/src/runtime/sys_openbsd_amd64.s
new file mode 100644
index 0000000..b3a76b5
--- /dev/null
+++ b/src/runtime/sys_openbsd_amd64.s
@@ -0,0 +1,788 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for AMD64, OpenBSD.
+// System calls are implemented in libc/libpthread, this file
+// contains trampolines that convert from Go to C calling convention.
+// Some direct system call implementations currently remain.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_MONOTONIC	$3
+
+TEXT runtime·settls(SB),NOSPLIT,$0
+	// Nothing to do, pthread already set thread-local storage up.
+	RET
+
+// mstart_stub is the first function executed on a new thread started by pthread_create.
+// It just does some low-level setup and then calls mstart.
+// Note: called with the C calling convention.
+TEXT runtime·mstart_stub(SB),NOSPLIT,$0
+	// DI points to the m.
+	// We are already on m's g0 stack.
+
+	// Save callee-save registers.
+	SUBQ	$48, SP
+	MOVQ	BX, 0(SP)
+	MOVQ	BP, 8(SP)
+	MOVQ	R12, 16(SP)
+	MOVQ	R13, 24(SP)
+	MOVQ	R14, 32(SP)
+	MOVQ	R15, 40(SP)
+
+	// Load g and save to TLS entry.
+	// See cmd/link/internal/ld/sym.go:computeTLSOffset.
+	MOVQ	m_g0(DI), DX // g
+	MOVQ	DX, -8(FS)
+
+	// Someday the convention will be D is always cleared.
+	CLD
+
+	CALL	runtime·mstart(SB)
+
+	// Restore callee-save registers.
+	MOVQ	0(SP), BX
+	MOVQ	8(SP), BP
+	MOVQ	16(SP), R12
+	MOVQ	24(SP), R13
+	MOVQ	32(SP), R14
+	MOVQ	40(SP), R15
+
+	// Go is all done with this OS thread.
+	// Tell pthread everything is ok (we never join with this thread, so
+	// the value here doesn't really matter).
+	XORL	AX, AX
+
+	ADDQ	$48, SP
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	PUSHQ	BP
+	MOVQ	SP, BP
+	ANDQ	$~15, SP     // alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$72
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVQ	BX,  bx-8(SP)
+	MOVQ	BP,  bp-16(SP)  // save in case GOEXPERIMENT=noframepointer is set
+	MOVQ	R12, r12-24(SP)
+	MOVQ	R13, r13-32(SP)
+	MOVQ	R14, r14-40(SP)
+	MOVQ	R15, r15-48(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVQ	DX, ctx-56(SP)
+	MOVQ	SI, info-64(SP)
+	MOVQ	DI, signum-72(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	MOVQ	r15-48(SP), R15
+	MOVQ	r14-40(SP), R14
+	MOVQ	r13-32(SP), R13
+	MOVQ	r12-24(SP), R12
+	MOVQ	bp-16(SP),  BP
+	MOVQ	bx-8(SP),   BX
+	RET
+
+//
+// These trampolines help convert from Go calling convention to C calling convention.
+// They should be called with asmcgocall.
+// A pointer to the arguments is passed in DI.
+// A single int32 result is returned in AX.
+// (For more results, make an args/results structure.)
+TEXT runtime·pthread_attr_init_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	0(DI), DI		// arg 1 - attr
+	CALL	libc_pthread_attr_init(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_attr_destroy_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	0(DI), DI		// arg 1 - attr
+	CALL	libc_pthread_attr_destroy(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_attr_getstacksize_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 - stacksize
+	MOVQ	0(DI), DI		// arg 1 - attr
+	CALL	libc_pthread_attr_getstacksize(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_attr_setdetachstate_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 - detachstate
+	MOVQ	0(DI), DI		// arg 1 - attr
+	CALL	libc_pthread_attr_setdetachstate(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·pthread_create_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	0(DI), SI		// arg 2 - attr
+	MOVQ	8(DI), DX		// arg 3 - start
+	MOVQ	16(DI), CX		// arg 4 - arg
+	MOVQ	SP, DI			// arg 1 - &thread (discarded)
+	CALL	libc_pthread_create(SB)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+TEXT runtime·thrkill_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	8(DI), SI		// arg 2 - signal
+	MOVQ	$0, DX			// arg 3 - tcb
+	MOVL	0(DI), DI		// arg 1 - tid
+	CALL	libc_thrkill(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·thrsleep_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	8(DI), SI		// arg 2 - clock_id
+	MOVQ	16(DI), DX		// arg 3 - abstime
+	MOVQ	24(DI), CX		// arg 4 - lock
+	MOVQ	32(DI), R8		// arg 5 - abort
+	MOVQ	0(DI), DI		// arg 1 - id
+	CALL	libc_thrsleep(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·thrwakeup_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	8(DI), SI		// arg 2 - count
+	MOVQ	0(DI), DI		// arg 1 - id
+	CALL	libc_thrwakeup(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·exit_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	0(DI), DI		// arg 1 exit status
+	CALL	libc_exit(SB)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+TEXT runtime·getthrid_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	DI, BX			// BX is caller-save
+	CALL	libc_getthrid(SB)
+	MOVL	AX, 0(BX)		// return value
+	POPQ	BP
+	RET
+
+TEXT runtime·raiseproc_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	0(DI), BX	// signal
+	CALL	libc_getpid(SB)
+	MOVL	AX, DI		// arg 1 pid
+	MOVL	BX, SI		// arg 2 signal
+	CALL	libc_kill(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·sched_yield_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	CALL	libc_sched_yield(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·mmap_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP			// make a frame; keep stack aligned
+	MOVQ	SP, BP
+	MOVQ	DI, BX
+	MOVQ	0(BX), DI		// arg 1 addr
+	MOVQ	8(BX), SI		// arg 2 len
+	MOVL	16(BX), DX		// arg 3 prot
+	MOVL	20(BX), CX		// arg 4 flags
+	MOVL	24(BX), R8		// arg 5 fid
+	MOVL	28(BX), R9		// arg 6 offset
+	CALL	libc_mmap(SB)
+	XORL	DX, DX
+	CMPQ	AX, $-1
+	JNE	ok
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), DX		// errno
+	XORQ	AX, AX
+ok:
+	MOVQ	AX, 32(BX)
+	MOVQ	DX, 40(BX)
+	POPQ	BP
+	RET
+
+TEXT runtime·munmap_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 len
+	MOVQ	0(DI), DI		// arg 1 addr
+	CALL	libc_munmap(SB)
+	TESTQ	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+TEXT runtime·madvise_trampoline(SB), NOSPLIT, $0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 len
+	MOVL	16(DI), DX	// arg 3 advice
+	MOVQ	0(DI), DI	// arg 1 addr
+	CALL	libc_madvise(SB)
+	// ignore failure - maybe pages are locked
+	POPQ	BP
+	RET
+
+TEXT runtime·open_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	8(DI), SI		// arg 2 - flags
+	MOVL	12(DI), DX		// arg 3 - mode
+	MOVQ	0(DI), DI		// arg 1 - path
+	XORL	AX, AX			// vararg: say "no float args"
+	CALL	libc_open(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·close_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	0(DI), DI		// arg 1 - fd
+	CALL	libc_close(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·read_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 - buf
+	MOVL	16(DI), DX		// arg 3 - count
+	MOVL	0(DI), DI		// arg 1 - fd
+	CALL	libc_read(SB)
+	TESTL	AX, AX
+	JGE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX		// errno
+	NEGL	AX			// caller expects negative errno value
+noerr:
+	POPQ	BP
+	RET
+
+TEXT runtime·write_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 buf
+	MOVL	16(DI), DX		// arg 3 count
+	MOVL	0(DI), DI		// arg 1 fd
+	CALL	libc_write(SB)
+	TESTL	AX, AX
+	JGE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX		// errno
+	NEGL	AX			// caller expects negative errno value
+noerr:
+	POPQ	BP
+	RET
+
+TEXT runtime·pipe2_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	8(DI), SI		// arg 2 flags
+	MOVQ	0(DI), DI		// arg 1 filedes
+	CALL	libc_pipe2(SB)
+	TESTL	AX, AX
+	JEQ	3(PC)
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX		// errno
+	NEGL	AX			// caller expects negative errno value
+	POPQ	BP
+	RET
+
+TEXT runtime·setitimer_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 new
+	MOVQ	16(DI), DX		// arg 3 old
+	MOVL	0(DI), DI		// arg 1 which
+	CALL	libc_setitimer(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·usleep_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	0(DI), DI		// arg 1 usec
+	CALL	libc_usleep(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·sysctl_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	8(DI), SI		// arg 2 miblen
+	MOVQ	16(DI), DX		// arg 3 out
+	MOVQ	24(DI), CX		// arg 4 size
+	MOVQ	32(DI), R8		// arg 5 dst
+	MOVQ	40(DI), R9		// arg 6 ndst
+	MOVQ	0(DI), DI		// arg 1 mib
+	CALL	libc_sysctl(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·kqueue_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	CALL	libc_kqueue(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·kevent_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 keventt
+	MOVL	16(DI), DX		// arg 3 nch
+	MOVQ	24(DI), CX		// arg 4 ev
+	MOVL	32(DI), R8		// arg 5 nev
+	MOVQ	40(DI), R9		// arg 6 ts
+	MOVL	0(DI), DI		// arg 1 kq
+	CALL	libc_kevent(SB)
+	CMPL	AX, $-1
+	JNE	ok
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX		// errno
+	NEGL	AX			// caller expects negative errno value
+ok:
+	POPQ	BP
+	RET
+
+TEXT runtime·clock_gettime_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP			// make a frame; keep stack aligned
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 tp
+	MOVL	0(DI), DI		// arg 1 clock_id
+	CALL	libc_clock_gettime(SB)
+	TESTL	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+TEXT runtime·fcntl_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVL	4(DI), SI		// arg 2 cmd
+	MOVL	8(DI), DX		// arg 3 arg
+	MOVL	0(DI), DI		// arg 1 fd
+	XORL	AX, AX			// vararg: say "no float args"
+	CALL	libc_fcntl(SB)
+	POPQ	BP
+	RET
+
+TEXT runtime·sigaction_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 new
+	MOVQ	16(DI), DX		// arg 3 old
+	MOVL	0(DI), DI		// arg 1 sig
+	CALL	libc_sigaction(SB)
+	TESTL	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+TEXT runtime·sigprocmask_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI	// arg 2 new
+	MOVQ	16(DI), DX	// arg 3 old
+	MOVL	0(DI), DI	// arg 1 how
+	CALL	libc_pthread_sigmask(SB)
+	TESTL	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+TEXT runtime·sigaltstack_trampoline(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	MOVQ	8(DI), SI		// arg 2 old
+	MOVQ	0(DI), DI		// arg 1 new
+	CALL	libc_sigaltstack(SB)
+	TESTQ	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	POPQ	BP
+	RET
+
+// syscall calls a function in libc on behalf of the syscall package.
+// syscall takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	(0*8)(DI), CX // fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	CX
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (4*8)(DI) // r1
+	MOVQ	DX, (5*8)(DI) // r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPL	AX, $-1	      // Note: high 32 bits are junk
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (6*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// syscallX calls a function in libc on behalf of the syscall package.
+// syscallX takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscallX must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscallX is like syscall but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscallX(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	(0*8)(DI), CX // fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	CX
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (4*8)(DI) // r1
+	MOVQ	DX, (5*8)(DI) // r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPQ	AX, $-1
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (6*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// syscall6 calls a function in libc on behalf of the syscall package.
+// syscall6 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6 expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall6(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	(0*8)(DI), R11// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), R9 // a6
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (7*8)(DI) // r1
+	MOVQ	DX, (8*8)(DI) // r2
+
+	CMPL	AX, $-1
+	JNE	ok
+
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (9*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// syscall6X calls a function in libc on behalf of the syscall package.
+// syscall6X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6X is like syscall6 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall6X(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ	$16, SP
+	MOVQ	(0*8)(DI), R11// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), R9 // a6
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (7*8)(DI) // r1
+	MOVQ	DX, (8*8)(DI) // r2
+
+	CMPQ	AX, $-1
+	JNE	ok
+
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (9*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// syscall10 calls a function in libc on behalf of the syscall package.
+// syscall10 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall10(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ    $48, SP
+
+	// Arguments a1 to a6 get passed in registers, with a7 onwards being
+	// passed via the stack per the x86-64 System V ABI
+	// (https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf).
+	MOVQ	(7*8)(DI), R10	// a7
+	MOVQ	(8*8)(DI), R11	// a8
+	MOVQ	(9*8)(DI), R12	// a9
+	MOVQ	(10*8)(DI), R13	// a10
+	MOVQ	R10, (0*8)(SP)	// a7
+	MOVQ	R11, (1*8)(SP)	// a8
+	MOVQ	R12, (2*8)(SP)	// a9
+	MOVQ	R13, (3*8)(SP)	// a10
+	MOVQ	(0*8)(DI), R11	// fn
+	MOVQ	(2*8)(DI), SI	// a2
+	MOVQ	(3*8)(DI), DX	// a3
+	MOVQ	(4*8)(DI), CX	// a4
+	MOVQ	(5*8)(DI), R8	// a5
+	MOVQ	(6*8)(DI), R9	// a6
+	MOVQ	DI, (4*8)(SP)
+	MOVQ	(1*8)(DI), DI	// a1
+	XORL	AX, AX	     	// vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(4*8)(SP), DI
+	MOVQ	AX, (11*8)(DI) // r1
+	MOVQ	DX, (12*8)(DI) // r2
+
+	CMPL	AX, $-1
+	JNE	ok
+
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(4*8)(SP), DI
+	MOVQ	AX, (13*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// syscall10X calls a function in libc on behalf of the syscall package.
+// syscall10X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall10X is like syscall10 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall10X(SB),NOSPLIT,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	SUBQ    $48, SP
+
+	// Arguments a1 to a6 get passed in registers, with a7 onwards being
+	// passed via the stack per the x86-64 System V ABI
+	// (https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf).
+	MOVQ	(7*8)(DI), R10	// a7
+	MOVQ	(8*8)(DI), R11	// a8
+	MOVQ	(9*8)(DI), R12	// a9
+	MOVQ	(10*8)(DI), R13	// a10
+	MOVQ	R10, (0*8)(SP)	// a7
+	MOVQ	R11, (1*8)(SP)	// a8
+	MOVQ	R12, (2*8)(SP)	// a9
+	MOVQ	R13, (3*8)(SP)	// a10
+	MOVQ	(0*8)(DI), R11	// fn
+	MOVQ	(2*8)(DI), SI	// a2
+	MOVQ	(3*8)(DI), DX	// a3
+	MOVQ	(4*8)(DI), CX	// a4
+	MOVQ	(5*8)(DI), R8	// a5
+	MOVQ	(6*8)(DI), R9	// a6
+	MOVQ	DI, (4*8)(SP)
+	MOVQ	(1*8)(DI), DI	// a1
+	XORL	AX, AX	     	// vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(4*8)(SP), DI
+	MOVQ	AX, (11*8)(DI) // r1
+	MOVQ	DX, (12*8)(DI) // r2
+
+	CMPQ	AX, $-1
+	JNE	ok
+
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(4*8)(SP), DI
+	MOVQ	AX, (13*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
diff --git a/src/runtime/sys_openbsd_arm.s b/src/runtime/sys_openbsd_arm.s
new file mode 100644
index 0000000..9e18ce0
--- /dev/null
+++ b/src/runtime/sys_openbsd_arm.s
@@ -0,0 +1,435 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for ARM, OpenBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME	$0
+#define	CLOCK_MONOTONIC	$3
+
+// With OpenBSD 6.7 onwards, an armv7 syscall returns two instructions
+// after the SWI instruction, to allow for a speculative execution
+// barrier to be placed after the SWI without impacting performance.
+// For now use hardware no-ops as this works with both older and newer
+// kernels. After OpenBSD 6.8 is released this should be changed to
+// speculation barriers.
+#define NOOP	MOVW    R0, R0
+#define	INVOKE_SYSCALL	\
+	SWI	$0;	\
+	NOOP;		\
+	NOOP
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0
+	MOVW	code+0(FP), R0	// arg 1 - status
+	MOVW	$1, R12			// sys_exit
+	INVOKE_SYSCALL
+	MOVW.CS	$0, R8			// crash on syscall failure
+	MOVW.CS	R8, (R8)
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVW	wait+0(FP), R0		// arg 1 - notdead
+	MOVW	$302, R12		// sys___threxit
+	INVOKE_SYSCALL
+	MOVW.CS	$1, R8			// crash on syscall failure
+	MOVW.CS	R8, (R8)
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0
+	MOVW	name+0(FP), R0		// arg 1 - path
+	MOVW	mode+4(FP), R1		// arg 2 - mode
+	MOVW	perm+8(FP), R2		// arg 3 - perm
+	MOVW	$5, R12			// sys_open
+	INVOKE_SYSCALL
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	MOVW	$6, R12			// sys_close
+	INVOKE_SYSCALL
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+4(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	MOVW	p+4(FP), R1		// arg 2 - buf
+	MOVW	n+8(FP), R2		// arg 3 - nbyte
+	MOVW	$3, R12			// sys_read
+	INVOKE_SYSCALL
+	RSB.CS	$0, R0		// caller expects negative errno
+	MOVW	R0, ret+12(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT,$0-12
+	MOVW	$r+0(FP), R0
+	MOVW	$263, R12
+	INVOKE_SYSCALL
+	MOVW	R0, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVW	$r+4(FP), R0
+	MOVW	flags+0(FP), R1
+	MOVW	$101, R12
+	INVOKE_SYSCALL
+	MOVW	R0, errno+12(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	MOVW	p+4(FP), R1		// arg 2 - buf
+	MOVW	n+8(FP), R2		// arg 3 - nbyte
+	MOVW	$4, R12			// sys_write
+	INVOKE_SYSCALL
+	RSB.CS	$0, R0		// caller expects negative errno
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVW	usec+0(FP), R0
+	CALL	runtime·usplitR0(SB)
+	MOVW	R0, 4(R13)		// tv_sec - l32
+	MOVW	$0, R0
+	MOVW	R0, 8(R13)		// tv_sec - h32
+	MOVW	$1000, R2
+	MUL	R1, R2
+	MOVW	R2, 12(R13)		// tv_nsec
+
+	MOVW	$4(R13), R0		// arg 1 - rqtp
+	MOVW	$0, R1			// arg 2 - rmtp
+	MOVW	$91, R12		// sys_nanosleep
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·getthrid(SB),NOSPLIT,$0-4
+	MOVW	$299, R12		// sys_getthrid
+	INVOKE_SYSCALL
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT runtime·thrkill(SB),NOSPLIT,$0-8
+	MOVW	tid+0(FP), R0		// arg 1 - tid
+	MOVW	sig+4(FP), R1		// arg 2 - signum
+	MOVW	$0, R2			// arg 3 - tcb
+	MOVW	$119, R12		// sys_thrkill
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$12
+	MOVW	$20, R12		// sys_getpid
+	INVOKE_SYSCALL
+					// arg 1 - pid, already in R0
+	MOVW	sig+0(FP), R1		// arg 2 - signum
+	MOVW	$122, R12		// sys_kill
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$16
+	MOVW	addr+0(FP), R0		// arg 1 - addr
+	MOVW	n+4(FP), R1		// arg 2 - len
+	MOVW	prot+8(FP), R2		// arg 3 - prot
+	MOVW	flags+12(FP), R3	// arg 4 - flags
+	MOVW	fd+16(FP), R4		// arg 5 - fd (on stack)
+	MOVW	R4, 4(R13)
+	MOVW	$0, R5			// arg 6 - pad (on stack)
+	MOVW	R5, 8(R13)
+	MOVW	off+20(FP), R6		// arg 7 - offset (on stack)
+	MOVW	R6, 12(R13)		// lower 32 bits (from Go runtime)
+	MOVW	$0, R7
+	MOVW	R7, 16(R13)		// high 32 bits
+	ADD	$4, R13
+	MOVW	$197, R12		// sys_mmap
+	INVOKE_SYSCALL
+	SUB	$4, R13
+	MOVW	$0, R1
+	MOVW.CS	R0, R1			// if error, move to R1
+	MOVW.CS $0, R0
+	MOVW	R0, p+24(FP)
+	MOVW	R1, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0		// arg 1 - addr
+	MOVW	n+4(FP), R1		// arg 2 - len
+	MOVW	$73, R12		// sys_munmap
+	INVOKE_SYSCALL
+	MOVW.CS	$0, R8			// crash on syscall failure
+	MOVW.CS	R8, (R8)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0		// arg 1 - addr
+	MOVW	n+4(FP), R1		// arg 2 - len
+	MOVW	flags+8(FP), R2		// arg 2 - flags
+	MOVW	$75, R12		// sys_madvise
+	INVOKE_SYSCALL
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0
+	MOVW	mode+0(FP), R0		// arg 1 - mode
+	MOVW	new+4(FP), R1		// arg 2 - new value
+	MOVW	old+8(FP), R2		// arg 3 - old value
+	MOVW	$69, R12		// sys_setitimer
+	INVOKE_SYSCALL
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB), NOSPLIT, $32
+	MOVW	CLOCK_REALTIME, R0	// arg 1 - clock_id
+	MOVW	$8(R13), R1		// arg 2 - tp
+	MOVW	$87, R12		// sys_clock_gettime
+	INVOKE_SYSCALL
+
+	MOVW	8(R13), R0		// sec - l32
+	MOVW	12(R13), R1		// sec - h32
+	MOVW	16(R13), R2		// nsec
+
+	MOVW	R0, sec_lo+0(FP)
+	MOVW	R1, sec_hi+4(FP)
+	MOVW	R2, nsec+8(FP)
+
+	RET
+
+// int64 nanotime1(void) so really
+// void nanotime1(int64 *nsec)
+TEXT runtime·nanotime1(SB),NOSPLIT,$32
+	MOVW	CLOCK_MONOTONIC, R0	// arg 1 - clock_id
+	MOVW	$8(R13), R1		// arg 2 - tp
+	MOVW	$87, R12		// sys_clock_gettime
+	INVOKE_SYSCALL
+
+	MOVW	8(R13), R0		// sec - l32
+	MOVW	12(R13), R4		// sec - h32
+	MOVW	16(R13), R2		// nsec
+
+	MOVW	$1000000000, R3
+	MULLU	R0, R3, (R1, R0)
+	MUL	R3, R4
+	ADD.S	R2, R0
+	ADC	R4, R1
+
+	MOVW	R0, ret_lo+0(FP)
+	MOVW	R1, ret_hi+4(FP)
+	RET
+
+TEXT runtime·sigaction(SB),NOSPLIT,$0
+	MOVW	sig+0(FP), R0		// arg 1 - signum
+	MOVW	new+4(FP), R1		// arg 2 - new sigaction
+	MOVW	old+8(FP), R2		// arg 3 - old sigaction
+	MOVW	$46, R12		// sys_sigaction
+	INVOKE_SYSCALL
+	MOVW.CS	$3, R8			// crash on syscall failure
+	MOVW.CS	R8, (R8)
+	RET
+
+TEXT runtime·obsdsigprocmask(SB),NOSPLIT,$0
+	MOVW	how+0(FP), R0		// arg 1 - mode
+	MOVW	new+4(FP), R1		// arg 2 - new
+	MOVW	$48, R12		// sys_sigprocmask
+	INVOKE_SYSCALL
+	MOVW.CS	$3, R8			// crash on syscall failure
+	MOVW.CS	R8, (R8)
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVW	sig+4(FP), R0
+	MOVW	info+8(FP), R1
+	MOVW	ctx+12(FP), R2
+	MOVW	fn+0(FP), R11
+	MOVW	R13, R4
+	SUB	$24, R13
+	BIC	$0x7, R13 // alignment for ELF ABI
+	BL	(R11)
+	MOVW	R4, R13
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$0
+	// Reserve space for callee-save registers and arguments.
+	MOVM.DB.W [R4-R11], (R13)
+	SUB	$16, R13
+
+	// If called from an external code context, g will not be set.
+	// Save R0, since runtime·load_g will clobber it.
+	MOVW	R0, 4(R13)		// signum
+	MOVB	runtime·iscgo(SB), R0
+	CMP	$0, R0
+	BL.NE	runtime·load_g(SB)
+
+	MOVW	R1, 8(R13)
+	MOVW	R2, 12(R13)
+	BL	runtime·sigtrampgo(SB)
+
+	// Restore callee-save registers.
+	ADD	$16, R13
+	MOVM.IA.W (R13), [R4-R11]
+
+	RET
+
+// int32 tfork(void *param, uintptr psize, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·tfork(SB),NOSPLIT,$0
+
+	// Copy mp, gp and fn off parent stack for use by child.
+	MOVW	mm+8(FP), R4
+	MOVW	gg+12(FP), R5
+	MOVW	fn+16(FP), R6
+
+	MOVW	param+0(FP), R0		// arg 1 - param
+	MOVW	psize+4(FP), R1		// arg 2 - psize
+	MOVW	$8, R12			// sys___tfork
+	INVOKE_SYSCALL
+
+	// Return if syscall failed.
+	B.CC	4(PC)
+	RSB	$0, R0
+	MOVW	R0, ret+20(FP)
+	RET
+
+	// In parent, return.
+	CMP	$0, R0
+	BEQ	3(PC)
+	MOVW	R0, ret+20(FP)
+	RET
+
+	// Initialise m, g.
+	MOVW	R5, g
+	MOVW	R4, g_m(g)
+
+	// Paranoia; check that stack splitting code works.
+	BL	runtime·emptyfunc(SB)
+
+	// Call fn.
+	BL	(R6)
+
+	// fn should never return.
+	MOVW	$2, R8			// crash if reached
+	MOVW	R8, (R8)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVW	new+0(FP), R0		// arg 1 - new sigaltstack
+	MOVW	old+4(FP), R1		// arg 2 - old sigaltstack
+	MOVW	$288, R12		// sys_sigaltstack
+	INVOKE_SYSCALL
+	MOVW.CS	$0, R8			// crash on syscall failure
+	MOVW.CS	R8, (R8)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVW	$298, R12		// sys_sched_yield
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·thrsleep(SB),NOSPLIT,$4
+	MOVW	ident+0(FP), R0		// arg 1 - ident
+	MOVW	clock_id+4(FP), R1	// arg 2 - clock_id
+	MOVW	tsp+8(FP), R2		// arg 3 - tsp
+	MOVW	lock+12(FP), R3		// arg 4 - lock
+	MOVW	abort+16(FP), R4	// arg 5 - abort (on stack)
+	MOVW	R4, 4(R13)
+	ADD	$4, R13
+	MOVW	$94, R12		// sys___thrsleep
+	INVOKE_SYSCALL
+	SUB	$4, R13
+	MOVW	R0, ret+20(FP)
+	RET
+
+TEXT runtime·thrwakeup(SB),NOSPLIT,$0
+	MOVW	ident+0(FP), R0		// arg 1 - ident
+	MOVW	n+4(FP), R1		// arg 2 - n
+	MOVW	$301, R12		// sys___thrwakeup
+	INVOKE_SYSCALL
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$8
+	MOVW	mib+0(FP), R0		// arg 1 - mib
+	MOVW	miblen+4(FP), R1	// arg 2 - miblen
+	MOVW	out+8(FP), R2		// arg 3 - out
+	MOVW	size+12(FP), R3		// arg 4 - size
+	MOVW	dst+16(FP), R4		// arg 5 - dest (on stack)
+	MOVW	R4, 4(R13)
+	MOVW	ndst+20(FP), R5		// arg 6 - newlen (on stack)
+	MOVW	R5, 8(R13)
+	ADD	$4, R13
+	MOVW	$202, R12		// sys___sysctl
+	INVOKE_SYSCALL
+	SUB	$4, R13
+	MOVW.CC	$0, R0
+	RSB.CS	$0, R0
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int32 runtime·kqueue(void);
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVW	$269, R12		// sys_kqueue
+	INVOKE_SYSCALL
+	RSB.CS	$0, R0
+	MOVW	R0, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout);
+TEXT runtime·kevent(SB),NOSPLIT,$8
+	MOVW	kq+0(FP), R0		// arg 1 - kq
+	MOVW	ch+4(FP), R1		// arg 2 - changelist
+	MOVW	nch+8(FP), R2		// arg 3 - nchanges
+	MOVW	ev+12(FP), R3		// arg 4 - eventlist
+	MOVW	nev+16(FP), R4		// arg 5 - nevents (on stack)
+	MOVW	R4, 4(R13)
+	MOVW	ts+20(FP), R5		// arg 6 - timeout (on stack)
+	MOVW	R5, 8(R13)
+	ADD	$4, R13
+	MOVW	$72, R12		// sys_kevent
+	INVOKE_SYSCALL
+	RSB.CS	$0, R0
+	SUB	$4, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func closeonexec(fd int32)
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	MOVW	$2, R1			// arg 2 - cmd (F_SETFD)
+	MOVW	$1, R2			// arg 3 - arg (FD_CLOEXEC)
+	MOVW	$92, R12		// sys_fcntl
+	INVOKE_SYSCALL
+	RET
+
+// func runtime·setNonblock(fd int32)
+TEXT runtime·setNonblock(SB),NOSPLIT,$0-4
+	MOVW	fd+0(FP), R0	// fd
+	MOVW	$3, R1	// F_GETFL
+	MOVW	$0, R2
+	MOVW	$92, R12
+	INVOKE_SYSCALL
+	ORR	$0x4, R0, R2	// O_NONBLOCK
+	MOVW	fd+0(FP), R0	// fd
+	MOVW	$4, R1	// F_SETFL
+	MOVW	$92, R12
+	INVOKE_SYSCALL
+	RET
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	B	runtime·armPublicationBarrier(SB)
+
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	MOVM.WP	[R1, R2, R3, R12], (R13)
+	MOVW	$330, R12		// sys___get_tcb
+	INVOKE_SYSCALL
+	MOVM.IAW (R13), [R1, R2, R3, R12]
+	RET
diff --git a/src/runtime/sys_openbsd_arm64.s b/src/runtime/sys_openbsd_arm64.s
new file mode 100644
index 0000000..9b4acc9
--- /dev/null
+++ b/src/runtime/sys_openbsd_arm64.s
@@ -0,0 +1,700 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for arm64, OpenBSD
+// System calls are implemented in libc/libpthread, this file
+// contains trampolines that convert from Go to C calling convention.
+// Some direct system call implementations currently remain.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME	$0
+#define	CLOCK_MONOTONIC	$3
+
+// mstart_stub is the first function executed on a new thread started by pthread_create.
+// It just does some low-level setup and then calls mstart.
+// Note: called with the C calling convention.
+TEXT runtime·mstart_stub(SB),NOSPLIT,$160
+	// R0 points to the m.
+	// We are already on m's g0 stack.
+
+	// Save callee-save registers.
+	MOVD	R19, 8(RSP)
+	MOVD	R20, 16(RSP)
+	MOVD	R21, 24(RSP)
+	MOVD	R22, 32(RSP)
+	MOVD	R23, 40(RSP)
+	MOVD	R24, 48(RSP)
+	MOVD	R25, 56(RSP)
+	MOVD	R26, 64(RSP)
+	MOVD	R27, 72(RSP)
+	MOVD	g, 80(RSP)
+	MOVD	R29, 88(RSP)
+	FMOVD	F8, 96(RSP)
+	FMOVD	F9, 104(RSP)
+	FMOVD	F10, 112(RSP)
+	FMOVD	F11, 120(RSP)
+	FMOVD	F12, 128(RSP)
+	FMOVD	F13, 136(RSP)
+	FMOVD	F14, 144(RSP)
+	FMOVD	F15, 152(RSP)
+
+	MOVD    m_g0(R0), g
+	BL	runtime·save_g(SB)
+
+	BL	runtime·mstart(SB)
+
+	// Restore callee-save registers.
+	MOVD	8(RSP), R19
+	MOVD	16(RSP), R20
+	MOVD	24(RSP), R21
+	MOVD	32(RSP), R22
+	MOVD	40(RSP), R23
+	MOVD	48(RSP), R24
+	MOVD	56(RSP), R25
+	MOVD	64(RSP), R26
+	MOVD	72(RSP), R27
+	MOVD	80(RSP), g
+	MOVD	88(RSP), R29
+	FMOVD	96(RSP), F8
+	FMOVD	104(RSP), F9
+	FMOVD	112(RSP), F10
+	FMOVD	120(RSP), F11
+	FMOVD	128(RSP), F12
+	FMOVD	136(RSP), F13
+	FMOVD	144(RSP), F14
+	FMOVD	152(RSP), F15
+
+	// Go is all done with this OS thread.
+	// Tell pthread everything is ok (we never join with this thread, so
+	// the value here doesn't really matter).
+	MOVD	$0, R0
+
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R0
+	MOVD	info+16(FP), R1
+	MOVD	ctx+24(FP), R2
+	MOVD	fn+0(FP), R11
+	BL	(R11)			// Alignment for ELF ABI?
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$192
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	MOVD	R19, 8*4(RSP)
+	MOVD	R20, 8*5(RSP)
+	MOVD	R21, 8*6(RSP)
+	MOVD	R22, 8*7(RSP)
+	MOVD	R23, 8*8(RSP)
+	MOVD	R24, 8*9(RSP)
+	MOVD	R25, 8*10(RSP)
+	MOVD	R26, 8*11(RSP)
+	MOVD	R27, 8*12(RSP)
+	MOVD	g, 8*13(RSP)
+	MOVD	R29, 8*14(RSP)
+	FMOVD	F8, 8*15(RSP)
+	FMOVD	F9, 8*16(RSP)
+	FMOVD	F10, 8*17(RSP)
+	FMOVD	F11, 8*18(RSP)
+	FMOVD	F12, 8*19(RSP)
+	FMOVD	F13, 8*20(RSP)
+	FMOVD	F14, 8*21(RSP)
+	FMOVD	F15, 8*22(RSP)
+
+	// If called from an external code context, g will not be set.
+	// Save R0, since runtime·load_g will clobber it.
+	MOVW	R0, 8(RSP)		// signum
+	BL	runtime·load_g(SB)
+
+	MOVD	R1, 16(RSP)
+	MOVD	R2, 24(RSP)
+	BL	runtime·sigtrampgo(SB)
+
+	// Restore callee-save registers.
+	MOVD	8*4(RSP), R19
+	MOVD	8*5(RSP), R20
+	MOVD	8*6(RSP), R21
+	MOVD	8*7(RSP), R22
+	MOVD	8*8(RSP), R23
+	MOVD	8*9(RSP), R24
+	MOVD	8*10(RSP), R25
+	MOVD	8*11(RSP), R26
+	MOVD	8*12(RSP), R27
+	MOVD	8*13(RSP), g
+	MOVD	8*14(RSP), R29
+	FMOVD	8*15(RSP), F8
+	FMOVD	8*16(RSP), F9
+	FMOVD	8*17(RSP), F10
+	FMOVD	8*18(RSP), F11
+	FMOVD	8*19(RSP), F12
+	FMOVD	8*20(RSP), F13
+	FMOVD	8*21(RSP), F14
+	FMOVD	8*22(RSP), F15
+
+	RET
+
+//
+// These trampolines help convert from Go calling convention to C calling convention.
+// They should be called with asmcgocall.
+// A pointer to the arguments is passed in R0.
+// A single int32 result is returned in R0.
+// (For more results, make an args/results structure.)
+TEXT runtime·pthread_attr_init_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0		// arg 1 - attr
+	CALL	libc_pthread_attr_init(SB)
+	RET
+
+TEXT runtime·pthread_attr_destroy_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0		// arg 1 - attr
+	CALL	libc_pthread_attr_destroy(SB)
+	RET
+
+TEXT runtime·pthread_attr_getstacksize_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - size
+	MOVD	0(R0), R0		// arg 1 - attr
+	CALL	libc_pthread_attr_getstacksize(SB)
+	RET
+
+TEXT runtime·pthread_attr_setdetachstate_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - state
+	MOVD	0(R0), R0		// arg 1 - attr
+	CALL	libc_pthread_attr_setdetachstate(SB)
+	RET
+
+TEXT runtime·pthread_create_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R1		// arg 2 - attr
+	MOVD	8(R0), R2		// arg 3 - start
+	MOVD	16(R0), R3		// arg 4 - arg
+	SUB	$16, RSP
+	MOVD	RSP, R0			// arg 1 - &threadid (discard)
+	CALL	libc_pthread_create(SB)
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·thrkill_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - signal
+	MOVD	$0, R2			// arg 3 - tcb
+	MOVW	0(R0), R0		// arg 1 - tid
+	CALL	libc_thrkill(SB)
+	RET
+
+TEXT runtime·thrsleep_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - clock_id
+	MOVD	16(R0), R2		// arg 3 - abstime
+	MOVD	24(R0), R3		// arg 4 - lock
+	MOVD	32(R0), R4		// arg 5 - abort
+	MOVD	0(R0), R0		// arg 1 - id
+	CALL	libc_thrsleep(SB)
+	RET
+
+TEXT runtime·thrwakeup_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - count
+	MOVD	0(R0), R0		// arg 1 - id
+	CALL	libc_thrwakeup(SB)
+	RET
+
+TEXT runtime·exit_trampoline(SB),NOSPLIT,$0
+	MOVW	0(R0), R0		// arg 1 - status
+	CALL	libc_exit(SB)
+	MOVD	$0, R0			// crash on failure
+	MOVD	R0, (R0)
+	RET
+
+TEXT runtime·getthrid_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R19			// pointer to args
+	CALL	libc_getthrid(SB)
+	MOVW	R0, 0(R19)		// return value
+	RET
+
+TEXT runtime·raiseproc_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R19			// pointer to args
+	CALL	libc_getpid(SB)		// arg 1 - pid
+	MOVW	0(R19), R1		// arg 2 - signal
+	CALL	libc_kill(SB)
+	RET
+
+TEXT runtime·sched_yield_trampoline(SB),NOSPLIT,$0
+	CALL	libc_sched_yield(SB)
+	RET
+
+TEXT runtime·mmap_trampoline(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+	MOVD	0(R19), R0		// arg 1 - addr
+	MOVD	8(R19), R1		// arg 2 - len
+	MOVW	16(R19), R2		// arg 3 - prot
+	MOVW	20(R19), R3		// arg 4 - flags
+	MOVW	24(R19), R4		// arg 5 - fid
+	MOVW	28(R19), R5		// arg 6 - offset
+	CALL	libc_mmap(SB)
+	MOVD	$0, R1
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R1		// errno
+	MOVD	$0, R0
+noerr:
+	MOVD	R0, 32(R19)
+	MOVD	R1, 40(R19)
+	RET
+
+TEXT runtime·munmap_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - len
+	MOVD	0(R0), R0		// arg 1 - addr
+	CALL	libc_munmap(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVD	$0, R0			// crash on failure
+	MOVD	R0, (R0)
+	RET
+
+TEXT runtime·madvise_trampoline(SB), NOSPLIT, $0
+	MOVD	8(R0), R1		// arg 2 - len
+	MOVW	16(R0), R2		// arg 3 - advice
+	MOVD	0(R0), R0		// arg 1 - addr
+	CALL	libc_madvise(SB)
+	// ignore failure - maybe pages are locked
+	RET
+
+TEXT runtime·open_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - flags
+	MOVW	12(R0), R2		// arg 3 - mode
+	MOVD	0(R0), R0		// arg 1 - path
+	MOVD	$0, R3			// varargs
+	CALL	libc_open(SB)
+	RET
+
+TEXT runtime·close_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0		// arg 1 - fd
+	CALL	libc_close(SB)
+	RET
+
+TEXT runtime·read_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - buf
+	MOVW	16(R0), R2		// arg 3 - count
+	MOVW	0(R0), R0		// arg 1 - fd
+	CALL	libc_read(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	NEG	R0, R0			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·write_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - buf
+	MOVW	16(R0), R2		// arg 3 - count
+	MOVW	0(R0), R0		// arg 1 - fd
+	CALL	libc_write(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	NEG	R0, R0			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·pipe2_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - flags
+	MOVD	0(R0), R0		// arg 1 - filedes
+	CALL	libc_pipe2(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	NEG	R0, R0			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·setitimer_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - new
+	MOVD	16(R0), R2		// arg 3 - old
+	MOVW	0(R0), R0		// arg 1 - which
+	CALL	libc_setitimer(SB)
+	RET
+
+TEXT runtime·usleep_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0		// arg 1 - usec
+	CALL	libc_usleep(SB)
+	RET
+
+TEXT runtime·sysctl_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - miblen
+	MOVD	16(R0), R2		// arg 3 - out
+	MOVD	24(R0), R3		// arg 4 - size
+	MOVD	32(R0), R4		// arg 5 - dst
+	MOVD	40(R0), R5		// arg 6 - ndst
+	MOVD	0(R0), R0		// arg 1 - mib
+	CALL	libc_sysctl(SB)
+	RET
+
+TEXT runtime·kqueue_trampoline(SB),NOSPLIT,$0
+	CALL	libc_kqueue(SB)
+	RET
+
+TEXT runtime·kevent_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - keventt
+	MOVW	16(R0), R2		// arg 3 - nch
+	MOVD	24(R0), R3		// arg 4 - ev
+	MOVW	32(R0), R4		// arg 5 - nev
+	MOVD	40(R0), R5		// arg 6 - ts
+	MOVW	0(R0), R0		// arg 1 - kq
+	CALL	libc_kevent(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	NEG	R0, R0			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·clock_gettime_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - tp
+	MOVD	0(R0), R0		// arg 1 - clock_id
+	CALL	libc_clock_gettime(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVD	$0, R0			// crash on failure
+	MOVD	R0, (R0)
+	RET
+
+TEXT runtime·fcntl_trampoline(SB),NOSPLIT,$0
+	MOVW	4(R0), R1		// arg 2 - cmd
+	MOVW	8(R0), R2		// arg 3 - arg
+	MOVW	0(R0), R0		// arg 1 - fd
+	MOVD	$0, R3			// vararg
+	CALL	libc_fcntl(SB)
+	RET
+
+TEXT runtime·sigaction_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - new
+	MOVD	16(R0), R2		// arg 3 - old
+	MOVW	0(R0), R0		// arg 1 - sig
+	CALL	libc_sigaction(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVD	$0, R0			// crash on syscall failure
+	MOVD	R0, (R0)
+	RET
+
+TEXT runtime·sigprocmask_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - new
+	MOVD	16(R0), R2		// arg 3 - old
+	MOVW	0(R0), R0		// arg 1 - how
+	CALL	libc_pthread_sigmask(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVD	$0, R0			// crash on syscall failure
+	MOVD	R0, (R0)
+	RET
+
+TEXT runtime·sigaltstack_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - old
+	MOVD	0(R0), R0		// arg 1 - new
+	CALL	libc_sigaltstack(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVD	$0, R0			// crash on syscall failure
+	MOVD	R0, (R0)
+	RET
+
+// syscall calls a function in libc on behalf of the syscall package.
+// syscall takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	$0, R3			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (4*8)(R19)		// r1
+	MOVD	R1, (5*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPW	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (6*8)(R19)		// err
+
+ok:
+	RET
+
+// syscallX calls a function in libc on behalf of the syscall package.
+// syscallX takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscallX must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscallX is like syscall but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscallX(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	$0, R3			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (4*8)(R19)		// r1
+	MOVD	R1, (5*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMP	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (6*8)(R19)		// err
+
+ok:
+	RET
+
+// syscall6 calls a function in libc on behalf of the syscall package.
+// syscall6 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6 expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall6(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	(4*8)(R19), R3		// a4
+	MOVD	(5*8)(R19), R4		// a5
+	MOVD	(6*8)(R19), R5		// a6
+	MOVD	$0, R6			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (7*8)(R19)		// r1
+	MOVD	R1, (8*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPW	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (9*8)(R19)		// err
+
+ok:
+	RET
+
+// syscall6X calls a function in libc on behalf of the syscall package.
+// syscall6X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6X is like syscall6 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall6X(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	(4*8)(R19), R3		// a4
+	MOVD	(5*8)(R19), R4		// a5
+	MOVD	(6*8)(R19), R5		// a6
+	MOVD	$0, R6			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (7*8)(R19)		// r1
+	MOVD	R1, (8*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMP	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (9*8)(R19)		// err
+
+ok:
+	RET
+
+// syscall10 calls a function in libc on behalf of the syscall package.
+// syscall10 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall10(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	(4*8)(R19), R3		// a4
+	MOVD	(5*8)(R19), R4		// a5
+	MOVD	(6*8)(R19), R5		// a6
+	MOVD	(7*8)(R19), R6		// a7
+	MOVD	(8*8)(R19), R7		// a8
+	MOVD	(9*8)(R19), R8		// a9
+	MOVD	(10*8)(R19), R9		// a10
+	MOVD	$0, R10			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (11*8)(R19)		// r1
+	MOVD	R1, (12*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPW	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (13*8)(R19)		// err
+
+ok:
+	RET
+
+// syscall10X calls a function in libc on behalf of the syscall package.
+// syscall10X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall10X is like syscall10 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall10X(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	(4*8)(R19), R3		// a4
+	MOVD	(5*8)(R19), R4		// a5
+	MOVD	(6*8)(R19), R5		// a6
+	MOVD	(7*8)(R19), R6		// a7
+	MOVD	(8*8)(R19), R7		// a8
+	MOVD	(9*8)(R19), R8		// a9
+	MOVD	(10*8)(R19), R9		// a10
+	MOVD	$0, R10			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (11*8)(R19)		// r1
+	MOVD	R1, (12*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMP	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (13*8)(R19)		// err
+
+ok:
+	RET
diff --git a/src/runtime/sys_openbsd_mips64.s b/src/runtime/sys_openbsd_mips64.s
new file mode 100644
index 0000000..3e4d209
--- /dev/null
+++ b/src/runtime/sys_openbsd_mips64.s
@@ -0,0 +1,400 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for mips64, OpenBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME	$0
+#define	CLOCK_MONOTONIC	$3
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0
+	MOVW	code+0(FP), R4		// arg 1 - status
+	MOVV	$1, R2			// sys_exit
+	SYSCALL
+	BEQ	R7, 3(PC)
+	MOVV	$0, R2			// crash on syscall failure
+	MOVV	R2, (R2)
+	RET
+
+// func exitThread(wait *uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0
+	MOVV	wait+0(FP), R4		// arg 1 - notdead
+	MOVV	$302, R2		// sys___threxit
+	SYSCALL
+	MOVV	$0, R2			// crash on syscall failure
+	MOVV	R2, (R2)
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0
+	MOVV	name+0(FP), R4		// arg 1 - path
+	MOVW	mode+8(FP), R5		// arg 2 - mode
+	MOVW	perm+12(FP), R6		// arg 3 - perm
+	MOVV	$5, R2			// sys_open
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R4		// arg 1 - fd
+	MOVV	$6, R2			// sys_close
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+8(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R4		// arg 1 - fd
+	MOVV	p+8(FP), R5		// arg 2 - buf
+	MOVW	n+16(FP), R6		// arg 3 - nbyte
+	MOVV	$3, R2			// sys_read
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func pipe() (r, w int32, errno int32)
+TEXT runtime·pipe(SB),NOSPLIT|NOFRAME,$0-12
+	MOVV	$r+0(FP), R4
+	MOVW	$0, R5
+	MOVV	$101, R2		// sys_pipe2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, errno+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVV	$r+8(FP), R4
+	MOVW	flags+0(FP), R5
+	MOVV	$101, R2		// sys_pipe2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, errno+16(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0
+	MOVV	fd+0(FP), R4		// arg 1 - fd
+	MOVV	p+8(FP), R5		// arg 2 - buf
+	MOVW	n+16(FP), R6		// arg 3 - nbyte
+	MOVV	$4, R2			// sys_write
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), R3
+	MOVV	R3, R5
+	MOVW	$1000000, R4
+	DIVVU	R4, R3
+	MOVV	LO, R3
+	MOVV	R3, 8(R29)		// tv_sec
+	MOVW	$1000, R4
+	MULVU	R3, R4
+	MOVV	LO, R4
+	SUBVU	R4, R5
+	MOVV	R5, 16(R29)		// tv_nsec
+
+	ADDV	$8, R29, R4		// arg 1 - rqtp
+	MOVV	$0, R5			// arg 2 - rmtp
+	MOVV	$91, R2			// sys_nanosleep
+	SYSCALL
+	RET
+
+TEXT runtime·getthrid(SB),NOSPLIT,$0-4
+	MOVV	$299, R2		// sys_getthrid
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT runtime·thrkill(SB),NOSPLIT,$0-16
+	MOVW	tid+0(FP), R4		// arg 1 - tid
+	MOVV	sig+8(FP), R5		// arg 2 - signum
+	MOVW	$0, R6			// arg 3 - tcb
+	MOVV	$119, R2		// sys_thrkill
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	MOVV	$20, R4			// sys_getpid
+	SYSCALL
+	MOVV	R2, R4			// arg 1 - pid
+	MOVW	sig+0(FP), R5		// arg 2 - signum
+	MOVV	$122, R2		// sys_kill
+	SYSCALL
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVV	addr+0(FP), R4		// arg 1 - addr
+	MOVV	n+8(FP), R5		// arg 2 - len
+	MOVW	prot+16(FP), R6		// arg 3 - prot
+	MOVW	flags+20(FP), R7	// arg 4 - flags
+	MOVW	fd+24(FP), R8		// arg 5 - fd
+	MOVW	$0, R9			// arg 6 - pad
+	MOVW	off+28(FP), R10		// arg 7 - offset
+	MOVV	$197, R2		// sys_mmap
+	SYSCALL
+	MOVV	$0, R4
+	BEQ	R7, 3(PC)
+	MOVV	R2, R4			// if error, move to R4
+	MOVV	$0, R2
+	MOVV	R2, p+32(FP)
+	MOVV	R4, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVV	addr+0(FP), R4		// arg 1 - addr
+	MOVV	n+8(FP), R5		// arg 2 - len
+	MOVV	$73, R2			// sys_munmap
+	SYSCALL
+	BEQ	R7, 3(PC)
+	MOVV	$0, R2			// crash on syscall failure
+	MOVV	R2, (R2)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVV	addr+0(FP), R4		// arg 1 - addr
+	MOVV	n+8(FP), R5		// arg 2 - len
+	MOVW	flags+16(FP), R6	// arg 2 - flags
+	MOVV	$75, R2			// sys_madvise
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0
+	MOVW	mode+0(FP), R4		// arg 1 - mode
+	MOVV	new+8(FP), R5		// arg 2 - new value
+	MOVV	old+16(FP), R6		// arg 3 - old value
+	MOVV	$69, R2			// sys_setitimer
+	SYSCALL
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB), NOSPLIT, $32
+	MOVW	CLOCK_REALTIME, R4	// arg 1 - clock_id
+	MOVV	$8(R29), R5		// arg 2 - tp
+	MOVV	$87, R2			// sys_clock_gettime
+	SYSCALL
+
+	MOVV	8(R29), R4		// sec
+	MOVV	16(R29), R5		// nsec
+	MOVV	R4, sec+0(FP)
+	MOVW	R5, nsec+8(FP)
+
+	RET
+
+// int64 nanotime1(void) so really
+// void nanotime1(int64 *nsec)
+TEXT runtime·nanotime1(SB),NOSPLIT,$32
+	MOVW	CLOCK_MONOTONIC, R4	// arg 1 - clock_id
+	MOVV	$8(R29), R5		// arg 2 - tp
+	MOVV	$87, R2			// sys_clock_gettime
+	SYSCALL
+
+	MOVV	8(R29), R3		// sec
+	MOVV	16(R29), R5		// nsec
+
+	MOVV	$1000000000, R4
+	MULVU	R4, R3
+	MOVV	LO, R3
+	ADDVU	R5, R3
+	MOVV	R3, ret+0(FP)
+	RET
+
+TEXT runtime·sigaction(SB),NOSPLIT,$0
+	MOVW	sig+0(FP), R4		// arg 1 - signum
+	MOVV	new+8(FP), R5		// arg 2 - new sigaction
+	MOVV	old+16(FP), R6		// arg 3 - old sigaction
+	MOVV	$46, R2			// sys_sigaction
+	SYSCALL
+	BEQ	R7, 3(PC)
+	MOVV	$3, R2			// crash on syscall failure
+	MOVV	R2, (R2)
+	RET
+
+TEXT runtime·obsdsigprocmask(SB),NOSPLIT,$0
+	MOVW	how+0(FP), R4		// arg 1 - mode
+	MOVW	new+4(FP), R5		// arg 2 - new
+	MOVV	$48, R2			// sys_sigprocmask
+	SYSCALL
+	BEQ	R7, 3(PC)
+	MOVV	$3, R2			// crash on syscall failure
+	MOVV	R2, (R2)
+	MOVW	R2, ret+8(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R4
+	MOVV	info+16(FP), R5
+	MOVV	ctx+24(FP), R6
+	MOVV	fn+0(FP), R25		// Must use R25, needed for PIC code.
+	CALL	(R25)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT,$192
+	// initialize REGSB = PC&0xffffffff00000000
+	BGEZAL	R0, 1(PC)
+	SRLV	$32, R31, RSB
+	SLLV	$32, RSB
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVB	runtime·iscgo(SB), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·load_g(SB)
+
+	MOVW	R4, 8(R29)
+	MOVV	R5, 16(R29)
+	MOVV	R6, 24(R29)
+	MOVV	$runtime·sigtrampgo(SB), R1
+	JAL	(R1)
+	RET
+
+// int32 tfork(void *param, uintptr psize, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·tfork(SB),NOSPLIT,$0
+
+	// Copy mp, gp and fn off parent stack for use by child.
+	MOVV	mm+16(FP), R16
+	MOVV	gg+24(FP), R17
+	MOVV	fn+32(FP), R18
+
+	MOVV	param+0(FP), R4		// arg 1 - param
+	MOVV	psize+8(FP), R5		// arg 2 - psize
+	MOVV	$8, R2			// sys___tfork
+	SYSCALL
+
+	// Return if syscall failed.
+	BEQ	R7, 4(PC)
+	SUBVU	R2, R0, R2		// caller expects negative errno
+	MOVW	R2, ret+40(FP)
+	RET
+
+	// In parent, return.
+	BEQ	R2, 3(PC)
+	MOVW	R2, ret+40(FP)
+	RET
+
+	// Initialise m, g.
+	MOVV	R17, g
+	MOVV	R16, g_m(g)
+
+	// Call fn.
+	CALL	(R18)
+
+	// fn should never return.
+	MOVV	$2, R8			// crash if reached
+	MOVV	R8, (R8)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVV	new+0(FP), R4		// arg 1 - new sigaltstack
+	MOVV	old+8(FP), R5		// arg 2 - old sigaltstack
+	MOVV	$288, R2		// sys_sigaltstack
+	SYSCALL
+	BEQ	R7, 3(PC)
+	MOVV	$0, R8			// crash on syscall failure
+	MOVV	R8, (R8)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVV	$298, R2		// sys_sched_yield
+	SYSCALL
+	RET
+
+TEXT runtime·thrsleep(SB),NOSPLIT,$0
+	MOVV	ident+0(FP), R4		// arg 1 - ident
+	MOVW	clock_id+8(FP), R5	// arg 2 - clock_id
+	MOVV	tsp+16(FP), R6		// arg 3 - tsp
+	MOVV	lock+24(FP), R7		// arg 4 - lock
+	MOVV	abort+32(FP), R8	// arg 5 - abort
+	MOVV	$94, R2			// sys___thrsleep
+	SYSCALL
+	MOVW	R2, ret+40(FP)
+	RET
+
+TEXT runtime·thrwakeup(SB),NOSPLIT,$0
+	MOVV	ident+0(FP), R4		// arg 1 - ident
+	MOVW	n+8(FP), R5		// arg 2 - n
+	MOVV	$301, R2		// sys___thrwakeup
+	SYSCALL
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVV	mib+0(FP), R4		// arg 1 - mib
+	MOVW	miblen+8(FP), R5	// arg 2 - miblen
+	MOVV	out+16(FP), R6		// arg 3 - out
+	MOVV	size+24(FP), R7		// arg 4 - size
+	MOVV	dst+32(FP), R8		// arg 5 - dest
+	MOVV	ndst+40(FP), R9		// arg 6 - newlen
+	MOVV	$202, R2		// sys___sysctl
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+48(FP)
+	RET
+
+// int32 runtime·kqueue(void);
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVV	$269, R2		// sys_kqueue
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout);
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVW	kq+0(FP), R4		// arg 1 - kq
+	MOVV	ch+8(FP), R5		// arg 2 - changelist
+	MOVW	nch+16(FP), R6		// arg 3 - nchanges
+	MOVV	ev+24(FP), R7		// arg 4 - eventlist
+	MOVW	nev+32(FP), R8		// arg 5 - nevents
+	MOVV	ts+40(FP), R9		// arg 6 - timeout
+	MOVV	$72, R2			// sys_kevent
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+48(FP)
+	RET
+
+// func closeonexec(fd int32)
+TEXT runtime·closeonexec(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R4		// arg 1 - fd
+	MOVV	$2, R5			// arg 2 - cmd (F_SETFD)
+	MOVV	$1, R6			// arg 3 - arg (FD_CLOEXEC)
+	MOVV	$92, R2			// sys_fcntl
+	SYSCALL
+	RET
+
+// func runtime·setNonblock(int32 fd)
+TEXT runtime·setNonblock(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	fd+0(FP), R4		// arg 1 - fd
+	MOVV	$3, R5			// arg 2 - cmd (F_GETFL)
+	MOVV	$0, R6			// arg 3
+	MOVV	$92, R2			// sys_fcntl
+	SYSCALL
+	MOVV	$4, R6			// O_NONBLOCK
+	OR	R2, R6			// arg 3 - flags
+	MOVW	fd+0(FP), R4		// arg 1 - fd
+	MOVV	$4, R5			// arg 2 - cmd (F_SETFL)
+	MOVV	$92, R2			// sys_fcntl
+	SYSCALL
+	RET
diff --git a/src/runtime/sys_plan9_386.s b/src/runtime/sys_plan9_386.s
new file mode 100644
index 0000000..f9969f6
--- /dev/null
+++ b/src/runtime/sys_plan9_386.s
@@ -0,0 +1,252 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// setldt(int entry, int address, int limit)
+TEXT runtime·setldt(SB),NOSPLIT,$0
+	RET
+
+TEXT runtime·open(SB),NOSPLIT,$0
+	MOVL    $14, AX
+	INT     $64
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·pread(SB),NOSPLIT,$0
+	MOVL    $50, AX
+	INT     $64
+	MOVL	AX, ret+20(FP)
+	RET
+
+TEXT runtime·pwrite(SB),NOSPLIT,$0
+	MOVL    $51, AX
+	INT     $64
+	MOVL	AX, ret+20(FP)
+	RET
+
+// int32 _seek(int64*, int32, int64, int32)
+TEXT _seek<>(SB),NOSPLIT,$0
+	MOVL	$39, AX
+	INT	$64
+	RET
+
+TEXT runtime·seek(SB),NOSPLIT,$24
+	LEAL	ret+16(FP), AX
+	MOVL	fd+0(FP), BX
+	MOVL	offset_lo+4(FP), CX
+	MOVL	offset_hi+8(FP), DX
+	MOVL	whence+12(FP), SI
+	MOVL	AX, 0(SP)
+	MOVL	BX, 4(SP)
+	MOVL	CX, 8(SP)
+	MOVL	DX, 12(SP)
+	MOVL	SI, 16(SP)
+	CALL	_seek<>(SB)
+	CMPL	AX, $0
+	JGE	3(PC)
+	MOVL	$-1, ret_lo+16(FP)
+	MOVL	$-1, ret_hi+20(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0
+	MOVL	$4, AX
+	INT	$64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·exits(SB),NOSPLIT,$0
+	MOVL    $8, AX
+	INT     $64
+	RET
+
+TEXT runtime·brk_(SB),NOSPLIT,$0
+	MOVL    $24, AX
+	INT     $64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·sleep(SB),NOSPLIT,$0
+	MOVL    $17, AX
+	INT     $64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·plan9_semacquire(SB),NOSPLIT,$0
+	MOVL	$37, AX
+	INT	$64
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·plan9_tsemacquire(SB),NOSPLIT,$0
+	MOVL	$52, AX
+	INT	$64
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT nsec<>(SB),NOSPLIT,$0
+	MOVL	$53, AX
+	INT	$64
+	RET
+
+TEXT runtime·nsec(SB),NOSPLIT,$8
+	LEAL	ret+4(FP), AX
+	MOVL	AX, 0(SP)
+	CALL	nsec<>(SB)
+	CMPL	AX, $0
+	JGE	3(PC)
+	MOVL	$-1, ret_lo+4(FP)
+	MOVL	$-1, ret_hi+8(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB),NOSPLIT,$8-12
+	CALL	runtime·nanotime1(SB)
+	MOVL	0(SP), AX
+	MOVL	4(SP), DX
+
+	MOVL	$1000000000, CX
+	DIVL	CX
+	MOVL	AX, sec_lo+0(FP)
+	MOVL	$0, sec_hi+4(FP)
+	MOVL	DX, nsec+8(FP)
+	RET
+
+TEXT runtime·notify(SB),NOSPLIT,$0
+	MOVL	$28, AX
+	INT	$64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·noted(SB),NOSPLIT,$0
+	MOVL	$29, AX
+	INT	$64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·plan9_semrelease(SB),NOSPLIT,$0
+	MOVL	$38, AX
+	INT	$64
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·rfork(SB),NOSPLIT,$0
+	MOVL	$19, AX
+	INT	$64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·tstart_plan9(SB),NOSPLIT,$4
+	MOVL	newm+0(FP), CX
+	MOVL	m_g0(CX), DX
+
+	// Layout new m scheduler stack on os stack.
+	MOVL	SP, AX
+	MOVL	AX, (g_stack+stack_hi)(DX)
+	SUBL	$(64*1024), AX		// stack size
+	MOVL	AX, (g_stack+stack_lo)(DX)
+	MOVL	AX, g_stackguard0(DX)
+	MOVL	AX, g_stackguard1(DX)
+
+	// Initialize procid from TOS struct.
+	MOVL	_tos(SB), AX
+	MOVL	48(AX), AX
+	MOVL	AX, m_procid(CX)	// save pid as m->procid
+
+	// Finally, initialize g.
+	get_tls(BX)
+	MOVL	DX, g(BX)
+
+	CALL	runtime·stackcheck(SB)	// smashes AX, CX
+	CALL	runtime·mstart(SB)
+
+	// Exit the thread.
+	MOVL	$0, 0(SP)
+	CALL	runtime·exits(SB)
+	JMP	0(PC)
+
+// void sigtramp(void *ureg, int8 *note)
+TEXT runtime·sigtramp(SB),NOSPLIT,$0
+	get_tls(AX)
+
+	// check that g exists
+	MOVL	g(AX), BX
+	CMPL	BX, $0
+	JNE	3(PC)
+	CALL	runtime·badsignal2(SB) // will exit
+	RET
+
+	// save args
+	MOVL	ureg+0(FP), CX
+	MOVL	note+4(FP), DX
+
+	// change stack
+	MOVL	g_m(BX), BX
+	MOVL	m_gsignal(BX), BP
+	MOVL	(g_stack+stack_hi)(BP), BP
+	MOVL	BP, SP
+
+	// make room for args and g
+	SUBL	$24, SP
+
+	// save g
+	MOVL	g(AX), BP
+	MOVL	BP, 20(SP)
+
+	// g = m->gsignal
+	MOVL	m_gsignal(BX), DI
+	MOVL	DI, g(AX)
+
+	// load args and call sighandler
+	MOVL	CX, 0(SP)
+	MOVL	DX, 4(SP)
+	MOVL	BP, 8(SP)
+
+	CALL	runtime·sighandler(SB)
+	MOVL	12(SP), AX
+
+	// restore g
+	get_tls(BX)
+	MOVL	20(SP), BP
+	MOVL	BP, g(BX)
+
+	// call noted(AX)
+	MOVL	AX, 0(SP)
+	CALL	runtime·noted(SB)
+	RET
+
+// Only used by the 64-bit runtime.
+TEXT runtime·setfpmasks(SB),NOSPLIT,$0
+	RET
+
+#define ERRMAX 128	/* from os_plan9.h */
+
+// void errstr(int8 *buf, int32 len)
+TEXT errstr<>(SB),NOSPLIT,$0
+	MOVL    $41, AX
+	INT     $64
+	RET
+
+// func errstr() string
+// Only used by package syscall.
+// Grab error string due to a syscall made
+// in entersyscall mode, without going
+// through the allocator (issue 4994).
+// See ../syscall/asm_plan9_386.s:/·Syscall/
+TEXT runtime·errstr(SB),NOSPLIT,$8-8
+	get_tls(AX)
+	MOVL	g(AX), BX
+	MOVL	g_m(BX), BX
+	MOVL	(m_mOS+mOS_errstr)(BX), CX
+	MOVL	CX, 0(SP)
+	MOVL	$ERRMAX, 4(SP)
+	CALL	errstr<>(SB)
+	CALL	runtime·findnull(SB)
+	MOVL	4(SP), AX
+	MOVL	AX, ret_len+4(FP)
+	MOVL	0(SP), AX
+	MOVL	AX, ret_base+0(FP)
+	RET
diff --git a/src/runtime/sys_plan9_amd64.s b/src/runtime/sys_plan9_amd64.s
new file mode 100644
index 0000000..383622b
--- /dev/null
+++ b/src/runtime/sys_plan9_amd64.s
@@ -0,0 +1,253 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+TEXT runtime·open(SB),NOSPLIT,$0
+	MOVQ	$14, BP
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·pread(SB),NOSPLIT,$0
+	MOVQ	$50, BP
+	SYSCALL
+	MOVL	AX, ret+32(FP)
+	RET
+
+TEXT runtime·pwrite(SB),NOSPLIT,$0
+	MOVQ	$51, BP
+	SYSCALL
+	MOVL	AX, ret+32(FP)
+	RET
+
+// int32 _seek(int64*, int32, int64, int32)
+TEXT _seek<>(SB),NOSPLIT,$0
+	MOVQ	$39, BP
+	SYSCALL
+	RET
+
+// int64 seek(int32, int64, int32)
+// Convenience wrapper around _seek, the actual system call.
+TEXT runtime·seek(SB),NOSPLIT,$32
+	LEAQ	ret+24(FP), AX
+	MOVL	fd+0(FP), BX
+	MOVQ	offset+8(FP), CX
+	MOVL	whence+16(FP), DX
+	MOVQ	AX, 0(SP)
+	MOVL	BX, 8(SP)
+	MOVQ	CX, 16(SP)
+	MOVL	DX, 24(SP)
+	CALL	_seek<>(SB)
+	CMPL	AX, $0
+	JGE	2(PC)
+	MOVQ	$-1, ret+24(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0
+	MOVQ	$4, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·exits(SB),NOSPLIT,$0
+	MOVQ	$8, BP
+	SYSCALL
+	RET
+
+TEXT runtime·brk_(SB),NOSPLIT,$0
+	MOVQ	$24, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·sleep(SB),NOSPLIT,$0
+	MOVQ	$17, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·plan9_semacquire(SB),NOSPLIT,$0
+	MOVQ	$37, BP
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·plan9_tsemacquire(SB),NOSPLIT,$0
+	MOVQ	$52, BP
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·nsec(SB),NOSPLIT,$0
+	MOVQ	$53, BP
+	SYSCALL
+	MOVQ	AX, ret+8(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB),NOSPLIT,$8-12
+	CALL	runtime·nanotime1(SB)
+	MOVQ	0(SP), AX
+
+	// generated code for
+	//	func f(x uint64) (uint64, uint64) { return x/1000000000, x%100000000 }
+	// adapted to reduce duplication
+	MOVQ	AX, CX
+	MOVQ	$1360296554856532783, AX
+	MULQ	CX
+	ADDQ	CX, DX
+	RCRQ	$1, DX
+	SHRQ	$29, DX
+	MOVQ	DX, sec+0(FP)
+	IMULQ	$1000000000, DX
+	SUBQ	DX, CX
+	MOVL	CX, nsec+8(FP)
+	RET
+
+TEXT runtime·notify(SB),NOSPLIT,$0
+	MOVQ	$28, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·noted(SB),NOSPLIT,$0
+	MOVQ	$29, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·plan9_semrelease(SB),NOSPLIT,$0
+	MOVQ	$38, BP
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·rfork(SB),NOSPLIT,$0
+	MOVQ	$19, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·tstart_plan9(SB),NOSPLIT,$8
+	MOVQ	newm+0(FP), CX
+	MOVQ	m_g0(CX), DX
+
+	// Layout new m scheduler stack on os stack.
+	MOVQ	SP, AX
+	MOVQ	AX, (g_stack+stack_hi)(DX)
+	SUBQ	$(64*1024), AX		// stack size
+	MOVQ	AX, (g_stack+stack_lo)(DX)
+	MOVQ	AX, g_stackguard0(DX)
+	MOVQ	AX, g_stackguard1(DX)
+
+	// Initialize procid from TOS struct.
+	MOVQ	_tos(SB), AX
+	MOVL	64(AX), AX
+	MOVQ	AX, m_procid(CX)	// save pid as m->procid
+
+	// Finally, initialize g.
+	get_tls(BX)
+	MOVQ	DX, g(BX)
+
+	CALL	runtime·stackcheck(SB)	// smashes AX, CX
+	CALL	runtime·mstart(SB)
+
+	// Exit the thread.
+	MOVQ	$0, 0(SP)
+	CALL	runtime·exits(SB)
+	JMP	0(PC)
+
+// This is needed by asm_amd64.s
+TEXT runtime·settls(SB),NOSPLIT,$0
+	RET
+
+// void sigtramp(void *ureg, int8 *note)
+TEXT runtime·sigtramp(SB),NOSPLIT,$0
+	get_tls(AX)
+
+	// check that g exists
+	MOVQ	g(AX), BX
+	CMPQ	BX, $0
+	JNE	3(PC)
+	CALL	runtime·badsignal2(SB) // will exit
+	RET
+
+	// save args
+	MOVQ	ureg+0(FP), CX
+	MOVQ	note+8(FP), DX
+
+	// change stack
+	MOVQ	g_m(BX), BX
+	MOVQ	m_gsignal(BX), R10
+	MOVQ	(g_stack+stack_hi)(R10), BP
+	MOVQ	BP, SP
+
+	// make room for args and g
+	SUBQ	$128, SP
+
+	// save g
+	MOVQ	g(AX), BP
+	MOVQ	BP, 32(SP)
+
+	// g = m->gsignal
+	MOVQ	R10, g(AX)
+
+	// load args and call sighandler
+	MOVQ	CX, 0(SP)
+	MOVQ	DX, 8(SP)
+	MOVQ	BP, 16(SP)
+
+	CALL	runtime·sighandler(SB)
+	MOVL	24(SP), AX
+
+	// restore g
+	get_tls(BX)
+	MOVQ	32(SP), R10
+	MOVQ	R10, g(BX)
+
+	// call noted(AX)
+	MOVQ	AX, 0(SP)
+	CALL	runtime·noted(SB)
+	RET
+
+TEXT runtime·setfpmasks(SB),NOSPLIT,$8
+	STMXCSR	0(SP)
+	MOVL	0(SP), AX
+	ANDL	$~0x3F, AX
+	ORL	$(0x3F<<7), AX
+	MOVL	AX, 0(SP)
+	LDMXCSR	0(SP)
+	RET
+
+#define ERRMAX 128	/* from os_plan9.h */
+
+// void errstr(int8 *buf, int32 len)
+TEXT errstr<>(SB),NOSPLIT,$0
+	MOVQ    $41, BP
+	SYSCALL
+	RET
+
+// func errstr() string
+// Only used by package syscall.
+// Grab error string due to a syscall made
+// in entersyscall mode, without going
+// through the allocator (issue 4994).
+// See ../syscall/asm_plan9_amd64.s:/·Syscall/
+TEXT runtime·errstr(SB),NOSPLIT,$16-16
+	get_tls(AX)
+	MOVQ	g(AX), BX
+	MOVQ	g_m(BX), BX
+	MOVQ	(m_mOS+mOS_errstr)(BX), CX
+	MOVQ	CX, 0(SP)
+	MOVQ	$ERRMAX, 8(SP)
+	CALL	errstr<>(SB)
+	CALL	runtime·findnull(SB)
+	MOVQ	8(SP), AX
+	MOVQ	AX, ret_len+8(FP)
+	MOVQ	0(SP), AX
+	MOVQ	AX, ret_base+0(FP)
+	RET
diff --git a/src/runtime/sys_plan9_arm.s b/src/runtime/sys_plan9_arm.s
new file mode 100644
index 0000000..9fbe305
--- /dev/null
+++ b/src/runtime/sys_plan9_arm.s
@@ -0,0 +1,320 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// from ../syscall/zsysnum_plan9.go
+
+#define SYS_SYSR1       0
+#define SYS_BIND        2
+#define SYS_CHDIR       3
+#define SYS_CLOSE       4
+#define SYS_DUP         5
+#define SYS_ALARM       6
+#define SYS_EXEC        7
+#define SYS_EXITS       8
+#define SYS_FAUTH       10
+#define SYS_SEGBRK      12
+#define SYS_OPEN        14
+#define SYS_OSEEK       16
+#define SYS_SLEEP       17
+#define SYS_RFORK       19
+#define SYS_PIPE        21
+#define SYS_CREATE      22
+#define SYS_FD2PATH     23
+#define SYS_BRK_        24
+#define SYS_REMOVE      25
+#define SYS_NOTIFY      28
+#define SYS_NOTED       29
+#define SYS_SEGATTACH   30
+#define SYS_SEGDETACH   31
+#define SYS_SEGFREE     32
+#define SYS_SEGFLUSH    33
+#define SYS_RENDEZVOUS  34
+#define SYS_UNMOUNT     35
+#define SYS_SEMACQUIRE  37
+#define SYS_SEMRELEASE  38
+#define SYS_SEEK        39
+#define SYS_FVERSION    40
+#define SYS_ERRSTR      41
+#define SYS_STAT        42
+#define SYS_FSTAT       43
+#define SYS_WSTAT       44
+#define SYS_FWSTAT      45
+#define SYS_MOUNT       46
+#define SYS_AWAIT       47
+#define SYS_PREAD       50
+#define SYS_PWRITE      51
+#define SYS_TSEMACQUIRE 52
+#define SYS_NSEC        53
+
+//func open(name *byte, mode, perm int32) int32
+TEXT runtime·open(SB),NOSPLIT,$0-16
+	MOVW    $SYS_OPEN, R0
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+//func pread(fd int32, buf unsafe.Pointer, nbytes int32, offset int64) int32
+TEXT runtime·pread(SB),NOSPLIT,$0-24
+	MOVW    $SYS_PREAD, R0
+	SWI	$0
+	MOVW	R0, ret+20(FP)
+	RET
+
+//func pwrite(fd int32, buf unsafe.Pointer, nbytes int32, offset int64) int32
+TEXT runtime·pwrite(SB),NOSPLIT,$0-24
+	MOVW    $SYS_PWRITE, R0
+	SWI	$0
+	MOVW	R0, ret+20(FP)
+	RET
+
+//func seek(fd int32, offset int64, whence int32) int64
+TEXT runtime·seek(SB),NOSPLIT,$0-24
+	MOVW	$ret_lo+16(FP), R0
+	MOVW	0(R13), R1
+	MOVW	R0, 0(R13)
+	MOVW.W	R1, -4(R13)
+	MOVW	$SYS_SEEK, R0
+	SWI	$0
+	MOVW.W	R1, 4(R13)
+	CMP	$-1, R0
+	MOVW.EQ	R0, ret_lo+16(FP)
+	MOVW.EQ	R0, ret_hi+20(FP)
+	RET
+
+//func closefd(fd int32) int32
+TEXT runtime·closefd(SB),NOSPLIT,$0-8
+	MOVW	$SYS_CLOSE, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func exits(msg *byte)
+TEXT runtime·exits(SB),NOSPLIT,$0-4
+	MOVW    $SYS_EXITS, R0
+	SWI	$0
+	RET
+
+//func brk_(addr unsafe.Pointer) int32
+TEXT runtime·brk_(SB),NOSPLIT,$0-8
+	MOVW    $SYS_BRK_, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func sleep(ms int32) int32
+TEXT runtime·sleep(SB),NOSPLIT,$0-8
+	MOVW    $SYS_SLEEP, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func plan9_semacquire(addr *uint32, block int32) int32
+TEXT runtime·plan9_semacquire(SB),NOSPLIT,$0-12
+	MOVW	$SYS_SEMACQUIRE, R0
+	SWI	$0
+	MOVW	R0, ret+8(FP)
+	RET
+
+//func plan9_tsemacquire(addr *uint32, ms int32) int32
+TEXT runtime·plan9_tsemacquire(SB),NOSPLIT,$0-12
+	MOVW	$SYS_TSEMACQUIRE, R0
+	SWI	$0
+	MOVW	R0, ret+8(FP)
+	RET
+
+//func nsec(*int64) int64
+TEXT runtime·nsec(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	$SYS_NSEC, R0
+	SWI	$0
+	MOVW	arg+0(FP), R1
+	MOVW	0(R1), R0
+	MOVW	R0, ret_lo+4(FP)
+	MOVW	4(R1), R0
+	MOVW	R0, ret_hi+8(FP)
+	RET
+
+// func walltime1() (sec int64, nsec int32)
+TEXT runtime·walltime1(SB),NOSPLIT,$12-12
+	// use nsec system call to get current time in nanoseconds
+	MOVW	$sysnsec_lo-8(SP), R0	// destination addr
+	MOVW	R0,res-12(SP)
+	MOVW	$SYS_NSEC, R0
+	SWI	$0
+	MOVW	sysnsec_lo-8(SP), R1	// R1:R2 = nsec
+	MOVW	sysnsec_hi-4(SP), R2
+
+	// multiply nanoseconds by reciprocal of 10**9 (scaled by 2**61)
+	// to get seconds (96 bit scaled result)
+	MOVW	$0x89705f41, R3		// 2**61 * 10**-9
+	MULLU	R1,R3,(R6,R5)		// R5:R6:R7 = R1:R2 * R3
+	MOVW	$0,R7
+	MULALU	R2,R3,(R7,R6)
+
+	// unscale by discarding low 32 bits, shifting the rest by 29
+	MOVW	R6>>29,R6		// R6:R7 = (R5:R6:R7 >> 61)
+	ORR	R7<<3,R6
+	MOVW	R7>>29,R7
+
+	// subtract (10**9 * sec) from nsec to get nanosecond remainder
+	MOVW	$1000000000, R5		// 10**9
+	MULLU	R6,R5,(R9,R8)		// R8:R9 = R6:R7 * R5
+	MULA	R7,R5,R9,R9
+	SUB.S	R8,R1			// R1:R2 -= R8:R9
+	SBC	R9,R2
+
+	// because reciprocal was a truncated repeating fraction, quotient
+	// may be slightly too small -- adjust to make remainder < 10**9
+	CMP	R5,R1			// if remainder > 10**9
+	SUB.HS	R5,R1			//    remainder -= 10**9
+	ADD.HS	$1,R6			//    sec += 1
+
+	MOVW	R6,sec_lo+0(FP)
+	MOVW	R7,sec_hi+4(FP)
+	MOVW	R1,nsec+8(FP)
+	RET
+
+//func notify(fn unsafe.Pointer) int32
+TEXT runtime·notify(SB),NOSPLIT,$0-8
+	MOVW	$SYS_NOTIFY, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func noted(mode int32) int32
+TEXT runtime·noted(SB),NOSPLIT,$0-8
+	MOVW	$SYS_NOTED, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func plan9_semrelease(addr *uint32, count int32) int32
+TEXT runtime·plan9_semrelease(SB),NOSPLIT,$0-12
+	MOVW	$SYS_SEMRELEASE, R0
+	SWI	$0
+	MOVW	R0, ret+8(FP)
+	RET
+
+//func rfork(flags int32) int32
+TEXT runtime·rfork(SB),NOSPLIT,$0-8
+	MOVW	$SYS_RFORK, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func tstart_plan9(newm *m)
+TEXT runtime·tstart_plan9(SB),NOSPLIT,$4-4
+	MOVW	newm+0(FP), R1
+	MOVW	m_g0(R1), g
+
+	// Layout new m scheduler stack on os stack.
+	MOVW	R13, R0
+	MOVW	R0, g_stack+stack_hi(g)
+	SUB	$(64*1024), R0
+	MOVW	R0, (g_stack+stack_lo)(g)
+	MOVW	R0, g_stackguard0(g)
+	MOVW	R0, g_stackguard1(g)
+
+	// Initialize procid from TOS struct.
+	MOVW	_tos(SB), R0
+	MOVW	48(R0), R0
+	MOVW	R0, m_procid(R1)	// save pid as m->procid
+
+	BL	runtime·mstart(SB)
+
+	// Exit the thread.
+	MOVW	$0, R0
+	MOVW	R0, 4(R13)
+	CALL	runtime·exits(SB)
+	JMP	0(PC)
+
+//func sigtramp(ureg, note unsafe.Pointer)
+TEXT runtime·sigtramp(SB),NOSPLIT,$0-8
+	// check that g and m exist
+	CMP	$0, g
+	BEQ	4(PC)
+	MOVW	g_m(g), R0
+	CMP 	$0, R0
+	BNE	2(PC)
+	BL	runtime·badsignal2(SB)	// will exit
+
+	// save args
+	MOVW	ureg+0(FP), R1
+	MOVW	note+4(FP), R2
+
+	// change stack
+	MOVW	m_gsignal(R0), R3
+	MOVW	(g_stack+stack_hi)(R3), R13
+
+	// make room for args, retval and g
+	SUB	$24, R13
+
+	// save g
+	MOVW	g, R3
+	MOVW	R3, 20(R13)
+
+	// g = m->gsignal
+	MOVW	m_gsignal(R0), g
+
+	// load args and call sighandler
+	ADD	$4,R13,R5
+	MOVM.IA	[R1-R3], (R5)
+	BL	runtime·sighandler(SB)
+	MOVW	16(R13), R0			// retval
+
+	// restore g
+	MOVW	20(R13), g
+
+	// call noted(R0)
+	MOVW	R0, 4(R13)
+	BL	runtime·noted(SB)
+	RET
+
+//func sigpanictramp()
+TEXT  runtime·sigpanictramp(SB),NOSPLIT,$0-0
+	MOVW.W	R0, -4(R13)
+	B	runtime·sigpanic(SB)
+
+//func setfpmasks()
+// Only used by the 64-bit runtime.
+TEXT runtime·setfpmasks(SB),NOSPLIT,$0
+	RET
+
+#define ERRMAX 128	/* from os_plan9.h */
+
+// func errstr() string
+// Only used by package syscall.
+// Grab error string due to a syscall made
+// in entersyscall mode, without going
+// through the allocator (issue 4994).
+// See ../syscall/asm_plan9_arm.s:/·Syscall/
+TEXT runtime·errstr(SB),NOSPLIT,$0-8
+	MOVW	g_m(g), R0
+	MOVW	(m_mOS+mOS_errstr)(R0), R1
+	MOVW	R1, ret_base+0(FP)
+	MOVW	$ERRMAX, R2
+	MOVW	R2, ret_len+4(FP)
+	MOVW    $SYS_ERRSTR, R0
+	SWI	$0
+	MOVW	R1, R2
+	MOVBU	0(R2), R0
+	CMP	$0, R0
+	BEQ	3(PC)
+	ADD	$1, R2
+	B	-4(PC)
+	SUB	R1, R2
+	MOVW	R2, ret_len+4(FP)
+	RET
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	B	runtime·armPublicationBarrier(SB)
+
+// never called (cgo not supported)
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0, R0
+	MOVW	R0, (R0)
+	RET
diff --git a/src/runtime/sys_ppc64x.go b/src/runtime/sys_ppc64x.go
new file mode 100644
index 0000000..796f27c
--- /dev/null
+++ b/src/runtime/sys_ppc64x.go
@@ -0,0 +1,22 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
+
+func prepGoExitFrame(sp uintptr)
diff --git a/src/runtime/sys_riscv64.go b/src/runtime/sys_riscv64.go
new file mode 100644
index 0000000..e710840
--- /dev/null
+++ b/src/runtime/sys_riscv64.go
@@ -0,0 +1,18 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_s390x.go b/src/runtime/sys_s390x.go
new file mode 100644
index 0000000..e710840
--- /dev/null
+++ b/src/runtime/sys_s390x.go
@@ -0,0 +1,18 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_solaris_amd64.s b/src/runtime/sys_solaris_amd64.s
new file mode 100644
index 0000000..05fd187
--- /dev/null
+++ b/src/runtime/sys_solaris_amd64.s
@@ -0,0 +1,320 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for AMD64, SunOS
+// /usr/include/sys/syscall.h for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// This is needed by asm_amd64.s
+TEXT runtime·settls(SB),NOSPLIT,$8
+	RET
+
+// void libc_miniterrno(void *(*___errno)(void));
+//
+// Set the TLS errno pointer in M.
+//
+// Called using runtime·asmcgocall from os_solaris.c:/minit.
+// NOT USING GO CALLING CONVENTION.
+TEXT runtime·miniterrno(SB),NOSPLIT,$0
+	// asmcgocall will put first argument into DI.
+	CALL	DI	// SysV ABI so returns in AX
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	MOVQ	g_m(BX), BX
+	MOVQ	AX,	(m_mOS+mOS_perrno)(BX)
+	RET
+
+// pipe(3c) wrapper that returns fds in AX, DX.
+// NOT USING GO CALLING CONVENTION.
+TEXT runtime·pipe1(SB),NOSPLIT,$0
+	SUBQ	$16, SP // 8 bytes will do, but stack has to be 16-byte aligned
+	MOVQ	SP, DI
+	LEAQ	libc_pipe(SB), AX
+	CALL	AX
+	MOVL	0(SP), AX
+	MOVL	4(SP), DX
+	ADDQ	$16, SP
+	RET
+
+// Call a library function with SysV calling conventions.
+// The called function can take a maximum of 6 INTEGER class arguments,
+// see
+//   Michael Matz, Jan Hubicka, Andreas Jaeger, and Mark Mitchell
+//   System V Application Binary Interface
+//   AMD64 Architecture Processor Supplement
+// section 3.2.3.
+//
+// Called by runtime·asmcgocall or runtime·cgocall.
+// NOT USING GO CALLING CONVENTION.
+TEXT runtime·asmsysvicall6(SB),NOSPLIT,$0
+	// asmcgocall will put first argument into DI.
+	PUSHQ	DI			// save for later
+	MOVQ	libcall_fn(DI), AX
+	MOVQ	libcall_args(DI), R11
+	MOVQ	libcall_n(DI), R10
+
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	CMPQ	BX, $0
+	JEQ	skiperrno1
+	MOVQ	g_m(BX), BX
+	MOVQ	(m_mOS+mOS_perrno)(BX), DX
+	CMPQ	DX, $0
+	JEQ	skiperrno1
+	MOVL	$0, 0(DX)
+
+skiperrno1:
+	CMPQ	R11, $0
+	JEQ	skipargs
+	// Load 6 args into correspondent registers.
+	MOVQ	0(R11), DI
+	MOVQ	8(R11), SI
+	MOVQ	16(R11), DX
+	MOVQ	24(R11), CX
+	MOVQ	32(R11), R8
+	MOVQ	40(R11), R9
+skipargs:
+
+	// Call SysV function
+	CALL	AX
+
+	// Return result
+	POPQ	DI
+	MOVQ	AX, libcall_r1(DI)
+	MOVQ	DX, libcall_r2(DI)
+
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	CMPQ	BX, $0
+	JEQ	skiperrno2
+	MOVQ	g_m(BX), BX
+	MOVQ	(m_mOS+mOS_perrno)(BX), AX
+	CMPQ	AX, $0
+	JEQ	skiperrno2
+	MOVL	0(AX), AX
+	MOVQ	AX, libcall_err(DI)
+
+skiperrno2:
+	RET
+
+// uint32 tstart_sysvicall(M *newm);
+TEXT runtime·tstart_sysvicall(SB),NOSPLIT,$0
+	// DI contains first arg newm
+	MOVQ	m_g0(DI), DX		// g
+
+	// Make TLS entries point at g and m.
+	get_tls(BX)
+	MOVQ	DX, g(BX)
+	MOVQ	DI, g_m(DX)
+
+	// Layout new m scheduler stack on os stack.
+	MOVQ	SP, AX
+	MOVQ	AX, (g_stack+stack_hi)(DX)
+	SUBQ	$(0x100000), AX		// stack size
+	MOVQ	AX, (g_stack+stack_lo)(DX)
+	ADDQ	$const__StackGuard, AX
+	MOVQ	AX, g_stackguard0(DX)
+	MOVQ	AX, g_stackguard1(DX)
+
+	// Someday the convention will be D is always cleared.
+	CLD
+
+	CALL	runtime·stackcheck(SB)	// clobbers AX,CX
+	CALL	runtime·mstart(SB)
+
+	XORL	AX, AX			// return 0 == success
+	MOVL	AX, ret+8(FP)
+	RET
+
+// Careful, this is called by __sighndlr, a libc function. We must preserve
+// registers as per AMD 64 ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT,$0
+	// Note that we are executing on altsigstack here, so we have
+	// more stack available than NOSPLIT would have us believe.
+	// To defeat the linker, we make our own stack frame with
+	// more space:
+	SUBQ    $184, SP
+
+	// save registers
+	MOVQ    BX, 32(SP)
+	MOVQ    BP, 40(SP)
+	MOVQ	R12, 48(SP)
+	MOVQ	R13, 56(SP)
+	MOVQ	R14, 64(SP)
+	MOVQ	R15, 72(SP)
+
+	get_tls(BX)
+	// check that g exists
+	MOVQ	g(BX), R10
+	CMPQ	R10, $0
+	JNE	allgood
+	MOVQ	SI, 80(SP)
+	MOVQ	DX, 88(SP)
+	LEAQ	80(SP), AX
+	MOVQ	DI, 0(SP)
+	MOVQ	AX, 8(SP)
+	MOVQ	$runtime·badsignal(SB), AX
+	CALL	AX
+	JMP	exit
+
+allgood:
+	// Save m->libcall and m->scratch. We need to do this because we
+	// might get interrupted by a signal in runtime·asmcgocall.
+
+	// save m->libcall
+	MOVQ	g_m(R10), BP
+	LEAQ	m_libcall(BP), R11
+	MOVQ	libcall_fn(R11), R10
+	MOVQ	R10, 88(SP)
+	MOVQ	libcall_args(R11), R10
+	MOVQ	R10, 96(SP)
+	MOVQ	libcall_n(R11), R10
+	MOVQ	R10, 104(SP)
+	MOVQ    libcall_r1(R11), R10
+	MOVQ    R10, 168(SP)
+	MOVQ    libcall_r2(R11), R10
+	MOVQ    R10, 176(SP)
+
+	// save m->scratch
+	LEAQ	(m_mOS+mOS_scratch)(BP), R11
+	MOVQ	0(R11), R10
+	MOVQ	R10, 112(SP)
+	MOVQ	8(R11), R10
+	MOVQ	R10, 120(SP)
+	MOVQ	16(R11), R10
+	MOVQ	R10, 128(SP)
+	MOVQ	24(R11), R10
+	MOVQ	R10, 136(SP)
+	MOVQ	32(R11), R10
+	MOVQ	R10, 144(SP)
+	MOVQ	40(R11), R10
+	MOVQ	R10, 152(SP)
+
+	// save errno, it might be EINTR; stuff we do here might reset it.
+	MOVQ	(m_mOS+mOS_perrno)(BP), R10
+	MOVL	0(R10), R10
+	MOVQ	R10, 160(SP)
+
+	// prepare call
+	MOVQ	DI, 0(SP)
+	MOVQ	SI, 8(SP)
+	MOVQ	DX, 16(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	get_tls(BX)
+	MOVQ	g(BX), BP
+	MOVQ	g_m(BP), BP
+	// restore libcall
+	LEAQ	m_libcall(BP), R11
+	MOVQ	88(SP), R10
+	MOVQ	R10, libcall_fn(R11)
+	MOVQ	96(SP), R10
+	MOVQ	R10, libcall_args(R11)
+	MOVQ	104(SP), R10
+	MOVQ	R10, libcall_n(R11)
+	MOVQ    168(SP), R10
+	MOVQ    R10, libcall_r1(R11)
+	MOVQ    176(SP), R10
+	MOVQ    R10, libcall_r2(R11)
+
+	// restore scratch
+	LEAQ	(m_mOS+mOS_scratch)(BP), R11
+	MOVQ	112(SP), R10
+	MOVQ	R10, 0(R11)
+	MOVQ	120(SP), R10
+	MOVQ	R10, 8(R11)
+	MOVQ	128(SP), R10
+	MOVQ	R10, 16(R11)
+	MOVQ	136(SP), R10
+	MOVQ	R10, 24(R11)
+	MOVQ	144(SP), R10
+	MOVQ	R10, 32(R11)
+	MOVQ	152(SP), R10
+	MOVQ	R10, 40(R11)
+
+	// restore errno
+	MOVQ	(m_mOS+mOS_perrno)(BP), R11
+	MOVQ	160(SP), R10
+	MOVL	R10, 0(R11)
+
+exit:
+	// restore registers
+	MOVQ	32(SP), BX
+	MOVQ	40(SP), BP
+	MOVQ	48(SP), R12
+	MOVQ	56(SP), R13
+	MOVQ	64(SP), R14
+	MOVQ	72(SP), R15
+
+	ADDQ    $184, SP
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	PUSHQ	BP
+	MOVQ	SP, BP
+	ANDQ	$~15, SP     // alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BP, SP
+	POPQ	BP
+	RET
+
+// Called from runtime·usleep (Go). Can be called on Go stack, on OS stack,
+// can also be called in cgo callback path without a g->m.
+TEXT runtime·usleep1(SB),NOSPLIT,$0
+	MOVL	usec+0(FP), DI
+	MOVQ	$usleep2<>(SB), AX // to hide from 6l
+
+	// Execute call on m->g0.
+	get_tls(R15)
+	CMPQ	R15, $0
+	JE	noswitch
+
+	MOVQ	g(R15), R13
+	CMPQ	R13, $0
+	JE	noswitch
+	MOVQ	g_m(R13), R13
+	CMPQ	R13, $0
+	JE	noswitch
+	// TODO(aram): do something about the cpu profiler here.
+
+	MOVQ	m_g0(R13), R14
+	CMPQ	g(R15), R14
+	JNE	switch
+	// executing on m->g0 already
+	CALL	AX
+	RET
+
+switch:
+	// Switch to m->g0 stack and back.
+	MOVQ	(g_sched+gobuf_sp)(R14), R14
+	MOVQ	SP, -8(R14)
+	LEAQ	-8(R14), SP
+	CALL	AX
+	MOVQ	0(SP), SP
+	RET
+
+noswitch:
+	// Not a Go-managed thread. Do not switch stack.
+	CALL	AX
+	RET
+
+// Runs on OS stack. duration (in µs units) is in DI.
+TEXT usleep2<>(SB),NOSPLIT,$0
+	LEAQ	libc_usleep(SB), AX
+	CALL	AX
+	RET
+
+// Runs on OS stack, called from runtime·osyield.
+TEXT runtime·osyield1(SB),NOSPLIT,$0
+	LEAQ	libc_sched_yield(SB), AX
+	CALL	AX
+	RET
diff --git a/src/runtime/sys_wasm.go b/src/runtime/sys_wasm.go
new file mode 100644
index 0000000..9bf710b
--- /dev/null
+++ b/src/runtime/sys_wasm.go
@@ -0,0 +1,42 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type m0Stack struct {
+	_ [8192 * sys.StackGuardMultiplier]byte
+}
+
+var wasmStack m0Stack
+
+func wasmMove()
+
+func wasmZero()
+
+func wasmDiv()
+
+func wasmTruncS()
+func wasmTruncU()
+
+func wasmExit(code int32)
+
+// adjust Gobuf as it if executed a call to fn with context ctxt
+// and then did an immediate gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	sp := buf.sp
+	if sys.RegSize > sys.PtrSize {
+		sp -= sys.PtrSize
+		*(*uintptr)(unsafe.Pointer(sp)) = 0
+	}
+	sp -= sys.PtrSize
+	*(*uintptr)(unsafe.Pointer(sp)) = buf.pc
+	buf.sp = sp
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_wasm.s b/src/runtime/sys_wasm.s
new file mode 100644
index 0000000..e7a6570
--- /dev/null
+++ b/src/runtime/sys_wasm.s
@@ -0,0 +1,202 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT runtime·wasmMove(SB), NOSPLIT, $0-0
+loop:
+	Loop
+		// *dst = *src
+		Get R0
+		Get R1
+		I64Load $0
+		I64Store $0
+
+		// n--
+		Get R2
+		I32Const $1
+		I32Sub
+		Tee R2
+
+		// n == 0
+		I32Eqz
+		If
+			Return
+		End
+
+		// dst += 8
+		Get R0
+		I32Const $8
+		I32Add
+		Set R0
+
+		// src += 8
+		Get R1
+		I32Const $8
+		I32Add
+		Set R1
+
+		Br loop
+	End
+	UNDEF
+
+TEXT runtime·wasmZero(SB), NOSPLIT, $0-0
+loop:
+	Loop
+		// *dst = 0
+		Get R0
+		I64Const $0
+		I64Store $0
+
+		// n--
+		Get R1
+		I32Const $1
+		I32Sub
+		Tee R1
+
+		// n == 0
+		I32Eqz
+		If
+			Return
+		End
+
+		// dst += 8
+		Get R0
+		I32Const $8
+		I32Add
+		Set R0
+
+		Br loop
+	End
+	UNDEF
+
+TEXT runtime·wasmDiv(SB), NOSPLIT, $0-0
+	Get R0
+	I64Const $-0x8000000000000000
+	I64Eq
+	If
+		Get R1
+		I64Const $-1
+		I64Eq
+		If
+			I64Const $-0x8000000000000000
+			Return
+		End
+	End
+	Get R0
+	Get R1
+	I64DivS
+	Return
+
+TEXT runtime·wasmTruncS(SB), NOSPLIT, $0-0
+	Get R0
+	Get R0
+	F64Ne // NaN
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	F64Const $0x7ffffffffffffc00p0 // Maximum truncated representation of 0x7fffffffffffffff
+	F64Gt
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	F64Const $-0x7ffffffffffffc00p0 // Minimum truncated representation of -0x8000000000000000
+	F64Lt
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	I64TruncF64S
+	Return
+
+TEXT runtime·wasmTruncU(SB), NOSPLIT, $0-0
+	Get R0
+	Get R0
+	F64Ne // NaN
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	F64Const $0xfffffffffffff800p0 // Maximum truncated representation of 0xffffffffffffffff
+	F64Gt
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	F64Const $0.
+	F64Lt
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	I64TruncF64U
+	Return
+
+TEXT runtime·exitThread(SB), NOSPLIT, $0-0
+	UNDEF
+
+TEXT runtime·osyield(SB), NOSPLIT, $0-0
+	UNDEF
+
+TEXT runtime·usleep(SB), NOSPLIT, $0-0
+	RET // TODO(neelance): implement usleep
+
+TEXT runtime·currentMemory(SB), NOSPLIT, $0
+	Get SP
+	CurrentMemory
+	I32Store ret+0(FP)
+	RET
+
+TEXT runtime·growMemory(SB), NOSPLIT, $0
+	Get SP
+	I32Load pages+0(FP)
+	GrowMemory
+	I32Store ret+8(FP)
+	RET
+
+TEXT ·resetMemoryDataView(SB), NOSPLIT, $0
+	CallImport
+	RET
+
+TEXT ·wasmExit(SB), NOSPLIT, $0
+	CallImport
+	RET
+
+TEXT ·wasmWrite(SB), NOSPLIT, $0
+	CallImport
+	RET
+
+TEXT ·nanotime1(SB), NOSPLIT, $0
+	CallImport
+	RET
+
+TEXT ·walltime1(SB), NOSPLIT, $0
+	CallImport
+	RET
+
+TEXT ·scheduleTimeoutEvent(SB), NOSPLIT, $0
+	CallImport
+	RET
+
+TEXT ·clearTimeoutEvent(SB), NOSPLIT, $0
+	CallImport
+	RET
+
+TEXT ·getRandomData(SB), NOSPLIT, $0
+	CallImport
+	RET
diff --git a/src/runtime/sys_windows_386.s b/src/runtime/sys_windows_386.s
new file mode 100644
index 0000000..ef8a3dd
--- /dev/null
+++ b/src/runtime/sys_windows_386.s
@@ -0,0 +1,573 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// void runtime·asmstdcall(void *c);
+TEXT runtime·asmstdcall(SB),NOSPLIT,$0
+	MOVL	fn+0(FP), BX
+
+	// SetLastError(0).
+	MOVL	$0, 0x34(FS)
+
+	// Copy args to the stack.
+	MOVL	SP, BP
+	MOVL	libcall_n(BX), CX	// words
+	MOVL	CX, AX
+	SALL	$2, AX
+	SUBL	AX, SP			// room for args
+	MOVL	SP, DI
+	MOVL	libcall_args(BX), SI
+	CLD
+	REP; MOVSL
+
+	// Call stdcall or cdecl function.
+	// DI SI BP BX are preserved, SP is not
+	CALL	libcall_fn(BX)
+	MOVL	BP, SP
+
+	// Return result.
+	MOVL	fn+0(FP), BX
+	MOVL	AX, libcall_r1(BX)
+	MOVL	DX, libcall_r2(BX)
+
+	// GetLastError().
+	MOVL	0x34(FS), AX
+	MOVL	AX, libcall_err(BX)
+
+	RET
+
+TEXT	runtime·badsignal2(SB),NOSPLIT,$24
+	// stderr
+	MOVL	$-12, 0(SP)
+	MOVL	SP, BP
+	CALL	*runtime·_GetStdHandle(SB)
+	MOVL	BP, SP
+
+	MOVL	AX, 0(SP)	// handle
+	MOVL	$runtime·badsignalmsg(SB), DX // pointer
+	MOVL	DX, 4(SP)
+	MOVL	runtime·badsignallen(SB), DX // count
+	MOVL	DX, 8(SP)
+	LEAL	20(SP), DX  // written count
+	MOVL	$0, 0(DX)
+	MOVL	DX, 12(SP)
+	MOVL	$0, 16(SP) // overlapped
+	CALL	*runtime·_WriteFile(SB)
+	MOVL	BP, SI
+	RET
+
+// faster get/set last error
+TEXT runtime·getlasterror(SB),NOSPLIT,$0
+	MOVL	0x34(FS), AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·setlasterror(SB),NOSPLIT,$0
+	MOVL	err+0(FP), AX
+	MOVL	AX, 0x34(FS)
+	RET
+
+// Called by Windows as a Vectored Exception Handler (VEH).
+// First argument is pointer to struct containing
+// exception record and context pointers.
+// Handler function is stored in AX.
+// Return 0 for 'not handled', -1 for handled.
+TEXT sigtramp<>(SB),NOSPLIT,$0-0
+	MOVL	ptrs+0(FP), CX
+	SUBL	$40, SP
+
+	// save callee-saved registers
+	MOVL	BX, 28(SP)
+	MOVL	BP, 16(SP)
+	MOVL	SI, 20(SP)
+	MOVL	DI, 24(SP)
+
+	MOVL	AX, SI	// save handler address
+
+	// find g
+	get_tls(DX)
+	CMPL	DX, $0
+	JNE	3(PC)
+	MOVL	$0, AX // continue
+	JMP	done
+	MOVL	g(DX), DX
+	CMPL	DX, $0
+	JNE	2(PC)
+	CALL	runtime·badsignal2(SB)
+
+	// save g and SP in case of stack switch
+	MOVL	DX, 32(SP)	// g
+	MOVL	SP, 36(SP)
+
+	// do we need to switch to the g0 stack?
+	MOVL	g_m(DX), BX
+	MOVL	m_g0(BX), BX
+	CMPL	DX, BX
+	JEQ	g0
+
+	// switch to the g0 stack
+	get_tls(BP)
+	MOVL	BX, g(BP)
+	MOVL	(g_sched+gobuf_sp)(BX), DI
+	// make it look like mstart called us on g0, to stop traceback
+	SUBL	$4, DI
+	MOVL	$runtime·mstart(SB), 0(DI)
+	// traceback will think that we've done SUBL
+	// on this stack, so subtract them here to match.
+	// (we need room for sighandler arguments anyway).
+	// and re-save old SP for restoring later.
+	SUBL	$40, DI
+	MOVL	SP, 36(DI)
+	MOVL	DI, SP
+
+g0:
+	MOVL	0(CX), BX // ExceptionRecord*
+	MOVL	4(CX), CX // Context*
+	MOVL	BX, 0(SP)
+	MOVL	CX, 4(SP)
+	MOVL	DX, 8(SP)
+	CALL	SI	// call handler
+	// AX is set to report result back to Windows
+	MOVL	12(SP), AX
+
+	// switch back to original stack and g
+	// no-op if we never left.
+	MOVL	36(SP), SP
+	MOVL	32(SP), DX
+	get_tls(BP)
+	MOVL	DX, g(BP)
+
+done:
+	// restore callee-saved registers
+	MOVL	24(SP), DI
+	MOVL	20(SP), SI
+	MOVL	16(SP), BP
+	MOVL	28(SP), BX
+
+	ADDL	$40, SP
+	// RET 4 (return and pop 4 bytes parameters)
+	BYTE $0xC2; WORD $4
+	RET // unreached; make assembler happy
+
+TEXT runtime·exceptiontramp(SB),NOSPLIT,$0
+	MOVL	$runtime·exceptionhandler(SB), AX
+	JMP	sigtramp<>(SB)
+
+TEXT runtime·firstcontinuetramp(SB),NOSPLIT,$0-0
+	// is never called
+	INT	$3
+
+TEXT runtime·lastcontinuetramp(SB),NOSPLIT,$0-0
+	MOVL	$runtime·lastcontinuehandler(SB), AX
+	JMP	sigtramp<>(SB)
+
+// Called by OS using stdcall ABI: bool ctrlhandler(uint32).
+TEXT runtime·ctrlhandler(SB),NOSPLIT,$0
+	PUSHL	$runtime·ctrlhandler1(SB)
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	CALL	runtime·externalthreadhandler(SB)
+	MOVL	4(SP), CX
+	ADDL	$12, SP
+	JMP	CX
+
+// Called by OS using stdcall ABI: uint32 profileloop(void*).
+TEXT runtime·profileloop(SB),NOSPLIT,$0
+	PUSHL	$runtime·profileloop1(SB)
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	CALL	runtime·externalthreadhandler(SB)
+	MOVL	4(SP), CX
+	ADDL	$12, SP
+	JMP	CX
+
+TEXT runtime·externalthreadhandler(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	PUSHL	BX
+	PUSHL	SI
+	PUSHL	DI
+	PUSHL	0x14(FS)
+	MOVL	SP, DX
+
+	// setup dummy m, g
+	SUBL	$m__size, SP		// space for M
+	MOVL	SP, 0(SP)
+	MOVL	$m__size, 4(SP)
+	CALL	runtime·memclrNoHeapPointers(SB)	// smashes AX,BX,CX
+
+	LEAL	m_tls(SP), CX
+	MOVL	CX, 0x14(FS)
+	MOVL	SP, BX
+	SUBL	$g__size, SP		// space for G
+	MOVL	SP, g(CX)
+	MOVL	SP, m_g0(BX)
+
+	MOVL	SP, 0(SP)
+	MOVL	$g__size, 4(SP)
+	CALL	runtime·memclrNoHeapPointers(SB)	// smashes AX,BX,CX
+	LEAL	g__size(SP), BX
+	MOVL	BX, g_m(SP)
+
+	LEAL	-32768(SP), CX		// must be less than SizeOfStackReserve set by linker
+	MOVL	CX, (g_stack+stack_lo)(SP)
+	ADDL	$const__StackGuard, CX
+	MOVL	CX, g_stackguard0(SP)
+	MOVL	CX, g_stackguard1(SP)
+	MOVL	DX, (g_stack+stack_hi)(SP)
+
+	PUSHL	AX			// room for return value
+	PUSHL	16(BP)			// arg for handler
+	CALL	8(BP)
+	POPL	CX
+	POPL	AX			// pass return value to Windows in AX
+
+	get_tls(CX)
+	MOVL	g(CX), CX
+	MOVL	(g_stack+stack_hi)(CX), SP
+	POPL	0x14(FS)
+	POPL	DI
+	POPL	SI
+	POPL	BX
+	POPL	BP
+	RET
+
+GLOBL runtime·cbctxts(SB), NOPTR, $4
+
+TEXT runtime·callbackasm1(SB),NOSPLIT,$0
+  	MOVL	0(SP), AX	// will use to find our callback context
+
+	// remove return address from stack, we are not returning to callbackasm, but to its caller.
+	ADDL	$4, SP
+
+	// address to callback parameters into CX
+	LEAL	4(SP), CX
+
+	// save registers as required for windows callback
+	PUSHL	DI
+	PUSHL	SI
+	PUSHL	BP
+	PUSHL	BX
+
+	// Go ABI requires DF flag to be cleared.
+	CLD
+
+	// determine index into runtime·cbs table
+	SUBL	$runtime·callbackasm(SB), AX
+	MOVL	$0, DX
+	MOVL	$5, BX	// divide by 5 because each call instruction in runtime·callbacks is 5 bytes long
+	DIVL	BX
+	SUBL	$1, AX	// subtract 1 because return PC is to the next slot
+
+	// Create a struct callbackArgs on our stack.
+	SUBL	$(12+callbackArgs__size), SP
+	MOVL	AX, (12+callbackArgs_index)(SP)		// callback index
+	MOVL	CX, (12+callbackArgs_args)(SP)		// address of args vector
+	MOVL	$0, (12+callbackArgs_result)(SP)	// result
+	LEAL	12(SP), AX	// AX = &callbackArgs{...}
+
+	// Call cgocallback, which will call callbackWrap(frame).
+	MOVL	$0, 8(SP)	// context
+	MOVL	AX, 4(SP)	// frame (address of callbackArgs)
+	LEAL	·callbackWrap(SB), AX
+	MOVL	AX, 0(SP)	// PC of function to call
+	CALL	runtime·cgocallback(SB)
+
+	// Get callback result.
+	MOVL	(12+callbackArgs_result)(SP), AX
+	// Get popRet.
+	MOVL	(12+callbackArgs_retPop)(SP), CX	// Can't use a callee-save register
+	ADDL	$(12+callbackArgs__size), SP
+
+	// restore registers as required for windows callback
+	POPL	BX
+	POPL	BP
+	POPL	SI
+	POPL	DI
+
+	// remove callback parameters before return (as per Windows spec)
+	POPL	DX
+	ADDL	CX, SP
+	PUSHL	DX
+
+	CLD
+
+	RET
+
+// void tstart(M *newm);
+TEXT tstart<>(SB),NOSPLIT,$0
+	MOVL	newm+0(FP), CX		// m
+	MOVL	m_g0(CX), DX		// g
+
+	// Layout new m scheduler stack on os stack.
+	MOVL	SP, AX
+	MOVL	AX, (g_stack+stack_hi)(DX)
+	SUBL	$(64*1024), AX		// initial stack size (adjusted later)
+	MOVL	AX, (g_stack+stack_lo)(DX)
+	ADDL	$const__StackGuard, AX
+	MOVL	AX, g_stackguard0(DX)
+	MOVL	AX, g_stackguard1(DX)
+
+	// Set up tls.
+	LEAL	m_tls(CX), SI
+	MOVL	SI, 0x14(FS)
+	MOVL	CX, g_m(DX)
+	MOVL	DX, g(SI)
+
+	// Someday the convention will be D is always cleared.
+	CLD
+
+	CALL	runtime·stackcheck(SB)	// clobbers AX,CX
+	CALL	runtime·mstart(SB)
+
+	RET
+
+// uint32 tstart_stdcall(M *newm);
+TEXT runtime·tstart_stdcall(SB),NOSPLIT,$0
+	MOVL	newm+0(FP), BX
+
+	PUSHL	BX
+	CALL	tstart<>(SB)
+	POPL	BX
+
+	// Adjust stack for stdcall to return properly.
+	MOVL	(SP), AX		// save return address
+	ADDL	$4, SP			// remove single parameter
+	MOVL	AX, (SP)		// restore return address
+
+	XORL	AX, AX			// return 0 == success
+
+	RET
+
+// setldt(int entry, int address, int limit)
+TEXT runtime·setldt(SB),NOSPLIT,$0
+	MOVL	base+4(FP), CX
+	MOVL	CX, 0x14(FS)
+	RET
+
+// onosstack calls fn on OS stack.
+// func onosstack(fn unsafe.Pointer, arg uint32)
+TEXT runtime·onosstack(SB),NOSPLIT,$0
+	MOVL	fn+0(FP), AX		// to hide from 8l
+	MOVL	arg+4(FP), BX
+
+	// Execute call on m->g0 stack, in case we are not actually
+	// calling a system call wrapper, like when running under WINE.
+	get_tls(CX)
+	CMPL	CX, $0
+	JNE	3(PC)
+	// Not a Go-managed thread. Do not switch stack.
+	CALL	AX
+	RET
+
+	MOVL	g(CX), BP
+	MOVL	g_m(BP), BP
+
+	// leave pc/sp for cpu profiler
+	MOVL	(SP), SI
+	MOVL	SI, m_libcallpc(BP)
+	MOVL	g(CX), SI
+	MOVL	SI, m_libcallg(BP)
+	// sp must be the last, because once async cpu profiler finds
+	// all three values to be non-zero, it will use them
+	LEAL	fn+0(FP), SI
+	MOVL	SI, m_libcallsp(BP)
+
+	MOVL	m_g0(BP), SI
+	CMPL	g(CX), SI
+	JNE	switch
+	// executing on m->g0 already
+	CALL	AX
+	JMP	ret
+
+switch:
+	// Switch to m->g0 stack and back.
+	MOVL	(g_sched+gobuf_sp)(SI), SI
+	MOVL	SP, -4(SI)
+	LEAL	-4(SI), SP
+	CALL	AX
+	MOVL	0(SP), SP
+
+ret:
+	get_tls(CX)
+	MOVL	g(CX), BP
+	MOVL	g_m(BP), BP
+	MOVL	$0, m_libcallsp(BP)
+	RET
+
+// Runs on OS stack. duration (in 100ns units) is in BX.
+TEXT runtime·usleep2(SB),NOSPLIT,$20
+	// Want negative 100ns units.
+	NEGL	BX
+	MOVL	$-1, hi-4(SP)
+	MOVL	BX, lo-8(SP)
+	LEAL	lo-8(SP), BX
+	MOVL	BX, ptime-12(SP)
+	MOVL	$0, alertable-16(SP)
+	MOVL	$-1, handle-20(SP)
+	MOVL	SP, BP
+	MOVL	runtime·_NtWaitForSingleObject(SB), AX
+	CALL	AX
+	MOVL	BP, SP
+	RET
+
+// Runs on OS stack. duration (in 100ns units) is in BX.
+TEXT runtime·usleep2HighRes(SB),NOSPLIT,$36
+	get_tls(CX)
+	CMPL	CX, $0
+	JE	gisnotset
+
+	// Want negative 100ns units.
+	NEGL	BX
+	MOVL	$-1, hi-4(SP)
+	MOVL	BX, lo-8(SP)
+
+	MOVL	g(CX), CX
+	MOVL	g_m(CX), CX
+	MOVL	(m_mOS+mOS_highResTimer)(CX), CX
+	MOVL	CX, saved_timer-12(SP)
+
+	MOVL	$0, fResume-16(SP)
+	MOVL	$0, lpArgToCompletionRoutine-20(SP)
+	MOVL	$0, pfnCompletionRoutine-24(SP)
+	MOVL	$0, lPeriod-28(SP)
+	LEAL	lo-8(SP), BX
+	MOVL	BX, lpDueTime-32(SP)
+	MOVL	CX, hTimer-36(SP)
+	MOVL	SP, BP
+	MOVL	runtime·_SetWaitableTimer(SB), AX
+	CALL	AX
+	MOVL	BP, SP
+
+	MOVL	$0, ptime-28(SP)
+	MOVL	$0, alertable-32(SP)
+	MOVL	saved_timer-12(SP), CX
+	MOVL	CX, handle-36(SP)
+	MOVL	SP, BP
+	MOVL	runtime·_NtWaitForSingleObject(SB), AX
+	CALL	AX
+	MOVL	BP, SP
+
+	RET
+
+gisnotset:
+	// TLS is not configured. Call usleep2 instead.
+	MOVL	$runtime·usleep2(SB), AX
+	CALL	AX
+	RET
+
+// Runs on OS stack.
+TEXT runtime·switchtothread(SB),NOSPLIT,$0
+	MOVL	SP, BP
+	MOVL	runtime·_SwitchToThread(SB), AX
+	CALL	AX
+	MOVL	BP, SP
+	RET
+
+// See https://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/
+// Must read hi1, then lo, then hi2. The snapshot is valid if hi1 == hi2.
+#define _INTERRUPT_TIME 0x7ffe0008
+#define _SYSTEM_TIME 0x7ffe0014
+#define time_lo 0
+#define time_hi1 4
+#define time_hi2 8
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$0-8
+	CMPB	runtime·useQPCTime(SB), $0
+	JNE	useQPC
+loop:
+	MOVL	(_INTERRUPT_TIME+time_hi1), AX
+	MOVL	(_INTERRUPT_TIME+time_lo), CX
+	MOVL	(_INTERRUPT_TIME+time_hi2), DI
+	CMPL	AX, DI
+	JNE	loop
+
+	// wintime = DI:CX, multiply by 100
+	MOVL	$100, AX
+	MULL	CX
+	IMULL	$100, DI
+	ADDL	DI, DX
+	// wintime*100 = DX:AX
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+useQPC:
+	JMP	runtime·nanotimeQPC(SB)
+	RET
+
+TEXT time·now(SB),NOSPLIT,$0-20
+	CMPB	runtime·useQPCTime(SB), $0
+	JNE	useQPC
+loop:
+	MOVL	(_INTERRUPT_TIME+time_hi1), AX
+	MOVL	(_INTERRUPT_TIME+time_lo), CX
+	MOVL	(_INTERRUPT_TIME+time_hi2), DI
+	CMPL	AX, DI
+	JNE	loop
+
+	// w = DI:CX
+	// multiply by 100
+	MOVL	$100, AX
+	MULL	CX
+	IMULL	$100, DI
+	ADDL	DI, DX
+	// w*100 = DX:AX
+	MOVL	AX, mono+12(FP)
+	MOVL	DX, mono+16(FP)
+
+wall:
+	MOVL	(_SYSTEM_TIME+time_hi1), CX
+	MOVL	(_SYSTEM_TIME+time_lo), AX
+	MOVL	(_SYSTEM_TIME+time_hi2), DX
+	CMPL	CX, DX
+	JNE	wall
+
+	// w = DX:AX
+	// convert to Unix epoch (but still 100ns units)
+	#define delta 116444736000000000
+	SUBL	$(delta & 0xFFFFFFFF), AX
+	SBBL $(delta >> 32), DX
+
+	// nano/100 = DX:AX
+	// split into two decimal halves by div 1e9.
+	// (decimal point is two spots over from correct place,
+	// but we avoid overflow in the high word.)
+	MOVL	$1000000000, CX
+	DIVL	CX
+	MOVL	AX, DI
+	MOVL	DX, SI
+
+	// DI = nano/100/1e9 = nano/1e11 = sec/100, DX = SI = nano/100%1e9
+	// split DX into seconds and nanoseconds by div 1e7 magic multiply.
+	MOVL	DX, AX
+	MOVL	$1801439851, CX
+	MULL	CX
+	SHRL	$22, DX
+	MOVL	DX, BX
+	IMULL	$10000000, DX
+	MOVL	SI, CX
+	SUBL	DX, CX
+
+	// DI = sec/100 (still)
+	// BX = (nano/100%1e9)/1e7 = (nano/1e9)%100 = sec%100
+	// CX = (nano/100%1e9)%1e7 = (nano%1e9)/100 = nsec/100
+	// store nsec for return
+	IMULL	$100, CX
+	MOVL	CX, nsec+8(FP)
+
+	// DI = sec/100 (still)
+	// BX = sec%100
+	// construct DX:AX = 64-bit sec and store for return
+	MOVL	$0, DX
+	MOVL	$100, AX
+	MULL	DI
+	ADDL	BX, AX
+	ADCL	$0, DX
+	MOVL	AX, sec+0(FP)
+	MOVL	DX, sec+4(FP)
+	RET
+useQPC:
+	JMP	runtime·nowQPC(SB)
+	RET
diff --git a/src/runtime/sys_windows_amd64.s b/src/runtime/sys_windows_amd64.s
new file mode 100644
index 0000000..d1690ca
--- /dev/null
+++ b/src/runtime/sys_windows_amd64.s
@@ -0,0 +1,579 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// maxargs should be divisible by 2, as Windows stack
+// must be kept 16-byte aligned on syscall entry.
+#define maxargs 16
+
+// void runtime·asmstdcall(void *c);
+TEXT runtime·asmstdcall(SB),NOSPLIT|NOFRAME,$0
+	// asmcgocall will put first argument into CX.
+	PUSHQ	CX			// save for later
+	MOVQ	libcall_fn(CX), AX
+	MOVQ	libcall_args(CX), SI
+	MOVQ	libcall_n(CX), CX
+
+	// SetLastError(0).
+	MOVQ	0x30(GS), DI
+	MOVL	$0, 0x68(DI)
+
+	SUBQ	$(maxargs*8), SP	// room for args
+
+	// Fast version, do not store args on the stack.
+	CMPL	CX, $4
+	JLE	loadregs
+
+	// Check we have enough room for args.
+	CMPL	CX, $maxargs
+	JLE	2(PC)
+	INT	$3			// not enough room -> crash
+
+	// Copy args to the stack.
+	MOVQ	SP, DI
+	CLD
+	REP; MOVSQ
+	MOVQ	SP, SI
+
+loadregs:
+	// Load first 4 args into correspondent registers.
+	MOVQ	0(SI), CX
+	MOVQ	8(SI), DX
+	MOVQ	16(SI), R8
+	MOVQ	24(SI), R9
+	// Floating point arguments are passed in the XMM
+	// registers. Set them here in case any of the arguments
+	// are floating point values. For details see
+	//	https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx
+	MOVQ	CX, X0
+	MOVQ	DX, X1
+	MOVQ	R8, X2
+	MOVQ	R9, X3
+
+	// Call stdcall function.
+	CALL	AX
+
+	ADDQ	$(maxargs*8), SP
+
+	// Return result.
+	POPQ	CX
+	MOVQ	AX, libcall_r1(CX)
+	// Floating point return values are returned in XMM0. Setting r2 to this
+	// value in case this call returned a floating point value. For details,
+	// see https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention
+	MOVQ    X0, libcall_r2(CX)
+
+	// GetLastError().
+	MOVQ	0x30(GS), DI
+	MOVL	0x68(DI), AX
+	MOVQ	AX, libcall_err(CX)
+
+	RET
+
+TEXT runtime·badsignal2(SB),NOSPLIT|NOFRAME,$48
+	// stderr
+	MOVQ	$-12, CX // stderr
+	MOVQ	CX, 0(SP)
+	MOVQ	runtime·_GetStdHandle(SB), AX
+	CALL	AX
+
+	MOVQ	AX, CX	// handle
+	MOVQ	CX, 0(SP)
+	MOVQ	$runtime·badsignalmsg(SB), DX // pointer
+	MOVQ	DX, 8(SP)
+	MOVL	$runtime·badsignallen(SB), R8 // count
+	MOVQ	R8, 16(SP)
+	LEAQ	40(SP), R9  // written count
+	MOVQ	$0, 0(R9)
+	MOVQ	R9, 24(SP)
+	MOVQ	$0, 32(SP)	// overlapped
+	MOVQ	runtime·_WriteFile(SB), AX
+	CALL	AX
+
+	RET
+
+// faster get/set last error
+TEXT runtime·getlasterror(SB),NOSPLIT,$0
+	MOVQ	0x30(GS), AX
+	MOVL	0x68(AX), AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·setlasterror(SB),NOSPLIT,$0
+	MOVL	err+0(FP), AX
+	MOVQ	0x30(GS),	CX
+	MOVL	AX, 0x68(CX)
+	RET
+
+// Called by Windows as a Vectored Exception Handler (VEH).
+// First argument is pointer to struct containing
+// exception record and context pointers.
+// Handler function is stored in AX.
+// Return 0 for 'not handled', -1 for handled.
+TEXT sigtramp<>(SB),NOSPLIT|NOFRAME,$0-0
+	// CX: PEXCEPTION_POINTERS ExceptionInfo
+
+	// DI SI BP BX R12 R13 R14 R15 registers and DF flag are preserved
+	// as required by windows callback convention.
+	PUSHFQ
+	SUBQ	$112, SP
+	MOVQ	DI, 80(SP)
+	MOVQ	SI, 72(SP)
+	MOVQ	BP, 64(SP)
+	MOVQ	BX, 56(SP)
+	MOVQ	R12, 48(SP)
+	MOVQ	R13, 40(SP)
+	MOVQ	R14, 32(SP)
+	MOVQ	R15, 88(SP)
+
+	MOVQ	AX, R15	// save handler address
+
+	// find g
+	get_tls(DX)
+	CMPQ	DX, $0
+	JNE	3(PC)
+	MOVQ	$0, AX // continue
+	JMP	done
+	MOVQ	g(DX), DX
+	CMPQ	DX, $0
+	JNE	2(PC)
+	CALL	runtime·badsignal2(SB)
+
+	// save g and SP in case of stack switch
+	MOVQ	DX, 96(SP) // g
+	MOVQ	SP, 104(SP)
+
+	// do we need to switch to the g0 stack?
+	MOVQ	g_m(DX), BX
+	MOVQ	m_g0(BX), BX
+	CMPQ	DX, BX
+	JEQ	g0
+
+	// switch to g0 stack
+	get_tls(BP)
+	MOVQ	BX, g(BP)
+	MOVQ	(g_sched+gobuf_sp)(BX), DI
+	// make it look like mstart called us on g0, to stop traceback
+	SUBQ	$8, DI
+	MOVQ	$runtime·mstart(SB), SI
+	MOVQ	SI, 0(DI)
+	// traceback will think that we've done PUSHFQ and SUBQ
+	// on this stack, so subtract them here to match.
+	// (we need room for sighandler arguments anyway).
+	// and re-save old SP for restoring later.
+	SUBQ	$(112+8), DI
+	// save g, save old stack pointer.
+	MOVQ	SP, 104(DI)
+	MOVQ	DI, SP
+
+g0:
+	MOVQ	0(CX), BX // ExceptionRecord*
+	MOVQ	8(CX), CX // Context*
+	MOVQ	BX, 0(SP)
+	MOVQ	CX, 8(SP)
+	MOVQ	DX, 16(SP)
+	CALL	R15	// call handler
+	// AX is set to report result back to Windows
+	MOVL	24(SP), AX
+
+	// switch back to original stack and g
+	// no-op if we never left.
+	MOVQ	104(SP), SP
+	MOVQ	96(SP), DX
+	get_tls(BP)
+	MOVQ	DX, g(BP)
+
+done:
+	// restore registers as required for windows callback
+	MOVQ	88(SP), R15
+	MOVQ	32(SP), R14
+	MOVQ	40(SP), R13
+	MOVQ	48(SP), R12
+	MOVQ	56(SP), BX
+	MOVQ	64(SP), BP
+	MOVQ	72(SP), SI
+	MOVQ	80(SP), DI
+	ADDQ	$112, SP
+	POPFQ
+
+	RET
+
+TEXT runtime·exceptiontramp(SB),NOSPLIT|NOFRAME,$0
+	MOVQ	$runtime·exceptionhandler(SB), AX
+	JMP	sigtramp<>(SB)
+
+TEXT runtime·firstcontinuetramp(SB),NOSPLIT|NOFRAME,$0-0
+	MOVQ	$runtime·firstcontinuehandler(SB), AX
+	JMP	sigtramp<>(SB)
+
+TEXT runtime·lastcontinuetramp(SB),NOSPLIT|NOFRAME,$0-0
+	MOVQ	$runtime·lastcontinuehandler(SB), AX
+	JMP	sigtramp<>(SB)
+
+TEXT runtime·ctrlhandler(SB),NOSPLIT|NOFRAME,$8
+	MOVQ	CX, 16(SP)		// spill
+	MOVQ	$runtime·ctrlhandler1(SB), CX
+	MOVQ	CX, 0(SP)
+	CALL	runtime·externalthreadhandler(SB)
+	RET
+
+TEXT runtime·profileloop(SB),NOSPLIT|NOFRAME,$8
+	MOVQ	$runtime·profileloop1(SB), CX
+	MOVQ	CX, 0(SP)
+	CALL	runtime·externalthreadhandler(SB)
+	RET
+
+TEXT runtime·externalthreadhandler(SB),NOSPLIT|NOFRAME,$0
+	PUSHQ	BP
+	MOVQ	SP, BP
+	PUSHQ	BX
+	PUSHQ	SI
+	PUSHQ	DI
+	PUSHQ	0x28(GS)
+	MOVQ	SP, DX
+
+	// setup dummy m, g
+	SUBQ	$m__size, SP		// space for M
+	MOVQ	SP, 0(SP)
+	MOVQ	$m__size, 8(SP)
+	CALL	runtime·memclrNoHeapPointers(SB)	// smashes AX,BX,CX, maybe BP
+
+	LEAQ	m_tls(SP), CX
+	MOVQ	CX, 0x28(GS)
+	MOVQ	SP, BX
+	SUBQ	$g__size, SP		// space for G
+	MOVQ	SP, g(CX)
+	MOVQ	SP, m_g0(BX)
+
+	MOVQ	SP, 0(SP)
+	MOVQ	$g__size, 8(SP)
+	CALL	runtime·memclrNoHeapPointers(SB)	// smashes AX,BX,CX, maybe BP
+	LEAQ	g__size(SP), BX
+	MOVQ	BX, g_m(SP)
+
+	LEAQ	-32768(SP), CX		// must be less than SizeOfStackReserve set by linker
+	MOVQ	CX, (g_stack+stack_lo)(SP)
+	ADDQ	$const__StackGuard, CX
+	MOVQ	CX, g_stackguard0(SP)
+	MOVQ	CX, g_stackguard1(SP)
+	MOVQ	DX, (g_stack+stack_hi)(SP)
+
+	PUSHQ	AX			// room for return value
+	PUSHQ	32(BP)			// arg for handler
+	CALL	16(BP)
+	POPQ	CX
+	POPQ	AX			// pass return value to Windows in AX
+
+	get_tls(CX)
+	MOVQ	g(CX), CX
+	MOVQ	(g_stack+stack_hi)(CX), SP
+	POPQ	0x28(GS)
+	POPQ	DI
+	POPQ	SI
+	POPQ	BX
+	POPQ	BP
+	RET
+
+GLOBL runtime·cbctxts(SB), NOPTR, $8
+
+TEXT runtime·callbackasm1(SB),NOSPLIT,$0
+	// Construct args vector for cgocallback().
+	// By windows/amd64 calling convention first 4 args are in CX, DX, R8, R9
+	// args from the 5th on are on the stack.
+	// In any case, even if function has 0,1,2,3,4 args, there is reserved
+	// but uninitialized "shadow space" for the first 4 args.
+	// The values are in registers.
+  	MOVQ	CX, (16+0)(SP)
+  	MOVQ	DX, (16+8)(SP)
+  	MOVQ	R8, (16+16)(SP)
+  	MOVQ	R9, (16+24)(SP)
+	// R8 = address of args vector
+	LEAQ	(16+0)(SP), R8
+
+	// remove return address from stack, we are not returning to callbackasm, but to its caller.
+  	MOVQ	0(SP), AX
+	ADDQ	$8, SP
+
+	// determine index into runtime·cbs table
+	MOVQ	$runtime·callbackasm(SB), DX
+	SUBQ	DX, AX
+	MOVQ	$0, DX
+	MOVQ	$5, CX	// divide by 5 because each call instruction in runtime·callbacks is 5 bytes long
+	DIVL	CX
+	SUBQ	$1, AX	// subtract 1 because return PC is to the next slot
+
+	// DI SI BP BX R12 R13 R14 R15 registers and DF flag are preserved
+	// as required by windows callback convention.
+	PUSHFQ
+	SUBQ	$64, SP
+	MOVQ	DI, 56(SP)
+	MOVQ	SI, 48(SP)
+	MOVQ	BP, 40(SP)
+	MOVQ	BX, 32(SP)
+	MOVQ	R12, 24(SP)
+	MOVQ	R13, 16(SP)
+	MOVQ	R14, 8(SP)
+	MOVQ	R15, 0(SP)
+
+	// Go ABI requires DF flag to be cleared.
+	CLD
+
+	// Create a struct callbackArgs on our stack to be passed as
+	// the "frame" to cgocallback and on to callbackWrap.
+	SUBQ	$(24+callbackArgs__size), SP
+	MOVQ	AX, (24+callbackArgs_index)(SP) 	// callback index
+	MOVQ	R8, (24+callbackArgs_args)(SP)  	// address of args vector
+	MOVQ	$0, (24+callbackArgs_result)(SP)	// result
+	LEAQ	24(SP), AX
+	// Call cgocallback, which will call callbackWrap(frame).
+	MOVQ	$0, 16(SP)	// context
+	MOVQ	AX, 8(SP)	// frame (address of callbackArgs)
+	LEAQ	·callbackWrap(SB), BX
+	MOVQ	BX, 0(SP)	// PC of function value to call (callbackWrap)
+	CALL	·cgocallback(SB)
+	// Get callback result.
+	MOVQ	(24+callbackArgs_result)(SP), AX
+	ADDQ	$(24+callbackArgs__size), SP
+
+	// restore registers as required for windows callback
+	MOVQ	0(SP), R15
+	MOVQ	8(SP), R14
+	MOVQ	16(SP), R13
+	MOVQ	24(SP), R12
+	MOVQ	32(SP), BX
+	MOVQ	40(SP), BP
+	MOVQ	48(SP), SI
+	MOVQ	56(SP), DI
+	ADDQ	$64, SP
+	POPFQ
+
+	// The return value was placed in AX above.
+	RET
+
+// uint32 tstart_stdcall(M *newm);
+TEXT runtime·tstart_stdcall(SB),NOSPLIT,$0
+	// CX contains first arg newm
+	MOVQ	m_g0(CX), DX		// g
+
+	// Layout new m scheduler stack on os stack.
+	MOVQ	SP, AX
+	MOVQ	AX, (g_stack+stack_hi)(DX)
+	SUBQ	$(64*1024), AX		// initial stack size (adjusted later)
+	MOVQ	AX, (g_stack+stack_lo)(DX)
+	ADDQ	$const__StackGuard, AX
+	MOVQ	AX, g_stackguard0(DX)
+	MOVQ	AX, g_stackguard1(DX)
+
+	// Set up tls.
+	LEAQ	m_tls(CX), SI
+	MOVQ	SI, 0x28(GS)
+	MOVQ	CX, g_m(DX)
+	MOVQ	DX, g(SI)
+
+	// Someday the convention will be D is always cleared.
+	CLD
+
+	CALL	runtime·stackcheck(SB)	// clobbers AX,CX
+	CALL	runtime·mstart(SB)
+
+	XORL	AX, AX			// return 0 == success
+	RET
+
+// set tls base to DI
+TEXT runtime·settls(SB),NOSPLIT,$0
+	MOVQ	DI, 0x28(GS)
+	RET
+
+// func onosstack(fn unsafe.Pointer, arg uint32)
+TEXT runtime·onosstack(SB),NOSPLIT,$0
+	MOVQ	fn+0(FP), AX		// to hide from 6l
+	MOVL	arg+8(FP), BX
+
+	// Execute call on m->g0 stack, in case we are not actually
+	// calling a system call wrapper, like when running under WINE.
+	get_tls(R15)
+	CMPQ	R15, $0
+	JNE	3(PC)
+	// Not a Go-managed thread. Do not switch stack.
+	CALL	AX
+	RET
+
+	MOVQ	g(R15), R13
+	MOVQ	g_m(R13), R13
+
+	// leave pc/sp for cpu profiler
+	MOVQ	(SP), R12
+	MOVQ	R12, m_libcallpc(R13)
+	MOVQ	g(R15), R12
+	MOVQ	R12, m_libcallg(R13)
+	// sp must be the last, because once async cpu profiler finds
+	// all three values to be non-zero, it will use them
+	LEAQ	fn+0(FP), R12
+	MOVQ	R12, m_libcallsp(R13)
+
+	MOVQ	m_g0(R13), R14
+	CMPQ	g(R15), R14
+	JNE	switch
+	// executing on m->g0 already
+	CALL	AX
+	JMP	ret
+
+switch:
+	// Switch to m->g0 stack and back.
+	MOVQ	(g_sched+gobuf_sp)(R14), R14
+	MOVQ	SP, -8(R14)
+	LEAQ	-8(R14), SP
+	CALL	AX
+	MOVQ	0(SP), SP
+
+ret:
+	MOVQ	$0, m_libcallsp(R13)
+	RET
+
+// Runs on OS stack. duration (in 100ns units) is in BX.
+// The function leaves room for 4 syscall parameters
+// (as per windows amd64 calling convention).
+TEXT runtime·usleep2(SB),NOSPLIT|NOFRAME,$48
+	MOVQ	SP, AX
+	ANDQ	$~15, SP	// alignment as per Windows requirement
+	MOVQ	AX, 40(SP)
+	// Want negative 100ns units.
+	NEGQ	BX
+	LEAQ	32(SP), R8  // ptime
+	MOVQ	BX, (R8)
+	MOVQ	$-1, CX // handle
+	MOVQ	$0, DX // alertable
+	MOVQ	runtime·_NtWaitForSingleObject(SB), AX
+	CALL	AX
+	MOVQ	40(SP), SP
+	RET
+
+// Runs on OS stack. duration (in 100ns units) is in BX.
+TEXT runtime·usleep2HighRes(SB),NOSPLIT|NOFRAME,$72
+	get_tls(CX)
+	CMPQ	CX, $0
+	JE	gisnotset
+
+	MOVQ	SP, AX
+	ANDQ	$~15, SP	// alignment as per Windows requirement
+	MOVQ	AX, 64(SP)
+
+	MOVQ	g(CX), CX
+	MOVQ	g_m(CX), CX
+	MOVQ	(m_mOS+mOS_highResTimer)(CX), CX	// hTimer
+	MOVQ	CX, 48(SP)				// save hTimer for later
+	// Want negative 100ns units.
+	NEGQ	BX
+	LEAQ	56(SP), DX				// lpDueTime
+	MOVQ	BX, (DX)
+	MOVQ	$0, R8					// lPeriod
+	MOVQ	$0, R9					// pfnCompletionRoutine
+	MOVQ	$0, AX
+	MOVQ	AX, 32(SP)				// lpArgToCompletionRoutine
+	MOVQ	AX, 40(SP)				// fResume
+	MOVQ	runtime·_SetWaitableTimer(SB), AX
+	CALL	AX
+
+	MOVQ	48(SP), CX				// handle
+	MOVQ	$0, DX					// alertable
+	MOVQ	$0, R8					// ptime
+	MOVQ	runtime·_NtWaitForSingleObject(SB), AX
+	CALL	AX
+
+	MOVQ	64(SP), SP
+	RET
+
+gisnotset:
+	// TLS is not configured. Call usleep2 instead.
+	MOVQ	$runtime·usleep2(SB), AX
+	CALL	AX
+	RET
+
+// Runs on OS stack.
+TEXT runtime·switchtothread(SB),NOSPLIT|NOFRAME,$0
+	MOVQ	SP, AX
+	ANDQ	$~15, SP	// alignment as per Windows requirement
+	SUBQ	$(48), SP	// room for SP and 4 args as per Windows requirement
+				// plus one extra word to keep stack 16 bytes aligned
+	MOVQ	AX, 32(SP)
+	MOVQ	runtime·_SwitchToThread(SB), AX
+	CALL	AX
+	MOVQ	32(SP), SP
+	RET
+
+// See https://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/
+// Must read hi1, then lo, then hi2. The snapshot is valid if hi1 == hi2.
+#define _INTERRUPT_TIME 0x7ffe0008
+#define _SYSTEM_TIME 0x7ffe0014
+#define time_lo 0
+#define time_hi1 4
+#define time_hi2 8
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$0-8
+	CMPB	runtime·useQPCTime(SB), $0
+	JNE	useQPC
+	MOVQ	$_INTERRUPT_TIME, DI
+loop:
+	MOVL	time_hi1(DI), AX
+	MOVL	time_lo(DI), BX
+	MOVL	time_hi2(DI), CX
+	CMPL	AX, CX
+	JNE	loop
+	SHLQ	$32, CX
+	ORQ	BX, CX
+	IMULQ	$100, CX
+	MOVQ	CX, ret+0(FP)
+	RET
+useQPC:
+	JMP	runtime·nanotimeQPC(SB)
+	RET
+
+TEXT time·now(SB),NOSPLIT,$0-24
+	CMPB	runtime·useQPCTime(SB), $0
+	JNE	useQPC
+	MOVQ	$_INTERRUPT_TIME, DI
+loop:
+	MOVL	time_hi1(DI), AX
+	MOVL	time_lo(DI), BX
+	MOVL	time_hi2(DI), CX
+	CMPL	AX, CX
+	JNE	loop
+	SHLQ	$32, AX
+	ORQ	BX, AX
+	IMULQ	$100, AX
+	MOVQ	AX, mono+16(FP)
+
+	MOVQ	$_SYSTEM_TIME, DI
+wall:
+	MOVL	time_hi1(DI), AX
+	MOVL	time_lo(DI), BX
+	MOVL	time_hi2(DI), CX
+	CMPL	AX, CX
+	JNE	wall
+	SHLQ	$32, AX
+	ORQ	BX, AX
+	MOVQ	$116444736000000000, DI
+	SUBQ	DI, AX
+	IMULQ	$100, AX
+
+	// generated code for
+	//	func f(x uint64) (uint64, uint64) { return x/1000000000, x%100000000 }
+	// adapted to reduce duplication
+	MOVQ	AX, CX
+	MOVQ	$1360296554856532783, AX
+	MULQ	CX
+	ADDQ	CX, DX
+	RCRQ	$1, DX
+	SHRQ	$29, DX
+	MOVQ	DX, sec+0(FP)
+	IMULQ	$1000000000, DX
+	SUBQ	DX, CX
+	MOVL	CX, nsec+8(FP)
+	RET
+useQPC:
+	JMP	runtime·nowQPC(SB)
+	RET
diff --git a/src/runtime/sys_windows_arm.s b/src/runtime/sys_windows_arm.s
new file mode 100644
index 0000000..fe26708
--- /dev/null
+++ b/src/runtime/sys_windows_arm.s
@@ -0,0 +1,694 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// void runtime·asmstdcall(void *c);
+TEXT runtime·asmstdcall(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4, R5, R14], (R13)	// push {r4, r5, lr}
+	MOVW	R0, R4			// put libcall * in r4
+	MOVW	R13, R5			// save stack pointer in r5
+
+	// SetLastError(0)
+	MOVW	$0, R0
+	MRC	15, 0, R1, C13, C0, 2
+	MOVW	R0, 0x34(R1)
+
+	MOVW	8(R4), R12	// libcall->args
+
+	// Do we have more than 4 arguments?
+	MOVW	4(R4), R0	// libcall->n
+	SUB.S	$4, R0, R2
+	BLE	loadregs
+
+	// Reserve stack space for remaining args
+	SUB	R2<<2, R13
+	BIC	$0x7, R13	// alignment for ABI
+
+	// R0: count of arguments
+	// R1:
+	// R2: loop counter, from 0 to (n-4)
+	// R3: scratch
+	// R4: pointer to libcall struct
+	// R12: libcall->args
+	MOVW	$0, R2
+stackargs:
+	ADD	$4, R2, R3		// r3 = args[4 + i]
+	MOVW	R3<<2(R12), R3
+	MOVW	R3, R2<<2(R13)		// stack[i] = r3
+
+	ADD	$1, R2			// i++
+	SUB	$4, R0, R3		// while (i < (n - 4))
+	CMP	R3, R2
+	BLT	stackargs
+
+loadregs:
+	CMP	$3, R0
+	MOVW.GT 12(R12), R3
+
+	CMP	$2, R0
+	MOVW.GT 8(R12), R2
+
+	CMP	$1, R0
+	MOVW.GT 4(R12), R1
+
+	CMP	$0, R0
+	MOVW.GT 0(R12), R0
+
+	BIC	$0x7, R13		// alignment for ABI
+	MOVW	0(R4), R12		// branch to libcall->fn
+	BL	(R12)
+
+	MOVW	R5, R13			// free stack space
+	MOVW	R0, 12(R4)		// save return value to libcall->r1
+	MOVW	R1, 16(R4)
+
+	// GetLastError
+	MRC	15, 0, R1, C13, C0, 2
+	MOVW	0x34(R1), R0
+	MOVW	R0, 20(R4)		// store in libcall->err
+
+	MOVM.IA.W (R13), [R4, R5, R15]
+
+TEXT runtime·badsignal2(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4, R14], (R13)	// push {r4, lr}
+	MOVW	R13, R4			// save original stack pointer
+	SUB	$8, R13			// space for 2 variables
+	BIC	$0x7, R13		// alignment for ABI
+
+	// stderr
+	MOVW	runtime·_GetStdHandle(SB), R1
+	MOVW	$-12, R0
+	BL	(R1)
+
+	MOVW	$runtime·badsignalmsg(SB), R1	// lpBuffer
+	MOVW	$runtime·badsignallen(SB), R2	// lpNumberOfBytesToWrite
+	MOVW	(R2), R2
+	ADD	$0x4, R13, R3		// lpNumberOfBytesWritten
+	MOVW	$0, R12			// lpOverlapped
+	MOVW	R12, (R13)
+
+	MOVW	runtime·_WriteFile(SB), R12
+	BL	(R12)
+
+	MOVW	R4, R13			// restore SP
+	MOVM.IA.W (R13), [R4, R15]	// pop {r4, pc}
+
+TEXT runtime·getlasterror(SB),NOSPLIT,$0
+	MRC	15, 0, R0, C13, C0, 2
+	MOVW	0x34(R0), R0
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT runtime·setlasterror(SB),NOSPLIT|NOFRAME,$0
+	MRC	15, 0, R1, C13, C0, 2
+	MOVW	R0, 0x34(R1)
+	RET
+
+// Called by Windows as a Vectored Exception Handler (VEH).
+// First argument is pointer to struct containing
+// exception record and context pointers.
+// Handler function is stored in R1
+// Return 0 for 'not handled', -1 for handled.
+// int32_t sigtramp(
+//     PEXCEPTION_POINTERS ExceptionInfo,
+//     func *GoExceptionHandler);
+TEXT sigtramp<>(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R0, R4-R11, R14], (R13)	// push {r0, r4-r11, lr} (SP-=40)
+	SUB	$(8+20), R13		// reserve space for g, sp, and
+					// parameters/retval to go call
+
+	MOVW	R0, R6			// Save param0
+	MOVW	R1, R7			// Save param1
+
+	BL      runtime·load_g(SB)
+	CMP	$0, g			// is there a current g?
+	BL.EQ	runtime·badsignal2(SB)
+
+	// save g and SP in case of stack switch
+	MOVW	R13, 24(R13)
+	MOVW	g, 20(R13)
+
+	// do we need to switch to the g0 stack?
+	MOVW	g, R5			// R5 = g
+	MOVW	g_m(R5), R2		// R2 = m
+	MOVW	m_g0(R2), R4		// R4 = g0
+	CMP	R5, R4			// if curg == g0
+	BEQ	g0
+
+	// switch to g0 stack
+	MOVW	R4, g				// g = g0
+	MOVW	(g_sched+gobuf_sp)(g), R3	// R3 = g->gobuf.sp
+	BL      runtime·save_g(SB)
+
+	// traceback will think that we've done PUSH and SUB
+	// on this stack, so subtract them here to match.
+	// (we need room for sighandler arguments anyway).
+	// and re-save old SP for restoring later.
+	SUB	$(40+8+20), R3
+	MOVW	R13, 24(R3)		// save old stack pointer
+	MOVW	R3, R13			// switch stack
+
+g0:
+	MOVW	0(R6), R2	// R2 = ExceptionPointers->ExceptionRecord
+	MOVW	4(R6), R3	// R3 = ExceptionPointers->ContextRecord
+
+	// make it look like mstart called us on g0, to stop traceback
+	MOVW    $runtime·mstart(SB), R4
+
+	MOVW	R4, 0(R13)	// Save link register for traceback
+	MOVW	R2, 4(R13)	// Move arg0 (ExceptionRecord) into position
+	MOVW	R3, 8(R13)	// Move arg1 (ContextRecord) into position
+	MOVW	R5, 12(R13)	// Move arg2 (original g) into position
+	BL	(R7)		// Call the go routine
+	MOVW	16(R13), R4	// Fetch return value from stack
+
+	// Compute the value of the g0 stack pointer after deallocating
+	// this frame, then allocating 8 bytes. We may need to store
+	// the resume SP and PC on the g0 stack to work around
+	// control flow guard when we resume from the exception.
+	ADD	$(40+20), R13, R12
+
+	// switch back to original stack and g
+	MOVW	24(R13), R13
+	MOVW	20(R13), g
+	BL      runtime·save_g(SB)
+
+done:
+	MOVW	R4, R0				// move retval into position
+	ADD	$(8 + 20), R13			// free locals
+	MOVM.IA.W (R13), [R3, R4-R11, R14]	// pop {r3, r4-r11, lr}
+
+	// if return value is CONTINUE_SEARCH, do not set up control
+	// flow guard workaround
+	CMP	$0, R0
+	BEQ	return
+
+	// Check if we need to set up the control flow guard workaround.
+	// On Windows/ARM, the stack pointer must lie within system
+	// stack limits when we resume from exception.
+	// Store the resume SP and PC on the g0 stack,
+	// and return to returntramp on the g0 stack. returntramp
+	// pops the saved PC and SP from the g0 stack, resuming execution
+	// at the desired location.
+	// If returntramp has already been set up by a previous exception
+	// handler, don't clobber the stored SP and PC on the stack.
+	MOVW	4(R3), R3			// PEXCEPTION_POINTERS->Context
+	MOVW	0x40(R3), R2			// load PC from context record
+	MOVW	$returntramp<>(SB), R1
+	CMP	R1, R2
+	B.EQ	return				// do not clobber saved SP/PC
+
+	// Save resume SP and PC on g0 stack
+	MOVW	0x38(R3), R2			// load SP from context record
+	MOVW	R2, 0(R12)			// Store resume SP on g0 stack
+	MOVW	0x40(R3), R2			// load PC from context record
+	MOVW	R2, 4(R12)			// Store resume PC on g0 stack
+
+	// Set up context record to return to returntramp on g0 stack
+	MOVW	R12, 0x38(R3)			// save g0 stack pointer
+						// in context record
+	MOVW	$returntramp<>(SB), R2	// save resume address
+	MOVW	R2, 0x40(R3)			// in context record
+
+return:
+	B	(R14)				// return
+
+//
+// Trampoline to resume execution from exception handler.
+// This is part of the control flow guard workaround.
+// It switches stacks and jumps to the continuation address.
+//
+TEXT returntramp<>(SB),NOSPLIT|NOFRAME,$0
+	MOVM.IA	(R13), [R13, R15]		// ldm sp, [sp, pc]
+
+TEXT runtime·exceptiontramp(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$runtime·exceptionhandler(SB), R1
+	B	sigtramp<>(SB)
+
+TEXT runtime·firstcontinuetramp(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$runtime·firstcontinuehandler(SB), R1
+	B	sigtramp<>(SB)
+
+TEXT runtime·lastcontinuetramp(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$runtime·lastcontinuehandler(SB), R1
+	B	sigtramp<>(SB)
+
+TEXT runtime·ctrlhandler(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$runtime·ctrlhandler1(SB), R1
+	B	runtime·externalthreadhandler(SB)
+
+TEXT runtime·profileloop(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$runtime·profileloop1(SB), R1
+	B	runtime·externalthreadhandler(SB)
+
+// int32 externalthreadhandler(uint32 arg, int (*func)(uint32))
+// stack layout:
+//   +----------------+
+//   | callee-save    |
+//   | registers      |
+//   +----------------+
+//   | m              |
+//   +----------------+
+// 20| g              |
+//   +----------------+
+// 16| func ptr (r1)  |
+//   +----------------+
+// 12| argument (r0)  |
+//---+----------------+
+// 8 | param1         |
+//   +----------------+
+// 4 | param0         |
+//   +----------------+
+// 0 | retval         |
+//   +----------------+
+//
+TEXT runtime·externalthreadhandler(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4-R11, R14], (R13)		// push {r4-r11, lr}
+	SUB	$(m__size + g__size + 20), R13	// space for locals
+	MOVW	R0, 12(R13)
+	MOVW	R1, 16(R13)
+
+	// zero out m and g structures
+	ADD	$20, R13, R0			// compute pointer to g
+	MOVW	R0, 4(R13)
+	MOVW	$(m__size + g__size), R0
+	MOVW	R0, 8(R13)
+	BL	runtime·memclrNoHeapPointers(SB)
+
+	// initialize m and g structures
+	ADD	$20, R13, R2			// R2 = g
+	ADD	$(20 + g__size), R13, R3	// R3 = m
+	MOVW	R2, m_g0(R3)			// m->g0 = g
+	MOVW	R3, g_m(R2)			// g->m = m
+	MOVW	R2, m_curg(R3)			// m->curg = g
+
+	MOVW	R2, g
+	BL	runtime·save_g(SB)
+
+	// set up stackguard stuff
+	MOVW	R13, R0
+	MOVW	R0, g_stack+stack_hi(g)
+	SUB	$(32*1024), R0
+	MOVW	R0, (g_stack+stack_lo)(g)
+	MOVW	R0, g_stackguard0(g)
+	MOVW	R0, g_stackguard1(g)
+
+	// move argument into position and call function
+	MOVW	12(R13), R0
+	MOVW	R0, 4(R13)
+	MOVW	16(R13), R1
+	BL	(R1)
+
+	// clear g
+	MOVW	$0, g
+	BL	runtime·save_g(SB)
+
+	MOVW	0(R13), R0			// load return value
+	ADD	$(m__size + g__size + 20), R13	// free locals
+	MOVM.IA.W (R13), [R4-R11, R15]		// pop {r4-r11, pc}
+
+GLOBL runtime·cbctxts(SB), NOPTR, $4
+
+TEXT runtime·callbackasm1(SB),NOSPLIT|NOFRAME,$0
+	// On entry, the trampoline in zcallback_windows_arm.s left
+	// the callback index in R12 (which is volatile in the C ABI).
+
+	// Push callback register arguments r0-r3. We do this first so
+	// they're contiguous with stack arguments.
+	MOVM.DB.W [R0-R3], (R13)
+	// Push C callee-save registers r4-r11 and lr.
+	MOVM.DB.W [R4-R11, R14], (R13)
+	SUB	$(16 + callbackArgs__size), R13	// space for locals
+
+	// Create a struct callbackArgs on our stack.
+	MOVW	R12, (16+callbackArgs_index)(R13)	// callback index
+	MOVW	$(16+callbackArgs__size+4*9)(R13), R0
+	MOVW	R0, (16+callbackArgs_args)(R13)		// address of args vector
+	MOVW	$0, R0
+	MOVW	R0, (16+callbackArgs_result)(R13)	// result
+
+	// Prepare for entry to Go.
+	BL	runtime·load_g(SB)
+
+	// Call cgocallback, which will call callbackWrap(frame).
+	MOVW	$0, R0
+	MOVW	R0, 12(R13)	// context
+	MOVW	$16(R13), R1	// R1 = &callbackArgs{...}
+	MOVW	R1, 8(R13)	// frame (address of callbackArgs)
+	MOVW	$·callbackWrap(SB), R1
+	MOVW	R1, 4(R13)	// PC of function to call
+	BL	runtime·cgocallback(SB)
+
+	// Get callback result.
+	MOVW	(16+callbackArgs_result)(R13), R0
+
+	ADD	$(16 + callbackArgs__size), R13	// free locals
+	MOVM.IA.W (R13), [R4-R11, R12]	// pop {r4-r11, lr=>r12}
+	ADD	$(4*4), R13	// skip r0-r3
+	B	(R12)	// return
+
+// uint32 tstart_stdcall(M *newm);
+TEXT runtime·tstart_stdcall(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4-R11, R14], (R13)		// push {r4-r11, lr}
+
+	MOVW	m_g0(R0), g
+	MOVW	R0, g_m(g)
+	BL	runtime·save_g(SB)
+
+	// do per-thread TLS initialization
+	BL	init_thread_tls<>(SB)
+
+	// Layout new m scheduler stack on os stack.
+	MOVW	R13, R0
+	MOVW	R0, g_stack+stack_hi(g)
+	SUB	$(64*1024), R0
+	MOVW	R0, (g_stack+stack_lo)(g)
+	MOVW	R0, g_stackguard0(g)
+	MOVW	R0, g_stackguard1(g)
+
+	BL	runtime·emptyfunc(SB)	// fault if stack check is wrong
+	BL	runtime·mstart(SB)
+
+	// Exit the thread.
+	MOVW	$0, R0
+	MOVM.IA.W (R13), [R4-R11, R15]		// pop {r4-r11, pc}
+
+// onosstack calls fn on OS stack.
+// adapted from asm_arm.s : systemstack
+// func onosstack(fn unsafe.Pointer, arg uint32)
+TEXT runtime·onosstack(SB),NOSPLIT,$0
+	MOVW	fn+0(FP), R5		// R5 = fn
+	MOVW	arg+4(FP), R6		// R6 = arg
+
+	// This function can be called when there is no g,
+	// for example, when we are handling a callback on a non-go thread.
+	// In this case we're already on the system stack.
+	CMP	$0, g
+	BEQ	noswitch
+
+	MOVW	g_m(g), R1		// R1 = m
+
+	MOVW	m_gsignal(R1), R2	// R2 = gsignal
+	CMP	g, R2
+	B.EQ	noswitch
+
+	MOVW	m_g0(R1), R2		// R2 = g0
+	CMP	g, R2
+	B.EQ	noswitch
+
+	MOVW	m_curg(R1), R3
+	CMP	g, R3
+	B.EQ	switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVW	$runtime·badsystemstack(SB), R0
+	BL	(R0)
+	B	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVW	$runtime·systemstack_switch(SB), R3
+	ADD	$4, R3, R3 // get past push {lr}
+	MOVW	R3, (g_sched+gobuf_pc)(g)
+	MOVW	R13, (g_sched+gobuf_sp)(g)
+	MOVW	LR, (g_sched+gobuf_lr)(g)
+	MOVW	g, (g_sched+gobuf_g)(g)
+
+	// switch to g0
+	MOVW	R2, g
+	MOVW	(g_sched+gobuf_sp)(R2), R3
+	// make it look like mstart called systemstack on g0, to stop traceback
+	SUB	$4, R3, R3
+	MOVW	$runtime·mstart(SB), R4
+	MOVW	R4, 0(R3)
+	MOVW	R3, R13
+
+	// call target function
+	MOVW	R6, R0		// arg
+	BL	(R5)
+
+	// switch back to g
+	MOVW	g_m(g), R1
+	MOVW	m_curg(R1), g
+	MOVW	(g_sched+gobuf_sp)(g), R13
+	MOVW	$0, R3
+	MOVW	R3, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVW.P	4(R13), R14	// restore LR
+	MOVW	R6, R0		// arg
+	B	(R5)
+
+// Runs on OS stack. Duration (in 100ns units) is in R0.
+TEXT runtime·usleep2(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4, R14], (R13)	// push {r4, lr}
+	MOVW	R13, R4			// Save SP
+	SUB	$8, R13			// R13 = R13 - 8
+	BIC	$0x7, R13		// Align SP for ABI
+	RSB	$0, R0, R3		// R3 = -R0
+	MOVW	$0, R1			// R1 = FALSE (alertable)
+	MOVW	$-1, R0			// R0 = handle
+	MOVW	R13, R2			// R2 = pTime
+	MOVW	R3, 0(R2)		// time_lo
+	MOVW	R0, 4(R2)		// time_hi
+	MOVW	runtime·_NtWaitForSingleObject(SB), R3
+	BL	(R3)
+	MOVW	R4, R13			// Restore SP
+	MOVM.IA.W (R13), [R4, R15]	// pop {R4, pc}
+
+// Runs on OS stack. Duration (in 100ns units) is in R0.
+// TODO: neeeds to be implemented properly.
+TEXT runtime·usleep2HighRes(SB),NOSPLIT|NOFRAME,$0
+	B	runtime·abort(SB)
+
+// Runs on OS stack.
+TEXT runtime·switchtothread(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4, R14], (R13)  	// push {R4, lr}
+	MOVW    R13, R4
+	BIC	$0x7, R13		// alignment for ABI
+	MOVW	runtime·_SwitchToThread(SB), R0
+	BL	(R0)
+	MOVW 	R4, R13			// restore stack pointer
+	MOVM.IA.W (R13), [R4, R15]	// pop {R4, pc}
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	B	runtime·armPublicationBarrier(SB)
+
+// never called (cgo not supported)
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0xabcd, R0
+	MOVW	R0, (R0)
+	RET
+
+// See http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-information-the-kuser_shared_data-structure/
+// Must read hi1, then lo, then hi2. The snapshot is valid if hi1 == hi2.
+#define _INTERRUPT_TIME 0x7ffe0008
+#define _SYSTEM_TIME 0x7ffe0014
+#define time_lo 0
+#define time_hi1 4
+#define time_hi2 8
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$0-8
+	MOVW	$0, R0
+	MOVB	runtime·useQPCTime(SB), R0
+	CMP	$0, R0
+	BNE	useQPC
+	MOVW	$_INTERRUPT_TIME, R3
+loop:
+	MOVW	time_hi1(R3), R1
+	MOVW	time_lo(R3), R0
+	MOVW	time_hi2(R3), R2
+	CMP	R1, R2
+	BNE	loop
+
+	// wintime = R1:R0, multiply by 100
+	MOVW	$100, R2
+	MULLU	R0, R2, (R4, R3)    // R4:R3 = R1:R0 * R2
+	MULA	R1, R2, R4, R4
+
+	// wintime*100 = R4:R3
+	MOVW	R3, ret_lo+0(FP)
+	MOVW	R4, ret_hi+4(FP)
+	RET
+useQPC:
+	B	runtime·nanotimeQPC(SB)		// tail call
+	RET
+
+TEXT time·now(SB),NOSPLIT,$0-20
+	MOVW    $0, R0
+	MOVB    runtime·useQPCTime(SB), R0
+	CMP	$0, R0
+	BNE	useQPC
+	MOVW	$_INTERRUPT_TIME, R3
+loop:
+	MOVW	time_hi1(R3), R1
+	MOVW	time_lo(R3), R0
+	MOVW	time_hi2(R3), R2
+	CMP	R1, R2
+	BNE	loop
+
+	// wintime = R1:R0, multiply by 100
+	MOVW	$100, R2
+	MULLU	R0, R2, (R4, R3)    // R4:R3 = R1:R0 * R2
+	MULA	R1, R2, R4, R4
+
+	// wintime*100 = R4:R3
+	MOVW	R3, mono+12(FP)
+	MOVW	R4, mono+16(FP)
+
+	MOVW	$_SYSTEM_TIME, R3
+wall:
+	MOVW	time_hi1(R3), R1
+	MOVW	time_lo(R3), R0
+	MOVW	time_hi2(R3), R2
+	CMP	R1, R2
+	BNE	wall
+
+	// w = R1:R0 in 100ns untis
+	// convert to Unix epoch (but still 100ns units)
+	#define delta 116444736000000000
+	SUB.S   $(delta & 0xFFFFFFFF), R0
+	SBC     $(delta >> 32), R1
+
+	// Convert to nSec
+	MOVW    $100, R2
+	MULLU   R0, R2, (R4, R3)    // R4:R3 = R1:R0 * R2
+	MULA    R1, R2, R4, R4
+	// w = R2:R1 in nSec
+	MOVW    R3, R1	      // R4:R3 -> R2:R1
+	MOVW    R4, R2
+
+	// multiply nanoseconds by reciprocal of 10**9 (scaled by 2**61)
+	// to get seconds (96 bit scaled result)
+	MOVW	$0x89705f41, R3		// 2**61 * 10**-9
+	MULLU	R1,R3,(R6,R5)		// R7:R6:R5 = R2:R1 * R3
+	MOVW	$0,R7
+	MULALU	R2,R3,(R7,R6)
+
+	// unscale by discarding low 32 bits, shifting the rest by 29
+	MOVW	R6>>29,R6		// R7:R6 = (R7:R6:R5 >> 61)
+	ORR	R7<<3,R6
+	MOVW	R7>>29,R7
+
+	// subtract (10**9 * sec) from nsec to get nanosecond remainder
+	MOVW	$1000000000, R5	// 10**9
+	MULLU	R6,R5,(R9,R8)   // R9:R8 = R7:R6 * R5
+	MULA	R7,R5,R9,R9
+	SUB.S	R8,R1		// R2:R1 -= R9:R8
+	SBC	R9,R2
+
+	// because reciprocal was a truncated repeating fraction, quotient
+	// may be slightly too small -- adjust to make remainder < 10**9
+	CMP	R5,R1	// if remainder > 10**9
+	SUB.HS	R5,R1   //    remainder -= 10**9
+	ADD.HS	$1,R6	//    sec += 1
+
+	MOVW	R6,sec_lo+0(FP)
+	MOVW	R7,sec_hi+4(FP)
+	MOVW	R1,nsec+8(FP)
+	RET
+useQPC:
+	B	runtime·nanotimeQPC(SB)		// tail call
+	RET
+
+// save_g saves the g register (R10) into thread local memory
+// so that we can call externally compiled
+// ARM code that will overwrite those registers.
+// NOTE: runtime.gogo assumes that R1 is preserved by this function.
+//       runtime.mcall assumes this function only clobbers R0 and R11.
+// Returns with g in R0.
+// Save the value in the _TEB->TlsSlots array.
+// Effectively implements TlsSetValue().
+// tls_g stores the TLS slot allocated TlsAlloc().
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0
+	MRC	15, 0, R0, C13, C0, 2
+	ADD	$0xe10, R0
+	MOVW 	$runtime·tls_g(SB), R11
+	MOVW	(R11), R11
+	MOVW	g, R11<<2(R0)
+	MOVW	g, R0	// preserve R0 across call to setg<>
+	RET
+
+// load_g loads the g register from thread-local memory,
+// for use after calling externally compiled
+// ARM code that overwrote those registers.
+// Get the value from the _TEB->TlsSlots array.
+// Effectively implements TlsGetValue().
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0
+	MRC	15, 0, R0, C13, C0, 2
+	ADD	$0xe10, R0
+	MOVW 	$runtime·tls_g(SB), g
+	MOVW	(g), g
+	MOVW	g<<2(R0), g
+	RET
+
+// This is called from rt0_go, which runs on the system stack
+// using the initial stack allocated by the OS.
+// It calls back into standard C using the BL below.
+// To do that, the stack pointer must be 8-byte-aligned.
+TEXT runtime·_initcgo(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4, R14], (R13)	// push {r4, lr}
+
+	// Ensure stack is 8-byte aligned before calling C code
+	MOVW	R13, R4
+	BIC	$0x7, R13
+
+	// Allocate a TLS slot to hold g across calls to external code
+	MOVW 	$runtime·_TlsAlloc(SB), R0
+	MOVW	(R0), R0
+	BL	(R0)
+
+	// Assert that slot is less than 64 so we can use _TEB->TlsSlots
+	CMP	$64, R0
+	MOVW	$runtime·abort(SB), R1
+	BL.GE	(R1)
+
+	// Save Slot into tls_g
+	MOVW 	$runtime·tls_g(SB), R1
+	MOVW	R0, (R1)
+
+	BL	init_thread_tls<>(SB)
+
+	MOVW	R4, R13
+	MOVM.IA.W (R13), [R4, R15]	// pop {r4, pc}
+
+// void init_thread_tls()
+//
+// Does per-thread TLS initialization. Saves a pointer to the TLS slot
+// holding G, in the current m.
+//
+//     g->m->tls[0] = &_TEB->TlsSlots[tls_g]
+//
+// The purpose of this is to enable the profiling handler to get the
+// current g associated with the thread. We cannot use m->curg because curg
+// only holds the current user g. If the thread is executing system code or
+// external code, m->curg will be NULL. The thread's TLS slot always holds
+// the current g, so save a reference to this location so the profiling
+// handler can get the real g from the thread's m.
+//
+// Clobbers R0-R3
+TEXT init_thread_tls<>(SB),NOSPLIT|NOFRAME,$0
+	// compute &_TEB->TlsSlots[tls_g]
+	MRC	15, 0, R0, C13, C0, 2
+	ADD	$0xe10, R0
+	MOVW 	$runtime·tls_g(SB), R1
+	MOVW	(R1), R1
+	MOVW	R1<<2, R1
+	ADD	R1, R0
+
+	// save in g->m->tls[0]
+	MOVW	g_m(g), R1
+	MOVW	R0, m_tls(R1)
+	RET
+
+// Holds the TLS Slot, which was allocated by TlsAlloc()
+GLOBL runtime·tls_g+0(SB), NOPTR, $4
diff --git a/src/runtime/sys_x86.go b/src/runtime/sys_x86.go
new file mode 100644
index 0000000..f917cb8
--- /dev/null
+++ b/src/runtime/sys_x86.go
@@ -0,0 +1,27 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build amd64 386
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	sp := buf.sp
+	if sys.RegSize > sys.PtrSize {
+		sp -= sys.PtrSize
+		*(*uintptr)(unsafe.Pointer(sp)) = 0
+	}
+	sp -= sys.PtrSize
+	*(*uintptr)(unsafe.Pointer(sp)) = buf.pc
+	buf.sp = sp
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/syscall2_solaris.go b/src/runtime/syscall2_solaris.go
new file mode 100644
index 0000000..e098e80
--- /dev/null
+++ b/src/runtime/syscall2_solaris.go
@@ -0,0 +1,43 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import _ "unsafe" // for go:linkname
+
+//go:cgo_import_dynamic libc_chdir chdir "libc.so"
+//go:cgo_import_dynamic libc_chroot chroot "libc.so"
+//go:cgo_import_dynamic libc_close close "libc.so"
+//go:cgo_import_dynamic libc_execve execve "libc.so"
+//go:cgo_import_dynamic libc_fcntl fcntl "libc.so"
+//go:cgo_import_dynamic libc_forkx forkx "libc.so"
+//go:cgo_import_dynamic libc_gethostname gethostname "libc.so"
+//go:cgo_import_dynamic libc_getpid getpid "libc.so"
+//go:cgo_import_dynamic libc_ioctl ioctl "libc.so"
+//go:cgo_import_dynamic libc_pipe pipe "libc.so"
+//go:cgo_import_dynamic libc_setgid setgid "libc.so"
+//go:cgo_import_dynamic libc_setgroups setgroups "libc.so"
+//go:cgo_import_dynamic libc_setsid setsid "libc.so"
+//go:cgo_import_dynamic libc_setuid setuid "libc.so"
+//go:cgo_import_dynamic libc_setpgid setpgid "libc.so"
+//go:cgo_import_dynamic libc_syscall syscall "libc.so"
+//go:cgo_import_dynamic libc_wait4 wait4 "libc.so"
+
+//go:linkname libc_chdir libc_chdir
+//go:linkname libc_chroot libc_chroot
+//go:linkname libc_close libc_close
+//go:linkname libc_execve libc_execve
+//go:linkname libc_fcntl libc_fcntl
+//go:linkname libc_forkx libc_forkx
+//go:linkname libc_gethostname libc_gethostname
+//go:linkname libc_getpid libc_getpid
+//go:linkname libc_ioctl libc_ioctl
+//go:linkname libc_pipe libc_pipe
+//go:linkname libc_setgid libc_setgid
+//go:linkname libc_setgroups libc_setgroups
+//go:linkname libc_setsid libc_setsid
+//go:linkname libc_setuid libc_setuid
+//go:linkname libc_setpgid libc_setpgid
+//go:linkname libc_syscall libc_syscall
+//go:linkname libc_wait4 libc_wait4
diff --git a/src/runtime/syscall_aix.go b/src/runtime/syscall_aix.go
new file mode 100644
index 0000000..79b5124
--- /dev/null
+++ b/src/runtime/syscall_aix.go
@@ -0,0 +1,226 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// This file handles some syscalls from the syscall package
+// Especially, syscalls use during forkAndExecInChild which must not split the stack
+
+//go:cgo_import_dynamic libc_chdir chdir "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_chroot chroot "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_dup2 dup2 "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_execve execve "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_fcntl fcntl "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_fork fork "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_ioctl ioctl "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setgid setgid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setgroups setgroups "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setsid setsid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setuid setuid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setpgid setpgid "libc.a/shr_64.o"
+
+//go:linkname libc_chdir libc_chdir
+//go:linkname libc_chroot libc_chroot
+//go:linkname libc_dup2 libc_dup2
+//go:linkname libc_execve libc_execve
+//go:linkname libc_fcntl libc_fcntl
+//go:linkname libc_fork libc_fork
+//go:linkname libc_ioctl libc_ioctl
+//go:linkname libc_setgid libc_setgid
+//go:linkname libc_setgroups libc_setgroups
+//go:linkname libc_setsid libc_setsid
+//go:linkname libc_setuid libc_setuid
+//go:linkname libc_setpgid libc_setpgid
+
+var (
+	libc_chdir,
+	libc_chroot,
+	libc_dup2,
+	libc_execve,
+	libc_fcntl,
+	libc_fork,
+	libc_ioctl,
+	libc_setgid,
+	libc_setgroups,
+	libc_setsid,
+	libc_setuid,
+	libc_setpgid libFunc
+)
+
+// In syscall_syscall6 and syscall_rawsyscall6, r2 is always 0
+// as it's never used on AIX
+// TODO: remove r2 from zsyscall_aix_$GOARCH.go
+
+// Syscall is needed because some packages (like net) need it too.
+// The best way is to return EINVAL and let Golang handles its failure
+// If the syscall can't fail, this function can redirect it to a real syscall.
+//
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:nosplit
+//go:linkname syscall_Syscall
+func syscall_Syscall(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	return 0, 0, _EINVAL
+}
+
+// This is syscall.RawSyscall, it exists to satisfy some build dependency,
+// but it doesn't work.
+//
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:linkname syscall_RawSyscall
+func syscall_RawSyscall(trap, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	panic("RawSyscall not available on AIX")
+}
+
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:nosplit
+//go:cgo_unsafe_args
+//go:linkname syscall_syscall6
+func syscall_syscall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	c := libcall{
+		fn:   fn,
+		n:    nargs,
+		args: uintptr(unsafe.Pointer(&a1)),
+	}
+
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+	exitsyscall()
+	return c.r1, 0, c.err
+}
+
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:nosplit
+//go:cgo_unsafe_args
+//go:linkname syscall_rawSyscall6
+func syscall_rawSyscall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	c := libcall{
+		fn:   fn,
+		n:    nargs,
+		args: uintptr(unsafe.Pointer(&a1)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	return c.r1, 0, c.err
+}
+
+//go:linkname syscall_chdir syscall.chdir
+//go:nosplit
+func syscall_chdir(path uintptr) (err uintptr) {
+	_, err = syscall1(&libc_chdir, path)
+	return
+}
+
+//go:linkname syscall_chroot1 syscall.chroot1
+//go:nosplit
+func syscall_chroot1(path uintptr) (err uintptr) {
+	_, err = syscall1(&libc_chroot, path)
+	return
+}
+
+// like close, but must not split stack, for fork.
+//go:linkname syscall_close syscall.close
+//go:nosplit
+func syscall_close(fd int32) int32 {
+	_, err := syscall1(&libc_close, uintptr(fd))
+	return int32(err)
+}
+
+//go:linkname syscall_dup2child syscall.dup2child
+//go:nosplit
+func syscall_dup2child(old, new uintptr) (val, err uintptr) {
+	val, err = syscall2(&libc_dup2, old, new)
+	return
+}
+
+//go:linkname syscall_execve syscall.execve
+//go:nosplit
+func syscall_execve(path, argv, envp uintptr) (err uintptr) {
+	_, err = syscall3(&libc_execve, path, argv, envp)
+	return
+}
+
+// like exit, but must not split stack, for fork.
+//go:linkname syscall_exit syscall.exit
+//go:nosplit
+func syscall_exit(code uintptr) {
+	syscall1(&libc_exit, code)
+}
+
+//go:linkname syscall_fcntl1 syscall.fcntl1
+//go:nosplit
+func syscall_fcntl1(fd, cmd, arg uintptr) (val, err uintptr) {
+	val, err = syscall3(&libc_fcntl, fd, cmd, arg)
+	return
+
+}
+
+//go:linkname syscall_forkx syscall.forkx
+//go:nosplit
+func syscall_forkx(flags uintptr) (pid uintptr, err uintptr) {
+	pid, err = syscall1(&libc_fork, flags)
+	return
+}
+
+//go:linkname syscall_getpid syscall.getpid
+//go:nosplit
+func syscall_getpid() (pid, err uintptr) {
+	pid, err = syscall0(&libc_getpid)
+	return
+}
+
+//go:linkname syscall_ioctl syscall.ioctl
+//go:nosplit
+func syscall_ioctl(fd, req, arg uintptr) (err uintptr) {
+	_, err = syscall3(&libc_ioctl, fd, req, arg)
+	return
+}
+
+//go:linkname syscall_setgid syscall.setgid
+//go:nosplit
+func syscall_setgid(gid uintptr) (err uintptr) {
+	_, err = syscall1(&libc_setgid, gid)
+	return
+}
+
+//go:linkname syscall_setgroups1 syscall.setgroups1
+//go:nosplit
+func syscall_setgroups1(ngid, gid uintptr) (err uintptr) {
+	_, err = syscall2(&libc_setgroups, ngid, gid)
+	return
+}
+
+//go:linkname syscall_setsid syscall.setsid
+//go:nosplit
+func syscall_setsid() (pid, err uintptr) {
+	pid, err = syscall0(&libc_setsid)
+	return
+}
+
+//go:linkname syscall_setuid syscall.setuid
+//go:nosplit
+func syscall_setuid(uid uintptr) (err uintptr) {
+	_, err = syscall1(&libc_setuid, uid)
+	return
+}
+
+//go:linkname syscall_setpgid syscall.setpgid
+//go:nosplit
+func syscall_setpgid(pid, pgid uintptr) (err uintptr) {
+	_, err = syscall2(&libc_setpgid, pid, pgid)
+	return
+}
+
+//go:linkname syscall_write1 syscall.write1
+//go:nosplit
+func syscall_write1(fd, buf, nbyte uintptr) (n, err uintptr) {
+	n, err = syscall3(&libc_write, fd, buf, nbyte)
+	return
+}
diff --git a/src/runtime/syscall_solaris.go b/src/runtime/syscall_solaris.go
new file mode 100644
index 0000000..0945169
--- /dev/null
+++ b/src/runtime/syscall_solaris.go
@@ -0,0 +1,319 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+var (
+	libc_chdir,
+	libc_chroot,
+	libc_close,
+	libc_execve,
+	libc_fcntl,
+	libc_forkx,
+	libc_gethostname,
+	libc_getpid,
+	libc_ioctl,
+	libc_setgid,
+	libc_setgroups,
+	libc_setsid,
+	libc_setuid,
+	libc_setpgid,
+	libc_syscall,
+	libc_wait4 libcFunc
+)
+
+//go:linkname pipe1x runtime.pipe1
+var pipe1x libcFunc // name to take addr of pipe1
+
+func pipe1() // declared for vet; do NOT call
+
+// Many of these are exported via linkname to assembly in the syscall
+// package.
+
+//go:nosplit
+//go:linkname syscall_sysvicall6
+func syscall_sysvicall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	call := libcall{
+		fn:   fn,
+		n:    nargs,
+		args: uintptr(unsafe.Pointer(&a1)),
+	}
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	exitsyscall()
+	return call.r1, call.r2, call.err
+}
+
+//go:nosplit
+//go:linkname syscall_rawsysvicall6
+func syscall_rawsysvicall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	call := libcall{
+		fn:   fn,
+		n:    nargs,
+		args: uintptr(unsafe.Pointer(&a1)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.r1, call.r2, call.err
+}
+
+// TODO(aram): Once we remove all instances of C calling sysvicallN, make
+// sysvicallN return errors and replace the body of the following functions
+// with calls to sysvicallN.
+
+//go:nosplit
+//go:linkname syscall_chdir
+func syscall_chdir(path uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_chdir)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&path)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:nosplit
+//go:linkname syscall_chroot
+func syscall_chroot(path uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_chroot)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&path)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+// like close, but must not split stack, for forkx.
+//go:nosplit
+//go:linkname syscall_close
+func syscall_close(fd int32) int32 {
+	return int32(sysvicall1(&libc_close, uintptr(fd)))
+}
+
+const _F_DUP2FD = 0x9
+
+//go:nosplit
+//go:linkname syscall_dup2
+func syscall_dup2(oldfd, newfd uintptr) (val, err uintptr) {
+	return syscall_fcntl(oldfd, _F_DUP2FD, newfd)
+}
+
+//go:nosplit
+//go:linkname syscall_execve
+func syscall_execve(path, argv, envp uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_execve)),
+		n:    3,
+		args: uintptr(unsafe.Pointer(&path)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+// like exit, but must not split stack, for forkx.
+//go:nosplit
+//go:linkname syscall_exit
+func syscall_exit(code uintptr) {
+	sysvicall1(&libc_exit, code)
+}
+
+//go:nosplit
+//go:linkname syscall_fcntl
+func syscall_fcntl(fd, cmd, arg uintptr) (val, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_fcntl)),
+		n:    3,
+		args: uintptr(unsafe.Pointer(&fd)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.r1, call.err
+}
+
+//go:nosplit
+//go:linkname syscall_forkx
+func syscall_forkx(flags uintptr) (pid uintptr, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_forkx)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&flags)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	if int(call.r1) != -1 {
+		call.err = 0
+	}
+	return call.r1, call.err
+}
+
+//go:linkname syscall_gethostname
+func syscall_gethostname() (name string, err uintptr) {
+	cname := new([_MAXHOSTNAMELEN]byte)
+	var args = [2]uintptr{uintptr(unsafe.Pointer(&cname[0])), _MAXHOSTNAMELEN}
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_gethostname)),
+		n:    2,
+		args: uintptr(unsafe.Pointer(&args[0])),
+	}
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	exitsyscall()
+	if call.r1 != 0 {
+		return "", call.err
+	}
+	cname[_MAXHOSTNAMELEN-1] = 0
+	return gostringnocopy(&cname[0]), 0
+}
+
+//go:nosplit
+//go:linkname syscall_getpid
+func syscall_getpid() (pid, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_getpid)),
+		n:    0,
+		args: uintptr(unsafe.Pointer(&libc_getpid)), // it's unused but must be non-nil, otherwise crashes
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.r1, call.err
+}
+
+//go:nosplit
+//go:linkname syscall_ioctl
+func syscall_ioctl(fd, req, arg uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_ioctl)),
+		n:    3,
+		args: uintptr(unsafe.Pointer(&fd)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:linkname syscall_pipe
+func syscall_pipe() (r, w, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&pipe1x)),
+		n:    0,
+		args: uintptr(unsafe.Pointer(&pipe1x)), // it's unused but must be non-nil, otherwise crashes
+	}
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	exitsyscall()
+	return call.r1, call.r2, call.err
+}
+
+// This is syscall.RawSyscall, it exists to satisfy some build dependency,
+// but it doesn't work.
+//
+//go:linkname syscall_rawsyscall
+func syscall_rawsyscall(trap, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	panic("RawSyscall not available on Solaris")
+}
+
+// This is syscall.RawSyscall6, it exists to avoid a linker error because
+// syscall.RawSyscall6 is already declared. See golang.org/issue/24357
+//
+//go:linkname syscall_rawsyscall6
+func syscall_rawsyscall6(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	panic("RawSyscall6 not available on Solaris")
+}
+
+//go:nosplit
+//go:linkname syscall_setgid
+func syscall_setgid(gid uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setgid)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&gid)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:nosplit
+//go:linkname syscall_setgroups
+func syscall_setgroups(ngid, gid uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setgroups)),
+		n:    2,
+		args: uintptr(unsafe.Pointer(&ngid)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:nosplit
+//go:linkname syscall_setsid
+func syscall_setsid() (pid, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setsid)),
+		n:    0,
+		args: uintptr(unsafe.Pointer(&libc_setsid)), // it's unused but must be non-nil, otherwise crashes
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.r1, call.err
+}
+
+//go:nosplit
+//go:linkname syscall_setuid
+func syscall_setuid(uid uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setuid)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&uid)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:nosplit
+//go:linkname syscall_setpgid
+func syscall_setpgid(pid, pgid uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setpgid)),
+		n:    2,
+		args: uintptr(unsafe.Pointer(&pid)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:linkname syscall_syscall
+func syscall_syscall(trap, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_syscall)),
+		n:    4,
+		args: uintptr(unsafe.Pointer(&trap)),
+	}
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	exitsyscall()
+	return call.r1, call.r2, call.err
+}
+
+//go:linkname syscall_wait4
+func syscall_wait4(pid uintptr, wstatus *uint32, options uintptr, rusage unsafe.Pointer) (wpid int, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_wait4)),
+		n:    4,
+		args: uintptr(unsafe.Pointer(&pid)),
+	}
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	exitsyscall()
+	return int(call.r1), call.err
+}
+
+//go:nosplit
+//go:linkname syscall_write
+func syscall_write(fd, buf, nbyte uintptr) (n, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_write)),
+		n:    3,
+		args: uintptr(unsafe.Pointer(&fd)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.r1, call.err
+}
diff --git a/src/runtime/syscall_windows.go b/src/runtime/syscall_windows.go
new file mode 100644
index 0000000..7835b49
--- /dev/null
+++ b/src/runtime/syscall_windows.go
@@ -0,0 +1,397 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// cbs stores all registered Go callbacks.
+var cbs struct {
+	lock  mutex
+	ctxt  [cb_max]winCallback
+	index map[winCallbackKey]int
+	n     int
+}
+
+// winCallback records information about a registered Go callback.
+type winCallback struct {
+	fn     *funcval // Go function
+	retPop uintptr  // For 386 cdecl, how many bytes to pop on return
+
+	// abiMap specifies how to translate from a C frame to a Go
+	// frame. This does not specify how to translate back because
+	// the result is always a uintptr. If the C ABI is fastcall,
+	// this assumes the four fastcall registers were first spilled
+	// to the shadow space.
+	abiMap []abiPart
+	// retOffset is the offset of the uintptr-sized result in the Go
+	// frame.
+	retOffset uintptr
+}
+
+// abiPart encodes a step in translating between calling ABIs.
+type abiPart struct {
+	src, dst uintptr
+	len      uintptr
+}
+
+func (a *abiPart) tryMerge(b abiPart) bool {
+	if a.src+a.len == b.src && a.dst+a.len == b.dst {
+		a.len += b.len
+		return true
+	}
+	return false
+}
+
+type winCallbackKey struct {
+	fn    *funcval
+	cdecl bool
+}
+
+func callbackasm()
+
+// callbackasmAddr returns address of runtime.callbackasm
+// function adjusted by i.
+// On x86 and amd64, runtime.callbackasm is a series of CALL instructions,
+// and we want callback to arrive at
+// correspondent call instruction instead of start of
+// runtime.callbackasm.
+// On ARM, runtime.callbackasm is a series of mov and branch instructions.
+// R12 is loaded with the callback index. Each entry is two instructions,
+// hence 8 bytes.
+func callbackasmAddr(i int) uintptr {
+	var entrySize int
+	switch GOARCH {
+	default:
+		panic("unsupported architecture")
+	case "386", "amd64":
+		entrySize = 5
+	case "arm":
+		// On ARM, each entry is a MOV instruction
+		// followed by a branch instruction
+		entrySize = 8
+	}
+	return funcPC(callbackasm) + uintptr(i*entrySize)
+}
+
+const callbackMaxFrame = 64 * sys.PtrSize
+
+// compileCallback converts a Go function fn into a C function pointer
+// that can be passed to Windows APIs.
+//
+// On 386, if cdecl is true, the returned C function will use the
+// cdecl calling convention; otherwise, it will use stdcall. On amd64,
+// it always uses fastcall. On arm, it always uses the ARM convention.
+//
+//go:linkname compileCallback syscall.compileCallback
+func compileCallback(fn eface, cdecl bool) (code uintptr) {
+	if GOARCH != "386" {
+		// cdecl is only meaningful on 386.
+		cdecl = false
+	}
+
+	if fn._type == nil || (fn._type.kind&kindMask) != kindFunc {
+		panic("compileCallback: expected function with one uintptr-sized result")
+	}
+	ft := (*functype)(unsafe.Pointer(fn._type))
+
+	// Check arguments and construct ABI translation.
+	var abiMap []abiPart
+	var src, dst uintptr
+	for _, t := range ft.in() {
+		if t.size > sys.PtrSize {
+			// We don't support this right now. In
+			// stdcall/cdecl, 64-bit ints and doubles are
+			// passed as two words (little endian); and
+			// structs are pushed on the stack. In
+			// fastcall, arguments larger than the word
+			// size are passed by reference. On arm,
+			// 8-byte aligned arguments round up to the
+			// next even register and can be split across
+			// registers and the stack.
+			panic("compileCallback: argument size is larger than uintptr")
+		}
+		if k := t.kind & kindMask; (GOARCH == "amd64" || GOARCH == "arm") && (k == kindFloat32 || k == kindFloat64) {
+			// In fastcall, floating-point arguments in
+			// the first four positions are passed in
+			// floating-point registers, which we don't
+			// currently spill. arm passes floating-point
+			// arguments in VFP registers, which we also
+			// don't support.
+			panic("compileCallback: float arguments not supported")
+		}
+
+		// The Go ABI aligns arguments.
+		dst = alignUp(dst, uintptr(t.align))
+		// In the C ABI, we're already on a word boundary.
+		// Also, sub-word-sized fastcall register arguments
+		// are stored to the least-significant bytes of the
+		// argument word and all supported Windows
+		// architectures are little endian, so src is already
+		// pointing to the right place for smaller arguments.
+		// The same is true on arm.
+
+		// Copy just the size of the argument. Note that this
+		// could be a small by-value struct, but C and Go
+		// struct layouts are compatible, so we can copy these
+		// directly, too.
+		part := abiPart{src, dst, t.size}
+		// Add this step to the adapter.
+		if len(abiMap) == 0 || !abiMap[len(abiMap)-1].tryMerge(part) {
+			abiMap = append(abiMap, part)
+		}
+
+		// cdecl, stdcall, fastcall, and arm pad arguments to word size.
+		src += sys.PtrSize
+		// The Go ABI packs arguments.
+		dst += t.size
+	}
+	// The Go ABI aligns the result to the word size. src is
+	// already aligned.
+	dst = alignUp(dst, sys.PtrSize)
+	retOffset := dst
+
+	if len(ft.out()) != 1 {
+		panic("compileCallback: expected function with one uintptr-sized result")
+	}
+	if ft.out()[0].size != sys.PtrSize {
+		panic("compileCallback: expected function with one uintptr-sized result")
+	}
+	if k := ft.out()[0].kind & kindMask; k == kindFloat32 || k == kindFloat64 {
+		// In cdecl and stdcall, float results are returned in
+		// ST(0). In fastcall, they're returned in XMM0.
+		// Either way, it's not AX.
+		panic("compileCallback: float results not supported")
+	}
+	// Make room for the uintptr-sized result.
+	dst += sys.PtrSize
+
+	if dst > callbackMaxFrame {
+		panic("compileCallback: function argument frame too large")
+	}
+
+	// For cdecl, the callee is responsible for popping its
+	// arguments from the C stack.
+	var retPop uintptr
+	if cdecl {
+		retPop = src
+	}
+
+	key := winCallbackKey{(*funcval)(fn.data), cdecl}
+
+	lock(&cbs.lock) // We don't unlock this in a defer because this is used from the system stack.
+
+	// Check if this callback is already registered.
+	if n, ok := cbs.index[key]; ok {
+		unlock(&cbs.lock)
+		return callbackasmAddr(n)
+	}
+
+	// Register the callback.
+	if cbs.index == nil {
+		cbs.index = make(map[winCallbackKey]int)
+	}
+	n := cbs.n
+	if n >= len(cbs.ctxt) {
+		unlock(&cbs.lock)
+		throw("too many callback functions")
+	}
+	c := winCallback{key.fn, retPop, abiMap, retOffset}
+	cbs.ctxt[n] = c
+	cbs.index[key] = n
+	cbs.n++
+
+	unlock(&cbs.lock)
+	return callbackasmAddr(n)
+}
+
+type callbackArgs struct {
+	index uintptr
+	// args points to the argument block.
+	//
+	// For cdecl and stdcall, all arguments are on the stack.
+	//
+	// For fastcall, the trampoline spills register arguments to
+	// the reserved spill slots below the stack arguments,
+	// resulting in a layout equivalent to stdcall.
+	//
+	// For arm, the trampoline stores the register arguments just
+	// below the stack arguments, so again we can treat it as one
+	// big stack arguments frame.
+	args unsafe.Pointer
+	// Below are out-args from callbackWrap
+	result uintptr
+	retPop uintptr // For 386 cdecl, how many bytes to pop on return
+}
+
+// callbackWrap is called by callbackasm to invoke a registered C callback.
+func callbackWrap(a *callbackArgs) {
+	c := cbs.ctxt[a.index]
+	a.retPop = c.retPop
+
+	// Convert from C to Go ABI.
+	var frame [callbackMaxFrame]byte
+	goArgs := unsafe.Pointer(&frame)
+	for _, part := range c.abiMap {
+		memmove(add(goArgs, part.dst), add(a.args, part.src), part.len)
+	}
+
+	// Even though this is copying back results, we can pass a nil
+	// type because those results must not require write barriers.
+	reflectcall(nil, unsafe.Pointer(c.fn), noescape(goArgs), uint32(c.retOffset)+sys.PtrSize, uint32(c.retOffset))
+
+	// Extract the result.
+	a.result = *(*uintptr)(unsafe.Pointer(&frame[c.retOffset]))
+}
+
+const _LOAD_LIBRARY_SEARCH_SYSTEM32 = 0x00000800
+
+// When available, this function will use LoadLibraryEx with the filename
+// parameter and the important SEARCH_SYSTEM32 argument. But on systems that
+// do not have that option, absoluteFilepath should contain a fallback
+// to the full path inside of system32 for use with vanilla LoadLibrary.
+//go:linkname syscall_loadsystemlibrary syscall.loadsystemlibrary
+//go:nosplit
+func syscall_loadsystemlibrary(filename *uint16, absoluteFilepath *uint16) (handle, err uintptr) {
+	lockOSThread()
+	c := &getg().m.syscall
+
+	if useLoadLibraryEx {
+		c.fn = getLoadLibraryEx()
+		c.n = 3
+		args := struct {
+			lpFileName *uint16
+			hFile      uintptr // always 0
+			flags      uint32
+		}{filename, 0, _LOAD_LIBRARY_SEARCH_SYSTEM32}
+		c.args = uintptr(noescape(unsafe.Pointer(&args)))
+	} else {
+		c.fn = getLoadLibrary()
+		c.n = 1
+		c.args = uintptr(noescape(unsafe.Pointer(&absoluteFilepath)))
+	}
+
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	handle = c.r1
+	if handle == 0 {
+		err = c.err
+	}
+	unlockOSThread() // not defer'd after the lockOSThread above to save stack frame size.
+	return
+}
+
+//go:linkname syscall_loadlibrary syscall.loadlibrary
+//go:nosplit
+func syscall_loadlibrary(filename *uint16) (handle, err uintptr) {
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = getLoadLibrary()
+	c.n = 1
+	c.args = uintptr(noescape(unsafe.Pointer(&filename)))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	handle = c.r1
+	if handle == 0 {
+		err = c.err
+	}
+	return
+}
+
+//go:linkname syscall_getprocaddress syscall.getprocaddress
+//go:nosplit
+func syscall_getprocaddress(handle uintptr, procname *byte) (outhandle, err uintptr) {
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = getGetProcAddress()
+	c.n = 2
+	c.args = uintptr(noescape(unsafe.Pointer(&handle)))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	outhandle = c.r1
+	if outhandle == 0 {
+		err = c.err
+	}
+	return
+}
+
+//go:linkname syscall_Syscall syscall.Syscall
+//go:nosplit
+func syscall_Syscall(fn, nargs, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = fn
+	c.n = nargs
+	c.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	return c.r1, c.r2, c.err
+}
+
+//go:linkname syscall_Syscall6 syscall.Syscall6
+//go:nosplit
+func syscall_Syscall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = fn
+	c.n = nargs
+	c.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	return c.r1, c.r2, c.err
+}
+
+//go:linkname syscall_Syscall9 syscall.Syscall9
+//go:nosplit
+func syscall_Syscall9(fn, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9 uintptr) (r1, r2, err uintptr) {
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = fn
+	c.n = nargs
+	c.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	return c.r1, c.r2, c.err
+}
+
+//go:linkname syscall_Syscall12 syscall.Syscall12
+//go:nosplit
+func syscall_Syscall12(fn, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12 uintptr) (r1, r2, err uintptr) {
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = fn
+	c.n = nargs
+	c.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	return c.r1, c.r2, c.err
+}
+
+//go:linkname syscall_Syscall15 syscall.Syscall15
+//go:nosplit
+func syscall_Syscall15(fn, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 uintptr) (r1, r2, err uintptr) {
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = fn
+	c.n = nargs
+	c.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	return c.r1, c.r2, c.err
+}
+
+//go:linkname syscall_Syscall18 syscall.Syscall18
+//go:nosplit
+func syscall_Syscall18(fn, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18 uintptr) (r1, r2, err uintptr) {
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = fn
+	c.n = nargs
+	c.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	return c.r1, c.r2, c.err
+}
diff --git a/src/runtime/syscall_windows_test.go b/src/runtime/syscall_windows_test.go
new file mode 100644
index 0000000..fb215b3
--- /dev/null
+++ b/src/runtime/syscall_windows_test.go
@@ -0,0 +1,1223 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"fmt"
+	"internal/syscall/windows/sysdll"
+	"internal/testenv"
+	"io"
+	"math"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"reflect"
+	"runtime"
+	"strconv"
+	"strings"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+type DLL struct {
+	*syscall.DLL
+	t *testing.T
+}
+
+func GetDLL(t *testing.T, name string) *DLL {
+	d, e := syscall.LoadDLL(name)
+	if e != nil {
+		t.Fatal(e)
+	}
+	return &DLL{DLL: d, t: t}
+}
+
+func (d *DLL) Proc(name string) *syscall.Proc {
+	p, e := d.FindProc(name)
+	if e != nil {
+		d.t.Fatal(e)
+	}
+	return p
+}
+
+func TestStdCall(t *testing.T) {
+	type Rect struct {
+		left, top, right, bottom int32
+	}
+	res := Rect{}
+	expected := Rect{1, 1, 40, 60}
+	a, _, _ := GetDLL(t, "user32.dll").Proc("UnionRect").Call(
+		uintptr(unsafe.Pointer(&res)),
+		uintptr(unsafe.Pointer(&Rect{10, 1, 14, 60})),
+		uintptr(unsafe.Pointer(&Rect{1, 2, 40, 50})))
+	if a != 1 || res.left != expected.left ||
+		res.top != expected.top ||
+		res.right != expected.right ||
+		res.bottom != expected.bottom {
+		t.Error("stdcall USER32.UnionRect returns", a, "res=", res)
+	}
+}
+
+func Test64BitReturnStdCall(t *testing.T) {
+
+	const (
+		VER_BUILDNUMBER      = 0x0000004
+		VER_MAJORVERSION     = 0x0000002
+		VER_MINORVERSION     = 0x0000001
+		VER_PLATFORMID       = 0x0000008
+		VER_PRODUCT_TYPE     = 0x0000080
+		VER_SERVICEPACKMAJOR = 0x0000020
+		VER_SERVICEPACKMINOR = 0x0000010
+		VER_SUITENAME        = 0x0000040
+
+		VER_EQUAL         = 1
+		VER_GREATER       = 2
+		VER_GREATER_EQUAL = 3
+		VER_LESS          = 4
+		VER_LESS_EQUAL    = 5
+
+		ERROR_OLD_WIN_VERSION syscall.Errno = 1150
+	)
+
+	type OSVersionInfoEx struct {
+		OSVersionInfoSize uint32
+		MajorVersion      uint32
+		MinorVersion      uint32
+		BuildNumber       uint32
+		PlatformId        uint32
+		CSDVersion        [128]uint16
+		ServicePackMajor  uint16
+		ServicePackMinor  uint16
+		SuiteMask         uint16
+		ProductType       byte
+		Reserve           byte
+	}
+
+	d := GetDLL(t, "kernel32.dll")
+
+	var m1, m2 uintptr
+	VerSetConditionMask := d.Proc("VerSetConditionMask")
+	m1, m2, _ = VerSetConditionMask.Call(m1, m2, VER_MAJORVERSION, VER_GREATER_EQUAL)
+	m1, m2, _ = VerSetConditionMask.Call(m1, m2, VER_MINORVERSION, VER_GREATER_EQUAL)
+	m1, m2, _ = VerSetConditionMask.Call(m1, m2, VER_SERVICEPACKMAJOR, VER_GREATER_EQUAL)
+	m1, m2, _ = VerSetConditionMask.Call(m1, m2, VER_SERVICEPACKMINOR, VER_GREATER_EQUAL)
+
+	vi := OSVersionInfoEx{
+		MajorVersion:     5,
+		MinorVersion:     1,
+		ServicePackMajor: 2,
+		ServicePackMinor: 0,
+	}
+	vi.OSVersionInfoSize = uint32(unsafe.Sizeof(vi))
+	r, _, e2 := d.Proc("VerifyVersionInfoW").Call(
+		uintptr(unsafe.Pointer(&vi)),
+		VER_MAJORVERSION|VER_MINORVERSION|VER_SERVICEPACKMAJOR|VER_SERVICEPACKMINOR,
+		m1, m2)
+	if r == 0 && e2 != ERROR_OLD_WIN_VERSION {
+		t.Errorf("VerifyVersionInfo failed: %s", e2)
+	}
+}
+
+func TestCDecl(t *testing.T) {
+	var buf [50]byte
+	fmtp, _ := syscall.BytePtrFromString("%d %d %d")
+	a, _, _ := GetDLL(t, "user32.dll").Proc("wsprintfA").Call(
+		uintptr(unsafe.Pointer(&buf[0])),
+		uintptr(unsafe.Pointer(fmtp)),
+		1000, 2000, 3000)
+	if string(buf[:a]) != "1000 2000 3000" {
+		t.Error("cdecl USER32.wsprintfA returns", a, "buf=", buf[:a])
+	}
+}
+
+func TestEnumWindows(t *testing.T) {
+	d := GetDLL(t, "user32.dll")
+	isWindows := d.Proc("IsWindow")
+	counter := 0
+	cb := syscall.NewCallback(func(hwnd syscall.Handle, lparam uintptr) uintptr {
+		if lparam != 888 {
+			t.Error("lparam was not passed to callback")
+		}
+		b, _, _ := isWindows.Call(uintptr(hwnd))
+		if b == 0 {
+			t.Error("USER32.IsWindow returns FALSE")
+		}
+		counter++
+		return 1 // continue enumeration
+	})
+	a, _, _ := d.Proc("EnumWindows").Call(cb, 888)
+	if a == 0 {
+		t.Error("USER32.EnumWindows returns FALSE")
+	}
+	if counter == 0 {
+		t.Error("Callback has been never called or your have no windows")
+	}
+}
+
+func callback(timeFormatString unsafe.Pointer, lparam uintptr) uintptr {
+	(*(*func())(unsafe.Pointer(&lparam)))()
+	return 0 // stop enumeration
+}
+
+// nestedCall calls into Windows, back into Go, and finally to f.
+func nestedCall(t *testing.T, f func()) {
+	c := syscall.NewCallback(callback)
+	d := GetDLL(t, "kernel32.dll")
+	defer d.Release()
+	const LOCALE_NAME_USER_DEFAULT = 0
+	d.Proc("EnumTimeFormatsEx").Call(c, LOCALE_NAME_USER_DEFAULT, 0, uintptr(*(*unsafe.Pointer)(unsafe.Pointer(&f))))
+}
+
+func TestCallback(t *testing.T) {
+	var x = false
+	nestedCall(t, func() { x = true })
+	if !x {
+		t.Fatal("nestedCall did not call func")
+	}
+}
+
+func TestCallbackGC(t *testing.T) {
+	nestedCall(t, runtime.GC)
+}
+
+func TestCallbackPanicLocked(t *testing.T) {
+	runtime.LockOSThread()
+	defer runtime.UnlockOSThread()
+
+	if !runtime.LockedOSThread() {
+		t.Fatal("runtime.LockOSThread didn't")
+	}
+	defer func() {
+		s := recover()
+		if s == nil {
+			t.Fatal("did not panic")
+		}
+		if s.(string) != "callback panic" {
+			t.Fatal("wrong panic:", s)
+		}
+		if !runtime.LockedOSThread() {
+			t.Fatal("lost lock on OS thread after panic")
+		}
+	}()
+	nestedCall(t, func() { panic("callback panic") })
+	panic("nestedCall returned")
+}
+
+func TestCallbackPanic(t *testing.T) {
+	// Make sure panic during callback unwinds properly.
+	if runtime.LockedOSThread() {
+		t.Fatal("locked OS thread on entry to TestCallbackPanic")
+	}
+	defer func() {
+		s := recover()
+		if s == nil {
+			t.Fatal("did not panic")
+		}
+		if s.(string) != "callback panic" {
+			t.Fatal("wrong panic:", s)
+		}
+		if runtime.LockedOSThread() {
+			t.Fatal("locked OS thread on exit from TestCallbackPanic")
+		}
+	}()
+	nestedCall(t, func() { panic("callback panic") })
+	panic("nestedCall returned")
+}
+
+func TestCallbackPanicLoop(t *testing.T) {
+	// Make sure we don't blow out m->g0 stack.
+	for i := 0; i < 100000; i++ {
+		TestCallbackPanic(t)
+	}
+}
+
+func TestBlockingCallback(t *testing.T) {
+	c := make(chan int)
+	go func() {
+		for i := 0; i < 10; i++ {
+			c <- <-c
+		}
+	}()
+	nestedCall(t, func() {
+		for i := 0; i < 10; i++ {
+			c <- i
+			if j := <-c; j != i {
+				t.Errorf("out of sync %d != %d", j, i)
+			}
+		}
+	})
+}
+
+func TestCallbackInAnotherThread(t *testing.T) {
+	d := GetDLL(t, "kernel32.dll")
+
+	f := func(p uintptr) uintptr {
+		return p
+	}
+	r, _, err := d.Proc("CreateThread").Call(0, 0, syscall.NewCallback(f), 123, 0, 0)
+	if r == 0 {
+		t.Fatalf("CreateThread failed: %v", err)
+	}
+	h := syscall.Handle(r)
+	defer syscall.CloseHandle(h)
+
+	switch s, err := syscall.WaitForSingleObject(h, 100); s {
+	case syscall.WAIT_OBJECT_0:
+		break
+	case syscall.WAIT_TIMEOUT:
+		t.Fatal("timeout waiting for thread to exit")
+	case syscall.WAIT_FAILED:
+		t.Fatalf("WaitForSingleObject failed: %v", err)
+	default:
+		t.Fatalf("WaitForSingleObject returns unexpected value %v", s)
+	}
+
+	var ec uint32
+	r, _, err = d.Proc("GetExitCodeThread").Call(uintptr(h), uintptr(unsafe.Pointer(&ec)))
+	if r == 0 {
+		t.Fatalf("GetExitCodeThread failed: %v", err)
+	}
+	if ec != 123 {
+		t.Fatalf("expected 123, but got %d", ec)
+	}
+}
+
+type cbFunc struct {
+	goFunc interface{}
+}
+
+func (f cbFunc) cName(cdecl bool) string {
+	name := "stdcall"
+	if cdecl {
+		name = "cdecl"
+	}
+	t := reflect.TypeOf(f.goFunc)
+	for i := 0; i < t.NumIn(); i++ {
+		name += "_" + t.In(i).Name()
+	}
+	return name
+}
+
+func (f cbFunc) cSrc(w io.Writer, cdecl bool) {
+	// Construct a C function that takes a callback with
+	// f.goFunc's signature, and calls it with integers 1..N.
+	funcname := f.cName(cdecl)
+	attr := "__stdcall"
+	if cdecl {
+		attr = "__cdecl"
+	}
+	typename := "t" + funcname
+	t := reflect.TypeOf(f.goFunc)
+	cTypes := make([]string, t.NumIn())
+	cArgs := make([]string, t.NumIn())
+	for i := range cTypes {
+		// We included stdint.h, so this works for all sized
+		// integer types, and uint8Pair_t.
+		cTypes[i] = t.In(i).Name() + "_t"
+		if t.In(i).Name() == "uint8Pair" {
+			cArgs[i] = fmt.Sprintf("(uint8Pair_t){%d,1}", i)
+		} else {
+			cArgs[i] = fmt.Sprintf("%d", i+1)
+		}
+	}
+	fmt.Fprintf(w, `
+typedef uintptr_t %s (*%s)(%s);
+uintptr_t %s(%s f) {
+	return f(%s);
+}
+	`, attr, typename, strings.Join(cTypes, ","), funcname, typename, strings.Join(cArgs, ","))
+}
+
+func (f cbFunc) testOne(t *testing.T, dll *syscall.DLL, cdecl bool, cb uintptr) {
+	r1, _, _ := dll.MustFindProc(f.cName(cdecl)).Call(cb)
+
+	want := 0
+	for i := 0; i < reflect.TypeOf(f.goFunc).NumIn(); i++ {
+		want += i + 1
+	}
+	if int(r1) != want {
+		t.Errorf("wanted result %d; got %d", want, r1)
+	}
+}
+
+type uint8Pair struct{ x, y uint8 }
+
+var cbFuncs = []cbFunc{
+	{func(i1, i2 uintptr) uintptr {
+		return i1 + i2
+	}},
+	{func(i1, i2, i3 uintptr) uintptr {
+		return i1 + i2 + i3
+	}},
+	{func(i1, i2, i3, i4 uintptr) uintptr {
+		return i1 + i2 + i3 + i4
+	}},
+	{func(i1, i2, i3, i4, i5 uintptr) uintptr {
+		return i1 + i2 + i3 + i4 + i5
+	}},
+	{func(i1, i2, i3, i4, i5, i6 uintptr) uintptr {
+		return i1 + i2 + i3 + i4 + i5 + i6
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7 uintptr) uintptr {
+		return i1 + i2 + i3 + i4 + i5 + i6 + i7
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7, i8 uintptr) uintptr {
+		return i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7, i8, i9 uintptr) uintptr {
+		return i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9
+	}},
+
+	// Non-uintptr parameters.
+	{func(i1, i2, i3, i4, i5, i6, i7, i8, i9 uint8) uintptr {
+		return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7, i8, i9 uint16) uintptr {
+		return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7, i8, i9 int8) uintptr {
+		return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+	}},
+	{func(i1 int8, i2 int16, i3 int32, i4, i5 uintptr) uintptr {
+		return uintptr(i1) + uintptr(i2) + uintptr(i3) + i4 + i5
+	}},
+	{func(i1, i2, i3, i4, i5 uint8Pair) uintptr {
+		return uintptr(i1.x + i1.y + i2.x + i2.y + i3.x + i3.y + i4.x + i4.y + i5.x + i5.y)
+	}},
+}
+
+type cbDLL struct {
+	name      string
+	buildArgs func(out, src string) []string
+}
+
+func (d *cbDLL) makeSrc(t *testing.T, path string) {
+	f, err := os.Create(path)
+	if err != nil {
+		t.Fatalf("failed to create source file: %v", err)
+	}
+	defer f.Close()
+
+	fmt.Fprint(f, `
+#include <stdint.h>
+typedef struct { uint8_t x, y; } uint8Pair_t;
+`)
+	for _, cbf := range cbFuncs {
+		cbf.cSrc(f, false)
+		cbf.cSrc(f, true)
+	}
+}
+
+func (d *cbDLL) build(t *testing.T, dir string) string {
+	srcname := d.name + ".c"
+	d.makeSrc(t, filepath.Join(dir, srcname))
+	outname := d.name + ".dll"
+	args := d.buildArgs(outname, srcname)
+	cmd := exec.Command(args[0], args[1:]...)
+	cmd.Dir = dir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	return filepath.Join(dir, outname)
+}
+
+var cbDLLs = []cbDLL{
+	{
+		"test",
+		func(out, src string) []string {
+			return []string{"gcc", "-shared", "-s", "-Werror", "-o", out, src}
+		},
+	},
+	{
+		"testO2",
+		func(out, src string) []string {
+			return []string{"gcc", "-shared", "-s", "-Werror", "-o", out, "-O2", src}
+		},
+	},
+}
+
+func TestStdcallAndCDeclCallbacks(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+	tmp, err := os.MkdirTemp("", "TestCDeclCallback")
+	if err != nil {
+		t.Fatal("TempDir failed: ", err)
+	}
+	defer os.RemoveAll(tmp)
+
+	for _, dll := range cbDLLs {
+		t.Run(dll.name, func(t *testing.T) {
+			dllPath := dll.build(t, tmp)
+			dll := syscall.MustLoadDLL(dllPath)
+			defer dll.Release()
+			for _, cbf := range cbFuncs {
+				t.Run(cbf.cName(false), func(t *testing.T) {
+					stdcall := syscall.NewCallback(cbf.goFunc)
+					cbf.testOne(t, dll, false, stdcall)
+				})
+				t.Run(cbf.cName(true), func(t *testing.T) {
+					cdecl := syscall.NewCallbackCDecl(cbf.goFunc)
+					cbf.testOne(t, dll, true, cdecl)
+				})
+			}
+		})
+	}
+}
+
+func TestRegisterClass(t *testing.T) {
+	kernel32 := GetDLL(t, "kernel32.dll")
+	user32 := GetDLL(t, "user32.dll")
+	mh, _, _ := kernel32.Proc("GetModuleHandleW").Call(0)
+	cb := syscall.NewCallback(func(hwnd syscall.Handle, msg uint32, wparam, lparam uintptr) (rc uintptr) {
+		t.Fatal("callback should never get called")
+		return 0
+	})
+	type Wndclassex struct {
+		Size       uint32
+		Style      uint32
+		WndProc    uintptr
+		ClsExtra   int32
+		WndExtra   int32
+		Instance   syscall.Handle
+		Icon       syscall.Handle
+		Cursor     syscall.Handle
+		Background syscall.Handle
+		MenuName   *uint16
+		ClassName  *uint16
+		IconSm     syscall.Handle
+	}
+	name := syscall.StringToUTF16Ptr("test_window")
+	wc := Wndclassex{
+		WndProc:   cb,
+		Instance:  syscall.Handle(mh),
+		ClassName: name,
+	}
+	wc.Size = uint32(unsafe.Sizeof(wc))
+	a, _, err := user32.Proc("RegisterClassExW").Call(uintptr(unsafe.Pointer(&wc)))
+	if a == 0 {
+		t.Fatalf("RegisterClassEx failed: %v", err)
+	}
+	r, _, err := user32.Proc("UnregisterClassW").Call(uintptr(unsafe.Pointer(name)), 0)
+	if r == 0 {
+		t.Fatalf("UnregisterClass failed: %v", err)
+	}
+}
+
+func TestOutputDebugString(t *testing.T) {
+	d := GetDLL(t, "kernel32.dll")
+	p := syscall.StringToUTF16Ptr("testing OutputDebugString")
+	d.Proc("OutputDebugStringW").Call(uintptr(unsafe.Pointer(p)))
+}
+
+func TestRaiseException(t *testing.T) {
+	o := runTestProg(t, "testprog", "RaiseException")
+	if strings.Contains(o, "RaiseException should not return") {
+		t.Fatalf("RaiseException did not crash program: %v", o)
+	}
+	if !strings.Contains(o, "Exception 0xbad") {
+		t.Fatalf("No stack trace: %v", o)
+	}
+}
+
+func TestZeroDivisionException(t *testing.T) {
+	o := runTestProg(t, "testprog", "ZeroDivisionException")
+	if !strings.Contains(o, "panic: runtime error: integer divide by zero") {
+		t.Fatalf("No stack trace: %v", o)
+	}
+}
+
+func TestWERDialogue(t *testing.T) {
+	if os.Getenv("TESTING_WER_DIALOGUE") == "1" {
+		defer os.Exit(0)
+
+		*runtime.TestingWER = true
+		const EXCEPTION_NONCONTINUABLE = 1
+		mod := syscall.MustLoadDLL("kernel32.dll")
+		proc := mod.MustFindProc("RaiseException")
+		proc.Call(0xbad, EXCEPTION_NONCONTINUABLE, 0, 0)
+		println("RaiseException should not return")
+		return
+	}
+	cmd := exec.Command(os.Args[0], "-test.run=TestWERDialogue")
+	cmd.Env = []string{"TESTING_WER_DIALOGUE=1"}
+	// Child process should not open WER dialogue, but return immediately instead.
+	cmd.CombinedOutput()
+}
+
+func TestWindowsStackMemory(t *testing.T) {
+	o := runTestProg(t, "testprog", "StackMemory")
+	stackUsage, err := strconv.Atoi(o)
+	if err != nil {
+		t.Fatalf("Failed to read stack usage: %v", err)
+	}
+	if expected, got := 100<<10, stackUsage; got > expected {
+		t.Fatalf("expected < %d bytes of memory per thread, got %d", expected, got)
+	}
+}
+
+var used byte
+
+func use(buf []byte) {
+	for _, c := range buf {
+		used += c
+	}
+}
+
+func forceStackCopy() (r int) {
+	var f func(int) int
+	f = func(i int) int {
+		var buf [256]byte
+		use(buf[:])
+		if i == 0 {
+			return 0
+		}
+		return i + f(i-1)
+	}
+	r = f(128)
+	return
+}
+
+func TestReturnAfterStackGrowInCallback(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+
+	const src = `
+#include <stdint.h>
+#include <windows.h>
+
+typedef uintptr_t __stdcall (*callback)(uintptr_t);
+
+uintptr_t cfunc(callback f, uintptr_t n) {
+   uintptr_t r;
+   r = f(n);
+   SetLastError(333);
+   return r;
+}
+`
+	tmpdir, err := os.MkdirTemp("", "TestReturnAfterStackGrowInCallback")
+	if err != nil {
+		t.Fatal("TempDir failed: ", err)
+	}
+	defer os.RemoveAll(tmpdir)
+
+	srcname := "mydll.c"
+	err = os.WriteFile(filepath.Join(tmpdir, srcname), []byte(src), 0)
+	if err != nil {
+		t.Fatal(err)
+	}
+	outname := "mydll.dll"
+	cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", outname, srcname)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	dllpath := filepath.Join(tmpdir, outname)
+
+	dll := syscall.MustLoadDLL(dllpath)
+	defer dll.Release()
+
+	proc := dll.MustFindProc("cfunc")
+
+	cb := syscall.NewCallback(func(n uintptr) uintptr {
+		forceStackCopy()
+		return n
+	})
+
+	// Use a new goroutine so that we get a small stack.
+	type result struct {
+		r   uintptr
+		err syscall.Errno
+	}
+	want := result{
+		// Make it large enough to test issue #29331.
+		r:   (^uintptr(0)) >> 24,
+		err: 333,
+	}
+	c := make(chan result)
+	go func() {
+		r, _, err := proc.Call(cb, want.r)
+		c <- result{r, err.(syscall.Errno)}
+	}()
+	if got := <-c; got != want {
+		t.Errorf("got %d want %d", got, want)
+	}
+}
+
+func TestFloatArgs(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+	if runtime.GOARCH != "amd64" {
+		t.Skipf("skipping test: GOARCH=%s", runtime.GOARCH)
+	}
+
+	const src = `
+#include <stdint.h>
+#include <windows.h>
+
+uintptr_t cfunc(uintptr_t a, double b, float c, double d) {
+	if (a == 1 && b == 2.2 && c == 3.3f && d == 4.4e44) {
+		return 1;
+	}
+	return 0;
+}
+`
+	tmpdir, err := os.MkdirTemp("", "TestFloatArgs")
+	if err != nil {
+		t.Fatal("TempDir failed: ", err)
+	}
+	defer os.RemoveAll(tmpdir)
+
+	srcname := "mydll.c"
+	err = os.WriteFile(filepath.Join(tmpdir, srcname), []byte(src), 0)
+	if err != nil {
+		t.Fatal(err)
+	}
+	outname := "mydll.dll"
+	cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", outname, srcname)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	dllpath := filepath.Join(tmpdir, outname)
+
+	dll := syscall.MustLoadDLL(dllpath)
+	defer dll.Release()
+
+	proc := dll.MustFindProc("cfunc")
+
+	r, _, err := proc.Call(
+		1,
+		uintptr(math.Float64bits(2.2)),
+		uintptr(math.Float32bits(3.3)),
+		uintptr(math.Float64bits(4.4e44)),
+	)
+	if r != 1 {
+		t.Errorf("got %d want 1 (err=%v)", r, err)
+	}
+}
+
+func TestFloatReturn(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+	if runtime.GOARCH != "amd64" {
+		t.Skipf("skipping test: GOARCH=%s", runtime.GOARCH)
+	}
+
+	const src = `
+#include <stdint.h>
+#include <windows.h>
+
+float cfuncFloat(uintptr_t a, double b, float c, double d) {
+	if (a == 1 && b == 2.2 && c == 3.3f && d == 4.4e44) {
+		return 1.5f;
+	}
+	return 0;
+}
+
+double cfuncDouble(uintptr_t a, double b, float c, double d) {
+	if (a == 1 && b == 2.2 && c == 3.3f && d == 4.4e44) {
+		return 2.5;
+	}
+	return 0;
+}
+`
+	tmpdir, err := os.MkdirTemp("", "TestFloatReturn")
+	if err != nil {
+		t.Fatal("TempDir failed: ", err)
+	}
+	defer os.RemoveAll(tmpdir)
+
+	srcname := "mydll.c"
+	err = os.WriteFile(filepath.Join(tmpdir, srcname), []byte(src), 0)
+	if err != nil {
+		t.Fatal(err)
+	}
+	outname := "mydll.dll"
+	cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", outname, srcname)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	dllpath := filepath.Join(tmpdir, outname)
+
+	dll := syscall.MustLoadDLL(dllpath)
+	defer dll.Release()
+
+	proc := dll.MustFindProc("cfuncFloat")
+
+	_, r, err := proc.Call(
+		1,
+		uintptr(math.Float64bits(2.2)),
+		uintptr(math.Float32bits(3.3)),
+		uintptr(math.Float64bits(4.4e44)),
+	)
+	fr := math.Float32frombits(uint32(r))
+	if fr != 1.5 {
+		t.Errorf("got %f want 1.5 (err=%v)", fr, err)
+	}
+
+	proc = dll.MustFindProc("cfuncDouble")
+
+	_, r, err = proc.Call(
+		1,
+		uintptr(math.Float64bits(2.2)),
+		uintptr(math.Float32bits(3.3)),
+		uintptr(math.Float64bits(4.4e44)),
+	)
+	dr := math.Float64frombits(uint64(r))
+	if dr != 2.5 {
+		t.Errorf("got %f want 2.5 (err=%v)", dr, err)
+	}
+}
+
+func TestTimeBeginPeriod(t *testing.T) {
+	const TIMERR_NOERROR = 0
+	if *runtime.TimeBeginPeriodRetValue != TIMERR_NOERROR {
+		t.Fatalf("timeBeginPeriod failed: it returned %d", *runtime.TimeBeginPeriodRetValue)
+	}
+}
+
+// removeOneCPU removes one (any) cpu from affinity mask.
+// It returns new affinity mask.
+func removeOneCPU(mask uintptr) (uintptr, error) {
+	if mask == 0 {
+		return 0, fmt.Errorf("cpu affinity mask is empty")
+	}
+	maskbits := int(unsafe.Sizeof(mask) * 8)
+	for i := 0; i < maskbits; i++ {
+		newmask := mask & ^(1 << uint(i))
+		if newmask != mask {
+			return newmask, nil
+		}
+
+	}
+	panic("not reached")
+}
+
+func resumeChildThread(kernel32 *syscall.DLL, childpid int) error {
+	_OpenThread := kernel32.MustFindProc("OpenThread")
+	_ResumeThread := kernel32.MustFindProc("ResumeThread")
+	_Thread32First := kernel32.MustFindProc("Thread32First")
+	_Thread32Next := kernel32.MustFindProc("Thread32Next")
+
+	snapshot, err := syscall.CreateToolhelp32Snapshot(syscall.TH32CS_SNAPTHREAD, 0)
+	if err != nil {
+		return err
+	}
+	defer syscall.CloseHandle(snapshot)
+
+	const _THREAD_SUSPEND_RESUME = 0x0002
+
+	type ThreadEntry32 struct {
+		Size           uint32
+		tUsage         uint32
+		ThreadID       uint32
+		OwnerProcessID uint32
+		BasePri        int32
+		DeltaPri       int32
+		Flags          uint32
+	}
+
+	var te ThreadEntry32
+	te.Size = uint32(unsafe.Sizeof(te))
+	ret, _, err := _Thread32First.Call(uintptr(snapshot), uintptr(unsafe.Pointer(&te)))
+	if ret == 0 {
+		return err
+	}
+	for te.OwnerProcessID != uint32(childpid) {
+		ret, _, err = _Thread32Next.Call(uintptr(snapshot), uintptr(unsafe.Pointer(&te)))
+		if ret == 0 {
+			return err
+		}
+	}
+	h, _, err := _OpenThread.Call(_THREAD_SUSPEND_RESUME, 1, uintptr(te.ThreadID))
+	if h == 0 {
+		return err
+	}
+	defer syscall.Close(syscall.Handle(h))
+
+	ret, _, err = _ResumeThread.Call(h)
+	if ret == 0xffffffff {
+		return err
+	}
+	return nil
+}
+
+func TestNumCPU(t *testing.T) {
+	if os.Getenv("GO_WANT_HELPER_PROCESS") == "1" {
+		// in child process
+		fmt.Fprintf(os.Stderr, "%d", runtime.NumCPU())
+		os.Exit(0)
+	}
+
+	switch n := runtime.NumberOfProcessors(); {
+	case n < 1:
+		t.Fatalf("system cannot have %d cpu(s)", n)
+	case n == 1:
+		if runtime.NumCPU() != 1 {
+			t.Fatalf("runtime.NumCPU() returns %d on single cpu system", runtime.NumCPU())
+		}
+		return
+	}
+
+	const (
+		_CREATE_SUSPENDED   = 0x00000004
+		_PROCESS_ALL_ACCESS = syscall.STANDARD_RIGHTS_REQUIRED | syscall.SYNCHRONIZE | 0xfff
+	)
+
+	kernel32 := syscall.MustLoadDLL("kernel32.dll")
+	_GetProcessAffinityMask := kernel32.MustFindProc("GetProcessAffinityMask")
+	_SetProcessAffinityMask := kernel32.MustFindProc("SetProcessAffinityMask")
+
+	cmd := exec.Command(os.Args[0], "-test.run=TestNumCPU")
+	cmd.Env = append(os.Environ(), "GO_WANT_HELPER_PROCESS=1")
+	var buf bytes.Buffer
+	cmd.Stdout = &buf
+	cmd.Stderr = &buf
+	cmd.SysProcAttr = &syscall.SysProcAttr{CreationFlags: _CREATE_SUSPENDED}
+	err := cmd.Start()
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer func() {
+		err = cmd.Wait()
+		childOutput := string(buf.Bytes())
+		if err != nil {
+			t.Fatalf("child failed: %v: %v", err, childOutput)
+		}
+		// removeOneCPU should have decreased child cpu count by 1
+		want := fmt.Sprintf("%d", runtime.NumCPU()-1)
+		if childOutput != want {
+			t.Fatalf("child output: want %q, got %q", want, childOutput)
+		}
+	}()
+
+	defer func() {
+		err = resumeChildThread(kernel32, cmd.Process.Pid)
+		if err != nil {
+			t.Fatal(err)
+		}
+	}()
+
+	ph, err := syscall.OpenProcess(_PROCESS_ALL_ACCESS, false, uint32(cmd.Process.Pid))
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer syscall.CloseHandle(ph)
+
+	var mask, sysmask uintptr
+	ret, _, err := _GetProcessAffinityMask.Call(uintptr(ph), uintptr(unsafe.Pointer(&mask)), uintptr(unsafe.Pointer(&sysmask)))
+	if ret == 0 {
+		t.Fatal(err)
+	}
+
+	newmask, err := removeOneCPU(mask)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	ret, _, err = _SetProcessAffinityMask.Call(uintptr(ph), newmask)
+	if ret == 0 {
+		t.Fatal(err)
+	}
+	ret, _, err = _GetProcessAffinityMask.Call(uintptr(ph), uintptr(unsafe.Pointer(&mask)), uintptr(unsafe.Pointer(&sysmask)))
+	if ret == 0 {
+		t.Fatal(err)
+	}
+	if newmask != mask {
+		t.Fatalf("SetProcessAffinityMask didn't set newmask of 0x%x. Current mask is 0x%x.", newmask, mask)
+	}
+}
+
+// See Issue 14959
+func TestDLLPreloadMitigation(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+
+	tmpdir, err := os.MkdirTemp("", "TestDLLPreloadMitigation")
+	if err != nil {
+		t.Fatal("TempDir failed: ", err)
+	}
+	defer func() {
+		err := os.RemoveAll(tmpdir)
+		if err != nil {
+			t.Error(err)
+		}
+	}()
+
+	dir0, err := os.Getwd()
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer os.Chdir(dir0)
+
+	const src = `
+#include <stdint.h>
+#include <windows.h>
+
+uintptr_t cfunc(void) {
+   SetLastError(123);
+   return 0;
+}
+`
+	srcname := "nojack.c"
+	err = os.WriteFile(filepath.Join(tmpdir, srcname), []byte(src), 0)
+	if err != nil {
+		t.Fatal(err)
+	}
+	name := "nojack.dll"
+	cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", name, srcname)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	dllpath := filepath.Join(tmpdir, name)
+
+	dll := syscall.MustLoadDLL(dllpath)
+	dll.MustFindProc("cfunc")
+	dll.Release()
+
+	// Get into the directory with the DLL we'll load by base name
+	// ("nojack.dll") Think of this as the user double-clicking an
+	// installer from their Downloads directory where a browser
+	// silently downloaded some malicious DLLs.
+	os.Chdir(tmpdir)
+
+	// First before we can load a DLL from the current directory,
+	// loading it only as "nojack.dll", without an absolute path.
+	delete(sysdll.IsSystemDLL, name) // in case test was run repeatedly
+	dll, err = syscall.LoadDLL(name)
+	if err != nil {
+		t.Fatalf("failed to load %s by base name before sysdll registration: %v", name, err)
+	}
+	dll.Release()
+
+	// And now verify that if we register it as a system32-only
+	// DLL, the implicit loading from the current directory no
+	// longer works.
+	sysdll.IsSystemDLL[name] = true
+	dll, err = syscall.LoadDLL(name)
+	if err == nil {
+		dll.Release()
+		if wantLoadLibraryEx() {
+			t.Fatalf("Bad: insecure load of DLL by base name %q before sysdll registration: %v", name, err)
+		}
+		t.Skip("insecure load of DLL, but expected")
+	}
+}
+
+// Test that C code called via a DLL can use large Windows thread
+// stacks and call back in to Go without crashing. See issue #20975.
+//
+// See also TestBigStackCallbackCgo.
+func TestBigStackCallbackSyscall(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+
+	srcname, err := filepath.Abs("testdata/testprogcgo/bigstack_windows.c")
+	if err != nil {
+		t.Fatal("Abs failed: ", err)
+	}
+
+	tmpdir, err := os.MkdirTemp("", "TestBigStackCallback")
+	if err != nil {
+		t.Fatal("TempDir failed: ", err)
+	}
+	defer os.RemoveAll(tmpdir)
+
+	outname := "mydll.dll"
+	cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", outname, srcname)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	dllpath := filepath.Join(tmpdir, outname)
+
+	dll := syscall.MustLoadDLL(dllpath)
+	defer dll.Release()
+
+	var ok bool
+	proc := dll.MustFindProc("bigStack")
+	cb := syscall.NewCallback(func() uintptr {
+		// Do something interesting to force stack checks.
+		forceStackCopy()
+		ok = true
+		return 0
+	})
+	proc.Call(cb)
+	if !ok {
+		t.Fatalf("callback not called")
+	}
+}
+
+// wantLoadLibraryEx reports whether we expect LoadLibraryEx to work for tests.
+func wantLoadLibraryEx() bool {
+	return testenv.Builder() == "windows-amd64-gce" || testenv.Builder() == "windows-386-gce"
+}
+
+func TestLoadLibraryEx(t *testing.T) {
+	use, have, flags := runtime.LoadLibraryExStatus()
+	if use {
+		return // success.
+	}
+	if wantLoadLibraryEx() {
+		t.Fatalf("Expected LoadLibraryEx+flags to be available. (LoadLibraryEx=%v; flags=%v)",
+			have, flags)
+	}
+	t.Skipf("LoadLibraryEx not usable, but not expected. (LoadLibraryEx=%v; flags=%v)",
+		have, flags)
+}
+
+var (
+	modwinmm    = syscall.NewLazyDLL("winmm.dll")
+	modkernel32 = syscall.NewLazyDLL("kernel32.dll")
+
+	procCreateEvent = modkernel32.NewProc("CreateEventW")
+	procSetEvent    = modkernel32.NewProc("SetEvent")
+)
+
+func createEvent() (syscall.Handle, error) {
+	r0, _, e0 := syscall.Syscall6(procCreateEvent.Addr(), 4, 0, 0, 0, 0, 0, 0)
+	if r0 == 0 {
+		return 0, syscall.Errno(e0)
+	}
+	return syscall.Handle(r0), nil
+}
+
+func setEvent(h syscall.Handle) error {
+	r0, _, e0 := syscall.Syscall(procSetEvent.Addr(), 1, uintptr(h), 0, 0)
+	if r0 == 0 {
+		return syscall.Errno(e0)
+	}
+	return nil
+}
+
+func BenchmarkChanToSyscallPing(b *testing.B) {
+	n := b.N
+	ch := make(chan int)
+	event, err := createEvent()
+	if err != nil {
+		b.Fatal(err)
+	}
+	go func() {
+		for i := 0; i < n; i++ {
+			syscall.WaitForSingleObject(event, syscall.INFINITE)
+			ch <- 1
+		}
+	}()
+	for i := 0; i < n; i++ {
+		err := setEvent(event)
+		if err != nil {
+			b.Fatal(err)
+		}
+		<-ch
+	}
+}
+
+func BenchmarkSyscallToSyscallPing(b *testing.B) {
+	n := b.N
+	event1, err := createEvent()
+	if err != nil {
+		b.Fatal(err)
+	}
+	event2, err := createEvent()
+	if err != nil {
+		b.Fatal(err)
+	}
+	go func() {
+		for i := 0; i < n; i++ {
+			syscall.WaitForSingleObject(event1, syscall.INFINITE)
+			if err := setEvent(event2); err != nil {
+				b.Errorf("Set event failed: %v", err)
+				return
+			}
+		}
+	}()
+	for i := 0; i < n; i++ {
+		if err := setEvent(event1); err != nil {
+			b.Fatal(err)
+		}
+		if b.Failed() {
+			break
+		}
+		syscall.WaitForSingleObject(event2, syscall.INFINITE)
+	}
+}
+
+func BenchmarkChanToChanPing(b *testing.B) {
+	n := b.N
+	ch1 := make(chan int)
+	ch2 := make(chan int)
+	go func() {
+		for i := 0; i < n; i++ {
+			<-ch1
+			ch2 <- 1
+		}
+	}()
+	for i := 0; i < n; i++ {
+		ch1 <- 1
+		<-ch2
+	}
+}
+
+func BenchmarkOsYield(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		runtime.OsYield()
+	}
+}
+
+func BenchmarkRunningGoProgram(b *testing.B) {
+	tmpdir, err := os.MkdirTemp("", "BenchmarkRunningGoProgram")
+	if err != nil {
+		b.Fatal(err)
+	}
+	defer os.RemoveAll(tmpdir)
+
+	src := filepath.Join(tmpdir, "main.go")
+	err = os.WriteFile(src, []byte(benchmarkRunningGoProgram), 0666)
+	if err != nil {
+		b.Fatal(err)
+	}
+
+	exe := filepath.Join(tmpdir, "main.exe")
+	cmd := exec.Command(testenv.GoToolPath(b), "build", "-o", exe, src)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		b.Fatalf("building main.exe failed: %v\n%s", err, out)
+	}
+
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		cmd := exec.Command(exe)
+		out, err := cmd.CombinedOutput()
+		if err != nil {
+			b.Fatalf("running main.exe failed: %v\n%s", err, out)
+		}
+	}
+}
+
+const benchmarkRunningGoProgram = `
+package main
+
+import _ "os" // average Go program will use "os" package, do the same here
+
+func main() {
+}
+`
diff --git a/src/runtime/testdata/testfaketime/faketime.go b/src/runtime/testdata/testfaketime/faketime.go
new file mode 100644
index 0000000..1fb15eb
--- /dev/null
+++ b/src/runtime/testdata/testfaketime/faketime.go
@@ -0,0 +1,28 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Test faketime support. This is its own test program because we have
+// to build it with custom build tags and hence want to minimize
+// dependencies.
+
+package main
+
+import (
+	"os"
+	"time"
+)
+
+func main() {
+	println("line 1")
+	// Stream switch, increments time
+	os.Stdout.WriteString("line 2\n")
+	os.Stdout.WriteString("line 3\n")
+	// Stream switch, increments time
+	os.Stderr.WriteString("line 4\n")
+	// Time jump
+	time.Sleep(1 * time.Second)
+	os.Stdout.WriteString("line 5\n")
+	// Print the current time.
+	os.Stdout.WriteString(time.Now().UTC().Format(time.RFC3339))
+}
diff --git a/src/runtime/testdata/testprog/abort.go b/src/runtime/testdata/testprog/abort.go
new file mode 100644
index 0000000..9e79d4d
--- /dev/null
+++ b/src/runtime/testdata/testprog/abort.go
@@ -0,0 +1,23 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import _ "unsafe" // for go:linkname
+
+func init() {
+	register("Abort", Abort)
+}
+
+//go:linkname runtimeAbort runtime.abort
+func runtimeAbort()
+
+func Abort() {
+	defer func() {
+		recover()
+		panic("BAD: recovered from abort")
+	}()
+	runtimeAbort()
+	println("BAD: after abort")
+}
diff --git a/src/runtime/testdata/testprog/badtraceback.go b/src/runtime/testdata/testprog/badtraceback.go
new file mode 100644
index 0000000..d558adc
--- /dev/null
+++ b/src/runtime/testdata/testprog/badtraceback.go
@@ -0,0 +1,47 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"runtime"
+	"runtime/debug"
+	"unsafe"
+)
+
+func init() {
+	register("BadTraceback", BadTraceback)
+}
+
+func BadTraceback() {
+	// Disable GC to prevent traceback at unexpected time.
+	debug.SetGCPercent(-1)
+
+	// Run badLR1 on its own stack to minimize the stack size and
+	// exercise the stack bounds logic in the hex dump.
+	go badLR1()
+	select {}
+}
+
+//go:noinline
+func badLR1() {
+	// We need two frames on LR machines because we'll smash this
+	// frame's saved LR.
+	badLR2(0)
+}
+
+//go:noinline
+func badLR2(arg int) {
+	// Smash the return PC or saved LR.
+	lrOff := unsafe.Sizeof(uintptr(0))
+	if runtime.GOARCH == "ppc64" || runtime.GOARCH == "ppc64le" {
+		lrOff = 32 // FIXED_FRAME or sys.MinFrameSize
+	}
+	lrPtr := (*uintptr)(unsafe.Pointer(uintptr(unsafe.Pointer(&arg)) - lrOff))
+	*lrPtr = 0xbad
+
+	// Print a backtrace. This should include diagnostics for the
+	// bad return PC and a hex dump.
+	panic("backtrace")
+}
diff --git a/src/runtime/testdata/testprog/checkptr.go b/src/runtime/testdata/testprog/checkptr.go
new file mode 100644
index 0000000..e0a2794
--- /dev/null
+++ b/src/runtime/testdata/testprog/checkptr.go
@@ -0,0 +1,51 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "unsafe"
+
+func init() {
+	register("CheckPtrAlignmentNoPtr", CheckPtrAlignmentNoPtr)
+	register("CheckPtrAlignmentPtr", CheckPtrAlignmentPtr)
+	register("CheckPtrArithmetic", CheckPtrArithmetic)
+	register("CheckPtrArithmetic2", CheckPtrArithmetic2)
+	register("CheckPtrSize", CheckPtrSize)
+	register("CheckPtrSmall", CheckPtrSmall)
+}
+
+func CheckPtrAlignmentNoPtr() {
+	var x [2]int64
+	p := unsafe.Pointer(&x[0])
+	sink2 = (*int64)(unsafe.Pointer(uintptr(p) + 1))
+}
+
+func CheckPtrAlignmentPtr() {
+	var x [2]int64
+	p := unsafe.Pointer(&x[0])
+	sink2 = (**int64)(unsafe.Pointer(uintptr(p) + 1))
+}
+
+func CheckPtrArithmetic() {
+	var x int
+	i := uintptr(unsafe.Pointer(&x))
+	sink2 = (*int)(unsafe.Pointer(i))
+}
+
+func CheckPtrArithmetic2() {
+	var x [2]int64
+	p := unsafe.Pointer(&x[1])
+	var one uintptr = 1
+	sink2 = unsafe.Pointer(uintptr(p) & ^one)
+}
+
+func CheckPtrSize() {
+	p := new(int64)
+	sink2 = p
+	sink2 = (*[100]int64)(unsafe.Pointer(p))
+}
+
+func CheckPtrSmall() {
+	sink2 = unsafe.Pointer(uintptr(1))
+}
diff --git a/src/runtime/testdata/testprog/crash.go b/src/runtime/testdata/testprog/crash.go
new file mode 100644
index 0000000..c4990cd
--- /dev/null
+++ b/src/runtime/testdata/testprog/crash.go
@@ -0,0 +1,66 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"runtime"
+)
+
+func init() {
+	register("Crash", Crash)
+	register("DoublePanic", DoublePanic)
+}
+
+func test(name string) {
+	defer func() {
+		if x := recover(); x != nil {
+			fmt.Printf(" recovered")
+		}
+		fmt.Printf(" done\n")
+	}()
+	fmt.Printf("%s:", name)
+	var s *string
+	_ = *s
+	fmt.Print("SHOULD NOT BE HERE")
+}
+
+func testInNewThread(name string) {
+	c := make(chan bool)
+	go func() {
+		runtime.LockOSThread()
+		test(name)
+		c <- true
+	}()
+	<-c
+}
+
+func Crash() {
+	runtime.LockOSThread()
+	test("main")
+	testInNewThread("new-thread")
+	testInNewThread("second-new-thread")
+	test("main-again")
+}
+
+type P string
+
+func (p P) String() string {
+	// Try to free the "YYY" string header when the "XXX"
+	// panic is stringified.
+	runtime.GC()
+	runtime.GC()
+	runtime.GC()
+	return string(p)
+}
+
+// Test that panic message is not clobbered.
+// See issue 30150.
+func DoublePanic() {
+	defer func() {
+		panic(P("YYY"))
+	}()
+	panic(P("XXX"))
+}
diff --git a/src/runtime/testdata/testprog/deadlock.go b/src/runtime/testdata/testprog/deadlock.go
new file mode 100644
index 0000000..781acbd
--- /dev/null
+++ b/src/runtime/testdata/testprog/deadlock.go
@@ -0,0 +1,363 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"runtime"
+	"runtime/debug"
+	"time"
+)
+
+func init() {
+	registerInit("InitDeadlock", InitDeadlock)
+	registerInit("NoHelperGoroutines", NoHelperGoroutines)
+
+	register("SimpleDeadlock", SimpleDeadlock)
+	register("LockedDeadlock", LockedDeadlock)
+	register("LockedDeadlock2", LockedDeadlock2)
+	register("GoexitDeadlock", GoexitDeadlock)
+	register("StackOverflow", StackOverflow)
+	register("ThreadExhaustion", ThreadExhaustion)
+	register("RecursivePanic", RecursivePanic)
+	register("RecursivePanic2", RecursivePanic2)
+	register("RecursivePanic3", RecursivePanic3)
+	register("RecursivePanic4", RecursivePanic4)
+	register("RecursivePanic5", RecursivePanic5)
+	register("GoexitExit", GoexitExit)
+	register("GoNil", GoNil)
+	register("MainGoroutineID", MainGoroutineID)
+	register("Breakpoint", Breakpoint)
+	register("GoexitInPanic", GoexitInPanic)
+	register("PanicAfterGoexit", PanicAfterGoexit)
+	register("RecoveredPanicAfterGoexit", RecoveredPanicAfterGoexit)
+	register("RecoverBeforePanicAfterGoexit", RecoverBeforePanicAfterGoexit)
+	register("RecoverBeforePanicAfterGoexit2", RecoverBeforePanicAfterGoexit2)
+	register("PanicTraceback", PanicTraceback)
+	register("GoschedInPanic", GoschedInPanic)
+	register("SyscallInPanic", SyscallInPanic)
+	register("PanicLoop", PanicLoop)
+}
+
+func SimpleDeadlock() {
+	select {}
+	panic("not reached")
+}
+
+func InitDeadlock() {
+	select {}
+	panic("not reached")
+}
+
+func LockedDeadlock() {
+	runtime.LockOSThread()
+	select {}
+}
+
+func LockedDeadlock2() {
+	go func() {
+		runtime.LockOSThread()
+		select {}
+	}()
+	time.Sleep(time.Millisecond)
+	select {}
+}
+
+func GoexitDeadlock() {
+	F := func() {
+		for i := 0; i < 10; i++ {
+		}
+	}
+
+	go F()
+	go F()
+	runtime.Goexit()
+}
+
+func StackOverflow() {
+	var f func() byte
+	f = func() byte {
+		var buf [64 << 10]byte
+		return buf[0] + f()
+	}
+	debug.SetMaxStack(1474560)
+	f()
+}
+
+func ThreadExhaustion() {
+	debug.SetMaxThreads(10)
+	c := make(chan int)
+	for i := 0; i < 100; i++ {
+		go func() {
+			runtime.LockOSThread()
+			c <- 0
+			select {}
+		}()
+		<-c
+	}
+}
+
+func RecursivePanic() {
+	func() {
+		defer func() {
+			fmt.Println(recover())
+		}()
+		var x [8192]byte
+		func(x [8192]byte) {
+			defer func() {
+				if err := recover(); err != nil {
+					panic("wrap: " + err.(string))
+				}
+			}()
+			panic("bad")
+		}(x)
+	}()
+	panic("again")
+}
+
+// Same as RecursivePanic, but do the first recover and the second panic in
+// separate defers, and make sure they are executed in the correct order.
+func RecursivePanic2() {
+	func() {
+		defer func() {
+			fmt.Println(recover())
+		}()
+		var x [8192]byte
+		func(x [8192]byte) {
+			defer func() {
+				panic("second panic")
+			}()
+			defer func() {
+				fmt.Println(recover())
+			}()
+			panic("first panic")
+		}(x)
+	}()
+	panic("third panic")
+}
+
+// Make sure that the first panic finished as a panic, even though the second
+// panic was recovered
+func RecursivePanic3() {
+	defer func() {
+		defer func() {
+			recover()
+		}()
+		panic("second panic")
+	}()
+	panic("first panic")
+}
+
+// Test case where a single defer recovers one panic but starts another panic. If
+// the second panic is never recovered, then the recovered first panic will still
+// appear on the panic stack (labeled '[recovered]') and the runtime stack.
+func RecursivePanic4() {
+	defer func() {
+		recover()
+		panic("second panic")
+	}()
+	panic("first panic")
+}
+
+// Test case where we have an open-coded defer higher up the stack (in two), and
+// in the current function (three) we recover in a defer while we still have
+// another defer to be processed.
+func RecursivePanic5() {
+	one()
+	panic("third panic")
+}
+
+//go:noinline
+func one() {
+	two()
+}
+
+//go:noinline
+func two() {
+	defer func() {
+	}()
+
+	three()
+}
+
+//go:noinline
+func three() {
+	defer func() {
+	}()
+
+	defer func() {
+		fmt.Println(recover())
+	}()
+
+	defer func() {
+		fmt.Println(recover())
+		panic("second panic")
+	}()
+
+	panic("first panic")
+}
+
+func GoexitExit() {
+	println("t1")
+	go func() {
+		time.Sleep(time.Millisecond)
+	}()
+	i := 0
+	println("t2")
+	runtime.SetFinalizer(&i, func(p *int) {})
+	println("t3")
+	runtime.GC()
+	println("t4")
+	runtime.Goexit()
+}
+
+func GoNil() {
+	defer func() {
+		recover()
+	}()
+	var f func()
+	go f()
+	select {}
+}
+
+func MainGoroutineID() {
+	panic("test")
+}
+
+func NoHelperGoroutines() {
+	i := 0
+	runtime.SetFinalizer(&i, func(p *int) {})
+	time.AfterFunc(time.Hour, func() {})
+	panic("oops")
+}
+
+func Breakpoint() {
+	runtime.Breakpoint()
+}
+
+func GoexitInPanic() {
+	go func() {
+		defer func() {
+			runtime.Goexit()
+		}()
+		panic("hello")
+	}()
+	runtime.Goexit()
+}
+
+type errorThatGosched struct{}
+
+func (errorThatGosched) Error() string {
+	runtime.Gosched()
+	return "errorThatGosched"
+}
+
+func GoschedInPanic() {
+	panic(errorThatGosched{})
+}
+
+type errorThatPrint struct{}
+
+func (errorThatPrint) Error() string {
+	fmt.Println("1")
+	fmt.Println("2")
+	return "3"
+}
+
+func SyscallInPanic() {
+	panic(errorThatPrint{})
+}
+
+func PanicAfterGoexit() {
+	defer func() {
+		panic("hello")
+	}()
+	runtime.Goexit()
+}
+
+func RecoveredPanicAfterGoexit() {
+	defer func() {
+		defer func() {
+			r := recover()
+			if r == nil {
+				panic("bad recover")
+			}
+		}()
+		panic("hello")
+	}()
+	runtime.Goexit()
+}
+
+func RecoverBeforePanicAfterGoexit() {
+	// 1. defer a function that recovers
+	// 2. defer a function that panics
+	// 3. call goexit
+	// Goexit runs the #2 defer. Its panic
+	// is caught by the #1 defer.  For Goexit, we explicitly
+	// resume execution in the Goexit loop, instead of resuming
+	// execution in the caller (which would make the Goexit disappear!)
+	defer func() {
+		r := recover()
+		if r == nil {
+			panic("bad recover")
+		}
+	}()
+	defer func() {
+		panic("hello")
+	}()
+	runtime.Goexit()
+}
+
+func RecoverBeforePanicAfterGoexit2() {
+	for i := 0; i < 2; i++ {
+		defer func() {
+		}()
+	}
+	// 1. defer a function that recovers
+	// 2. defer a function that panics
+	// 3. call goexit
+	// Goexit runs the #2 defer. Its panic
+	// is caught by the #1 defer.  For Goexit, we explicitly
+	// resume execution in the Goexit loop, instead of resuming
+	// execution in the caller (which would make the Goexit disappear!)
+	defer func() {
+		r := recover()
+		if r == nil {
+			panic("bad recover")
+		}
+	}()
+	defer func() {
+		panic("hello")
+	}()
+	runtime.Goexit()
+}
+
+func PanicTraceback() {
+	pt1()
+}
+
+func pt1() {
+	defer func() {
+		panic("panic pt1")
+	}()
+	pt2()
+}
+
+func pt2() {
+	defer func() {
+		panic("panic pt2")
+	}()
+	panic("hello")
+}
+
+type panicError struct{}
+
+func (*panicError) Error() string {
+	panic("double error")
+}
+
+func PanicLoop() {
+	panic(&panicError{})
+}
diff --git a/src/runtime/testdata/testprog/gc.go b/src/runtime/testdata/testprog/gc.go
new file mode 100644
index 0000000..74732cd
--- /dev/null
+++ b/src/runtime/testdata/testprog/gc.go
@@ -0,0 +1,302 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"os"
+	"runtime"
+	"runtime/debug"
+	"sync/atomic"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("GCFairness", GCFairness)
+	register("GCFairness2", GCFairness2)
+	register("GCSys", GCSys)
+	register("GCPhys", GCPhys)
+	register("DeferLiveness", DeferLiveness)
+	register("GCZombie", GCZombie)
+}
+
+func GCSys() {
+	runtime.GOMAXPROCS(1)
+	memstats := new(runtime.MemStats)
+	runtime.GC()
+	runtime.ReadMemStats(memstats)
+	sys := memstats.Sys
+
+	runtime.MemProfileRate = 0 // disable profiler
+
+	itercount := 100000
+	for i := 0; i < itercount; i++ {
+		workthegc()
+	}
+
+	// Should only be using a few MB.
+	// We allocated 100 MB or (if not short) 1 GB.
+	runtime.ReadMemStats(memstats)
+	if sys > memstats.Sys {
+		sys = 0
+	} else {
+		sys = memstats.Sys - sys
+	}
+	if sys > 16<<20 {
+		fmt.Printf("using too much memory: %d bytes\n", sys)
+		return
+	}
+	fmt.Printf("OK\n")
+}
+
+var sink []byte
+
+func workthegc() []byte {
+	sink = make([]byte, 1029)
+	return sink
+}
+
+func GCFairness() {
+	runtime.GOMAXPROCS(1)
+	f, err := os.Open("/dev/null")
+	if os.IsNotExist(err) {
+		// This test tests what it is intended to test only if writes are fast.
+		// If there is no /dev/null, we just don't execute the test.
+		fmt.Println("OK")
+		return
+	}
+	if err != nil {
+		fmt.Println(err)
+		os.Exit(1)
+	}
+	for i := 0; i < 2; i++ {
+		go func() {
+			for {
+				f.Write([]byte("."))
+			}
+		}()
+	}
+	time.Sleep(10 * time.Millisecond)
+	fmt.Println("OK")
+}
+
+func GCFairness2() {
+	// Make sure user code can't exploit the GC's high priority
+	// scheduling to make scheduling of user code unfair. See
+	// issue #15706.
+	runtime.GOMAXPROCS(1)
+	debug.SetGCPercent(1)
+	var count [3]int64
+	var sink [3]interface{}
+	for i := range count {
+		go func(i int) {
+			for {
+				sink[i] = make([]byte, 1024)
+				atomic.AddInt64(&count[i], 1)
+			}
+		}(i)
+	}
+	// Note: If the unfairness is really bad, it may not even get
+	// past the sleep.
+	//
+	// If the scheduling rules change, this may not be enough time
+	// to let all goroutines run, but for now we cycle through
+	// them rapidly.
+	//
+	// OpenBSD's scheduler makes every usleep() take at least
+	// 20ms, so we need a long time to ensure all goroutines have
+	// run. If they haven't run after 30ms, give it another 1000ms
+	// and check again.
+	time.Sleep(30 * time.Millisecond)
+	var fail bool
+	for i := range count {
+		if atomic.LoadInt64(&count[i]) == 0 {
+			fail = true
+		}
+	}
+	if fail {
+		time.Sleep(1 * time.Second)
+		for i := range count {
+			if atomic.LoadInt64(&count[i]) == 0 {
+				fmt.Printf("goroutine %d did not run\n", i)
+				return
+			}
+		}
+	}
+	fmt.Println("OK")
+}
+
+func GCPhys() {
+	// This test ensures that heap-growth scavenging is working as intended.
+	//
+	// It sets up a specific scenario: it allocates two pairs of objects whose
+	// sizes sum to size. One object in each pair is "small" (though must be
+	// large enough to be considered a large object by the runtime) and one is
+	// large. The small objects are kept while the large objects are freed,
+	// creating two large unscavenged holes in the heap. The heap goal should
+	// also be small as a result (so size must be at least as large as the
+	// minimum heap size). We then allocate one large object, bigger than both
+	// pairs of objects combined. This allocation, because it will tip
+	// HeapSys-HeapReleased well above the heap goal, should trigger heap-growth
+	// scavenging and scavenge most, if not all, of the large holes we created
+	// earlier.
+	const (
+		// Size must be also large enough to be considered a large
+		// object (not in any size-segregated span).
+		size    = 4 << 20
+		split   = 64 << 10
+		objects = 2
+
+		// The page cache could hide 64 8-KiB pages from the scavenger today.
+		maxPageCache = (8 << 10) * 64
+
+		// Reduce GOMAXPROCS down to 4 if it's greater. We need to bound the amount
+		// of memory held in the page cache because the scavenger can't reach it.
+		// The page cache will hold at most maxPageCache of memory per-P, so this
+		// bounds the amount of memory hidden from the scavenger to 4*maxPageCache
+		// at most.
+		maxProcs = 4
+	)
+	// Set GOGC so that this test operates under consistent assumptions.
+	debug.SetGCPercent(100)
+	procs := runtime.GOMAXPROCS(-1)
+	if procs > maxProcs {
+		defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(maxProcs))
+		procs = runtime.GOMAXPROCS(-1)
+	}
+	// Save objects which we want to survive, and condemn objects which we don't.
+	// Note that we condemn objects in this way and release them all at once in
+	// order to avoid having the GC start freeing up these objects while the loop
+	// is still running and filling in the holes we intend to make.
+	saved := make([][]byte, 0, objects+1)
+	condemned := make([][]byte, 0, objects)
+	for i := 0; i < 2*objects; i++ {
+		if i%2 == 0 {
+			saved = append(saved, make([]byte, split))
+		} else {
+			condemned = append(condemned, make([]byte, size-split))
+		}
+	}
+	condemned = nil
+	// Clean up the heap. This will free up every other object created above
+	// (i.e. everything in condemned) creating holes in the heap.
+	// Also, if the condemned objects are still being swept, its possible that
+	// the scavenging that happens as a result of the next allocation won't see
+	// the holes at all. We call runtime.GC() twice here so that when we allocate
+	// our large object there's no race with sweeping.
+	runtime.GC()
+	runtime.GC()
+	// Perform one big allocation which should also scavenge any holes.
+	//
+	// The heap goal will rise after this object is allocated, so it's very
+	// important that we try to do all the scavenging in a single allocation
+	// that exceeds the heap goal. Otherwise the rising heap goal could foil our
+	// test.
+	saved = append(saved, make([]byte, objects*size))
+	// Clean up the heap again just to put it in a known state.
+	runtime.GC()
+	// heapBacked is an estimate of the amount of physical memory used by
+	// this test. HeapSys is an estimate of the size of the mapped virtual
+	// address space (which may or may not be backed by physical pages)
+	// whereas HeapReleased is an estimate of the amount of bytes returned
+	// to the OS. Their difference then roughly corresponds to the amount
+	// of virtual address space that is backed by physical pages.
+	var stats runtime.MemStats
+	runtime.ReadMemStats(&stats)
+	heapBacked := stats.HeapSys - stats.HeapReleased
+	// If heapBacked does not exceed the heap goal by more than retainExtraPercent
+	// then the scavenger is working as expected; the newly-created holes have been
+	// scavenged immediately as part of the allocations which cannot fit in the holes.
+	//
+	// Since the runtime should scavenge the entirety of the remaining holes,
+	// theoretically there should be no more free and unscavenged memory. However due
+	// to other allocations that happen during this test we may still see some physical
+	// memory over-use.
+	overuse := (float64(heapBacked) - float64(stats.HeapAlloc)) / float64(stats.HeapAlloc)
+	// Compute the threshold.
+	//
+	// In theory, this threshold should just be zero, but that's not possible in practice.
+	// Firstly, the runtime's page cache can hide up to maxPageCache of free memory from the
+	// scavenger per P. To account for this, we increase the threshold by the ratio between the
+	// total amount the runtime could hide from the scavenger to the amount of memory we expect
+	// to be able to scavenge here, which is (size-split)*objects. This computation is the crux
+	// GOMAXPROCS above; if GOMAXPROCS is too high the threshold just becomes 100%+ since the
+	// amount of memory being allocated is fixed. Then we add 5% to account for noise, such as
+	// other allocations this test may have performed that we don't explicitly account for The
+	// baseline threshold here is around 11% for GOMAXPROCS=1, capping out at around 30% for
+	// GOMAXPROCS=4.
+	threshold := 0.05 + float64(procs)*maxPageCache/float64((size-split)*objects)
+	if overuse <= threshold {
+		fmt.Println("OK")
+		return
+	}
+	// Physical memory utilization exceeds the threshold, so heap-growth scavenging
+	// did not operate as expected.
+	//
+	// In the context of this test, this indicates a large amount of
+	// fragmentation with physical pages that are otherwise unused but not
+	// returned to the OS.
+	fmt.Printf("exceeded physical memory overuse threshold of %3.2f%%: %3.2f%%\n"+
+		"(alloc: %d, goal: %d, sys: %d, rel: %d, objs: %d)\n", threshold*100, overuse*100,
+		stats.HeapAlloc, stats.NextGC, stats.HeapSys, stats.HeapReleased, len(saved))
+	runtime.KeepAlive(saved)
+}
+
+// Test that defer closure is correctly scanned when the stack is scanned.
+func DeferLiveness() {
+	var x [10]int
+	escape(&x)
+	fn := func() {
+		if x[0] != 42 {
+			panic("FAIL")
+		}
+	}
+	defer fn()
+
+	x[0] = 42
+	runtime.GC()
+	runtime.GC()
+	runtime.GC()
+}
+
+//go:noinline
+func escape(x interface{}) { sink2 = x; sink2 = nil }
+
+var sink2 interface{}
+
+// Test zombie object detection and reporting.
+func GCZombie() {
+	// Allocate several objects of unusual size (so free slots are
+	// unlikely to all be re-allocated by the runtime).
+	const size = 190
+	const count = 8192 / size
+	keep := make([]*byte, 0, (count+1)/2)
+	free := make([]uintptr, 0, (count+1)/2)
+	zombies := make([]*byte, 0, len(free))
+	for i := 0; i < count; i++ {
+		obj := make([]byte, size)
+		p := &obj[0]
+		if i%2 == 0 {
+			keep = append(keep, p)
+		} else {
+			free = append(free, uintptr(unsafe.Pointer(p)))
+		}
+	}
+
+	// Free the unreferenced objects.
+	runtime.GC()
+
+	// Bring the free objects back to life.
+	for _, p := range free {
+		zombies = append(zombies, (*byte)(unsafe.Pointer(p)))
+	}
+
+	// GC should detect the zombie objects.
+	runtime.GC()
+	println("failed")
+	runtime.KeepAlive(keep)
+	runtime.KeepAlive(zombies)
+}
diff --git a/src/runtime/testdata/testprog/lockosthread.go b/src/runtime/testdata/testprog/lockosthread.go
new file mode 100644
index 0000000..e9d7fdb
--- /dev/null
+++ b/src/runtime/testdata/testprog/lockosthread.go
@@ -0,0 +1,246 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"os"
+	"runtime"
+	"sync"
+	"time"
+)
+
+var mainTID int
+
+func init() {
+	registerInit("LockOSThreadMain", func() {
+		// init is guaranteed to run on the main thread.
+		mainTID = gettid()
+	})
+	register("LockOSThreadMain", LockOSThreadMain)
+
+	registerInit("LockOSThreadAlt", func() {
+		// Lock the OS thread now so main runs on the main thread.
+		runtime.LockOSThread()
+	})
+	register("LockOSThreadAlt", LockOSThreadAlt)
+
+	registerInit("LockOSThreadAvoidsStatePropagation", func() {
+		// Lock the OS thread now so main runs on the main thread.
+		runtime.LockOSThread()
+	})
+	register("LockOSThreadAvoidsStatePropagation", LockOSThreadAvoidsStatePropagation)
+	register("LockOSThreadTemplateThreadRace", LockOSThreadTemplateThreadRace)
+}
+
+func LockOSThreadMain() {
+	// gettid only works on Linux, so on other platforms this just
+	// checks that the runtime doesn't do anything terrible.
+
+	// This requires GOMAXPROCS=1 from the beginning to reliably
+	// start a goroutine on the main thread.
+	if runtime.GOMAXPROCS(-1) != 1 {
+		println("requires GOMAXPROCS=1")
+		os.Exit(1)
+	}
+
+	ready := make(chan bool, 1)
+	go func() {
+		// Because GOMAXPROCS=1, this *should* be on the main
+		// thread. Stay there.
+		runtime.LockOSThread()
+		if mainTID != 0 && gettid() != mainTID {
+			println("failed to start goroutine on main thread")
+			os.Exit(1)
+		}
+		// Exit with the thread locked, which should exit the
+		// main thread.
+		ready <- true
+	}()
+	<-ready
+	time.Sleep(1 * time.Millisecond)
+	// Check that this goroutine is still running on a different
+	// thread.
+	if mainTID != 0 && gettid() == mainTID {
+		println("goroutine migrated to locked thread")
+		os.Exit(1)
+	}
+	println("OK")
+}
+
+func LockOSThreadAlt() {
+	// This is running locked to the main OS thread.
+
+	var subTID int
+	ready := make(chan bool, 1)
+	go func() {
+		// This goroutine must be running on a new thread.
+		runtime.LockOSThread()
+		subTID = gettid()
+		ready <- true
+		// Exit with the thread locked.
+	}()
+	<-ready
+	runtime.UnlockOSThread()
+	for i := 0; i < 100; i++ {
+		time.Sleep(1 * time.Millisecond)
+		// Check that this goroutine is running on a different thread.
+		if subTID != 0 && gettid() == subTID {
+			println("locked thread reused")
+			os.Exit(1)
+		}
+		exists, supported := tidExists(subTID)
+		if !supported || !exists {
+			goto ok
+		}
+	}
+	println("sub thread", subTID, "still running")
+	return
+ok:
+	println("OK")
+}
+
+func LockOSThreadAvoidsStatePropagation() {
+	// This test is similar to LockOSThreadAlt in that it will detect if a thread
+	// which should have died is still running. However, rather than do this with
+	// thread IDs, it does this by unsharing state on that thread. This way, it
+	// also detects whether new threads were cloned from the dead thread, and not
+	// from a clean thread. Cloning from a locked thread is undesirable since
+	// cloned threads will inherit potentially unwanted OS state.
+	//
+	// unshareFs, getcwd, and chdir("/tmp") are only guaranteed to work on
+	// Linux, so on other platforms this just checks that the runtime doesn't
+	// do anything terrible.
+	//
+	// This is running locked to the main OS thread.
+
+	// GOMAXPROCS=1 makes this fail much more reliably if a tainted thread is
+	// cloned from.
+	if runtime.GOMAXPROCS(-1) != 1 {
+		println("requires GOMAXPROCS=1")
+		os.Exit(1)
+	}
+
+	if err := chdir("/"); err != nil {
+		println("failed to chdir:", err.Error())
+		os.Exit(1)
+	}
+	// On systems other than Linux, cwd == "".
+	cwd, err := getcwd()
+	if err != nil {
+		println("failed to get cwd:", err.Error())
+		os.Exit(1)
+	}
+	if cwd != "" && cwd != "/" {
+		println("unexpected cwd", cwd, " wanted /")
+		os.Exit(1)
+	}
+
+	ready := make(chan bool, 1)
+	go func() {
+		// This goroutine must be running on a new thread.
+		runtime.LockOSThread()
+
+		// Unshare details about the FS, like the CWD, with
+		// the rest of the process on this thread.
+		// On systems other than Linux, this is a no-op.
+		if err := unshareFs(); err != nil {
+			if err == errNotPermitted {
+				println("unshare not permitted")
+				os.Exit(0)
+			}
+			println("failed to unshare fs:", err.Error())
+			os.Exit(1)
+		}
+		// Chdir to somewhere else on this thread.
+		// On systems other than Linux, this is a no-op.
+		if err := chdir("/tmp"); err != nil {
+			println("failed to chdir:", err.Error())
+			os.Exit(1)
+		}
+
+		// The state on this thread is now considered "tainted", but it
+		// should no longer be observable in any other context.
+
+		ready <- true
+		// Exit with the thread locked.
+	}()
+	<-ready
+
+	// Spawn yet another goroutine and lock it. Since GOMAXPROCS=1, if
+	// for some reason state from the (hopefully dead) locked thread above
+	// propagated into a newly created thread (via clone), or that thread
+	// is actually being re-used, then we should get scheduled on such a
+	// thread with high likelihood.
+	done := make(chan bool)
+	go func() {
+		runtime.LockOSThread()
+
+		// Get the CWD and check if this is the same as the main thread's
+		// CWD. Every thread should share the same CWD.
+		// On systems other than Linux, wd == "".
+		wd, err := getcwd()
+		if err != nil {
+			println("failed to get cwd:", err.Error())
+			os.Exit(1)
+		}
+		if wd != cwd {
+			println("bad state from old thread propagated after it should have died")
+			os.Exit(1)
+		}
+		<-done
+
+		runtime.UnlockOSThread()
+	}()
+	done <- true
+	runtime.UnlockOSThread()
+	println("OK")
+}
+
+func LockOSThreadTemplateThreadRace() {
+	// This test attempts to reproduce the race described in
+	// golang.org/issue/38931. To do so, we must have a stop-the-world
+	// (achieved via ReadMemStats) racing with two LockOSThread calls.
+	//
+	// While this test attempts to line up the timing, it is only expected
+	// to fail (and thus hang) around 2% of the time if the race is
+	// present.
+
+	// Ensure enough Ps to actually run everything in parallel. Though on
+	// <4 core machines, we are still at the whim of the kernel scheduler.
+	runtime.GOMAXPROCS(4)
+
+	go func() {
+		// Stop the world; race with LockOSThread below.
+		var m runtime.MemStats
+		for {
+			runtime.ReadMemStats(&m)
+		}
+	}()
+
+	// Try to synchronize both LockOSThreads.
+	start := time.Now().Add(10 * time.Millisecond)
+
+	var wg sync.WaitGroup
+	wg.Add(2)
+
+	for i := 0; i < 2; i++ {
+		go func() {
+			for time.Now().Before(start) {
+			}
+
+			// Add work to the local runq to trigger early startm
+			// in handoffp.
+			go func() {}()
+
+			runtime.LockOSThread()
+			runtime.Gosched() // add a preemption point.
+			wg.Done()
+		}()
+	}
+
+	wg.Wait()
+	// If both LockOSThreads completed then we did not hit the race.
+	println("OK")
+}
diff --git a/src/runtime/testdata/testprog/main.go b/src/runtime/testdata/testprog/main.go
new file mode 100644
index 0000000..ae491a2
--- /dev/null
+++ b/src/runtime/testdata/testprog/main.go
@@ -0,0 +1,35 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "os"
+
+var cmds = map[string]func(){}
+
+func register(name string, f func()) {
+	if cmds[name] != nil {
+		panic("duplicate registration: " + name)
+	}
+	cmds[name] = f
+}
+
+func registerInit(name string, f func()) {
+	if len(os.Args) >= 2 && os.Args[1] == name {
+		f()
+	}
+}
+
+func main() {
+	if len(os.Args) < 2 {
+		println("usage: " + os.Args[0] + " name-of-test")
+		return
+	}
+	f := cmds[os.Args[1]]
+	if f == nil {
+		println("unknown function: " + os.Args[1])
+		return
+	}
+	f()
+}
diff --git a/src/runtime/testdata/testprog/map.go b/src/runtime/testdata/testprog/map.go
new file mode 100644
index 0000000..5524289
--- /dev/null
+++ b/src/runtime/testdata/testprog/map.go
@@ -0,0 +1,77 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "runtime"
+
+func init() {
+	register("concurrentMapWrites", concurrentMapWrites)
+	register("concurrentMapReadWrite", concurrentMapReadWrite)
+	register("concurrentMapIterateWrite", concurrentMapIterateWrite)
+}
+
+func concurrentMapWrites() {
+	m := map[int]int{}
+	c := make(chan struct{})
+	go func() {
+		for i := 0; i < 10000; i++ {
+			m[5] = 0
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	go func() {
+		for i := 0; i < 10000; i++ {
+			m[6] = 0
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	<-c
+	<-c
+}
+
+func concurrentMapReadWrite() {
+	m := map[int]int{}
+	c := make(chan struct{})
+	go func() {
+		for i := 0; i < 10000; i++ {
+			m[5] = 0
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	go func() {
+		for i := 0; i < 10000; i++ {
+			_ = m[6]
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	<-c
+	<-c
+}
+
+func concurrentMapIterateWrite() {
+	m := map[int]int{}
+	c := make(chan struct{})
+	go func() {
+		for i := 0; i < 10000; i++ {
+			m[5] = 0
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	go func() {
+		for i := 0; i < 10000; i++ {
+			for range m {
+			}
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	<-c
+	<-c
+}
diff --git a/src/runtime/testdata/testprog/memprof.go b/src/runtime/testdata/testprog/memprof.go
new file mode 100644
index 0000000..0392e60
--- /dev/null
+++ b/src/runtime/testdata/testprog/memprof.go
@@ -0,0 +1,51 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"os"
+	"runtime"
+	"runtime/pprof"
+)
+
+func init() {
+	register("MemProf", MemProf)
+}
+
+var memProfBuf bytes.Buffer
+var memProfStr string
+
+func MemProf() {
+	// Force heap sampling for determinism.
+	runtime.MemProfileRate = 1
+
+	for i := 0; i < 10; i++ {
+		fmt.Fprintf(&memProfBuf, "%*d\n", i, i)
+	}
+	memProfStr = memProfBuf.String()
+
+	runtime.GC()
+
+	f, err := os.CreateTemp("", "memprof")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := pprof.WriteHeapProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	name := f.Name()
+	if err := f.Close(); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	fmt.Println(name)
+}
diff --git a/src/runtime/testdata/testprog/misc.go b/src/runtime/testdata/testprog/misc.go
new file mode 100644
index 0000000..7ccd389
--- /dev/null
+++ b/src/runtime/testdata/testprog/misc.go
@@ -0,0 +1,15 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "runtime"
+
+func init() {
+	register("NumGoroutine", NumGoroutine)
+}
+
+func NumGoroutine() {
+	println(runtime.NumGoroutine())
+}
diff --git a/src/runtime/testdata/testprog/numcpu_freebsd.go b/src/runtime/testdata/testprog/numcpu_freebsd.go
new file mode 100644
index 0000000..aff36ec
--- /dev/null
+++ b/src/runtime/testdata/testprog/numcpu_freebsd.go
@@ -0,0 +1,141 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"os"
+	"os/exec"
+	"regexp"
+	"runtime"
+	"strconv"
+	"strings"
+	"syscall"
+)
+
+var (
+	cpuSetRE = regexp.MustCompile(`(\d,?)+`)
+)
+
+func init() {
+	register("FreeBSDNumCPU", FreeBSDNumCPU)
+	register("FreeBSDNumCPUHelper", FreeBSDNumCPUHelper)
+}
+
+func FreeBSDNumCPUHelper() {
+	fmt.Printf("%d\n", runtime.NumCPU())
+}
+
+func FreeBSDNumCPU() {
+	_, err := exec.LookPath("cpuset")
+	if err != nil {
+		// Can not test without cpuset command.
+		fmt.Println("OK")
+		return
+	}
+	_, err = exec.LookPath("sysctl")
+	if err != nil {
+		// Can not test without sysctl command.
+		fmt.Println("OK")
+		return
+	}
+	cmd := exec.Command("sysctl", "-n", "kern.smp.active")
+	output, err := cmd.CombinedOutput()
+	if err != nil {
+		fmt.Printf("fail to launch '%s', error: %s, output: %s\n", strings.Join(cmd.Args, " "), err, output)
+		return
+	}
+	if bytes.Equal(output, []byte("1\n")) == false {
+		// SMP mode deactivated in kernel.
+		fmt.Println("OK")
+		return
+	}
+
+	list, err := getList()
+	if err != nil {
+		fmt.Printf("%s\n", err)
+		return
+	}
+	err = checkNCPU(list)
+	if err != nil {
+		fmt.Printf("%s\n", err)
+		return
+	}
+	if len(list) >= 2 {
+		err = checkNCPU(list[:len(list)-1])
+		if err != nil {
+			fmt.Printf("%s\n", err)
+			return
+		}
+	}
+	fmt.Println("OK")
+	return
+}
+
+func getList() ([]string, error) {
+	pid := syscall.Getpid()
+
+	// Launch cpuset to print a list of available CPUs: pid <PID> mask: 0, 1, 2, 3.
+	cmd := exec.Command("cpuset", "-g", "-p", strconv.Itoa(pid))
+	cmdline := strings.Join(cmd.Args, " ")
+	output, err := cmd.CombinedOutput()
+	if err != nil {
+		return nil, fmt.Errorf("fail to execute '%s': %s", cmdline, err)
+	}
+	pos := bytes.IndexRune(output, '\n')
+	if pos == -1 {
+		return nil, fmt.Errorf("invalid output from '%s', '\\n' not found: %s", cmdline, output)
+	}
+	output = output[0:pos]
+
+	pos = bytes.IndexRune(output, ':')
+	if pos == -1 {
+		return nil, fmt.Errorf("invalid output from '%s', ':' not found: %s", cmdline, output)
+	}
+
+	var list []string
+	for _, val := range bytes.Split(output[pos+1:], []byte(",")) {
+		index := string(bytes.TrimSpace(val))
+		if len(index) == 0 {
+			continue
+		}
+		list = append(list, index)
+	}
+	if len(list) == 0 {
+		return nil, fmt.Errorf("empty CPU list from '%s': %s", cmdline, output)
+	}
+	return list, nil
+}
+
+func checkNCPU(list []string) error {
+	listString := strings.Join(list, ",")
+	if len(listString) == 0 {
+		return fmt.Errorf("could not check against an empty CPU list")
+	}
+
+	cListString := cpuSetRE.FindString(listString)
+	if len(cListString) == 0 {
+		return fmt.Errorf("invalid cpuset output '%s'", listString)
+	}
+	// Launch FreeBSDNumCPUHelper() with specified CPUs list.
+	cmd := exec.Command("cpuset", "-l", cListString, os.Args[0], "FreeBSDNumCPUHelper")
+	cmdline := strings.Join(cmd.Args, " ")
+	output, err := cmd.CombinedOutput()
+	if err != nil {
+		return fmt.Errorf("fail to launch child '%s', error: %s, output: %s", cmdline, err, output)
+	}
+
+	// NumCPU from FreeBSDNumCPUHelper come with '\n'.
+	output = bytes.TrimSpace(output)
+	n, err := strconv.Atoi(string(output))
+	if err != nil {
+		return fmt.Errorf("fail to parse output from child '%s', error: %s, output: %s", cmdline, err, output)
+	}
+	if n != len(list) {
+		return fmt.Errorf("runtime.NumCPU() expected to %d, got %d when run with CPU list %s", len(list), n, cListString)
+	}
+	return nil
+}
diff --git a/src/runtime/testdata/testprog/panicprint.go b/src/runtime/testdata/testprog/panicprint.go
new file mode 100644
index 0000000..c8deabe
--- /dev/null
+++ b/src/runtime/testdata/testprog/panicprint.go
@@ -0,0 +1,111 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+type MyBool bool
+type MyComplex128 complex128
+type MyComplex64 complex64
+type MyFloat32 float32
+type MyFloat64 float64
+type MyInt int
+type MyInt8 int8
+type MyInt16 int16
+type MyInt32 int32
+type MyInt64 int64
+type MyString string
+type MyUint uint
+type MyUint8 uint8
+type MyUint16 uint16
+type MyUint32 uint32
+type MyUint64 uint64
+type MyUintptr uintptr
+
+func panicCustomComplex64() {
+	panic(MyComplex64(0.11 + 3i))
+}
+
+func panicCustomComplex128() {
+	panic(MyComplex128(32.1 + 10i))
+}
+
+func panicCustomString() {
+	panic(MyString("Panic"))
+}
+
+func panicCustomBool() {
+	panic(MyBool(true))
+}
+
+func panicCustomInt() {
+	panic(MyInt(93))
+}
+
+func panicCustomInt8() {
+	panic(MyInt8(93))
+}
+
+func panicCustomInt16() {
+	panic(MyInt16(93))
+}
+
+func panicCustomInt32() {
+	panic(MyInt32(93))
+}
+
+func panicCustomInt64() {
+	panic(MyInt64(93))
+}
+
+func panicCustomUint() {
+	panic(MyUint(93))
+}
+
+func panicCustomUint8() {
+	panic(MyUint8(93))
+}
+
+func panicCustomUint16() {
+	panic(MyUint16(93))
+}
+
+func panicCustomUint32() {
+	panic(MyUint32(93))
+}
+
+func panicCustomUint64() {
+	panic(MyUint64(93))
+}
+
+func panicCustomUintptr() {
+	panic(MyUintptr(93))
+}
+
+func panicCustomFloat64() {
+	panic(MyFloat64(-93.70))
+}
+
+func panicCustomFloat32() {
+	panic(MyFloat32(-93.70))
+}
+
+func init() {
+	register("panicCustomComplex64", panicCustomComplex64)
+	register("panicCustomComplex128", panicCustomComplex128)
+	register("panicCustomBool", panicCustomBool)
+	register("panicCustomFloat32", panicCustomFloat32)
+	register("panicCustomFloat64", panicCustomFloat64)
+	register("panicCustomInt", panicCustomInt)
+	register("panicCustomInt8", panicCustomInt8)
+	register("panicCustomInt16", panicCustomInt16)
+	register("panicCustomInt32", panicCustomInt32)
+	register("panicCustomInt64", panicCustomInt64)
+	register("panicCustomString", panicCustomString)
+	register("panicCustomUint", panicCustomUint)
+	register("panicCustomUint8", panicCustomUint8)
+	register("panicCustomUint16", panicCustomUint16)
+	register("panicCustomUint32", panicCustomUint32)
+	register("panicCustomUint64", panicCustomUint64)
+	register("panicCustomUintptr", panicCustomUintptr)
+}
diff --git a/src/runtime/testdata/testprog/panicrace.go b/src/runtime/testdata/testprog/panicrace.go
new file mode 100644
index 0000000..f058994
--- /dev/null
+++ b/src/runtime/testdata/testprog/panicrace.go
@@ -0,0 +1,27 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"runtime"
+	"sync"
+)
+
+func init() {
+	register("PanicRace", PanicRace)
+}
+
+func PanicRace() {
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		defer func() {
+			wg.Done()
+			runtime.Gosched()
+		}()
+		panic("crash")
+	}()
+	wg.Wait()
+}
diff --git a/src/runtime/testdata/testprog/preempt.go b/src/runtime/testdata/testprog/preempt.go
new file mode 100644
index 0000000..1c74d0e
--- /dev/null
+++ b/src/runtime/testdata/testprog/preempt.go
@@ -0,0 +1,71 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"runtime"
+	"runtime/debug"
+	"sync/atomic"
+)
+
+func init() {
+	register("AsyncPreempt", AsyncPreempt)
+}
+
+func AsyncPreempt() {
+	// Run with just 1 GOMAXPROCS so the runtime is required to
+	// use scheduler preemption.
+	runtime.GOMAXPROCS(1)
+	// Disable GC so we have complete control of what we're testing.
+	debug.SetGCPercent(-1)
+
+	// Start a goroutine with no sync safe-points.
+	var ready, ready2 uint32
+	go func() {
+		for {
+			atomic.StoreUint32(&ready, 1)
+			dummy()
+			dummy()
+		}
+	}()
+	// Also start one with a frameless function.
+	// This is an especially interesting case for
+	// LR machines.
+	go func() {
+		atomic.AddUint32(&ready2, 1)
+		frameless()
+	}()
+	// Also test empty infinite loop.
+	go func() {
+		atomic.AddUint32(&ready2, 1)
+		for {
+		}
+	}()
+
+	// Wait for the goroutine to stop passing through sync
+	// safe-points.
+	for atomic.LoadUint32(&ready) == 0 || atomic.LoadUint32(&ready2) < 2 {
+		runtime.Gosched()
+	}
+
+	// Run a GC, which will have to stop the goroutine for STW and
+	// for stack scanning. If this doesn't work, the test will
+	// deadlock and timeout.
+	runtime.GC()
+
+	println("OK")
+}
+
+//go:noinline
+func frameless() {
+	for i := int64(0); i < 1<<62; i++ {
+		out += i * i * i * i * i * 12345
+	}
+}
+
+var out int64
+
+//go:noinline
+func dummy() {}
diff --git a/src/runtime/testdata/testprog/signal.go b/src/runtime/testdata/testprog/signal.go
new file mode 100644
index 0000000..417e105
--- /dev/null
+++ b/src/runtime/testdata/testprog/signal.go
@@ -0,0 +1,29 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !windows,!plan9
+
+package main
+
+import (
+	"syscall"
+	"time"
+)
+
+func init() {
+	register("SignalExitStatus", SignalExitStatus)
+}
+
+func SignalExitStatus() {
+	syscall.Kill(syscall.Getpid(), syscall.SIGTERM)
+
+	// Should die immediately, but we've seen flakiness on various
+	// systems (see issue 14063). It's possible that the signal is
+	// being delivered to a different thread and we are returning
+	// and exiting before that thread runs again. Give the program
+	// a little while to die to make sure we pick up the signal
+	// before we return and exit the program. The time here
+	// shouldn't matter--we'll never really sleep this long.
+	time.Sleep(time.Second)
+}
diff --git a/src/runtime/testdata/testprog/sleep.go b/src/runtime/testdata/testprog/sleep.go
new file mode 100644
index 0000000..86e2f6c
--- /dev/null
+++ b/src/runtime/testdata/testprog/sleep.go
@@ -0,0 +1,17 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "time"
+
+// for golang.org/issue/27250
+
+func init() {
+	register("After1", After1)
+}
+
+func After1() {
+	<-time.After(1 * time.Second)
+}
diff --git a/src/runtime/testdata/testprog/stringconcat.go b/src/runtime/testdata/testprog/stringconcat.go
new file mode 100644
index 0000000..f233e66
--- /dev/null
+++ b/src/runtime/testdata/testprog/stringconcat.go
@@ -0,0 +1,20 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "strings"
+
+func init() {
+	register("stringconcat", stringconcat)
+}
+
+func stringconcat() {
+	s0 := strings.Repeat("0", 1<<10)
+	s1 := strings.Repeat("1", 1<<10)
+	s2 := strings.Repeat("2", 1<<10)
+	s3 := strings.Repeat("3", 1<<10)
+	s := s0 + s1 + s2 + s3
+	panic(s)
+}
diff --git a/src/runtime/testdata/testprog/syscall_windows.go b/src/runtime/testdata/testprog/syscall_windows.go
new file mode 100644
index 0000000..b4b6644
--- /dev/null
+++ b/src/runtime/testdata/testprog/syscall_windows.go
@@ -0,0 +1,70 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"internal/syscall/windows"
+	"runtime"
+	"sync"
+	"syscall"
+	"unsafe"
+)
+
+func init() {
+	register("RaiseException", RaiseException)
+	register("ZeroDivisionException", ZeroDivisionException)
+	register("StackMemory", StackMemory)
+}
+
+func RaiseException() {
+	const EXCEPTION_NONCONTINUABLE = 1
+	mod := syscall.MustLoadDLL("kernel32.dll")
+	proc := mod.MustFindProc("RaiseException")
+	proc.Call(0xbad, EXCEPTION_NONCONTINUABLE, 0, 0)
+	println("RaiseException should not return")
+}
+
+func ZeroDivisionException() {
+	x := 1
+	y := 0
+	z := x / y
+	println(z)
+}
+
+func getPagefileUsage() (uintptr, error) {
+	p, err := syscall.GetCurrentProcess()
+	if err != nil {
+		return 0, err
+	}
+	var m windows.PROCESS_MEMORY_COUNTERS
+	err = windows.GetProcessMemoryInfo(p, &m, uint32(unsafe.Sizeof(m)))
+	if err != nil {
+		return 0, err
+	}
+	return m.PagefileUsage, nil
+}
+
+func StackMemory() {
+	mem1, err := getPagefileUsage()
+	if err != nil {
+		panic(err)
+	}
+	const threadCount = 100
+	var wg sync.WaitGroup
+	for i := 0; i < threadCount; i++ {
+		wg.Add(1)
+		go func() {
+			runtime.LockOSThread()
+			wg.Done()
+			select {}
+		}()
+	}
+	wg.Wait()
+	mem2, err := getPagefileUsage()
+	if err != nil {
+		panic(err)
+	}
+	print((mem2 - mem1) / threadCount)
+}
diff --git a/src/runtime/testdata/testprog/syscalls.go b/src/runtime/testdata/testprog/syscalls.go
new file mode 100644
index 0000000..098d5ca
--- /dev/null
+++ b/src/runtime/testdata/testprog/syscalls.go
@@ -0,0 +1,11 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"errors"
+)
+
+var errNotPermitted = errors.New("operation not permitted")
diff --git a/src/runtime/testdata/testprog/syscalls_linux.go b/src/runtime/testdata/testprog/syscalls_linux.go
new file mode 100644
index 0000000..48f8014
--- /dev/null
+++ b/src/runtime/testdata/testprog/syscalls_linux.go
@@ -0,0 +1,58 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"os"
+	"syscall"
+)
+
+func gettid() int {
+	return syscall.Gettid()
+}
+
+func tidExists(tid int) (exists, supported bool) {
+	stat, err := os.ReadFile(fmt.Sprintf("/proc/self/task/%d/stat", tid))
+	if os.IsNotExist(err) {
+		return false, true
+	}
+	// Check if it's a zombie thread.
+	state := bytes.Fields(stat)[2]
+	return !(len(state) == 1 && state[0] == 'Z'), true
+}
+
+func getcwd() (string, error) {
+	if !syscall.ImplementsGetwd {
+		return "", nil
+	}
+	// Use the syscall to get the current working directory.
+	// This is imperative for checking for OS thread state
+	// after an unshare since os.Getwd might just check the
+	// environment, or use some other mechanism.
+	var buf [4096]byte
+	n, err := syscall.Getcwd(buf[:])
+	if err != nil {
+		return "", err
+	}
+	// Subtract one for null terminator.
+	return string(buf[:n-1]), nil
+}
+
+func unshareFs() error {
+	err := syscall.Unshare(syscall.CLONE_FS)
+	if err != nil {
+		errno, ok := err.(syscall.Errno)
+		if ok && errno == syscall.EPERM {
+			return errNotPermitted
+		}
+	}
+	return err
+}
+
+func chdir(path string) error {
+	return syscall.Chdir(path)
+}
diff --git a/src/runtime/testdata/testprog/syscalls_none.go b/src/runtime/testdata/testprog/syscalls_none.go
new file mode 100644
index 0000000..7f8ded3
--- /dev/null
+++ b/src/runtime/testdata/testprog/syscalls_none.go
@@ -0,0 +1,27 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !linux
+
+package main
+
+func gettid() int {
+	return 0
+}
+
+func tidExists(tid int) (exists, supported bool) {
+	return false, false
+}
+
+func getcwd() (string, error) {
+	return "", nil
+}
+
+func unshareFs() error {
+	return nil
+}
+
+func chdir(path string) error {
+	return nil
+}
diff --git a/src/runtime/testdata/testprog/timeprof.go b/src/runtime/testdata/testprog/timeprof.go
new file mode 100644
index 0000000..1e90af4
--- /dev/null
+++ b/src/runtime/testdata/testprog/timeprof.go
@@ -0,0 +1,45 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"os"
+	"runtime/pprof"
+	"time"
+)
+
+func init() {
+	register("TimeProf", TimeProf)
+}
+
+func TimeProf() {
+	f, err := os.CreateTemp("", "timeprof")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := pprof.StartCPUProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	t0 := time.Now()
+	// We should get a profiling signal 100 times a second,
+	// so running for 1/10 second should be sufficient.
+	for time.Since(t0) < time.Second/10 {
+	}
+
+	pprof.StopCPUProfile()
+
+	name := f.Name()
+	if err := f.Close(); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	fmt.Println(name)
+}
diff --git a/src/runtime/testdata/testprog/traceback_ancestors.go b/src/runtime/testdata/testprog/traceback_ancestors.go
new file mode 100644
index 0000000..0ee402c
--- /dev/null
+++ b/src/runtime/testdata/testprog/traceback_ancestors.go
@@ -0,0 +1,99 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"runtime"
+	"strings"
+)
+
+func init() {
+	register("TracebackAncestors", TracebackAncestors)
+}
+
+const numGoroutines = 3
+const numFrames = 2
+
+func TracebackAncestors() {
+	w := make(chan struct{})
+	recurseThenCallGo(w, numGoroutines, numFrames, true)
+	<-w
+	printStack()
+	close(w)
+}
+
+var ignoreGoroutines = make(map[string]bool)
+
+func printStack() {
+	buf := make([]byte, 1024)
+	for {
+		n := runtime.Stack(buf, true)
+		if n < len(buf) {
+			tb := string(buf[:n])
+
+			// Delete any ignored goroutines, if present.
+			pos := 0
+			for pos < len(tb) {
+				next := pos + strings.Index(tb[pos:], "\n\n")
+				if next < pos {
+					next = len(tb)
+				} else {
+					next += len("\n\n")
+				}
+
+				if strings.HasPrefix(tb[pos:], "goroutine ") {
+					id := tb[pos+len("goroutine "):]
+					id = id[:strings.IndexByte(id, ' ')]
+					if ignoreGoroutines[id] {
+						tb = tb[:pos] + tb[next:]
+						next = pos
+					}
+				}
+				pos = next
+			}
+
+			fmt.Print(tb)
+			return
+		}
+		buf = make([]byte, 2*len(buf))
+	}
+}
+
+func recurseThenCallGo(w chan struct{}, frames int, goroutines int, main bool) {
+	if frames == 0 {
+		// Signal to TracebackAncestors that we are done recursing and starting goroutines.
+		w <- struct{}{}
+		<-w
+		return
+	}
+	if goroutines == 0 {
+		// Record which goroutine this is so we can ignore it
+		// in the traceback if it hasn't finished exiting by
+		// the time we printStack.
+		if !main {
+			ignoreGoroutines[goroutineID()] = true
+		}
+
+		// Start the next goroutine now that there are no more recursions left
+		// for this current goroutine.
+		go recurseThenCallGo(w, frames-1, numFrames, false)
+		return
+	}
+	recurseThenCallGo(w, frames, goroutines-1, main)
+}
+
+func goroutineID() string {
+	buf := make([]byte, 128)
+	runtime.Stack(buf, false)
+	const prefix = "goroutine "
+	if !bytes.HasPrefix(buf, []byte(prefix)) {
+		panic(fmt.Sprintf("expected %q at beginning of traceback:\n%s", prefix, buf))
+	}
+	buf = buf[len(prefix):]
+	n := bytes.IndexByte(buf, ' ')
+	return string(buf[:n])
+}
diff --git a/src/runtime/testdata/testprog/vdso.go b/src/runtime/testdata/testprog/vdso.go
new file mode 100644
index 0000000..d2a300d
--- /dev/null
+++ b/src/runtime/testdata/testprog/vdso.go
@@ -0,0 +1,54 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Invoke signal hander in the VDSO context (see issue 32912).
+
+package main
+
+import (
+	"fmt"
+	"os"
+	"runtime/pprof"
+	"time"
+)
+
+func init() {
+	register("SignalInVDSO", signalInVDSO)
+}
+
+func signalInVDSO() {
+	f, err := os.CreateTemp("", "timeprofnow")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := pprof.StartCPUProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	t0 := time.Now()
+	t1 := t0
+	// We should get a profiling signal 100 times a second,
+	// so running for 1 second should be sufficient.
+	for t1.Sub(t0) < time.Second {
+		t1 = time.Now()
+	}
+
+	pprof.StopCPUProfile()
+
+	name := f.Name()
+	if err := f.Close(); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := os.Remove(name); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	fmt.Println("success")
+}
diff --git a/src/runtime/testdata/testprogcgo/aprof.go b/src/runtime/testdata/testprogcgo/aprof.go
new file mode 100644
index 0000000..aabca9e
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/aprof.go
@@ -0,0 +1,53 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// Test that SIGPROF received in C code does not crash the process
+// looking for the C code's func pointer.
+
+// The test fails when the function is the first C function.
+// The exported functions are the first C functions, so we use that.
+
+// extern void GoNop();
+import "C"
+
+import (
+	"bytes"
+	"fmt"
+	"runtime/pprof"
+	"time"
+)
+
+func init() {
+	register("CgoCCodeSIGPROF", CgoCCodeSIGPROF)
+}
+
+//export GoNop
+func GoNop() {}
+
+func CgoCCodeSIGPROF() {
+	c := make(chan bool)
+	go func() {
+		<-c
+		start := time.Now()
+		for i := 0; i < 1e7; i++ {
+			if i%1000 == 0 {
+				if time.Since(start) > time.Second {
+					break
+				}
+			}
+			C.GoNop()
+		}
+		c <- true
+	}()
+
+	var buf bytes.Buffer
+	pprof.StartCPUProfile(&buf)
+	c <- true
+	<-c
+	pprof.StopCPUProfile()
+
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/bigstack_windows.c b/src/runtime/testdata/testprogcgo/bigstack_windows.c
new file mode 100644
index 0000000..cd85ac8
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/bigstack_windows.c
@@ -0,0 +1,46 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This test source is used by both TestBigStackCallbackCgo (linked
+// directly into the Go binary) and TestBigStackCallbackSyscall
+// (compiled into a DLL).
+
+#include <windows.h>
+#include <stdio.h>
+
+#ifndef STACK_SIZE_PARAM_IS_A_RESERVATION
+#define STACK_SIZE_PARAM_IS_A_RESERVATION 0x00010000
+#endif
+
+typedef void callback(char*);
+
+// Allocate a stack that's much larger than the default.
+static const int STACK_SIZE = 16<<20;
+
+static callback *bigStackCallback;
+
+static void useStack(int bytes) {
+	// Windows doesn't like huge frames, so we grow the stack 64k at a time.
+	char x[64<<10];
+	if (bytes < sizeof x) {
+		bigStackCallback(x);
+	} else {
+		useStack(bytes - sizeof x);
+	}
+}
+
+static DWORD WINAPI threadEntry(LPVOID lpParam) {
+	useStack(STACK_SIZE - (128<<10));
+	return 0;
+}
+
+void bigStack(callback *cb) {
+	bigStackCallback = cb;
+	HANDLE hThread = CreateThread(NULL, STACK_SIZE, threadEntry, NULL, STACK_SIZE_PARAM_IS_A_RESERVATION, NULL);
+	if (hThread == NULL) {
+		fprintf(stderr, "CreateThread failed\n");
+		exit(1);
+	}
+	WaitForSingleObject(hThread, INFINITE);
+}
diff --git a/src/runtime/testdata/testprogcgo/bigstack_windows.go b/src/runtime/testdata/testprogcgo/bigstack_windows.go
new file mode 100644
index 0000000..f58fcf9
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/bigstack_windows.go
@@ -0,0 +1,27 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+typedef void callback(char*);
+extern void goBigStack1(char*);
+extern void bigStack(callback*);
+*/
+import "C"
+
+func init() {
+	register("BigStack", BigStack)
+}
+
+func BigStack() {
+	// Create a large thread stack and call back into Go to test
+	// if Go correctly determines the stack bounds.
+	C.bigStack((*C.callback)(C.goBigStack1))
+}
+
+//export goBigStack1
+func goBigStack1(x *C.char) {
+	println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/callback.go b/src/runtime/testdata/testprogcgo/callback.go
new file mode 100644
index 0000000..be0409f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/callback.go
@@ -0,0 +1,93 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <pthread.h>
+
+void go_callback();
+
+static void *thr(void *arg) {
+    go_callback();
+    return 0;
+}
+
+static void foo() {
+    pthread_t th;
+    pthread_attr_t attr;
+    pthread_attr_init(&attr);
+    pthread_attr_setstacksize(&attr, 256 << 10);
+    pthread_create(&th, &attr, thr, 0);
+    pthread_join(th, 0);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"runtime"
+)
+
+func init() {
+	register("CgoCallbackGC", CgoCallbackGC)
+}
+
+//export go_callback
+func go_callback() {
+	runtime.GC()
+	grow()
+	runtime.GC()
+}
+
+var cnt int
+
+func grow() {
+	x := 10000
+	sum := 0
+	if grow1(&x, &sum) == 0 {
+		panic("bad")
+	}
+}
+
+func grow1(x, sum *int) int {
+	if *x == 0 {
+		return *sum + 1
+	}
+	*x--
+	sum1 := *sum + *x
+	return grow1(x, &sum1)
+}
+
+func CgoCallbackGC() {
+	P := 100
+	if os.Getenv("RUNTIME_TESTING_SHORT") != "" {
+		P = 10
+	}
+	done := make(chan bool)
+	// allocate a bunch of stack frames and spray them with pointers
+	for i := 0; i < P; i++ {
+		go func() {
+			grow()
+			done <- true
+		}()
+	}
+	for i := 0; i < P; i++ {
+		<-done
+	}
+	// now give these stack frames to cgo callbacks
+	for i := 0; i < P; i++ {
+		go func() {
+			C.foo()
+			done <- true
+		}()
+	}
+	for i := 0; i < P; i++ {
+		<-done
+	}
+	fmt.Printf("OK\n")
+}
diff --git a/src/runtime/testdata/testprogcgo/catchpanic.go b/src/runtime/testdata/testprogcgo/catchpanic.go
new file mode 100644
index 0000000..55a606d
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/catchpanic.go
@@ -0,0 +1,46 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <signal.h>
+#include <stdlib.h>
+#include <string.h>
+
+static void abrthandler(int signum) {
+	if (signum == SIGABRT) {
+		exit(0);  // success
+	}
+}
+
+void registerAbortHandler() {
+	struct sigaction act;
+	memset(&act, 0, sizeof act);
+	act.sa_handler = abrthandler;
+	sigaction(SIGABRT, &act, NULL);
+}
+
+static void __attribute__ ((constructor)) sigsetup(void) {
+	if (getenv("CGOCATCHPANIC_EARLY_HANDLER") == NULL)
+		return;
+	registerAbortHandler();
+}
+*/
+import "C"
+import "os"
+
+func init() {
+	register("CgoCatchPanic", CgoCatchPanic)
+}
+
+// Test that the SIGABRT raised by panic can be caught by an early signal handler.
+func CgoCatchPanic() {
+	if _, ok := os.LookupEnv("CGOCATCHPANIC_EARLY_HANDLER"); !ok {
+		C.registerAbortHandler()
+	}
+	panic("catch me")
+}
diff --git a/src/runtime/testdata/testprogcgo/cgo.go b/src/runtime/testdata/testprogcgo/cgo.go
new file mode 100644
index 0000000..a587db3
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/cgo.go
@@ -0,0 +1,108 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+void foo1(void) {}
+void foo2(void* p) {}
+*/
+import "C"
+import (
+	"fmt"
+	"os"
+	"runtime"
+	"strconv"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("CgoSignalDeadlock", CgoSignalDeadlock)
+	register("CgoTraceback", CgoTraceback)
+	register("CgoCheckBytes", CgoCheckBytes)
+}
+
+func CgoSignalDeadlock() {
+	runtime.GOMAXPROCS(100)
+	ping := make(chan bool)
+	go func() {
+		for i := 0; ; i++ {
+			runtime.Gosched()
+			select {
+			case done := <-ping:
+				if done {
+					ping <- true
+					return
+				}
+				ping <- true
+			default:
+			}
+			func() {
+				defer func() {
+					recover()
+				}()
+				var s *string
+				*s = ""
+				fmt.Printf("continued after expected panic\n")
+			}()
+		}
+	}()
+	time.Sleep(time.Millisecond)
+	start := time.Now()
+	var times []time.Duration
+	n := 64
+	if os.Getenv("RUNTIME_TEST_SHORT") != "" {
+		n = 16
+	}
+	for i := 0; i < n; i++ {
+		go func() {
+			runtime.LockOSThread()
+			select {}
+		}()
+		go func() {
+			runtime.LockOSThread()
+			select {}
+		}()
+		time.Sleep(time.Millisecond)
+		ping <- false
+		select {
+		case <-ping:
+			times = append(times, time.Since(start))
+		case <-time.After(time.Second):
+			fmt.Printf("HANG 1 %v\n", times)
+			return
+		}
+	}
+	ping <- true
+	select {
+	case <-ping:
+	case <-time.After(time.Second):
+		fmt.Printf("HANG 2 %v\n", times)
+		return
+	}
+	fmt.Printf("OK\n")
+}
+
+func CgoTraceback() {
+	C.foo1()
+	buf := make([]byte, 1)
+	runtime.Stack(buf, true)
+	fmt.Printf("OK\n")
+}
+
+func CgoCheckBytes() {
+	try, _ := strconv.Atoi(os.Getenv("GO_CGOCHECKBYTES_TRY"))
+	if try <= 0 {
+		try = 1
+	}
+	b := make([]byte, 1e6*try)
+	start := time.Now()
+	for i := 0; i < 1e3*try; i++ {
+		C.foo2(unsafe.Pointer(&b[0]))
+		if time.Since(start) > time.Second {
+			break
+		}
+	}
+}
diff --git a/src/runtime/testdata/testprogcgo/crash.go b/src/runtime/testdata/testprogcgo/crash.go
new file mode 100644
index 0000000..4d83132
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/crash.go
@@ -0,0 +1,45 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"runtime"
+)
+
+func init() {
+	register("Crash", Crash)
+}
+
+func test(name string) {
+	defer func() {
+		if x := recover(); x != nil {
+			fmt.Printf(" recovered")
+		}
+		fmt.Printf(" done\n")
+	}()
+	fmt.Printf("%s:", name)
+	var s *string
+	_ = *s
+	fmt.Print("SHOULD NOT BE HERE")
+}
+
+func testInNewThread(name string) {
+	c := make(chan bool)
+	go func() {
+		runtime.LockOSThread()
+		test(name)
+		c <- true
+	}()
+	<-c
+}
+
+func Crash() {
+	runtime.LockOSThread()
+	test("main")
+	testInNewThread("new-thread")
+	testInNewThread("second-new-thread")
+	test("main-again")
+}
diff --git a/src/runtime/testdata/testprogcgo/deadlock.go b/src/runtime/testdata/testprogcgo/deadlock.go
new file mode 100644
index 0000000..2cc68a8
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/deadlock.go
@@ -0,0 +1,30 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+char *geterror() {
+	return "cgo error";
+}
+*/
+import "C"
+import (
+	"fmt"
+)
+
+func init() {
+	register("CgoPanicDeadlock", CgoPanicDeadlock)
+}
+
+type cgoError struct{}
+
+func (cgoError) Error() string {
+	fmt.Print("") // necessary to trigger the deadlock
+	return C.GoString(C.geterror())
+}
+
+func CgoPanicDeadlock() {
+	panic(cgoError{})
+}
diff --git a/src/runtime/testdata/testprogcgo/dll_windows.go b/src/runtime/testdata/testprogcgo/dll_windows.go
new file mode 100644
index 0000000..25380fb
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/dll_windows.go
@@ -0,0 +1,25 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+#include <windows.h>
+
+DWORD getthread() {
+	return GetCurrentThreadId();
+}
+*/
+import "C"
+import "runtime/testdata/testprogcgo/windows"
+
+func init() {
+	register("CgoDLLImportsMain", CgoDLLImportsMain)
+}
+
+func CgoDLLImportsMain() {
+	C.getthread()
+	windows.GetThread()
+	println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/dropm.go b/src/runtime/testdata/testprogcgo/dropm.go
new file mode 100644
index 0000000..9e782f5
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/dropm.go
@@ -0,0 +1,59 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+// Test that a sequence of callbacks from C to Go get the same m.
+// This failed to be true on arm and arm64, which was the root cause
+// of issue 13881.
+
+package main
+
+/*
+#include <stddef.h>
+#include <pthread.h>
+
+extern void GoCheckM();
+
+static void* thread(void* arg __attribute__ ((unused))) {
+	GoCheckM();
+	return NULL;
+}
+
+static void CheckM() {
+	pthread_t tid;
+	pthread_create(&tid, NULL, thread, NULL);
+	pthread_join(tid, NULL);
+	pthread_create(&tid, NULL, thread, NULL);
+	pthread_join(tid, NULL);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+)
+
+func init() {
+	register("EnsureDropM", EnsureDropM)
+}
+
+var savedM uintptr
+
+//export GoCheckM
+func GoCheckM() {
+	m := runtime_getm_for_test()
+	if savedM == 0 {
+		savedM = m
+	} else if savedM != m {
+		fmt.Printf("m == %x want %x\n", m, savedM)
+		os.Exit(1)
+	}
+}
+
+func EnsureDropM() {
+	C.CheckM()
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/dropm_stub.go b/src/runtime/testdata/testprogcgo/dropm_stub.go
new file mode 100644
index 0000000..f7f142c
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/dropm_stub.go
@@ -0,0 +1,11 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import _ "unsafe" // for go:linkname
+
+// Defined in the runtime package.
+//go:linkname runtime_getm_for_test runtime.getm
+func runtime_getm_for_test() uintptr
diff --git a/src/runtime/testdata/testprogcgo/eintr.go b/src/runtime/testdata/testprogcgo/eintr.go
new file mode 100644
index 0000000..1722a75
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/eintr.go
@@ -0,0 +1,245 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <errno.h>
+#include <signal.h>
+#include <string.h>
+
+static int clearRestart(int sig) {
+	struct sigaction sa;
+
+	memset(&sa, 0, sizeof sa);
+	if (sigaction(sig, NULL, &sa) < 0) {
+		return errno;
+	}
+	sa.sa_flags &=~ SA_RESTART;
+	if (sigaction(sig, &sa, NULL) < 0) {
+		return errno;
+	}
+	return 0;
+}
+*/
+import "C"
+
+import (
+	"bytes"
+	"errors"
+	"fmt"
+	"io"
+	"log"
+	"net"
+	"os"
+	"os/exec"
+	"sync"
+	"syscall"
+	"time"
+)
+
+func init() {
+	register("EINTR", EINTR)
+	register("Block", Block)
+}
+
+// Test various operations when a signal handler is installed without
+// the SA_RESTART flag. This tests that the os and net APIs handle EINTR.
+func EINTR() {
+	if errno := C.clearRestart(C.int(syscall.SIGURG)); errno != 0 {
+		log.Fatal(syscall.Errno(errno))
+	}
+	if errno := C.clearRestart(C.int(syscall.SIGWINCH)); errno != 0 {
+		log.Fatal(syscall.Errno(errno))
+	}
+	if errno := C.clearRestart(C.int(syscall.SIGCHLD)); errno != 0 {
+		log.Fatal(syscall.Errno(errno))
+	}
+
+	var wg sync.WaitGroup
+	testPipe(&wg)
+	testNet(&wg)
+	testExec(&wg)
+	wg.Wait()
+	fmt.Println("OK")
+}
+
+// spin does CPU bound spinning and allocating for a millisecond,
+// to get a SIGURG.
+//go:noinline
+func spin() (float64, []byte) {
+	stop := time.Now().Add(time.Millisecond)
+	r1 := 0.0
+	r2 := make([]byte, 200)
+	for time.Now().Before(stop) {
+		for i := 1; i < 1e6; i++ {
+			r1 += r1 / float64(i)
+			r2 = append(r2, bytes.Repeat([]byte{byte(i)}, 100)...)
+			r2 = r2[100:]
+		}
+	}
+	return r1, r2
+}
+
+// winch sends a few SIGWINCH signals to the process.
+func winch() {
+	ticker := time.NewTicker(100 * time.Microsecond)
+	defer ticker.Stop()
+	pid := syscall.Getpid()
+	for n := 10; n > 0; n-- {
+		syscall.Kill(pid, syscall.SIGWINCH)
+		<-ticker.C
+	}
+}
+
+// sendSomeSignals triggers a few SIGURG and SIGWINCH signals.
+func sendSomeSignals() {
+	done := make(chan struct{})
+	go func() {
+		spin()
+		close(done)
+	}()
+	winch()
+	<-done
+}
+
+// testPipe tests pipe operations.
+func testPipe(wg *sync.WaitGroup) {
+	r, w, err := os.Pipe()
+	if err != nil {
+		log.Fatal(err)
+	}
+	if err := syscall.SetNonblock(int(r.Fd()), false); err != nil {
+		log.Fatal(err)
+	}
+	if err := syscall.SetNonblock(int(w.Fd()), false); err != nil {
+		log.Fatal(err)
+	}
+	wg.Add(2)
+	go func() {
+		defer wg.Done()
+		defer w.Close()
+		// Spin before calling Write so that the first ReadFull
+		// in the other goroutine will likely be interrupted
+		// by a signal.
+		sendSomeSignals()
+		// This Write will likely be interrupted by a signal
+		// as the other goroutine spins in the middle of reading.
+		// We write enough data that we should always fill the
+		// pipe buffer and need multiple write system calls.
+		if _, err := w.Write(bytes.Repeat([]byte{0}, 2<<20)); err != nil {
+			log.Fatal(err)
+		}
+	}()
+	go func() {
+		defer wg.Done()
+		defer r.Close()
+		b := make([]byte, 1<<20)
+		// This ReadFull will likely be interrupted by a signal,
+		// as the other goroutine spins before writing anything.
+		if _, err := io.ReadFull(r, b); err != nil {
+			log.Fatal(err)
+		}
+		// Spin after reading half the data so that the Write
+		// in the other goroutine will likely be interrupted
+		// before it completes.
+		sendSomeSignals()
+		if _, err := io.ReadFull(r, b); err != nil {
+			log.Fatal(err)
+		}
+	}()
+}
+
+// testNet tests network operations.
+func testNet(wg *sync.WaitGroup) {
+	ln, err := net.Listen("tcp4", "127.0.0.1:0")
+	if err != nil {
+		if errors.Is(err, syscall.EAFNOSUPPORT) || errors.Is(err, syscall.EPROTONOSUPPORT) {
+			return
+		}
+		log.Fatal(err)
+	}
+	wg.Add(2)
+	go func() {
+		defer wg.Done()
+		defer ln.Close()
+		c, err := ln.Accept()
+		if err != nil {
+			log.Fatal(err)
+		}
+		defer c.Close()
+		cf, err := c.(*net.TCPConn).File()
+		if err != nil {
+			log.Fatal(err)
+		}
+		defer cf.Close()
+		if err := syscall.SetNonblock(int(cf.Fd()), false); err != nil {
+			log.Fatal(err)
+		}
+		// See comments in testPipe.
+		sendSomeSignals()
+		if _, err := cf.Write(bytes.Repeat([]byte{0}, 2<<20)); err != nil {
+			log.Fatal(err)
+		}
+	}()
+	go func() {
+		defer wg.Done()
+		sendSomeSignals()
+		c, err := net.Dial("tcp", ln.Addr().String())
+		if err != nil {
+			log.Fatal(err)
+		}
+		defer c.Close()
+		cf, err := c.(*net.TCPConn).File()
+		if err != nil {
+			log.Fatal(err)
+		}
+		defer cf.Close()
+		if err := syscall.SetNonblock(int(cf.Fd()), false); err != nil {
+			log.Fatal(err)
+		}
+		// See comments in testPipe.
+		b := make([]byte, 1<<20)
+		if _, err := io.ReadFull(cf, b); err != nil {
+			log.Fatal(err)
+		}
+		sendSomeSignals()
+		if _, err := io.ReadFull(cf, b); err != nil {
+			log.Fatal(err)
+		}
+	}()
+}
+
+func testExec(wg *sync.WaitGroup) {
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		cmd := exec.Command(os.Args[0], "Block")
+		stdin, err := cmd.StdinPipe()
+		if err != nil {
+			log.Fatal(err)
+		}
+		cmd.Stderr = new(bytes.Buffer)
+		cmd.Stdout = cmd.Stderr
+		if err := cmd.Start(); err != nil {
+			log.Fatal(err)
+		}
+
+		go func() {
+			sendSomeSignals()
+			stdin.Close()
+		}()
+
+		if err := cmd.Wait(); err != nil {
+			log.Fatalf("%v:\n%s", err, cmd.Stdout)
+		}
+	}()
+}
+
+// Block blocks until stdin is closed.
+func Block() {
+	io.Copy(io.Discard, os.Stdin)
+}
diff --git a/src/runtime/testdata/testprogcgo/exec.go b/src/runtime/testdata/testprogcgo/exec.go
new file mode 100644
index 0000000..15723c7
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/exec.go
@@ -0,0 +1,106 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <stddef.h>
+#include <signal.h>
+#include <pthread.h>
+
+// Save the signal mask at startup so that we see what it is before
+// the Go runtime starts setting up signals.
+
+static sigset_t mask;
+
+static void init(void) __attribute__ ((constructor));
+
+static void init() {
+	sigemptyset(&mask);
+	pthread_sigmask(SIG_SETMASK, NULL, &mask);
+}
+
+int SIGINTBlocked() {
+	return sigismember(&mask, SIGINT);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"io/fs"
+	"os"
+	"os/exec"
+	"os/signal"
+	"sync"
+	"syscall"
+)
+
+func init() {
+	register("CgoExecSignalMask", CgoExecSignalMask)
+}
+
+func CgoExecSignalMask() {
+	if len(os.Args) > 2 && os.Args[2] == "testsigint" {
+		if C.SIGINTBlocked() != 0 {
+			os.Exit(1)
+		}
+		os.Exit(0)
+	}
+
+	c := make(chan os.Signal, 1)
+	signal.Notify(c, syscall.SIGTERM)
+	go func() {
+		for range c {
+		}
+	}()
+
+	const goCount = 10
+	const execCount = 10
+	var wg sync.WaitGroup
+	wg.Add(goCount*execCount + goCount)
+	for i := 0; i < goCount; i++ {
+		go func() {
+			defer wg.Done()
+			for j := 0; j < execCount; j++ {
+				c2 := make(chan os.Signal, 1)
+				signal.Notify(c2, syscall.SIGUSR1)
+				syscall.Kill(os.Getpid(), syscall.SIGTERM)
+				go func(j int) {
+					defer wg.Done()
+					cmd := exec.Command(os.Args[0], "CgoExecSignalMask", "testsigint")
+					cmd.Stdin = os.Stdin
+					cmd.Stdout = os.Stdout
+					cmd.Stderr = os.Stderr
+					if err := cmd.Run(); err != nil {
+						// An overloaded system
+						// may fail with EAGAIN.
+						// This doesn't tell us
+						// anything useful; ignore it.
+						// Issue #27731.
+						if isEAGAIN(err) {
+							return
+						}
+						fmt.Printf("iteration %d: %v\n", j, err)
+						os.Exit(1)
+					}
+				}(j)
+				signal.Stop(c2)
+			}
+		}()
+	}
+	wg.Wait()
+
+	fmt.Println("OK")
+}
+
+// isEAGAIN reports whether err is an EAGAIN error from a process execution.
+func isEAGAIN(err error) bool {
+	if p, ok := err.(*fs.PathError); ok {
+		err = p.Err
+	}
+	return err == syscall.EAGAIN
+}
diff --git a/src/runtime/testdata/testprogcgo/lockosthread.c b/src/runtime/testdata/testprogcgo/lockosthread.c
new file mode 100644
index 0000000..b10cc4f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/lockosthread.c
@@ -0,0 +1,13 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+#include <stdint.h>
+
+uint32_t threadExited;
+
+void setExited(void *x) {
+	__sync_fetch_and_add(&threadExited, 1);
+}
diff --git a/src/runtime/testdata/testprogcgo/lockosthread.go b/src/runtime/testdata/testprogcgo/lockosthread.go
new file mode 100644
index 0000000..36423d9
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/lockosthread.go
@@ -0,0 +1,111 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+package main
+
+import (
+	"os"
+	"runtime"
+	"sync/atomic"
+	"time"
+	"unsafe"
+)
+
+/*
+#include <pthread.h>
+#include <stdint.h>
+
+extern uint32_t threadExited;
+
+void setExited(void *x);
+*/
+import "C"
+
+var mainThread C.pthread_t
+
+func init() {
+	registerInit("LockOSThreadMain", func() {
+		// init is guaranteed to run on the main thread.
+		mainThread = C.pthread_self()
+	})
+	register("LockOSThreadMain", LockOSThreadMain)
+
+	registerInit("LockOSThreadAlt", func() {
+		// Lock the OS thread now so main runs on the main thread.
+		runtime.LockOSThread()
+	})
+	register("LockOSThreadAlt", LockOSThreadAlt)
+}
+
+func LockOSThreadMain() {
+	// This requires GOMAXPROCS=1 from the beginning to reliably
+	// start a goroutine on the main thread.
+	if runtime.GOMAXPROCS(-1) != 1 {
+		println("requires GOMAXPROCS=1")
+		os.Exit(1)
+	}
+
+	ready := make(chan bool, 1)
+	go func() {
+		// Because GOMAXPROCS=1, this *should* be on the main
+		// thread. Stay there.
+		runtime.LockOSThread()
+		self := C.pthread_self()
+		if C.pthread_equal(mainThread, self) == 0 {
+			println("failed to start goroutine on main thread")
+			os.Exit(1)
+		}
+		// Exit with the thread locked, which should exit the
+		// main thread.
+		ready <- true
+	}()
+	<-ready
+	time.Sleep(1 * time.Millisecond)
+	// Check that this goroutine is still running on a different
+	// thread.
+	self := C.pthread_self()
+	if C.pthread_equal(mainThread, self) != 0 {
+		println("goroutine migrated to locked thread")
+		os.Exit(1)
+	}
+	println("OK")
+}
+
+func LockOSThreadAlt() {
+	// This is running locked to the main OS thread.
+
+	var subThread C.pthread_t
+	ready := make(chan bool, 1)
+	C.threadExited = 0
+	go func() {
+		// This goroutine must be running on a new thread.
+		runtime.LockOSThread()
+		subThread = C.pthread_self()
+		// Register a pthread destructor so we can tell this
+		// thread has exited.
+		var key C.pthread_key_t
+		C.pthread_key_create(&key, (*[0]byte)(unsafe.Pointer(C.setExited)))
+		C.pthread_setspecific(key, unsafe.Pointer(new(int)))
+		ready <- true
+		// Exit with the thread locked.
+	}()
+	<-ready
+	for i := 0; i < 100; i++ {
+		time.Sleep(1 * time.Millisecond)
+		// Check that this goroutine is running on a different thread.
+		self := C.pthread_self()
+		if C.pthread_equal(subThread, self) != 0 {
+			println("locked thread reused")
+			os.Exit(1)
+		}
+		if atomic.LoadUint32((*uint32)(&C.threadExited)) != 0 {
+			println("OK")
+			return
+		}
+	}
+	println("sub thread still running")
+	os.Exit(1)
+}
diff --git a/src/runtime/testdata/testprogcgo/main.go b/src/runtime/testdata/testprogcgo/main.go
new file mode 100644
index 0000000..ae491a2
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/main.go
@@ -0,0 +1,35 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "os"
+
+var cmds = map[string]func(){}
+
+func register(name string, f func()) {
+	if cmds[name] != nil {
+		panic("duplicate registration: " + name)
+	}
+	cmds[name] = f
+}
+
+func registerInit(name string, f func()) {
+	if len(os.Args) >= 2 && os.Args[1] == name {
+		f()
+	}
+}
+
+func main() {
+	if len(os.Args) < 2 {
+		println("usage: " + os.Args[0] + " name-of-test")
+		return
+	}
+	f := cmds[os.Args[1]]
+	if f == nil {
+		println("unknown function: " + os.Args[1])
+		return
+	}
+	f()
+}
diff --git a/src/runtime/testdata/testprogcgo/needmdeadlock.go b/src/runtime/testdata/testprogcgo/needmdeadlock.go
new file mode 100644
index 0000000..5a9c359
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/needmdeadlock.go
@@ -0,0 +1,95 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+package main
+
+// This is for issue #42207.
+// During a call to needm we could get a SIGCHLD signal
+// which would itself call needm, causing a deadlock.
+
+/*
+#include <signal.h>
+#include <pthread.h>
+#include <sched.h>
+#include <unistd.h>
+
+extern void GoNeedM();
+
+#define SIGNALERS 10
+
+static void* needmSignalThread(void* p) {
+	pthread_t* pt = (pthread_t*)(p);
+	int i;
+
+	for (i = 0; i < 100; i++) {
+		if (pthread_kill(*pt, SIGCHLD) < 0) {
+			return NULL;
+		}
+		usleep(1);
+	}
+	return NULL;
+}
+
+// We don't need many calls, as the deadlock is only likely
+// to occur the first couple of times that needm is called.
+// After that there will likely be an extra M available.
+#define CALLS 10
+
+static void* needmCallbackThread(void* p) {
+	int i;
+
+	for (i = 0; i < SIGNALERS; i++) {
+		sched_yield(); // Help the signal threads get started.
+	}
+	for (i = 0; i < CALLS; i++) {
+		GoNeedM();
+	}
+	return NULL;
+}
+
+static void runNeedmSignalThread() {
+	int i;
+	pthread_t caller;
+	pthread_t s[SIGNALERS];
+
+	pthread_create(&caller, NULL, needmCallbackThread, NULL);
+	for (i = 0; i < SIGNALERS; i++) {
+		pthread_create(&s[i], NULL, needmSignalThread, &caller);
+	}
+	for (i = 0; i < SIGNALERS; i++) {
+		pthread_join(s[i], NULL);
+	}
+	pthread_join(caller, NULL);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"time"
+)
+
+func init() {
+	register("NeedmDeadlock", NeedmDeadlock)
+}
+
+//export GoNeedM
+func GoNeedM() {
+}
+
+func NeedmDeadlock() {
+	// The failure symptom is that the program hangs because of a
+	// deadlock in needm, so set an alarm.
+	go func() {
+		time.Sleep(5 * time.Second)
+		fmt.Println("Hung for 5 seconds")
+		os.Exit(1)
+	}()
+
+	C.runNeedmSignalThread()
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/numgoroutine.go b/src/runtime/testdata/testprogcgo/numgoroutine.go
new file mode 100644
index 0000000..5bdfe52
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/numgoroutine.go
@@ -0,0 +1,92 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <stddef.h>
+#include <pthread.h>
+
+extern void CallbackNumGoroutine();
+
+static void* thread2(void* arg __attribute__ ((unused))) {
+	CallbackNumGoroutine();
+	return NULL;
+}
+
+static void CheckNumGoroutine() {
+	pthread_t tid;
+	pthread_create(&tid, NULL, thread2, NULL);
+	pthread_join(tid, NULL);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"runtime"
+	"strings"
+)
+
+var baseGoroutines int
+
+func init() {
+	register("NumGoroutine", NumGoroutine)
+}
+
+func NumGoroutine() {
+	// Test that there are just the expected number of goroutines
+	// running. Specifically, test that the spare M's goroutine
+	// doesn't show up.
+	if _, ok := checkNumGoroutine("first", 1+baseGoroutines); !ok {
+		return
+	}
+
+	// Test that the goroutine for a callback from C appears.
+	if C.CheckNumGoroutine(); !callbackok {
+		return
+	}
+
+	// Make sure we're back to the initial goroutines.
+	if _, ok := checkNumGoroutine("third", 1+baseGoroutines); !ok {
+		return
+	}
+
+	fmt.Println("OK")
+}
+
+func checkNumGoroutine(label string, want int) (string, bool) {
+	n := runtime.NumGoroutine()
+	if n != want {
+		fmt.Printf("%s NumGoroutine: want %d; got %d\n", label, want, n)
+		return "", false
+	}
+
+	sbuf := make([]byte, 32<<10)
+	sbuf = sbuf[:runtime.Stack(sbuf, true)]
+	n = strings.Count(string(sbuf), "goroutine ")
+	if n != want {
+		fmt.Printf("%s Stack: want %d; got %d:\n%s\n", label, want, n, string(sbuf))
+		return "", false
+	}
+	return string(sbuf), true
+}
+
+var callbackok bool
+
+//export CallbackNumGoroutine
+func CallbackNumGoroutine() {
+	stk, ok := checkNumGoroutine("second", 2+baseGoroutines)
+	if !ok {
+		return
+	}
+	if !strings.Contains(stk, "CallbackNumGoroutine") {
+		fmt.Printf("missing CallbackNumGoroutine from stack:\n%s\n", stk)
+		return
+	}
+
+	callbackok = true
+}
diff --git a/src/runtime/testdata/testprogcgo/pprof.go b/src/runtime/testdata/testprogcgo/pprof.go
new file mode 100644
index 0000000..3b73fa0
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/pprof.go
@@ -0,0 +1,102 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// Run a slow C function saving a CPU profile.
+
+/*
+#include <stdint.h>
+
+int salt1;
+int salt2;
+
+void cpuHog() {
+	int foo = salt1;
+	int i;
+
+	for (i = 0; i < 100000; i++) {
+		if (foo > 0) {
+			foo *= foo;
+		} else {
+			foo *= foo + 1;
+		}
+	}
+	salt2 = foo;
+}
+
+void cpuHog2() {
+}
+
+static int cpuHogCount;
+
+struct cgoTracebackArg {
+	uintptr_t  context;
+	uintptr_t  sigContext;
+	uintptr_t* buf;
+	uintptr_t  max;
+};
+
+// pprofCgoTraceback is passed to runtime.SetCgoTraceback.
+// For testing purposes it pretends that all CPU hits in C code are in cpuHog.
+// Issue #29034: At least 2 frames are required to verify all frames are captured
+// since runtime/pprof ignores the runtime.goexit base frame if it exists.
+void pprofCgoTraceback(void* parg) {
+	struct cgoTracebackArg* arg = (struct cgoTracebackArg*)(parg);
+	arg->buf[0] = (uintptr_t)(cpuHog) + 0x10;
+	arg->buf[1] = (uintptr_t)(cpuHog2) + 0x4;
+	arg->buf[2] = 0;
+	++cpuHogCount;
+}
+
+// getCpuHogCount fetches the number of times we've seen cpuHog in the
+// traceback.
+int getCpuHogCount() {
+	return cpuHogCount;
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("CgoPprof", CgoPprof)
+}
+
+func CgoPprof() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.pprofCgoTraceback), nil, nil)
+
+	f, err := os.CreateTemp("", "prof")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := pprof.StartCPUProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	t0 := time.Now()
+	for C.getCpuHogCount() < 2 && time.Since(t0) < time.Second {
+		C.cpuHog()
+	}
+
+	pprof.StopCPUProfile()
+
+	name := f.Name()
+	if err := f.Close(); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	fmt.Println(name)
+}
diff --git a/src/runtime/testdata/testprogcgo/raceprof.go b/src/runtime/testdata/testprogcgo/raceprof.go
new file mode 100644
index 0000000..f7ca629
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/raceprof.go
@@ -0,0 +1,78 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux,amd64 freebsd,amd64
+
+package main
+
+// Test that we can collect a lot of colliding profiling signals from
+// an external C thread. This used to fail when built with the race
+// detector, because a call of the predeclared function copy was
+// turned into a call to runtime.slicecopy, which is not marked nosplit.
+
+/*
+#include <signal.h>
+#include <stdint.h>
+#include <pthread.h>
+#include <sched.h>
+
+struct cgoTracebackArg {
+	uintptr_t  context;
+	uintptr_t  sigContext;
+	uintptr_t* buf;
+	uintptr_t  max;
+};
+
+static int raceprofCount;
+
+// We want a bunch of different profile stacks that collide in the
+// hash table maintained in runtime/cpuprof.go. This code knows the
+// size of the hash table (1 << 10) and knows that the hash function
+// is simply multiplicative.
+void raceprofTraceback(void* parg) {
+	struct cgoTracebackArg* arg = (struct cgoTracebackArg*)(parg);
+	raceprofCount++;
+	arg->buf[0] = raceprofCount * (1 << 10);
+	arg->buf[1] = 0;
+}
+
+static void* raceprofThread(void* p) {
+	int i;
+
+	for (i = 0; i < 100; i++) {
+		pthread_kill(pthread_self(), SIGPROF);
+		sched_yield();
+	}
+	return 0;
+}
+
+void runRaceprofThread() {
+	pthread_t tid;
+	pthread_create(&tid, 0, raceprofThread, 0);
+	pthread_join(tid, 0);
+}
+*/
+import "C"
+
+import (
+	"bytes"
+	"fmt"
+	"runtime"
+	"runtime/pprof"
+	"unsafe"
+)
+
+func init() {
+	register("CgoRaceprof", CgoRaceprof)
+}
+
+func CgoRaceprof() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.raceprofTraceback), nil, nil)
+
+	var buf bytes.Buffer
+	pprof.StartCPUProfile(&buf)
+
+	C.runRaceprofThread()
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/racesig.go b/src/runtime/testdata/testprogcgo/racesig.go
new file mode 100644
index 0000000..a079b3f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/racesig.go
@@ -0,0 +1,102 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux,amd64 freebsd,amd64
+
+package main
+
+// Test that an external C thread that is calling malloc can be hit
+// with SIGCHLD signals. This used to fail when built with the race
+// detector, because in that case the signal handler would indirectly
+// call the C malloc function.
+
+/*
+#include <errno.h>
+#include <signal.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <pthread.h>
+#include <sched.h>
+#include <unistd.h>
+
+#define ALLOCERS 100
+#define SIGNALERS 10
+
+static void* signalThread(void* p) {
+	pthread_t* pt = (pthread_t*)(p);
+	int i, j;
+
+	for (i = 0; i < 100; i++) {
+		for (j = 0; j < ALLOCERS; j++) {
+			if (pthread_kill(pt[j], SIGCHLD) < 0) {
+				return NULL;
+			}
+		}
+		usleep(1);
+	}
+	return NULL;
+}
+
+#define CALLS 100
+
+static void* mallocThread(void* p) {
+	int i;
+	void *a[CALLS];
+
+	for (i = 0; i < ALLOCERS; i++) {
+		sched_yield();
+	}
+	for (i = 0; i < CALLS; i++) {
+		a[i] = malloc(i);
+	}
+	for (i = 0; i < CALLS; i++) {
+		free(a[i]);
+	}
+	return NULL;
+}
+
+void runRaceSignalThread() {
+	int i;
+	pthread_t m[ALLOCERS];
+	pthread_t s[SIGNALERS];
+
+	for (i = 0; i < ALLOCERS; i++) {
+		pthread_create(&m[i], NULL, mallocThread, NULL);
+	}
+	for (i = 0; i < SIGNALERS; i++) {
+		pthread_create(&s[i], NULL, signalThread, &m[0]);
+	}
+	for (i = 0; i < SIGNALERS; i++) {
+		pthread_join(s[i], NULL);
+	}
+	for (i = 0; i < ALLOCERS; i++) {
+		pthread_join(m[i], NULL);
+	}
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"time"
+)
+
+func init() {
+	register("CgoRaceSignal", CgoRaceSignal)
+}
+
+func CgoRaceSignal() {
+	// The failure symptom is that the program hangs because of a
+	// deadlock in malloc, so set an alarm.
+	go func() {
+		time.Sleep(5 * time.Second)
+		fmt.Println("Hung for 5 seconds")
+		os.Exit(1)
+	}()
+
+	C.runRaceSignalThread()
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/segv.go b/src/runtime/testdata/testprogcgo/segv.go
new file mode 100644
index 0000000..3237a8c
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/segv.go
@@ -0,0 +1,56 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+package main
+
+// static void nop() {}
+import "C"
+
+import (
+	"syscall"
+	"time"
+)
+
+func init() {
+	register("Segv", Segv)
+	register("SegvInCgo", SegvInCgo)
+}
+
+var Sum int
+
+func Segv() {
+	c := make(chan bool)
+	go func() {
+		close(c)
+		for i := 0; ; i++ {
+			Sum += i
+		}
+	}()
+
+	<-c
+
+	syscall.Kill(syscall.Getpid(), syscall.SIGSEGV)
+
+	// Give the OS time to deliver the signal.
+	time.Sleep(time.Second)
+}
+
+func SegvInCgo() {
+	c := make(chan bool)
+	go func() {
+		close(c)
+		for {
+			C.nop()
+		}
+	}()
+
+	<-c
+
+	syscall.Kill(syscall.Getpid(), syscall.SIGSEGV)
+
+	// Give the OS time to deliver the signal.
+	time.Sleep(time.Second)
+}
diff --git a/src/runtime/testdata/testprogcgo/sigpanic.go b/src/runtime/testdata/testprogcgo/sigpanic.go
new file mode 100644
index 0000000..cb46030
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/sigpanic.go
@@ -0,0 +1,28 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// This program will crash.
+// We want to test unwinding from sigpanic into C code (without a C symbolizer).
+
+/*
+#cgo CFLAGS: -O0
+
+char *pnil;
+
+static int f1(void) {
+	*pnil = 0;
+	return 0;
+}
+*/
+import "C"
+
+func init() {
+	register("TracebackSigpanic", TracebackSigpanic)
+}
+
+func TracebackSigpanic() {
+	C.f1()
+}
diff --git a/src/runtime/testdata/testprogcgo/sigstack.go b/src/runtime/testdata/testprogcgo/sigstack.go
new file mode 100644
index 0000000..21b668d
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/sigstack.go
@@ -0,0 +1,98 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+// Test handling of Go-allocated signal stacks when calling from
+// C-created threads with and without signal stacks. (See issue
+// #22930.)
+
+package main
+
+/*
+#include <pthread.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/mman.h>
+
+#ifdef _AIX
+// On AIX, SIGSTKSZ is too small to handle Go sighandler.
+#define CSIGSTKSZ 0x4000
+#else
+#define CSIGSTKSZ SIGSTKSZ
+#endif
+
+extern void SigStackCallback();
+
+static void* WithSigStack(void* arg __attribute__((unused))) {
+	// Set up an alternate system stack.
+	void* base = mmap(0, CSIGSTKSZ, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
+	if (base == MAP_FAILED) {
+		perror("mmap failed");
+		abort();
+	}
+	stack_t st = {}, ost = {};
+	st.ss_sp = (char*)base;
+	st.ss_flags = 0;
+	st.ss_size = CSIGSTKSZ;
+	if (sigaltstack(&st, &ost) < 0) {
+		perror("sigaltstack failed");
+		abort();
+	}
+
+	// Call Go.
+	SigStackCallback();
+
+	// Disable signal stack and protect it so we can detect reuse.
+	if (ost.ss_flags & SS_DISABLE) {
+		// Darwin libsystem has a bug where it checks ss_size
+		// even if SS_DISABLE is set. (The kernel gets it right.)
+		ost.ss_size = CSIGSTKSZ;
+	}
+	if (sigaltstack(&ost, NULL) < 0) {
+		perror("sigaltstack restore failed");
+		abort();
+	}
+	mprotect(base, CSIGSTKSZ, PROT_NONE);
+	return NULL;
+}
+
+static void* WithoutSigStack(void* arg __attribute__((unused))) {
+	SigStackCallback();
+	return NULL;
+}
+
+static void DoThread(int sigstack) {
+	pthread_t tid;
+	if (sigstack) {
+		pthread_create(&tid, NULL, WithSigStack, NULL);
+	} else {
+		pthread_create(&tid, NULL, WithoutSigStack, NULL);
+	}
+	pthread_join(tid, NULL);
+}
+*/
+import "C"
+
+func init() {
+	register("SigStack", SigStack)
+}
+
+func SigStack() {
+	C.DoThread(0)
+	C.DoThread(1)
+	C.DoThread(0)
+	C.DoThread(1)
+	println("OK")
+}
+
+var BadPtr *int
+
+//export SigStackCallback
+func SigStackCallback() {
+	// Cause the Go signal handler to run.
+	defer func() { recover() }()
+	*BadPtr = 42
+}
diff --git a/src/runtime/testdata/testprogcgo/stack_windows.go b/src/runtime/testdata/testprogcgo/stack_windows.go
new file mode 100644
index 0000000..846297a
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/stack_windows.go
@@ -0,0 +1,54 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "C"
+import (
+	"internal/syscall/windows"
+	"runtime"
+	"sync"
+	"syscall"
+	"unsafe"
+)
+
+func init() {
+	register("StackMemory", StackMemory)
+}
+
+func getPagefileUsage() (uintptr, error) {
+	p, err := syscall.GetCurrentProcess()
+	if err != nil {
+		return 0, err
+	}
+	var m windows.PROCESS_MEMORY_COUNTERS
+	err = windows.GetProcessMemoryInfo(p, &m, uint32(unsafe.Sizeof(m)))
+	if err != nil {
+		return 0, err
+	}
+	return m.PagefileUsage, nil
+}
+
+func StackMemory() {
+	mem1, err := getPagefileUsage()
+	if err != nil {
+		panic(err)
+	}
+	const threadCount = 100
+	var wg sync.WaitGroup
+	for i := 0; i < threadCount; i++ {
+		wg.Add(1)
+		go func() {
+			runtime.LockOSThread()
+			wg.Done()
+			select {}
+		}()
+	}
+	wg.Wait()
+	mem2, err := getPagefileUsage()
+	if err != nil {
+		panic(err)
+	}
+	print((mem2 - mem1) / threadCount)
+}
diff --git a/src/runtime/testdata/testprogcgo/threadpanic.go b/src/runtime/testdata/testprogcgo/threadpanic.go
new file mode 100644
index 0000000..f9b48a9
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/threadpanic.go
@@ -0,0 +1,24 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9
+
+package main
+
+// void start(void);
+import "C"
+
+func init() {
+	register("CgoExternalThreadPanic", CgoExternalThreadPanic)
+}
+
+func CgoExternalThreadPanic() {
+	C.start()
+	select {}
+}
+
+//export gopanic
+func gopanic() {
+	panic("BOOM")
+}
diff --git a/src/runtime/testdata/testprogcgo/threadpanic_unix.c b/src/runtime/testdata/testprogcgo/threadpanic_unix.c
new file mode 100644
index 0000000..c426452
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/threadpanic_unix.c
@@ -0,0 +1,26 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <pthread.h>
+
+void gopanic(void);
+
+static void*
+die(void* x)
+{
+	gopanic();
+	return 0;
+}
+
+void
+start(void)
+{
+	pthread_t t;
+	if(pthread_create(&t, 0, die, 0) != 0)
+		printf("pthread_create failed\n");
+}
diff --git a/src/runtime/testdata/testprogcgo/threadpanic_windows.c b/src/runtime/testdata/testprogcgo/threadpanic_windows.c
new file mode 100644
index 0000000..ba66d0f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/threadpanic_windows.c
@@ -0,0 +1,23 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <process.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+void gopanic(void);
+
+static unsigned int __attribute__((__stdcall__))
+die(void* x)
+{
+	gopanic();
+	return 0;
+}
+
+void
+start(void)
+{
+	if(_beginthreadex(0, 0, die, 0, 0, 0) != 0)
+		printf("_beginthreadex failed\n");
+}
diff --git a/src/runtime/testdata/testprogcgo/threadpprof.go b/src/runtime/testdata/testprogcgo/threadpprof.go
new file mode 100644
index 0000000..feb774b
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/threadpprof.go
@@ -0,0 +1,126 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+package main
+
+// Run a slow C function saving a CPU profile.
+
+/*
+#include <stdint.h>
+#include <time.h>
+#include <pthread.h>
+
+int threadSalt1;
+int threadSalt2;
+
+void cpuHogThread() {
+	int foo = threadSalt1;
+	int i;
+
+	for (i = 0; i < 100000; i++) {
+		if (foo > 0) {
+			foo *= foo;
+		} else {
+			foo *= foo + 1;
+		}
+	}
+	threadSalt2 = foo;
+}
+
+void cpuHogThread2() {
+}
+
+static int cpuHogThreadCount;
+
+struct cgoTracebackArg {
+	uintptr_t  context;
+	uintptr_t  sigContext;
+	uintptr_t* buf;
+	uintptr_t  max;
+};
+
+// pprofCgoThreadTraceback is passed to runtime.SetCgoTraceback.
+// For testing purposes it pretends that all CPU hits in C code are in cpuHog.
+void pprofCgoThreadTraceback(void* parg) {
+	struct cgoTracebackArg* arg = (struct cgoTracebackArg*)(parg);
+	arg->buf[0] = (uintptr_t)(cpuHogThread) + 0x10;
+	arg->buf[1] = (uintptr_t)(cpuHogThread2) + 0x4;
+	arg->buf[2] = 0;
+	__sync_add_and_fetch(&cpuHogThreadCount, 1);
+}
+
+// getCPUHogThreadCount fetches the number of times we've seen cpuHogThread
+// in the traceback.
+int getCPUHogThreadCount() {
+	return __sync_add_and_fetch(&cpuHogThreadCount, 0);
+}
+
+static void* cpuHogDriver(void* arg __attribute__ ((unused))) {
+	while (1) {
+		cpuHogThread();
+	}
+	return 0;
+}
+
+void runCPUHogThread(void) {
+	pthread_t tid;
+	pthread_create(&tid, 0, cpuHogDriver, 0);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("CgoPprofThread", CgoPprofThread)
+	register("CgoPprofThreadNoTraceback", CgoPprofThreadNoTraceback)
+}
+
+func CgoPprofThread() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.pprofCgoThreadTraceback), nil, nil)
+	pprofThread()
+}
+
+func CgoPprofThreadNoTraceback() {
+	pprofThread()
+}
+
+func pprofThread() {
+	f, err := os.CreateTemp("", "prof")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := pprof.StartCPUProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	C.runCPUHogThread()
+
+	t0 := time.Now()
+	for C.getCPUHogThreadCount() < 2 && time.Since(t0) < time.Second {
+		time.Sleep(100 * time.Millisecond)
+	}
+
+	pprof.StopCPUProfile()
+
+	name := f.Name()
+	if err := f.Close(); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	fmt.Println(name)
+}
diff --git a/src/runtime/testdata/testprogcgo/threadprof.go b/src/runtime/testdata/testprogcgo/threadprof.go
new file mode 100644
index 0000000..2d4c103
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/threadprof.go
@@ -0,0 +1,98 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// We only build this file with the tag "threadprof", since it starts
+// a thread running a busy loop at constructor time.
+
+// +build !plan9,!windows
+// +build threadprof
+
+package main
+
+/*
+#include <stdint.h>
+#include <signal.h>
+#include <pthread.h>
+
+volatile int32_t spinlock;
+
+static void *thread1(void *p) {
+	(void)p;
+	while (spinlock == 0)
+		;
+	pthread_kill(pthread_self(), SIGPROF);
+	spinlock = 0;
+	return NULL;
+}
+
+__attribute__((constructor)) void issue9456() {
+	pthread_t tid;
+	pthread_create(&tid, 0, thread1, NULL);
+}
+
+void **nullptr;
+
+void *crash(void *p) {
+	*nullptr = p;
+	return 0;
+}
+
+int start_crashing_thread(void) {
+	pthread_t tid;
+	return pthread_create(&tid, 0, crash, 0);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"os/exec"
+	"runtime"
+	"sync/atomic"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("CgoExternalThreadSIGPROF", CgoExternalThreadSIGPROF)
+	register("CgoExternalThreadSignal", CgoExternalThreadSignal)
+}
+
+func CgoExternalThreadSIGPROF() {
+	// This test intends to test that sending SIGPROF to foreign threads
+	// before we make any cgo call will not abort the whole process, so
+	// we cannot make any cgo call here. See https://golang.org/issue/9456.
+	atomic.StoreInt32((*int32)(unsafe.Pointer(&C.spinlock)), 1)
+	for atomic.LoadInt32((*int32)(unsafe.Pointer(&C.spinlock))) == 1 {
+		runtime.Gosched()
+	}
+	println("OK")
+}
+
+func CgoExternalThreadSignal() {
+	if len(os.Args) > 2 && os.Args[2] == "crash" {
+		i := C.start_crashing_thread()
+		if i != 0 {
+			fmt.Println("pthread_create failed:", i)
+			// Exit with 0 because parent expects us to crash.
+			return
+		}
+
+		// We should crash immediately, but give it plenty of
+		// time before failing (by exiting 0) in case we are
+		// running on a slow system.
+		time.Sleep(5 * time.Second)
+		return
+	}
+
+	out, err := exec.Command(os.Args[0], "CgoExternalThreadSignal", "crash").CombinedOutput()
+	if err == nil {
+		fmt.Println("C signal did not crash as expected")
+		fmt.Printf("\n%s\n", out)
+		os.Exit(1)
+	}
+
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/traceback.go b/src/runtime/testdata/testprogcgo/traceback.go
new file mode 100644
index 0000000..e2d7599
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/traceback.go
@@ -0,0 +1,54 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// This program will crash.
+// We want the stack trace to include the C functions.
+// We use a fake traceback, and a symbolizer that dumps a string we recognize.
+
+/*
+#cgo CFLAGS: -g -O0
+
+// Defined in traceback_c.c.
+extern int crashInGo;
+int tracebackF1(void);
+void cgoTraceback(void* parg);
+void cgoSymbolizer(void* parg);
+*/
+import "C"
+
+import (
+	"runtime"
+	"unsafe"
+)
+
+func init() {
+	register("CrashTraceback", CrashTraceback)
+	register("CrashTracebackGo", CrashTracebackGo)
+}
+
+func CrashTraceback() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.cgoTraceback), nil, unsafe.Pointer(C.cgoSymbolizer))
+	C.tracebackF1()
+}
+
+func CrashTracebackGo() {
+	C.crashInGo = 1
+	CrashTraceback()
+}
+
+//export h1
+func h1() {
+	h2()
+}
+
+func h2() {
+	h3()
+}
+
+func h3() {
+	var x *int
+	*x = 0
+}
diff --git a/src/runtime/testdata/testprogcgo/traceback_c.c b/src/runtime/testdata/testprogcgo/traceback_c.c
new file mode 100644
index 0000000..56eda8f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/traceback_c.c
@@ -0,0 +1,65 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The C definitions for traceback.go. That file uses //export so
+// it can't put function definitions in the "C" import comment.
+
+#include <stdint.h>
+
+char *p;
+
+int crashInGo;
+extern void h1(void);
+
+int tracebackF3(void) {
+	if (crashInGo)
+		h1();
+	else
+		*p = 0;
+	return 0;
+}
+
+int tracebackF2(void) {
+	return tracebackF3();
+}
+
+int tracebackF1(void) {
+	return tracebackF2();
+}
+
+struct cgoTracebackArg {
+	uintptr_t  context;
+	uintptr_t  sigContext;
+	uintptr_t* buf;
+	uintptr_t  max;
+};
+
+struct cgoSymbolizerArg {
+	uintptr_t   pc;
+	const char* file;
+	uintptr_t   lineno;
+	const char* func;
+	uintptr_t   entry;
+	uintptr_t   more;
+	uintptr_t   data;
+};
+
+void cgoTraceback(void* parg) {
+	struct cgoTracebackArg* arg = (struct cgoTracebackArg*)(parg);
+	arg->buf[0] = 1;
+	arg->buf[1] = 2;
+	arg->buf[2] = 3;
+	arg->buf[3] = 0;
+}
+
+void cgoSymbolizer(void* parg) {
+	struct cgoSymbolizerArg* arg = (struct cgoSymbolizerArg*)(parg);
+	if (arg->pc != arg->data + 1) {
+		arg->file = "unexpected data";
+	} else {
+		arg->file = "cgo symbolizer";
+	}
+	arg->lineno = arg->data + 1;
+	arg->data++;
+}
diff --git a/src/runtime/testdata/testprogcgo/tracebackctxt.go b/src/runtime/testdata/testprogcgo/tracebackctxt.go
new file mode 100644
index 0000000..51fa4ad
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/tracebackctxt.go
@@ -0,0 +1,107 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The __attribute__((weak)) used below doesn't seem to work on Windows.
+
+package main
+
+// Test the context argument to SetCgoTraceback.
+// Use fake context, traceback, and symbolizer functions.
+
+/*
+// Defined in tracebackctxt_c.c.
+extern void C1(void);
+extern void C2(void);
+extern void tcContext(void*);
+extern void tcTraceback(void*);
+extern void tcSymbolizer(void*);
+extern int getContextCount(void);
+*/
+import "C"
+
+import (
+	"fmt"
+	"runtime"
+	"unsafe"
+)
+
+func init() {
+	register("TracebackContext", TracebackContext)
+}
+
+var tracebackOK bool
+
+func TracebackContext() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.tcTraceback), unsafe.Pointer(C.tcContext), unsafe.Pointer(C.tcSymbolizer))
+	C.C1()
+	if got := C.getContextCount(); got != 0 {
+		fmt.Printf("at end contextCount == %d, expected 0\n", got)
+		tracebackOK = false
+	}
+	if tracebackOK {
+		fmt.Println("OK")
+	}
+}
+
+//export G1
+func G1() {
+	C.C2()
+}
+
+//export G2
+func G2() {
+	pc := make([]uintptr, 32)
+	n := runtime.Callers(0, pc)
+	cf := runtime.CallersFrames(pc[:n])
+	var frames []runtime.Frame
+	for {
+		frame, more := cf.Next()
+		frames = append(frames, frame)
+		if !more {
+			break
+		}
+	}
+
+	want := []struct {
+		function string
+		line     int
+	}{
+		{"main.G2", 0},
+		{"cFunction", 0x10200},
+		{"cFunction", 0x200},
+		{"cFunction", 0x10201},
+		{"cFunction", 0x201},
+		{"main.G1", 0},
+		{"cFunction", 0x10100},
+		{"cFunction", 0x100},
+		{"main.TracebackContext", 0},
+	}
+
+	ok := true
+	i := 0
+wantLoop:
+	for _, w := range want {
+		for ; i < len(frames); i++ {
+			if w.function == frames[i].Function {
+				if w.line != 0 && w.line != frames[i].Line {
+					fmt.Printf("found function %s at wrong line %#x (expected %#x)\n", w.function, frames[i].Line, w.line)
+					ok = false
+				}
+				i++
+				continue wantLoop
+			}
+		}
+		fmt.Printf("did not find function %s in\n", w.function)
+		for _, f := range frames {
+			fmt.Println(f)
+		}
+		ok = false
+		break
+	}
+	tracebackOK = ok
+	if got := C.getContextCount(); got != 2 {
+		fmt.Printf("at bottom contextCount == %d, expected 2\n", got)
+		tracebackOK = false
+	}
+}
diff --git a/src/runtime/testdata/testprogcgo/tracebackctxt_c.c b/src/runtime/testdata/testprogcgo/tracebackctxt_c.c
new file mode 100644
index 0000000..900cada
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/tracebackctxt_c.c
@@ -0,0 +1,91 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The C definitions for tracebackctxt.go. That file uses //export so
+// it can't put function definitions in the "C" import comment.
+
+#include <stdlib.h>
+#include <stdint.h>
+
+// Functions exported from Go.
+extern void G1(void);
+extern void G2(void);
+
+void C1() {
+	G1();
+}
+
+void C2() {
+	G2();
+}
+
+struct cgoContextArg {
+	uintptr_t context;
+};
+
+struct cgoTracebackArg {
+	uintptr_t  context;
+	uintptr_t  sigContext;
+	uintptr_t* buf;
+	uintptr_t  max;
+};
+
+struct cgoSymbolizerArg {
+	uintptr_t   pc;
+	const char* file;
+	uintptr_t   lineno;
+	const char* func;
+	uintptr_t   entry;
+	uintptr_t   more;
+	uintptr_t   data;
+};
+
+// Uses atomic adds and subtracts to catch the possibility of
+// erroneous calls from multiple threads; that should be impossible in
+// this test case, but we check just in case.
+static int contextCount;
+
+int getContextCount() {
+	return __sync_add_and_fetch(&contextCount, 0);
+}
+
+void tcContext(void* parg) {
+	struct cgoContextArg* arg = (struct cgoContextArg*)(parg);
+	if (arg->context == 0) {
+		arg->context = __sync_add_and_fetch(&contextCount, 1);
+	} else {
+		if (arg->context != __sync_add_and_fetch(&contextCount, 0)) {
+			abort();
+		}
+		__sync_sub_and_fetch(&contextCount, 1);
+	}
+}
+
+void tcTraceback(void* parg) {
+	int base, i;
+	struct cgoTracebackArg* arg = (struct cgoTracebackArg*)(parg);
+	if (arg->context == 0) {
+		// This shouldn't happen in this program.
+		abort();
+	}
+	// Return a variable number of PC values.
+	base = arg->context << 8;
+	for (i = 0; i < arg->context; i++) {
+		if (i < arg->max) {
+			arg->buf[i] = base + i;
+		}
+	}
+}
+
+void tcSymbolizer(void *parg) {
+	struct cgoSymbolizerArg* arg = (struct cgoSymbolizerArg*)(parg);
+	if (arg->pc == 0) {
+		return;
+	}
+	// Report two lines per PC returned by traceback, to test more handling.
+	arg->more = arg->file == NULL;
+	arg->file = "tracebackctxt.go";
+	arg->func = "cFunction";
+	arg->lineno = arg->pc + (arg->more << 16);
+}
diff --git a/src/runtime/testdata/testprogcgo/windows/win.go b/src/runtime/testdata/testprogcgo/windows/win.go
new file mode 100644
index 0000000..f2eabb9
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/windows/win.go
@@ -0,0 +1,16 @@
+package windows
+
+/*
+#cgo CFLAGS: -mnop-fun-dllimport
+
+#include <windows.h>
+
+DWORD agetthread() {
+	return GetCurrentThreadId();
+}
+*/
+import "C"
+
+func GetThread() uint32 {
+	return uint32(C.agetthread())
+}
diff --git a/src/runtime/testdata/testprognet/main.go b/src/runtime/testdata/testprognet/main.go
new file mode 100644
index 0000000..ae491a2
--- /dev/null
+++ b/src/runtime/testdata/testprognet/main.go
@@ -0,0 +1,35 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "os"
+
+var cmds = map[string]func(){}
+
+func register(name string, f func()) {
+	if cmds[name] != nil {
+		panic("duplicate registration: " + name)
+	}
+	cmds[name] = f
+}
+
+func registerInit(name string, f func()) {
+	if len(os.Args) >= 2 && os.Args[1] == name {
+		f()
+	}
+}
+
+func main() {
+	if len(os.Args) < 2 {
+		println("usage: " + os.Args[0] + " name-of-test")
+		return
+	}
+	f := cmds[os.Args[1]]
+	if f == nil {
+		println("unknown function: " + os.Args[1])
+		return
+	}
+	f()
+}
diff --git a/src/runtime/testdata/testprognet/net.go b/src/runtime/testdata/testprognet/net.go
new file mode 100644
index 0000000..714b101
--- /dev/null
+++ b/src/runtime/testdata/testprognet/net.go
@@ -0,0 +1,29 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"net"
+)
+
+func init() {
+	registerInit("NetpollDeadlock", NetpollDeadlockInit)
+	register("NetpollDeadlock", NetpollDeadlock)
+}
+
+func NetpollDeadlockInit() {
+	fmt.Println("dialing")
+	c, err := net.Dial("tcp", "localhost:14356")
+	if err == nil {
+		c.Close()
+	} else {
+		fmt.Println("error: ", err)
+	}
+}
+
+func NetpollDeadlock() {
+	fmt.Println("done")
+}
diff --git a/src/runtime/testdata/testprognet/signal.go b/src/runtime/testdata/testprognet/signal.go
new file mode 100644
index 0000000..4d2de79
--- /dev/null
+++ b/src/runtime/testdata/testprognet/signal.go
@@ -0,0 +1,26 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !windows,!plan9
+
+// This is in testprognet instead of testprog because testprog
+// must not import anything (like net, but also like os/signal)
+// that kicks off background goroutines during init.
+
+package main
+
+import (
+	"os/signal"
+	"syscall"
+)
+
+func init() {
+	register("SignalIgnoreSIGTRAP", SignalIgnoreSIGTRAP)
+}
+
+func SignalIgnoreSIGTRAP() {
+	signal.Ignore(syscall.SIGTRAP)
+	syscall.Kill(syscall.Getpid(), syscall.SIGTRAP)
+	println("OK")
+}
diff --git a/src/runtime/testdata/testprognet/signalexec.go b/src/runtime/testdata/testprognet/signalexec.go
new file mode 100644
index 0000000..4a988ef
--- /dev/null
+++ b/src/runtime/testdata/testprognet/signalexec.go
@@ -0,0 +1,70 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build darwin dragonfly freebsd linux netbsd openbsd
+
+// This is in testprognet instead of testprog because testprog
+// must not import anything (like net, but also like os/signal)
+// that kicks off background goroutines during init.
+
+package main
+
+import (
+	"fmt"
+	"os"
+	"os/exec"
+	"os/signal"
+	"sync"
+	"syscall"
+	"time"
+)
+
+func init() {
+	register("SignalDuringExec", SignalDuringExec)
+	register("Nop", Nop)
+}
+
+func SignalDuringExec() {
+	pgrp := syscall.Getpgrp()
+
+	const tries = 10
+
+	var wg sync.WaitGroup
+	c := make(chan os.Signal, tries)
+	signal.Notify(c, syscall.SIGWINCH)
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		for range c {
+		}
+	}()
+
+	for i := 0; i < tries; i++ {
+		time.Sleep(time.Microsecond)
+		wg.Add(2)
+		go func() {
+			defer wg.Done()
+			cmd := exec.Command(os.Args[0], "Nop")
+			cmd.Stdout = os.Stdout
+			cmd.Stderr = os.Stderr
+			if err := cmd.Run(); err != nil {
+				fmt.Printf("Start failed: %v", err)
+			}
+		}()
+		go func() {
+			defer wg.Done()
+			syscall.Kill(-pgrp, syscall.SIGWINCH)
+		}()
+	}
+
+	signal.Stop(c)
+	close(c)
+	wg.Wait()
+
+	fmt.Println("OK")
+}
+
+func Nop() {
+	// This is just for SignalDuringExec.
+}
diff --git a/src/runtime/testdata/testwinlib/main.c b/src/runtime/testdata/testwinlib/main.c
new file mode 100644
index 0000000..e84a32f
--- /dev/null
+++ b/src/runtime/testdata/testwinlib/main.c
@@ -0,0 +1,57 @@
+#include <stdio.h>
+#include <windows.h>
+#include "testwinlib.h"
+
+int exceptionCount;
+int continueCount;
+LONG WINAPI customExceptionHandlder(struct _EXCEPTION_POINTERS *ExceptionInfo)
+{
+    if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_BREAKPOINT)
+    {
+        exceptionCount++;
+        // prepare context to resume execution
+        CONTEXT *c = ExceptionInfo->ContextRecord;
+        c->Rip = *(ULONG_PTR *)c->Rsp;
+        c->Rsp += 8;
+        return EXCEPTION_CONTINUE_EXECUTION;
+    }
+    return EXCEPTION_CONTINUE_SEARCH;
+}
+LONG WINAPI customContinueHandlder(struct _EXCEPTION_POINTERS *ExceptionInfo)
+{
+    if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_BREAKPOINT)
+    {
+        continueCount++;
+        return EXCEPTION_CONTINUE_EXECUTION;
+    }
+    return EXCEPTION_CONTINUE_SEARCH;
+}
+
+void throwFromC()
+{
+    DebugBreak();
+}
+int main()
+{
+    // simulate a "lazily" attached debugger, by calling some go code before attaching the exception/continue handler
+    Dummy();
+    exceptionCount = 0;
+    continueCount = 0;
+    void *exceptionHandlerHandle = AddVectoredExceptionHandler(0, customExceptionHandlder);
+    if (NULL == exceptionHandlerHandle)
+    {
+        printf("cannot add vectored exception handler\n");
+        return 2;
+    }
+    void *continueHandlerHandle = AddVectoredContinueHandler(0, customContinueHandlder);
+    if (NULL == continueHandlerHandle)
+    {
+        printf("cannot add vectored continue handler\n");
+        return 2;
+    }
+    CallMeBack(throwFromC);
+    RemoveVectoredContinueHandler(continueHandlerHandle);
+    RemoveVectoredExceptionHandler(exceptionHandlerHandle);
+    printf("exceptionCount: %d\ncontinueCount: %d\n", exceptionCount, continueCount);
+    return 0;
+}
+\ No newline at end of file
diff --git a/src/runtime/testdata/testwinlib/main.go b/src/runtime/testdata/testwinlib/main.go
new file mode 100644
index 0000000..400eaa1
--- /dev/null
+++ b/src/runtime/testdata/testwinlib/main.go
@@ -0,0 +1,28 @@
+// +build windows,cgo
+
+package main
+
+// #include <windows.h>
+// typedef void(*callmeBackFunc)();
+// static void bridgeCallback(callmeBackFunc callback) {
+//	callback();
+//}
+import "C"
+
+// CallMeBack call backs C code.
+//export CallMeBack
+func CallMeBack(callback C.callmeBackFunc) {
+	C.bridgeCallback(callback)
+}
+
+// Dummy is called by the C code before registering the exception/continue handlers simulating a debugger.
+// This makes sure that the Go runtime's lastcontinuehandler is reached before the C continue handler and thus,
+// validate that it does not crash the program before another handler could take an action.
+// The idea here is to reproduce what happens when you attach a debugger to a running program.
+// It also simulate the behavior of the .Net debugger, which register its exception/continue handlers lazily.
+//export Dummy
+func Dummy() int {
+	return 42
+}
+
+func main() {}
diff --git a/src/runtime/testdata/testwinlibsignal/dummy.go b/src/runtime/testdata/testwinlibsignal/dummy.go
new file mode 100644
index 0000000..82dfd91
--- /dev/null
+++ b/src/runtime/testdata/testwinlibsignal/dummy.go
@@ -0,0 +1,10 @@
+// +build windows
+
+package main
+
+//export Dummy
+func Dummy() int {
+	return 42
+}
+
+func main() {}
diff --git a/src/runtime/testdata/testwinlibsignal/main.c b/src/runtime/testdata/testwinlibsignal/main.c
new file mode 100644
index 0000000..1787fef
--- /dev/null
+++ b/src/runtime/testdata/testwinlibsignal/main.c
@@ -0,0 +1,50 @@
+#include <windows.h>
+#include <stdio.h>
+
+HANDLE waitForCtrlBreakEvent;
+
+BOOL WINAPI CtrlHandler(DWORD fdwCtrlType)
+{
+    switch (fdwCtrlType)
+    {
+    case CTRL_BREAK_EVENT:
+        SetEvent(waitForCtrlBreakEvent);
+        return TRUE;
+    default:
+        return FALSE;
+    }
+}
+
+int main(void)
+{
+    waitForCtrlBreakEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
+    if (!waitForCtrlBreakEvent) {
+        fprintf(stderr, "ERROR: Could not create event");
+        return 1;
+    }
+
+    if (!SetConsoleCtrlHandler(CtrlHandler, TRUE))
+    {
+        fprintf(stderr, "ERROR: Could not set control handler");
+        return 1;
+    }
+
+    // The library must be loaded after the SetConsoleCtrlHandler call
+    // so that the library handler registers after the main program.
+    // This way the library handler gets called first.
+    HMODULE dummyDll = LoadLibrary("dummy.dll");
+    if (!dummyDll) {
+        fprintf(stderr, "ERROR: Could not load dummy.dll");
+        return 1;
+    }
+
+    printf("ready\n");
+    fflush(stdout);
+
+    if (WaitForSingleObject(waitForCtrlBreakEvent, 5000) != WAIT_OBJECT_0) {
+        fprintf(stderr, "FAILURE: No signal received");
+        return 1;
+    }
+
+    return 0;
+}
diff --git a/src/runtime/testdata/testwinsignal/main.go b/src/runtime/testdata/testwinsignal/main.go
new file mode 100644
index 0000000..d8cd884
--- /dev/null
+++ b/src/runtime/testdata/testwinsignal/main.go
@@ -0,0 +1,19 @@
+package main
+
+import (
+	"fmt"
+	"os"
+	"os/signal"
+	"time"
+)
+
+func main() {
+	c := make(chan os.Signal, 1)
+	signal.Notify(c)
+
+	fmt.Println("ready")
+	sig := <-c
+
+	time.Sleep(time.Second)
+	fmt.Println(sig)
+}
diff --git a/src/runtime/textflag.h b/src/runtime/textflag.h
new file mode 100644
index 0000000..daca36d
--- /dev/null
+++ b/src/runtime/textflag.h
@@ -0,0 +1,37 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file defines flags attached to various functions
+// and data objects. The compilers, assemblers, and linker must
+// all agree on these values.
+//
+// Keep in sync with src/cmd/internal/obj/textflag.go.
+
+// Don't profile the marked routine. This flag is deprecated.
+#define NOPROF	1
+// It is ok for the linker to get multiple of these symbols. It will
+// pick one of the duplicates to use.
+#define DUPOK	2
+// Don't insert stack check preamble.
+#define NOSPLIT	4
+// Put this data in a read-only section.
+#define RODATA	8
+// This data contains no pointers.
+#define NOPTR	16
+// This is a wrapper function and should not count as disabling 'recover'.
+#define WRAPPER 32
+// This function uses its incoming context register.
+#define NEEDCTXT 64
+// Allocate a word of thread local storage and store the offset from the
+// thread local base to the thread local storage in this variable.
+#define TLSBSS	256
+// Do not insert instructions to allocate a stack frame for this function.
+// Only valid on functions that declare a frame size of 0.
+// TODO(mwhudson): only implemented for ppc64x at present.
+#define NOFRAME 512
+// Function can call reflect.Type.Method or reflect.Type.MethodByName.
+#define REFLECTMETHOD 1024
+// Function is the top of the call stack. Call stack unwinders should stop
+// at this function.
+#define TOPFRAME 2048
diff --git a/src/runtime/time.go b/src/runtime/time.go
new file mode 100644
index 0000000..517a493
--- /dev/null
+++ b/src/runtime/time.go
@@ -0,0 +1,1127 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Time-related runtime and pieces of package time.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Package time knows the layout of this structure.
+// If this struct changes, adjust ../time/sleep.go:/runtimeTimer.
+type timer struct {
+	// If this timer is on a heap, which P's heap it is on.
+	// puintptr rather than *p to match uintptr in the versions
+	// of this struct defined in other packages.
+	pp puintptr
+
+	// Timer wakes up at when, and then at when+period, ... (period > 0 only)
+	// each time calling f(arg, now) in the timer goroutine, so f must be
+	// a well-behaved function and not block.
+	//
+	// when must be positive on an active timer.
+	when   int64
+	period int64
+	f      func(interface{}, uintptr)
+	arg    interface{}
+	seq    uintptr
+
+	// What to set the when field to in timerModifiedXX status.
+	nextwhen int64
+
+	// The status field holds one of the values below.
+	status uint32
+}
+
+// Code outside this file has to be careful in using a timer value.
+//
+// The pp, status, and nextwhen fields may only be used by code in this file.
+//
+// Code that creates a new timer value can set the when, period, f,
+// arg, and seq fields.
+// A new timer value may be passed to addtimer (called by time.startTimer).
+// After doing that no fields may be touched.
+//
+// An active timer (one that has been passed to addtimer) may be
+// passed to deltimer (time.stopTimer), after which it is no longer an
+// active timer. It is an inactive timer.
+// In an inactive timer the period, f, arg, and seq fields may be modified,
+// but not the when field.
+// It's OK to just drop an inactive timer and let the GC collect it.
+// It's not OK to pass an inactive timer to addtimer.
+// Only newly allocated timer values may be passed to addtimer.
+//
+// An active timer may be passed to modtimer. No fields may be touched.
+// It remains an active timer.
+//
+// An inactive timer may be passed to resettimer to turn into an
+// active timer with an updated when field.
+// It's OK to pass a newly allocated timer value to resettimer.
+//
+// Timer operations are addtimer, deltimer, modtimer, resettimer,
+// cleantimers, adjusttimers, and runtimer.
+//
+// We don't permit calling addtimer/deltimer/modtimer/resettimer simultaneously,
+// but adjusttimers and runtimer can be called at the same time as any of those.
+//
+// Active timers live in heaps attached to P, in the timers field.
+// Inactive timers live there too temporarily, until they are removed.
+//
+// addtimer:
+//   timerNoStatus   -> timerWaiting
+//   anything else   -> panic: invalid value
+// deltimer:
+//   timerWaiting         -> timerModifying -> timerDeleted
+//   timerModifiedEarlier -> timerModifying -> timerDeleted
+//   timerModifiedLater   -> timerModifying -> timerDeleted
+//   timerNoStatus        -> do nothing
+//   timerDeleted         -> do nothing
+//   timerRemoving        -> do nothing
+//   timerRemoved         -> do nothing
+//   timerRunning         -> wait until status changes
+//   timerMoving          -> wait until status changes
+//   timerModifying       -> wait until status changes
+// modtimer:
+//   timerWaiting    -> timerModifying -> timerModifiedXX
+//   timerModifiedXX -> timerModifying -> timerModifiedYY
+//   timerNoStatus   -> timerModifying -> timerWaiting
+//   timerRemoved    -> timerModifying -> timerWaiting
+//   timerDeleted    -> timerModifying -> timerModifiedXX
+//   timerRunning    -> wait until status changes
+//   timerMoving     -> wait until status changes
+//   timerRemoving   -> wait until status changes
+//   timerModifying  -> wait until status changes
+// cleantimers (looks in P's timer heap):
+//   timerDeleted    -> timerRemoving -> timerRemoved
+//   timerModifiedXX -> timerMoving -> timerWaiting
+// adjusttimers (looks in P's timer heap):
+//   timerDeleted    -> timerRemoving -> timerRemoved
+//   timerModifiedXX -> timerMoving -> timerWaiting
+// runtimer (looks in P's timer heap):
+//   timerNoStatus   -> panic: uninitialized timer
+//   timerWaiting    -> timerWaiting or
+//   timerWaiting    -> timerRunning -> timerNoStatus or
+//   timerWaiting    -> timerRunning -> timerWaiting
+//   timerModifying  -> wait until status changes
+//   timerModifiedXX -> timerMoving -> timerWaiting
+//   timerDeleted    -> timerRemoving -> timerRemoved
+//   timerRunning    -> panic: concurrent runtimer calls
+//   timerRemoved    -> panic: inconsistent timer heap
+//   timerRemoving   -> panic: inconsistent timer heap
+//   timerMoving     -> panic: inconsistent timer heap
+
+// Values for the timer status field.
+const (
+	// Timer has no status set yet.
+	timerNoStatus = iota
+
+	// Waiting for timer to fire.
+	// The timer is in some P's heap.
+	timerWaiting
+
+	// Running the timer function.
+	// A timer will only have this status briefly.
+	timerRunning
+
+	// The timer is deleted and should be removed.
+	// It should not be run, but it is still in some P's heap.
+	timerDeleted
+
+	// The timer is being removed.
+	// The timer will only have this status briefly.
+	timerRemoving
+
+	// The timer has been stopped.
+	// It is not in any P's heap.
+	timerRemoved
+
+	// The timer is being modified.
+	// The timer will only have this status briefly.
+	timerModifying
+
+	// The timer has been modified to an earlier time.
+	// The new when value is in the nextwhen field.
+	// The timer is in some P's heap, possibly in the wrong place.
+	timerModifiedEarlier
+
+	// The timer has been modified to the same or a later time.
+	// The new when value is in the nextwhen field.
+	// The timer is in some P's heap, possibly in the wrong place.
+	timerModifiedLater
+
+	// The timer has been modified and is being moved.
+	// The timer will only have this status briefly.
+	timerMoving
+)
+
+// maxWhen is the maximum value for timer's when field.
+const maxWhen = 1<<63 - 1
+
+// verifyTimers can be set to true to add debugging checks that the
+// timer heaps are valid.
+const verifyTimers = false
+
+// Package time APIs.
+// Godoc uses the comments in package time, not these.
+
+// time.now is implemented in assembly.
+
+// timeSleep puts the current goroutine to sleep for at least ns nanoseconds.
+//go:linkname timeSleep time.Sleep
+func timeSleep(ns int64) {
+	if ns <= 0 {
+		return
+	}
+
+	gp := getg()
+	t := gp.timer
+	if t == nil {
+		t = new(timer)
+		gp.timer = t
+	}
+	t.f = goroutineReady
+	t.arg = gp
+	t.nextwhen = nanotime() + ns
+	if t.nextwhen < 0 { // check for overflow.
+		t.nextwhen = maxWhen
+	}
+	gopark(resetForSleep, unsafe.Pointer(t), waitReasonSleep, traceEvGoSleep, 1)
+}
+
+// resetForSleep is called after the goroutine is parked for timeSleep.
+// We can't call resettimer in timeSleep itself because if this is a short
+// sleep and there are many goroutines then the P can wind up running the
+// timer function, goroutineReady, before the goroutine has been parked.
+func resetForSleep(gp *g, ut unsafe.Pointer) bool {
+	t := (*timer)(ut)
+	resettimer(t, t.nextwhen)
+	return true
+}
+
+// startTimer adds t to the timer heap.
+//go:linkname startTimer time.startTimer
+func startTimer(t *timer) {
+	if raceenabled {
+		racerelease(unsafe.Pointer(t))
+	}
+	addtimer(t)
+}
+
+// stopTimer stops a timer.
+// It reports whether t was stopped before being run.
+//go:linkname stopTimer time.stopTimer
+func stopTimer(t *timer) bool {
+	return deltimer(t)
+}
+
+// resetTimer resets an inactive timer, adding it to the heap.
+//go:linkname resetTimer time.resetTimer
+// Reports whether the timer was modified before it was run.
+func resetTimer(t *timer, when int64) bool {
+	if raceenabled {
+		racerelease(unsafe.Pointer(t))
+	}
+	return resettimer(t, when)
+}
+
+// modTimer modifies an existing timer.
+//go:linkname modTimer time.modTimer
+func modTimer(t *timer, when, period int64, f func(interface{}, uintptr), arg interface{}, seq uintptr) {
+	modtimer(t, when, period, f, arg, seq)
+}
+
+// Go runtime.
+
+// Ready the goroutine arg.
+func goroutineReady(arg interface{}, seq uintptr) {
+	goready(arg.(*g), 0)
+}
+
+// addtimer adds a timer to the current P.
+// This should only be called with a newly created timer.
+// That avoids the risk of changing the when field of a timer in some P's heap,
+// which could cause the heap to become unsorted.
+func addtimer(t *timer) {
+	// when must be positive. A negative value will cause runtimer to
+	// overflow during its delta calculation and never expire other runtime
+	// timers. Zero will cause checkTimers to fail to notice the timer.
+	if t.when <= 0 {
+		throw("timer when must be positive")
+	}
+	if t.period < 0 {
+		throw("timer period must be non-negative")
+	}
+	if t.status != timerNoStatus {
+		throw("addtimer called with initialized timer")
+	}
+	t.status = timerWaiting
+
+	when := t.when
+
+	// Disable preemption while using pp to avoid changing another P's heap.
+	mp := acquirem()
+
+	pp := getg().m.p.ptr()
+	lock(&pp.timersLock)
+	cleantimers(pp)
+	doaddtimer(pp, t)
+	unlock(&pp.timersLock)
+
+	wakeNetPoller(when)
+
+	releasem(mp)
+}
+
+// doaddtimer adds t to the current P's heap.
+// The caller must have locked the timers for pp.
+func doaddtimer(pp *p, t *timer) {
+	// Timers rely on the network poller, so make sure the poller
+	// has started.
+	if netpollInited == 0 {
+		netpollGenericInit()
+	}
+
+	if t.pp != 0 {
+		throw("doaddtimer: P already set in timer")
+	}
+	t.pp.set(pp)
+	i := len(pp.timers)
+	pp.timers = append(pp.timers, t)
+	siftupTimer(pp.timers, i)
+	if t == pp.timers[0] {
+		atomic.Store64(&pp.timer0When, uint64(t.when))
+	}
+	atomic.Xadd(&pp.numTimers, 1)
+}
+
+// deltimer deletes the timer t. It may be on some other P, so we can't
+// actually remove it from the timers heap. We can only mark it as deleted.
+// It will be removed in due course by the P whose heap it is on.
+// Reports whether the timer was removed before it was run.
+func deltimer(t *timer) bool {
+	for {
+		switch s := atomic.Load(&t.status); s {
+		case timerWaiting, timerModifiedLater:
+			// Prevent preemption while the timer is in timerModifying.
+			// This could lead to a self-deadlock. See #38070.
+			mp := acquirem()
+			if atomic.Cas(&t.status, s, timerModifying) {
+				// Must fetch t.pp before changing status,
+				// as cleantimers in another goroutine
+				// can clear t.pp of a timerDeleted timer.
+				tpp := t.pp.ptr()
+				if !atomic.Cas(&t.status, timerModifying, timerDeleted) {
+					badTimer()
+				}
+				releasem(mp)
+				atomic.Xadd(&tpp.deletedTimers, 1)
+				// Timer was not yet run.
+				return true
+			} else {
+				releasem(mp)
+			}
+		case timerModifiedEarlier:
+			// Prevent preemption while the timer is in timerModifying.
+			// This could lead to a self-deadlock. See #38070.
+			mp := acquirem()
+			if atomic.Cas(&t.status, s, timerModifying) {
+				// Must fetch t.pp before setting status
+				// to timerDeleted.
+				tpp := t.pp.ptr()
+				if !atomic.Cas(&t.status, timerModifying, timerDeleted) {
+					badTimer()
+				}
+				releasem(mp)
+				atomic.Xadd(&tpp.deletedTimers, 1)
+				// Timer was not yet run.
+				return true
+			} else {
+				releasem(mp)
+			}
+		case timerDeleted, timerRemoving, timerRemoved:
+			// Timer was already run.
+			return false
+		case timerRunning, timerMoving:
+			// The timer is being run or moved, by a different P.
+			// Wait for it to complete.
+			osyield()
+		case timerNoStatus:
+			// Removing timer that was never added or
+			// has already been run. Also see issue 21874.
+			return false
+		case timerModifying:
+			// Simultaneous calls to deltimer and modtimer.
+			// Wait for the other call to complete.
+			osyield()
+		default:
+			badTimer()
+		}
+	}
+}
+
+// dodeltimer removes timer i from the current P's heap.
+// We are locked on the P when this is called.
+// It returns the smallest changed index in pp.timers.
+// The caller must have locked the timers for pp.
+func dodeltimer(pp *p, i int) int {
+	if t := pp.timers[i]; t.pp.ptr() != pp {
+		throw("dodeltimer: wrong P")
+	} else {
+		t.pp = 0
+	}
+	last := len(pp.timers) - 1
+	if i != last {
+		pp.timers[i] = pp.timers[last]
+	}
+	pp.timers[last] = nil
+	pp.timers = pp.timers[:last]
+	smallestChanged := i
+	if i != last {
+		// Moving to i may have moved the last timer to a new parent,
+		// so sift up to preserve the heap guarantee.
+		smallestChanged = siftupTimer(pp.timers, i)
+		siftdownTimer(pp.timers, i)
+	}
+	if i == 0 {
+		updateTimer0When(pp)
+	}
+	atomic.Xadd(&pp.numTimers, -1)
+	return smallestChanged
+}
+
+// dodeltimer0 removes timer 0 from the current P's heap.
+// We are locked on the P when this is called.
+// It reports whether it saw no problems due to races.
+// The caller must have locked the timers for pp.
+func dodeltimer0(pp *p) {
+	if t := pp.timers[0]; t.pp.ptr() != pp {
+		throw("dodeltimer0: wrong P")
+	} else {
+		t.pp = 0
+	}
+	last := len(pp.timers) - 1
+	if last > 0 {
+		pp.timers[0] = pp.timers[last]
+	}
+	pp.timers[last] = nil
+	pp.timers = pp.timers[:last]
+	if last > 0 {
+		siftdownTimer(pp.timers, 0)
+	}
+	updateTimer0When(pp)
+	atomic.Xadd(&pp.numTimers, -1)
+}
+
+// modtimer modifies an existing timer.
+// This is called by the netpoll code or time.Ticker.Reset or time.Timer.Reset.
+// Reports whether the timer was modified before it was run.
+func modtimer(t *timer, when, period int64, f func(interface{}, uintptr), arg interface{}, seq uintptr) bool {
+	if when <= 0 {
+		throw("timer when must be positive")
+	}
+	if period < 0 {
+		throw("timer period must be non-negative")
+	}
+
+	status := uint32(timerNoStatus)
+	wasRemoved := false
+	var pending bool
+	var mp *m
+loop:
+	for {
+		switch status = atomic.Load(&t.status); status {
+		case timerWaiting, timerModifiedEarlier, timerModifiedLater:
+			// Prevent preemption while the timer is in timerModifying.
+			// This could lead to a self-deadlock. See #38070.
+			mp = acquirem()
+			if atomic.Cas(&t.status, status, timerModifying) {
+				pending = true // timer not yet run
+				break loop
+			}
+			releasem(mp)
+		case timerNoStatus, timerRemoved:
+			// Prevent preemption while the timer is in timerModifying.
+			// This could lead to a self-deadlock. See #38070.
+			mp = acquirem()
+
+			// Timer was already run and t is no longer in a heap.
+			// Act like addtimer.
+			if atomic.Cas(&t.status, status, timerModifying) {
+				wasRemoved = true
+				pending = false // timer already run or stopped
+				break loop
+			}
+			releasem(mp)
+		case timerDeleted:
+			// Prevent preemption while the timer is in timerModifying.
+			// This could lead to a self-deadlock. See #38070.
+			mp = acquirem()
+			if atomic.Cas(&t.status, status, timerModifying) {
+				atomic.Xadd(&t.pp.ptr().deletedTimers, -1)
+				pending = false // timer already stopped
+				break loop
+			}
+			releasem(mp)
+		case timerRunning, timerRemoving, timerMoving:
+			// The timer is being run or moved, by a different P.
+			// Wait for it to complete.
+			osyield()
+		case timerModifying:
+			// Multiple simultaneous calls to modtimer.
+			// Wait for the other call to complete.
+			osyield()
+		default:
+			badTimer()
+		}
+	}
+
+	t.period = period
+	t.f = f
+	t.arg = arg
+	t.seq = seq
+
+	if wasRemoved {
+		t.when = when
+		pp := getg().m.p.ptr()
+		lock(&pp.timersLock)
+		doaddtimer(pp, t)
+		unlock(&pp.timersLock)
+		if !atomic.Cas(&t.status, timerModifying, timerWaiting) {
+			badTimer()
+		}
+		releasem(mp)
+		wakeNetPoller(when)
+	} else {
+		// The timer is in some other P's heap, so we can't change
+		// the when field. If we did, the other P's heap would
+		// be out of order. So we put the new when value in the
+		// nextwhen field, and let the other P set the when field
+		// when it is prepared to resort the heap.
+		t.nextwhen = when
+
+		newStatus := uint32(timerModifiedLater)
+		if when < t.when {
+			newStatus = timerModifiedEarlier
+		}
+
+		tpp := t.pp.ptr()
+
+		if newStatus == timerModifiedEarlier {
+			updateTimerModifiedEarliest(tpp, when)
+		}
+
+		// Set the new status of the timer.
+		if !atomic.Cas(&t.status, timerModifying, newStatus) {
+			badTimer()
+		}
+		releasem(mp)
+
+		// If the new status is earlier, wake up the poller.
+		if newStatus == timerModifiedEarlier {
+			wakeNetPoller(when)
+		}
+	}
+
+	return pending
+}
+
+// resettimer resets the time when a timer should fire.
+// If used for an inactive timer, the timer will become active.
+// This should be called instead of addtimer if the timer value has been,
+// or may have been, used previously.
+// Reports whether the timer was modified before it was run.
+func resettimer(t *timer, when int64) bool {
+	return modtimer(t, when, t.period, t.f, t.arg, t.seq)
+}
+
+// cleantimers cleans up the head of the timer queue. This speeds up
+// programs that create and delete timers; leaving them in the heap
+// slows down addtimer. Reports whether no timer problems were found.
+// The caller must have locked the timers for pp.
+func cleantimers(pp *p) {
+	gp := getg()
+	for {
+		if len(pp.timers) == 0 {
+			return
+		}
+
+		// This loop can theoretically run for a while, and because
+		// it is holding timersLock it cannot be preempted.
+		// If someone is trying to preempt us, just return.
+		// We can clean the timers later.
+		if gp.preemptStop {
+			return
+		}
+
+		t := pp.timers[0]
+		if t.pp.ptr() != pp {
+			throw("cleantimers: bad p")
+		}
+		switch s := atomic.Load(&t.status); s {
+		case timerDeleted:
+			if !atomic.Cas(&t.status, s, timerRemoving) {
+				continue
+			}
+			dodeltimer0(pp)
+			if !atomic.Cas(&t.status, timerRemoving, timerRemoved) {
+				badTimer()
+			}
+			atomic.Xadd(&pp.deletedTimers, -1)
+		case timerModifiedEarlier, timerModifiedLater:
+			if !atomic.Cas(&t.status, s, timerMoving) {
+				continue
+			}
+			// Now we can change the when field.
+			t.when = t.nextwhen
+			// Move t to the right position.
+			dodeltimer0(pp)
+			doaddtimer(pp, t)
+			if !atomic.Cas(&t.status, timerMoving, timerWaiting) {
+				badTimer()
+			}
+		default:
+			// Head of timers does not need adjustment.
+			return
+		}
+	}
+}
+
+// moveTimers moves a slice of timers to pp. The slice has been taken
+// from a different P.
+// This is currently called when the world is stopped, but the caller
+// is expected to have locked the timers for pp.
+func moveTimers(pp *p, timers []*timer) {
+	for _, t := range timers {
+	loop:
+		for {
+			switch s := atomic.Load(&t.status); s {
+			case timerWaiting:
+				if !atomic.Cas(&t.status, s, timerMoving) {
+					continue
+				}
+				t.pp = 0
+				doaddtimer(pp, t)
+				if !atomic.Cas(&t.status, timerMoving, timerWaiting) {
+					badTimer()
+				}
+				break loop
+			case timerModifiedEarlier, timerModifiedLater:
+				if !atomic.Cas(&t.status, s, timerMoving) {
+					continue
+				}
+				t.when = t.nextwhen
+				t.pp = 0
+				doaddtimer(pp, t)
+				if !atomic.Cas(&t.status, timerMoving, timerWaiting) {
+					badTimer()
+				}
+				break loop
+			case timerDeleted:
+				if !atomic.Cas(&t.status, s, timerRemoved) {
+					continue
+				}
+				t.pp = 0
+				// We no longer need this timer in the heap.
+				break loop
+			case timerModifying:
+				// Loop until the modification is complete.
+				osyield()
+			case timerNoStatus, timerRemoved:
+				// We should not see these status values in a timers heap.
+				badTimer()
+			case timerRunning, timerRemoving, timerMoving:
+				// Some other P thinks it owns this timer,
+				// which should not happen.
+				badTimer()
+			default:
+				badTimer()
+			}
+		}
+	}
+}
+
+// adjusttimers looks through the timers in the current P's heap for
+// any timers that have been modified to run earlier, and puts them in
+// the correct place in the heap. While looking for those timers,
+// it also moves timers that have been modified to run later,
+// and removes deleted timers. The caller must have locked the timers for pp.
+func adjusttimers(pp *p, now int64) {
+	// If we haven't yet reached the time of the first timerModifiedEarlier
+	// timer, don't do anything. This speeds up programs that adjust
+	// a lot of timers back and forth if the timers rarely expire.
+	// We'll postpone looking through all the adjusted timers until
+	// one would actually expire.
+	first := atomic.Load64(&pp.timerModifiedEarliest)
+	if first == 0 || int64(first) > now {
+		if verifyTimers {
+			verifyTimerHeap(pp)
+		}
+		return
+	}
+
+	// We are going to clear all timerModifiedEarlier timers.
+	atomic.Store64(&pp.timerModifiedEarliest, 0)
+
+	var moved []*timer
+	for i := 0; i < len(pp.timers); i++ {
+		t := pp.timers[i]
+		if t.pp.ptr() != pp {
+			throw("adjusttimers: bad p")
+		}
+		switch s := atomic.Load(&t.status); s {
+		case timerDeleted:
+			if atomic.Cas(&t.status, s, timerRemoving) {
+				changed := dodeltimer(pp, i)
+				if !atomic.Cas(&t.status, timerRemoving, timerRemoved) {
+					badTimer()
+				}
+				atomic.Xadd(&pp.deletedTimers, -1)
+				// Go back to the earliest changed heap entry.
+				// "- 1" because the loop will add 1.
+				i = changed - 1
+			}
+		case timerModifiedEarlier, timerModifiedLater:
+			if atomic.Cas(&t.status, s, timerMoving) {
+				// Now we can change the when field.
+				t.when = t.nextwhen
+				// Take t off the heap, and hold onto it.
+				// We don't add it back yet because the
+				// heap manipulation could cause our
+				// loop to skip some other timer.
+				changed := dodeltimer(pp, i)
+				moved = append(moved, t)
+				// Go back to the earliest changed heap entry.
+				// "- 1" because the loop will add 1.
+				i = changed - 1
+			}
+		case timerNoStatus, timerRunning, timerRemoving, timerRemoved, timerMoving:
+			badTimer()
+		case timerWaiting:
+			// OK, nothing to do.
+		case timerModifying:
+			// Check again after modification is complete.
+			osyield()
+			i--
+		default:
+			badTimer()
+		}
+	}
+
+	if len(moved) > 0 {
+		addAdjustedTimers(pp, moved)
+	}
+
+	if verifyTimers {
+		verifyTimerHeap(pp)
+	}
+}
+
+// addAdjustedTimers adds any timers we adjusted in adjusttimers
+// back to the timer heap.
+func addAdjustedTimers(pp *p, moved []*timer) {
+	for _, t := range moved {
+		doaddtimer(pp, t)
+		if !atomic.Cas(&t.status, timerMoving, timerWaiting) {
+			badTimer()
+		}
+	}
+}
+
+// nobarrierWakeTime looks at P's timers and returns the time when we
+// should wake up the netpoller. It returns 0 if there are no timers.
+// This function is invoked when dropping a P, and must run without
+// any write barriers.
+//go:nowritebarrierrec
+func nobarrierWakeTime(pp *p) int64 {
+	next := int64(atomic.Load64(&pp.timer0When))
+	nextAdj := int64(atomic.Load64(&pp.timerModifiedEarliest))
+	if next == 0 || (nextAdj != 0 && nextAdj < next) {
+		next = nextAdj
+	}
+	return next
+}
+
+// runtimer examines the first timer in timers. If it is ready based on now,
+// it runs the timer and removes or updates it.
+// Returns 0 if it ran a timer, -1 if there are no more timers, or the time
+// when the first timer should run.
+// The caller must have locked the timers for pp.
+// If a timer is run, this will temporarily unlock the timers.
+//go:systemstack
+func runtimer(pp *p, now int64) int64 {
+	for {
+		t := pp.timers[0]
+		if t.pp.ptr() != pp {
+			throw("runtimer: bad p")
+		}
+		switch s := atomic.Load(&t.status); s {
+		case timerWaiting:
+			if t.when > now {
+				// Not ready to run.
+				return t.when
+			}
+
+			if !atomic.Cas(&t.status, s, timerRunning) {
+				continue
+			}
+			// Note that runOneTimer may temporarily unlock
+			// pp.timersLock.
+			runOneTimer(pp, t, now)
+			return 0
+
+		case timerDeleted:
+			if !atomic.Cas(&t.status, s, timerRemoving) {
+				continue
+			}
+			dodeltimer0(pp)
+			if !atomic.Cas(&t.status, timerRemoving, timerRemoved) {
+				badTimer()
+			}
+			atomic.Xadd(&pp.deletedTimers, -1)
+			if len(pp.timers) == 0 {
+				return -1
+			}
+
+		case timerModifiedEarlier, timerModifiedLater:
+			if !atomic.Cas(&t.status, s, timerMoving) {
+				continue
+			}
+			t.when = t.nextwhen
+			dodeltimer0(pp)
+			doaddtimer(pp, t)
+			if !atomic.Cas(&t.status, timerMoving, timerWaiting) {
+				badTimer()
+			}
+
+		case timerModifying:
+			// Wait for modification to complete.
+			osyield()
+
+		case timerNoStatus, timerRemoved:
+			// Should not see a new or inactive timer on the heap.
+			badTimer()
+		case timerRunning, timerRemoving, timerMoving:
+			// These should only be set when timers are locked,
+			// and we didn't do it.
+			badTimer()
+		default:
+			badTimer()
+		}
+	}
+}
+
+// runOneTimer runs a single timer.
+// The caller must have locked the timers for pp.
+// This will temporarily unlock the timers while running the timer function.
+//go:systemstack
+func runOneTimer(pp *p, t *timer, now int64) {
+	if raceenabled {
+		ppcur := getg().m.p.ptr()
+		if ppcur.timerRaceCtx == 0 {
+			ppcur.timerRaceCtx = racegostart(funcPC(runtimer) + sys.PCQuantum)
+		}
+		raceacquirectx(ppcur.timerRaceCtx, unsafe.Pointer(t))
+	}
+
+	f := t.f
+	arg := t.arg
+	seq := t.seq
+
+	if t.period > 0 {
+		// Leave in heap but adjust next time to fire.
+		delta := t.when - now
+		t.when += t.period * (1 + -delta/t.period)
+		if t.when < 0 { // check for overflow.
+			t.when = maxWhen
+		}
+		siftdownTimer(pp.timers, 0)
+		if !atomic.Cas(&t.status, timerRunning, timerWaiting) {
+			badTimer()
+		}
+		updateTimer0When(pp)
+	} else {
+		// Remove from heap.
+		dodeltimer0(pp)
+		if !atomic.Cas(&t.status, timerRunning, timerNoStatus) {
+			badTimer()
+		}
+	}
+
+	if raceenabled {
+		// Temporarily use the current P's racectx for g0.
+		gp := getg()
+		if gp.racectx != 0 {
+			throw("runOneTimer: unexpected racectx")
+		}
+		gp.racectx = gp.m.p.ptr().timerRaceCtx
+	}
+
+	unlock(&pp.timersLock)
+
+	f(arg, seq)
+
+	lock(&pp.timersLock)
+
+	if raceenabled {
+		gp := getg()
+		gp.racectx = 0
+	}
+}
+
+// clearDeletedTimers removes all deleted timers from the P's timer heap.
+// This is used to avoid clogging up the heap if the program
+// starts a lot of long-running timers and then stops them.
+// For example, this can happen via context.WithTimeout.
+//
+// This is the only function that walks through the entire timer heap,
+// other than moveTimers which only runs when the world is stopped.
+//
+// The caller must have locked the timers for pp.
+func clearDeletedTimers(pp *p) {
+	// We are going to clear all timerModifiedEarlier timers.
+	// Do this now in case new ones show up while we are looping.
+	atomic.Store64(&pp.timerModifiedEarliest, 0)
+
+	cdel := int32(0)
+	to := 0
+	changedHeap := false
+	timers := pp.timers
+nextTimer:
+	for _, t := range timers {
+		for {
+			switch s := atomic.Load(&t.status); s {
+			case timerWaiting:
+				if changedHeap {
+					timers[to] = t
+					siftupTimer(timers, to)
+				}
+				to++
+				continue nextTimer
+			case timerModifiedEarlier, timerModifiedLater:
+				if atomic.Cas(&t.status, s, timerMoving) {
+					t.when = t.nextwhen
+					timers[to] = t
+					siftupTimer(timers, to)
+					to++
+					changedHeap = true
+					if !atomic.Cas(&t.status, timerMoving, timerWaiting) {
+						badTimer()
+					}
+					continue nextTimer
+				}
+			case timerDeleted:
+				if atomic.Cas(&t.status, s, timerRemoving) {
+					t.pp = 0
+					cdel++
+					if !atomic.Cas(&t.status, timerRemoving, timerRemoved) {
+						badTimer()
+					}
+					changedHeap = true
+					continue nextTimer
+				}
+			case timerModifying:
+				// Loop until modification complete.
+				osyield()
+			case timerNoStatus, timerRemoved:
+				// We should not see these status values in a timer heap.
+				badTimer()
+			case timerRunning, timerRemoving, timerMoving:
+				// Some other P thinks it owns this timer,
+				// which should not happen.
+				badTimer()
+			default:
+				badTimer()
+			}
+		}
+	}
+
+	// Set remaining slots in timers slice to nil,
+	// so that the timer values can be garbage collected.
+	for i := to; i < len(timers); i++ {
+		timers[i] = nil
+	}
+
+	atomic.Xadd(&pp.deletedTimers, -cdel)
+	atomic.Xadd(&pp.numTimers, -cdel)
+
+	timers = timers[:to]
+	pp.timers = timers
+	updateTimer0When(pp)
+
+	if verifyTimers {
+		verifyTimerHeap(pp)
+	}
+}
+
+// verifyTimerHeap verifies that the timer heap is in a valid state.
+// This is only for debugging, and is only called if verifyTimers is true.
+// The caller must have locked the timers.
+func verifyTimerHeap(pp *p) {
+	for i, t := range pp.timers {
+		if i == 0 {
+			// First timer has no parent.
+			continue
+		}
+
+		// The heap is 4-ary. See siftupTimer and siftdownTimer.
+		p := (i - 1) / 4
+		if t.when < pp.timers[p].when {
+			print("bad timer heap at ", i, ": ", p, ": ", pp.timers[p].when, ", ", i, ": ", t.when, "\n")
+			throw("bad timer heap")
+		}
+	}
+	if numTimers := int(atomic.Load(&pp.numTimers)); len(pp.timers) != numTimers {
+		println("timer heap len", len(pp.timers), "!= numTimers", numTimers)
+		throw("bad timer heap len")
+	}
+}
+
+// updateTimer0When sets the P's timer0When field.
+// The caller must have locked the timers for pp.
+func updateTimer0When(pp *p) {
+	if len(pp.timers) == 0 {
+		atomic.Store64(&pp.timer0When, 0)
+	} else {
+		atomic.Store64(&pp.timer0When, uint64(pp.timers[0].when))
+	}
+}
+
+// updateTimerModifiedEarliest updates the recorded nextwhen field of the
+// earlier timerModifiedEarier value.
+// The timers for pp will not be locked.
+func updateTimerModifiedEarliest(pp *p, nextwhen int64) {
+	for {
+		old := atomic.Load64(&pp.timerModifiedEarliest)
+		if old != 0 && int64(old) < nextwhen {
+			return
+		}
+		if atomic.Cas64(&pp.timerModifiedEarliest, old, uint64(nextwhen)) {
+			return
+		}
+	}
+}
+
+// timeSleepUntil returns the time when the next timer should fire,
+// and the P that holds the timer heap that that timer is on.
+// This is only called by sysmon and checkdead.
+func timeSleepUntil() (int64, *p) {
+	next := int64(maxWhen)
+	var pret *p
+
+	// Prevent allp slice changes. This is like retake.
+	lock(&allpLock)
+	for _, pp := range allp {
+		if pp == nil {
+			// This can happen if procresize has grown
+			// allp but not yet created new Ps.
+			continue
+		}
+
+		w := int64(atomic.Load64(&pp.timer0When))
+		if w != 0 && w < next {
+			next = w
+			pret = pp
+		}
+
+		w = int64(atomic.Load64(&pp.timerModifiedEarliest))
+		if w != 0 && w < next {
+			next = w
+			pret = pp
+		}
+	}
+	unlock(&allpLock)
+
+	return next, pret
+}
+
+// Heap maintenance algorithms.
+// These algorithms check for slice index errors manually.
+// Slice index error can happen if the program is using racy
+// access to timers. We don't want to panic here, because
+// it will cause the program to crash with a mysterious
+// "panic holding locks" message. Instead, we panic while not
+// holding a lock.
+
+// siftupTimer puts the timer at position i in the right place
+// in the heap by moving it up toward the top of the heap.
+// It returns the smallest changed index.
+func siftupTimer(t []*timer, i int) int {
+	if i >= len(t) {
+		badTimer()
+	}
+	when := t[i].when
+	if when <= 0 {
+		badTimer()
+	}
+	tmp := t[i]
+	for i > 0 {
+		p := (i - 1) / 4 // parent
+		if when >= t[p].when {
+			break
+		}
+		t[i] = t[p]
+		i = p
+	}
+	if tmp != t[i] {
+		t[i] = tmp
+	}
+	return i
+}
+
+// siftdownTimer puts the timer at position i in the right place
+// in the heap by moving it down toward the bottom of the heap.
+func siftdownTimer(t []*timer, i int) {
+	n := len(t)
+	if i >= n {
+		badTimer()
+	}
+	when := t[i].when
+	if when <= 0 {
+		badTimer()
+	}
+	tmp := t[i]
+	for {
+		c := i*4 + 1 // left child
+		c3 := c + 2  // mid child
+		if c >= n {
+			break
+		}
+		w := t[c].when
+		if c+1 < n && t[c+1].when < w {
+			w = t[c+1].when
+			c++
+		}
+		if c3 < n {
+			w3 := t[c3].when
+			if c3+1 < n && t[c3+1].when < w3 {
+				w3 = t[c3+1].when
+				c3++
+			}
+			if w3 < w {
+				w = w3
+				c = c3
+			}
+		}
+		if w >= when {
+			break
+		}
+		t[i] = t[c]
+		i = c
+	}
+	if tmp != t[i] {
+		t[i] = tmp
+	}
+}
+
+// badTimer is called if the timer data structures have been corrupted,
+// presumably due to racy use by the program. We panic here rather than
+// panicing due to invalid slice access while holding locks.
+// See issue #25686.
+func badTimer() {
+	throw("timer data corruption")
+}
diff --git a/src/runtime/time_fake.go b/src/runtime/time_fake.go
new file mode 100644
index 0000000..c64d299
--- /dev/null
+++ b/src/runtime/time_fake.go
@@ -0,0 +1,100 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build faketime
+// +build !windows
+
+// Faketime isn't currently supported on Windows. This would require:
+//
+// 1. Shadowing time_now, which is implemented in assembly on Windows.
+//    Since that's exported directly to the time package from runtime
+//    assembly, this would involve moving it from sys_windows_*.s into
+//    its own assembly files build-tagged with !faketime and using the
+//    implementation of time_now from timestub.go in faketime mode.
+//
+// 2. Modifying syscall.Write to call syscall.faketimeWrite,
+//    translating the Stdout and Stderr handles into FDs 1 and 2.
+//    (See CL 192739 PS 3.)
+
+package runtime
+
+import "unsafe"
+
+// faketime is the simulated time in nanoseconds since 1970 for the
+// playground.
+var faketime int64 = 1257894000000000000
+
+var faketimeState struct {
+	lock mutex
+
+	// lastfaketime is the last faketime value written to fd 1 or 2.
+	lastfaketime int64
+
+	// lastfd is the fd to which lastfaketime was written.
+	//
+	// Subsequent writes to the same fd may use the same
+	// timestamp, but the timestamp must increase if the fd
+	// changes.
+	lastfd uintptr
+}
+
+//go:nosplit
+func nanotime() int64 {
+	return faketime
+}
+
+func walltime() (sec int64, nsec int32) {
+	return faketime / 1000000000, int32(faketime % 1000000000)
+}
+
+func write(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	if !(fd == 1 || fd == 2) {
+		// Do an ordinary write.
+		return write1(fd, p, n)
+	}
+
+	// Write with the playback header.
+
+	// First, lock to avoid interleaving writes.
+	lock(&faketimeState.lock)
+
+	// If the current fd doesn't match the fd of the previous write,
+	// ensure that the timestamp is strictly greater. That way, we can
+	// recover the original order even if we read the fds separately.
+	t := faketimeState.lastfaketime
+	if fd != faketimeState.lastfd {
+		t++
+		faketimeState.lastfd = fd
+	}
+	if faketime > t {
+		t = faketime
+	}
+	faketimeState.lastfaketime = t
+
+	// Playback header: 0 0 P B <8-byte time> <4-byte data length> (big endian)
+	var buf [4 + 8 + 4]byte
+	buf[2] = 'P'
+	buf[3] = 'B'
+	tu := uint64(t)
+	buf[4] = byte(tu >> (7 * 8))
+	buf[5] = byte(tu >> (6 * 8))
+	buf[6] = byte(tu >> (5 * 8))
+	buf[7] = byte(tu >> (4 * 8))
+	buf[8] = byte(tu >> (3 * 8))
+	buf[9] = byte(tu >> (2 * 8))
+	buf[10] = byte(tu >> (1 * 8))
+	buf[11] = byte(tu >> (0 * 8))
+	nu := uint32(n)
+	buf[12] = byte(nu >> (3 * 8))
+	buf[13] = byte(nu >> (2 * 8))
+	buf[14] = byte(nu >> (1 * 8))
+	buf[15] = byte(nu >> (0 * 8))
+	write1(fd, unsafe.Pointer(&buf[0]), int32(len(buf)))
+
+	// Write actual data.
+	res := write1(fd, p, n)
+
+	unlock(&faketimeState.lock)
+	return res
+}
diff --git a/src/runtime/time_nofake.go b/src/runtime/time_nofake.go
new file mode 100644
index 0000000..1912a94
--- /dev/null
+++ b/src/runtime/time_nofake.go
@@ -0,0 +1,31 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !faketime
+
+package runtime
+
+import "unsafe"
+
+// faketime is the simulated time in nanoseconds since 1970 for the
+// playground.
+//
+// Zero means not to use faketime.
+var faketime int64
+
+//go:nosplit
+func nanotime() int64 {
+	return nanotime1()
+}
+
+func walltime() (sec int64, nsec int32) {
+	return walltime1()
+}
+
+// write must be nosplit on Windows (see write1)
+//
+//go:nosplit
+func write(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	return write1(fd, p, n)
+}
diff --git a/src/runtime/time_test.go b/src/runtime/time_test.go
new file mode 100644
index 0000000..afd9af2
--- /dev/null
+++ b/src/runtime/time_test.go
@@ -0,0 +1,97 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"encoding/binary"
+	"errors"
+	"internal/testenv"
+	"os/exec"
+	"reflect"
+	"runtime"
+	"testing"
+)
+
+func TestFakeTime(t *testing.T) {
+	if runtime.GOOS == "windows" {
+		t.Skip("faketime not supported on windows")
+	}
+
+	// Faketime is advanced in checkdead. External linking brings in cgo,
+	// causing checkdead not working.
+	testenv.MustInternalLink(t)
+
+	t.Parallel()
+
+	exe, err := buildTestProg(t, "testfaketime", "-tags=faketime")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	var stdout, stderr bytes.Buffer
+	cmd := exec.Command(exe)
+	cmd.Stdout = &stdout
+	cmd.Stderr = &stderr
+
+	err = testenv.CleanCmdEnv(cmd).Run()
+	if err != nil {
+		t.Fatalf("exit status: %v\n%s", err, stderr.String())
+	}
+
+	t.Logf("raw stdout: %q", stdout.String())
+	t.Logf("raw stderr: %q", stderr.String())
+
+	f1, err1 := parseFakeTime(stdout.Bytes())
+	if err1 != nil {
+		t.Fatal(err1)
+	}
+	f2, err2 := parseFakeTime(stderr.Bytes())
+	if err2 != nil {
+		t.Fatal(err2)
+	}
+
+	const time0 = 1257894000000000000
+	got := [][]fakeTimeFrame{f1, f2}
+	var want = [][]fakeTimeFrame{{
+		{time0 + 1, "line 2\n"},
+		{time0 + 1, "line 3\n"},
+		{time0 + 1e9, "line 5\n"},
+		{time0 + 1e9, "2009-11-10T23:00:01Z"},
+	}, {
+		{time0, "line 1\n"},
+		{time0 + 2, "line 4\n"},
+	}}
+	if !reflect.DeepEqual(want, got) {
+		t.Fatalf("want %v, got %v", want, got)
+	}
+}
+
+type fakeTimeFrame struct {
+	time uint64
+	data string
+}
+
+func parseFakeTime(x []byte) ([]fakeTimeFrame, error) {
+	var frames []fakeTimeFrame
+	for len(x) != 0 {
+		if len(x) < 4+8+4 {
+			return nil, errors.New("truncated header")
+		}
+		const magic = "\x00\x00PB"
+		if string(x[:len(magic)]) != magic {
+			return nil, errors.New("bad magic")
+		}
+		x = x[len(magic):]
+		time := binary.BigEndian.Uint64(x)
+		x = x[8:]
+		dlen := binary.BigEndian.Uint32(x)
+		x = x[4:]
+		data := string(x[:dlen])
+		x = x[dlen:]
+		frames = append(frames, fakeTimeFrame{time, data})
+	}
+	return frames, nil
+}
diff --git a/src/runtime/timeasm.go b/src/runtime/timeasm.go
new file mode 100644
index 0000000..82cf63e
--- /dev/null
+++ b/src/runtime/timeasm.go
@@ -0,0 +1,14 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Declarations for operating systems implementing time.now directly in assembly.
+
+// +build windows
+
+package runtime
+
+import _ "unsafe"
+
+//go:linkname time_now time.now
+func time_now() (sec int64, nsec int32, mono int64)
diff --git a/src/runtime/timestub.go b/src/runtime/timestub.go
new file mode 100644
index 0000000..459bf8e
--- /dev/null
+++ b/src/runtime/timestub.go
@@ -0,0 +1,18 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Declarations for operating systems implementing time.now
+// indirectly, in terms of walltime and nanotime assembly.
+
+// +build !windows
+
+package runtime
+
+import _ "unsafe" // for go:linkname
+
+//go:linkname time_now time.now
+func time_now() (sec int64, nsec int32, mono int64) {
+	sec, nsec = walltime()
+	return sec, nsec, nanotime()
+}
diff --git a/src/runtime/timestub2.go b/src/runtime/timestub2.go
new file mode 100644
index 0000000..68777ee
--- /dev/null
+++ b/src/runtime/timestub2.go
@@ -0,0 +1,14 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !aix
+// +build !darwin
+// +build !freebsd
+// +build !openbsd
+// +build !solaris
+// +build !windows
+
+package runtime
+
+func walltime1() (sec int64, nsec int32)
diff --git a/src/runtime/tls_arm.s b/src/runtime/tls_arm.s
new file mode 100644
index 0000000..e42de8d
--- /dev/null
+++ b/src/runtime/tls_arm.s
@@ -0,0 +1,98 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !windows
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// We have to resort to TLS variable to save g(R10).
+// One reason is that external code might trigger
+// SIGSEGV, and our runtime.sigtramp don't even know we
+// are in external code, and will continue to use R10,
+// this might as well result in another SIGSEGV.
+// Note: both functions will clobber R0 and R11 and
+// can be called from 5c ABI code.
+
+// On android, runtime.tls_g is a normal variable.
+// TLS offset is computed in x_cgo_inittls.
+#ifdef GOOS_android
+#define TLSG_IS_VARIABLE
+#endif
+
+// save_g saves the g register into pthread-provided
+// thread-local memory, so that we can call externally compiled
+// ARM code that will overwrite those registers.
+// NOTE: runtime.gogo assumes that R1 is preserved by this function.
+//       runtime.mcall assumes this function only clobbers R0 and R11.
+// Returns with g in R0.
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0
+	// If the host does not support MRC the linker will replace it with
+	// a call to runtime.read_tls_fallback which jumps to __kuser_get_tls.
+	// The replacement function saves LR in R11 over the call to read_tls_fallback.
+	MRC	15, 0, R0, C13, C0, 3 // fetch TLS base pointer
+	BIC $3, R0 // Darwin/ARM might return unaligned pointer
+	MOVW	runtime·tls_g(SB), R11
+	ADD	R11, R0
+	MOVW	g, 0(R0)
+	MOVW	g, R0 // preserve R0 across call to setg<>
+	RET
+
+// load_g loads the g register from pthread-provided
+// thread-local memory, for use after calling externally compiled
+// ARM code that overwrote those registers.
+TEXT runtime·load_g(SB),NOSPLIT,$0
+	// See save_g
+	MRC	15, 0, R0, C13, C0, 3 // fetch TLS base pointer
+	BIC $3, R0 // Darwin/ARM might return unaligned pointer
+	MOVW	runtime·tls_g(SB), R11
+	ADD	R11, R0
+	MOVW	0(R0), g
+	RET
+
+// This is called from rt0_go, which runs on the system stack
+// using the initial stack allocated by the OS.
+// It calls back into standard C using the BL (R4) below.
+// To do that, the stack pointer must be 8-byte-aligned
+// on some systems, notably FreeBSD.
+// The ARM ABI says the stack pointer must be 8-byte-aligned
+// on entry to any function, but only FreeBSD's C library seems to care.
+// The caller was 8-byte aligned, but we push an LR.
+// Declare a dummy word ($4, not $0) to make sure the
+// frame is 8 bytes and stays 8-byte-aligned.
+TEXT runtime·_initcgo(SB),NOSPLIT,$4
+	// if there is an _cgo_init, call it.
+	MOVW	_cgo_init(SB), R4
+	CMP	$0, R4
+	B.EQ	nocgo
+	MRC     15, 0, R0, C13, C0, 3 	// load TLS base pointer
+	MOVW 	R0, R3 			// arg 3: TLS base pointer
+#ifdef TLSG_IS_VARIABLE
+	MOVW 	$runtime·tls_g(SB), R2 	// arg 2: &tls_g
+#else
+	MOVW	$0, R2			// arg 2: not used when using platform tls
+#endif
+	MOVW	$setg_gcc<>(SB), R1 	// arg 1: setg
+	MOVW	g, R0 			// arg 0: G
+	BL	(R4) // will clobber R0-R3
+nocgo:
+	RET
+
+// void setg_gcc(G*); set g called from gcc.
+TEXT setg_gcc<>(SB),NOSPLIT,$0
+	MOVW	R0, g
+	B		runtime·save_g(SB)
+
+#ifdef TLSG_IS_VARIABLE
+#ifdef GOOS_android
+// Use the free TLS_SLOT_APP slot #2 on Android Q.
+// Earlier androids are set up in gcc_android.c.
+DATA runtime·tls_g+0(SB)/4, $8
+#endif
+GLOBL runtime·tls_g+0(SB), NOPTR, $4
+#else
+GLOBL runtime·tls_g+0(SB), TLSBSS, $4
+#endif
diff --git a/src/runtime/tls_arm64.h b/src/runtime/tls_arm64.h
new file mode 100644
index 0000000..0804fa3
--- /dev/null
+++ b/src/runtime/tls_arm64.h
@@ -0,0 +1,48 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#ifdef GOOS_android
+#define TLS_linux
+#define TLSG_IS_VARIABLE
+#endif
+#ifdef GOOS_linux
+#define TLS_linux
+#endif
+#ifdef TLS_linux
+#define TPIDR TPIDR_EL0
+#define MRS_TPIDR_R0 WORD $0xd53bd040 // MRS TPIDR_EL0, R0
+#endif
+
+#ifdef GOOS_darwin
+#define TLS_darwin
+#endif
+#ifdef GOOS_ios
+#define TLS_darwin
+#endif
+#ifdef TLS_darwin
+#define TPIDR TPIDRRO_EL0
+#define TLSG_IS_VARIABLE
+#define MRS_TPIDR_R0 WORD $0xd53bd060 // MRS TPIDRRO_EL0, R0
+#endif
+
+#ifdef GOOS_freebsd
+#define TPIDR TPIDR_EL0
+#define MRS_TPIDR_R0 WORD $0xd53bd040 // MRS TPIDR_EL0, R0
+#endif
+
+#ifdef GOOS_netbsd
+#define TPIDR TPIDRRO_EL0
+#define MRS_TPIDR_R0 WORD $0xd53bd040 // MRS TPIDRRO_EL0, R0
+#endif
+
+#ifdef GOOS_openbsd
+#define TPIDR TPIDR_EL0
+#define MRS_TPIDR_R0 WORD $0xd53bd040 // MRS TPIDR_EL0, R0
+#endif
+
+// Define something that will break the build if
+// the GOOS is unknown.
+#ifndef TPIDR
+#define MRS_TPIDR_R0 TPIDR_UNKNOWN
+#endif
diff --git a/src/runtime/tls_arm64.s b/src/runtime/tls_arm64.s
new file mode 100644
index 0000000..085012f
--- /dev/null
+++ b/src/runtime/tls_arm64.s
@@ -0,0 +1,58 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+#include "tls_arm64.h"
+
+TEXT runtime·load_g(SB),NOSPLIT,$0
+#ifndef TLS_darwin
+#ifndef GOOS_openbsd
+	MOVB	runtime·iscgo(SB), R0
+	CBZ	R0, nocgo
+#endif
+#endif
+
+	MRS_TPIDR_R0
+#ifdef TLS_darwin
+	// Darwin sometimes returns unaligned pointers
+	AND	$0xfffffffffffffff8, R0
+#endif
+	MOVD	runtime·tls_g(SB), R27
+	MOVD	(R0)(R27), g
+
+nocgo:
+	RET
+
+TEXT runtime·save_g(SB),NOSPLIT,$0
+#ifndef TLS_darwin
+#ifndef GOOS_openbsd
+	MOVB	runtime·iscgo(SB), R0
+	CBZ	R0, nocgo
+#endif
+#endif
+
+	MRS_TPIDR_R0
+#ifdef TLS_darwin
+	// Darwin sometimes returns unaligned pointers
+	AND	$0xfffffffffffffff8, R0
+#endif
+	MOVD	runtime·tls_g(SB), R27
+	MOVD	g, (R0)(R27)
+
+nocgo:
+	RET
+
+#ifdef TLSG_IS_VARIABLE
+#ifdef GOOS_android
+// Use the free TLS_SLOT_APP slot #2 on Android Q.
+// Earlier androids are set up in gcc_android.c.
+DATA runtime·tls_g+0(SB)/8, $16
+#endif
+GLOBL runtime·tls_g+0(SB), NOPTR, $8
+#else
+GLOBL runtime·tls_g+0(SB), TLSBSS, $8
+#endif
diff --git a/src/runtime/tls_mips64x.s b/src/runtime/tls_mips64x.s
new file mode 100644
index 0000000..888c0ef
--- /dev/null
+++ b/src/runtime/tls_mips64x.s
@@ -0,0 +1,30 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips64 mips64le
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// If !iscgo, this is a no-op.
+//
+// NOTE: mcall() assumes this clobbers only R23 (REGTMP).
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·iscgo(SB), R23
+	BEQ	R23, nocgo
+
+	MOVV	R3, R23	// save R3
+	MOVV	g, runtime·tls_g(SB) // TLS relocation clobbers R3
+	MOVV	R23, R3	// restore R3
+
+nocgo:
+	RET
+
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVV	runtime·tls_g(SB), g // TLS relocation clobbers R3
+	RET
+
+GLOBL runtime·tls_g(SB), TLSBSS, $8
diff --git a/src/runtime/tls_mipsx.s b/src/runtime/tls_mipsx.s
new file mode 100644
index 0000000..d2ffcd9
--- /dev/null
+++ b/src/runtime/tls_mipsx.s
@@ -0,0 +1,29 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mips mipsle
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// If !iscgo, this is a no-op.
+// NOTE: gogo asumes load_g only clobers g (R30) and REGTMP (R23)
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·iscgo(SB), R23
+	BEQ	R23, nocgo
+
+	MOVW	R3, R23
+	MOVW	g, runtime·tls_g(SB) // TLS relocation clobbers R3
+	MOVW	R23, R3
+
+nocgo:
+	RET
+
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	runtime·tls_g(SB), g // TLS relocation clobbers R3
+	RET
+
+GLOBL runtime·tls_g(SB), TLSBSS, $4
diff --git a/src/runtime/tls_ppc64x.s b/src/runtime/tls_ppc64x.s
new file mode 100644
index 0000000..c697449
--- /dev/null
+++ b/src/runtime/tls_ppc64x.s
@@ -0,0 +1,51 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ppc64 ppc64le
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// We have to resort to TLS variable to save g (R30).
+// One reason is that external code might trigger
+// SIGSEGV, and our runtime.sigtramp don't even know we
+// are in external code, and will continue to use R30,
+// this might well result in another SIGSEGV.
+
+// save_g saves the g register into pthread-provided
+// thread-local memory, so that we can call externally compiled
+// ppc64 code that will overwrite this register.
+//
+// If !iscgo, this is a no-op.
+//
+// NOTE: setg_gcc<> assume this clobbers only R31.
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+#ifndef GOOS_aix
+	MOVBZ	runtime·iscgo(SB), R31
+	CMP	R31, $0
+	BEQ	nocgo
+#endif
+	MOVD	runtime·tls_g(SB), R31
+	MOVD	g, 0(R13)(R31*1)
+
+nocgo:
+	RET
+
+// load_g loads the g register from pthread-provided
+// thread-local memory, for use after calling externally compiled
+// ppc64 code that overwrote those registers.
+//
+// This is never called directly from C code (it doesn't have to
+// follow the C ABI), but it may be called from a C context, where the
+// usual Go registers aren't set up.
+//
+// NOTE: _cgo_topofstack assumes this only clobbers g (R30), and R31.
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	runtime·tls_g(SB), R31
+	MOVD	0(R13)(R31*1), g
+	RET
+
+GLOBL runtime·tls_g+0(SB), TLSBSS+DUPOK, $8
diff --git a/src/runtime/tls_riscv64.s b/src/runtime/tls_riscv64.s
new file mode 100644
index 0000000..22b550b
--- /dev/null
+++ b/src/runtime/tls_riscv64.s
@@ -0,0 +1,30 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// If !iscgo, this is a no-op.
+//
+// NOTE: mcall() assumes this clobbers only X31 (REG_TMP).
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·iscgo(SB), X31
+	BEQ	X0, X31, nocgo
+
+	MOV	runtime·tls_g(SB), X31
+	ADD	X4, X31		// add offset to thread pointer (X4)
+	MOV	g, (X31)
+
+nocgo:
+	RET
+
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOV	runtime·tls_g(SB), X31
+	ADD	X4, X31		// add offset to thread pointer (X4)
+	MOV	(X31), g
+	RET
+
+GLOBL runtime·tls_g(SB), TLSBSS, $8
diff --git a/src/runtime/tls_s390x.s b/src/runtime/tls_s390x.s
new file mode 100644
index 0000000..cb6a21c
--- /dev/null
+++ b/src/runtime/tls_s390x.s
@@ -0,0 +1,51 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// We have to resort to TLS variable to save g (R13).
+// One reason is that external code might trigger
+// SIGSEGV, and our runtime.sigtramp don't even know we
+// are in external code, and will continue to use R13,
+// this might well result in another SIGSEGV.
+
+// save_g saves the g register into pthread-provided
+// thread-local memory, so that we can call externally compiled
+// s390x code that will overwrite this register.
+//
+// If !iscgo, this is a no-op.
+//
+// NOTE: setg_gcc<> assume this clobbers only R10 and R11.
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·iscgo(SB),  R10
+	CMPBEQ	R10, $0, nocgo
+	MOVW	AR0, R11
+	SLD	$32, R11
+	MOVW	AR1, R11
+	MOVD	runtime·tls_g(SB), R10
+	MOVD	g, 0(R10)(R11*1)
+nocgo:
+	RET
+
+// load_g loads the g register from pthread-provided
+// thread-local memory, for use after calling externally compiled
+// s390x code that overwrote those registers.
+//
+// This is never called directly from C code (it doesn't have to
+// follow the C ABI), but it may be called from a C context, where the
+// usual Go registers aren't set up.
+//
+// NOTE: _cgo_topofstack assumes this only clobbers g (R13), R10 and R11.
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	AR0, R11
+	SLD	$32, R11
+	MOVW	AR1, R11
+	MOVD	runtime·tls_g(SB), R10
+	MOVD	0(R10)(R11*1), g
+	RET
+
+GLOBL runtime·tls_g+0(SB),TLSBSS,$8
diff --git a/src/runtime/trace.go b/src/runtime/trace.go
new file mode 100644
index 0000000..bcd0b9d
--- /dev/null
+++ b/src/runtime/trace.go
@@ -0,0 +1,1231 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Go execution tracer.
+// The tracer captures a wide range of execution events like goroutine
+// creation/blocking/unblocking, syscall enter/exit/block, GC-related events,
+// changes of heap size, processor start/stop, etc and writes them to a buffer
+// in a compact form. A precise nanosecond-precision timestamp and a stack
+// trace is captured for most events.
+// See https://golang.org/s/go15trace for more info.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Event types in the trace, args are given in square brackets.
+const (
+	traceEvNone              = 0  // unused
+	traceEvBatch             = 1  // start of per-P batch of events [pid, timestamp]
+	traceEvFrequency         = 2  // contains tracer timer frequency [frequency (ticks per second)]
+	traceEvStack             = 3  // stack [stack id, number of PCs, array of {PC, func string ID, file string ID, line}]
+	traceEvGomaxprocs        = 4  // current value of GOMAXPROCS [timestamp, GOMAXPROCS, stack id]
+	traceEvProcStart         = 5  // start of P [timestamp, thread id]
+	traceEvProcStop          = 6  // stop of P [timestamp]
+	traceEvGCStart           = 7  // GC start [timestamp, seq, stack id]
+	traceEvGCDone            = 8  // GC done [timestamp]
+	traceEvGCSTWStart        = 9  // GC STW start [timestamp, kind]
+	traceEvGCSTWDone         = 10 // GC STW done [timestamp]
+	traceEvGCSweepStart      = 11 // GC sweep start [timestamp, stack id]
+	traceEvGCSweepDone       = 12 // GC sweep done [timestamp, swept, reclaimed]
+	traceEvGoCreate          = 13 // goroutine creation [timestamp, new goroutine id, new stack id, stack id]
+	traceEvGoStart           = 14 // goroutine starts running [timestamp, goroutine id, seq]
+	traceEvGoEnd             = 15 // goroutine ends [timestamp]
+	traceEvGoStop            = 16 // goroutine stops (like in select{}) [timestamp, stack]
+	traceEvGoSched           = 17 // goroutine calls Gosched [timestamp, stack]
+	traceEvGoPreempt         = 18 // goroutine is preempted [timestamp, stack]
+	traceEvGoSleep           = 19 // goroutine calls Sleep [timestamp, stack]
+	traceEvGoBlock           = 20 // goroutine blocks [timestamp, stack]
+	traceEvGoUnblock         = 21 // goroutine is unblocked [timestamp, goroutine id, seq, stack]
+	traceEvGoBlockSend       = 22 // goroutine blocks on chan send [timestamp, stack]
+	traceEvGoBlockRecv       = 23 // goroutine blocks on chan recv [timestamp, stack]
+	traceEvGoBlockSelect     = 24 // goroutine blocks on select [timestamp, stack]
+	traceEvGoBlockSync       = 25 // goroutine blocks on Mutex/RWMutex [timestamp, stack]
+	traceEvGoBlockCond       = 26 // goroutine blocks on Cond [timestamp, stack]
+	traceEvGoBlockNet        = 27 // goroutine blocks on network [timestamp, stack]
+	traceEvGoSysCall         = 28 // syscall enter [timestamp, stack]
+	traceEvGoSysExit         = 29 // syscall exit [timestamp, goroutine id, seq, real timestamp]
+	traceEvGoSysBlock        = 30 // syscall blocks [timestamp]
+	traceEvGoWaiting         = 31 // denotes that goroutine is blocked when tracing starts [timestamp, goroutine id]
+	traceEvGoInSyscall       = 32 // denotes that goroutine is in syscall when tracing starts [timestamp, goroutine id]
+	traceEvHeapAlloc         = 33 // memstats.heap_live change [timestamp, heap_alloc]
+	traceEvNextGC            = 34 // memstats.next_gc change [timestamp, next_gc]
+	traceEvTimerGoroutine    = 35 // not currently used; previously denoted timer goroutine [timer goroutine id]
+	traceEvFutileWakeup      = 36 // denotes that the previous wakeup of this goroutine was futile [timestamp]
+	traceEvString            = 37 // string dictionary entry [ID, length, string]
+	traceEvGoStartLocal      = 38 // goroutine starts running on the same P as the last event [timestamp, goroutine id]
+	traceEvGoUnblockLocal    = 39 // goroutine is unblocked on the same P as the last event [timestamp, goroutine id, stack]
+	traceEvGoSysExitLocal    = 40 // syscall exit on the same P as the last event [timestamp, goroutine id, real timestamp]
+	traceEvGoStartLabel      = 41 // goroutine starts running with label [timestamp, goroutine id, seq, label string id]
+	traceEvGoBlockGC         = 42 // goroutine blocks on GC assist [timestamp, stack]
+	traceEvGCMarkAssistStart = 43 // GC mark assist start [timestamp, stack]
+	traceEvGCMarkAssistDone  = 44 // GC mark assist done [timestamp]
+	traceEvUserTaskCreate    = 45 // trace.NewContext [timestamp, internal task id, internal parent task id, stack, name string]
+	traceEvUserTaskEnd       = 46 // end of a task [timestamp, internal task id, stack]
+	traceEvUserRegion        = 47 // trace.WithRegion [timestamp, internal task id, mode(0:start, 1:end), stack, name string]
+	traceEvUserLog           = 48 // trace.Log [timestamp, internal task id, key string id, stack, value string]
+	traceEvCount             = 49
+	// Byte is used but only 6 bits are available for event type.
+	// The remaining 2 bits are used to specify the number of arguments.
+	// That means, the max event type value is 63.
+)
+
+const (
+	// Timestamps in trace are cputicks/traceTickDiv.
+	// This makes absolute values of timestamp diffs smaller,
+	// and so they are encoded in less number of bytes.
+	// 64 on x86 is somewhat arbitrary (one tick is ~20ns on a 3GHz machine).
+	// The suggested increment frequency for PowerPC's time base register is
+	// 512 MHz according to Power ISA v2.07 section 6.2, so we use 16 on ppc64
+	// and ppc64le.
+	// Tracing won't work reliably for architectures where cputicks is emulated
+	// by nanotime, so the value doesn't matter for those architectures.
+	traceTickDiv = 16 + 48*(sys.Goarch386|sys.GoarchAmd64)
+	// Maximum number of PCs in a single stack trace.
+	// Since events contain only stack id rather than whole stack trace,
+	// we can allow quite large values here.
+	traceStackSize = 128
+	// Identifier of a fake P that is used when we trace without a real P.
+	traceGlobProc = -1
+	// Maximum number of bytes to encode uint64 in base-128.
+	traceBytesPerNumber = 10
+	// Shift of the number of arguments in the first event byte.
+	traceArgCountShift = 6
+	// Flag passed to traceGoPark to denote that the previous wakeup of this
+	// goroutine was futile. For example, a goroutine was unblocked on a mutex,
+	// but another goroutine got ahead and acquired the mutex before the first
+	// goroutine is scheduled, so the first goroutine has to block again.
+	// Such wakeups happen on buffered channels and sync.Mutex,
+	// but are generally not interesting for end user.
+	traceFutileWakeup byte = 128
+)
+
+// trace is global tracing context.
+var trace struct {
+	lock          mutex       // protects the following members
+	lockOwner     *g          // to avoid deadlocks during recursive lock locks
+	enabled       bool        // when set runtime traces events
+	shutdown      bool        // set when we are waiting for trace reader to finish after setting enabled to false
+	headerWritten bool        // whether ReadTrace has emitted trace header
+	footerWritten bool        // whether ReadTrace has emitted trace footer
+	shutdownSema  uint32      // used to wait for ReadTrace completion
+	seqStart      uint64      // sequence number when tracing was started
+	ticksStart    int64       // cputicks when tracing was started
+	ticksEnd      int64       // cputicks when tracing was stopped
+	timeStart     int64       // nanotime when tracing was started
+	timeEnd       int64       // nanotime when tracing was stopped
+	seqGC         uint64      // GC start/done sequencer
+	reading       traceBufPtr // buffer currently handed off to user
+	empty         traceBufPtr // stack of empty buffers
+	fullHead      traceBufPtr // queue of full buffers
+	fullTail      traceBufPtr
+	reader        guintptr        // goroutine that called ReadTrace, or nil
+	stackTab      traceStackTable // maps stack traces to unique ids
+
+	// Dictionary for traceEvString.
+	//
+	// TODO: central lock to access the map is not ideal.
+	//   option: pre-assign ids to all user annotation region names and tags
+	//   option: per-P cache
+	//   option: sync.Map like data structure
+	stringsLock mutex
+	strings     map[string]uint64
+	stringSeq   uint64
+
+	// markWorkerLabels maps gcMarkWorkerMode to string ID.
+	markWorkerLabels [len(gcMarkWorkerModeStrings)]uint64
+
+	bufLock mutex       // protects buf
+	buf     traceBufPtr // global trace buffer, used when running without a p
+}
+
+// traceBufHeader is per-P tracing buffer.
+type traceBufHeader struct {
+	link      traceBufPtr             // in trace.empty/full
+	lastTicks uint64                  // when we wrote the last event
+	pos       int                     // next write offset in arr
+	stk       [traceStackSize]uintptr // scratch buffer for traceback
+}
+
+// traceBuf is per-P tracing buffer.
+//
+//go:notinheap
+type traceBuf struct {
+	traceBufHeader
+	arr [64<<10 - unsafe.Sizeof(traceBufHeader{})]byte // underlying buffer for traceBufHeader.buf
+}
+
+// traceBufPtr is a *traceBuf that is not traced by the garbage
+// collector and doesn't have write barriers. traceBufs are not
+// allocated from the GC'd heap, so this is safe, and are often
+// manipulated in contexts where write barriers are not allowed, so
+// this is necessary.
+//
+// TODO: Since traceBuf is now go:notinheap, this isn't necessary.
+type traceBufPtr uintptr
+
+func (tp traceBufPtr) ptr() *traceBuf   { return (*traceBuf)(unsafe.Pointer(tp)) }
+func (tp *traceBufPtr) set(b *traceBuf) { *tp = traceBufPtr(unsafe.Pointer(b)) }
+func traceBufPtrOf(b *traceBuf) traceBufPtr {
+	return traceBufPtr(unsafe.Pointer(b))
+}
+
+// StartTrace enables tracing for the current process.
+// While tracing, the data will be buffered and available via ReadTrace.
+// StartTrace returns an error if tracing is already enabled.
+// Most clients should use the runtime/trace package or the testing package's
+// -test.trace flag instead of calling StartTrace directly.
+func StartTrace() error {
+	// Stop the world so that we can take a consistent snapshot
+	// of all goroutines at the beginning of the trace.
+	// Do not stop the world during GC so we ensure we always see
+	// a consistent view of GC-related events (e.g. a start is always
+	// paired with an end).
+	stopTheWorldGC("start tracing")
+
+	// Prevent sysmon from running any code that could generate events.
+	lock(&sched.sysmonlock)
+
+	// We are in stop-the-world, but syscalls can finish and write to trace concurrently.
+	// Exitsyscall could check trace.enabled long before and then suddenly wake up
+	// and decide to write to trace at a random point in time.
+	// However, such syscall will use the global trace.buf buffer, because we've
+	// acquired all p's by doing stop-the-world. So this protects us from such races.
+	lock(&trace.bufLock)
+
+	if trace.enabled || trace.shutdown {
+		unlock(&trace.bufLock)
+		unlock(&sched.sysmonlock)
+		startTheWorldGC()
+		return errorString("tracing is already enabled")
+	}
+
+	// Can't set trace.enabled yet. While the world is stopped, exitsyscall could
+	// already emit a delayed event (see exitTicks in exitsyscall) if we set trace.enabled here.
+	// That would lead to an inconsistent trace:
+	// - either GoSysExit appears before EvGoInSyscall,
+	// - or GoSysExit appears for a goroutine for which we don't emit EvGoInSyscall below.
+	// To instruct traceEvent that it must not ignore events below, we set startingtrace.
+	// trace.enabled is set afterwards once we have emitted all preliminary events.
+	_g_ := getg()
+	_g_.m.startingtrace = true
+
+	// Obtain current stack ID to use in all traceEvGoCreate events below.
+	mp := acquirem()
+	stkBuf := make([]uintptr, traceStackSize)
+	stackID := traceStackID(mp, stkBuf, 2)
+	releasem(mp)
+
+	for _, gp := range allgs {
+		status := readgstatus(gp)
+		if status != _Gdead {
+			gp.traceseq = 0
+			gp.tracelastp = getg().m.p
+			// +PCQuantum because traceFrameForPC expects return PCs and subtracts PCQuantum.
+			id := trace.stackTab.put([]uintptr{gp.startpc + sys.PCQuantum})
+			traceEvent(traceEvGoCreate, -1, uint64(gp.goid), uint64(id), stackID)
+		}
+		if status == _Gwaiting {
+			// traceEvGoWaiting is implied to have seq=1.
+			gp.traceseq++
+			traceEvent(traceEvGoWaiting, -1, uint64(gp.goid))
+		}
+		if status == _Gsyscall {
+			gp.traceseq++
+			traceEvent(traceEvGoInSyscall, -1, uint64(gp.goid))
+		} else {
+			gp.sysblocktraced = false
+		}
+	}
+	traceProcStart()
+	traceGoStart()
+	// Note: ticksStart needs to be set after we emit traceEvGoInSyscall events.
+	// If we do it the other way around, it is possible that exitsyscall will
+	// query sysexitticks after ticksStart but before traceEvGoInSyscall timestamp.
+	// It will lead to a false conclusion that cputicks is broken.
+	trace.ticksStart = cputicks()
+	trace.timeStart = nanotime()
+	trace.headerWritten = false
+	trace.footerWritten = false
+
+	// string to id mapping
+	//  0 : reserved for an empty string
+	//  remaining: other strings registered by traceString
+	trace.stringSeq = 0
+	trace.strings = make(map[string]uint64)
+
+	trace.seqGC = 0
+	_g_.m.startingtrace = false
+	trace.enabled = true
+
+	// Register runtime goroutine labels.
+	_, pid, bufp := traceAcquireBuffer()
+	for i, label := range gcMarkWorkerModeStrings[:] {
+		trace.markWorkerLabels[i], bufp = traceString(bufp, pid, label)
+	}
+	traceReleaseBuffer(pid)
+
+	unlock(&trace.bufLock)
+
+	unlock(&sched.sysmonlock)
+
+	startTheWorldGC()
+	return nil
+}
+
+// StopTrace stops tracing, if it was previously enabled.
+// StopTrace only returns after all the reads for the trace have completed.
+func StopTrace() {
+	// Stop the world so that we can collect the trace buffers from all p's below,
+	// and also to avoid races with traceEvent.
+	stopTheWorldGC("stop tracing")
+
+	// See the comment in StartTrace.
+	lock(&sched.sysmonlock)
+
+	// See the comment in StartTrace.
+	lock(&trace.bufLock)
+
+	if !trace.enabled {
+		unlock(&trace.bufLock)
+		unlock(&sched.sysmonlock)
+		startTheWorldGC()
+		return
+	}
+
+	traceGoSched()
+
+	// Loop over all allocated Ps because dead Ps may still have
+	// trace buffers.
+	for _, p := range allp[:cap(allp)] {
+		buf := p.tracebuf
+		if buf != 0 {
+			traceFullQueue(buf)
+			p.tracebuf = 0
+		}
+	}
+	if trace.buf != 0 {
+		buf := trace.buf
+		trace.buf = 0
+		if buf.ptr().pos != 0 {
+			traceFullQueue(buf)
+		}
+	}
+
+	for {
+		trace.ticksEnd = cputicks()
+		trace.timeEnd = nanotime()
+		// Windows time can tick only every 15ms, wait for at least one tick.
+		if trace.timeEnd != trace.timeStart {
+			break
+		}
+		osyield()
+	}
+
+	trace.enabled = false
+	trace.shutdown = true
+	unlock(&trace.bufLock)
+
+	unlock(&sched.sysmonlock)
+
+	startTheWorldGC()
+
+	// The world is started but we've set trace.shutdown, so new tracing can't start.
+	// Wait for the trace reader to flush pending buffers and stop.
+	semacquire(&trace.shutdownSema)
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&trace.shutdownSema))
+	}
+
+	// The lock protects us from races with StartTrace/StopTrace because they do stop-the-world.
+	lock(&trace.lock)
+	for _, p := range allp[:cap(allp)] {
+		if p.tracebuf != 0 {
+			throw("trace: non-empty trace buffer in proc")
+		}
+	}
+	if trace.buf != 0 {
+		throw("trace: non-empty global trace buffer")
+	}
+	if trace.fullHead != 0 || trace.fullTail != 0 {
+		throw("trace: non-empty full trace buffer")
+	}
+	if trace.reading != 0 || trace.reader != 0 {
+		throw("trace: reading after shutdown")
+	}
+	for trace.empty != 0 {
+		buf := trace.empty
+		trace.empty = buf.ptr().link
+		sysFree(unsafe.Pointer(buf), unsafe.Sizeof(*buf.ptr()), &memstats.other_sys)
+	}
+	trace.strings = nil
+	trace.shutdown = false
+	unlock(&trace.lock)
+}
+
+// ReadTrace returns the next chunk of binary tracing data, blocking until data
+// is available. If tracing is turned off and all the data accumulated while it
+// was on has been returned, ReadTrace returns nil. The caller must copy the
+// returned data before calling ReadTrace again.
+// ReadTrace must be called from one goroutine at a time.
+func ReadTrace() []byte {
+	// This function may need to lock trace.lock recursively
+	// (goparkunlock -> traceGoPark -> traceEvent -> traceFlush).
+	// To allow this we use trace.lockOwner.
+	// Also this function must not allocate while holding trace.lock:
+	// allocation can call heap allocate, which will try to emit a trace
+	// event while holding heap lock.
+	lock(&trace.lock)
+	trace.lockOwner = getg()
+
+	if trace.reader != 0 {
+		// More than one goroutine reads trace. This is bad.
+		// But we rather do not crash the program because of tracing,
+		// because tracing can be enabled at runtime on prod servers.
+		trace.lockOwner = nil
+		unlock(&trace.lock)
+		println("runtime: ReadTrace called from multiple goroutines simultaneously")
+		return nil
+	}
+	// Recycle the old buffer.
+	if buf := trace.reading; buf != 0 {
+		buf.ptr().link = trace.empty
+		trace.empty = buf
+		trace.reading = 0
+	}
+	// Write trace header.
+	if !trace.headerWritten {
+		trace.headerWritten = true
+		trace.lockOwner = nil
+		unlock(&trace.lock)
+		return []byte("go 1.11 trace\x00\x00\x00")
+	}
+	// Wait for new data.
+	if trace.fullHead == 0 && !trace.shutdown {
+		trace.reader.set(getg())
+		goparkunlock(&trace.lock, waitReasonTraceReaderBlocked, traceEvGoBlock, 2)
+		lock(&trace.lock)
+	}
+	// Write a buffer.
+	if trace.fullHead != 0 {
+		buf := traceFullDequeue()
+		trace.reading = buf
+		trace.lockOwner = nil
+		unlock(&trace.lock)
+		return buf.ptr().arr[:buf.ptr().pos]
+	}
+	// Write footer with timer frequency.
+	if !trace.footerWritten {
+		trace.footerWritten = true
+		// Use float64 because (trace.ticksEnd - trace.ticksStart) * 1e9 can overflow int64.
+		freq := float64(trace.ticksEnd-trace.ticksStart) * 1e9 / float64(trace.timeEnd-trace.timeStart) / traceTickDiv
+		trace.lockOwner = nil
+		unlock(&trace.lock)
+		var data []byte
+		data = append(data, traceEvFrequency|0<<traceArgCountShift)
+		data = traceAppend(data, uint64(freq))
+		// This will emit a bunch of full buffers, we will pick them up
+		// on the next iteration.
+		trace.stackTab.dump()
+		return data
+	}
+	// Done.
+	if trace.shutdown {
+		trace.lockOwner = nil
+		unlock(&trace.lock)
+		if raceenabled {
+			// Model synchronization on trace.shutdownSema, which race
+			// detector does not see. This is required to avoid false
+			// race reports on writer passed to trace.Start.
+			racerelease(unsafe.Pointer(&trace.shutdownSema))
+		}
+		// trace.enabled is already reset, so can call traceable functions.
+		semrelease(&trace.shutdownSema)
+		return nil
+	}
+	// Also bad, but see the comment above.
+	trace.lockOwner = nil
+	unlock(&trace.lock)
+	println("runtime: spurious wakeup of trace reader")
+	return nil
+}
+
+// traceReader returns the trace reader that should be woken up, if any.
+func traceReader() *g {
+	if trace.reader == 0 || (trace.fullHead == 0 && !trace.shutdown) {
+		return nil
+	}
+	lock(&trace.lock)
+	if trace.reader == 0 || (trace.fullHead == 0 && !trace.shutdown) {
+		unlock(&trace.lock)
+		return nil
+	}
+	gp := trace.reader.ptr()
+	trace.reader.set(nil)
+	unlock(&trace.lock)
+	return gp
+}
+
+// traceProcFree frees trace buffer associated with pp.
+func traceProcFree(pp *p) {
+	buf := pp.tracebuf
+	pp.tracebuf = 0
+	if buf == 0 {
+		return
+	}
+	lock(&trace.lock)
+	traceFullQueue(buf)
+	unlock(&trace.lock)
+}
+
+// traceFullQueue queues buf into queue of full buffers.
+func traceFullQueue(buf traceBufPtr) {
+	buf.ptr().link = 0
+	if trace.fullHead == 0 {
+		trace.fullHead = buf
+	} else {
+		trace.fullTail.ptr().link = buf
+	}
+	trace.fullTail = buf
+}
+
+// traceFullDequeue dequeues from queue of full buffers.
+func traceFullDequeue() traceBufPtr {
+	buf := trace.fullHead
+	if buf == 0 {
+		return 0
+	}
+	trace.fullHead = buf.ptr().link
+	if trace.fullHead == 0 {
+		trace.fullTail = 0
+	}
+	buf.ptr().link = 0
+	return buf
+}
+
+// traceEvent writes a single event to trace buffer, flushing the buffer if necessary.
+// ev is event type.
+// If skip > 0, write current stack id as the last argument (skipping skip top frames).
+// If skip = 0, this event type should contain a stack, but we don't want
+// to collect and remember it for this particular call.
+func traceEvent(ev byte, skip int, args ...uint64) {
+	mp, pid, bufp := traceAcquireBuffer()
+	// Double-check trace.enabled now that we've done m.locks++ and acquired bufLock.
+	// This protects from races between traceEvent and StartTrace/StopTrace.
+
+	// The caller checked that trace.enabled == true, but trace.enabled might have been
+	// turned off between the check and now. Check again. traceLockBuffer did mp.locks++,
+	// StopTrace does stopTheWorld, and stopTheWorld waits for mp.locks to go back to zero,
+	// so if we see trace.enabled == true now, we know it's true for the rest of the function.
+	// Exitsyscall can run even during stopTheWorld. The race with StartTrace/StopTrace
+	// during tracing in exitsyscall is resolved by locking trace.bufLock in traceLockBuffer.
+	//
+	// Note trace_userTaskCreate runs the same check.
+	if !trace.enabled && !mp.startingtrace {
+		traceReleaseBuffer(pid)
+		return
+	}
+
+	if skip > 0 {
+		if getg() == mp.curg {
+			skip++ // +1 because stack is captured in traceEventLocked.
+		}
+	}
+	traceEventLocked(0, mp, pid, bufp, ev, skip, args...)
+	traceReleaseBuffer(pid)
+}
+
+func traceEventLocked(extraBytes int, mp *m, pid int32, bufp *traceBufPtr, ev byte, skip int, args ...uint64) {
+	buf := bufp.ptr()
+	// TODO: test on non-zero extraBytes param.
+	maxSize := 2 + 5*traceBytesPerNumber + extraBytes // event type, length, sequence, timestamp, stack id and two add params
+	if buf == nil || len(buf.arr)-buf.pos < maxSize {
+		buf = traceFlush(traceBufPtrOf(buf), pid).ptr()
+		bufp.set(buf)
+	}
+
+	ticks := uint64(cputicks()) / traceTickDiv
+	tickDiff := ticks - buf.lastTicks
+	buf.lastTicks = ticks
+	narg := byte(len(args))
+	if skip >= 0 {
+		narg++
+	}
+	// We have only 2 bits for number of arguments.
+	// If number is >= 3, then the event type is followed by event length in bytes.
+	if narg > 3 {
+		narg = 3
+	}
+	startPos := buf.pos
+	buf.byte(ev | narg<<traceArgCountShift)
+	var lenp *byte
+	if narg == 3 {
+		// Reserve the byte for length assuming that length < 128.
+		buf.varint(0)
+		lenp = &buf.arr[buf.pos-1]
+	}
+	buf.varint(tickDiff)
+	for _, a := range args {
+		buf.varint(a)
+	}
+	if skip == 0 {
+		buf.varint(0)
+	} else if skip > 0 {
+		buf.varint(traceStackID(mp, buf.stk[:], skip))
+	}
+	evSize := buf.pos - startPos
+	if evSize > maxSize {
+		throw("invalid length of trace event")
+	}
+	if lenp != nil {
+		// Fill in actual length.
+		*lenp = byte(evSize - 2)
+	}
+}
+
+func traceStackID(mp *m, buf []uintptr, skip int) uint64 {
+	_g_ := getg()
+	gp := mp.curg
+	var nstk int
+	if gp == _g_ {
+		nstk = callers(skip+1, buf)
+	} else if gp != nil {
+		gp = mp.curg
+		nstk = gcallers(gp, skip, buf)
+	}
+	if nstk > 0 {
+		nstk-- // skip runtime.goexit
+	}
+	if nstk > 0 && gp.goid == 1 {
+		nstk-- // skip runtime.main
+	}
+	id := trace.stackTab.put(buf[:nstk])
+	return uint64(id)
+}
+
+// traceAcquireBuffer returns trace buffer to use and, if necessary, locks it.
+func traceAcquireBuffer() (mp *m, pid int32, bufp *traceBufPtr) {
+	mp = acquirem()
+	if p := mp.p.ptr(); p != nil {
+		return mp, p.id, &p.tracebuf
+	}
+	lock(&trace.bufLock)
+	return mp, traceGlobProc, &trace.buf
+}
+
+// traceReleaseBuffer releases a buffer previously acquired with traceAcquireBuffer.
+func traceReleaseBuffer(pid int32) {
+	if pid == traceGlobProc {
+		unlock(&trace.bufLock)
+	}
+	releasem(getg().m)
+}
+
+// traceFlush puts buf onto stack of full buffers and returns an empty buffer.
+func traceFlush(buf traceBufPtr, pid int32) traceBufPtr {
+	owner := trace.lockOwner
+	dolock := owner == nil || owner != getg().m.curg
+	if dolock {
+		lock(&trace.lock)
+	}
+	if buf != 0 {
+		traceFullQueue(buf)
+	}
+	if trace.empty != 0 {
+		buf = trace.empty
+		trace.empty = buf.ptr().link
+	} else {
+		buf = traceBufPtr(sysAlloc(unsafe.Sizeof(traceBuf{}), &memstats.other_sys))
+		if buf == 0 {
+			throw("trace: out of memory")
+		}
+	}
+	bufp := buf.ptr()
+	bufp.link.set(nil)
+	bufp.pos = 0
+
+	// initialize the buffer for a new batch
+	ticks := uint64(cputicks()) / traceTickDiv
+	bufp.lastTicks = ticks
+	bufp.byte(traceEvBatch | 1<<traceArgCountShift)
+	bufp.varint(uint64(pid))
+	bufp.varint(ticks)
+
+	if dolock {
+		unlock(&trace.lock)
+	}
+	return buf
+}
+
+// traceString adds a string to the trace.strings and returns the id.
+func traceString(bufp *traceBufPtr, pid int32, s string) (uint64, *traceBufPtr) {
+	if s == "" {
+		return 0, bufp
+	}
+
+	lock(&trace.stringsLock)
+	if raceenabled {
+		// raceacquire is necessary because the map access
+		// below is race annotated.
+		raceacquire(unsafe.Pointer(&trace.stringsLock))
+	}
+
+	if id, ok := trace.strings[s]; ok {
+		if raceenabled {
+			racerelease(unsafe.Pointer(&trace.stringsLock))
+		}
+		unlock(&trace.stringsLock)
+
+		return id, bufp
+	}
+
+	trace.stringSeq++
+	id := trace.stringSeq
+	trace.strings[s] = id
+
+	if raceenabled {
+		racerelease(unsafe.Pointer(&trace.stringsLock))
+	}
+	unlock(&trace.stringsLock)
+
+	// memory allocation in above may trigger tracing and
+	// cause *bufp changes. Following code now works with *bufp,
+	// so there must be no memory allocation or any activities
+	// that causes tracing after this point.
+
+	buf := bufp.ptr()
+	size := 1 + 2*traceBytesPerNumber + len(s)
+	if buf == nil || len(buf.arr)-buf.pos < size {
+		buf = traceFlush(traceBufPtrOf(buf), pid).ptr()
+		bufp.set(buf)
+	}
+	buf.byte(traceEvString)
+	buf.varint(id)
+
+	// double-check the string and the length can fit.
+	// Otherwise, truncate the string.
+	slen := len(s)
+	if room := len(buf.arr) - buf.pos; room < slen+traceBytesPerNumber {
+		slen = room
+	}
+
+	buf.varint(uint64(slen))
+	buf.pos += copy(buf.arr[buf.pos:], s[:slen])
+
+	bufp.set(buf)
+	return id, bufp
+}
+
+// traceAppend appends v to buf in little-endian-base-128 encoding.
+func traceAppend(buf []byte, v uint64) []byte {
+	for ; v >= 0x80; v >>= 7 {
+		buf = append(buf, 0x80|byte(v))
+	}
+	buf = append(buf, byte(v))
+	return buf
+}
+
+// varint appends v to buf in little-endian-base-128 encoding.
+func (buf *traceBuf) varint(v uint64) {
+	pos := buf.pos
+	for ; v >= 0x80; v >>= 7 {
+		buf.arr[pos] = 0x80 | byte(v)
+		pos++
+	}
+	buf.arr[pos] = byte(v)
+	pos++
+	buf.pos = pos
+}
+
+// byte appends v to buf.
+func (buf *traceBuf) byte(v byte) {
+	buf.arr[buf.pos] = v
+	buf.pos++
+}
+
+// traceStackTable maps stack traces (arrays of PC's) to unique uint32 ids.
+// It is lock-free for reading.
+type traceStackTable struct {
+	lock mutex
+	seq  uint32
+	mem  traceAlloc
+	tab  [1 << 13]traceStackPtr
+}
+
+// traceStack is a single stack in traceStackTable.
+type traceStack struct {
+	link traceStackPtr
+	hash uintptr
+	id   uint32
+	n    int
+	stk  [0]uintptr // real type [n]uintptr
+}
+
+type traceStackPtr uintptr
+
+func (tp traceStackPtr) ptr() *traceStack { return (*traceStack)(unsafe.Pointer(tp)) }
+
+// stack returns slice of PCs.
+func (ts *traceStack) stack() []uintptr {
+	return (*[traceStackSize]uintptr)(unsafe.Pointer(&ts.stk))[:ts.n]
+}
+
+// put returns a unique id for the stack trace pcs and caches it in the table,
+// if it sees the trace for the first time.
+func (tab *traceStackTable) put(pcs []uintptr) uint32 {
+	if len(pcs) == 0 {
+		return 0
+	}
+	hash := memhash(unsafe.Pointer(&pcs[0]), 0, uintptr(len(pcs))*unsafe.Sizeof(pcs[0]))
+	// First, search the hashtable w/o the mutex.
+	if id := tab.find(pcs, hash); id != 0 {
+		return id
+	}
+	// Now, double check under the mutex.
+	lock(&tab.lock)
+	if id := tab.find(pcs, hash); id != 0 {
+		unlock(&tab.lock)
+		return id
+	}
+	// Create new record.
+	tab.seq++
+	stk := tab.newStack(len(pcs))
+	stk.hash = hash
+	stk.id = tab.seq
+	stk.n = len(pcs)
+	stkpc := stk.stack()
+	for i, pc := range pcs {
+		stkpc[i] = pc
+	}
+	part := int(hash % uintptr(len(tab.tab)))
+	stk.link = tab.tab[part]
+	atomicstorep(unsafe.Pointer(&tab.tab[part]), unsafe.Pointer(stk))
+	unlock(&tab.lock)
+	return stk.id
+}
+
+// find checks if the stack trace pcs is already present in the table.
+func (tab *traceStackTable) find(pcs []uintptr, hash uintptr) uint32 {
+	part := int(hash % uintptr(len(tab.tab)))
+Search:
+	for stk := tab.tab[part].ptr(); stk != nil; stk = stk.link.ptr() {
+		if stk.hash == hash && stk.n == len(pcs) {
+			for i, stkpc := range stk.stack() {
+				if stkpc != pcs[i] {
+					continue Search
+				}
+			}
+			return stk.id
+		}
+	}
+	return 0
+}
+
+// newStack allocates a new stack of size n.
+func (tab *traceStackTable) newStack(n int) *traceStack {
+	return (*traceStack)(tab.mem.alloc(unsafe.Sizeof(traceStack{}) + uintptr(n)*sys.PtrSize))
+}
+
+// allFrames returns all of the Frames corresponding to pcs.
+func allFrames(pcs []uintptr) []Frame {
+	frames := make([]Frame, 0, len(pcs))
+	ci := CallersFrames(pcs)
+	for {
+		f, more := ci.Next()
+		frames = append(frames, f)
+		if !more {
+			return frames
+		}
+	}
+}
+
+// dump writes all previously cached stacks to trace buffers,
+// releases all memory and resets state.
+func (tab *traceStackTable) dump() {
+	var tmp [(2 + 4*traceStackSize) * traceBytesPerNumber]byte
+	bufp := traceFlush(0, 0)
+	for _, stk := range tab.tab {
+		stk := stk.ptr()
+		for ; stk != nil; stk = stk.link.ptr() {
+			tmpbuf := tmp[:0]
+			tmpbuf = traceAppend(tmpbuf, uint64(stk.id))
+			frames := allFrames(stk.stack())
+			tmpbuf = traceAppend(tmpbuf, uint64(len(frames)))
+			for _, f := range frames {
+				var frame traceFrame
+				frame, bufp = traceFrameForPC(bufp, 0, f)
+				tmpbuf = traceAppend(tmpbuf, uint64(f.PC))
+				tmpbuf = traceAppend(tmpbuf, uint64(frame.funcID))
+				tmpbuf = traceAppend(tmpbuf, uint64(frame.fileID))
+				tmpbuf = traceAppend(tmpbuf, uint64(frame.line))
+			}
+			// Now copy to the buffer.
+			size := 1 + traceBytesPerNumber + len(tmpbuf)
+			if buf := bufp.ptr(); len(buf.arr)-buf.pos < size {
+				bufp = traceFlush(bufp, 0)
+			}
+			buf := bufp.ptr()
+			buf.byte(traceEvStack | 3<<traceArgCountShift)
+			buf.varint(uint64(len(tmpbuf)))
+			buf.pos += copy(buf.arr[buf.pos:], tmpbuf)
+		}
+	}
+
+	lock(&trace.lock)
+	traceFullQueue(bufp)
+	unlock(&trace.lock)
+
+	tab.mem.drop()
+	*tab = traceStackTable{}
+	lockInit(&((*tab).lock), lockRankTraceStackTab)
+}
+
+type traceFrame struct {
+	funcID uint64
+	fileID uint64
+	line   uint64
+}
+
+// traceFrameForPC records the frame information.
+// It may allocate memory.
+func traceFrameForPC(buf traceBufPtr, pid int32, f Frame) (traceFrame, traceBufPtr) {
+	bufp := &buf
+	var frame traceFrame
+
+	fn := f.Function
+	const maxLen = 1 << 10
+	if len(fn) > maxLen {
+		fn = fn[len(fn)-maxLen:]
+	}
+	frame.funcID, bufp = traceString(bufp, pid, fn)
+	frame.line = uint64(f.Line)
+	file := f.File
+	if len(file) > maxLen {
+		file = file[len(file)-maxLen:]
+	}
+	frame.fileID, bufp = traceString(bufp, pid, file)
+	return frame, (*bufp)
+}
+
+// traceAlloc is a non-thread-safe region allocator.
+// It holds a linked list of traceAllocBlock.
+type traceAlloc struct {
+	head traceAllocBlockPtr
+	off  uintptr
+}
+
+// traceAllocBlock is a block in traceAlloc.
+//
+// traceAllocBlock is allocated from non-GC'd memory, so it must not
+// contain heap pointers. Writes to pointers to traceAllocBlocks do
+// not need write barriers.
+//
+//go:notinheap
+type traceAllocBlock struct {
+	next traceAllocBlockPtr
+	data [64<<10 - sys.PtrSize]byte
+}
+
+// TODO: Since traceAllocBlock is now go:notinheap, this isn't necessary.
+type traceAllocBlockPtr uintptr
+
+func (p traceAllocBlockPtr) ptr() *traceAllocBlock   { return (*traceAllocBlock)(unsafe.Pointer(p)) }
+func (p *traceAllocBlockPtr) set(x *traceAllocBlock) { *p = traceAllocBlockPtr(unsafe.Pointer(x)) }
+
+// alloc allocates n-byte block.
+func (a *traceAlloc) alloc(n uintptr) unsafe.Pointer {
+	n = alignUp(n, sys.PtrSize)
+	if a.head == 0 || a.off+n > uintptr(len(a.head.ptr().data)) {
+		if n > uintptr(len(a.head.ptr().data)) {
+			throw("trace: alloc too large")
+		}
+		block := (*traceAllocBlock)(sysAlloc(unsafe.Sizeof(traceAllocBlock{}), &memstats.other_sys))
+		if block == nil {
+			throw("trace: out of memory")
+		}
+		block.next.set(a.head.ptr())
+		a.head.set(block)
+		a.off = 0
+	}
+	p := &a.head.ptr().data[a.off]
+	a.off += n
+	return unsafe.Pointer(p)
+}
+
+// drop frees all previously allocated memory and resets the allocator.
+func (a *traceAlloc) drop() {
+	for a.head != 0 {
+		block := a.head.ptr()
+		a.head.set(block.next.ptr())
+		sysFree(unsafe.Pointer(block), unsafe.Sizeof(traceAllocBlock{}), &memstats.other_sys)
+	}
+}
+
+// The following functions write specific events to trace.
+
+func traceGomaxprocs(procs int32) {
+	traceEvent(traceEvGomaxprocs, 1, uint64(procs))
+}
+
+func traceProcStart() {
+	traceEvent(traceEvProcStart, -1, uint64(getg().m.id))
+}
+
+func traceProcStop(pp *p) {
+	// Sysmon and stopTheWorld can stop Ps blocked in syscalls,
+	// to handle this we temporary employ the P.
+	mp := acquirem()
+	oldp := mp.p
+	mp.p.set(pp)
+	traceEvent(traceEvProcStop, -1)
+	mp.p = oldp
+	releasem(mp)
+}
+
+func traceGCStart() {
+	traceEvent(traceEvGCStart, 3, trace.seqGC)
+	trace.seqGC++
+}
+
+func traceGCDone() {
+	traceEvent(traceEvGCDone, -1)
+}
+
+func traceGCSTWStart(kind int) {
+	traceEvent(traceEvGCSTWStart, -1, uint64(kind))
+}
+
+func traceGCSTWDone() {
+	traceEvent(traceEvGCSTWDone, -1)
+}
+
+// traceGCSweepStart prepares to trace a sweep loop. This does not
+// emit any events until traceGCSweepSpan is called.
+//
+// traceGCSweepStart must be paired with traceGCSweepDone and there
+// must be no preemption points between these two calls.
+func traceGCSweepStart() {
+	// Delay the actual GCSweepStart event until the first span
+	// sweep. If we don't sweep anything, don't emit any events.
+	_p_ := getg().m.p.ptr()
+	if _p_.traceSweep {
+		throw("double traceGCSweepStart")
+	}
+	_p_.traceSweep, _p_.traceSwept, _p_.traceReclaimed = true, 0, 0
+}
+
+// traceGCSweepSpan traces the sweep of a single page.
+//
+// This may be called outside a traceGCSweepStart/traceGCSweepDone
+// pair; however, it will not emit any trace events in this case.
+func traceGCSweepSpan(bytesSwept uintptr) {
+	_p_ := getg().m.p.ptr()
+	if _p_.traceSweep {
+		if _p_.traceSwept == 0 {
+			traceEvent(traceEvGCSweepStart, 1)
+		}
+		_p_.traceSwept += bytesSwept
+	}
+}
+
+func traceGCSweepDone() {
+	_p_ := getg().m.p.ptr()
+	if !_p_.traceSweep {
+		throw("missing traceGCSweepStart")
+	}
+	if _p_.traceSwept != 0 {
+		traceEvent(traceEvGCSweepDone, -1, uint64(_p_.traceSwept), uint64(_p_.traceReclaimed))
+	}
+	_p_.traceSweep = false
+}
+
+func traceGCMarkAssistStart() {
+	traceEvent(traceEvGCMarkAssistStart, 1)
+}
+
+func traceGCMarkAssistDone() {
+	traceEvent(traceEvGCMarkAssistDone, -1)
+}
+
+func traceGoCreate(newg *g, pc uintptr) {
+	newg.traceseq = 0
+	newg.tracelastp = getg().m.p
+	// +PCQuantum because traceFrameForPC expects return PCs and subtracts PCQuantum.
+	id := trace.stackTab.put([]uintptr{pc + sys.PCQuantum})
+	traceEvent(traceEvGoCreate, 2, uint64(newg.goid), uint64(id))
+}
+
+func traceGoStart() {
+	_g_ := getg().m.curg
+	_p_ := _g_.m.p
+	_g_.traceseq++
+	if _p_.ptr().gcMarkWorkerMode != gcMarkWorkerNotWorker {
+		traceEvent(traceEvGoStartLabel, -1, uint64(_g_.goid), _g_.traceseq, trace.markWorkerLabels[_p_.ptr().gcMarkWorkerMode])
+	} else if _g_.tracelastp == _p_ {
+		traceEvent(traceEvGoStartLocal, -1, uint64(_g_.goid))
+	} else {
+		_g_.tracelastp = _p_
+		traceEvent(traceEvGoStart, -1, uint64(_g_.goid), _g_.traceseq)
+	}
+}
+
+func traceGoEnd() {
+	traceEvent(traceEvGoEnd, -1)
+}
+
+func traceGoSched() {
+	_g_ := getg()
+	_g_.tracelastp = _g_.m.p
+	traceEvent(traceEvGoSched, 1)
+}
+
+func traceGoPreempt() {
+	_g_ := getg()
+	_g_.tracelastp = _g_.m.p
+	traceEvent(traceEvGoPreempt, 1)
+}
+
+func traceGoPark(traceEv byte, skip int) {
+	if traceEv&traceFutileWakeup != 0 {
+		traceEvent(traceEvFutileWakeup, -1)
+	}
+	traceEvent(traceEv & ^traceFutileWakeup, skip)
+}
+
+func traceGoUnpark(gp *g, skip int) {
+	_p_ := getg().m.p
+	gp.traceseq++
+	if gp.tracelastp == _p_ {
+		traceEvent(traceEvGoUnblockLocal, skip, uint64(gp.goid))
+	} else {
+		gp.tracelastp = _p_
+		traceEvent(traceEvGoUnblock, skip, uint64(gp.goid), gp.traceseq)
+	}
+}
+
+func traceGoSysCall() {
+	traceEvent(traceEvGoSysCall, 1)
+}
+
+func traceGoSysExit(ts int64) {
+	if ts != 0 && ts < trace.ticksStart {
+		// There is a race between the code that initializes sysexitticks
+		// (in exitsyscall, which runs without a P, and therefore is not
+		// stopped with the rest of the world) and the code that initializes
+		// a new trace. The recorded sysexitticks must therefore be treated
+		// as "best effort". If they are valid for this trace, then great,
+		// use them for greater accuracy. But if they're not valid for this
+		// trace, assume that the trace was started after the actual syscall
+		// exit (but before we actually managed to start the goroutine,
+		// aka right now), and assign a fresh time stamp to keep the log consistent.
+		ts = 0
+	}
+	_g_ := getg().m.curg
+	_g_.traceseq++
+	_g_.tracelastp = _g_.m.p
+	traceEvent(traceEvGoSysExit, -1, uint64(_g_.goid), _g_.traceseq, uint64(ts)/traceTickDiv)
+}
+
+func traceGoSysBlock(pp *p) {
+	// Sysmon and stopTheWorld can declare syscalls running on remote Ps as blocked,
+	// to handle this we temporary employ the P.
+	mp := acquirem()
+	oldp := mp.p
+	mp.p.set(pp)
+	traceEvent(traceEvGoSysBlock, -1)
+	mp.p = oldp
+	releasem(mp)
+}
+
+func traceHeapAlloc() {
+	traceEvent(traceEvHeapAlloc, -1, memstats.heap_live)
+}
+
+func traceNextGC() {
+	if nextGC := atomic.Load64(&memstats.next_gc); nextGC == ^uint64(0) {
+		// Heap-based triggering is disabled.
+		traceEvent(traceEvNextGC, -1, 0)
+	} else {
+		traceEvent(traceEvNextGC, -1, nextGC)
+	}
+}
+
+// To access runtime functions from runtime/trace.
+// See runtime/trace/annotation.go
+
+//go:linkname trace_userTaskCreate runtime/trace.userTaskCreate
+func trace_userTaskCreate(id, parentID uint64, taskType string) {
+	if !trace.enabled {
+		return
+	}
+
+	// Same as in traceEvent.
+	mp, pid, bufp := traceAcquireBuffer()
+	if !trace.enabled && !mp.startingtrace {
+		traceReleaseBuffer(pid)
+		return
+	}
+
+	typeStringID, bufp := traceString(bufp, pid, taskType)
+	traceEventLocked(0, mp, pid, bufp, traceEvUserTaskCreate, 3, id, parentID, typeStringID)
+	traceReleaseBuffer(pid)
+}
+
+//go:linkname trace_userTaskEnd runtime/trace.userTaskEnd
+func trace_userTaskEnd(id uint64) {
+	traceEvent(traceEvUserTaskEnd, 2, id)
+}
+
+//go:linkname trace_userRegion runtime/trace.userRegion
+func trace_userRegion(id, mode uint64, name string) {
+	if !trace.enabled {
+		return
+	}
+
+	mp, pid, bufp := traceAcquireBuffer()
+	if !trace.enabled && !mp.startingtrace {
+		traceReleaseBuffer(pid)
+		return
+	}
+
+	nameStringID, bufp := traceString(bufp, pid, name)
+	traceEventLocked(0, mp, pid, bufp, traceEvUserRegion, 3, id, mode, nameStringID)
+	traceReleaseBuffer(pid)
+}
+
+//go:linkname trace_userLog runtime/trace.userLog
+func trace_userLog(id uint64, category, message string) {
+	if !trace.enabled {
+		return
+	}
+
+	mp, pid, bufp := traceAcquireBuffer()
+	if !trace.enabled && !mp.startingtrace {
+		traceReleaseBuffer(pid)
+		return
+	}
+
+	categoryID, bufp := traceString(bufp, pid, category)
+
+	extraSpace := traceBytesPerNumber + len(message) // extraSpace for the value string
+	traceEventLocked(extraSpace, mp, pid, bufp, traceEvUserLog, 3, id, categoryID)
+	// traceEventLocked reserved extra space for val and len(val)
+	// in buf, so buf now has room for the following.
+	buf := bufp.ptr()
+
+	// double-check the message and its length can fit.
+	// Otherwise, truncate the message.
+	slen := len(message)
+	if room := len(buf.arr) - buf.pos; room < slen+traceBytesPerNumber {
+		slen = room
+	}
+	buf.varint(uint64(slen))
+	buf.pos += copy(buf.arr[buf.pos:], message[:slen])
+
+	traceReleaseBuffer(pid)
+}
diff --git a/src/runtime/trace/annotation.go b/src/runtime/trace/annotation.go
new file mode 100644
index 0000000..6e18bfb
--- /dev/null
+++ b/src/runtime/trace/annotation.go
@@ -0,0 +1,200 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package trace
+
+import (
+	"context"
+	"fmt"
+	"sync/atomic"
+	_ "unsafe"
+)
+
+type traceContextKey struct{}
+
+// NewTask creates a task instance with the type taskType and returns
+// it along with a Context that carries the task.
+// If the input context contains a task, the new task is its subtask.
+//
+// The taskType is used to classify task instances. Analysis tools
+// like the Go execution tracer may assume there are only a bounded
+// number of unique task types in the system.
+//
+// The returned end function is used to mark the task's end.
+// The trace tool measures task latency as the time between task creation
+// and when the end function is called, and provides the latency
+// distribution per task type.
+// If the end function is called multiple times, only the first
+// call is used in the latency measurement.
+//
+//   ctx, task := trace.NewTask(ctx, "awesomeTask")
+//   trace.WithRegion(ctx, "preparation", prepWork)
+//   // preparation of the task
+//   go func() {  // continue processing the task in a separate goroutine.
+//       defer task.End()
+//       trace.WithRegion(ctx, "remainingWork", remainingWork)
+//   }()
+func NewTask(pctx context.Context, taskType string) (ctx context.Context, task *Task) {
+	pid := fromContext(pctx).id
+	id := newID()
+	userTaskCreate(id, pid, taskType)
+	s := &Task{id: id}
+	return context.WithValue(pctx, traceContextKey{}, s), s
+
+	// We allocate a new task and the end function even when
+	// the tracing is disabled because the context and the detach
+	// function can be used across trace enable/disable boundaries,
+	// which complicates the problem.
+	//
+	// For example, consider the following scenario:
+	//   - trace is enabled.
+	//   - trace.WithRegion is called, so a new context ctx
+	//     with a new region is created.
+	//   - trace is disabled.
+	//   - trace is enabled again.
+	//   - trace APIs with the ctx is called. Is the ID in the task
+	//   a valid one to use?
+	//
+	// TODO(hyangah): reduce the overhead at least when
+	// tracing is disabled. Maybe the id can embed a tracing
+	// round number and ignore ids generated from previous
+	// tracing round.
+}
+
+func fromContext(ctx context.Context) *Task {
+	if s, ok := ctx.Value(traceContextKey{}).(*Task); ok {
+		return s
+	}
+	return &bgTask
+}
+
+// Task is a data type for tracing a user-defined, logical operation.
+type Task struct {
+	id uint64
+	// TODO(hyangah): record parent id?
+}
+
+// End marks the end of the operation represented by the Task.
+func (t *Task) End() {
+	userTaskEnd(t.id)
+}
+
+var lastTaskID uint64 = 0 // task id issued last time
+
+func newID() uint64 {
+	// TODO(hyangah): use per-P cache
+	return atomic.AddUint64(&lastTaskID, 1)
+}
+
+var bgTask = Task{id: uint64(0)}
+
+// Log emits a one-off event with the given category and message.
+// Category can be empty and the API assumes there are only a handful of
+// unique categories in the system.
+func Log(ctx context.Context, category, message string) {
+	id := fromContext(ctx).id
+	userLog(id, category, message)
+}
+
+// Logf is like Log, but the value is formatted using the specified format spec.
+func Logf(ctx context.Context, category, format string, args ...interface{}) {
+	if IsEnabled() {
+		// Ideally this should be just Log, but that will
+		// add one more frame in the stack trace.
+		id := fromContext(ctx).id
+		userLog(id, category, fmt.Sprintf(format, args...))
+	}
+}
+
+const (
+	regionStartCode = uint64(0)
+	regionEndCode   = uint64(1)
+)
+
+// WithRegion starts a region associated with its calling goroutine, runs fn,
+// and then ends the region. If the context carries a task, the region is
+// associated with the task. Otherwise, the region is attached to the background
+// task.
+//
+// The regionType is used to classify regions, so there should be only a
+// handful of unique region types.
+func WithRegion(ctx context.Context, regionType string, fn func()) {
+	// NOTE:
+	// WithRegion helps avoiding misuse of the API but in practice,
+	// this is very restrictive:
+	// - Use of WithRegion makes the stack traces captured from
+	//   region start and end are identical.
+	// - Refactoring the existing code to use WithRegion is sometimes
+	//   hard and makes the code less readable.
+	//     e.g. code block nested deep in the loop with various
+	//          exit point with return values
+	// - Refactoring the code to use this API with closure can
+	//   cause different GC behavior such as retaining some parameters
+	//   longer.
+	// This causes more churns in code than I hoped, and sometimes
+	// makes the code less readable.
+
+	id := fromContext(ctx).id
+	userRegion(id, regionStartCode, regionType)
+	defer userRegion(id, regionEndCode, regionType)
+	fn()
+}
+
+// StartRegion starts a region and returns a function for marking the
+// end of the region. The returned Region's End function must be called
+// from the same goroutine where the region was started.
+// Within each goroutine, regions must nest. That is, regions started
+// after this region must be ended before this region can be ended.
+// Recommended usage is
+//
+//     defer trace.StartRegion(ctx, "myTracedRegion").End()
+//
+func StartRegion(ctx context.Context, regionType string) *Region {
+	if !IsEnabled() {
+		return noopRegion
+	}
+	id := fromContext(ctx).id
+	userRegion(id, regionStartCode, regionType)
+	return &Region{id, regionType}
+}
+
+// Region is a region of code whose execution time interval is traced.
+type Region struct {
+	id         uint64
+	regionType string
+}
+
+var noopRegion = &Region{}
+
+// End marks the end of the traced code region.
+func (r *Region) End() {
+	if r == noopRegion {
+		return
+	}
+	userRegion(r.id, regionEndCode, r.regionType)
+}
+
+// IsEnabled reports whether tracing is enabled.
+// The information is advisory only. The tracing status
+// may have changed by the time this function returns.
+func IsEnabled() bool {
+	enabled := atomic.LoadInt32(&tracing.enabled)
+	return enabled == 1
+}
+
+//
+// Function bodies are defined in runtime/trace.go
+//
+
+// emits UserTaskCreate event.
+func userTaskCreate(id, parentID uint64, taskType string)
+
+// emits UserTaskEnd event.
+func userTaskEnd(id uint64)
+
+// emits UserRegion event.
+func userRegion(id, mode uint64, regionType string)
+
+// emits UserLog event.
+func userLog(id uint64, category, message string)
diff --git a/src/runtime/trace/annotation_test.go b/src/runtime/trace/annotation_test.go
new file mode 100644
index 0000000..31fccef
--- /dev/null
+++ b/src/runtime/trace/annotation_test.go
@@ -0,0 +1,156 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package trace_test
+
+import (
+	"bytes"
+	"context"
+	"fmt"
+	"internal/trace"
+	"reflect"
+	. "runtime/trace"
+	"strings"
+	"sync"
+	"testing"
+)
+
+func BenchmarkStartRegion(b *testing.B) {
+	b.ReportAllocs()
+	ctx, task := NewTask(context.Background(), "benchmark")
+	defer task.End()
+
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			StartRegion(ctx, "region").End()
+		}
+	})
+}
+
+func BenchmarkNewTask(b *testing.B) {
+	b.ReportAllocs()
+	pctx, task := NewTask(context.Background(), "benchmark")
+	defer task.End()
+
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			_, task := NewTask(pctx, "task")
+			task.End()
+		}
+	})
+}
+
+func TestUserTaskRegion(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	bgctx, cancel := context.WithCancel(context.Background())
+	defer cancel()
+
+	preExistingRegion := StartRegion(bgctx, "pre-existing region")
+
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+
+	// Beginning of traced execution
+	var wg sync.WaitGroup
+	ctx, task := NewTask(bgctx, "task0") // EvUserTaskCreate("task0")
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		defer task.End() // EvUserTaskEnd("task0")
+
+		WithRegion(ctx, "region0", func() {
+			// EvUserRegionCreate("region0", start)
+			WithRegion(ctx, "region1", func() {
+				Log(ctx, "key0", "0123456789abcdef") // EvUserLog("task0", "key0", "0....f")
+			})
+			// EvUserRegion("region0", end)
+		})
+	}()
+
+	wg.Wait()
+
+	preExistingRegion.End()
+	postExistingRegion := StartRegion(bgctx, "post-existing region")
+
+	// End of traced execution
+	Stop()
+
+	postExistingRegion.End()
+
+	saveTrace(t, buf, "TestUserTaskRegion")
+	res, err := trace.Parse(buf, "")
+	if err == trace.ErrTimeOrder {
+		// golang.org/issues/16755
+		t.Skipf("skipping trace: %v", err)
+	}
+	if err != nil {
+		t.Fatalf("Parse failed: %v", err)
+	}
+
+	// Check whether we see all user annotation related records in order
+	type testData struct {
+		typ     byte
+		strs    []string
+		args    []uint64
+		setLink bool
+	}
+
+	var got []testData
+	tasks := map[uint64]string{}
+	for _, e := range res.Events {
+		t.Logf("%s", e)
+		switch e.Type {
+		case trace.EvUserTaskCreate:
+			taskName := e.SArgs[0]
+			got = append(got, testData{trace.EvUserTaskCreate, []string{taskName}, nil, e.Link != nil})
+			if e.Link != nil && e.Link.Type != trace.EvUserTaskEnd {
+				t.Errorf("Unexpected linked event %q->%q", e, e.Link)
+			}
+			tasks[e.Args[0]] = taskName
+		case trace.EvUserLog:
+			key, val := e.SArgs[0], e.SArgs[1]
+			taskName := tasks[e.Args[0]]
+			got = append(got, testData{trace.EvUserLog, []string{taskName, key, val}, nil, e.Link != nil})
+		case trace.EvUserTaskEnd:
+			taskName := tasks[e.Args[0]]
+			got = append(got, testData{trace.EvUserTaskEnd, []string{taskName}, nil, e.Link != nil})
+			if e.Link != nil && e.Link.Type != trace.EvUserTaskCreate {
+				t.Errorf("Unexpected linked event %q->%q", e, e.Link)
+			}
+		case trace.EvUserRegion:
+			taskName := tasks[e.Args[0]]
+			regionName := e.SArgs[0]
+			got = append(got, testData{trace.EvUserRegion, []string{taskName, regionName}, []uint64{e.Args[1]}, e.Link != nil})
+			if e.Link != nil && (e.Link.Type != trace.EvUserRegion || e.Link.SArgs[0] != regionName) {
+				t.Errorf("Unexpected linked event %q->%q", e, e.Link)
+			}
+		}
+	}
+	want := []testData{
+		{trace.EvUserTaskCreate, []string{"task0"}, nil, true},
+		{trace.EvUserRegion, []string{"task0", "region0"}, []uint64{0}, true},
+		{trace.EvUserRegion, []string{"task0", "region1"}, []uint64{0}, true},
+		{trace.EvUserLog, []string{"task0", "key0", "0123456789abcdef"}, nil, false},
+		{trace.EvUserRegion, []string{"task0", "region1"}, []uint64{1}, false},
+		{trace.EvUserRegion, []string{"task0", "region0"}, []uint64{1}, false},
+		{trace.EvUserTaskEnd, []string{"task0"}, nil, false},
+		//  Currently, pre-existing region is not recorded to avoid allocations.
+		//  {trace.EvUserRegion, []string{"", "pre-existing region"}, []uint64{1}, false},
+		{trace.EvUserRegion, []string{"", "post-existing region"}, []uint64{0}, false},
+	}
+	if !reflect.DeepEqual(got, want) {
+		pretty := func(data []testData) string {
+			var s strings.Builder
+			for _, d := range data {
+				s.WriteString(fmt.Sprintf("\t%+v\n", d))
+			}
+			return s.String()
+		}
+		t.Errorf("Got user region related events\n%+v\nwant:\n%+v", pretty(got), pretty(want))
+	}
+}
diff --git a/src/runtime/trace/example_test.go b/src/runtime/trace/example_test.go
new file mode 100644
index 0000000..ba96a82
--- /dev/null
+++ b/src/runtime/trace/example_test.go
@@ -0,0 +1,39 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package trace_test
+
+import (
+	"fmt"
+	"log"
+	"os"
+	"runtime/trace"
+)
+
+// Example demonstrates the use of the trace package to trace
+// the execution of a Go program. The trace output will be
+// written to the file trace.out
+func Example() {
+	f, err := os.Create("trace.out")
+	if err != nil {
+		log.Fatalf("failed to create trace output file: %v", err)
+	}
+	defer func() {
+		if err := f.Close(); err != nil {
+			log.Fatalf("failed to close trace file: %v", err)
+		}
+	}()
+
+	if err := trace.Start(f); err != nil {
+		log.Fatalf("failed to start trace: %v", err)
+	}
+	defer trace.Stop()
+
+	// your program here
+	RunMyProgram()
+}
+
+func RunMyProgram() {
+	fmt.Printf("this function will be traced")
+}
diff --git a/src/runtime/trace/trace.go b/src/runtime/trace/trace.go
new file mode 100644
index 0000000..b34aef0
--- /dev/null
+++ b/src/runtime/trace/trace.go
@@ -0,0 +1,153 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package trace contains facilities for programs to generate traces
+// for the Go execution tracer.
+//
+// Tracing runtime activities
+//
+// The execution trace captures a wide range of execution events such as
+// goroutine creation/blocking/unblocking, syscall enter/exit/block,
+// GC-related events, changes of heap size, processor start/stop, etc.
+// A precise nanosecond-precision timestamp and a stack trace is
+// captured for most events. The generated trace can be interpreted
+// using `go tool trace`.
+//
+// Support for tracing tests and benchmarks built with the standard
+// testing package is built into `go test`. For example, the following
+// command runs the test in the current directory and writes the trace
+// file (trace.out).
+//
+//    go test -trace=trace.out
+//
+// This runtime/trace package provides APIs to add equivalent tracing
+// support to a standalone program. See the Example that demonstrates
+// how to use this API to enable tracing.
+//
+// There is also a standard HTTP interface to trace data. Adding the
+// following line will install a handler under the /debug/pprof/trace URL
+// to download a live trace:
+//
+//     import _ "net/http/pprof"
+//
+// See the net/http/pprof package for more details about all of the
+// debug endpoints installed by this import.
+//
+// User annotation
+//
+// Package trace provides user annotation APIs that can be used to
+// log interesting events during execution.
+//
+// There are three types of user annotations: log messages, regions,
+// and tasks.
+//
+// Log emits a timestamped message to the execution trace along with
+// additional information such as the category of the message and
+// which goroutine called Log. The execution tracer provides UIs to filter
+// and group goroutines using the log category and the message supplied
+// in Log.
+//
+// A region is for logging a time interval during a goroutine's execution.
+// By definition, a region starts and ends in the same goroutine.
+// Regions can be nested to represent subintervals.
+// For example, the following code records four regions in the execution
+// trace to trace the durations of sequential steps in a cappuccino making
+// operation.
+//
+//   trace.WithRegion(ctx, "makeCappuccino", func() {
+//
+//      // orderID allows to identify a specific order
+//      // among many cappuccino order region records.
+//      trace.Log(ctx, "orderID", orderID)
+//
+//      trace.WithRegion(ctx, "steamMilk", steamMilk)
+//      trace.WithRegion(ctx, "extractCoffee", extractCoffee)
+//      trace.WithRegion(ctx, "mixMilkCoffee", mixMilkCoffee)
+//   })
+//
+// A task is a higher-level component that aids tracing of logical
+// operations such as an RPC request, an HTTP request, or an
+// interesting local operation which may require multiple goroutines
+// working together. Since tasks can involve multiple goroutines,
+// they are tracked via a context.Context object. NewTask creates
+// a new task and embeds it in the returned context.Context object.
+// Log messages and regions are attached to the task, if any, in the
+// Context passed to Log and WithRegion.
+//
+// For example, assume that we decided to froth milk, extract coffee,
+// and mix milk and coffee in separate goroutines. With a task,
+// the trace tool can identify the goroutines involved in a specific
+// cappuccino order.
+//
+//      ctx, task := trace.NewTask(ctx, "makeCappuccino")
+//      trace.Log(ctx, "orderID", orderID)
+//
+//      milk := make(chan bool)
+//      espresso := make(chan bool)
+//
+//      go func() {
+//              trace.WithRegion(ctx, "steamMilk", steamMilk)
+//              milk <- true
+//      }()
+//      go func() {
+//              trace.WithRegion(ctx, "extractCoffee", extractCoffee)
+//              espresso <- true
+//      }()
+//      go func() {
+//              defer task.End() // When assemble is done, the order is complete.
+//              <-espresso
+//              <-milk
+//              trace.WithRegion(ctx, "mixMilkCoffee", mixMilkCoffee)
+//      }()
+//
+//
+// The trace tool computes the latency of a task by measuring the
+// time between the task creation and the task end and provides
+// latency distributions for each task type found in the trace.
+package trace
+
+import (
+	"io"
+	"runtime"
+	"sync"
+	"sync/atomic"
+)
+
+// Start enables tracing for the current program.
+// While tracing, the trace will be buffered and written to w.
+// Start returns an error if tracing is already enabled.
+func Start(w io.Writer) error {
+	tracing.Lock()
+	defer tracing.Unlock()
+
+	if err := runtime.StartTrace(); err != nil {
+		return err
+	}
+	go func() {
+		for {
+			data := runtime.ReadTrace()
+			if data == nil {
+				break
+			}
+			w.Write(data)
+		}
+	}()
+	atomic.StoreInt32(&tracing.enabled, 1)
+	return nil
+}
+
+// Stop stops the current tracing, if any.
+// Stop only returns after all the writes for the trace have completed.
+func Stop() {
+	tracing.Lock()
+	defer tracing.Unlock()
+	atomic.StoreInt32(&tracing.enabled, 0)
+
+	runtime.StopTrace()
+}
+
+var tracing struct {
+	sync.Mutex       // gate mutators (Start, Stop)
+	enabled    int32 // accessed via atomic
+}
diff --git a/src/runtime/trace/trace_stack_test.go b/src/runtime/trace/trace_stack_test.go
new file mode 100644
index 0000000..be3adc9
--- /dev/null
+++ b/src/runtime/trace/trace_stack_test.go
@@ -0,0 +1,333 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package trace_test
+
+import (
+	"bytes"
+	"fmt"
+	"internal/testenv"
+	"internal/trace"
+	"net"
+	"os"
+	"runtime"
+	. "runtime/trace"
+	"strings"
+	"sync"
+	"testing"
+	"text/tabwriter"
+	"time"
+)
+
+// TestTraceSymbolize tests symbolization and that events has proper stacks.
+// In particular that we strip bottom uninteresting frames like goexit,
+// top uninteresting frames (runtime guts).
+func TestTraceSymbolize(t *testing.T) {
+	skipTraceSymbolizeTestIfNecessary(t)
+
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+	defer Stop() // in case of early return
+
+	// Now we will do a bunch of things for which we verify stacks later.
+	// It is impossible to ensure that a goroutine has actually blocked
+	// on a channel, in a select or otherwise. So we kick off goroutines
+	// that need to block first in the hope that while we are executing
+	// the rest of the test, they will block.
+	go func() { // func1
+		select {}
+	}()
+	go func() { // func2
+		var c chan int
+		c <- 0
+	}()
+	go func() { // func3
+		var c chan int
+		<-c
+	}()
+	done1 := make(chan bool)
+	go func() { // func4
+		<-done1
+	}()
+	done2 := make(chan bool)
+	go func() { // func5
+		done2 <- true
+	}()
+	c1 := make(chan int)
+	c2 := make(chan int)
+	go func() { // func6
+		select {
+		case <-c1:
+		case <-c2:
+		}
+	}()
+	var mu sync.Mutex
+	mu.Lock()
+	go func() { // func7
+		mu.Lock()
+		mu.Unlock()
+	}()
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() { // func8
+		wg.Wait()
+	}()
+	cv := sync.NewCond(&sync.Mutex{})
+	go func() { // func9
+		cv.L.Lock()
+		cv.Wait()
+		cv.L.Unlock()
+	}()
+	ln, err := net.Listen("tcp", "127.0.0.1:0")
+	if err != nil {
+		t.Fatalf("failed to listen: %v", err)
+	}
+	go func() { // func10
+		c, err := ln.Accept()
+		if err != nil {
+			t.Errorf("failed to accept: %v", err)
+			return
+		}
+		c.Close()
+	}()
+	rp, wp, err := os.Pipe()
+	if err != nil {
+		t.Fatalf("failed to create a pipe: %v", err)
+	}
+	defer rp.Close()
+	defer wp.Close()
+	pipeReadDone := make(chan bool)
+	go func() { // func11
+		var data [1]byte
+		rp.Read(data[:])
+		pipeReadDone <- true
+	}()
+
+	time.Sleep(100 * time.Millisecond)
+	runtime.GC()
+	runtime.Gosched()
+	time.Sleep(100 * time.Millisecond) // the last chance for the goroutines above to block
+	done1 <- true
+	<-done2
+	select {
+	case c1 <- 0:
+	case c2 <- 0:
+	}
+	mu.Unlock()
+	wg.Done()
+	cv.Signal()
+	c, err := net.Dial("tcp", ln.Addr().String())
+	if err != nil {
+		t.Fatalf("failed to dial: %v", err)
+	}
+	c.Close()
+	var data [1]byte
+	wp.Write(data[:])
+	<-pipeReadDone
+
+	oldGoMaxProcs := runtime.GOMAXPROCS(0)
+	runtime.GOMAXPROCS(oldGoMaxProcs + 1)
+
+	Stop()
+
+	runtime.GOMAXPROCS(oldGoMaxProcs)
+
+	events, _ := parseTrace(t, buf)
+
+	// Now check that the stacks are correct.
+	type eventDesc struct {
+		Type byte
+		Stk  []frame
+	}
+	want := []eventDesc{
+		{trace.EvGCStart, []frame{
+			{"runtime.GC", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 0},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoStart, []frame{
+			{"runtime/trace_test.TestTraceSymbolize.func1", 0},
+		}},
+		{trace.EvGoSched, []frame{
+			{"runtime/trace_test.TestTraceSymbolize", 111},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoCreate, []frame{
+			{"runtime/trace_test.TestTraceSymbolize", 40},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoStop, []frame{
+			{"runtime.block", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func1", 0},
+		}},
+		{trace.EvGoStop, []frame{
+			{"runtime.chansend1", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func2", 0},
+		}},
+		{trace.EvGoStop, []frame{
+			{"runtime.chanrecv1", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func3", 0},
+		}},
+		{trace.EvGoBlockRecv, []frame{
+			{"runtime.chanrecv1", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func4", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"runtime.chansend1", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 113},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoBlockSend, []frame{
+			{"runtime.chansend1", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func5", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"runtime.chanrecv1", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 114},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoBlockSelect, []frame{
+			{"runtime.selectgo", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func6", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"runtime.selectgo", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 115},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoBlockSync, []frame{
+			{"sync.(*Mutex).Lock", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func7", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"sync.(*Mutex).Unlock", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 0},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoBlockSync, []frame{
+			{"sync.(*WaitGroup).Wait", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func8", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"sync.(*WaitGroup).Add", 0},
+			{"sync.(*WaitGroup).Done", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 120},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoBlockCond, []frame{
+			{"sync.(*Cond).Wait", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func9", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"sync.(*Cond).Signal", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 0},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoSleep, []frame{
+			{"time.Sleep", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 0},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGomaxprocs, []frame{
+			{"runtime.startTheWorld", 0}, // this is when the current gomaxprocs is logged.
+			{"runtime.startTheWorldGC", 0},
+			{"runtime.GOMAXPROCS", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 0},
+			{"testing.tRunner", 0},
+		}},
+	}
+	// Stacks for the following events are OS-dependent due to OS-specific code in net package.
+	if runtime.GOOS != "windows" && runtime.GOOS != "plan9" {
+		want = append(want, []eventDesc{
+			{trace.EvGoBlockNet, []frame{
+				{"internal/poll.(*FD).Accept", 0},
+				{"net.(*netFD).accept", 0},
+				{"net.(*TCPListener).accept", 0},
+				{"net.(*TCPListener).Accept", 0},
+				{"runtime/trace_test.TestTraceSymbolize.func10", 0},
+			}},
+			{trace.EvGoSysCall, []frame{
+				{"syscall.read", 0},
+				{"syscall.Read", 0},
+				{"internal/poll.ignoringEINTRIO", 0},
+				{"internal/poll.(*FD).Read", 0},
+				{"os.(*File).read", 0},
+				{"os.(*File).Read", 0},
+				{"runtime/trace_test.TestTraceSymbolize.func11", 0},
+			}},
+		}...)
+	}
+	matched := make([]bool, len(want))
+	for _, ev := range events {
+	wantLoop:
+		for i, w := range want {
+			if matched[i] || w.Type != ev.Type || len(w.Stk) != len(ev.Stk) {
+				continue
+			}
+
+			for fi, f := range ev.Stk {
+				wf := w.Stk[fi]
+				if wf.Fn != f.Fn || wf.Line != 0 && wf.Line != f.Line {
+					continue wantLoop
+				}
+			}
+			matched[i] = true
+		}
+	}
+	for i, w := range want {
+		if matched[i] {
+			continue
+		}
+		seen, n := dumpEventStacks(w.Type, events)
+		t.Errorf("Did not match event %v with stack\n%s\nSeen %d events of the type\n%s",
+			trace.EventDescriptions[w.Type].Name, dumpFrames(w.Stk), n, seen)
+	}
+}
+
+func skipTraceSymbolizeTestIfNecessary(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+}
+
+func dumpEventStacks(typ byte, events []*trace.Event) ([]byte, int) {
+	matched := 0
+	o := new(bytes.Buffer)
+	tw := tabwriter.NewWriter(o, 0, 8, 0, '\t', 0)
+	for _, ev := range events {
+		if ev.Type != typ {
+			continue
+		}
+		matched++
+		fmt.Fprintf(tw, "Offset %d\n", ev.Off)
+		for _, f := range ev.Stk {
+			fname := f.File
+			if idx := strings.Index(fname, "/go/src/"); idx > 0 {
+				fname = fname[idx:]
+			}
+			fmt.Fprintf(tw, "  %v\t%s:%d\n", f.Fn, fname, f.Line)
+		}
+	}
+	tw.Flush()
+	return o.Bytes(), matched
+}
+
+type frame struct {
+	Fn   string
+	Line int
+}
+
+func dumpFrames(frames []frame) []byte {
+	o := new(bytes.Buffer)
+	tw := tabwriter.NewWriter(o, 0, 8, 0, '\t', 0)
+
+	for _, f := range frames {
+		fmt.Fprintf(tw, "  %v\t :%d\n", f.Fn, f.Line)
+	}
+	tw.Flush()
+	return o.Bytes()
+}
diff --git a/src/runtime/trace/trace_test.go b/src/runtime/trace/trace_test.go
new file mode 100644
index 0000000..b316eaf
--- /dev/null
+++ b/src/runtime/trace/trace_test.go
@@ -0,0 +1,591 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package trace_test
+
+import (
+	"bytes"
+	"flag"
+	"internal/race"
+	"internal/trace"
+	"io"
+	"net"
+	"os"
+	"runtime"
+	. "runtime/trace"
+	"strconv"
+	"sync"
+	"testing"
+	"time"
+)
+
+var (
+	saveTraces = flag.Bool("savetraces", false, "save traces collected by tests")
+)
+
+// TestEventBatch tests Flush calls that happen during Start
+// don't produce corrupted traces.
+func TestEventBatch(t *testing.T) {
+	if race.Enabled {
+		t.Skip("skipping in race mode")
+	}
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	if testing.Short() {
+		t.Skip("skipping in short mode")
+	}
+	// During Start, bunch of records are written to reflect the current
+	// snapshot of the program, including state of each goroutines.
+	// And some string constants are written to the trace to aid trace
+	// parsing. This test checks Flush of the buffer occurred during
+	// this process doesn't cause corrupted traces.
+	// When a Flush is called during Start is complicated
+	// so we test with a range of number of goroutines hoping that one
+	// of them triggers Flush.
+	// This range was chosen to fill up a ~64KB buffer with traceEvGoCreate
+	// and traceEvGoWaiting events (12~13bytes per goroutine).
+	for g := 4950; g < 5050; g++ {
+		n := g
+		t.Run("G="+strconv.Itoa(n), func(t *testing.T) {
+			var wg sync.WaitGroup
+			wg.Add(n)
+
+			in := make(chan bool, 1000)
+			for i := 0; i < n; i++ {
+				go func() {
+					<-in
+					wg.Done()
+				}()
+			}
+			buf := new(bytes.Buffer)
+			if err := Start(buf); err != nil {
+				t.Fatalf("failed to start tracing: %v", err)
+			}
+
+			for i := 0; i < n; i++ {
+				in <- true
+			}
+			wg.Wait()
+			Stop()
+
+			_, err := trace.Parse(buf, "")
+			if err == trace.ErrTimeOrder {
+				t.Skipf("skipping trace: %v", err)
+			}
+
+			if err != nil {
+				t.Fatalf("failed to parse trace: %v", err)
+			}
+		})
+	}
+}
+
+func TestTraceStartStop(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+	Stop()
+	size := buf.Len()
+	if size == 0 {
+		t.Fatalf("trace is empty")
+	}
+	time.Sleep(100 * time.Millisecond)
+	if size != buf.Len() {
+		t.Fatalf("trace writes after stop: %v -> %v", size, buf.Len())
+	}
+	saveTrace(t, buf, "TestTraceStartStop")
+}
+
+func TestTraceDoubleStart(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	Stop()
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+	if err := Start(buf); err == nil {
+		t.Fatalf("succeed to start tracing second time")
+	}
+	Stop()
+	Stop()
+}
+
+func TestTrace(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+	Stop()
+	saveTrace(t, buf, "TestTrace")
+	_, err := trace.Parse(buf, "")
+	if err == trace.ErrTimeOrder {
+		t.Skipf("skipping trace: %v", err)
+	}
+	if err != nil {
+		t.Fatalf("failed to parse trace: %v", err)
+	}
+}
+
+func parseTrace(t *testing.T, r io.Reader) ([]*trace.Event, map[uint64]*trace.GDesc) {
+	res, err := trace.Parse(r, "")
+	if err == trace.ErrTimeOrder {
+		t.Skipf("skipping trace: %v", err)
+	}
+	if err != nil {
+		t.Fatalf("failed to parse trace: %v", err)
+	}
+	gs := trace.GoroutineStats(res.Events)
+	for goid := range gs {
+		// We don't do any particular checks on the result at the moment.
+		// But still check that RelatedGoroutines does not crash, hang, etc.
+		_ = trace.RelatedGoroutines(res.Events, goid)
+	}
+	return res.Events, gs
+}
+
+func testBrokenTimestamps(t *testing.T, data []byte) {
+	// On some processors cputicks (used to generate trace timestamps)
+	// produce non-monotonic timestamps. It is important that the parser
+	// distinguishes logically inconsistent traces (e.g. missing, excessive
+	// or misordered events) from broken timestamps. The former is a bug
+	// in tracer, the latter is a machine issue.
+	// So now that we have a consistent trace, test that (1) parser does
+	// not return a logical error in case of broken timestamps
+	// and (2) broken timestamps are eventually detected and reported.
+	trace.BreakTimestampsForTesting = true
+	defer func() {
+		trace.BreakTimestampsForTesting = false
+	}()
+	for i := 0; i < 1e4; i++ {
+		_, err := trace.Parse(bytes.NewReader(data), "")
+		if err == trace.ErrTimeOrder {
+			return
+		}
+		if err != nil {
+			t.Fatalf("failed to parse trace: %v", err)
+		}
+	}
+}
+
+func TestTraceStress(t *testing.T) {
+	if runtime.GOOS == "js" {
+		t.Skip("no os.Pipe on js")
+	}
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	if testing.Short() {
+		t.Skip("skipping in -short mode")
+	}
+
+	var wg sync.WaitGroup
+	done := make(chan bool)
+
+	// Create a goroutine blocked before tracing.
+	wg.Add(1)
+	go func() {
+		<-done
+		wg.Done()
+	}()
+
+	// Create a goroutine blocked in syscall before tracing.
+	rp, wp, err := os.Pipe()
+	if err != nil {
+		t.Fatalf("failed to create pipe: %v", err)
+	}
+	defer func() {
+		rp.Close()
+		wp.Close()
+	}()
+	wg.Add(1)
+	go func() {
+		var tmp [1]byte
+		rp.Read(tmp[:])
+		<-done
+		wg.Done()
+	}()
+	time.Sleep(time.Millisecond) // give the goroutine above time to block
+
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+
+	procs := runtime.GOMAXPROCS(10)
+	time.Sleep(50 * time.Millisecond) // test proc stop/start events
+
+	go func() {
+		runtime.LockOSThread()
+		for {
+			select {
+			case <-done:
+				return
+			default:
+				runtime.Gosched()
+			}
+		}
+	}()
+
+	runtime.GC()
+	// Trigger GC from malloc.
+	n := int(1e3)
+	if isMemoryConstrained() {
+		// Reduce allocation to avoid running out of
+		// memory on the builder - see issue/12032.
+		n = 512
+	}
+	for i := 0; i < n; i++ {
+		_ = make([]byte, 1<<20)
+	}
+
+	// Create a bunch of busy goroutines to load all Ps.
+	for p := 0; p < 10; p++ {
+		wg.Add(1)
+		go func() {
+			// Do something useful.
+			tmp := make([]byte, 1<<16)
+			for i := range tmp {
+				tmp[i]++
+			}
+			_ = tmp
+			<-done
+			wg.Done()
+		}()
+	}
+
+	// Block in syscall.
+	wg.Add(1)
+	go func() {
+		var tmp [1]byte
+		rp.Read(tmp[:])
+		<-done
+		wg.Done()
+	}()
+
+	// Test timers.
+	timerDone := make(chan bool)
+	go func() {
+		time.Sleep(time.Millisecond)
+		timerDone <- true
+	}()
+	<-timerDone
+
+	// A bit of network.
+	ln, err := net.Listen("tcp", "127.0.0.1:0")
+	if err != nil {
+		t.Fatalf("listen failed: %v", err)
+	}
+	defer ln.Close()
+	go func() {
+		c, err := ln.Accept()
+		if err != nil {
+			return
+		}
+		time.Sleep(time.Millisecond)
+		var buf [1]byte
+		c.Write(buf[:])
+		c.Close()
+	}()
+	c, err := net.Dial("tcp", ln.Addr().String())
+	if err != nil {
+		t.Fatalf("dial failed: %v", err)
+	}
+	var tmp [1]byte
+	c.Read(tmp[:])
+	c.Close()
+
+	go func() {
+		runtime.Gosched()
+		select {}
+	}()
+
+	// Unblock helper goroutines and wait them to finish.
+	wp.Write(tmp[:])
+	wp.Write(tmp[:])
+	close(done)
+	wg.Wait()
+
+	runtime.GOMAXPROCS(procs)
+
+	Stop()
+	saveTrace(t, buf, "TestTraceStress")
+	trace := buf.Bytes()
+	parseTrace(t, buf)
+	testBrokenTimestamps(t, trace)
+}
+
+// isMemoryConstrained reports whether the current machine is likely
+// to be memory constrained.
+// This was originally for the openbsd/arm builder (Issue 12032).
+// TODO: move this to testenv? Make this look at memory? Look at GO_BUILDER_NAME?
+func isMemoryConstrained() bool {
+	if runtime.GOOS == "plan9" {
+		return true
+	}
+	switch runtime.GOARCH {
+	case "arm", "mips", "mipsle":
+		return true
+	}
+	return false
+}
+
+// Do a bunch of various stuff (timers, GC, network, etc) in a separate goroutine.
+// And concurrently with all that start/stop trace 3 times.
+func TestTraceStressStartStop(t *testing.T) {
+	if runtime.GOOS == "js" {
+		t.Skip("no os.Pipe on js")
+	}
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(8))
+	outerDone := make(chan bool)
+
+	go func() {
+		defer func() {
+			outerDone <- true
+		}()
+
+		var wg sync.WaitGroup
+		done := make(chan bool)
+
+		wg.Add(1)
+		go func() {
+			<-done
+			wg.Done()
+		}()
+
+		rp, wp, err := os.Pipe()
+		if err != nil {
+			t.Errorf("failed to create pipe: %v", err)
+			return
+		}
+		defer func() {
+			rp.Close()
+			wp.Close()
+		}()
+		wg.Add(1)
+		go func() {
+			var tmp [1]byte
+			rp.Read(tmp[:])
+			<-done
+			wg.Done()
+		}()
+		time.Sleep(time.Millisecond)
+
+		go func() {
+			runtime.LockOSThread()
+			for {
+				select {
+				case <-done:
+					return
+				default:
+					runtime.Gosched()
+				}
+			}
+		}()
+
+		runtime.GC()
+		// Trigger GC from malloc.
+		n := int(1e3)
+		if isMemoryConstrained() {
+			// Reduce allocation to avoid running out of
+			// memory on the builder.
+			n = 512
+		}
+		for i := 0; i < n; i++ {
+			_ = make([]byte, 1<<20)
+		}
+
+		// Create a bunch of busy goroutines to load all Ps.
+		for p := 0; p < 10; p++ {
+			wg.Add(1)
+			go func() {
+				// Do something useful.
+				tmp := make([]byte, 1<<16)
+				for i := range tmp {
+					tmp[i]++
+				}
+				_ = tmp
+				<-done
+				wg.Done()
+			}()
+		}
+
+		// Block in syscall.
+		wg.Add(1)
+		go func() {
+			var tmp [1]byte
+			rp.Read(tmp[:])
+			<-done
+			wg.Done()
+		}()
+
+		runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+
+		// Test timers.
+		timerDone := make(chan bool)
+		go func() {
+			time.Sleep(time.Millisecond)
+			timerDone <- true
+		}()
+		<-timerDone
+
+		// A bit of network.
+		ln, err := net.Listen("tcp", "127.0.0.1:0")
+		if err != nil {
+			t.Errorf("listen failed: %v", err)
+			return
+		}
+		defer ln.Close()
+		go func() {
+			c, err := ln.Accept()
+			if err != nil {
+				return
+			}
+			time.Sleep(time.Millisecond)
+			var buf [1]byte
+			c.Write(buf[:])
+			c.Close()
+		}()
+		c, err := net.Dial("tcp", ln.Addr().String())
+		if err != nil {
+			t.Errorf("dial failed: %v", err)
+			return
+		}
+		var tmp [1]byte
+		c.Read(tmp[:])
+		c.Close()
+
+		go func() {
+			runtime.Gosched()
+			select {}
+		}()
+
+		// Unblock helper goroutines and wait them to finish.
+		wp.Write(tmp[:])
+		wp.Write(tmp[:])
+		close(done)
+		wg.Wait()
+	}()
+
+	for i := 0; i < 3; i++ {
+		buf := new(bytes.Buffer)
+		if err := Start(buf); err != nil {
+			t.Fatalf("failed to start tracing: %v", err)
+		}
+		time.Sleep(time.Millisecond)
+		Stop()
+		saveTrace(t, buf, "TestTraceStressStartStop")
+		trace := buf.Bytes()
+		parseTrace(t, buf)
+		testBrokenTimestamps(t, trace)
+	}
+	<-outerDone
+}
+
+func TestTraceFutileWakeup(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(8))
+	c0 := make(chan int, 1)
+	c1 := make(chan int, 1)
+	c2 := make(chan int, 1)
+	const procs = 2
+	var done sync.WaitGroup
+	done.Add(4 * procs)
+	for p := 0; p < procs; p++ {
+		const iters = 1e3
+		go func() {
+			for i := 0; i < iters; i++ {
+				runtime.Gosched()
+				c0 <- 0
+			}
+			done.Done()
+		}()
+		go func() {
+			for i := 0; i < iters; i++ {
+				runtime.Gosched()
+				<-c0
+			}
+			done.Done()
+		}()
+		go func() {
+			for i := 0; i < iters; i++ {
+				runtime.Gosched()
+				select {
+				case c1 <- 0:
+				case c2 <- 0:
+				}
+			}
+			done.Done()
+		}()
+		go func() {
+			for i := 0; i < iters; i++ {
+				runtime.Gosched()
+				select {
+				case <-c1:
+				case <-c2:
+				}
+			}
+			done.Done()
+		}()
+	}
+	done.Wait()
+
+	Stop()
+	saveTrace(t, buf, "TestTraceFutileWakeup")
+	events, _ := parseTrace(t, buf)
+	// Check that (1) trace does not contain EvFutileWakeup events and
+	// (2) there are no consecutive EvGoBlock/EvGCStart/EvGoBlock events
+	// (we call runtime.Gosched between all operations, so these would be futile wakeups).
+	gs := make(map[uint64]int)
+	for _, ev := range events {
+		switch ev.Type {
+		case trace.EvFutileWakeup:
+			t.Fatalf("found EvFutileWakeup event")
+		case trace.EvGoBlockSend, trace.EvGoBlockRecv, trace.EvGoBlockSelect:
+			if gs[ev.G] == 2 {
+				t.Fatalf("goroutine %v blocked on %v at %v right after start",
+					ev.G, trace.EventDescriptions[ev.Type].Name, ev.Ts)
+			}
+			if gs[ev.G] == 1 {
+				t.Fatalf("goroutine %v blocked on %v at %v while blocked",
+					ev.G, trace.EventDescriptions[ev.Type].Name, ev.Ts)
+			}
+			gs[ev.G] = 1
+		case trace.EvGoStart:
+			if gs[ev.G] == 1 {
+				gs[ev.G] = 2
+			}
+		default:
+			delete(gs, ev.G)
+		}
+	}
+}
+
+func saveTrace(t *testing.T, buf *bytes.Buffer, name string) {
+	if !*saveTraces {
+		return
+	}
+	if err := os.WriteFile(name+".trace", buf.Bytes(), 0600); err != nil {
+		t.Errorf("failed to write trace file: %s", err)
+	}
+}
diff --git a/src/runtime/traceback.go b/src/runtime/traceback.go
new file mode 100644
index 0000000..2601cd6
--- /dev/null
+++ b/src/runtime/traceback.go
@@ -0,0 +1,1346 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/bytealg"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// The code in this file implements stack trace walking for all architectures.
+// The most important fact about a given architecture is whether it uses a link register.
+// On systems with link registers, the prologue for a non-leaf function stores the
+// incoming value of LR at the bottom of the newly allocated stack frame.
+// On systems without link registers, the architecture pushes a return PC during
+// the call instruction, so the return PC ends up above the stack frame.
+// In this file, the return PC is always called LR, no matter how it was found.
+//
+// To date, the opposite of a link register architecture is an x86 architecture.
+// This code may need to change if some other kind of non-link-register
+// architecture comes along.
+//
+// The other important fact is the size of a pointer: on 32-bit systems the LR
+// takes up only 4 bytes on the stack, while on 64-bit systems it takes up 8 bytes.
+// Typically this is ptrSize.
+//
+// As an exception, amd64p32 had ptrSize == 4 but the CALL instruction still
+// stored an 8-byte return PC onto the stack. To accommodate this, we used regSize
+// as the size of the architecture-pushed return PC.
+//
+// usesLR is defined below in terms of minFrameSize, which is defined in
+// arch_$GOARCH.go. ptrSize and regSize are defined in stubs.go.
+
+const usesLR = sys.MinFrameSize > 0
+
+// Traceback over the deferred function calls.
+// Report them like calls that have been invoked but not started executing yet.
+func tracebackdefers(gp *g, callback func(*stkframe, unsafe.Pointer) bool, v unsafe.Pointer) {
+	var frame stkframe
+	for d := gp._defer; d != nil; d = d.link {
+		fn := d.fn
+		if fn == nil {
+			// Defer of nil function. Args don't matter.
+			frame.pc = 0
+			frame.fn = funcInfo{}
+			frame.argp = 0
+			frame.arglen = 0
+			frame.argmap = nil
+		} else {
+			frame.pc = fn.fn
+			f := findfunc(frame.pc)
+			if !f.valid() {
+				print("runtime: unknown pc in defer ", hex(frame.pc), "\n")
+				throw("unknown pc")
+			}
+			frame.fn = f
+			frame.argp = uintptr(deferArgs(d))
+			var ok bool
+			frame.arglen, frame.argmap, ok = getArgInfoFast(f, true)
+			if !ok {
+				frame.arglen, frame.argmap = getArgInfo(&frame, f, true, fn)
+			}
+		}
+		frame.continpc = frame.pc
+		if !callback((*stkframe)(noescape(unsafe.Pointer(&frame))), v) {
+			return
+		}
+	}
+}
+
+const sizeofSkipFunction = 256
+
+// Generic traceback. Handles runtime stack prints (pcbuf == nil),
+// the runtime.Callers function (pcbuf != nil), as well as the garbage
+// collector (callback != nil).  A little clunky to merge these, but avoids
+// duplicating the code and all its subtlety.
+//
+// The skip argument is only valid with pcbuf != nil and counts the number
+// of logical frames to skip rather than physical frames (with inlining, a
+// PC in pcbuf can represent multiple calls). If a PC is partially skipped
+// and max > 1, pcbuf[1] will be runtime.skipPleaseUseCallersFrames+N where
+// N indicates the number of logical frames to skip in pcbuf[0].
+func gentraceback(pc0, sp0, lr0 uintptr, gp *g, skip int, pcbuf *uintptr, max int, callback func(*stkframe, unsafe.Pointer) bool, v unsafe.Pointer, flags uint) int {
+	if skip > 0 && callback != nil {
+		throw("gentraceback callback cannot be used with non-zero skip")
+	}
+
+	// Don't call this "g"; it's too easy get "g" and "gp" confused.
+	if ourg := getg(); ourg == gp && ourg == ourg.m.curg {
+		// The starting sp has been passed in as a uintptr, and the caller may
+		// have other uintptr-typed stack references as well.
+		// If during one of the calls that got us here or during one of the
+		// callbacks below the stack must be grown, all these uintptr references
+		// to the stack will not be updated, and gentraceback will continue
+		// to inspect the old stack memory, which may no longer be valid.
+		// Even if all the variables were updated correctly, it is not clear that
+		// we want to expose a traceback that begins on one stack and ends
+		// on another stack. That could confuse callers quite a bit.
+		// Instead, we require that gentraceback and any other function that
+		// accepts an sp for the current goroutine (typically obtained by
+		// calling getcallersp) must not run on that goroutine's stack but
+		// instead on the g0 stack.
+		throw("gentraceback cannot trace user goroutine on its own stack")
+	}
+	level, _, _ := gotraceback()
+
+	var ctxt *funcval // Context pointer for unstarted goroutines. See issue #25897.
+
+	if pc0 == ^uintptr(0) && sp0 == ^uintptr(0) { // Signal to fetch saved values from gp.
+		if gp.syscallsp != 0 {
+			pc0 = gp.syscallpc
+			sp0 = gp.syscallsp
+			if usesLR {
+				lr0 = 0
+			}
+		} else {
+			pc0 = gp.sched.pc
+			sp0 = gp.sched.sp
+			if usesLR {
+				lr0 = gp.sched.lr
+			}
+			ctxt = (*funcval)(gp.sched.ctxt)
+		}
+	}
+
+	nprint := 0
+	var frame stkframe
+	frame.pc = pc0
+	frame.sp = sp0
+	if usesLR {
+		frame.lr = lr0
+	}
+	waspanic := false
+	cgoCtxt := gp.cgoCtxt
+	printing := pcbuf == nil && callback == nil
+
+	// If the PC is zero, it's likely a nil function call.
+	// Start in the caller's frame.
+	if frame.pc == 0 {
+		if usesLR {
+			frame.pc = *(*uintptr)(unsafe.Pointer(frame.sp))
+			frame.lr = 0
+		} else {
+			frame.pc = uintptr(*(*sys.Uintreg)(unsafe.Pointer(frame.sp)))
+			frame.sp += sys.RegSize
+		}
+	}
+
+	f := findfunc(frame.pc)
+	if !f.valid() {
+		if callback != nil || printing {
+			print("runtime: unknown pc ", hex(frame.pc), "\n")
+			tracebackHexdump(gp.stack, &frame, 0)
+		}
+		if callback != nil {
+			throw("unknown pc")
+		}
+		return 0
+	}
+	frame.fn = f
+
+	var cache pcvalueCache
+
+	lastFuncID := funcID_normal
+	n := 0
+	for n < max {
+		// Typically:
+		//	pc is the PC of the running function.
+		//	sp is the stack pointer at that program counter.
+		//	fp is the frame pointer (caller's stack pointer) at that program counter, or nil if unknown.
+		//	stk is the stack containing sp.
+		//	The caller's program counter is lr, unless lr is zero, in which case it is *(uintptr*)sp.
+		f = frame.fn
+		if f.pcsp == 0 {
+			// No frame information, must be external function, like race support.
+			// See golang.org/issue/13568.
+			break
+		}
+
+		// Found an actual function.
+		// Derive frame pointer and link register.
+		if frame.fp == 0 {
+			// Jump over system stack transitions. If we're on g0 and there's a user
+			// goroutine, try to jump. Otherwise this is a regular call.
+			if flags&_TraceJumpStack != 0 && gp == gp.m.g0 && gp.m.curg != nil {
+				switch f.funcID {
+				case funcID_morestack:
+					// morestack does not return normally -- newstack()
+					// gogo's to curg.sched. Match that.
+					// This keeps morestack() from showing up in the backtrace,
+					// but that makes some sense since it'll never be returned
+					// to.
+					frame.pc = gp.m.curg.sched.pc
+					frame.fn = findfunc(frame.pc)
+					f = frame.fn
+					frame.sp = gp.m.curg.sched.sp
+					cgoCtxt = gp.m.curg.cgoCtxt
+				case funcID_systemstack:
+					// systemstack returns normally, so just follow the
+					// stack transition.
+					frame.sp = gp.m.curg.sched.sp
+					cgoCtxt = gp.m.curg.cgoCtxt
+				}
+			}
+			frame.fp = frame.sp + uintptr(funcspdelta(f, frame.pc, &cache))
+			if !usesLR {
+				// On x86, call instruction pushes return PC before entering new function.
+				frame.fp += sys.RegSize
+			}
+		}
+		var flr funcInfo
+		if topofstack(f, gp.m != nil && gp == gp.m.g0) {
+			frame.lr = 0
+			flr = funcInfo{}
+		} else if usesLR && f.funcID == funcID_jmpdefer {
+			// jmpdefer modifies SP/LR/PC non-atomically.
+			// If a profiling interrupt arrives during jmpdefer,
+			// the stack unwind may see a mismatched register set
+			// and get confused. Stop if we see PC within jmpdefer
+			// to avoid that confusion.
+			// See golang.org/issue/8153.
+			if callback != nil {
+				throw("traceback_arm: found jmpdefer when tracing with callback")
+			}
+			frame.lr = 0
+		} else {
+			var lrPtr uintptr
+			if usesLR {
+				if n == 0 && frame.sp < frame.fp || frame.lr == 0 {
+					lrPtr = frame.sp
+					frame.lr = *(*uintptr)(unsafe.Pointer(lrPtr))
+				}
+			} else {
+				if frame.lr == 0 {
+					lrPtr = frame.fp - sys.RegSize
+					frame.lr = uintptr(*(*sys.Uintreg)(unsafe.Pointer(lrPtr)))
+				}
+			}
+			flr = findfunc(frame.lr)
+			if !flr.valid() {
+				// This happens if you get a profiling interrupt at just the wrong time.
+				// In that context it is okay to stop early.
+				// But if callback is set, we're doing a garbage collection and must
+				// get everything, so crash loudly.
+				doPrint := printing
+				if doPrint && gp.m.incgo && f.funcID == funcID_sigpanic {
+					// We can inject sigpanic
+					// calls directly into C code,
+					// in which case we'll see a C
+					// return PC. Don't complain.
+					doPrint = false
+				}
+				if callback != nil || doPrint {
+					print("runtime: unexpected return pc for ", funcname(f), " called from ", hex(frame.lr), "\n")
+					tracebackHexdump(gp.stack, &frame, lrPtr)
+				}
+				if callback != nil {
+					throw("unknown caller pc")
+				}
+			}
+		}
+
+		frame.varp = frame.fp
+		if !usesLR {
+			// On x86, call instruction pushes return PC before entering new function.
+			frame.varp -= sys.RegSize
+		}
+
+		// For architectures with frame pointers, if there's
+		// a frame, then there's a saved frame pointer here.
+		if frame.varp > frame.sp && (GOARCH == "amd64" || GOARCH == "arm64") {
+			frame.varp -= sys.RegSize
+		}
+
+		// Derive size of arguments.
+		// Most functions have a fixed-size argument block,
+		// so we can use metadata about the function f.
+		// Not all, though: there are some variadic functions
+		// in package runtime and reflect, and for those we use call-specific
+		// metadata recorded by f's caller.
+		if callback != nil || printing {
+			frame.argp = frame.fp + sys.MinFrameSize
+			var ok bool
+			frame.arglen, frame.argmap, ok = getArgInfoFast(f, callback != nil)
+			if !ok {
+				frame.arglen, frame.argmap = getArgInfo(&frame, f, callback != nil, ctxt)
+			}
+		}
+		ctxt = nil // ctxt is only needed to get arg maps for the topmost frame
+
+		// Determine frame's 'continuation PC', where it can continue.
+		// Normally this is the return address on the stack, but if sigpanic
+		// is immediately below this function on the stack, then the frame
+		// stopped executing due to a trap, and frame.pc is probably not
+		// a safe point for looking up liveness information. In this panicking case,
+		// the function either doesn't return at all (if it has no defers or if the
+		// defers do not recover) or it returns from one of the calls to
+		// deferproc a second time (if the corresponding deferred func recovers).
+		// In the latter case, use a deferreturn call site as the continuation pc.
+		frame.continpc = frame.pc
+		if waspanic {
+			if frame.fn.deferreturn != 0 {
+				frame.continpc = frame.fn.entry + uintptr(frame.fn.deferreturn) + 1
+				// Note: this may perhaps keep return variables alive longer than
+				// strictly necessary, as we are using "function has a defer statement"
+				// as a proxy for "function actually deferred something". It seems
+				// to be a minor drawback. (We used to actually look through the
+				// gp._defer for a defer corresponding to this function, but that
+				// is hard to do with defer records on the stack during a stack copy.)
+				// Note: the +1 is to offset the -1 that
+				// stack.go:getStackMap does to back up a return
+				// address make sure the pc is in the CALL instruction.
+			} else {
+				frame.continpc = 0
+			}
+		}
+
+		if callback != nil {
+			if !callback((*stkframe)(noescape(unsafe.Pointer(&frame))), v) {
+				return n
+			}
+		}
+
+		if pcbuf != nil {
+			pc := frame.pc
+			// backup to CALL instruction to read inlining info (same logic as below)
+			tracepc := pc
+			// Normally, pc is a return address. In that case, we want to look up
+			// file/line information using pc-1, because that is the pc of the
+			// call instruction (more precisely, the last byte of the call instruction).
+			// Callers expect the pc buffer to contain return addresses and do the
+			// same -1 themselves, so we keep pc unchanged.
+			// When the pc is from a signal (e.g. profiler or segv) then we want
+			// to look up file/line information using pc, and we store pc+1 in the
+			// pc buffer so callers can unconditionally subtract 1 before looking up.
+			// See issue 34123.
+			// The pc can be at function entry when the frame is initialized without
+			// actually running code, like runtime.mstart.
+			if (n == 0 && flags&_TraceTrap != 0) || waspanic || pc == f.entry {
+				pc++
+			} else {
+				tracepc--
+			}
+
+			// If there is inlining info, record the inner frames.
+			if inldata := funcdata(f, _FUNCDATA_InlTree); inldata != nil {
+				inltree := (*[1 << 20]inlinedCall)(inldata)
+				for {
+					ix := pcdatavalue(f, _PCDATA_InlTreeIndex, tracepc, &cache)
+					if ix < 0 {
+						break
+					}
+					if inltree[ix].funcID == funcID_wrapper && elideWrapperCalling(lastFuncID) {
+						// ignore wrappers
+					} else if skip > 0 {
+						skip--
+					} else if n < max {
+						(*[1 << 20]uintptr)(unsafe.Pointer(pcbuf))[n] = pc
+						n++
+					}
+					lastFuncID = inltree[ix].funcID
+					// Back up to an instruction in the "caller".
+					tracepc = frame.fn.entry + uintptr(inltree[ix].parentPc)
+					pc = tracepc + 1
+				}
+			}
+			// Record the main frame.
+			if f.funcID == funcID_wrapper && elideWrapperCalling(lastFuncID) {
+				// Ignore wrapper functions (except when they trigger panics).
+			} else if skip > 0 {
+				skip--
+			} else if n < max {
+				(*[1 << 20]uintptr)(unsafe.Pointer(pcbuf))[n] = pc
+				n++
+			}
+			lastFuncID = f.funcID
+			n-- // offset n++ below
+		}
+
+		if printing {
+			// assume skip=0 for printing.
+			//
+			// Never elide wrappers if we haven't printed
+			// any frames. And don't elide wrappers that
+			// called panic rather than the wrapped
+			// function. Otherwise, leave them out.
+
+			// backup to CALL instruction to read inlining info (same logic as below)
+			tracepc := frame.pc
+			if (n > 0 || flags&_TraceTrap == 0) && frame.pc > f.entry && !waspanic {
+				tracepc--
+			}
+			// If there is inlining info, print the inner frames.
+			if inldata := funcdata(f, _FUNCDATA_InlTree); inldata != nil {
+				inltree := (*[1 << 20]inlinedCall)(inldata)
+				var inlFunc _func
+				inlFuncInfo := funcInfo{&inlFunc, f.datap}
+				for {
+					ix := pcdatavalue(f, _PCDATA_InlTreeIndex, tracepc, nil)
+					if ix < 0 {
+						break
+					}
+
+					// Create a fake _func for the
+					// inlined function.
+					inlFunc.nameoff = inltree[ix].func_
+					inlFunc.funcID = inltree[ix].funcID
+
+					if (flags&_TraceRuntimeFrames) != 0 || showframe(inlFuncInfo, gp, nprint == 0, inlFuncInfo.funcID, lastFuncID) {
+						name := funcname(inlFuncInfo)
+						file, line := funcline(f, tracepc)
+						print(name, "(...)\n")
+						print("\t", file, ":", line, "\n")
+						nprint++
+					}
+					lastFuncID = inltree[ix].funcID
+					// Back up to an instruction in the "caller".
+					tracepc = frame.fn.entry + uintptr(inltree[ix].parentPc)
+				}
+			}
+			if (flags&_TraceRuntimeFrames) != 0 || showframe(f, gp, nprint == 0, f.funcID, lastFuncID) {
+				// Print during crash.
+				//	main(0x1, 0x2, 0x3)
+				//		/home/rsc/go/src/runtime/x.go:23 +0xf
+				//
+				name := funcname(f)
+				file, line := funcline(f, tracepc)
+				if name == "runtime.gopanic" {
+					name = "panic"
+				}
+				print(name, "(")
+				argp := (*[100]uintptr)(unsafe.Pointer(frame.argp))
+				for i := uintptr(0); i < frame.arglen/sys.PtrSize; i++ {
+					if i >= 10 {
+						print(", ...")
+						break
+					}
+					if i != 0 {
+						print(", ")
+					}
+					print(hex(argp[i]))
+				}
+				print(")\n")
+				print("\t", file, ":", line)
+				if frame.pc > f.entry {
+					print(" +", hex(frame.pc-f.entry))
+				}
+				if gp.m != nil && gp.m.throwing > 0 && gp == gp.m.curg || level >= 2 {
+					print(" fp=", hex(frame.fp), " sp=", hex(frame.sp), " pc=", hex(frame.pc))
+				}
+				print("\n")
+				nprint++
+			}
+			lastFuncID = f.funcID
+		}
+		n++
+
+		if f.funcID == funcID_cgocallback && len(cgoCtxt) > 0 {
+			ctxt := cgoCtxt[len(cgoCtxt)-1]
+			cgoCtxt = cgoCtxt[:len(cgoCtxt)-1]
+
+			// skip only applies to Go frames.
+			// callback != nil only used when we only care
+			// about Go frames.
+			if skip == 0 && callback == nil {
+				n = tracebackCgoContext(pcbuf, printing, ctxt, n, max)
+			}
+		}
+
+		waspanic = f.funcID == funcID_sigpanic
+		injectedCall := waspanic || f.funcID == funcID_asyncPreempt
+
+		// Do not unwind past the bottom of the stack.
+		if !flr.valid() {
+			break
+		}
+
+		// Unwind to next frame.
+		frame.fn = flr
+		frame.pc = frame.lr
+		frame.lr = 0
+		frame.sp = frame.fp
+		frame.fp = 0
+		frame.argmap = nil
+
+		// On link register architectures, sighandler saves the LR on stack
+		// before faking a call.
+		if usesLR && injectedCall {
+			x := *(*uintptr)(unsafe.Pointer(frame.sp))
+			frame.sp += sys.MinFrameSize
+			if GOARCH == "arm64" {
+				// arm64 needs 16-byte aligned SP, always
+				frame.sp += sys.PtrSize
+			}
+			f = findfunc(frame.pc)
+			frame.fn = f
+			if !f.valid() {
+				frame.pc = x
+			} else if funcspdelta(f, frame.pc, &cache) == 0 {
+				frame.lr = x
+			}
+		}
+	}
+
+	if printing {
+		n = nprint
+	}
+
+	// Note that panic != nil is okay here: there can be leftover panics,
+	// because the defers on the panic stack do not nest in frame order as
+	// they do on the defer stack. If you have:
+	//
+	//	frame 1 defers d1
+	//	frame 2 defers d2
+	//	frame 3 defers d3
+	//	frame 4 panics
+	//	frame 4's panic starts running defers
+	//	frame 5, running d3, defers d4
+	//	frame 5 panics
+	//	frame 5's panic starts running defers
+	//	frame 6, running d4, garbage collects
+	//	frame 6, running d2, garbage collects
+	//
+	// During the execution of d4, the panic stack is d4 -> d3, which
+	// is nested properly, and we'll treat frame 3 as resumable, because we
+	// can find d3. (And in fact frame 3 is resumable. If d4 recovers
+	// and frame 5 continues running, d3, d3 can recover and we'll
+	// resume execution in (returning from) frame 3.)
+	//
+	// During the execution of d2, however, the panic stack is d2 -> d3,
+	// which is inverted. The scan will match d2 to frame 2 but having
+	// d2 on the stack until then means it will not match d3 to frame 3.
+	// This is okay: if we're running d2, then all the defers after d2 have
+	// completed and their corresponding frames are dead. Not finding d3
+	// for frame 3 means we'll set frame 3's continpc == 0, which is correct
+	// (frame 3 is dead). At the end of the walk the panic stack can thus
+	// contain defers (d3 in this case) for dead frames. The inversion here
+	// always indicates a dead frame, and the effect of the inversion on the
+	// scan is to hide those dead frames, so the scan is still okay:
+	// what's left on the panic stack are exactly (and only) the dead frames.
+	//
+	// We require callback != nil here because only when callback != nil
+	// do we know that gentraceback is being called in a "must be correct"
+	// context as opposed to a "best effort" context. The tracebacks with
+	// callbacks only happen when everything is stopped nicely.
+	// At other times, such as when gathering a stack for a profiling signal
+	// or when printing a traceback during a crash, everything may not be
+	// stopped nicely, and the stack walk may not be able to complete.
+	if callback != nil && n < max && frame.sp != gp.stktopsp {
+		print("runtime: g", gp.goid, ": frame.sp=", hex(frame.sp), " top=", hex(gp.stktopsp), "\n")
+		print("\tstack=[", hex(gp.stack.lo), "-", hex(gp.stack.hi), "] n=", n, " max=", max, "\n")
+		throw("traceback did not unwind completely")
+	}
+
+	return n
+}
+
+// reflectMethodValue is a partial duplicate of reflect.makeFuncImpl
+// and reflect.methodValue.
+type reflectMethodValue struct {
+	fn     uintptr
+	stack  *bitvector // ptrmap for both args and results
+	argLen uintptr    // just args
+}
+
+// getArgInfoFast returns the argument frame information for a call to f.
+// It is short and inlineable. However, it does not handle all functions.
+// If ok reports false, you must call getArgInfo instead.
+// TODO(josharian): once we do mid-stack inlining,
+// call getArgInfo directly from getArgInfoFast and stop returning an ok bool.
+func getArgInfoFast(f funcInfo, needArgMap bool) (arglen uintptr, argmap *bitvector, ok bool) {
+	return uintptr(f.args), nil, !(needArgMap && f.args == _ArgsSizeUnknown)
+}
+
+// getArgInfo returns the argument frame information for a call to f
+// with call frame frame.
+//
+// This is used for both actual calls with active stack frames and for
+// deferred calls or goroutines that are not yet executing. If this is an actual
+// call, ctxt must be nil (getArgInfo will retrieve what it needs from
+// the active stack frame). If this is a deferred call or unstarted goroutine,
+// ctxt must be the function object that was deferred or go'd.
+func getArgInfo(frame *stkframe, f funcInfo, needArgMap bool, ctxt *funcval) (arglen uintptr, argmap *bitvector) {
+	arglen = uintptr(f.args)
+	if needArgMap && f.args == _ArgsSizeUnknown {
+		// Extract argument bitmaps for reflect stubs from the calls they made to reflect.
+		switch funcname(f) {
+		case "reflect.makeFuncStub", "reflect.methodValueCall":
+			// These take a *reflect.methodValue as their
+			// context register.
+			var mv *reflectMethodValue
+			var retValid bool
+			if ctxt != nil {
+				// This is not an actual call, but a
+				// deferred call or an unstarted goroutine.
+				// The function value is itself the *reflect.methodValue.
+				mv = (*reflectMethodValue)(unsafe.Pointer(ctxt))
+			} else {
+				// This is a real call that took the
+				// *reflect.methodValue as its context
+				// register and immediately saved it
+				// to 0(SP). Get the methodValue from
+				// 0(SP).
+				arg0 := frame.sp + sys.MinFrameSize
+				mv = *(**reflectMethodValue)(unsafe.Pointer(arg0))
+				// Figure out whether the return values are valid.
+				// Reflect will update this value after it copies
+				// in the return values.
+				retValid = *(*bool)(unsafe.Pointer(arg0 + 3*sys.PtrSize))
+			}
+			if mv.fn != f.entry {
+				print("runtime: confused by ", funcname(f), "\n")
+				throw("reflect mismatch")
+			}
+			bv := mv.stack
+			arglen = uintptr(bv.n * sys.PtrSize)
+			if !retValid {
+				arglen = uintptr(mv.argLen) &^ (sys.PtrSize - 1)
+			}
+			argmap = bv
+		}
+	}
+	return
+}
+
+// tracebackCgoContext handles tracing back a cgo context value, from
+// the context argument to setCgoTraceback, for the gentraceback
+// function. It returns the new value of n.
+func tracebackCgoContext(pcbuf *uintptr, printing bool, ctxt uintptr, n, max int) int {
+	var cgoPCs [32]uintptr
+	cgoContextPCs(ctxt, cgoPCs[:])
+	var arg cgoSymbolizerArg
+	anySymbolized := false
+	for _, pc := range cgoPCs {
+		if pc == 0 || n >= max {
+			break
+		}
+		if pcbuf != nil {
+			(*[1 << 20]uintptr)(unsafe.Pointer(pcbuf))[n] = pc
+		}
+		if printing {
+			if cgoSymbolizer == nil {
+				print("non-Go function at pc=", hex(pc), "\n")
+			} else {
+				c := printOneCgoTraceback(pc, max-n, &arg)
+				n += c - 1 // +1 a few lines down
+				anySymbolized = true
+			}
+		}
+		n++
+	}
+	if anySymbolized {
+		arg.pc = 0
+		callCgoSymbolizer(&arg)
+	}
+	return n
+}
+
+func printcreatedby(gp *g) {
+	// Show what created goroutine, except main goroutine (goid 1).
+	pc := gp.gopc
+	f := findfunc(pc)
+	if f.valid() && showframe(f, gp, false, funcID_normal, funcID_normal) && gp.goid != 1 {
+		printcreatedby1(f, pc)
+	}
+}
+
+func printcreatedby1(f funcInfo, pc uintptr) {
+	print("created by ", funcname(f), "\n")
+	tracepc := pc // back up to CALL instruction for funcline.
+	if pc > f.entry {
+		tracepc -= sys.PCQuantum
+	}
+	file, line := funcline(f, tracepc)
+	print("\t", file, ":", line)
+	if pc > f.entry {
+		print(" +", hex(pc-f.entry))
+	}
+	print("\n")
+}
+
+func traceback(pc, sp, lr uintptr, gp *g) {
+	traceback1(pc, sp, lr, gp, 0)
+}
+
+// tracebacktrap is like traceback but expects that the PC and SP were obtained
+// from a trap, not from gp->sched or gp->syscallpc/gp->syscallsp or getcallerpc/getcallersp.
+// Because they are from a trap instead of from a saved pair,
+// the initial PC must not be rewound to the previous instruction.
+// (All the saved pairs record a PC that is a return address, so we
+// rewind it into the CALL instruction.)
+// If gp.m.libcall{g,pc,sp} information is available, it uses that information in preference to
+// the pc/sp/lr passed in.
+func tracebacktrap(pc, sp, lr uintptr, gp *g) {
+	if gp.m.libcallsp != 0 {
+		// We're in C code somewhere, traceback from the saved position.
+		traceback1(gp.m.libcallpc, gp.m.libcallsp, 0, gp.m.libcallg.ptr(), 0)
+		return
+	}
+	traceback1(pc, sp, lr, gp, _TraceTrap)
+}
+
+func traceback1(pc, sp, lr uintptr, gp *g, flags uint) {
+	// If the goroutine is in cgo, and we have a cgo traceback, print that.
+	if iscgo && gp.m != nil && gp.m.ncgo > 0 && gp.syscallsp != 0 && gp.m.cgoCallers != nil && gp.m.cgoCallers[0] != 0 {
+		// Lock cgoCallers so that a signal handler won't
+		// change it, copy the array, reset it, unlock it.
+		// We are locked to the thread and are not running
+		// concurrently with a signal handler.
+		// We just have to stop a signal handler from interrupting
+		// in the middle of our copy.
+		atomic.Store(&gp.m.cgoCallersUse, 1)
+		cgoCallers := *gp.m.cgoCallers
+		gp.m.cgoCallers[0] = 0
+		atomic.Store(&gp.m.cgoCallersUse, 0)
+
+		printCgoTraceback(&cgoCallers)
+	}
+
+	var n int
+	if readgstatus(gp)&^_Gscan == _Gsyscall {
+		// Override registers if blocked in system call.
+		pc = gp.syscallpc
+		sp = gp.syscallsp
+		flags &^= _TraceTrap
+	}
+	// Print traceback. By default, omits runtime frames.
+	// If that means we print nothing at all, repeat forcing all frames printed.
+	n = gentraceback(pc, sp, lr, gp, 0, nil, _TracebackMaxFrames, nil, nil, flags)
+	if n == 0 && (flags&_TraceRuntimeFrames) == 0 {
+		n = gentraceback(pc, sp, lr, gp, 0, nil, _TracebackMaxFrames, nil, nil, flags|_TraceRuntimeFrames)
+	}
+	if n == _TracebackMaxFrames {
+		print("...additional frames elided...\n")
+	}
+	printcreatedby(gp)
+
+	if gp.ancestors == nil {
+		return
+	}
+	for _, ancestor := range *gp.ancestors {
+		printAncestorTraceback(ancestor)
+	}
+}
+
+// printAncestorTraceback prints the traceback of the given ancestor.
+// TODO: Unify this with gentraceback and CallersFrames.
+func printAncestorTraceback(ancestor ancestorInfo) {
+	print("[originating from goroutine ", ancestor.goid, "]:\n")
+	for fidx, pc := range ancestor.pcs {
+		f := findfunc(pc) // f previously validated
+		if showfuncinfo(f, fidx == 0, funcID_normal, funcID_normal) {
+			printAncestorTracebackFuncInfo(f, pc)
+		}
+	}
+	if len(ancestor.pcs) == _TracebackMaxFrames {
+		print("...additional frames elided...\n")
+	}
+	// Show what created goroutine, except main goroutine (goid 1).
+	f := findfunc(ancestor.gopc)
+	if f.valid() && showfuncinfo(f, false, funcID_normal, funcID_normal) && ancestor.goid != 1 {
+		printcreatedby1(f, ancestor.gopc)
+	}
+}
+
+// printAncestorTraceback prints the given function info at a given pc
+// within an ancestor traceback. The precision of this info is reduced
+// due to only have access to the pcs at the time of the caller
+// goroutine being created.
+func printAncestorTracebackFuncInfo(f funcInfo, pc uintptr) {
+	name := funcname(f)
+	if inldata := funcdata(f, _FUNCDATA_InlTree); inldata != nil {
+		inltree := (*[1 << 20]inlinedCall)(inldata)
+		ix := pcdatavalue(f, _PCDATA_InlTreeIndex, pc, nil)
+		if ix >= 0 {
+			name = funcnameFromNameoff(f, inltree[ix].func_)
+		}
+	}
+	file, line := funcline(f, pc)
+	if name == "runtime.gopanic" {
+		name = "panic"
+	}
+	print(name, "(...)\n")
+	print("\t", file, ":", line)
+	if pc > f.entry {
+		print(" +", hex(pc-f.entry))
+	}
+	print("\n")
+}
+
+func callers(skip int, pcbuf []uintptr) int {
+	sp := getcallersp()
+	pc := getcallerpc()
+	gp := getg()
+	var n int
+	systemstack(func() {
+		n = gentraceback(pc, sp, 0, gp, skip, &pcbuf[0], len(pcbuf), nil, nil, 0)
+	})
+	return n
+}
+
+func gcallers(gp *g, skip int, pcbuf []uintptr) int {
+	return gentraceback(^uintptr(0), ^uintptr(0), 0, gp, skip, &pcbuf[0], len(pcbuf), nil, nil, 0)
+}
+
+// showframe reports whether the frame with the given characteristics should
+// be printed during a traceback.
+func showframe(f funcInfo, gp *g, firstFrame bool, funcID, childID funcID) bool {
+	g := getg()
+	if g.m.throwing > 0 && gp != nil && (gp == g.m.curg || gp == g.m.caughtsig.ptr()) {
+		return true
+	}
+	return showfuncinfo(f, firstFrame, funcID, childID)
+}
+
+// showfuncinfo reports whether a function with the given characteristics should
+// be printed during a traceback.
+func showfuncinfo(f funcInfo, firstFrame bool, funcID, childID funcID) bool {
+	// Note that f may be a synthesized funcInfo for an inlined
+	// function, in which case only nameoff and funcID are set.
+
+	level, _, _ := gotraceback()
+	if level > 1 {
+		// Show all frames.
+		return true
+	}
+
+	if !f.valid() {
+		return false
+	}
+
+	if funcID == funcID_wrapper && elideWrapperCalling(childID) {
+		return false
+	}
+
+	name := funcname(f)
+
+	// Special case: always show runtime.gopanic frame
+	// in the middle of a stack trace, so that we can
+	// see the boundary between ordinary code and
+	// panic-induced deferred code.
+	// See golang.org/issue/5832.
+	if name == "runtime.gopanic" && !firstFrame {
+		return true
+	}
+
+	return bytealg.IndexByteString(name, '.') >= 0 && (!hasPrefix(name, "runtime.") || isExportedRuntime(name))
+}
+
+// isExportedRuntime reports whether name is an exported runtime function.
+// It is only for runtime functions, so ASCII A-Z is fine.
+func isExportedRuntime(name string) bool {
+	const n = len("runtime.")
+	return len(name) > n && name[:n] == "runtime." && 'A' <= name[n] && name[n] <= 'Z'
+}
+
+// elideWrapperCalling reports whether a wrapper function that called
+// function id should be elided from stack traces.
+func elideWrapperCalling(id funcID) bool {
+	// If the wrapper called a panic function instead of the
+	// wrapped function, we want to include it in stacks.
+	return !(id == funcID_gopanic || id == funcID_sigpanic || id == funcID_panicwrap)
+}
+
+var gStatusStrings = [...]string{
+	_Gidle:      "idle",
+	_Grunnable:  "runnable",
+	_Grunning:   "running",
+	_Gsyscall:   "syscall",
+	_Gwaiting:   "waiting",
+	_Gdead:      "dead",
+	_Gcopystack: "copystack",
+	_Gpreempted: "preempted",
+}
+
+func goroutineheader(gp *g) {
+	gpstatus := readgstatus(gp)
+
+	isScan := gpstatus&_Gscan != 0
+	gpstatus &^= _Gscan // drop the scan bit
+
+	// Basic string status
+	var status string
+	if 0 <= gpstatus && gpstatus < uint32(len(gStatusStrings)) {
+		status = gStatusStrings[gpstatus]
+	} else {
+		status = "???"
+	}
+
+	// Override.
+	if gpstatus == _Gwaiting && gp.waitreason != waitReasonZero {
+		status = gp.waitreason.String()
+	}
+
+	// approx time the G is blocked, in minutes
+	var waitfor int64
+	if (gpstatus == _Gwaiting || gpstatus == _Gsyscall) && gp.waitsince != 0 {
+		waitfor = (nanotime() - gp.waitsince) / 60e9
+	}
+	print("goroutine ", gp.goid, " [", status)
+	if isScan {
+		print(" (scan)")
+	}
+	if waitfor >= 1 {
+		print(", ", waitfor, " minutes")
+	}
+	if gp.lockedm != 0 {
+		print(", locked to thread")
+	}
+	print("]:\n")
+}
+
+func tracebackothers(me *g) {
+	level, _, _ := gotraceback()
+
+	// Show the current goroutine first, if we haven't already.
+	curgp := getg().m.curg
+	if curgp != nil && curgp != me {
+		print("\n")
+		goroutineheader(curgp)
+		traceback(^uintptr(0), ^uintptr(0), 0, curgp)
+	}
+
+	// We can't take allglock here because this may be during fatal
+	// throw/panic, where locking allglock could be out-of-order or a
+	// direct deadlock.
+	//
+	// Instead, use atomic access to allgs which requires no locking. We
+	// don't lock against concurrent creation of new Gs, but even with
+	// allglock we may miss Gs created after this loop.
+	ptr, length := atomicAllG()
+	for i := uintptr(0); i < length; i++ {
+		gp := atomicAllGIndex(ptr, i)
+
+		if gp == me || gp == curgp || readgstatus(gp) == _Gdead || isSystemGoroutine(gp, false) && level < 2 {
+			continue
+		}
+		print("\n")
+		goroutineheader(gp)
+		// Note: gp.m == g.m occurs when tracebackothers is
+		// called from a signal handler initiated during a
+		// systemstack call. The original G is still in the
+		// running state, and we want to print its stack.
+		if gp.m != getg().m && readgstatus(gp)&^_Gscan == _Grunning {
+			print("\tgoroutine running on other thread; stack unavailable\n")
+			printcreatedby(gp)
+		} else {
+			traceback(^uintptr(0), ^uintptr(0), 0, gp)
+		}
+	}
+}
+
+// tracebackHexdump hexdumps part of stk around frame.sp and frame.fp
+// for debugging purposes. If the address bad is included in the
+// hexdumped range, it will mark it as well.
+func tracebackHexdump(stk stack, frame *stkframe, bad uintptr) {
+	const expand = 32 * sys.PtrSize
+	const maxExpand = 256 * sys.PtrSize
+	// Start around frame.sp.
+	lo, hi := frame.sp, frame.sp
+	// Expand to include frame.fp.
+	if frame.fp != 0 && frame.fp < lo {
+		lo = frame.fp
+	}
+	if frame.fp != 0 && frame.fp > hi {
+		hi = frame.fp
+	}
+	// Expand a bit more.
+	lo, hi = lo-expand, hi+expand
+	// But don't go too far from frame.sp.
+	if lo < frame.sp-maxExpand {
+		lo = frame.sp - maxExpand
+	}
+	if hi > frame.sp+maxExpand {
+		hi = frame.sp + maxExpand
+	}
+	// And don't go outside the stack bounds.
+	if lo < stk.lo {
+		lo = stk.lo
+	}
+	if hi > stk.hi {
+		hi = stk.hi
+	}
+
+	// Print the hex dump.
+	print("stack: frame={sp:", hex(frame.sp), ", fp:", hex(frame.fp), "} stack=[", hex(stk.lo), ",", hex(stk.hi), ")\n")
+	hexdumpWords(lo, hi, func(p uintptr) byte {
+		switch p {
+		case frame.fp:
+			return '>'
+		case frame.sp:
+			return '<'
+		case bad:
+			return '!'
+		}
+		return 0
+	})
+}
+
+// Does f mark the top of a goroutine stack?
+func topofstack(f funcInfo, g0 bool) bool {
+	return f.funcID == funcID_goexit ||
+		f.funcID == funcID_mstart ||
+		f.funcID == funcID_mcall ||
+		f.funcID == funcID_morestack ||
+		f.funcID == funcID_rt0_go ||
+		f.funcID == funcID_externalthreadhandler ||
+		// asmcgocall is TOS on the system stack because it
+		// switches to the system stack, but in this case we
+		// can come back to the regular stack and still want
+		// to be able to unwind through the call that appeared
+		// on the regular stack.
+		(g0 && f.funcID == funcID_asmcgocall)
+}
+
+// isSystemGoroutine reports whether the goroutine g must be omitted
+// in stack dumps and deadlock detector. This is any goroutine that
+// starts at a runtime.* entry point, except for runtime.main,
+// runtime.handleAsyncEvent (wasm only) and sometimes runtime.runfinq.
+//
+// If fixed is true, any goroutine that can vary between user and
+// system (that is, the finalizer goroutine) is considered a user
+// goroutine.
+func isSystemGoroutine(gp *g, fixed bool) bool {
+	// Keep this in sync with cmd/trace/trace.go:isSystemGoroutine.
+	f := findfunc(gp.startpc)
+	if !f.valid() {
+		return false
+	}
+	if f.funcID == funcID_runtime_main || f.funcID == funcID_handleAsyncEvent {
+		return false
+	}
+	if f.funcID == funcID_runfinq {
+		// We include the finalizer goroutine if it's calling
+		// back into user code.
+		if fixed {
+			// This goroutine can vary. In fixed mode,
+			// always consider it a user goroutine.
+			return false
+		}
+		return !fingRunning
+	}
+	return hasPrefix(funcname(f), "runtime.")
+}
+
+// SetCgoTraceback records three C functions to use to gather
+// traceback information from C code and to convert that traceback
+// information into symbolic information. These are used when printing
+// stack traces for a program that uses cgo.
+//
+// The traceback and context functions may be called from a signal
+// handler, and must therefore use only async-signal safe functions.
+// The symbolizer function may be called while the program is
+// crashing, and so must be cautious about using memory.  None of the
+// functions may call back into Go.
+//
+// The context function will be called with a single argument, a
+// pointer to a struct:
+//
+//	struct {
+//		Context uintptr
+//	}
+//
+// In C syntax, this struct will be
+//
+//	struct {
+//		uintptr_t Context;
+//	};
+//
+// If the Context field is 0, the context function is being called to
+// record the current traceback context. It should record in the
+// Context field whatever information is needed about the current
+// point of execution to later produce a stack trace, probably the
+// stack pointer and PC. In this case the context function will be
+// called from C code.
+//
+// If the Context field is not 0, then it is a value returned by a
+// previous call to the context function. This case is called when the
+// context is no longer needed; that is, when the Go code is returning
+// to its C code caller. This permits the context function to release
+// any associated resources.
+//
+// While it would be correct for the context function to record a
+// complete a stack trace whenever it is called, and simply copy that
+// out in the traceback function, in a typical program the context
+// function will be called many times without ever recording a
+// traceback for that context. Recording a complete stack trace in a
+// call to the context function is likely to be inefficient.
+//
+// The traceback function will be called with a single argument, a
+// pointer to a struct:
+//
+//	struct {
+//		Context    uintptr
+//		SigContext uintptr
+//		Buf        *uintptr
+//		Max        uintptr
+//	}
+//
+// In C syntax, this struct will be
+//
+//	struct {
+//		uintptr_t  Context;
+//		uintptr_t  SigContext;
+//		uintptr_t* Buf;
+//		uintptr_t  Max;
+//	};
+//
+// The Context field will be zero to gather a traceback from the
+// current program execution point. In this case, the traceback
+// function will be called from C code.
+//
+// Otherwise Context will be a value previously returned by a call to
+// the context function. The traceback function should gather a stack
+// trace from that saved point in the program execution. The traceback
+// function may be called from an execution thread other than the one
+// that recorded the context, but only when the context is known to be
+// valid and unchanging. The traceback function may also be called
+// deeper in the call stack on the same thread that recorded the
+// context. The traceback function may be called multiple times with
+// the same Context value; it will usually be appropriate to cache the
+// result, if possible, the first time this is called for a specific
+// context value.
+//
+// If the traceback function is called from a signal handler on a Unix
+// system, SigContext will be the signal context argument passed to
+// the signal handler (a C ucontext_t* cast to uintptr_t). This may be
+// used to start tracing at the point where the signal occurred. If
+// the traceback function is not called from a signal handler,
+// SigContext will be zero.
+//
+// Buf is where the traceback information should be stored. It should
+// be PC values, such that Buf[0] is the PC of the caller, Buf[1] is
+// the PC of that function's caller, and so on.  Max is the maximum
+// number of entries to store.  The function should store a zero to
+// indicate the top of the stack, or that the caller is on a different
+// stack, presumably a Go stack.
+//
+// Unlike runtime.Callers, the PC values returned should, when passed
+// to the symbolizer function, return the file/line of the call
+// instruction.  No additional subtraction is required or appropriate.
+//
+// On all platforms, the traceback function is invoked when a call from
+// Go to C to Go requests a stack trace. On linux/amd64, linux/ppc64le,
+// and freebsd/amd64, the traceback function is also invoked when a
+// signal is received by a thread that is executing a cgo call. The
+// traceback function should not make assumptions about when it is
+// called, as future versions of Go may make additional calls.
+//
+// The symbolizer function will be called with a single argument, a
+// pointer to a struct:
+//
+//	struct {
+//		PC      uintptr // program counter to fetch information for
+//		File    *byte   // file name (NUL terminated)
+//		Lineno  uintptr // line number
+//		Func    *byte   // function name (NUL terminated)
+//		Entry   uintptr // function entry point
+//		More    uintptr // set non-zero if more info for this PC
+//		Data    uintptr // unused by runtime, available for function
+//	}
+//
+// In C syntax, this struct will be
+//
+//	struct {
+//		uintptr_t PC;
+//		char*     File;
+//		uintptr_t Lineno;
+//		char*     Func;
+//		uintptr_t Entry;
+//		uintptr_t More;
+//		uintptr_t Data;
+//	};
+//
+// The PC field will be a value returned by a call to the traceback
+// function.
+//
+// The first time the function is called for a particular traceback,
+// all the fields except PC will be 0. The function should fill in the
+// other fields if possible, setting them to 0/nil if the information
+// is not available. The Data field may be used to store any useful
+// information across calls. The More field should be set to non-zero
+// if there is more information for this PC, zero otherwise. If More
+// is set non-zero, the function will be called again with the same
+// PC, and may return different information (this is intended for use
+// with inlined functions). If More is zero, the function will be
+// called with the next PC value in the traceback. When the traceback
+// is complete, the function will be called once more with PC set to
+// zero; this may be used to free any information. Each call will
+// leave the fields of the struct set to the same values they had upon
+// return, except for the PC field when the More field is zero. The
+// function must not keep a copy of the struct pointer between calls.
+//
+// When calling SetCgoTraceback, the version argument is the version
+// number of the structs that the functions expect to receive.
+// Currently this must be zero.
+//
+// The symbolizer function may be nil, in which case the results of
+// the traceback function will be displayed as numbers. If the
+// traceback function is nil, the symbolizer function will never be
+// called. The context function may be nil, in which case the
+// traceback function will only be called with the context field set
+// to zero.  If the context function is nil, then calls from Go to C
+// to Go will not show a traceback for the C portion of the call stack.
+//
+// SetCgoTraceback should be called only once, ideally from an init function.
+func SetCgoTraceback(version int, traceback, context, symbolizer unsafe.Pointer) {
+	if version != 0 {
+		panic("unsupported version")
+	}
+
+	if cgoTraceback != nil && cgoTraceback != traceback ||
+		cgoContext != nil && cgoContext != context ||
+		cgoSymbolizer != nil && cgoSymbolizer != symbolizer {
+		panic("call SetCgoTraceback only once")
+	}
+
+	cgoTraceback = traceback
+	cgoContext = context
+	cgoSymbolizer = symbolizer
+
+	// The context function is called when a C function calls a Go
+	// function. As such it is only called by C code in runtime/cgo.
+	if _cgo_set_context_function != nil {
+		cgocall(_cgo_set_context_function, context)
+	}
+}
+
+var cgoTraceback unsafe.Pointer
+var cgoContext unsafe.Pointer
+var cgoSymbolizer unsafe.Pointer
+
+// cgoTracebackArg is the type passed to cgoTraceback.
+type cgoTracebackArg struct {
+	context    uintptr
+	sigContext uintptr
+	buf        *uintptr
+	max        uintptr
+}
+
+// cgoContextArg is the type passed to the context function.
+type cgoContextArg struct {
+	context uintptr
+}
+
+// cgoSymbolizerArg is the type passed to cgoSymbolizer.
+type cgoSymbolizerArg struct {
+	pc       uintptr
+	file     *byte
+	lineno   uintptr
+	funcName *byte
+	entry    uintptr
+	more     uintptr
+	data     uintptr
+}
+
+// cgoTraceback prints a traceback of callers.
+func printCgoTraceback(callers *cgoCallers) {
+	if cgoSymbolizer == nil {
+		for _, c := range callers {
+			if c == 0 {
+				break
+			}
+			print("non-Go function at pc=", hex(c), "\n")
+		}
+		return
+	}
+
+	var arg cgoSymbolizerArg
+	for _, c := range callers {
+		if c == 0 {
+			break
+		}
+		printOneCgoTraceback(c, 0x7fffffff, &arg)
+	}
+	arg.pc = 0
+	callCgoSymbolizer(&arg)
+}
+
+// printOneCgoTraceback prints the traceback of a single cgo caller.
+// This can print more than one line because of inlining.
+// Returns the number of frames printed.
+func printOneCgoTraceback(pc uintptr, max int, arg *cgoSymbolizerArg) int {
+	c := 0
+	arg.pc = pc
+	for c <= max {
+		callCgoSymbolizer(arg)
+		if arg.funcName != nil {
+			// Note that we don't print any argument
+			// information here, not even parentheses.
+			// The symbolizer must add that if appropriate.
+			println(gostringnocopy(arg.funcName))
+		} else {
+			println("non-Go function")
+		}
+		print("\t")
+		if arg.file != nil {
+			print(gostringnocopy(arg.file), ":", arg.lineno, " ")
+		}
+		print("pc=", hex(pc), "\n")
+		c++
+		if arg.more == 0 {
+			break
+		}
+	}
+	return c
+}
+
+// callCgoSymbolizer calls the cgoSymbolizer function.
+func callCgoSymbolizer(arg *cgoSymbolizerArg) {
+	call := cgocall
+	if panicking > 0 || getg().m.curg != getg() {
+		// We do not want to call into the scheduler when panicking
+		// or when on the system stack.
+		call = asmcgocall
+	}
+	if msanenabled {
+		msanwrite(unsafe.Pointer(arg), unsafe.Sizeof(cgoSymbolizerArg{}))
+	}
+	call(cgoSymbolizer, noescape(unsafe.Pointer(arg)))
+}
+
+// cgoContextPCs gets the PC values from a cgo traceback.
+func cgoContextPCs(ctxt uintptr, buf []uintptr) {
+	if cgoTraceback == nil {
+		return
+	}
+	call := cgocall
+	if panicking > 0 || getg().m.curg != getg() {
+		// We do not want to call into the scheduler when panicking
+		// or when on the system stack.
+		call = asmcgocall
+	}
+	arg := cgoTracebackArg{
+		context: ctxt,
+		buf:     (*uintptr)(noescape(unsafe.Pointer(&buf[0]))),
+		max:     uintptr(len(buf)),
+	}
+	if msanenabled {
+		msanwrite(unsafe.Pointer(&arg), unsafe.Sizeof(arg))
+	}
+	call(cgoTraceback, noescape(unsafe.Pointer(&arg)))
+}
diff --git a/src/runtime/type.go b/src/runtime/type.go
new file mode 100644
index 0000000..81455f3
--- /dev/null
+++ b/src/runtime/type.go
@@ -0,0 +1,733 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Runtime type representation.
+
+package runtime
+
+import "unsafe"
+
+// tflag is documented in reflect/type.go.
+//
+// tflag values must be kept in sync with copies in:
+//	cmd/compile/internal/gc/reflect.go
+//	cmd/link/internal/ld/decodesym.go
+//	reflect/type.go
+//      internal/reflectlite/type.go
+type tflag uint8
+
+const (
+	tflagUncommon      tflag = 1 << 0
+	tflagExtraStar     tflag = 1 << 1
+	tflagNamed         tflag = 1 << 2
+	tflagRegularMemory tflag = 1 << 3 // equal and hash can treat values of this type as a single region of t.size bytes
+)
+
+// Needs to be in sync with ../cmd/link/internal/ld/decodesym.go:/^func.commonsize,
+// ../cmd/compile/internal/gc/reflect.go:/^func.dcommontype and
+// ../reflect/type.go:/^type.rtype.
+// ../internal/reflectlite/type.go:/^type.rtype.
+type _type struct {
+	size       uintptr
+	ptrdata    uintptr // size of memory prefix holding all pointers
+	hash       uint32
+	tflag      tflag
+	align      uint8
+	fieldAlign uint8
+	kind       uint8
+	// function for comparing objects of this type
+	// (ptr to object A, ptr to object B) -> ==?
+	equal func(unsafe.Pointer, unsafe.Pointer) bool
+	// gcdata stores the GC type data for the garbage collector.
+	// If the KindGCProg bit is set in kind, gcdata is a GC program.
+	// Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
+	gcdata    *byte
+	str       nameOff
+	ptrToThis typeOff
+}
+
+func (t *_type) string() string {
+	s := t.nameOff(t.str).name()
+	if t.tflag&tflagExtraStar != 0 {
+		return s[1:]
+	}
+	return s
+}
+
+func (t *_type) uncommon() *uncommontype {
+	if t.tflag&tflagUncommon == 0 {
+		return nil
+	}
+	switch t.kind & kindMask {
+	case kindStruct:
+		type u struct {
+			structtype
+			u uncommontype
+		}
+		return &(*u)(unsafe.Pointer(t)).u
+	case kindPtr:
+		type u struct {
+			ptrtype
+			u uncommontype
+		}
+		return &(*u)(unsafe.Pointer(t)).u
+	case kindFunc:
+		type u struct {
+			functype
+			u uncommontype
+		}
+		return &(*u)(unsafe.Pointer(t)).u
+	case kindSlice:
+		type u struct {
+			slicetype
+			u uncommontype
+		}
+		return &(*u)(unsafe.Pointer(t)).u
+	case kindArray:
+		type u struct {
+			arraytype
+			u uncommontype
+		}
+		return &(*u)(unsafe.Pointer(t)).u
+	case kindChan:
+		type u struct {
+			chantype
+			u uncommontype
+		}
+		return &(*u)(unsafe.Pointer(t)).u
+	case kindMap:
+		type u struct {
+			maptype
+			u uncommontype
+		}
+		return &(*u)(unsafe.Pointer(t)).u
+	case kindInterface:
+		type u struct {
+			interfacetype
+			u uncommontype
+		}
+		return &(*u)(unsafe.Pointer(t)).u
+	default:
+		type u struct {
+			_type
+			u uncommontype
+		}
+		return &(*u)(unsafe.Pointer(t)).u
+	}
+}
+
+func (t *_type) name() string {
+	if t.tflag&tflagNamed == 0 {
+		return ""
+	}
+	s := t.string()
+	i := len(s) - 1
+	for i >= 0 && s[i] != '.' {
+		i--
+	}
+	return s[i+1:]
+}
+
+// pkgpath returns the path of the package where t was defined, if
+// available. This is not the same as the reflect package's PkgPath
+// method, in that it returns the package path for struct and interface
+// types, not just named types.
+func (t *_type) pkgpath() string {
+	if u := t.uncommon(); u != nil {
+		return t.nameOff(u.pkgpath).name()
+	}
+	switch t.kind & kindMask {
+	case kindStruct:
+		st := (*structtype)(unsafe.Pointer(t))
+		return st.pkgPath.name()
+	case kindInterface:
+		it := (*interfacetype)(unsafe.Pointer(t))
+		return it.pkgpath.name()
+	}
+	return ""
+}
+
+// reflectOffs holds type offsets defined at run time by the reflect package.
+//
+// When a type is defined at run time, its *rtype data lives on the heap.
+// There are a wide range of possible addresses the heap may use, that
+// may not be representable as a 32-bit offset. Moreover the GC may
+// one day start moving heap memory, in which case there is no stable
+// offset that can be defined.
+//
+// To provide stable offsets, we add pin *rtype objects in a global map
+// and treat the offset as an identifier. We use negative offsets that
+// do not overlap with any compile-time module offsets.
+//
+// Entries are created by reflect.addReflectOff.
+var reflectOffs struct {
+	lock mutex
+	next int32
+	m    map[int32]unsafe.Pointer
+	minv map[unsafe.Pointer]int32
+}
+
+func reflectOffsLock() {
+	lock(&reflectOffs.lock)
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&reflectOffs.lock))
+	}
+}
+
+func reflectOffsUnlock() {
+	if raceenabled {
+		racerelease(unsafe.Pointer(&reflectOffs.lock))
+	}
+	unlock(&reflectOffs.lock)
+}
+
+func resolveNameOff(ptrInModule unsafe.Pointer, off nameOff) name {
+	if off == 0 {
+		return name{}
+	}
+	base := uintptr(ptrInModule)
+	for md := &firstmoduledata; md != nil; md = md.next {
+		if base >= md.types && base < md.etypes {
+			res := md.types + uintptr(off)
+			if res > md.etypes {
+				println("runtime: nameOff", hex(off), "out of range", hex(md.types), "-", hex(md.etypes))
+				throw("runtime: name offset out of range")
+			}
+			return name{(*byte)(unsafe.Pointer(res))}
+		}
+	}
+
+	// No module found. see if it is a run time name.
+	reflectOffsLock()
+	res, found := reflectOffs.m[int32(off)]
+	reflectOffsUnlock()
+	if !found {
+		println("runtime: nameOff", hex(off), "base", hex(base), "not in ranges:")
+		for next := &firstmoduledata; next != nil; next = next.next {
+			println("\ttypes", hex(next.types), "etypes", hex(next.etypes))
+		}
+		throw("runtime: name offset base pointer out of range")
+	}
+	return name{(*byte)(res)}
+}
+
+func (t *_type) nameOff(off nameOff) name {
+	return resolveNameOff(unsafe.Pointer(t), off)
+}
+
+func resolveTypeOff(ptrInModule unsafe.Pointer, off typeOff) *_type {
+	if off == 0 || off == -1 {
+		// -1 is the sentinel value for unreachable code.
+		// See cmd/link/internal/ld/data.go:relocsym.
+		return nil
+	}
+	base := uintptr(ptrInModule)
+	var md *moduledata
+	for next := &firstmoduledata; next != nil; next = next.next {
+		if base >= next.types && base < next.etypes {
+			md = next
+			break
+		}
+	}
+	if md == nil {
+		reflectOffsLock()
+		res := reflectOffs.m[int32(off)]
+		reflectOffsUnlock()
+		if res == nil {
+			println("runtime: typeOff", hex(off), "base", hex(base), "not in ranges:")
+			for next := &firstmoduledata; next != nil; next = next.next {
+				println("\ttypes", hex(next.types), "etypes", hex(next.etypes))
+			}
+			throw("runtime: type offset base pointer out of range")
+		}
+		return (*_type)(res)
+	}
+	if t := md.typemap[off]; t != nil {
+		return t
+	}
+	res := md.types + uintptr(off)
+	if res > md.etypes {
+		println("runtime: typeOff", hex(off), "out of range", hex(md.types), "-", hex(md.etypes))
+		throw("runtime: type offset out of range")
+	}
+	return (*_type)(unsafe.Pointer(res))
+}
+
+func (t *_type) typeOff(off typeOff) *_type {
+	return resolveTypeOff(unsafe.Pointer(t), off)
+}
+
+func (t *_type) textOff(off textOff) unsafe.Pointer {
+	if off == -1 {
+		// -1 is the sentinel value for unreachable code.
+		// See cmd/link/internal/ld/data.go:relocsym.
+		return unsafe.Pointer(^uintptr(0))
+	}
+	base := uintptr(unsafe.Pointer(t))
+	var md *moduledata
+	for next := &firstmoduledata; next != nil; next = next.next {
+		if base >= next.types && base < next.etypes {
+			md = next
+			break
+		}
+	}
+	if md == nil {
+		reflectOffsLock()
+		res := reflectOffs.m[int32(off)]
+		reflectOffsUnlock()
+		if res == nil {
+			println("runtime: textOff", hex(off), "base", hex(base), "not in ranges:")
+			for next := &firstmoduledata; next != nil; next = next.next {
+				println("\ttypes", hex(next.types), "etypes", hex(next.etypes))
+			}
+			throw("runtime: text offset base pointer out of range")
+		}
+		return res
+	}
+	res := uintptr(0)
+
+	// The text, or instruction stream is generated as one large buffer.  The off (offset) for a method is
+	// its offset within this buffer.  If the total text size gets too large, there can be issues on platforms like ppc64 if
+	// the target of calls are too far for the call instruction.  To resolve the large text issue, the text is split
+	// into multiple text sections to allow the linker to generate long calls when necessary.  When this happens, the vaddr
+	// for each text section is set to its offset within the text.  Each method's offset is compared against the section
+	// vaddrs and sizes to determine the containing section.  Then the section relative offset is added to the section's
+	// relocated baseaddr to compute the method addess.
+
+	if len(md.textsectmap) > 1 {
+		for i := range md.textsectmap {
+			sectaddr := md.textsectmap[i].vaddr
+			sectlen := md.textsectmap[i].length
+			if uintptr(off) >= sectaddr && uintptr(off) < sectaddr+sectlen {
+				res = md.textsectmap[i].baseaddr + uintptr(off) - uintptr(md.textsectmap[i].vaddr)
+				break
+			}
+		}
+	} else {
+		// single text section
+		res = md.text + uintptr(off)
+	}
+
+	if res > md.etext && GOARCH != "wasm" { // on wasm, functions do not live in the same address space as the linear memory
+		println("runtime: textOff", hex(off), "out of range", hex(md.text), "-", hex(md.etext))
+		throw("runtime: text offset out of range")
+	}
+	return unsafe.Pointer(res)
+}
+
+func (t *functype) in() []*_type {
+	// See funcType in reflect/type.go for details on data layout.
+	uadd := uintptr(unsafe.Sizeof(functype{}))
+	if t.typ.tflag&tflagUncommon != 0 {
+		uadd += unsafe.Sizeof(uncommontype{})
+	}
+	return (*[1 << 20]*_type)(add(unsafe.Pointer(t), uadd))[:t.inCount]
+}
+
+func (t *functype) out() []*_type {
+	// See funcType in reflect/type.go for details on data layout.
+	uadd := uintptr(unsafe.Sizeof(functype{}))
+	if t.typ.tflag&tflagUncommon != 0 {
+		uadd += unsafe.Sizeof(uncommontype{})
+	}
+	outCount := t.outCount & (1<<15 - 1)
+	return (*[1 << 20]*_type)(add(unsafe.Pointer(t), uadd))[t.inCount : t.inCount+outCount]
+}
+
+func (t *functype) dotdotdot() bool {
+	return t.outCount&(1<<15) != 0
+}
+
+type nameOff int32
+type typeOff int32
+type textOff int32
+
+type method struct {
+	name nameOff
+	mtyp typeOff
+	ifn  textOff
+	tfn  textOff
+}
+
+type uncommontype struct {
+	pkgpath nameOff
+	mcount  uint16 // number of methods
+	xcount  uint16 // number of exported methods
+	moff    uint32 // offset from this uncommontype to [mcount]method
+	_       uint32 // unused
+}
+
+type imethod struct {
+	name nameOff
+	ityp typeOff
+}
+
+type interfacetype struct {
+	typ     _type
+	pkgpath name
+	mhdr    []imethod
+}
+
+type maptype struct {
+	typ    _type
+	key    *_type
+	elem   *_type
+	bucket *_type // internal type representing a hash bucket
+	// function for hashing keys (ptr to key, seed) -> hash
+	hasher     func(unsafe.Pointer, uintptr) uintptr
+	keysize    uint8  // size of key slot
+	elemsize   uint8  // size of elem slot
+	bucketsize uint16 // size of bucket
+	flags      uint32
+}
+
+// Note: flag values must match those used in the TMAP case
+// in ../cmd/compile/internal/gc/reflect.go:dtypesym.
+func (mt *maptype) indirectkey() bool { // store ptr to key instead of key itself
+	return mt.flags&1 != 0
+}
+func (mt *maptype) indirectelem() bool { // store ptr to elem instead of elem itself
+	return mt.flags&2 != 0
+}
+func (mt *maptype) reflexivekey() bool { // true if k==k for all keys
+	return mt.flags&4 != 0
+}
+func (mt *maptype) needkeyupdate() bool { // true if we need to update key on an overwrite
+	return mt.flags&8 != 0
+}
+func (mt *maptype) hashMightPanic() bool { // true if hash function might panic
+	return mt.flags&16 != 0
+}
+
+type arraytype struct {
+	typ   _type
+	elem  *_type
+	slice *_type
+	len   uintptr
+}
+
+type chantype struct {
+	typ  _type
+	elem *_type
+	dir  uintptr
+}
+
+type slicetype struct {
+	typ  _type
+	elem *_type
+}
+
+type functype struct {
+	typ      _type
+	inCount  uint16
+	outCount uint16
+}
+
+type ptrtype struct {
+	typ  _type
+	elem *_type
+}
+
+type structfield struct {
+	name       name
+	typ        *_type
+	offsetAnon uintptr
+}
+
+func (f *structfield) offset() uintptr {
+	return f.offsetAnon >> 1
+}
+
+type structtype struct {
+	typ     _type
+	pkgPath name
+	fields  []structfield
+}
+
+// name is an encoded type name with optional extra data.
+// See reflect/type.go for details.
+type name struct {
+	bytes *byte
+}
+
+func (n name) data(off int) *byte {
+	return (*byte)(add(unsafe.Pointer(n.bytes), uintptr(off)))
+}
+
+func (n name) isExported() bool {
+	return (*n.bytes)&(1<<0) != 0
+}
+
+func (n name) nameLen() int {
+	return int(uint16(*n.data(1))<<8 | uint16(*n.data(2)))
+}
+
+func (n name) tagLen() int {
+	if *n.data(0)&(1<<1) == 0 {
+		return 0
+	}
+	off := 3 + n.nameLen()
+	return int(uint16(*n.data(off))<<8 | uint16(*n.data(off + 1)))
+}
+
+func (n name) name() (s string) {
+	if n.bytes == nil {
+		return ""
+	}
+	nl := n.nameLen()
+	if nl == 0 {
+		return ""
+	}
+	hdr := (*stringStruct)(unsafe.Pointer(&s))
+	hdr.str = unsafe.Pointer(n.data(3))
+	hdr.len = nl
+	return s
+}
+
+func (n name) tag() (s string) {
+	tl := n.tagLen()
+	if tl == 0 {
+		return ""
+	}
+	nl := n.nameLen()
+	hdr := (*stringStruct)(unsafe.Pointer(&s))
+	hdr.str = unsafe.Pointer(n.data(3 + nl + 2))
+	hdr.len = tl
+	return s
+}
+
+func (n name) pkgPath() string {
+	if n.bytes == nil || *n.data(0)&(1<<2) == 0 {
+		return ""
+	}
+	off := 3 + n.nameLen()
+	if tl := n.tagLen(); tl > 0 {
+		off += 2 + tl
+	}
+	var nameOff nameOff
+	copy((*[4]byte)(unsafe.Pointer(&nameOff))[:], (*[4]byte)(unsafe.Pointer(n.data(off)))[:])
+	pkgPathName := resolveNameOff(unsafe.Pointer(n.bytes), nameOff)
+	return pkgPathName.name()
+}
+
+func (n name) isBlank() bool {
+	if n.bytes == nil {
+		return false
+	}
+	if n.nameLen() != 1 {
+		return false
+	}
+	return *n.data(3) == '_'
+}
+
+// typelinksinit scans the types from extra modules and builds the
+// moduledata typemap used to de-duplicate type pointers.
+func typelinksinit() {
+	if firstmoduledata.next == nil {
+		return
+	}
+	typehash := make(map[uint32][]*_type, len(firstmoduledata.typelinks))
+
+	modules := activeModules()
+	prev := modules[0]
+	for _, md := range modules[1:] {
+		// Collect types from the previous module into typehash.
+	collect:
+		for _, tl := range prev.typelinks {
+			var t *_type
+			if prev.typemap == nil {
+				t = (*_type)(unsafe.Pointer(prev.types + uintptr(tl)))
+			} else {
+				t = prev.typemap[typeOff(tl)]
+			}
+			// Add to typehash if not seen before.
+			tlist := typehash[t.hash]
+			for _, tcur := range tlist {
+				if tcur == t {
+					continue collect
+				}
+			}
+			typehash[t.hash] = append(tlist, t)
+		}
+
+		if md.typemap == nil {
+			// If any of this module's typelinks match a type from a
+			// prior module, prefer that prior type by adding the offset
+			// to this module's typemap.
+			tm := make(map[typeOff]*_type, len(md.typelinks))
+			pinnedTypemaps = append(pinnedTypemaps, tm)
+			md.typemap = tm
+			for _, tl := range md.typelinks {
+				t := (*_type)(unsafe.Pointer(md.types + uintptr(tl)))
+				for _, candidate := range typehash[t.hash] {
+					seen := map[_typePair]struct{}{}
+					if typesEqual(t, candidate, seen) {
+						t = candidate
+						break
+					}
+				}
+				md.typemap[typeOff(tl)] = t
+			}
+		}
+
+		prev = md
+	}
+}
+
+type _typePair struct {
+	t1 *_type
+	t2 *_type
+}
+
+// typesEqual reports whether two types are equal.
+//
+// Everywhere in the runtime and reflect packages, it is assumed that
+// there is exactly one *_type per Go type, so that pointer equality
+// can be used to test if types are equal. There is one place that
+// breaks this assumption: buildmode=shared. In this case a type can
+// appear as two different pieces of memory. This is hidden from the
+// runtime and reflect package by the per-module typemap built in
+// typelinksinit. It uses typesEqual to map types from later modules
+// back into earlier ones.
+//
+// Only typelinksinit needs this function.
+func typesEqual(t, v *_type, seen map[_typePair]struct{}) bool {
+	tp := _typePair{t, v}
+	if _, ok := seen[tp]; ok {
+		return true
+	}
+
+	// mark these types as seen, and thus equivalent which prevents an infinite loop if
+	// the two types are identical, but recursively defined and loaded from
+	// different modules
+	seen[tp] = struct{}{}
+
+	if t == v {
+		return true
+	}
+	kind := t.kind & kindMask
+	if kind != v.kind&kindMask {
+		return false
+	}
+	if t.string() != v.string() {
+		return false
+	}
+	ut := t.uncommon()
+	uv := v.uncommon()
+	if ut != nil || uv != nil {
+		if ut == nil || uv == nil {
+			return false
+		}
+		pkgpatht := t.nameOff(ut.pkgpath).name()
+		pkgpathv := v.nameOff(uv.pkgpath).name()
+		if pkgpatht != pkgpathv {
+			return false
+		}
+	}
+	if kindBool <= kind && kind <= kindComplex128 {
+		return true
+	}
+	switch kind {
+	case kindString, kindUnsafePointer:
+		return true
+	case kindArray:
+		at := (*arraytype)(unsafe.Pointer(t))
+		av := (*arraytype)(unsafe.Pointer(v))
+		return typesEqual(at.elem, av.elem, seen) && at.len == av.len
+	case kindChan:
+		ct := (*chantype)(unsafe.Pointer(t))
+		cv := (*chantype)(unsafe.Pointer(v))
+		return ct.dir == cv.dir && typesEqual(ct.elem, cv.elem, seen)
+	case kindFunc:
+		ft := (*functype)(unsafe.Pointer(t))
+		fv := (*functype)(unsafe.Pointer(v))
+		if ft.outCount != fv.outCount || ft.inCount != fv.inCount {
+			return false
+		}
+		tin, vin := ft.in(), fv.in()
+		for i := 0; i < len(tin); i++ {
+			if !typesEqual(tin[i], vin[i], seen) {
+				return false
+			}
+		}
+		tout, vout := ft.out(), fv.out()
+		for i := 0; i < len(tout); i++ {
+			if !typesEqual(tout[i], vout[i], seen) {
+				return false
+			}
+		}
+		return true
+	case kindInterface:
+		it := (*interfacetype)(unsafe.Pointer(t))
+		iv := (*interfacetype)(unsafe.Pointer(v))
+		if it.pkgpath.name() != iv.pkgpath.name() {
+			return false
+		}
+		if len(it.mhdr) != len(iv.mhdr) {
+			return false
+		}
+		for i := range it.mhdr {
+			tm := &it.mhdr[i]
+			vm := &iv.mhdr[i]
+			// Note the mhdr array can be relocated from
+			// another module. See #17724.
+			tname := resolveNameOff(unsafe.Pointer(tm), tm.name)
+			vname := resolveNameOff(unsafe.Pointer(vm), vm.name)
+			if tname.name() != vname.name() {
+				return false
+			}
+			if tname.pkgPath() != vname.pkgPath() {
+				return false
+			}
+			tityp := resolveTypeOff(unsafe.Pointer(tm), tm.ityp)
+			vityp := resolveTypeOff(unsafe.Pointer(vm), vm.ityp)
+			if !typesEqual(tityp, vityp, seen) {
+				return false
+			}
+		}
+		return true
+	case kindMap:
+		mt := (*maptype)(unsafe.Pointer(t))
+		mv := (*maptype)(unsafe.Pointer(v))
+		return typesEqual(mt.key, mv.key, seen) && typesEqual(mt.elem, mv.elem, seen)
+	case kindPtr:
+		pt := (*ptrtype)(unsafe.Pointer(t))
+		pv := (*ptrtype)(unsafe.Pointer(v))
+		return typesEqual(pt.elem, pv.elem, seen)
+	case kindSlice:
+		st := (*slicetype)(unsafe.Pointer(t))
+		sv := (*slicetype)(unsafe.Pointer(v))
+		return typesEqual(st.elem, sv.elem, seen)
+	case kindStruct:
+		st := (*structtype)(unsafe.Pointer(t))
+		sv := (*structtype)(unsafe.Pointer(v))
+		if len(st.fields) != len(sv.fields) {
+			return false
+		}
+		if st.pkgPath.name() != sv.pkgPath.name() {
+			return false
+		}
+		for i := range st.fields {
+			tf := &st.fields[i]
+			vf := &sv.fields[i]
+			if tf.name.name() != vf.name.name() {
+				return false
+			}
+			if !typesEqual(tf.typ, vf.typ, seen) {
+				return false
+			}
+			if tf.name.tag() != vf.name.tag() {
+				return false
+			}
+			if tf.offsetAnon != vf.offsetAnon {
+				return false
+			}
+		}
+		return true
+	default:
+		println("runtime: impossible type kind", kind)
+		throw("runtime: impossible type kind")
+		return false
+	}
+}
diff --git a/src/runtime/typekind.go b/src/runtime/typekind.go
new file mode 100644
index 0000000..7087a9b
--- /dev/null
+++ b/src/runtime/typekind.go
@@ -0,0 +1,43 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	kindBool = 1 + iota
+	kindInt
+	kindInt8
+	kindInt16
+	kindInt32
+	kindInt64
+	kindUint
+	kindUint8
+	kindUint16
+	kindUint32
+	kindUint64
+	kindUintptr
+	kindFloat32
+	kindFloat64
+	kindComplex64
+	kindComplex128
+	kindArray
+	kindChan
+	kindFunc
+	kindInterface
+	kindMap
+	kindPtr
+	kindSlice
+	kindString
+	kindStruct
+	kindUnsafePointer
+
+	kindDirectIface = 1 << 5
+	kindGCProg      = 1 << 6
+	kindMask        = (1 << 5) - 1
+)
+
+// isDirectIface reports whether t is stored directly in an interface value.
+func isDirectIface(t *_type) bool {
+	return t.kind&kindDirectIface != 0
+}
diff --git a/src/runtime/utf8.go b/src/runtime/utf8.go
new file mode 100644
index 0000000..52b7576
--- /dev/null
+++ b/src/runtime/utf8.go
@@ -0,0 +1,132 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Numbers fundamental to the encoding.
+const (
+	runeError = '\uFFFD'     // the "error" Rune or "Unicode replacement character"
+	runeSelf  = 0x80         // characters below runeSelf are represented as themselves in a single byte.
+	maxRune   = '\U0010FFFF' // Maximum valid Unicode code point.
+)
+
+// Code points in the surrogate range are not valid for UTF-8.
+const (
+	surrogateMin = 0xD800
+	surrogateMax = 0xDFFF
+)
+
+const (
+	t1 = 0x00 // 0000 0000
+	tx = 0x80 // 1000 0000
+	t2 = 0xC0 // 1100 0000
+	t3 = 0xE0 // 1110 0000
+	t4 = 0xF0 // 1111 0000
+	t5 = 0xF8 // 1111 1000
+
+	maskx = 0x3F // 0011 1111
+	mask2 = 0x1F // 0001 1111
+	mask3 = 0x0F // 0000 1111
+	mask4 = 0x07 // 0000 0111
+
+	rune1Max = 1<<7 - 1
+	rune2Max = 1<<11 - 1
+	rune3Max = 1<<16 - 1
+
+	// The default lowest and highest continuation byte.
+	locb = 0x80 // 1000 0000
+	hicb = 0xBF // 1011 1111
+)
+
+// countrunes returns the number of runes in s.
+func countrunes(s string) int {
+	n := 0
+	for range s {
+		n++
+	}
+	return n
+}
+
+// decoderune returns the non-ASCII rune at the start of
+// s[k:] and the index after the rune in s.
+//
+// decoderune assumes that caller has checked that
+// the to be decoded rune is a non-ASCII rune.
+//
+// If the string appears to be incomplete or decoding problems
+// are encountered (runeerror, k + 1) is returned to ensure
+// progress when decoderune is used to iterate over a string.
+func decoderune(s string, k int) (r rune, pos int) {
+	pos = k
+
+	if k >= len(s) {
+		return runeError, k + 1
+	}
+
+	s = s[k:]
+
+	switch {
+	case t2 <= s[0] && s[0] < t3:
+		// 0080-07FF two byte sequence
+		if len(s) > 1 && (locb <= s[1] && s[1] <= hicb) {
+			r = rune(s[0]&mask2)<<6 | rune(s[1]&maskx)
+			pos += 2
+			if rune1Max < r {
+				return
+			}
+		}
+	case t3 <= s[0] && s[0] < t4:
+		// 0800-FFFF three byte sequence
+		if len(s) > 2 && (locb <= s[1] && s[1] <= hicb) && (locb <= s[2] && s[2] <= hicb) {
+			r = rune(s[0]&mask3)<<12 | rune(s[1]&maskx)<<6 | rune(s[2]&maskx)
+			pos += 3
+			if rune2Max < r && !(surrogateMin <= r && r <= surrogateMax) {
+				return
+			}
+		}
+	case t4 <= s[0] && s[0] < t5:
+		// 10000-1FFFFF four byte sequence
+		if len(s) > 3 && (locb <= s[1] && s[1] <= hicb) && (locb <= s[2] && s[2] <= hicb) && (locb <= s[3] && s[3] <= hicb) {
+			r = rune(s[0]&mask4)<<18 | rune(s[1]&maskx)<<12 | rune(s[2]&maskx)<<6 | rune(s[3]&maskx)
+			pos += 4
+			if rune3Max < r && r <= maxRune {
+				return
+			}
+		}
+	}
+
+	return runeError, k + 1
+}
+
+// encoderune writes into p (which must be large enough) the UTF-8 encoding of the rune.
+// It returns the number of bytes written.
+func encoderune(p []byte, r rune) int {
+	// Negative values are erroneous. Making it unsigned addresses the problem.
+	switch i := uint32(r); {
+	case i <= rune1Max:
+		p[0] = byte(r)
+		return 1
+	case i <= rune2Max:
+		_ = p[1] // eliminate bounds checks
+		p[0] = t2 | byte(r>>6)
+		p[1] = tx | byte(r)&maskx
+		return 2
+	case i > maxRune, surrogateMin <= i && i <= surrogateMax:
+		r = runeError
+		fallthrough
+	case i <= rune3Max:
+		_ = p[2] // eliminate bounds checks
+		p[0] = t3 | byte(r>>12)
+		p[1] = tx | byte(r>>6)&maskx
+		p[2] = tx | byte(r)&maskx
+		return 3
+	default:
+		_ = p[3] // eliminate bounds checks
+		p[0] = t4 | byte(r>>18)
+		p[1] = tx | byte(r>>12)&maskx
+		p[2] = tx | byte(r>>6)&maskx
+		p[3] = tx | byte(r)&maskx
+		return 4
+	}
+}
diff --git a/src/runtime/vdso_elf32.go b/src/runtime/vdso_elf32.go
new file mode 100644
index 0000000..2720f33
--- /dev/null
+++ b/src/runtime/vdso_elf32.go
@@ -0,0 +1,80 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build 386 arm
+
+package runtime
+
+// ELF32 structure definitions for use by the vDSO loader
+
+type elfSym struct {
+	st_name  uint32
+	st_value uint32
+	st_size  uint32
+	st_info  byte
+	st_other byte
+	st_shndx uint16
+}
+
+type elfVerdef struct {
+	vd_version uint16 /* Version revision */
+	vd_flags   uint16 /* Version information */
+	vd_ndx     uint16 /* Version Index */
+	vd_cnt     uint16 /* Number of associated aux entries */
+	vd_hash    uint32 /* Version name hash value */
+	vd_aux     uint32 /* Offset in bytes to verdaux array */
+	vd_next    uint32 /* Offset in bytes to next verdef entry */
+}
+
+type elfEhdr struct {
+	e_ident     [_EI_NIDENT]byte /* Magic number and other info */
+	e_type      uint16           /* Object file type */
+	e_machine   uint16           /* Architecture */
+	e_version   uint32           /* Object file version */
+	e_entry     uint32           /* Entry point virtual address */
+	e_phoff     uint32           /* Program header table file offset */
+	e_shoff     uint32           /* Section header table file offset */
+	e_flags     uint32           /* Processor-specific flags */
+	e_ehsize    uint16           /* ELF header size in bytes */
+	e_phentsize uint16           /* Program header table entry size */
+	e_phnum     uint16           /* Program header table entry count */
+	e_shentsize uint16           /* Section header table entry size */
+	e_shnum     uint16           /* Section header table entry count */
+	e_shstrndx  uint16           /* Section header string table index */
+}
+
+type elfPhdr struct {
+	p_type   uint32 /* Segment type */
+	p_offset uint32 /* Segment file offset */
+	p_vaddr  uint32 /* Segment virtual address */
+	p_paddr  uint32 /* Segment physical address */
+	p_filesz uint32 /* Segment size in file */
+	p_memsz  uint32 /* Segment size in memory */
+	p_flags  uint32 /* Segment flags */
+	p_align  uint32 /* Segment alignment */
+}
+
+type elfShdr struct {
+	sh_name      uint32 /* Section name (string tbl index) */
+	sh_type      uint32 /* Section type */
+	sh_flags     uint32 /* Section flags */
+	sh_addr      uint32 /* Section virtual addr at execution */
+	sh_offset    uint32 /* Section file offset */
+	sh_size      uint32 /* Section size in bytes */
+	sh_link      uint32 /* Link to another section */
+	sh_info      uint32 /* Additional section information */
+	sh_addralign uint32 /* Section alignment */
+	sh_entsize   uint32 /* Entry size if section holds table */
+}
+
+type elfDyn struct {
+	d_tag int32  /* Dynamic entry type */
+	d_val uint32 /* Integer value */
+}
+
+type elfVerdaux struct {
+	vda_name uint32 /* Version or dependency names */
+	vda_next uint32 /* Offset in bytes to next verdaux entry */
+}
diff --git a/src/runtime/vdso_elf64.go b/src/runtime/vdso_elf64.go
new file mode 100644
index 0000000..6ded9d6
--- /dev/null
+++ b/src/runtime/vdso_elf64.go
@@ -0,0 +1,80 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build amd64 arm64 mips64 mips64le ppc64 ppc64le
+
+package runtime
+
+// ELF64 structure definitions for use by the vDSO loader
+
+type elfSym struct {
+	st_name  uint32
+	st_info  byte
+	st_other byte
+	st_shndx uint16
+	st_value uint64
+	st_size  uint64
+}
+
+type elfVerdef struct {
+	vd_version uint16 /* Version revision */
+	vd_flags   uint16 /* Version information */
+	vd_ndx     uint16 /* Version Index */
+	vd_cnt     uint16 /* Number of associated aux entries */
+	vd_hash    uint32 /* Version name hash value */
+	vd_aux     uint32 /* Offset in bytes to verdaux array */
+	vd_next    uint32 /* Offset in bytes to next verdef entry */
+}
+
+type elfEhdr struct {
+	e_ident     [_EI_NIDENT]byte /* Magic number and other info */
+	e_type      uint16           /* Object file type */
+	e_machine   uint16           /* Architecture */
+	e_version   uint32           /* Object file version */
+	e_entry     uint64           /* Entry point virtual address */
+	e_phoff     uint64           /* Program header table file offset */
+	e_shoff     uint64           /* Section header table file offset */
+	e_flags     uint32           /* Processor-specific flags */
+	e_ehsize    uint16           /* ELF header size in bytes */
+	e_phentsize uint16           /* Program header table entry size */
+	e_phnum     uint16           /* Program header table entry count */
+	e_shentsize uint16           /* Section header table entry size */
+	e_shnum     uint16           /* Section header table entry count */
+	e_shstrndx  uint16           /* Section header string table index */
+}
+
+type elfPhdr struct {
+	p_type   uint32 /* Segment type */
+	p_flags  uint32 /* Segment flags */
+	p_offset uint64 /* Segment file offset */
+	p_vaddr  uint64 /* Segment virtual address */
+	p_paddr  uint64 /* Segment physical address */
+	p_filesz uint64 /* Segment size in file */
+	p_memsz  uint64 /* Segment size in memory */
+	p_align  uint64 /* Segment alignment */
+}
+
+type elfShdr struct {
+	sh_name      uint32 /* Section name (string tbl index) */
+	sh_type      uint32 /* Section type */
+	sh_flags     uint64 /* Section flags */
+	sh_addr      uint64 /* Section virtual addr at execution */
+	sh_offset    uint64 /* Section file offset */
+	sh_size      uint64 /* Section size in bytes */
+	sh_link      uint32 /* Link to another section */
+	sh_info      uint32 /* Additional section information */
+	sh_addralign uint64 /* Section alignment */
+	sh_entsize   uint64 /* Entry size if section holds table */
+}
+
+type elfDyn struct {
+	d_tag int64  /* Dynamic entry type */
+	d_val uint64 /* Integer value */
+}
+
+type elfVerdaux struct {
+	vda_name uint32 /* Version or dependency names */
+	vda_next uint32 /* Offset in bytes to next verdaux entry */
+}
diff --git a/src/runtime/vdso_freebsd.go b/src/runtime/vdso_freebsd.go
new file mode 100644
index 0000000..122cc8b
--- /dev/null
+++ b/src/runtime/vdso_freebsd.go
@@ -0,0 +1,114 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build freebsd
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const _VDSO_TH_NUM = 4 // defined in <sys/vdso.h> #ifdef _KERNEL
+
+var timekeepSharedPage *vdsoTimekeep
+
+//go:nosplit
+func (bt *bintime) Add(bt2 *bintime) {
+	u := bt.frac
+	bt.frac += bt2.frac
+	if u > bt.frac {
+		bt.sec++
+	}
+	bt.sec += bt2.sec
+}
+
+//go:nosplit
+func (bt *bintime) AddX(x uint64) {
+	u := bt.frac
+	bt.frac += x
+	if u > bt.frac {
+		bt.sec++
+	}
+}
+
+var (
+	// binuptimeDummy is used in binuptime as the address of an atomic.Load, to simulate
+	// an atomic_thread_fence_acq() call which behaves as an instruction reordering and
+	// memory barrier.
+	binuptimeDummy uint32
+
+	zeroBintime bintime
+)
+
+// based on /usr/src/lib/libc/sys/__vdso_gettimeofday.c
+//
+//go:nosplit
+func binuptime(abs bool) (bt bintime) {
+	timehands := (*[_VDSO_TH_NUM]vdsoTimehands)(add(unsafe.Pointer(timekeepSharedPage), vdsoTimekeepSize))
+	for {
+		if timekeepSharedPage.enabled == 0 {
+			return zeroBintime
+		}
+
+		curr := atomic.Load(&timekeepSharedPage.current) // atomic_load_acq_32
+		th := &timehands[curr]
+		gen := atomic.Load(&th.gen) // atomic_load_acq_32
+		bt = th.offset
+
+		if tc, ok := th.getTimecounter(); !ok {
+			return zeroBintime
+		} else {
+			delta := (tc - th.offset_count) & th.counter_mask
+			bt.AddX(th.scale * uint64(delta))
+		}
+		if abs {
+			bt.Add(&th.boottime)
+		}
+
+		atomic.Load(&binuptimeDummy) // atomic_thread_fence_acq()
+		if curr == timekeepSharedPage.current && gen != 0 && gen == th.gen {
+			break
+		}
+	}
+	return bt
+}
+
+//go:nosplit
+func vdsoClockGettime(clockID int32) bintime {
+	if timekeepSharedPage == nil || timekeepSharedPage.ver != _VDSO_TK_VER_CURR {
+		return zeroBintime
+	}
+	abs := false
+	switch clockID {
+	case _CLOCK_MONOTONIC:
+		/* ok */
+	case _CLOCK_REALTIME:
+		abs = true
+	default:
+		return zeroBintime
+	}
+	return binuptime(abs)
+}
+
+func fallback_nanotime() int64
+func fallback_walltime() (sec int64, nsec int32)
+
+//go:nosplit
+func nanotime1() int64 {
+	bt := vdsoClockGettime(_CLOCK_MONOTONIC)
+	if bt == zeroBintime {
+		return fallback_nanotime()
+	}
+	return int64((1e9 * uint64(bt.sec)) + ((1e9 * uint64(bt.frac>>32)) >> 32))
+}
+
+func walltime1() (sec int64, nsec int32) {
+	bt := vdsoClockGettime(_CLOCK_REALTIME)
+	if bt == zeroBintime {
+		return fallback_walltime()
+	}
+	return int64(bt.sec), int32((1e9 * uint64(bt.frac>>32)) >> 32)
+}
diff --git a/src/runtime/vdso_freebsd_arm.go b/src/runtime/vdso_freebsd_arm.go
new file mode 100644
index 0000000..669fed0
--- /dev/null
+++ b/src/runtime/vdso_freebsd_arm.go
@@ -0,0 +1,21 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_VDSO_TH_ALGO_ARM_GENTIM = 1
+)
+
+func getCntxct(physical bool) uint32
+
+//go:nosplit
+func (th *vdsoTimehands) getTimecounter() (uint32, bool) {
+	switch th.algo {
+	case _VDSO_TH_ALGO_ARM_GENTIM:
+		return getCntxct(th.physical != 0), true
+	default:
+		return 0, false
+	}
+}
diff --git a/src/runtime/vdso_freebsd_arm64.go b/src/runtime/vdso_freebsd_arm64.go
new file mode 100644
index 0000000..7d9f62d
--- /dev/null
+++ b/src/runtime/vdso_freebsd_arm64.go
@@ -0,0 +1,21 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_VDSO_TH_ALGO_ARM_GENTIM = 1
+)
+
+func getCntxct(physical bool) uint32
+
+//go:nosplit
+func (th *vdsoTimehands) getTimecounter() (uint32, bool) {
+	switch th.algo {
+	case _VDSO_TH_ALGO_ARM_GENTIM:
+		return getCntxct(false), true
+	default:
+		return 0, false
+	}
+}
diff --git a/src/runtime/vdso_freebsd_x86.go b/src/runtime/vdso_freebsd_x86.go
new file mode 100644
index 0000000..1fa5d80
--- /dev/null
+++ b/src/runtime/vdso_freebsd_x86.go
@@ -0,0 +1,93 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build freebsd
+// +build 386 amd64
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const (
+	_VDSO_TH_ALGO_X86_TSC  = 1
+	_VDSO_TH_ALGO_X86_HPET = 2
+)
+
+const (
+	_HPET_DEV_MAP_MAX  = 10
+	_HPET_MAIN_COUNTER = 0xf0 /* Main counter register */
+
+	hpetDevPath = "/dev/hpetX\x00"
+)
+
+var hpetDevMap [_HPET_DEV_MAP_MAX]uintptr
+
+//go:nosplit
+func (th *vdsoTimehands) getTSCTimecounter() uint32 {
+	tsc := cputicks()
+	if th.x86_shift > 0 {
+		tsc >>= th.x86_shift
+	}
+	return uint32(tsc)
+}
+
+//go:systemstack
+func (th *vdsoTimehands) getHPETTimecounter() (uint32, bool) {
+	const digits = "0123456789"
+
+	idx := int(th.x86_hpet_idx)
+	if idx >= len(hpetDevMap) {
+		return 0, false
+	}
+
+	p := atomic.Loaduintptr(&hpetDevMap[idx])
+	if p == 0 {
+		var devPath [len(hpetDevPath)]byte
+		copy(devPath[:], hpetDevPath)
+		devPath[9] = digits[idx]
+
+		fd := open(&devPath[0], 0 /* O_RDONLY */, 0)
+		if fd < 0 {
+			atomic.Casuintptr(&hpetDevMap[idx], 0, ^uintptr(0))
+			return 0, false
+		}
+
+		addr, mmapErr := mmap(nil, physPageSize, _PROT_READ, _MAP_SHARED, fd, 0)
+		closefd(fd)
+		newP := uintptr(addr)
+		if mmapErr != 0 {
+			newP = ^uintptr(0)
+		}
+		if !atomic.Casuintptr(&hpetDevMap[idx], 0, newP) && mmapErr == 0 {
+			munmap(addr, physPageSize)
+		}
+		p = atomic.Loaduintptr(&hpetDevMap[idx])
+	}
+	if p == ^uintptr(0) {
+		return 0, false
+	}
+	return *(*uint32)(unsafe.Pointer(p + _HPET_MAIN_COUNTER)), true
+}
+
+//go:nosplit
+func (th *vdsoTimehands) getTimecounter() (uint32, bool) {
+	switch th.algo {
+	case _VDSO_TH_ALGO_X86_TSC:
+		return th.getTSCTimecounter(), true
+	case _VDSO_TH_ALGO_X86_HPET:
+		var (
+			tc uint32
+			ok bool
+		)
+		systemstack(func() {
+			tc, ok = th.getHPETTimecounter()
+		})
+		return tc, ok
+	default:
+		return 0, false
+	}
+}
diff --git a/src/runtime/vdso_in_none.go b/src/runtime/vdso_in_none.go
new file mode 100644
index 0000000..7f4019c
--- /dev/null
+++ b/src/runtime/vdso_in_none.go
@@ -0,0 +1,13 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux,!386,!amd64,!arm,!arm64,!mips64,!mips64le,!ppc64,!ppc64le !linux
+
+package runtime
+
+// A dummy version of inVDSOPage for targets that don't use a VDSO.
+
+func inVDSOPage(pc uintptr) bool {
+	return false
+}
diff --git a/src/runtime/vdso_linux.go b/src/runtime/vdso_linux.go
new file mode 100644
index 0000000..6e29424
--- /dev/null
+++ b/src/runtime/vdso_linux.go
@@ -0,0 +1,293 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build 386 amd64 arm arm64 mips64 mips64le ppc64 ppc64le
+
+package runtime
+
+import "unsafe"
+
+// Look up symbols in the Linux vDSO.
+
+// This code was originally based on the sample Linux vDSO parser at
+// https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/vDSO/parse_vdso.c
+
+// This implements the ELF dynamic linking spec at
+// http://sco.com/developers/gabi/latest/ch5.dynamic.html
+
+// The version section is documented at
+// https://refspecs.linuxfoundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/symversion.html
+
+const (
+	_AT_SYSINFO_EHDR = 33
+
+	_PT_LOAD    = 1 /* Loadable program segment */
+	_PT_DYNAMIC = 2 /* Dynamic linking information */
+
+	_DT_NULL     = 0          /* Marks end of dynamic section */
+	_DT_HASH     = 4          /* Dynamic symbol hash table */
+	_DT_STRTAB   = 5          /* Address of string table */
+	_DT_SYMTAB   = 6          /* Address of symbol table */
+	_DT_GNU_HASH = 0x6ffffef5 /* GNU-style dynamic symbol hash table */
+	_DT_VERSYM   = 0x6ffffff0
+	_DT_VERDEF   = 0x6ffffffc
+
+	_VER_FLG_BASE = 0x1 /* Version definition of file itself */
+
+	_SHN_UNDEF = 0 /* Undefined section */
+
+	_SHT_DYNSYM = 11 /* Dynamic linker symbol table */
+
+	_STT_FUNC = 2 /* Symbol is a code object */
+
+	_STT_NOTYPE = 0 /* Symbol type is not specified */
+
+	_STB_GLOBAL = 1 /* Global symbol */
+	_STB_WEAK   = 2 /* Weak symbol */
+
+	_EI_NIDENT = 16
+
+	// Maximum indices for the array types used when traversing the vDSO ELF structures.
+	// Computed from architecture-specific max provided by vdso_linux_*.go
+	vdsoSymTabSize     = vdsoArrayMax / unsafe.Sizeof(elfSym{})
+	vdsoDynSize        = vdsoArrayMax / unsafe.Sizeof(elfDyn{})
+	vdsoSymStringsSize = vdsoArrayMax     // byte
+	vdsoVerSymSize     = vdsoArrayMax / 2 // uint16
+	vdsoHashSize       = vdsoArrayMax / 4 // uint32
+
+	// vdsoBloomSizeScale is a scaling factor for gnuhash tables which are uint32 indexed,
+	// but contain uintptrs
+	vdsoBloomSizeScale = unsafe.Sizeof(uintptr(0)) / 4 // uint32
+)
+
+/* How to extract and insert information held in the st_info field.  */
+func _ELF_ST_BIND(val byte) byte { return val >> 4 }
+func _ELF_ST_TYPE(val byte) byte { return val & 0xf }
+
+type vdsoSymbolKey struct {
+	name    string
+	symHash uint32
+	gnuHash uint32
+	ptr     *uintptr
+}
+
+type vdsoVersionKey struct {
+	version string
+	verHash uint32
+}
+
+type vdsoInfo struct {
+	valid bool
+
+	/* Load information */
+	loadAddr   uintptr
+	loadOffset uintptr /* loadAddr - recorded vaddr */
+
+	/* Symbol table */
+	symtab     *[vdsoSymTabSize]elfSym
+	symstrings *[vdsoSymStringsSize]byte
+	chain      []uint32
+	bucket     []uint32
+	symOff     uint32
+	isGNUHash  bool
+
+	/* Version table */
+	versym *[vdsoVerSymSize]uint16
+	verdef *elfVerdef
+}
+
+// see vdso_linux_*.go for vdsoSymbolKeys[] and vdso*Sym vars
+
+func vdsoInitFromSysinfoEhdr(info *vdsoInfo, hdr *elfEhdr) {
+	info.valid = false
+	info.loadAddr = uintptr(unsafe.Pointer(hdr))
+
+	pt := unsafe.Pointer(info.loadAddr + uintptr(hdr.e_phoff))
+
+	// We need two things from the segment table: the load offset
+	// and the dynamic table.
+	var foundVaddr bool
+	var dyn *[vdsoDynSize]elfDyn
+	for i := uint16(0); i < hdr.e_phnum; i++ {
+		pt := (*elfPhdr)(add(pt, uintptr(i)*unsafe.Sizeof(elfPhdr{})))
+		switch pt.p_type {
+		case _PT_LOAD:
+			if !foundVaddr {
+				foundVaddr = true
+				info.loadOffset = info.loadAddr + uintptr(pt.p_offset-pt.p_vaddr)
+			}
+
+		case _PT_DYNAMIC:
+			dyn = (*[vdsoDynSize]elfDyn)(unsafe.Pointer(info.loadAddr + uintptr(pt.p_offset)))
+		}
+	}
+
+	if !foundVaddr || dyn == nil {
+		return // Failed
+	}
+
+	// Fish out the useful bits of the dynamic table.
+
+	var hash, gnuhash *[vdsoHashSize]uint32
+	info.symstrings = nil
+	info.symtab = nil
+	info.versym = nil
+	info.verdef = nil
+	for i := 0; dyn[i].d_tag != _DT_NULL; i++ {
+		dt := &dyn[i]
+		p := info.loadOffset + uintptr(dt.d_val)
+		switch dt.d_tag {
+		case _DT_STRTAB:
+			info.symstrings = (*[vdsoSymStringsSize]byte)(unsafe.Pointer(p))
+		case _DT_SYMTAB:
+			info.symtab = (*[vdsoSymTabSize]elfSym)(unsafe.Pointer(p))
+		case _DT_HASH:
+			hash = (*[vdsoHashSize]uint32)(unsafe.Pointer(p))
+		case _DT_GNU_HASH:
+			gnuhash = (*[vdsoHashSize]uint32)(unsafe.Pointer(p))
+		case _DT_VERSYM:
+			info.versym = (*[vdsoVerSymSize]uint16)(unsafe.Pointer(p))
+		case _DT_VERDEF:
+			info.verdef = (*elfVerdef)(unsafe.Pointer(p))
+		}
+	}
+
+	if info.symstrings == nil || info.symtab == nil || (hash == nil && gnuhash == nil) {
+		return // Failed
+	}
+
+	if info.verdef == nil {
+		info.versym = nil
+	}
+
+	if gnuhash != nil {
+		// Parse the GNU hash table header.
+		nbucket := gnuhash[0]
+		info.symOff = gnuhash[1]
+		bloomSize := gnuhash[2]
+		info.bucket = gnuhash[4+bloomSize*uint32(vdsoBloomSizeScale):][:nbucket]
+		info.chain = gnuhash[4+bloomSize*uint32(vdsoBloomSizeScale)+nbucket:]
+		info.isGNUHash = true
+	} else {
+		// Parse the hash table header.
+		nbucket := hash[0]
+		nchain := hash[1]
+		info.bucket = hash[2 : 2+nbucket]
+		info.chain = hash[2+nbucket : 2+nbucket+nchain]
+	}
+
+	// That's all we need.
+	info.valid = true
+}
+
+func vdsoFindVersion(info *vdsoInfo, ver *vdsoVersionKey) int32 {
+	if !info.valid {
+		return 0
+	}
+
+	def := info.verdef
+	for {
+		if def.vd_flags&_VER_FLG_BASE == 0 {
+			aux := (*elfVerdaux)(add(unsafe.Pointer(def), uintptr(def.vd_aux)))
+			if def.vd_hash == ver.verHash && ver.version == gostringnocopy(&info.symstrings[aux.vda_name]) {
+				return int32(def.vd_ndx & 0x7fff)
+			}
+		}
+
+		if def.vd_next == 0 {
+			break
+		}
+		def = (*elfVerdef)(add(unsafe.Pointer(def), uintptr(def.vd_next)))
+	}
+
+	return -1 // cannot match any version
+}
+
+func vdsoParseSymbols(info *vdsoInfo, version int32) {
+	if !info.valid {
+		return
+	}
+
+	apply := func(symIndex uint32, k vdsoSymbolKey) bool {
+		sym := &info.symtab[symIndex]
+		typ := _ELF_ST_TYPE(sym.st_info)
+		bind := _ELF_ST_BIND(sym.st_info)
+		// On ppc64x, VDSO functions are of type _STT_NOTYPE.
+		if typ != _STT_FUNC && typ != _STT_NOTYPE || bind != _STB_GLOBAL && bind != _STB_WEAK || sym.st_shndx == _SHN_UNDEF {
+			return false
+		}
+		if k.name != gostringnocopy(&info.symstrings[sym.st_name]) {
+			return false
+		}
+		// Check symbol version.
+		if info.versym != nil && version != 0 && int32(info.versym[symIndex]&0x7fff) != version {
+			return false
+		}
+
+		*k.ptr = info.loadOffset + uintptr(sym.st_value)
+		return true
+	}
+
+	if !info.isGNUHash {
+		// Old-style DT_HASH table.
+		for _, k := range vdsoSymbolKeys {
+			for chain := info.bucket[k.symHash%uint32(len(info.bucket))]; chain != 0; chain = info.chain[chain] {
+				if apply(chain, k) {
+					break
+				}
+			}
+		}
+		return
+	}
+
+	// New-style DT_GNU_HASH table.
+	for _, k := range vdsoSymbolKeys {
+		symIndex := info.bucket[k.gnuHash%uint32(len(info.bucket))]
+		if symIndex < info.symOff {
+			continue
+		}
+		for ; ; symIndex++ {
+			hash := info.chain[symIndex-info.symOff]
+			if hash|1 == k.gnuHash|1 {
+				// Found a hash match.
+				if apply(symIndex, k) {
+					break
+				}
+			}
+			if hash&1 != 0 {
+				// End of chain.
+				break
+			}
+		}
+	}
+}
+
+func vdsoauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_SYSINFO_EHDR:
+		if val == 0 {
+			// Something went wrong
+			return
+		}
+		var info vdsoInfo
+		// TODO(rsc): I don't understand why the compiler thinks info escapes
+		// when passed to the three functions below.
+		info1 := (*vdsoInfo)(noescape(unsafe.Pointer(&info)))
+		vdsoInitFromSysinfoEhdr(info1, (*elfEhdr)(unsafe.Pointer(val)))
+		vdsoParseSymbols(info1, vdsoFindVersion(info1, &vdsoLinuxVersion))
+	}
+}
+
+// vdsoMarker reports whether PC is on the VDSO page.
+//go:nosplit
+func inVDSOPage(pc uintptr) bool {
+	for _, k := range vdsoSymbolKeys {
+		if *k.ptr != 0 {
+			page := *k.ptr &^ (physPageSize - 1)
+			return pc >= page && pc < page+physPageSize
+		}
+	}
+	return false
+}
diff --git a/src/runtime/vdso_linux_386.go b/src/runtime/vdso_linux_386.go
new file mode 100644
index 0000000..5092c7c
--- /dev/null
+++ b/src/runtime/vdso_linux_386.go
@@ -0,0 +1,21 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/x86/galign.go arch.MAXWIDTH initialization, but must also
+	// be constrained to max +ve int.
+	vdsoArrayMax = 1<<31 - 1
+)
+
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6", 0x3ae75f6}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__vdso_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+// initialize to fall back to syscall
+var vdsoClockgettimeSym uintptr = 0
diff --git a/src/runtime/vdso_linux_amd64.go b/src/runtime/vdso_linux_amd64.go
new file mode 100644
index 0000000..4e9f748
--- /dev/null
+++ b/src/runtime/vdso_linux_amd64.go
@@ -0,0 +1,23 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/amd64/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6", 0x3ae75f6}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__vdso_gettimeofday", 0x315ca59, 0xb01bca00, &vdsoGettimeofdaySym},
+	{"__vdso_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+var (
+	vdsoGettimeofdaySym uintptr
+	vdsoClockgettimeSym uintptr
+)
diff --git a/src/runtime/vdso_linux_arm.go b/src/runtime/vdso_linux_arm.go
new file mode 100644
index 0000000..ac3bdcf
--- /dev/null
+++ b/src/runtime/vdso_linux_arm.go
@@ -0,0 +1,21 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/arm/galign.go arch.MAXWIDTH initialization, but must also
+	// be constrained to max +ve int.
+	vdsoArrayMax = 1<<31 - 1
+)
+
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6", 0x3ae75f6}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__vdso_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+// initialize to fall back to syscall
+var vdsoClockgettimeSym uintptr = 0
diff --git a/src/runtime/vdso_linux_arm64.go b/src/runtime/vdso_linux_arm64.go
new file mode 100644
index 0000000..2f003cd
--- /dev/null
+++ b/src/runtime/vdso_linux_arm64.go
@@ -0,0 +1,21 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/arm64/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+// key and version at man 7 vdso : aarch64
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6.39", 0x75fcb89}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__kernel_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+// initialize to fall back to syscall
+var vdsoClockgettimeSym uintptr = 0
diff --git a/src/runtime/vdso_linux_mips64x.go b/src/runtime/vdso_linux_mips64x.go
new file mode 100644
index 0000000..3a0f947
--- /dev/null
+++ b/src/runtime/vdso_linux_mips64x.go
@@ -0,0 +1,28 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build mips64 mips64le
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/mips64/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+// see man 7 vdso : mips
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6", 0x3ae75f6}
+
+// The symbol name is not __kernel_clock_gettime as suggested by the manpage;
+// according to Linux source code it should be __vdso_clock_gettime instead.
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__vdso_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+// initialize to fall back to syscall
+var (
+	vdsoClockgettimeSym uintptr = 0
+)
diff --git a/src/runtime/vdso_linux_ppc64x.go b/src/runtime/vdso_linux_ppc64x.go
new file mode 100644
index 0000000..f30946e
--- /dev/null
+++ b/src/runtime/vdso_linux_ppc64x.go
@@ -0,0 +1,25 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+// +build ppc64 ppc64le
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/ppc64/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6.15", 0x75fcba5}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__kernel_clock_gettime", 0xb0cd725, 0xdfa941fd, &vdsoClockgettimeSym},
+}
+
+// initialize with vsyscall fallbacks
+var (
+	vdsoClockgettimeSym uintptr = 0
+)
diff --git a/src/runtime/vlop_386.s b/src/runtime/vlop_386.s
new file mode 100644
index 0000000..b478ff8
--- /dev/null
+++ b/src/runtime/vlop_386.s
@@ -0,0 +1,56 @@
+// Inferno's libkern/vlop-386.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/vlop-386.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "textflag.h"
+
+/*
+ * C runtime for 64-bit divide.
+ */
+
+// runtime·_mul64x32(lo64 *uint64, a uint64, b uint32) (hi32 uint32)
+// sets *lo64 = low 64 bits of 96-bit product a*b; returns high 32 bits.
+TEXT runtime·_mul64by32(SB), NOSPLIT, $0
+	MOVL	lo64+0(FP), CX
+	MOVL	a_lo+4(FP), AX
+	MULL	b+12(FP)
+	MOVL	AX, 0(CX)
+	MOVL	DX, BX
+	MOVL	a_hi+8(FP), AX
+	MULL	b+12(FP)
+	ADDL	AX, BX
+	ADCL	$0, DX
+	MOVL	BX, 4(CX)
+	MOVL	DX, AX
+	MOVL	AX, hi32+16(FP)
+	RET
+
+TEXT runtime·_div64by32(SB), NOSPLIT, $0
+	MOVL	r+12(FP), CX
+	MOVL	a_lo+0(FP), AX
+	MOVL	a_hi+4(FP), DX
+	DIVL	b+8(FP)
+	MOVL	DX, 0(CX)
+	MOVL	AX, q+16(FP)
+	RET
diff --git a/src/runtime/vlop_arm.s b/src/runtime/vlop_arm.s
new file mode 100644
index 0000000..9e19938
--- /dev/null
+++ b/src/runtime/vlop_arm.s
@@ -0,0 +1,260 @@
+// Inferno's libkern/vlop-arm.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/vlop-arm.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// func runtime·udiv(n, d uint32) (q, r uint32)
+// compiler knowns the register usage of this function
+// Reference:
+// Sloss, Andrew et. al; ARM System Developer's Guide: Designing and Optimizing System Software
+// Morgan Kaufmann; 1 edition (April 8, 2004), ISBN 978-1558608740
+#define Rq	R0 // input d, output q
+#define Rr	R1 // input n, output r
+#define Rs	R2 // three temporary variables
+#define RM	R3
+#define Ra	R11
+
+// Be careful: Ra == R11 will be used by the linker for synthesized instructions.
+// Note: this function does not have a frame.
+TEXT runtime·udiv(SB),NOSPLIT|NOFRAME,$0
+	MOVBU	internal∕cpu·ARM+const_offsetARMHasIDIVA(SB), Ra
+	CMP	$0, Ra
+	BNE	udiv_hardware
+
+	CLZ 	Rq, Rs // find normalizing shift
+	MOVW.S	Rq<<Rs, Ra
+	MOVW	$fast_udiv_tab<>-64(SB), RM
+	ADD.NE	Ra>>25, RM, Ra // index by most significant 7 bits of divisor
+	MOVBU.NE	(Ra), Ra
+
+	SUB.S	$7, Rs
+	RSB 	$0, Rq, RM // M = -q
+	MOVW.PL	Ra<<Rs, Rq
+
+	// 1st Newton iteration
+	MUL.PL	RM, Rq, Ra // a = -q*d
+	BMI 	udiv_by_large_d
+	MULAWT	Ra, Rq, Rq, Rq // q approx q-(q*q*d>>32)
+	TEQ 	RM->1, RM // check for d=0 or d=1
+
+	// 2nd Newton iteration
+	MUL.NE	RM, Rq, Ra
+	MOVW.NE	$0, Rs
+	MULAL.NE Rq, Ra, (Rq,Rs)
+	BEQ 	udiv_by_0_or_1
+
+	// q now accurate enough for a remainder r, 0<=r<3*d
+	MULLU	Rq, Rr, (Rq,Rs) // q = (r * q) >> 32
+	ADD 	RM, Rr, Rr // r = n - d
+	MULA	RM, Rq, Rr, Rr // r = n - (q+1)*d
+
+	// since 0 <= n-q*d < 3*d; thus -d <= r < 2*d
+	CMN 	RM, Rr // t = r-d
+	SUB.CS	RM, Rr, Rr // if (t<-d || t>=0) r=r+d
+	ADD.CC	$1, Rq
+	ADD.PL	RM<<1, Rr
+	ADD.PL	$2, Rq
+	RET
+
+// use hardware divider
+udiv_hardware:
+	DIVUHW	Rq, Rr, Rs
+	MUL	Rs, Rq, RM
+	RSB	Rr, RM, Rr
+	MOVW	Rs, Rq
+	RET
+
+udiv_by_large_d:
+	// at this point we know d>=2^(31-6)=2^25
+	SUB 	$4, Ra, Ra
+	RSB 	$0, Rs, Rs
+	MOVW	Ra>>Rs, Rq
+	MULLU	Rq, Rr, (Rq,Rs)
+	MULA	RM, Rq, Rr, Rr
+
+	// q now accurate enough for a remainder r, 0<=r<4*d
+	CMN 	Rr>>1, RM // if(r/2 >= d)
+	ADD.CS	RM<<1, Rr
+	ADD.CS	$2, Rq
+	CMN 	Rr, RM
+	ADD.CS	RM, Rr
+	ADD.CS	$1, Rq
+	RET
+
+udiv_by_0_or_1:
+	// carry set if d==1, carry clear if d==0
+	BCC udiv_by_0
+	MOVW	Rr, Rq
+	MOVW	$0, Rr
+	RET
+
+udiv_by_0:
+	MOVW	$runtime·panicdivide(SB), R11
+	B	(R11)
+
+// var tab [64]byte
+// tab[0] = 255; for i := 1; i <= 63; i++ { tab[i] = (1<<14)/(64+i) }
+// laid out here as little-endian uint32s
+DATA fast_udiv_tab<>+0x00(SB)/4, $0xf4f8fcff
+DATA fast_udiv_tab<>+0x04(SB)/4, $0xe6eaedf0
+DATA fast_udiv_tab<>+0x08(SB)/4, $0xdadde0e3
+DATA fast_udiv_tab<>+0x0c(SB)/4, $0xcfd2d4d7
+DATA fast_udiv_tab<>+0x10(SB)/4, $0xc5c7cacc
+DATA fast_udiv_tab<>+0x14(SB)/4, $0xbcbec0c3
+DATA fast_udiv_tab<>+0x18(SB)/4, $0xb4b6b8ba
+DATA fast_udiv_tab<>+0x1c(SB)/4, $0xacaeb0b2
+DATA fast_udiv_tab<>+0x20(SB)/4, $0xa5a7a8aa
+DATA fast_udiv_tab<>+0x24(SB)/4, $0x9fa0a2a3
+DATA fast_udiv_tab<>+0x28(SB)/4, $0x999a9c9d
+DATA fast_udiv_tab<>+0x2c(SB)/4, $0x93949697
+DATA fast_udiv_tab<>+0x30(SB)/4, $0x8e8f9092
+DATA fast_udiv_tab<>+0x34(SB)/4, $0x898a8c8d
+DATA fast_udiv_tab<>+0x38(SB)/4, $0x85868788
+DATA fast_udiv_tab<>+0x3c(SB)/4, $0x81828384
+GLOBL fast_udiv_tab<>(SB), RODATA, $64
+
+// The linker will pass numerator in R8
+#define Rn R8
+// The linker expects the result in RTMP
+#define RTMP R11
+
+TEXT runtime·_divu(SB), NOSPLIT, $16-0
+	// It's not strictly true that there are no local pointers.
+	// It could be that the saved registers Rq, Rr, Rs, and Rm
+	// contain pointers. However, the only way this can matter
+	// is if the stack grows (which it can't, udiv is nosplit)
+	// or if a fault happens and more frames are added to
+	// the stack due to deferred functions.
+	// In the latter case, the stack can grow arbitrarily,
+	// and garbage collection can happen, and those
+	// operations care about pointers, but in that case
+	// the calling frame is dead, and so are the saved
+	// registers. So we can claim there are no pointers here.
+	NO_LOCAL_POINTERS
+	MOVW	Rq, 4(R13)
+	MOVW	Rr, 8(R13)
+	MOVW	Rs, 12(R13)
+	MOVW	RM, 16(R13)
+
+	MOVW	Rn, Rr			/* numerator */
+	MOVW	g_m(g), Rq
+	MOVW	m_divmod(Rq), Rq	/* denominator */
+	BL  	runtime·udiv(SB)
+	MOVW	Rq, RTMP
+	MOVW	4(R13), Rq
+	MOVW	8(R13), Rr
+	MOVW	12(R13), Rs
+	MOVW	16(R13), RM
+	RET
+
+TEXT runtime·_modu(SB), NOSPLIT, $16-0
+	NO_LOCAL_POINTERS
+	MOVW	Rq, 4(R13)
+	MOVW	Rr, 8(R13)
+	MOVW	Rs, 12(R13)
+	MOVW	RM, 16(R13)
+
+	MOVW	Rn, Rr			/* numerator */
+	MOVW	g_m(g), Rq
+	MOVW	m_divmod(Rq), Rq	/* denominator */
+	BL  	runtime·udiv(SB)
+	MOVW	Rr, RTMP
+	MOVW	4(R13), Rq
+	MOVW	8(R13), Rr
+	MOVW	12(R13), Rs
+	MOVW	16(R13), RM
+	RET
+
+TEXT runtime·_div(SB),NOSPLIT,$16-0
+	NO_LOCAL_POINTERS
+	MOVW	Rq, 4(R13)
+	MOVW	Rr, 8(R13)
+	MOVW	Rs, 12(R13)
+	MOVW	RM, 16(R13)
+	MOVW	Rn, Rr			/* numerator */
+	MOVW	g_m(g), Rq
+	MOVW	m_divmod(Rq), Rq	/* denominator */
+	CMP 	$0, Rr
+	BGE 	d1
+	RSB 	$0, Rr, Rr
+	CMP 	$0, Rq
+	BGE 	d2
+	RSB 	$0, Rq, Rq
+d0:
+	BL  	runtime·udiv(SB)  	/* none/both neg */
+	MOVW	Rq, RTMP
+	B	out1
+d1:
+	CMP 	$0, Rq
+	BGE 	d0
+	RSB 	$0, Rq, Rq
+d2:
+	BL  	runtime·udiv(SB)  	/* one neg */
+	RSB	$0, Rq, RTMP
+out1:
+	MOVW	4(R13), Rq
+	MOVW	8(R13), Rr
+	MOVW	12(R13), Rs
+	MOVW	16(R13), RM
+	RET
+
+TEXT runtime·_mod(SB),NOSPLIT,$16-0
+	NO_LOCAL_POINTERS
+	MOVW	Rq, 4(R13)
+	MOVW	Rr, 8(R13)
+	MOVW	Rs, 12(R13)
+	MOVW	RM, 16(R13)
+	MOVW	Rn, Rr			/* numerator */
+	MOVW	g_m(g), Rq
+	MOVW	m_divmod(Rq), Rq	/* denominator */
+	CMP 	$0, Rq
+	RSB.LT	$0, Rq, Rq
+	CMP 	$0, Rr
+	BGE 	m1
+	RSB 	$0, Rr, Rr
+	BL  	runtime·udiv(SB)  	/* neg numerator */
+	RSB 	$0, Rr, RTMP
+	B   	out
+m1:
+	BL  	runtime·udiv(SB)  	/* pos numerator */
+	MOVW	Rr, RTMP
+out:
+	MOVW	4(R13), Rq
+	MOVW	8(R13), Rr
+	MOVW	12(R13), Rs
+	MOVW	16(R13), RM
+	RET
+
+// _mul64by32 and _div64by32 not implemented on arm
+TEXT runtime·_mul64by32(SB), NOSPLIT, $0
+	MOVW	$0, R0
+	MOVW	(R0), R1 // crash
+
+TEXT runtime·_div64by32(SB), NOSPLIT, $0
+	MOVW	$0, R0
+	MOVW	(R0), R1 // crash
diff --git a/src/runtime/vlop_arm_test.go b/src/runtime/vlop_arm_test.go
new file mode 100644
index 0000000..015126a
--- /dev/null
+++ b/src/runtime/vlop_arm_test.go
@@ -0,0 +1,128 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+)
+
+// arm soft division benchmarks adapted from
+// https://ridiculousfish.com/files/division_benchmarks.tar.gz
+
+const numeratorsSize = 1 << 21
+
+var numerators = randomNumerators()
+
+type randstate struct {
+	hi, lo uint32
+}
+
+func (r *randstate) rand() uint32 {
+	r.hi = r.hi<<16 + r.hi>>16
+	r.hi += r.lo
+	r.lo += r.hi
+	return r.hi
+}
+
+func randomNumerators() []uint32 {
+	numerators := make([]uint32, numeratorsSize)
+	random := &randstate{2147483563, 2147483563 ^ 0x49616E42}
+	for i := range numerators {
+		numerators[i] = random.rand()
+	}
+	return numerators
+}
+
+func bmUint32Div(divisor uint32, b *testing.B) {
+	var sum uint32
+	for i := 0; i < b.N; i++ {
+		sum += numerators[i&(numeratorsSize-1)] / divisor
+	}
+}
+
+func BenchmarkUint32Div7(b *testing.B)         { bmUint32Div(7, b) }
+func BenchmarkUint32Div37(b *testing.B)        { bmUint32Div(37, b) }
+func BenchmarkUint32Div123(b *testing.B)       { bmUint32Div(123, b) }
+func BenchmarkUint32Div763(b *testing.B)       { bmUint32Div(763, b) }
+func BenchmarkUint32Div1247(b *testing.B)      { bmUint32Div(1247, b) }
+func BenchmarkUint32Div9305(b *testing.B)      { bmUint32Div(9305, b) }
+func BenchmarkUint32Div13307(b *testing.B)     { bmUint32Div(13307, b) }
+func BenchmarkUint32Div52513(b *testing.B)     { bmUint32Div(52513, b) }
+func BenchmarkUint32Div60978747(b *testing.B)  { bmUint32Div(60978747, b) }
+func BenchmarkUint32Div106956295(b *testing.B) { bmUint32Div(106956295, b) }
+
+func bmUint32Mod(divisor uint32, b *testing.B) {
+	var sum uint32
+	for i := 0; i < b.N; i++ {
+		sum += numerators[i&(numeratorsSize-1)] % divisor
+	}
+}
+
+func BenchmarkUint32Mod7(b *testing.B)         { bmUint32Mod(7, b) }
+func BenchmarkUint32Mod37(b *testing.B)        { bmUint32Mod(37, b) }
+func BenchmarkUint32Mod123(b *testing.B)       { bmUint32Mod(123, b) }
+func BenchmarkUint32Mod763(b *testing.B)       { bmUint32Mod(763, b) }
+func BenchmarkUint32Mod1247(b *testing.B)      { bmUint32Mod(1247, b) }
+func BenchmarkUint32Mod9305(b *testing.B)      { bmUint32Mod(9305, b) }
+func BenchmarkUint32Mod13307(b *testing.B)     { bmUint32Mod(13307, b) }
+func BenchmarkUint32Mod52513(b *testing.B)     { bmUint32Mod(52513, b) }
+func BenchmarkUint32Mod60978747(b *testing.B)  { bmUint32Mod(60978747, b) }
+func BenchmarkUint32Mod106956295(b *testing.B) { bmUint32Mod(106956295, b) }
+
+func TestUsplit(t *testing.T) {
+	var den uint32 = 1000000
+	for _, x := range []uint32{0, 1, 999999, 1000000, 1010101, 0xFFFFFFFF} {
+		q1, r1 := runtime.Usplit(x)
+		q2, r2 := x/den, x%den
+		if q1 != q2 || r1 != r2 {
+			t.Errorf("%d/1e6, %d%%1e6 = %d, %d, want %d, %d", x, x, q1, r1, q2, r2)
+		}
+	}
+}
+
+//go:noinline
+func armFloatWrite(a *[129]float64) {
+	// This used to miscompile on arm5.
+	// The offset is too big to fit in a load.
+	// So the code does:
+	//   ldr     r0, [sp, #8]
+	//   bl      6f690 <_sfloat>
+	//   ldr     fp, [pc, #32]   ; (address of 128.0)
+	//   vldr    d0, [fp]
+	//   ldr     fp, [pc, #28]   ; (1024)
+	//   add     fp, fp, r0
+	//   vstr    d0, [fp]
+	// The software floating-point emulator gives up on the add.
+	// This causes the store to not work.
+	// See issue 15440.
+	a[128] = 128.0
+}
+func TestArmFloatBigOffsetWrite(t *testing.T) {
+	var a [129]float64
+	for i := 0; i < 128; i++ {
+		a[i] = float64(i)
+	}
+	armFloatWrite(&a)
+	for i, x := range a {
+		if x != float64(i) {
+			t.Errorf("bad entry %d:%f\n", i, x)
+		}
+	}
+}
+
+//go:noinline
+func armFloatRead(a *[129]float64) float64 {
+	return a[128]
+}
+func TestArmFloatBigOffsetRead(t *testing.T) {
+	var a [129]float64
+	for i := 0; i < 129; i++ {
+		a[i] = float64(i)
+	}
+	if x := armFloatRead(&a); x != 128.0 {
+		t.Errorf("bad value %f\n", x)
+	}
+}
diff --git a/src/runtime/vlrt.go b/src/runtime/vlrt.go
new file mode 100644
index 0000000..996c061
--- /dev/null
+++ b/src/runtime/vlrt.go
@@ -0,0 +1,277 @@
+// Inferno's libkern/vlrt-arm.c
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/vlrt-arm.c
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+// +build arm 386 mips mipsle
+
+package runtime
+
+import "unsafe"
+
+const (
+	sign32 = 1 << (32 - 1)
+	sign64 = 1 << (64 - 1)
+)
+
+func float64toint64(d float64) (y uint64) {
+	_d2v(&y, d)
+	return
+}
+
+func float64touint64(d float64) (y uint64) {
+	_d2v(&y, d)
+	return
+}
+
+func int64tofloat64(y int64) float64 {
+	if y < 0 {
+		return -uint64tofloat64(-uint64(y))
+	}
+	return uint64tofloat64(uint64(y))
+}
+
+func uint64tofloat64(y uint64) float64 {
+	hi := float64(uint32(y >> 32))
+	lo := float64(uint32(y))
+	d := hi*(1<<32) + lo
+	return d
+}
+
+func _d2v(y *uint64, d float64) {
+	x := *(*uint64)(unsafe.Pointer(&d))
+
+	xhi := uint32(x>>32)&0xfffff | 0x100000
+	xlo := uint32(x)
+	sh := 1075 - int32(uint32(x>>52)&0x7ff)
+
+	var ylo, yhi uint32
+	if sh >= 0 {
+		sh := uint32(sh)
+		/* v = (hi||lo) >> sh */
+		if sh < 32 {
+			if sh == 0 {
+				ylo = xlo
+				yhi = xhi
+			} else {
+				ylo = xlo>>sh | xhi<<(32-sh)
+				yhi = xhi >> sh
+			}
+		} else {
+			if sh == 32 {
+				ylo = xhi
+			} else if sh < 64 {
+				ylo = xhi >> (sh - 32)
+			}
+		}
+	} else {
+		/* v = (hi||lo) << -sh */
+		sh := uint32(-sh)
+		if sh <= 11 {
+			ylo = xlo << sh
+			yhi = xhi<<sh | xlo>>(32-sh)
+		} else {
+			/* overflow */
+			yhi = uint32(d) /* causes something awful */
+		}
+	}
+	if x&sign64 != 0 {
+		if ylo != 0 {
+			ylo = -ylo
+			yhi = ^yhi
+		} else {
+			yhi = -yhi
+		}
+	}
+
+	*y = uint64(yhi)<<32 | uint64(ylo)
+}
+func uint64div(n, d uint64) uint64 {
+	// Check for 32 bit operands
+	if uint32(n>>32) == 0 && uint32(d>>32) == 0 {
+		if uint32(d) == 0 {
+			panicdivide()
+		}
+		return uint64(uint32(n) / uint32(d))
+	}
+	q, _ := dodiv(n, d)
+	return q
+}
+
+func uint64mod(n, d uint64) uint64 {
+	// Check for 32 bit operands
+	if uint32(n>>32) == 0 && uint32(d>>32) == 0 {
+		if uint32(d) == 0 {
+			panicdivide()
+		}
+		return uint64(uint32(n) % uint32(d))
+	}
+	_, r := dodiv(n, d)
+	return r
+}
+
+func int64div(n, d int64) int64 {
+	// Check for 32 bit operands
+	if int64(int32(n)) == n && int64(int32(d)) == d {
+		if int32(n) == -0x80000000 && int32(d) == -1 {
+			// special case: 32-bit -0x80000000 / -1 = -0x80000000,
+			// but 64-bit -0x80000000 / -1 = 0x80000000.
+			return 0x80000000
+		}
+		if int32(d) == 0 {
+			panicdivide()
+		}
+		return int64(int32(n) / int32(d))
+	}
+
+	nneg := n < 0
+	dneg := d < 0
+	if nneg {
+		n = -n
+	}
+	if dneg {
+		d = -d
+	}
+	uq, _ := dodiv(uint64(n), uint64(d))
+	q := int64(uq)
+	if nneg != dneg {
+		q = -q
+	}
+	return q
+}
+
+//go:nosplit
+func int64mod(n, d int64) int64 {
+	// Check for 32 bit operands
+	if int64(int32(n)) == n && int64(int32(d)) == d {
+		if int32(d) == 0 {
+			panicdivide()
+		}
+		return int64(int32(n) % int32(d))
+	}
+
+	nneg := n < 0
+	if nneg {
+		n = -n
+	}
+	if d < 0 {
+		d = -d
+	}
+	_, ur := dodiv(uint64(n), uint64(d))
+	r := int64(ur)
+	if nneg {
+		r = -r
+	}
+	return r
+}
+
+//go:noescape
+func _mul64by32(lo64 *uint64, a uint64, b uint32) (hi32 uint32)
+
+//go:noescape
+func _div64by32(a uint64, b uint32, r *uint32) (q uint32)
+
+//go:nosplit
+func dodiv(n, d uint64) (q, r uint64) {
+	if GOARCH == "arm" {
+		// arm doesn't have a division instruction, so
+		// slowdodiv is the best that we can do.
+		return slowdodiv(n, d)
+	}
+
+	if GOARCH == "mips" || GOARCH == "mipsle" {
+		// No _div64by32 on mips and using only _mul64by32 doesn't bring much benefit
+		return slowdodiv(n, d)
+	}
+
+	if d > n {
+		return 0, n
+	}
+
+	if uint32(d>>32) != 0 {
+		t := uint32(n>>32) / uint32(d>>32)
+		var lo64 uint64
+		hi32 := _mul64by32(&lo64, d, t)
+		if hi32 != 0 || lo64 > n {
+			return slowdodiv(n, d)
+		}
+		return uint64(t), n - lo64
+	}
+
+	// d is 32 bit
+	var qhi uint32
+	if uint32(n>>32) >= uint32(d) {
+		if uint32(d) == 0 {
+			panicdivide()
+		}
+		qhi = uint32(n>>32) / uint32(d)
+		n -= uint64(uint32(d)*qhi) << 32
+	} else {
+		qhi = 0
+	}
+
+	var rlo uint32
+	qlo := _div64by32(n, uint32(d), &rlo)
+	return uint64(qhi)<<32 + uint64(qlo), uint64(rlo)
+}
+
+//go:nosplit
+func slowdodiv(n, d uint64) (q, r uint64) {
+	if d == 0 {
+		panicdivide()
+	}
+
+	// Set up the divisor and find the number of iterations needed.
+	capn := n
+	if n >= sign64 {
+		capn = sign64
+	}
+	i := 0
+	for d < capn {
+		d <<= 1
+		i++
+	}
+
+	for ; i >= 0; i-- {
+		q <<= 1
+		if n >= d {
+			n -= d
+			q |= 1
+		}
+		d >>= 1
+	}
+	return q, n
+}
+
+// Floating point control word values.
+// Bits 0-5 are bits to disable floating-point exceptions.
+// Bits 8-9 are the precision control:
+//   0 = single precision a.k.a. float32
+//   2 = double precision a.k.a. float64
+// Bits 10-11 are the rounding mode:
+//   0 = round to nearest (even on a tie)
+//   3 = round toward zero
+var (
+	controlWord64      uint16 = 0x3f + 2<<8 + 0<<10
+	controlWord64trunc uint16 = 0x3f + 2<<8 + 3<<10
+)
diff --git a/src/runtime/wincallback.go b/src/runtime/wincallback.go
new file mode 100644
index 0000000..fb45222
--- /dev/null
+++ b/src/runtime/wincallback.go
@@ -0,0 +1,95 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build ignore
+
+// Generate Windows callback assembly file.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"os"
+)
+
+const maxCallback = 2000
+
+func genasm386Amd64() {
+	var buf bytes.Buffer
+
+	buf.WriteString(`// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+// +build 386 amd64
+// runtime·callbackasm is called by external code to
+// execute Go implemented callback function. It is not
+// called from the start, instead runtime·compilecallback
+// always returns address into runtime·callbackasm offset
+// appropriately so different callbacks start with different
+// CALL instruction in runtime·callbackasm. This determines
+// which Go callback function is executed later on.
+
+TEXT runtime·callbackasm(SB),7,$0
+`)
+	for i := 0; i < maxCallback; i++ {
+		buf.WriteString("\tCALL\truntime·callbackasm1(SB)\n")
+	}
+
+	filename := fmt.Sprintf("zcallback_windows.s")
+	err := os.WriteFile(filename, buf.Bytes(), 0666)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "wincallback: %s\n", err)
+		os.Exit(2)
+	}
+}
+
+func genasmArm() {
+	var buf bytes.Buffer
+
+	buf.WriteString(`// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+// External code calls into callbackasm at an offset corresponding
+// to the callback index. Callbackasm is a table of MOV and B instructions.
+// The MOV instruction loads R12 with the callback index, and the
+// B instruction branches to callbackasm1.
+// callbackasm1 takes the callback index from R12 and
+// indexes into an array that stores information about each callback.
+// It then calls the Go implementation for that callback.
+#include "textflag.h"
+
+TEXT runtime·callbackasm(SB),NOSPLIT|NOFRAME,$0
+`)
+	for i := 0; i < maxCallback; i++ {
+		buf.WriteString(fmt.Sprintf("\tMOVW\t$%d, R12\n", i))
+		buf.WriteString("\tB\truntime·callbackasm1(SB)\n")
+	}
+
+	err := os.WriteFile("zcallback_windows_arm.s", buf.Bytes(), 0666)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "wincallback: %s\n", err)
+		os.Exit(2)
+	}
+}
+
+func gengo() {
+	var buf bytes.Buffer
+
+	buf.WriteString(fmt.Sprintf(`// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+package runtime
+
+const cb_max = %d // maximum number of windows callbacks allowed
+`, maxCallback))
+	err := os.WriteFile("zcallback_windows.go", buf.Bytes(), 0666)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "wincallback: %s\n", err)
+		os.Exit(2)
+	}
+}
+
+func main() {
+	genasm386Amd64()
+	genasmArm()
+	gengo()
+}
diff --git a/src/runtime/write_err.go b/src/runtime/write_err.go
new file mode 100644
index 0000000..6b1467b
--- /dev/null
+++ b/src/runtime/write_err.go
@@ -0,0 +1,13 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !android
+
+package runtime
+
+import "unsafe"
+
+func writeErr(b []byte) {
+	write(2, unsafe.Pointer(&b[0]), int32(len(b)))
+}
diff --git a/src/runtime/write_err_android.go b/src/runtime/write_err_android.go
new file mode 100644
index 0000000..2419fc8
--- /dev/null
+++ b/src/runtime/write_err_android.go
@@ -0,0 +1,162 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+var (
+	writeHeader = []byte{6 /* ANDROID_LOG_ERROR */, 'G', 'o', 0}
+	writePath   = []byte("/dev/log/main\x00")
+	writeLogd   = []byte("/dev/socket/logdw\x00")
+
+	// guarded by printlock/printunlock.
+	writeFD  uintptr
+	writeBuf [1024]byte
+	writePos int
+)
+
+// Prior to Android-L, logging was done through writes to /dev/log files implemented
+// in kernel ring buffers. In Android-L, those /dev/log files are no longer
+// accessible and logging is done through a centralized user-mode logger, logd.
+//
+// https://android.googlesource.com/platform/system/core/+/refs/tags/android-6.0.1_r78/liblog/logd_write.c
+type loggerType int32
+
+const (
+	unknown loggerType = iota
+	legacy
+	logd
+	// TODO(hakim): logging for emulator?
+)
+
+var logger loggerType
+
+func writeErr(b []byte) {
+	if logger == unknown {
+		// Use logd if /dev/socket/logdw is available.
+		if v := uintptr(access(&writeLogd[0], 0x02 /* W_OK */)); v == 0 {
+			logger = logd
+			initLogd()
+		} else {
+			logger = legacy
+			initLegacy()
+		}
+	}
+
+	// Write to stderr for command-line programs.
+	write(2, unsafe.Pointer(&b[0]), int32(len(b)))
+
+	// Log format: "<header>\x00<message m bytes>\x00"
+	//
+	// <header>
+	//   In legacy mode: "<priority 1 byte><tag n bytes>".
+	//   In logd mode: "<android_log_header_t 11 bytes><priority 1 byte><tag n bytes>"
+	//
+	// The entire log needs to be delivered in a single syscall (the NDK
+	// does this with writev). Each log is its own line, so we need to
+	// buffer writes until we see a newline.
+	var hlen int
+	switch logger {
+	case logd:
+		hlen = writeLogdHeader()
+	case legacy:
+		hlen = len(writeHeader)
+	}
+
+	dst := writeBuf[hlen:]
+	for _, v := range b {
+		if v == 0 { // android logging won't print a zero byte
+			v = '0'
+		}
+		dst[writePos] = v
+		writePos++
+		if v == '\n' || writePos == len(dst)-1 {
+			dst[writePos] = 0
+			write(writeFD, unsafe.Pointer(&writeBuf[0]), int32(hlen+writePos))
+			for i := range dst {
+				dst[i] = 0
+			}
+			writePos = 0
+		}
+	}
+}
+
+func initLegacy() {
+	// In legacy mode, logs are written to /dev/log/main
+	writeFD = uintptr(open(&writePath[0], 0x1 /* O_WRONLY */, 0))
+	if writeFD == 0 {
+		// It is hard to do anything here. Write to stderr just
+		// in case user has root on device and has run
+		//	adb shell setprop log.redirect-stdio true
+		msg := []byte("runtime: cannot open /dev/log/main\x00")
+		write(2, unsafe.Pointer(&msg[0]), int32(len(msg)))
+		exit(2)
+	}
+
+	// Prepopulate the invariant header part.
+	copy(writeBuf[:len(writeHeader)], writeHeader)
+}
+
+// used in initLogdWrite but defined here to avoid heap allocation.
+var logdAddr sockaddr_un
+
+func initLogd() {
+	// In logd mode, logs are sent to the logd via a unix domain socket.
+	logdAddr.family = _AF_UNIX
+	copy(logdAddr.path[:], writeLogd)
+
+	// We are not using non-blocking I/O because writes taking this path
+	// are most likely triggered by panic, we cannot think of the advantage of
+	// non-blocking I/O for panic but see disadvantage (dropping panic message),
+	// and blocking I/O simplifies the code a lot.
+	fd := socket(_AF_UNIX, _SOCK_DGRAM|_O_CLOEXEC, 0)
+	if fd < 0 {
+		msg := []byte("runtime: cannot create a socket for logging\x00")
+		write(2, unsafe.Pointer(&msg[0]), int32(len(msg)))
+		exit(2)
+	}
+
+	errno := connect(fd, unsafe.Pointer(&logdAddr), int32(unsafe.Sizeof(logdAddr)))
+	if errno < 0 {
+		msg := []byte("runtime: cannot connect to /dev/socket/logdw\x00")
+		write(2, unsafe.Pointer(&msg[0]), int32(len(msg)))
+		// TODO(hakim): or should we just close fd and hope for better luck next time?
+		exit(2)
+	}
+	writeFD = uintptr(fd)
+
+	// Prepopulate invariant part of the header.
+	// The first 11 bytes will be populated later in writeLogdHeader.
+	copy(writeBuf[11:11+len(writeHeader)], writeHeader)
+}
+
+// writeLogdHeader populates the header and returns the length of the payload.
+func writeLogdHeader() int {
+	hdr := writeBuf[:11]
+
+	// The first 11 bytes of the header corresponds to android_log_header_t
+	// as defined in system/core/include/private/android_logger.h
+	//   hdr[0] log type id (unsigned char), defined in <log/log.h>
+	//   hdr[1:2] tid (uint16_t)
+	//   hdr[3:11] log_time defined in <log/log_read.h>
+	//      hdr[3:7] sec unsigned uint32, little endian.
+	//      hdr[7:11] nsec unsigned uint32, little endian.
+	hdr[0] = 0 // LOG_ID_MAIN
+	sec, nsec := walltime()
+	packUint32(hdr[3:7], uint32(sec))
+	packUint32(hdr[7:11], uint32(nsec))
+
+	// TODO(hakim):  hdr[1:2] = gettid?
+
+	return 11 + len(writeHeader)
+}
+
+func packUint32(b []byte, v uint32) {
+	// little-endian.
+	b[0] = byte(v)
+	b[1] = byte(v >> 8)
+	b[2] = byte(v >> 16)
+	b[3] = byte(v >> 24)
+}
diff --git a/src/runtime/zcallback_windows.go b/src/runtime/zcallback_windows.go
new file mode 100644
index 0000000..2c3cb28
--- /dev/null
+++ b/src/runtime/zcallback_windows.go
@@ -0,0 +1,5 @@
+// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+package runtime
+
+const cb_max = 2000 // maximum number of windows callbacks allowed
diff --git a/src/runtime/zcallback_windows.s b/src/runtime/zcallback_windows.s
new file mode 100644
index 0000000..7772eef
--- /dev/null
+++ b/src/runtime/zcallback_windows.s
@@ -0,0 +1,2012 @@
+// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+// +build 386 amd64
+// runtime·callbackasm is called by external code to
+// execute Go implemented callback function. It is not
+// called from the start, instead runtime·compilecallback
+// always returns address into runtime·callbackasm offset
+// appropriately so different callbacks start with different
+// CALL instruction in runtime·callbackasm. This determines
+// which Go callback function is executed later on.
+
+TEXT runtime·callbackasm(SB),7,$0
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
diff --git a/src/runtime/zcallback_windows_arm.s b/src/runtime/zcallback_windows_arm.s
new file mode 100644
index 0000000..f943d84
--- /dev/null
+++ b/src/runtime/zcallback_windows_arm.s
@@ -0,0 +1,4012 @@
+// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+// External code calls into callbackasm at an offset corresponding
+// to the callback index. Callbackasm is a table of MOV and B instructions.
+// The MOV instruction loads R12 with the callback index, and the
+// B instruction branches to callbackasm1.
+// callbackasm1 takes the callback index from R12 and
+// indexes into an array that stores information about each callback.
+// It then calls the Go implementation for that callback.
+#include "textflag.h"
+
+TEXT runtime·callbackasm(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$2, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$3, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$4, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$5, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$6, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$7, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$8, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$9, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$10, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$11, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$12, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$13, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$14, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$15, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$16, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$17, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$18, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$19, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$20, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$21, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$22, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$23, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$24, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$25, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$26, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$27, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$28, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$29, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$30, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$31, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$32, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$33, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$34, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$35, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$36, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$37, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$38, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$39, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$40, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$41, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$42, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$43, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$44, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$45, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$46, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$47, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$48, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$49, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$50, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$51, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$52, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$53, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$54, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$55, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$56, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$57, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$58, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$59, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$60, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$61, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$62, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$63, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$64, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$65, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$66, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$67, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$68, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$69, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$70, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$71, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$72, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$73, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$74, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$75, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$76, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$77, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$78, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$79, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$80, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$81, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$82, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$83, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$84, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$85, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$86, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$87, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$88, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$89, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$90, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$91, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$92, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$93, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$94, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$95, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$96, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$97, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$98, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$99, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$100, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$101, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$102, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$103, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$104, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$105, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$106, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$107, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$108, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$109, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$110, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$111, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$112, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$113, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$114, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$115, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$116, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$117, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$118, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$119, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$120, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$121, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$122, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$123, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$124, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$125, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$126, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$127, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$128, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$129, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$130, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$131, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$132, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$133, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$134, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$135, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$136, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$137, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$138, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$139, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$140, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$141, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$142, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$143, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$144, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$145, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$146, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$147, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$148, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$149, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$150, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$151, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$152, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$153, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$154, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$155, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$156, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$157, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$158, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$159, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$160, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$161, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$162, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$163, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$164, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$165, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$166, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$167, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$168, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$169, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$170, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$171, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$172, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$173, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$174, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$175, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$176, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$177, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$178, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$179, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$180, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$181, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$182, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$183, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$184, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$185, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$186, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$187, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$188, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$189, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$190, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$191, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$192, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$193, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$194, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$195, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$196, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$197, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$198, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$199, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$200, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$201, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$202, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$203, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$204, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$205, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$206, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$207, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$208, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$209, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$210, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$211, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$212, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$213, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$214, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$215, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$216, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$217, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$218, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$219, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$220, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$221, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$222, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$223, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$224, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$225, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$226, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$227, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$228, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$229, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$230, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$231, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$232, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$233, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$234, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$235, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$236, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$237, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$238, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$239, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$240, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$241, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$242, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$243, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$244, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$245, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$246, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$247, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$248, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$249, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$250, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$251, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$252, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$253, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$254, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$255, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$256, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$257, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$258, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$259, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$260, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$261, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$262, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$263, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$264, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$265, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$266, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$267, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$268, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$269, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$270, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$271, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$272, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$273, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$274, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$275, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$276, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$277, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$278, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$279, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$280, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$281, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$282, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$283, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$284, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$285, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$286, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$287, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$288, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$289, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$290, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$291, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$292, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$293, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$294, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$295, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$296, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$297, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$298, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$299, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$300, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$301, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$302, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$303, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$304, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$305, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$306, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$307, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$308, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$309, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$310, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$311, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$312, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$313, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$314, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$315, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$316, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$317, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$318, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$319, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$320, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$321, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$322, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$323, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$324, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$325, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$326, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$327, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$328, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$329, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$330, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$331, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$332, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$333, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$334, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$335, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$336, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$337, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$338, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$339, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$340, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$341, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$342, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$343, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$344, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$345, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$346, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$347, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$348, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$349, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$350, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$351, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$352, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$353, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$354, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$355, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$356, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$357, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$358, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$359, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$360, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$361, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$362, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$363, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$364, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$365, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$366, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$367, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$368, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$369, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$370, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$371, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$372, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$373, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$374, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$375, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$376, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$377, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$378, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$379, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$380, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$381, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$382, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$383, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$384, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$385, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$386, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$387, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$388, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$389, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$390, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$391, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$392, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$393, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$394, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$395, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$396, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$397, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$398, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$399, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$400, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$401, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$402, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$403, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$404, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$405, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$406, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$407, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$408, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$409, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$410, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$411, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$412, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$413, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$414, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$415, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$416, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$417, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$418, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$419, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$420, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$421, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$422, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$423, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$424, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$425, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$426, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$427, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$428, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$429, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$430, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$431, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$432, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$433, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$434, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$435, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$436, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$437, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$438, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$439, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$440, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$441, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$442, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$443, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$444, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$445, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$446, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$447, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$448, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$449, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$450, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$451, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$452, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$453, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$454, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$455, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$456, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$457, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$458, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$459, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$460, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$461, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$462, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$463, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$464, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$465, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$466, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$467, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$468, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$469, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$470, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$471, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$472, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$473, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$474, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$475, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$476, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$477, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$478, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$479, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$480, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$481, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$482, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$483, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$484, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$485, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$486, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$487, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$488, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$489, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$490, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$491, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$492, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$493, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$494, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$495, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$496, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$497, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$498, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$499, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$500, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$501, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$502, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$503, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$504, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$505, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$506, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$507, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$508, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$509, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$510, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$511, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$512, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$513, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$514, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$515, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$516, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$517, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$518, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$519, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$520, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$521, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$522, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$523, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$524, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$525, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$526, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$527, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$528, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$529, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$530, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$531, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$532, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$533, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$534, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$535, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$536, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$537, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$538, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$539, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$540, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$541, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$542, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$543, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$544, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$545, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$546, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$547, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$548, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$549, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$550, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$551, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$552, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$553, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$554, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$555, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$556, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$557, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$558, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$559, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$560, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$561, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$562, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$563, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$564, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$565, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$566, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$567, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$568, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$569, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$570, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$571, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$572, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$573, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$574, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$575, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$576, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$577, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$578, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$579, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$580, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$581, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$582, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$583, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$584, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$585, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$586, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$587, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$588, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$589, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$590, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$591, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$592, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$593, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$594, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$595, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$596, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$597, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$598, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$599, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$600, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$601, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$602, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$603, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$604, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$605, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$606, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$607, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$608, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$609, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$610, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$611, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$612, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$613, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$614, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$615, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$616, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$617, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$618, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$619, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$620, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$621, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$622, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$623, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$624, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$625, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$626, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$627, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$628, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$629, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$630, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$631, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$632, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$633, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$634, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$635, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$636, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$637, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$638, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$639, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$640, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$641, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$642, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$643, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$644, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$645, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$646, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$647, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$648, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$649, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$650, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$651, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$652, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$653, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$654, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$655, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$656, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$657, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$658, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$659, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$660, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$661, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$662, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$663, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$664, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$665, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$666, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$667, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$668, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$669, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$670, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$671, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$672, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$673, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$674, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$675, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$676, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$677, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$678, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$679, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$680, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$681, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$682, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$683, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$684, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$685, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$686, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$687, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$688, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$689, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$690, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$691, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$692, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$693, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$694, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$695, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$696, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$697, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$698, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$699, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$700, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$701, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$702, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$703, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$704, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$705, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$706, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$707, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$708, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$709, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$710, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$711, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$712, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$713, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$714, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$715, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$716, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$717, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$718, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$719, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$720, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$721, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$722, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$723, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$724, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$725, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$726, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$727, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$728, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$729, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$730, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$731, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$732, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$733, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$734, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$735, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$736, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$737, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$738, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$739, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$740, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$741, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$742, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$743, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$744, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$745, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$746, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$747, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$748, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$749, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$750, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$751, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$752, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$753, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$754, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$755, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$756, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$757, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$758, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$759, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$760, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$761, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$762, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$763, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$764, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$765, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$766, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$767, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$768, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$769, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$770, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$771, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$772, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$773, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$774, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$775, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$776, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$777, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$778, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$779, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$780, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$781, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$782, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$783, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$784, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$785, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$786, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$787, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$788, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$789, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$790, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$791, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$792, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$793, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$794, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$795, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$796, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$797, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$798, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$799, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$800, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$801, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$802, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$803, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$804, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$805, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$806, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$807, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$808, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$809, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$810, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$811, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$812, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$813, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$814, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$815, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$816, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$817, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$818, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$819, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$820, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$821, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$822, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$823, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$824, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$825, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$826, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$827, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$828, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$829, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$830, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$831, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$832, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$833, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$834, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$835, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$836, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$837, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$838, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$839, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$840, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$841, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$842, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$843, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$844, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$845, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$846, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$847, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$848, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$849, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$850, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$851, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$852, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$853, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$854, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$855, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$856, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$857, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$858, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$859, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$860, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$861, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$862, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$863, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$864, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$865, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$866, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$867, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$868, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$869, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$870, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$871, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$872, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$873, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$874, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$875, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$876, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$877, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$878, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$879, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$880, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$881, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$882, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$883, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$884, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$885, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$886, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$887, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$888, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$889, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$890, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$891, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$892, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$893, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$894, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$895, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$896, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$897, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$898, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$899, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$900, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$901, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$902, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$903, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$904, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$905, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$906, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$907, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$908, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$909, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$910, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$911, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$912, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$913, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$914, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$915, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$916, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$917, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$918, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$919, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$920, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$921, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$922, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$923, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$924, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$925, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$926, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$927, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$928, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$929, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$930, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$931, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$932, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$933, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$934, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$935, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$936, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$937, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$938, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$939, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$940, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$941, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$942, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$943, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$944, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$945, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$946, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$947, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$948, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$949, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$950, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$951, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$952, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$953, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$954, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$955, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$956, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$957, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$958, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$959, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$960, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$961, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$962, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$963, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$964, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$965, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$966, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$967, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$968, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$969, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$970, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$971, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$972, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$973, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$974, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$975, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$976, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$977, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$978, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$979, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$980, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$981, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$982, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$983, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$984, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$985, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$986, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$987, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$988, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$989, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$990, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$991, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$992, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$993, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$994, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$995, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$996, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$997, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$998, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$999, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1000, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1001, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1002, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1003, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1004, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1005, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1006, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1007, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1008, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1009, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1010, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1011, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1012, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1013, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1014, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1015, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1016, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1017, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1018, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1019, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1020, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1021, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1022, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1023, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1024, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1025, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1026, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1027, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1028, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1029, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1030, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1031, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1032, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1033, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1034, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1035, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1036, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1037, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1038, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1039, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1040, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1041, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1042, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1043, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1044, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1045, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1046, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1047, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1048, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1049, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1050, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1051, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1052, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1053, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1054, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1055, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1056, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1057, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1058, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1059, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1060, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1061, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1062, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1063, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1064, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1065, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1066, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1067, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1068, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1069, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1070, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1071, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1072, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1073, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1074, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1075, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1076, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1077, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1078, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1079, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1080, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1081, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1082, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1083, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1084, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1085, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1086, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1087, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1088, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1089, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1090, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1091, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1092, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1093, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1094, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1095, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1096, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1097, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1098, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1099, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1100, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1101, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1102, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1103, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1104, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1105, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1106, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1107, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1108, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1109, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1110, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1111, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1112, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1113, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1114, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1115, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1116, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1117, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1118, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1119, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1120, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1121, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1122, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1123, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1124, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1125, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1126, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1127, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1128, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1129, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1130, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1131, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1132, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1133, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1134, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1135, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1136, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1137, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1138, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1139, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1140, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1141, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1142, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1143, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1144, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1145, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1146, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1147, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1148, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1149, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1150, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1151, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1152, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1153, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1154, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1155, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1156, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1157, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1158, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1159, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1160, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1161, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1162, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1163, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1164, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1165, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1166, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1167, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1168, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1169, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1170, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1171, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1172, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1173, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1174, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1175, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1176, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1177, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1178, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1179, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1180, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1181, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1182, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1183, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1184, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1185, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1186, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1187, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1188, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1189, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1190, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1191, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1192, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1193, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1194, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1195, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1196, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1197, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1198, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1199, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1200, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1201, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1202, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1203, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1204, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1205, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1206, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1207, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1208, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1209, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1210, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1211, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1212, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1213, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1214, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1215, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1216, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1217, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1218, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1219, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1220, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1221, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1222, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1223, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1224, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1225, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1226, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1227, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1228, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1229, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1230, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1231, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1232, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1233, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1234, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1235, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1236, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1237, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1238, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1239, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1240, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1241, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1242, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1243, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1244, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1245, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1246, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1247, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1248, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1249, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1250, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1251, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1252, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1253, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1254, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1255, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1256, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1257, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1258, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1259, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1260, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1261, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1262, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1263, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1264, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1265, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1266, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1267, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1268, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1269, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1270, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1271, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1272, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1273, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1274, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1275, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1276, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1277, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1278, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1279, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1280, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1281, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1282, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1283, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1284, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1285, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1286, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1287, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1288, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1289, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1290, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1291, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1292, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1293, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1294, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1295, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1296, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1297, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1298, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1299, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1300, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1301, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1302, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1303, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1304, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1305, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1306, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1307, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1308, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1309, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1310, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1311, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1312, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1313, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1314, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1315, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1316, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1317, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1318, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1319, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1320, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1321, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1322, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1323, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1324, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1325, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1326, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1327, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1328, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1329, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1330, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1331, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1332, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1333, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1334, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1335, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1336, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1337, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1338, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1339, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1340, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1341, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1342, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1343, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1344, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1345, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1346, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1347, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1348, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1349, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1350, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1351, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1352, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1353, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1354, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1355, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1356, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1357, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1358, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1359, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1360, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1361, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1362, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1363, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1364, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1365, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1366, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1367, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1368, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1369, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1370, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1371, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1372, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1373, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1374, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1375, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1376, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1377, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1378, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1379, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1380, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1381, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1382, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1383, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1384, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1385, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1386, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1387, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1388, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1389, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1390, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1391, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1392, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1393, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1394, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1395, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1396, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1397, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1398, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1399, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1400, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1401, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1402, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1403, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1404, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1405, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1406, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1407, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1408, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1409, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1410, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1411, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1412, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1413, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1414, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1415, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1416, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1417, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1418, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1419, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1420, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1421, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1422, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1423, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1424, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1425, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1426, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1427, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1428, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1429, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1430, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1431, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1432, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1433, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1434, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1435, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1436, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1437, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1438, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1439, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1440, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1441, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1442, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1443, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1444, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1445, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1446, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1447, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1448, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1449, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1450, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1451, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1452, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1453, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1454, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1455, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1456, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1457, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1458, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1459, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1460, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1461, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1462, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1463, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1464, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1465, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1466, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1467, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1468, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1469, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1470, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1471, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1472, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1473, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1474, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1475, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1476, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1477, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1478, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1479, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1480, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1481, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1482, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1483, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1484, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1485, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1486, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1487, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1488, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1489, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1490, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1491, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1492, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1493, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1494, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1495, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1496, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1497, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1498, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1499, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1500, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1501, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1502, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1503, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1504, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1505, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1506, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1507, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1508, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1509, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1510, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1511, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1512, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1513, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1514, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1515, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1516, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1517, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1518, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1519, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1520, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1521, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1522, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1523, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1524, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1525, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1526, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1527, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1528, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1529, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1530, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1531, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1532, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1533, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1534, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1535, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1536, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1537, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1538, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1539, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1540, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1541, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1542, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1543, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1544, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1545, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1546, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1547, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1548, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1549, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1550, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1551, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1552, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1553, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1554, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1555, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1556, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1557, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1558, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1559, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1560, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1561, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1562, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1563, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1564, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1565, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1566, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1567, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1568, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1569, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1570, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1571, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1572, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1573, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1574, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1575, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1576, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1577, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1578, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1579, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1580, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1581, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1582, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1583, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1584, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1585, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1586, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1587, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1588, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1589, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1590, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1591, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1592, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1593, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1594, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1595, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1596, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1597, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1598, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1599, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1600, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1601, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1602, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1603, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1604, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1605, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1606, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1607, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1608, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1609, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1610, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1611, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1612, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1613, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1614, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1615, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1616, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1617, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1618, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1619, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1620, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1621, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1622, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1623, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1624, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1625, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1626, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1627, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1628, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1629, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1630, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1631, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1632, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1633, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1634, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1635, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1636, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1637, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1638, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1639, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1640, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1641, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1642, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1643, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1644, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1645, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1646, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1647, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1648, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1649, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1650, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1651, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1652, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1653, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1654, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1655, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1656, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1657, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1658, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1659, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1660, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1661, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1662, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1663, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1664, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1665, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1666, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1667, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1668, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1669, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1670, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1671, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1672, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1673, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1674, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1675, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1676, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1677, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1678, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1679, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1680, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1681, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1682, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1683, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1684, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1685, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1686, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1687, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1688, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1689, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1690, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1691, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1692, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1693, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1694, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1695, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1696, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1697, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1698, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1699, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1700, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1701, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1702, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1703, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1704, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1705, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1706, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1707, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1708, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1709, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1710, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1711, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1712, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1713, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1714, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1715, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1716, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1717, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1718, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1719, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1720, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1721, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1722, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1723, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1724, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1725, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1726, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1727, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1728, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1729, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1730, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1731, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1732, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1733, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1734, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1735, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1736, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1737, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1738, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1739, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1740, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1741, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1742, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1743, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1744, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1745, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1746, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1747, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1748, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1749, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1750, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1751, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1752, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1753, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1754, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1755, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1756, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1757, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1758, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1759, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1760, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1761, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1762, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1763, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1764, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1765, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1766, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1767, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1768, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1769, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1770, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1771, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1772, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1773, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1774, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1775, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1776, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1777, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1778, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1779, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1780, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1781, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1782, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1783, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1784, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1785, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1786, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1787, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1788, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1789, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1790, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1791, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1792, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1793, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1794, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1795, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1796, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1797, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1798, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1799, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1800, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1801, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1802, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1803, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1804, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1805, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1806, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1807, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1808, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1809, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1810, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1811, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1812, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1813, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1814, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1815, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1816, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1817, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1818, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1819, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1820, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1821, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1822, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1823, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1824, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1825, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1826, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1827, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1828, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1829, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1830, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1831, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1832, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1833, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1834, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1835, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1836, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1837, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1838, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1839, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1840, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1841, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1842, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1843, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1844, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1845, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1846, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1847, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1848, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1849, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1850, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1851, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1852, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1853, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1854, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1855, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1856, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1857, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1858, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1859, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1860, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1861, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1862, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1863, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1864, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1865, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1866, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1867, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1868, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1869, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1870, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1871, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1872, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1873, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1874, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1875, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1876, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1877, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1878, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1879, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1880, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1881, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1882, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1883, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1884, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1885, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1886, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1887, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1888, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1889, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1890, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1891, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1892, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1893, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1894, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1895, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1896, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1897, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1898, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1899, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1900, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1901, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1902, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1903, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1904, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1905, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1906, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1907, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1908, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1909, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1910, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1911, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1912, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1913, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1914, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1915, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1916, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1917, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1918, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1919, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1920, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1921, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1922, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1923, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1924, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1925, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1926, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1927, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1928, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1929, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1930, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1931, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1932, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1933, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1934, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1935, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1936, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1937, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1938, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1939, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1940, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1941, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1942, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1943, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1944, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1945, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1946, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1947, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1948, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1949, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1950, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1951, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1952, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1953, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1954, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1955, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1956, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1957, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1958, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1959, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1960, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1961, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1962, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1963, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1964, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1965, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1966, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1967, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1968, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1969, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1970, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1971, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1972, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1973, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1974, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1975, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1976, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1977, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1978, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1979, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1980, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1981, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1982, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1983, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1984, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1985, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1986, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1987, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1988, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1989, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1990, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1991, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1992, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1993, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1994, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1995, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1996, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1997, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1998, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1999, R12
+	B	runtime·callbackasm1(SB)